Proceed to GeoCommunity Home Page


SpatialNewsGIS Data DepotGeoImaging ChannelGIS and MappingSoftwareGIS JobsGeoBids-RFPsGeoCommunity MarketplaceGIS Event Listings
HomeLoginAccountsAboutContactAdvertiseSearchFAQsForumsCartFree Newsletter

Sponsored by:


TOPICS
Today's News

Submit News

Feature Articles

Product Reviews

Education

News Affiliates

Discussions

Newsletters

Email Lists

Polls

Editor's Corner


SpatialNews Daily Newswire!
Subscribe now!

Latest Industry Headlines
SiteVision GIS Partnership With City of Roanoke VA Goes Live
Garmin® Introduces Delta™ Upland Remote Trainer with Beeper
Caliper Offers Updated Chile Data for Use with Maptitude 2013
Southampton’s Go! Rhinos Trail Mapped by Ordnance Survey
New Approach to Measuring Coral Growth Offers Valuable Tool for Reef Managers
Topo ly - Tailor-Fit for Companies' Online Mapping Needs

Latest GeoBids-RFPs
Nautical Charts*Poland
Software & Telemetry GPS
Spatial Data Management-DC
Geospatial and Mapping-DC
Next-Gen 911-MO

Recent Job Opportunities
Planner/GIS Specialist
Team Leader- Grape Supply Systems
Geospatial Developer

Recent Discussions
Raster images
cartographic symbology
Telephone Exchange areas in Europe
Problem showcasing Vector map on Windows CE device
Base map

GeoCommunity Mailing List
 
Mailing List Archives

Subject: Re: [gislist] Address Parsing for Standardization and Geocoding
Date:  03/25/2005 09:35:01 AM
From:  Bill Thoen



On Thu, 24 Mar 2005, Sonny Parafina wrote:

> You might want to look at Schuyler Erle's geocoder.us
>
> http://geocoder.us/
>
> Source is available and its written in perl. Its pretty nifty but it
> doesn't handle mis-spelling, you would probably need a soundex to handle
> that.

Thanks! I had forgotten about that. But then I looked at the source code,
and realized that it's a bit beyond me. I think what I want to know is
right here in US.pm:

our %Addr_Match = (
type => join("|", keys %Geo::Coder::US::Codes::_Street_Type_List),
number => qr/d+-?d*/,
state => join("|", %Geo::Coder::US::Codes::State_Code),
direct => join("|", %Geo::Coder::US::Codes::Directional),
dircode => join("|", keys %Geo::Coder::US::Codes::Direction_Code),
zip => qr/d{5}(?:-d{4})?/,
corner => qr/(?:and|at|&|@)/i,
unit => qr/(?:pmb|ste|suite|dept|apt|room)W+w+/i,
):

{
use re 'eval':
$Addr_Match{street} = qr/
(?:($Addr_Match{direct})W+ (?{ $_{prefix} = $^N }))?
(?:
([^,]+) (?{ $_{street} = $^N })
(?:[^w,]+($Addr_Match{type}) (?{ $_{type} = $^N }))
(?:[^w,]+($Addr_Match{direct}) (?{ $_{suffix} = $^N }))?
etc....

But I don't understand this. I've also received another response that
suggested that regular expressions are the way to go. Unless someone can
explain (in english) what's going on in the re logic above, I might just
have to read the O'Reilly Camel book!

- Bill Thoen





















>
> sonny
>
> -----Original Message-----
> From: gislist-bounces@lists.thinkburst.com
> [mailto:gislist-bounces@lists.thinkburst.com]On Behalf Of Bill Thoen
> Sent: Thursday, March 24, 2005 9:23 PM
> To: gislist@lists.thinkburst.com
> Subject: [gislist] Address Parsing for Standardization and Geocoding
>
>
> I'm looking for advice and algorithms for splitting US addresses into
> street number, prefix direction, street name, street type, suffix
> direction and unit. The problem I have is that the addresses I'm working
> with have all these logical fields combined into one physical field and
> the elements are not standardized. For example, the information in the
> street field may vary a lot. There may or may not be direction information
> or even street types. You can't be sure that the second word represents
> the prefix direction, and it's really hard to tell which word is the last
> one of the street name and whether the next word is the street type,
> suffix direction or unit. Also some street names are spelled differently,
> like "Woody Creek Rd" and "Woody Crk Rd."
>
> Any suggestions on how to approach this problem? I'm currently working on
> this in an Access database, and I can handle SQL and VBA programming
> without too much difficulty. I'm just wondering how big a problem this is,
> and how to break it down into smaller problems.
>
> I did find some general articles on street elements via Google, and I know
> where to get the USPS abbreviations for street types and directions, but I
> haven't found any technical details on how to parse and standardize
> addresses, so before I try to start from scratch, I thought I'd ask and
> see what ideas and pointers that others might have.
>
> - Bill Thoen
>
>
> _______________________________________________
> gislist mailing list
> gislist@lists.geocomm.com
> http://lists.geocomm.com/mailman/listinfo/gislist
>
> _________________________________
> This list is brought to you by
> The GeoCommunity
> http://www.geocomm.com/
>
> Get Access to the latest GIS & Geospatial Industry RFPs and bids
> http://www.geobids.com
>

_______________________________________________
gislist mailing list
gislist@lists.geocomm.com
http://lists.geocomm.com/mailman/listinfo/gislist

_________________________________
This list is brought to you by
The GeoCommunity
http://www.geocomm.com/

Get Access to the latest GIS & Geospatial Industry RFPs and bids
http://www.geobids.com

Sponsored by:

For information
regarding
advertising rates
Click Here!

Copyright© 1995-2012 MindSites Group / Privacy Policy

GeoCommunity™, Wireless Developer Network™, GIS Data Depot®, and Spatial News™
including all logos and other service marks
are registered trademarks and trade communities of
MindSites Group