Assessing the Certainty of Locations Produced by an Address Geocoding System
Addresses are the most common georeferencing resource people use to communicate to others a location within a city. Urban GIS applications that receive data directly from citizens, or from legacy information systems, need to be able to quickly and efficiently obtain a spatial location from addresses. In this paper we understand addresses in a broader perspective, in which not only the conventional elements of postal addresses are considered, but other kinds of direct or indirect references to places, such as building names, postal codes, or telephone area codes, which are also valuable as locators to urban places. This broader view on addresses allows us to work with two perspectives. First, in the ontological definition, modeling, and implementation of an addressing database that is flexible enough to accommodate the variety of concepts and address formats used worldwide, along with direct and indirect references to places. Second, in the definition of an indicator that is able to quantify the degree of certainty that could be reached when a user-given, semi-structured address is geocoded into a spatial position, as a function of the type and completeness of the available addressing data and of the geocoding method that has been employed. This indicator, which we call Geocoding Certainty Indicator (GCI), can be used as a threshold, beyond which the geocoded event should be left out of any statistical analysis, or as a weight that allows spatial analysis methods to reduce the influence of events that have been less reliably located. In order to support geocoding activities and the determination of the GCI, we propose a conceptual schema for addressing databases. The schema is flexible enough to accommodate a variety of addressing systems, at various levels of detail, and in different countries. Our intention is to depart from the usual geocoding strategy employed in commercial GIS products, which is usually limited to the average American or British address format. The schema also extends the notion of postal address to something broader, including popular names for places, building names, reference places, and other concepts. This approach extends Simpson’s and Yu’s Comput. Environ. Urban Syst., 27: 283–307, 2003 work on postal codes to records of any kind, including place names and loosely formatted addresses.
Keywordsaddress geocoding geographic information systems spatial databases certainty assessment postal addresses
Frederico Fonseca’s work was partially supported by the National Science Foundation under NSF ITR grant number 0219025 and by the generous support of Penn State’s College of Information Sciences and Technology. Clodoveu Davis’s work is partially supported by CNPq, the Brazilian governmental agency in charge of fostering scientific and technological development. His work in this paper is related to projects ChegoLá (FAPEMIG EDT 1461/03), Saudavel (CNPq grant number 552044/2002-4), and EndFlex (CNPq grant number 502853/2004-2). Authors also thank PRODABEL, the information technology company for the city of Belo Horizonte, for providing data used in the development and testing of the software described in the paper. The authors also wish to thank Max Egenhofer for his comments and suggestions on an early draft of this paper.
- 1.W. Aref and H. Samet. “Optimization strategies for spatial query processing,” in 17th International Conference on Very Large Data Bases, Barcelona, Spain, 1991.Google Scholar
- 2.K.A.V. Borges, A.H.F. Laender, C.B. Medeiros, A.S. Silva, and C.A. Davis Jr. “The web as a data source for spatial databases,” in V Brazilian Symposium on GeoInformatics (GeoInfo 2003), Campos do Jordão (SP), 2003.Google Scholar
- 4.Britannica Student Encyclopaedia. Seoul. Encyclopaedia Britannica Online Volume, 2006.Google Scholar
- 5.C.A. Davis Jr. “Address base creation using raster–vector integration,” in URISA 1993 Annual Conference, URISA: Atlanta, Georgia, 1993.Google Scholar
- 6.C.A. Davis Jr., F. Fonseca, and K.A.V. Borges. “A flexible addressing system for approximate geocoding,” in V Brazilian Symposium on GeoInformatics (GeoInfo 2003), Campos do Jordão (SP), 2003.Google Scholar
- 9.P. Eichelberger. “The importance of addresses—the locus of GIS,” in URISA 1993 Annual Conference, URISA: Atlanta, Georgia, 1993.Google Scholar
- 10.Federal Geographic Data Committee. Draft Proposal for a National Spatial Data Infrastructure Standards Project—Address Content Standard, FGDC, 2003.Google Scholar
- 12.L.L. Hill. “Core elements of digital gazetteers: placenames, categories, and footprints,” in 4th European Conference on Research and Advanced Technology for Digital Libraries, 2000.Google Scholar
- 13.K. Hiramatsu and T. Ishida. “An augmented web space for digital cities,” in IEEE/IPSJ Symposium on Applications and the Internet (SAINT—01), 2001.Google Scholar
- 14.C.B. Jones, R. Purves, A. Ruas, M. Sanderson, M. Sester, M. Van Kreveld, and R. Weibel. “Spatial information retrieval and geographical ontologies: an overview of the SPIRIT project,” in The 25th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR 2002), Tampere, Finland, 2002.Google Scholar
- 15.P. Longley, M. Goodchild, D. Maguire, and D. Rhind (Eds.), Geographical Information Systems. Wiley: New York, 1999.Google Scholar
- 16.K.S. McCurley. “Geospatial mapping and navigation on the web,” in Tenth International World Wide Web Conference (WWW10), ACM: Hong Kong, 2001.Google Scholar
- 17.A.M.V. Monteiro, M.S. Carvalho, R. Assunção, W. Vieira, P.J. Ribeiro, C.A. Davis Jr., and L. Regis. SAUDAVEL: Bridging the Gap between Research and Services in Public Health Operational Programs by Multi-institutional Networking Development and Use of Spatial Information Technology Innovative Tools. Instituto Nacional de Pesquisas Espaciais: São José dos Campos (SP), Brazil, 2004; Available from: http://www.dpi.inpe.br/saudavel/documentos/ArtigoSAUDAVELAgo2004.pdf.
- 20.G.R. Rhind. Global Sourcebook of Address Data Management: A Guide to Address Formats and Data in 194 Countries. Gower: Aldershot, 615, 1999.Google Scholar
- 22.RuaVista Magazine. The Numbering System of Buildings. [Web site] 2003 [cited 2003 05 may 2003]; Available from: http://www.ruavista.com/numbering.htm.
- 23.G. Rushton, G. Elmes, and R. McMaster. “Considerations for improving geographic information system research in public health,” URISA Journal, Vol. 12(2):31–49, 2000.Google Scholar
- 26.L.A. Souza, C.A. Davis Jr., K.A.V. Borges, T.M. Delboni, and A.H.F. Laender. “The role of gazetteers in geographic knowledge discovery on the web,” in 3rd Latin American Web Congress, Buenos Aires, Argentina, 2005.Google Scholar
- 27.U.S. Census Bureau. 108th CD Census 2000 TIGER/Line Files Technical Documentation. 2003 March 2003 [cited; 321]. Available from: http://www.census.gov/geo/www/tiger/tgrcd108/tgr108cd.pdf.