, Volume 11, Issue 1, pp 103–129 | Cite as

Assessing the Certainty of Locations Produced by an Address Geocoding System

  • Clodoveu A. DavisJrEmail author
  • Frederico T. Fonseca


Addresses are the most common georeferencing resource people use to communicate to others a location within a city. Urban GIS applications that receive data directly from citizens, or from legacy information systems, need to be able to quickly and efficiently obtain a spatial location from addresses. In this paper we understand addresses in a broader perspective, in which not only the conventional elements of postal addresses are considered, but other kinds of direct or indirect references to places, such as building names, postal codes, or telephone area codes, which are also valuable as locators to urban places. This broader view on addresses allows us to work with two perspectives. First, in the ontological definition, modeling, and implementation of an addressing database that is flexible enough to accommodate the variety of concepts and address formats used worldwide, along with direct and indirect references to places. Second, in the definition of an indicator that is able to quantify the degree of certainty that could be reached when a user-given, semi-structured address is geocoded into a spatial position, as a function of the type and completeness of the available addressing data and of the geocoding method that has been employed. This indicator, which we call Geocoding Certainty Indicator (GCI), can be used as a threshold, beyond which the geocoded event should be left out of any statistical analysis, or as a weight that allows spatial analysis methods to reduce the influence of events that have been less reliably located. In order to support geocoding activities and the determination of the GCI, we propose a conceptual schema for addressing databases. The schema is flexible enough to accommodate a variety of addressing systems, at various levels of detail, and in different countries. Our intention is to depart from the usual geocoding strategy employed in commercial GIS products, which is usually limited to the average American or British address format. The schema also extends the notion of postal address to something broader, including popular names for places, building names, reference places, and other concepts. This approach extends Simpson’s and Yu’s Comput. Environ. Urban Syst., 27: 283–307, 2003 work on postal codes to records of any kind, including place names and loosely formatted addresses.


address geocoding geographic information systems spatial databases certainty assessment postal addresses 



Frederico Fonseca’s work was partially supported by the National Science Foundation under NSF ITR grant number 0219025 and by the generous support of Penn State’s College of Information Sciences and Technology. Clodoveu Davis’s work is partially supported by CNPq, the Brazilian governmental agency in charge of fostering scientific and technological development. His work in this paper is related to projects ChegoLá (FAPEMIG EDT 1461/03), Saudavel (CNPq grant number 552044/2002-4), and EndFlex (CNPq grant number 502853/2004-2). Authors also thank PRODABEL, the information technology company for the city of Belo Horizonte, for providing data used in the development and testing of the software described in the paper. The authors also wish to thank Max Egenhofer for his comments and suggestions on an early draft of this paper.


  1. 1.
    W. Aref and H. Samet. “Optimization strategies for spatial query processing,” in 17th International Conference on Very Large Data Bases, Barcelona, Spain, 1991.Google Scholar
  2. 2.
    K.A.V. Borges, A.H.F. Laender, C.B. Medeiros, A.S. Silva, and C.A. Davis Jr. “The web as a data source for spatial databases,” in V Brazilian Symposium on GeoInformatics (GeoInfo 2003), Campos do Jordão (SP), 2003.Google Scholar
  3. 3.
    K.A.V. Borges, C.A. Davis Jr., and A.H.F. Laender. “OMT-G: an object-oriented data model for geographic applications,” GeoInformatica, Vol. 5(3):221–260, 2001.CrossRefGoogle Scholar
  4. 4.
    Britannica Student Encyclopaedia. Seoul. Encyclopaedia Britannica Online Volume, 2006.Google Scholar
  5. 5.
    C.A. Davis Jr. “Address base creation using raster–vector integration,” in URISA 1993 Annual Conference, URISA: Atlanta, Georgia, 1993.Google Scholar
  6. 6.
    C.A. Davis Jr., F. Fonseca, and K.A.V. Borges. “A flexible addressing system for approximate geocoding,” in V Brazilian Symposium on GeoInformatics (GeoInfo 2003), Campos do Jordão (SP), 2003.Google Scholar
  7. 7.
    G. Derekenaris, J. Garofalakis, C. Makris, J. Prentzas, S. Sioutas, and A. Tsakalidis. “Integrating GIS, GPS and GSM technologies for the effective management of ambulances,” Computers, Environment and Urban Systems, Vol. 25(3):267–278, 2001.CrossRefGoogle Scholar
  8. 8.
    M. Duckham, K. Mason, J. Stell, and M. Worboys. “A formal approach to imperfection in geographic information,” Computers, Environment and Urban Systems, Vol. 25(1):89–103, 2001.CrossRefGoogle Scholar
  9. 9.
    P. Eichelberger. “The importance of addresses—the locus of GIS,” in URISA 1993 Annual Conference, URISA: Atlanta, Georgia, 1993.Google Scholar
  10. 10.
    Federal Geographic Data Committee. Draft Proposal for a National Spatial Data Infrastructure Standards Project—Address Content Standard, FGDC, 2003.Google Scholar
  11. 11.
    M. Goodchild. “GIS and transportation: status and challenges,” GeoInformatica, Vol. 4(2):127–139, 2000.CrossRefGoogle Scholar
  12. 12.
    L.L. Hill. “Core elements of digital gazetteers: placenames, categories, and footprints,” in 4th European Conference on Research and Advanced Technology for Digital Libraries, 2000.Google Scholar
  13. 13.
    K. Hiramatsu and T. Ishida. “An augmented web space for digital cities,” in IEEE/IPSJ Symposium on Applications and the Internet (SAINT—01), 2001.Google Scholar
  14. 14.
    C.B. Jones, R. Purves, A. Ruas, M. Sanderson, M. Sester, M. Van Kreveld, and R. Weibel. “Spatial information retrieval and geographical ontologies: an overview of the SPIRIT project,” in The 25th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR 2002), Tampere, Finland, 2002.Google Scholar
  15. 15.
    P. Longley, M. Goodchild, D. Maguire, and D. Rhind (Eds.), Geographical Information Systems. Wiley: New York, 1999.Google Scholar
  16. 16.
    K.S. McCurley. “Geospatial mapping and navigation on the web,” in Tenth International World Wide Web Conference (WWW10), ACM: Hong Kong, 2001.Google Scholar
  17. 17.
    A.M.V. Monteiro, M.S. Carvalho, R. Assunção, W. Vieira, P.J. Ribeiro, C.A. Davis Jr., and L. Regis. SAUDAVEL: Bridging the Gap between Research and Services in Public Health Operational Programs by Multi-institutional Networking Development and Use of Spatial Information Technology Innovative Tools. Instituto Nacional de Pesquisas Espaciais: São José dos Campos (SP), Brazil, 2004; Available from:
  18. 18.
    M. Morad. “British Standard 7666 as a framework for geocoding land and property information the UK,” Computers, Environment and Urban Systems, Vol. 26(5):483–492, 2002.CrossRefGoogle Scholar
  19. 19.
    G. Navarro. “A guided tour to approximate string matching,” ACM Computing Surveys, Vol. 33(1):31–88, 2001.CrossRefGoogle Scholar
  20. 20.
    G.R. Rhind. Global Sourcebook of Address Data Management: A Guide to Address Formats and Data in 194 Countries. Gower: Aldershot, 615, 1999.Google Scholar
  21. 21.
    T.B. Richards, C.M. Croner, G. Rushton, C.K. Brown, and L. Fowler. “Geographic information systems and public health: mapping the future,” Public Health Reports, Vol. 114(4):359–373, 1999.CrossRefGoogle Scholar
  22. 22.
    RuaVista Magazine. The Numbering System of Buildings. [Web site] 2003 [cited 2003 05 may 2003]; Available from:
  23. 23.
    G. Rushton, G. Elmes, and R. McMaster. “Considerations for improving geographic information system research in public health,” URISA Journal, Vol. 12(2):31–49, 2000.Google Scholar
  24. 24.
    P. Scarponcini. “Generalized model for linear referencing in transportation,” GeoInformatica, Vol. 6(1):35–55, 2002.CrossRefGoogle Scholar
  25. 25.
    L. Simpson and A. Yu. “Public access to conversion of data between geographies, with multiple look up tables derived from a postal directory,” Computers, Environment and Urban Systems, Vol. 27(3):283–307, 2003.CrossRefGoogle Scholar
  26. 26.
    L.A. Souza, C.A. Davis Jr., K.A.V. Borges, T.M. Delboni, and A.H.F. Laender. “The role of gazetteers in geographic knowledge discovery on the web,” in 3rd Latin American Web Congress, Buenos Aires, Argentina, 2005.Google Scholar
  27. 27.
    U.S. Census Bureau. 108th CD Census 2000 TIGER/Line Files Technical Documentation. 2003 March 2003 [cited; 321]. Available from:
  28. 28.
    S. Wu and U. Manber. “Fast text searching allowing errors,” Communications of the ACM, Vol. 35(10):83–91, 1992.CrossRefGoogle Scholar
  29. 29.
    D.H. Yang, L.M. Bilaver, O. Hayes, and R. Goerge. “Improving geocoding practices: evaluation of geocoding tools,” Journal of Medical Systems, Vol. 28(4):361–370, 2004.CrossRefGoogle Scholar
  30. 30.
    J. Zobel and P. Dart. “Finding approximate matches in large lexicons,” Software—Practice and Experience, Vol. 25(3):331–345, 1995.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Instituto de InformáticaPontifícia Universidade Católica de Minas GeraisBelo HorizonteBrazil
  2. 2.College of Information Sciences and TechnologyThe Pennsylvania State UniversityUniversity ParkUSA

Personalised recommendations