Geocode Matching and Privacy Preservation

Christen, Peter

doi:10.1007/978-3-642-01718-6_2

Geocode Matching and Privacy Preservation

Peter Christen²⁰

Conference paper

619 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 5456))

Abstract

Geocoding is the process of matching addresses to geographic locations, such as latitudes and longitudes, or local census areas. In many applications, addresses are the key to geo-spatial data analysis and mining. Privacy and confidentiality are of paramount importance when data from, for example, cancer registries or crime databases is geocoded. Various approaches to privacy-preserving data matching, also called record linkage or entity resolution, have been developed in recent times. However, most of these approaches have not considered the specific privacy issues involved in geocode matching. This paper provides a brief introduction to privacy-preserving data and geocode matching, and using several real-world scenarios the issues involved in privacy and confidentiality for data and geocode matching are illustrated. The challenges of making privacy-preserving matching practical for real-world applications are highlighted, and potential directions for future research are discussed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

US Federal Geographic Data Committee. Homeland Security and Geographic Information Systems: How GIS and mapping technology can save lives and protect property in post-September 11th America. Public Health GIS News and Information (52), 21–23 (May 2003)
Google Scholar
Christen, P., Goiser, K.: Quality and complexity measures for data linkage and deduplication. In: Guillet, F., Hamilton, H.J. (eds.) Quality Measures in Data Mining. Studies in Computational Intelligence, vol. 43, pp. 127–151. Springer, Heidelberg (2007)
Chapter Google Scholar
Winkler, W.E.: Overview of record linkage and current research directions. Technical Report RRS2006/02, US Bureau of the Census (2006)
Google Scholar
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)
Article Google Scholar
Kelman, C.W., Bass, J.A., Holman, D.: Research use of linked health data – A best practice protocol. ANZ Journal of Public Health 26(3), 251–255 (2002)
Google Scholar
Jonas, J., Harper, J.: Effective counterterrorism and the limited role of predictive data mining. Policy Analysis (584) (2006)
Google Scholar
Wang, G., Chen, H., Xu, J.J., Atabakhsh, H.: Automatically detecting criminal identity deception: An adaptive detection algorithm. IEEE Transactions on Systems, Man and Cybernetics (Part A) 36(5), 988–999 (2006)
Article Google Scholar
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1) (2007)
Google Scholar
Hernandez, M.A., Stolfo, S.J.: Real-world data is dirty: Data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery 2(1), 9–37 (1998)
Article Google Scholar
Churches, T., Christen, P., Lim, K., Zhu, J.: Preparation of name and address data for record linkage using hidden Markov models. BioMed Central Medical Informatics and Decision Making 2(9) (2002)
Google Scholar
Baxter, R., Christen, P., Churches, T.: A comparison of fast blocking methods for record linkage. In: ACM KDD Workshop on Data Cleaning, Record Linkage and Object Consolidation, Washington, DC (2003)
Google Scholar
Christen, P.: Febrl – An open source data cleaning, deduplication and record linkage system with a graphical user interface. In: ACM International Conference on Knowledge Discovery and Data Mining, Las Vegas, pp. 1065–1068 (2008)
Google Scholar
Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: IJCAI Workshop on Information Integration on the Web, Acapulco, pp. 73–78 (2003)
Google Scholar
Christen, P.: A comparison of personal name matching: Techniques and practical issues. In: IEEE ICDM Workshop on Mining Complex Data, Hong Kong, pp. 290–294 (2006)
Google Scholar
Christen, P.: Automatic record linkage using seeded nearest neighbour and support vector machine classification. In: ACM International Conference on Knowledge Discovery and Data Mining, Las Vegas, pp. 151–159 (2008)
Google Scholar
Clarke, D.: Practical introduction to record linkage for injury research. Injury Prevention 10, 186–191 (2004)
Article Google Scholar
Christen, P., Willmore, A., Churches, T.: A probabilistic geocoding system utilising a parcel based address file. In: Williams, G.J., Simoff, S.J. (eds.) Data Mining. LNCS (LNAI), vol. 3755, pp. 130–145. Springer, Heidelberg (2006)
Chapter Google Scholar
Paull, D.: A geocoded national address file for Australia: The G-NAF what, why, who and when? PSMA Australia Limited, Griffith, ACT, Australia (2003), http://www.g-naf.com.au/
Cayo, M.R., Talbot, T.O.: Positional error in automated geocoding of residential addresses. International Journal of Health Geographics 2(10) (2003)
Google Scholar
Brownstein, J.S., Cassa, C., Kohane, I.S., Mandl, K.D.: Reverse geocoding: Concerns about patient confidentiality in the display of geospatial health data. In: AMIA Annual Symposium Proceedings 2005, p. 905 (2005)
Google Scholar
Brownstein, J.S., Cassa, C., Mandl, K.D.: No place to hide–reverse identification of patients from published maps. New England Journal of Medicine 355(16), 1741–1742 (2006)
Article Google Scholar
Curtis, A.J., Mills, J.W., Leitner, M.: Spatial confidentiality and GIS: Re-engineering mortality locations from published maps about Hurricane Katrina. International Journal of Health Geographics 5(1), 44–56 (2006)
Article Google Scholar
Australian Attorney-General’s Department, Standing Committee of Attorney’s-General: Model criminal law officers’ committee: Final report on identity crime. Canberra (March 2008)
Google Scholar
Chaytor, R., Brown, E., Wareham, T.: Privacy advisors for personal information management. In: SIGIR Workshop on Personal Information Management, Seattle, Washington, pp. 28–31 (2006)
Google Scholar
Fienberg, S.E.: Privacy and confidentiality in an e-Commerce world: Data mining, data warehousing, matching and disclosure limitation. Statistical Science 21(2), 143–154 (2006)
Article MathSciNet MATH Google Scholar
Sweeney, L.: K-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 557–570 (2002)
Article MathSciNet MATH Google Scholar
Christen, P.: Privacy-preserving data linkage and geocoding: Current approaches and research directions. In: IEEE ICDM Workshop on Privacy Aspects of Data Mining, Hong Kong, pp. 497–501 (2006)
Google Scholar
Sweeney, L.: Privacy-enhanced linking. ACM SIGKDD Explorations 7(2), 72–75 (2005)
Article Google Scholar
Christen, P., Churches, T.: Secure health data linkage and geocoding: Current approaches and research directions. In: National e-Health Privacy and Security Symposium, Brisbane, Australia (2006)
Google Scholar
Wartell, J., McEwen, T.: Privacy in the information age: A guide for sharing crime maps and spatial data. Institute for Law and Justice, NCJ 188739 (July 2001)
Google Scholar
Rushton, G., Armstrong, M.P., Gittler, J., Greene, B.R., Pavlik, C.E., West, M.M., Zimmerman, D.L.: Geocoding in cancer research – A review. American Journal of Preventive Medicine 30(2S), 16–24 (2006)
Article Google Scholar
Rivest, R.L.: Chaffing and winnowing: Confidentiality without encryption. MIT Lab for Computer Science (1998), http://theory.lcs.mit.edu/~rivest/chaffing.txt
Churches, T.: A proposed architecture and method of operation for improving the protection of privacy and confidentiality in disease registers. BioMed. Central Medical Research Methodology 3(1) (2003)
Google Scholar
Bouzelat, H., Quantin, C., Dusserre, L.: Extraction and anonymity protocol of medical file. In: AMIA Fall Symposium, pp. 323–327 (1996)
Google Scholar
Dusserre, L., Quantin, C., Bouzelat, H.: A one way public key cryptosystem for the linkage of nominal files in epidemiological studies. Medinfo. 8(644–7) (1995)
Google Scholar
Quantin, C., Bouzelat, H., Allaert, F.A., Benhamiche, A.M., Faivre, J., Dusserre, L.: Automatic record hash coding and linkage for epidemiological follow-up data confidentiality. Methods of Information in Medicine 37(3), 271–277 (1998)
Google Scholar
Quantin, C., Bouzelat, H., Allaert, F.A., Benhamiche, A.M., Faivre, J., Dusserre, L.: How to ensure data quality of an epidemiological follow-up: Quality assessment of an anonymous record linkage procedure. International Journal of Medical Informatics 49(1), 117–122 (1998)
Article Google Scholar
Quantin, C., Bouzelat, H., Dusserre, L.: Irreversible encryption method by generation of polynomials. Medical Informatics and the Internet in Medicine 21(2), 113–121 (1996)
Article Google Scholar
Schneier, B.: Applied cryptography: Protocols, algorithms, and source code in C, 2nd edn. John Wiley & Sons, Inc., New York (1995)
MATH Google Scholar
Ravikumar, P., Cohen, W.W., Fienberg, S.E.: A secure protocol for computing string distance metrics. In: IEEE ICDM Workshop on Privacy and Security Aspects of Data Mining, Brighton, UK (2004)
Google Scholar
Atallah, M.J., Kerschbaum, F., Du, W.: Secure and private sequence comparisons. In: ACM Workshop on Privacy in the Electronic Society, Washington DC, pp. 39–44 (2003)
Google Scholar
O’Keefe, C.M., Yung, M., Gu, L., Baxter, R.: Privacy-preserving data linkage protocols. In: ACM Workshop on Privacy in the Electronic Society, Washington DC, pp. 94–102 (2004)
Google Scholar
Churches, T., Christen, P.: Some methods for blindfolded record linkage. BioMed. Central Medical Informatics and Decision Making 4(9) (2004)
Google Scholar
Al-Lawati, A., Lee, D., McDaniel, P.: Blocking-aware private record linkage. In: International Workshop on Information Quality in Information Systems, Baltimore, pp. 59–68 (2005)
Google Scholar
Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: IEEE International Conference Data Engineering, pp. 496–505 (2008)
Google Scholar
Christen, P.: Automatic training example selection for scalable unsupervised record linkage. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 511–518. Springer, Heidelberg (2008)
Chapter Google Scholar
Guisado-Gamez, J., Prat-Perez, A., Nin, J., Muntes-Mulero, V., Larriba-Pey, J.L.: Parallelizing record linkage for disclosure risk assessment. In: Privacy in Statistical Databases, Istanbul, Turkey. LNCS, vol. 5262, pp. 190–202. Springer, Heidelberg (2008)
Chapter Google Scholar
Christen, P., Gayler, R.: Towards scalable real-time entity resolution using a similarity-aware inverted index approach. In: AusDM 2008, CRPIT, Glenelg, Australia, vol. 87, pp. 51–60 (2008)
Google Scholar
Winkler, W.E.: Masking and re-identification methods for public-use microdata: Overview and research problems. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 216–230. Springer, Heidelberg (2004)
Chapter Google Scholar
Malin, B., Sweeney, L.: A secure protocol to distribute unlinkable health data. In: American Medical Informatics Association 2005 Annual Symposium, Washington DC, pp. 485–489 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The Australian National University, Canberra, ACT 0200, Australia
Peter Christen

Authors

Peter Christen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Yahoo! Research Barcelona, Avinguda Drive 177, 08018, Barcelona, Spain
Francesco Bonchi
Department of Computer Science and Communication, University of Insubria, 22100, Varese, Italy
Elena Ferrari
311 Computer Science Building, 500W. 15th St., MO 65409, Rolla, USA
Wei Jiang
Department of Biomedical Informatics, Vanderbilt University, TN 37203, nashville, USA
Bradley Malin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Christen, P. (2009). Geocode Matching and Privacy Preservation. In: Bonchi, F., Ferrari, E., Jiang, W., Malin, B. (eds) Privacy, Security, and Trust in KDD. PInKDD 2008. Lecture Notes in Computer Science, vol 5456. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01718-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-01718-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01717-9
Online ISBN: 978-3-642-01718-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics