Abstract
Geocoding is the process of matching addresses to geographic locations, such as latitudes and longitudes, or local census areas. In many applications, addresses are the key to geo-spatial data analysis and mining. Privacy and confidentiality are of paramount importance when data from, for example, cancer registries or crime databases is geocoded. Various approaches to privacy-preserving data matching, also called record linkage or entity resolution, have been developed in recent times. However, most of these approaches have not considered the specific privacy issues involved in geocode matching. This paper provides a brief introduction to privacy-preserving data and geocode matching, and using several real-world scenarios the issues involved in privacy and confidentiality for data and geocode matching are illustrated. The challenges of making privacy-preserving matching practical for real-world applications are highlighted, and potential directions for future research are discussed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
US Federal Geographic Data Committee. Homeland Security and Geographic Information Systems: How GIS and mapping technology can save lives and protect property in post-September 11th America. Public Health GIS News and Information (52), 21–23 (May 2003)
Christen, P., Goiser, K.: Quality and complexity measures for data linkage and deduplication. In: Guillet, F., Hamilton, H.J. (eds.) Quality Measures in Data Mining. Studies in Computational Intelligence, vol. 43, pp. 127–151. Springer, Heidelberg (2007)
Winkler, W.E.: Overview of record linkage and current research directions. Technical Report RRS2006/02, US Bureau of the Census (2006)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)
Kelman, C.W., Bass, J.A., Holman, D.: Research use of linked health data – A best practice protocol. ANZ Journal of Public Health 26(3), 251–255 (2002)
Jonas, J., Harper, J.: Effective counterterrorism and the limited role of predictive data mining. Policy Analysis (584) (2006)
Wang, G., Chen, H., Xu, J.J., Atabakhsh, H.: Automatically detecting criminal identity deception: An adaptive detection algorithm. IEEE Transactions on Systems, Man and Cybernetics (Part A) 36(5), 988–999 (2006)
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1) (2007)
Hernandez, M.A., Stolfo, S.J.: Real-world data is dirty: Data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery 2(1), 9–37 (1998)
Churches, T., Christen, P., Lim, K., Zhu, J.: Preparation of name and address data for record linkage using hidden Markov models. BioMed Central Medical Informatics and Decision Making 2(9) (2002)
Baxter, R., Christen, P., Churches, T.: A comparison of fast blocking methods for record linkage. In: ACM KDD Workshop on Data Cleaning, Record Linkage and Object Consolidation, Washington, DC (2003)
Christen, P.: Febrl – An open source data cleaning, deduplication and record linkage system with a graphical user interface. In: ACM International Conference on Knowledge Discovery and Data Mining, Las Vegas, pp. 1065–1068 (2008)
Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: IJCAI Workshop on Information Integration on the Web, Acapulco, pp. 73–78 (2003)
Christen, P.: A comparison of personal name matching: Techniques and practical issues. In: IEEE ICDM Workshop on Mining Complex Data, Hong Kong, pp. 290–294 (2006)
Christen, P.: Automatic record linkage using seeded nearest neighbour and support vector machine classification. In: ACM International Conference on Knowledge Discovery and Data Mining, Las Vegas, pp. 151–159 (2008)
Clarke, D.: Practical introduction to record linkage for injury research. Injury Prevention 10, 186–191 (2004)
Christen, P., Willmore, A., Churches, T.: A probabilistic geocoding system utilising a parcel based address file. In: Williams, G.J., Simoff, S.J. (eds.) Data Mining. LNCS (LNAI), vol. 3755, pp. 130–145. Springer, Heidelberg (2006)
Paull, D.: A geocoded national address file for Australia: The G-NAF what, why, who and when? PSMA Australia Limited, Griffith, ACT, Australia (2003), http://www.g-naf.com.au/
Cayo, M.R., Talbot, T.O.: Positional error in automated geocoding of residential addresses. International Journal of Health Geographics 2(10) (2003)
Brownstein, J.S., Cassa, C., Kohane, I.S., Mandl, K.D.: Reverse geocoding: Concerns about patient confidentiality in the display of geospatial health data. In: AMIA Annual Symposium Proceedings 2005, p. 905 (2005)
Brownstein, J.S., Cassa, C., Mandl, K.D.: No place to hide–reverse identification of patients from published maps. New England Journal of Medicine 355(16), 1741–1742 (2006)
Curtis, A.J., Mills, J.W., Leitner, M.: Spatial confidentiality and GIS: Re-engineering mortality locations from published maps about Hurricane Katrina. International Journal of Health Geographics 5(1), 44–56 (2006)
Australian Attorney-General’s Department, Standing Committee of Attorney’s-General: Model criminal law officers’ committee: Final report on identity crime. Canberra (March 2008)
Chaytor, R., Brown, E., Wareham, T.: Privacy advisors for personal information management. In: SIGIR Workshop on Personal Information Management, Seattle, Washington, pp. 28–31 (2006)
Fienberg, S.E.: Privacy and confidentiality in an e-Commerce world: Data mining, data warehousing, matching and disclosure limitation. Statistical Science 21(2), 143–154 (2006)
Sweeney, L.: K-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 557–570 (2002)
Christen, P.: Privacy-preserving data linkage and geocoding: Current approaches and research directions. In: IEEE ICDM Workshop on Privacy Aspects of Data Mining, Hong Kong, pp. 497–501 (2006)
Sweeney, L.: Privacy-enhanced linking. ACM SIGKDD Explorations 7(2), 72–75 (2005)
Christen, P., Churches, T.: Secure health data linkage and geocoding: Current approaches and research directions. In: National e-Health Privacy and Security Symposium, Brisbane, Australia (2006)
Wartell, J., McEwen, T.: Privacy in the information age: A guide for sharing crime maps and spatial data. Institute for Law and Justice, NCJ 188739 (July 2001)
Rushton, G., Armstrong, M.P., Gittler, J., Greene, B.R., Pavlik, C.E., West, M.M., Zimmerman, D.L.: Geocoding in cancer research – A review. American Journal of Preventive Medicine 30(2S), 16–24 (2006)
Rivest, R.L.: Chaffing and winnowing: Confidentiality without encryption. MIT Lab for Computer Science (1998), http://theory.lcs.mit.edu/~rivest/chaffing.txt
Churches, T.: A proposed architecture and method of operation for improving the protection of privacy and confidentiality in disease registers. BioMed. Central Medical Research Methodology 3(1) (2003)
Bouzelat, H., Quantin, C., Dusserre, L.: Extraction and anonymity protocol of medical file. In: AMIA Fall Symposium, pp. 323–327 (1996)
Dusserre, L., Quantin, C., Bouzelat, H.: A one way public key cryptosystem for the linkage of nominal files in epidemiological studies. Medinfo. 8(644–7) (1995)
Quantin, C., Bouzelat, H., Allaert, F.A., Benhamiche, A.M., Faivre, J., Dusserre, L.: Automatic record hash coding and linkage for epidemiological follow-up data confidentiality. Methods of Information in Medicine 37(3), 271–277 (1998)
Quantin, C., Bouzelat, H., Allaert, F.A., Benhamiche, A.M., Faivre, J., Dusserre, L.: How to ensure data quality of an epidemiological follow-up: Quality assessment of an anonymous record linkage procedure. International Journal of Medical Informatics 49(1), 117–122 (1998)
Quantin, C., Bouzelat, H., Dusserre, L.: Irreversible encryption method by generation of polynomials. Medical Informatics and the Internet in Medicine 21(2), 113–121 (1996)
Schneier, B.: Applied cryptography: Protocols, algorithms, and source code in C, 2nd edn. John Wiley & Sons, Inc., New York (1995)
Ravikumar, P., Cohen, W.W., Fienberg, S.E.: A secure protocol for computing string distance metrics. In: IEEE ICDM Workshop on Privacy and Security Aspects of Data Mining, Brighton, UK (2004)
Atallah, M.J., Kerschbaum, F., Du, W.: Secure and private sequence comparisons. In: ACM Workshop on Privacy in the Electronic Society, Washington DC, pp. 39–44 (2003)
O’Keefe, C.M., Yung, M., Gu, L., Baxter, R.: Privacy-preserving data linkage protocols. In: ACM Workshop on Privacy in the Electronic Society, Washington DC, pp. 94–102 (2004)
Churches, T., Christen, P.: Some methods for blindfolded record linkage. BioMed. Central Medical Informatics and Decision Making 4(9) (2004)
Al-Lawati, A., Lee, D., McDaniel, P.: Blocking-aware private record linkage. In: International Workshop on Information Quality in Information Systems, Baltimore, pp. 59–68 (2005)
Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: IEEE International Conference Data Engineering, pp. 496–505 (2008)
Christen, P.: Automatic training example selection for scalable unsupervised record linkage. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 511–518. Springer, Heidelberg (2008)
Guisado-Gamez, J., Prat-Perez, A., Nin, J., Muntes-Mulero, V., Larriba-Pey, J.L.: Parallelizing record linkage for disclosure risk assessment. In: Privacy in Statistical Databases, Istanbul, Turkey. LNCS, vol. 5262, pp. 190–202. Springer, Heidelberg (2008)
Christen, P., Gayler, R.: Towards scalable real-time entity resolution using a similarity-aware inverted index approach. In: AusDM 2008, CRPIT, Glenelg, Australia, vol. 87, pp. 51–60 (2008)
Winkler, W.E.: Masking and re-identification methods for public-use microdata: Overview and research problems. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 216–230. Springer, Heidelberg (2004)
Malin, B., Sweeney, L.: A secure protocol to distribute unlinkable health data. In: American Medical Informatics Association 2005 Annual Symposium, Washington DC, pp. 485–489 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Christen, P. (2009). Geocode Matching and Privacy Preservation. In: Bonchi, F., Ferrari, E., Jiang, W., Malin, B. (eds) Privacy, Security, and Trust in KDD. PInKDD 2008. Lecture Notes in Computer Science, vol 5456. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01718-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-01718-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01717-9
Online ISBN: 978-3-642-01718-6
eBook Packages: Computer ScienceComputer Science (R0)