Abstract
In this paper we address the problem of matching data from different databases using a third party, where the actual data can not be disclosed. The aim is to provide a mechanism for improved matching results across databases while preserving the privacy of sensitive information in those databases. This is particularly relevant with health related databases, where bringing data about patients together from multiple databases allows for important medical research, but the sensitive nature of the data requires that identifying information never be disclosed.
The method described uses a public reference table to provide a way for matching people’s names in different databases without requiring identifying information to be revealed to any party outside the originating data source. An advantage of our algorithm is that it provides a mechanism for dealing with typographical or other errors in the data.
The key features of our proposed approach are: (1) original private data from individual custodians are never revealed to any other party because data comparison is performed at individual custodians and only comparison results, which are data in the reference table, are sent; (2) the third party performs the match based on encrypted values in the public reference table and some distance information. Experimental results show that our proposed method performs fuzzy matching (similarity join) at an accuracy comparable to that of conventional fuzzy matching algorithms without revealing any identifying information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Atallah, M.J., Kerschbaum, F., Du, W.: Secure and private sequence comparisons. In: WPES 2003: Proceedings of the 2003 ACM workshop on Privacy in the electronic society, pp. 39–44. ACM Press, New York (2003)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley Professional, Reading (1999)
Boruch, R., Cecil, J.: Assuring the confidentiality of social research data. University of Philadelphia Press, Philadelphia (1979)
Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 313–324. ACM Press, New York (2003)
Christen, P., Churches, T., Hegland, M.: Febrl - A Parallel Open Source Data Linkage System. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS, vol. 3056, pp. 638–647. Springer, Heidelberg (2004)
Churches, T.: A proposed architecture and method of operation for improving the protection of privacy and confidentiality in disease registers. BMC Medical Research Methodology 3 (2003)
Churches, T., Christen, P.: Some methods for blindfolded record linkage. BMC Medical Informatics and Decision Making 4(1), 9 (2004)
Cohen, W.: Data integration using similarity joins and a word-based information representation language. ACM Transactions on Information Systems 18(3), 288–321 (2000)
Cohen, W.W., Sarawagi, S.: Exploiting dictionaries in named entity extraction: combining semi-markov extraction processes and data integration methods. In: KDD 2004: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 89–98. ACM Press, New York (2004)
Culberson, J.C., Reckhow, R.A.: Covering polygons is hard. J. Algorithms 17(1), 2–44 (1994)
Du, W., Atallah, M.: Privacy-preserving statistical analysis. In: Proc. of the 17th Annual Computer Security Applications Conference, pp. 102–110 (2001)
Dusserre, L., Quantin, C., Bouzelat, H.: A one way public key cryptosystem for the linkage of nominal files in epidemiological studies. International Journal of Medical Informatics 8, 644–647 (1995)
Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 491–500. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Gu, L., Baxter, R.: Adaptive Filtering for Efficient Record Linkage. In: Proc. of SIAM International Conference on Data Mining (SIAM 2004), Orlando, Florida, April 2004, pp. 477–481 (2004)
Gu, L., Baxter, R.: Decision models for record linkage. In: Proc. of the 3rd Australasian Data Mining Conference, Cairns, Australia, December 2004, pp. 241–254 (2004)
Gupta, A., Mumick, I.S.: Maintenance of materialized views: problems, techniques, and applications. IEEE Data Engineering Bulletin, Special Issue on Materialized Views and Warehousing 18(2) (1995)
Jin, L., Li, C., Mehrotra, S.: Efficient record linkage in large data sets. In: Proc. of the 8th International Conference on Database Systems for Advanced Applications (DASFAA) (2003)
Kelman, C., Bass, A., Holman, C.: Research use of linked health data - a best practice protocol. Australian and New Zealand Journal of Public Health 26, 251–255 (2002)
McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD, pp. 169–178 (2000)
O’Keefe, C.M., Yung, M., Gu, L., Baxter, R.: Privacy-preserving data linkage protocols. In: WPES 2004: Proceedings of the 2004 ACM workshop on Privacy in the electronic society, pp. 94–102. ACM Press, New York (2004)
Pang, C., Dong, G., Ramamohanarao, K.: Incremental maintenance of shortest distance and transitive closure in first-order logic and sql. ACM Trans. Database Syst. 30(3), 698–721 (2005)
Pang, C., Gu, L.: Data comparison using encrypted data and data clusters. Patent Application No 2005906045 (AU) (2005)
Ravikumar, P., Cohen, W., Fienberg, S.: A secure protocol for computing string distance metrics. In: Proc. of the Workshop on Privacy and Security Aspects of Data Mining (PPDM) held in conjunction with IEEE International Conference on Data Mining (2004)
The Internet Movie Database, http://www.imdb.com
Winkler, W.: The state of record linkage and current research. In: Proceedings of the Survey Methods Section, Statistical Society of Canada, pp. 73–80 (1999)
Yao, A.: Protocols for Secure Communication. In: Proc. of 23rd IEEE Symposium on the Foundations of Computer Science, pp. 160–168. IEEE, Los Alamitos (1986)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Pang, C., Gu, L., Hansen, D., Maeder, A. (2009). Privacy-Preserving Fuzzy Matching Using a Public Reference Table. In: McClean, S., Millard, P., El-Darzi, E., Nugent, C. (eds) Intelligent Patient Management. Studies in Computational Intelligence, vol 189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00179-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-00179-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00178-9
Online ISBN: 978-3-642-00179-6
eBook Packages: EngineeringEngineering (R0)