Skip to main content

Privacy-Preserving Fuzzy Matching Using a Public Reference Table

  • Chapter
Intelligent Patient Management

Part of the book series: Studies in Computational Intelligence ((SCI,volume 189))

Abstract

In this paper we address the problem of matching data from different databases using a third party, where the actual data can not be disclosed. The aim is to provide a mechanism for improved matching results across databases while preserving the privacy of sensitive information in those databases. This is particularly relevant with health related databases, where bringing data about patients together from multiple databases allows for important medical research, but the sensitive nature of the data requires that identifying information never be disclosed.

The method described uses a public reference table to provide a way for matching people’s names in different databases without requiring identifying information to be revealed to any party outside the originating data source. An advantage of our algorithm is that it provides a mechanism for dealing with typographical or other errors in the data.

The key features of our proposed approach are: (1) original private data from individual custodians are never revealed to any other party because data comparison is performed at individual custodians and only comparison results, which are data in the reference table, are sent; (2) the third party performs the match based on encrypted values in the public reference table and some distance information. Experimental results show that our proposed method performs fuzzy matching (similarity join) at an accuracy comparable to that of conventional fuzzy matching algorithms without revealing any identifying information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atallah, M.J., Kerschbaum, F., Du, W.: Secure and private sequence comparisons. In: WPES 2003: Proceedings of the 2003 ACM workshop on Privacy in the electronic society, pp. 39–44. ACM Press, New York (2003)

    Chapter  Google Scholar 

  2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley Professional, Reading (1999)

    Google Scholar 

  3. Boruch, R., Cecil, J.: Assuring the confidentiality of social research data. University of Philadelphia Press, Philadelphia (1979)

    Google Scholar 

  4. Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 313–324. ACM Press, New York (2003)

    Chapter  Google Scholar 

  5. Christen, P., Churches, T., Hegland, M.: Febrl - A Parallel Open Source Data Linkage System. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS, vol. 3056, pp. 638–647. Springer, Heidelberg (2004)

    Google Scholar 

  6. Churches, T.: A proposed architecture and method of operation for improving the protection of privacy and confidentiality in disease registers. BMC Medical Research Methodology 3 (2003)

    Google Scholar 

  7. Churches, T., Christen, P.: Some methods for blindfolded record linkage. BMC Medical Informatics and Decision Making 4(1), 9 (2004)

    Article  Google Scholar 

  8. Cohen, W.: Data integration using similarity joins and a word-based information representation language. ACM Transactions on Information Systems 18(3), 288–321 (2000)

    Article  Google Scholar 

  9. Cohen, W.W., Sarawagi, S.: Exploiting dictionaries in named entity extraction: combining semi-markov extraction processes and data integration methods. In: KDD 2004: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 89–98. ACM Press, New York (2004)

    Chapter  Google Scholar 

  10. Culberson, J.C., Reckhow, R.A.: Covering polygons is hard. J. Algorithms 17(1), 2–44 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  11. Du, W., Atallah, M.: Privacy-preserving statistical analysis. In: Proc. of the 17th Annual Computer Security Applications Conference, pp. 102–110 (2001)

    Google Scholar 

  12. Dusserre, L., Quantin, C., Bouzelat, H.: A one way public key cryptosystem for the linkage of nominal files in epidemiological studies. International Journal of Medical Informatics 8, 644–647 (1995)

    Google Scholar 

  13. Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 491–500. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  14. Gu, L., Baxter, R.: Adaptive Filtering for Efficient Record Linkage. In: Proc. of SIAM International Conference on Data Mining (SIAM 2004), Orlando, Florida, April 2004, pp. 477–481 (2004)

    Google Scholar 

  15. Gu, L., Baxter, R.: Decision models for record linkage. In: Proc. of the 3rd Australasian Data Mining Conference, Cairns, Australia, December 2004, pp. 241–254 (2004)

    Google Scholar 

  16. Gupta, A., Mumick, I.S.: Maintenance of materialized views: problems, techniques, and applications. IEEE Data Engineering Bulletin, Special Issue on Materialized Views and Warehousing 18(2) (1995)

    Google Scholar 

  17. Jin, L., Li, C., Mehrotra, S.: Efficient record linkage in large data sets. In: Proc. of the 8th International Conference on Database Systems for Advanced Applications (DASFAA) (2003)

    Google Scholar 

  18. Kelman, C., Bass, A., Holman, C.: Research use of linked health data - a best practice protocol. Australian and New Zealand Journal of Public Health 26, 251–255 (2002)

    Article  Google Scholar 

  19. McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD, pp. 169–178 (2000)

    Google Scholar 

  20. O’Keefe, C.M., Yung, M., Gu, L., Baxter, R.: Privacy-preserving data linkage protocols. In: WPES 2004: Proceedings of the 2004 ACM workshop on Privacy in the electronic society, pp. 94–102. ACM Press, New York (2004)

    Chapter  Google Scholar 

  21. Pang, C., Dong, G., Ramamohanarao, K.: Incremental maintenance of shortest distance and transitive closure in first-order logic and sql. ACM Trans. Database Syst. 30(3), 698–721 (2005)

    Article  Google Scholar 

  22. Pang, C., Gu, L.: Data comparison using encrypted data and data clusters. Patent Application No 2005906045 (AU) (2005)

    Google Scholar 

  23. Ravikumar, P., Cohen, W., Fienberg, S.: A secure protocol for computing string distance metrics. In: Proc. of the Workshop on Privacy and Security Aspects of Data Mining (PPDM) held in conjunction with IEEE International Conference on Data Mining (2004)

    Google Scholar 

  24. The Internet Movie Database, http://www.imdb.com

  25. Winkler, W.: The state of record linkage and current research. In: Proceedings of the Survey Methods Section, Statistical Society of Canada, pp. 73–80 (1999)

    Google Scholar 

  26. Yao, A.: Protocols for Secure Communication. In: Proc. of 23rd IEEE Symposium on the Foundations of Computer Science, pp. 160–168. IEEE, Los Alamitos (1986)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Pang, C., Gu, L., Hansen, D., Maeder, A. (2009). Privacy-Preserving Fuzzy Matching Using a Public Reference Table. In: McClean, S., Millard, P., El-Darzi, E., Nugent, C. (eds) Intelligent Patient Management. Studies in Computational Intelligence, vol 189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00179-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00179-6_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00178-9

  • Online ISBN: 978-3-642-00179-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics