Skip to main content

Clustering-Based Scalable Indexing for Multi-party Privacy-Preserving Record Linkage

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9078))

Included in the following conference series:

Abstract

The identification of common sets of records in multiple databases has become an increasingly important subject in many application areas, including banking, health, and national security. Often privacy concerns and regulations prevent the owners of the databases from sharing any sensitive details of their records with each other, and with any other party. The linkage of records in multiple databases while preserving privacy and confidentiality is an emerging research discipline known as privacy-preserving record linkage (PPRL). We propose a novel two-step indexing (blocking) approach for PPRL between multiple (more than two) parties. First, we generate small mini-blocks using a multi-bit Bloom filter splitting method and second we merge these mini-blocks based on their similarity using a novel hierarchical canopy clustering technique. An empirical study conducted with large datasets of up-to one million records shows that our approach is scalable with the size of the datasets and the number of parties, while providing better privacy than previous multi-party indexing approaches.

Clustering-based Scalable Indexing for Multi-party Privacy-preserving Record Linkage - This research is funded by the Australian Research Council under Discovery Project DP130101801.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al-Lawati, A., Lee, D., McDaniel, P.: Blocking-aware private record linkage. In: ACM IQIS, pp. 59–68, Baltimore (2005)

    Google Scholar 

  2. Bloom, B.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  3. Christen, P.: Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer (2012)

    Google Scholar 

  4. Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. IEEE TKDE 24(9), 1537–1555 (2012)

    Google Scholar 

  5. Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.: Tools for privacy preserving distributed data mining. SIGKDD Explorations 4(2), 28–34 (2002)

    Article  Google Scholar 

  6. Cohen, W.W., Richman, J.: Learning to match and cluster large high-dimensional data sets for data integration. In: ACM SIGKDD, pp. 475–480, Edmonton (2002)

    Google Scholar 

  7. Durham, E.: A framework for accurate, efficient private record linkage. Ph.D. thesis, Faculty of the Graduate School of Vanderbilt University, Nashville, TN (2012)

    Google Scholar 

  8. Karakasidis, A., Verykios, V.: Secure blocking+secure matching = secure record linkage. Journal of Computing Science and Engineering 5, 223–235 (2011)

    Article  Google Scholar 

  9. Kristensen, T.G., Nielsen, J., Pedersen, C.N.: A tree-based method for the rapid screening of chemical fingerprints. Algo. for Molecular Biology 5(1), 9 (2010)

    Article  Google Scholar 

  10. Kuzu, M., Kantarcioglu, M., Inan, A., Bertino, E., Durham, E., Malin, B.: Efficient privacy-aware record integration. In: ACM EDBT, Genoa, Italy (2013)

    Google Scholar 

  11. Lai, P., Yiu, S., Chow, K., Chong, C., Hui, L.: An efficient Bloom filter based solution for multiparty private matching. In: SAM, Las Vegas (2006)

    Google Scholar 

  12. McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: ACM SIGKDD, Boston (2000)

    Google Scholar 

  13. Ranbaduge, T., Vatsalan, D., Christen, P.: Tree based scalable indexing for multi-party privacy-preserving record linkage. In: AusDM, CRPIT 158, Brisbane (2014)

    Google Scholar 

  14. Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using Bloom filters. BMC Medical Informatics and Decision Making 9(1) (2009)

    Google Scholar 

  15. Schnell, R.: Privacy-preserving record linkage and privacy-preserving blocking for large files with cryptographic keys using multibit trees. In: JSM, Montreal (2013)

    Google Scholar 

  16. Vatsalan, D., Christen, P., Verykios, V.: A taxonomy of privacy-preserving record linkage techniques. JIS 38(6), 946–969 (2013)

    Google Scholar 

  17. Vatsalan, D., Christen, P.: Scalable privacy-preserving record linkage for multiple databases. In: ACM CIKM, pp. 1795–1798, Shanghai (2014)

    Google Scholar 

  18. Vatsalan, D., Christen, P., O’Keefe, C.M., Verykios, V.: An evaluation framework for privacy-preserving record linkage. JPC 6(1), 35–75 (2014)

    Google Scholar 

  19. Vatsalan, D., Christen, P., Verykios, V.: Efficient two-party private blocking based on sorted nearest neighborhood clustering. In: ACM CIKM, San Francisco (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thilina Ranbaduge .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ranbaduge, T., Vatsalan, D., Christen, P. (2015). Clustering-Based Scalable Indexing for Multi-party Privacy-Preserving Record Linkage. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18032-8_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18031-1

  • Online ISBN: 978-3-319-18032-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics