Skip to main content

Canopy-Based Private Blocking

  • Conference paper
  • First Online:
Book cover Data Mining (AusDM 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 996))

Included in the following conference series:

  • 1076 Accesses

Abstract

Integrating data from different sources often involves using personal information for linking records that correspond to the same real-world entities. This raises privacy concerns, leading to development of privacy preserving record linkage (PPRL) techniques which aim to conduct linkage without revealing private or confidential information of the corresponding entities. To make privacy methods scalable to large datasets, in this paper, we propose a novel blocking approach that adapts canopy clustering for a private setting. Our approach features using public reference data as a basis to form blocks, and involving redundancy in block assignments. We provide an analysis on the approach’s privacy and experimentally evaluate its performance in terms of efficiency and effectiveness. The results show that our approach is scalable with the size of datasets and achieves better quality than the state-of-the-art sorted neighborhood based approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Schema matching is another research topic beyond the scope of this paper.

  2. 2.

    The values used for reference should be in the same domain as the values of the local blocking attributes (e.g. both are surnames).

References

  1. Agrawal, R., Evfimievski, A., Srikant, R.: Information sharing across private databases. In: Proceedings of SIGMOD (2003)

    Google Scholar 

  2. Al-Lawati, A., Lee, D., McDaniel, P.: Blocking-aware private record linkage. In: Proceedings of IQIS (2005)

    Google Scholar 

  3. Bonomi, L., Xiong, L., Chen, R., Fung, B.C.M.: Frequent grams based embedding for privacy preserving record linkage. In: Proceedings of CIKM (2012)

    Google Scholar 

  4. Christen, P.: Febrl – an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: Proceedings of SIGKDD (2008)

    Google Scholar 

  5. Christen, P.: Preparation of a real temporal voter data set for record linkage and duplicate detection research. Technical report, ANU (2014)

    Google Scholar 

  6. Durham, E.: A framework for accurate, efficient private record linkage. Ph.D. thesis, Vanderbilt University (2012)

    Google Scholar 

  7. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006, Part II. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1

    Chapter  Google Scholar 

  8. Han, S., Shen, D., Nie, T., Kou, Y., Yu, G.: Scalable private blocking technique for privacy-preserving record linkage. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds.) APWeb 2016, Part II. LNCS, vol. 9932, pp. 201–213. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45817-5_16

    Chapter  Google Scholar 

  9. Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: Proceedings of ICDE (2008)

    Google Scholar 

  10. Karakasidis, A., Koloniari, G., Verykios, V.S.: Scalable blocking for privacy preserving record linkage. In: Proceedings of SIGKDD (2015)

    Google Scholar 

  11. Karakasidis, A., Verykios, V.S.: Reference table based K-anonymous private blocking. In: Proceedings of SAC (2012)

    Google Scholar 

  12. Kuzu, M., Kantarcioglu, M., Inan, A., Bertino, E., Durham, E., Malin, B.: Efficient privacy-aware record integration. In: Proceedings of EDBT (2013)

    Google Scholar 

  13. McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of SIGKDD (2000)

    Google Scholar 

  14. Ranbaduge, T., Vatsalan, D., Christen, P.: Tree based scalable indexing for multi-party privacy-preserving record linkage. In: Proceedings of AusDM (2014)

    Google Scholar 

  15. Ranbaduge, T., Vatsalan, D., Christen, P.: Clustering-based scalable indexing for multi-party privacy-preserving record linkage. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015, Part II. LNCS (LNAI), vol. 9078, pp. 549–561. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18032-8_43

    Chapter  Google Scholar 

  16. Ranbaduge, T., Vatsalan, D., Christen, P., Verykios, V.: Hashing-based distributed multi-party blocking for privacy-preserving record linkage. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016, Part II. LNCS (LNAI), vol. 9652, pp. 415–427. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31750-2_33

    Chapter  Google Scholar 

  17. Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using bloom filters. BMC Med. Inform. Decis. Making 9, 41 (2009)

    Article  Google Scholar 

  18. Vatsalan, D., Christen, P.: Sorted nearest neighborhood clustering for efficient private blocking. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part II. LNCS (LNAI), vol. 7819, pp. 341–352. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_29

    Chapter  Google Scholar 

  19. Vatsalan, D., Christen, P., Verykios, V.: Efficient two-party private blocking based on sorted nearest neighborhood clustering. In: Proceedings of CIKM (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanfeng Shu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shu, Y., Hardy, S., Thorne, B. (2019). Canopy-Based Private Blocking. In: Islam, R., et al. Data Mining. AusDM 2018. Communications in Computer and Information Science, vol 996. Springer, Singapore. https://doi.org/10.1007/978-981-13-6661-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-6661-1_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-6660-4

  • Online ISBN: 978-981-13-6661-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics