Advertisement

A scalable privacy-preserving framework for temporal record linkage

  • Thilina RanbadugeEmail author
  • Peter Christen
Regular Paper
  • 40 Downloads

Abstract

Record linkage (RL) is the process of identifying matching records from different databases that refer to the same entity. In many applications, it is common that the attribute values of records that belong to the same entity evolve over time, for example people can change their surname or address. Therefore, to identify the records that refer to the same entity over time, RL should make use of temporal information such as the time-stamp of when a record was created and/or update last. However, if RL needs to be conducted on information about people, due to privacy and confidentiality concerns organisations are often not willing or allowed to share sensitive data in their databases, such as personal medical records or location and financial details, with other organisations. This paper proposes a scalable framework for privacy-preserving temporal record linkage that can link different databases while ensuring the privacy of sensitive data in these databases. We propose two protocols that can be used in different linkage scenarios with and without a third party. Our protocols use Bloom filter encoding which incorporates the temporal information available in records during the linkage process. Our approaches first securely calculate the probabilities of entities changing attribute values in their records over a period of time. Based on these probabilities, we then generate a set of masking Bloom filters to adjust the similarities between record pairs. We provide a theoretical analysis of the complexity and privacy of our techniques and conduct an empirical study on large real databases containing several millions of records. The experimental results show that our approaches can achieve better linkage quality compared to non-temporal PPRL while providing privacy to individuals in the databases that are being linked.

Keywords

Secure multiparty computation Encryption Temporal records 

Notes

Acknowledgements

This work was funded by the Australian Research Council under Discovery Project DP160101934.

References

  1. 1.
    Chiang YH, Doan A, Naughton JF (2014) Modeling entity evolution for temporal record matching. In: ACM SIGMOD, pp 1175–1186Google Scholar
  2. 2.
    Christen P (2012) Data matching–concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer, BerlinGoogle Scholar
  3. 3.
    Christen P, Gayler RW (2013) Adaptive temporal entity resolution on dynamic databases. In: PAKDD. Springer, pp 558–569Google Scholar
  4. 4.
    Christen P, Vatsalan D, Wang Q (2015) Efficient entity resolution with adaptive and interactive training data selection. In: IEEE ICDMGoogle Scholar
  5. 5.
    Christen P, Schnell R, Vatsalan D, Ranbaduge T (2017a) Efficient cryptanalysis of Bloom filters for privacy-preserving record linkage. In: PAKDDGoogle Scholar
  6. 6.
    Christen V, Groß A, Fisher J, Wang Q, Christen P, Rahm E (2017b) Temporal group linkage and evolution analysis for census data. In: EDBT, pp 620–631Google Scholar
  7. 7.
    Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu M (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2):28–34CrossRefGoogle Scholar
  8. 8.
    Durham EA, Toth C, Kuzu M, Kantarcioglu M, Xue Y, Malin B (2013) Composite Bloom filters for secure record linkage. In: TKDEGoogle Scholar
  9. 9.
    Hand D, Christen P (2018) A note on using the F-measure for evaluating record linkage algorithms. Stat Comput 28(3):539–547MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Hu Y, Wang Q, Vatsalan D, Christen P (2017) Improving temporal record linkage using regression classification. In: PAKDDGoogle Scholar
  11. 11.
    Inan A, Kantarcioglu M, Ghinita G, Bertino E (2010) Private record matching using differential privacy. In: International conference on extending database technology. ACM, pp 123–134Google Scholar
  12. 12.
    Karakasidis A, Verykios V (2011) Secure blocking+ secure matching= secure record linkage. J Comput Sci Eng 5(3):223–235CrossRefGoogle Scholar
  13. 13.
    Li F, Lee ML, Hsu W, Tan WC (2015) Linking temporal records for profiling entities. In: ACM SIGMOD, pp 593–605Google Scholar
  14. 14.
    Li P, Dong XL, Maurino A, Srivastava D (2011) Linking temporal records. VLDB Endowment 4(11):956–967zbMATHGoogle Scholar
  15. 15.
    Lin HY, Tzeng WG (2005) An efficient solution to the Millionaires’ problem based on homomorphic encryption. In: Applied cryptography and network security. Springer, pp 456–466Google Scholar
  16. 16.
    Lindell Y, Pinkas B (2009) Secure multiparty computation for privacy-preserving data mining. JPC 1(1):5CrossRefGoogle Scholar
  17. 17.
    Lyubashevsky V, Peikert C, Regev O (2012) On ideal lattices and learning with errors over rings. Cryptology ePrint Archive, Report 2012/230, https://eprint.iacr.org/2012/230
  18. 18.
    Naehrig M, Lauter K, Vaikuntanathan V (2011) Can homomorphic encryption be practical? In: 3rd ACM workshop on cloud computing security workshop. ACMGoogle Scholar
  19. 19.
    Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: EUROCRYPT. Springer, pp 223–238Google Scholar
  20. 20.
    Ranbaduge T, Christen P (2018) Privacy-preserving temporal record linkage. In: IEEE ICDM, pp 1161–1171Google Scholar
  21. 21.
    Ranbaduge T, Vatsalan D, Christen P (2014) Tree based scalable indexing for multi-party privacy-preserving record linkage. In: AusDM, CRPIT 158. BrisbaneGoogle Scholar
  22. 22.
    Ranbaduge T, Vatsalan D, Christen P (2015) Clustering-based scalable indexing for multi-party privacy-preserving record linkage. In: PAKDD’09. Springer LNAI, VietnamGoogle Scholar
  23. 23.
    Randall S, Ferrante A, Boyd J, Semmens J (2013) The effect of data cleaning on record linkage quality. BMC Med Inform Decis Mak 13:64CrossRefGoogle Scholar
  24. 24.
    Randall SM, Ferrante AM, Boyd JH, Bauer JK, Semmens JB (2014) Privacy-preserving record linkage on large real world datasets. JBI 50:205Google Scholar
  25. 25.
    Schnell R, Bachteler T, Reiher J (2009) Privacy-preserving record linkage using Bloom filters. BMC Med Inform Decis Mak 9:41CrossRefGoogle Scholar
  26. 26.
    Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Vatsalan D, Christen P (2012) An iterative two-party protocol for scalable privacy-preserving record linkage. In: AusDM, CRPIT 134. Sydney, AustraliaGoogle Scholar
  28. 28.
    Vatsalan D, Christen P (2013) Sorted nearest neighborhood clustering for efficient private blocking. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 341–352Google Scholar
  29. 29.
    Vatsalan D, Christen P (2014) Scalable privacy-preserving record linkage for multiple databases. In: ACM CIKM, pp 1795–1798Google Scholar
  30. 30.
    Vatsalan D, Christen P (2016) Multi-party privacy-preserving record linkage using Bloom filters. arXiv preprint arXiv:1612.08835
  31. 31.
    Vatsalan D, Christen P, Verykios V (2013a) Efficient two-party private blocking based on sorted nearest neighborhood clustering. In: ACM CIKM. San Francisco, pp 1949–1958Google Scholar
  32. 32.
    Vatsalan D, Christen P, Verykios VS (2013b) A taxonomy of privacy-preserving record linkage techniques. JIS 38(6):946Google Scholar
  33. 33.
    Vatsalan D, Sehili Z, Christen P, Rahm E (2017) Privacy-preserving record linkage for big data: current approaches and research challenges. Springer, Berlin, pp 851–895Google Scholar
  34. 34.
    Yakout M, Atallah M, Elmagarmid A (2009) Efficient private record linkage. In: IEEE international conference on data engineering, pp 1283–1286Google Scholar
  35. 35.
    Yao AC (1982) Protocols for secure computations. In: IEEE SFCSGoogle Scholar
  36. 36.
    Yasuda M, Shimoyama T, Kogure J, Yokoyama K, Koshiba T (2015) New packing method in somewhat homomorphic encryption and its applications. Secur Commun Netw 8(13):2194–2213CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Research School of Computer ScienceAustralian National UniversityCanberraAustralia

Personalised recommendations