Skip to main content

Decision Tree-Based Anonymized Electronic Health Record Fusion for Public Health Informatics

  • Conference paper
  • First Online:
Intelligent Computing (SAI 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 858))

Included in the following conference series:

  • 1283 Accesses

Abstract

Electronic Health Record (EHR) is frequently used in Health Information Exchanges for fusing data of same patients for public health informatics through the demographic attributes. Fusing this information across multiple health care entities presents a two-fold complexity. First the privacy constraints are stringent regarding sharing of demographic information across organizations. This requires encrypting or hashing records for anonymity. Second, the fusion of anonymized data leads to problem of finding duplicate records and linking the incoming information accurately to the existing records. This paper presents a methodology to acquire health data by the office of any public health department while preserving the privacy, integrity and usefulness of the data. Our novel duplicate detection algorithm is based on a combination of cryptographic hashing and machine learning techniques for approximate linking of patients’ records by identifying duplicate and unique records. Experimental results on three different datasets show that our proposed methodology is capable of detecting duplicates based on encoded demographic data from EHR affectively. In addition the proposed methodology can potentially be applied for record matching in other domains with encoded data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Menachemi, N., Collum, T.: Benefits and drawbacks of electronic health record systems. Risk Manag. Healthc. Policy 4, 47–55 (2011)

    Article  Google Scholar 

  2. Blumenthal, D., Tavenner, M.: The ‘Meaningful Use’ regulation for electronic health records. N. Engl. J. Med. 363(6), 501–504 (2010)

    Article  Google Scholar 

  3. Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Heal. Inf. Sci. Syst. 2(1), 3 (2014)

    Google Scholar 

  4. Grande, D., Mitra, N., Shah, A., Wan, F., Asch, D.A.: Public preferences about secondary uses of electronic health information. JAMA Intern. Med. 173(19), 1798–1806 (2013)

    Article  Google Scholar 

  5. Centers for Medicare & Medicaid Services: The Health Insurance Portability and Accountability Act of 1996 (HIPAA) (1996)

    Google Scholar 

  6. Information Commissioner: Data Protection Act 1998 Legal Guidance: a reference document for organisations and their advisers that provides a broad guide to the Act as a whole. Information Commissioner’s office, Cheshire (2009)

    Google Scholar 

  7. European Parliament and the Council of the European Union: Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Off. J. Eur. Union L281, 31–50 (1995)

    Google Scholar 

  8. Wang, X., Ling, J.: Multiple valued logic approach for matching patient records in multiple databases. J Biomed. Inf. 45(2), 224–230 (2012)

    Article  Google Scholar 

  9. Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intell. Syst. 18(5), 16–23 (2003)

    Article  Google Scholar 

  10. Hernández, M.A., Stolfo, S.J.: Real-world data is dirty: data cleansing and the merge/purge problem. Data Min. Knowl. Discov. 2(1), 9–37 (1998)

    Article  Google Scholar 

  11. Elmagarmid, K., Member, S.: Duplicate record detection : a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)

    Article  Google Scholar 

  12. Rehman, M., Esichaikul, V.: Duplicate record detection for database cleansing. In: 2009 Second International Conference on Machine Vision, pp. 333–338 (2009)

    Google Scholar 

  13. Sorkhabi, B., Derakhshi, M.R.F., Shahamfar, H.: An algorithm for detecting similar data in replicated databases using multi criteria decision making. In: 2009 Second International Conference on Environmental and Computer Science, pp. 199–203 (2009)

    Google Scholar 

  14. Zhang, J.: An efficient and effective duplication detection method in large database applications. In: 2010 Fourth International Conference on Network and System Security, pp. 494–501 (2010)

    Google Scholar 

  15. Herschel, M.: Efficient and effective duplicate detection in hierarchical data. IEEE Trans. Knowl. Data Eng. 25(5), 1028–1041 (2013)

    Article  Google Scholar 

  16. Samiei, A., Naumann, F.: Cluster-based sorted neighborhood for efficient duplicate detection. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 202–209 (2016)

    Google Scholar 

  17. Newcombe, H.B.: Record linking: the design of efficient systems for linking records into individual and family histories. Am. J. Hum. Genet. 19(3), 335–359 (1967)

    Google Scholar 

  18. Wandhekar, V., Mohanpurkar, A.: Proof of duplication detection in data by applying similarity strategies. In: 2015 International Conference on Information Processing (ICIP), pp. 429–434 (2015)

    Google Scholar 

  19. Ektefa, M., Ibrahim, H., Memar, S.: A Threshold-based Similarity Measure for Duplicate Detection, pp. 37–41 (2011)

    Google Scholar 

  20. Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)

    Article  MathSciNet  Google Scholar 

  21. Sweeney, L.: Achieving K-anonymity privacy protection using generalization and suppression. Int. J. Uncertain Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)

    Article  MathSciNet  Google Scholar 

  22. Wong, R.C.-W., Li, J., Fu, A.W.-C., Wang, K.: (a, K)-anonymity: an enhanced K-anonymity model for privacy preserving data publishing. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 754–759 (2006)

    Google Scholar 

  23. Loukides, G., Shao, J.: Capturing data usefulness and privacy protection in K-anonymisation. In: Proceedings of the 2007 ACM Symposium on Applied Computing, pp. 370–374 (2007)

    Google Scholar 

  24. Nergiz, M.E., Atzori, M., Clifton, C.: Hiding the presence of individuals from shared databases. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 665–676 (2007)

    Google Scholar 

  25. Nergiz, M.E., Clifton, C.: Presence without complete world knowledge. IEEE Trans. Knowl. Data Eng. 22(6), 868–883 (2010)

    Article  Google Scholar 

  26. Gkoulalas-Divanis, A., Loukides, G., Sun, J.: Publishing data from electronic health records while preserving privacy: a survey of algorithms. J. Biomed. Inform. 50(Supplement C), 4–19 (2014)

    Article  Google Scholar 

  27. Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms, 2nd edn. McGraw-Hill Higher Education, New York (2001)

    Google Scholar 

  28. Handschuh, H.: SHA-0, SHA-1, SHA-2 (Secure Hash Algorithm). In: van Tilborg, H.C.A., Jajodia, S. (eds.) Encyclopedia of Cryptography and Security, 2nd edn., pp. 1190–1193. Springer, New York (2011)

    Google Scholar 

  29. Norouzi, M., Fleet, D.J., Salakhutdinov, R.R.: Hamming distance metric learning. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1061–1069. Curran Associates, Inc. (2012)

    Google Scholar 

  30. Wei, M., Sung, A.H., Cather, M.E.: Improving database quality through eliminating duplicate records. Data Sci. J. 5, 127–142 (2006)

    Article  Google Scholar 

  31. Wright, D.: Telemedicine and developing countries - a report of Study Group 2 of the ITU Development Sector. J. Telemed. Telecare 4(Suppl 2), 1–85 (1998)

    Google Scholar 

  32. Winkler, W.E., Thibaudeau, Y.: An application of the Fellegi-Sunter model of record linkage to the 1990 U.S. decennial census. In: U.S. Decennial Census. Technical report, US Bureau of the Census (1987)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatima Khalique .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khalique, F., Khan, S.A., Mubarak, Qua., Safdar, H. (2019). Decision Tree-Based Anonymized Electronic Health Record Fusion for Public Health Informatics. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Computing. SAI 2018. Advances in Intelligent Systems and Computing, vol 858. Springer, Cham. https://doi.org/10.1007/978-3-030-01174-1_30

Download citation

Publish with us

Policies and ethics