Decision Tree-Based Anonymized Electronic Health Record Fusion for Public Health Informatics

Khalique, Fatima; Khan, Shoab Ahmed; Mubarak, Qurat-ul-ain; Safdar, Hasan

doi:10.1007/978-3-030-01174-1_30

Fatima Khalique¹⁷,
Shoab Ahmed Khan¹⁸,
Qurat-ul-ain Mubarak¹⁸ &
…
Hasan Safdar¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 858))

Included in the following conference series:

Science and Information Conference

1283 Accesses

Abstract

Electronic Health Record (EHR) is frequently used in Health Information Exchanges for fusing data of same patients for public health informatics through the demographic attributes. Fusing this information across multiple health care entities presents a two-fold complexity. First the privacy constraints are stringent regarding sharing of demographic information across organizations. This requires encrypting or hashing records for anonymity. Second, the fusion of anonymized data leads to problem of finding duplicate records and linking the incoming information accurately to the existing records. This paper presents a methodology to acquire health data by the office of any public health department while preserving the privacy, integrity and usefulness of the data. Our novel duplicate detection algorithm is based on a combination of cryptographic hashing and machine learning techniques for approximate linking of patients’ records by identifying duplicate and unique records. Experimental results on three different datasets show that our proposed methodology is capable of detecting duplicates based on encoded demographic data from EHR affectively. In addition the proposed methodology can potentially be applied for record matching in other domains with encoded data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Menachemi, N., Collum, T.: Benefits and drawbacks of electronic health record systems. Risk Manag. Healthc. Policy 4, 47–55 (2011)
Article Google Scholar
Blumenthal, D., Tavenner, M.: The ‘Meaningful Use’ regulation for electronic health records. N. Engl. J. Med. 363(6), 501–504 (2010)
Article Google Scholar
Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Heal. Inf. Sci. Syst. 2(1), 3 (2014)
Google Scholar
Grande, D., Mitra, N., Shah, A., Wan, F., Asch, D.A.: Public preferences about secondary uses of electronic health information. JAMA Intern. Med. 173(19), 1798–1806 (2013)
Article Google Scholar
Centers for Medicare & Medicaid Services: The Health Insurance Portability and Accountability Act of 1996 (HIPAA) (1996)
Google Scholar
Information Commissioner: Data Protection Act 1998 Legal Guidance: a reference document for organisations and their advisers that provides a broad guide to the Act as a whole. Information Commissioner’s office, Cheshire (2009)
Google Scholar
European Parliament and the Council of the European Union: Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Off. J. Eur. Union L281, 31–50 (1995)
Google Scholar
Wang, X., Ling, J.: Multiple valued logic approach for matching patient records in multiple databases. J Biomed. Inf. 45(2), 224–230 (2012)
Article Google Scholar
Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intell. Syst. 18(5), 16–23 (2003)
Article Google Scholar
Hernández, M.A., Stolfo, S.J.: Real-world data is dirty: data cleansing and the merge/purge problem. Data Min. Knowl. Discov. 2(1), 9–37 (1998)
Article Google Scholar
Elmagarmid, K., Member, S.: Duplicate record detection : a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)
Article Google Scholar
Rehman, M., Esichaikul, V.: Duplicate record detection for database cleansing. In: 2009 Second International Conference on Machine Vision, pp. 333–338 (2009)
Google Scholar
Sorkhabi, B., Derakhshi, M.R.F., Shahamfar, H.: An algorithm for detecting similar data in replicated databases using multi criteria decision making. In: 2009 Second International Conference on Environmental and Computer Science, pp. 199–203 (2009)
Google Scholar
Zhang, J.: An efficient and effective duplication detection method in large database applications. In: 2010 Fourth International Conference on Network and System Security, pp. 494–501 (2010)
Google Scholar
Herschel, M.: Efficient and effective duplicate detection in hierarchical data. IEEE Trans. Knowl. Data Eng. 25(5), 1028–1041 (2013)
Article Google Scholar
Samiei, A., Naumann, F.: Cluster-based sorted neighborhood for efficient duplicate detection. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 202–209 (2016)
Google Scholar
Newcombe, H.B.: Record linking: the design of efficient systems for linking records into individual and family histories. Am. J. Hum. Genet. 19(3), 335–359 (1967)
Google Scholar
Wandhekar, V., Mohanpurkar, A.: Proof of duplication detection in data by applying similarity strategies. In: 2015 International Conference on Information Processing (ICIP), pp. 429–434 (2015)
Google Scholar
Ektefa, M., Ibrahim, H., Memar, S.: A Threshold-based Similarity Measure for Duplicate Detection, pp. 37–41 (2011)
Google Scholar
Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Article MathSciNet Google Scholar
Sweeney, L.: Achieving K-anonymity privacy protection using generalization and suppression. Int. J. Uncertain Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)
Article MathSciNet Google Scholar
Wong, R.C.-W., Li, J., Fu, A.W.-C., Wang, K.: (a, K)-anonymity: an enhanced K-anonymity model for privacy preserving data publishing. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 754–759 (2006)
Google Scholar
Loukides, G., Shao, J.: Capturing data usefulness and privacy protection in K-anonymisation. In: Proceedings of the 2007 ACM Symposium on Applied Computing, pp. 370–374 (2007)
Google Scholar
Nergiz, M.E., Atzori, M., Clifton, C.: Hiding the presence of individuals from shared databases. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 665–676 (2007)
Google Scholar
Nergiz, M.E., Clifton, C.: Presence without complete world knowledge. IEEE Trans. Knowl. Data Eng. 22(6), 868–883 (2010)
Article Google Scholar
Gkoulalas-Divanis, A., Loukides, G., Sun, J.: Publishing data from electronic health records while preserving privacy: a survey of algorithms. J. Biomed. Inform. 50(Supplement C), 4–19 (2014)
Article Google Scholar
Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms, 2nd edn. McGraw-Hill Higher Education, New York (2001)
Google Scholar
Handschuh, H.: SHA-0, SHA-1, SHA-2 (Secure Hash Algorithm). In: van Tilborg, H.C.A., Jajodia, S. (eds.) Encyclopedia of Cryptography and Security, 2nd edn., pp. 1190–1193. Springer, New York (2011)
Google Scholar
Norouzi, M., Fleet, D.J., Salakhutdinov, R.R.: Hamming distance metric learning. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1061–1069. Curran Associates, Inc. (2012)
Google Scholar
Wei, M., Sung, A.H., Cather, M.E.: Improving database quality through eliminating duplicate records. Data Sci. J. 5, 127–142 (2006)
Article Google Scholar
Wright, D.: Telemedicine and developing countries - a report of Study Group 2 of the ITU Development Sector. J. Telemed. Telecare 4(Suppl 2), 1–85 (1998)
Google Scholar
Winkler, W.E., Thibaudeau, Y.: An application of the Fellegi-Sunter model of record linkage to the 1990 U.S. decennial census. In: U.S. Decennial Census. Technical report, US Bureau of the Census (1987)
Google Scholar

Download references

Author information

Authors and Affiliations

National University of Sciences and Technology, Islamabad, Pakistan
Fatima Khalique
College of Electrical and Mechanical Engineering, NUST, Islamabad, Pakistan
Shoab Ahmed Khan & Qurat-ul-ain Mubarak
Center for Advanced Studies in Engineering, Islamabad, Pakistan
Hasan Safdar

Authors

Fatima Khalique
View author publications
You can also search for this author in PubMed Google Scholar
Shoab Ahmed Khan
View author publications
You can also search for this author in PubMed Google Scholar
Qurat-ul-ain Mubarak
View author publications
You can also search for this author in PubMed Google Scholar
Hasan Safdar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fatima Khalique .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Department of Information Science, Saga University, Honjo, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khalique, F., Khan, S.A., Mubarak, Qua., Safdar, H. (2019). Decision Tree-Based Anonymized Electronic Health Record Fusion for Public Health Informatics. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Computing. SAI 2018. Advances in Intelligent Systems and Computing, vol 858. Springer, Cham. https://doi.org/10.1007/978-3-030-01174-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-01174-1_30
Published: 02 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01173-4
Online ISBN: 978-3-030-01174-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics