Machine Learning

, Volume 95, Issue 1, pp 87–101 | Cite as

Detecting inappropriate access to electronic health records using collaborative filtering

  • Aditya Krishna Menon
  • Xiaoqian Jiang
  • Jihoon Kim
  • Jaideep Vaidya
  • Lucila Ohno-Machado


Many healthcare facilities enforce security on their electronic health records (EHRs) through a corrective mechanism: some staff nominally have almost unrestricted access to the records, but there is a strict ex post facto audit process for inappropriate accesses, i.e., accesses that violate the facility’s security and privacy policies. This process is inefficient, as each suspicious access has to be reviewed by a security expert, and is purely retrospective, as it occurs after damage may have been incurred. This motivates automated approaches based on machine learning using historical data. Previous attempts at such a system have successfully applied supervised learning models to this end, such as SVMs and logistic regression. While providing benefits over manual auditing, these approaches ignore the identity of the users and patients involved in a record access. Therefore, they cannot exploit the fact that a patient whose record was previously involved in a violation has an increased risk of being involved in a future violation. Motivated by this, in this paper, we propose a collaborative filtering inspired approach to predicting inappropriate accesses. Our solution integrates both explicit and latent features for staff and patients, the latter acting as a personalized “fingerprint” based on historical access patterns. The proposed method, when applied to real EHR access data from two tertiary hospitals and a file-access dataset from Amazon, shows not only significantly improved performance compared to existing methods, but also provides insights as to what indicates an inappropriate access.


Access violation Collaborative filtering Electronic health records Privacy breach detection 



The authors were funded in part by the NIH grants K99LM011392, R01LM009520, U54HL108460, R01HS019913, and UL1RR031980. We thank Janice M. Grillo, Rose Mandelbaum, Debra Mikels, Bhakti Patel and Partners HealthCare for their contributions to this research. We also thank the anonymous reviewers for their valuable comments that improved the presentation of the paper.

Supplementary material

10994_2013_5376_MOESM1_ESM.pdf (429 kb)
(PDF 430 kB)


  1. Boxwala, A., Kim, J., Grillo, J., & Ohno-Machado, L. (2011). Using statistical and machine learning to help institutions detect suspicious access to electronic health records. Journal of the American Medical Informatics Association, 18(4), 498–505. CrossRefGoogle Scholar
  2. Chen, Y., & Malin, B. (2011). Detection of anomalous insiders in collaborative environments via relational analysis of access logs. In Proceedings of the first ACM conference on data and application security and privacy (pp. 63–74). New York: ACM. Google Scholar
  3. Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, ICML’06 (pp. 233–240). New York: ACM. doi: 10.1145/1143844.1143874 CrossRefGoogle Scholar
  4. DeGroot, M. H., & Fienberg, S. E. (1983). The comparison and evaluation of forecasters. Journal of the Royal Statistical Society. Series D. The Statistician, 32(1/2), 12–22. Google Scholar
  5. Fabbri, D., & LeFevre, K. (2011). Explanation-based auditing. Proceedings of the VLDB Endowment, 5(1), 1–12. CrossRefGoogle Scholar
  6. Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8), 861–874. doi: 10.1016/j.patrec.2005.10.010. MathSciNetCrossRefGoogle Scholar
  7. Guardian, T. (2010). Department of health & human services, breaches affecting 500 or more individuals. Available online:
  8. Hofmann, T., Puzicha, J., & Jordan, M. I. (1999). Learning from dyadic data. In NIPS’99 (pp. 466–472). Google Scholar
  9. Kaushik, R., & Ramamurthy, R. (2011). Whodunit: an auditing tool for detecting data breaches. Proceedings of the VLDB Endowment, 4(12), 1410–1413. Google Scholar
  10. Kim, J., Grillo, J., Boxwala, A., Jiang, X., Mandelbaum, R., Patel, B., Mikels, D., Vinterbo, S., & Ohno-Machado, L. (2011). Anomaly and signature filtering improve classifier performance for detection of suspicious access to EHRs. In Proceedings of AMIA Annual Symposium (Vol. 2011, pp. 723–731). Google Scholar
  11. Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37. CrossRefGoogle Scholar
  12. Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. In Proceedings of the fourteenth international conference on machine learning (pp. 179–186). San Mateo: Morgan Kaufmann. Google Scholar
  13. Malin, B., Nyemba, S., & Paulett, J. (2011). Learning relational policies from electronic health record access logs. Journal of Biomedical Informatics, 44(2), 333–342. CrossRefGoogle Scholar
  14. Menon, A. K., Chitrapura, K. P., Garg, S., Agarwal, D., & Kota, N. (2011). Response prediction using collaborative filtering with hierarchies and side-information. In KDD’11 (pp. 141–149). New York: ACM. Google Scholar
  15. Murphy, A. H., & Winkler, R. L. (1977). Reliability of subjective probability forecasts of precipitation and temperature. Journal of the Royal Statistical Society. Series C. Applied Statistics, 26(1), 41–47. Google Scholar
  16. Office of Technology Assessment, United States Congress (1986) Federal government information technology: electronic record systems and individual privacy, ota-cit-296. Google Scholar
  17. Ornstein, C. (2008). Fawcett’s cancer file breached. Available online:
  18. Porter, H. (2010). Opting out of nhs spine. Available online:
  19. Thai-Nghe, N., Drumond, L., Horváth, T., Nanopoulos, A., & Schmidt-Thieme, L. (2011). Matrix and tensor factorization for predicting student performance. In CSEDU (Vol. 1, pp. 69–78). Google Scholar
  20. Wright, A., Soran, C., Jenter, C., Volk, L., Bates, D., & Simon, S. (2010). Physician attitudes toward health information exchange: results of a statewide survey. Journal of the American Medical Informatics Association, 17(1), 66–70. CrossRefGoogle Scholar
  21. Yang, S. H., Long, B., Smola, A., Sadagopan, N., Zheng, Z., & Zha, H. (2011). Like like alike: joint friendship and interest propagation in social networks. In WWW’11 (pp. 537–546). Google Scholar
  22. Zhang, W., Gunter, C., Liebovitz, D., Tian, J., & Malin, B. (2011). Role prediction using electronic medical record system audits. In Proceedings of AMIA annual symposium (pp. 858–867). Google Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Aditya Krishna Menon
    • 1
  • Xiaoqian Jiang
    • 1
  • Jihoon Kim
    • 1
  • Jaideep Vaidya
    • 2
  • Lucila Ohno-Machado
    • 1
  1. 1.UC San DiegoLa JollaUSA
  2. 2.Rutgers UniversityNewarkUSA

Personalised recommendations