Skip to main content

Advertisement

Log in

Effective Identification of Similar Patients Through Sequential Matching over ICD Code Embedding

  • Patient Facing Systems
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD (International Classification of Diseases (World Health Organization 2013)) code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The R codes of WVM and others are available at https://github.com/nphdang/WVM

  2. Ethics approval was obtained from the NSW Population and Health Services Research Ethics Committee (AU RED Reference: HREC/15/CIPHS/1)

  3. We exclude the runtime of learning ICD code vectors in CSM and our WVM since this task is negligible, which only takes 106 (second) in our experiment.

References

  1. World Health Organization: International Classification of Diseases (ICD). http://www.who.int/classifications/icd/en/, 2013

  2. World Health Organization: International statistical classification of diseases and related health problems 10th revision. [Online]. Available: http://apps.who.int/classifications/icd10/browse/2010/en, 2010

  3. Australian Consortium for Classification Development: ICD-10-AM. [Online]. Available: https://www.accd.net.au/Icd10.aspx, 2017

  4. O’Malley, K., Cook, K., Price, M., Wildes, K. R., Hurdle, J., and Ashton, C., Measuring diagnoses: ICD code accuracy. Health Serv. Res. 40:1620–1639, 2005.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Wang, F., Hu, J., and Sun, J.: Medical prognosis based on patient similarity and expert feedback. In: The 21st International Conference on Pattern Recognition, pp. 1799–1802, IEEE, 2012.

  6. Choi, E., Schuetz, A., Stewart, W. F., and Sun, J.: Medical concept representation learning from electronic health records and its application on heart failure prediction. arXiv:1602.03686, 2016

  7. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119, 2013.

  8. Lee, J., Maslove, D.M., and Dubin, J., Personalized mortality prediction driven by electronic medical data and a patient similarity metric. PloS One 10(5):e0127428, 2015.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Carnaby-Mann, G., and Crary, M., Mcneill dysphagia therapy program: a case-control study. Arch. Phys. Med. Rehabil. 91(5):743–749, 2010.

    Article  PubMed  Google Scholar 

  10. Hielscher, T., Spiliopoulou, M., Völzke, H., and Kühn, J.-P.: Using participant similarity for the classification of epidemiological data on hepatic steatosis. In: The 27th International Symposium on Computer-Based Medical Systems, pp. 1–7, IEEE, 2014.

  11. Le, Q, and Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196, 2014.

  12. Levy, O., Goldberg, Y., and Dagan, I., Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Comput. Linguist. 3:211–225, 2015.

    Google Scholar 

  13. Grover, A, and Leskovec, J.: node2vec: scalable feature learning for networks in KDD. In: ACM, pp. 855–864, 2016.

  14. Nguyen, D., Luo, W., Nguyen, T. D., Venkatesh, S., and Phung, D.: Learning graph representation via frequent subgraphs. In: SDM. Accepted, SIAM, 2018.

  15. Moen, H., Ginter, F., Marsi, E., Peltonen, L.-M., Salakoski, T., and Salanterä, S., Care episode retrieval: distributional semantic models for information retrieval in the clinical domain. BMC Med. Inform. Decis. Mak. 15(2):1, 2015.

    Google Scholar 

  16. Nguyen, P., Tran, T., Wickramasinghe, N., and Venkatesh, S., Deepr: a convolutional net for medical records. IEEE J. Biomed. Health Inform. 21(1):22–30, 2017.

    Article  PubMed  Google Scholar 

  17. Choi, E., Bahadori, M. T., Searles, E., Coffey, C., Thompson, M., Bost, J., Tejedor-Sojo, J., and Sun. J.: Multi-layer representation learning for medical concepts in KDD. In: ACM, pp. 1495–1504, 2016.

  18. Choi, Y., Chiu, C. Y.-I., and Sontag, D.: Learning low-dimensional representations of medical concepts. In: AMIA Summits on Translational Science Proceedings, pp. 41–51, 2016.

  19. Mikolov, T., Chen, K., Corrado, G., and Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013

  20. Pearce, N., Analysis of matched case-control studies. BMJ 352:i969, 2016.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Nguyen, D., Luo, W., Phung, D., and Venkatesh, S.: Exceptional contrast set mining: moving beyond the deluge of the obvious. In: Australasian Joint Conference on Artificial Intelligence, pp. 455–468. Springer, Berlin, 2016.

  22. Bigus, J., Campbell, M., Carmeli, B., Cefkin, M., Chang, H., Chen-Ritzo, C.-H., Cody, W., Ebadollahi, S., Evfimievski, A., Farkash, A., et al., Information technology for healthcare transformation. IBM Journal of Research and Development 55(5):6–20, 2011.

    Article  Google Scholar 

  23. Thomas, K., Rahman, M., Mor, V., and Intrator, O., Influence of hospital and nursing home quality on hospital readmissions. The American Journal of Managed Care 20(11):e523, 2014.

    PubMed  PubMed Central  Google Scholar 

  24. Håkonsen, S., Pedersen, P., Bjerrum, M., Bygholm, A., and Peters, M., Nursing minimum data sets for documenting nutritional care for adults in primary healthcare: a scoping review. JBI Database of Systematic Reviews and Implementation Reports 16(1):117–139, 2018.

    Article  PubMed  Google Scholar 

  25. Maaten, L. V. D., and Hinton, G., Visualizing data using t-sne. Journal of Machine Learning Research 9: 2579–2605, 2008.

    Google Scholar 

  26. Futoma, J., Morris, J., and Lucas, J., A comparison of models for predicting early hospital readmissions. Journal of Biomedical Informatics 56:229–238, 2015.

    Article  PubMed  Google Scholar 

  27. Pham, T., Tran, T., Phung, D., and Venkatesh, S., Deepcare: a deep dynamic memory model for predictive medicine in PAKDD, pp. 30–41. Berlin: Springer, 2016.

    Google Scholar 

  28. Turgeman, L., May, J., and Sciulli, R., Insights from a machine learning model for predicting the hospital length of stay (los) at the time of admission. Expert Systems with Applications 78:376–385, 2017.

    Article  Google Scholar 

  29. Chaou, C.-H., Chen, H.-H., Chang, S.-H., Tang, P., Pan, S.-L., Yen, A. M.-F., and Chiu, T.-F., Predicting length of stay among patients discharged from the emergency departmentusing an accelerated failure time model. PloS One 12(1):e0165756, 2017.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Nguyen, D., Nguyen, T. D., Luo, W., and Venkatesh, S.: Trans2vec: learning transaction embedding via items and frequent itemsets. In: PAKDD. Accepted. Springer, Berlin, 2018.

  31. Pobiedina, N., and Ichise, R., Citation count prediction as a link prediction problem. Applied Intelligence 44(2):252–268, 2016.

    Article  Google Scholar 

Download references

Acknowledgments

This work is partially supported by the Telstra-Deakin Centre of Excellence (CoE) in Big Data and Machine Learning. Dinh Phung gratefully acknowledges the partial support from the Australian Research Council (ARC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dang Nguyen.

Ethics declarations

Conflict of Interest

The authors have no conflict of interest to declare.

Ethical Approval

Ethics approval was obtained from the New South Wales Population and Health Services Research Ethics Committee (AU RED Reference: HREC/15/CIPHS/1).

Informed Consent

This study is a secondary analysis of routinely collected data, and the consent had been obtained by the original data guarantor.

Additional information

This article is part of the Topical Collection on Patient Facing Systems

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, D., Luo, W., Venkatesh, S. et al. Effective Identification of Similar Patients Through Sequential Matching over ICD Code Embedding. J Med Syst 42, 94 (2018). https://doi.org/10.1007/s10916-018-0951-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-018-0951-4

Keywords

Navigation