Skip to main content

Interactive Deep Metric Learning for Healthcare Cohort Discovery

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1127))

Abstract

Given the continuous growth of large-scale complex electronic healthcare data, a data-driven healthcare cohort discovery facilitated by machine learning tools with domain expert knowledge is required to gain further insights of the healthcare system. Specifically, clustering plays a crucial role in healthcare cohort discovery, and metric learning is able to incorporate expert feedback to generate more fit-for-purpose clustering outputs. However, most of the existing metric learning methods assume all labelled instances already pre-exists, which is not always true in real-world applications. In addition, big data in healthcare also brings new challenges to metric learning on handling complex structured data. In this paper, we propose a novel systematic method, namely Interactive Deep Metric Learning (IDML), which uses an interactive process to iteratively incorporate feedback from domain experts to identify cohorts that are more relevant to a particular pre-defined purpose. Moreover, the proposed method leverages powerful deep learning-based embedding techniques to incrementally gain effective representations for the complex structures inherit in patient journey data. We experimentally evaluate the effectiveness of the proposed IDML using two public healthcare datasets. The proposed method has also been implemented into an interactive cohort discovery tool for a real-world application in healthcare.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988)

    MathSciNet  Google Scholar 

  2. Awasthi, P., Balcan, M.F., Voevodski, K.: Local algorithms for interactive clustering. J. Mach. Learn. Res. 18(1), 75–109 (2017)

    MathSciNet  MATH  Google Scholar 

  3. Balcan, M.-F., Blum, A.: Clustering with interactive feedback. In: Freund, Y., Györfi, L., Turán, G., Zeugmann, T. (eds.) ALT 2008. LNCS (LNAI), vol. 5254, pp. 316–328. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87987-9_27

    Chapter  Google Scholar 

  4. Balcan, M.F., Liang, Y., Gupta, P.: Robust hierarchical clustering. J. Mach. Learn. Res. 15(1), 3831–3871 (2014)

    MathSciNet  MATH  Google Scholar 

  5. Brainard, W.C.: Uncertainty and the effectiveness of policy. Am. Econ. Rev. 57(2), 411–425 (1967)

    Google Scholar 

  6. Choi, E., et al.: Doctor AI: predicting clinical events via recurrent neural networks. In: Machine Learning for Healthcare Conference, pp. 301–318 (2016)

    Google Scholar 

  7. Choi, E., et al.: Multi-layer representation learning for medical concepts. In: SIGKDD, pp. 1495–1504. ACM (2016)

    Google Scholar 

  8. Choi, E., et al.: RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. In: NIPS, pp. 3504–3512 (2016)

    Google Scholar 

  9. Choi, Y., Chiu, C.Y.I., Sontag, D.: Learning low-dimensional representations of medical concepts. AMIA Jt. Summits Transl. Sci. Proc. 2016, 41 (2016)

    Google Scholar 

  10. cms.gov: CMS 2008–2010 data entrepreneurs’ synthetic public use file (2015)

    Google Scholar 

  11. Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: ICML, pp. 209–216. ACM (2007)

    Google Scholar 

  12. Goldberger, J., Hinton, G.E., Roweis, S.T., Salakhutdinov, R.R.: Neighbourhood components analysis. In: NIPS, pp. 513–520 (2005)

    Google Scholar 

  13. Hinton, G.E., Roweis, S.T.: Stochastic neighbor embedding. In: NIPS, pp. 857–864 (2003)

    Google Scholar 

  14. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  15. Jensen, P.B., Jensen, L.J., Brunak, S.: Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Gen. 13(6), 395 (2012)

    Article  Google Scholar 

  16. Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)

    Article  Google Scholar 

  17. Jolliffe, I.: Principal component analysis for special types of data. In: Jolliffe, I. (ed.) Principal Component Analysis, pp. 338–372. Springer, New York (2002). https://doi.org/10.1007/0-387-22440-8_13

    Chapter  MATH  Google Scholar 

  18. Lipton, Z.C., Kale, D.C., Elkan, C., Wetzel, R.: Learning to diagnose with LSTM recurrent neural networks. arXiv preprint arXiv:1511.03677 (2015)

  19. Meystre, S., et al.: Clinical data reuse or secondary use: current status and potential future progress (2017)

    Article  Google Scholar 

  20. Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)

    Google Scholar 

  21. Mikolov, T., et al.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)

  22. Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016)

    Article  Google Scholar 

  23. Peng, X., Long, G., Pan, S., Jiang, J., Niu, Z.: Attentive dual embedding for understanding medical concepts in electronic health records. In: IJCNN, pp. 1–8 (2019)

    Google Scholar 

  24. Peng, X., Long, G., Shen, T., Wang, S., Jiang, J., Blumenstein, M.: Temporal self-attention network for medical concept embedding. arXiv preprint arXiv:1909.06886 (2019)

  25. Schoen, C., Osborn, R., Doty, M.M., Squires, D., Peugh, J., Applebaum, S.: A survey of primary care physicians in eleven countries, 2009: perspectives on care, costs, and experiences: doctors say problems exist across all eleven countries, although some nations are doing a better job than others. Health Aff. 28(Suppl1), w1171–w1183 (2009)

    Article  Google Scholar 

  26. Suo, Q., et al.: Deep patient similarity learning for personalized healthcare. IEEE T NANOBIOSCI 17(3), 219–227 (2018)

    Article  MathSciNet  Google Scholar 

  27. Wang, F.: Semisupervised metric learning by maximizing constraint margin. Cybernetics 41(4), 931–939 (2011)

    Google Scholar 

  28. Wang, F., Sun, J.: PSF: a unified patient similarity evaluation framework through metric learning with weak supervision. IEEE J. Biomed. Health Inform. 19(3), 1053–1060 (2015)

    Article  Google Scholar 

  29. Wang, F., Sun, J., Hu, J., Ebadollahi, S.: iMet: interactive metric learning in healthcare applications. In: SDM, pp. 944–955. SIAM (2011)

    Google Scholar 

  30. Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. JMLR 10(Feb), 207–244 (2009)

    MATH  Google Scholar 

  31. Weiskopf, N.G., et al.: Defining and measuring completeness of electronic health records for secondary use. J. Biomed. Inform. 46(5), 830–836 (2013)

    Article  Google Scholar 

  32. Weiskopf, N.G., Weng, C.: Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. JAMIA 20(1), 144–151 (2013)

    Google Scholar 

  33. Xing, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: NIPS, pp. 521–528 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yang Wang , Guodong Long or Xueping Peng .

Editor information

Editors and Affiliations

Appendices

A LSTM Formulation

$$\begin{aligned} \begin{array}{l} i_t = \sigma (W^{i}v_t + U^{i}h_{t-1})\\ f_t = \sigma (W^{f}v_t + U^{f}h_{t-1})\\ o_t = \sigma (W^{o}v_t + U^{o}h_{t-1})\\ \widetilde{c}_{t} = tanh(W^{c}v_t + U^{c}h_{t-1})\\ c_{t} = f_t * {c}_{t-1} + i_{t} * \widetilde{c}_{t}\\ h_t = tanh(c_{t})*o_t \end{array} \end{aligned}$$
(6)

where \(i_t,f_t, o_t\) are input, forget and output gates respectively. The gates are different neural networks that decide which information is allowed on the cell state. The gates can learn what information is relevant to keep or forget during training. \(v_t\) is an input to a network at time t, \(h_{t-1}\) is an output at time \(t-1\) and \({c}_{t-1}\) is an internal cell state at \(t-1\).

B Clustering Evaluation Measures

  • The Normalized Mutual Information (NMI) is defined as:

    $$\begin{aligned} NMI (\widehat{K};K) = \dfrac{2\times I(\widehat{K};K)}{\left[ H( \widehat{K}) +H(K) \right] } \end{aligned}$$
    (7)

    where \(I(\widehat{K};K)\) is the mutual information and the entropies \(H(\widehat{K})\) and H(K) are used for normalizing the mutual information to be in the range of [0, 1]. The higher the NMI is, the better the corresponding clustering is.

  • The Adjusted Rand Index (ARI) of clustering is defined as:

    $$\begin{aligned} ARI = \dfrac{RI - E[RI]}{max(RI) - E[RI]} \end{aligned}$$
    (8)

    where \(RI = \dfrac{a+b}{C^{N}_{2}}\), a is the number of patient pairs coming from the same cohort and also grouped into same cluster, b is the number of patient pairs belonging to different cohorts and grouped into different clusters. N is the total number of patients. The range of ARI values is [−1, 1]. The higher the ARI is, the better the corresponding clustering is.

  • The Purity of clustering is defined as:

    $$\begin{aligned} Purity (\widehat{K},K) = \dfrac{1}{N}\sum ^{|\widehat{K}|}_{i=1}\max _{j}\left| \widehat{k_{i}}\cap k_{j}\right| , \end{aligned}$$
    (9)

    where \(\widehat{K}=\{\widehat{k_{1}}, \widehat{k_{2}},\dots , \widehat{k_{|\widehat{K}|}}\}\) is the set of clusters produced by the chosen clustering algorithms. \(|\widehat{K}|\) is the total number of clusters. \(K=\{k_{1}, k_{2},\dots , k_{|K|}\}\) is the group of patient cohorts (i.e. ground truth). \(\max _{j}\left| \widehat{k_{i}}\cap k_{j}\right| \) is the size of the intersection of cluster \(\widehat{k_{i}}\) and patient cohort \(k_{j}\) which is most frequent inside. The range of Purity values is [0, 1]. The higher the Purity is, the better the corresponding clustering is.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Y., Long, G., Peng, X., Clarke, A., Stevenson, R., Gerrard, L. (2019). Interactive Deep Metric Learning for Healthcare Cohort Discovery. In: Le, T., et al. Data Mining. AusDM 2019. Communications in Computer and Information Science, vol 1127. Springer, Singapore. https://doi.org/10.1007/978-981-15-1699-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-1699-3_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-1698-6

  • Online ISBN: 978-981-15-1699-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics