Skip to main content

Embedded Word Representations for Rich Indexing: A Case Study for Medical Records

  • Conference paper
  • First Online:
Case-Based Reasoning Research and Development (ICCBR 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11156))

Included in the following conference series:

Abstract

Case indexing decisions must often confront the tradeoff between rich semantic indexing schemes, which provide effective retrieval at large indexing cost, and shallower indexing schemes, which enable low-cost indexing but may be less reliable. Indexing for textual case-based reasoning is often based on information retrieval approaches that minimize index acquisition cost but sacrifice semantic information. This paper presents JointEmbed, a method for automatically generating rich indices. JointEmbed automatically generates continuous vector space embeddings that implicitly capture semantic information, leveraging multiple knowledge sources such as free text cases and pre-existing knowledge graphs. JointEmbed generates effective indices by applying pTransR, a novel approach for modelling knowledge graphs, to encode and summarize contents of domain knowledge resources. JointEmbed is applied to the medical CBR task of retrieving relevant patient electronic health records, for which potential health consequences make retrieval quality paramount. An evaluation supports that JointEmbed outperforms previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.ncbi.nlm.nih.gov/pmc/tolls/openftlist.

  2. 2.

    https://clinicaltrials.gov.

  3. 3.

    https://www.ncbi.nlm.nih.gov/mesh.

References

  1. Aronson, A.R.: Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium, pp. 17–21. American Medical Informatics Association (2001)

    Google Scholar 

  2. Bichindaritz, I., Marling, C.: Case-based reasoning in the health sciences: what’s next? Artif. Intell. Med. 36(2), 127–135 (2006)

    Article  Google Scholar 

  3. Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucl. Acids Res. 32, 267–270 (2004)

    Article  Google Scholar 

  4. Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, vol. 26, pp. 2787–2795 (2013)

    Google Scholar 

  5. Brüninghaus, S., Ashley, K.D.: The role of information extraction for textual CBR. In: Aha, D.W., Watson, I. (eds.) ICCBR 2001. LNCS (LNAI), vol. 2080, pp. 74–89. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44593-5_6

    Chapter  Google Scholar 

  6. Burke, R.D., Hammond, K.J., Kulyukin, V.A., Lytinen, S.L., Tomuro, N., Schoenberg, S.: Question answering from frequently asked question files: experiences with the FAQ finder system. AI Mag. 18(2), 57–66 (1997)

    Google Scholar 

  7. Cunningham, C., Weber, R., Proctor, J.M., Fowler, C., Murphy, M.: Investigating graphs in textual case-based reasoning. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 573–586. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28631-8_42

    Chapter  Google Scholar 

  8. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)

    Article  Google Scholar 

  9. Gupta, K.M., Aha, D.W.: Towards acquiring case indexing taxonomies from text. In: Proceedings of the 17th International Florida AI Research Society Conference, pp. 172–177 (2004)

    Google Scholar 

  10. Huang, W., Li, G., Jin, Z.: Improved knowledge base completion by the path-augmented transR model. In: Li, G., Ge, Y., Zhang, Z., Jin, Z., Blumenstein, M. (eds.) KSEM 2017. LNCS (LNAI), vol. 10412, pp. 149–159. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63558-3_13

    Chapter  Google Scholar 

  11. Johnson, A.E., et al.: MIMIC-III. Scientific data 3, 160035 (2016)

    Article  Google Scholar 

  12. Kanerva, P., Kristoferson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the Annual Meeting of the Cognitive Science Society, pp. 103–106. Erlbaum (2002)

    Google Scholar 

  13. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning ICML, pp. 1188–1196 (2014)

    Google Scholar 

  14. Lenz, M., Burkhard, H.-D.: CBR for document retrieval: the FAllQ project. In: Leake, D.B., Plaza, E. (eds.) ICCBR 1997. LNCS, vol. 1266, pp. 84–93. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63233-6_481

    Chapter  Google Scholar 

  15. Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2181–2187. AAAI Press (2015)

    Google Scholar 

  16. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781

  17. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26, 3111–3119 (2013)

    Google Scholar 

  18. Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics, pp. 746–751 (2013)

    Google Scholar 

  19. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  20. Moen, H., Ginter, F., Marsi, E., Peltonen, L., Salakoski, T., Salanterä, S.: Care episode retrieval: distributional semantic models for information retrieval in the clinical domain. BMC Med. Inf. Decis. Mak. 15(S-2)–S2 (2015)

    Google Scholar 

  21. Moen, H., et al.: Comparison of automatic summarisation methods for clinical free text notes. Artif. Intell. Med. 67, 25–37 (2016)

    Article  Google Scholar 

  22. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  23. Osgood, R., Bareiss, R.: Automated index generation for constructing large-scale conversational hypermedia systems. In: Proceedings of the Eleventh National Conference on Artificial Intelligence, pp. 309–314. AAAI Press, July 1993

    Google Scholar 

  24. Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. J. Comput. Syst. Sci. 61(2), 217–235 (2000)

    Article  MathSciNet  Google Scholar 

  25. Patterson, D.W., Rooney, N., Dobrynin, V., Galushka, M.: Sophia: a novel approach for textual case-based reasoning. In: Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence IJCAI, pp. 15–20 (2005)

    Google Scholar 

  26. Porter, R., Kaplan, J.: Merck manual (2012). https://www.merckmanuals.com/professional

  27. Proctor, J.M., Waldstein, I., Weber, R.: Identifying facts for TCBR. In: ICCBR 2005 Workshop Proceedings, pp. 150–159 (2005)

    Google Scholar 

  28. Schank, R., et al.: A content theory of memory indexing. Tech. Rep. 1, Institute for the Learning Sciences, Northwestern University (1990)

    Google Scholar 

  29. Sizov, G., Öztürk, P., Štyrák, J.: Acquisition and reuse of reasoning knowledge from textual cases for automated analysis. In: Lamontagne, L., Plaza, E. (eds.) ICCBR 2014. LNCS (LNAI), vol. 8765, pp. 465–479. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11209-1_33

    Chapter  Google Scholar 

  30. Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph and text jointly embedding. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing EMNLP, pp. 1591–1601 (2014)

    Google Scholar 

  31. Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 1112–1119. AAAI Press (2014)

    Google Scholar 

  32. Weber, R.O., Ashley, K.D., Brüninghaus, S.: Textual case-based reasoning. Knowl. Eng. Rev. 20(3), 255–260 (2005)

    Article  Google Scholar 

  33. Wiratunga, N., Lothian, R., Massie, S.: Unsupervised feature selection for text data. In: Roth-Berghofer, T.R., Göker, M.H., Güvenir, H.A. (eds.) ECCBR 2006. LNCS (LNAI), vol. 4106, pp. 340–354. Springer, Heidelberg (2006). https://doi.org/10.1007/11805816_26

    Chapter  Google Scholar 

  34. Wiratunga, N., Lothian, R., Chakraborti, S., Koychev, I.: Textual feature construction from keywords. In: ICCBR 2005 Workshop Proceedings, pp. 110–119 (2005)

    Google Scholar 

  35. Xie, R., Liu, Z., Jia, J., Luan, H., Sun, M.: Representation learning of knowledge graphs with entity descriptions. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 2659–2665. AAAI Press (2016)

    Google Scholar 

  36. Yang, C., He, B.: A novel semantics-based approach to medical literature search. In: IEEE International Conference on Bioinformatics and Biomedicine BIBM, pp. 1616–1623 (2016)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the Indiana University Precision Health Initiative.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Leake .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Metcalf, K., Leake, D. (2018). Embedded Word Representations for Rich Indexing: A Case Study for Medical Records. In: Cox, M., Funk, P., Begum, S. (eds) Case-Based Reasoning Research and Development. ICCBR 2018. Lecture Notes in Computer Science(), vol 11156. Springer, Cham. https://doi.org/10.1007/978-3-030-01081-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01081-2_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01080-5

  • Online ISBN: 978-3-030-01081-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics