Skip to main content

Vector Space Models for Encoding and Retrieving Longitudinal Medical Record Data

  • Conference paper
  • First Online:
Biomedical Data Management and Graph Online Querying (Big-O(Q) 2015, DMAH 2015)

Abstract

Vector space models (VSMs) are widely used as information retrieval methods and have been adapted to many applications. In this paper, we propose a novel use of VSMs for classification and retrieval of longitudinal electronic medical record data. These data contain sequences of clinical events that are based on treatment decisions, but the treatment plan is not recorded with the events. The goals of our VSM methods are (1) to identify which plan a specific patient treatment sequence best matches and (2) to find patients whose treatment histories most closely follow a specific plan. We first build a traditional VSM that uses standard terms corresponding to the events found in clinical plans and treatment histories. We also consider temporal terms that represent binary relationships of precedence between or co-occurrence of these events. We create four alternative VSMs that use different combinations of standard and temporal terms as dimensions, and we evaluate their performance using manually annotated data on chemotherapy plans and treatment histories for breast cancer patients. In classifying treatment histories, the best approach used temporal terms, which had 87 % accuracy in identifying the correct clinical plan. For information retrieval, our results showed that the traditional VSM performed best. Our results indicate that VSMs have good performance for classification and retrieval of longitudinal electronic medical records, but the results depend on how the model is constructed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Syed, H., Das, A.K.: Identifying chemotherapy regimens in electronic health record data using interval-encoded sequence alignment. In: Holmes, J.H., Bellazzi, R., Sacchi, L., Peek, N. (eds.) AIME 2015. LNCS, vol. 9105, pp. 143–147. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  2. Syed, H., Das, A.K.: Temporal Needleman–Wunsch. In: Proceedings of 2015 IEEE/ACM International Conference on Data Science and Advanced Analytics (DSAA 2015) (2015)

    Google Scholar 

  3. Salton, G.: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall Inc., Upper Saddle River (1971)

    Google Scholar 

  4. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975). doi:10.1145/361219.361220

    Article  MATH  Google Scholar 

  5. Suzuki, T., Yokoi, H., Fujita, S., Takabayashi, K.: Automatic DPC code selection from electronic medical records: text mining trial of discharge summary. Methods Inf. Med. 47(6), 541–548 (2008)

    Google Scholar 

  6. Prados-Suárez, B., Molina, C., Peña, Y.C., de Reyes, M.P.: Improving electronic health records retrieval using contexts. Expert Syst. Appl. 39(10), 8522–8536 (2012)

    Article  Google Scholar 

  7. Hauskrecht, M., Valko, M., Batal, I., Clermont, G., Visweswaran, S., Cooper, G.F.: Conditional outlier detection for clinical alerting. In: AMIA Annual Symposium Proceedings/AMIA Symposium 2010, pp. 286–290 (2010)

    Google Scholar 

  8. Jain, H., Thao, C., Zhao, H.: Enhancing electronic medical record retrieval through semantic query expansion. ISeB 10(2), 165–181 (2012)

    Article  Google Scholar 

  9. Mao, W., Chu, W.W.: The phrase-based vector space model for automatic retrieval of free-text medical documents. Data Knowl. Eng. 61(1), 76–92 (2007)

    Article  Google Scholar 

  10. Mao, W., Chu, W.W.: Free-text medical document retrieval via phrase-based vector space model. In: Proceedings/AMIA Annual Symposium, AMIA Symposium 2002, pp. 489–493 (2002)

    Google Scholar 

  11. Hassanpour, S., O’Connor, M.J., Das, A.K.: Evaluation of semantic-based information retrieval methods in the autism phenotype domain. In: AMIA Annual Symposium Proceedings/AMIA Symposium, pp. 569–577 (2011)

    Google Scholar 

  12. Hassanpour, S., O’Connor, M.J., Das, A.K.: A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain. J. Biomed. Semant. 4(1), 14 (2013)

    Article  Google Scholar 

  13. Mondal, D., Gangopadhyay, A., Russell, W.: Medical decision making using vector space model. In: Proceedings of the 1st ACM International Health Informatics Symposium (IHI 2010), pp. 386–390. ACM, New York (2010)

    Google Scholar 

  14. Pôssas, B., Ziviani, N., Meira, Jr. W.: Enhancing the set-based model using proximity information. In: Laender, A.H., Oliveir, A.L., (eds.) (SPIRE 2002). LNCS, vol. 2476, pp. 104–116. Springer, Heidelberg (2002)

    Google Scholar 

  15. Pôssas, B., Ziviani, N., Meira, W.J., Ribeiro-Neto, B.: Set-based model: a new approach for information retrieval. In: SIGIR 2002, pp. 230–237 (2002)

    Google Scholar 

  16. Silva, I.R., Souza, J.A.N., Santos, K.S.: Dependence among terms in vector space model. In: Proceedings of the International Database Engineering and Applications Symposium (IDEAS 2004), pp. 97–102 (2004)

    Google Scholar 

  17. Wong, S.K.M., Ziarko, W., Raghavan, V.V., Wong, P.C.N.: On modeling of information retrieval concepts in vector spaces. ACM Trans. Database Syst. 12(2), 299–321 (1987)

    Article  Google Scholar 

  18. Wong, S.K.M., Ziarko W., Wong, P.C.N.: Generalized vector space model in information retrieval. In: SIGIR 85 Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 18–25 (1985)

    Google Scholar 

  19. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)

    MathSciNet  MATH  Google Scholar 

  20. Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–843 (1983)

    Article  MATH  Google Scholar 

  21. Carlson, R.W., Allred, D.C., Anderson, B.O., et al.: Invasive breast cancer. J. Natl. Compr. Cancer Netw. 9(2), 136–222 (2011)

    Google Scholar 

  22. National Comprehensive Cancer Network. Breast cancer. NCCN Clinical Practice Guidelines in Oncology, version 1.2012 (Accessed from the web) (2012)

    Google Scholar 

  23. National Comprehensive Cancer Network. Breast cancer. NCCN Clinical Practice Guidelines in Oncology, version 1.2013 (Accessed from the web) (2013)

    Google Scholar 

  24. Gradishar, W.J., Anderson, B.O., Blair, S.L., et al.: Breast cancer version 3.2014. J. Natl. Compr. Cancer Netw. 12(4), 542–590 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haider Syed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Syed, H., Das, A.K. (2016). Vector Space Models for Encoding and Retrieving Longitudinal Medical Record Data. In: Wang, F., Luo, G., Weng, C., Khan, A., Mitra, P., Yu, C. (eds) Biomedical Data Management and Graph Online Querying. Big-O(Q) DMAH 2015 2015. Lecture Notes in Computer Science(), vol 9579. Springer, Cham. https://doi.org/10.1007/978-3-319-41576-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41576-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41575-8

  • Online ISBN: 978-3-319-41576-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics