Skip to main content

Leaving No Stone Unturned: Using Machine Learning Based Approaches for Information Extraction from Full Texts of a Research Data Warehouse

  • Conference paper
  • First Online:
Book cover Data Integration in the Life Sciences (DILS 2018)

Abstract

Data in healthcare and routine medical treatment is growing fast. Therefore and because of its variety, possible correlation within these are becoming even more complex. Popular tools for facilitating the daily routine for the clinical researchers are more often based on machine learning (ML) algorithms. Those tools might facilitate data management, data integration or even content classification. Besides commercial functionalities, there are many solutions which are developed by the user himself for his own, specific question of research or task. One of these tasks is described within this work: qualifying the Weber fracture, an ankle joint fracture, from radiological findings with the help of supervised machine learning algorithms. To do so, the findings were firstly processed with common natural language processing (NLP) methods. For the classifying part, we used the bags-of-words-approach to bring together the medical findings on the one hand, and the metadata of the findings on the other hand, and compared several common classifier to have the best results. In order to conduct this study, we used the data and the technology of the Enterprise Clinical Research Data Warehouse (ECRDW) from Hannover Medical School. This paper shows the implementation of machine learning and NLP techniques into the data warehouse integration process in order to provide consolidated, processed and qualified data to be queried for teaching and research purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    ICD-GM: “International Classification of Diseases, German Modification” is the official classification for diagnoses in outpatient and inpatient health care in Germany.

References

  1. Köppen, V., Saake, G., Sattler, K.-U.: Data Warehouse Technologien. MITP (2014). ISBN 9783826694851

    Google Scholar 

  2. Tolxdorff, T., Puppe, F.: Klinisches Data Warehouse. Informatik-Spektrum 39, 233–237 (2016). https://doi.org/10.1007/s00287-016-0968-3

    Article  Google Scholar 

  3. Zapletal, E., Bibault, J.-E., Giraud, P., Burgun, A.: Integrating multimodal radiation therapy data into i2b2. Appl. Clin. Inform. 09, 377–390 (2018). https://doi.org/10.1055/s-0038-1651497

    Article  Google Scholar 

  4. Dietrich, G., et al.: Ad hoc information extraction for clinical data warehouses. Methods Inf. Med. 57, e22–e29 (2018). https://doi.org/10.3414/ME17-02-0010

    Article  Google Scholar 

  5. Kharat, A., Singh, A., Kulkarni, V., Shah, D.: Data mining in radiology. Indian J. Radiol. Imaging 24, 97 (2014). https://doi.org/10.4103/0971-3026.134367

    Article  Google Scholar 

  6. Do, B.H., Wu, A.S., Maley, J., Biswal, S.: Automatic retrieval of bone fracture knowledge using natural language processing. J. Digit. Imaging 26, 709–713 (2013). https://doi.org/10.1007/s10278-012-9531-1

    Article  Google Scholar 

  7. Perkins, J.: Python Text Processing with NLTK 2.0 Cookbook. Packt Publishing, Birmingham (2010). ISBN 978-1-849513-60-9

    Google Scholar 

  8. Daumke, P., Simon, K., Paetzold, J., Marwede, D., Kotter, E.: Data-Mining in radiologischen Befundtexten. RöFo - Fortschritte auf dem Gebiet der Röntgenstrahlen und der Bildgeb. Verfahren 182, WS117_3 (2010). https://doi.org/10.1055/s-0030-1252462

  9. Kavuluru, R., Rios, A., Lu, Y.: An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records. Artif. Intell. Med. 65, 155–166 (2015). https://doi.org/10.1016/j.artmed.2015.04.007

    Article  Google Scholar 

  10. McNutt, T.R., Moore, K.L., Quon, H.: Needs and challenges for big data in radiation oncology. Int. J. Radiat. Oncol. Biol. Phys. 95, 909–915 (2016). https://doi.org/10.1016/j.ijrobp.2015.11.032

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Johanna Fiebeck or Svetlana Gerbel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fiebeck, J., Laser, H., Winther, H.B., Gerbel, S. (2019). Leaving No Stone Unturned: Using Machine Learning Based Approaches for Information Extraction from Full Texts of a Research Data Warehouse. In: Auer, S., Vidal, ME. (eds) Data Integration in the Life Sciences. DILS 2018. Lecture Notes in Computer Science(), vol 11371. Springer, Cham. https://doi.org/10.1007/978-3-030-06016-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-06016-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-06015-2

  • Online ISBN: 978-3-030-06016-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics