Skip to main content

A Bootstrapping Approach to Symptom Entity Extraction on Chinese Electronic Medical Records

  • Conference paper
  • First Online:
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (NLP-NABD 2016, CCL 2016)

Abstract

Symptom entities are widely distributed in Chinese electronic medical records. Previous approaches on symptom entity extraction usually extract continuous strings as symptom entities and require massive human efforts on corpus annotation. We describe the symptom entity as two-tuples of <subject, lesion> and design a soft pattern matching method to locate them in sentences in the EMR. Our bootstrapping approach which only requires a few annotated symptom tuples and it allows iterative extraction from mass electronic medical record databases without human supervision. Furthermore, the described method annotates symptom entities in EMR by the extracted tuples. Starting with 60 annotated entities, our approach reached an F value of 81.40 % in the extraction task of 3,150 entities from 992 sets of electronic medical records.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguist. Investig. Rev. Int. Linguist. Fr. Linguist. Gén. 30(24), 3–26 (2007)

    Google Scholar 

  2. Qu, C., Guan, Y., Yang, J., Liu, Y.: The construction of annotated corpora of named entities for Chinese electronic medical records. Chin. High Technol. Lett. 2(5) (2015)

    Google Scholar 

  3. Sittig, D.F., Singh, H.: Which electronic health record is better: A or B? Realities of comparing the effectiveness of electronic health records. J. Comput. Eff. Res. 3(5), 447–450 (2014)

    Article  Google Scholar 

  4. Erica, B., Field, J.R., Sunny, W., et al.: Biobanks and electronic medical records: enabling cost-effective research. Sci. Transl. Med. 6(234), 86 (2014)

    Google Scholar 

  5. Wei, W.-Q., Feng, Q., Jiang, L., et al.: Characterization of statin dose response in electronic medical records. Clin. Pharmacol. Ther. 95(3), 331–338 (2014)

    Article  Google Scholar 

  6. https://github.com/WILAB-HIT/Resources

  7. Eriksen, T.E., Risør, M.B.: What is called symptom? Med. Health Care Philos. 17(1), 89–102 (2014)

    Article  Google Scholar 

  8. Yang, J., Yu, Q., Guan, Y., Jiang, Z.: An overview on research of electronic medical record oriented named entity recognition and entity relation extraction. Acta Autom. Sinica 40(8), 1537–1562 (2014)

    Google Scholar 

  9. Uzuner, Ö., South, B.R., Shen, S., et al.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18(5), 552–556 (2011)

    Article  Google Scholar 

  10. Savova, G.K., Masanz, J.J., Ogren, P.V., et al.: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17(5), 507–513 (2010)

    Article  Google Scholar 

  11. Feng, Y.: Intelligent recognition of named entity in EMRs. Chin. J. Biomed. Eng. 30(2), 256–262 (2011)

    Google Scholar 

  12. Li, D., Savova, G.: Conditional random fields and support vector machines for disorder named entity recognition in clinical texts. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing (BioNLP 2008), pp. 94–95 (2008)

    Google Scholar 

  13. Jiang, M., Chen, Y., Liu, M., et al.: A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J. Am. Med. Inform. Assoc. 8(5), 601–606 (2011)

    Article  MathSciNet  Google Scholar 

  14. Jonnalagadda, S., Cohen, T., Wu, S., et al.: Enhancing clinical concept extraction with distributional semantics. J. Biomed. Inform. 45(1), 129–140 (2012)

    Article  Google Scholar 

  15. Bruijn, B.D., Cherry, C., Kiritchenko, S., et al.: Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J. Am. Med. Inform. Assoc. 18(5), 557–562 (2011)

    Article  Google Scholar 

  16. Xu, G., Quan, G., Wang, Y.: Research of electronic medical record key information extraction based on HL7. J. Harbin Inst. Technol. 3(11), 89–94 (2011)

    Google Scholar 

  17. Zhang, L.: Chinese EMR word segmentation and named entity mining based on semi supervised learning. Harbin Institute of Technology (2014)

    Google Scholar 

  18. Zhao, J., Qin, B.: Design and implementation of event arguments extraction system based on BootStrapping. Intell. Comput. Appl. 2(1), 16–20 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianyi Qin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Qin, T., Guan, Y. (2016). A Bootstrapping Approach to Symptom Entity Extraction on Chinese Electronic Medical Records. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47674-2_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47673-5

  • Online ISBN: 978-3-319-47674-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics