Skip to main content

A Hybrid Approach for Biomedical Relation Extraction Using Finite State Automata and Random Forest-Weighted Fusion

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2017)

Abstract

The automatic extraction of relations between medical entities found in related texts is considered to be a very important task, due to the multitude of applications that it can support, from question answering systems to the development of medical ontologies. Many different methodologies have been presented and applied to this task over the years. Of particular interest are hybrid approaches, in which different techniques are combined in order to improve the individual performance of either one of them. In this study, we extend a previously established hybrid framework for medical relation extraction, which we modify by enhancing the pattern-based part of the framework and by applying a more sophisticated weighting method. Most notably, we replace the use of regular expressions with finite state automata for the pattern-building part, while the fusion part is replaced by a weighting strategy that is based on the operational capabilities of the Random Forests algorithm. The experimental results indicate the superiority of the proposed approach against the aforementioned well-established hybrid methodology and other state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Frunza, O., Inkpen, D.: Extracting relations between diseases, treatments, and tests from clinical data. In: Butz, C., Lingras, P. (eds.) AI 2011. LNCS, vol. 6657, pp. 140–145. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21043-3_17

    Chapter  Google Scholar 

  2. Ben Abacha, A., Zweigenbaum, P.: A hybrid approach for the extraction of semantic relations from MEDLINE Abstracts. In: Gelbukh, A. (ed.) CICLing 2011. LNCS, vol. 6609, pp. 139–150. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19437-5_11

    Chapter  Google Scholar 

  3. Ben Abacha, A., Zweigenbaum, P.: Means: A medical question-answering system combining nlp techniques and semantic web technologies. Inf. Process. Manage. 51(5), 570–594 (2015)

    Article  Google Scholar 

  4. Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)

    Article  Google Scholar 

  5. Uzuner, Ö., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18(5), 552–556 (2011)

    Article  Google Scholar 

  6. Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: Genies: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(suppl 1), S74–S82 (2001)

    Article  Google Scholar 

  7. Feldman, R., Regev, Y., Finkelstein-Landau, M., Hurvitz, E., Kogan, B.: Mining biomedical literature using information extraction. Current Drug Discov. 2(10), 19–23 (2002)

    Google Scholar 

  8. Rosario, B., Hearst, M.A.: Classifying semantic relations in bioscience texts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 430. Association for Computational Linguistics, July 2004

    Google Scholar 

  9. Bundschus, M., Dejori, M., Stetter, M., Tresp, V., Kriegel, H.P.: Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinform. 9(1), 1 (2008)

    Article  Google Scholar 

  10. Li, J., Zhang, Z., Li, X., Chen, H.: Kernel-based learning for biomedical relation extraction. J. Am. Soc. Inf. Sci. Technol. 59(5), 756–769 (2008)

    Article  Google Scholar 

  11. Muzaffar, A.W., Azam, F., Qamar, U.: A relation extraction framework for biomedical text using hybrid feature set. In: Computational and Mathematical Methods in Medicine (2015)

    Google Scholar 

  12. Luo, Y., Uzuner, Ö., Szolovits, P.: Bridging semantics and syntax with graph algorithms state-of-the-art of extracting biomedical relations. Briefings in Bioinformatics (2016)

    Google Scholar 

  13. Sahu, S.K., Anand, A., Oruganty, K., Gattu, M.: Relation extraction from clinical texts using domain invariant convolutional neural network. arXiv preprint arXiv:1606.09370 (2016)

  14. Tripoliti, E.E., Fotiadis, D.I., Manis, G.: Automated diagnosis of diseases based on classification: dynamic determination of the number of trees in random forests algorithm. IEEE Trans. Inf Technol. Biomed. 16(4), 615–622 (2012)

    Article  Google Scholar 

  15. Gokgoz, E., Subasi, A.: Comparison of decision tree algorithms for EMG signal classification using DWT. Biomed. Signal Process. Control 18, 138–144 (2015)

    Article  Google Scholar 

  16. Steyrl, D., Scherer, R., Faller, J., Müller-Putz, G.R.: Random forests in non-invasive sensorimotor rhythm brain-computer interfaces: a practical and convenient non-linear classifier. Biomed. Eng./Biomedizinische Technik 61(1), 77–86 (2016)

    Article  Google Scholar 

  17. Liparas, D., HaCohen-Kerner, Y., Moumtzidou, A., Vrochidis, S., Kompatsiaris, I.: News articles classification using random forests and weighted multimodal features. In: Lamas, D., Buitelaar, P. (eds.) IRFC 2014. LNCS, vol. 8849, pp. 63–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12979-2_6

    Chapter  Google Scholar 

  18. Vapnik, V.N.: The Nature of Statistical Learning Theory (1995)

    Google Scholar 

  19. Rink, B., Harabagiu, S., Roberts, K.: Automatic extraction of relations between medical concepts in clinical texts. J. Am. Med. Inf. Assoc. 18(5), 594–600 (2011)

    Article  Google Scholar 

  20. Grouin, C., et al.: CARAMBA: concept, assertion, and relation annotation using machine-learning based approaches. In: i2b2 Medication Extraction Challenge Workshop, November 2010

    Google Scholar 

  21. Paumier, S., Nagel, J.S.: UNITEX 3.1BETA. User Manual (2013)

    Google Scholar 

  22. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The stanford corenlp natural language processing toolkit. In: ACL (System Demonstrations), pp. 55–60, June 2014

    Google Scholar 

  23. Lindberg, D.A., Humphreys, B.L., McCray, A.T.: The unified medical language system. In: IMIA Yearbook, pp. 41–51 (1993)

    Google Scholar 

  24. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  25. Roberts, K., Rink, B., Harabagiu, S.: Extraction of medical concepts, assertions, and relations from discharge summaries for the fourth i2b2/VA shared task. In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data. i2b2, Boston (2010)

    Google Scholar 

  26. de Bruijn, B., Cherry, C., Kiritchenko, S., Martin, J., Zhu, X.: NRC at i2b2: one challenge, three practical tasks, nine statistical systems, hundreds of clinical records, millions of useful features. In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data. i2b2, Boston (2010)

    Google Scholar 

  27. Patrick, J.D., Nguyen, D.H.M., Wang, Y.: I2b2 challenges in clinical natural language processing 2010. In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data. i2b2, Boston (2010)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the project KRISTINA (H2020-645012), funded by the European Commission. Deidentified clinical records used in this research were provided by the i2b2 National Center for Biomedical Computing funded by U54LM008748 and were originally prepared for the Shared Tasks for Challenges in NLP for Clinical Data organized by Dr. Ozlem Uzuner, i2b2 and SUNY.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thanassis Mavropoulos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mavropoulos, T., Liparas, D., Symeonidis, S., Vrochidis, S., Kompatsiaris, I. (2018). A Hybrid Approach for Biomedical Relation Extraction Using Finite State Automata and Random Forest-Weighted Fusion. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77113-7_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77112-0

  • Online ISBN: 978-3-319-77113-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics