A Hybrid Approach for Biomedical Relation Extraction Using Finite State Automata and Random Forest-Weighted Fusion

Mavropoulos, Thanassis; Liparas, Dimitris; Symeonidis, Spyridon; Vrochidis, Stefanos; Kompatsiaris, Ioannis

doi:10.1007/978-3-319-77113-7_35

Thanassis Mavropoulos¹⁴,
Dimitris Liparas¹⁴,
Spyridon Symeonidis¹⁴,
Stefanos Vrochidis¹⁴ &
…
Ioannis Kompatsiaris¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10761))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

884 Accesses
1 Citations

Abstract

The automatic extraction of relations between medical entities found in related texts is considered to be a very important task, due to the multitude of applications that it can support, from question answering systems to the development of medical ontologies. Many different methodologies have been presented and applied to this task over the years. Of particular interest are hybrid approaches, in which different techniques are combined in order to improve the individual performance of either one of them. In this study, we extend a previously established hybrid framework for medical relation extraction, which we modify by enhancing the pattern-based part of the framework and by applying a more sophisticated weighting method. Most notably, we replace the use of regular expressions with finite state automata for the pattern-building part, while the fusion part is replaced by a weighting strategy that is based on the operational capabilities of the Random Forests algorithm. The experimental results indicate the superiority of the proposed approach against the aforementioned well-established hybrid methodology and other state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Frunza, O., Inkpen, D.: Extracting relations between diseases, treatments, and tests from clinical data. In: Butz, C., Lingras, P. (eds.) AI 2011. LNCS, vol. 6657, pp. 140–145. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21043-3_17
Chapter Google Scholar
Ben Abacha, A., Zweigenbaum, P.: A hybrid approach for the extraction of semantic relations from MEDLINE Abstracts. In: Gelbukh, A. (ed.) CICLing 2011. LNCS, vol. 6609, pp. 139–150. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19437-5_11
Chapter Google Scholar
Ben Abacha, A., Zweigenbaum, P.: Means: A medical question-answering system combining nlp techniques and semantic web technologies. Inf. Process. Manage. 51(5), 570–594 (2015)
Article Google Scholar
Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)
Article Google Scholar
Uzuner, Ö., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18(5), 552–556 (2011)
Article Google Scholar
Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: Genies: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(suppl 1), S74–S82 (2001)
Article Google Scholar
Feldman, R., Regev, Y., Finkelstein-Landau, M., Hurvitz, E., Kogan, B.: Mining biomedical literature using information extraction. Current Drug Discov. 2(10), 19–23 (2002)
Google Scholar
Rosario, B., Hearst, M.A.: Classifying semantic relations in bioscience texts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 430. Association for Computational Linguistics, July 2004
Google Scholar
Bundschus, M., Dejori, M., Stetter, M., Tresp, V., Kriegel, H.P.: Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinform. 9(1), 1 (2008)
Article Google Scholar
Li, J., Zhang, Z., Li, X., Chen, H.: Kernel-based learning for biomedical relation extraction. J. Am. Soc. Inf. Sci. Technol. 59(5), 756–769 (2008)
Article Google Scholar
Muzaffar, A.W., Azam, F., Qamar, U.: A relation extraction framework for biomedical text using hybrid feature set. In: Computational and Mathematical Methods in Medicine (2015)
Google Scholar
Luo, Y., Uzuner, Ö., Szolovits, P.: Bridging semantics and syntax with graph algorithms state-of-the-art of extracting biomedical relations. Briefings in Bioinformatics (2016)
Google Scholar
Sahu, S.K., Anand, A., Oruganty, K., Gattu, M.: Relation extraction from clinical texts using domain invariant convolutional neural network. arXiv preprint arXiv:1606.09370 (2016)
Tripoliti, E.E., Fotiadis, D.I., Manis, G.: Automated diagnosis of diseases based on classification: dynamic determination of the number of trees in random forests algorithm. IEEE Trans. Inf Technol. Biomed. 16(4), 615–622 (2012)
Article Google Scholar
Gokgoz, E., Subasi, A.: Comparison of decision tree algorithms for EMG signal classification using DWT. Biomed. Signal Process. Control 18, 138–144 (2015)
Article Google Scholar
Steyrl, D., Scherer, R., Faller, J., Müller-Putz, G.R.: Random forests in non-invasive sensorimotor rhythm brain-computer interfaces: a practical and convenient non-linear classifier. Biomed. Eng./Biomedizinische Technik 61(1), 77–86 (2016)
Article Google Scholar
Liparas, D., HaCohen-Kerner, Y., Moumtzidou, A., Vrochidis, S., Kompatsiaris, I.: News articles classification using random forests and weighted multimodal features. In: Lamas, D., Buitelaar, P. (eds.) IRFC 2014. LNCS, vol. 8849, pp. 63–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12979-2_6
Chapter Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory (1995)
Google Scholar
Rink, B., Harabagiu, S., Roberts, K.: Automatic extraction of relations between medical concepts in clinical texts. J. Am. Med. Inf. Assoc. 18(5), 594–600 (2011)
Article Google Scholar
Grouin, C., et al.: CARAMBA: concept, assertion, and relation annotation using machine-learning based approaches. In: i2b2 Medication Extraction Challenge Workshop, November 2010
Google Scholar
Paumier, S., Nagel, J.S.: UNITEX 3.1BETA. User Manual (2013)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The stanford corenlp natural language processing toolkit. In: ACL (System Demonstrations), pp. 55–60, June 2014
Google Scholar
Lindberg, D.A., Humphreys, B.L., McCray, A.T.: The unified medical language system. In: IMIA Yearbook, pp. 41–51 (1993)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
Roberts, K., Rink, B., Harabagiu, S.: Extraction of medical concepts, assertions, and relations from discharge summaries for the fourth i2b2/VA shared task. In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data. i2b2, Boston (2010)
Google Scholar
de Bruijn, B., Cherry, C., Kiritchenko, S., Martin, J., Zhu, X.: NRC at i2b2: one challenge, three practical tasks, nine statistical systems, hundreds of clinical records, millions of useful features. In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data. i2b2, Boston (2010)
Google Scholar
Patrick, J.D., Nguyen, D.H.M., Wang, Y.: I2b2 challenges in clinical natural language processing 2010. In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data. i2b2, Boston (2010)
Google Scholar

Download references

Acknowledgments

This work was supported by the project KRISTINA (H2020-645012), funded by the European Commission. Deidentified clinical records used in this research were provided by the i2b2 National Center for Biomedical Computing funded by U54LM008748 and were originally prepared for the Shared Tasks for Challenges in NLP for Clinical Data organized by Dr. Ozlem Uzuner, i2b2 and SUNY.

Author information

Authors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thermi-Thessaloniki, Greece
Thanassis Mavropoulos, Dimitris Liparas, Spyridon Symeonidis, Stefanos Vrochidis & Ioannis Kompatsiaris

Authors

Thanassis Mavropoulos
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Liparas
View author publications
You can also search for this author in PubMed Google Scholar
Spyridon Symeonidis
View author publications
You can also search for this author in PubMed Google Scholar
Stefanos Vrochidis
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Kompatsiaris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanassis Mavropoulos .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mavropoulos, T., Liparas, D., Symeonidis, S., Vrochidis, S., Kompatsiaris, I. (2018). A Hybrid Approach for Biomedical Relation Extraction Using Finite State Automata and Random Forest-Weighted Fusion. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-77113-7_35
Published: 10 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77112-0
Online ISBN: 978-3-319-77113-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics