Abstract
Open Information Extraction (OIE) is a process of extracting clauses present in the text. Extraction of clauses is useful for several applications. However, the existing OIE methods do not focus on the improvement of such applications. In this paper, we present a methodology for OIE using a rule-based clause extraction engine (RCE-OIE) by considering some aspects like handling of coordinating conjunctions, negations, and relative clauses for the improvement of semantic applications. We have evaluated RCE-OIE on OIE datasets to show that our clause extraction approach is domain-independent and comparable with the state-of-the-art OIE systems. Our RCE-OIE is capable of improving the performance of downstream applications. In particular, RCE-OIE significantly improves the performance of paraphrase identification on Microsoft Research corpus when compared with the existing OIE systems.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Akbik, A., Loser, A.: Kraken: N-ary facts in open information extraction. In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pp. 52–56 (2012)
Angeli, G., Premkumar, M J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: Proceedings of the ACL, pp. 1–11 (2015)
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction for the web. IJCAI 7, 2670–2676 (2007)
Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 355–366 (2013)
Dolan, B., Quirk, C., Brockett, C.: Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 350 (2004)
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in NLP, pp. 1535–1545 (2011)
Furbach, U., Glockner, I., Helbig, H., Pelzer, B.: LogAnswer—a deduction-based question answering system (system description). In: International Joint Conference on Automated Reasoning, pp. 139–146. Springer (2008)
Hovy, E., Lin, C.: Automated text summarization and the summarist system. In: Proceedings of a Workshop on Held at Baltimore, Maryland, 13–15 Oct 1998, pp. 197–214. Association for Computational Linguistics (1998)
Madnani, N., Tetreault, J., Chodorow, M.: Re-examining machine translation metrics for paraphrase identification. In: Proceedings of the 2012 Conference of the North American Chapter of ACL: Human Language Technologies, pp. 182–190 (2012)
McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157 (1947)
Schmitz, M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in NLP and Computational Natural Language Learning, pp. 523–534 (2012)
Thenmozhi, D., Aravindan, C.: An automatic and clause based approach to learn relations for ontologies. Comput. J. 59(6), 889–907 (2016). https://doi.org/10.1093/comjnl/bxv071
Thenmozhi, D., Aravindan, C.: Paraphrase identification by using clause based similarity features and machine translation metrics. Comput. J. 59(9), 1289–1302 (2016). https://doi.org/10.1093/comjnl/bxv083
Wu, F., Weld, D.S.: Open information extraction using Wikipedia. In: Proceedings of the 48th Annual Meeting of the ACL, pp. 118–127 (2010)
Zouaq, A.: An overview of shallow and deep natural language processing for ontology learning. In: Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances, vol. 2, pp. 16–37 (2011)
Acknowledgements
We would like to thank the management of SSN Institutions for funding the High Performance Computing (HPC) lab where this research is being carried out.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Thenmozhi, D., Aravindan, C. (2018). RCE-OIE: Open Information Extraction Using a Rule-Based Clause Extraction Engine for Semantic Applications. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., Sahoo, M. (eds) Recent Findings in Intelligent Computing Techniques . Advances in Intelligent Systems and Computing, vol 709. Springer, Singapore. https://doi.org/10.1007/978-981-10-8633-5_20
Download citation
DOI: https://doi.org/10.1007/978-981-10-8633-5_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8632-8
Online ISBN: 978-981-10-8633-5
eBook Packages: EngineeringEngineering (R0)