Relation Extraction for Open and Closed Domain Question Answering

Bouma, Gosse; Fahmi, Ismail; Mur, Jori

doi:10.1007/978-3-642-17525-1_8

Gosse Bouma³,
Ismail Fahmi⁴ &
Jori Mur⁵

Part of the book series: Theory and Applications of Natural Language Processing ((NLP))

668 Accesses
2 Citations

Abstract

One of the most accurate methods in Question Answering (QA) uses off-line information extraction to find answers for frequently asked questions. It requires automatic extraction from text of all relation instances for relations that users frequently ask for. In this chapter, two methods are presented for learning relation instances for relations relevant in a closed and open domain (medical) QA system. Both methods try to learn automatic dependency paths that typically connect two arguments of a given relation. The first (lightly supervised) method starts from a seed list of argument instances, and extracts dependency paths from all sentences in which a seed pair occurs. This method works well for large text collections and for seeds which are easily identified, such as named entities, and is well-suited for open domain QA. A second experiment concentrates on medical relation extraction for the question answering module of the IMIX system. The IMIX corpus is relatively small and relation instances may contain complex noun phrases that do not occur frequently in the exact same form in the corpus. In this case, learning from annotated data is necessary. Dependency patterns enriched with semantic concept labels are shown to give accurate results for relations that are relevant for a medical QA system. Both methods improve the performance of the Dutch QA system Joost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Research 32(Database Issue):D267
Google Scholar
Bouma G, Nerbonne J (2010) Applying the espresso-algorithm to large parsed corpora. Submitted.
Google Scholar
Bouma G, van Noord G, Malouf R (2001) Alpino: Wide-coverage computational analysis of Dutch. In: Computational Linguistics in The Netherlands 2000, Rodopi, Amsterdam
Google Scholar
Bouma G, Fahmi I, Mur J, van Noord G, van der Plas L, Tiedeman J (2005) Linguistic knowledge and question answering. Traitement Automatique des Langues 2(46):15–39
Google Scholar
Bouma G, Mur J, van Noord G, van der Plas L, Tiedemann J (2006) Question answering for dutch using dependency relations. In: Peters C (ed) Accessing Multilingual Information Repositories, pp 370–379, URL http://dx.doi. org/10.1007/11878773_42
Braun L, Wiesman F, van den Herik J (2005) Towards automatic formulation of a physician’s information needs. In: Proceedings of the Dutch-Belgian Information Retrieval Workshop, Utrecht, the Netherlands
Google Scholar
Briscoe T, Carroll J (2002) Robust accurate statistical annotation of general text. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation, Citeseer, pp 1499–1504
Google Scholar
Bunescu R, Mooney R (2005) A shortest path dependency kernel for relation extraction. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, pp 724–731
Google Scholar
Canisius S, van den Bosch A, Daelemans W (2006) Constraint satisfaction inference: Non-probabilistic global inference for sequence labelling. In: Proceedings of the EACL 2006 Workshop on Learning Structured Information in Natural Language Applications, Trento
Google Scholar
Culotta A, Sorensen J (2004) Dependency tree kernels for relation extraction. In: 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Barcelona, Spain
Google Scholar
Etzioni O, Cafarella M, Downey D, Popescu A, Shaked T, Soderland S, Weld D, Yates A (2005) Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165(1):91–134
Article Google Scholar
Fleischman M, Hovy E, Echihabi A (2003) Offline strategies for online question answering: Answering questions before they are asked. In: Proc. 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, pp 1–7
Google Scholar
Fundel K, K¨uffner R, Zimmer R (2007) Relex - relation extraction using dependency trees. Bioinformatics 23:365–371
Google Scholar
Ittoo A, Bouma G (2010) Mereological and meronymic relations for learning part whole relations. In: Computational Linguistics in the Netherlands 2010, Utrecht, the Netherlands
Google Scholar
Jijkoun V, Mur J, de Rijke M (2004) Information extraction for question answering: Improving recall through syntactic patterns. In: Coling 2004, Geneva, pp 1284–1290
Google Scholar
Justeson J, Katz S (1995) Technical terminology: some linguistic properties and an algorithm for identification in text. Natural language engineering 1(01):9–27
Article Google Scholar
Katrenko S, Adriaans P (2007) Learning relations from biomedical corpora using dependency trees. In: Tuyls K, Westra R, Saeys Y, Now´e A (eds) Knowledge Discovery and Emergent Complexity in BioInformatics, Lecture Notes in Bioinformatics. LNBI, vol. 4366, Springer
Google Scholar
Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of COLING/ACL, Montreal, pp 768–774
Google Scholar
Lin D (2003) Dependency-based evaluation of MINIPAR. In: A Abeill´e, Treebanks: Building and Using Parsed Corpora, Kluwer, pp 317-329
Google Scholar
Lin D, Pantel P (2001) Discovery of inference rules for question answering. Natural Language Engineering 7:343–360
Article Google Scholar
Lita L, Carbonell J (2004) Unsupervised question answering data acquisition from local corpora. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management, ACM, p 614
Google Scholar
Magnini B, Romagnoli S, Vallin A, Herrera J, Pe˜nas A, Peinado V, Verdejo F, de Rijke M (2003) The multiple language question answering track at clef 2003. In: Peters C (ed) Working Notes for the CLEF 2003 Workshop, Trondheim, Norway
Google Scholar
McCarthy D, Koeling R, Weeds J, Carroll J (2007) Unsupervised acquisition of predominant word senses. Computational Linguistics 33(4):553–590
Article Google Scholar
McIntosh T, Curran J (2009) Reducing semantic drift with bagging and distributional similarity. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP
Google Scholar
Mur J (2008) Off-line answer extraction for question answering. PhD thesis, University of Groningen, Groningen
Google Scholar
van Noord G (2004) Error mining for wide-coverage grammar engineering. In: Proceedings of the ACL 2004, Barcelona
Google Scholar
van Noord G (2006) At last parsing is now operational. In: Mertens P, Fairon C, Dister A, Watrin P (eds) TALN06. Verbum Ex Machina. Actes de la 13e conference sur le traitement automatique des langues naturelles, pp 20–42
Google Scholar
van Noord G (2009) Learning efficient parsing. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece, pp 817–825
Google Scholar
Pad´o S, LapataM(2007) Dependency-based construction of semantic space models. Computational Linguistics 33(2):161–199
Google Scholar
Pantel P, Pennacchiotti M (2006) Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In: Proceedings of Conference on Computational Linguistics / Association for Computational Linguistics (COLING/ACL-06), Sydney, Australia, pp 113–120
Google Scholar
van der Plas L (2008) Automatic lexico-semantic acquisition for question answering. PhD thesis, University of Groningen
Google Scholar
Pollard C, Sag I (1994) Head-driven Phrase Structure Grammar. Center for the Study of Language and Information Stanford
Google Scholar
Prins R, van Noord G (2001) Unsupervised pos-tagging improves parsing accuracy and parsing efficiency. In: IWPT 2001: International Workshop on Parsing Technologies, Beijing China
Google Scholar
Ravichandran D, Hovy E (2002) Learning surface text patterns for a question answering system. In: Proceedings of ACL, vol 2, pp 41–47
Google Scholar
Rinaldi F, Schneider G, Kaljurand K, Hess M, Romacker M (2006) An environment for relation mining over richly annotated corpora: the case of genia. BMC Bioinformatics 7
Google Scholar
Rosario B, Hearst M (2004) Classifying semantic relations in bioscience texts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain
Google Scholar
Snow R, Jurafsky D, Ng A (2005) Learning syntactic patterns for automatic hypernym discovery. Advances in Neural Information Processing Systems 17:1297–1304
Google Scholar
Soubbotin M, Soubbotin S (2002) Use of patterns for detection of answer strings: A systematic approach. In: Proceedings of TREC, vol 11
Google Scholar
Stevenson M, Greenwood M (2009) Dependency pattern models for information extraction. Research on Language and Computation 3:13–39
Article Google Scholar
Tiedemann J (2005) Integrating linguistic knowledge in passage retrieval for question answering. In: Proceedings of EMNLP 2005, Vancouver, pp 939–946
Google Scholar
Tjong Kim Sang E, Bouma G, de Rijke M (2005) Developing offline strategies for answering medical questions. In: Moll´a D, Vicedo JL (eds) AAAI 2005 workshop on Question Answering in Restricted Domains
Google Scholar
Zhao S, Grishman R (2005) Extracting relations with integrated information using kernel methods. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, Michigan, pp 419 – 426
Google Scholar

Download references

Author information

Authors and Affiliations

University of Groningen, Groningen, The Netherlands
Gosse Bouma
Gresnews Media, Amsterdam, The Netherlands
Ismail Fahmi
De Rode Planeet, Zuidhorn, The Netherlands
Jori Mur

Authors

Gosse Bouma
View author publications
You can also search for this author in PubMed Google Scholar
Ismail Fahmi
View author publications
You can also search for this author in PubMed Google Scholar
Jori Mur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gosse Bouma .

Editor information

Editors and Affiliations

Fac. Humanities, Tilburg University, Tilburg, Netherlands
Antal van den Bosch
, Information Science, University of Groningen, NL-9700 AS Groningen, Netherlands
Gosse Bouma

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bouma, G., Fahmi, I., Mur, J. (2011). Relation Extraction for Open and Closed Domain Question Answering. In: van den Bosch, A., Bouma, G. (eds) Interactive Multi-modal Question-Answering. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17525-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-17525-1_8
Published: 08 April 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17524-4
Online ISBN: 978-3-642-17525-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics