Skip to main content

Relation Extraction for Open and Closed Domain Question Answering

  • Chapter
  • First Online:
Interactive Multi-modal Question-Answering

Abstract

One of the most accurate methods in Question Answering (QA) uses off-line information extraction to find answers for frequently asked questions. It requires automatic extraction from text of all relation instances for relations that users frequently ask for. In this chapter, two methods are presented for learning relation instances for relations relevant in a closed and open domain (medical) QA system. Both methods try to learn automatic dependency paths that typically connect two arguments of a given relation. The first (lightly supervised) method starts from a seed list of argument instances, and extracts dependency paths from all sentences in which a seed pair occurs. This method works well for large text collections and for seeds which are easily identified, such as named entities, and is well-suited for open domain QA. A second experiment concentrates on medical relation extraction for the question answering module of the IMIX system. The IMIX corpus is relatively small and relation instances may contain complex noun phrases that do not occur frequently in the exact same form in the corpus. In this case, learning from annotated data is necessary. Dependency patterns enriched with semantic concept labels are shown to give accurate results for relations that are relevant for a medical QA system. Both methods improve the performance of the Dutch QA system Joost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Research 32(Database Issue):D267

    Google Scholar 

  • Bouma G, Nerbonne J (2010) Applying the espresso-algorithm to large parsed corpora. Submitted.

    Google Scholar 

  • Bouma G, van Noord G, Malouf R (2001) Alpino: Wide-coverage computational analysis of Dutch. In: Computational Linguistics in The Netherlands 2000, Rodopi, Amsterdam

    Google Scholar 

  • Bouma G, Fahmi I, Mur J, van Noord G, van der Plas L, Tiedeman J (2005) Linguistic knowledge and question answering. Traitement Automatique des Langues 2(46):15–39

    Google Scholar 

  • Bouma G, Mur J, van Noord G, van der Plas L, Tiedemann J (2006) Question answering for dutch using dependency relations. In: Peters C (ed) Accessing Multilingual Information Repositories, pp 370–379, URL http://dx.doi. org/10.1007/11878773_42

  • Braun L, Wiesman F, van den Herik J (2005) Towards automatic formulation of a physician’s information needs. In: Proceedings of the Dutch-Belgian Information Retrieval Workshop, Utrecht, the Netherlands

    Google Scholar 

  • Briscoe T, Carroll J (2002) Robust accurate statistical annotation of general text. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation, Citeseer, pp 1499–1504

    Google Scholar 

  • Bunescu R, Mooney R (2005) A shortest path dependency kernel for relation extraction. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, pp 724–731

    Google Scholar 

  • Canisius S, van den Bosch A, Daelemans W (2006) Constraint satisfaction inference: Non-probabilistic global inference for sequence labelling. In: Proceedings of the EACL 2006 Workshop on Learning Structured Information in Natural Language Applications, Trento

    Google Scholar 

  • Culotta A, Sorensen J (2004) Dependency tree kernels for relation extraction. In: 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Barcelona, Spain

    Google Scholar 

  • Etzioni O, Cafarella M, Downey D, Popescu A, Shaked T, Soderland S, Weld D, Yates A (2005) Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165(1):91–134

    Article  Google Scholar 

  • Fleischman M, Hovy E, Echihabi A (2003) Offline strategies for online question answering: Answering questions before they are asked. In: Proc. 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, pp 1–7

    Google Scholar 

  • Fundel K, K¨uffner R, Zimmer R (2007) Relex - relation extraction using dependency trees. Bioinformatics 23:365–371

    Google Scholar 

  • Ittoo A, Bouma G (2010) Mereological and meronymic relations for learning part whole relations. In: Computational Linguistics in the Netherlands 2010, Utrecht, the Netherlands

    Google Scholar 

  • Jijkoun V, Mur J, de Rijke M (2004) Information extraction for question answering: Improving recall through syntactic patterns. In: Coling 2004, Geneva, pp 1284–1290

    Google Scholar 

  • Justeson J, Katz S (1995) Technical terminology: some linguistic properties and an algorithm for identification in text. Natural language engineering 1(01):9–27

    Article  Google Scholar 

  • Katrenko S, Adriaans P (2007) Learning relations from biomedical corpora using dependency trees. In: Tuyls K, Westra R, Saeys Y, Now´e A (eds) Knowledge Discovery and Emergent Complexity in BioInformatics, Lecture Notes in Bioinformatics. LNBI, vol. 4366, Springer

    Google Scholar 

  • Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of COLING/ACL, Montreal, pp 768–774

    Google Scholar 

  • Lin D (2003) Dependency-based evaluation of MINIPAR. In: A Abeill´e, Treebanks: Building and Using Parsed Corpora, Kluwer, pp 317-329

    Google Scholar 

  • Lin D, Pantel P (2001) Discovery of inference rules for question answering. Natural Language Engineering 7:343–360

    Article  Google Scholar 

  • Lita L, Carbonell J (2004) Unsupervised question answering data acquisition from local corpora. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management, ACM, p 614

    Google Scholar 

  • Magnini B, Romagnoli S, Vallin A, Herrera J, PeËœnas A, Peinado V, Verdejo F, de Rijke M (2003) The multiple language question answering track at clef 2003. In: Peters C (ed) Working Notes for the CLEF 2003 Workshop, Trondheim, Norway

    Google Scholar 

  • McCarthy D, Koeling R, Weeds J, Carroll J (2007) Unsupervised acquisition of predominant word senses. Computational Linguistics 33(4):553–590

    Article  Google Scholar 

  • McIntosh T, Curran J (2009) Reducing semantic drift with bagging and distributional similarity. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

    Google Scholar 

  • Mur J (2008) Off-line answer extraction for question answering. PhD thesis, University of Groningen, Groningen

    Google Scholar 

  • van Noord G (2004) Error mining for wide-coverage grammar engineering. In: Proceedings of the ACL 2004, Barcelona

    Google Scholar 

  • van Noord G (2006) At last parsing is now operational. In: Mertens P, Fairon C, Dister A, Watrin P (eds) TALN06. Verbum Ex Machina. Actes de la 13e conference sur le traitement automatique des langues naturelles, pp 20–42

    Google Scholar 

  • van Noord G (2009) Learning efficient parsing. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece, pp 817–825

    Google Scholar 

  • Pad´o S, LapataM(2007) Dependency-based construction of semantic space models. Computational Linguistics 33(2):161–199

    Google Scholar 

  • Pantel P, Pennacchiotti M (2006) Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In: Proceedings of Conference on Computational Linguistics / Association for Computational Linguistics (COLING/ACL-06), Sydney, Australia, pp 113–120

    Google Scholar 

  • van der Plas L (2008) Automatic lexico-semantic acquisition for question answering. PhD thesis, University of Groningen

    Google Scholar 

  • Pollard C, Sag I (1994) Head-driven Phrase Structure Grammar. Center for the Study of Language and Information Stanford

    Google Scholar 

  • Prins R, van Noord G (2001) Unsupervised pos-tagging improves parsing accuracy and parsing efficiency. In: IWPT 2001: International Workshop on Parsing Technologies, Beijing China

    Google Scholar 

  • Ravichandran D, Hovy E (2002) Learning surface text patterns for a question answering system. In: Proceedings of ACL, vol 2, pp 41–47

    Google Scholar 

  • Rinaldi F, Schneider G, Kaljurand K, Hess M, Romacker M (2006) An environment for relation mining over richly annotated corpora: the case of genia. BMC Bioinformatics 7

    Google Scholar 

  • Rosario B, Hearst M (2004) Classifying semantic relations in bioscience texts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain

    Google Scholar 

  • Snow R, Jurafsky D, Ng A (2005) Learning syntactic patterns for automatic hypernym discovery. Advances in Neural Information Processing Systems 17:1297–1304

    Google Scholar 

  • Soubbotin M, Soubbotin S (2002) Use of patterns for detection of answer strings: A systematic approach. In: Proceedings of TREC, vol 11

    Google Scholar 

  • Stevenson M, Greenwood M (2009) Dependency pattern models for information extraction. Research on Language and Computation 3:13–39

    Article  Google Scholar 

  • Tiedemann J (2005) Integrating linguistic knowledge in passage retrieval for question answering. In: Proceedings of EMNLP 2005, Vancouver, pp 939–946

    Google Scholar 

  • Tjong Kim Sang E, Bouma G, de Rijke M (2005) Developing offline strategies for answering medical questions. In: Moll´a D, Vicedo JL (eds) AAAI 2005 workshop on Question Answering in Restricted Domains

    Google Scholar 

  • Zhao S, Grishman R (2005) Extracting relations with integrated information using kernel methods. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, Michigan, pp 419 – 426

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gosse Bouma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bouma, G., Fahmi, I., Mur, J. (2011). Relation Extraction for Open and Closed Domain Question Answering. In: van den Bosch, A., Bouma, G. (eds) Interactive Multi-modal Question-Answering. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17525-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17525-1_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17524-4

  • Online ISBN: 978-3-642-17525-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics