Answering Definition Questions: Dealing with Data Sparseness in Lexicalised Dependency Trees-Based Language Models

Figueroa, Alejandro; Atkinson, John

doi:10.1007/978-3-642-12436-5_22

Alejandro Figueroa⁷ &
John Atkinson⁸

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 45))

Included in the following conference series:

International Conference on Web Information Systems and Technologies

497 Accesses

Abstract

A crucial step in the answering process of definition questions, such as “Who is Gordon Brown?”, is the ranking of answer candidates. In definition Question Answering (QA), sentences are normally interpreted as potential answers, and one of the most promising ranking strategies predicates upon Language Models (LMs).

However, one of the factors that makes LMs less attractive is the fact that they can suffer from data sparseness, when the training material is insufficient or candidate sentences are too long. This paper analyses two methods, different in nature, for tackling data sparseness head-on: (1) combining LMs learnt from different, but overlapping, training corpora, and (2) selective substitutions grounded upon part-of-speech (POS) taggings.

Results show that the first method improves the Mean Average Precision (MAP) of the top-ranked answers, while at the same time, it diminishes the average F-score of the final output. Conversely, the impact of the second approach depends on the test corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Figueroa, A., Atkinson, J.: Using Dependency Paths For Answering Definition Questions on The Web. In: 5th International Conference on Web Information Systems and Technologies, pp. 643–650 (2009)
Google Scholar
Cui, H., Kan, M.Y., Chua, T.S.: Unsupervised Learning of Soft Patterns for Definitional Question Answering. In: Proceedings of the Thirteenth World Wide Web Conference (WWW 2004), pp. 90–99 (2004)
Google Scholar
Cui, H., Kan, M.Y., Chua, T.S.: Soft pattern matching models for definitional question answering. ACM Trans. Inf. Syst. 25 (2007)
Google Scholar
Cui, T., Kan, M., Xiao, J.: A comparative study on sentence retrieval for definitional question answering. In: SIGIR Workshop on Information Retrieval for Question Answering (IR4QA), pp. 383–390 (2004)
Google Scholar
Han, K., Song, Y., Rim, H.: Probabilistic model for definitional question answering. In: Proceedings of SIGIR 2006, pp. 212–219 (2006)
Google Scholar
Zhang, Z., Zhou, Y., Huang, X., Wu, L.: Answering Definition Questions Using Web Knowledge Bases. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 498–506. Springer, Heidelberg (2005)
Chapter Google Scholar
Firth, J.R.: A synopsis of linguistic theory 1930-1955. Studies in Linguistic Analysis, 1–32 (1957)
Google Scholar
Harris, Z.: Distributional Structure. Distributional structure. Word 10(23), 146–162 (1954)
Google Scholar
Chen, Y., Zhon, M., Wang, S.: Reranking Answers for Definitional QA Using Language Modeling. In: Coling/ACL 2006, pp. 1081–1088 (2006)
Google Scholar
Belkin, M., Goldsmith, J.: Using eigenvectors of the bigram graph to infer grammatical features and categories. In: Proceedings of the Morphology/Phonology Learning Workshop of ACL 2002 (2002)
Google Scholar
Hildebrandt, W., Katz, B., Lin, J.: Answering Definition Questions Using Multiple Knowledge Sources. In: Proceedings of HLT-NAACL, pp. 49–56 (2004)
Google Scholar
Soubbotin, M.M.: Patterns of Potential Answer Expressions as Clues to the Right Answers. In: Proceedings of the TREC-10 Conference (2001)
Google Scholar
Lin, D., Pantel, P.: Discovery of Inference Rules for Question Answering. Journal of Natural Language Engineering 7, 343–360 (2001)
Article Google Scholar
Bunescu, R., Mooney, R.J.: A Shortest Path Dependency Kernel for Relation Extraction. In: Proceedings of HLT/EMNLP (2005)
Google Scholar
Chen, S., Goodman, J.: An Empirical Study of Smoothing Techniques for Language Modeling. In: Proceedings of the 34th Annual Meeting of the ACL, pp. 310–318 (1996)
Google Scholar
Figueroa, A., Neumann, G.: A Multilingual Framework for Searching Definitions on Web Snippets. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 144–159. Springer, Heidelberg (2007)
Chapter Google Scholar
Voorhees, E.M.: Evaluating Answers to Definition Questions. In: HLT-NAACL, pp. 109–111 (2003)
Google Scholar
Lin, J., Demner-Fushman, D.: Will pyramids built of nuggets topple over? In: Proceedings of the main conference on HTL/NAACL, pp. 383–390 (2006)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
MATH Google Scholar
Surdeanu, M., Ciaramita, M., Zaragoza, H.: Learning to Rank Answers on Large Online QA Collections. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL 2008), pp. 719–727 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, Stuhlsatzenhausweg 3, D - 66123, Saarbrücken, Germany
Alejandro Figueroa
Department of Computer Sciences, Universidad de Concepción, Concepción, Chile
John Atkinson

Authors

Alejandro Figueroa
View author publications
You can also search for this author in PubMed Google Scholar
John Atkinson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Systems and informatics, Institute for Systems and Technologies of Information, Control and Communication (INSTICC) and Instituto Politécnico de Setúbal (IPS), Rua do Vale de Chaves, Estefanilha, 2910-761, Setúbal, Portugal
José Cordeiro & Joaquim Filipe &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Figueroa, A., Atkinson, J. (2010). Answering Definition Questions: Dealing with Data Sparseness in Lexicalised Dependency Trees-Based Language Models. In: Cordeiro, J., Filipe, J. (eds) Web Information Systems and Technologies. WEBIST 2009. Lecture Notes in Business Information Processing, vol 45. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12436-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-12436-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12435-8
Online ISBN: 978-3-642-12436-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics