Skip to main content

Answering Definition Questions: Dealing with Data Sparseness in Lexicalised Dependency Trees-Based Language Models

  • Conference paper
Web Information Systems and Technologies (WEBIST 2009)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 45))

Included in the following conference series:

  • 497 Accesses

Abstract

A crucial step in the answering process of definition questions, such as “Who is Gordon Brown?”, is the ranking of answer candidates. In definition Question Answering (QA), sentences are normally interpreted as potential answers, and one of the most promising ranking strategies predicates upon Language Models (LMs).

However, one of the factors that makes LMs less attractive is the fact that they can suffer from data sparseness, when the training material is insufficient or candidate sentences are too long. This paper analyses two methods, different in nature, for tackling data sparseness head-on: (1) combining LMs learnt from different, but overlapping, training corpora, and (2) selective substitutions grounded upon part-of-speech (POS) taggings.

Results show that the first method improves the Mean Average Precision (MAP) of the top-ranked answers, while at the same time, it diminishes the average F-score of the final output. Conversely, the impact of the second approach depends on the test corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Figueroa, A., Atkinson, J.: Using Dependency Paths For Answering Definition Questions on The Web. In: 5th International Conference on Web Information Systems and Technologies, pp. 643–650 (2009)

    Google Scholar 

  2. Cui, H., Kan, M.Y., Chua, T.S.: Unsupervised Learning of Soft Patterns for Definitional Question Answering. In: Proceedings of the Thirteenth World Wide Web Conference (WWW 2004), pp. 90–99 (2004)

    Google Scholar 

  3. Cui, H., Kan, M.Y., Chua, T.S.: Soft pattern matching models for definitional question answering. ACM Trans. Inf. Syst. 25 (2007)

    Google Scholar 

  4. Cui, T., Kan, M., Xiao, J.: A comparative study on sentence retrieval for definitional question answering. In: SIGIR Workshop on Information Retrieval for Question Answering (IR4QA), pp. 383–390 (2004)

    Google Scholar 

  5. Han, K., Song, Y., Rim, H.: Probabilistic model for definitional question answering. In: Proceedings of SIGIR 2006, pp. 212–219 (2006)

    Google Scholar 

  6. Zhang, Z., Zhou, Y., Huang, X., Wu, L.: Answering Definition Questions Using Web Knowledge Bases. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 498–506. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Firth, J.R.: A synopsis of linguistic theory 1930-1955. Studies in Linguistic Analysis, 1–32 (1957)

    Google Scholar 

  8. Harris, Z.: Distributional Structure. Distributional structure. Word 10(23), 146–162 (1954)

    Google Scholar 

  9. Chen, Y., Zhon, M., Wang, S.: Reranking Answers for Definitional QA Using Language Modeling. In: Coling/ACL 2006, pp. 1081–1088 (2006)

    Google Scholar 

  10. Belkin, M., Goldsmith, J.: Using eigenvectors of the bigram graph to infer grammatical features and categories. In: Proceedings of the Morphology/Phonology Learning Workshop of ACL 2002 (2002)

    Google Scholar 

  11. Hildebrandt, W., Katz, B., Lin, J.: Answering Definition Questions Using Multiple Knowledge Sources. In: Proceedings of HLT-NAACL, pp. 49–56 (2004)

    Google Scholar 

  12. Soubbotin, M.M.: Patterns of Potential Answer Expressions as Clues to the Right Answers. In: Proceedings of the TREC-10 Conference (2001)

    Google Scholar 

  13. Lin, D., Pantel, P.: Discovery of Inference Rules for Question Answering. Journal of Natural Language Engineering 7, 343–360 (2001)

    Article  Google Scholar 

  14. Bunescu, R., Mooney, R.J.: A Shortest Path Dependency Kernel for Relation Extraction. In: Proceedings of HLT/EMNLP (2005)

    Google Scholar 

  15. Chen, S., Goodman, J.: An Empirical Study of Smoothing Techniques for Language Modeling. In: Proceedings of the 34th Annual Meeting of the ACL, pp. 310–318 (1996)

    Google Scholar 

  16. Figueroa, A., Neumann, G.: A Multilingual Framework for Searching Definitions on Web Snippets. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 144–159. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  17. Voorhees, E.M.: Evaluating Answers to Definition Questions. In: HLT-NAACL, pp. 109–111 (2003)

    Google Scholar 

  18. Lin, J., Demner-Fushman, D.: Will pyramids built of nuggets topple over? In: Proceedings of the main conference on HTL/NAACL, pp. 383–390 (2006)

    Google Scholar 

  19. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    MATH  Google Scholar 

  20. Surdeanu, M., Ciaramita, M., Zaragoza, H.: Learning to Rank Answers on Large Online QA Collections. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL 2008), pp. 719–727 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Figueroa, A., Atkinson, J. (2010). Answering Definition Questions: Dealing with Data Sparseness in Lexicalised Dependency Trees-Based Language Models. In: Cordeiro, J., Filipe, J. (eds) Web Information Systems and Technologies. WEBIST 2009. Lecture Notes in Business Information Processing, vol 45. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12436-5_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12436-5_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12435-8

  • Online ISBN: 978-3-642-12436-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics