Query Modulation For Web-Based Question Answering

Radev, Dragomir R.; Qi, Hong; Zheng, Zhiping; Blair-Goldensohn, Sasha; Zhang, Zhu; Fan, Weiguo; Prager, John

doi:10.1007/978-1-4020-4746-6_9

Dragomir R. Radev⁵,
Hong Qi⁶,
Zhiping Zheng⁷,
Sasha Blair-Goldensohn⁸,
Zhu Zhang⁹,
Weiguo Fan¹⁰ &
…
John Prager¹¹

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 32))

741 Accesses

The web is now becoming one of the largest information and knowledge repositories. Many large scale search engines (Google, Fast, Northern Light, etc.) have emerged to help users find information. In this paper, we study how we can effectively use these existing search engines to mine the Web and discover the “correct” answers to factual natural language questions. We propose a probabilistic algorithm called QASM (Question Answering using Statistical Models) that learns the best query paraphrase of a natural language question. We validate our approach for both local and web search engines using questions from the TREC evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

7. References

Banko, M., V. Mittal, and M. Witbrock. Headline Generation Based on Statistical Translation, ACL 2000.
Google Scholar
Berger, A. and J. Lafferty. Information retrieval as statistical translation. In Proceedings, 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, August 1999.
Google Scholar
Berger, A., P. Brown, S. Pietra, V. Pietra, J. Lafferty, H. Printz, and L. Ures. The Candide system for machine translation. In Proceedings of the ARPA Conference on Human Language Technology, 1994.
Google Scholar
Brown, P.F., J. Cocke, S. A. D. Pietra, V. J. D. Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin. A statistical approach to machine translation. Computational Linguistics, 16(2):79-85, 1990.
Google Scholar
Church, K. A stochastic parts program and a noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, Austin, Texas, 1988.
Google Scholar
Cohn, D. and Z. Ghahramani and M. Jordan. Active learning with statistical models. Journal of Artificial Intelligence Research 4, 1996, pages 129-145.
Google Scholar
Dempster, A.P., N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society series B, 39:1-38, 1977.
Google Scholar
Excite query corpus. ftp://ftp.excite.com/pub/jack/Excite_Log_12201999.gz, 1999.
Glover, E., G. Flake, S. Lawrence, W. Birmingham, and A. Kruger. Improving category specific web search by learning query modifications. In Symposium on Applications and the Internet, Jan 8– 12 2001.
Google Scholar
Glover, E.J., S. Lawrence, M. D. Gordon, W. P. Birmingham, and C. L. Giles. Web search - your way. Communications of the ACM, 2001.
Google Scholar
Harabagiu, S., D. Moldovan, M. Pasca, R. Mihalcea, M. Surdeanu, R. Bunescu, R. Gîrju, V. Rus, and P. Morarescu. The TREC-9 question answering track evaluation. In Text Retrieval Conference TREC-9, Gaithersburg, MD, 2001.
Google Scholar
Jelinek, F. Statistical Methods for Speech Recognition. MIT Press, Cambridge, Massachusetts, 1997.
Google Scholar
Knight, K. and D. Marcu. Statistics-based summarization - step one: sentence compression. In Proceedings of Seventeenth Annual Conference of the American Association for Artificial Intelligence, Austin, Texas, August 2000.
Google Scholar
Knight, K. and J. Graehl. Machine transliteration. Computational Linguistics, 24(4), 1998.
Google Scholar
Manning, C. and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999.
Google Scholar
McCallum, A. and K. Nigam. Employing EM and pool-based active learning for text classification. Proceedings on ICML. Pages 359-367, 1998.
Google Scholar
Mikheev, A. Tagging sentence boundaries. In Proceedings of SIGIR 2000.
Google Scholar
Miller, G.A., R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography (special issue), 3(4):235-312, 1990.
Article Google Scholar
Mitra, M. and A. Singhal and C. Buckley. Improving Automatic Query Expansion. SIGIR 1998.
Google Scholar
Moldovan, D., S. Harabagiu, M. Pasca, R. Mihalcea, R. Girju, R. Goodrum, and V. Rus. The structure and performance of an open-domain question answering system. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong, October 2000.
Google Scholar
Neal, R. and G. Hinton. A new view of the EM algorithm that justifies incremental and other variant. Technical Report. University of Toronto, 1993.
Google Scholar
Nigam, K., A. McCallum, S. Thrun, and T. Mitchell. Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning 39 (2-3), pages 103-134, 2000.
Article Google Scholar
Pereira, F., N. Tishby, and L. Lee. Distributional clustering of English words. In 30th Annual Meeting of the ACL, 183-190, 1993.
Google Scholar
Ponte, J. and B. Croft. A language modeling approach to information retrieval. In Proceedings, 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 275-281, Melbourne, Australia, August 1998.
Google Scholar
Prager, J., E. Brown, A. Coden, and Dragomir R. Radev. Question-answering by predictive annotation. In Proceedings of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, July 2000.
Google Scholar
Radev, Dragomir R., J. Prager, and V. Samn. Ranking potential answers to natural language questions. In Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, WA, May 2000.
Google Scholar
Radev, Dragomir R., Kelsey Libner, and Weiguo Fan. Getting Answers to Natural Language Queries on the Web. Journal of the American Society for Information Science and Technology, 2002.
Google Scholar
Voorhees, E. and D. Tice. The TREC-8 question answering track evaluation. In Text Retrieval Conference TREC-8, Gaithersburg, MD, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Michigan, 48109, Ann Arbor, MI, USA
Dragomir R. Radev
Lu Jia Zui Finance and Trade Zone, 33 Hua Yuan Shi Qiao Road, 200120, Shanghai, China
Hong Qi
15322 NW Dane Lane, 97229, Portland, OR, USA
Zhiping Zheng
Columbia University, 1214 Amsterdam Avenue, 10027, New York, NY, USA
Sasha Blair-Goldensohn
The University of Arizona, 1130 E. Helen Street, 85721, Tucson, AZ, USA
Zhu Zhang
Virginia Polytechnic Institute and State University, 3007 Pamplin Hall, 24061, Blacksburg, VA, USA
Weiguo Fan
IBM T.J. Watson Research Center, 704, 10598, Yorktown Heights, NY, USA
John Prager

Authors

Dragomir R. Radev
View author publications
You can also search for this author in PubMed Google Scholar
Hong Qi
View author publications
You can also search for this author in PubMed Google Scholar
Zhiping Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Sasha Blair-Goldensohn
View author publications
You can also search for this author in PubMed Google Scholar
Zhu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weiguo Fan
View author publications
You can also search for this author in PubMed Google Scholar
John Prager
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

State University of New York at Albany, 1400 Washington Avenue, 12222, Albany, NY, USA
Tomek Strzalkowski
University of Texas at Dallas, 75083, Richardson, TX, USA
Sanda M. Harabagiu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Radev, D.R. et al. (2008). Query Modulation For Web-Based Question Answering. In: Strzalkowski, T., Harabagiu, S.M. (eds) Advances in Open Domain Question Answering. Text, Speech and Language Technology, vol 32. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-4746-6_9

Download citation

DOI: https://doi.org/10.1007/978-1-4020-4746-6_9
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-4744-2
Online ISBN: 978-1-4020-4746-6
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)

Publish with us

Policies and ethics