Query Modulation For Web-Based Question Answering

  • Dragomir R. Radev
  • Hong Qi
  • Zhiping Zheng
  • Sasha Blair-Goldensohn
  • Zhu Zhang
  • Weiguo Fan
  • John Prager
Part of the Text, Speech and Language Technology book series (TLTB, volume 32)

The web is now becoming one of the largest information and knowledge repositories. Many large scale search engines (Google, Fast, Northern Light, etc.) have emerged to help users find information. In this paper, we study how we can effectively use these existing search engines to mine the Web and discover the “correct” answers to factual natural language questions. We propose a probabilistic algorithm called QASM (Question Answering using Statistical Models) that learns the best query paraphrase of a natural language question. We validate our approach for both local and web search engines using questions from the TREC evaluation.


Search Engine Noun Phrase Expectation Maximization Algorithm Question Answering Statistical Machine Translation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

7. References

  1. Banko, M., V. Mittal, and M. Witbrock. Headline Generation Based on Statistical Translation, ACL 2000.Google Scholar
  2. Berger, A. and J. Lafferty. Information retrieval as statistical translation. In Proceedings, 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, August 1999.Google Scholar
  3. Berger, A., P. Brown, S. Pietra, V. Pietra, J. Lafferty, H. Printz, and L. Ures. The Candide system for machine translation. In Proceedings of the ARPA Conference on Human Language Technology, 1994.Google Scholar
  4. Brown, P.F., J. Cocke, S. A. D. Pietra, V. J. D. Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin. A statistical approach to machine translation. Computational Linguistics, 16(2):79-85, 1990.Google Scholar
  5. Church, K. A stochastic parts program and a noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, Austin, Texas, 1988.Google Scholar
  6. Cohn, D. and Z. Ghahramani and M. Jordan. Active learning with statistical models. Journal of Artificial Intelligence Research 4, 1996, pages 129-145.Google Scholar
  7. Dempster, A.P., N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society series B, 39:1-38, 1977.Google Scholar
  8. Glover, E., G. Flake, S. Lawrence, W. Birmingham, and A. Kruger. Improving category specific web search by learning query modifications. In Symposium on Applications and the Internet, Jan 8– 12 2001.Google Scholar
  9. Glover, E.J., S. Lawrence, M. D. Gordon, W. P. Birmingham, and C. L. Giles. Web search - your way. Communications of the ACM, 2001.Google Scholar
  10. Harabagiu, S., D. Moldovan, M. Pasca, R. Mihalcea, M. Surdeanu, R. Bunescu, R. Gîrju, V. Rus, and P. Morarescu. The TREC-9 question answering track evaluation. In Text Retrieval Conference TREC-9, Gaithersburg, MD, 2001.Google Scholar
  11. Jelinek, F. Statistical Methods for Speech Recognition. MIT Press, Cambridge, Massachusetts, 1997.Google Scholar
  12. Knight, K. and D. Marcu. Statistics-based summarization - step one: sentence compression. In Proceedings of Seventeenth Annual Conference of the American Association for Artificial Intelligence, Austin, Texas, August 2000.Google Scholar
  13. Knight, K. and J. Graehl. Machine transliteration. Computational Linguistics, 24(4), 1998.Google Scholar
  14. Manning, C. and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999.Google Scholar
  15. McCallum, A. and K. Nigam. Employing EM and pool-based active learning for text classification. Proceedings on ICML. Pages 359-367, 1998.Google Scholar
  16. Mikheev, A. Tagging sentence boundaries. In Proceedings of SIGIR 2000.Google Scholar
  17. Miller, G.A., R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography (special issue), 3(4):235-312, 1990.CrossRefGoogle Scholar
  18. Mitra, M. and A. Singhal and C. Buckley. Improving Automatic Query Expansion. SIGIR 1998.Google Scholar
  19. Moldovan, D., S. Harabagiu, M. Pasca, R. Mihalcea, R. Girju, R. Goodrum, and V. Rus. The structure and performance of an open-domain question answering system. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong, October 2000.Google Scholar
  20. Neal, R. and G. Hinton. A new view of the EM algorithm that justifies incremental and other variant. Technical Report. University of Toronto, 1993.Google Scholar
  21. Nigam, K., A. McCallum, S. Thrun, and T. Mitchell. Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning 39 (2-3), pages 103-134, 2000.CrossRefGoogle Scholar
  22. Pereira, F., N. Tishby, and L. Lee. Distributional clustering of English words. In 30th Annual Meeting of the ACL, 183-190, 1993.Google Scholar
  23. Ponte, J. and B. Croft. A language modeling approach to information retrieval. In Proceedings, 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 275-281, Melbourne, Australia, August 1998.Google Scholar
  24. Prager, J., E. Brown, A. Coden, and Dragomir R. Radev. Question-answering by predictive annotation. In Proceedings of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, July 2000.Google Scholar
  25. Radev, Dragomir R., J. Prager, and V. Samn. Ranking potential answers to natural language questions. In Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, WA, May 2000.Google Scholar
  26. Radev, Dragomir R., Kelsey Libner, and Weiguo Fan. Getting Answers to Natural Language Queries on the Web. Journal of the American Society for Information Science and Technology, 2002.Google Scholar
  27. Voorhees, E. and D. Tice. The TREC-8 question answering track evaluation. In Text Retrieval Conference TREC-8, Gaithersburg, MD, 2000.Google Scholar

Copyright information

© Springer 2008

Authors and Affiliations

  • Dragomir R. Radev
    • 1
  • Hong Qi
    • 2
  • Zhiping Zheng
    • 3
  • Sasha Blair-Goldensohn
    • 4
  • Zhu Zhang
    • 5
  • Weiguo Fan
    • 6
  • John Prager
    • 7
  1. 1.University of MichiganAnn ArborUSA
  2. 2.Lu Jia Zui Finance and Trade ZoneChina
  3. 3.PortlandUSA
  4. 4.Columbia UniversityNew YorkUSA
  5. 5.The University of ArizonaTucsonUSA
  6. 6.Virginia Polytechnic Institute and State UniversityBlacksburgUSA
  7. 7.IBM T.J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations