Skip to main content

Incorporating Passage Feature Within Language Model Framework for Information Retrieval

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4394))

Abstract

Passage feature has been proved very useful in document retrieval. In this paper, we successfully incorporate the passage feature into language model framework by extending the Jelinek-Mercer smoothing. This scheme not only increases the precision of document language model but also can let the passage feature act well in the documents that are not very long. We compare our schemes with 4 baselines: the unigram language model and the passage language model with Jelinek-Mercer and Dirichlet smoothing. Experimental results on the TREC collections indicate that our method significantly outperforms the unigram language model and gets better performance than passage language model in collections whose documents are not very long.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Callan, J.P.: Passage-level evidence in document retrieval. In: Croft, B.W., van Rijsbergen, C.J. (eds.) Proceedings of the 17th annual international ACM-SIGIR conference on research and developments in information retrieval, Dublin, Ireland, July, pp. 302–310. ACM Press, New York (1994)

    Google Scholar 

  2. Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: Belkin, N.J., Narasimhalu, D., Willett, P. (eds.) Proceedings of the 20th annual international ACM-SIGIR conference on research and development in information retrieval, Philadelphia, PA, pp. 178–185. ACM Press, New York (1997)

    Google Scholar 

  3. Kaszkiel, M., Zobel, J.: Effective ranking with arbitrary passages. Journal of the American Society For Information Science and Technology 52(4), 344–364 (2001)

    Article  Google Scholar 

  4. Xu, J., Croft, B.: Passage retrieval based on language models. In: The 11th International Conference on Information and Knowledge Management, McLean (2002)

    Google Scholar 

  5. Ponte, J., Croft, W.B.: Text segmentation by topic. In: Proceedings of the 1st European conference on research and advanced technology for digital libraries, pp. 113–125 (1997)

    Google Scholar 

  6. Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: Korfhage, R., Rasmussen, E., Willet, P. (eds.) Proceedings of the 16th annual international ACM-SIGIR conference on research and development in information retrieval, Pittsburgh, PA, pp. 59–68. ACM Press, New York (1993)

    Google Scholar 

  7. Ponte, J., Croft, W.B.: A language modelling approach to information retrieval. In: Proceedings of the 21st annual international ACM-SIGIR conference on research and development in information retrieval, pp. 275–281. ACM Press, New York (1998)

    Google Scholar 

  8. Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: Proceedings of the 22nd annual international ACM-SIGIR conference on research and development in information retrieval, pp. 222–229. ACM Press, New York (1999)

    Google Scholar 

  9. Song, F., Croft, W.B.: A general language model for information retrieval. In: Proceedings of the 22nd annual international ACM-SIGIR conference on research and development in information retrieval, Berkeley, CA, pp. 279–280. ACM Press, New York (1999)

    Google Scholar 

  10. Miller, D.H., Leek, T., Schwartz, R.: A hidden Markov model information retrieval system. In: Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 214–221. ACM Press, New York (1999)

    Google Scholar 

  11. Lavrenko, V., Croft, B.: Relevance-based language models. In: Proceedings of the 2001 ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120–127. ACM Press, New York (2001)

    Google Scholar 

  12. Xiaoyong Liu, W.: Bruce Croft: Cluster-Based Retrieval Using Language Models. In: Proceedings of the 27th annual international ACM-SIGIR conference on research and development in information retrieval, ACM Press, New York (2004)

    Google Scholar 

  13. Zhai, C., Laferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the 2001 ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 334–342. ACM Press, New York (2001)

    Google Scholar 

  14. Salton, G., Allan, J., Singhal, A.K.: Automatic text decomposition and structuring. Information Processing and Management 32(2), 127–138 (1996)

    Article  Google Scholar 

  15. Hearst, M.A.: TextTiling, a quantitative approach to discourse segmentation. Technical Report 93/24 Sequoia 2000 Technical Report, University of California, Berkeley (1993)

    Google Scholar 

  16. Reynar, J.C.: An automatic method of finding topic boundaries. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (student session), as Cruces, New Mexico, USA, July (1994)

    Google Scholar 

  17. Ogilvie, P., Callan, J.: Experiments using the lemur toolkit. In: Proceedings of the Tenth Text Retrieval Conference (TREC-10), pp. 103–108 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dang, K., Zhao, T., Qi, H., Zheng, D. (2007). Incorporating Passage Feature Within Language Model Framework for Information Retrieval. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70939-8_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70938-1

  • Online ISBN: 978-3-540-70939-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics