Abstract
Passage feature has been proved very useful in document retrieval. In this paper, we successfully incorporate the passage feature into language model framework by extending the Jelinek-Mercer smoothing. This scheme not only increases the precision of document language model but also can let the passage feature act well in the documents that are not very long. We compare our schemes with 4 baselines: the unigram language model and the passage language model with Jelinek-Mercer and Dirichlet smoothing. Experimental results on the TREC collections indicate that our method significantly outperforms the unigram language model and gets better performance than passage language model in collections whose documents are not very long.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Callan, J.P.: Passage-level evidence in document retrieval. In: Croft, B.W., van Rijsbergen, C.J. (eds.) Proceedings of the 17th annual international ACM-SIGIR conference on research and developments in information retrieval, Dublin, Ireland, July, pp. 302–310. ACM Press, New York (1994)
Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: Belkin, N.J., Narasimhalu, D., Willett, P. (eds.) Proceedings of the 20th annual international ACM-SIGIR conference on research and development in information retrieval, Philadelphia, PA, pp. 178–185. ACM Press, New York (1997)
Kaszkiel, M., Zobel, J.: Effective ranking with arbitrary passages. Journal of the American Society For Information Science and Technology 52(4), 344–364 (2001)
Xu, J., Croft, B.: Passage retrieval based on language models. In: The 11th International Conference on Information and Knowledge Management, McLean (2002)
Ponte, J., Croft, W.B.: Text segmentation by topic. In: Proceedings of the 1st European conference on research and advanced technology for digital libraries, pp. 113–125 (1997)
Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: Korfhage, R., Rasmussen, E., Willet, P. (eds.) Proceedings of the 16th annual international ACM-SIGIR conference on research and development in information retrieval, Pittsburgh, PA, pp. 59–68. ACM Press, New York (1993)
Ponte, J., Croft, W.B.: A language modelling approach to information retrieval. In: Proceedings of the 21st annual international ACM-SIGIR conference on research and development in information retrieval, pp. 275–281. ACM Press, New York (1998)
Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: Proceedings of the 22nd annual international ACM-SIGIR conference on research and development in information retrieval, pp. 222–229. ACM Press, New York (1999)
Song, F., Croft, W.B.: A general language model for information retrieval. In: Proceedings of the 22nd annual international ACM-SIGIR conference on research and development in information retrieval, Berkeley, CA, pp. 279–280. ACM Press, New York (1999)
Miller, D.H., Leek, T., Schwartz, R.: A hidden Markov model information retrieval system. In: Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 214–221. ACM Press, New York (1999)
Lavrenko, V., Croft, B.: Relevance-based language models. In: Proceedings of the 2001 ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120–127. ACM Press, New York (2001)
Xiaoyong Liu, W.: Bruce Croft: Cluster-Based Retrieval Using Language Models. In: Proceedings of the 27th annual international ACM-SIGIR conference on research and development in information retrieval, ACM Press, New York (2004)
Zhai, C., Laferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the 2001 ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 334–342. ACM Press, New York (2001)
Salton, G., Allan, J., Singhal, A.K.: Automatic text decomposition and structuring. Information Processing and Management 32(2), 127–138 (1996)
Hearst, M.A.: TextTiling, a quantitative approach to discourse segmentation. Technical Report 93/24 Sequoia 2000 Technical Report, University of California, Berkeley (1993)
Reynar, J.C.: An automatic method of finding topic boundaries. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (student session), as Cruces, New Mexico, USA, July (1994)
Ogilvie, P., Callan, J.: Experiments using the lemur toolkit. In: Proceedings of the Tenth Text Retrieval Conference (TREC-10), pp. 103–108 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dang, K., Zhao, T., Qi, H., Zheng, D. (2007). Incorporating Passage Feature Within Language Model Framework for Information Retrieval. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_42
Download citation
DOI: https://doi.org/10.1007/978-3-540-70939-8_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70938-1
Online ISBN: 978-3-540-70939-8
eBook Packages: Computer ScienceComputer Science (R0)