Abstract
Techniques for automatic query expansion from top retrieved documents have recently shown promise for improving retrieval effectiveness on large collections but there is still a lack of systematic evaluation and comparative studies. In this paper we focus on term-scoring methods based on the differences between the distribution of terms in (pseudo-)relevant documents and the distribution of terms in all documents, seen as a complement or an alternative to more conventional techniques. We show that when such distributional methods are used to select expansion terms within Rocchio’s classical reweighting scheme, the overall performance is not likely to improve. However, we also show that when the same distributional methods are used to both select and weight expansion terms the retrieval effectiveness may considerably improve. We then argue, based on their variation in performance on individual queries, that the set of ranked terms suggested by individual distributional methods can be combined to further improve mean performance, by analogy with ensembling classifiers, and present experimental evidence supporting this view. Taken together, our experiments show that with automatic query expansion it is possible to achieve performance gains as high as 21.34% over non-expanded query (for non-interpolated average precision). We also discuss the effect that the main parameters involved in automatic query expansion, such as query difficulty, number of selected documents, and number of selected terms, have on retrieval effectiveness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Attar, R., and Fraenkel, A. S. (1977). Local feedback in full-text retrieval systems. Journal of the Association for Computing Machinery, 2493), 397–417.
Bartell, B., Cottrell, G., and Belew, R. (1994). Automatic combination of multiple ranked retrieval systems. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), pp. 173–181, Dublin.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), pp. 123–140.
Buckley, C., and Salton, G. (1995). Optimization of relevance feedback weights. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’95), pp. 351–357, Seattle.
Buckley, C., Salton, G., Allan, J., and Singhal, A. (1995). Automatic query expansion using SMART: TREC3. Proceedings of the third Text REtrieval Conference (TREC-3).
Carpineto, C., and Romano, G. (1998). Effective reformulation of Boolean queries with concept lattices. Proceedings of the 3rd International Conference on Flexible Query-Answering Systems (FQAS’98), Lecture Notes in Artificial Intelligence, Springer Verlag, pp. 83–94.
Carpineto, C., De Mori, R., and Romano, G. (1999). Informative term selection for automatic query expansion. Proceedings of the Seventh Text Retrieval Conference (TREC-7).
Cooper, J., & Byrd, R. (1997). Lexical navigation: visually prompted query expansion and refinement. Proceedings of the 2nd ACM Digital Library Conference, pp. 237–246.
Croft, B., and Harper, D. J. (1979). Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35, 285–295.
Dietterich, T. (1997). Machine-learning research: four current directions. AI Magazine, Winter 1997, pp. 97–135.
Doszcocks, T.E. (1978). AID: an associative interactive dictionary for online searching. Online Review 2(2), pp. 163–174.
Fitzpatrick, L., and Dent, M. (1997). Automatic feedback using past queries: social searching? Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’97), pp. 306–313, Philadelpia.
Harman, D. (1992). Relevance feedback and other query modification techniques. In Information Retrieval–Data Structures and Algorithms, Frakes, W.B., and Baeza-Yates, R. (Eds.), pp. 241–263, Prentice Hall, Englewood Cliffs, NJ.
Hawking, D., Thistlewaite, P., and Craswell, N. (1998). ANU/ACSys TREC-6 Experiments. In D. K. Harman, editor, Proceedings of the Sixth Text Retrieval Conference (TREC-6).
Hull, D., Pedersen, J., and Schutze, H. (1996). Method combination for document filtering. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), pp. 279–287, Zurich.
Larkey, L., and Croft, B. (1996). Combining classifiers in text categorization. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), pp. 289–297, Zurich.
Mitra, M., Singhal, A., and Buckley, C. (1998). Improving automatic query expansion. Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), pp. 206–214, Melbourne.
Robertson, S.E. (1990). On term selection for query expansion. Journal of Documentation, 46(4), pp. 359–364.
Robertson, S.E., Walker, S., Jones, G.J.F., Hancock-Beaulieu, and Gatford, M. (1995). Okapi at TREC-3. Proceedings of the third Text REtrieval Conference (TREC-3).
Rocchio, J. (1971). Relevance feedback in information retrieval. In Salton, G. (ed.), The SMART retrieval system–experiments in automatic document processing, chapter 14, Prentice Hall, Englewood Cliffs.
Salton, G. and Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4), 288–297.
Shapire, R, Singer, Y., and Singhal, A. (1998). Boosting and Rocchio applied to text filtering. Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), pp. 215–223, Melbourne.
Singhal, A., Choi, J., Hindle, D., Lewis, D., and Pereira, F. (1999). AT&T at TREC-7. Proceedings of the Seventh Text Retrieval Conference (TREC-7).
Srinavasan, P. (1996). Query expansion and MEDLINE. Information Processing & Management, 32(4), pp. 431–443.
Xu, J., and Croft, B. (1996). Query expansion using local and global document analysis. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), pp. 4–11, Zurich.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carpineto, C., Romano, G. (1999). Towards More Effective Techniques for Automatic Query Expansion. In: Abiteboul, S., Vercoustre, AM. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1999. Lecture Notes in Computer Science, vol 1696. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48155-9_10
Download citation
DOI: https://doi.org/10.1007/3-540-48155-9_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66558-8
Online ISBN: 978-3-540-48155-3
eBook Packages: Springer Book Archive