Skip to main content

Towards More Effective Techniques for Automatic Query Expansion

  • Conference paper
  • First Online:
Book cover Research and Advanced Technology for Digital Libraries (ECDL 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1696))

Included in the following conference series:

Abstract

Techniques for automatic query expansion from top retrieved documents have recently shown promise for improving retrieval effectiveness on large collections but there is still a lack of systematic evaluation and comparative studies. In this paper we focus on term-scoring methods based on the differences between the distribution of terms in (pseudo-)relevant documents and the distribution of terms in all documents, seen as a complement or an alternative to more conventional techniques. We show that when such distributional methods are used to select expansion terms within Rocchio’s classical reweighting scheme, the overall performance is not likely to improve. However, we also show that when the same distributional methods are used to both select and weight expansion terms the retrieval effectiveness may considerably improve. We then argue, based on their variation in performance on individual queries, that the set of ranked terms suggested by individual distributional methods can be combined to further improve mean performance, by analogy with ensembling classifiers, and present experimental evidence supporting this view. Taken together, our experiments show that with automatic query expansion it is possible to achieve performance gains as high as 21.34% over non-expanded query (for non-interpolated average precision). We also discuss the effect that the main parameters involved in automatic query expansion, such as query difficulty, number of selected documents, and number of selected terms, have on retrieval effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Attar, R., and Fraenkel, A. S. (1977). Local feedback in full-text retrieval systems. Journal of the Association for Computing Machinery, 2493), 397–417.

    Google Scholar 

  2. Bartell, B., Cottrell, G., and Belew, R. (1994). Automatic combination of multiple ranked retrieval systems. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), pp. 173–181, Dublin.

    Google Scholar 

  3. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), pp. 123–140.

    MATH  MathSciNet  Google Scholar 

  4. Buckley, C., and Salton, G. (1995). Optimization of relevance feedback weights. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’95), pp. 351–357, Seattle.

    Google Scholar 

  5. Buckley, C., Salton, G., Allan, J., and Singhal, A. (1995). Automatic query expansion using SMART: TREC3. Proceedings of the third Text REtrieval Conference (TREC-3).

    Google Scholar 

  6. Carpineto, C., and Romano, G. (1998). Effective reformulation of Boolean queries with concept lattices. Proceedings of the 3rd International Conference on Flexible Query-Answering Systems (FQAS’98), Lecture Notes in Artificial Intelligence, Springer Verlag, pp. 83–94.

    Google Scholar 

  7. Carpineto, C., De Mori, R., and Romano, G. (1999). Informative term selection for automatic query expansion. Proceedings of the Seventh Text Retrieval Conference (TREC-7).

    Google Scholar 

  8. Cooper, J., & Byrd, R. (1997). Lexical navigation: visually prompted query expansion and refinement. Proceedings of the 2nd ACM Digital Library Conference, pp. 237–246.

    Google Scholar 

  9. Croft, B., and Harper, D. J. (1979). Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35, 285–295.

    Article  Google Scholar 

  10. Dietterich, T. (1997). Machine-learning research: four current directions. AI Magazine, Winter 1997, pp. 97–135.

    Google Scholar 

  11. Doszcocks, T.E. (1978). AID: an associative interactive dictionary for online searching. Online Review 2(2), pp. 163–174.

    Article  Google Scholar 

  12. Fitzpatrick, L., and Dent, M. (1997). Automatic feedback using past queries: social searching? Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’97), pp. 306–313, Philadelpia.

    Google Scholar 

  13. Harman, D. (1992). Relevance feedback and other query modification techniques. In Information Retrieval–Data Structures and Algorithms, Frakes, W.B., and Baeza-Yates, R. (Eds.), pp. 241–263, Prentice Hall, Englewood Cliffs, NJ.

    Google Scholar 

  14. Hawking, D., Thistlewaite, P., and Craswell, N. (1998). ANU/ACSys TREC-6 Experiments. In D. K. Harman, editor, Proceedings of the Sixth Text Retrieval Conference (TREC-6).

    Google Scholar 

  15. Hull, D., Pedersen, J., and Schutze, H. (1996). Method combination for document filtering. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), pp. 279–287, Zurich.

    Google Scholar 

  16. Larkey, L., and Croft, B. (1996). Combining classifiers in text categorization. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), pp. 289–297, Zurich.

    Google Scholar 

  17. Mitra, M., Singhal, A., and Buckley, C. (1998). Improving automatic query expansion. Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), pp. 206–214, Melbourne.

    Google Scholar 

  18. Robertson, S.E. (1990). On term selection for query expansion. Journal of Documentation, 46(4), pp. 359–364.

    Article  Google Scholar 

  19. Robertson, S.E., Walker, S., Jones, G.J.F., Hancock-Beaulieu, and Gatford, M. (1995). Okapi at TREC-3. Proceedings of the third Text REtrieval Conference (TREC-3).

    Google Scholar 

  20. Rocchio, J. (1971). Relevance feedback in information retrieval. In Salton, G. (ed.), The SMART retrieval system–experiments in automatic document processing, chapter 14, Prentice Hall, Englewood Cliffs.

    Google Scholar 

  21. Salton, G. and Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4), 288–297.

    Article  Google Scholar 

  22. Shapire, R, Singer, Y., and Singhal, A. (1998). Boosting and Rocchio applied to text filtering. Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), pp. 215–223, Melbourne.

    Google Scholar 

  23. Singhal, A., Choi, J., Hindle, D., Lewis, D., and Pereira, F. (1999). AT&T at TREC-7. Proceedings of the Seventh Text Retrieval Conference (TREC-7).

    Google Scholar 

  24. Srinavasan, P. (1996). Query expansion and MEDLINE. Information Processing & Management, 32(4), pp. 431–443.

    Article  Google Scholar 

  25. Xu, J., and Croft, B. (1996). Query expansion using local and global document analysis. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), pp. 4–11, Zurich.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Carpineto, C., Romano, G. (1999). Towards More Effective Techniques for Automatic Query Expansion. In: Abiteboul, S., Vercoustre, AM. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1999. Lecture Notes in Computer Science, vol 1696. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48155-9_10

Download citation

  • DOI: https://doi.org/10.1007/3-540-48155-9_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66558-8

  • Online ISBN: 978-3-540-48155-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics