Skip to main content

The Role of Multi-word Units in Interactive Information Retrieval

  • Conference paper
Advances in Information Retrieval (ECIR 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3408))

Included in the following conference series:

Abstract

The paper presents several techniques for selecting noun phrases for interactive query expansion following pseudo-relevance feedback and a new phrase search method. A combined syntactico-statistical method was used for the selection of phrases. First, noun phrases were selected using a part-of-speech tagger and a noun-phrase chunker, and secondly, different statistical measures were applied to select phrases for query expansion. Experiments were also conducted studying the effectiveness of noun phrases in document ranking. We analyse the problems of phrase weighting and suggest new ways of addressing them. A new method of phrase matching and weighting was developed, which specifically addresses the problem of weighting overlapping and non-contiguous word sequences in documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Robertson, S.E., Spärck Jones, K.: Relevance Weighting of Search Terms. Journal of the American Society for Information Science 27, 129–146 (1976)

    Article  Google Scholar 

  2. Spärck Jones, K., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments. Information Processing and Management 36(6), 779–808 (Part 1); 809–840 (Part 2) (2000)

    Article  Google Scholar 

  3. Salton, G., Wong, A., Yang, C.S.: A vector space model for information retrieval. Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  4. Voorhees, E., Buckland, L. (eds.): Proceedings of the Twelfth Text Retrieval Conference, NIST, Gaithersburg, MD (2004)

    Google Scholar 

  5. Xu, J., Croft, B.: Query expansion using local and global document analysis. In: Proceedings of the 19th International Conference on Research and Development in Information Retrieval (SIGIR 1996), Zurich, Switzerland, pp. 4–11 (1996)

    Google Scholar 

  6. Frantzi, K.T., Ananiadou, S.: Extracting nested collocations. In: Proceedings of the 16th Conference on Computational Linguistics, COLING, pp. 41–46 (1996)

    Google Scholar 

  7. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)

    Google Scholar 

  8. Bely, N., Borillo, A., Virbel, J., Siot-Decauville, N.: Procédures d’analyse sémantique appliquée à la documentation scientifique. Paris: Gauthier (1970)

    Google Scholar 

  9. Fagan, J.L.: Automatic Phrase Indexing For Document Retrieval: An Examination Of Syntactic And Non-Syntactic Methods. In: Proceedings of the Tenth ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, pp. 91–108 (1987)

    Google Scholar 

  10. Fagan, J.L.: The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval. Journal of the American Society for Information Science 40(2), 115–132 (1989)

    Article  Google Scholar 

  11. Salton, G., Lesk, M.E.: Computer Evaluation of Indexing and Text Processing. Journal of the ACM (JACM) 15(1), 8–36 (1968)

    Article  MATH  Google Scholar 

  12. Strzalkowski, T., Perez-Carballo, J.: Evaluating natural language processing techniques in information retrieval. In: Strzalkowski, T. (ed.) Natural language information retrieval, pp. 113–145. Kluwer Academic Publishers, Dordrecht (1999)

    Google Scholar 

  13. Mitra, M., Buckley, C., Singhal, A., Cardie, C.: An Analysis of Statistical And Syntactic Phrases. In: Proceedings of RIAO 1997, Computer-Assisted Information Searching on the Internet, Montreal, Canada, pp. 200–214 (1997)

    Google Scholar 

  14. Robertson, S.E., Zaragoza, H., Taylor, M.: Microsoft Cambridge at TREC-12: HARD track. In: Voorhees, E., Buckland, L. (eds.) Proceedings of the Twelfth Text Retrieval Conference, NIST, Gaithersburg, MD, pp. 418–425 (2004)

    Google Scholar 

  15. Marchionini, G.: Interfaces for End-User Information Seeking. Journal of the ASIS 43(2), 156–163 (1992)

    Google Scholar 

  16. Smeaton, A.F., Kelledy, F.: User-Chosen Phrases in Interactive Query Formulation for Information Retrieval. In: Proceedings of the 20th BCS-IRSG Colloquium, Grenoble, France. Workshops in Computing. Springer, Heidelberg (1998)

    Google Scholar 

  17. Vechtomova, O., Karamuftuoglu, M., Lam, E.: Interactive Search Refinement Techniques for HARD Tasks. In: Voorhees, E., Buckland, L. (eds.) Proceedings of the Twelfth Text Retrieval Conference, NIST, Gaithersburg, MD, pp. 820–827 (2004)

    Google Scholar 

  18. Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging. Computational Linguistics 21(4), 543–565 (1995)

    Google Scholar 

  19. Ramshaw, L., Marcus, M.: Text Chunking Using Transformation-Based Learning. In: Proceedings of the Third ACL Workshop on Very Large Corpora. MIT, Cambridge (1995)

    Google Scholar 

  20. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  21. Banerjee, S., Pedersen, T.: The Design, Implementation and Use of the Ngram Statistics Package. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City (2003)

    Google Scholar 

  22. Clarke, C.L.A., Cormack, G.V.: On the use of Regular Expressions for Searching Text. University of Waterloo Computer Science Department Technical Report number CS-1995-2007, University of Waterloo, Canada (1995)

    Google Scholar 

  23. Allan, J.: HARD Track Overview in TREC 2003 High Accuracy Retrieval from Documents. In: Voorhees, E., Buckland, L. (eds.) Proceedings of the Twelfth Text Retrieval Conference, NIST, Gaithersburg, MD, pp. 24–37 (2004)

    Google Scholar 

  24. Beaulieu, M., Jones, S.: Interactive searching and interface issues in the Okapi best match probabilistic retrieval system. Interacting with Computers 10(3), 237–248 (1998)

    Article  Google Scholar 

  25. Ruthven, I.: Re-examining the potential effectiveness of interactive query expansion. In: Proceedings of the 26th ACM-SIGIR conference, Toronto, Canada, pp. 213–220 (2003)

    Google Scholar 

  26. Vintar, Š.: Comparative Evaluation of C-Value in the Treatment of Nested Terms. In: Proceedings of MEMURA 2004 Workshop (Methodologies and Evaluation of Multiword Units in Real-world Applications), Language Resources and Evaluation Conference (LREC), Lisbon, Portugal, pp. 54–57 (2004)

    Google Scholar 

  27. Vechtomova, O., Karamuftuoglu, M., Skomorowski, J.: Approaches to High Accuracy Document Retrieval in HARD Track. In: Voorhees, E., Buckland, L. (eds.) To appear in Proceedings of the Thirteenth Text Retrieval Conference, NIST, Gaithersburg, MD (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vechtomova, O. (2005). The Role of Multi-word Units in Interactive Information Retrieval. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-31865-1_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25295-5

  • Online ISBN: 978-3-540-31865-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics