Skip to main content

Combining Vector Space Model and Multi Word Term Extraction for Semantic Query Expansion

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2007)

Abstract

In this paper, we target document ranking in a highly technical field with the aim to approximate a ranking that is obtained through an existing ontology (knowledge structure). We test and combine symbolic and vector space models (VSM). Our symbolic approach relies on shallow NLP and on internal linguistic relations between Multi-Word Terms (MWTs). Documents are ranked based on different semantic relations they share with the query terms, either directly or indirectly after clustering the MWTs using the identified lexico-semantic relations. The VSM approach consisted in ranking documents with different functions ranging from the classical tf.idf to more elaborate similarity functions. Results shows that the ranking obtained by the symbolic approach performs better on most queries than the vector space model. However, the ranking obtained by combining both approaches outperforms by a wide margin the results obtained by methods from each approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ray, E.J., Seltzer, R., Ray, D.S.: The AltaVista Search Revolution. Osborne-McGraw Hill, New York (1997)

    Google Scholar 

  2. Torres-Moreno, J.M., Velázquez-Morales, P., Meunier, J.G.: Condensés de textes par des méthodes numériques. In: JADT 2002, France, pp. 723–734 (2002)

    Google Scholar 

  3. SanJuan, E., Ibekwe-SanJuan, F.: Text mining without document context. Information Processing and Management 42, 1532–1552 (2006)

    Article  Google Scholar 

  4. Salton, G.: The SMART Retrieval System - Experiments un Automatic Document Processing. Englewood Cliffs (1971)

    Google Scholar 

  5. Morris, A., Kasper, G., Adams, D.: The effects and limitations of automated text condensing on reading comprehension performance. In: Advances in automatic text summarization, U.S.A, pp. 305–323. The MIT Press, Cambridge, MA (1999)

    Google Scholar 

  6. Paice, C.D.: Another stemmer. SIGIR Forum 24(3), 56–61 (1990)

    Article  Google Scholar 

  7. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  8. Siegel, S., Castellan, N.: Nonparametric statistics for the behavioral sciences. McGraw-Hill, New York (1988)

    Google Scholar 

  9. Buckley, C.: Looking at limits and tradeoffs: Sabir research at trec. In: Proc. of the 14th Text REtrieval Conference (TREC 2005), Gaithersburg, Maryland, U.S.A 13 (2005)

    Google Scholar 

  10. Liu, S., Yu, C.: University of Illinois Chicago at TREC. In: Proc. of the 14th Text REtrieval Conference (TREC 2005), Gaithersburg, Maryland, U.S.A 7 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zoubida Kedad Nadira Lammari Elisabeth Métais Farid Meziane Yacine Rezgui

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

SanJuan, E., Ibekwe-SanJuan, F., Torres-Moreno, JM., Velázquez-Morales, P. (2007). Combining Vector Space Model and Multi Word Term Extraction for Semantic Query Expansion. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds) Natural Language Processing and Information Systems. NLDB 2007. Lecture Notes in Computer Science, vol 4592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73351-5_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73351-5_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73350-8

  • Online ISBN: 978-3-540-73351-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics