Skip to main content

An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5723))

Abstract

Explicit Semantic Analysis (ESA) has been recently proposed as an approach to computing semantic relatedness between words (and indirectly also between texts) and has thus a natural application in information retrieval, showing the potential to alleviate the vocabulary mismatch problem inherent in standard Bag-of-Word models. The ESA model has been also recently extended to cross-lingual retrieval settings, which can be considered as an extreme case of the vocabulary mismatch problem. The ESA approach actually represents a class of approaches and allows for various instantiations. As our first contribution, we generalize ESA in order to clearly show the degrees of freedom it provides. Second, we propose some variants of ESA along different dimensions, testing their impact on performance on a cross-lingual mate retrieval task on two datasets (JRC-ACQUIS and Multext). Our results are interesting as a systematic investigation has been missing so far and the variations between different basic design choices are significant. We also show that the settings adopted in the original ESA implementation are reasonably good, which to our knowledge has not been demonstrated so far, but can still be significantly improved by tuning the right parameters (yielding a relative improvement on a cross-lingual mate retrieval task of between 62% (Multext) and 237% (JRC-ACQUIS) with respect to the original ESA model).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Richardson, R., Smeaton, A.: Using wordnet in a knowledge-based approach to information retrieval. In: Proceedings of the BCS-IRSG-Colloquium (1995)

    Google Scholar 

  2. Schütze, H., Pedersen, J.: A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing and Management 33(3), 307–318 (1997)

    Article  Google Scholar 

  3. Gurevych, I., Müller, C., Zesch, T.: What to be? - electronic career guidance based on semantic relatedness. In: Proceedings of ACL (2007)

    Google Scholar 

  4. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  5. Gonzalo, J., Verdejo, F., Chugur, I., Cigarran, J.: Indexing with wordnet synsets can improve text retrieval. In: Proceedings of the COLING/ACL 1998 Workshop on Usage of WordNet for NLP, pp. 38–44 (1998)

    Google Scholar 

  6. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of IJCAI, pp. 1606–1611 (2007)

    Google Scholar 

  7. Furnas, G., Landauer, T., Gomez, L., Dumais, S.: The vocabulary problem in human-system communication. Communications of the ACM 30(1), 964–971 (1987)

    Article  Google Scholar 

  8. Sorg, P., Cimiano, P.: Cross-lingual information rerieval with explicit semantic analysis. In: Working Notes of the Annual CLEF Meeting (2008)

    Google Scholar 

  9. Potthast, M., Stein, B., Anderka, M.: A wikipedia-based multilingual retrieval model. In: Proceedings of ECIR, pp. 522–530 (2008)

    Google Scholar 

  10. Littman, M., Dumais, S., Landauer, T.: Automatic Cross-Language Information Retrieval using Latext Semantic Indexing. In: Cross-Language Information Retrieval, pp. 51–62. Kluwer, Dordrecht (1998)

    Google Scholar 

  11. Dumais, S., Letsche, T., Littman, M., Landauer, T.: Automatic cross-language retrieval using latent semantic indexing. In: Proceedings of the AAAI Symposium on Cross Language Text and Speech Retrieval (1997)

    Google Scholar 

  12. Müller, C., Gurevych, I.: Using wikipedia and wiktionary in domain-specific information retrieval. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) Evaluating Systems for Multilingual and Multimodal Information Access. LNCS, vol. 5706, pp. 219–226. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  13. Gabrilovich, E.: Feature Generation for Textual Information Retrieval using World Knowledge. PhD thesis, Israel Institute of Technology, Haifa (2006)

    Google Scholar 

  14. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at trec-3. In: Proceedings of TREC (1994)

    Google Scholar 

  15. Zhai, C.X., Lafferty, J.D.: Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of CIKM, pp. 403–410 (2001)

    Google Scholar 

  16. Lee, L.: Measures of distributional similarity. In: Proceedings of ACL (1999)

    Google Scholar 

  17. Egozi, O., Gabrilovich, E., Markovitch, S.: Concept-based feature generation and selection for information retrieval. In: Proceedings of AAAI (2008)

    Google Scholar 

  18. Gabrilovich, E., Markovitch, S.: Feature generation for text categorization using world knowledge. In: Proceedings of IJCAI (2005)

    Google Scholar 

  19. Gupta, R., Ratinov, L.: Text categorization with knowledge transfer from heterogeneous data sources. In: Proceedings of AAAI, pp. 842–847 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sorg, P., Cimiano, P. (2010). An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval. In: Horacek, H., Métais, E., Muñoz, R., Wolska, M. (eds) Natural Language Processing and Information Systems. NLDB 2009. Lecture Notes in Computer Science, vol 5723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12550-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12550-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12549-2

  • Online ISBN: 978-3-642-12550-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics