An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval

Sorg, Philipp; Cimiano, Philipp

doi:10.1007/978-3-642-12550-8_4

Philipp Sorg²⁰ &
Philipp Cimiano²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5723))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

908 Accesses
9 Citations

Abstract

Explicit Semantic Analysis (ESA) has been recently proposed as an approach to computing semantic relatedness between words (and indirectly also between texts) and has thus a natural application in information retrieval, showing the potential to alleviate the vocabulary mismatch problem inherent in standard Bag-of-Word models. The ESA model has been also recently extended to cross-lingual retrieval settings, which can be considered as an extreme case of the vocabulary mismatch problem. The ESA approach actually represents a class of approaches and allows for various instantiations. As our first contribution, we generalize ESA in order to clearly show the degrees of freedom it provides. Second, we propose some variants of ESA along different dimensions, testing their impact on performance on a cross-lingual mate retrieval task on two datasets (JRC-ACQUIS and Multext). Our results are interesting as a systematic investigation has been missing so far and the variations between different basic design choices are significant. We also show that the settings adopted in the original ESA implementation are reasonably good, which to our knowledge has not been demonstrated so far, but can still be significantly improved by tuning the right parameters (yielding a relative improvement on a cross-lingual mate retrieval task of between 62% (Multext) and 237% (JRC-ACQUIS) with respect to the original ESA model).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Richardson, R., Smeaton, A.: Using wordnet in a knowledge-based approach to information retrieval. In: Proceedings of the BCS-IRSG-Colloquium (1995)
Google Scholar
Schütze, H., Pedersen, J.: A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing and Management 33(3), 307–318 (1997)
Article Google Scholar
Gurevych, I., Müller, C., Zesch, T.: What to be? - electronic career guidance based on semantic relatedness. In: Proceedings of ACL (2007)
Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Gonzalo, J., Verdejo, F., Chugur, I., Cigarran, J.: Indexing with wordnet synsets can improve text retrieval. In: Proceedings of the COLING/ACL 1998 Workshop on Usage of WordNet for NLP, pp. 38–44 (1998)
Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of IJCAI, pp. 1606–1611 (2007)
Google Scholar
Furnas, G., Landauer, T., Gomez, L., Dumais, S.: The vocabulary problem in human-system communication. Communications of the ACM 30(1), 964–971 (1987)
Article Google Scholar
Sorg, P., Cimiano, P.: Cross-lingual information rerieval with explicit semantic analysis. In: Working Notes of the Annual CLEF Meeting (2008)
Google Scholar
Potthast, M., Stein, B., Anderka, M.: A wikipedia-based multilingual retrieval model. In: Proceedings of ECIR, pp. 522–530 (2008)
Google Scholar
Littman, M., Dumais, S., Landauer, T.: Automatic Cross-Language Information Retrieval using Latext Semantic Indexing. In: Cross-Language Information Retrieval, pp. 51–62. Kluwer, Dordrecht (1998)
Google Scholar
Dumais, S., Letsche, T., Littman, M., Landauer, T.: Automatic cross-language retrieval using latent semantic indexing. In: Proceedings of the AAAI Symposium on Cross Language Text and Speech Retrieval (1997)
Google Scholar
Müller, C., Gurevych, I.: Using wikipedia and wiktionary in domain-specific information retrieval. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) Evaluating Systems for Multilingual and Multimodal Information Access. LNCS, vol. 5706, pp. 219–226. Springer, Heidelberg (2009)
Chapter Google Scholar
Gabrilovich, E.: Feature Generation for Textual Information Retrieval using World Knowledge. PhD thesis, Israel Institute of Technology, Haifa (2006)
Google Scholar
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at trec-3. In: Proceedings of TREC (1994)
Google Scholar
Zhai, C.X., Lafferty, J.D.: Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of CIKM, pp. 403–410 (2001)
Google Scholar
Lee, L.: Measures of distributional similarity. In: Proceedings of ACL (1999)
Google Scholar
Egozi, O., Gabrilovich, E., Markovitch, S.: Concept-based feature generation and selection for information retrieval. In: Proceedings of AAAI (2008)
Google Scholar
Gabrilovich, E., Markovitch, S.: Feature generation for text categorization using world knowledge. In: Proceedings of IJCAI (2005)
Google Scholar
Gupta, R., Ratinov, L.: Text categorization with knowledge transfer from heterogeneous data sources. In: Proceedings of AAAI, pp. 842–847 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute AIFB, University of Karlsruhe,
Philipp Sorg
Web Information Systems Group, Delft University of Technology,
Philipp Cimiano

Authors

Philipp Sorg
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Cimiano
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Computertechnologie, Technische Universität Wien, A-1040, Wien, Austria
Helmut Horacek
CNAM- Laboratoire Cédric, 292 Rue St. Martin, 75141, Paris Cedex 03, France
Elisabeth Métais
Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Campus de San Vincente del Raspeig, Apdo 99, 03080, Alicante, Spain
Rafael Muñoz
Dept. of Computational Linguistics, Saarland University, Germany
Magdalena Wolska

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sorg, P., Cimiano, P. (2010). An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval. In: Horacek, H., Métais, E., Muñoz, R., Wolska, M. (eds) Natural Language Processing and Information Systems. NLDB 2009. Lecture Notes in Computer Science, vol 5723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12550-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-12550-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12549-2
Online ISBN: 978-3-642-12550-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics