Skip to main content

Cross-Language Information Filtering: Word Sense Disambiguation vs. Distributional Models

  • Conference paper
AI*IA 2011: Artificial Intelligence Around Man and Beyond (AI*IA 2011)

Abstract

The exponential growth of the Web is the most influential factor that contributes to the increasing importance of text retrieval and filtering systems. Anyway, since information exists in many languages, users could also consider as relevant documents written in different languages from the one the query is formulated in. In this context, an emerging requirement is to sift through the increasing flood of multilingual text: this poses a renewed challenge for designing effective multilingual Information Filtering systems. How could we represent user information needs or user preferences in a language-independent way?

In this paper, we compared two content-based techniques able to provide users with cross-language recommendations: the first one relies on a knowledge-based word sense disambiguation technique that uses MultiWordNet as sense inventory, while the latter is based on a dimensionality reduction technique called Random Indexing and exploits the so-called distributional hypothesis in order to build language-independent user profiles.

Since the experiments conducted in a movie recommendation scenario show the effectiveness of both approaches, we tried also to underline strenghts and weaknesses of each approach in order to identify scenarios in which a specific technique fits better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andreas Juffinger, R.K., Granitzer, M.: A Wikipedia-Based Multilingual Retrieval Model. In: Evaluating Systems for Multilingual and Multimodal Information Access, pp. 155–162 (2009)

    Google Scholar 

  2. Basile, P., de Gemmis, M., Gentile, A., Iaquinta, L., Lops, P., Semeraro, G.: META - MultilanguagE Text Analyzer. In: Proceedings of the Language and Speech Technnology Conference - LangTech 2008, Rome, Italy, February 28-29, pp. 137–140 (2008)

    Google Scholar 

  3. Basile, P., Caputo, A., Semeraro, G.: Semantic vectors: an information retrieval scenario. In: Melucci, M., Mizzaro, S., Pasi, G. (eds.) IIR 2010 - Proceedings of the First Italian Information Retrieval Workshop, Padua, Italy, January 27-28, pp. 1–5 (2010)

    Google Scholar 

  4. Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: KDD 2001, pp. 245–250. ACM, New York (2001)

    Google Scholar 

  5. Damankesh, A., Singh, J., Jahedpari, F., Shaalan, K., Oroumchian, F.: Using Human Plausible Reasoning as a Framework for Multilingual Information Filtering. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mostefa, D., Penas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241. Springer, Heidelberg (2010)

    Google Scholar 

  6. Dasgupta, S., Gupta, A.: An elementary proof of the Johnson-Lindenstrauss lemma. Tech. rep., Technical Report TR-99-006, International Computer Science Institute, Berkeley, California, USA (1999)

    Google Scholar 

  7. de Gemmis, M., Lops, P., Semeraro, G., Basile, P.: Integrating Tags in a Semantic Content-based Recommender. In: Proc. of the 2008 ACM Conf. on Recommender Systems, RecSys 2008, Lausanne, Switzerland, October 23-25, pp. 163–170 (2008)

    Google Scholar 

  8. Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis. In: Veloso, M.M. (ed.) IJCAI, pp. 1606–1611 (2007)

    Google Scholar 

  9. Gonzalo, J., Verdejo, F., Peters, C., Calzolari, N.: Applying EuroWordNet to Cross-Language Text Retrieval, vol. 32, pp. 185–207. Springer, Netherlands (1998)

    Google Scholar 

  10. Harris, Z.: Mathematical Structures of Language. Interscience, New York (1968)

    MATH  Google Scholar 

  11. Kanerva, P.: Sparse Distributed Memory. MIT Press, Cambridge (1988)

    MATH  Google Scholar 

  12. Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: Proc. of IJCAI 1995, pp. 1137–1145 (1995)

    Google Scholar 

  13. Landauer, T.K., Dumais, S.T.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review 104(2), 211–240 (1997)

    Article  Google Scholar 

  14. Magnini, B., Strapparava, C.: Improving user modelling with content-based techniques. In: Bauer, M., Gmytrasiewicz, P.J., Vassileva, J. (eds.) UM 2001. LNCS (LNAI), vol. 2109, pp. 74–83. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  15. Miller, G.: WordNet: An On-Line Lexical Database. International Journal of Lexicography 3(4) (1990) (Special Issue)

    Google Scholar 

  16. Musto, C.: Enhanced vector space models for content-based recommender systems. In: Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys 2010, pp. 361–364. ACM, New York (2010), http://doi.acm.org/10.1145/1864708.1864791

    Google Scholar 

  17. Oard, D.W.: Alternative Approaches for Cross-Language Text Retrieval. In: AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, pp. 154–162 (1997)

    Google Scholar 

  18. Pazzani, M.J., Billsus, D.: Content-Based Recommendation Systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) Adaptive Web 2007. LNCS, vol. 4321, pp. 325–341. Springer, Heidelberg (2007) iSBN 978-3-540-72078-2

    Chapter  Google Scholar 

  19. Pianta, E., Bentivogli, L., Girardi, C.: MultiwordNet: developing an aligned multilingual database. In: Proc. of the 1st Int. WordNet Conference, Mysore, India, pp. 293–302 (2002)

    Google Scholar 

  20. Potthast, M., Stein, B., Anderka, M.: A wikipedia-based multilingual retrieval model. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 522–530. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  21. Chau, R., Yeh, C.-H.: Fuzzy multilingual information filtering. In: 12th IEEE International Conference on Fuzzy Systems, FUZZ 2003, pp. 767–771 (2003)

    Google Scholar 

  22. Sahlgren, M.: An introduction to random indexing. In: Methods and Applications of Semantic Indexing Workshop, TKE 2005 (2005)

    Google Scholar 

  23. Sahlgren, M.: The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Stockholm University, Department of Linguistics (2006)

    Google Scholar 

  24. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1) (2002)

    Google Scholar 

  25. Vossen, P.: Introduction to EuroWordNet. Computers and the Humanities 32(2-3), 73–89 (1998)

    Article  Google Scholar 

  26. Widdows, D.: Orthogonal negation in vector spaces for modelling word-meanings and document retrieval. In: ACL 2003: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pp. 136–143. Association for Computational Linguistics, Morristown (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Musto, C., Narducci, F., Basile, P., Lops, P., de Gemmis, M., Semeraro, G. (2011). Cross-Language Information Filtering: Word Sense Disambiguation vs. Distributional Models. In: Pirrone, R., Sorbello, F. (eds) AI*IA 2011: Artificial Intelligence Around Man and Beyond. AI*IA 2011. Lecture Notes in Computer Science(), vol 6934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23954-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23954-0_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23953-3

  • Online ISBN: 978-3-642-23954-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics