Skip to main content

Exploring Random Indexing for Profile Learning

  • Conference paper
  • First Online:
Future and Emergent Trends in Language Technology (FETLT 2015)

Abstract

Random Indexing is a recent technique for dimensionality reduction that allows to obtain a word space model from a set of contexts. This technique is less computationally expensive in comparison with others like LSI, Word2Vec or LDA. These characteristics turn it an attractive prospect to be used in an online learning environment. In this work, we compare several variants reported in the Random Indexing literature with the aim of using on the profile learning task. Experiments conducted in a subcollection of the dataset Reuter-21578 show that Random Indexing produces promising results, identifying some versions without actual advantage for the task at hand. Results obtained, by comparing Random Indexing with LDA, Word2Vec or LSI, also show that this technique is a viable alternative for representing documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://code.google.com/p/word2vec/.

  2. 2.

    http://www.daviddlewis.com/resources/testcollections/reuters21578/.

  3. 3.

    https://radimrehurek.com/gensim/.

References

  1. Becker, J., Kuropka, D.: Topic-based vector space model. In: Proceedings of the 6th International Conference on Business Information Systems, pp. 7–12 (2003)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Cohen, T., Schvaneveldt, R., Widdows, D.: Reflective random indexing and indirect inference: a scalable method for discovery of implicit connections. J. Biomed. Inform. 43(2), 240–256 (2010). http://www.sciencedirect.com/science/article/pii/S1532046409001208

    Article  Google Scholar 

  4. Dumais, S., Furnas, G., Landauer, T., Deerwester, S., Deerwester, S., et al.: Latent semantic indexing. In: Proceedings of the Text Retrieval Conference (1995)

    Google Scholar 

  5. Hecht-Nielsen, R.: Context vectors: general purpose approximate meaning representations self-organized from raw data. In: Computational Intelligence: Imitating life, pp. 43–56 (1994)

    Google Scholar 

  6. Higgins, D., Burstein, J.: Sentence similarity measures for essay coherence. In: Proceedings of the 7th International Workshop on Computational Semantics, pp. 1–12 (2007)

    Google Scholar 

  7. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)

    Google Scholar 

  8. Johnson, W.B., Lindenstrauss, J.: Extensions of lipschitz mappings into a hilbert space. Contemp. Math. 26(189–206), 1 (1984)

    MathSciNet  MATH  Google Scholar 

  9. Kanerva, P., Kristofersson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, vol. 1036 (2000)

    Google Scholar 

  10. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)

    Google Scholar 

  11. Musto, C.: Enhanced vector space models for content-based recommender systems. In: Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys 2010, pp. 361–364. ACM (2010)

    Google Scholar 

  12. QasemiZadeh, B., Handschuh, S.: Random indexing explained with high probability. In: Král, P., et al. (eds.) TSD 2015. LNCS, vol. 9302, pp. 414–423. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24033-6_47

    Chapter  Google Scholar 

  13. QasemiZadeh, B.: Random indexing revisited. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds.) NLDB 2015. LNCS, vol. 9103, pp. 437–442. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  14. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta, pp. 45–50, May 2010 http://is.muni.cz/publication/884893/en

  15. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)

    Article  Google Scholar 

  16. Sahlgren, M.: An introduction to random indexing. In: Proceedings of the Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005, August 2005

    Google Scholar 

  17. Sahlgren, M., Cöster, R.: Using bag-of-concepts to improve the performance of support vector machines in text categorization. In: Proceedings of the 20th International Conference on Computational Linguistics, pp. 487. Association for Computational Linguistics (2004)

    Google Scholar 

Download references

Acknowledgments

We thank the CPU Lab for the use of its facilities for running the experiments.

The first author was supported by Conacyt through scholarship 635046, and the second author was partially supported by SNI, México.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrian Fonseca Bruzón .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Fonseca Bruzón, A., López-López, A., Medina Pagola, J. (2016). Exploring Random Indexing for Profile Learning. In: Quesada, J., Martín Mateos, FJ., Lopez-Soto, T. (eds) Future and Emergent Trends in Language Technology. FETLT 2015. Lecture Notes in Computer Science(), vol 9577. Springer, Cham. https://doi.org/10.1007/978-3-319-33500-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-33500-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-33499-8

  • Online ISBN: 978-3-319-33500-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics