Exploring Random Indexing for Profile Learning

Fonseca Bruzón, Adrian; López-López, Aurelio; Medina Pagola, José

doi:10.1007/978-3-319-33500-1_7

Adrian Fonseca Bruzón^16,17,
Aurelio López-López¹⁶ &
José Medina Pagola¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9577))

Included in the following conference series:

International Workshop on Future and Emerging Trends in Language Technology

526 Accesses

Abstract

Random Indexing is a recent technique for dimensionality reduction that allows to obtain a word space model from a set of contexts. This technique is less computationally expensive in comparison with others like LSI, Word2Vec or LDA. These characteristics turn it an attractive prospect to be used in an online learning environment. In this work, we compare several variants reported in the Random Indexing literature with the aim of using on the profile learning task. Experiments conducted in a subcollection of the dataset Reuter-21578 show that Random Indexing produces promising results, identifying some versions without actual advantage for the task at hand. Results obtained, by comparing Random Indexing with LDA, Word2Vec or LSI, also show that this technique is a viable alternative for representing documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Becker, J., Kuropka, D.: Topic-based vector space model. In: Proceedings of the 6th International Conference on Business Information Systems, pp. 7–12 (2003)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Cohen, T., Schvaneveldt, R., Widdows, D.: Reflective random indexing and indirect inference: a scalable method for discovery of implicit connections. J. Biomed. Inform. 43(2), 240–256 (2010). http://www.sciencedirect.com/science/article/pii/S1532046409001208
Article Google Scholar
Dumais, S., Furnas, G., Landauer, T., Deerwester, S., Deerwester, S., et al.: Latent semantic indexing. In: Proceedings of the Text Retrieval Conference (1995)
Google Scholar
Hecht-Nielsen, R.: Context vectors: general purpose approximate meaning representations self-organized from raw data. In: Computational Intelligence: Imitating life, pp. 43–56 (1994)
Google Scholar
Higgins, D., Burstein, J.: Sentence similarity measures for essay coherence. In: Proceedings of the 7th International Workshop on Computational Semantics, pp. 1–12 (2007)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)
Google Scholar
Johnson, W.B., Lindenstrauss, J.: Extensions of lipschitz mappings into a hilbert space. Contemp. Math. 26(189–206), 1 (1984)
MathSciNet MATH Google Scholar
Kanerva, P., Kristofersson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, vol. 1036 (2000)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
Google Scholar
Musto, C.: Enhanced vector space models for content-based recommender systems. In: Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys 2010, pp. 361–364. ACM (2010)
Google Scholar
QasemiZadeh, B., Handschuh, S.: Random indexing explained with high probability. In: Král, P., et al. (eds.) TSD 2015. LNCS, vol. 9302, pp. 414–423. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24033-6_47
Chapter Google Scholar
QasemiZadeh, B.: Random indexing revisited. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds.) NLDB 2015. LNCS, vol. 9103, pp. 437–442. Springer, Heidelberg (2015)
Chapter Google Scholar
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta, pp. 45–50, May 2010 http://is.muni.cz/publication/884893/en
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)
Article Google Scholar
Sahlgren, M.: An introduction to random indexing. In: Proceedings of the Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005, August 2005
Google Scholar
Sahlgren, M., Cöster, R.: Using bag-of-concepts to improve the performance of support vector machines in text categorization. In: Proceedings of the 20th International Conference on Computational Linguistics, pp. 487. Association for Computational Linguistics (2004)
Google Scholar

Download references

Acknowledgments

We thank the CPU Lab for the use of its facilities for running the experiments.

The first author was supported by Conacyt through scholarship 635046, and the second author was partially supported by SNI, México.

Author information

Authors and Affiliations

National Institute of Astrophysics, Optics and Electronics, Sta María Tonantzintla, Puebla, Mexico
Adrian Fonseca Bruzón & Aurelio López-López
Center for Pattern Recognition and Data Mining, Santiago de Cuba, Cuba
Adrian Fonseca Bruzón
Advanced Technologies Application Center, Havana, Cuba
José Medina Pagola

Authors

Adrian Fonseca Bruzón
View author publications
You can also search for this author in PubMed Google Scholar
Aurelio López-López
View author publications
You can also search for this author in PubMed Google Scholar
José Medina Pagola
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adrian Fonseca Bruzón .

Editor information

Editors and Affiliations

University of Seville, Seville, Spain
José F. Quesada
University of Seville, Seville, Spain
Francisco-Jesús Martín Mateos
University of Seville, Seville, Spain
Teresa Lopez-Soto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fonseca Bruzón, A., López-López, A., Medina Pagola, J. (2016). Exploring Random Indexing for Profile Learning. In: Quesada, J., Martín Mateos, FJ., Lopez-Soto, T. (eds) Future and Emergent Trends in Language Technology. FETLT 2015. Lecture Notes in Computer Science(), vol 9577. Springer, Cham. https://doi.org/10.1007/978-3-319-33500-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-33500-1_7
Published: 26 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33499-8
Online ISBN: 978-3-319-33500-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics