Abstract
Random indexing is a method for constructing vector spaces at a reduced dimensionality. Previously, the method has been proposed using Kanerva’s sparse distributed memory model. Although intuitively plausible, this description fails to provide mathematical justification for setting the method’s parameters. The random indexing method is revisited using the principles of sparse random projections in Euclidean spaces in order to complement its previous delineation.
B. QasemiZadeh—This publication has emanated from research conducted with the financial support of Science Foundation Ireland under Grant Number SFI/12/RC/2289.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
An extension to this discussion and some empirical experiments can be seen in [15].
References
Achlioptas, D.: Database-friendly random projections. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2001, pp. 274–281. ACM, New York (2001)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Damljanovic, D., Petrak, J., Lupu, M., Cunningham, H., Carlsson, M., Engstrom, G., Andersson, B.: Random indexing for finding similar nodes within large RDF graphs. In: Proceedings of the 8th International Conference on the Semantic Web, ESWC 2011, pp. 156–171. Springer, Heidelberg (2012). http://dx.doi.org/10.1007/978-3-642-25953-1_13
Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22(1), 60–65 (2003)
De Vries, C.M., De Vine, L., Geva, S.: Random indexing k-tree (2010). CoRR abs/1001.0833, http://arxiv.org/abs/1001.0833
De Vries, C.M., Geva, S.: Pairwise similarity of TopSig document signatures. In: Proceedings of the Seventeenth Australasian Document Computing Symposium, ADCS 2012, pp. 128–134. ACM, New York (2012)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.7546
Geva, S., De Vries, C.M.: TOPSIG: topology preserving document signatures. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 333–338. ACM, New York (2011)
Johnson, W., Lindenstrauss, J.: Extensions of lipschitz mappings into a Hilbert space. In: Conference in Modern Analysis and Probability (New Haven, Connecticut, 1982), Contemporary Mathematics, vol. 26, pp. 189–206. American Mathematical Society (1984). http://www.ams.org/books/conm/026/
Kanerva, P., Kristoferson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, pp. 103–106. Erlbaum (2000). http://www.rni.org/kanerva/cogsci2k-poster.txt
Lapesa, G., Evert, S.: Evaluating neighbor rank and distance measures as predictors of semantic priming. In: Proceedings of the Fourth Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL), pp. 66–74. Association for Computational Linguistics, Sofia, Bulgaria, August 2013. http://www.aclweb.org/anthology/W13-2608
Li, P., Hastie, T.J., Church, K.W.: Very sparse random projections. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 287–296. ACM, New York (2006)
Lupu, M.: On the usability of random indexing in patent retrieval. In: Hernandez, N., Jäschke, R., Croitoru, M. (eds.) ICCS 2014. LNCS, vol. 8577, pp. 202–216. Springer, Heidelberg (2014)
Polajnar, T., Clark, S.: Improving distributional semantic vectors through context selection and normalisation. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014). ACL, Gothenburg, Sweden (2014). http://www.cl.cam.ac.uk/%7Esc609/pubs/eacl14tam.pdf
QasemiZadeh, B.: Random indexing explained with high probability (2015)
Sahlgren, M.: An introduction to random indexing. In: Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005 (2005). http://soda.swedish-ict.se/221/1/RI_intro.pdf
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Int. Res. 37(1), 141–188 (2010). http://dl.acm.org/citation.cfm?id=1861751.1861756
Zadeh, B.Q., Handschuh, S.: Evaluation of technology term recognition with random indexing. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), Reykjavik, Iceland, May 2014. http://www.lrec-conf.org/proceedings/lrec2014/pdf/920_Paper.pdf, aCL Anthology Identifier: L14–1703
Zadeh, B.Q., Handschuh, S.: Random Manhattan indexing. In: 25th International Workshop on Database and Expert Systems Applications, DEXA 2014, pp. 203–208. IEEE (2014). http://dx.doi.org/10.1109/DEXA.2014.51
Zadeh, B.Q., Handschuh, S.: Random Manhattan integer indexing: incremental L1 normed vector space construction. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1713–1723. Association for Computational Linguistics (2014). http://aclweb.org/anthology/D14-1178
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
QasemiZadeh, B. (2015). Random Indexing Revisited. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2015. Lecture Notes in Computer Science(), vol 9103. Springer, Cham. https://doi.org/10.1007/978-3-319-19581-0_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-19581-0_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19580-3
Online ISBN: 978-3-319-19581-0
eBook Packages: Computer ScienceComputer Science (R0)