Skip to main content

Random Indexing Revisited

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9103))

Abstract

Random indexing is a method for constructing vector spaces at a reduced dimensionality. Previously, the method has been proposed using Kanerva’s sparse distributed memory model. Although intuitively plausible, this description fails to provide mathematical justification for setting the method’s parameters. The random indexing method is revisited using the principles of sparse random projections in Euclidean spaces in order to complement its previous delineation.

B. QasemiZadeh—This publication has emanated from research conducted with the financial support of Science Foundation Ireland under Grant Number SFI/12/RC/2289.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    An extension to this discussion and some empirical experiments can be seen in [15].

References

  1. Achlioptas, D.: Database-friendly random projections. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2001, pp. 274–281. ACM, New York (2001)

    Google Scholar 

  2. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  3. Damljanovic, D., Petrak, J., Lupu, M., Cunningham, H., Carlsson, M., Engstrom, G., Andersson, B.: Random indexing for finding similar nodes within large RDF graphs. In: Proceedings of the 8th International Conference on the Semantic Web, ESWC 2011, pp. 156–171. Springer, Heidelberg (2012). http://dx.doi.org/10.1007/978-3-642-25953-1_13

  4. Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22(1), 60–65 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  5. De Vries, C.M., De Vine, L., Geva, S.: Random indexing k-tree (2010). CoRR abs/1001.0833, http://arxiv.org/abs/1001.0833

  6. De Vries, C.M., Geva, S.: Pairwise similarity of TopSig document signatures. In: Proceedings of the Seventeenth Australasian Document Computing Symposium, ADCS 2012, pp. 128–134. ACM, New York (2012)

    Google Scholar 

  7. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.7546

  8. Geva, S., De Vries, C.M.: TOPSIG: topology preserving document signatures. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 333–338. ACM, New York (2011)

    Google Scholar 

  9. Johnson, W., Lindenstrauss, J.: Extensions of lipschitz mappings into a Hilbert space. In: Conference in Modern Analysis and Probability (New Haven, Connecticut, 1982), Contemporary Mathematics, vol. 26, pp. 189–206. American Mathematical Society (1984). http://www.ams.org/books/conm/026/

  10. Kanerva, P., Kristoferson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, pp. 103–106. Erlbaum (2000). http://www.rni.org/kanerva/cogsci2k-poster.txt

  11. Lapesa, G., Evert, S.: Evaluating neighbor rank and distance measures as predictors of semantic priming. In: Proceedings of the Fourth Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL), pp. 66–74. Association for Computational Linguistics, Sofia, Bulgaria, August 2013. http://www.aclweb.org/anthology/W13-2608

  12. Li, P., Hastie, T.J., Church, K.W.: Very sparse random projections. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 287–296. ACM, New York (2006)

    Google Scholar 

  13. Lupu, M.: On the usability of random indexing in patent retrieval. In: Hernandez, N., Jäschke, R., Croitoru, M. (eds.) ICCS 2014. LNCS, vol. 8577, pp. 202–216. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  14. Polajnar, T., Clark, S.: Improving distributional semantic vectors through context selection and normalisation. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014). ACL, Gothenburg, Sweden (2014). http://www.cl.cam.ac.uk/%7Esc609/pubs/eacl14tam.pdf

  15. QasemiZadeh, B.: Random indexing explained with high probability (2015)

    Google Scholar 

  16. Sahlgren, M.: An introduction to random indexing. In: Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005 (2005). http://soda.swedish-ict.se/221/1/RI_intro.pdf

  17. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  18. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Int. Res. 37(1), 141–188 (2010). http://dl.acm.org/citation.cfm?id=1861751.1861756

  19. Zadeh, B.Q., Handschuh, S.: Evaluation of technology term recognition with random indexing. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), Reykjavik, Iceland, May 2014. http://www.lrec-conf.org/proceedings/lrec2014/pdf/920_Paper.pdf, aCL Anthology Identifier: L14–1703

  20. Zadeh, B.Q., Handschuh, S.: Random Manhattan indexing. In: 25th International Workshop on Database and Expert Systems Applications, DEXA 2014, pp. 203–208. IEEE (2014). http://dx.doi.org/10.1109/DEXA.2014.51

  21. Zadeh, B.Q., Handschuh, S.: Random Manhattan integer indexing: incremental L1 normed vector space construction. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1713–1723. Association for Computational Linguistics (2014). http://aclweb.org/anthology/D14-1178

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Behrang QasemiZadeh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

QasemiZadeh, B. (2015). Random Indexing Revisited. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2015. Lecture Notes in Computer Science(), vol 9103. Springer, Cham. https://doi.org/10.1007/978-3-319-19581-0_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19581-0_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19580-3

  • Online ISBN: 978-3-319-19581-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics