Skip to main content

Orthogonality and Orthography: Introducing Measured Distance into Semantic Space

  • Conference paper
  • First Online:
Quantum Interaction (QI 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8369))

Included in the following conference series:

Abstract

This paper explores a new technique for encoding structured information into a semantic model, for the construction of vector representations of words and sentences. As an illustrative application, we use this technique to compose robust representations of words based on sequences of letters, that are tolerant to changes such as transposition, insertion and deletion of characters. Since these vectors are generated from the written form or orthography of a word, we call them ‘orthographic vectors’. The representation of discrete letters in a continuous vector space is an interesting example of a Generalized Quantum model, and the process of generating semantic vectors for letters in a word is mathematically similar to the derivation of orbital angular momentum in quantum mechanics. The importance (and sometimes, the violation) of orthogonality is discussed in both mathematical settings. This work is grounded in psychological literature on word representation and recognition, and is also motivated by potential technological applications such as genre-appropriate spelling correction. The mathematical method, examples and experiments, and the implementation and availability of the technique in the Semantic Vectors package are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The same random number sequence must be used for all vectors in a demarcator set, so that a consistent random value for each bit position is compared to the relevant thresholds.

  2. 2.

    For terms of different lengths, we elected to construct a set of demarcator vectors for each term. So while \(D(\alpha )\) and \(D(\omega )\) will be identical, the demarcator for a particular position may differ. It would also be possible to use identical demarcator vectors (by generating a set large enough to accommodate the longest term), which may be advantageous for some tasks.

  3. 3.

    In this example we have drawn negative and positive positions, though in practice we have only experimented with nonnegative positions so far.

  4. 4.

    As the randomization procedure makes it very unlikely that the estimates of similarity between any two pairs will be identical, we have considered a difference of \(\le 0.05\) to be approximately equal. This mirrors the relaxed constraint that \(\ge 0.95\) is approximately identical used by Hannagan and his colleagues for the stability constraint [15, 23].

References

  1. Jones, M.N., Mewhort, D.J.K.: Representing word meaning and order information in a composite holographic lexicon. Psychol. Rev. 114, 1–37 (2007)

    Article  Google Scholar 

  2. Sahlgren, M., Holst, A.,Kanerva, P.: Permutations as a means to encode order in word space. In: Proceedings of the 30th Annual Meeting of the Cognitive Science Society (CogSci’08), July 23–26, Washington D.C., USA (2008)

    Google Scholar 

  3. Cox, G.E., Kachergis, G., Recchia, G., Jones, M.N.: Toward a scalable holographic word-form representation. Behav. Res. Methods 43, 602–615 (2011)

    Article  Google Scholar 

  4. Kachergis, G., Cox, G.E., Jones, M.N.: Orbeagle: integrating orthography into a holographic model of the lexicon. In: Honkela, T. (ed.) ICANN 2011, Part I. LNCS, vol. 6791, pp. 307–314. Springer, Heidelberg (2011)

    Google Scholar 

  5. Plate, T.A.: Holographic Reduced Representation: Distributed Representation for Cognitive Structures. CSLI Publications, Stanford (2003)

    Google Scholar 

  6. Cohen, T., Widdows, D.: Empirical distributional semantics: methods and biomedical applications. J. Biomed. Inf. 42, 390–405 (2009)

    Article  Google Scholar 

  7. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)

    MATH  MathSciNet  Google Scholar 

  8. Landauer, T., Dumais, S.: A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104, 211–240 (1997)

    Article  Google Scholar 

  9. Burgess, C., Livesay, K., Lund, K.: Explorations in context space: words, sentences, discourse. Discourse Process. 25, 211–257 (1998)

    Article  Google Scholar 

  10. De Vine, L., Bruza, P.: Semantic oscillations: Encoding context and structure in complex valued holographic vectors. In: Proceedings of AAAI Fall Symposium on Quantum Informatics for Cognitive Social, and Semantic Processes (2010)

    Google Scholar 

  11. Basile, P., Caputo, A., Semeraro, G.: Encoding syntactic dependencies by vector permutation. In: Proceedings of the EMNLP 2011 Workshop on GEometrical Models of Natural Language Semantics, GEMS, vol. 11, pp. 43–51 (2011)

    Google Scholar 

  12. Cohen, T., Widdows, D., Schvaneveldt, R., Rindflesch, T.: Finding schizophrenia’s prozac: emergent relational similarity in predication space. In: Song, D., Melucci, M., Frommholz, I., Zhang, P., Wang, L., Arafat, S. (eds.) QI 2011. LNCS, vol. 7052, pp. 48–59. Springer, Heidelberg (2011)

    Google Scholar 

  13. Gomez, P., Ratcliff, R., Perea, M.: The overlap model: a model of letter position coding. Psychol. Rev. 115, 577–600 (2008)

    Article  Google Scholar 

  14. Davis, C.J.: The spatial coding model of visual word identification. Psychol. Rev. 117(3), 713 (2010)

    Article  Google Scholar 

  15. Hannagan, T., Dupoux, E., Christophe, A.: Holographic string encoding. Cogn. Sci. 35(1), 79–118 (2011)

    Article  Google Scholar 

  16. Gayler, R.W.: Vector symbolic architectures answer jackendoff’s challenges for cognitive neuroscience. In: Slezak, P. (ed.) ICCS/ASCS International Conference on Cognitive Science, pp. 133–138. University of New South Wales, Sydney (2004)

    Google Scholar 

  17. Kanerva, P.: Binary spatter-coding of ordered k-tuples. In: Artificial Neural Networks – ICANN, vol. 96, pp. 869–873 (1996)

    Google Scholar 

  18. Wahle, M., Widdows, D., Herskovic, J.R., Bernstam, E.V., Cohen, T.: Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts. In: AMIA Annual Symposium Proceedings 2012, pp. 940–949, November 2012

    Google Scholar 

  19. Widdows, D., Peters, S.: Word vectors and quantum logic experiments with negation and disjunction. In: Proceedings of 8th Mathematics of Language Conference, Bloomington, Indiana (2003)

    Google Scholar 

  20. Cohen, T., Widdows, D., De Vine, L., Schvaneveldt, R., Rindflesch, T.C.: Many paths lead to discovery: analogical retrieval of cancer therapies. In: Busemeyer, J.R., Dubois, F., Lambert-Mogiliansky, A., Melucci, M. (eds.) QI 2012. LNCS, vol. 7620, pp. 90–101. Springer, Heidelberg (2012)

    Google Scholar 

  21. Bohm, D.: Quantum Theory. Prentice-Hall, Upper Saddle River (1951). Republished by Dover (1989)

    Google Scholar 

  22. Widdows, D., Cohen, T.: Real, complex, and binary semantic vectors. In: Sixth International Symposium on Quantum Interaction, France, Paris (2012)

    Google Scholar 

  23. Hannagan, T., Grainger, J.: Protein analysis meets visual word recognition: a case for string kernels in the brain. Cogn. Sci. 36(4), 575–606 (2012)

    Article  Google Scholar 

  24. Yarowsky, D., Wicentowski, R.: Minimally supervised morphological analysis by multimodal alignment. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL ’00, PA, USA, pp. 207–216. Association for Computational Linguistics, Stroudsburg (2000)

    Google Scholar 

  25. Baroni, M., Matiasek, J., Trost, H.: Unsupervised discovery of morphologically related words based on orthographic and semantic similarity. In: Proceedings of the ACL-02 workshop on Morphological and phonological learning -. MPL ’02, PA, USA, vol. 6, pp. 48–57. Association for Computational Linguistics, Stroudsburg (2002)

    Google Scholar 

  26. Dennis, S.: Introducing word order within the LSA framework. In: Handbook of Latent Semantic Analysis (2007)

    Google Scholar 

  27. Widdows, D., Ferraro, K.: Semantic vectors: a scalable open source package and online technology management application. In: Sixth International Conference on Language Resources and Evaluation (LREC 2008) (2008)

    Google Scholar 

  28. Widdows, D., Cohen, T.: The semantic vectors package: New algorithms and public tools for distributional semantics. In: Fourth IEEE International Conference on Semantic Computing (ICSC), pp. 9–15 (2010)

    Google Scholar 

Download references

Acknowledgments

This research was supported by US National Library of Medicine grant R21 LM010826. We would like to thank Lance DeVine, for the CHRR implementation used in this research, and Tom Landauer for providing the TASA corpus.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Trevor Cohen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cohen, T., Widdows, D., Wahle, M., Schvaneveldt, R. (2014). Orthogonality and Orthography: Introducing Measured Distance into Semantic Space. In: Atmanspacher, H., Haven, E., Kitto, K., Raine, D. (eds) Quantum Interaction. QI 2013. Lecture Notes in Computer Science(), vol 8369. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54943-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54943-4_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54942-7

  • Online ISBN: 978-3-642-54943-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics