Skip to main content

Expansion-by-Analogy: A Vector Symbolic Approach to Semantic Search

  • Conference paper
  • First Online:
Quantum Interaction (QI 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8951))

Included in the following conference series:

Abstract

In this paper, we develop an approach to semantic search that utilizes high-dimensional vector representations to infer the nature of the relationship between query concepts and other concepts in relevant documents. We do so by incorporating outside knowledge drawn from tens of millions of concept-relation-concept triplets, known as semantic predications, extracted from the biomedical literature using a Natural Language Processing (NLP) system called SemRep. Inference is accomplished in high-dimensional space using Expansion-by-Analogy, a novel analogical approach to pseudo-relevance feedback, in which the relationships between query concepts and other concepts in documents they occur in guide the query expansion process. The semantic vector based approaches developed in this work show improvements in performance over a baseline bag-of-concepts model, and these improvements are most pronounced on queries that are not conducive to keyword-based search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bodenreider, O., Stevens, R.: Bio-ontologies: current trends and future directions. Briefings Bioinform. 7, 256–274 (2006). PMID: 16899495 PMCID: PMC1847325

    Google Scholar 

  2. Zhou, W., Yu, C., Smalheiser, N., Torvik, V., Hong, J.: Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 655–662. ACM (2007)

    Google Scholar 

  3. Hersh, W.R.: Report on the TREC 2004 genomics track. In: ACM SIGIR Forum, vol. 39, pp. 21–24. ACM (2005)

    Google Scholar 

  4. Hersh, W.R., Cohen, A.M., Roberts, P.M., Rekapalli, H.K.: TREC 2006 genomics track overview. In: TREC (2006)

    Google Scholar 

  5. Koopman, B., Zuccon, G., Bruza, P., Sitbon, L., Lawley, M.: An evaluation of corpus-driven measures of medical concept similarity for information retrieval. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2439–2442. ACM (2012)

    Google Scholar 

  6. Zuccon, G., Koopman, B., Nguyen, A., Vickers, D., Butt, L.: Exploiting medical hierarchies for concept-based information retrieval. In: Proceedings of the Seventeenth Australasian Document Computing Symposium, pp. 111–114. ACM (2012)

    Google Scholar 

  7. Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., Rindflesch, T.C.: SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics 28(23), 3158–3160 (2012)

    Article  Google Scholar 

  8. Rindflesch, T.C., Fiszman, M.: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J. Biomed. Inf. 36, 462–477 (2003)

    Article  Google Scholar 

  9. Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database Issue), D267 (2004)

    Article  Google Scholar 

  10. Kilicoglu, H., Fiszman, M., Rosemblat, G., Marimpietri, S., Rindflesch, T.C.: Arguments of nominals in semantic interpretation of biomedical text. In: Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pp. 46–54 (2010)

    Google Scholar 

  11. Cohen, T., Schvaneveldt, R., Rindflesch, T.: Predication-based semantic indexing: permutations as a means to encode predications in semantic space. AMIA Annu. Symp. Proc., 114–118 (2009)

    Google Scholar 

  12. Cohen, T., Widdows, D., Schvaneveldt, R., Rindflesch, T.C.: Finding Schizophrenia’s prozac emergent relational similarity in predication space. In: Song, D., Melucci, M., Frommholz, I., Zhang, P., Wang, L., Arafat, S. (eds.) QI 2011. LNCS, vol. 7052, pp. 48–59. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  13. Cohen, T., Widdows, D., Schvaneveldt, R.W., Rindflesch, T.C.: Logical leaps and quantum connectives: forging paths through predication space. In: Proceedings of AAAI Fall Symposium on Quantum Informatics for Cognitive, Social, and Semantic Processes, pp. 11–13 (2010)

    Google Scholar 

  14. Cohen, T., Widdows, D., De Vine, L., Schvaneveldt, R., Rindflesch, T.C.: Many paths lead to discovery: analogical retrieval of cancer therapies. In: Busemeyer, J.R., Dubois, F., Lambert-Mogiliansky, A., Melucci, M. (eds.) QI 2012. LNCS, vol. 7620, pp. 90–101. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  15. Cohen, T., Widdows, D.: Empirical distributional semantics: methods and biomedical applications. J. Biomed. Inf. 42, 390–405 (2009)

    Article  Google Scholar 

  16. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)

    MATH  MathSciNet  Google Scholar 

  17. Kanerva, P., Kristofersson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, vol. 1036 (2000)

    Google Scholar 

  18. Cohen, T., Widdows, D., Schvaneveldt, R., Davies, P., Rindflesch, T.: Discovering discovery patterns with predication-based semantic indexing. J. Biomed. Inf. 45, 1049–1065 (2012)

    Article  Google Scholar 

  19. Gentner, D., Markman, A.B.: Structure mapping in analogy and similarity. Am. psychol. 52(1), 45 (1997)

    Article  Google Scholar 

  20. Gayler, R.W.: Vector symbolic architectures answer jackendoff’s challenges for cognitive neuroscience. In: Slezak, P. (ed.), ICCS/ASCS International Conference on Cognitive Science, (Sydney, Australia. University of New South Wales.), pp. 133–138 (2004)

    Google Scholar 

  21. Plate, T.A.: Holographic Reduced Representation: Distributed Representation for Cognitive Structures. CSLI Publications, Stanford (2003)

    Google Scholar 

  22. De Vine, L., Bruza, P.: Semantic oscillations: encoding context and structure in complex valued holographic vectors. Proceedings of AAAI Fall Symposium on Quantum Informatics for Cognitive Social, and Semantic Processes (2010)

    Google Scholar 

  23. Widdows, D., Cohen, T.: Real, complex, and binary semantic vectors. In: Busemeyer, J.R., Dubois, F., Lambert-Mogiliansky, A., Melucci, M. (eds.) QI 2012. LNCS, vol. 7620, pp. 24–35. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  24. Kanerva, P.: Binary spatter-coding of ordered k-tuples. In: von der Malsburg, C., von Seelen, W., Vorbrüggen, J.C., Sendhoff, B. (eds.) Artificial Neural Networks — ICANN 1996. LNCS, vol. 1112, pp. 869–873. Springer, Heidelberg (1996)

    Google Scholar 

  25. Wahle, M., Widdows, D., Herskovic, J.R., Bernstam, E.V., Cohen, T.: Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts. AMIA Annu. Symp. Proc., 940–949 (2012)

    Google Scholar 

  26. Karlgren, J., Sahlgren, M.: From Words to Understanding, Foundations of Real-World Intelligence, pp. 294–308. CSLI Publications, Stanford (2001)

    Google Scholar 

  27. Hersh, W., Buckley, C., Leone, T.J., Hickam, D.: OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 192–201 (1994)

    Google Scholar 

  28. Aronson, A.R., Lang, F.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inf. Assoc. 17, 229–236 (2010)

    Article  Google Scholar 

  29. Hersh, W.R., Hickam, D.H., Haynes, R.B., McKibbon, K.A.: A performance and failure analysis of SAPHIRE with a MEDLINE test collection. J. Am. Med. Inf. Assoc. 1, 51–60 (1994)

    Article  Google Scholar 

  30. Aronson, A.R., Rindflesch, T.C., Browne, A.C.: Exploiting a large thesaurus for information retrieval. RIAO 94, 197–216 (1994)

    Google Scholar 

  31. Widdows, D., Cohen, T.: The semantic vectors package: new algorithms and public tools for distributional semantics. In: Fourth IEEE International Conference on Semantic Computing (ICSC) (2010)

    Google Scholar 

  32. Apache lucene. https://lucene.apache.org

  33. trec-eval. http://trec.nist.gov/trec_eval/

  34. Koopman, B., Zuccon, G., Bruza, P., Sitbon, L., Lawley, M.: Graph-based concept weighting for medical information retrieval. In: Proceedings of the Seventeenth Australasian Document Computing Symposium, ADCS 2012, pp. 80–87. ACM, New York, NY, USA (2012)

    Google Scholar 

  35. Cohen, T., Widdows, D., Schvaneveldt, R., Rindflesch, T.: Discovery at a distance: farther journey’s in predication space. In: Proceedings of the First International Workshop on the role of Semantic Web in Literature-Based Discovery (SWLBD2012), The IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2012). Philadelphia, PA, USA, 4–7 October 2012

    Google Scholar 

Download references

Acknowledgments

This research was supported by US National Library of Medicine grants R21 LM010826 and R01 LM011563. It was also supported in part by the Intramural Research Program of the US National Institutes of Health, National Library of Medicine. We would like to thank Lance DeVine, for contributing the CHRR implementation that was used in this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Trevor Cohen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Cohen, T., Widdows, D., Rindflesch, T. (2015). Expansion-by-Analogy: A Vector Symbolic Approach to Semantic Search. In: Atmanspacher, H., Bergomi, C., Filk, T., Kitto, K. (eds) Quantum Interaction. QI 2014. Lecture Notes in Computer Science(), vol 8951. Springer, Cham. https://doi.org/10.1007/978-3-319-15931-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-15931-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-15930-0

  • Online ISBN: 978-3-319-15931-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics