Skip to main content

A Comparative Study of Two Short Text Semantic Similarity Measures

  • Conference paper
Agent and Multi-Agent Systems: Technologies and Applications (KES-AMSTA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4953))

Abstract

This paper describes a comparative study of STASIS and LSA. These measures of semantic similarity can be applied to short texts for use in Conversational Agents (CAs). CAs are computer programs that interact with humans through natural language dialogue. Business organizations have spent large sums of money in recent years developing them for online customer self-service, but achievements have been limited to simple FAQ systems. We believe this is due to the labour-intensive process of scripting, which could be reduced radically by the use of short-text semantic similarity measures. “Short texts” are typically 10-20 words long but are not required to be grammatically correct sentences, for example spoken utterances and text messages. We also present a benchmark data set of 65 sentence pairs with human-derived similarity ratings. This data set is the first of its kind, specifically developed to evaluate such measures and we believe it will be valuable to future researchers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Li, Y., et al.: Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE Transactions on Knowledge and Data Engineering 18(8), 1138–1150 (2006)

    Article  Google Scholar 

  2. Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)

    Article  Google Scholar 

  3. Lapalme, G., Lamontagne, L.: Textual Reuse for Email Response. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 242–256. Springer, Heidelberg (2004)

    Google Scholar 

  4. Glass, J., et al.: A Framework for Developing Conversational User Interfaces. In: Fourth International Conference on Computer-Aided Design of User Interfaces, Funchal, Isle of Madeira, Portugal (2004)

    Google Scholar 

  5. Bickmore, T., Giorgino, T.: Health dialog systems for patients and consumers. J. Biomed. Inform. 39(5), 556–571 (2006)

    Article  Google Scholar 

  6. Cassell, J., et al.: Embodied Conversational Agents (2000)

    Google Scholar 

  7. Gorin, A.L., Riccardi, G., Wright, J.H.: How I help you? Speech Communication 23, 113–127 (1997)

    Article  Google Scholar 

  8. Graesser, A.C., et al.: AutoTutor: An Intelligent Tutoring System With Mixed Initiative Dialogue. IEEE Transactions on Education 48(4), 612–618 (2005)

    Article  Google Scholar 

  9. McGeary, Z., et al.: Online Self-service: The Slow Road to Search Effectiveness, in Customer Relationship Management (2005)

    Google Scholar 

  10. Sammut, C.: Managing Context in a Conversational Agent. Electronic Transactions in Artificial Intelligence Volume, 191–201 (2001)

    Google Scholar 

  11. Michie, D.: Return of the Imitation Game. Electronic Transactions in Artificial Intelligence Volume, 205–220 (2001)

    Google Scholar 

  12. Resnik, P., Diab, M.: Measuring Verb Similarity. In: Twenty Second Annual Meeting of the Cognitive Science Society (COGSCI 2000), Philadelphia (2000)

    Google Scholar 

  13. Resnik, P.: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11, 95–130 (1999)

    MATH  Google Scholar 

  14. Prior, A., Bentin, S.: Incidental formation of episodic associations: The importance of sen-tential context. Memory and Cognition 31, 306–316 (2003)

    Google Scholar 

  15. McNamara, T.P., Sternberg, R.J.: Processing Verbal Relations. Intelligence 15, 193–221 (1991)

    Article  Google Scholar 

  16. Miller, G.A., Charles, W.G.: Contextual Correlates of Semantic Similarity. Language and Cognitive Processes 6(1), 1–28 (1991)

    Article  Google Scholar 

  17. Viggliocho, G., et al.: Representing the meanings of object and action words: The featural and unitary semantic space hypothesis. Cognition 85, B1–B69 (2002)

    Google Scholar 

  18. Charles, W.G.: Contextual Correlates of Meaning. Applied Psycholinguistics 21, 505–524 (2000)

    Article  Google Scholar 

  19. Klein, D., Murphy, G.: Paper has been my ruin: conceptual relations of polysemous senses. Journal of Memory and Language 47(4), 548–570 (2002)

    Article  Google Scholar 

  20. Tversky, A.: Features of Similarity. Psychological Review 84(4), 327–352 (1977)

    Article  Google Scholar 

  21. Gleitman, L.R., et al.: Similar, and similar concepts. Cognition 58, 321–376 (1996)

    Article  Google Scholar 

  22. Deerwester, S., et al.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  23. Blalock, H.M.: Social Statistics. McGraw-Hill Inc., New York (1979)

    Google Scholar 

  24. Rubenstein, H., Goodenough, J.: Contextual Correlates of Synonymy. Communications of the ACM 8(10), 627–633 (1965)

    Article  Google Scholar 

  25. Sinclair, J.: Collins Cobuild English Dictionary for Advanced Learners, 3rd edn. Harper Collins, New York (2001)

    Google Scholar 

  26. O’Shea, J.D.: http://www.docm.mmu.ac.uk/STAFF/J.Oshea/

  27. Laham, D.: (October 1998) (cited 30/09/2007), http://lsa.colorado.edu/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ngoc Thanh Nguyen Geun Sik Jo Robert J. Howlett Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

O’Shea, J., Bandar, Z., Crockett, K., McLean, D. (2008). A Comparative Study of Two Short Text Semantic Similarity Measures. In: Nguyen, N.T., Jo, G.S., Howlett, R.J., Jain, L.C. (eds) Agent and Multi-Agent Systems: Technologies and Applications. KES-AMSTA 2008. Lecture Notes in Computer Science(), vol 4953. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78582-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78582-8_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78581-1

  • Online ISBN: 978-3-540-78582-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics