Skip to main content

Semantic Textual Similarity as a Service

  • Conference paper
  • First Online:
Service Research and Innovation (ASSRI 2015, ASSRI 2017)

Abstract

Ensembling well performing models has been proved to outperform individual models in semantic textual similarity task; however, employing existing models still remains a challenge. In this paper, we tackle this issue by providing a service oriented system to index a text similarity model using RESTful services. We also propose a baseline approach, based on an effective penalty-award weighting schema and word-level edit distance, in which pairs of sentences are divided into two main categories based on the number of substitution, insert, and delete required to convert the first sentence to the second one. It is debated that, when the word-level edit distance is very small, it is wiser to measure dissimilarity than similarity. Using knowledge bases along with common natural language processing tools, the proposed method tries to enhance the accuracy of measuring similarity between two sentences. We compared the proposed method with existing approaches, and we found that it produces promising results. Our source code is freely available onĀ GitLab.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    International Workshop on Semantic Evaluation.

  2. 2.

    https://gitlab.com/mysilver/semantic-text-similarity.

  3. 3.

    It means only word similarity Type 1 to 5 positively contribute to the semantic similarity of two sentences.

  4. 4.

    http://www.rxnlp.com/api-reference/text-similarity-api-reference/.

  5. 5.

    https://dandelion.eu/docs/api/datatxt/sim/v1/.

References

  1. Afzal, N., Wang, Y., Liu, H.: MayoNLP at SemEval-2016 task 1: semantic textual similarity based on lexical semantic net and deep learning semantic model. In: SemEval@ NAACL-HLT, pp. 674ā€“679 (2016)

    Google ScholarĀ 

  2. Agirre, E., Banea, C., Cardie, C., Cer, D.M., Diab, M.T, Gonzalez-Agirre, A., Guo, W., Lopez-Gazpio, I., Maritxalar, M., Mihalcea, R., et al.: SemEval-2015 task 2: semantic textual similarity, English, Spanish and Pilot on interpretability. In: SemEval@ NAACL-HLT, pp. 252ā€“263 (2015)

    Google ScholarĀ 

  3. Agirre, E., Banea, C., Cardie, C., Cer, D.M., Diab, M.T, Gonzalez-Agirre, A., Guo, W., Lopez-Gazpio, I., Maritxalar, M., Mihalcea, R., Rigau, G., Wiebe, J.: SemEval-2014 task 10: multilingual semantic textual similarity. In: SemEval@ COLING, pp. 81ā€“91 (2014)

    Google ScholarĀ 

  4. Agirre, E., Banea, C., Cer, D.M., Diab, M.T., Gonzalez-Agirre, A., Mihalcea, R., Rigau, G., Wiebe, J.: SemEval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: SemEval@ NAACL-HLT, pp. 497ā€“511 (2016)

    Google ScholarĀ 

  5. Banjade, R., Niraula, N.B., Maharjan, N., Rus, V., Stefanescu, D., Lintean, M.C., Gautam, D.: NeRoSim: a system for measuring and interpreting semantic textual similarity. In: SemEval@ NAACL-HLT, pp. 164ā€“171 (2015)

    Google ScholarĀ 

  6. BƤr, D., Zesch, T., Gurevych, I.: DKPro similarity: an open source framework for text similarity. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Sofia, Bulgaria, pp. 121ā€“126. Association for Computational Linguistics, August 2013

    Google ScholarĀ 

  7. Beheshti, S.-M.-R., Benatallah, B., Venugopal, S., Ryu, S.H., Motahari-Nezhad, H.R., Wang, W.: A systematic review and comparative analysis of cross-document coreference resolution methods and tools. Computing 99(4), 313ā€“349 (2017)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  8. Beheshti, S.-M.-R., Nezhad, H.R.M., Benatallah, B.: Temporal provenance model (TPM): model and query language. CoRR, abs/1211.5009 (2012)

    Google ScholarĀ 

  9. Beheshti, S.-M.-R., Venugopal, S., Ryu, S.H., Benatallah, B., Wang, W.: Big data and cross-document coreference resolution: current state and future opportunities. CoRR, abs/1311.3987 (2013)

    Google ScholarĀ 

  10. Brychcin, T., Svoboda, L.: UWB at SemEval-2016 task 1: semantic textual similarity using lexical, syntactic, and semantic information. In: SemEval@ NAACL-HLT, pp. 588ā€“594 (2016)

    Google ScholarĀ 

  11. Campagna, G., Ramesh, R., Xu, S., Fischer, M., Lam, M.S.: Almond: the architecture of an open, crowdsourced, privacy-preserving, programmable virtual assistant. In: Proceedings of the 26th International Conference on World Wide Web, pp. 341ā€“350. International World Wide Web Conferences Steering Committee (2017)

    Google ScholarĀ 

  12. Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 task 1: semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055 (2017)

  13. Han, L., Martineau, J., Cheng, D., Thomas, C.: Samsung: align-and-differentiate approach to semantic textual similarity. In: SemEval@ NAACL-HLT, pp. 172ā€“177 (2015)

    Google ScholarĀ 

  14. HƤnig, C., Remus, R., De La Puente, X.: ExB themis: extensive feature extraction from word alignments for semantic textual similarity. In: SemEval@ NAACL-HLT, pp. 264ā€“268 (2015)

    Google ScholarĀ 

  15. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79ā€“86 (1951)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  16. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 1188ā€“1196 (2014)

    Google ScholarĀ 

  17. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford coreNLP natural language processing toolkit. In: ACL (System Demonstrations), pp. 55ā€“60 (2014)

    Google ScholarĀ 

  18. Maurer, H.A., Kappe, F., Zaka, B.: Plagiarism-a survey. J. UCS 12(8), 1050ā€“1084 (2006)

    Google ScholarĀ 

  19. Mawson, C.O.S.: Rogetā€™s thesaurus of english words and phrases (1976)

    Google ScholarĀ 

  20. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111ā€“3119 (2013)

    Google ScholarĀ 

  21. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39ā€“41 (1995)

    ArticleĀ  Google ScholarĀ 

  22. Monge, A.E., Elkan, C., et al.: The field matching problem: algorithms and applications. In: KDD, pp. 267ā€“270 (1996)

    Google ScholarĀ 

  23. Navigli, R., Ponzetto, S.P.: BabelNet: building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 216ā€“225. Association for Computational Linguistics (2010)

    Google ScholarĀ 

  24. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532ā€“1543 (2014)

    Google ScholarĀ 

  25. Przybyła, P., Nguyen, N.T.H., Shardlow, M., Kontonatsios, G., Ananiadou, S.: NaCTeM at SemEval-2016 task 1: inferring sentence-level semantic similarity from an ensemble of complementary lexical and sentence-level features. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), pp. 614ā€“620 (2016)

    Google ScholarĀ 

  26. Rychalska, B., Pakulska, K. Chodorowska, K., Walczak, W., Andruszkiewicz, P.: Samsung Poland NLP team at SemEval-2016 task 1: necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity. In: SemEval@ NAACL-HLT, pp. 602ā€“608 (2016)

    Google ScholarĀ 

  27. Sultan, M.A., Bethard, S., Sumner, T.: DLS@CU: sentence similarity from word alignment. In: SemEval@ COLING, pp. 241ā€“246 (2014)

    Google ScholarĀ 

  28. Sultan, M.A., Bethard, S., Sumner, T.: DLS@CU: sentence similarity from word alignment and semantic vector composition. In: SemEval@ NAACL-HLT, pp. 148ā€“153 (2015)

    Google ScholarĀ 

  29. Å arić, F., GlavaÅ”, G., Karan, M., Å najder, J., BaÅ”ić, B.D.: TakeLab: systems for measuring semantic text similarity. In: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), MontrĆ©al, Canada, pp. 441ā€“448. Association for Computational Linguistics, 7ā€“8 June 2012

    Google ScholarĀ 

  30. Yaghoub-Zadeh-Fard, M.A., Minaei-Bidgoli, B., Rahmani, S., Shahrivari, S.: PSWG: an automatic stop-word list generator for Persian information retrieval systems based on similarity function pos information. In: 2015 2nd International Conference on Knowledge-Based Engineering and Innovation (KBEI), pp. 111ā€“117, November 2015

    Google ScholarĀ 

  31. Yaghoub-Zadeh-Fard, M.A., Rahmani, S., Kashefi, O., Minaei-Bidgoli, B.: An efficient set of parts of speech in Persian information retrieval systems (1394)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad-Ali Yaghoub-Zadeh-Fard .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fakouri-Kapourchali, R., Yaghoub-Zadeh-Fard, MA., Khalili, M. (2018). Semantic Textual Similarity as a Service. In: Beheshti, A., Hashmi, M., Dong, H., Zhang, W. (eds) Service Research and Innovation. ASSRI ASSRI 2015 2017. Lecture Notes in Business Information Processing, vol 234. Springer, Cham. https://doi.org/10.1007/978-3-319-76587-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-76587-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-76586-0

  • Online ISBN: 978-3-319-76587-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics