Semantic Textual Similarity as a Service

Fakouri-Kapourchali, Roghayeh; Yaghoub-Zadeh-Fard, Mohammad-Ali; Khalili, Mehdi

doi:10.1007/978-3-319-76587-7_14

Roghayeh Fakouri-Kapourchali¹⁰,
Mohammad-Ali Yaghoub-Zadeh-Fard¹¹ &
Mehdi Khalili¹⁰

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 234))

Included in the following conference series:

500 Accesses
1 Citations

Abstract

Ensembling well performing models has been proved to outperform individual models in semantic textual similarity task; however, employing existing models still remains a challenge. In this paper, we tackle this issue by providing a service oriented system to index a text similarity model using RESTful services. We also propose a baseline approach, based on an effective penalty-award weighting schema and word-level edit distance, in which pairs of sentences are divided into two main categories based on the number of substitution, insert, and delete required to convert the first sentence to the second one. It is debated that, when the word-level edit distance is very small, it is wiser to measure dissimilarity than similarity. Using knowledge bases along with common natural language processing tools, the proposed method tries to enhance the accuracy of measuring similarity between two sentences. We compared the proposed method with existing approaches, and we found that it produces promising results. Our source code is freely available on GitLab.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
International Workshop on Semantic Evaluation.
2.
https://gitlab.com/mysilver/semantic-text-similarity.
3.
It means only word similarity Type 1 to 5 positively contribute to the semantic similarity of two sentences.
4.
http://www.rxnlp.com/api-reference/text-similarity-api-reference/.
5.
https://dandelion.eu/docs/api/datatxt/sim/v1/.

References

Afzal, N., Wang, Y., Liu, H.: MayoNLP at SemEval-2016 task 1: semantic textual similarity based on lexical semantic net and deep learning semantic model. In: SemEval@ NAACL-HLT, pp. 674–679 (2016)
Google Scholar
Agirre, E., Banea, C., Cardie, C., Cer, D.M., Diab, M.T, Gonzalez-Agirre, A., Guo, W., Lopez-Gazpio, I., Maritxalar, M., Mihalcea, R., et al.: SemEval-2015 task 2: semantic textual similarity, English, Spanish and Pilot on interpretability. In: SemEval@ NAACL-HLT, pp. 252–263 (2015)
Google Scholar
Agirre, E., Banea, C., Cardie, C., Cer, D.M., Diab, M.T, Gonzalez-Agirre, A., Guo, W., Lopez-Gazpio, I., Maritxalar, M., Mihalcea, R., Rigau, G., Wiebe, J.: SemEval-2014 task 10: multilingual semantic textual similarity. In: SemEval@ COLING, pp. 81–91 (2014)
Google Scholar
Agirre, E., Banea, C., Cer, D.M., Diab, M.T., Gonzalez-Agirre, A., Mihalcea, R., Rigau, G., Wiebe, J.: SemEval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: SemEval@ NAACL-HLT, pp. 497–511 (2016)
Google Scholar
Banjade, R., Niraula, N.B., Maharjan, N., Rus, V., Stefanescu, D., Lintean, M.C., Gautam, D.: NeRoSim: a system for measuring and interpreting semantic textual similarity. In: SemEval@ NAACL-HLT, pp. 164–171 (2015)
Google Scholar
Bär, D., Zesch, T., Gurevych, I.: DKPro similarity: an open source framework for text similarity. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Sofia, Bulgaria, pp. 121–126. Association for Computational Linguistics, August 2013
Google Scholar
Beheshti, S.-M.-R., Benatallah, B., Venugopal, S., Ryu, S.H., Motahari-Nezhad, H.R., Wang, W.: A systematic review and comparative analysis of cross-document coreference resolution methods and tools. Computing 99(4), 313–349 (2017)
Article MathSciNet Google Scholar
Beheshti, S.-M.-R., Nezhad, H.R.M., Benatallah, B.: Temporal provenance model (TPM): model and query language. CoRR, abs/1211.5009 (2012)
Google Scholar
Beheshti, S.-M.-R., Venugopal, S., Ryu, S.H., Benatallah, B., Wang, W.: Big data and cross-document coreference resolution: current state and future opportunities. CoRR, abs/1311.3987 (2013)
Google Scholar
Brychcin, T., Svoboda, L.: UWB at SemEval-2016 task 1: semantic textual similarity using lexical, syntactic, and semantic information. In: SemEval@ NAACL-HLT, pp. 588–594 (2016)
Google Scholar
Campagna, G., Ramesh, R., Xu, S., Fischer, M., Lam, M.S.: Almond: the architecture of an open, crowdsourced, privacy-preserving, programmable virtual assistant. In: Proceedings of the 26th International Conference on World Wide Web, pp. 341–350. International World Wide Web Conferences Steering Committee (2017)
Google Scholar
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 task 1: semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055 (2017)
Han, L., Martineau, J., Cheng, D., Thomas, C.: Samsung: align-and-differentiate approach to semantic textual similarity. In: SemEval@ NAACL-HLT, pp. 172–177 (2015)
Google Scholar
Hänig, C., Remus, R., De La Puente, X.: ExB themis: extensive feature extraction from word alignments for semantic textual similarity. In: SemEval@ NAACL-HLT, pp. 264–268 (2015)
Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MathSciNet Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 1188–1196 (2014)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford coreNLP natural language processing toolkit. In: ACL (System Demonstrations), pp. 55–60 (2014)
Google Scholar
Maurer, H.A., Kappe, F., Zaka, B.: Plagiarism-a survey. J. UCS 12(8), 1050–1084 (2006)
Google Scholar
Mawson, C.O.S.: Roget’s thesaurus of english words and phrases (1976)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Monge, A.E., Elkan, C., et al.: The field matching problem: algorithms and applications. In: KDD, pp. 267–270 (1996)
Google Scholar
Navigli, R., Ponzetto, S.P.: BabelNet: building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 216–225. Association for Computational Linguistics (2010)
Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Przybyła, P., Nguyen, N.T.H., Shardlow, M., Kontonatsios, G., Ananiadou, S.: NaCTeM at SemEval-2016 task 1: inferring sentence-level semantic similarity from an ensemble of complementary lexical and sentence-level features. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), pp. 614–620 (2016)
Google Scholar
Rychalska, B., Pakulska, K. Chodorowska, K., Walczak, W., Andruszkiewicz, P.: Samsung Poland NLP team at SemEval-2016 task 1: necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity. In: SemEval@ NAACL-HLT, pp. 602–608 (2016)
Google Scholar
Sultan, M.A., Bethard, S., Sumner, T.: DLS@CU: sentence similarity from word alignment. In: SemEval@ COLING, pp. 241–246 (2014)
Google Scholar
Sultan, M.A., Bethard, S., Sumner, T.: DLS@CU: sentence similarity from word alignment and semantic vector composition. In: SemEval@ NAACL-HLT, pp. 148–153 (2015)
Google Scholar
Šarić, F., Glavaš, G., Karan, M., Šnajder, J., Bašić, B.D.: TakeLab: systems for measuring semantic text similarity. In: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), Montréal, Canada, pp. 441–448. Association for Computational Linguistics, 7–8 June 2012
Google Scholar
Yaghoub-Zadeh-Fard, M.A., Minaei-Bidgoli, B., Rahmani, S., Shahrivari, S.: PSWG: an automatic stop-word list generator for Persian information retrieval systems based on similarity function pos information. In: 2015 2nd International Conference on Knowledge-Based Engineering and Innovation (KBEI), pp. 111–117, November 2015
Google Scholar
Yaghoub-Zadeh-Fard, M.A., Rahmani, S., Kashefi, O., Minaei-Bidgoli, B.: An efficient set of parts of speech in Persian information retrieval systems (1394)
Google Scholar

Download references

Author information

Authors and Affiliations

Payame Noor University, Tehran, Iran
Roghayeh Fakouri-Kapourchali & Mehdi Khalili
University of New South Wales, Kensington, Australia
Mohammad-Ali Yaghoub-Zadeh-Fard

Authors

Roghayeh Fakouri-Kapourchali
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad-Ali Yaghoub-Zadeh-Fard
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Khalili
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad-Ali Yaghoub-Zadeh-Fard .

Editor information

Editors and Affiliations

Macquarie University, Sydney, New South Wales, Australia
Amin Beheshti
CSIRO Research Organisation, Dutton Park, Queensland, Australia
Mustafa Hashmi
Royal Melbourne Institute of Technology, Melbourne, Victoria, Australia
Hai Dong
Macquarie University, Sydney, New South Wales, Australia
Wei Emma Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fakouri-Kapourchali, R., Yaghoub-Zadeh-Fard, MA., Khalili, M. (2018). Semantic Textual Similarity as a Service. In: Beheshti, A., Hashmi, M., Dong, H., Zhang, W. (eds) Service Research and Innovation. ASSRI ASSRI 2015 2017. Lecture Notes in Business Information Processing, vol 234. Springer, Cham. https://doi.org/10.1007/978-3-319-76587-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-76587-7_14
Published: 03 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76586-0
Online ISBN: 978-3-319-76587-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics