Abstract
Nowadays, detecting similar sentences can play a major role in various fundamental applications for reading and analyzing sentences like information retrieval, categorization, detection of paraphrases, summarizing, translation etc. In this work, we present a novel method for the detection of similar sentences. This method highlights the using of units of n-grams of characters. The online dictionary as well as any search engine are not being used. Hence, this idea leads our method a simplest and optimum way to handle the similarities between two sentences. In addition, the grammar rules as well as any syntax have not been used in our method. That’s why, our approach is language-independent. We analyze and compare a range of similarity measures with our methodology. Meanwhile, the complexity of our method is O(N2) which is pretty much better.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akermi, I., Faiz, R.: An approach to semantic text similarity computing. In: Silhavy, R., Senkerik, R., Oplatkova, Z.K., Silhavy, P., Prokopova, Z. (eds.) Modern Trends and Techniques in Computer Science. AISC, vol. 285, pp. 383–393. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06740-7_32
Akermi, I., Faiz, R.: Hybrid method for computing word-pair similarity based on web content. In: Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics. ACM, Craiova (2012)
Kumari, P., Ravishankar, K.: Measuring Semantic Similarity between Words using Page-Count and Pattern Clustering Methods (2013)
Takale, S.A., Nandgaonkar, S.S.: Measuring semantic similarity between words using web documents. Int. J. Adv. Comput. Sci. Appl. (2010)
Rijsbergen, C.J.V.: Information Retrieval. Butterworth-Heinemann, London (1979)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Bollegala, D., Matsuo, Y., Ishizuka, M.: WebSim: a web-based semantic similarity measure. In: Proceedings of 21st Annual Conference of the Japanese Society of Artificial Intelligence (2007)
Manning, C.: Foundations of statistical natural language processing. Nat. Lang. Eng. 8(1), 91–92 (2002)
Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10(3), 61–74 (1999)
Islam, A., Milios, E., Kešelj, V.: Text similarity using Google tri-grams. In: Kosseim, L. Inkpen, D. (eds.) AI 2012. LNCS (LNAI), vol. 7310, pp. 312–317. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30353-1_29
Kondrak, G.: N-gram similarity and distance. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 115–126. Springer, Heidelberg (2005). https://doi.org/10.1007/11575832_13
Grefenstette, G.: Comparing two language identification schemes. In: Proceedings of JADT 1995 (1995)
Damashek, M.: Gauging similarity with n-grams: language-independent categorization of sentence. Science 267, 843–848 (1995)
Huffman, S., Damashek, M.: Acquaintance: a novel vector-space n-gram technique for document categorization. In: NIST Special Publication, National Institute of Standards and Technology, pp. 305–310 (1995)
Biskri, I., Delisle, S.: Les n-grams de caractères pour l’aide à l’extraction de connaissances dans des bases de données sentenceuelles multilingues. In: Proceedings of TALN-2001, pp. 93–102 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Sultana, S., Biskri, I. (2018). Identifying Similar Sentences by Using N-Grams of Characters. In: Mouhoub, M., Sadaoui, S., Ait Mohamed, O., Ali, M. (eds) Recent Trends and Future Technology in Applied Intelligence. IEA/AIE 2018. Lecture Notes in Computer Science(), vol 10868. Springer, Cham. https://doi.org/10.1007/978-3-319-92058-0_80
Download citation
DOI: https://doi.org/10.1007/978-3-319-92058-0_80
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92057-3
Online ISBN: 978-3-319-92058-0
eBook Packages: Computer ScienceComputer Science (R0)