Skip to main content

Identifying Similar Sentences by Using N-Grams of Characters

  • Conference paper
  • First Online:
Recent Trends and Future Technology in Applied Intelligence (IEA/AIE 2018)

Abstract

Nowadays, detecting similar sentences can play a major role in various fundamental applications for reading and analyzing sentences like information retrieval, categorization, detection of paraphrases, summarizing, translation etc. In this work, we present a novel method for the detection of similar sentences. This method highlights the using of units of n-grams of characters. The online dictionary as well as any search engine are not being used. Hence, this idea leads our method a simplest and optimum way to handle the similarities between two sentences. In addition, the grammar rules as well as any syntax have not been used in our method. That’s why, our approach is language-independent. We analyze and compare a range of similarity measures with our methodology. Meanwhile, the complexity of our method is O(N2) which is pretty much better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Akermi, I., Faiz, R.: An approach to semantic text similarity computing. In: Silhavy, R., Senkerik, R., Oplatkova, Z.K., Silhavy, P., Prokopova, Z. (eds.) Modern Trends and Techniques in Computer Science. AISC, vol. 285, pp. 383–393. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06740-7_32

    Chapter  Google Scholar 

  2. Akermi, I., Faiz, R.: Hybrid method for computing word-pair similarity based on web content. In: Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics. ACM, Craiova (2012)

    Google Scholar 

  3. Kumari, P., Ravishankar, K.: Measuring Semantic Similarity between Words using Page-Count and Pattern Clustering Methods (2013)

    Google Scholar 

  4. Takale, S.A., Nandgaonkar, S.S.: Measuring semantic similarity between words using web documents. Int. J. Adv. Comput. Sci. Appl. (2010)

    Google Scholar 

  5. Rijsbergen, C.J.V.: Information Retrieval. Butterworth-Heinemann, London (1979)

    Google Scholar 

  6. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  7. Bollegala, D., Matsuo, Y., Ishizuka, M.: WebSim: a web-based semantic similarity measure. In: Proceedings of 21st Annual Conference of the Japanese Society of Artificial Intelligence (2007)

    Google Scholar 

  8. Manning, C.: Foundations of statistical natural language processing. Nat. Lang. Eng. 8(1), 91–92 (2002)

    Google Scholar 

  9. Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10(3), 61–74 (1999)

    Google Scholar 

  10. Islam, A., Milios, E., Kešelj, V.: Text similarity using Google tri-grams. In: Kosseim, L. Inkpen, D. (eds.) AI 2012. LNCS (LNAI), vol. 7310, pp. 312–317. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30353-1_29

    Chapter  Google Scholar 

  11. Kondrak, G.: N-gram similarity and distance. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 115–126. Springer, Heidelberg (2005). https://doi.org/10.1007/11575832_13

    Chapter  Google Scholar 

  12. Grefenstette, G.: Comparing two language identification schemes. In: Proceedings of JADT 1995 (1995)

    Google Scholar 

  13. Damashek, M.: Gauging similarity with n-grams: language-independent categorization of sentence. Science 267, 843–848 (1995)

    Article  Google Scholar 

  14. Huffman, S., Damashek, M.: Acquaintance: a novel vector-space n-gram technique for document categorization. In: NIST Special Publication, National Institute of Standards and Technology, pp. 305–310 (1995)

    Google Scholar 

  15. Biskri, I., Delisle, S.: Les n-grams de caractères pour l’aide à l’extraction de connaissances dans des bases de données sentenceuelles multilingues. In: Proceedings of TALN-2001, pp. 93–102 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ismaïl Biskri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sultana, S., Biskri, I. (2018). Identifying Similar Sentences by Using N-Grams of Characters. In: Mouhoub, M., Sadaoui, S., Ait Mohamed, O., Ali, M. (eds) Recent Trends and Future Technology in Applied Intelligence. IEA/AIE 2018. Lecture Notes in Computer Science(), vol 10868. Springer, Cham. https://doi.org/10.1007/978-3-319-92058-0_80

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92058-0_80

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92057-3

  • Online ISBN: 978-3-319-92058-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics