Word Level Plagiarism Detection of Marathi Text Using N-Gram Approach

  • Ramesh R. NaikEmail author
  • Maheshkumar B. LandgeEmail author
  • C. Namrata MahenderEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1037)


Plagiarism is increasing day by day. Plagiarism detection is one of the most complex, but a must requirement. This paper deals with word level plagiarism detection for Marathi text by using N-gram language model and a Marathi corpus. This is most simple in form still provides good depth for understanding and emphasing copy-paste and paraphrased plagiarism detection. It forms basis for sentence as well as paragraph level processing


Plagiarism detection N-gram Marathi language 



Authors would like to acknowledge and thanks to CSRI DST Major Project sanctioned No.SR/CSRI/71/2015(G), Computational and Psycholinguistic Research Lab Facility supporting to this work and Department of Computer Science and Information Technology, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, Maharashtra, India.


  1. 1.
    University of Melbourne (2005). What is plagiarism? Accessed 27 June 2018
  2. 2.
    Paul clough, plagiarism in natural and programming languages an overview of current tools and technologies, Technical report, University of Sheffeld, Sheffeld, UK, June 2000Google Scholar
  3. 3.
    Grozea, C., et al.: ENCOPLOT: pairwise sequence matching in linear time applied to plagiarism detection. In 3rd PAN Workshop. Uncovering Plagiarism, Authorship, and Social Software Misuse, p. 10 (2009)Google Scholar
  4. 4.
    Grozea, C., Popescu, M.: Who’s the thief? automatic detection of the direction of plagiarism. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 700–710. Springer, Heidelberg (2010). Scholar
  5. 5.
    Barrón-Cedeño, A., Rosso, P.: On automatic plagiarism detection based on n-grams comparison. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 696–700. Springer, Heidelberg (2009). Scholar
  6. 6.
    Chiu, S., Uysal, I., Croft, B.W.: Evaluating text reuse discovery on the web. In: Proceedings of the Third Symposium on Information Interaction in Context, pp. 299–304 (2010)Google Scholar
  7. 7.
    Weber Wulff, D.: Copy, Shake, and Paste- A blog about plagiarism from a German professor, written in English. Accessed 28 June 2018
  8. 8.
    Lancaster, T.: Effective and efficient plagiarism detection. Ph.D. thesis, school of computing, information systems and mathematics south bank university (2003)Google Scholar
  9. 9.
    Barnbaum, C.: Plagiarism: A Student’s Guide to Recognizing It and Avoiding It. Valdos Ta state university. Accessed 28 June 2018
  10. 10.
    Maurer, H., et al.: Plagiarism-a survey. J. Univ. Comput. Sci. 12, 1050–1084 (2006)Google Scholar
  11. 11.
    Bretag, T., Mahmud, S.: Self-plagiarism or appropriate textual re-use. J. Acad. Ethics 7, 193–205 (2009)CrossRefGoogle Scholar
  12. 12.
    Vani, K., Gupta, D.: Using k-means cluster based techniques in external plagiarism detection. In: 2014 International Conference on Contemporary Computing and Informatics (IC3I), pp. 1268–1273. IEEE 2014Google Scholar
  13. 13.
    Jurafsky, D., Martin, J.H.: Text book on “Speech and Language Processing”, Copyright c 2016. All rights reserved (2017)Google Scholar
  14. 14.
    What-are-n-grams.html. Accessed 18 Aug 2018

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of CS & ITDr. B. A. M. UniversityAurangabadIndia

Personalised recommendations