Influence of Language-Specific Features for Author Identification on Indian Literature in Marathi

  • Sunil Digambarrao Kale
  • Rajesh S. Prasad
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1118)


Author identification (AI) is a process of investigating author of an anonymous text document. AI has a great help in digital forensic, copyright issues, plagiarism detection, etc. for making the law process quick and efficient. This paper presents AI on Indian regional language Marathi. Insted of it kindly replce it by this scentence: In this research paper, we proposed 21 language-specific lexical features. Validation of these proposed features is done on “Author wise Marathi Language Typewritten Text Corpus” published by us at Indian Language Technology Proliferation and Deployment Center. Experimentation is performed with proposed 21 features. Performance is compared through various machine learning algorithms like Naïve Bayes, k-Nearest Neighbor and Sequential Minimal Optimization. k-Nearest Neighbor performs well over Naïve Bayes and Sequential Minimal Optimization with average accuracy achieved as 82.06 on comedy articles and 85.44 on mixed articles. Proposed language-specific features provide significant improvement in result of accuracy over traditional features.


Author identification Feature extraction Machine learning Marathi language Stylometry Text mining 


  1. 1.
    Prasad, J.R., Kulkarni, U.V., Prasad, R.S.: Offline handwritten character recognition of gujrati script using pattern matching. In: 2009 3rd International Conference on Anti-counterfeiting, Security, and Identification in Communication, pp. 611–615 (2009)Google Scholar
  2. 2.
    Prasad, J.R., Kulkarni, U.V., Prasad, R.S.: Template matching algorithm for Gujrati character recognition. In: 2009 Second International Conference on Emerging Trends in Engineering & Technology, pp. 263–268 (2009)Google Scholar
  3. 3.
    Kale, S.D., Prasad, R.S.: Author identification on literature in different languages: a systematic survey. In: 2018 International Conference On Advances in Communication and Computing Technology (ICACCT), pp. 174–181 (2018)Google Scholar
  4. 4.
    Kale, S.D., Prasad, R.S.: A systematic review on author identification methods. Int. J. Rough Sets Data Anal. 4(2), 81–91 (2017)CrossRefGoogle Scholar
  5. 5.
    Juola, P.: Large-scale experiments in authorship attribution. English Stud. 93(3), 275–283 (2012)CrossRefGoogle Scholar
  6. 6.
    Das, S., Mitra, P.: Author identification in bengali literary works. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6744, LNCS, Springer, pp. 220–226 (2011)Google Scholar
  7. 7.
    Kaur, N., Verma, A.: Authorship attribution of punjabi poetry using SVM classifier. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 5(5), 1055–1061 (2015)Google Scholar
  8. 8.
    Pandian, A., Ramalingam, V.V., Vishnu Preet, R.P.: Authorship identification for Tamil classical poem (Mukkoodar Pallu) using C4.5 algorithm. Indian J. Sci. Technol. 9(47), 1–5 (2016)Google Scholar
  9. 9.
    Prasad, S.N., Narsimha, V.B., Reddy, P.V., Babu, A.V.: Influence of lexical, syntactic and structural features and their combination on authorship attribution for telugu text. Procedia Comput. Sci. 48(C), 58–64 (2015)Google Scholar
  10. 10.
    Alam, H., Kumar, A.: Multi-lingual author identification and linguistic feature extraction—a machine learning approach. In: 2013 IEEE International Conference on Technologies for Homeland Security (HST), pp. 386–389 (2013)Google Scholar
  11. 11.
    Ma, J., Xue, B., Zhang, M.: A profile-based authorship attribution approach to forensic identification in chinese online messages. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9650, Springer, pp. 33–52 (2016)Google Scholar
  12. 12.
    Kestemont, M.: Stylometric authorship attribution for the middle dutch mystical tradition from Groenendaal. Dutch Crossing, pp. 1–35 (2016)Google Scholar
  13. 13.
    Peng, F., Schuurmans, D., Wang, S., Keselj, V.: Language independent authorship attribution using character level language models. In: Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics—EACL ’03, vol. 1, p. 267 (2003)Google Scholar
  14. 14.
    Tsuboi, Y., Matsumoto, Y.: Authorship identification for heterogeneous documents. IPSJ SIG Notes 148–153 (2002)Google Scholar
  15. 15.
    Sboev, A., Litvinova, T., Gudovskikh, D., Rybka, R., Moloshnikov, I.: Machine learning models of text categorization by author gender using topic-independent features. Procedia Comput. Sci. 101, 135–142 (2016)CrossRefGoogle Scholar
  16. 16.
    Saygili, N.S., Amghar, T., Levrat, B., Acarman, T.: Taking advantage of Turkish characteristic features to achieve authorship attribution problems for Turkish. In: 2017 25th Signal Processing and Communications Applications Conference (SIU), pp. 1–4 (2017)Google Scholar
  17. 17.
    Kale, S.D., Prasad, R.: Author identification using sequential minimal optimization with rule-based decision tree on Indian literature in Marathi. Procedia Comput. Sci. J. 132, 1086–1101 (2018)Google Scholar
  18. 18.
    Kale, S.D., Prasad, R.S.: Author identification on imbalanced class dataset of Indian literature in marathi. Int. J. Comput. Sci. Eng. 6(11), 542–547 (2019)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Sunil Digambarrao Kale
    • 1
    • 2
  • Rajesh S. Prasad
    • 3
  1. 1.Smt. Kashibai Navale College of EngineeringPuneIndia
  2. 2.Pune Institute of Computer TechnologyPuneIndia
  3. 3.Sinhgad Institute of Technology and SciencePuneIndia

Personalised recommendations