Advertisement

Author Identification in Bengali Literary Works

  • Suprabhat Das
  • Pabitra Mitra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6744)

Abstract

In this paper, we study the problem of authorship identification in Bengali literary works. We considered three authors namely Rabindranath Tagore, Bankim Chandra Chattopadhyay and Sukanta Bhattacharyay. It was observed that simple unigram and bi-gram features along with vocabulary richness were rich enough to discriminate amongst these authors. Although results degraded slightly when training set size was considerably small. For larger training set, a classification accuracy of above 90% for unigram feature and almost 100% for bi-gram feature was achieved. Results could be improved further by using more sophisticated features.

Keywords

Stylometry authorship attribution Bengali literary works unigram bi-gram 

References

  1. 1.
    Mendenhall, T.C.: The characteristic curves of composition. Science ns-9, 237–246 (1887)CrossRefGoogle Scholar
  2. 2.
    Zipf, G.K.: Selected Studies of the Principle of Relative Frequency in Language. Harvard University Press, Cambridge (1932)CrossRefGoogle Scholar
  3. 3.
    Yule, G.U.: The Statistical Study of Literary Vocabulary. Cambridge University Press, Cambridge (1944)Google Scholar
  4. 4.
    Mosteller, F., Wallace, D.L.: Inference and Disputed Authorship: The Federalist. Addison-Wesley, Reading (1964)zbMATHGoogle Scholar
  5. 5.
    Burrows, J.F.: Word patterns and story shapes: The statistical analysis of narrative style. Literary and Linguistic Computing 2, 61–70 (1987)CrossRefGoogle Scholar
  6. 6.
    Burrows, J.F.: Not unles you ask nicely: The interpretative nexus between analysis and information. Literary and Linguistic Computing 7, 91–109 (1992)CrossRefGoogle Scholar
  7. 7.
    Binongo, J.N.G., Smith, M.W.A.: The application of principal component analysis to stylometry. Literary and Linguistic Computing 14, 445–466 (1999)CrossRefGoogle Scholar
  8. 8.
    Holmes, D.I., Robertson, M., Paez, R.: Stephen crane and the new-york tribune: A case study in traditional and non-traditional authorship attribution. Computers and the Humanities 35, 315–331 (2001)CrossRefGoogle Scholar
  9. 9.
    Burrows, J.F.: Delta: a measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing 17, 267–287 (2002)CrossRefGoogle Scholar
  10. 10.
    Burrows, J.F.: The englishing of juvenal: Computational stylistics and translated texts. Style 36, 677–699 (2002)Google Scholar
  11. 11.
    Kjell, B., Woods, W.A., Frieder, O.: Information retrieval using letter tuples with neural network and nearest neighbor classifiers. In: IEEE International Conference on Systems, Man and Cybernetics, Vancouver, BC, vol. 2, pp. 1222–1225 (1995)Google Scholar
  12. 12.
    Baayen, H., Van Halteren, H., Tweedie, F.: Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing 11, 121–132 (1996)CrossRefGoogle Scholar
  13. 13.
    Juola, P., Baayen, H.: A controlled-corpus experiment in authorship identification by cross-entropy. Literary and Linguistic Computing 20, 59–67 (2005)CrossRefGoogle Scholar
  14. 14.
    Zhao, Y., Zobel, J.: Searching with style: Authorship attribution in classic literature. In: Proceedings of 30th Australasian Conference on Computer Science, vol. 62, pp. 59–68 (2007)Google Scholar
  15. 15.
    Stamatatos, E.: Author identification: Using text sampling to handle the class imbalance problem. Information Processing & Management 44, 790–799 (2008)CrossRefGoogle Scholar
  16. 16.
    Koppel, M., Schler, J., Argamon, S., Messeri, E.: Authorship attribution with thousands of candidate authors. In: Proceedings of the 29th ACM SIGIR, pp. 659–660. ACM Press, New York (2006)Google Scholar
  17. 17.
    Mansur, M., UzZaman, N., Khan, M.: Analysis of n-gram based text categorization for bangla in a newspaper corpus. In: Proceedings of 9th International Conference on Computer and Information Technology, Dhaka, Bangladesh (2006)Google Scholar
  18. 18.
    Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology 60, 538–556 (2009)CrossRefGoogle Scholar
  19. 19.
    Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)zbMATHGoogle Scholar
  20. 20.
    Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Suprabhat Das
    • 1
  • Pabitra Mitra
    • 1
  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology KharagpurIndia

Personalised recommendations