Skip to main content

A Comparative Survey of Authorship Attribution on Short Arabic Texts

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Included in the following conference series:

Abstract

In this paper, we deal with the problem of authorship attribution (AA) on short Arabic texts. So, we make a survey on a set of several features and classifiers that are employed for the task of AA. This investigation uses characters, character bigrams, character trigrams, character tetragrams, words, word bigrams and rare words. The AA is ensured by 4 different measures, 3 classifiers (Multi-Layer Perceptron (MLP), Support Vector Machines (SVM) and Linear Regression (LR)) and a new proposed fusion called VBF (i.e. Vote Based Fusion). The evaluation is done on short Arabic texts extracted from the AAAT dataset (AA of Ancient Arabic Texts). Although the task of AA is known to be difficult on short texts, the different results have revealed interesting information on the performances of the features and classification techniques on Arabic text data. For instance, character-based features appear to be better than word-based features for short texts. Furthermore, the proposed VBF fusion provided high performances with an accuracy of 90% of good AA, which is higher than the score of the original classifier using only one feature. Globally, the results of this investigation shed light on the efficiency and pertinency of several features and classifiers in AA of short Arabic texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Sayoud, H.: Author discrimination between the Holy Quran and Prophet’s statements. Lit. Linguist. Comput. 27(4), 427–444 (2012)

    Article  Google Scholar 

  2. Chowdhury, H.A., Bhattacharyya, D.K.: Plagiarism: taxonomy, tools and detection techniques. In: Paper of the 19th National Convention on Knowledge, Library and Information Networking (NACLIN 2016) held at Tezpur University, Assam, India (2016)

    Google Scholar 

  3. Sayoud, H.: Segmental analysis based authorship discrimination between the Holy Quran and Prophet’s statements. Can. Soc. Digit. Hum., Digital Studies Journal (2015)

    Google Scholar 

  4. Tambouratzis, G., Hairetakis, G., Markantonatou, S., Carayannis, G.: Applying the SOM model to text classification according to register and stylistic content. Int. J. Neural Syst. 13(1), 1–11 (2003)

    Article  Google Scholar 

  5. Holmes, D.I.: Authorship attribution. Comput. Humanit. 28, 87–106 (1994)

    Article  Google Scholar 

  6. Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Automatic text categorization in terms of genre and author. Comput. Linguist. 26(4), 471–495 (2000)

    Article  Google Scholar 

  7. Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)

    Article  Google Scholar 

  8. Mendenhall, T.C.: The characteristic curves of composition. Science 9, 237–249 (1887)

    Article  Google Scholar 

  9. Argamon S., Levitan, S.: Measuring the usefulness of function words for authorship attribution. In: Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing (2005)

    Google Scholar 

  10. Burrows, J.F.: Word patterns and story shapes: the statistical analysis of narrative style. Lit. Linguist. Comput. 2, 61–70 (1987)

    Article  Google Scholar 

  11. Abbasi, A., Chen, H.: Applying authorship analysis to extremist-group web forum messages. Intell. Syst. 20(5), 67–75 (2005)

    Article  Google Scholar 

  12. Argamon, S., Saric, M., Stein, S.: Style mining of electronic messages for multiple authorship discrimination: first results. In: Proceedings of 9th ACM SIGKDD, pp. 475–480 (2003)

    Google Scholar 

  13. Zhao, Y., and Zobel, J.: Effective and scalable authorship attribution using function words. 2nd Asia Information Retrieval Symposium (2005)

    Google Scholar 

  14. Koppel, M., Schler J.: Exploiting stylistic idiosyncrasies for authorship attribution. In: IJCAI Workshop on Computational Approaches to Style Analysis and Synthesis, pp. 69–72 (2003)

    Google Scholar 

  15. Argamon, S., et al.: Stylistic text classification using functional lexical features. J. Am. Soc. Inform. Sci. Technol. 58(6), 802–822 (2007)

    Article  Google Scholar 

  16. Peng, F., Shuurmans, D., Wang, S.: Augmenting naive Bayes classifiers with statistical language models. Inf. Retrieval J. 7(1), 317–345 (2004)

    Article  Google Scholar 

  17. Sanderson, C., Guenter, S.: Short text authorship attribution via sequence kernels, Markov chains and author unmasking: an investigation. In: Proceedings of the International Conference on Empirical Methods in Natural Language Engineering, pp. 482–491 (2006)

    Google Scholar 

  18. Kjell, B.: Discrimination of authorship using visualization. Inf. Process. Manag. 30(1), 141–150 (1994)

    Article  Google Scholar 

  19. Forsyth, R., Holmes, D.: Feature-finding for text classification. Lit. Linguist. Comput. 11(4), 163–174 (1996)

    Google Scholar 

  20. Peng, F., Shuurmans, D., Keselj, V., Wang, S.: Language independent authorship attribution using character level language models. In: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, pp. 267–274 (2003)

    Google Scholar 

  21. Keselj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. Pacific Association for Computational Linguistics, pp. 255–264 (2003)

    Google Scholar 

  22. Ouamour, S., Sayoud, H.: Authorship attribution of ancient texts written by ten arabic travelers using character N-Grams. CITS-2013, Athens, Greece, CITS (2013)

    Google Scholar 

  23. Jockers, M.L., Witten, D.M.: A comparative study of machine learning methods for authorship attribution. Lit. Linguist. Comput. 25(2), 215–223 (2010)

    Article  Google Scholar 

  24. Shaker, K.: Investigating features and techniques for Arabic authorship attribution, PhD thesis Heriot-Watt University (2012)

    Google Scholar 

  25. Stamatatos, E.: Author identification using imbalanced and limited training texts, text-based Information Retrieval, pp. 237–241 (2007)

    Google Scholar 

  26. Jain, A.K., Ross, A., Prabhakar, S.: An introduction to biometric recognition. Trans. Circ. Syst. Video Technol. 14(1), 4–20 (2004)

    Article  Google Scholar 

  27. Ouamour, S., Khennouf, S., Bourib, S., Hadjadj, H., Sayoud H.: Effect of the text size on stylometry-application on arabic religious texts. In: International Conference on Computer Science Applied Mathematics and Applications, pp 215–228, Vienna, Austria (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Halim Sayoud .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ouamour, S., Sayoud, H. (2018). A Comparative Survey of Authorship Attribution on Short Arabic Texts. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99579-3_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99578-6

  • Online ISBN: 978-3-319-99579-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics