Abstract
In this paper, we deal with the problem of authorship attribution (AA) on short Arabic texts. So, we make a survey on a set of several features and classifiers that are employed for the task of AA. This investigation uses characters, character bigrams, character trigrams, character tetragrams, words, word bigrams and rare words. The AA is ensured by 4 different measures, 3 classifiers (Multi-Layer Perceptron (MLP), Support Vector Machines (SVM) and Linear Regression (LR)) and a new proposed fusion called VBF (i.e. Vote Based Fusion). The evaluation is done on short Arabic texts extracted from the AAAT dataset (AA of Ancient Arabic Texts). Although the task of AA is known to be difficult on short texts, the different results have revealed interesting information on the performances of the features and classification techniques on Arabic text data. For instance, character-based features appear to be better than word-based features for short texts. Furthermore, the proposed VBF fusion provided high performances with an accuracy of 90% of good AA, which is higher than the score of the original classifier using only one feature. Globally, the results of this investigation shed light on the efficiency and pertinency of several features and classifiers in AA of short Arabic texts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sayoud, H.: Author discrimination between the Holy Quran and Prophet’s statements. Lit. Linguist. Comput. 27(4), 427–444 (2012)
Chowdhury, H.A., Bhattacharyya, D.K.: Plagiarism: taxonomy, tools and detection techniques. In: Paper of the 19th National Convention on Knowledge, Library and Information Networking (NACLIN 2016) held at Tezpur University, Assam, India (2016)
Sayoud, H.: Segmental analysis based authorship discrimination between the Holy Quran and Prophet’s statements. Can. Soc. Digit. Hum., Digital Studies Journal (2015)
Tambouratzis, G., Hairetakis, G., Markantonatou, S., Carayannis, G.: Applying the SOM model to text classification according to register and stylistic content. Int. J. Neural Syst. 13(1), 1–11 (2003)
Holmes, D.I.: Authorship attribution. Comput. Humanit. 28, 87–106 (1994)
Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Automatic text categorization in terms of genre and author. Comput. Linguist. 26(4), 471–495 (2000)
Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)
Mendenhall, T.C.: The characteristic curves of composition. Science 9, 237–249 (1887)
Argamon S., Levitan, S.: Measuring the usefulness of function words for authorship attribution. In: Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing (2005)
Burrows, J.F.: Word patterns and story shapes: the statistical analysis of narrative style. Lit. Linguist. Comput. 2, 61–70 (1987)
Abbasi, A., Chen, H.: Applying authorship analysis to extremist-group web forum messages. Intell. Syst. 20(5), 67–75 (2005)
Argamon, S., Saric, M., Stein, S.: Style mining of electronic messages for multiple authorship discrimination: first results. In: Proceedings of 9th ACM SIGKDD, pp. 475–480 (2003)
Zhao, Y., and Zobel, J.: Effective and scalable authorship attribution using function words. 2nd Asia Information Retrieval Symposium (2005)
Koppel, M., Schler J.: Exploiting stylistic idiosyncrasies for authorship attribution. In: IJCAI Workshop on Computational Approaches to Style Analysis and Synthesis, pp. 69–72 (2003)
Argamon, S., et al.: Stylistic text classification using functional lexical features. J. Am. Soc. Inform. Sci. Technol. 58(6), 802–822 (2007)
Peng, F., Shuurmans, D., Wang, S.: Augmenting naive Bayes classifiers with statistical language models. Inf. Retrieval J. 7(1), 317–345 (2004)
Sanderson, C., Guenter, S.: Short text authorship attribution via sequence kernels, Markov chains and author unmasking: an investigation. In: Proceedings of the International Conference on Empirical Methods in Natural Language Engineering, pp. 482–491 (2006)
Kjell, B.: Discrimination of authorship using visualization. Inf. Process. Manag. 30(1), 141–150 (1994)
Forsyth, R., Holmes, D.: Feature-finding for text classification. Lit. Linguist. Comput. 11(4), 163–174 (1996)
Peng, F., Shuurmans, D., Keselj, V., Wang, S.: Language independent authorship attribution using character level language models. In: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, pp. 267–274 (2003)
Keselj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. Pacific Association for Computational Linguistics, pp. 255–264 (2003)
Ouamour, S., Sayoud, H.: Authorship attribution of ancient texts written by ten arabic travelers using character N-Grams. CITS-2013, Athens, Greece, CITS (2013)
Jockers, M.L., Witten, D.M.: A comparative study of machine learning methods for authorship attribution. Lit. Linguist. Comput. 25(2), 215–223 (2010)
Shaker, K.: Investigating features and techniques for Arabic authorship attribution, PhD thesis Heriot-Watt University (2012)
Stamatatos, E.: Author identification using imbalanced and limited training texts, text-based Information Retrieval, pp. 237–241 (2007)
Jain, A.K., Ross, A., Prabhakar, S.: An introduction to biometric recognition. Trans. Circ. Syst. Video Technol. 14(1), 4–20 (2004)
Ouamour, S., Khennouf, S., Bourib, S., Hadjadj, H., Sayoud H.: Effect of the text size on stylometry-application on arabic religious texts. In: International Conference on Computer Science Applied Mathematics and Applications, pp 215–228, Vienna, Austria (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Ouamour, S., Sayoud, H. (2018). A Comparative Survey of Authorship Attribution on Short Arabic Texts. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_50
Download citation
DOI: https://doi.org/10.1007/978-3-319-99579-3_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)