Visual Analytics Based Authorship Discrimination Using Gaussian Mixture Models and Self Organising Maps: Application on Quran and Hadith

  • Halim SayoudEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10868)


An interesting way to analyse the authorship authenticity of a document, is the use of stylometry. However, the use of conventional features and classifiers has some disadvantages such as the automatic authorship decision, which usually gives us a speechless authorship classification without (often) any way to measure or interpret the consistency of the results.

In this paper, we present a visual analytics based approach for the task of authorship discrimination. A specific application is dedicated to the authorship comparison between two ancient religious books: the Quran and Hadith. In fact, an important raising question is: could these ancient books be written by the same Author?

Thus, seven types of features are combined and normalized by PCA reduction and three visual analytical clustering methods are employed and commented on, namely: Principal Component Analysis, Gaussian Mixture Models and Self Organizing Maps.

The new visual analytical approach appears interesting, since it does not only show the distinction between the author styles, but also sheds light on how consistent was that distinction (i.e. visually).

Concerning the discrimination application on the ancient religious books, the results have shown the appearance of two separated clusters: namely a Quran cluster and Hadith cluster. The clusters distinction corresponds to a clear authorship difference between the two investigated documents, which implies that the two books (i.e. Quran and Hadith) come from two different Authors.


Artificial intelligence Data mining Visual analytics Natural language processing Authorship attribution Quran authorship 


  1. 1.
    Blascheck, T., John, M., Kurzhals, K., Koch, S., Ertl, T.: VA2: a visual analytics approach for evaluating visual analytics applications. IEEE Trans. Vis. Comput. Graph. 22(1), 61–70 (2016)CrossRefGoogle Scholar
  2. 2.
    Sayoud, H.: Segmental analysis based authorship discrimination between the Holy Quran and Prophet’s statements. Digital Stud. J. 2014–2015 (2015)Google Scholar
  3. 3.
    Sayoud, H.: A visual analytics based investigation on the authorship of the Holy Quran. In: International Conference on Information Visualization Theory and Applications (IVAPP’2015), 11–14 March 2015, pp. 177–181 (2015)Google Scholar
  4. 4.
    Ibrahim, I.A.: A brief illustrated guide to understanding Islam. Library of Congress, Darussalam Publishers, Houston.
  5. 5.
    Sayoud, H.: Author discrimination between the Holy Quran and Prophet’s statements. Literary Linguist. Comput. 27(4), 427–444 (2012)Google Scholar
  6. 6.
    Norusis, M.: Cluster analysis. In: SPSS 17.0 Statistical Procedures Companion, Marija Norusis, pp. 361–391. Pearson editor (2008). Chap. 16Google Scholar
  7. 7.
    Ellis, G., Mansmann, F.: VisMaster, Visual Analytics. In: Mastering the Information Age. Scientific Coordinator of VisMaster. Daniel Keim Jörn Kohlhammer (2010). Chap. 2Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.USTHB UniversityAlgiersAlgeria

Personalised recommendations