Advertisement

Author Attribution Using Network Motifs

Conference paper
Part of the Springer Proceedings in Complexity book series (SPCOM)

Abstract

The problem of recognizing the author of unknown text has concerned linguistics and scientists for a long period of time. The authorship of the famous Federalist Papers remained unknown until Mosteller and Wallace solved the mystery in 1964 using the frequency of functional words. After that, many statistical and computational studies were published in the fields of authorship attribution and stylistic analysis. Complex networks, gaining much popularity in recent years, may have a role to play in this field. Furthermore, several studies show that network motifs, defined as statistically significant subgraphs within a network, have the ability to distinguish networks from distinctive disciplines. In this paper, we succeed in the utilization of network motifs to distinguish the writing style of 10 famous authors. Using statistical learning algorithms, we achieved an accuracy of 77% in classifying 100 books written by ten different authors, which outperformed the results from other works. We believe that our method proved the importance of network motifs in author attribution.

Keywords

Word co-occurrence networks Author attribution Network motif Classification 

References

  1. 1.
    Akimushkin, C., Amancio, D.R., Oliveira Jr., O.N.: Text authorship identified using the dynamics of word co-occurrence networks. PloS one 12(1), e0170527 (2017)CrossRefGoogle Scholar
  2. 2.
    Al Rozz, Y., Hamoodat, H., Menezes, R.: Characterization of written languages using structural features from common corpora. In: Workshop on Complex Networks CompleNet, pp. 161–173. Springer, Berlin (2017)Google Scholar
  3. 3.
    Amancio, D.R.: A complex network approach to stylometry. PloS one 10(8), e0136076 (2015)CrossRefGoogle Scholar
  4. 4.
    Arefin, A.S., Vimieiro, R., Riveros, C., Craig, H., Moscato, P.: An information theoretic clustering approach for unveiling authorship affinities in Shakespearean era plays and poems. PloS one 9(10), e111445 (2014)ADSCrossRefGoogle Scholar
  5. 5.
    Biber, D.: Variation Across Speech and Writing. Cambridge University Press, Cambridge (1991)Google Scholar
  6. 6.
    Biemann, C., Krumov, L., Roos, S., Weihe, K.: Network motifs are a powerful tool for semantic distinction. Towards a Theoretical Framework for Analyzing Complex Linguistic Networks, pp. 83–105. Springer, Berlin (2016)Google Scholar
  7. 7.
    Cabatbat, J.J.T., Monsanto, J.P., Tapang, G.A.: Preserved network metrics across translated texts. Int. J. Mod. Phys. C 25(02), 1350092 (2014)ADSCrossRefGoogle Scholar
  8. 8.
    Chen, X., Hao, P., Chandramouli, R., Subbalakshmi, K.P.: Authorship similarity detection from email messages. In: International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 375–386. Springer, Berlin (2011)Google Scholar
  9. 9.
    Li, J., Xiao, F., Zhou, J., Yang, Z.: Motifs and motif generalization in Chinese word networks. Procedia Comput. Sci. 9, 550–556 (2012)CrossRefGoogle Scholar
  10. 10.
    Marinho, V.Q., Hirst, G., Amancio, D.R.: Authorship attribution via network motifs identification. In: 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), pp. 355–360. IEEE (2016)Google Scholar
  11. 11.
    Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)ADSCrossRefGoogle Scholar
  12. 12.
    Milo, R., Itzkovitz, S., Kashtan, N., Levitt, R., Shen-Orr, S., Ayzenshtat, I., Sheffer, M., Alon, U.: Superfamilies of evolved and designed networks. Science 303(5663), 1538–1542 (2004)ADSCrossRefGoogle Scholar
  13. 13.
    Mosteller, F., Wallace, D.L.: Inference in an authorship problem: a comparative study of discrimination methods applied to the authorship of the disputed federalist papers. J. Am. Stat. Assoc. 58(302), 275–309 (1963)MATHGoogle Scholar
  14. 14.
    Nunberg, G.: The Linguistics of Punctuation. CSLI Lecture Notes. Cambridge University Press, Cambridge (1990)Google Scholar
  15. 15.
    Rizvić, H., Martinčić-Ipšić, S., Meštrović, A.: Network motifs analysis of croatian literature. arXiv:1411.4960 (2014)
  16. 16.
    Rocha, A., Scheirer, W.J., Forstall, C.W., Cavalcante, T., Theophilo, A., Shen, B., Carvalho, A.R.B., Stamatatos, E.: Authorship attribution for social media forensics. IEEE Trans. Inf. Forensic Secur. 12(1), 5–33 (2017)CrossRefGoogle Scholar
  17. 17.
    Segarra, S., Eisen, M., Ribeiro, A.: Authorship attribution through function word adjacency networks. IEEE Trans. Sig. Process. 63(20), 5464–5478 (2015)ADSMathSciNetCrossRefGoogle Scholar
  18. 18.
    Stamatatos, E.: A survey of modern authorship attribution methods. J. Assoc. Inf. Sci. Technol. 60(3), 538–556 (2009)CrossRefGoogle Scholar
  19. 19.
    Tran, N.T.L., DeLuccia, L., McDonald, A.F., Huang, C.-H.: Cross-disciplinary detection and analysis of network motifs. Bioinform. Biol. Insights 9, 49 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.BioComplex Laboratory, Computer ScienceFlorida Institute of TechnologyMelbourneUSA

Personalised recommendations