Skip to main content

Authorship Analysis with Machine Learning

  • Living reference work entry
  • First Online:
Encyclopedia of Machine Learning and Data Science
  • 56 Accesses

Abstract

The Internet has evolved with the development of textual data. Crimes related to the use of textual data on the Internet have also increased. Attribution is a technique of text classification used to identify an unknown file by analyzing files from multiple authors. Authorship analysis is widely studied for identifying individuals based on their writing style extracted from their written discourses. Most studies reveal that humans leave (consciously or unconsciously) their personality traits and sociolinguistic characteristics in their writing. Authorship analysis methods range from manual, statistical, computational, and machine learning to deep learning. In the current Industry Evolution 4.0 (4IR) era, the digital world contains large amounts of data, including social media, mobile, network, and Internet of Things (IoT). The key to intelligence is analyzing this data and developing appropriate automated and smart applications using artificial intelligence (AI) and machine learning (ML) algorithms. According to the perspective of data computing and analysis, AI and ML have developed rapidly, and this data generally allows applications to operate intelligently. Moreover, ML typically provides a system that can automatically learn from experience without any special programming. Hence, it is often called the latest technology of the 4IR. This entryr studies machine learning, including supervised, unsupervised, semi-supervised, and reinforcement learning for authorship analysis. Recently developed learning models such as recurrent neural networks and bidirectional encoder representations from transformers are also explored. Further, this entry presents the open issues and emerging challenges associated with authorship learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Recommended Reading

  • Addabe Y, Abu Hammad Y, Ayyad N, Yahya A (2021) Association for Computing Machinery, New York, NY, USA, 6, 1–6. https://doi.org/10.1145/3485557.3485563

  • Altakrori M, Iqbal F, Benjamin CMF, Ding SHH, Tubaishat A (2018) Arabic authorship attribution: an extensive study on twitter posts. ACM Trans Asian Low-Resour Lang 18(1), p.51, https://doi.org/10.1145/3236391

  • Altakrori MH, Cheung JCK, Fung B (2021) The topic confusion task: A Novel Evaluation Scenario for Authorship Attribution. In Findings of the Association for Computational Linguistics: Punta Cana, Dominican Republic. Association for Computational Linguistics, pp. 4242–4256

    Google Scholar 

  • Argamon S, Koppel M, Pennebaker JW, Schler J (2009) Automatically profiling the author of an anonymous text. Commun ACM 52(2):119–123

    Article  Google Scholar 

  • Bo H, Ding SH, Fung B, Iqbal F (2021) ER-AE: differentially-private text generation for authorship anonymization. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 3997–4007

    Google Scholar 

  • Chen H, Huang Z, Li J, Zheng R (2006) A framework for authorship identification of online messages: writing-style features and classification techniques. JASIST:378–393

    Google Scholar 

  • Garí Y, Monge DA, Pacini E, Mateos C, Garino CG (2021) Reinforcement learning-based application autoscaling in the cloud: a survey. Eng Appl Artif Intell 102:104288

    Article  Google Scholar 

  • Hina M, Ali M, Javed AR, Ghabban F, Khan LA, Jalil Z (2021) Sefaced: semantic-based forensic analysis and classification of E-mail data using deep learning. IEEE Access 9:98398–98411

    Article  Google Scholar 

  • Mateless R, Tsur O, Moskovitch R (2021) Pkg2Vec: hierarchical package embedding for code authorship attribution. Futur Gener Comput Syst 116:49–60

    Article  Google Scholar 

  • Mekala S, Tippireddy RR, Bulusu VV (2018) A novel document representation approach for authorship attribution. Int J Intell Eng Syst 11(3):261–270

    Google Scholar 

  • Mendenhall T (1887) The characteristic curves of composition. Science IX:237–249

    Article  Google Scholar 

  • Mosteller F, Wallace DL (1964) Inference and disputed authorship: the federalist. Addison-Wesley

    MATH  Google Scholar 

  • Rathore DS, Choudhary A (2021) An efficient classification technique of data mining for predicting heart disease. In: Emerging technologies in data mining and information security. Springer, Singapore, pp 15–24

    Chapter  Google Scholar 

  • Reddy PB, Mohan TM, Raja PVK, Reddy TR (2020) A novel approach for authorship verification. In: Data engineering and communication technology. Springer, Singapore, pp 441–448

    Chapter  Google Scholar 

  • Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A (2020) Cybersecurity data science: an overview from a machine learning perspective. J Big Data 7(1):1–29

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farkhund Iqbal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Ahmed, W., Javed, A.R., Jalil, Z., Iqbal, F. (2022). Authorship Analysis with Machine Learning. In: Phung, D., Webb, G.I., Sammut, C. (eds) Encyclopedia of Machine Learning and Data Science. Springer, New York, NY. https://doi.org/10.1007/978-1-4899-7502-7_986-1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4899-7502-7_986-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4899-7502-7

  • Online ISBN: 978-1-4899-7502-7

  • eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics