Abstract
The Internet has evolved with the development of textual data. Crimes related to the use of textual data on the Internet have also increased. Attribution is a technique of text classification used to identify an unknown file by analyzing files from multiple authors. Authorship analysis is widely studied for identifying individuals based on their writing style extracted from their written discourses. Most studies reveal that humans leave (consciously or unconsciously) their personality traits and sociolinguistic characteristics in their writing. Authorship analysis methods range from manual, statistical, computational, and machine learning to deep learning. In the current Industry Evolution 4.0 (4IR) era, the digital world contains large amounts of data, including social media, mobile, network, and Internet of Things (IoT). The key to intelligence is analyzing this data and developing appropriate automated and smart applications using artificial intelligence (AI) and machine learning (ML) algorithms. According to the perspective of data computing and analysis, AI and ML have developed rapidly, and this data generally allows applications to operate intelligently. Moreover, ML typically provides a system that can automatically learn from experience without any special programming. Hence, it is often called the latest technology of the 4IR. This entryr studies machine learning, including supervised, unsupervised, semi-supervised, and reinforcement learning for authorship analysis. Recently developed learning models such as recurrent neural networks and bidirectional encoder representations from transformers are also explored. Further, this entry presents the open issues and emerging challenges associated with authorship learning models.
Recommended Reading
Addabe Y, Abu Hammad Y, Ayyad N, Yahya A (2021) Association for Computing Machinery, New York, NY, USA, 6, 1–6. https://doi.org/10.1145/3485557.3485563
Altakrori M, Iqbal F, Benjamin CMF, Ding SHH, Tubaishat A (2018) Arabic authorship attribution: an extensive study on twitter posts. ACM Trans Asian Low-Resour Lang 18(1), p.51, https://doi.org/10.1145/3236391
Altakrori MH, Cheung JCK, Fung B (2021) The topic confusion task: A Novel Evaluation Scenario for Authorship Attribution. In Findings of the Association for Computational Linguistics: Punta Cana, Dominican Republic. Association for Computational Linguistics, pp. 4242–4256
Argamon S, Koppel M, Pennebaker JW, Schler J (2009) Automatically profiling the author of an anonymous text. Commun ACM 52(2):119–123
Bo H, Ding SH, Fung B, Iqbal F (2021) ER-AE: differentially-private text generation for authorship anonymization. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 3997–4007
Chen H, Huang Z, Li J, Zheng R (2006) A framework for authorship identification of online messages: writing-style features and classification techniques. JASIST:378–393
Garà Y, Monge DA, Pacini E, Mateos C, Garino CG (2021) Reinforcement learning-based application autoscaling in the cloud: a survey. Eng Appl Artif Intell 102:104288
Hina M, Ali M, Javed AR, Ghabban F, Khan LA, Jalil Z (2021) Sefaced: semantic-based forensic analysis and classification of E-mail data using deep learning. IEEE Access 9:98398–98411
Mateless R, Tsur O, Moskovitch R (2021) Pkg2Vec: hierarchical package embedding for code authorship attribution. Futur Gener Comput Syst 116:49–60
Mekala S, Tippireddy RR, Bulusu VV (2018) A novel document representation approach for authorship attribution. Int J Intell Eng Syst 11(3):261–270
Mendenhall T (1887) The characteristic curves of composition. Science IX:237–249
Mosteller F, Wallace DL (1964) Inference and disputed authorship: the federalist. Addison-Wesley
Rathore DS, Choudhary A (2021) An efficient classification technique of data mining for predicting heart disease. In: Emerging technologies in data mining and information security. Springer, Singapore, pp 15–24
Reddy PB, Mohan TM, Raja PVK, Reddy TR (2020) A novel approach for authorship verification. In: Data engineering and communication technology. Springer, Singapore, pp 441–448
Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A (2020) Cybersecurity data science: an overview from a machine learning perspective. J Big Data 7(1):1–29
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Ahmed, W., Javed, A.R., Jalil, Z., Iqbal, F. (2022). Authorship Analysis with Machine Learning. In: Phung, D., Webb, G.I., Sammut, C. (eds) Encyclopedia of Machine Learning and Data Science. Springer, New York, NY. https://doi.org/10.1007/978-1-4899-7502-7_986-1
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7502-7_986-1
Received:
Accepted:
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4899-7502-7
Online ISBN: 978-1-4899-7502-7
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering