Abstract
The rapidly growing data in the Web result in stolen, unidentified, and fraudulent data. Identification of such data is of a prime objective for forensic departments, researchers, and governments. In this context, authorship analysis is very useful to reveal the truth by analyzing the text. Authorship analysis is observing the properties of a text to predict authorship of a document. Stylometry is the root for authorship analysis, which is a linguistic research field that exploits the machine learning techniques as well as knowledge of statistics. Authorship Attribution is a type of authorship analysis technique, which is aimed at recognizing the author of an anonymous text within a closed set of authors or subjects. Most of the researchers in Authorship Attribution approaches proposed various set of stylistic features to differentiate the authors based on style of writing. It was observed from the literature the accuracy of author prediction was not satisfactory with stylistic features. In this paper, the experimentation carried out with various term weight measures identified in various text processing domains to predict the author of a new document. The results show that the term weight measures obtained good accuracies for author prediction when compared with most of the existing approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Efstathios Stamatatos. “A survey of modern authorship attribution methods”, Journal of the American Society for Information Science and Technology, 03/2009.
Juola, P.: Authorship attribution. Found. Trends Inf. Retr. 1 (2006) 233–334.
Ludovic Tanguy, Franck Sajous, Basilio Calderone, and Nabil Hathout. Authorship attribution: using rich linguistic features when training data is scarce, CLEF 2012 Evaluation Labs and Workshop, 17–20 September, Rome, Italy, September 2012.
Stefan Ruseti and Traian Rebedea. Authorship Identification Using a Reduced Set of Linguistic Features—Notebook for PAN at CLEF 2012. CLEF 2012 Evaluation Labs and Workshop, 17–20 September, Rome, Italy, September 2012.
Navot Akiva. Authorship and Plagiarism Detection Using Binary BOW Features, CLEF 2012 Evaluation Labs and Workshop, 17–20 September, Rome, Italy, September 2012. ISBN 978-88-904810-3-1. ISSN 2038-4963.
Darnes Vilariño, Esteban Castillo, David Pinto, Saul León, and Mireya Tovar. Baseline Approaches for the Authorship Identification Task, Notebook for PAN at CLEF 2011. CLEF 2011 Evaluation Labs and Workshop.
Hugo Jair Escalante. EPSMS and the Document Occurrence Representation for Authorship Identification, Notebook for PAN at CLEF 2011. CLEF 2011 Evaluation Labs and Workshop.
George K. Mikros and Kostas Perifanos. Authorship identification in large email collections: Experiments using features that belong to different linguistic levels, Notebook for PAN at CLEF 2011. CLEF 2011 Evaluation Labs and Workshop.
Dennis, S.F., “The Design and Testing of a Fully Automated Indexing-Searching System for Documents Consisting of Expository Text”, in Informational Retrieval: A Critical review, g. Schecter, editor, Thompson Book Company, Washington D.C., 1967, pages 67–94.
Singhal, A., Buckley, C. and Mitra, M., “Pivoted document length normalization”, in Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, ACM., (1996), 21–29.
Lan, M., Tan, C. L., Su, J., & Lu, Y. (2009). Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31 (4), 721–735. http://doi.org/10.1109/TPAMI.2008.110.
Wei Zong, Feng Wu, Lap-Keung Chu, Domenic Sculli, “A discriminative and semantic feature selection method for text categorization”, International Journal of production Economics, Elsevier, Jan 2015, pp. 215–222.
Liu, Y., Loh, H. T., & Sun, A. (2009). Imbalanced text classification: A term weighting approach. Expert Systems with Applications, 36 (1), 690–701. http://doi.org/10.1016/j.eswa.2007.10.042.
Ren, F., & Sohrab, M. G. (2013). Class-indexing-based term weighting for automatic text classification. Information Sciences, 236, 109–125. http://doi.org/10.1016/j.ins.2013.02.029.
Raghunadha Reddy T, Vishnu Vardhan B, Vijayapal Reddy P, “Profile specific Document Weighted approach using a New Term Weighting Measure for Author Profiling ”, International Journal of Intelligent Engineering and Systems, 9 (4), pp. 136–146, Nov 2016.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sreenivas, M., Raghunadha Reddy, T., Vishnu Vardhan, B. (2019). Impact of Term Weight Measures for Author Identification. In: Bapi, R., Rao, K., Prasad, M. (eds) First International Conference on Artificial Intelligence and Cognitive Computing . Advances in Intelligent Systems and Computing, vol 815. Springer, Singapore. https://doi.org/10.1007/978-981-13-1580-0_8
Download citation
DOI: https://doi.org/10.1007/978-981-13-1580-0_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1579-4
Online ISBN: 978-981-13-1580-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)