Impact of Term Weight Measures for Author Identification

Sreenivas, M.; Raghunadha Reddy, T.; Vishnu Vardhan, B.

doi:10.1007/978-981-13-1580-0_8

M. Sreenivas¹⁷,
T. Raghunadha Reddy¹⁸ &
B. Vishnu Vardhan¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 815))

840 Accesses

Abstract

The rapidly growing data in the Web result in stolen, unidentified, and fraudulent data. Identification of such data is of a prime objective for forensic departments, researchers, and governments. In this context, authorship analysis is very useful to reveal the truth by analyzing the text. Authorship analysis is observing the properties of a text to predict authorship of a document. Stylometry is the root for authorship analysis, which is a linguistic research field that exploits the machine learning techniques as well as knowledge of statistics. Authorship Attribution is a type of authorship analysis technique, which is aimed at recognizing the author of an anonymous text within a closed set of authors or subjects. Most of the researchers in Authorship Attribution approaches proposed various set of stylistic features to differentiate the authors based on style of writing. It was observed from the literature the accuracy of author prediction was not satisfactory with stylistic features. In this paper, the experimentation carried out with various term weight measures identified in various text processing domains to predict the author of a new document. The results show that the term weight measures obtained good accuracies for author prediction when compared with most of the existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Efstathios Stamatatos. “A survey of modern authorship attribution methods”, Journal of the American Society for Information Science and Technology, 03/2009.
Google Scholar
Juola, P.: Authorship attribution. Found. Trends Inf. Retr. 1 (2006) 233–334.
Article Google Scholar
Ludovic Tanguy, Franck Sajous, Basilio Calderone, and Nabil Hathout. Authorship attribution: using rich linguistic features when training data is scarce, CLEF 2012 Evaluation Labs and Workshop, 17–20 September, Rome, Italy, September 2012.
Google Scholar
Stefan Ruseti and Traian Rebedea. Authorship Identification Using a Reduced Set of Linguistic Features—Notebook for PAN at CLEF 2012. CLEF 2012 Evaluation Labs and Workshop, 17–20 September, Rome, Italy, September 2012.
Google Scholar
Navot Akiva. Authorship and Plagiarism Detection Using Binary BOW Features, CLEF 2012 Evaluation Labs and Workshop, 17–20 September, Rome, Italy, September 2012. ISBN 978-88-904810-3-1. ISSN 2038-4963.
Google Scholar
Darnes Vilariño, Esteban Castillo, David Pinto, Saul León, and Mireya Tovar. Baseline Approaches for the Authorship Identification Task, Notebook for PAN at CLEF 2011. CLEF 2011 Evaluation Labs and Workshop.
Google Scholar
Hugo Jair Escalante. EPSMS and the Document Occurrence Representation for Authorship Identification, Notebook for PAN at CLEF 2011. CLEF 2011 Evaluation Labs and Workshop.
Google Scholar
George K. Mikros and Kostas Perifanos. Authorship identification in large email collections: Experiments using features that belong to different linguistic levels, Notebook for PAN at CLEF 2011. CLEF 2011 Evaluation Labs and Workshop.
Google Scholar
Dennis, S.F., “The Design and Testing of a Fully Automated Indexing-Searching System for Documents Consisting of Expository Text”, in Informational Retrieval: A Critical review, g. Schecter, editor, Thompson Book Company, Washington D.C., 1967, pages 67–94.
Google Scholar
Singhal, A., Buckley, C. and Mitra, M., “Pivoted document length normalization”, in Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, ACM., (1996), 21–29.
Google Scholar
Lan, M., Tan, C. L., Su, J., & Lu, Y. (2009). Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31 (4), 721–735. http://doi.org/10.1109/TPAMI.2008.110.
Article Google Scholar
Wei Zong, Feng Wu, Lap-Keung Chu, Domenic Sculli, “A discriminative and semantic feature selection method for text categorization”, International Journal of production Economics, Elsevier, Jan 2015, pp. 215–222.
Google Scholar
Liu, Y., Loh, H. T., & Sun, A. (2009). Imbalanced text classification: A term weighting approach. Expert Systems with Applications, 36 (1), 690–701. http://doi.org/10.1016/j.eswa.2007.10.042.
Article Google Scholar
Ren, F., & Sohrab, M. G. (2013). Class-indexing-based term weighting for automatic text classification. Information Sciences, 236, 109–125. http://doi.org/10.1016/j.ins.2013.02.029.
Article Google Scholar
Raghunadha Reddy T, Vishnu Vardhan B, Vijayapal Reddy P, “Profile specific Document Weighted approach using a New Term Weighting Measure for Author Profiling ”, International Journal of Intelligent Engineering and Systems, 9 (4), pp. 136–146, Nov 2016.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Research Scholar of RUK, Sreenidhi Institute of Science and Technology, Hyderabad, India
M. Sreenivas
Dept of IT, Vardhaman College of Engineering, Hyderabad, India
T. Raghunadha Reddy
Dept of CSE, JNTUH Jagtiyal, Karimnagar, India
B. Vishnu Vardhan

Authors

M. Sreenivas
View author publications
You can also search for this author in PubMed Google Scholar
T. Raghunadha Reddy
View author publications
You can also search for this author in PubMed Google Scholar
B. Vishnu Vardhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to T. Raghunadha Reddy .

Editor information

Editors and Affiliations

School of Computer and Information Sciences, University of Hyderabad, Hyderabad, Telangana, India
Raju Surampudi Bapi
Department Computer Science and Engineering, MLR Institute of Technology, Hyderabad, Telangana, India
Koppula Srinivas Rao
IDRBT, Hyderabad, Telangana, India
Munaga V. N. K. Prasad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sreenivas, M., Raghunadha Reddy, T., Vishnu Vardhan, B. (2019). Impact of Term Weight Measures for Author Identification. In: Bapi, R., Rao, K., Prasad, M. (eds) First International Conference on Artificial Intelligence and Cognitive Computing . Advances in Intelligent Systems and Computing, vol 815. Springer, Singapore. https://doi.org/10.1007/978-981-13-1580-0_8

Download citation

DOI: https://doi.org/10.1007/978-981-13-1580-0_8
Published: 05 November 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1579-4
Online ISBN: 978-981-13-1580-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics