Skip to main content

Impact of Term Weight Measures for Author Identification

  • Conference paper
  • First Online:
Book cover First International Conference on Artificial Intelligence and Cognitive Computing

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 815))

  • 840 Accesses

Abstract

The rapidly growing data in the Web result in stolen, unidentified, and fraudulent data. Identification of such data is of a prime objective for forensic departments, researchers, and governments. In this context, authorship analysis is very useful to reveal the truth by analyzing the text. Authorship analysis is observing the properties of a text to predict authorship of a document. Stylometry is the root for authorship analysis, which is a linguistic research field that exploits the machine learning techniques as well as knowledge of statistics. Authorship Attribution is a type of authorship analysis technique, which is aimed at recognizing the author of an anonymous text within a closed set of authors or subjects. Most of the researchers in Authorship Attribution approaches proposed various set of stylistic features to differentiate the authors based on style of writing. It was observed from the literature the accuracy of author prediction was not satisfactory with stylistic features. In this paper, the experimentation carried out with various term weight measures identified in various text processing domains to predict the author of a new document. The results show that the term weight measures obtained good accuracies for author prediction when compared with most of the existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Efstathios Stamatatos. “A survey of modern authorship attribution methods”, Journal of the American Society for Information Science and Technology, 03/2009.

    Google Scholar 

  2. Juola, P.: Authorship attribution. Found. Trends Inf. Retr. 1 (2006) 233–334.

    Article  Google Scholar 

  3. Ludovic Tanguy, Franck Sajous, Basilio Calderone, and Nabil Hathout. Authorship attribution: using rich linguistic features when training data is scarce, CLEF 2012 Evaluation Labs and Workshop, 17–20 September, Rome, Italy, September 2012.

    Google Scholar 

  4. Stefan Ruseti and Traian Rebedea. Authorship Identification Using a Reduced Set of Linguistic Features—Notebook for PAN at CLEF 2012. CLEF 2012 Evaluation Labs and Workshop, 17–20 September, Rome, Italy, September 2012.

    Google Scholar 

  5. Navot Akiva. Authorship and Plagiarism Detection Using Binary BOW Features, CLEF 2012 Evaluation Labs and Workshop, 17–20 September, Rome, Italy, September 2012. ISBN 978-88-904810-3-1. ISSN 2038-4963.

    Google Scholar 

  6. Darnes Vilariño, Esteban Castillo, David Pinto, Saul León, and Mireya Tovar. Baseline Approaches for the Authorship Identification Task, Notebook for PAN at CLEF 2011. CLEF 2011 Evaluation Labs and Workshop.

    Google Scholar 

  7. Hugo Jair Escalante. EPSMS and the Document Occurrence Representation for Authorship Identification, Notebook for PAN at CLEF 2011. CLEF 2011 Evaluation Labs and Workshop.

    Google Scholar 

  8. George K. Mikros and Kostas Perifanos. Authorship identification in large email collections: Experiments using features that belong to different linguistic levels, Notebook for PAN at CLEF 2011. CLEF 2011 Evaluation Labs and Workshop.

    Google Scholar 

  9. Dennis, S.F., “The Design and Testing of a Fully Automated Indexing-Searching System for Documents Consisting of Expository Text”, in Informational Retrieval: A Critical review, g. Schecter, editor, Thompson Book Company, Washington D.C., 1967, pages 67–94.

    Google Scholar 

  10. Singhal, A., Buckley, C. and Mitra, M., “Pivoted document length normalization”, in Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, ACM., (1996), 21–29.

    Google Scholar 

  11. Lan, M., Tan, C. L., Su, J., & Lu, Y. (2009). Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31 (4), 721–735. http://doi.org/10.1109/TPAMI.2008.110.

    Article  Google Scholar 

  12. Wei Zong, Feng Wu, Lap-Keung Chu, Domenic Sculli, “A discriminative and semantic feature selection method for text categorization”, International Journal of production Economics, Elsevier, Jan 2015, pp. 215–222.

    Google Scholar 

  13. Liu, Y., Loh, H. T., & Sun, A. (2009). Imbalanced text classification: A term weighting approach. Expert Systems with Applications, 36 (1), 690–701. http://doi.org/10.1016/j.eswa.2007.10.042.

    Article  Google Scholar 

  14. Ren, F., & Sohrab, M. G. (2013). Class-indexing-based term weighting for automatic text classification. Information Sciences, 236, 109–125. http://doi.org/10.1016/j.ins.2013.02.029.

    Article  Google Scholar 

  15. Raghunadha Reddy T, Vishnu Vardhan B, Vijayapal Reddy P, “Profile specific Document Weighted approach using a New Term Weighting Measure for Author Profiling ”, International Journal of Intelligent Engineering and Systems, 9 (4), pp. 136–146, Nov 2016.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to T. Raghunadha Reddy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sreenivas, M., Raghunadha Reddy, T., Vishnu Vardhan, B. (2019). Impact of Term Weight Measures for Author Identification. In: Bapi, R., Rao, K., Prasad, M. (eds) First International Conference on Artificial Intelligence and Cognitive Computing . Advances in Intelligent Systems and Computing, vol 815. Springer, Singapore. https://doi.org/10.1007/978-981-13-1580-0_8

Download citation

Publish with us

Policies and ethics