A New Document Author Representation for Authorship Attribution

López-Monroy, Adrián Pastor; Montes-y-Gómez, Manuel; Villaseñor-Pineda, Luis; Carrasco-Ochoa, Jesús Ariel; Martínez-Trinidad, José Fco.

doi:10.1007/978-3-642-31149-9_29

Adrián Pastor López-Monroy²⁰,
Manuel Montes-y-Gómez²⁰,
Luis Villaseñor-Pineda²⁰,
Jesús Ariel Carrasco-Ochoa²⁰ &
…
José Fco. Martínez-Trinidad²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7329))

Included in the following conference series:

Mexican Conference on Pattern Recognition

1569 Accesses
3 Citations

Abstract

This paper proposes a novel representation for Authorship Attribution (AA), based on Concise Semantic Analysis (CSA), which has been successfully used in Text Categorization (TC). Our approach for AA, called Document Author Representation (DAR), builds document vectors in a space of authors, calculating the relationship between textual features and authors. In order to evaluate our approach, we compare the proposed representation with conventional approaches and previous works using the c50 corpus. We found that DAR can be very useful in AA tasks, because it provides good performance on imbalanced data, getting comparable or better accuracy results.

Download to read the full chapter text

Chapter PDF

A New Approach for Authorship Attribution

Improving Cross-Topic Authorship Attribution: The Role of Pre-Processing

Empirical Evaluations Using Character and Word N-Grams on Authorship Attribution for Telugu Text

Keywords

References

Zhixing, L., Zhongyang, X., Yufang, Z., Chunyong, L., Kuan, L.: Fast text categorization using concise semantic analysis. Pattern Recognition Letters 32(3), 441–448 (2010)
Google Scholar
Stamatatos, E.: A survey on modern authorship attribution methods. Journal of the American Society for Information Science and Technology 60(3), 538–556 (2009)
Article Google Scholar
Plakias, S., Stamatatos, E.: Tensor Space Models for Authorship Identification. In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds.) SETN 2008. LNCS (LNAI), vol. 5138, pp. 239–249. Springer, Heidelberg (2008)
Chapter Google Scholar
Frantzeskou, G., Stamatatos, E., Gritzalis, S., Chaski, C.E., Howald, B.S.: Identifying authorship by byte-level n-grams: the source code author profile (SCAP). Int. Journal of Digital Evidence 6(1) (2007)
Google Scholar
Deerwester, S.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Article Google Scholar
Gabrilovich, E., Markovitch, S.: Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research 34, 443–498 (2009)
MATH Google Scholar
Schler, J., Koppel, M., Argamon, S.: Computational methods in authorship attribution. Journal of the American Society for Information Science 60(1), 9–26 (2009)
Article Google Scholar
Miranda-García, A., Calle-Martín, J.: Yule’s k characteristic K revisited. Language Resources and Evaluation 39(4), 287–294 (2005)
Article Google Scholar
Stamatatos, E.: Author identification: Using text sampling to handle the class imbalance problem. Information Processing and Management 44(2), 790–799 (2008)
Article Google Scholar
Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. Notebook for PAN at CLEF 2011 (2011)
Google Scholar
Solorio, T., Pillay, S., Raghavan, S., Montes-y-Gómez, M.: Modality specific meta features for authorship attribution in web forum posts. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 156–164 (2011)
Google Scholar
Abbasi, A., Chen, H.: Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Transactions on Information Systems 26(2), Article 7 (2008)
Google Scholar
Cai, D., He, X., Wen, J.R., Han, J., Ma, W.Y.: Support tensor machines for text categorization. Technical report, UIUCDCS-R-2006-2714, University of Illinois at Urbana-Champaign (2006)
Google Scholar
Pavelec, D., Justino, E., Batista, L.V., Oliveira, L.S.: Author identification using writer-dependent and writer-independent strategies. In: Proceedings of the 2008 ACM Symposium on Applied Computing - SAC 2008, pp. 414–418 (2008)
Google Scholar
Houvardas, J., Stamatatos, E.: N-Gram Feature Selection for Authorship Identification. In: Euzenat, J., Domingue, J. (eds.) AIMSA 2006. LNCS (LNAI), vol. 4183, pp. 77–86. Springer, Heidelberg (2006)
Chapter Google Scholar
Lewis, D., Yang, Y., Rose, T., Li, F.: RCV1: a new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, National Institute for Astrophysics, Optics and Electronics, Luis Enrique Erro #1, Tonantzintla, Puebla, Mexico
Adrián Pastor López-Monroy, Manuel Montes-y-Gómez, Luis Villaseñor-Pineda, Jesús Ariel Carrasco-Ochoa & José Fco. Martínez-Trinidad

Authors

Adrián Pastor López-Monroy
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Montes-y-Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Luis Villaseñor-Pineda
View author publications
You can also search for this author in PubMed Google Scholar
Jesús Ariel Carrasco-Ochoa
View author publications
You can also search for this author in PubMed Google Scholar
José Fco. Martínez-Trinidad
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Optics and Electronics (INAOE), Computer Science Department, National Institute for Astrophysics, Luis Enrique Erro No. 1, Sta. Maria Tonantzintla, 72840, Puebla, Mexico
Jesús Ariel Carrasco-Ochoa
Optics and Electronics (INAOE), Computer Science Department, National Institute of Astrophysics, Luis Enrique Erro No. 1, Sta. Maria Tonantzintla, 72840, Puebla, Mexico
José Francisco Martínez-Trinidad
Faculty of Computer Sciences, Autonomous University of Puebla, Av. San Claudio y 14 Sur, Ciudad Universitaria, C.P. 7257, Puebla, Mexico
José Arturo Olvera López
Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, 110 Eighth Street, 12180, Troy, NY, USA
Kim L. Boyer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

López-Monroy, A.P., Montes-y-Gómez, M., Villaseñor-Pineda, L., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F. (2012). A New Document Author Representation for Authorship Attribution. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera López, J.A., Boyer, K.L. (eds) Pattern Recognition. MCPR 2012. Lecture Notes in Computer Science, vol 7329. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31149-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-31149-9_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31148-2
Online ISBN: 978-3-642-31149-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

A New Document Author Representation for Authorship Attribution

Abstract

Chapter PDF

Similar content being viewed by others

A New Approach for Authorship Attribution

Improving Cross-Topic Authorship Attribution: The Role of Pre-Processing

Empirical Evaluations Using Character and Word N-Grams on Authorship Attribution for Telugu Text

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

A New Document Author Representation for Authorship Attribution

Abstract

Chapter PDF

Similar content being viewed by others

A New Approach for Authorship Attribution

Improving Cross-Topic Authorship Attribution: The Role of Pre-Processing

Empirical Evaluations Using Character and Word N-Grams on Authorship Attribution for Telugu Text

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation