‘twazn me!!! ;(’ Automatic Authorship Analysis of Micro-Blogging Messages

Sousa Silva, Rui; Laboreiro, Gustavo; Sarmento, Luís; Grant, Tim; Oliveira, Eugénio; Maia, Belinda

doi:10.1007/978-3-642-22327-3_16

Rui Sousa Silva^19,21,
Gustavo Laboreiro^20,22,
Luís Sarmento^20,22,
Tim Grant¹⁹,
Eugénio Oliveira²⁰ &
…
Belinda Maia²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6716))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

1940 Accesses
32 Citations
1 Altmetric

Abstract

In this paper we propose a set of stylistic markers for automatically attributing authorship to micro-blogging messages. The proposed markers include highly personal and idiosyncratic editing options, such as ‘emoticons’, interjections, punctuation, abbreviations and other low-level features. We evaluate the ability of these features to help discriminate the authorship of Twitter messages among three authors. For that purpose, we train SVM classifiers to learn stylometric models for each author based on different combinations of the groups of stylistic features that we propose. Results show a relatively good-performance in attributing authorship of micro-blogging messages (F = 0.63) using this set of features, even when training the classifiers with as few as 60 examples from each author (F = 0.54). Additionally, we conclude that emoticons are the most discriminating features in these groups.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Grant, T.: Txt 4n6: Idiolect free authorship analysis. In: Coulthard, M., Johnson, A. (eds.) Routledge Handbook of Forensic Linguistics. Routledge, New York (2010)
Google Scholar
de Vel, O., Anderson, A., Corney, M., Mohay, G.: Mining e-mail content for author identification forensics, vol. 30, pp. 55–64. ACM, New York (2001)
Google Scholar
Park, T., Li, J., Zhao, H., Chau, M.: Analyzing writing styles of bloggers with different opinions. In: Proceedings of the 19th Annual Workshop on Information Technologies and Systems (WITS 2009), Phoenix, Arizona, USA, December 14-15 (2009)
Google Scholar
Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers’ age and gender. In: International AAAI Conference on Weblogs and Social Media (2009)
Google Scholar
Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology 60(1), 9–26 (2009)
Article Google Scholar
Jindal, N., Liu, B.: Opinion spam and analysis. In: WSDM 2008: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 219–230. ACM, New York (2008)
Google Scholar
Pavelac, D., Justino, E., Olivera, L.S.: Author identification using stylometric features. Intelligencia Artificial,Revista Iberoamericana de IA 11(36), 59–66 (2007)
Google Scholar
Sousa-Silva, R., Sarmento, L., Grant, T., Oliveira, E.C., Maia, B.: Comparing sentence-level features for authorship analysis in portuguese. In: PROPOR, pp. 51–54 (2010)
Google Scholar
Hirst, G., Feiguina, O.: Bigrams of syntactic labels for authorship discrimination of short texts. Lit. Linguist. Computing 22(4), 405–417 (2007)
Google Scholar
Abbasi, A., Chen, H.: Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26(2), 1–29 (2008)
Article Google Scholar
Layton, R., Watters, P., Dazeley, R.: Authorship attribution for twitter in 140 characters or less. In: Workshop Cybercrime and Trustworthy Computing, pp. 1–8 (2010)
Google Scholar
Raghavan, S., Kovashka, A., Mooney, R.: Authorship attribution using probabilistic context-free grammars, pp. 38–42 (2010)
Google Scholar
Eagleson, R.: Forensic analysis of personal written texts: a case study. In: Gibbons, J. (ed.) Forensic Linguistics: An Introduction to Language in the Justice System, pp. 362–373. Longman, Harlow (1994)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Forensic Linguistics at Aston University, UK
Rui Sousa Silva & Tim Grant
Faculdade de Engenharia da Universidade do Porto - DEI - LIACC, Portugal
Gustavo Laboreiro, Luís Sarmento & Eugénio Oliveira
CLUP - Centro de Linguística da Universidade do Porto, Portugal
Rui Sousa Silva & Belinda Maia
SAPO Labs Porto, Portugal
Gustavo Laboreiro & Luís Sarmento

Authors

Rui Sousa Silva
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo Laboreiro
View author publications
You can also search for this author in PubMed Google Scholar
Luís Sarmento
View author publications
You can also search for this author in PubMed Google Scholar
Tim Grant
View author publications
You can also search for this author in PubMed Google Scholar
Eugénio Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Belinda Maia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, University of Alicante, 03080, Alicante, Spain
Rafael Muñoz
Department of Software and Computing Systems, University of Alicante, Aptdo. de Correos 99, 03080, Alicante, Spain
Andrés Montoyo
CNAM- Laboratoire Cédric, 292 Rue St. Martin, 75141, Paris Cedex 03, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sousa Silva, R., Laboreiro, G., Sarmento, L., Grant, T., Oliveira, E., Maia, B. (2011). ‘twazn me!!! ;(’ Automatic Authorship Analysis of Micro-Blogging Messages. In: Muñoz, R., Montoyo, A., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2011. Lecture Notes in Computer Science, vol 6716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22327-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-22327-3_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22326-6
Online ISBN: 978-3-642-22327-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics