Is This a Joke? Detecting Humor in Spanish Tweets

Castro, Santiago; Cubero, Matías; Garat, Diego; Moncecchi, Guillermo

doi:10.1007/978-3-319-47955-2_12

Santiago Castro¹⁷,
Matías Cubero¹⁷,
Diego Garat¹⁷ &
…
Guillermo Moncecchi¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10022))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

1336 Accesses
8 Citations
6 Altmetric

Abstract

While humor has been historically studied from a psychological, cognitive and linguistic standpoint, its study from a computational perspective is an area yet to be explored in Computational Linguistics. There exist some previous works, but a characterization of humor that allows its automatic recognition and generation is far from being specified. In this work we build a crowdsourced corpus of labeled tweets, annotated according to its humor value, letting the annotators subjectively decide which are humorous. A humor classifier for Spanish tweets is assembled based on supervised learning, reaching a precision of 84 % and a recall of 69 %.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://dle.rae.es/.
2.
Taken from https://twitter.com/chistetipico/status/430549009812291584. It has been slightly adapted to maintain an appropriate language.
3.
Perplexity is a measurement of how well a probability model predicts a sample. Low perplexity indicates the probability model is good at predicting the sample. It is defined as \(2^{- \frac{1}{n} \sum _{i=1}^n \log _2 p(x_i)}\), where \(x_1, \ldots , x_n\) are the sample data and \(p(x_i)\) is the probability assigned to each one.
4.
These tweets show traits of different variaties of the Spanish language among them. At least three countries were identified: Colombia, Spain and Uruguay.
5.
http://clasificahumor.com.
6.
https://play.google.com/store/apps/details?id=com.clasificahumor.android.
7.
Note that Kappa assumes a fixed number of annotators. For this reason, we measure it with 2 and 6, in order to give an idea of the agreement having a value with many tweets but few annotators, and other value with few tweets but many annotators.
8.
The codebase for the classifier and the corpus built can be found in https://github.com/pln-fing-udelar/pghumor.
9.
https://www.google.com.
10.
https://www.wiktionary.org.

References

International Journal of Humor Research: HUMOR (1988). http://www.degruyter.com/view/j/humr. Visited May 2015
Raskin, V.: Semantic Mechanisms of Humor. Springer, Heidelberg (1985)
Google Scholar
Mulder, M.P., Nijholt, A.: Humour Research: State of Art. Technical report TR-CTIT-02-34, Enschede: Centre for Telematics and Information Technology University of Twente (2002)
Google Scholar
Gruner, C.: The Game of Humor: A Comprehensive Theory of Why We Laugh. Transaction Publishers, Piscataway (2000)
Google Scholar
Freud, S., Strachey, J.: Jokes and Their Relation to the Unconscious (1905)
Google Scholar
Minsky, M.: Jokes and the logic of the cognitive unconscious. In: Vaina, L., Hintikka, J. (eds.) Cognitive Constraints on Communication, vol. 18, pp. 175–200. Springer, Heidelberg (1980)
Chapter Google Scholar
Rutter, J.: Stand-up as interaction: performance and audience in comedy venues. Citeseer (1997)
Google Scholar
Attardo, S., Raskin, V.: Script theory revis(it)ed: joke similarity and joke representation model. Humor: Int. J. Humor Res. 4, 293–347 (1991)
Article Google Scholar
Ruch, W., Attardo, S., Raskin, V.: Toward an empirical verification of the general theory of verbal humor. HUMOR: Int. J. Humor Res. 6(2), 123–136 (1993)
Article Google Scholar
Mihalcea, R., Strapparava, C.: Making computers laugh: investigations in automatic humor recognition. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT 2005, pp. 531–538. Association for Computational Linguistics, Vancouver (2005)
Google Scholar
Reyes, A., Buscaldi, D., Rosso, P.: An analysis of the impact of ambiguity on automatic humour recognition. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 162–169. Springer, Heidelberg (2009)
Chapter Google Scholar
Reyes, A., Rosso, P., Martí, M.A., Taulé, M.: Características y rasgos afectivos del humor: un estudio de reconocimiento automático del humor en textos escolares en catalán. Procesamiento del Lenguaje Nat. 43, 235–243 (2009)
Google Scholar
Strapparava, C., Valitutti, A.: WordNet affect: an affective extension of WordNet. In: LREC, pp. 1083–1086 (2004)
Google Scholar
Sjöbergh, J., Araki, K.: Recognizing humor without recognizing meaning. In: Masulli, F., Mitra, S., Pasi, G. (eds.) WILF 2007. LNCS (LNAI), vol. 4578, pp. 469–476. Springer, Heidelberg (2007)
Chapter Google Scholar
Basili, R., Zanzotto, F.M.: Parsing engineering and empirical robustness. Nat. Lang. Eng. 8(3), 97–120 (2002)
Google Scholar
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Mihalcea, R.F., Pulman, S.: Characterizing humour: an exploration of features in humorous texts. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 337–347. Springer, Heidelberg (2007)
Chapter Google Scholar
Mihalcea, R., Strapparava, C.: Learning to laugh (automatically): computational models for humor recognition. Comput. Intell. 22(2), 126–142 (2006)
Article MathSciNet Google Scholar
Padro, L., Stanilovsky, E.: FreeLing 3.0: towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012), Istanbul, Turkey (2012)
Google Scholar
Mihalcea, R., Strapparava, C.: Bootstrapping for fun: web-based construction of large data sets for humor recognition. In: Proceedings of the Workshop on Negotiation, Behaviour and Language (FINEXIN 2005), pp. 84–93 (2005)
Google Scholar
Gonzalez-Agirre, A., Laparra, E., Rigau, G.: Multilingual central repository version 3.0: upgrading a very large lexical knowledge base. In: Proceedings of the 6th Global WordNet Conference (GWC 2012), Matsue (2012)
Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Article MATH Google Scholar
Reese, S., Boleda, G., Cuadros, M., Padró, L., Rigau, G.: Wikicorpus: a word-sense disambiguated multilingual wikipedia corpus. In: Proceedings of 7th Language Resources and Evaluation Conference (LREC 2010), La Valleta, Malta (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad de la República, Montevideo, Uruguay
Santiago Castro, Matías Cubero, Diego Garat & Guillermo Moncecchi

Authors

Santiago Castro
View author publications
You can also search for this author in PubMed Google Scholar
Matías Cubero
View author publications
You can also search for this author in PubMed Google Scholar
Diego Garat
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Moncecchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santiago Castro .

Editor information

Editors and Affiliations

INAOE , Tonantzintla, Mexico
Manuel Montes y Gómez
Astrofisica Optica y Electronica, INAOE , Puebla, Mexico
Hugo Jair Escalante
Universidad Nacional de Costa Rica , Heredia, Costa Rica
Alberto Segura
Universidad Nacional de Costa Rica , Heredia, Costa Rica
Juan de Dios Murillo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Castro, S., Cubero, M., Garat, D., Moncecchi, G. (2016). Is This a Joke? Detecting Humor in Spanish Tweets. In: Montes y Gómez, M., Escalante, H., Segura, A., Murillo, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2016. IBERAMIA 2016. Lecture Notes in Computer Science(), vol 10022. Springer, Cham. https://doi.org/10.1007/978-3-319-47955-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-47955-2_12
Published: 14 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47954-5
Online ISBN: 978-3-319-47955-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics