Abstract
While humor has been historically studied from a psychological, cognitive and linguistic standpoint, its study from a computational perspective is an area yet to be explored in Computational Linguistics. There exist some previous works, but a characterization of humor that allows its automatic recognition and generation is far from being specified. In this work we build a crowdsourced corpus of labeled tweets, annotated according to its humor value, letting the annotators subjectively decide which are humorous. A humor classifier for Spanish tweets is assembled based on supervised learning, reaching a precision of 84 % and a recall of 69 %.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Taken from https://twitter.com/chistetipico/status/430549009812291584. It has been slightly adapted to maintain an appropriate language.
- 3.
Perplexity is a measurement of how well a probability model predicts a sample. Low perplexity indicates the probability model is good at predicting the sample. It is defined as \(2^{- \frac{1}{n} \sum _{i=1}^n \log _2 p(x_i)}\), where \(x_1, \ldots , x_n\) are the sample data and \(p(x_i)\) is the probability assigned to each one.
- 4.
These tweets show traits of different variaties of the Spanish language among them. At least three countries were identified: Colombia, Spain and Uruguay.
- 5.
- 6.
- 7.
Note that Kappa assumes a fixed number of annotators. For this reason, we measure it with 2 and 6, in order to give an idea of the agreement having a value with many tweets but few annotators, and other value with few tweets but many annotators.
- 8.
The codebase for the classifier and the corpus built can be found in https://github.com/pln-fing-udelar/pghumor.
- 9.
- 10.
References
International Journal of Humor Research: HUMOR (1988). http://www.degruyter.com/view/j/humr. Visited May 2015
Raskin, V.: Semantic Mechanisms of Humor. Springer, Heidelberg (1985)
Mulder, M.P., Nijholt, A.: Humour Research: State of Art. Technical report TR-CTIT-02-34, Enschede: Centre for Telematics and Information Technology University of Twente (2002)
Gruner, C.: The Game of Humor: A Comprehensive Theory of Why We Laugh. Transaction Publishers, Piscataway (2000)
Freud, S., Strachey, J.: Jokes and Their Relation to the Unconscious (1905)
Minsky, M.: Jokes and the logic of the cognitive unconscious. In: Vaina, L., Hintikka, J. (eds.) Cognitive Constraints on Communication, vol. 18, pp. 175–200. Springer, Heidelberg (1980)
Rutter, J.: Stand-up as interaction: performance and audience in comedy venues. Citeseer (1997)
Attardo, S., Raskin, V.: Script theory revis(it)ed: joke similarity and joke representation model. Humor: Int. J. Humor Res. 4, 293–347 (1991)
Ruch, W., Attardo, S., Raskin, V.: Toward an empirical verification of the general theory of verbal humor. HUMOR: Int. J. Humor Res. 6(2), 123–136 (1993)
Mihalcea, R., Strapparava, C.: Making computers laugh: investigations in automatic humor recognition. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT 2005, pp. 531–538. Association for Computational Linguistics, Vancouver (2005)
Reyes, A., Buscaldi, D., Rosso, P.: An analysis of the impact of ambiguity on automatic humour recognition. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 162–169. Springer, Heidelberg (2009)
Reyes, A., Rosso, P., Martí, M.A., Taulé, M.: Características y rasgos afectivos del humor: un estudio de reconocimiento automático del humor en textos escolares en catalán. Procesamiento del Lenguaje Nat. 43, 235–243 (2009)
Strapparava, C., Valitutti, A.: WordNet affect: an affective extension of WordNet. In: LREC, pp. 1083–1086 (2004)
Sjöbergh, J., Araki, K.: Recognizing humor without recognizing meaning. In: Masulli, F., Mitra, S., Pasi, G. (eds.) WILF 2007. LNCS (LNAI), vol. 4578, pp. 469–476. Springer, Heidelberg (2007)
Basili, R., Zanzotto, F.M.: Parsing engineering and empirical robustness. Nat. Lang. Eng. 8(3), 97–120 (2002)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Mihalcea, R.F., Pulman, S.: Characterizing humour: an exploration of features in humorous texts. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 337–347. Springer, Heidelberg (2007)
Mihalcea, R., Strapparava, C.: Learning to laugh (automatically): computational models for humor recognition. Comput. Intell. 22(2), 126–142 (2006)
Padro, L., Stanilovsky, E.: FreeLing 3.0: towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012), Istanbul, Turkey (2012)
Mihalcea, R., Strapparava, C.: Bootstrapping for fun: web-based construction of large data sets for humor recognition. In: Proceedings of the Workshop on Negotiation, Behaviour and Language (FINEXIN 2005), pp. 84–93 (2005)
Gonzalez-Agirre, A., Laparra, E., Rigau, G.: Multilingual central repository version 3.0: upgrading a very large lexical knowledge base. In: Proceedings of the 6th Global WordNet Conference (GWC 2012), Matsue (2012)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Reese, S., Boleda, G., Cuadros, M., Padró, L., Rigau, G.: Wikicorpus: a word-sense disambiguated multilingual wikipedia corpus. In: Proceedings of 7th Language Resources and Evaluation Conference (LREC 2010), La Valleta, Malta (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Castro, S., Cubero, M., Garat, D., Moncecchi, G. (2016). Is This a Joke? Detecting Humor in Spanish Tweets. In: Montes y Gómez, M., Escalante, H., Segura, A., Murillo, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2016. IBERAMIA 2016. Lecture Notes in Computer Science(), vol 10022. Springer, Cham. https://doi.org/10.1007/978-3-319-47955-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-47955-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47954-5
Online ISBN: 978-3-319-47955-2
eBook Packages: Computer ScienceComputer Science (R0)