Skip to main content

Is This a Joke? Detecting Humor in Spanish Tweets

  • Conference paper
  • First Online:
Advances in Artificial Intelligence - IBERAMIA 2016 (IBERAMIA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10022))

Included in the following conference series:

Abstract

While humor has been historically studied from a psychological, cognitive and linguistic standpoint, its study from a computational perspective is an area yet to be explored in Computational Linguistics. There exist some previous works, but a characterization of humor that allows its automatic recognition and generation is far from being specified. In this work we build a crowdsourced corpus of labeled tweets, annotated according to its humor value, letting the annotators subjectively decide which are humorous. A humor classifier for Spanish tweets is assembled based on supervised learning, reaching a precision of 84 % and a recall of 69 %.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://dle.rae.es/.

  2. 2.

    Taken from https://twitter.com/chistetipico/status/430549009812291584. It has been slightly adapted to maintain an appropriate language.

  3. 3.

    Perplexity is a measurement of how well a probability model predicts a sample. Low perplexity indicates the probability model is good at predicting the sample. It is defined as \(2^{- \frac{1}{n} \sum _{i=1}^n \log _2 p(x_i)}\), where \(x_1, \ldots , x_n\) are the sample data and \(p(x_i)\) is the probability assigned to each one.

  4. 4.

    These tweets show traits of different variaties of the Spanish language among them. At least three countries were identified: Colombia, Spain and Uruguay.

  5. 5.

    http://clasificahumor.com.

  6. 6.

    https://play.google.com/store/apps/details?id=com.clasificahumor.android.

  7. 7.

    Note that Kappa assumes a fixed number of annotators. For this reason, we measure it with 2 and 6, in order to give an idea of the agreement having a value with many tweets but few annotators, and other value with few tweets but many annotators.

  8. 8.

    The codebase for the classifier and the corpus built can be found in https://github.com/pln-fing-udelar/pghumor.

  9. 9.

    https://www.google.com.

  10. 10.

    https://www.wiktionary.org.

References

  1. International Journal of Humor Research: HUMOR (1988). http://www.degruyter.com/view/j/humr. Visited May 2015

  2. Raskin, V.: Semantic Mechanisms of Humor. Springer, Heidelberg (1985)

    Google Scholar 

  3. Mulder, M.P., Nijholt, A.: Humour Research: State of Art. Technical report TR-CTIT-02-34, Enschede: Centre for Telematics and Information Technology University of Twente (2002)

    Google Scholar 

  4. Gruner, C.: The Game of Humor: A Comprehensive Theory of Why We Laugh. Transaction Publishers, Piscataway (2000)

    Google Scholar 

  5. Freud, S., Strachey, J.: Jokes and Their Relation to the Unconscious (1905)

    Google Scholar 

  6. Minsky, M.: Jokes and the logic of the cognitive unconscious. In: Vaina, L., Hintikka, J. (eds.) Cognitive Constraints on Communication, vol. 18, pp. 175–200. Springer, Heidelberg (1980)

    Chapter  Google Scholar 

  7. Rutter, J.: Stand-up as interaction: performance and audience in comedy venues. Citeseer (1997)

    Google Scholar 

  8. Attardo, S., Raskin, V.: Script theory revis(it)ed: joke similarity and joke representation model. Humor: Int. J. Humor Res. 4, 293–347 (1991)

    Article  Google Scholar 

  9. Ruch, W., Attardo, S., Raskin, V.: Toward an empirical verification of the general theory of verbal humor. HUMOR: Int. J. Humor Res. 6(2), 123–136 (1993)

    Article  Google Scholar 

  10. Mihalcea, R., Strapparava, C.: Making computers laugh: investigations in automatic humor recognition. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT 2005, pp. 531–538. Association for Computational Linguistics, Vancouver (2005)

    Google Scholar 

  11. Reyes, A., Buscaldi, D., Rosso, P.: An analysis of the impact of ambiguity on automatic humour recognition. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 162–169. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Reyes, A., Rosso, P., Martí, M.A., Taulé, M.: Características y rasgos afectivos del humor: un estudio de reconocimiento automático del humor en textos escolares en catalán. Procesamiento del Lenguaje Nat. 43, 235–243 (2009)

    Google Scholar 

  13. Strapparava, C., Valitutti, A.: WordNet affect: an affective extension of WordNet. In: LREC, pp. 1083–1086 (2004)

    Google Scholar 

  14. Sjöbergh, J., Araki, K.: Recognizing humor without recognizing meaning. In: Masulli, F., Mitra, S., Pasi, G. (eds.) WILF 2007. LNCS (LNAI), vol. 4578, pp. 469–476. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  15. Basili, R., Zanzotto, F.M.: Parsing engineering and empirical robustness. Nat. Lang. Eng. 8(3), 97–120 (2002)

    Google Scholar 

  16. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  17. Mihalcea, R.F., Pulman, S.: Characterizing humour: an exploration of features in humorous texts. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 337–347. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  18. Mihalcea, R., Strapparava, C.: Learning to laugh (automatically): computational models for humor recognition. Comput. Intell. 22(2), 126–142 (2006)

    Article  MathSciNet  Google Scholar 

  19. Padro, L., Stanilovsky, E.: FreeLing 3.0: towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012), Istanbul, Turkey (2012)

    Google Scholar 

  20. Mihalcea, R., Strapparava, C.: Bootstrapping for fun: web-based construction of large data sets for humor recognition. In: Proceedings of the Workshop on Negotiation, Behaviour and Language (FINEXIN 2005), pp. 84–93 (2005)

    Google Scholar 

  21. Gonzalez-Agirre, A., Laparra, E., Rigau, G.: Multilingual central repository version 3.0: upgrading a very large lexical knowledge base. In: Proceedings of the 6th Global WordNet Conference (GWC 2012), Matsue (2012)

    Google Scholar 

  22. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)

    Article  MATH  Google Scholar 

  23. Reese, S., Boleda, G., Cuadros, M., Padró, L., Rigau, G.: Wikicorpus: a word-sense disambiguated multilingual wikipedia corpus. In: Proceedings of 7th Language Resources and Evaluation Conference (LREC 2010), La Valleta, Malta (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santiago Castro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Castro, S., Cubero, M., Garat, D., Moncecchi, G. (2016). Is This a Joke? Detecting Humor in Spanish Tweets. In: Montes y Gómez, M., Escalante, H., Segura, A., Murillo, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2016. IBERAMIA 2016. Lecture Notes in Computer Science(), vol 10022. Springer, Cham. https://doi.org/10.1007/978-3-319-47955-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47955-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47954-5

  • Online ISBN: 978-3-319-47955-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics