Saturation Tests in Application to Validation of Opinion Corpora: A Tool for Corpora Processing

  • Zygmunt VetulaniEmail author
  • Marta Witkowska
  • Suleyman Menken
  • Umut Canbolat
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10930)


Opinion processing has recently gained much interest among computational linguists, public relation experts, marketing companies, and politicians. Studies of the natural language expression of opinions, desires, emotions, and related phenomena require appropriate tools and methodologies. We propose tools for collection of empirical data in the form of a corpus, limiting our research field to customers’ written opinions about widely used on-line booking services in the area of hotel reservations (via In this paper, we present the corpus acquisition procedure and our data acquisition tool, as well as discuss our decisions about the selection of the source data. We also present some limitations of our proposal and propose a validation methodology for the resulting corpora.


Text corpora Language resources Opinion processing Corpora validation Saturation tests 


  1. 1.
    Collins English Dictionary—Complete & Unabridged 2012 Digital Edition; © William Collins Sons & Co. Ltd. 1979, 1986 © HarperCollins Publishers (1998, 2000, 2003, 2005, 2006, 2007, 2009, 2012)Google Scholar
  2. 2.
    Charaudeau, P., Maingueneau, D.: Dictionnaire d’Analyse du Discours. Seuil, Paris (2002)Google Scholar
  3. 3.
    Stoyanov, V., Cardie, C., Litman, D., Wiebe, J.: Evaluating an opinion annotation scheme using a new multi-perspective question and answer corpus. In: Shanahan, J.G., Qu, Y., Wiebe, J. (eds.) Computing Attitude and Affect in Text: Theory and Applications. The Information Retrieval Series, vol. 20, pp. 77–91. Springer, Dordrecht (2006)CrossRefGoogle Scholar
  4. 4.
    Ptaszynski, M., Rzepka, R., Araki, K., Momouchi, Y.: Automatically annotating a five-billion-word corpus of Japanese blogs for affect and sentiment analysis. In: Proceedings of the 3rd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, Jeju, Republic of Korea, pp. 89–98. Association for Computational Linguistics, Stroudsburg (2012)Google Scholar
  5. 5.
    Esuli, A., Sebastiani, F.: SentiWordNet: a publicly available lexical resource for opinion mining. In: Proceedings of the 5th Conference on Language Resources and Evaluation, LREC 2006, pp. 417–422. European Language Resources Association, Genoa (2006)Google Scholar
  6. 6.
    Vetulani, Z., Vetulani G., Kochanowski, B.: Recent advances in development of a lexicon-grammar of Polish: PolNet 3.0. In: Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation, LREC 2016, pp. 2851–2854. European Language Resources Association, Paris (2016)Google Scholar
  7. 7.
    Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 17–23 May 2010, Valletta, Malta, pp. 1320–1326. European Language Resources Association, Genoa (2010)Google Scholar
  8. 8.
    Read, J.: Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Knight, K., Ng, H.T., Oflazer, K. (eds.) 43rd Annual Meeting of the Association of Computational Linguistics 2005, Proceedings of the Conference, University of Michigan. The Association for Computer Linguistics, New Brunswick (2005)Google Scholar
  9. 9.
    McEnry, T., Hardie, A.: Corpus Linguistics: Method. Theory and Practice. Cambridge University Press, Cambridge (2012)Google Scholar
  10. 10.
    Kittredge, R.: Semantic processing of texts in restricted sublanguage. Comput. Math Appl. 9(1), 45–58 (1983)CrossRefGoogle Scholar
  11. 11.
    Vetulani, Z.: Linguistic problems in the theory of man-machine communication in natural language. Universitätsverlag Dr, N. Brockmeyer, Bochum (1989)Google Scholar
  12. 12.
    Muller, Ch.: Peut-on estimer l’étendue d’un lexique? Cah. Lexicol. 27, 3–29 (1975)Google Scholar
  13. 13.
    Legieżyńska, A.: Julia Hartwig. Wdzięczność. Wydawnictwo Uniwersytetu Łódzkiego, Łódź (in Polish) (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Zygmunt Vetulani
    • 1
    Email author
  • Marta Witkowska
    • 1
  • Suleyman Menken
    • 2
  • Umut Canbolat
    • 2
  1. 1.Adam Mickiewicz University in PoznańPoznańPoland
  2. 2.University of KocaeliİzmitTurkey

Personalised recommendations