Skip to main content

Vowel Disharmony in Czech Words and Stems

  • Chapter
  • First Online:
  • 483 Accesses

Abstract

This corpus study describes vowel phonotactics in Czech words. The results suggest that some probabilistic patterns are employed in Czech: some vowel combinations are overrepresented, while others are underrepresented. A syllable containing a short front vowel tends to be followed by a syllable with a long front vowel. A long front vowel is typically followed by a back vowel and a long back vowel tends to be followed by a short vowel; thus, an interesting circular dissimilative pattern can be observed. An explanation of the phenomena can be facilitated by the Shannonian theory of communication. The analysis was performed both on words and word stems (i.e, words without endings), obtaining different results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Paradigm in the Kuhnian sense (Kuhn, 1962).

  2. 2.

    Phonological Lexical Corpus, which is not a corpus in the traditional sense; it is a list of lexemes (available at http://www.ujc.cas.cz/phword) (Bičan, 2015c).

  3. 3.

    Details on the Hungarian National Corpus and the data are available at http://corpus.nytud.hu/mnsz/index_eng.html (Oravecz, Váradi, & Sass, 2014).

  4. 4.

    The full dataset for this study can be found at http://www.milicka.cz/kestazeni/vowels.zip.

  5. 5.

    As you can see, abbreviations were not excluded from the corpus. This is why some of the rare and underrepresented vowel pairs are instantiated by abbreviations; otherwise, their frequency would be even lower.

  6. 6.

    The black “smudge” near the /r/ vertex is a thick “loop edge.” This means that the /r/–/r/ pairs are really rare.

  7. 7.

    The examples show that there are some errors with stem extraction, namely ubrousek (‘napkin’) is stemmed as ubrous, due to the alternations in the (diminutive) suffix –ek (e.g, obdélníček (‘little rectangle nom sg’)—obdélníčku (‘little rectangle gen sg’)).

  8. 8.

    The (l → é), (ú → ou), (é → ú), and (ú → é) pairs are so rare within stems that there is no example for them in the corpus; the (r → r) example is a long interjection.

  9. 9.

    There might be other patterns that affect the word—its paradigm assignment, both diachronically and synchronically. Their effects might be even stronger than the effects of the phenomena under consideration, but this study does not focus on them.

  10. 10.

    As the number of the statistical units in our corpora is very large, even a small effect size causes statistically significant differences. For example, the overall number of short frontshort front pairs in SYN2010 corpus is 14,328,194 out of all 61,503,108 pairs. The same figure for the SYN2015 is 14,243,894 out of 60,963,320 pairs. According to Fisher’s test, the frequencies are significantly different (p < 0.001), while the real-life significance of the difference is quite low—the 95% confidence interval of the risk ratio lies between 0.9964 and 0.9977 (calculated according to Altman, 1990), which is very close to 1, i.e, the relative frequency of the specified vowel pair in the two corpora is close to being identical.

  11. 11.

    Here, we mean entropy in the Shannonian sense, i.e, \( H=-\sum \limits_{a\in A}f(a){\log}_2f(a) \), where A is set of all vowels in the language system. If the phonotactics are not taken into account, then the entropy of a vowel pair is just the doubled entropy of a single vowel.

  12. 12.

    The entropy of the vowel pair is calculated like the entropy of a single vowel, i.e, \( H=-\sum \limits_{a\in A}\sum \limits_{b\in A}f\left(a;b\right){\log}_2f\left(a;b\right), \) where A is set of all vowels in the language system.

  13. 13.

    Admittedly, this principle belongs to the generativist linguistic framework rather than corpus or quantitative linguistics, as it was developed to describe one of the possible transformations of “deep structure” into “surface structure.” But, it is nonetheless worth noting that even the generativist descriptions suggest that the phenomenon of Czech vowel disharmony is not an isolated linguistic process.

References

  • Altman, D. G. (1990). Practical statistics for medical research. Cleveland, OH: CRC Press.

    Google Scholar 

  • Altmann, G. (1980). Prolegomena to Menzerath’s law. In R. Grotjahn (Ed.), Glottometrika 2 (pp. 1–10). Bochum, Germany: Brockmeyer.

    Google Scholar 

  • Anderson, L. B. (1980). Using asymmetrical and gradient data in the study of vowel harmony. In R. M. Vago (Ed.), Issues in vowel harmony (pp. 271–340). Amsterdam, The Netherlands: John Benjamins.

    Chapter  Google Scholar 

  • Bičan, A. (2011). Phonotactics of Czech. Ph.D. thesis, Masaryk University, Brno, Czech Republic. Retrieved October 12, 2017, from https://theses.cz/id/eguqrt

  • Bičan, A. (2015b). Corpus-based analysis of the Czech syllable. In E. Guetiérrez Rubio (Ed.), Beiträge der Europäischen Slavistischen Linguistik (POLYSLAV) 18 (pp. 26–36). Munich, Germany: Harrasowitz Verlag.

    Google Scholar 

  • Bičan, A. (2015c). Fonologický lexikální korpus češtiny a slabičná struktura českého slova [Phonological Lexical Corpus of Czech Language and the Syllabic Structure of Czech Words]. Bohemica Olomucensia, 7(3-4), 45–59.

    Google Scholar 

  • Bičan, A. (2015a). Distribution of vocalic quantity in Czech. Grazer Linguistische Studien, 83, 133–138.

    Google Scholar 

  • Čermák, F., Doležalová-Spoustová, D., Hlaváčová, J., Hnátková, M., Jelínek, T., Kocek, J., et al. (2005). SYN2005: žánrově vyvážený korpus psané češtiny [SYN 2005: Genre-Balanced Corpus of Written Czech]. Praha, Slovakia: Ústav Českého národního korpusu FF UK Retrieved October 12, 2017, from http://www.korpus.cz

  • Cvrček, V., Čermáková, A., & Křen, M. (2016). Nová koncepce synchronních korpusů psané češtiny [New Conception of the Synchronic Corpora of Written Czech]. Slovo a slovesnost, 77(2), 83–101.

    Google Scholar 

  • Dankovičová, J. (1999). Czech. In Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet (pp. 70–74). Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Goldsmith, J. (1976). Autosegmental phonology. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA.

    Google Scholar 

  • Hnátková, M., Křen, M., Procházka, P., & Skoumalová, H. (2014). The SYN-series corpora of written Czech. Proceedings of the ninth international conference on Language Resources and Evaluation (LREC”14), 160–164.

    Google Scholar 

  • Johnson, D. C. (1980). Regular disharmony in Kirghiz. In R. M. Vago (Ed.), Issues in vowel harmony (pp. 89–100). Amsterdam, The Netherlands: John Benjamins.

    Chapter  Google Scholar 

  • Křen, M., Cvrček, V., Čapka, T., Čermáková, A., Hnátková, M., Chlumská, L., Jelínek, T., Kováříková, D., Petkevič, V., Procházka, P., Skoumalová, H., Škrabal, M., Truneček, P., Vondřička, P., & Zasina, A. (2016). SYN2015: Representative corpus of contemporary written Czech. Proceedings of the tenth international conference on Language Resources and Evaluation (LREC”16), 2522–2528.

    Google Scholar 

  • Křen, M., Bartoň, T., Cvrček, V., Hnátková, M.,Jelínek, T., Kocek, J., Novotná, R., Petkevič, V., Procházka, P., Schmiedtová, V., & Skoumalová, H. (2010). SYN2010: žánrově vyvážený korpus psané češtiny [SYN 2010: Genre-Balanced Corpus of Written Czech]. Praha, Slovakia: Ústav Českého národního korpusu FF UK. Retrieved October 12, 2017, from http://www.korpus.cz

  • Křen, M., Cvrček, V., Čapka, T., Čermáková, A., Hnátková, M., Chlumská, L., Jelínek, T., Kováříková, D., Petkevič, V., Procházka, P., Skoumalová, H., Škrabal, M., Truneček, P., Vondřička, P., & Zasina, A. J. (2015). SYN2015: reprezentativní korpus psané češtiny [SYN 2015: Representative Corpus of Written Czech]. Praha, Slovakia: Ústav Českého národního korpusu FF UK. Retrieved October 12, 2017, from http://www.korpus.cz

  • Kuhn, T. S. (1962). The structure of scientific revolutions. Chicago: University of Chicago Press.

    Google Scholar 

  • Leben, W. R. (1973). Suprasegmental phonology. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA.

    Google Scholar 

  • MacKay, D. J. C. (2003). Information theory, inference and learning algorithms. Cambridge, UK: Cambridge University Press.

    MATH  Google Scholar 

  • McCarthy, J. J. (1986). OCP effects: Gemination and antigemination. Linguistic Inquiry, 17(2), 207–263.

    Google Scholar 

  • Menzerath, P. (1928). Über einige phonetische Probleme. In Actes du premier Congres International de Linguistes. Leiden, Netherlands: Sijthoff.

    Google Scholar 

  • Milička, J. (2016). Teorie komunikace jakožto explanatorní princip přirozené víceúrovňové segmentace textů [The Theory of Communication as an Explanatory Principle for Natural Multilevel Text Segmentation]. Ph.D. thesis, Charles University, Prague, Czech Republic. Retrieved October 12, 2017, from https://is.cuni.cz/webapps/zzp/detail/104810

  • Nguyen, N., & Fagyal, Z. (2008). Acoustic aspects of vowel harmony in French. Journal of Phonetics, 36(1), 1–27.

    Article  Google Scholar 

  • Ohala, J. J. (1994). Towards a universal, phonetically-based, theory of vowel harmony. Third international conference on spoken language processing, 491–494.

    Google Scholar 

  • Oravecz, C., Váradi, T., & Sass, B. (2014). The Hungarian Gigaword Corpus. In: Proceedings of LREC 2014. http://www.lrec-conf.org/proceedings/lrec2014/pdf/681_Paper.pdf

  • Palková, Z. (1994). Fonetika a fonologie češtiny [Phonetics and Phonology of Czech]. Praha, Slovakia: Karolinum.

    Google Scholar 

  • Petkevič, V. (2014). Problémy automatické morfologické disambiguace češtiny [Problems of Automated Disambiguation of Czech Morphology]. Naše řeč, 97, 194–207.

    Google Scholar 

  • Poldauf, I. (1969). Máme v češtině harmonii samohlásek? [Do We Have Vowel Harmony in Czech?]. Naše řeč, 52, 201–209.

    Google Scholar 

  • Ringen, C. O., & Kontra, M. (1989). Hungarian neutral vowels. Lingua, 78(2-3), 181–191.

    Article  Google Scholar 

  • Rounds, C. (2001). Hungarian: An essential grammar. Hove, UK: Psychology Press.

    Book  Google Scholar 

  • Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423.

    Article  MathSciNet  Google Scholar 

  • Suomi, K., McQueen, J. M., & Cutler, A. (1997). Vowel harmony and speech segmentation in finnish. Journal of Memory and Language, 36(3), 422–444.

    Article  Google Scholar 

  • Vago, R. M. (1976). Theoretical implications of Hungarian vowel harmony. Linguistic Inquiry, 7(2), 243–263.

    Google Scholar 

Download references

Acknowledgments

This study was written within the programme Progres Q08 Czech National Corpus implemented at the Faculty of Arts, Charles University. We would like to thank Václav Cvrček and Masako Ueda Fidler (the editors of this volume), Alžběta Růžičková, Jakub Sláma, and Sadie Gold-Shapiro for their suggestions and comments.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Milička, J., Kalábová, H. (2018). Vowel Disharmony in Czech Words and Stems. In: Fidler, M., Cvrček, V. (eds) Taming the Corpus. Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-98017-1_3

Download citation

Publish with us

Policies and ethics