Advertisement

Morphological Richness of Text

  • Radek Čech
  • Miroslav Kubát
Chapter
Part of the Quantitative Methods in the Humanities and Social Sciences book series (QMHSS)

Abstract

This study proposes a method for measuring the morphological richness of text. The method enables us to characterize the morphological complexity of a text (or a corpus). It is based on a computation of the difference between two measurements — the vocabulary richness of lemmas and the vocabulary richness of word forms. The greater the difference, the higher the morphological complexity of a text. The Moving Average Type Token Ratio (MATTR) is used for the computation of vocabulary richness. We hypothesize that the proposed indicator, known as Moving Average Morphological Richness (MAMR), should reflect the style of a text, and could therefore be used in stylometry. To verify this assumption, MAMR is applied in analyses of both genre and authorship.

Keywords

Morphological richness Vocabulary richness Stylometry Genre Authorship Czech language 

References

  1. Baerman, M., Brown, D., & Corbett, G. (Eds.). (2015). Understanding and measuring morphological complexity. New York: Oxford University Press.Google Scholar
  2. Bane, M. (2008). Quantifying and measuring morphological complexity. In C. B. Chang & H. J. Haynie (Eds.), Proceedings of the 26th West Coast Conference on formal linguistics (pp. 69–76). Somerville, MA: Cascadilla Proceedings Project.Google Scholar
  3. Bentz C., Ruzsics, T., Koplenig, A., Samardžić, T. (2016). Comparison between morphological complexity measures: typological data vs. language corpora. In Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC) at the 26th International Conference on Computational Linguistics (COLING 2016), Osaka.Google Scholar
  4. Březina, V., Pallotti, G. (2016). Morphological complexity in written L2 texts. Second language research, DOI:  https://doi.org/10.1177/0267658316643125.
  5. Čech, R. (2016). Tematická koncentrace textu v češtině [Thematic concentration of text in Czech]. Praha, Czech Republic: ÚFAL.Google Scholar
  6. Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian knot: The moving-average type-token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100.CrossRefGoogle Scholar
  7. Cvrček, V., & Chlumská, L. (2015). Simplification in translated Czech: A new approach to type-token ratio. Russian Linguistics, 39(3), 309–325.CrossRefGoogle Scholar
  8. Cvrček, V., & Václavík, J. (2015). Jednoznačnost a kontext. Kvantitativní studie [Unambiguity and context. A quantitative study]. Korpus—gramatika—axiologie, 11(2015), 28–41.Google Scholar
  9. Guiraud, P. (1954). Les catactères stitistiques du vocabulaire. Paris, France: Presses Universitaires de France.Google Scholar
  10. Indrisano, R., & Squire, J. R. (Eds.). (2000). Perspectives on writing: Research, theory, and practice. Newark, NJ: International Reading Association.Google Scholar
  11. Juola, P. (2008). Authorship attribution. Foundations and Trends in Information Retrieval, 1(3), 233–334.CrossRefGoogle Scholar
  12. Kettunen, K. (2014). Can type-token ratio be used to show morphological complexity of languages? Journal of Quantitative Linguistics, 21(3), 223–245.CrossRefGoogle Scholar
  13. Kubát, M. (2016). Kvantitativní analýza žánrů [Quantitative analysis of genres]. Ostrava, Czech Republic: Ostravská univerzita.Google Scholar
  14. Kubát, M., Matlach, V., & Čech, R. (2014). QUITA–Quantitative index text analyzer. Lüdensheid, Germany: RAM.Google Scholar
  15. Kubát, M., & Milička, J. (2013). Vocabulary richness measure in genres. Journal of Quantitative Linguistics, 20(4), 339–349.CrossRefGoogle Scholar
  16. Pinker, S. (2010). The language instinct: How the mind creates language. New York: Harper Collins.Google Scholar
  17. Popescu, I. I., & Altmann, G. (2007). Writer’s view of text generation. Glottometrics, 15, 71–81.Google Scholar
  18. Popescu, I.-I., Altmann, G., Grzybek, P., Jayaram, B. D., Köhler, R., Krupa, V., et al. (2009). Word frequency studies. Berlin, Germany: Mouton de Gruyter.Google Scholar
  19. Popescu, I. I., Čech, R., & Altmann, G. (2011). The lambda-structure of texts. Lüdenscheid, Germany: RAM.Google Scholar
  20. Scott, M. (2013). WordSmith tools. Liverpool, UK: Lexical Analysis Software.Google Scholar
  21. Xanthos, A., Laaha, S., Gillis, S., Stephany, U., Aksu-Koç, A., Christofidou, A., et al. (2011). On the role of morphological richness in the early development of noun and verb inflection. First Language, 31(4), 461–479.CrossRefGoogle Scholar
  22. Yule, G. U. (1944). The statistical study of literary vocabulary. Cambridge: The University Press.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Radek Čech
    • 1
  • Miroslav Kubát
    • 1
  1. 1.University of OstravaOstravaCzech Republic

Personalised recommendations