Advertisement

Russian Linguistics

, Volume 40, Issue 1, pp 11–31 | Cite as

A stylometric approach to the study of differences between standard variants of Bosnian/Croatian/Serbian, or: is the Hobbit in Serbian more Hobbit or more Serbian?

  • Ruprecht von Waldenfels
  • Maciej Eder
Article
  • 133 Downloads

Abstract

The article uses a stylometric approach to study differences between standard variants of the pluricentric standard language Bosnian/Croatian/Serbian in a corpus of originals and translations from other languages. Three experiments are reported. The first two serve to show that choice of the Croatian vs. Serbian variant is not the most important factor shaping frequency profiles of translations; rather, author-specific and other stylistic factors have a stronger impact. For the third experiment, a classifier is trained and its factors are analyzed to pinpoint variant-specific differences in the frequencies of word forms that are used in both variants. Our results show that a stylometric approach is useful in an empirical investigation of recurrent differences between different varieties and standard variants of BCS.

Keywords

Word Frequency Standard Variant Word Form Standard Language Categorical Difference 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Стилометрический подход к изучению различий между литературными вариантами боснийско-хорватско-сербского языка, или: сербский Хоббит—больше хоббит или больше серб?

Аннотация

В статье применяется стилометрический подход к изучению различий между литературными вариантами полицентричного боснийско-хорватско-сербского языка на базе корпуса оригинальных текстов и параллельных переводов. Излагаются результаты трех экспериментов. Первые два эксперимента показывают, что выбор между хорватским или сербским вариантом не является наиболее существенным фактором, определяющим частотные профили словоформ в переводах; в большей мере оказывает воздействие личность автора и другие стилистические факторы. В третьем эксперименте анализируется работа статистического классификатора текстов. Данный анализ позволяет выявить словоформы, которые выступают во всех вариантах, но являются, в силу разной частотности, диагностическими для их различения. Результаты экспериментов показывают, что стилометрический подход может быть полезен для эмпирического изучения систематических различий между вариантами боснийско-хорватско-сербского языка.

References

  1. Alexander, R. (2006). Bosnian, Croatian, Serbian. A grammar with sociolinguistic commentary. London. Google Scholar
  2. Barentsen, A. (2008). Vyraženie posledovatel’nosti dejstvij pri povtorjaemosti v prošlom v sovremennyx slavjanskix jazykax. In P. Houtzagers, J. Kalsbeek, & J. Schaeken (Eds.), Dutch contributions to the Fourteenth International Congress of Slavists, Ohrid, September 10–16, 2008. Linguistics (Studies in Slavic and General Linguistics, 34, pp. 1–36). Amsterdam, New York. Google Scholar
  3. Biber, D. (1995). Dimensions of register variation. A cross-linguistic comparison. Cambridge. CrossRefGoogle Scholar
  4. Brodnjak, V. (1993). Razlikovni rječnik srpskog i hrvatskog jezika. Zagreb. Google Scholar
  5. Brozović, D. (1992). Serbo-Croatian as a pluricentric language. In M. Clyne (Ed.), Pluricentric languages. Differing norms in different nations (Contributions to the Sociology of Language, 62, pp. 347–380). Berlin, New York. Google Scholar
  6. Brozović, D. (1993). Pogovor. In V. Brodnjak (Ed.), Razlikovni rječnik srpskog i hrvatskog jezika (pp. 628–630). Zagreb. Google Scholar
  7. Bunčić, D. (2008). Die (Re-)Nationalisierung der serbokroatischen Standards. In S. Kempgen, K. Gutschmidt, U. Jekutsch, & L. Udolph (Eds.), Deutsche Beiträge zum 14. Internationalen Slavistenkongress. Ohrid 2008 (Die Welt der Slaven. Sammelbände – Sborniki, 32, pp. 89–102). München. Google Scholar
  8. Burrows, J. (2002). ‘Delta’: a measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing, 17(3), 267–287. CrossRefGoogle Scholar
  9. Eder, M. (2013). Computational stylistics and Biblical translation: How reliable can a dendrogram be? In T. Piotrowski & Ł. Grabowski (Eds.), The translator and the computer (pp. 155–170). Wrocław. Google Scholar
  10. Eder, M. (2015). Visualization in stylometry: cluster analysis using networks. Digital Scholarship in the Humanities, 30. doi: 10.1093/llc/fqv061.
  11. Eder, M., Kestemont, M., & Rybicki, J. (2013). Stylometry with R: a suite of tools. In Digital Humanities 2013. Book of Abstracts (pp. 487–489). Lincoln. Google Scholar
  12. Gröschel, B. (2009). Das Serbokroatische zwischen Linguistik und Politik. Mit einer Bibliographie zum postjugoslavischen Sprachenstreit. München. Google Scholar
  13. Hoover, D. (2004a). Testing Burrows’ Delta. Literary and Linguistic Computing, 19(4), 453–475. CrossRefGoogle Scholar
  14. Hoover, D. (2004b). Delta prime. Literary and Linguistic Computing, 19(4), 477–495. CrossRefGoogle Scholar
  15. Jannidis, F., Pielström, S., Schöch, C., & Vitt, T. (2015). Improving Burrows’ Delta. An empirical evaluation of text distance measures. In Digital Humanities 2015. Book of Abstracts. Graz. Retrieved from: http://dh2015.org/abstracts (20 November 2015). Google Scholar
  16. Jockers, M. L., Witten, D. M., & Criddle, C. S. (2008). Reassessing authorship in the Book of Mormon using delta and nearest shrunken centroid classification. Literary and Linguistic Computing, 23(4), 465–491. CrossRefGoogle Scholar
  17. Koppel, M., Schler, J., & Argamon, S. (2009). Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology, 60(1), 9–26. CrossRefGoogle Scholar
  18. Kordić, S. (2009). Plurizentrische Sprachen, Ausbausprachen, Abstandsprachen und die Serbokroatistik. Zeitschrift für Balkanologie, 45(2), 210–215. Google Scholar
  19. Mosteller, F., & Wallace, D. L. (2007[1964]). Inference and disputed authorship: The Federalist (reprinted with a new introduction by John Nerbonne). Stanford. Google Scholar
  20. Nerbonne, J. (2007). The exact analysis of text. In F. Mosteller & D. L. Wallace (Eds.), Inference and disputed authorship: The Federalist (reprinted with a new introduction by John Nerbonne, pp. XI–XX). Stanford. Google Scholar
  21. Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology, 60(3), 538–556. CrossRefGoogle Scholar
  22. Stevanović, M. (1965). Neke leksičko-stilske razlike, a ne jezičke varijante. Naš jezik, 14, 195–226. Google Scholar
  23. Tiedemann, J., & Ljubešić, N. (2012). Efficient discrimination between closely related languages. In M. Kay & C. Boitet (Eds.), Proceedings of the 24th International Conference on Computational Linguistics: Technical Papers (COLING 2012). Mumbai, 8–15 December 2012 (pp. 2619–2634). Retrieved from: https://aclweb.org/anthology/C/C12/ (20 November 2015). Google Scholar
  24. Voß, C. (2009). Review of: B. Gröschel (2009). Das Serbokroatische zwischen Linguistik und Politik. Mit einer Bibliographie zum postjugoslavischen Sprachenstreit. München. Südost-Forschungen, 68, 778–781. Google Scholar
  25. von Waldenfels, R. (2012). ParaSol: introduction to a Slavic parallel corpus. Prace Filologiczne, LXIII, 293–302. Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  1. 1.Department of Slavic Languages and LiteraturesUniversity of CaliforniaBerkeleyUSA
  2. 2.Institute of Polish LanguagePolish Academy of SciencesCracowPoland
  3. 3.Pedagogical University of CracowCracowPoland

Personalised recommendations