A Corpus-Based Study of the Rate of Changes in Frequency of Syntactic Bigrams in English and Russian

Bochkarev, Vladimir; Solovyev, Valery; Shevlyakova, Anna

doi:10.1007/978-3-030-33749-0_37

A Corpus-Based Study of the Rate of Changes in Frequency of Syntactic Bigrams in English and Russian

Conference paper
First Online: 27 October 2019

1566 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11835))

Abstract

The article describes general regularities of frequency dynamics of syntactic bigrams and the method used to analyse them. The work objective is to quantitatively estimate the typical rate of change in frequency of syntactic bigrams in English and Russian. Both changes in frequency of words contained in syntactic bigrams and changes in the co-occurrence of these words influence the total rate of changes in frequency of syntactic bigrams. Their contribution to the total rate of frequency changes was estimated using decomposition of the Kullback-Leibler symmetrized divergence. It was also determined to what extent frequencies of the syntactic bigrams respond to major social events. Data on frequencies of syntactic bigrams from the English and Russian sub-corpora of Google Books Ngram were used as a study material. It was found that the regularities of the syntactic bigram usage are similar in English and Russian. The proposed approach can be used in other fields of science.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ng, V., Cardie, C.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 104–111 (2014)
Google Scholar
Michel, J.-B., Shen, Y., Aiden, A., Veres, A., Gray, M., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)
Article Google Scholar
Gerlach, M., Altmann, E.: Stochastic model for the vocabulary growth in natural languages. Phys. Rev. X 10(3), 021006 (2013)
Google Scholar
Hilpert, M., Gries, S.: Assessing frequency changes in multistage diachronic corpora: applications for historical corpus linguistics and the study of language acquisition. Lit. Linguist. Comput. 24(4), 385–401 (2009)
Article Google Scholar
Petersen, A.M., Tenenbaum, J.N., Havlin, S., Stanley, H.E., Perc, M.: Languages cool as they expand: allometric scaling and the decreasing need for new words. Sci. Rep. 2, 943 (2012). PMID 23230508
Article Google Scholar
Juola, P.: Using the Google N-Gram corpus to measure cultural complexity. Lit. Linguist. Comput. 28(4), 668–675 (2013)
Article Google Scholar
Bochkarev, V., Solovyev, V., Shevlyakova, A.: Analysis of dynamics of the number of syntactic dependencies in Russian and English using Google Books Ngram. In: CEUR Workshop Proceedings, vol. 2303, pp. 14–25 (2018)
Google Scholar
Padó, S., Lapata, M.: Dependency-based construction of semantic space models. Comput. Linguistics. 33(2), 161–199 (2007)
Article Google Scholar
Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic dependency-based N-grams as classification features. In: Batyrshin, I., Mendoza, M.G. (eds.) MICAI 2012. LNCS (LNAI), vol. 7630, pp. 1–11. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37798-3_1
Chapter Google Scholar
Bochkarev, V., Solovyev, V., Wichmann, S.: Universals versus historical contingencies in lexical evolution. J. R. Soc. Interface 11, 20140841 (2014)
Article Google Scholar
Lin, Y., Michel, J.-B., Aiden, E.L., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the Google Books Ngram corpus. In: Li, H., Lin, C.-Y., Osborne, M., Lee, G.G., Park, J.C. (eds.) 2012 Proceedings of the Conference on 50th Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 238–242. Association for Computational Linguistics, Jeju Island (2012)
Google Scholar
Buntinx, V., Bornet, C., Kaplan, F.: Studying linguistic changes over 200 years of newspapers through resilient words analysis. Front. Digit. Hum. 4, 1–10 (2017)
Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)
Article MathSciNet Google Scholar
Solovyev, V., Bochkarev, V., Shevlyakova, A.: Dynamics of core of language vocabulary. In: CEUR Workshop Proceedings, vol. 1886, pp. 122–129 (2016)
Google Scholar
Church, K., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)
Google Scholar

Download references

Acknowledgements

This research was financially supported by the Russian Government Program of Competitive Growth of Kazan Federal University, state assignment of Ministry of Education and Science, grant agreement № 34.5517.2017/6.7, and by RFBR, grant № 17-29-09163.

Author information

Authors and Affiliations

Kazan Federal University, Kremlyovskaya Street 18, Kazan, 420008, Russia
Vladimir Bochkarev, Valery Solovyev & Anna Shevlyakova

Authors

Vladimir Bochkarev
View author publications
You can also search for this author in PubMed Google Scholar
Valery Solovyev
View author publications
You can also search for this author in PubMed Google Scholar
Anna Shevlyakova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir Bochkarev .

Editor information

Editors and Affiliations

Universidad Panamericana, Mexico City, Mexico
Lourdes Martínez-Villaseñor
Instituto Politecnico Nacional, Mexico, Mexico
Ildar Batyrshin
Universidad Veracruzana, Xalapa, Mexico
Antonio Marín-Hernández

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bochkarev, V., Solovyev, V., Shevlyakova, A. (2019). A Corpus-Based Study of the Rate of Changes in Frequency of Syntactic Bigrams in English and Russian. In: Martínez-Villaseñor, L., Batyrshin, I., Marín-Hernández, A. (eds) Advances in Soft Computing. MICAI 2019. Lecture Notes in Computer Science(), vol 11835. Springer, Cham. https://doi.org/10.1007/978-3-030-33749-0_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-33749-0_37
Published: 27 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33748-3
Online ISBN: 978-3-030-33749-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics