On Text Corpora, Word Lengths, Andword Frequencies in Slovenian

  • Primož Jakopin
Part of the Text, Speech and Language Technology book series (TLTB, volume 31)

From the first beginnings in the mid-1990s, availability of electronic text corpora in Slovenian, all with an Internet user interface, has grown to a level comparable to many European languages with a long history of quantitative linguistic research. There are two established corpora with 100 million running words, an academic one which is freely accessible and a commercial one, prepared by industrial and academic partners. The two are complemented by a sizeable collection of works of fiction, available for reading in a free virtual library and several specialized corpora, compiled for the needs of particular institutions. The majority of Slovenian newspapers are also accessible online, at least in the form of selected articles.


Word Frequency Word Length Word Form Total Frequency Text Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer 2007

Authors and Affiliations

  • Primož Jakopin
    • 1
  1. 1.Laboratorij za korpus slovenskega jezikaInštitut za slovenski jezik Frana Ramovša ZRC SAZU Gosposka 13LjubljanaSlovenia

Personalised recommendations