Abstract
This paper describes the algorithm for the Belarusian main dictiona-ries enrichment in NooJ, on the basis of the first one-million corpus for the Belarusian NooJ module. From the broad list of possible subject categories, the corpus focuses on literature of fiction, historical literature, medical literature, scientific literature, sociological literature, and so on. The corpus is considered the finest source for searching unknown words of different domains. So, for this purpose a specific algorithm for automatic word paradigms generation have been agreed to develop. The authors have worked out a mechanism for further processing of all unknown (unique) words extracted from the corpus and adding them to the present dictionary on the basis of the Belarusian NooJ module. The algorithm is based on the required grammatical information of an entire word.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Reentovich, I., Hetsevich, Y., Voronovich, V., Kachan, E., Kozlovskaya, H., Tretyak, A., Koshchanka, U.: The first one-million corpus for the Belarusian NooJ module. In: Okrut, T., Hetsevich, Y., Silberztein, M., Stanislavenka, H. (eds.) NooJ 2015. CCIS, vol. 607, pp. 3–15. Springer, Heidelberg (2016). doi:10.1007/978-3-319-42471-2_1
NooJ: A Linguistic Development Environment. Electronic resource (2015). http://www.NooJ4nlp.net
Hetsevich, Y.: Overview of Belarusian and Russian dictionaries and their adaptation for NooJ. In: Vučković, K., Bekavac, B., Silberztein, M. (eds.) Selected Papers from the NooJ 2011 International Conference on Formalising Natural Languages with NooJ, pp. 29–40. Cambridge Scholars Publishing, Newcastle (2011)
The Levenshtein-Algorithm. Electronic resource (2015). http://www.levenshtein.net/
Word Paradigm Generator. Electronic resource (2016). http://corpus.by/WordParadigmGenerator/
Hetsevich, Y.: Semi-automatic part-of-speech annotating for Belarusian dictionaries enrichment. In: Silberztein. M., Monteleone, M., Barone, L. (eds.) Proceedings of the NooJ 2016 International Conference (in print)
Silberztein, M.: Formalizing Natural Languages: The NooJ Approach. Wiley, London (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Hetsevich, Y., Varanovich, V., Kachan, E., Reentovich, I., Lysy, S. (2016). Semi-automatic Part-of-Speech Annotating for Belarusian Dictionaries Enrichment in NooJ. In: Barone, L., Monteleone, M., Silberztein, M. (eds) Automatic Processing of Natural-Language Electronic Texts with NooJ. NooJ 2016. Communications in Computer and Information Science, vol 667. Springer, Cham. https://doi.org/10.1007/978-3-319-55002-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-55002-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55001-5
Online ISBN: 978-3-319-55002-2
eBook Packages: Computer ScienceComputer Science (R0)