Abstract
This paper reports on the development of a grammar error labeling system for the Latvian language. We choose to label six error types that are crucial for understanding a text as noted in a survey by native Latvian speakers. The error types are the following: an incorrect use of a preposition, an incorrect agreement in a phrase, an incorrect verb form, an incorrect noun form, an incorrect choice of the definite/indefinite ending of an adjective, and a missing comma. For neural network model training, a large amount of error-annotated training data is required. We generate artificial errors in a correct text to cope with the lack of manually annotated data. As a bidirectional Long Short-Term Memory neural network algorithm is considered the best for erroneous word detection by several authors, we chose this architecture. We train several models – models labeling a single type of error and models labeling all six types of errors. The precision for all types of errors reaches 94.61%, the recall – 94.08%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5(1), 135–146 (2017)
Chollampatt, S., Ng, H.T.: A multilayer convolutional encoder-decoder neural network for grammatical error correction. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Dale, R.: Checking in on grammar checking. Nat. Lang. Eng. 22(03), 491–495 (2016)
Darǵis, R., Auziņa, I., Levāne-Petrova, K.: The use of text alignment in semi-automatic error analysis: use case in the development of the corpus of the Latvian language learners. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), pp. 4111–4115 (2018)
Deksne, D., Skadina, I.: Error-annotated corpus of Latvian. In: Utka, A., et al. (eds.) Human Language Technologies - The Baltic Perspective. Proceedings of the sixth International Conference Baltic HLT 2014, FAIA, vol. 268, pp. 163–166. IOS Press, Amsterdam (2014)
Deksne, D.: A new phase in the development of a grammar checker for Latvian. In: Skadiņa, I., Rozis, R. (eds.) Human Language Technologies - The Baltic Perspective. Proceedings of the seventh International Conference Baltic HLT 2016, FAIA, vol. 289, pp. 147–152. IOS Press, Amsterdam (2016)
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ge, T., Wei, F., Zhou, M.: Fluency boost learning and inference for neural grammatical error correction. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1055–1065 (2018)
Ghosh, S., Kristensson, P.O.: Neural networks for text correction and completion in keyboard decoding. arXiv preprint arXiv:1709.06429 (2017)
Han, N.R., Chodorow, M., Leacock, C.: Detecting errors in English article usage by non-native speakers. Nat. Lang. Eng. 12(2), 115–129 (2006)
Junczys-Dowmunt, M., Grundkiewicz, R., Guha, S., Heafield, K.: Approaching neural grammatical error correction as a low-resource machine translation task. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp. 595–606 (2018)
Kaneko, M., Sakaizawa, Y., Komachi, M.: Grammatical error detection using error-and grammaticality-specific word embeddings. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 40–48 (2017)
Liu, Z.R., Liu, Y.: Exploiting unlabeled data for neural grammatical error detection. J. Comput. Sci. Technol. 32(4), 758–767 (2017)
Ng, H.T., Wu, S.M., Briscoe, T., Hadiwinoto, C., Susanto, R.H., Bryant, C.: The CoNLL-2014 shared task on grammatical error correction. In: CoNLL Shared Task, pp. 1–14 (2014)
Rei, M., Felice, M., Yuan, Z., Briscoe, T.: Artificial error generation with machine translation and syntactic patterns. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 287–292. ACL, Copenhagen (2017)
Rei, M., Yannakoudakis., H.: Compositional sequence labeling models for error detection in learner writing. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1181–1191. ACL, Berlin (2016)
Rei, M., Yannakoudakis, H.: Auxiliary objectives for neural error detection models. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 33–43. ACL, Copenhagen (2017)
Sakaguchi, K., Napoles, C., Tetreault, J.: GEC into the future: where are we going and how do we get there? In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 180–187. ACL, Copenhagen (2017)
Schmaltz, A., Kim, Y., Rush, A. and Shieber, S.: Adapting sequence models for sentence correction. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2807–2813. ACL, Copenhagen (2017)
Sun, C., Jin, X., Lin, L., Zhao, Y., Wang, X.: Convolutional neural networks for correcting English article errors. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) National CCF Conference on Natural Language Processing and Chinese Computing. LNCS, vol. 9362, pp. 102–110. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25207-0_9
Šķilters, J., Zariņa, L., Žilinskaitė-Šinkūnienė, E., Skolmeistere, V.: Acceptability rating of ungrammatical colloquial Latvian: how native speakers judge different error types. Baltic J. Mod. Comput. 6(2), 173–194 (2018)
Tiedemann, J.: News from OPUS - a collection of multilingual parallel corpora with tools and interfaces. In: Nicolov, N., Angelova, G., Mitkov, R. (eds.) Recent Advances in Natural Language Processing V. Selected papers from RANLP 2007, pp. 237–248. John Benjamins Publishing Company, Amsterdam/Philadelphia (2009)
Znotiņa, I.: Computer-aided error analysis for researching baltic interlanguage. Rural Environment, Education, Personality (REEP). In: Proceedings of the tenth International Scientific Conference, pp. 238–244. LLU, Jelgava (2017)
Acknowledgment
The research has been supported by the European Regional Development Fund within the project “Neural Network Modelling for Inflected Natural Languages” No. 1.1.1.1/16/A/215.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Deksne, D. (2019). Bidirectional LSTM Tagger for Latvian Grammatical Error Detection. In: Ekštein, K. (eds) Text, Speech, and Dialogue. TSD 2019. Lecture Notes in Computer Science(), vol 11697. Springer, Cham. https://doi.org/10.1007/978-3-030-27947-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-27947-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27946-2
Online ISBN: 978-3-030-27947-9
eBook Packages: Computer ScienceComputer Science (R0)