Abstract
This paper describes design and creation of a multilingual parallel corpus for South African languages. One of the applications of the corpus, namely, the induction of a part-of-speech tagger for Afrikaans from the data, is presented in the paper. Development of the Afrikaans part-of-speech tagger is based on a modified method for induction of linguistic tools from parallel corpora originally proposed by Yarowsky and Ngai (2001).
Chapter PDF
Similar content being viewed by others
Keywords
References
G. Bouma, G. van Noord and R. Malouf. Alpino: Wide-coverage Computational Analysis of Dutch. Computational Linguistics in The Netherlands. 2001.
T. Brants, TnT-A Statistical Part-of-Speech Tagger. Proceedings of ANLP-2000. Seattle, 2000.
P. F. Brown, J. Cocke, S. Delia Pictra, V. J. Delia Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer and P. S. Roossin. A Statistical Approach to Machine Translation. Computational Linguistics 16(2):79–85, 1990.
E. Charniak. A Maximum-Entropy-Inspired Parser. Proceedings of ANLP/NAACL’2000. Seattle, 2000.
P. Danielsson and D. Ridings. Practical presentation of a vanilla aligner. Sprakbanken, Institutionen for svenska spraket, Goteborgs universitet, 1997.
L. Dimitrova, T. Erjavec, N. Ide, H.-J, Kaalep, V. Petkevic and D. Tufis. Multext-East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages. Proceedings of COLING’98. Montreal, 1998.
N. Ide and J. Varonis. Multext (multilingual tools and corpora). Proceedings of COLING’94, p. 90–96. Kyoto, 1994.
M. Marcus, B. Santorini and M. A. Marcinkiewicz. Building a Large Annotated Corpus of English: The Penn Treebank, Computational Linguistics 19(2); 333–330, 1993.
F. J. Och, C. Tillmann and H. Ney, Improved alignment models for statistical machine translation, Proceedins of the EMNLP/WVLC Conference. 1999.
F. J. Och and H. Ney. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1): 19–51, 2003.
S. Pilon. Automatic part-of-speech tagging of Afrikaans, MA thesis, North-West University, 2006.
F. Resnik. Mining the Web for Bilingual Text. Proceedings of ACL’99. Maryland, 1999.
D. Yarowsky and G. Ngai. Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection across Aligned Corpora. Proceedings of NAACL 2001. Pittsburgh, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 International Federation for Information Processing
About this paper
Cite this paper
Trushkina, J. (2006). Development of a Multilingual Parallel Corpus and a Part-of-Speech Tagger for Afrikaans. In: Shi, Z., Shimohara, K., Feng, D. (eds) Intelligent Information Processing III. IIP 2006. IFIP International Federation for Information Processing, vol 228. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-44641-7_47
Download citation
DOI: https://doi.org/10.1007/978-0-387-44641-7_47
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-44639-4
Online ISBN: 978-0-387-44641-7
eBook Packages: Computer ScienceComputer Science (R0)