Development of a Multilingual Parallel Corpus and a Part-of-Speech Tagger for Afrikaans

Trushkina, Julia

doi:10.1007/978-0-387-44641-7_47

Julia Trushkina⁴

Part of the book series: IFIP International Federation for Information Processing ((IFIPAICT,volume 228))

Included in the following conference series:

International Conference on Intelligent Information Processing

1220 Accesses

Abstract

This paper describes design and creation of a multilingual parallel corpus for South African languages. One of the applications of the corpus, namely, the induction of a part-of-speech tagger for Afrikaans from the data, is presented in the paper. Development of the Afrikaans part-of-speech tagger is based on a modified method for induction of linguistic tools from parallel corpora originally proposed by Yarowsky and Ngai (2001).

Download to read the full chapter text

Chapter PDF

Dutch Parallel Corpus: A Balanced Parallel Corpus for Dutch-English and Dutch-French

Creating Multilingual Parallel Corpora in Indian Languages

Using Wiktionary to Build an Italian Part-of-Speech Tagger

Keywords

References

G. Bouma, G. van Noord and R. Malouf. Alpino: Wide-coverage Computational Analysis of Dutch. Computational Linguistics in The Netherlands. 2001.
Google Scholar
T. Brants, TnT-A Statistical Part-of-Speech Tagger. Proceedings of ANLP-2000. Seattle, 2000.
Google Scholar
P. F. Brown, J. Cocke, S. Delia Pictra, V. J. Delia Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer and P. S. Roossin. A Statistical Approach to Machine Translation. Computational Linguistics 16(2):79–85, 1990.
Google Scholar
E. Charniak. A Maximum-Entropy-Inspired Parser. Proceedings of ANLP/NAACL’2000. Seattle, 2000.
Google Scholar
P. Danielsson and D. Ridings. Practical presentation of a vanilla aligner. Sprakbanken, Institutionen for svenska spraket, Goteborgs universitet, 1997.
Google Scholar
L. Dimitrova, T. Erjavec, N. Ide, H.-J, Kaalep, V. Petkevic and D. Tufis. Multext-East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages. Proceedings of COLING’98. Montreal, 1998.
Google Scholar
N. Ide and J. Varonis. Multext (multilingual tools and corpora). Proceedings of COLING’94, p. 90–96. Kyoto, 1994.
Google Scholar
M. Marcus, B. Santorini and M. A. Marcinkiewicz. Building a Large Annotated Corpus of English: The Penn Treebank, Computational Linguistics 19(2); 333–330, 1993.
Google Scholar
F. J. Och, C. Tillmann and H. Ney, Improved alignment models for statistical machine translation, Proceedins of the EMNLP/WVLC Conference. 1999.
Google Scholar
F. J. Och and H. Ney. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1): 19–51, 2003.
Article Google Scholar
S. Pilon. Automatic part-of-speech tagging of Afrikaans, MA thesis, North-West University, 2006.
Google Scholar
F. Resnik. Mining the Web for Bilingual Text. Proceedings of ACL’99. Maryland, 1999.
Google Scholar
D. Yarowsky and G. Ngai. Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection across Aligned Corpora. Proceedings of NAACL 2001. Pittsburgh, 2001.
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Text Technology, North-West University, 2531 Potchefstroom, South Africa
Julia Trushkina

Authors

Julia Trushkina
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, China
Zhongzhi Shi
ATR Network Informatics Laboratories, Japan
K. Shimohara
University of Sydney, Australia
D. Feng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Trushkina, J. (2006). Development of a Multilingual Parallel Corpus and a Part-of-Speech Tagger for Afrikaans. In: Shi, Z., Shimohara, K., Feng, D. (eds) Intelligent Information Processing III. IIP 2006. IFIP International Federation for Information Processing, vol 228. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-44641-7_47

Download citation

DOI: https://doi.org/10.1007/978-0-387-44641-7_47
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-44639-4
Online ISBN: 978-0-387-44641-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Development of a Multilingual Parallel Corpus and a Part-of-Speech Tagger for Afrikaans

Abstract

Chapter PDF

Similar content being viewed by others

Dutch Parallel Corpus: A Balanced Parallel Corpus for Dutch-English and Dutch-French

Creating Multilingual Parallel Corpora in Indian Languages

Using Wiktionary to Build an Italian Part-of-Speech Tagger

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Development of a Multilingual Parallel Corpus and a Part-of-Speech Tagger for Afrikaans

Abstract

Chapter PDF

Similar content being viewed by others

Dutch Parallel Corpus: A Balanced Parallel Corpus for Dutch-English and Dutch-French

Creating Multilingual Parallel Corpora in Indian Languages

Using Wiktionary to Build an Italian Part-of-Speech Tagger

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation