Abstract
This paper introduces our activities on corpus annotation and management tool development in the Japanese government funded project, Balanced Corpus of Contemporary Written Japanese. We are investigating various levels of text annotation that covers morphological and POS tagging, syntactic dependency parsing, predicate-argument analysis, and coreference analysis. Since automatic annotation is not perfect, we need annotated corpus management tools that facilitate corpus browsing and error correction. We especially take up our corpus management tool ChaKi, explains its functions, and discuss how we are trying to maintain consistency of corpus annotation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proc. Human Language Technology and North American Chapter of Association for Computational Linguistics, pp. 8–15 (2003)
Iida, R., Inui, K., Matsumoto, Y.: Exploiting Syntactic Patterns as Clues in Zero-Anaphora Resolution. In: ACL-Coling-2006, pp. 625–632 (2006)
Kudo, T., Matsumoto, Y.: Japanese Dependency Analysis using Cascaded Chunking. In: 6th Conference on Natural Language Learning, pp. 63–69 (2002)
Maekawa, K.: KOTONOHA and BCCWJ: Development of a Balanced Corpus of Contemporary Written Japanese. In: Corpora and Language Research: Proceedings of the First International Conference on Korean Language, Literature, and Culture, pp. 158–177 (2007)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Matsumoto, Y.: An Annotated Corpus Management Tool: ChaKi. In: Proc. 5th International Conference on Language Resources and Evaluation (LREC) (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Matsumoto, Y. (2008). Corpus Annotation/Management Tools for the Project: Balanced Corpus of Contemporary Written Japanese. In: Tokunaga, T., Ortega, A. (eds) Large-Scale Knowledge Resources. Construction and Application. LKR 2008. Lecture Notes in Computer Science(), vol 4938. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78159-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-78159-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78158-5
Online ISBN: 978-3-540-78159-2
eBook Packages: Computer ScienceComputer Science (R0)