Identification of Subject Shareness for Korean-English Machine Translation

Kim, Kye-Sung; Park, Seong-Bae; Song, Hyun-Je; Park, Se-Young; Lee, Sang-Jo

doi:10.1007/978-3-540-89197-0_22

Kye-Sung Kim³,
Seong-Bae Park³,
Hyun-Je Song³,
Se-Young Park³ &
…
Sang-Jo Lee³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5351))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

1322 Accesses
1 Citations

Abstract

One of the most critical issues in translating Korean into other languages is the common use of empty arguments. Since even mandatory elements in Korean are often dropped unlike English, the missing elements should be resolved during translation to obtain grammatical sentences. In this paper, we focus on missing subjects in intra-sentential level, which can be regarded as the identification of subject sharing between clauses. In order to reflect syntactic information in resolving missing subjects, we use a parse tree kernel, a specialized convolution kernel. In experimental evaluation, syntactic information turns out to be positively related to the identification of subject shareness. Our method achieves an accuracy of 81.39% and outperforms the baseline system assuming that two adjacent clauses share a subject.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Moschitti, A.: Making Tree Kernels Practical for Natural Language Learning. In: proceedings of the 11th International Conference on European Association for Computational Linguistics, pp. 113–120 (2006)
Google Scholar
Egedi, D., Palmer, M., Park, H.S., Joshi, A.K.: Korean to English Translation Using Synchronous TAGs. In: Proceedings of the First Conference of the Association for Machine Translation in the Americas, pp. 48–55 (1994)
Google Scholar
Haussler, D.: Convolution Kernels on Discrete Structures. UCS-CRL-99-10, UC Santa Cruz (1999)
Google Scholar
Kawahara, D., Kurohashi, S.: Zero Pronoun Resolution based on Automatically Constructed Case Frames and Structural Preference of Antecedents. Journal of Natural Language Processing 11(3), 3–19 (2004)
Article Google Scholar
Grosz, B.J., Joshi, A.K., Weinstein, S.: Centering: A Framework for Modeling the Local Coherence of Discourse. Computational Linguistics 21(2), 203–225 (1995)
Google Scholar
Isozaki, H., Hirao, T.: Japanese zero pronoun resolution based on ranking rules and machine learning. In: Proceedings of Empirical Methods in Natural Language Processing, pp. 184–191 (2003)
Google Scholar
Kim, J.-J., Choi, K.-S., Chae, Y.-S.: Phrase-Pattern-based Korean to English Machine Translation using Two Level Translation Pattern Selection. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, pp. 31–36 (2002)
Google Scholar
Peral, J., Ferrandez, A.: Pronominal Anaphora Generation in an English-Spanish MT Approach. In: Computational Linguistics and Intelligent Text Processing, pp. 187–196 (2002)
Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: Proceedings of the 10th International Conference on Research in Computational Linguistics (1997)
Google Scholar
Roh, J.-E., Lee, J.-H.: An Empirical Study for Generating Zero Pronoun in Korean based on Cost-based Centering Model. In: Proceedings of Australasian Language Technology Association, pp. 90–97 (2003)
Google Scholar
Collins, M., Duffy, N.: Convolution Kernels for Natural Language. In: Proceedings of NIPS 2001, pp. 625–632 (2001)
Google Scholar
Collins, M., Koehn, P., Kucerova, I.: Clause Restructing for Statistical Machine Translation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 531–540 (2005)
Google Scholar
Kim, M.-K.: A Centering Dynamics Approach to Zero Pronouns in Korean. The Discourse and Cognitive 10(3), 57–73 (2003)
Google Scholar
Kim, M.-Y., Lee, J.-H.: Two-Phase S-Clause Segmentation. IEICE Transaction on Information and System E88-D(7), 1724–1736 (2005)
Article Google Scholar
Hong, M.: Centering theory and Argument Deletion in Spoken Korean. The Korean Journal Cognitive Science (11-1), 9–24 (2000)
Google Scholar
Chang, P.-C., Toutanova, K.: A Discriminative Syntactic Word Order Model for Machine Translation. In: Proceedings of 45th Annual Meeting of the Association for Computational Linguistics, pp. 9–16 (2007)
Google Scholar
Zhao, S., Ng, H.T.: Identification and Resolution of Chinese Zero Pronouns: A Machine Learning Approach. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 541–550 (2007)
Google Scholar
Joachims, T.: Making large-Scale SVM Learning Practical. In: Scholkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT-Press, Cambridge (1999)
Google Scholar
Roh, Y.-H., Hong, M., Choi, S.-K., Lee, K.-Y., Park, S.-K.: For the Proper Treatment of Long Sentences in a Sentence Pattern based English-Korean MT System. In: Proceedings of Machine Translation Summit IX, pp. 23–27 (2003)
Google Scholar
Kim, Y.-J.: Subject/Object Drop in the Acquisition of Korean: A Cross-Linguistic Comparision. East Asian Linguistics 9(4), 325–351 (2000)
Article MathSciNet Google Scholar
Lee, Y.-S., Yi, W.S., Seneff, S., Weinstein, C.J.: Interlingua-Based Broad-Coverage Korean-to-English Tranlsation in CCLINC. In: Proceedings of the first International Conference on Human language Technology Research, pp. 1–6 (2001)
Google Scholar
Leffa, V.J.: Clause Processing in Complex Sentences. In: Proceedings of 1st International Conference on Language Resources and Evaluation, pp. 937–943 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Kyungpook National University, 702-701, Daegu, Korea
Kye-Sung Kim, Seong-Bae Park, Hyun-Je Song, Se-Young Park & Sang-Jo Lee

Authors

Kye-Sung Kim
View author publications
You can also search for this author in PubMed Google Scholar
Seong-Bae Park
View author publications
You can also search for this author in PubMed Google Scholar
Hyun-Je Song
View author publications
You can also search for this author in PubMed Google Scholar
Se-Young Park
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Jo Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Asahidai 1-1, 923-12292, Nomi, Japan
Tu-Bao Ho
Department of Computer Science & Technology, Nanjing University, 22 Hankou Road, 210093, China
Zhi-Hua Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, KS., Park, SB., Song, HJ., Park, SY., Lee, SJ. (2008). Identification of Subject Shareness for Korean-English Machine Translation. In: Ho, TB., Zhou, ZH. (eds) PRICAI 2008: Trends in Artificial Intelligence. PRICAI 2008. Lecture Notes in Computer Science(), vol 5351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89197-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-540-89197-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89196-3
Online ISBN: 978-3-540-89197-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics