Abstract
In this paper, we present a new approach to align sentences in bilingual parallel corpora based on a probabilistic neural network (P-NNT) classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuation score, and cognate score values. A set of manually aligned training data was used to train the probabilistic neural network. Another set of data was used for testing. Using the probabilistic neural network approach, an error reduction of 27% was achieved over the length based approach when applied on English-Arabic parallel documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wang, X., Ren, F.: Chinese-japanese clause alignment. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 400–412. Springer, Heidelberg (2005)
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Com-putational Linguistics 19, 75–102 (1993)
Brown, P., Lai, J., Mercer, R.: Aligning sentences in parallel corpora. In: Proceedings of the 29th annual meeting of the association for computational linguistics, Berkeley, CA, USA (1991)
Simard, M., Foster, G., Isabelle, P.: Using cognates to align sentences in bilingual corpora. In: Proceedings of TMI 1992, Montreal, Canada, pp. 67–81 (1992)
Thomas, C., Kevin, C.: Aligning Parallel Bilingual Corpora Statistically with Punctuation Criteria. Computational Linguistics and Chinese Language Processing 10(1), 95–122 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fattah, M.A., Ren, F., Kuroiwa, S. (2006). Probabilistic Neural Network Based English-Arabic Sentence Alignment. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_11
Download citation
DOI: https://doi.org/10.1007/11671299_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32205-4
Online ISBN: 978-3-540-32206-1
eBook Packages: Computer ScienceComputer Science (R0)