Skip to main content

Bilingual Sentence Alignment Based on Punctuation Statistics and Lexicon

  • Conference paper
Natural Language Processing – IJCNLP 2004 (IJCNLP 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Included in the following conference series:

Abstract

This paper presents a new method of aligning bilingual parallel texts based on punctuation statistics and lexical information. It is demonstrated that the punctuation statistics prove to be effective means to achieve good results. The task of sentence alignment of bilingual texts written in disparate language pairs like English and Chinese is reportedly more difficult. We examine the feasibility of using punctuations for high accuracy sentence alignment. Encouraging precision rate is demonstrated in aligning sentences in bilingual parallel corpora based solely on punctuation statistics. Improved results were obtained when both punctuation statistics and lexical information were employed. We have experimented with an implementation of the proposed method on the parallel corpora of Sinorama Magazine and Records of the Hong Kong Legislative Council with satisfactory results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, USA, pp. 169–176 (1991)

    Google Scholar 

  2. Chen, S.F.: Aligning Sentences in Bilingual Corpora Using Lexical Information. In: Proceedings of ACL 1993, Columbus OH (1993)

    Google Scholar 

  3. Chuang, T., You, G.N., Chang, J.S.: Adaptive Bilingual Sentence Alignment. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 21–30. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Déjean, H., Gaussier, É., Sadat, F.: Bilingual Terminology Extraction: An Approach based on a Multilingual thesaurus Applicable to Comparable Corpora. In: Proceedings of the 19th International Conference on Computational Linguistics COLING 2002, Taipei, Taiwan, August 24-September 1, pp. 218–224 (2002)

    Google Scholar 

  5. Dolan, W.B., Pinkham, J., Richardson, S.D.: MSR-MT: The Microsoft Research Machine Translation System. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 237–239. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpus. Computational Linguistics 19, 75–102 (1991)

    Google Scholar 

  7. Gey, F.C., Chen, A., Buckland, M.K., Larson, R.R.: Translingual vocabulary mappings for multilingual information access. In: SIGIR 2002, pp. 455–456 (2002)

    Google Scholar 

  8. Jutras, J.-M.: An Automatic Reviser: The TransCheck System. In: Proc. of Applied Natural Language Processing, pp. 127–134 (2000)

    Google Scholar 

  9. Kay, M., Röscheisen, M.: Text-Translation Alignment. Computational Linguistics 19(1), 121–142 (1993)

    Google Scholar 

  10. Kueng, T.L., Su, K.-Y.: A Robust Cross-Domain Bilingual Sentence Alignment Model. In: Proceedings of the 19th International Conference on Computational Linguistics (2002)

    Google Scholar 

  11. Kwok, K.: NTCIR-2 Chinese, Cross-Language Retrieval Experiments Using PIRCS. In: Proceedings of the Second NTCIR Workshop Meeting, pp. (5) 14–20 (2001), National Institute of Informatics, Japan

    Google Scholar 

  12. Marcu, D., Wong, W.: A Phrase-Based, Joint Probability Model for Statistical Machine Translation. In: EMNLP (2002)

    Google Scholar 

  13. Melamed, I.: Dan, Models of Translational Equivalence among Words. Computational Linguistics 26(2), 221–249 (2000)

    Article  Google Scholar 

  14. Moore, R.C.: Fast and Accurate Sentence Alignment of Bilingual Corpora. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 135–144. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  15. Piao, S.S.: Sentence and word alignment between Chinese and English. Ph.D. thesis, Lancaster University (2000)

    Google Scholar 

  16. Proctor, P.: Longman English-Chinese Dictionary of Contemporary English. Longman Group (Far East), Hong Kong (1988)

    Google Scholar 

  17. Richards, J., et al.: Longman Dictionary of Applied Linguistics. Longman (1985)

    Google Scholar 

  18. Simard, M., Foster, G., Isabelle, P.: Using cognates to align sentences in bilingual corpora. In: Proceedings of TMI 1992, Montreal, Canada, pp. 67–81 (1992)

    Google Scholar 

  19. West, M.: A General Service List of English Words, Longman, London (1953)

    Google Scholar 

  20. Wu, D.: Aligning a parallel English-Chinese corpus statistically with lexical criteria. In: The Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, New, Mexico, USA, pp. 80–87 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chuang, T.C., Wu, JC., Lin, T., Shei, WC., Chang, J.S. (2005). Bilingual Sentence Alignment Based on Punctuation Statistics and Lexicon. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30211-7_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24475-2

  • Online ISBN: 978-3-540-30211-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics