Skip to main content

Improved Algorithm for Automatic Word Alignment for Hindi-Punjabi Parallel Corpus

  • Conference paper
Data Engineering and Management (ICDEM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6411))

Included in the following conference series:

  • 1371 Accesses

Abstract

This paper describes an alignment system that aligns texts at the word level in Hindi-Punjabi parallel corpus. The previous aligner was based on length based estimation approach. In the previous version, multi-word unit & sometime one-to-one produces alignment errors. In this improved version, different techniques like Boundary Detection, Dictionary-Lookup (DL), Nearest-align-Neighbor (NAN) and Scoring based Minimum distance function to improve the accuracy has been used. Alignment of words means to identify correspondences between words in source language and target language sentences. This automatic word alignment of Hindi-Punjabi corpus is very useful in automatically developing Hindi-Punjabi dictionary. In the previous version, the system accuracy was claimed to be 89.5 % approximately but after rigorous testing, it is found to be 65%. After implementing above techniques in the improved system explained here, system accuracy was found to be 99.09% for one-to-one word alignment and 80% accuracy for multi-word alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kawtrakul, A., Thumkanon, C., Oovorawan, Y., Varasrai, P., Suktarachan, M.: Automatic Thaiunknown word recognition. In: Proceedings of the Pacific Rim Symposium on Natural Language Processing, Thail, pp. 341–348 (1997)

    Google Scholar 

  2. Aswani, N., Gaizauskas, R.: Aligning words in English-Hindi parallel corpora. In: Proceeding of the ACL Workshop on Bilingual & Using Parallel Texts, Ann Arbor, pp. 115–118 (June 2005)

    Google Scholar 

  3. Dagan, I., Church, K., Gale, W.: Robust Bilingual Word Alignment for Machine Translation. In: Proceedings of the Workshop on Very Large Corpora (1993)

    Google Scholar 

  4. Goyal, V., Garcha, L.: Automatic Word Alignment Algorithm for Bilingual Hindi-Punjabi Parallel Text. In: Proceeding of the IACC, Patiala (2009)

    Google Scholar 

  5. Somboonphol, N., Sornlertlamvanich, V.: Statistical Technique for Estimating Word correspondence for Bilingual Dictionary Development. In: Proceedings of SNLP-Oriental COCOSDA (2002)

    Google Scholar 

  6. Wu, D.: Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria. In: Proc. of the 32nd Annual Conference of the ACL, Las Cruces, NM, pp. 80–87 (1994)

    Google Scholar 

  7. Gaizauskas, R., Aswani, N.: A hybrid approach to align sentences & words. In: Proceeding of the ACL Workshop on Bilingual & Using Parallel texts, Ann Arbor, pp. 57–64 (June 2005)

    Google Scholar 

  8. Moore, K.: The Ultimate VB .NET and ASP.NET Code Book

    Google Scholar 

  9. Macdonald, M.: Beginning ASP.NET in VB .NET: From Novice to Professional

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jindal, K., Goyal, V. (2012). Improved Algorithm for Automatic Word Alignment for Hindi-Punjabi Parallel Corpus. In: Kannan, R., Andres, F. (eds) Data Engineering and Management. ICDEM 2010. Lecture Notes in Computer Science, vol 6411. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27872-3_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27872-3_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27871-6

  • Online ISBN: 978-3-642-27872-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics