Improved Algorithm for Automatic Word Alignment for Hindi-Punjabi Parallel Corpus

Jindal, Karuna; Goyal, Vishal

doi:10.1007/978-3-642-27872-3_39

Karuna Jindal¹⁸ &
Vishal Goyal¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6411))

Included in the following conference series:

International Conference on Data Engineering and Management

1371 Accesses

Abstract

This paper describes an alignment system that aligns texts at the word level in Hindi-Punjabi parallel corpus. The previous aligner was based on length based estimation approach. In the previous version, multi-word unit & sometime one-to-one produces alignment errors. In this improved version, different techniques like Boundary Detection, Dictionary-Lookup (DL), Nearest-align-Neighbor (NAN) and Scoring based Minimum distance function to improve the accuracy has been used. Alignment of words means to identify correspondences between words in source language and target language sentences. This automatic word alignment of Hindi-Punjabi corpus is very useful in automatically developing Hindi-Punjabi dictionary. In the previous version, the system accuracy was claimed to be 89.5 % approximately but after rigorous testing, it is found to be 65%. After implementing above techniques in the improved system explained here, system accuracy was found to be 99.09% for one-to-one word alignment and 80% accuracy for multi-word alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kawtrakul, A., Thumkanon, C., Oovorawan, Y., Varasrai, P., Suktarachan, M.: Automatic Thaiunknown word recognition. In: Proceedings of the Pacific Rim Symposium on Natural Language Processing, Thail, pp. 341–348 (1997)
Google Scholar
Aswani, N., Gaizauskas, R.: Aligning words in English-Hindi parallel corpora. In: Proceeding of the ACL Workshop on Bilingual & Using Parallel Texts, Ann Arbor, pp. 115–118 (June 2005)
Google Scholar
Dagan, I., Church, K., Gale, W.: Robust Bilingual Word Alignment for Machine Translation. In: Proceedings of the Workshop on Very Large Corpora (1993)
Google Scholar
Goyal, V., Garcha, L.: Automatic Word Alignment Algorithm for Bilingual Hindi-Punjabi Parallel Text. In: Proceeding of the IACC, Patiala (2009)
Google Scholar
Somboonphol, N., Sornlertlamvanich, V.: Statistical Technique for Estimating Word correspondence for Bilingual Dictionary Development. In: Proceedings of SNLP-Oriental COCOSDA (2002)
Google Scholar
Wu, D.: Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria. In: Proc. of the 32nd Annual Conference of the ACL, Las Cruces, NM, pp. 80–87 (1994)
Google Scholar
Gaizauskas, R., Aswani, N.: A hybrid approach to align sentences & words. In: Proceeding of the ACL Workshop on Bilingual & Using Parallel texts, Ann Arbor, pp. 57–64 (June 2005)
Google Scholar
Moore, K.: The Ultimate VB .NET and ASP.NET Code Book
Google Scholar
Macdonald, M.: Beginning ASP.NET in VB .NET: From Novice to Professional
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Punjabi University, Patiala, India
Karuna Jindal & Vishal Goyal

Authors

Karuna Jindal
View author publications
You can also search for this author in PubMed Google Scholar
Vishal Goyal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Bishop Heber College(Autonomous), 620017, Tiruchirappalli, India
Rajkumar Kannan
National Institute of Informatics (NII), 101-8430, Tokyo, Japan
Frederic Andres

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jindal, K., Goyal, V. (2012). Improved Algorithm for Automatic Word Alignment for Hindi-Punjabi Parallel Corpus. In: Kannan, R., Andres, F. (eds) Data Engineering and Management. ICDEM 2010. Lecture Notes in Computer Science, vol 6411. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27872-3_39

Download citation

DOI: https://doi.org/10.1007/978-3-642-27872-3_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27871-6
Online ISBN: 978-3-642-27872-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics