Skip to main content

Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna

  • Conference paper
  • First Online:
Information and Communication Technology for Development for Africa (ICT4DA 2017)

Abstract

In this research an attempt have been made to experiment on Amharic-Tigrigna machine translation for promoting information sharing. Since there is no Amharic-Tigrigna parallel text corpus, we prepared a parallel text corpus for Amharic-Tigrigna machine translation system from religious domain specifically from bible. Consequently, the data preparation involves sentence alignment, sentence splitting, tokenization, normalization of Amharic-Tigrigna parallel corpora and then splitting the dataset into training, tuning and testing data. Then, Amharic-Tigrigna translation model have been constructed using training data and further tuned for better translation. Finally, given target language model, the Amharic-Tigrigna translation system generates a target output with reference to translation model using word and morpheme as a unit. The result we found from the experiment is promising to design Amharic-Tigrigna machine translation system between resource deficient languages. We are now working on post-editing to enhance the performance of the bi-lingual Amharic-Tigrigna translator.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at https://www.perl.org.

  2. 2.

    Available at https://www.python.org.

  3. 3.

    The unit obtained with Morfessor segmentation is referred here as morpheme without any linguistic definition of morpheme.

References

  1. Nakamura, S.: Overcoming the language barrier with speech translation technology. Sci. Technol. Trends Q. Rev. 31, 35–48 (2009)

    Google Scholar 

  2. What is machine translation, SYSTRAN: we speak your industry’s language. http://www.systran.co.uk/systran/corporate-profile/translation-technology/what-is-machine-translation

  3. Martínez, L.G.: Human Translation Versus Machine Translation and Full Post-editing of Raw Machine Translation Output. Dublin City University, Dublin (2003)

    Google Scholar 

  4. Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World, 20th edn. SIL, Dallas (2017)

    Google Scholar 

  5. Zekaria, S.: Summary and Statistical Report of the 2007 Population and Housing Census. Central Statistical Agency, Addis Ababa (2008)

    Google Scholar 

  6. Ager, S.: Omniglot, the online Encyclopedia of writing systems and languages

    Google Scholar 

  7. Hudson, G.: The world’s major languages: Amharic. In: The World’s Major Languages, 2nd edn, pp. 594–614. Routledge, Oxon/New York (2009)

    Google Scholar 

  8. Abyssinica dictionary: Amharic, the official language of Ethiopia (2015)

    Google Scholar 

  9. Teferra, S., Menzel, W., Tafila, B.: An Amharic speech corpus for large vocabulary continuous speech recognition. In: Proceedings of the XVth International Conference of Ethiopian Studies, Hamburg, Germany (2005)

    Google Scholar 

  10. Woldeyohannis, M.M., Besacier, L., Meshesha, M.: A corpus for Amharic-English speech translation: the case of tourism domain. In: Mekuria, F., Nigussie, E.E., Dargie, W., Edward, M., Tegegne, T., et al. (eds.) ICT4DA 2017. LNICST, vol. 244, pp. 129–139. Springer, Cham (2018)

    Google Scholar 

  11. Besacier, L., Le, V.-B., Boitet, C., Berment, V.: ASR and translation for under-resourced languages, Grenoble cedex 9, France

    Google Scholar 

  12. Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the Workshop on Morphological and Phonological Learning of ACL-02, pp. 21–30, Philadelphia, Pennsylvania (2002)

    Google Scholar 

Download references

Acknowledgement

We would like to thank Ethiopia Ministry of Communication and Information Technology (MCIT) for funding to collect parallel text corpus and conduct an experiement for a bilingual Amharic-Tigrigna statistical machine translation research project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Melese Woldeyohannis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Woldeyohannis, M.M., Meshesha, M. (2018). Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna. In: Mekuria, F., Nigussie, E., Dargie, W., Edward, M., Tegegne, T. (eds) Information and Communication Technology for Development for Africa. ICT4DA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 244. Springer, Cham. https://doi.org/10.1007/978-3-319-95153-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-95153-9_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-95152-2

  • Online ISBN: 978-3-319-95153-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics