Skip to main content

Design and Development of Sentence Parser for Afan Oromo Language

  • Conference paper
  • First Online:
  • 517 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1026))

Abstract

Parser is an efficient and accurate enough to be useful in many natural language processing systems, most notably in machine translation [1]. Previously many sentence parsers are developed for foreign languages such as English, Arabic, etc. as well as for Amharic language from local languages of Ethiopia. However, to the best of the researcher’s knowledge concerned, there is no Afan Oromo sentence parser for simple and complex sentences. Thus, we proposed to develop a sentence parser for Afan Oromo language. Parsing Afan Oromo sentence is needed and a necessary mechanism for other natural language processing applications like machine translation, question answering, knowledge extraction and information retrieval, particularly for Afan Oromo language. Rule-based parser using a top-down chart parsing algorithm for Afan Oromo sentences presented in this paper. Context Free Grammar (CFG) is used to represent the grammar. 500 sentences were prepared for sample corpus and CFG rules are extracted manually from sample tagged corpus. We also developed simple algorithm of a lexicon generator to automatically generate the lexical rules. Python programming language and NLTK are used as an implementation tools for this study. From the total of sample dataset 70% is simple sentence type because of we considered four different types of simple sentences (declaratives, interrogatives, imperatives and exclamatory sentences) and the rest 30% is complex sentence type. The parser was trained on 400 sentences of training dataset with the accuracy of 98.25% and tested on 100 sentences of testing dataset with the accuracy of 91%. The experimental results on a parser is an encouraging result since it is the first work for simple and complex sentences of Afan Oromo language.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Katz-Brown, J., et al.: Training a parser for machine translation reordering. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP 2011), pp. 183–192 (2011)

    Google Scholar 

  2. Genemo, A.S.: Afaan Oromo Named Entity Recognition Using Hybrid Approach, M.Sc. thesis, Department of Computer Science, School of Graduate Studies, Addis Ababa University, Addis Ababa (2015)

    Google Scholar 

  3. Chomsky, N.: Syntactic Structures, 2nd edn. New York (2002)

    Google Scholar 

  4. Megersa, D.: An automatic sentence parser for Oromo language using supervised learning technique, M.Sc.thesis, Department of Information Science, School of Graduate Studies, Addis Ababa University, Addis Ababa (2002)

    Google Scholar 

  5. Mohammed, A.D.: A top-down chart parser for Amharic sentences, M.Sc. thesis, Department of Computer Science, School of Graduate Studies, Addis Ababa University, Addis Ababa (2015)

    Google Scholar 

  6. Alemu, A.: Automatic sentence parsing for Amharic text an experiment using probabilistic context free grammars, M.Sc. thesis, Department of Information Science, School of Graduate Studies, Addis Ababa University, Addis Ababa (2002)

    Google Scholar 

  7. Agonafer, D.G.: An integrated approach to automatic complex sentence parsing for Amharic text, M.Sc. thesis, Department of Information Science, School of Graduate Studies, Addis Ababa University, Addis Ababa (2003)

    Google Scholar 

  8. Ibrahim, A.: A hybrid approach to Amharic base phrase chunking and parsing, M.Sc. thesis, Department of Computer Science, School of Graduate Studies, Addis Ababa University, Addis Ababa (2013)

    Google Scholar 

  9. Sleator, D.D.K.: Parsing English with a Link Grammar, National Science Foundation under grant CCR-8658139, Oline Corporation. R. R. Donnelley and Sons, New York (1991)

    Google Scholar 

  10. Al-Taani, A., Msallam, M., Wedian, S.: A top-down chart parser for analyzing Arabic sentences. Int. Arab. J. Inf. Technol. 9(3), 109–116 (2012)

    Google Scholar 

  11. Khoufi, N., Aloulou, C., Hadrich, L., Anlp, B.: ARSYPAR : a tool for parsing the Arabic language. In: International Arab Conference on Information Technology, ACIT, University of Science & Technology (2013)

    Google Scholar 

  12. Bataineh, B.M., Bataineh, E.A.: An efficient recursive transition network parser for Arabic language. In: Proceedings of the World Congress on Engineering 2009, vol. II, pp. 1307–1311 (2009)

    Google Scholar 

  13. Hambir, N.: Hindi parser-based on CKY algorithm, vol. 3, no. 2, pp. 851–853 (2012)

    Google Scholar 

  14. Thant, W.W., Htwe, T.M., Thein, N.L.: Context free grammar based top-down parsing of Myanmar sentences. International Conference On Information Technology, Pattaya, December 2011, pp. 71–75 (2011)

    Google Scholar 

  15. Lian, H.: Chinese language parsing with maximum-entropy-inspired parser maximum-entropy-inspired parser, M.S. thesis, pp. 1–6 (2005)

    Google Scholar 

  16. Ouersighni, R.: Robust Rule-based Approach in Arabic processing. Int. J. Comput. Appl. 93(12), 31–37 (2014)

    Google Scholar 

  17. Erbach, G.: A flexible parser for a linguistic development environment. In: Herzog, O., Rollinger, C.-R. (eds.) Text Understanding in LILOG. LNCS, vol. 546, pp. 74–87. Springer, Heidelberg (1991). https://doi.org/10.1007/3-540-54594-8_53

    Chapter  Google Scholar 

  18. Weikum, G.: Foundations of statistical natural language processing. ACM SIGMOD Rec. 31(3), 37 (2002)

    Article  Google Scholar 

  19. Jason: Parsing. https://www.cs.cornell.edu/courses/cs4740/2012sp/lectures/parsing-intro-4pp.pdf. Accessed 03 Feb 2017

  20. Thompson, B.I.: Afro Asiatic Language Family (2017). http://aboutworldlanguages.com/afro-asiatic-language-family. Accessed 04 Feb 2017

  21. Gamta, T.: The Oromo language and the latin alphabet. J. Oromo Stud. 10–13 (1992). http://www.africa.upenn.edu/Hornet/Afaan_Oromo_19777.html. Accessed 06 Feb 2017

  22. Ganfure, G.O., Midekso, D.: Design and implementation of morphology based spell checker, vol. 3, no. 12, pp. 118–125 (2014)

    Google Scholar 

  23. Yimam, B.: The phrase structures of ethiopian oromo, Ph.D. Dissertification, Addis Ababa University (1986)

    Google Scholar 

  24. Alqrainy, S., Jordan, S., Alkoffash, M.S.: Context-free grammar analysis for Arabic sentences. Int. J. Comput. Appl. 53(3), 7–11 (2012)

    Google Scholar 

  25. Kibble, R.: Introduction to natural language processing undergraduate study in computing and related programmes (2013)

    Google Scholar 

  26. Zhu, S.C.: Ch 4 classic parsing algorithms chart parsing in NLP pp. 1–51

    Google Scholar 

  27. Fox, H.J.: Lexicalized, edge-based, best-first chart parsing, M.Sc. thesis, Department of Computer Science, Massachusetts Institute of Technology, Brown University (1999)

    Google Scholar 

  28. Nedjo, A.T., Huang, D., Liu, X.: Automatic part-of-speech tagging for Oromo language using Maximum Entropy Markov Model (MEMM), vol. 10, pp. 3319–3334 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hailu Beshada Balcha or Tesfa Tegegne .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Balcha, H.B., Tegegne, T. (2019). Design and Development of Sentence Parser for Afan Oromo Language. In: Mekuria, F., Nigussie, E., Tegegne, T. (eds) Information and Communication Technology for Development for Africa. ICT4DA 2019. Communications in Computer and Information Science, vol 1026. Springer, Cham. https://doi.org/10.1007/978-3-030-26630-1_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26630-1_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26629-5

  • Online ISBN: 978-3-030-26630-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics