Advertisement

Itemsets-Based Amharic Document Categorization Using an Extended A Priori Algorithm

  • Abraham Hailu
  • Yaregal AssabieEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9561)

Abstract

Document categorization is gaining importance due to the large volume of electronic information which requires automatic organization and pattern identification. Due to the morphological complexity of the language, automatic categorization of Amharic documents has become a difficult talk to carry out. This paper presents a system that categorizes Amharic documents based on the frequency of itemsets obtained after analyzing the morphology of the language. We selected seven categories into which a given document is to be classified. The task of categorization is achieved by employing an extended version of a priori algorithm which had been traditionally used for the purpose of knowledge mining in the form of association rules. The system is tested with a corpus containing Amharic news documents and experimental results are reported.

Keywords

Amharic language processing Text categorization Document classification A priori algorithm Itemsets 

References

  1. Afework, Y.: Automatic amharic text categorization. Master’s thesis, Addis Ababa University, Ethiopia (2008)Google Scholar
  2. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington, DC (1993)Google Scholar
  3. Eyassu, S., Gambäck, B.: Classifying amharic news text using self-organizing maps. In: Proceedings of 43rd Annual Meeting of the Association for Computational Linguistics, Michigan, USA (2005)Google Scholar
  4. Goller, C., Löning, J., Will, T., Wolff, W.: Automatic document classification: a thorough evaluation of various methods (2009). doi:10.1.1.90.966Google Scholar
  5. Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, USA (2006)zbMATHGoogle Scholar
  6. Hynek, J., Jezek, K., Rohlik, O.: Short document categorization - itemsets method. In: The Proceedings of PKDD 2000 Conference, Lyon, France (2000)Google Scholar
  7. Lewis, M.P., Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World, Seventeenth edn. SIL International, Dallas (2013)Google Scholar
  8. Morshed, A.: Towards the automatic categorization of documents in user-generated categorizations, Technical report No. DIT-06-001, University of Trento, Italy (2006)Google Scholar
  9. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRefGoogle Scholar
  10. Teklu, S.: Automatic categorization of Amharic news text: a machine learning approach. Master’s thesis, Addis Ababa University, Ethiopia (2003)Google Scholar
  11. Tilahun, S.: Automatic Amharic news categorization. Master’s thesis, Addis Ababa University, Ethiopia (2001)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of Computer ScienceAddis Ababa UniversityAddis AbabaEthiopia

Personalised recommendations