Itemsets-Based Amharic Document Categorization Using an Extended A Priori Algorithm

Hailu, Abraham; Assabie, Yaregal

doi:10.1007/978-3-319-43808-5_24

Abraham Hailu¹⁶ &
Yaregal Assabie¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9561))

Included in the following conference series:

Language and Technology Conference

678 Accesses

Abstract

Document categorization is gaining importance due to the large volume of electronic information which requires automatic organization and pattern identification. Due to the morphological complexity of the language, automatic categorization of Amharic documents has become a difficult talk to carry out. This paper presents a system that categorizes Amharic documents based on the frequency of itemsets obtained after analyzing the morphology of the language. We selected seven categories into which a given document is to be classified. The task of categorization is achieved by employing an extended version of a priori algorithm which had been traditionally used for the purpose of knowledge mining in the form of association rules. The system is tested with a corpus containing Amharic news documents and experimental results are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Afework, Y.: Automatic amharic text categorization. Master’s thesis, Addis Ababa University, Ethiopia (2008)
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington, DC (1993)
Google Scholar
Eyassu, S., Gambäck, B.: Classifying amharic news text using self-organizing maps. In: Proceedings of 43rd Annual Meeting of the Association for Computational Linguistics, Michigan, USA (2005)
Google Scholar
Goller, C., Löning, J., Will, T., Wolff, W.: Automatic document classification: a thorough evaluation of various methods (2009). doi:10.1.1.90.966
Google Scholar
Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, USA (2006)
MATH Google Scholar
Hynek, J., Jezek, K., Rohlik, O.: Short document categorization - itemsets method. In: The Proceedings of PKDD 2000 Conference, Lyon, France (2000)
Google Scholar
Lewis, M.P., Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World, Seventeenth edn. SIL International, Dallas (2013)
Google Scholar
Morshed, A.: Towards the automatic categorization of documents in user-generated categorizations, Technical report No. DIT-06-001, University of Trento, Italy (2006)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article Google Scholar
Teklu, S.: Automatic categorization of Amharic news text: a machine learning approach. Master’s thesis, Addis Ababa University, Ethiopia (2003)
Google Scholar
Tilahun, S.: Automatic Amharic news categorization. Master’s thesis, Addis Ababa University, Ethiopia (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Addis Ababa University, Addis Ababa, Ethiopia
Abraham Hailu & Yaregal Assabie

Authors

Abraham Hailu
View author publications
You can also search for this author in PubMed Google Scholar
Yaregal Assabie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yaregal Assabie .

Editor information

Editors and Affiliations

Adam Mickiewicz University , Poznań, Poland
Zygmunt Vetulani
Deutsches Forschungszentrum f. Künstl.Intelligenz (DFKI GmbH), Saarbrücken, Saarland, Germany
Hans Uszkoreit
Adam Mickiewicz University , Poznań, Poland
Marek Kubis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hailu, A., Assabie, Y. (2016). Itemsets-Based Amharic Document Categorization Using an Extended A Priori Algorithm. In: Vetulani, Z., Uszkoreit, H., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2013. Lecture Notes in Computer Science(), vol 9561. Springer, Cham. https://doi.org/10.1007/978-3-319-43808-5_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-43808-5_24
Published: 30 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43807-8
Online ISBN: 978-3-319-43808-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics