Document Classification Using Enhanced Grid Based Clustering Algorithm

Rashad, Mohamed Ahmed; El-Deeb, Hesham; Fakhr, Mohamed Waleed

doi:10.1007/978-3-319-06764-3_27

Mohamed Ahmed Rashad³,
Hesham El-Deeb⁴ &
Mohamed Waleed Fakhr³

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 312))

2412 Accesses
2 Citations

Abstract

Automated document clustering is an important text mining task especially with the rapid growth of the number of online documents present in Arabic language. Text clustering aims to automatically assign the text to a predefined cluster based on linguistic features. This research proposes an enhanced grid based clustering algorithm. The main purpose of this algorithm is to divide the data space into clusters with arbitrary shape. These clusters are considered as dense regions of points in the data space that are separated by regions of low density representing noise. Also it deals with making clustering the data set with multi-densities and assigning noise and outliers to the closest category. This will reduce the time complexity. Unclassified documents are preprocessed by removing stops words and extracting word root used to reduce the dimensionality of feature vectors of documents. Each document is then represented as a vector of words and their frequencies. The accuracy is presented according to time consumption and the percentage of successfully clustered instances. The results of the experiments that were carried out on an in-house collected Arabic text have proven its effectiveness of the enhanced clustering algorithm with average accuracy 89 %.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Raghuvira Pratap A, K Suvarna Vani, J Rama Devi, Dr.K Nageswara Rao, “An Efficient Density based Improved K- medoids Clustering algorithm”, (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 2, No. 6, 2011.
Google Scholar
Dina Adel Said “Dimensionality reduction techniques for enhancing automatic text categorization”, 2007.
Google Scholar
Priyanka Trikha and Singh Vijendra, “Fast Density Based Clustering Algorithm”, International Journal of Machine Learning and Computing, Vol. 3, No. 1, February 2013.
Google Scholar
Li Jian; Yu Wei; Yan Bao-Ping; “Memory effect in DBSCAN algorithm,” Computer Science & Education, 2009. ICCSE ‘09. 4th International Conference on, vol., no., pp.31-36, 25-28 July 2009.
Google Scholar
J. Hencil Peter, A. Antonysamy, “An Optimised Density Based Clustering Algorithm”, International Journal of Computer Applications (0975 – 8887) Volume 6– No.9, September 2010.
Google Scholar
Anil Kumar, S.Chandrasekhar, “Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering”, International Journal of Engineering Research & Technology (IJERT) Vol. 1 Issue 5, July – 2012 ISSN: 2278-0181.
Google Scholar
Osama A. Ghanem, Wesam M. Ashour, “Stemming Effectiveness in Clustering of Arabic Documents”, International Journal of Computer Applications (0975 – 8887) Volume 49– No.5, July 2012.
Google Scholar
Motaz K. Saad, “The Impact of Text Preprocessing and Term Weighting on Arabic Text Classification”, September 2010.
Google Scholar
Al-Shalabi, R., Kanaan, G. and Al-Serhan H., “New approach for extracting Arabic roots”, The International Arab Conference on Information Technology (ACIT ‘2003), Alexandria, Egypt, December, 2003.
Google Scholar
Mahmud S. Alkoffash, “Comparing between Arabic Text Clustering using K-means and K-mediods”, International Journal of Computer Applications (0975 – 8887) Volume 51– No.2, August 2012.
Google Scholar
El-Watan news http://www.elwatannews.com/.
El-Jazeera news http://www.aljazeera.net/news.

Download references

Author information

Authors and Affiliations

College of Computing and IT, Arab Academy for Science and Technology (AASTMT), Latakia, Syria
Mohamed Ahmed Rashad & Mohamed Waleed Fakhr
College of Computer Science, Modern University for Technology and Information (M.T.I), Cairo, Egypt
Hesham El-Deeb

Authors

Mohamed Ahmed Rashad
View author publications
You can also search for this author in PubMed Google Scholar
Hesham El-Deeb
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Waleed Fakhr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Ahmed Rashad .

Editor information

Editors and Affiliations

Computer Science and Engineering, University of Bridgeport Associate Dean for Graduate Programs, Bridgeport, Connecticut, USA
Khaled Elleithy
Engineering and Computer Science, University of Bridgeport Dean of the School of Engineering, Bridgeport, Connecticut, USA
Tarek Sobh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rashad, M.A., El-Deeb, H., Fakhr, M.W. (2015). Document Classification Using Enhanced Grid Based Clustering Algorithm. In: Elleithy, K., Sobh, T. (eds) New Trends in Networking, Computing, E-learning, Systems Sciences, and Engineering. Lecture Notes in Electrical Engineering, vol 312. Springer, Cham. https://doi.org/10.1007/978-3-319-06764-3_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-06764-3_27
Published: 08 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06763-6
Online ISBN: 978-3-319-06764-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics