Abstract
Traditional supervised clustering methods require the user to provide the number of clusters before we start any data exploration. The data engineer also has to select the initial cluster seeds. In c-means clustering method, the performance efficiency of the algorithm depends mainly on the initial selection of number of clusters and cluster seeds. With the real world data, the initial selection of cluster count and centroids becomes a tedious task. In this paper we propose a modified clustering algorithm which works on the principles of fuzzy clustering. The method we propose is using a modified form of popular fuzzy c-means algorithm for membership calculation. The algorithm begins on the assumption that all the data points are initial centroids. . The clusters are continuously merged based on a threshold value until we get the optimum number of clusters. The algorithm is also capable of detecting the outliers The algorithm is tested with the data for Gross National Happiness (GNH) program of Bhutan and found to be highly efficient in segmenting natural data sets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Pal, K., Mitra, P.: Data Mining in Soft Computing Framework: A Survey. IEEE transactions on neural networks 13(1) (January 2002)
Au, W.H., Chan, K.C.C.: Classification with Degree of Membership: A Fuzzy Approach. In: Proceedings IEEE International Conference on Data Mining, ICDM 2001 (2001)
Halkidi, M.: Quality assessment and Uncertainty Handling in Data Mining Process, http://citeseer.ist.psu.edu/halkidi00quality.html
Inmon, W.H.: The data warehouse and data mining. Commun., ACM 39, 49–50 (1996)
Fayyad, U., Uthurusamy, R.: Data mining and knowledge discovery in databases. ACM Commun. 39, 24–27 (1996)
Thomas, B., Raju, G.: A Modified c-means algorithm for Natural Data Exploration. In: WASET International Conference on Knowledge Management (ICKM), January 2009, vol. 49 (2009) ISSN 2070-3724
Thomas, B., Raju, G.: A Fuzzy Threshold Based Unsupervised Clustering Algorithm for Natural Data Exploration. In: Proceedings of International Conference on Database and Data Mining (ICDDM) (June 2010)
Keith, C.C., Wai-Ho Au, C., Choi, B.: Mining Fuzzy Rules in A Donor Database for Direct Marketing by A Charitable Organization. In: Proceedings. First IEEE International Conference on Cognitive Informatics, pp. 239–246 (2002)
Cox, E.: Fuzzy Modeling and Genetic Algorithms for Data Mining and Exploration. Elsevier, Amsterdam (2005)
Klir, G.J., Folger, T.A.: Fuzzy Sets, Uncertainty and Information. Prentice Hall, Englewood Cliffs (1988)
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Elsevier, Amsterdam (2003)
Donnelly, S.: How Bhutan Can Develop and Measure GNH, http://www.bhutanstudies.org.bt/seminar/0402-gnh/GNH-papers-1st_18-20.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thomas, B., Raju, G. (2010). A Fuzzy Threshold Based Modified Clustering Algorithm for Natural Data Exploration. In: Chen, H., Chau, M., Li, Sh., Urs, S., Srinivasa, S., Wang, G.A. (eds) Intelligence and Security Informatics. PAISI 2010. Lecture Notes in Computer Science, vol 6122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13601-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-13601-6_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13600-9
Online ISBN: 978-3-642-13601-6
eBook Packages: Computer ScienceComputer Science (R0)