Abstract
With development in Database Technology, many existent real world applications contain outsized volumes of categorical data, which are playing an important role in data analysis and effective decision making. However, the clustering algorithms are deliberated for numerical data only, for the reason that of their similarity of measures. There is an enormous work carried on clustering categorical data with predefined similarity measure explicitly defined over categorical data. However, intricate problem with real world domain is that the feature in the data may depend on some hidden and transonic perspective, which is explicitly not in the given form of predictive features. So this poses a covenant with categorical data competently and proficiently. In this paper, a stratified immune based approach is proposed for clustering categorical data CAIS, is proposed with new similarity measure to minimize distance function. CAIS adopts an immunology based approach for effective discovery of clusters over categorical data. It selects frequently subsist nomadic feature as representative object and perform grouping into clusters with new affinity measure. CAIS is scaled to large number of attributes to minimize miscluster rate in the datasets. The extensive empirical analysis on CAIS shows that the proposed approach attains better mining efficiency on various categorical datasets and outperforms with Expectation Maximization (EM) in different settings.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Nasraoui O, Soliman M, Saka E, Badia A, Germain R (2008) A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans Knowl Data Eng 20(2):202–215
Anderberg MR (1973) Cluster analysis for applications. Academic Press
Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco
Jain A, Dubes R (1998) Algorithms for clustering data. Prentice Hall
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Computing Surveys
Gibson D, Jon K, Prabhakar R (1998) Clustering categorical data: an approach based on dynamical systems. In: Proceedings of the 24th international conference on very large databases, pp 311–323 (and Squeezer [9])
Ganti V, Gehrke J, Ramakrishnan R (1999) CACTUS-clustering categorical data using summaries. In: Proceedings of 1999 international conference on knowledge discovery and data mining, pp 73–83
Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of 1999 international conference on data engineering, pp 512–521
He Z, Xu X, Deng S (2002) Squeezer: An efficient algorithm for clustering categorical data. J Comput Sci Technol 5:611–624
Zhendong P, Jiafu T, Mu L (2006) An immune-based clustering algorithm for large data sets with categorical values. World J Model Simul 2(1):28–35
Dempster AP, Laird NM, Rubin DB (1996) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Singapore
About this paper
Cite this paper
Surya Narayana, G., Vasumathi, D., Prasanna, K. (2017). A New Stratified Immune Based Approach for Clustering High Dimensional Categorical Data. In: Attele, K., Kumar, A., Sankar, V., Rao, N., Sarma, T. (eds) Emerging Trends in Electrical, Communications and Information Technologies. Lecture Notes in Electrical Engineering, vol 394. Springer, Singapore. https://doi.org/10.1007/978-981-10-1540-3_15
Download citation
DOI: https://doi.org/10.1007/978-981-10-1540-3_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-1538-0
Online ISBN: 978-981-10-1540-3
eBook Packages: EngineeringEngineering (R0)