A New Stratified Immune Based Approach for Clustering High Dimensional Categorical Data

Surya Narayana, G.; Vasumathi, D.; Prasanna, K.

doi:10.1007/978-981-10-1540-3_15

A New Stratified Immune Based Approach for Clustering High Dimensional Categorical Data

G. Surya Narayana⁶,
D. Vasumathi⁷ &
K. Prasanna⁶

Conference paper
First Online: 15 November 2016

679 Accesses

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 394))

Abstract

With development in Database Technology, many existent real world applications contain outsized volumes of categorical data, which are playing an important role in data analysis and effective decision making. However, the clustering algorithms are deliberated for numerical data only, for the reason that of their similarity of measures. There is an enormous work carried on clustering categorical data with predefined similarity measure explicitly defined over categorical data. However, intricate problem with real world domain is that the feature in the data may depend on some hidden and transonic perspective, which is explicitly not in the given form of predictive features. So this poses a covenant with categorical data competently and proficiently. In this paper, a stratified immune based approach is proposed for clustering categorical data CAIS, is proposed with new similarity measure to minimize distance function. CAIS adopts an immunology based approach for effective discovery of clusters over categorical data. It selects frequently subsist nomadic feature as representative object and perform grouping into clusters with new affinity measure. CAIS is scaled to large number of attributes to minimize miscluster rate in the datasets. The extensive empirical analysis on CAIS shows that the proposed approach attains better mining efficiency on various categorical datasets and outperforms with Expectation Maximization (EM) in different settings.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Nasraoui O, Soliman M, Saka E, Badia A, Germain R (2008) A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans Knowl Data Eng 20(2):202–215
Article Google Scholar
Anderberg MR (1973) Cluster analysis for applications. Academic Press
Google Scholar
Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco
MATH Google Scholar
Jain A, Dubes R (1998) Algorithms for clustering data. Prentice Hall
Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Computing Surveys
Google Scholar
Gibson D, Jon K, Prabhakar R (1998) Clustering categorical data: an approach based on dynamical systems. In: Proceedings of the 24th international conference on very large databases, pp 311–323 (and Squeezer [9])
Google Scholar
Ganti V, Gehrke J, Ramakrishnan R (1999) CACTUS-clustering categorical data using summaries. In: Proceedings of 1999 international conference on knowledge discovery and data mining, pp 73–83
Google Scholar
Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of 1999 international conference on data engineering, pp 512–521
Google Scholar
He Z, Xu X, Deng S (2002) Squeezer: An efficient algorithm for clustering categorical data. J Comput Sci Technol 5:611–624
Article MathSciNet MATH Google Scholar
Zhendong P, Jiafu T, Mu L (2006) An immune-based clustering algorithm for large data sets with categorical values. World J Model Simul 2(1):28–35
Google Scholar
Dempster AP, Laird NM, Rubin DB (1996) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, AITS, Rajampet, India
G. Surya Narayana & K. Prasanna
Department of CSE, JNTU, Hyderabad, India
D. Vasumathi

Authors

G. Surya Narayana
View author publications
You can also search for this author in PubMed Google Scholar
D. Vasumathi
View author publications
You can also search for this author in PubMed Google Scholar
K. Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. Surya Narayana .

Editor information

Editors and Affiliations

Mathematics and Computer Science, Chicago State University, Chicago, Illinois, USA
Kapila Rohan Attele
BioAxis DNA Research Centre (P) Hyderabad, IEEE India Council, Hyderabad, India
Amit Kumar
Department of Electrical Engineering, JNTU College of Engineering, Ananthapur, India
V. Sankar
Department of Computer Science, CVR College of Engineering, Hyderabad, India
N. V. Rao
Department of Computer Science and Engineering, Srinivasa Ramanujan Institute of Technology, Ananthapur, India
T. Hitendra Sarma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Surya Narayana, G., Vasumathi, D., Prasanna, K. (2017). A New Stratified Immune Based Approach for Clustering High Dimensional Categorical Data. In: Attele, K., Kumar, A., Sankar, V., Rao, N., Sarma, T. (eds) Emerging Trends in Electrical, Communications and Information Technologies. Lecture Notes in Electrical Engineering, vol 394. Springer, Singapore. https://doi.org/10.1007/978-981-10-1540-3_15

Download citation

DOI: https://doi.org/10.1007/978-981-10-1540-3_15
Published: 15 November 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-1538-0
Online ISBN: 978-981-10-1540-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics