Skip to main content

A New Stratified Immune Based Approach for Clustering High Dimensional Categorical Data

  • Conference paper
  • First Online:
  • 679 Accesses

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 394))

Abstract

With development in Database Technology, many existent real world applications contain outsized volumes of categorical data, which are playing an important role in data analysis and effective decision making. However, the clustering algorithms are deliberated for numerical data only, for the reason that of their similarity of measures. There is an enormous work carried on clustering categorical data with predefined similarity measure explicitly defined over categorical data. However, intricate problem with real world domain is that the feature in the data may depend on some hidden and transonic perspective, which is explicitly not in the given form of predictive features. So this poses a covenant with categorical data competently and proficiently. In this paper, a stratified immune based approach is proposed for clustering categorical data CAIS, is proposed with new similarity measure to minimize distance function. CAIS adopts an immunology based approach for effective discovery of clusters over categorical data. It selects frequently subsist nomadic feature as representative object and perform grouping into clusters with new affinity measure. CAIS is scaled to large number of attributes to minimize miscluster rate in the datasets. The extensive empirical analysis on CAIS shows that the proposed approach attains better mining efficiency on various categorical datasets and outperforms with Expectation Maximization (EM) in different settings.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Nasraoui O, Soliman M, Saka E, Badia A, Germain R (2008) A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans Knowl Data Eng 20(2):202–215

    Article  Google Scholar 

  2. Anderberg MR (1973) Cluster analysis for applications. Academic Press

    Google Scholar 

  3. Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco

    MATH  Google Scholar 

  4. Jain A, Dubes R (1998) Algorithms for clustering data. Prentice Hall

    Google Scholar 

  5. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Computing Surveys

    Google Scholar 

  6. Gibson D, Jon K, Prabhakar R (1998) Clustering categorical data: an approach based on dynamical systems. In: Proceedings of the 24th international conference on very large databases, pp 311–323 (and Squeezer [9])

    Google Scholar 

  7. Ganti V, Gehrke J, Ramakrishnan R (1999) CACTUS-clustering categorical data using summaries. In: Proceedings of 1999 international conference on knowledge discovery and data mining, pp 73–83

    Google Scholar 

  8. Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of 1999 international conference on data engineering, pp 512–521

    Google Scholar 

  9. He Z, Xu X, Deng S (2002) Squeezer: An efficient algorithm for clustering categorical data. J Comput Sci Technol 5:611–624

    Article  MathSciNet  MATH  Google Scholar 

  10. Zhendong P, Jiafu T, Mu L (2006) An immune-based clustering algorithm for large data sets with categorical values. World J Model Simul 2(1):28–35

    Google Scholar 

  11. Dempster AP, Laird NM, Rubin DB (1996) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. Surya Narayana .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Singapore

About this paper

Cite this paper

Surya Narayana, G., Vasumathi, D., Prasanna, K. (2017). A New Stratified Immune Based Approach for Clustering High Dimensional Categorical Data. In: Attele, K., Kumar, A., Sankar, V., Rao, N., Sarma, T. (eds) Emerging Trends in Electrical, Communications and Information Technologies. Lecture Notes in Electrical Engineering, vol 394. Springer, Singapore. https://doi.org/10.1007/978-981-10-1540-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-1540-3_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-1538-0

  • Online ISBN: 978-981-10-1540-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics