Advertisement

Abstract

Clustering is the popular unsupervised learning technique of data mining which divide the data into groups having similar objects and used in various application areas. k-Means is the most popular clustering algorithm among all partition based clustering algorithm to partition a dataset into meaningful patterns. k-Means suffers some shortcomings. This paper addresses two shortcomings of k-Means; pass number of centroids in apriori and does not handle noise. This paper also presents an overview of cluster analysis, clustering algorithms, preprocessing and normalization techniques in modified k-Means to improve the effectiveness and efficiency of the modified k-Means clustering algorithm.

Keywords

Algorithm Clustering k-Means Preprocessing Normalization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Chichester (1990)CrossRefzbMATHGoogle Scholar
  2. 2.
    Velmurugan, T., Santhanam, T.: Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points. Journal of Computer Science 6(3), 363–368 (2010)CrossRefGoogle Scholar
  3. 3.
    Jiawei Han, M.K.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers. An Imprint of Elsevier (2006)Google Scholar
  4. 4.
    Dunham, M.H.: Data Mining- Introductory and Advanced Concepts. In: Pearson Education 2006. Proceedings of the World Congress on Engineering, vol. 1 (2009)Google Scholar
  5. 5.
    McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceeding 5th Berkeley Symp. Math. Statist. Prob., vol. 1, pp. 281–297 (1967)Google Scholar
  6. 6.
    Merz, C., Murphy, P.: UCI Repository of Machine Learning Databases, ftp://ftp.ics.uci.edu/pub/machine-learning-databases
  7. 7.
    Tan, P.-N., Steinback, M., Kumar, V.: Introduction to Data Mining. Pearson Education (2007)Google Scholar
  8. 8.
    Patel, V.R., Mehta, R.G.: Clustering Algorithms: A Comprehensive Survey. In: International Conference on Electronics, Information and Communication Systems Engineering, Jodhpur (2011)Google Scholar
  9. 9.
    Oyelade, O.J., Oladipupo, O.O., Obagbuwa, I.C.: Application of kMeans Clustering algorithm for prediction of Students’ Academic Performance. International Journal of Computer Science and Information Security 7 (2010)Google Scholar
  10. 10.
    Sumitra Devi, K.A., Vijayalakshmi, M.N., Vasantha, R., Abraham, A.: Accomplishment of Circuit Partitioning using VHDL and Clustering Pertaining to VLSI designGoogle Scholar
  11. 11.
    Tilton, J.C., Marchisio, G., Koperski, K.: NASA’s Intelligent Systems Program, NASA Headquarter Code RGoogle Scholar
  12. 12.
    Ng, R.T., Han, J.: CLARANS:A Method for Clustering Objects for Spatial Data Mining. IEEE Transaction on Knowledge and Data Engineering 14(5), 1003–1016 (2002)CrossRefGoogle Scholar
  13. 13.
    Seidman, C.: Data Mining with Microsoft SQL Server 2000 Technical Reference, amazon.com/Mining-Microsoft-Server-Technical-Reference/dp/0735612714; ISBN:0-7356-1271-4
  14. 14.
    Noh, S.-K., Kim, Y.-M., Kim, D.K., Noh, B.-N.: Network Anomaly Detection Based on Clustering of Sequence Patterns. In: Gavrilova, M.L., Gervasi, O., Kumar, V., Tan, C.J.K., Taniar, D., Laganá, A., Mun, Y., Choo, H. (eds.) ICCSA 2006. LNCS, vol. 3981, pp. 349–358. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  15. 15.
    Sahay, S.: Study and Implementation of CHEMELEON algorithm for gene clusteringGoogle Scholar
  16. 16.
    Erman, J., Arlitt, M., Mahanti, A.: Traffic Classification Using Clustering Algorithms. In: SIGCOMM 2006 Workshops, Pisa, Italy, September 11-15 (2006)Google Scholar
  17. 17.
    Santhisree, K., Damodaram, A.: OPTICS on Sequential Data: Experiments and Test Results. International Journal of Computer Applications 5, 1–4 (2010)CrossRefGoogle Scholar
  18. 18.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. Department of Computer Science, University of Wisconsin, Madison, WI 53706Google Scholar
  19. 19.
    Maheshwari, P., Srivastava, N.: WaveCluster for Remote Sensing Image Retrieval. International Journal on Computer Science and Engineering 3(2) (2011)Google Scholar
  20. 20.
    Scanlan, J., Hartnett, J., Williams, R.: DynamicWEB: Profile Correlation Using COBWEB. In: Sattar, A., Kang, B.-h. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 1059–1063. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Vaishali R. Patel
    • 1
  • Rupa G. Mehta
    • 2
  1. 1.Department of Computer Science and EngineeringSVMITBharuchIndia
  2. 2.Department of Computer EngineeringSVNITSuratIndia

Personalised recommendations