An Empirical Comparative Study of Novel Clustering Algorithms for Class Imbalance Learning

Santhosh Kumar, Ch. N.; Rao, K. Nageswara; Govardhan, A.

doi:10.1007/978-81-322-2523-2_17

An Empirical Comparative Study of Novel Clustering Algorithms for Class Imbalance Learning

Ch. N. Santhosh Kumar⁶,
K. Nageswara Rao⁷ &
A. Govardhan⁸

Conference paper
First Online: 01 January 2015

1315 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 380))

Abstract

Data mining is the process of discovering knowledge from the vast data sources. In Data mining, classification and clustering are the two broad branches of study. In Clustering, K-means algorithm is one of the bench mark algorithms used for numerous applications. The popularity of k-means algorithm is due to its efficient and low usage of memory. One of the short comings of k-means algorithm is degradation of performance, when applied to imbalance distributed data. The results of cluster size generated by k-means are relatively uniform, in spite of the input data with non-uniform cluster sizes, which is defined as “uniform effect” in the literature. This paper proposes several novel algorithms to solve the above said problem. The proposed algorithms are compared with each other. The experiments conducted with the proposed algorithm on eleven UCI datasets with evaluation metrics show that proposed algorithms are effective to solve the problem of “uniform effect.”

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Xiong, H., Wu, J.J., Chen, J.: K-means clustering versus validation measures: A data-distribution perspective. IEEE Trans. Syst. Man Cybern. B Cybern. 39(2), 318–331 (2009)
Article Google Scholar
Lu, W.-Z., Wang, D.: Ground-level ozone prediction by support vector machine approach with a cost-sensitive classification scheme. Sci. Total. Environ. 395(2–3), 109–116 (2008)
Article Google Scholar
Huang, Y.-M., Hung, C.-M., Jiau, H.C.: Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Anal. R. World Appl. 7(4), 720–747 (2006)
Article MATH MathSciNet Google Scholar
Cieslak, D., Chawla, N., Striegel, A.: Combating imbalance in network intrusion datasets. In: IEEE International Conference Granular Computing, pp. 732–737 (2006)
Google Scholar
Mazurowski, M.A., Habas, P.A., Zurada, J.M., Lo, J.Y., Baker, J.A., Tourassi, G.D.: Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Netw. 21(2–3), 427–436 (2008)
Article Google Scholar
Freitas, A., Costa-Pereira, A., Brazdil, P.: Cost-sensitive decision trees applied to medical data. In: Song, I., Eder, J., Nguyen, T. (eds.) Data Warehousing Knowl. Discov. Lecture Notes Series in Computer Science
Google Scholar
Kilic, K., Uncu, Ö., Türksen, I.B.: Comparison of different strategies of utilizing fuzzy clustering in structure identification. Inf. Sci. 177(23), 5153–5162 (2007)
Google Scholar
Celebi, M.E., Kingravi, H.A., Uddin, B., Iyatomi, H., Aslandogan, Y.A., Stoecker, W.V., Moss, R.H.: A methodological approach to the classification of dermoscopy images. Comput. Med. Imag. Grap. 31(6), 362–373 (2007)
Article Google Scholar
Peng, X., King, I.: Robust BMPM training based on second-order cone programming and its application in medical diagnosis. Neural Netw. 21(2–3), 450–457 (2008). Berlin/Heidelberg, Germany: Springer, 2007, vol. 4654, pp. 303–312
Google Scholar
Guha, S., Rastogi, R., Shim, K.: Cure: an efficient clustering algorithm for large databases. In: Proceedings International Conference ACM Special Interest Group Manage Data, pp. 73–84 (1998)
Google Scholar
Liu, M.H., Jiang, X.D., Kot, A.C.: A multi-prototype clustering algorithm. Pattern Recognit. 42, 689–698 (2009)
Article MATH Google Scholar
Lago-Fernándezn, L.F., Aragón, J., Martínez-Muñoz, G., González, A.M., Sánchez-Montañés, M.: Cluster validation in problems with increasing dimensionality and unbalanced clusters. Neurocomputing, Elsevier 123, 33–39 (2014)
Google Scholar
Alejo, R., García, V., Pacheco-Sánchez, J.H.: An efficient over-sampling approach based on mean square error back propagation for dealing with the multi-class imbalance problem. Neural Process Lett, Elsivier. doi:10.1007/s11063-014-9376-3
Google Scholar
Wang, Q.: A hybrid sampling SVM approach to imbalanced data classification. Hindawi Publishing Corporation Abstract and Applied Analysis, vol. 2014, p. 7. Article ID 972786. http://dx.doi.org/10.1155/2014/972786
Santhosh Kumar, N., Nageswara Rao, K.,·Govardhan, A., Sudheer Reddy, K., Ali Mirza, M.: Undersampled K-means approach for handling imbalanced distributed data. Prog. Artif. Intell. Springer. doi:10.1007/s13748-014-0045-6
Google Scholar
Brzezinski, D., Stefanowski. J.: Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neural Networks Learn. Syst. http://dx.doi.org/10.1109/TNNLS.2013.2251352
Poolsawad, N., Kambhampati, C., Cleland, J.G.F.: Balancing class for performance of classification with a clinical dataset. In: Proceedings of the World Congress on Engineering 2014, vol. I, WCE n, U.K
Google Scholar
Oreški, G., Oreški, S.: An experimental comparison of classification algorithm performances for highly imbalanced datasets. Presented at CECIIS 2014
Google Scholar
Stefanowski, J.: Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data. Emerg. Paradig. Mach. Learn. Smart Innov. Syst. Technol. 13, 277–306 (2013)
Article Google Scholar
Tomašev, N., Mladeni, D.: Class imbalance and the curse of minority hubs. Knowledge-Based Syst. J. (2013). doi:http://dx.doi.org/10.1016/j.knosys.2013.08.031
Google Scholar
Santhosh Kumar, Ch.N., Nageswara Rao, K., Govardhan, A., Sudheer Reddy, K., Mahmood, A.M.: Undersampled K-means approach for handling imbalanced distributed data. Progress in Artificial Intelligence. ISSN:2192-6352 Prog Artif. Intell. 3, 29–38 (2014). doi:10.1007/s13748-014-0045-6. Published in Springer-Verlag Berlin Heidelberg April 2014
Google Scholar
Santhosh Kumar, Ch.N., Nageswara Rao, K., Govardhan, A., Sudheer Reddy, K.: Imbalanced K- means: An algorithm to cluster imbalanced—distributed data. Int. J. Eng. Techn. Res. (IJETR). vol.2, Issue-2, Feb. 2014. ISSN:2321-0869
Google Scholar
Santhosh Kumar, Ch.N., Nageswara Rao, K., Govardhan, A., Sandhya, N.: Subset K-Means approach for handling imbalanced-distributed data. Springer International Publication Switzerland 2015—Emerging ICT for Bridging the Future—Proceedings of the 49th Annual Convention of the Computer Society of India CSI, vol. 2. Advances in Intelligent Systems and Computing, vol. 338. doi:10.1007/978-3-319-13731-5_54, 2015, pp. 497–508. Published in Springer International Publication Switzerland 2015
Google Scholar
Blake, C., Merz, C.J.: UCI repository of machine learning databases. Machine-readable data repository. Department of Information and Computer Science, University of California at Irvine, Irvine (2000). http://www.ics.uci.edu/mlearn/MLRepository.html
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, JNTU-Hyderabad, Hyderabad, A.P, India
Ch. N. Santhosh Kumar
PSCMR College of Engineering and Technology, Vijayawada, A.P, India
K. Nageswara Rao
CSE & SIT, JNTU Hyderabad, Hyderabad, A.P, India
A. Govardhan

Authors

Ch. N. Santhosh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
K. Nageswara Rao
View author publications
You can also search for this author in PubMed Google Scholar
A. Govardhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ch. N. Santhosh Kumar .

Editor information

Editors and Affiliations

Dept. of Computer Science and Engineering, Anil Neerukonda Institute of Technology and Sciences, Vishakapatnam, India
Suresh Chandra Satapathy
Department of CSE, CMR Technical Campus, Hyderabad, India
K. Srujan Raju
Computer Science & Engineering, Kalyani University, Nadia, West Bengal, India
Jyotsna Kumar Mandal
Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges, Lucknow, Uttar Pradesh, India
Vikrant Bhateja

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Santhosh Kumar, C.N., Rao, K.N., Govardhan, A. (2016). An Empirical Comparative Study of Novel Clustering Algorithms for Class Imbalance Learning. In: Satapathy, S., Raju, K., Mandal, J., Bhateja, V. (eds) Proceedings of the Second International Conference on Computer and Communication Technologies. Advances in Intelligent Systems and Computing, vol 380. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2523-2_17

Download citation

DOI: https://doi.org/10.1007/978-81-322-2523-2_17
Published: 04 September 2015
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2522-5
Online ISBN: 978-81-322-2523-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics