Abstract
K-means is one of the most popular partition-based clustering algorithms that partition data objects based on attributes/features into K number of groups or clusters. In this paper, we address the major issues affecting the performance of k-means clustering algorithm. We have proposed as well as implemented a new k-means-based clustering algorithm which forms clusters by detecting and removing both global and local outliers and automatically converging into optimal clusters which are formed by a two-part process of splitting the initial clusters into subclusters based on criterion at local level and, in the second part, merging the clusters that satisfy the nearness criterion. Experiments show that our algorithm is able to automatically generate optimal number of clusters of different sizes and shapes which are free from global and local outliers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bakar, Z.A., Mohemad, R., Ahmad, A., Deris, M.M.: A comparative study for outlier detection in data mining. IEEE Explore, ICCIS.2006.252287
Vijayarani, S., Nithya, S.: An efficient clustering algorithm for outlier detection. Int. J. Comput. Appl. (0975-8887), 32 (2011)
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31, 651–666 (2010) (Elsevier, 2009). http://www.journals.elsevier.com/pattern-recognition-letters
Bhowate, P.K., Gadicha, V.B.: Outlier detection method for data set based on clustering and EDA technique. Int. J. Eng. Res. Technol. (IJERT) 3(2) (2014). ISSN: 2278-0181
Zang, J.: Advancements of Outlier Detection: A Survey, 2nd edn., vol. 13, iss. 01–03. ICST Transactions on Scalable Systems, January–March 2013
Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase clustering process for outlier detection. Pattern Recognit. Lett. 22(6–7), 691–700 (2001)
He, Z., Xu, X.: A Fast Greedy Algorithm for Outlier Mining. ACM Digital Library, April 2006
Zhang, J.: Advancements of outlier detection: a survey. ICST Trans. Scalable Inf. Syst. 2nd Volume 13, Issue 01–03 (2013)
Popat, S.K., Emmanuel, M.: Review and comparative study of clustering techniques. Int. J. Comput. Sci. Inf. Technol. 5(1), 805–812 (2014)
Teknomo, K.: K-means clustering tutorial. Medicine 100(4), 3 (2006)
Blömer, J., Lammersen, C., Schmidt, M., Sohler, C.: Theoretical analysis of the k-means algorithm–a survey. In: Algorithm Engineering, pp. 81–116. Springer International Publishing (2016)
Gan, G., Ng, M.K.-P.: k-means clustering with outlier removal. Pattern Recognit. Lett. 90, 8–14 (2017)
Zhou, Y., Yu, H., Cai, X.: A novel k-means algorithm for clustering and outlier detection. In: Second International Conference on Future Information Technology and Management Engineering, 2009 (FITME’09), pp. 476–480. IEEE (2009)
Na, S., Xumin, L., Yong, G.: Research on k-means clustering algorithm: an improved k-means clustering algorithm. In: 2010 Third International Symposium on Intelligent Information Technology and Security Informatics (IITSI), pp. 63–67. IEEE (2010)
Chawla, S., Gionis, A.: k-means: a unified approach to clustering and outlier detection. In: Proceedings of the 2013 SIAM International Conference on Data Mining, pp. 189–197. Society for Industrial and Applied Mathematics (2013)
Gupta, S., Kumar, R., Lu, K., Moseley, B., Vassilvitskii, S.: Local search methods for k-means with outliers. Proc. VLDB Endow. 10(7), 757–768 (2017)
Capó, M., Pérez, A., Lozano, J.A.: An efficient approximation to the k-means clustering for massive data. Knowl.-Based Syst. 117, 56–69 (2017)
Ott, L., Pang, L., Ramos, F.T., Chawla, S.: On integrated clustering and outlier detection. In: Advances in Neural Information Processing Systems, pp. 1359–1367 (2014)
Hautamäki, V., Cherednichenko, S., Kärkkäinen, I., Kinnunen, T., Fränti, P.: Improving k-means by outlier removal. In: Scandinavian Conference on Image Analysis, pp. 978–987. Springer, Berlin, Heidelberg (2005)
Patel, V.R., Mehta, R.G.: Impact of outlier removal and normalization approach in modified k-means clustering algorithm. Int. J. Comput. Sci. Issues (IJCSI) 8(5), 331–336 (2011)
Chadha, A., Kumar, S.: An improved K-means clustering algorithm: a step forward for removal of dependency on K. In: 2014 International Conference on Optimization, Reliability, and Information Technology (ICROIT), pp. 136–140. IEEE (2014)
Marghny, M.H., Taloba, A.I.: Outlier detection using improved genetic k-means (2014). arXiv:1402.6859
Vijayarani, S., Maria Sylviaa, S., Sakila, A.: Clustering algorithms for outlier detection performance analysis. Int. Conf. Comput. Intell. Syst. 4, 1213–1217 (2015)
Rajeswari, K., Acharya, O., Sharma, M., Kopnar, M., Karandikar, K.: Improvement in K-means clustering algorithm using data clustering. In: 2015 International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 367–369. IEEE (2015)
Yu, Q., Luo, Y., Chen, C., Ding, X.: Outlier-eliminated k-means clustering algorithm based on differential privacy preservation. Appl. Intell. 1–13 (2016)
Han, J., Kamber, M., Pei, J.: Data mining concepts and techniques. In: The Morgan Kaufmann Series in Data Management Systems, 3rd edn. Elsevier (2012)
Breunig, M.M., Kriegel, H.P., Ng, R.T.: LOF: identifying density-based local outliers. In: ACM Conference Proceedings, pp. 93–104 (2000)
https://www.rdocumentation.org/packages/vegan/versions/2.4-2/topics/cascadeKM
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/ (2017)
Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recognit. 39(5), 761–765 (2006)
Kärkkäinen, I, Fränti, P.: Dynamic local search algorithm for the clustering problem. Research Report A-2002-6
Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)
Fränti P, et al.: Clustering datasets. http://cs.uef.fi/sipu/datasets/ (2015)
Acknowledgements
We are thankful to Ms. Sana Balotrawala and Mr. Rushabh Shah, students of M.C.A., Department of Computer Science, Gujarat University, for helping with implementation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jambudi, T., Gandhi, S. (2019). A New K-means-Based Algorithm for Automatic Clustering and Outlier Discovery. In: Satapathy, S., Joshi, A. (eds) Information and Communication Technology for Intelligent Systems . Smart Innovation, Systems and Technologies, vol 107. Springer, Singapore. https://doi.org/10.1007/978-981-13-1747-7_44
Download citation
DOI: https://doi.org/10.1007/978-981-13-1747-7_44
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1746-0
Online ISBN: 978-981-13-1747-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)