A New K-means-Based Algorithm for Automatic Clustering and Outlier Discovery

Jambudi, Trushali; Gandhi, Savita

doi:10.1007/978-981-13-1747-7_44

Trushali Jambudi⁵ &
Savita Gandhi⁶

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 107))

1476 Accesses
3 Citations

Abstract

K-means is one of the most popular partition-based clustering algorithms that partition data objects based on attributes/features into K number of groups or clusters. In this paper, we address the major issues affecting the performance of k-means clustering algorithm. We have proposed as well as implemented a new k-means-based clustering algorithm which forms clusters by detecting and removing both global and local outliers and automatically converging into optimal clusters which are formed by a two-part process of splitting the initial clusters into subclusters based on criterion at local level and, in the second part, merging the clusters that satisfy the nearness criterion. Experiments show that our algorithm is able to automatically generate optimal number of clusters of different sizes and shapes which are free from global and local outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bakar, Z.A., Mohemad, R., Ahmad, A., Deris, M.M.: A comparative study for outlier detection in data mining. IEEE Explore, ICCIS.2006.252287
Google Scholar
Vijayarani, S., Nithya, S.: An efficient clustering algorithm for outlier detection. Int. J. Comput. Appl. (0975-8887), 32 (2011)
Google Scholar
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31, 651–666 (2010) (Elsevier, 2009). http://www.journals.elsevier.com/pattern-recognition-letters
Article Google Scholar
Bhowate, P.K., Gadicha, V.B.: Outlier detection method for data set based on clustering and EDA technique. Int. J. Eng. Res. Technol. (IJERT) 3(2) (2014). ISSN: 2278-0181
Google Scholar
Zang, J.: Advancements of Outlier Detection: A Survey, 2nd edn., vol. 13, iss. 01–03. ICST Transactions on Scalable Systems, January–March 2013
Google Scholar
Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase clustering process for outlier detection. Pattern Recognit. Lett. 22(6–7), 691–700 (2001)
Article Google Scholar
He, Z., Xu, X.: A Fast Greedy Algorithm for Outlier Mining. ACM Digital Library, April 2006
Chapter Google Scholar
Zhang, J.: Advancements of outlier detection: a survey. ICST Trans. Scalable Inf. Syst. 2nd Volume 13, Issue 01–03 (2013)
Article Google Scholar
Popat, S.K., Emmanuel, M.: Review and comparative study of clustering techniques. Int. J. Comput. Sci. Inf. Technol. 5(1), 805–812 (2014)
Google Scholar
Teknomo, K.: K-means clustering tutorial. Medicine 100(4), 3 (2006)
Google Scholar
Blömer, J., Lammersen, C., Schmidt, M., Sohler, C.: Theoretical analysis of the k-means algorithm–a survey. In: Algorithm Engineering, pp. 81–116. Springer International Publishing (2016)
Google Scholar
Gan, G., Ng, M.K.-P.: k-means clustering with outlier removal. Pattern Recognit. Lett. 90, 8–14 (2017)
Article Google Scholar
Zhou, Y., Yu, H., Cai, X.: A novel k-means algorithm for clustering and outlier detection. In: Second International Conference on Future Information Technology and Management Engineering, 2009 (FITME’09), pp. 476–480. IEEE (2009)
Google Scholar
Na, S., Xumin, L., Yong, G.: Research on k-means clustering algorithm: an improved k-means clustering algorithm. In: 2010 Third International Symposium on Intelligent Information Technology and Security Informatics (IITSI), pp. 63–67. IEEE (2010)
Google Scholar
Chawla, S., Gionis, A.: k-means: a unified approach to clustering and outlier detection. In: Proceedings of the 2013 SIAM International Conference on Data Mining, pp. 189–197. Society for Industrial and Applied Mathematics (2013)
Google Scholar
Gupta, S., Kumar, R., Lu, K., Moseley, B., Vassilvitskii, S.: Local search methods for k-means with outliers. Proc. VLDB Endow. 10(7), 757–768 (2017)
Article Google Scholar
Capó, M., Pérez, A., Lozano, J.A.: An efficient approximation to the k-means clustering for massive data. Knowl.-Based Syst. 117, 56–69 (2017)
Article Google Scholar
Ott, L., Pang, L., Ramos, F.T., Chawla, S.: On integrated clustering and outlier detection. In: Advances in Neural Information Processing Systems, pp. 1359–1367 (2014)
Google Scholar
Hautamäki, V., Cherednichenko, S., Kärkkäinen, I., Kinnunen, T., Fränti, P.: Improving k-means by outlier removal. In: Scandinavian Conference on Image Analysis, pp. 978–987. Springer, Berlin, Heidelberg (2005)
Chapter Google Scholar
Patel, V.R., Mehta, R.G.: Impact of outlier removal and normalization approach in modified k-means clustering algorithm. Int. J. Comput. Sci. Issues (IJCSI) 8(5), 331–336 (2011)
Google Scholar
Chadha, A., Kumar, S.: An improved K-means clustering algorithm: a step forward for removal of dependency on K. In: 2014 International Conference on Optimization, Reliability, and Information Technology (ICROIT), pp. 136–140. IEEE (2014)
Google Scholar
Marghny, M.H., Taloba, A.I.: Outlier detection using improved genetic k-means (2014). arXiv:1402.6859
Vijayarani, S., Maria Sylviaa, S., Sakila, A.: Clustering algorithms for outlier detection performance analysis. Int. Conf. Comput. Intell. Syst. 4, 1213–1217 (2015)
Google Scholar
Rajeswari, K., Acharya, O., Sharma, M., Kopnar, M., Karandikar, K.: Improvement in K-means clustering algorithm using data clustering. In: 2015 International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 367–369. IEEE (2015)
Google Scholar
Yu, Q., Luo, Y., Chen, C., Ding, X.: Outlier-eliminated k-means clustering algorithm based on differential privacy preservation. Appl. Intell. 1–13 (2016)
Google Scholar
Han, J., Kamber, M., Pei, J.: Data mining concepts and techniques. In: The Morgan Kaufmann Series in Data Management Systems, 3rd edn. Elsevier (2012)
Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T.: LOF: identifying density-based local outliers. In: ACM Conference Proceedings, pp. 93–104 (2000)
Article Google Scholar
https://www.rdocumentation.org/packages/vegan/versions/2.4-2/topics/cascadeKM
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)
MathSciNet MATH Google Scholar
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/ (2017)
Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recognit. 39(5), 761–765 (2006)
Article Google Scholar
Kärkkäinen, I, Fränti, P.: Dynamic local search algorithm for the clustering problem. Research Report A-2002-6
Google Scholar
Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)
Article Google Scholar
Fränti P, et al.: Clustering datasets. http://cs.uef.fi/sipu/datasets/ (2015)

Download references

Acknowledgements

We are thankful to Ms. Sana Balotrawala and Mr. Rushabh Shah, students of M.C.A., Department of Computer Science, Gujarat University, for helping with implementation.

Author information

Authors and Affiliations

School of Computer Studies, Ahmedabad University, Ahmedabad, Gujarat, India
Trushali Jambudi
Department of Computer Science, Gujarat University, Ahmedabad, Gujarat, India
Savita Gandhi

Authors

Trushali Jambudi
View author publications
You can also search for this author in PubMed Google Scholar
Savita Gandhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Trushali Jambudi .

Editor information

Editors and Affiliations

School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, India
Suresh Chandra Satapathy
Sabar Institute of Technology, Gujarat Technological University, Ahmedabad, Gujarat, India
Amit Joshi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jambudi, T., Gandhi, S. (2019). A New K-means-Based Algorithm for Automatic Clustering and Outlier Discovery. In: Satapathy, S., Joshi, A. (eds) Information and Communication Technology for Intelligent Systems . Smart Innovation, Systems and Technologies, vol 107. Springer, Singapore. https://doi.org/10.1007/978-981-13-1747-7_44

Download citation

DOI: https://doi.org/10.1007/978-981-13-1747-7_44
Published: 15 December 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1746-0
Online ISBN: 978-981-13-1747-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics