Skip to main content

A New K-means-Based Algorithm for Automatic Clustering and Outlier Discovery

  • Conference paper
  • First Online:
Information and Communication Technology for Intelligent Systems

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 107))

Abstract

K-means is one of the most popular partition-based clustering algorithms that partition data objects based on attributes/features into K number of groups or clusters. In this paper, we address the major issues affecting the performance of k-means clustering algorithm. We have proposed as well as implemented a new k-means-based clustering algorithm which forms clusters by detecting and removing both global and local outliers and automatically converging into optimal clusters which are formed by a two-part process of splitting the initial clusters into subclusters based on criterion at local level and, in the second part, merging the clusters that satisfy the nearness criterion. Experiments show that our algorithm is able to automatically generate optimal number of clusters of different sizes and shapes which are free from global and local outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bakar, Z.A., Mohemad, R., Ahmad, A., Deris, M.M.: A comparative study for outlier detection in data mining. IEEE Explore, ICCIS.2006.252287

    Google Scholar 

  2. Vijayarani, S., Nithya, S.: An efficient clustering algorithm for outlier detection. Int. J. Comput. Appl. (0975-8887), 32 (2011)

    Google Scholar 

  3. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31, 651–666 (2010) (Elsevier, 2009). http://www.journals.elsevier.com/pattern-recognition-letters

    Article  Google Scholar 

  4. Bhowate, P.K., Gadicha, V.B.: Outlier detection method for data set based on clustering and EDA technique. Int. J. Eng. Res. Technol. (IJERT) 3(2) (2014). ISSN: 2278-0181

    Google Scholar 

  5. Zang, J.: Advancements of Outlier Detection: A Survey, 2nd edn., vol. 13, iss. 01–03. ICST Transactions on Scalable Systems, January–March 2013

    Google Scholar 

  6. Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase clustering process for outlier detection. Pattern Recognit. Lett. 22(6–7), 691–700 (2001)

    Article  Google Scholar 

  7. He, Z., Xu, X.: A Fast Greedy Algorithm for Outlier Mining. ACM Digital Library, April 2006

    Chapter  Google Scholar 

  8. Zhang, J.: Advancements of outlier detection: a survey. ICST Trans. Scalable Inf. Syst. 2nd Volume 13, Issue 01–03 (2013)

    Article  Google Scholar 

  9. Popat, S.K., Emmanuel, M.: Review and comparative study of clustering techniques. Int. J. Comput. Sci. Inf. Technol. 5(1), 805–812 (2014)

    Google Scholar 

  10. Teknomo, K.: K-means clustering tutorial. Medicine 100(4), 3 (2006)

    Google Scholar 

  11. Blömer, J., Lammersen, C., Schmidt, M., Sohler, C.: Theoretical analysis of the k-means algorithm–a survey. In: Algorithm Engineering, pp. 81–116. Springer International Publishing (2016)

    Google Scholar 

  12. Gan, G., Ng, M.K.-P.: k-means clustering with outlier removal. Pattern Recognit. Lett. 90, 8–14 (2017)

    Article  Google Scholar 

  13. Zhou, Y., Yu, H., Cai, X.: A novel k-means algorithm for clustering and outlier detection. In: Second International Conference on Future Information Technology and Management Engineering, 2009 (FITME’09), pp. 476–480. IEEE (2009)

    Google Scholar 

  14. Na, S., Xumin, L., Yong, G.: Research on k-means clustering algorithm: an improved k-means clustering algorithm. In: 2010 Third International Symposium on Intelligent Information Technology and Security Informatics (IITSI), pp. 63–67. IEEE (2010)

    Google Scholar 

  15. Chawla, S., Gionis, A.: k-means: a unified approach to clustering and outlier detection. In: Proceedings of the 2013 SIAM International Conference on Data Mining, pp. 189–197. Society for Industrial and Applied Mathematics (2013)

    Google Scholar 

  16. Gupta, S., Kumar, R., Lu, K., Moseley, B., Vassilvitskii, S.: Local search methods for k-means with outliers. Proc. VLDB Endow. 10(7), 757–768 (2017)

    Article  Google Scholar 

  17. Capó, M., Pérez, A., Lozano, J.A.: An efficient approximation to the k-means clustering for massive data. Knowl.-Based Syst. 117, 56–69 (2017)

    Article  Google Scholar 

  18. Ott, L., Pang, L., Ramos, F.T., Chawla, S.: On integrated clustering and outlier detection. In: Advances in Neural Information Processing Systems, pp. 1359–1367 (2014)

    Google Scholar 

  19. Hautamäki, V., Cherednichenko, S., Kärkkäinen, I., Kinnunen, T., Fränti, P.: Improving k-means by outlier removal. In: Scandinavian Conference on Image Analysis, pp. 978–987. Springer, Berlin, Heidelberg (2005)

    Chapter  Google Scholar 

  20. Patel, V.R., Mehta, R.G.: Impact of outlier removal and normalization approach in modified k-means clustering algorithm. Int. J. Comput. Sci. Issues (IJCSI) 8(5), 331–336 (2011)

    Google Scholar 

  21. Chadha, A., Kumar, S.: An improved K-means clustering algorithm: a step forward for removal of dependency on K. In: 2014 International Conference on Optimization, Reliability, and Information Technology (ICROIT), pp. 136–140. IEEE (2014)

    Google Scholar 

  22. Marghny, M.H., Taloba, A.I.: Outlier detection using improved genetic k-means (2014). arXiv:1402.6859

  23. Vijayarani, S., Maria Sylviaa, S., Sakila, A.: Clustering algorithms for outlier detection performance analysis. Int. Conf. Comput. Intell. Syst. 4, 1213–1217 (2015)

    Google Scholar 

  24. Rajeswari, K., Acharya, O., Sharma, M., Kopnar, M., Karandikar, K.: Improvement in K-means clustering algorithm using data clustering. In: 2015 International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 367–369. IEEE (2015)

    Google Scholar 

  25. Yu, Q., Luo, Y., Chen, C., Ding, X.: Outlier-eliminated k-means clustering algorithm based on differential privacy preservation. Appl. Intell. 1–13 (2016)

    Google Scholar 

  26. Han, J., Kamber, M., Pei, J.: Data mining concepts and techniques. In: The Morgan Kaufmann Series in Data Management Systems, 3rd edn. Elsevier (2012)

    Google Scholar 

  27. Breunig, M.M., Kriegel, H.P., Ng, R.T.: LOF: identifying density-based local outliers. In: ACM Conference Proceedings, pp. 93–104 (2000)

    Article  Google Scholar 

  28. https://www.rdocumentation.org/packages/vegan/versions/2.4-2/topics/cascadeKM

  29. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)

    MathSciNet  MATH  Google Scholar 

  30. R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/ (2017)

  31. Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recognit. 39(5), 761–765 (2006)

    Article  Google Scholar 

  32. Kärkkäinen, I, Fränti, P.: Dynamic local search algorithm for the clustering problem. Research Report A-2002-6

    Google Scholar 

  33. Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)

    Article  Google Scholar 

  34. Fränti P, et al.: Clustering datasets. http://cs.uef.fi/sipu/datasets/ (2015)

Download references

Acknowledgements

We are thankful to Ms. Sana Balotrawala and Mr. Rushabh Shah, students of M.C.A., Department of Computer Science, Gujarat University, for helping with implementation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Trushali Jambudi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jambudi, T., Gandhi, S. (2019). A New K-means-Based Algorithm for Automatic Clustering and Outlier Discovery. In: Satapathy, S., Joshi, A. (eds) Information and Communication Technology for Intelligent Systems . Smart Innovation, Systems and Technologies, vol 107. Springer, Singapore. https://doi.org/10.1007/978-981-13-1747-7_44

Download citation

Publish with us

Policies and ethics