Skip to main content

Y-Means: An Autonomous Clustering Algorithm

  • Conference paper
Hybrid Artificial Intelligence Systems (HAIS 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6076))

Included in the following conference series:

Abstract

This paper proposes an unsupervised clustering technique for data classification based on the K-means algorithm. The K-means algorithm is well known for its simplicity and low time complexity. However, the algorithm has three main drawbacks: dependency on the initial centroids, dependency on the number of clusters, and degeneracy. Our solution accommodates these three issues, by proposing an approach to automatically detect a semi-optimal number of clusters according to the statistical nature of the data. As a side effect, the method also makes choices of the initial centroid-seeds not critical to the clustering results. The experimental results show the robustness of the Y-means algorithm as well as its good performance against a set of other well known unsupervised clustering techniques. Furthermore, we study the performance of our proposed solution against different distance and outlier-detection functions and recommend the best combinations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chan, P.K., Mahoney, M.V., Arshad, M.H.: Managing cyber threats: Issues, approaches, and challenges. In: Learning Rules and Clusters for Anomaly Detection in Network Traffic, ch. 3, pp. 81–99. Springer, Heidelberg (2005)

    Google Scholar 

  2. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)

    MATH  Google Scholar 

  3. Cover, T., Hart, P.G.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory IT-13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  5. Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernatics 3(1), 32–57 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  6. Frigge, M., Hoaglin, D.C., Iglewicz, B.: Some implementations of the boxplot. The American Statistician 43(1), 50–54 (1989)

    Google Scholar 

  7. Gibson, H.R.: Elementary statistics. William C. Brown Publishers, Dubuque (1994)

    Google Scholar 

  8. Guan, Y., Belacel, N., Ghorbani, A.A.: Y-means: a clustering method for intrusion detection. In: Proceedings of the Canadian Conference on Electrical and Computer Engineering, Montreal, Canada, May 2003, pp. 1083–1086 (2003)

    Google Scholar 

  9. Han, J., Kamber, M.: Data mining: Concepts and techniques. Morgan Kaufmann Publishers, New York (2001)

    MATH  Google Scholar 

  10. Hansen, P., Mladenovi, N.: J-means: a new local search heuristic for minimum sum-of-squares clustering. Pattern Recognition 34(2), 405–413 (2002)

    Article  Google Scholar 

  11. Jain, A.K., Dubes, R.C.: Algorithms for cluster data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  12. Kohonen, T.: Self-organizing map. Springer, Heidelberg (1997)

    Book  MATH  Google Scholar 

  13. MIT Lincoln Laboratory, Intrusion detection evaluation data set DARPA1998 (1998), http://www.ll.mit.edu/IST/ideval/data/1998/1998_data_index.html

  14. Lei, J.Z., Ghorbani, A.: Network intrusion detection using an improved competitive learning neural network. In: Proceedings of The Second Annual Conference on Communication Networks and Services Research (CNSR), pp. 190–197 (2004)

    Google Scholar 

  15. Lin, Y., Shiueng, C.: A genetic approach to the automatic clustering problem. Pattern Recognition 34(2), 415–424 (2001)

    Article  MATH  Google Scholar 

  16. Lippman, R.P.: An introduction to computing with neural networks. Proceedings of the ASSP Magazine 4(2), 4–22 (1987)

    Article  MathSciNet  Google Scholar 

  17. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 2(1), pp. 281–297 (1967)

    Google Scholar 

  18. Mahalanobis, P.: On the generalized distance in statistics. Proceedings of the National Instute of Science (India) 2(1), 49–55

    Google Scholar 

  19. University of California Irvine, Knowledge discovery and data mining dataset KDD 1999 (1999), http://kdd.ics.uci.edu/databases/kddcup99/task.html

  20. Pelleg, D., Moore, A.: X-means: Extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 727–734. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  21. Portnoy, L., Eskin, E., Stolfo, S.J.: Intrusion detection with unlabeled data using clustering. In: Proceedings of ACM CSS Workshop on Data Mining Applied to Security, DMSA 2001, November 2001. ACM, New York (2001)

    Google Scholar 

  22. Quinlan, J.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  23. Spath, H.: Clustering analysis algorithms for data reduction and classification of objects. Ellis Horwood, Chichester (1980)

    MATH  Google Scholar 

  24. Walpole, R.E.: Elementary Statistical Concepts, 2nd edn. Macmillan, Basingstoke (1983)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ghorbani, A.A., Onut, IV. (2010). Y-Means: An Autonomous Clustering Algorithm. In: Graña Romay, M., Corchado, E., Garcia Sebastian, M.T. (eds) Hybrid Artificial Intelligence Systems. HAIS 2010. Lecture Notes in Computer Science(), vol 6076. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13769-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13769-3_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13768-6

  • Online ISBN: 978-3-642-13769-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics