Y-Means: An Autonomous Clustering Algorithm

Ghorbani, Ali A.; Onut, Iosif-Viorel

doi:10.1007/978-3-642-13769-3_1

Ali A. Ghorbani²¹ &
Iosif-Viorel Onut²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6076))

Included in the following conference series:

International Conference on Hybrid Artificial Intelligence Systems

1475 Accesses
1 Citations

Abstract

This paper proposes an unsupervised clustering technique for data classification based on the K-means algorithm. The K-means algorithm is well known for its simplicity and low time complexity. However, the algorithm has three main drawbacks: dependency on the initial centroids, dependency on the number of clusters, and degeneracy. Our solution accommodates these three issues, by proposing an approach to automatically detect a semi-optimal number of clusters according to the statistical nature of the data. As a side effect, the method also makes choices of the initial centroid-seeds not critical to the clustering results. The experimental results show the robustness of the Y-means algorithm as well as its good performance against a set of other well known unsupervised clustering techniques. Furthermore, we study the performance of our proposed solution against different distance and outlier-detection functions and recommend the best combinations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chan, P.K., Mahoney, M.V., Arshad, M.H.: Managing cyber threats: Issues, approaches, and challenges. In: Learning Rules and Clusters for Anomaly Detection in Network Traffic, ch. 3, pp. 81–99. Springer, Heidelberg (2005)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Cover, T., Hart, P.G.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory IT-13(1), 21–27 (1967)
Article MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernatics 3(1), 32–57 (1973)
Article MathSciNet MATH Google Scholar
Frigge, M., Hoaglin, D.C., Iglewicz, B.: Some implementations of the boxplot. The American Statistician 43(1), 50–54 (1989)
Google Scholar
Gibson, H.R.: Elementary statistics. William C. Brown Publishers, Dubuque (1994)
Google Scholar
Guan, Y., Belacel, N., Ghorbani, A.A.: Y-means: a clustering method for intrusion detection. In: Proceedings of the Canadian Conference on Electrical and Computer Engineering, Montreal, Canada, May 2003, pp. 1083–1086 (2003)
Google Scholar
Han, J., Kamber, M.: Data mining: Concepts and techniques. Morgan Kaufmann Publishers, New York (2001)
MATH Google Scholar
Hansen, P., Mladenovi, N.: J-means: a new local search heuristic for minimum sum-of-squares clustering. Pattern Recognition 34(2), 405–413 (2002)
Article Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for cluster data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Kohonen, T.: Self-organizing map. Springer, Heidelberg (1997)
Book MATH Google Scholar
MIT Lincoln Laboratory, Intrusion detection evaluation data set DARPA1998 (1998), http://www.ll.mit.edu/IST/ideval/data/1998/1998_data_index.html
Lei, J.Z., Ghorbani, A.: Network intrusion detection using an improved competitive learning neural network. In: Proceedings of The Second Annual Conference on Communication Networks and Services Research (CNSR), pp. 190–197 (2004)
Google Scholar
Lin, Y., Shiueng, C.: A genetic approach to the automatic clustering problem. Pattern Recognition 34(2), 415–424 (2001)
Article MATH Google Scholar
Lippman, R.P.: An introduction to computing with neural networks. Proceedings of the ASSP Magazine 4(2), 4–22 (1987)
Article MathSciNet Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 2(1), pp. 281–297 (1967)
Google Scholar
Mahalanobis, P.: On the generalized distance in statistics. Proceedings of the National Instute of Science (India) 2(1), 49–55
Google Scholar
University of California Irvine, Knowledge discovery and data mining dataset KDD 1999 (1999), http://kdd.ics.uci.edu/databases/kddcup99/task.html
Pelleg, D., Moore, A.: X-means: Extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 727–734. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Portnoy, L., Eskin, E., Stolfo, S.J.: Intrusion detection with unlabeled data using clustering. In: Proceedings of ACM CSS Workshop on Data Mining Applied to Security, DMSA 2001, November 2001. ACM, New York (2001)
Google Scholar
Quinlan, J.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Spath, H.: Clustering analysis algorithms for data reduction and classification of objects. Ellis Horwood, Chichester (1980)
MATH Google Scholar
Walpole, R.E.: Elementary Statistical Concepts, 2nd edn. Macmillan, Basingstoke (1983)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, University of New Brunswick, Fredericton, Canada
Ali A. Ghorbani & Iosif-Viorel Onut

Authors

Ali A. Ghorbani
View author publications
You can also search for this author in PubMed Google Scholar
Iosif-Viorel Onut
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Facultad de informatica UPV/EHU, San Sebastian, Spain
Manuel Graña Romay & M. Teresa Garcia Sebastian &
Universidad de Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghorbani, A.A., Onut, IV. (2010). Y-Means: An Autonomous Clustering Algorithm. In: Graña Romay, M., Corchado, E., Garcia Sebastian, M.T. (eds) Hybrid Artificial Intelligence Systems. HAIS 2010. Lecture Notes in Computer Science(), vol 6076. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13769-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-13769-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13768-6
Online ISBN: 978-3-642-13769-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics