A New Clustering Algorithm Based on K-Means Using a Line Segment as Prototype

  • Juan Carlos Rojas Thomas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7042)

Abstract

This project shows the development of a new clustering algorithm, based on k-means, which faces its problems with clusters of differences variances. This new algorithm uses a line segment as prototype which captures the axis that presents the biggest variance of the cluster. The line segment adjusts iteratively its long and direction as the data are classified. To perform the classification, a border region that determines approximately the limit on the cluster is built based on geometric model, which depends on the central line segment. The data are classified later according to their proximity to the different border regions. The process is repeated until the parameters of the all border regions associated with each cluster remain constant.

Keywords

Clustering Kmeans Variance Central Line Segment Border Region 

References

  1. 1.
    Jain, A.K., Murty, M.N., Flynn, O.J.: Data Clustering: a review. ACM Computing Surveys 31(3) (September 1999)Google Scholar
  2. 2.
    Gan, G., Ma, C., Wu, J.: Data Clustering Theory, algorithms and applications. SIAM, Society for Industrial and Applied Mathematics (May 30, 2007)Google Scholar
  3. 3.
    Fahim, M., Saake, G., Salem, A.M., Torkey, F.A., Ramadan, M.A.: K-Means for Spherical Clusters with Large Variance in Sizes. In: Proceedings of World Academy of Science, Engineering and Technology, Paris, vol. 35, pp. 177–182 (November 2008) ISSN 2070-3740Google Scholar
  4. 4.
    Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithms for Large Databases. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, pp. 73–84 (1998)Google Scholar
  5. 5.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)MATHGoogle Scholar
  6. 6.
    Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 3rd edn. (2006)Google Scholar
  7. 7.
    Pham, D.T., Dimov, S.S., Nguyen, C.D.: Selection of k in K-means clustering. Mechanical Engineering Science 219, 103–119 (2004)CrossRefGoogle Scholar
  8. 8.
    Pelleg, D., Moore, A.: x-means: Extending k-means with efficient estimation of the number of clusters. In: Proceedings of Seventeenth International Conference on Machine Learning, pp. 727–734. Morgan Kaufmann, San Francisco (2000)Google Scholar
  9. 9.
    Faber, V.: Clustering and the continuous k-means algorithm. Los Alamos Science 22, 138–144 (1994)Google Scholar
  10. 10.
    Phillips, S.: Acceleration of K-means and Related Clustering Algorithms. In: Mount, D.M., Stein, C. (eds.) ALENEX 2002. LNCS, vol. 2409, pp. 166–177. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Huang, Z.: Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Minig and Knowledge Discovery 2(3), 283–304 (1998)CrossRefGoogle Scholar
  12. 12.
    Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: Proceedings of the 15th International Conference on Machine Learning, pp. 91–99. Morgan Kaufmann, San Francisco (1998)Google Scholar
  13. 13.
    Deelers, S., Auwatanamongkol, S.: Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance. PWASET 26, 323–328 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Juan Carlos Rojas Thomas
    • 1
  1. 1.Universidad de AtacamaCopiapóChile

Personalised recommendations