Skip to main content

Novel Approaches to Unsupervised Clustering Through k-Windows Algorithm

  • Conference paper
Knowledge Mining

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 185))

Summary

The extraction of meaningful information from large collections of data is a fundamental issues in science. To this end, clustering algorithms are typically employed to identify groups (clusters) of similar objects. A critical issue for any clustering algorithm is the determination of the number of clusters present in a dataset. In this contribution we present a clustering algorithm that in addition to partitioning the data into clusters, it approximates the number of clusters during its execution. We further present modifications of this algorithm for different distributed environments, and dynamic databases. Finally, we present a modification of the algorithm that exploits the fractal dimension of the data to partition the dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P.K. Agarwal and C.M. Procopiuc. Exact and approximation algorithms for clustering (extended abstract). In Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 658–667, San Francisco, California, U.S.A., 1998.

    Google Scholar 

  2. M.S. Aldenderfer and R.K. Blashfield. Cluster Analysis, volume 44 of Quantitative Applications in the Social Sciences. SAGE Publications, London, 1984.

    Google Scholar 

  3. P. Alevizos. An algorithm for orthogonal range search in d ≥ 3 dimensions. In Proceedings of the 14th European Workshop on Computational Geometry. Barcelona, 1998.

    Google Scholar 

  4. P. Alevizos, D.K. Tasoulis, and M.N. Vrahatis. Parallelizing the unsupervised k-windows clustering algorithm. In R. Wyrzykowski, editor, Lecture Notes in Computer Science, volume 3019, pages 225–232. Springer-Verlag, 2004.

    Google Scholar 

  5. M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. In ACM SIGMOD Int. Conf. on Management of Data, pages 49–60, 1999.

    Google Scholar 

  6. J. Aslam, K. Pelekhov, and D. Rus. A practical clustering algorithm for static and dynamic information organization. In ACM-SIAM Symposium on Discrete Algorithms, pages 51–60, 1999.

    Google Scholar 

  7. D. Barbarä and P. Chen. Using the fractal dimension to cluster datasets. In KDD, pages 260–264. ACM Press, 2000.

    Google Scholar 

  8. R.W. Becker and G.V. Lago. A global optimization algorithm. In Proceedings of the 8th Allerton Conference on Circuits and Systems Theory, pages 3–12, 1970.

    Google Scholar 

  9. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In ACM SIGMOD Int. Conf. on Management of Data, pages 322–331, 1990.

    Google Scholar 

  10. J.L. Bentley and H.A. Maurer. Efficient worst-case data structures for range searching. Acta Informatica, 13:155–168, 1980.

    Article  MathSciNet  MATH  Google Scholar 

  11. F. Can. Incremental clustering for dynamic information processing. ACM Trans. Inf. Syst., 11(2):143–164, 1993.

    Article  Google Scholar 

  12. Earthquake Catalogue. http://www.gein.noa.gr/services/cat.html, Institute of Geodynamics, National Observatory of Athens.

    Google Scholar 

  13. P.K. Chan and S.J. Stolfo. Sharing learned models among remote database partitions by local meta-learning. In Knowledge Discovery and Data Mining, pages 2–7, 1996.

    Google Scholar 

  14. M. Charikar, C. Chekuri, T. Feder, and R. Motwani. Incremental clustering and dynamic information retrieval. SIAM Journal on Computing, 33(6):1417–1440, 2004.

    MathSciNet  MATH  Google Scholar 

  15. B. Chazelle. Filtering search: A new approach to query-answering. SIAM Journal on Computing, 15(3):703–724, 1986.

    Article  MATH  MathSciNet  Google Scholar 

  16. B. Chazelle and L.J. Guibas. Fractional cascading: II applications. Algorithmica, 1:163–191, 1986.

    Article  MathSciNet  MATH  Google Scholar 

  17. D.W.L. Cheung, S.D. Lee, and B. Kao. A general incremental technique for maintaining discovered association rules. In Database Systems for Advanced Applications, pages 185–194, 1997.

    Google Scholar 

  18. I.S. Dhillon and D.S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Large-Scale Parallel Data Mining, Lecture Notes in Artificial Intelligence, pages 245–260, 2000.

    Google Scholar 

  19. I.S. Dhillon and D.S. Modha. Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1/2):143–175, 2001.

    Article  MATH  Google Scholar 

  20. M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental clustering for mining in a data warehousing environment. In 24rd Int. Conf. erence on Very Large Data Bases, pages 323–333. Morgan Kaufmann Publishers Inc., 1998.

    Google Scholar 

  21. M. Ester and R. Wittmann. Incremental generalization for mining in a data warehousing environment. In Proceedings of the 6th Int. Conf. Extending Database Technology, pages 135–149. Springer-Verlag, 1998.

    Google Scholar 

  22. U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. Advances in Knowledge Discovery and Data Mining. MIT Press, 1996.

    Google Scholar 

  23. T. Feder D.H. Greene. Optimal algorithm for approximate clustering. In 20th Annual ACM Sympos. Theory Comput., pages 434–444, 1988.

    Google Scholar 

  24. J.A. Hartigan and M.A. Wong. A k-means clustering algorithm. Applied Statistics, 28:100–108, 1979.

    MATH  Google Scholar 

  25. H. Kargupta, W. Huang, K. Sivakumar, and E.L. Johnson. Distributed clustering using collective principal component analysis. Knowledge and Information Systems, 3(4):422–448, 2001.

    Article  MATH  Google Scholar 

  26. KDD. Cup data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, 1999.

    Google Scholar 

  27. H-P. Kriegel, P. Kroger, and I. Gotlibovich. Incremental optics: Efficient computation of updates in a hierarchical cluster ordering. In 5th Int. Conf. on Data Warehousing and Knowledge Discovery, 2003.

    Google Scholar 

  28. W. Lam and A.M. Segre. Distributed data mining of probabilistic knowledge. In Proceedings of the 17th Int. Conf. on Distributed Computing Systems, Washington, pages 178–185. IEEE Computer Society Press, 1997.

    Google Scholar 

  29. L.S. Liebovitch and T. Toth. A fast algorithm to determine fractal dimensions by box counting. Physics Letters, 141A(8), 1989.

    Google Scholar 

  30. C. Linnaeus. Clavis Classium in Systemate Phytologorum in Bibliotheca Botanica. Amsterdam, The Netherlands: Biblioteca Botanica, 1736.

    Google Scholar 

  31. G.D. Magoulas, V.P. Plagianakos, D.K. Tasoulis, and M.N. Vrahatis. Tumor detection in colonoscopy using the unsupervised k-windows clustering algorithm and neural networks. In Fourth European Symposium on “Biomedical Engineering”, 2004.

    Google Scholar 

  32. B. B. Mandelbrot. The Fractal Geometry of Nature. Freeman, New York, 1983.

    Google Scholar 

  33. N. Megiddo and K.J. Supowit. On the complexity of some common geometric problems. SIAM Journal on Computing, 13:182–196, 1984.

    Article  MathSciNet  MATH  Google Scholar 

  34. O. Nasraoui and C. Rojas. From static to dynamic web usage mining: Towards scalable profiling and personalization with evolutionary computation. In Workshop on Information Technology Rabat, Morocco, 2003.

    Google Scholar 

  35. N.G. Pavlidis, D.K. Tasoulis, and M.N. Vrahatis. Financial forecasting through unsupervised clustering and evolutionary trained neural networks. In Congress on Evolutionary Computation, pages 2314–2321, Canberra Australia, 2003.

    Google Scholar 

  36. A.P. Pentland. Fractal-based description of natural scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6):661–674, 1984.

    Article  Google Scholar 

  37. M.G.P. Prasad, S. Dube, and K. Sridharan. An efficient fractals-based algorithm for clustering. In IEEE Region 10 Conference on Convergent Technologies For The Asia-Pacific, 2003.

    Google Scholar 

  38. F. Preparata and M. Shamos. Computational Geometry. Springer Verlag, New York, Berlin, 1985.

    Google Scholar 

  39. O. Procopiuc, P.K. Agarwal, L. Arge, and J.S. Vitter. Bkd-tree: A dynamic scalable kd-tree. In T. Hadzilacos, Y. Manolopoulos, and J.F. Roddick, editors, Advances in Spatial and Temporal Databases, SSTD, volume 2750 of Lecture Notes in Computer Science, pages 46–65. Springer, 2003.

    Google Scholar 

  40. V. Ramasubramanian and K. Paliwal. Fast k-dimensional tree algorithms for nearest neighbor search with application to vector quantization encoding. IEEE Transactions on Signal Processing, 40(3):518–531, 1992.

    Article  Google Scholar 

  41. M. Rigou, S. Sirmakessis, and A. Tsakalidis. A computational geometry approach to web personalization. In IEEE Int. Conf. on E-Commerce Technology (CEC’04), pages 377–380, San Diego, California, 2004.

    Google Scholar 

  42. J.T. Robinson. The K-D-B-tree: A search structure for large multidimensional dynamic indexes. In ACM SIGMOD Int. Conf. on Management of Data, pages 10–18, 1981.

    Google Scholar 

  43. J. Sander, M. Ester, H.-P. Kriegel, and X. Xu. Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery, 2(2):169–194, 1998.

    Article  Google Scholar 

  44. N. Sarkar and B.B. Chaudhuri. An efficient approach to estimate fractal dimension of textural images. Pattern Recognition, 25(9):1035–1041, 1992.

    Article  Google Scholar 

  45. S. Sirmakessis, editor. Text Mining and its Applications, volume 138 of Studies in Fuzziness and Soft Computing. Springer, 2004.

    Google Scholar 

  46. D.K. Tasoulis, P. Alevizos, B. Boutsinas, and M.N. Vrahatis. Parallel unsupervised k-windows: an efficient parallel clustering algorithm. In V. Malyshkin, editor, Lecture Notes in Computer Science, volume 2763, pages 336–344. Springer-Verlag, 2003.

    Google Scholar 

  47. D.K. Tasoulis, V.P. Plagianakos, and M.N. Vrahatis. Unsupervised cluster analysis in bioinformatics. In Fourth European Symposium on “Biomedical Engineering”, 2004.

    Google Scholar 

  48. D.K. Tasoulis, V.P. Plagianakos, and M.N. Vrahatis. Unsupervised clustering of bioinformatics data. In European Symposium on Intelligent Technologies, Hybrid Systems and their implementation on Smart Adaptive Systems, Eunite, pages 47–53, 2004.

    Google Scholar 

  49. D.K. Tasoulis, L. Vladutu, V.P. Plagianakos, A. Bezerianos, and M.N. Vrahatis. On-line neural network training for automatic ischemia episode detection. In Leszek Rutkowski, Jörg H. Siekmann, Ryszard Tadeusiewicz, and Lotfi A. Zadeh, editors, Lecture Notes in Computer Science, volume 2070, pages 1062–1068. Springer-Verlag, 2003.

    Google Scholar 

  50. D.K. Tasoulis and M.N. Vrahatis. Unsupervised distributed clustering. In IASTED Int. Conf. on Parallel and Distributed Computing and Networks, pages 347–351. Innsbruck, Austria, 2004.

    Google Scholar 

  51. D.K. Tasoulis and M.N. Vrahatis. Unsupervised clustering on dynaic databases. Pattern Recognition Letters, 2005. in press.

    Google Scholar 

  52. D.K. Tasoulis and M.N. Vrahatis. Unsupervised clustering using fractal dimension. International Journal of Biffurcation and Chaos, 2005. in press.

    Google Scholar 

  53. D.K. Tasoulis and M.N. Vrahatis. Generalizing the k-windows clustering algorithm for metric spaces. Mathematical and Computer Modelling, 2005. in press.

    Google Scholar 

  54. A. Törn and A. Žilinskas. Global Optimization. Springer-Verlag, Berlin, 1989.

    MATH  Google Scholar 

  55. C. Tryon. Cluster Analysis. Ann Arbor, MI: Edward Brothers, 1939.

    Google Scholar 

  56. M.N. Vrahatis, B. Boutsinas, P. Alevizos, and G. Pavlides. The new k-windows algorithm for improving the k-means clustering algorithm. Journal of Complexity, 18:375–391, 2002.

    Article  MathSciNet  MATH  Google Scholar 

  57. P. Willett. Recent trends in hierarchic document clustering: a critical review. Inf. Process. Manage., 24(5):577–597, 1988.

    Article  Google Scholar 

  58. X. Xu, J. Jgerand, and H.P. Kriegel. A fast parallel clustering algorithm for large spatial databases. Data Mining and Knowledge Discovery, 3:263–290, 1999.

    Article  Google Scholar 

  59. C. Zou, B. Salzberg, and R. Ladin. Back to the future: Dynamic hierarchical clustering. In Int. Conf. on Data Engineering, pages 578–587. IEEE Computer Society, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tasoulis, D.K., Vrahatis, M.N. (2005). Novel Approaches to Unsupervised Clustering Through k-Windows Algorithm. In: Sirmakessis, S. (eds) Knowledge Mining. Studies in Fuzziness and Soft Computing, vol 185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32394-5_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-32394-5_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25070-8

  • Online ISBN: 978-3-540-32394-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics