Summary
The extraction of meaningful information from large collections of data is a fundamental issues in science. To this end, clustering algorithms are typically employed to identify groups (clusters) of similar objects. A critical issue for any clustering algorithm is the determination of the number of clusters present in a dataset. In this contribution we present a clustering algorithm that in addition to partitioning the data into clusters, it approximates the number of clusters during its execution. We further present modifications of this algorithm for different distributed environments, and dynamic databases. Finally, we present a modification of the algorithm that exploits the fractal dimension of the data to partition the dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
P.K. Agarwal and C.M. Procopiuc. Exact and approximation algorithms for clustering (extended abstract). In Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 658–667, San Francisco, California, U.S.A., 1998.
M.S. Aldenderfer and R.K. Blashfield. Cluster Analysis, volume 44 of Quantitative Applications in the Social Sciences. SAGE Publications, London, 1984.
P. Alevizos. An algorithm for orthogonal range search in d ≥ 3 dimensions. In Proceedings of the 14th European Workshop on Computational Geometry. Barcelona, 1998.
P. Alevizos, D.K. Tasoulis, and M.N. Vrahatis. Parallelizing the unsupervised k-windows clustering algorithm. In R. Wyrzykowski, editor, Lecture Notes in Computer Science, volume 3019, pages 225–232. Springer-Verlag, 2004.
M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. In ACM SIGMOD Int. Conf. on Management of Data, pages 49–60, 1999.
J. Aslam, K. Pelekhov, and D. Rus. A practical clustering algorithm for static and dynamic information organization. In ACM-SIAM Symposium on Discrete Algorithms, pages 51–60, 1999.
D. Barbarä and P. Chen. Using the fractal dimension to cluster datasets. In KDD, pages 260–264. ACM Press, 2000.
R.W. Becker and G.V. Lago. A global optimization algorithm. In Proceedings of the 8th Allerton Conference on Circuits and Systems Theory, pages 3–12, 1970.
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In ACM SIGMOD Int. Conf. on Management of Data, pages 322–331, 1990.
J.L. Bentley and H.A. Maurer. Efficient worst-case data structures for range searching. Acta Informatica, 13:155–168, 1980.
F. Can. Incremental clustering for dynamic information processing. ACM Trans. Inf. Syst., 11(2):143–164, 1993.
Earthquake Catalogue. http://www.gein.noa.gr/services/cat.html, Institute of Geodynamics, National Observatory of Athens.
P.K. Chan and S.J. Stolfo. Sharing learned models among remote database partitions by local meta-learning. In Knowledge Discovery and Data Mining, pages 2–7, 1996.
M. Charikar, C. Chekuri, T. Feder, and R. Motwani. Incremental clustering and dynamic information retrieval. SIAM Journal on Computing, 33(6):1417–1440, 2004.
B. Chazelle. Filtering search: A new approach to query-answering. SIAM Journal on Computing, 15(3):703–724, 1986.
B. Chazelle and L.J. Guibas. Fractional cascading: II applications. Algorithmica, 1:163–191, 1986.
D.W.L. Cheung, S.D. Lee, and B. Kao. A general incremental technique for maintaining discovered association rules. In Database Systems for Advanced Applications, pages 185–194, 1997.
I.S. Dhillon and D.S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Large-Scale Parallel Data Mining, Lecture Notes in Artificial Intelligence, pages 245–260, 2000.
I.S. Dhillon and D.S. Modha. Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1/2):143–175, 2001.
M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental clustering for mining in a data warehousing environment. In 24rd Int. Conf. erence on Very Large Data Bases, pages 323–333. Morgan Kaufmann Publishers Inc., 1998.
M. Ester and R. Wittmann. Incremental generalization for mining in a data warehousing environment. In Proceedings of the 6th Int. Conf. Extending Database Technology, pages 135–149. Springer-Verlag, 1998.
U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. Advances in Knowledge Discovery and Data Mining. MIT Press, 1996.
T. Feder D.H. Greene. Optimal algorithm for approximate clustering. In 20th Annual ACM Sympos. Theory Comput., pages 434–444, 1988.
J.A. Hartigan and M.A. Wong. A k-means clustering algorithm. Applied Statistics, 28:100–108, 1979.
H. Kargupta, W. Huang, K. Sivakumar, and E.L. Johnson. Distributed clustering using collective principal component analysis. Knowledge and Information Systems, 3(4):422–448, 2001.
KDD. Cup data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, 1999.
H-P. Kriegel, P. Kroger, and I. Gotlibovich. Incremental optics: Efficient computation of updates in a hierarchical cluster ordering. In 5th Int. Conf. on Data Warehousing and Knowledge Discovery, 2003.
W. Lam and A.M. Segre. Distributed data mining of probabilistic knowledge. In Proceedings of the 17th Int. Conf. on Distributed Computing Systems, Washington, pages 178–185. IEEE Computer Society Press, 1997.
L.S. Liebovitch and T. Toth. A fast algorithm to determine fractal dimensions by box counting. Physics Letters, 141A(8), 1989.
C. Linnaeus. Clavis Classium in Systemate Phytologorum in Bibliotheca Botanica. Amsterdam, The Netherlands: Biblioteca Botanica, 1736.
G.D. Magoulas, V.P. Plagianakos, D.K. Tasoulis, and M.N. Vrahatis. Tumor detection in colonoscopy using the unsupervised k-windows clustering algorithm and neural networks. In Fourth European Symposium on “Biomedical Engineering”, 2004.
B. B. Mandelbrot. The Fractal Geometry of Nature. Freeman, New York, 1983.
N. Megiddo and K.J. Supowit. On the complexity of some common geometric problems. SIAM Journal on Computing, 13:182–196, 1984.
O. Nasraoui and C. Rojas. From static to dynamic web usage mining: Towards scalable profiling and personalization with evolutionary computation. In Workshop on Information Technology Rabat, Morocco, 2003.
N.G. Pavlidis, D.K. Tasoulis, and M.N. Vrahatis. Financial forecasting through unsupervised clustering and evolutionary trained neural networks. In Congress on Evolutionary Computation, pages 2314–2321, Canberra Australia, 2003.
A.P. Pentland. Fractal-based description of natural scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6):661–674, 1984.
M.G.P. Prasad, S. Dube, and K. Sridharan. An efficient fractals-based algorithm for clustering. In IEEE Region 10 Conference on Convergent Technologies For The Asia-Pacific, 2003.
F. Preparata and M. Shamos. Computational Geometry. Springer Verlag, New York, Berlin, 1985.
O. Procopiuc, P.K. Agarwal, L. Arge, and J.S. Vitter. Bkd-tree: A dynamic scalable kd-tree. In T. Hadzilacos, Y. Manolopoulos, and J.F. Roddick, editors, Advances in Spatial and Temporal Databases, SSTD, volume 2750 of Lecture Notes in Computer Science, pages 46–65. Springer, 2003.
V. Ramasubramanian and K. Paliwal. Fast k-dimensional tree algorithms for nearest neighbor search with application to vector quantization encoding. IEEE Transactions on Signal Processing, 40(3):518–531, 1992.
M. Rigou, S. Sirmakessis, and A. Tsakalidis. A computational geometry approach to web personalization. In IEEE Int. Conf. on E-Commerce Technology (CEC’04), pages 377–380, San Diego, California, 2004.
J.T. Robinson. The K-D-B-tree: A search structure for large multidimensional dynamic indexes. In ACM SIGMOD Int. Conf. on Management of Data, pages 10–18, 1981.
J. Sander, M. Ester, H.-P. Kriegel, and X. Xu. Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery, 2(2):169–194, 1998.
N. Sarkar and B.B. Chaudhuri. An efficient approach to estimate fractal dimension of textural images. Pattern Recognition, 25(9):1035–1041, 1992.
S. Sirmakessis, editor. Text Mining and its Applications, volume 138 of Studies in Fuzziness and Soft Computing. Springer, 2004.
D.K. Tasoulis, P. Alevizos, B. Boutsinas, and M.N. Vrahatis. Parallel unsupervised k-windows: an efficient parallel clustering algorithm. In V. Malyshkin, editor, Lecture Notes in Computer Science, volume 2763, pages 336–344. Springer-Verlag, 2003.
D.K. Tasoulis, V.P. Plagianakos, and M.N. Vrahatis. Unsupervised cluster analysis in bioinformatics. In Fourth European Symposium on “Biomedical Engineering”, 2004.
D.K. Tasoulis, V.P. Plagianakos, and M.N. Vrahatis. Unsupervised clustering of bioinformatics data. In European Symposium on Intelligent Technologies, Hybrid Systems and their implementation on Smart Adaptive Systems, Eunite, pages 47–53, 2004.
D.K. Tasoulis, L. Vladutu, V.P. Plagianakos, A. Bezerianos, and M.N. Vrahatis. On-line neural network training for automatic ischemia episode detection. In Leszek Rutkowski, Jörg H. Siekmann, Ryszard Tadeusiewicz, and Lotfi A. Zadeh, editors, Lecture Notes in Computer Science, volume 2070, pages 1062–1068. Springer-Verlag, 2003.
D.K. Tasoulis and M.N. Vrahatis. Unsupervised distributed clustering. In IASTED Int. Conf. on Parallel and Distributed Computing and Networks, pages 347–351. Innsbruck, Austria, 2004.
D.K. Tasoulis and M.N. Vrahatis. Unsupervised clustering on dynaic databases. Pattern Recognition Letters, 2005. in press.
D.K. Tasoulis and M.N. Vrahatis. Unsupervised clustering using fractal dimension. International Journal of Biffurcation and Chaos, 2005. in press.
D.K. Tasoulis and M.N. Vrahatis. Generalizing the k-windows clustering algorithm for metric spaces. Mathematical and Computer Modelling, 2005. in press.
A. Törn and A. Žilinskas. Global Optimization. Springer-Verlag, Berlin, 1989.
C. Tryon. Cluster Analysis. Ann Arbor, MI: Edward Brothers, 1939.
M.N. Vrahatis, B. Boutsinas, P. Alevizos, and G. Pavlides. The new k-windows algorithm for improving the k-means clustering algorithm. Journal of Complexity, 18:375–391, 2002.
P. Willett. Recent trends in hierarchic document clustering: a critical review. Inf. Process. Manage., 24(5):577–597, 1988.
X. Xu, J. Jgerand, and H.P. Kriegel. A fast parallel clustering algorithm for large spatial databases. Data Mining and Knowledge Discovery, 3:263–290, 1999.
C. Zou, B. Salzberg, and R. Ladin. Back to the future: Dynamic hierarchical clustering. In Int. Conf. on Data Engineering, pages 578–587. IEEE Computer Society, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tasoulis, D.K., Vrahatis, M.N. (2005). Novel Approaches to Unsupervised Clustering Through k-Windows Algorithm. In: Sirmakessis, S. (eds) Knowledge Mining. Studies in Fuzziness and Soft Computing, vol 185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32394-5_5
Download citation
DOI: https://doi.org/10.1007/3-540-32394-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25070-8
Online ISBN: 978-3-540-32394-5
eBook Packages: EngineeringEngineering (R0)