Novel Approaches to Unsupervised Clustering Through k-Windows Algorithm

Tasoulis, D. K.; Vrahatis, M. N.

doi:10.1007/3-540-32394-5_5

D. K. Tasoulis³ &
M. N. Vrahatis³

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 185))

739 Accesses
9 Citations

Summary

The extraction of meaningful information from large collections of data is a fundamental issues in science. To this end, clustering algorithms are typically employed to identify groups (clusters) of similar objects. A critical issue for any clustering algorithm is the determination of the number of clusters present in a dataset. In this contribution we present a clustering algorithm that in addition to partitioning the data into clusters, it approximates the number of clusters during its execution. We further present modifications of this algorithm for different distributed environments, and dynamic databases. Finally, we present a modification of the algorithm that exploits the fractal dimension of the data to partition the dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P.K. Agarwal and C.M. Procopiuc. Exact and approximation algorithms for clustering (extended abstract). In Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 658–667, San Francisco, California, U.S.A., 1998.
Google Scholar
M.S. Aldenderfer and R.K. Blashfield. Cluster Analysis, volume 44 of Quantitative Applications in the Social Sciences. SAGE Publications, London, 1984.
Google Scholar
P. Alevizos. An algorithm for orthogonal range search in d ≥ 3 dimensions. In Proceedings of the 14th European Workshop on Computational Geometry. Barcelona, 1998.
Google Scholar
P. Alevizos, D.K. Tasoulis, and M.N. Vrahatis. Parallelizing the unsupervised k-windows clustering algorithm. In R. Wyrzykowski, editor, Lecture Notes in Computer Science, volume 3019, pages 225–232. Springer-Verlag, 2004.
Google Scholar
M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. In ACM SIGMOD Int. Conf. on Management of Data, pages 49–60, 1999.
Google Scholar
J. Aslam, K. Pelekhov, and D. Rus. A practical clustering algorithm for static and dynamic information organization. In ACM-SIAM Symposium on Discrete Algorithms, pages 51–60, 1999.
Google Scholar
D. Barbarä and P. Chen. Using the fractal dimension to cluster datasets. In KDD, pages 260–264. ACM Press, 2000.
Google Scholar
R.W. Becker and G.V. Lago. A global optimization algorithm. In Proceedings of the 8th Allerton Conference on Circuits and Systems Theory, pages 3–12, 1970.
Google Scholar
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In ACM SIGMOD Int. Conf. on Management of Data, pages 322–331, 1990.
Google Scholar
J.L. Bentley and H.A. Maurer. Efficient worst-case data structures for range searching. Acta Informatica, 13:155–168, 1980.
Article MathSciNet MATH Google Scholar
F. Can. Incremental clustering for dynamic information processing. ACM Trans. Inf. Syst., 11(2):143–164, 1993.
Article Google Scholar
Earthquake Catalogue. http://www.gein.noa.gr/services/cat.html, Institute of Geodynamics, National Observatory of Athens.
Google Scholar
P.K. Chan and S.J. Stolfo. Sharing learned models among remote database partitions by local meta-learning. In Knowledge Discovery and Data Mining, pages 2–7, 1996.
Google Scholar
M. Charikar, C. Chekuri, T. Feder, and R. Motwani. Incremental clustering and dynamic information retrieval. SIAM Journal on Computing, 33(6):1417–1440, 2004.
MathSciNet MATH Google Scholar
B. Chazelle. Filtering search: A new approach to query-answering. SIAM Journal on Computing, 15(3):703–724, 1986.
Article MATH MathSciNet Google Scholar
B. Chazelle and L.J. Guibas. Fractional cascading: II applications. Algorithmica, 1:163–191, 1986.
Article MathSciNet MATH Google Scholar
D.W.L. Cheung, S.D. Lee, and B. Kao. A general incremental technique for maintaining discovered association rules. In Database Systems for Advanced Applications, pages 185–194, 1997.
Google Scholar
I.S. Dhillon and D.S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Large-Scale Parallel Data Mining, Lecture Notes in Artificial Intelligence, pages 245–260, 2000.
Google Scholar
I.S. Dhillon and D.S. Modha. Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1/2):143–175, 2001.
Article MATH Google Scholar
M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental clustering for mining in a data warehousing environment. In 24rd Int. Conf. erence on Very Large Data Bases, pages 323–333. Morgan Kaufmann Publishers Inc., 1998.
Google Scholar
M. Ester and R. Wittmann. Incremental generalization for mining in a data warehousing environment. In Proceedings of the 6th Int. Conf. Extending Database Technology, pages 135–149. Springer-Verlag, 1998.
Google Scholar
U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. Advances in Knowledge Discovery and Data Mining. MIT Press, 1996.
Google Scholar
T. Feder D.H. Greene. Optimal algorithm for approximate clustering. In 20th Annual ACM Sympos. Theory Comput., pages 434–444, 1988.
Google Scholar
J.A. Hartigan and M.A. Wong. A k-means clustering algorithm. Applied Statistics, 28:100–108, 1979.
MATH Google Scholar
H. Kargupta, W. Huang, K. Sivakumar, and E.L. Johnson. Distributed clustering using collective principal component analysis. Knowledge and Information Systems, 3(4):422–448, 2001.
Article MATH Google Scholar
KDD. Cup data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, 1999.
Google Scholar
H-P. Kriegel, P. Kroger, and I. Gotlibovich. Incremental optics: Efficient computation of updates in a hierarchical cluster ordering. In 5th Int. Conf. on Data Warehousing and Knowledge Discovery, 2003.
Google Scholar
W. Lam and A.M. Segre. Distributed data mining of probabilistic knowledge. In Proceedings of the 17th Int. Conf. on Distributed Computing Systems, Washington, pages 178–185. IEEE Computer Society Press, 1997.
Google Scholar
L.S. Liebovitch and T. Toth. A fast algorithm to determine fractal dimensions by box counting. Physics Letters, 141A(8), 1989.
Google Scholar
C. Linnaeus. Clavis Classium in Systemate Phytologorum in Bibliotheca Botanica. Amsterdam, The Netherlands: Biblioteca Botanica, 1736.
Google Scholar
G.D. Magoulas, V.P. Plagianakos, D.K. Tasoulis, and M.N. Vrahatis. Tumor detection in colonoscopy using the unsupervised k-windows clustering algorithm and neural networks. In Fourth European Symposium on “Biomedical Engineering”, 2004.
Google Scholar
B. B. Mandelbrot. The Fractal Geometry of Nature. Freeman, New York, 1983.
Google Scholar
N. Megiddo and K.J. Supowit. On the complexity of some common geometric problems. SIAM Journal on Computing, 13:182–196, 1984.
Article MathSciNet MATH Google Scholar
O. Nasraoui and C. Rojas. From static to dynamic web usage mining: Towards scalable profiling and personalization with evolutionary computation. In Workshop on Information Technology Rabat, Morocco, 2003.
Google Scholar
N.G. Pavlidis, D.K. Tasoulis, and M.N. Vrahatis. Financial forecasting through unsupervised clustering and evolutionary trained neural networks. In Congress on Evolutionary Computation, pages 2314–2321, Canberra Australia, 2003.
Google Scholar
A.P. Pentland. Fractal-based description of natural scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6):661–674, 1984.
Article Google Scholar
M.G.P. Prasad, S. Dube, and K. Sridharan. An efficient fractals-based algorithm for clustering. In IEEE Region 10 Conference on Convergent Technologies For The Asia-Pacific, 2003.
Google Scholar
F. Preparata and M. Shamos. Computational Geometry. Springer Verlag, New York, Berlin, 1985.
Google Scholar
O. Procopiuc, P.K. Agarwal, L. Arge, and J.S. Vitter. Bkd-tree: A dynamic scalable kd-tree. In T. Hadzilacos, Y. Manolopoulos, and J.F. Roddick, editors, Advances in Spatial and Temporal Databases, SSTD, volume 2750 of Lecture Notes in Computer Science, pages 46–65. Springer, 2003.
Google Scholar
V. Ramasubramanian and K. Paliwal. Fast k-dimensional tree algorithms for nearest neighbor search with application to vector quantization encoding. IEEE Transactions on Signal Processing, 40(3):518–531, 1992.
Article Google Scholar
M. Rigou, S. Sirmakessis, and A. Tsakalidis. A computational geometry approach to web personalization. In IEEE Int. Conf. on E-Commerce Technology (CEC’04), pages 377–380, San Diego, California, 2004.
Google Scholar
J.T. Robinson. The K-D-B-tree: A search structure for large multidimensional dynamic indexes. In ACM SIGMOD Int. Conf. on Management of Data, pages 10–18, 1981.
Google Scholar
J. Sander, M. Ester, H.-P. Kriegel, and X. Xu. Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery, 2(2):169–194, 1998.
Article Google Scholar
N. Sarkar and B.B. Chaudhuri. An efficient approach to estimate fractal dimension of textural images. Pattern Recognition, 25(9):1035–1041, 1992.
Article Google Scholar
S. Sirmakessis, editor. Text Mining and its Applications, volume 138 of Studies in Fuzziness and Soft Computing. Springer, 2004.
Google Scholar
D.K. Tasoulis, P. Alevizos, B. Boutsinas, and M.N. Vrahatis. Parallel unsupervised k-windows: an efficient parallel clustering algorithm. In V. Malyshkin, editor, Lecture Notes in Computer Science, volume 2763, pages 336–344. Springer-Verlag, 2003.
Google Scholar
D.K. Tasoulis, V.P. Plagianakos, and M.N. Vrahatis. Unsupervised cluster analysis in bioinformatics. In Fourth European Symposium on “Biomedical Engineering”, 2004.
Google Scholar
D.K. Tasoulis, V.P. Plagianakos, and M.N. Vrahatis. Unsupervised clustering of bioinformatics data. In European Symposium on Intelligent Technologies, Hybrid Systems and their implementation on Smart Adaptive Systems, Eunite, pages 47–53, 2004.
Google Scholar
D.K. Tasoulis, L. Vladutu, V.P. Plagianakos, A. Bezerianos, and M.N. Vrahatis. On-line neural network training for automatic ischemia episode detection. In Leszek Rutkowski, Jörg H. Siekmann, Ryszard Tadeusiewicz, and Lotfi A. Zadeh, editors, Lecture Notes in Computer Science, volume 2070, pages 1062–1068. Springer-Verlag, 2003.
Google Scholar
D.K. Tasoulis and M.N. Vrahatis. Unsupervised distributed clustering. In IASTED Int. Conf. on Parallel and Distributed Computing and Networks, pages 347–351. Innsbruck, Austria, 2004.
Google Scholar
D.K. Tasoulis and M.N. Vrahatis. Unsupervised clustering on dynaic databases. Pattern Recognition Letters, 2005. in press.
Google Scholar
D.K. Tasoulis and M.N. Vrahatis. Unsupervised clustering using fractal dimension. International Journal of Biffurcation and Chaos, 2005. in press.
Google Scholar
D.K. Tasoulis and M.N. Vrahatis. Generalizing the k-windows clustering algorithm for metric spaces. Mathematical and Computer Modelling, 2005. in press.
Google Scholar
A. Törn and A. Žilinskas. Global Optimization. Springer-Verlag, Berlin, 1989.
MATH Google Scholar
C. Tryon. Cluster Analysis. Ann Arbor, MI: Edward Brothers, 1939.
Google Scholar
M.N. Vrahatis, B. Boutsinas, P. Alevizos, and G. Pavlides. The new k-windows algorithm for improving the k-means clustering algorithm. Journal of Complexity, 18:375–391, 2002.
Article MathSciNet MATH Google Scholar
P. Willett. Recent trends in hierarchic document clustering: a critical review. Inf. Process. Manage., 24(5):577–597, 1988.
Article Google Scholar
X. Xu, J. Jgerand, and H.P. Kriegel. A fast parallel clustering algorithm for large spatial databases. Data Mining and Knowledge Discovery, 3:263–290, 1999.
Article Google Scholar
C. Zou, B. Salzberg, and R. Ladin. Back to the future: Dynamic hierarchical clustering. In Int. Conf. on Data Engineering, pages 578–587. IEEE Computer Society, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Computational Intelligence Laboratory, Department of Mathematics, University of Patras Artificial Intelligence Research Center (UPAIRC), University of Patras, GR-26110, Patras, Greece
D. K. Tasoulis & M. N. Vrahatis

Authors

D. K. Tasoulis
View author publications
You can also search for this author in PubMed Google Scholar
M. N. Vrahatis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Academic Computer Technology, Institute 61 Riga Feraiou Str., 26221, Patras, Greece
Spiros Sirmakessis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tasoulis, D.K., Vrahatis, M.N. (2005). Novel Approaches to Unsupervised Clustering Through k-Windows Algorithm. In: Sirmakessis, S. (eds) Knowledge Mining. Studies in Fuzziness and Soft Computing, vol 185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32394-5_5

Download citation

DOI: https://doi.org/10.1007/3-540-32394-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25070-8
Online ISBN: 978-3-540-32394-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics