Sequential Hierarchical Pattern Clustering
Clustering is a widely used unsupervised data analysis technique in machine learning. However, a common requirement amongst many existing clustering methods is that all pairwise distances between patterns must be computed in advance. This makes it computationally expensive and difficult to cope with large scale data used in several applications, such as in bioinformatics. In this paper we propose a novel sequential hierarchical clustering technique that initially builds a hierarchical tree from a small fraction of the entire data, while the remaining data is processed sequentially and the tree adapted constructively. Preliminary results using this approach show that the quality of the clusters obtained does not degrade while reducing the computational needs.
KeywordsOn-line clustering Hierarchical clustering Large scale data Gene expression
- 1.Achtert, E., Bohm, C., Kriegel, H.-P., Kröger, P.: Online Hierarchical Clustering in a Data Warehouse Environment Data Mining. In: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 10–17 (2005)Google Scholar
- 2.Farran, B., Saunders, C.: Voted Spheres: An online Fast Approach to Large Scale Learning. In: IEEE International Symposium on Mining and Web (2009)Google Scholar
- 6.Hasan, M., Jue, J.: Online Clustering for Hierarchical WDM Networks. In: IEEE/OSA Conference on Optical Fiber Communication, San Diego, CA, pp. 1–3 (2008)Google Scholar
- 15.Ramanan, A., Niranjan, M.: Designing a Resource-Allocating Discriminant Codebook for Visual Object Recognition. Neural Computation (2009) (under review)Google Scholar