Sequential Hierarchical Pattern Clustering

  • Bassam Farran
  • Amirthalingam Ramanan
  • Mahesan Niranjan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5780)


Clustering is a widely used unsupervised data analysis technique in machine learning. However, a common requirement amongst many existing clustering methods is that all pairwise distances between patterns must be computed in advance. This makes it computationally expensive and difficult to cope with large scale data used in several applications, such as in bioinformatics. In this paper we propose a novel sequential hierarchical clustering technique that initially builds a hierarchical tree from a small fraction of the entire data, while the remaining data is processed sequentially and the tree adapted constructively. Preliminary results using this approach show that the quality of the clusters obtained does not degrade while reducing the computational needs.


On-line clustering Hierarchical clustering Large scale data Gene expression 


  1. 1.
    Achtert, E., Bohm, C., Kriegel, H.-P., Kröger, P.: Online Hierarchical Clustering in a Data Warehouse Environment Data Mining. In: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 10–17 (2005)Google Scholar
  2. 2.
    Farran, B., Saunders, C.: Voted Spheres: An online Fast Approach to Large Scale Learning. In: IEEE International Symposium on Mining and Web (2009)Google Scholar
  3. 3.
    Ding, C.H., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4), 349–358 (2001)CrossRefPubMedGoogle Scholar
  4. 4.
    Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences USA 95(25), 14863–14868 (1998)CrossRefGoogle Scholar
  5. 5.
    Frey, B.J., Dueck, D.: Clustering by Passing Messages between Data Points. Science AAAS 315, 972–976 (2007)CrossRefGoogle Scholar
  6. 6.
    Hasan, M., Jue, J.: Online Clustering for Hierarchical WDM Networks. In: IEEE/OSA Conference on Optical Fiber Communication, San Diego, CA, pp. 1–3 (2008)Google Scholar
  7. 7.
    Kadirkamanathan, V., Niranjan, M.: A Function Estimation Approach to Sequential Learning with Neural Networks. Neural Computation 5, 954–975 (1993)CrossRefGoogle Scholar
  8. 8.
    Kaplan, N., Friedlich, M., Fromer, M., Linial, M.: A functional hierarchical organization of the protein sequence space. BMC Bioinformatics 5, 196 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Kull, M., Vilo, J.: Fast approximate hierarchical clustering using similarity heuristics. BioData Mining 1, 9 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Loewenstein, Y., Portugaly, E., Fromer, M., Linial, M.: Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space. Bioinformatics 24(13), 41–49 (2008)CrossRefGoogle Scholar
  11. 11.
    Molina, C., Niranjan, M.: Pruning with replacement on limited resource allocating networks by f-projections. Neural Computation 8(4), 855–868 (1996)CrossRefGoogle Scholar
  12. 12.
    Lo Conte, L., Ailey, B., Hubbard, T.J., Brenner, S.E., Murzin, A.G., Chothia, C.: SCOP: a Structural Classification Of Proteins database. Nucleic Acids Research 28(1), 257–259 (2000)CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Needleman, S.B., Wunsch, C.D.: A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of two Proteins. Journal of Molecular Biology 48(3), 443–453 (1970)CrossRefPubMedGoogle Scholar
  14. 14.
    Platt, J.C.: A Resource-Allocating Network for Function Interpolation. Neural Computation 3, 213–225 (1991)CrossRefGoogle Scholar
  15. 15.
    Ramanan, A., Niranjan, M.: Designing a Resource-Allocating Discriminant Codebook for Visual Object Recognition. Neural Computation (2009) (under review)Google Scholar
  16. 16.
    Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. Journal of Molecular Biology 147, 195–197 (1981)CrossRefPubMedGoogle Scholar
  17. 17.
    El-Sonbaty, Y., Ismail, M.A.: On-line hierarchical clustering. Pattern Recognition Letters 19, 1285–1291 (1998)CrossRefGoogle Scholar
  18. 18.
    Wu, C.H., Apweiler, R., Bairoch, A., Natale, D.A., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., et al.: The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Research 34, D187–D191 (2006)CrossRefGoogle Scholar
  19. 19.
    Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery 10(2), 141–168 (2005)CrossRefGoogle Scholar
  20. 20.
    Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study. International Journal of Computer Vision 73, 213–238 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Bassam Farran
    • 1
  • Amirthalingam Ramanan
    • 1
  • Mahesan Niranjan
    • 1
  1. 1.School of Electronics and Computer ScienceUniversity of SouthamptonSouthamptonUnited Kingdom

Personalised recommendations