Self-Organizing Map and Other Clustering Methods in Transcriptomics

  • Xuhua Xia


Self-organizing map (SOM) is an artificial neural network algorithm, having been used frequently with transcriptomic data analysis, in particular for clustering co-expressed genes as a basis to infer co-regulated genes. It can be applied to any set of objects as long as a distance function can be defined between objects. SOM is numerically illustrated together with a simple UPGMA method to contrast between the two. A less known application of SOM is in discovering heterogeneous motifs present in a set of sequences, making it more general than Gibbs sampler in de novo motif discovery. These two approaches, one with a (gene × expression) matrix as input and the other with a set of sequences as input (where each sequence may contain multiple but heterogeneous protein-binding sites), are illustrated.


  1. Bickel DR (2003) Robust cluster analysis of microarray gene expression data with the number of clusters determined biologically. Bioinformatics 19(7):818–824CrossRefPubMedGoogle Scholar
  2. Chen JJ, Peck K, Hong TM, Yang SC, Sher YP, Shih JY, Wu R, Cheng JL, Roffler SR, Wu CW et al (2001) Global analysis of gene expression in invasion by a lung cancer model. Cancer Res 61(13):5223–5230PubMedGoogle Scholar
  3. Chilingaryan A, Gevorgyan N, Vardanyan A, Jones D, Szabo A (2002) Multivariate approach for selecting sets of differentially expressed genes. Math Biosci 176(1):59–69CrossRefPubMedGoogle Scholar
  4. Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ et al (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2(1):65–73CrossRefPubMedGoogle Scholar
  5. Covell DG, Wallqvist A, Rabow AA, Thanki N (2003) Molecular classification of cancer: unsupervised self-organizing map analysis of gene expression microarray data. Mol Cancer Ther 2(3):317–332PubMedGoogle Scholar
  6. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95(25):14863–14868CrossRefPubMedPubMedCentralGoogle Scholar
  7. Hartigan JA (1975) Clustering algorithms. Wiley, New YorkGoogle Scholar
  8. Kim DW, Lee KH, Lee D (2005) Detecting clusters of different geometrical shapes in microarray gene expression data. Bioinformatics 21(9):1927–1934CrossRefPubMedGoogle Scholar
  9. Kohonen T (2001) Self-organizing maps. Springer, BerlinCrossRefGoogle Scholar
  10. Lamendola DE, Duan Z, Yusuf RZ, Seiden MV (2003) Molecular description of evolving paclitaxel resistance in the SKOV-3 human ovarian carcinoma cell line. Cancer Res 63(9):2200–2205PubMedGoogle Scholar
  11. Murtagh F (1984) Complexities of hierarchic clustering algorithms: state of the art. Comput Stat Q 1:101–113Google Scholar
  12. Ordway JM, Fenster SD, Ruan H, Curran T (2005) A transcriptome map of cellular transformation by the fos oncogene. Mol Cancer 4(1):19CrossRefPubMedPubMedCentralGoogle Scholar
  13. Pielou EC (1984) The interpretation of ecological data: a primer on classification and ordination. Wiley, New YorkGoogle Scholar
  14. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425PubMedGoogle Scholar
  15. Sawa T, Ohno-Machado L (2003) A neural network-based similarity index for clustering DNA microarray data. Comput Biol Med 33(1):1–15CrossRefPubMedGoogle Scholar
  16. Seo EY, Namkung JH, Lee KM, Lee WH, Im M, Kee SH, Tae Park G, Yang JM, Seo YJ, Park JK et al (2005) Analysis of calcium-inducible genes in keratinocytes using suppression subtractive hybridization and cDNA microarray. Genomics 86(5):528–538CrossRefPubMedGoogle Scholar
  17. Sneath PHA (1962) The construction of taxonomic groups. In: Ainsworth GC, Sneath PHA (eds) Microbial classification. Cambridge University Press, Cambridge, pp 289–332Google Scholar
  18. Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kans Sci Bull 28:1409–1438Google Scholar
  19. Toronen P, Kolehmainen M, Wong G, Castren E (1999) Analysis of gene expression data using self-organizing maps. FEBS Lett 451(2):142–146CrossRefPubMedGoogle Scholar
  20. Trutschl M, Dinkova TD, Rhoads RE (2005) Application of machine learning and visualization of heterogeneous datasets to uncover relationships between translation and developmental stage expression of C. elegans mRNAs. Physiol Genomics 21(2):264–273CrossRefPubMedGoogle Scholar
  21. Wang J, Delabie J, Aasheim H, Smeland E, Myklebost O (2002) Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study. BMC Bioinform 3:36CrossRefGoogle Scholar
  22. Xia X (2017d) Self-organizing map for characterizing heterogeneous nucleotide and amino acid sequence motifs. Computation 5(4):43CrossRefGoogle Scholar
  23. Xia X, Xie Z (2001a) AMADA: analysis of microarray data. Bioinformatics 17:569–570CrossRefPubMedGoogle Scholar
  24. Xiao L, Wang K, Teng Y, Zhang J (2003) Component plane presentation integrated self-organizing map for microarray data analysis. FEBS Lett 538(1–3):117–124CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2018

Authors and Affiliations

  • Xuhua Xia
    • 1
  1. 1.University of Ottawa CAREG and Biology DepartmentOttawaCanada

Personalised recommendations