Self-Organizing Map and Other Clustering Methods in Transcriptomics
Self-organizing map (SOM) is an artificial neural network algorithm, having been used frequently with transcriptomic data analysis, in particular for clustering co-expressed genes as a basis to infer co-regulated genes. It can be applied to any set of objects as long as a distance function can be defined between objects. SOM is numerically illustrated together with a simple UPGMA method to contrast between the two. A less known application of SOM is in discovering heterogeneous motifs present in a set of sequences, making it more general than Gibbs sampler in de novo motif discovery. These two approaches, one with a (gene × expression) matrix as input and the other with a set of sequences as input (where each sequence may contain multiple but heterogeneous protein-binding sites), are illustrated.
- Hartigan JA (1975) Clustering algorithms. Wiley, New YorkGoogle Scholar
- Murtagh F (1984) Complexities of hierarchic clustering algorithms: state of the art. Comput Stat Q 1:101–113Google Scholar
- Pielou EC (1984) The interpretation of ecological data: a primer on classification and ordination. Wiley, New YorkGoogle Scholar
- Sneath PHA (1962) The construction of taxonomic groups. In: Ainsworth GC, Sneath PHA (eds) Microbial classification. Cambridge University Press, Cambridge, pp 289–332Google Scholar
- Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kans Sci Bull 28:1409–1438Google Scholar