Abstract
Long-lived consortiums in genomics generate massive highly-dimensional datasets over the course of many months or years with substantial blocks of data added over time. Algorithms designed to characterize and cluster this data are designed to run once on a dataset in its entirety, and thus, any analysis of these collections must be entirely re-done from scratch every time a new block of data is added. We describe a novel progressive clustering approach using a variation of the self-organizing map (SOM) algorithm, which we call the Living SOM. Our software package is capable of clustering highly-dimensional data with all of the power of regular SOMs with the added benefit of incorporating additional datasets as they become available while maintaining the initial structure as much as possible. This allows us to evaluate the impact of the new datasets on previous analyses with the potential to keep classifications intact if appropriate. We demonstrate the power of this technique on a collection of gene expression experiments done in an embryonic time course of development for mouse from the ENCODE consortium.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kohonen T (2001) Self-organizing maps, 3rd edn. Springer, Heidelberg
Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11(3):586–600
Mortazavi A et al (2013) Integrating and mining the chromatin landscape of cell-type specificity using self-organizing maps. Genome Res 23:2136–2148
Tamayo P et al (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. PNAS 96(6):2907–2912
Silva B, Marques N (2015) The ubiquitous self-organizing map for non-stationary data streams. J Big Data 2:27
Link to ENCODE datasets. https://bit.ly/2FGKWnx. Accessed 17 Jan 2019
Jaccard P (1912) The distribution of the Flora of the Alpine Zone. New Phytol. 11(1912):37–50
Acknowledgments
We would like to thank the Wold lab at Caltech for providing the data for this work as well as Dana Wyman in the Mortazavi lab at UCI for feedback. Funding for this work was provided by NHGRI UM1 HG009443 to AM.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Jansen, C., Mortazavi, A. (2020). Progressive Clustering and Characterization of Increasingly Higher Dimensional Datasets with Living Self-organizing Maps. In: Vellido, A., Gibert, K., Angulo, C., MartÃn Guerrero, J. (eds) Advances in Self-Organizing Maps, Learning Vector Quantization, Clustering and Data Visualization. WSOM 2019. Advances in Intelligent Systems and Computing, vol 976. Springer, Cham. https://doi.org/10.1007/978-3-030-19642-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-19642-4_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19641-7
Online ISBN: 978-3-030-19642-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)