Abstract
Due to recent breakthroughs in high-throughput sequencing, it is now possible to use chromosome conformation capture (CCC) to understand the three dimensional conformation of DNA at the whole genome level, and to characterize it with the so-called contact maps. This is very useful since many biological processes are correlated with DNA folding, such as DNA transcription. However, the methods for the analysis of such conformations are still lacking mathematical guarantees and statistical power. To handle this issue, we propose to use the Mapper, which is a standard tool of Topological Data Analysis (TDA) that allows one to efficiently encode the inherent continuity and topology of underlying biological processes in data, in the form of a graph with various features such as branches and loops. In this article, we show how recent statistical techniques developed in TDA for the Mapper algorithm can be extended and leveraged to formally define and statistically quantify the presence of topological structures coming from biological phenomena, such as the cell cyle, in datasets of CCC contact maps.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alan Agresti. Categorical data analysis, 3rd edition. Wiley, 2012.
Ferhat Ay and William Noble. Analysis methods for studying the 3D architecture of the genome. Genome Biology, 16:183–198, 2015.
Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46:255–308, 2009.
Pablo Camara, Arnold Levine, and Raul Rabadan. Inference of Ancestral Recombination Graphs through Topological Data Analysis. PLoS Computational Biology, 12(8):1–25, 2016.
Mathieu Carrière, Bertrand Michel, and Steve Oudot. Statistical analysis and parameter selection for mapper. Journal of Machine Learning Research, 19(12):1–39, 2018.
Mathieu Carrière and Steve Oudot. Structure and Stability of the 1-Dimensional Mapper. Foundations of Computational Mathematics, 2017.
David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of Persistence Diagrams. Discrete and Computational Geometry, 37(1):103–120, 2007.
David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Extending persistence using Poincaré and Lefschetz duality. Foundation of Computational Mathematics, 9(1):79–103, 2009.
Tamal Dey, Facundo Mémoli, and Yusu Wang. Multiscale Mapper: Topological Summarization via Codomain Covers. In Proceedings of the 27th Symposium on Discrete Algorithms, pages 997–1013, 2016.
Josée Dostie, Todd Richmond, Ramy Arnaout, Rebecca Selzer, William Lee, Tracey Honan, Eric Rubio, Anton Krumm, Justin Lamb, Chad Nusbaum, Roland Green, and Job Dekker. Chromosome Conformation Capture Carbon Copy (5C): A massively parallel solution for mapping interactions between genomic elements. Genome Research, 16(10):1299–1309, 2006.
Job Dekker, Karsten Rippe, Martijn Dekker, and Nancy Kleckner. Capturing chromosome conformation. Science, 295(5558):1306–1311, 2002.
Elzo de Wit and Wouter de Laat. A decade of 3C technologies: insights into nuclear organization. Genes and Development, 26(1):11–24, 2012.
Herbert Edelsbrunner and John Harer. Computational Topology: an introduction. AMS Bookstore, 2010.
Erez Lieberman-Aiden, Nynke van Berkum, Louise Williams, Maxim Imakaev, Tobias Ragoczy, Agnes Telling, Ido Amit, Bryan Lajoie, Peter Sabo, Michael Dorschner, Richard Sandstrom, Bradley Bernstein, Michael Bender, Mark Groudine, Andreas Gnirke, John Stamatoyannopoulos, Leonid Mirny, Eric Lander, and Job Dekker. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science, 326(5950):289–293, 2009.
Jie Liu, Dejun Lin, Galip Yardimci, and William Noble. Unsupervised embedding of single-cell Hi-C data. Bioinformatics, 34(13):i96–i104, 2018.
Nathan Mantel. Chi-Square Tests with One Degree of Freedom; Extensions of the Mantel-Haenszel Procedure. Journal of the American Statistical Association, 58(303):690–700, 1963.
Fionn Murtagh and Pedro Contreras. Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1):86–97, 2012.
Elizabeth Munch and Bei Wang. Convergence between Categorical Representations of Reeb Space and Mapper. In Proceedings of the 32nd Symposium on Computational Geometry, volume 51, pages 53:1–53:16, 2016.
Takashi Nagano, Yaniv Lubling, Csilla Varnai, Carmel Dudley, Wing Leung, Yael Baran, Netta Cohen, Steven Wingett, Peter Fraser, and Amos Tanay. Cell-cycle dynamics of chromosomal organization at single-cell resolution. Nature, 547:61–67, 2017.
Steve Oudot. Persistence Theory: From Quiver Representations to Data Analysis. Number 209 in Mathematical Surveys and Monographs. American Mathematical Society, 2015.
Abbas Rizvi, Pablo Camara, Elena Kandror, Thomas Roberts, Ira Schieren, Tom Maniatis, and Raul Rabadan. Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development. Nature Biotechnology, 35:551–560, 2017.
Marieke Simonis, Petra Klous, Erik Splinter, Yuri Moshkin, Rob Willemsen, Elzo de Wit, Bas van Steensel, and Wouter de Laat. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nature Genetics, 38:1348–1354, 2006.
Gurjeet Singh, Facundo Mémoli, and Gunnar Carlsson. Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. In Symposium on Point Based Graphics, pages 91–100, 2007.
Tao Yang, Feipeng Zhang, Galip Yardimci, Fan Song, Ross Hardison, William Noble, Feng Yue, and Qunhua Li. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Research, 27(11):1939–1949, 2017.
Acknowledgements
This work has been funded by NIH grants (U54 CA193313 and U54 CA209997) and Chan Zuckerberg Initiative pilot grant.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Carrière, M., Rabadán, R. (2020). Topological Data Analysis of Single-Cell Hi-C Contact Maps. In: Baas, N., Carlsson, G., Quick, G., Szymik, M., Thaule, M. (eds) Topological Data Analysis. Abel Symposia, vol 15. Springer, Cham. https://doi.org/10.1007/978-3-030-43408-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-43408-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43407-6
Online ISBN: 978-3-030-43408-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)