Advertisement

Local Topological Data Analysis to Uncover the Global Structure of Data Approaching Graph-Structured Topologies

  • Robin VandaeleEmail author
  • Tijl De Bie
  • Yvan Saeys
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11052)

Abstract

Gene expression data of differentiating cells, galaxies distributed in space, and earthquake locations, all share a common property: they lie close to a graph-structured topology in their respective spaces [1, 4, 9, 10, 20], referred to as one-dimensional stratified spaces in mathematics. Often, the uncovering of such topologies offers great insight into these data sets. However, methods for dimensionality reduction are clearly inappropriate for this purpose, and also methods from the relatively new field of Topological Data Analysis (TDA) are inappropriate, due to noise sensitivity, computational complexity, or other limitations. In this paper we introduce a new method, termed Local TDA (LTDA), which resolves the issues of pre-existing methods by unveiling (global) graph-structured topologies in data by means of robust and computationally cheap local analyses. Our method rests on a simple graph-theoretic result that enables one to identify isolated, end-, edge- and multifurcation points in the topology underlying the data. It then uses this information to piece together a graph that is homeomorphic to the unknown one-dimensional stratified space underlying the point cloud data. We evaluate our method on a number of artificial and real-life data sets, demonstrating its superior effectiveness, robustness against noise, and scalability. Code related to this paper is available at: https://bitbucket.org/ghentdatascience/gltda-public.

Keywords

Topological Data Analysis Persistent homology Metric spaces Graph theory Stratified spaces 

Notes

Acknowledgments

This work was funded by the ERC under the European Union’s Seventh Framework Programme (FP7/2007-2013) / ERC Grant Agreement no. 615517, and the FWO (G091017N, G0F9816N).

References

  1. 1.
    Aanjaneya, M., Chazal, F., Chen, D., GLisse, M., Guibas, L., Morozov, D.: Metric graph reconstruction from noisy data. Int. J. Comput. Geom. Appl. 22(04), 305–325 (2012)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bernhard, K., Vygen, J.: Combinatorial Optimization: Theory and Algorithms. Springer, Heidelberg (2012).  https://doi.org/10.1007/3-540-29297-7CrossRefzbMATHGoogle Scholar
  3. 3.
    Cámara, P.G., Rosenbloom, D.I.S., Emmett, K.J., Levine, A.J., Rabadán, R.: Topological data analysis generates high-resolution, genome-wide maps of human recombination. Cell Syst. 3(1), 83–94 (2016)CrossRefGoogle Scholar
  4. 4.
    Cannoodt, R., Saelens, W., Saeys, Y.: Computational methods for trajectory inference from single-cell transcriptomics. Eur. J. Immunol. 46(11), 2496–2506 (2016)CrossRefGoogle Scholar
  5. 5.
    Carlsson, G.: Topology and data. Bull. Am. Math. Soc. 46(2), 255–308 (2009).  https://doi.org/10.1090/S0273-0979-09-01249-XMathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Carlsson, G.: Topological pattern recognition for point cloud data (2013)Google Scholar
  7. 7.
    Carlsson, G., Ishkhanov, T., de Silva, V., Zomorodian, A.: On the local behavior of spaces of natural images. Int. J. Comput. Vis. 76(1), 1–12 (2008)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Chazal, F., Cohen-Steiner, D., Mérigot, Q.: Geometric inference for measures based on distance functions (2009)Google Scholar
  9. 9.
    Choi, E., Bond, N.A., Strauss, M.A., Coil, A.L., Davis, M., Willmer, C.N.A.: Tracing the filamentary structure of the galaxy distribution at z \(\sim \) 0.8. Mon. Not. R. Astron. Soc. 406(1), 320–328 (2010)CrossRefGoogle Scholar
  10. 10.
    De Baets, L., Van Gassen, S., Dhaene, T., Saeys, Y.: Unsupervised trajectory inference using graph mining. In: Angelini, C., Rancoita, P.M.V., Rovetta, S. (eds.) CIBB 2015. LNCS, vol. 9874, pp. 84–97. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-44332-4_7CrossRefGoogle Scholar
  11. 11.
    Fasy, B.T., Wang, B.: Exploring persistent local homology in topological data analysis. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6430–6434 (2016)Google Scholar
  12. 12.
    Ghrist, R.: Barcodes: the persistent topology of data. Bull. (New Ser.) Am. Math. Soc. 45(107), 61–75 (2008)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Giusti, C., Ghrist, R., Bassett, D.S.: Two’s company, three (or more) is a simplex. J. Comput. Neurosci. 41(1), 1–14 (2016)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Hatcher, A.: Algebraic Topology. Cambridge University Press, Cambridge (2002)zbMATHGoogle Scholar
  15. 15.
    Hopcroft, J.E., Ullman, J.D.: Set merging algorithms. SIAM J. Comput. 2(4), 294–303 (1973)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Lapaugh, A.S., Rivest, R.L.: The subgraph homeomorphism problem. J. Comput. Syst. Sci. 20(2), 133–149 (1980)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Medina, P., Doerge, R.: Statistical methods in topological data analysis for complex, high-dimensional data. In: Annual Conference on Applied Statistics in Agriculture (2015)Google Scholar
  18. 18.
    Nanda, V., Sazdanović, R.: Simplicial models and topological inference in biological systems. In: Jonoska, N., Saito, M. (eds.) Discrete and Topological Models in Molecular Biology. NCS, pp. 109–141. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-642-40193-0_6CrossRefzbMATHGoogle Scholar
  19. 19.
    Nicolau, M., Levine, A.J., Carlsson, G.: Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proc. Nat. Acad. Sci. 108(17), 7265–7270 (2011)CrossRefGoogle Scholar
  20. 20.
    Rizvi, A.H., et al.: Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development. Nat. Biotechnol. 35, 551–560 (2017)CrossRefGoogle Scholar
  21. 21.
    Wang, B., Summa, B., Pascucci, V., Vejdemo-Johansson, M.: Branching and circular features in high dimensional data. IEEE Trans. Visual. Comput. Graph. 17, 1902–1911 (2011)CrossRefGoogle Scholar
  22. 22.
    Wang, K.: The basic theory of persistent homology (2012)Google Scholar
  23. 23.
    Wasserman, L.: Topological data analysis. Ann. Rev. Stat. Appl. 5(1), 501–532 (2018)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Zomorodian, A., Carlsson, G.: Computing persistent homology. Discrete Comput. Geom. 33(2), 249–274 (2005)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.IDLab, Department of Electronics and Information SystemsGhent UniversityGentBelgium
  2. 2.Data Mining and Modelling for Biomedicine (DaMBi)VIB Inflammation Research CenterGentBelgium

Personalised recommendations