Skip to main content

On Topological Data Mining

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8401))

Abstract

Humans are very good at pattern recognition in dimensions of ≤ 3. However, most of data, e.g. in the biomedical domain, is in dimensions much higher than 3, which makes manual analyses awkward, sometimes practically impossible. Actually, mapping higher dimensional data into lower dimensions is a major task in Human–Computer Interaction and Interactive Data Visualization, and a concerted effort including recent advances in computational topology may contribute to make sense of such data. Topology has its roots in the works of Euler and Gauss, however, for a long time was part of theoretical mathematics. Within the last ten years computational topology rapidly gains much interest amongst computer scientists. Topology is basically the study of abstract shapes and spaces and mappings between them. It originated from the study of geometry and set theory. Topological methods can be applied to data represented by point clouds, that is, finite subsets of the n-dimensional Euclidean space. We can think of the input as a sample of some unknown space which one wishes to reconstruct and understand, and we must distinguish between the ambient (embedding) dimension n, and the intrinsic dimension of the data. Whilst n is usually high, the intrinsic dimension, being of primary interest, is typically small. Therefore, knowing the intrinsic dimensionality of data can be seen as one first step towards understanding its structure. Consequently, applying topological techniques to data mining and knowledge discovery is a hot and promising future research area.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Holzinger, A., Dehmer, M., Jurisica, I.: Knowledge discovery and interactive data mining in bioinformatics state-of-the-art, future challenges and research directions. BMC Bioinformatics 15(suppl. 6), I1 (2014)

    Google Scholar 

  2. Edelsbrunner, H., Harer, J.L.: Computational Topology: An Introduction. American Mathematical Society, Providence (2010)

    MATH  Google Scholar 

  3. De Silva, V.: Geometry and topology of point cloud data sets: a statement of my research interests (2004), http://pomona.edu

  4. Hatcher, A.: Algebraic Topology. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

  5. Edelsbrunner, H., Kirkpatrick, D., Seidel, R.: On the shape of a set of points in the plane. IEEE Transactions on Information Theory 29(4), 551–559 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  6. Edelsbrunner, H., Mucke, E.P.: 3-dimensional alpha-shapes. ACM Transactions on Graphics 13(1), 43–72 (1994)

    Article  MATH  Google Scholar 

  7. Albou, L.P., Schwarz, B., Poch, O., Wurtz, J.M., Moras, D.: Defining and characterizing protein surface using alpha shapes. Proteins-Structure Function and Bioinformatics 76(1), 1–12 (2009)

    Article  Google Scholar 

  8. Frosini, P., Landi, C.: Persistent betti numbers for a noise tolerant shape-based approach to image retrieval. Pattern Recognition Letters 34(8), 863–872 (2013)

    Article  Google Scholar 

  9. Goodman, J.E., O’Rourke, J.: Handbook of Discrete and Computational Geometry. Chapman and Hall/CRC, Boca Raton (2010)

    MATH  Google Scholar 

  10. Cignoni, P., Montani, C., Scopigno, R.: Dewall: A fast divide and conquer delaunay triangulation algorithm in ed. Computer-Aided Design 30(5), 333–341 (1998)

    Article  MATH  Google Scholar 

  11. Bass, H.: Euler characteristics and characters of discrete groups. Inventiones Mathematicae 35(1), 155–196 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  12. Whitehead, G.W.: Elements of homotopy theory. Springer (1978)

    Google Scholar 

  13. Alexandroff, P., Hopf, H.: Topologie I. Springer, Berlin (1935)

    Book  MATH  Google Scholar 

  14. Munkres, J.R.: Elements of algebraic topology, vol. 2. Addison-Wesley, Reading (1984)

    MATH  Google Scholar 

  15. Edelsbrunner, H., Harer, J.: Persistent Homology - a Survey. Contemporary Mathematics Series, vol. 453, pp. 257–282. Amer Mathematical Soc., Providence (2008)

    MATH  Google Scholar 

  16. Doraiswamy, H., Natarajan, V.: Efficient algorithms for computing reeb graphs. Computational Geometry 42(67), 606–616 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  17. Edelsbrunner, H., Harer, J., Mascarenhas, A., Pascucci, V., Snoeyink, J.: Time-varying reeb graphs for continuous space-time data. Computational Geometry-Theory and Applications 41(3), 149–166 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  18. Biasotti, S., Giorgi, D., Spagnuolo, M., Falcidieno, B.: Reeb graphs for shape analysis and applications. Theoretical Computer Science 392(13), 5–22 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  19. Euler, L.: Solutio problematis ad geometriam situs pertinentis. Commentarii Academiae Scientiarum Petropolitanae 8(1741), 128–140

    Google Scholar 

  20. Listing, J.B.: Vorstudien zur Topologie. Vandenhoeck und Ruprecht, Goettingen (1848)

    Google Scholar 

  21. Listing, J.B.: Der Census rauumlicher Complexe: oder Verallgemeinerung des euler’schen Satzes von den Polyedern, vol. 10. Dieterich, Goettingen (1862)

    Google Scholar 

  22. Moebius, A.F.: Theorie der elementaren verwandtschaft. Berichte der Saechsischen Akademie der Wissensschaften 15, 18–57 (1863)

    Google Scholar 

  23. Blackmore, D., Peters, T.J.: Computational topology, pp. 491–545. Elsevier, Amsterdam (2007)

    Google Scholar 

  24. Tourlakis, G., Mylopoulos, J.: Some results in computational topology. Journal of the ACM (JACM) 20(3), 439–455 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  25. Bubenik, P., Kim, P.T.: A statistical approach to persistent homology. Homology, Homotopy and Applications 9(2), 337–362 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  26. Burton, B.A.: Computational topology with Regina: Algorithms, heuristics and implementations, vol. 597, pp. 195–224. American Mathematical Society, Providence (2013)

    MATH  Google Scholar 

  27. Carlsson, G.: Topology and data. Bulletin of the American Mathematical Society 46(2), 255–308 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  28. Dey, T.K., Edelsbrunner, H., Guha, S.: Computational topology. Contemporary Mathematics 223, 109–144 (1999)

    Article  MATH  Google Scholar 

  29. Dunfield, N.M., Gukov, S., Rasmussen, J.: The superpolynomial for knot homologies. Experimental Mathematics 15(2), 129–159 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  30. Cerri, A., Fabio, B.D., Ferri, M., Frosini, P., Landi, C.: Betti numbers in multidimensional persistent homology are stable functions. Mathematical Methods in the Applied Sciences 36(12), 1543–1557 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  31. Ghrist, R.: Barcodes: the persistent topology of data. Bulletin of the American Mathematical Society 45(1), 61–75 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  32. Edelsbrunner, H., Morozov, D., Pascucci, V.: Persistence-sensitive simplification functions on 2-manifolds. In: Proceedings of the Twenty-Second Annual Symposium on Computational Geometry, pp. 127–134. ACM (2006)

    Google Scholar 

  33. Kaczynski, T., Mischaikow, K., Mrozek, M.: Computational homology, vol. 157. Springer (2004)

    Google Scholar 

  34. Pascucci, V., Tricoche, X., Hagen, H., Tierny, J.: Topological Methods in Data Analysis and Visualization: Theory, Algorithms, and Applications (Mathematics+Visualization). Springer, Heidelberg (2011)

    Book  MATH  Google Scholar 

  35. Robins, V., Abernethy, J., Rooney, N., Bradley, E.: Topology and intelligent data analysis. In: Berthold, M., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 111–122. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  36. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Article  Google Scholar 

  37. Zomorodian, A.: Topology for computing, vol. 16. Cambridge University Press, Cambridge (2005)

    Book  MATH  Google Scholar 

  38. Holzinger, A., Malle, B., Bloice, M., Wiltgen, M., Ferri, M., Stanganelli, I., Hofmann-Wellenhof, R.: On the generation of point cloud data sets: the first step in the knowledge discovery process. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 57–80. Springer, Heidelberg (2014)

    Google Scholar 

  39. Holzinger, A., Stocker, C., Peischl, B., Simonic, K.M.: On using entropy for enhancing handwriting preprocessing. Entropy 14(11), 2324–2350 (2012)

    Article  MATH  Google Scholar 

  40. Mémoli, F., Sapiro, G.: A theoretical and computational framework for isometry invariant recognition of point cloud data. Foundations of Computational Mathematics 5(3), 313–347 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  41. Canutescu, A.A., Shelenkov, A.A., Dunbrack, R.L.: A graph-theory algorithm for rapid protein side-chain prediction. Protein Science 12(9), 2001–2014 (2003)

    Article  Google Scholar 

  42. Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18(11), 620 (1975)

    Article  MATH  Google Scholar 

  43. Holzinger, A.: Biomedical Informatics: Computational Sciences meets Life Sciences. BoD, Norderstedt (2012)

    Google Scholar 

  44. Wagner, H., Dłotko, P., Mrozek, M.: Computational topology in text mining. In: Ferri, M., Frosini, P., Landi, C., Cerri, A., Di Fabio, B. (eds.) CTIC 2012. LNCS, vol. 7309, pp. 68–78. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  45. Cannon, J.W.: The recognition problem: what is a topological manifold? Bulletin of the American Mathematical Society 84(5), 832–866 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  46. Zomorodian, A.: Chapman & Hall/CRC Applied Algorithms and Data Structures series. In: Computational Topology, pp. 1–31. Chapman and Hall/CRC, Boca Raton (2010), doi:10.1201/9781584888215-c3.

    Google Scholar 

  47. Carlsson, G.: Topological pattern recognition for point cloud data (2013)

    Google Scholar 

  48. Epstein, C., Carlsson, G., Edelsbrunner, H.: Topological data analysis. Inverse Problems 27(12), 120201 (2011)

    Article  Google Scholar 

  49. Aurenhammer, F.: Voronoi diagrams a survey of a fundamental geometric data structure. ACM Computing Surveys (CSUR) 23(3), 345–405 (1991)

    Article  Google Scholar 

  50. Epstein, C., Carlsson, G., Edelsbrunner, H.: Topological data analysis. Inverse Problems 27(12) (2011)

    Google Scholar 

  51. Zomorodian, A.: Topological Data Analysis, vol. 70, pp. 1–39 (2012)

    Google Scholar 

  52. Blumberg, A., Mandell, M.: Quantitative homotopy theory in topological data analysis. Foundations of Computational Mathematics 13(6), 885–911 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  53. Tourlaki, G., Mylopoul, J.: Some results in computational topology. Journal of the ACM (JACM) 20(3), 439–455 (1973)

    Article  MathSciNet  Google Scholar 

  54. Kong, T.Y., Rosenfeld, A.: Digtial topology - introduction and survey. Computer Vision Graphics and Image Processing 48(3), 357–393 (1989)

    Article  Google Scholar 

  55. Holzinger, A., Schantl, J., Schroettner, M., Seifert, C., Verspoor, K.: Biomedical text mining: State-of-the-art, open problems and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 271–300. Springer, Berlin (2014)

    Google Scholar 

  56. Washio, T., Motoda, H.: State of the art of graph-based data mining. ACM SIGKDD Explorations Newsletter 5(1), 59 (2003)

    Article  Google Scholar 

  57. Jiang, C., Coenen, F., Sanderson, R., Zito, M.: Text classification using graph mining-based feature extraction. Knowledge-Based Systems 23(4), 302–308 (2010)

    Article  Google Scholar 

  58. Melcuk, I.: Dependency Syntax: Theory and Practice. State University of New York Press (1988)

    Google Scholar 

  59. Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. Artif. Int. Res. 1(1), 231–255 (1994)

    Google Scholar 

  60. Yoshida, K., Motoda, H., Indurkhya, N.: Graph-based induction as a unified learning framework. Applied Intelligence 4(3), 297–316 (1994)

    Article  Google Scholar 

  61. Dehaspe, L., Toivonen, H.: Discovery of frequent DATALOG patterns. Data Mining and Knowledge Discovery 3(1), 7–36 (1999)

    Article  Google Scholar 

  62. Fischer, I., Meinl, T.: Graph based molecular data mining – an overview. In: SMC, vol. 5, pp. 4578–4582. IEEE (2004)

    Google Scholar 

  63. Morales, L.P., Esteban, A.D., Gervás, P.: Concept-graph based biomedical automatic summarization using ontologies. In: Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing. TextGraphs-3, pp. 53–56. Association for Computational Linguistics, Stroudsburg (2008)

    Chapter  Google Scholar 

  64. Yan, X., Mehan, M.R., Huang, Y., Waterman, M.S., Yu, P.S., Zhou, X.J.: A graph-based approach to systematically reconstruct human transcriptional regulatory modules. Bioinformatics 23(13), i577–i586 (2007)

    Google Scholar 

  65. Agirre, E., Soroa, A., Stevenson, M.: Graph-based word sense disambiguation of biomedical documents. Bioinformatics 26(22), 2889–2896 (2010)

    Article  Google Scholar 

  66. Liu, H., Hunter, L., Keselj, V., Verspoor, K.: Approximate subgraph matching-based literature mining for biomedical events and relations. PLoS One 8(4) (April 2013)

    Google Scholar 

  67. Liu, H., Komandur, R., Verspoor, K.: From graphs to events: A subgraph matching approach for information extraction from biomedical text. In: Proceedings of BioNLP Shared Task 2011 Workshop, pp. 164–172. Association for Computational Linguistics (2011)

    Google Scholar 

  68. Nicolau, M., Levine, A.J., Carlsson, G.: Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences of the United States of America 108(17), 7265–7270 (2011)

    Article  Google Scholar 

  69. Carlsson, G.: Topology and Data. Bull. Amer. Math. Soc. 46, 255–308 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  70. Zhu, X.: Persistent homology: An introduction and a new text representation for natural language processing. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1953–1959. AAAI Press (2013)

    Google Scholar 

  71. Zhou, X., Han, H., Chankai, I., Prestrud, A., Brooks, A.: Approaches to text mining for clinical medical records. In: Proceedings of the 2006 ACM Symposium on Applied Computing, SAC 2006, p. 235–239. ACM Press, New York (2006)

    Google Scholar 

  72. Corley, C.D., Cook, D.J., Mikler, A.R., Singh, K.P.: Text and structural data mining of influenza mentions in Web and social media. International Journal of Environmental Research and Public Health 7(2), 596–615 (2010)

    Article  Google Scholar 

  73. Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics 5(1), 147 (2004)

    Article  Google Scholar 

  74. Barabási, A., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nature Reviews Genetics 12(1), 56–68 (2011)

    Article  Google Scholar 

  75. Delfinado, C.J.A., Edelsbrunner, H.: An incremental algorithm for betti numbers of simplicial complexes on the 3-sphere. Computer Aided Geometric Design 12(7), 771–784 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  76. Delfinado, C.J.A., Edelsbrunner, H.: An incremental algorithm for betti numbers of simplicial complexes. In: Proceedings of the Ninth Annual Symposium on Computational Geometry, pp. 232–239. ACM (1993)

    Google Scholar 

  77. Ellis, G.: Homological Algebra Programming. Contemporary Mathematics Series, vol. 470, pp. 63–74. Amer Mathematical Soc., Providence (2008)

    MATH  Google Scholar 

  78. Dumas, J.G., Gautier, T., Giesbrecht, M., Giorgi, P., Hovinen, B., Kaltofen, E., Saunders, B.D., Turner, W.J., Villard, G.: Linbox: A generic library for exact linear algebra. In: Cohen, A.M., Gao, X.S., Takayama, N. (eds.) 1st International Congress of Mathematical Software (ICMS 2002), pp. 40–50. World Scientific (2002)

    Google Scholar 

  79. Singh, G., Memoli, F., Carlsson, G.: Topological methods for the analysis of high dimensional data sets and 3d object recognition. In: Botsch, M., Pajarola, R. (eds.) Eurographics Symposium on Point-Based Graphics, vol. 22, pp. 91–100. Euro Graphics (2007)

    Google Scholar 

  80. Kobayashi, M.: Resources for studying statistical analysis of biomedical data and R. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 183–195. Springer, Heidelberg (2014)

    Google Scholar 

  81. Tausz, A., Vejdemo-Johansson, M., Adams, H.: Javaplex: A research software package for persistent (co) homology (2011), http://code.google.com/javaplex

  82. Vegter, G.: Computational topology, pp. 517–536. CRC Press, Inc., Boca Raton (2004)

    MATH  Google Scholar 

  83. Volodin, I., Kuznetsov, V., Fomenko, A.T.: The problem of discriminating algorithmically the standard three-dimensional sphere. Russian Mathematical Surveys 29(5), 71 (1974)

    Article  MATH  Google Scholar 

  84. Brehm, U., Khnel, W.: Combinatorial manifolds with few vertices. Topology 26(4), 465–473 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  85. Sarkaria, K.S.: Heawood inequalities. Journal of Combinatorial Theory, Series A 46(1), 50–78 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  86. Otasek, D., Pastrello, C., Holzinger, A., Jurisica, I.: Visual Data Mining: Effective Exploration ofthe Biological Universe. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 19–33. Springer, Heidelberg (2014)

    Google Scholar 

  87. Holzinger, A.: Human Computer Interaction & Knowledge Discovery (HCI-KDD): What is the benefit of bringing those two fields to work together? In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 319–328. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  88. Morozov, D., Weber, G.: Distributed merge trees. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, vol. 48, pp. 93–102 (August 2013)

    Google Scholar 

  89. Rieck, B., Mara, H., Leitte, H.: Multivariate data analysis using persistence-based filtering and topological signatures. IEEE Transactions on Visualization and Computer Graphics 18(12), 2382–2391 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Holzinger, A. (2014). On Topological Data Mining. In: Holzinger, A., Jurisica, I. (eds) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. Lecture Notes in Computer Science, vol 8401. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43968-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43968-5_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43967-8

  • Online ISBN: 978-3-662-43968-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics