Skip to main content

Cluster Identification via Persistent Homology and Other Clustering Techniques, with Application to Liver Transplant Data

  • Chapter
  • First Online:
Book cover Research in Computational Topology

Abstract

Clustering, an unsupervised learning method, can be very useful in detecting hidden patterns in complex and/or high-dimensional data. Persistent homology, a recently developed branch of computational topology, studies the evolution of topological features under a varying filtration parameter. At a fixed filtration parameter value, one can find different topological features in a dataset, such as connected components (zero-dimensional topological features), loops (one-dimensional topological features), and more generally, k-dimensional holes (k-dimensional topological features). In the classical sense, clusters correspond to zero-dimensional topological features. We explore whether higher dimensional homology can contribute to detecting hidden patterns in data. We observe that some loops formed in survival data seem to be able to detect outliers that other clustering techniques do not detect. We analyze patterns of patients in terms of their covariates and survival time, and determine the most important predictor variables in predicting survival times of liver transplant patients by applying a random survival forest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. C.C. Aggarwal, C.K. Reddy, Data Clustering. Algorithms and Applications (CRC Press, Boca Raton, 2014)

    Google Scholar 

  2. A. Ben-Hur, D. Horn, H. Siegelmann, V. Vapnik, Support vector clustering. J. Mach. Learn. Res. 2, 125–137 (2001)

    MATH  Google Scholar 

  3. L. Breiman, Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  Google Scholar 

  4. P. Bubenik, Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16, 77–102 (2015)

    MathSciNet  MATH  Google Scholar 

  5. G. Carlsson, Topology and data. Bull. Am. Math. Soc. 46(2), 255–308 (2009)

    Article  MathSciNet  Google Scholar 

  6. A. Collins, A. Zomorodian, G. Carlsson, L.J. Guibas, A barcode shape descriptor for curve point cloud data. Comput. Graph. 28, 881–894 (2004)

    Article  Google Scholar 

  7. V. de Silva, G. Carlsson, Topological Approximation by Small Simplicial Complexes. Eurographics Symposium on Point-Based Graphics (2004).

    Google Scholar 

  8. P. Dutkowski, C.E. Oberkofler, K. Slankamenac, M.A. Puhan, E. Schadde, B. Mullhaupt, A. Geier, P.A. Clavien, Are there better guidelines for allocation in liver transplantation? A novel score targeting justice and utility in the model for end-stage liver disease era. Ann. Surg. 254(5), 745–753 (2011)

    Google Scholar 

  9. H. Edelsbrunner, J. Harer, Computational Topology: An Introduction (American Mathematical Society, Providence, 2010)

    MATH  Google Scholar 

  10. H. Edelsbrunner, D. Letscher, A. Zomorodian, Topological persistence and simplification. Discrete Comput. Geom. 28, 511–533 (2002)

    Article  MathSciNet  Google Scholar 

  11. B.T. Fasy, J. Kim, F. Lecci, C. Maria, V. Rouvreau. TDA package for R. Statistical tools for topological data analysis, (2014), https://cran.r-project.org/.

  12. B.T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan, A. Singh, Statistical inference for persistent homology: confidence sets for persistence diagrams. arXiv:1303.7117v2 (2013)

    Google Scholar 

  13. S. Feng, N.P. Goodrich, J.L. Bragg-Gresham, D.M. Dykstra, J.D. Punch, M.A. DebRoy, S.M. Greenstein, R.M. Merion, Characteristics associated with liver graft failure: the concept of a donor risk index. Am. J. Transplant. 6(4), 783–790 (2006)

    Article  Google Scholar 

  14. J. Fridlyand, “Resampling Methods for Variable Selection and Classification: Applications to Genomics,” Ph.D. thesis, University of California, Berkeley, Dept. of Statistics, (2001).

    Google Scholar 

  15. R. Ghrist, Barcodes: the persistent topology of data. Bull. Am. Math. Soc. 45(1), 61–75 (2008)

    Article  MathSciNet  Google Scholar 

  16. L. Gilles, W. Louis, S. Antonio, G. Pierre, Understanding variable importance in forests of randomized trees, in NIPS’13 Proceedings of the 26th International Conference on Neural Information Processing Systems (Curran Associates Inc., Red Hook, 2013), pp. 431–439

    Google Scholar 

  17. T. Hastie, R. Tibshirani, J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd edn. (Springer, New York, 2009)

    Book  Google Scholar 

  18. G. Heo, J. Gamble, P. Kim, Topological analysis of variance and the maxillary complex. J. Acoust. Soc. Am. 107, 477–492 (2012)

    MathSciNet  MATH  Google Scholar 

  19. H. Ishwaran, U.B. Kogalur, Random survival forests for r. R News 7(2), 25–31 (2007)

    Google Scholar 

  20. H. Ishwaran, The effect of splitting on random forests. Mach. Learn. 99, 75–118 (2015)

    Article  MathSciNet  Google Scholar 

  21. H. Ishwaran, U.B. Kogalur, randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC). R package version 2.2.0 (2016). http://cran.r-project.org

  22. H. Ishwaran, U.B. Kogalur, E.H. Blackstone, M.S. Lauer, Random survival forests. Ann. Appl. Stat. 2, 841–860 (2008)

    Article  MathSciNet  Google Scholar 

  23. H. Ishwaran, U.B. Kogalur, X. Chen, A.J. Minn, Random survival forests for high-dimensional data. Stat. Anal. Data Min. 4, 115–132 (2011)

    Article  MathSciNet  Google Scholar 

  24. L. Kaufman, P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis (Wiley, New York, 2005)

    MATH  Google Scholar 

  25. D. Morozov, Dionysus: a C++ library for computing persistent homology (2007). http://www.mrzv.org/software/dionysus

    Google Scholar 

  26. H.S. Park, C.H. Jun, A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36, 3336–3341 (2009)

    Article  Google Scholar 

  27. M.S. Roberts, D.C. Angus, C.L. Bryce, Z. Valenta, L. Weissfeld, Survival after liver transplantation in the United States: a disease-specific analysis of the UNOS database. Liver Transpl. 10(7), 886–897 (2004)

    Article  Google Scholar 

  28. M.J. van der Laan, K.S. Pollard, J. Bryan, A new partitioning round medoids algorithm. J. Stat. Comput. Simul. 73(8), 575–584 (2003)

    Article  MathSciNet  Google Scholar 

  29. U. von Luxburg, A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  30. K. Xia, G.W. Wei, Persistent homology analysis of protein structure, flexibility, and folding. Int. J. Numer. Methods Biomed. Eng. 30(8), 814–844 (2014)

    Article  MathSciNet  Google Scholar 

  31. A. Zomorodian, G. Carlsson, Computing persistent homology. Discrete Comput. Geom. 33, 249–274 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We would like to thank Matthew Pietrosanu for his support in the computation process. We also thank Jisu Kim and Dmitriy Morozov for their insightful discussions and suggestions. We would like to acknowledge funding support provided by McIntyre Memorial Fund and NSERC DG 293180.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giseon Heo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 The Author(s) and the Association for Women in Mathematics

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wubie, B.A. et al. (2018). Cluster Identification via Persistent Homology and Other Clustering Techniques, with Application to Liver Transplant Data. In: Chambers, E., Fasy, B., Ziegelmeier, L. (eds) Research in Computational Topology. Association for Women in Mathematics Series, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-89593-2_9

Download citation

Publish with us

Policies and ethics