Abstract
Clustering, an unsupervised learning method, can be very useful in detecting hidden patterns in complex and/or high-dimensional data. Persistent homology, a recently developed branch of computational topology, studies the evolution of topological features under a varying filtration parameter. At a fixed filtration parameter value, one can find different topological features in a dataset, such as connected components (zero-dimensional topological features), loops (one-dimensional topological features), and more generally, k-dimensional holes (k-dimensional topological features). In the classical sense, clusters correspond to zero-dimensional topological features. We explore whether higher dimensional homology can contribute to detecting hidden patterns in data. We observe that some loops formed in survival data seem to be able to detect outliers that other clustering techniques do not detect. We analyze patterns of patients in terms of their covariates and survival time, and determine the most important predictor variables in predicting survival times of liver transplant patients by applying a random survival forest.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
C.C. Aggarwal, C.K. Reddy, Data Clustering. Algorithms and Applications (CRC Press, Boca Raton, 2014)
A. Ben-Hur, D. Horn, H. Siegelmann, V. Vapnik, Support vector clustering. J. Mach. Learn. Res. 2, 125–137 (2001)
L. Breiman, Random forests. Mach. Learn. 45, 5–32 (2001)
P. Bubenik, Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16, 77–102 (2015)
G. Carlsson, Topology and data. Bull. Am. Math. Soc. 46(2), 255–308 (2009)
A. Collins, A. Zomorodian, G. Carlsson, L.J. Guibas, A barcode shape descriptor for curve point cloud data. Comput. Graph. 28, 881–894 (2004)
V. de Silva, G. Carlsson, Topological Approximation by Small Simplicial Complexes. Eurographics Symposium on Point-Based Graphics (2004).
P. Dutkowski, C.E. Oberkofler, K. Slankamenac, M.A. Puhan, E. Schadde, B. Mullhaupt, A. Geier, P.A. Clavien, Are there better guidelines for allocation in liver transplantation? A novel score targeting justice and utility in the model for end-stage liver disease era. Ann. Surg. 254(5), 745–753 (2011)
H. Edelsbrunner, J. Harer, Computational Topology: An Introduction (American Mathematical Society, Providence, 2010)
H. Edelsbrunner, D. Letscher, A. Zomorodian, Topological persistence and simplification. Discrete Comput. Geom. 28, 511–533 (2002)
B.T. Fasy, J. Kim, F. Lecci, C. Maria, V. Rouvreau. TDA package for R. Statistical tools for topological data analysis, (2014), https://cran.r-project.org/.
B.T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan, A. Singh, Statistical inference for persistent homology: confidence sets for persistence diagrams. arXiv:1303.7117v2 (2013)
S. Feng, N.P. Goodrich, J.L. Bragg-Gresham, D.M. Dykstra, J.D. Punch, M.A. DebRoy, S.M. Greenstein, R.M. Merion, Characteristics associated with liver graft failure: the concept of a donor risk index. Am. J. Transplant. 6(4), 783–790 (2006)
J. Fridlyand, “Resampling Methods for Variable Selection and Classification: Applications to Genomics,” Ph.D. thesis, University of California, Berkeley, Dept. of Statistics, (2001).
R. Ghrist, Barcodes: the persistent topology of data. Bull. Am. Math. Soc. 45(1), 61–75 (2008)
L. Gilles, W. Louis, S. Antonio, G. Pierre, Understanding variable importance in forests of randomized trees, in NIPS’13 Proceedings of the 26th International Conference on Neural Information Processing Systems (Curran Associates Inc., Red Hook, 2013), pp. 431–439
T. Hastie, R. Tibshirani, J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd edn. (Springer, New York, 2009)
G. Heo, J. Gamble, P. Kim, Topological analysis of variance and the maxillary complex. J. Acoust. Soc. Am. 107, 477–492 (2012)
H. Ishwaran, U.B. Kogalur, Random survival forests for r. R News 7(2), 25–31 (2007)
H. Ishwaran, The effect of splitting on random forests. Mach. Learn. 99, 75–118 (2015)
H. Ishwaran, U.B. Kogalur, randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC). R package version 2.2.0 (2016). http://cran.r-project.org
H. Ishwaran, U.B. Kogalur, E.H. Blackstone, M.S. Lauer, Random survival forests. Ann. Appl. Stat. 2, 841–860 (2008)
H. Ishwaran, U.B. Kogalur, X. Chen, A.J. Minn, Random survival forests for high-dimensional data. Stat. Anal. Data Min. 4, 115–132 (2011)
L. Kaufman, P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis (Wiley, New York, 2005)
D. Morozov, Dionysus: a C++ library for computing persistent homology (2007). http://www.mrzv.org/software/dionysus
H.S. Park, C.H. Jun, A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36, 3336–3341 (2009)
M.S. Roberts, D.C. Angus, C.L. Bryce, Z. Valenta, L. Weissfeld, Survival after liver transplantation in the United States: a disease-specific analysis of the UNOS database. Liver Transpl. 10(7), 886–897 (2004)
M.J. van der Laan, K.S. Pollard, J. Bryan, A new partitioning round medoids algorithm. J. Stat. Comput. Simul. 73(8), 575–584 (2003)
U. von Luxburg, A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
K. Xia, G.W. Wei, Persistent homology analysis of protein structure, flexibility, and folding. Int. J. Numer. Methods Biomed. Eng. 30(8), 814–844 (2014)
A. Zomorodian, G. Carlsson, Computing persistent homology. Discrete Comput. Geom. 33, 249–274 (2005)
Acknowledgements
We would like to thank Matthew Pietrosanu for his support in the computation process. We also thank Jisu Kim and Dmitriy Morozov for their insightful discussions and suggestions. We would like to acknowledge funding support provided by McIntyre Memorial Fund and NSERC DG 293180.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 The Author(s) and the Association for Women in Mathematics
About this chapter
Cite this chapter
Wubie, B.A. et al. (2018). Cluster Identification via Persistent Homology and Other Clustering Techniques, with Application to Liver Transplant Data. In: Chambers, E., Fasy, B., Ziegelmeier, L. (eds) Research in Computational Topology. Association for Women in Mathematics Series, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-89593-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-89593-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89592-5
Online ISBN: 978-3-319-89593-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)