Cluster Identification via Persistent Homology and Other Clustering Techniques, with Application to Liver Transplant Data

Wubie, Berhanu A.; Andres, Axel; Greiner, Russell; Hoehn, Bret; Montano-Loza, Aldo; Kneteman, Norman; Heo, Giseon

doi:10.1007/978-3-319-89593-2_9

Berhanu A. Wubie⁵,
Axel Andres⁶,
Russell Greiner⁷,
Bret Hoehn⁸,
Aldo Montano-Loza⁹,
Norman Kneteman¹⁰ &
…
Giseon Heo¹¹

Part of the book series: Association for Women in Mathematics Series ((AWMS,volume 13))

1112 Accesses
1 Citations

Abstract

Clustering, an unsupervised learning method, can be very useful in detecting hidden patterns in complex and/or high-dimensional data. Persistent homology, a recently developed branch of computational topology, studies the evolution of topological features under a varying filtration parameter. At a fixed filtration parameter value, one can find different topological features in a dataset, such as connected components (zero-dimensional topological features), loops (one-dimensional topological features), and more generally, k-dimensional holes (k-dimensional topological features). In the classical sense, clusters correspond to zero-dimensional topological features. We explore whether higher dimensional homology can contribute to detecting hidden patterns in data. We observe that some loops formed in survival data seem to be able to detect outliers that other clustering techniques do not detect. We analyze patterns of patients in terms of their covariates and survival time, and determine the most important predictor variables in predicting survival times of liver transplant patients by applying a random survival forest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

C.C. Aggarwal, C.K. Reddy, Data Clustering. Algorithms and Applications (CRC Press, Boca Raton, 2014)
Google Scholar
A. Ben-Hur, D. Horn, H. Siegelmann, V. Vapnik, Support vector clustering. J. Mach. Learn. Res. 2, 125–137 (2001)
MATH Google Scholar
L. Breiman, Random forests. Mach. Learn. 45, 5–32 (2001)
Article Google Scholar
P. Bubenik, Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16, 77–102 (2015)
MathSciNet MATH Google Scholar
G. Carlsson, Topology and data. Bull. Am. Math. Soc. 46(2), 255–308 (2009)
Article MathSciNet Google Scholar
A. Collins, A. Zomorodian, G. Carlsson, L.J. Guibas, A barcode shape descriptor for curve point cloud data. Comput. Graph. 28, 881–894 (2004)
Article Google Scholar
V. de Silva, G. Carlsson, Topological Approximation by Small Simplicial Complexes. Eurographics Symposium on Point-Based Graphics (2004).
Google Scholar
P. Dutkowski, C.E. Oberkofler, K. Slankamenac, M.A. Puhan, E. Schadde, B. Mullhaupt, A. Geier, P.A. Clavien, Are there better guidelines for allocation in liver transplantation? A novel score targeting justice and utility in the model for end-stage liver disease era. Ann. Surg. 254(5), 745–753 (2011)
Google Scholar
H. Edelsbrunner, J. Harer, Computational Topology: An Introduction (American Mathematical Society, Providence, 2010)
MATH Google Scholar
H. Edelsbrunner, D. Letscher, A. Zomorodian, Topological persistence and simplification. Discrete Comput. Geom. 28, 511–533 (2002)
Article MathSciNet Google Scholar
B.T. Fasy, J. Kim, F. Lecci, C. Maria, V. Rouvreau. TDA package for R. Statistical tools for topological data analysis, (2014), https://cran.r-project.org/.
B.T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan, A. Singh, Statistical inference for persistent homology: confidence sets for persistence diagrams. arXiv:1303.7117v2 (2013)
Google Scholar
S. Feng, N.P. Goodrich, J.L. Bragg-Gresham, D.M. Dykstra, J.D. Punch, M.A. DebRoy, S.M. Greenstein, R.M. Merion, Characteristics associated with liver graft failure: the concept of a donor risk index. Am. J. Transplant. 6(4), 783–790 (2006)
Article Google Scholar
J. Fridlyand, “Resampling Methods for Variable Selection and Classification: Applications to Genomics,” Ph.D. thesis, University of California, Berkeley, Dept. of Statistics, (2001).
Google Scholar
R. Ghrist, Barcodes: the persistent topology of data. Bull. Am. Math. Soc. 45(1), 61–75 (2008)
Article MathSciNet Google Scholar
L. Gilles, W. Louis, S. Antonio, G. Pierre, Understanding variable importance in forests of randomized trees, in NIPS’13 Proceedings of the 26th International Conference on Neural Information Processing Systems (Curran Associates Inc., Red Hook, 2013), pp. 431–439
Google Scholar
T. Hastie, R. Tibshirani, J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd edn. (Springer, New York, 2009)
Book Google Scholar
G. Heo, J. Gamble, P. Kim, Topological analysis of variance and the maxillary complex. J. Acoust. Soc. Am. 107, 477–492 (2012)
MathSciNet MATH Google Scholar
H. Ishwaran, U.B. Kogalur, Random survival forests for r. R News 7(2), 25–31 (2007)
Google Scholar
H. Ishwaran, The effect of splitting on random forests. Mach. Learn. 99, 75–118 (2015)
Article MathSciNet Google Scholar
H. Ishwaran, U.B. Kogalur, randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC). R package version 2.2.0 (2016). http://cran.r-project.org
H. Ishwaran, U.B. Kogalur, E.H. Blackstone, M.S. Lauer, Random survival forests. Ann. Appl. Stat. 2, 841–860 (2008)
Article MathSciNet Google Scholar
H. Ishwaran, U.B. Kogalur, X. Chen, A.J. Minn, Random survival forests for high-dimensional data. Stat. Anal. Data Min. 4, 115–132 (2011)
Article MathSciNet Google Scholar
L. Kaufman, P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis (Wiley, New York, 2005)
MATH Google Scholar
D. Morozov, Dionysus: a C++ library for computing persistent homology (2007). http://www.mrzv.org/software/dionysus
Google Scholar
H.S. Park, C.H. Jun, A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36, 3336–3341 (2009)
Article Google Scholar
M.S. Roberts, D.C. Angus, C.L. Bryce, Z. Valenta, L. Weissfeld, Survival after liver transplantation in the United States: a disease-specific analysis of the UNOS database. Liver Transpl. 10(7), 886–897 (2004)
Article Google Scholar
M.J. van der Laan, K.S. Pollard, J. Bryan, A new partitioning round medoids algorithm. J. Stat. Comput. Simul. 73(8), 575–584 (2003)
Article MathSciNet Google Scholar
U. von Luxburg, A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Article MathSciNet Google Scholar
K. Xia, G.W. Wei, Persistent homology analysis of protein structure, flexibility, and folding. Int. J. Numer. Methods Biomed. Eng. 30(8), 814–844 (2014)
Article MathSciNet Google Scholar
A. Zomorodian, G. Carlsson, Computing persistent homology. Discrete Comput. Geom. 33, 249–274 (2005)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We would like to thank Matthew Pietrosanu for his support in the computation process. We also thank Jisu Kim and Dmitriy Morozov for their insightful discussions and suggestions. We would like to acknowledge funding support provided by McIntyre Memorial Fund and NSERC DG 293180.

Author information

Authors and Affiliations

Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, Canada
Berhanu A. Wubie
Service de chirurgie viscérale et transplantation, Hôpitaux Universitaires de Genève, Geneva, Switzerland
Axel Andres
Computing Science, University of Alberta, Edmonton, Canada
Russell Greiner
Alberta Innovates Centre for Machine Learning, Edmonton, Canada
Bret Hoehn
Hepatology, Department of Medicine, University of Alberta Hospital, Edmonton, Canada
Aldo Montano-Loza
Transplantation surgery, Dept. of Surgery, University of Alberta Hospital, Edmonton, Canada
Norman Kneteman
School of Dentistry, University of Alberta, Edmonton, Alberta, Canada
Giseon Heo

Authors

Berhanu A. Wubie
View author publications
You can also search for this author in PubMed Google Scholar
Axel Andres
View author publications
You can also search for this author in PubMed Google Scholar
Russell Greiner
View author publications
You can also search for this author in PubMed Google Scholar
Bret Hoehn
View author publications
You can also search for this author in PubMed Google Scholar
Aldo Montano-Loza
View author publications
You can also search for this author in PubMed Google Scholar
Norman Kneteman
View author publications
You can also search for this author in PubMed Google Scholar
Giseon Heo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giseon Heo .

Editor information

Editors and Affiliations

Department of Computer Science, St Louis University, St Louis, Missouri, USA
Erin Wolf Chambers
Gianforte School of Computing, and Department of Mathematical Sciences, Montana State University, Bozeman, Montana, USA
Brittany Terese Fasy
Department of Mathematics, Statistics, & Computer Science, Macalester College, Saint Paul, Minnesota, USA
Lori Ziegelmeier

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wubie, B.A. et al. (2018). Cluster Identification via Persistent Homology and Other Clustering Techniques, with Application to Liver Transplant Data. In: Chambers, E., Fasy, B., Ziegelmeier, L. (eds) Research in Computational Topology. Association for Women in Mathematics Series, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-89593-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-89593-2_9
Published: 31 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89592-5
Online ISBN: 978-3-319-89593-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics