Applied Intelligence

, Volume 49, Issue 5, pp 1675–1687 | Cite as

Large-margin learning of Cox proportional hazard models for survival analysis

  • Minyoung KimEmail author


Machine learning approaches have been recently attempted to tackle the prediction tasks in survival analysis. However, most existing methods aim to learn the prognostic function directly via linear regression or ranking models, unable to exploit the underlying density family, notably the famous CoxPH model. In this paper we propose a novel estimator for the CoxPH model based on the margin maximization principle, which was proven to achieve superb generalization performance in standard classification problems in machine learning. The censored data are effectively handled by incorporating cost-sensitive margin violation loss. We demonstrate the improved prediction performance on several survival datasets.


Large-margin learning Density estimation Cox proportional hazard models Survival analysis 


Funding Information

This study was supported by the Research Program funded by the SeoulTech (Seoul National University of Science & Technology).

Compliance with Ethical Standards

Conflict of interests

The authors have no conflict of interest.

Consent for Publication

Consent to submit this manuscript has been received tacitly from the authors’ institution, Seoul National University of Science & Technology.


  1. 1.
    Adams RP, Murray I, Mackay DJ (2009) Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities. International Conference on Machine LearningGoogle Scholar
  2. 2.
    Bartlett PL (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inf Theory 44(2):525–536MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual ACM Workshop on Computational Learning TheoryGoogle Scholar
  4. 4.
    Cheng S, Wei L, Ying Z (1997) Predicting survival probabilities with semiparametric transformation models. J Am Stat Assoc 92(437):227–235MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Cox D (1972) Regression models and life-tables (with discussion). J R Stat Soc Ser B 34(2):187–220zbMATHGoogle Scholar
  6. 6.
    Dabrowska D, Doksum K (1988) Partial likelihood in transformation models with censored data. Scand J Stat 15(1):1–23MathSciNetzbMATHGoogle Scholar
  7. 7.
    Dempsey WH, Moreno A, Scott CK, Dennis ML, Gustafson DH, Murphy SA, Rehg JM (2017) Isurvive: an interpretable, event-time prediction model for mHealth. International Conference on Machine LearningGoogle Scholar
  8. 8.
    Fernández T, Rivera N, Teh YW (2016) Gaussian processes for survival analysis. In: Advances in Neural Information Processing SystemsGoogle Scholar
  9. 9.
    Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. European Conference on Computational Learning TheoryGoogle Scholar
  10. 10.
    Gill P, Murray W, Wright M (1981) Practical optimization. Academic Press, LondonzbMATHGoogle Scholar
  11. 11.
    Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2(3):841–860MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Kalbfleisch J (1978) Likelihood methods and nonparametric tests. J Am Stat Assoc 73(361):167–170MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Kalbfleisch J, Prentice R (2002) The statistical analysis of failure time data. Wiley Series in Probability and Statistics, New YorkCrossRefzbMATHGoogle Scholar
  14. 14.
    Khan F, Zubek V (2008) Support vector regression for censored data (SVRc): A novel tool for survival analysis. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM)Google Scholar
  15. 15.
    Kim M, Pavlovic V (2011) Sequence classification via large margin hidden Markov models. Data Min Knowl Disc 23(2):322–344MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Kleinbaum DG, Klein M (2005) Survival analysis: a self-learning text (statistics for biology and health). Springer, BerlinzbMATHGoogle Scholar
  17. 17.
    Lillard P (2000) aml multilevel multiprocess statistical software. Release 1.0, EconWare, LA, CaliforniaGoogle Scholar
  18. 18.
    Prentice RL (1974) A log gamma model and its maximum likelihood estimation. Biometrika 61(3):539–544MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Ranganath R, Perotte A, Elhadad N, Blei D (2016) Deep survival analysis. Machine Learning for Health CareGoogle Scholar
  20. 20.
    Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. The MIT Press, CambridgezbMATHGoogle Scholar
  21. 21.
    Ross SM (2006) Simulation. Academic Press, New YorkzbMATHGoogle Scholar
  22. 22.
    Sauerbrei W, Royston P (1999) Building multivariable prognostic and diagnostic models: Transformation of the predictors by using fractional polynomials. J R Stat Soc Ser A 162(1):71–94CrossRefGoogle Scholar
  23. 23.
    Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, CambridgezbMATHGoogle Scholar
  24. 24.
    Shawe-Taylor J, Bartlett P, Williamson R, Anthony M (1996) A framework for structural risk minimisation. In: Proceedings of the 9th Annual Conference on Computational Learning Theory, Desenzano sul Garda, ItalyGoogle Scholar
  25. 25.
    Shivaswamy P, Chu W, Jansche M (2007) A support vector approach to censored targets. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM)Google Scholar
  26. 26.
    Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222MathSciNetCrossRefGoogle Scholar
  27. 27.
    Sorensen DC (1982) Newton’s method with a model trust region modification. SIAM J Numer Anal 19 (2):409–426MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Taskar B, Guestrin C, Koller D (2003) Max-margin Markov networks. Neural Information Processing Systems, Vancouver, BC, CanadaGoogle Scholar
  29. 29.
    Therneau TM, Grambsch PM (2000) Modeling Survival Data: Extending the Cox Model. Springer, New YorkCrossRefzbMATHGoogle Scholar
  30. 30.
    Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. ICMLGoogle Scholar
  31. 31.
    Van Belle V, Pelckmans K, Suykens J, Van Huffel S (2009) Learning transformation models for ranking and survival analysis. Tech. Rep., 09-45, ESAT-SISTA, K.U.Leuven (Leuven, Belgium)Google Scholar
  32. 32.
    Van Belle V, Pelckmans K, Van Huffel S, Suykens J (2011) Support vector methods for survival analysis: A comparison between ranking and regression approaches. Artif Intell Med 53(2):107–118CrossRefGoogle Scholar
  33. 33.
    Vapnik VN (1995) The nature of statistical learning theory. Springer, BerlinCrossRefzbMATHGoogle Scholar
  34. 34.
    Zhang T (2002) Covering number bounds of certain regularized linear function classes. J Mach Learn Res 2:527–550MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Electronics & IT Media EngineeringSeoul National University of Science & TechnologySeoulKorea

Personalised recommendations