Variable Selection via Generalized SELO-Penalized Cox Regression Models

  • Yueyong ShiEmail author
  • Deyi Xu
  • Yongxiu Cao
  • Yuling Jiao


The seamless-L0 (SELO) penalty is a smooth function that very closely resembles the L0 penalty, which has been demonstrated theoretically and practically to be effective in nonconvex penalization for variable selection. In this paper, the authors first generalize the SELO penalty to a class of penalties retaining good features of SELO, and then develop variable selection and parameter estimation in Cox models using the proposed generalized SELO (GSELO) penalized log partial likelihood (PPL) approach. The authors show that the GSELO-PPL procedure possesses the oracle property with a diverging number of predictors under certain mild, interpretable regularity conditions. The entire path of GSELO-PPL estimates can be efficiently computed through a smoothing quasi-Newton (SQN) with continuation algorithm. The authors propose a consistent modified BIC (MBIC) tuning parameter selector for GSELO-PPL, and show that under some regularity conditions, the GSELOPPL- MBIC procedure consistently identifies the true model. Simulation studies and real data analysis are conducted to evaluate the finite sample performance of the proposed method.


Continuation Cox models generalized SELO modified BIC penalized likelihood smoothing quasi-Newton 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Cox D R, Regression models and life tables (with discussion), J. R. Stat. Soc. Ser. B Stat. Methodol., 1972, 34(2): 187–220.zbMATHGoogle Scholar
  2. [2]
    Tibshirani R, The lasso method for variable selection in the Cox model, Stat. Med., 1997, 16(4): 385–395.Google Scholar
  3. [3]
    Fan J and Li R, Variable selection for Cox’s proportional hazards model and frailty model, Ann. Stat., 2002, 30(1): 74–99.MathSciNetzbMATHGoogle Scholar
  4. [4]
    Cai J, Fan J, Li R, et al., Variable selection for multivariate failure time data, Biometrika, 2005, 92(2): 303–316.MathSciNetzbMATHGoogle Scholar
  5. [5]
    Zhang H H and Lu W, Adaptive Lasso for Cox’s proportional hazards model, Biometrika, 2007, 94(3): 691–703.MathSciNetzbMATHGoogle Scholar
  6. [6]
    Dicker L, Huang B, and Lin X, Variable selection and estimation with the seamless-L 0 penalty, Statist. Sinica, 2013, 23: 929–962.MathSciNetzbMATHGoogle Scholar
  7. [7]
    Fan J and Lü J, A selective overview of variable selection in high dimensional feature space, Statist. Sinica, 2010, 20(1): 101–148.MathSciNetzbMATHGoogle Scholar
  8. [8]
    Li Z, Wang S, and Lin X, Variable selection and estimation in generalized linear models with the seamless L 0 penalty, Canad. J. Statist., 2012, 40(4): 745–769.MathSciNetzbMATHGoogle Scholar
  9. [9]
    Zhang H, Sun J, and Wang D, Variable selection and estimation for multivariate panel count data via the seamless L 0 penalty, Canad. J. Statist., 2013, 41(2): 368–385.MathSciNetzbMATHGoogle Scholar
  10. [10]
    Ciuperca G, Model selection in high-dimensional quantile regression with seamless L 0 penalty, Statist. Probab. Lett., 2015, 107: 313–323.MathSciNetzbMATHGoogle Scholar
  11. [11]
    Nikolova M, Local strong homogeneity of a regularized estimator, SIAM J. Appl. Math., 2000, 61(2): 633–658.MathSciNetzbMATHGoogle Scholar
  12. [12]
    Lü J and Fan Y, A unified approach to model selection and sparse recovery using regularized least squares, Ann. Stat., 2009, 37(6A): 3498–3528.MathSciNetzbMATHGoogle Scholar
  13. [13]
    Nocedal J and Wright S, Numerical Optimization, 2nd Edition, Springer, New York, 2006.zbMATHGoogle Scholar
  14. [14]
    Shi Y, Cao Y, Jiao Y, et al., SICA for Cox’s proportional hazards model with a diverging number of parameters, Acta Math. Appl. Sin. Engl. Ser., 2014, 30(4): 887–902.MathSciNetzbMATHGoogle Scholar
  15. [15]
    Cao Y, Jiao Y, Shi Y, et al., Penalized variable selection procedure for Cox proportional hazards model via seamless-L 0 penalty, Sci. Sin. Math., 2018, 48(5): 643–660.Google Scholar
  16. [16]
    Chen X, Superlinear convergence of smoothing quasi-Newton methods for nonsmooth equations, J. Comput. Appl. Math., 1997, 80(1): 105–126.MathSciNetzbMATHGoogle Scholar
  17. [17]
    Chen X, Smoothing methods for nonsmooth, nonconvex minimization, Math. Program., 2012, 134(1): 71–99.MathSciNetzbMATHGoogle Scholar
  18. [18]
    Ma C F, Optimization Method and Its Matlab Program Design, Science Press, Beijing, 2010.Google Scholar
  19. [19]
    Fan J and Peng H, Nonconcave penalized likelihood with a diverging number of parameters, Ann. Stat., 2004, 32(3): 928–961.MathSciNetzbMATHGoogle Scholar
  20. [20]
    Zou H, Hastie T, and Tibshirani R, On the “degrees of freedom” of the lasso, Ann. Statist., 2007, 35(5): 2173–2192.MathSciNetzbMATHGoogle Scholar
  21. [21]
    Shi Y, Jiao Y, Yan L, et al., A modified BIC tuning parameter selector for SICA-penalized Cox regression models with diverging dimensionality, J. Math., 2017, 37(4): 723–730.zbMATHGoogle Scholar
  22. [22]
    Jiao Y, Jin B, and Lu X, A primal dual active set with continuation algorithm for the ℓ0- regularized optimization problem, Appl. Comput. Harmon. Anal., 2015, 39: 400–426.MathSciNetzbMATHGoogle Scholar
  23. [23]
    Jiao Y, Jin B, Lu X, et al., A primal dual active set algorithm for a class of nonconvex sparsity optimization, arXiv preprint arXiv:1310.1147v3, 2016.Google Scholar
  24. [24]
    Jiao Y, Jin B, and Lu X, Group sparse recovery via the ℓ0 (ℓ2) penalty: Theory and algorithm, IEEE Trans. Signal Process., 2017, 65(4): 998–1012.MathSciNetGoogle Scholar
  25. [25]
    Jiao Y, Jin B, and Lu X, Iterative soft/hard thresholding with homotopy continuation for sparse recovery, IEEE Signal Process. Lett., 2017, 24(6): 784–788.Google Scholar
  26. [26]
    Shi Y, Huang J, Jiao Y, et al., Semi-smooth Newton algorithm for non-convex penalized linear regression, arXiv preprint arXiv:1802.08895v1, 2018.Google Scholar
  27. [27]
    Hosmer D W, Lemeshow S, and May S, Applied Survival Analysis: Regression Modeling of Timeto-Event Data, 2nd Edition, Wiley, New York, 2008.Google Scholar
  28. [28]
    Therneau T, A Package for Survival Analysis in S, R Package Version 2.39-5.Google Scholar
  29. [29]
    Shi Y, Cao Y, Yu J, et al., Variable selection via generalized SELO-penalized linear regression models, Appl. Math. J. Chinese Univ. Ser. B, 2018, 33(2): 145–162.MathSciNetzbMATHGoogle Scholar
  30. [30]
    Lin W and Lü J, High-dimensional sparse additive hazards regression, J. Amer. Statist. Assoc., 2013, 108(501): 247–264.MathSciNetzbMATHGoogle Scholar
  31. [31]
    Lu Y, Zhang R, and Hu B, The adaptive LASSO spline estimation of single-index model, Journal of Systems Science & Complexity, 2016, 29(4): 1100–1111.MathSciNetzbMATHGoogle Scholar
  32. [32]
    Breheny P, The group exponential lasso for bi-level variable selection, Biometrics, 2015, 71(3): 731–740.MathSciNetzbMATHGoogle Scholar
  33. [33]
    Huang J, Liu L, Liu Y, et al., Group selection in the Cox model with a diverging number of covariates, Statist. Sinica, 2014, 24: 1787–1810.MathSciNetzbMATHGoogle Scholar
  34. [34]
    Fan J, Xue L, and Zou H, Strong oracle optimality of folded concave penalized estimation, Ann. Stat., 2014, 42(3): 819–849.MathSciNetzbMATHGoogle Scholar
  35. [35]
    Huang J, Sun T, Ying Z, et al., Oracle inequalities for the lasso in the Cox model, Ann. Stat., 2013, 41(3): 1142–1165.MathSciNetzbMATHGoogle Scholar
  36. [36]
    Zou H and Li R, One-step sparse estimates in nonconcave penalized likelihood models, Ann. Statist., 2008, 36(4): 1509–1533.MathSciNetzbMATHGoogle Scholar
  37. [37]
    Breheny P and Huang J, Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection, Ann. Appl. Stat., 2011, 5(1): 232–253.MathSciNetzbMATHGoogle Scholar
  38. [38]
    Mazumder R, Friedman J, and Hastie T, Sparsenet: Coordinate descent with nonconvex penalties, J. Amer. Statist. Assoc., 2011, 106(495): 1125–1138.MathSciNetzbMATHGoogle Scholar
  39. [39]
    Shi Y, Jiao Y, Cao Y, et al., An alternating direction method of multipliers for MCP-penalized regression with high-dimensional data, Acta Math. Sin. Engl. Ser., 2018, Scholar
  40. [40]
    Shi Y, Wu Y, Xu D, et al., An ADMM with continuation algorithm for non-convex SICA-penalized regression in high dimensions, J. Stat. Comput. Simul., 2018, 88(9): 1826–1846.MathSciNetGoogle Scholar
  41. [41]
    Tibshirani R, Regression shrinkage and selection via the Lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 1996, 58(1): 267–288.MathSciNetzbMATHGoogle Scholar
  42. [42]
    Fan J and Li R, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., 2001, 96(456): 1348–1360.MathSciNetzbMATHGoogle Scholar
  43. [43]
    Zhang C H, Nearly unbiased variable selection under minimax concave penalty, Ann. Statist., 2010, 38(2): 894–942.MathSciNetzbMATHGoogle Scholar

Copyright information

© Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Yueyong Shi
    • 1
    • 2
    Email author
  • Deyi Xu
    • 1
    • 2
  • Yongxiu Cao
    • 3
  • Yuling Jiao
    • 3
  1. 1.School of Economics and ManagementChina University of GeosciencesWuhanChina
  2. 2.Center for Resources and Environmental Economic ResearchChina University of GeosciencesWuhanChina
  3. 3.School of Statistics and MathematicsZhongnan University of Economics and LawWuhanChina

Personalised recommendations