High-Dimensional Classification

Part of the Springer Handbooks of Computational Statistics book series (SHCS)


There are three fundamental goals in constructing a good high-dimensional classifier: high accuracy, interpretable feature selection, and efficient computation. In the past 15 years, several popular high-dimensional classifiers have been developed and studied in the literature. These classifiers can be roughly divided into two categories: sparse penalized margin-based classifiers and sparse discriminant analysis. In this chapter we give a comprehensive review of these popular high-dimensional classifiers.


Classifier Bayes rule High dimensional data Regularization Sparsity Variable selection 


  1. Bartlett P, Jordan M, McAuliffe J (2006) Convexity, classification and risk bounds. J Am Stat Assoc 101:138–156MathSciNetCrossRefGoogle Scholar
  2. Bradley P, Mangasarian O (1998) Feature selection via concave minimization and support vector machines. In: Machine learning proceedings of the fifteenth international conference (ICML’98), Citeseer, pp 82–90Google Scholar
  3. Breiman L (2001) Random forests. Mach Learn 45:5–32Google Scholar
  4. Cai T, Liu W (2011) A direct estimation approach to sparse linear discriminant analysis. J Am Stat Assoc 106:1566–1577MathSciNetCrossRefGoogle Scholar
  5. Clemmensen L, Hastie T, Witten D, Ersboll B (2011) Sparse discriminant analysis. Technometrics 53:406–413MathSciNetCrossRefGoogle Scholar
  6. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression (with discussion). Ann Stat 32:407–499Google Scholar
  7. Fan J, Fan Y (2008) High dimensional classification using features annealed independence rules. Ann Stat 36:2605–2637MathSciNetCrossRefGoogle Scholar
  8. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360MathSciNetCrossRefGoogle Scholar
  9. Fan J, Feng Y, Tong X (2012) A ROAD to classification in high dimensional space. J R Stat Soc Ser B 74:745–771Google Scholar
  10. Fan J, Xue L, Zou H (2014) Strong oracle optimality of folded concave penalized estimation. Ann Stat 42:819–849MathSciNetCrossRefGoogle Scholar
  11. Fisher R (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188CrossRefGoogle Scholar
  12. Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232Google Scholar
  13. Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22Google Scholar
  14. Freund Y, Schapire R, (1997) A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J Comput Syst Sci 55(1):119–139MathSciNetCrossRefGoogle Scholar
  15. Graham K, de LasMorenas A, Tripathi A, King C, Kavanah M, Mendez J, Stone M, Slama J, Miller M, Antoine G et al (2010) Gene expression in histologically normal epithelium from breast cancer patients and from cancer-free prophylactic mastectomy patients shares a similar profile. Br J Cancer 102:1284–1293CrossRefGoogle Scholar
  16. Hand DJ (2006) Classifier technology and the illusion of progress. Stat Sci 21:1–14MathSciNetCrossRefGoogle Scholar
  17. Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall/CRC, Boca RatonGoogle Scholar
  18. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New YorkGoogle Scholar
  19. Hunter D, Lange K (2004) A tutorial on MM algorithms. Am Stat 58:30–37MathSciNetCrossRefGoogle Scholar
  20. Lin Y, Jeon Y (2003) Discriminant analysis through a semiparametric model. Biometrika 90:379–392MathSciNetCrossRefGoogle Scholar
  21. Liu J, Ji S, Ye J (2009) SLEP: sparse learning with efficient projections. Arizona State University.
  22. Mai Q, Zou H (2015) Sparse semiparametric discriminant analysis. J Multivar Anal 135:175–188MathSciNetCrossRefGoogle Scholar
  23. Mai, Q, Zou H, Yuan M (2012) A direct approach to sparse discriminant analysis in ultra-high dimensions. Biometrika 99:29–42MathSciNetCrossRefGoogle Scholar
  24. Mai Q, Yang Y, Zou H (2018, to appear) Multiclass sparse discriminant analysis. Stat SinGoogle Scholar
  25. Meier L, van de Geer S, Buhlmann P (2008) The group lasso for logistic regression. J R Stat Soc Ser B 70:53–71MathSciNetCrossRefGoogle Scholar
  26. Michie D, Spiegelhalter D, Taylor C (1994) Machine learning, neural and statistical classification, 1st edn. Ellis Horwood, Upper Saddle RiverGoogle Scholar
  27. Mika S, Räsch G, Weston J, Schölkopf B, Müller KR (1999) Fisher discriminant analysis with kernels. Neural Netw Signal Process IX:41–48Google Scholar
  28. Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Springer, New YorkGoogle Scholar
  29. Nesterov Y (2007) Gradient methods for minimizing composite objective function. Technical Report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (UCL)Google Scholar
  30. Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie J, Lander E, Loda M, Kantoff P, Golub T, Sellers W (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209CrossRefGoogle Scholar
  31. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58: 267–288Google Scholar
  32. Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci 99:6567–6572CrossRefGoogle Scholar
  33. Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109:475–494MathSciNetCrossRefGoogle Scholar
  34. Vapnik V (1996) The nature of statistical learning. Springer, New YorkGoogle Scholar
  35. Wang L, Shen X (2006) Multicategory support vector machines, feature selection and solution path. Stat Sin 16:617–634Google Scholar
  36. Wang L, Zhu J, Zou H (2008) Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24:412–419CrossRefGoogle Scholar
  37. Witten D, Tibshirani R (2011) Penalized classification using Fisher’s linear discriminant. J R Stat Soc Ser B 73:753–772MathSciNetCrossRefGoogle Scholar
  38. Wu M, Zhang L, Wang Z, Christiani D, Lin X (2008) Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set pathway and gene selection. Bioinformatics 25:1145–1151CrossRefGoogle Scholar
  39. Yang Y, Zou H (2013) An efficient algorithm for computing the HHSVM and its generalizations. J Comput Graph Stat 22:396–415MathSciNetCrossRefGoogle Scholar
  40. Yang Y, Zou H (2015) A fast unified algorithm for solving group-lasso penalized learning problems. Stat Comput 25(6):1129–1141MathSciNetCrossRefGoogle Scholar
  41. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68:49–67MathSciNetCrossRefGoogle Scholar
  42. Zhang C-H (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942MathSciNetCrossRefGoogle Scholar
  43. Zhang HH, Ahn J, Lin X, Park C (2006) Gene selection using support vector machines with nonconvex penalty. Bioinformatics 2:88–95CrossRefGoogle Scholar
  44. Zhu J, Hastie T (2005) Kernel logistic regression and the import vector machine. J Comput Graph Stat 14:185–205MathSciNetCrossRefGoogle Scholar
  45. Zhu J, Rosset S, Hastie T, Tibshirani R (2003) 1-Norm support vector machine. In: Neural information processing systems. MIT Press, Cambridge, p 16Google Scholar
  46. Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429MathSciNetCrossRefGoogle Scholar
  47. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320MathSciNetCrossRefGoogle Scholar
  48. Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Ann Stat 36:1509–1533CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of StatisticsUniversity of MinnesotaMinneapolisUSA

Personalised recommendations