High-Dimensional Data Classification

  • Vijay PappuEmail author
  • Panos M. Pardalos
Part of the Springer Optimization and Its Applications book series (SOIA, volume 92)


Recently, high-dimensional classification problems have been ubiquitous due to significant advances in technology. High dimensionality poses significant statistical challenges and renders many traditional classification algorithms impractical to use. In this chapter, we present a comprehensive overview of different classifiers that have been highly successful in handling high-dimensional data classification problems. We start with popular methods such as Support Vector Machines and variants of discriminant functions and discuss in detail their applications and modifications to several problems in high-dimensional settings. We also examine regularization techniques and their integration to several existing algorithms. We then discuss more recent methods, namely the hybrid classifiers and the ensemble classifiers. Feature selection techniques, as a part of hybrid classifiers, are introduced and their relative merits and drawbacks are examined. Lastly, we describe AdaBoost and Random Forests in the ensemble classifiers and discuss their recent surge as useful algorithms for solving high-dimensional data problems.


High dimensional data classification Ensemble methods Feature selection Curse of dimensionality Regularization 



This research is partially supported by NSF & DTRA grants


  1. 1.
    Ben-Bassat, M.: 35 use of distance measures, information measures and error bounds in feature evaluation. In: Handbook of Statistics, vol. 2, pp. 773–791. North-Holland, Amsterdam (1982)Google Scholar
  2. 2.
    Bickel, P., Levina, E.: Some theory for fisher’s linear discriminant function, Naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10(6), 989–1010 (2004)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Bishop, C.: Pattern Recognition and Machine Learning. Springer, New York (2006)zbMATHGoogle Scholar
  4. 4.
    Bo, T., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biol. 3(4), 1–11 (2002)CrossRefGoogle Scholar
  5. 5.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  6. 6.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  7. 7.
    Breiman, L.: Prediction games and arcing algorithms. Neural Comput. 11(7), 1493–1517 (1999)CrossRefGoogle Scholar
  8. 8.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  9. 9.
    Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C., Furey, T., Ares, M., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97(1), 262 (2000)CrossRefGoogle Scholar
  10. 10.
    Bühlmann, P.: Boosting methods: why they can be useful for high-dimensional data. In: Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC) (2003)Google Scholar
  11. 11.
    Bühlmann, P., Yu, B.: Boosting with the l 2 loss: regression and classification. J. Am. Stat. Assoc. 98(462), 324–339 (2003)CrossRefzbMATHGoogle Scholar
  12. 12.
    Burges, C.: Advances in Kernel Methods: Support Vector Learning. The MIT Press, Cambridge (1999)Google Scholar
  13. 13.
    Byvatov, E., Schneider, G., et al.: Support vector machine applications in bioinformatics. Appl. Bioinformatics 2(2), 67–77 (2003)Google Scholar
  14. 14.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)CrossRefGoogle Scholar
  15. 15.
    Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Mach. Learn. 46(1), 131–159 (2002)CrossRefzbMATHGoogle Scholar
  16. 16.
    Chung, K., Kao, W., Sun, C., Wang, L., Lin, C.: Radius margin bounds for support vector machines with the rbf kernel. Neural Comput. 15(11), 2643–2681 (2003)CrossRefzbMATHGoogle Scholar
  17. 17.
    Clarke, R., Ressom, H., Wang, A., Xuan, J., Liu, M., Gehan, E., Wang, Y.: The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat. Rev. Cancer 8(1), 37–49 (2008)CrossRefGoogle Scholar
  18. 18.
    Clemmensen, L., Hastie, T., Witten, D., Ersbøll, B.: Sparse discriminant analysis. Technometrics 53(4), 406–413 (2011)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Dabney, A.: Classification of microarrays to nearest centroids. Bioinformatics 21(22), 4148–4154 (2005)CrossRefGoogle Scholar
  20. 20.
    Davis, L., Mitchell, M.: Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York (1991)Google Scholar
  21. 21.
    De Maesschalck, R., Jouan-Rimbaud, D., Massart, D.: The mahalanobis distance. Chemometr. Intell. Lab. Syst. 50(1), 1–18 (2000)Google Scholar
  22. 22.
    Den Hertog, D.: Interior Point Approach to Linear, Quadratic and Convex Programming: Algorithms and Complexity. Kluwer Academic, Norwell (1992)Google Scholar
  23. 23.
    Dettling, M., Bühlmann, P.: Boosting for tumor classification with gene expression data. Bioinformatics 19(9), 1061–1069 (2003)CrossRefGoogle Scholar
  24. 24.
    Díaz-Uriarte, R., De Andres, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(3), 1–13 (2006)Google Scholar
  25. 25.
    Dietterich, T.: Ensemble methods in machine learning. In: Multiple Classifier Systems, pp. 1–15. Springer, Heidelberg (2000)Google Scholar
  26. 26.
    Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinforma. Comput. Biol. 3(2), 185–205 (2005)CrossRefMathSciNetGoogle Scholar
  27. 27.
    Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley-Interscience, London (2001)zbMATHGoogle Scholar
  28. 28.
    Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97(457), 77–87 (2002)CrossRefzbMATHMathSciNetGoogle Scholar
  29. 29.
    Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)CrossRefzbMATHMathSciNetGoogle Scholar
  30. 30.
    Fenn, M., Pappu, V.: Data mining for cancer biomarkers with raman spectroscopy. In: Data Mining for Biomarker Discovery, pp. 143–168. Springer, Berlin (2012)Google Scholar
  31. 31.
    Ferri, F., Pudil, P., Hatef, M., Kittler, J.: Comparative study of techniques for large-scale feature selection. In: Pattern Recognition in Practice IV: Multiple Paradigms, Comparative Studies, and Hybrid Systems, pp. 403–413. IEEE Xplore (1994)Google Scholar
  32. 32.
    Freund, Y.: Boosting a weak learning algorithm by majority. Inf. Comput. 121(2), 256–285 (1995)CrossRefzbMATHMathSciNetGoogle Scholar
  33. 33.
    Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning, pp. 148–156. Morgan Kaufmann, Los Altos (1996)Google Scholar
  34. 34.
    Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14(1612), 771–780 (1999)Google Scholar
  35. 35.
    Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, Berlin (2001)zbMATHGoogle Scholar
  36. 36.
    Fu, S., Desmarais, M.: Markov blanket based feature selection: a review of past decade. In: Proceedings of the World Congress on Engineering, vol. 1, pp. 321–328 (2010). CiteseerGoogle Scholar
  37. 37.
    Genuer, R., Poggi, J., Tuleau-Malot, C.: Variable selection using random forests. Pattern Recognit. Lett. 31(14), 2225–2236 (2010)CrossRefGoogle Scholar
  38. 38.
    Gislason, P., Benediktsson, J., Sveinsson, J.: Random forests for land cover classification. Pattern Recognit. Lett. 27(4), 294–300 (2006)CrossRefGoogle Scholar
  39. 39.
    Guo, X., Yang, J., Wu, C., Wang, C., Liang, Y.: A novel ls-svms hyper-parameter selection based on particle swarm optimization. Neurocomputing 71(16), 3211–3215 (2008)CrossRefGoogle Scholar
  40. 40.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  41. 41.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)CrossRefzbMATHGoogle Scholar
  42. 42.
    Hall, M.: Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato (1999)Google Scholar
  43. 43.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  44. 44.
    Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, Englewood (2004)Google Scholar
  45. 45.
    Herbert, P., Tiejun, T.: Recent advances in discriminant analysis for high-dimensional data classification. J. Biom. Biostat. 3(2), 1–2 (2012)Google Scholar
  46. 46.
    Hua, J., Tembe, W., Dougherty, E.: Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognit. 42(3), 409–424 (2009)CrossRefzbMATHGoogle Scholar
  47. 47.
    Huang, C., Wang, C.: A ga-based feature selection and parameters optimization for support vector machines. Expert Syst. Appl. 31(2), 231–240 (2006)CrossRefGoogle Scholar
  48. 48.
    Huang, S., Tong, T., Zhao, H.: Bias-corrected diagonal discriminant rules for high-dimensional classification. Biometrics 66(4), 1096–1106 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  49. 49.
    Hughes, G.: On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 14(1), 55–63 (1968)CrossRefGoogle Scholar
  50. 50.
    Jain, A., Duin, R., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 4–37 (2000)CrossRefGoogle Scholar
  51. 51.
    Jiang, H., Deng, Y., Chen, H., Tao, L., Sha, Q., Chen, J., Tsai, C., Zhang, S.: Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinformatics 5(81), 1–12 (2004)Google Scholar
  52. 52.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Machine Learning: ECML-98, pp. 137–142. Springer, Berlin (1998)Google Scholar
  53. 53.
    Johnstone, I., Titterington, D.: Statistical challenges of high-dimensional data. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 367(1906), 4237–4253 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  54. 54.
    Kearns, M., Valiant, L.: Learning Boolean formulae or finite automata is as hard as factoring. Center for Research in Computing Technology, Aiken Computation Laboratory, Harvard University (1988)Google Scholar
  55. 55.
    Kirkpatrick, S., Gelatt, C. Jr., Vecchi, M.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)CrossRefzbMATHMathSciNetGoogle Scholar
  56. 56.
    Kittler, J.: Feature set search algorithms. In: Pattern Recognition and Signal Processing, pp. 41–60. Sijthoff and Noordhoff, Alphen aan den Rijn (1978)Google Scholar
  57. 57.
    Kleinbaum, D., Klein, M., Pryor, E.: Logistic Regression: A Self-learning Text. Springer, Berlin (2002)Google Scholar
  58. 58.
    Kohavi, R., John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRefzbMATHGoogle Scholar
  59. 59.
    Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the 13th International Conference on Machine Learning, pp. 284–292 (1996)Google Scholar
  60. 60.
    Köppen, M.: The curse of dimensionality. In: Proceedings of the 5th Online World Conference on Soft Computing in Industrial Applications (WSC5), pp. 4–8 (2000)Google Scholar
  61. 61.
    Lin, S., Lee, Z., Chen, S., Tseng, T.: Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl. Soft Comput. 8(4), 1505–1512 (2008)CrossRefGoogle Scholar
  62. 62.
    Lin, S., Ying, K., Chen, S., Lee, Z.: Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst. Appl. 35(4), 1817–1824 (2008)CrossRefGoogle Scholar
  63. 63.
    Ma, S., Huang, J.: Regularized roc method for disease classification and biomarker selection with microarray data. Bioinformatics 21(24), 4356–4362 (2005)CrossRefGoogle Scholar
  64. 64.
    McLachlan, G., Wiley, J.: Discriminant Analysis and Statistical Pattern Recognition. Wiley Online Library, New York (1992)CrossRefGoogle Scholar
  65. 65.
    Minh, H., Niyogi, P., Yao, Y.: Mercer’s theorem, feature maps, and smoothing. In: Learning Theory, pp. 154–168. Springer Berlin Heidelberg (2006)Google Scholar
  66. 66.
    Mourão-Miranda, J., Bokde, A., Born, C., Hampel, H., Stetter, M.: Classifying brain states and determining the discriminating activation patterns: support vector machine on functional MRI data. NeuroImage 28(4), 980–995 (2005)CrossRefGoogle Scholar
  67. 67.
    Pal, M.: Support vector machine-based feature selection for land cover classification: a case study with dais hyperspectral data. Int. J. Remote Sens. 27(14), 2877–2894 (2006)CrossRefGoogle Scholar
  68. 68.
    Pal, M., Foody, G.: Feature selection for classification of hyperspectral data by svm. IEEE Trans. Geosci. Remote Sens. 48(5), 2297–2307 (2010)CrossRefGoogle Scholar
  69. 69.
    Pal, M., Mather, P.: Support vector machines for classification in remote sensing. Int. J. Remote Sens. 26(5), 1007–1011 (2005)CrossRefGoogle Scholar
  70. 70.
    Pang, H., Lin, A., Holford, M., Enerson, B., Lu, B., Lawton, M., Floyd, E., Zhao, H.: Pathway analysis using random forests classification and regression. Bioinformatics 22(16), 2028–2036 (2006)CrossRefGoogle Scholar
  71. 71.
    Pang, H., Tong, T., Zhao, H.: Shrinkage-based diagonal discriminant analysis and its applications in high-dimensional data. Biometrics 65(4), 1021–1029 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  72. 72.
    Pudil, P., Novovičová, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognit. Lett. 15(11), 1119–1125 (1994)CrossRefGoogle Scholar
  73. 73.
    Qiao, Z., Zhou, L., Huang, J.: Sparse linear discriminant analysis with applications to high dimensional low sample size data. Int. J. Appl. Math. 39(1), 6–29 (2009)MathSciNetGoogle Scholar
  74. 74.
    Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98(26), 15149–15154 (2001)CrossRefGoogle Scholar
  75. 75.
    Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1), 1–39 (2010)CrossRefMathSciNetGoogle Scholar
  76. 76.
    Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  77. 77.
    Schaalje, G., Fields, P.: Open-set nearest shrunken centroid classification. Commun. Stat. Theory Methods 41(4), 638–652 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  78. 78.
    Schaalje, G., Fields, P., Roper, M., Snow, G.: Extended nearest shrunken centroid classification: a new method for open-set authorship attribution of texts of varying sizes. Lit. Linguist. Comput. 26(1), 71–88 (2011)CrossRefGoogle Scholar
  79. 79.
    Schapire, R.: The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)Google Scholar
  80. 80.
    Schoonover, J., Marx, R., Zhang, S.: Multivariate curve resolution in the analysis of vibrational spectroscopy data files. Appl. Spectrosc. 57(5), 483–490 (2003)CrossRefGoogle Scholar
  81. 81.
    Skalak, D.: Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: Proceedings of the 11th International Conference on Machine Learning, pp. 293–301 (1994). CiteseerGoogle Scholar
  82. 82.
    Statnikov, A., Wang, L., Aliferis, C.: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics 9(319), 1–10 (2008)Google Scholar
  83. 83.
    Tan, M., Wang, L., Tsang, I.: Learning sparse svm for feature selection on very high dimensional datasets. In: Proceedings of the 27th International Conference on Machine Learning, pp. 1047–1054 (2010)Google Scholar
  84. 84.
    Thomaz, C., Gillies, D.: A maximum uncertainty lda-based approach for limited sample size problems - with application to face recognition. In: Proceedings of the 18th Brazilian Symposium on Computer Graphics and Image Processing, pp. 89–96. IEEE, Natal (2005)Google Scholar
  85. 85.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B Methodol. 58, 267–288 (1996)zbMATHMathSciNetGoogle Scholar
  86. 86.
    Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99(10), 6567–6572 (2002)CrossRefGoogle Scholar
  87. 87.
    Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Class prediction by nearest shrunken centroids, with applications to dna microarrays. Stat. Sci. 18, 104–117 (2003)CrossRefzbMATHMathSciNetGoogle Scholar
  88. 88.
    Tong, T., Chen, L., Zhao, H.: Improved mean estimation and its application to diagonal discriminant analysis. Bioinformatics 28(4), 531–537 (2012)CrossRefGoogle Scholar
  89. 89.
    Trafalis, T., Ince, H.: Support vector machine for regression and applications to financial forecasting. In: Proceedings of the International Joint Conference on Neural Networks, vol. 6, pp. 348–353. IEEE, New York (2000)Google Scholar
  90. 90.
    Trunk, G.: A problem of dimensionality: a simple example. IEEE Trans. Pattern Anal. Mach. Intell. 3(3), 306–307 (1979)CrossRefGoogle Scholar
  91. 91.
    Valiant, L.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)CrossRefzbMATHGoogle Scholar
  92. 92.
    Vapnik, V.: The nature of statistical learning theory. springer (2000)Google Scholar
  93. 93.
    Vapnik, V., Chapelle, O.: Bounds on error expectation for support vector machines. Neural Comput. 12(9), 2013–2036 (2000)CrossRefGoogle Scholar
  94. 94.
    Xu, P., Brock, G., Parrish, R.: Modified linear discriminant analysis approaches for classification of high-dimensional microarray data. Comput. Stat. Data Anal. 53(5), 1674–1687 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  95. 95.
    Yeung, K., Bumgarner, R., et al.: Multiclass classification of microarray data with repeated measurements: application to cancer. Genome Biol. 4(12), R83 (2003)CrossRefGoogle Scholar
  96. 96.
    Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th International Conference on Machine Learning, pp. 856–863 (2003)Google Scholar
  97. 97.
    Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)zbMATHGoogle Scholar
  98. 98.
    Zhang, L., Lin, X.: Some considerations of classification for high dimension low-sample size data. Stat. Methods Med. Res. 22, 537–550 (2011)CrossRefGoogle Scholar
  99. 99.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67(2), 301–320 (2005)CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Industrial and Systems EngineeringUniversity of FloridaGainesvilleUSA

Personalised recommendations