Machine Learning

Abstract

This tutorial provides a brief overview of a number of important tools that form the crux of the modern machine learning toolbox. These tools can be used for supervised learning, unsupervised learning, reinforcement learning and their numerous variants developed over the years. Because of the lack of space, this survey is not intended to be comprehensive. Interested readers are referred to conference proceedings such as Neural Information Processing Systems (NIPS ) and the International Conference on Machine Learning (ICML) for the most recent advances.

Keywords

Manifold Covariance Beach Harness Lasso 
BIC

Bayesian information criterion

BMA

Bayes model averaging

BMF

binary matrix factorization

BYY

Bayesian Yin-Yang

DCA

de-correlated component analysis

EM

expectation maximization

FA

factor analysis

HMM

hidden Markov model

HT

Hough transform

ICA

independent component analysis

ICML

International Conference on Machine Learning

IFA

independent factor analysis

KPCA

kernel principal component analysis

LDA

linear discriminant analysis

LFA

local factor analysis

MCA

minor component analysis

MDL

minimum description length

MDP

Markov decision process

MIL

multi-instance learning

MIML

multi-instance, multi-label learning

MLR

multi-response linear regression

MSA

minor subspace analysis

MTFL

multi-task feature learning

MTL

multi-task learning

NFA

non-Gaussian factor analysis

NIPS

neural information processing system

NMF

nonnegative matrix factorization

PCA

principal component analysis

PSA

principal subspace analysis

RBF

radial basis function

RHT

randomized Hough transform

RMTL

regularized multi-task learning

RPCL

rival penalized competitive learning

S3VM

semi-supervised support vector machine

SARSA

state-action-reward-state-action

SBF

subspace-based function

SSM

state–space model

TD

temporal difference

TFA

temporal factor analysis

WTA

winner-take-all

References

  1. [29.1]
    H. Simon: Why should machines learn? In: Machine Learning. An Artificial Intelligence Approach, ed. by I.R. Anderson, R.S. Michalski, J.G. Carbonell, T.M. Mitchell (Tioga Publ., Palo Alto 1983)Google Scholar
  2. [29.2]
    L. Xu: Bayesian Ying Yang learning, Scholarpedia 2(3), 1809 (2007)CrossRefGoogle Scholar
  3. [29.3]
    L. Xu: Bayesian Ying-Yang system, best harmony learning, and five action circling, Front. Electr. Electr. Eng. China 5(3), 281–328 (2010)CrossRefGoogle Scholar
  4. [29.4]
    L. Xu, S. Klasa, A. Yuille: Recent advances on techniques static feed-forward networks with supervised learning, Int. J. Neural Syst. 3(3), 253–290 (1992)CrossRefGoogle Scholar
  5. [29.5]
    L. Xu: Learning algorithms for RBF functions and subspace based functions. In: Handbook of Research on Machine Learning, Applications and Trends: Algorithms, Methods and Techniques, ed. by E. Olivas, J.D.M. Guerrero, M.M. Sober, J.R.M. Benedito, A.J.S. López (Inform. Sci. Ref., Hershey 2009) pp. 60–94Google Scholar
  6. [29.6]
    L. Xu: Several streams of progresses on unsupervised learning: A tutorial overview, Appl. Inf. 1 (2013) Google Scholar
  7. [29.7]
    A. Jain: Data clustering: 50 years beyond k-means, Pattern Recognit. Lett. 31, 651–666 (2010)CrossRefGoogle Scholar
  8. [29.8]
    H. Kriegel, P. Kroger, A. Zimek: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering, ACM Trans. Knowl. Discov. Data 3(1), 1 (2009)CrossRefGoogle Scholar
  9. [29.9]
    H. Yin: Advances in adaptive nonlinear manifolds and dimensionality reduction, Front. Electr. Electr. Eng. China 6(1), 72–85 (2011)CrossRefGoogle Scholar
  10. [29.10]
    T.T. Kohonen Honkela: Kohonen network, Scholarpedia 2(1), 1568 (2007)CrossRefGoogle Scholar
  11. [29.11]
    L. Xu, J. Neufeld, B. Larson, D. Schuurmans: Maximum margin clustering, Adv. Neural Inf. Process. Syst. (2004) pp. 1537–1544Google Scholar
  12. [29.12]
    K. Zhang, I. Tsang, J. Kwok: Maximum margin clustering made practical, IEEE Trans. Neural Netw. 20(4), 583–596 (2009)CrossRefGoogle Scholar
  13. [29.13]
    Y.-F. Li, I. Tsang, J. Kwok, Z.-H. Zhou: Tighter and convex maximum margin clustering, Proc. 12th Int. Conf. Artif. Intell. Stat. (2009)Google Scholar
  14. [29.14]
    Z.-H. Zhou: Ensemble Methods: Foundations and Algorithms (Taylor Francis, Boca Raton 2012)Google Scholar
  15. [29.15]
    G. Tsoumakas, I. Katakis, I. Vlahavas: Mining multi-label data. In: Data Mining and Knowledge Discovery Handbook, 2nd edn., ed. by O. Maimon, L. Rokach (Springer, Berlin, Heidelberg 2010)Google Scholar
  16. [29.16]
    C. Silla, A. Freitas: A survey of hierarchical classification across different application domains, Data Min. Knowl. Discov. 22(1/2), 31–72 (2010)MathSciNetMATHGoogle Scholar
  17. [29.17]
    W. Bi, J. Kwok: Multi-label classification on tree- and DAG-structured hierarchies, Proc. 28th Int. Conf. Mach. Learn. (2011)Google Scholar
  18. [29.18]
    W. Bi, J. Kwok: Hierarchical multilabel classification with minimum Bayes risk, Proc. Int. Conf. Data Min. (2012)Google Scholar
  19. [29.19]
    W. Bi, J. Kwok: Mandatory leaf node prediction in hierarchical multilabel classification, Adv. Neural Inf. Process. Syst. (2012)Google Scholar
  20. [29.20]
    T.G. Dietterich, R.H. Lathrop, T. Lozano-Pérez: Solving the multiple-instance problem with axis-parallel rectangles, Artif. Intell. 89(1-2), 31–71 (1997)MATHCrossRefGoogle Scholar
  21. [29.21]
    Z.-H.M.-L. Zhou Zhang: Solving multi-instance problems with classifier ensemble based on constructive clustering, Knowl. Inf. Syst. 11(2), 155–170 (2007)CrossRefGoogle Scholar
  22. [29.22]
    Z.-H. Zhou, Y.-Y. Sun, Y.-F. Li: Multi-instance learning by treating instances as non-i.i.d. samples, Proc. 26th Int. Conf. Mach. Learn. (2009) pp. 1249–1256Google Scholar
  23. [29.23]
    Z.-H. Zhou, J.-M. Xu: On the relation between multi-instance learning and semi-supervised learning, Proc. 24th Int. Conf. Mach. Learn. (2007) pp. 1167–1174Google Scholar
  24. [29.24]
    N. Weidmann, E. Frank, B. Pfahringer: A two-level learning method for generalized multi-instance problem, Proc. 14th Eur. Conf. Mach. Learn. (2003) pp. 468–479Google Scholar
  25. [29.25]
    S.D. Scott, J. Zhang, J. Brown: On generalized multiple-instance learning, Int. J. Comput. Intell. Appl. 5(1), 21–35 (2005)CrossRefGoogle Scholar
  26. [29.26]
    Z.-H. Zhou, M.-L. Zhang, S.-J. Huang, Y.-F. Li: Multi-instance multi-label learning, Artif. Intell. 176(1), 2291–2320 (2012)MathSciNetMATHCrossRefGoogle Scholar
  27. [29.27]
    J. Foulds, E. Frank: A review of multi-instance learning assumptions, Knowl. Eng. Rev. 25(1), 1–25 (2010)CrossRefGoogle Scholar
  28. [29.28]
    L. Xu, A. Krzyzak, C. Suen: Several methods for combining multiple classifiers and their applications in handwritten character recognition, IEEE Trans. Syst. Man Cybern. SMC 22(3), 418–435 (1992)CrossRefGoogle Scholar
  29. [29.29]
    J. Kittler, M. Hatef, R. Duin, J. Matas: On combining classifiers, IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)CrossRefGoogle Scholar
  30. [29.30]
    L. Xu, S.I. Amari: Combining classifiers and learning mixture-of-experts. In: Encyclopedia of Artificial Intelligence, ed. by J. Dopioco, J. Dorado, A. Pazos (Inform. Sci. Ref., Hershey 2008) pp. 318–326Google Scholar
  31. [29.31]
    A. Blum, T. Mitchell: Combining labeled and unlabeled data with co-training, Proc. 11th Annu. Conf. Comput. Learn. Theory (1998) pp. 92–100Google Scholar
  32. [29.32]
    S. Abney: Bootstrapping, Proc. 40th Annu. Meet. Assoc. Comput. Linguist. (2002) pp. 360–367Google Scholar
  33. [29.33]
    M.-F. Balcan, A. Blum, K. Yang: Co-training and expansion: Towards bridging theory and practice, Adv. Neural Inf. Process. Syst. (2005) pp. 89–96Google Scholar
  34. [29.34]
    W. Wang, Z.-H. Zhou: A new analysis of co-training, Proc. 27th Int. Conf. Mach. Learn. (2010) pp. 1135–1142Google Scholar
  35. [29.35]
    Z.-H. Zhou, D.-C. Zhan, Q. Yang: Semi-supervised learning with very few labeled training examples, Proc. 22nd AAAI Conf. Artif. Intell. (2007) pp. 675–680Google Scholar
  36. [29.36]
    W. Wang, Z.-H. Zhou: Multi-view active learning in the non-realizable case, Adv. Neural Inf. Process. Syst. (2010) pp. 2388–2396Google Scholar
  37. [29.37]
    R. Caruana: Multitask learning, Mach. Learn. 28(1), 41–75 (1997)MathSciNetCrossRefGoogle Scholar
  38. [29.38]
    T. Evgeniou, M. Pontil: Regularized multi-task learning, Proc. 10th Int. Conf. Know. Discov. Data Min. (2004) pp. 109–117Google Scholar
  39. [29.39]
    T. Evgeniou, C.A. Micchelli, M. Pontil: Learning multiple tasks with kernel methods, J. Mach. Learn. Res. 6, 615–637 (2005)MathSciNetMATHGoogle Scholar
  40. [29.40]
    A. Argyriou, T. Evgeniou, M. Pontil: Multi-task feature learning, Adv. Neural Inf. Process. Syst. (2007) pp. 41–48Google Scholar
  41. [29.41]
    A. Argyriou, T. Evgeniou, M. Pontil: Convex multi-task feature learning, Mach. Learn. 73(3), 243–272 (2008)CrossRefGoogle Scholar
  42. [29.42]
    T. Kato, H. Kashima, M. Sugiyama, K. Asai: Multi-task learning via conic programming, Adv. Neural Inf. Process. Syst. (2007) pp. 737–744Google Scholar
  43. [29.43]
    R. Ando, T. Zhang: A framework for learning predictive structures from multiple tasks and unlabeled data, J. Mach. Learn. Res. 6, 1817–1853 (2005)MathSciNetMATHGoogle Scholar
  44. [29.44]
    Y. Zhang, D.-Y. Yeung: A convex formulation for learning task relationships in multi-task learning, Proc. 24th Conf. Uncertain. Artif. Intell. (2010) pp. 733–742Google Scholar
  45. [29.45]
    L. Jacob, F. Bach, J. Vert: Clustered multi-task learning: A convex formulation, Adv. Neural Inf. Process. Syst. (2008) pp. 745–752Google Scholar
  46. [29.46]
    L.J. Zhong Kwok: Convex multitask learning with flexible task clusters, Proc. 29th Int. Conf. Mach. Learn. (2012)Google Scholar
  47. [29.47]
    J. Chen, J. Zhou, J. Ye: Integrating low-rank and group-sparse structures for robust multi-task learning, Proc. 17th Int. Conf. Knowl. Discov. Data Min. (2011) pp. 42–50Google Scholar
  48. [29.48]
    S. Pan, J. Kwok, Q. Yang, J. Pan: Adaptive localization in A dynamic WiFi environment through multi-view learning, Proc. 22nd AAAI Conf. Artif. Intell. (2007) pp. 1108–1113Google Scholar
  49. [29.49]
    S. Pan, J. Kwok, Q. Yang: Transfer learning via dimensionality reduction, Proc. 23rd AAAI Conf. Artif. Intell. (2008)Google Scholar
  50. [29.50]
    S. Pan, I. Tsang, J. Kwok, Q. Yang: Domain adaptation via transfer component analysis, IEEE Trans. Neural Netw. 22(2), 199–210 (2011)CrossRefGoogle Scholar
  51. [29.51]
    W. Dai, Q. Yang, G. Xue, Y. Yu: Boosting for transfer learning, Proc. 24th Int. Conf. Mach. Learn. (2007) pp. 193–200Google Scholar
  52. [29.52]
    J. Huang, A. Smola, A. Gretton, K. Borgwardt, B. Schölkopf: Correcting sample selection bias by unlabeled data, Adv. Neural Inf. Process. Syst. (2007) pp. 601–608Google Scholar
  53. [29.53]
    M. Sugiyama, S. Nakajima, H. Kashima, P.V. Buenau, M. Kawanabe: Direct importance estimation with model selection and its application to covariate shift adaptation, Adv. Neural Inf. Process. Syst. (2008)Google Scholar
  54. [29.54]
    C. Elkan: The foundations of cost-sensitive learning, Proc. 17th Int. Jt. Conf. Artif. Intell. (2001) pp. 973–978Google Scholar
  55. [29.55]
    Z.-H. Zhou, X.-Y. Liu: On multi-class cost-sensitive learning, Proc. 21st Natl. Conf. Artif. Intell. (2006) pp. 567–572Google Scholar
  56. [29.56]
    X.-Y. Liu, Z.-H. Zhou: Learning with cost intervals, Proc. 16th Int. Conf. Knowl. Discov. Data Min. (2010) pp. 403–412Google Scholar
  57. [29.57]
    P.D. Turney: Types of cost in inductive concept learning, Proc. 17th Int. Conf. Mach. Learn. (2000) pp. 15–21Google Scholar
  58. [29.58]
    L. Xu: On essential topics of BYY harmony learning: Current status, challenging issues, and gene analysis applications, Front. Electr. Elect. Eng. China 7(1), 147–196 (2012)Google Scholar
  59. [29.59]
    L. Xu: Semi-blind bilinear matrix system, BYY harmony learning, and gene analysis applications, Proc. 6th Int. Conf. New Trends Inf. Sci. Serv. Sci. Data Min. (2012) pp. 661–666Google Scholar
  60. [29.60]
    L. Xu: Independent subspaces. In: Encyclopedia of Artificial Intelligence, ed. by J. Dopioco, J. Dorado, A. Pazos (Inform. Sci. Ref., Hershey 2008) pp. 903–912Google Scholar
  61. [29.61]
    L. Xu: Independent component analysis and extensions with noise and time: A Bayesian Ying-Yang learning perspective, Neural Inf. Process. Lett. Rev. 1(1), 1–52 (2003)Google Scholar
  62. [29.62]
    L. Xu: One-bit-matching ICA theorem, convex-concave programming, and distribution approximation for combinatorics, Neural Comput. 19, 546–569 (2007)MathSciNetMATHCrossRefGoogle Scholar
  63. [29.63]
    S. Tu, L. Xu: Parameterizations make different model selections: Empirical findings from factor analysis, Front. Electr. Electr. Eng. China 6(2), 256–274 (2011)MathSciNetCrossRefGoogle Scholar
  64. [29.64]
    P. Williams: Bayesian regularization and pruning using A Laplace prior, Neural Comput. 7(1), 117–143 (1995)CrossRefGoogle Scholar
  65. [29.65]
    R. Tibshirani: Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B: Methodol. 58(1), 267–288 (1996)MathSciNetMATHGoogle Scholar
  66. [29.66]
    M. Figueiredo, A. Jain: Unsupervised learning of finite mixture models, IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)CrossRefGoogle Scholar
  67. [29.67]
    C. McGrory, D. Titterington: Variational approximations in Bayesian model selection for finite mixture distributions, Comput. Stat. Data Anal. 51(11), 5352–5367 (2007)MathSciNetMATHCrossRefGoogle Scholar
  68. [29.68]
    A. Corduneanu, C. Bishop: Variational Bayesian model selection for mixture distributions, Proc. 8th Int. Conf. Artif. Intell. Stat. (2001) pp. 27–34Google Scholar
  69. [29.69]
    L. Xu: Rival penalized competitive learning, Scholarpedia 2(8), 1810 (2007)CrossRefGoogle Scholar
  70. [29.70]
    L. Xu: A unified perspective and new results on RHT computing, mixture based learning, and multi-learner based problem solving, Pattern Recognit. 40(8), 2129–2153 (2007)MATHCrossRefGoogle Scholar
  71. [29.71]
    L. Xu: BYY harmony learning, structural RPCL, and topological self-organizing on mixture models, Neural Netw. 8-9, 1125–1151 (2002)CrossRefGoogle Scholar
  72. [29.72]
    L. Xu, M. Jordan, G. Hinton: An alternative model for mixtures of experts, Adv. Neural Inf. Process. Syst. (1995) pp. 633–640Google Scholar
  73. [29.73]
    D. Lee, H. Seung: Learning the parts of objects by non-negative matrix factorization, Nature 401(6755), 788–791 (1999)CrossRefGoogle Scholar
  74. [29.74]
    S. Madeira: A. Oliveira, Biclustering algorithms for biological data analysis: A survey, IEEE Trans. Comput. Biol. Bioinform. 1(1), 25–45 (2004)CrossRefGoogle Scholar
  75. [29.75]
    S. Tu, R. Chen, L. Xu: A binary matrix factorization algorithm for protein complex prediction, Proteome Sci. 9(Suppl 1), S18 (2011)CrossRefGoogle Scholar
  76. [29.76]
    X. He, P. Niyogi: Locality preserving projections, Adv. Neural Inf. Process. Syst. (2003) pp. 152–160Google Scholar
  77. [29.77]
    X. He, B. Lin: Tangent space learning and generalization, Front. Electr. Electr. Eng. China 6(1), 27–42 (2011)CrossRefGoogle Scholar
  78. [29.78]
    M.M. Meila Jordan: Learning with mixtures of trees, J. Mach. Learn. Res. 1, 1–48 (2000)MathSciNetMATHGoogle Scholar
  79. [29.79]
    J. Pearl: Fusion, propagation and structuring in belief networks, Artif. Intell. 29(3), 241–288 (1986), Sep.MathSciNetMATHCrossRefGoogle Scholar
  80. [29.80]
    L. Xu, J. Pearl: Structuring causal tree models with continuous variables, Proc. 3rd Annu. Conf. Uncertain. Artif. Intell. (1987) pp. 170–179Google Scholar
  81. [29.81]
    A. Barto: Temporal difference learning, Scholarpedia 2(11), 1604 (2007)CrossRefGoogle Scholar
  82. [29.82]
    F. Woergoetter, B. Porr: Reinforcement learning, Scholarpedia 3(3), 1448 (2008)CrossRefGoogle Scholar
  83. [29.83]
    O. Chapelle, B. Schölkopf, A. Zien: Semi-Supervised Learning (MIT, Cambridge 2006)CrossRefGoogle Scholar
  84. [29.84]
    X. Zhu: Semi-supervised learning literature survey (Univ. of Wisconsin, Madison 2008)Google Scholar
  85. [29.85]
    Z.-H. Zhou, M. Li: Semi-supervised learning by disagreement, Knowl. Inform. Syst. 24(3), 415–439 (2010)CrossRefGoogle Scholar
  86. [29.86]
    Z.-H. Zhou: When semi-supervised learning meets ensemble learning, Front. Electr. Electr. Eng. China 6(1), 6–16 (2011)CrossRefGoogle Scholar
  87. [29.87]
    V.N. Vapnik: Statistical Learning Theory (Wiley, New York 1998)MATHGoogle Scholar
  88. [29.88]
    Y.-F. Li, Z.-H. Zhou: Towards making unlabeled data never hurt, Proc. 28th Int. Conf. Mach. Learn. (2011) pp. 1081–1088Google Scholar
  89. [29.89]
    A. Fred, A.K. Jain: Data clustering using evidence accumulation, Proc. 16th Int. Conf. Pattern Recognit. (2002) pp. 276–280Google Scholar
  90. [29.90]
    A. Strehl, J. Ghosh: Cluster ensembles – A knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNetMATHGoogle Scholar
  91. [29.91]
    R. Jacobs, M. Jordan, S. Nowlan, G. Hinton: Adaptive mixtures of local experts, Neural Comput. 3, 79–87 (1991)CrossRefGoogle Scholar
  92. [29.92]
    R.E. Schapire: The strength of weak learnability, Mach. Learn. 5(2), 197–227 (1990)Google Scholar
  93. [29.93]
    Y. Freund, R.E. Schapire: A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetMATHCrossRefGoogle Scholar
  94. [29.94]
    J. Friedman, T. Hastie, R. Tibshirani: Additive logistic regression: A statistical view of boosting (with discussions), Ann. Stat. 28(2), 337–407 (2000)MathSciNetMATHCrossRefGoogle Scholar
  95. [29.95]
    R.E. Schapire, Y. Singer: Improved boosting algorithms using confidence-rated predictions, Mach. Learn. 37(3), 297–336 (1999)MATHCrossRefGoogle Scholar
  96. [29.96]
    J. Zhu, S. Rosset, H. Zou, T. Hastie: Multi-class AdaBoost, Stat. Interface 2, 349–360 (2009)MathSciNetMATHCrossRefGoogle Scholar
  97. [29.97]
    R.E. Schapire, Y. Freund, P. Bartlett, W.S. Lee: Boosting the margin: A new explanation for the effectiveness of voting methods, Ann. Stat. 26(5), 1651–1686 (1998)MathSciNetMATHCrossRefGoogle Scholar
  98. [29.98]
    L. Breiman: Prediction games and arcing algorithms, Neural Comput. 11(7), 1493–1517 (1999)CrossRefGoogle Scholar
  99. [29.99]
    L. Breiman: Bagging predictors, Mach. Learn. 24(2), 123–140 (1996)MathSciNetMATHGoogle Scholar
  100. [29.100]
    C. Domingo, O. Watanabe: Madaboost: A modification of AdaBoost, Proc. 13th Annu. Conf. Comput. Learn. Theory (2000) pp. 180–189Google Scholar
  101. [29.101]
    Y. Freund: An adaptive version of the boost by majority algorithm, Mach. Learn. 43(3), 293–318 (2001)MATHCrossRefGoogle Scholar
  102. [29.102]
    B. Efron, R. Tibshirani: An Introduction to the Bootstrap (Chapman Hall, New York 1993)MATHCrossRefGoogle Scholar
  103. [29.103]
    A. Buja, W. Stuetzle: Observations on bagging, Stat. Sin. 16(2), 323–351 (2006)MathSciNetMATHGoogle Scholar
  104. [29.104]
    J.H.P. Friedman Hall: On bagging and nonlinear estimation, J. Stat. Plan. Inference 137(3), 669–683 (2007)MathSciNetMATHCrossRefGoogle Scholar
  105. [29.105]
    L. Breiman: Random forests, Mach. Learn. 45(1), 5–32 (2001)MathSciNetMATHCrossRefGoogle Scholar
  106. [29.106]
    D.H. Wolpert: Stacked generalization, Neural Netw. 5(2), 241–260 (1992)MathSciNetCrossRefGoogle Scholar
  107. [29.107]
    L. Breiman: Stacked regressions, Mach. Learn. 24(1), 49–64 (1996)MathSciNetMATHGoogle Scholar
  108. [29.108]
    P. Smyth, D. Wolpert: Stacked density estimation, Adv. Neural Inf. Process. Syst. (1998) pp. 668–674Google Scholar
  109. [29.109]
    L. Xu, A. Krzyzak, C. Sun: Associative switch for combining multiple classifiers, Int. Jt. Conf. Neural Netw. (1991) pp. 43–48Google Scholar
  110. [29.110]
    K.M. Ting, I.H. Witten: Issues in stacked generalization, J. Artif. Intell. Res. 10, 271–289 (1999)MATHGoogle Scholar
  111. [29.111]
    A.K. Seewald: How to make stacking better and faster while also taking care of an unknown weakness, Proc. 19th Int. Conf. Mach. Learn. (2002) pp. 554–561Google Scholar
  112. [29.112]
    B. Clarke: Comparing Bayes model averaging and stacking when model approximation error cannot be ignored, J. Mach. Learn. Res. 4, 683–712 (2003)MathSciNetMATHGoogle Scholar
  113. [29.113]
    A. Krogh, J. Vedelsby: Neural network ensembles, cross validation, and active learning, Adv. Neural Inf. Process. Syst. (1995) pp. 231–238Google Scholar
  114. [29.114]
    N.R. Ueda Nakano: Generalization error of ensemble estimators, Proc. IEEE Int. Conf. Neural Netw. (1996) pp. 90–95Google Scholar
  115. [29.115]
    G. Brown, J.L. Wyatt, P. Tino: Managing diversity in regression ensembles, J. Mach. Learn. Res. 6, 1621–1650 (2005)MathSciNetMATHGoogle Scholar
  116. [29.116]
    Z.-H. Zhou, J. Wu, W. Tang: Ensembling neural networks: Many could be better than all, Artif. Intell. 137(1-2), 239–263 (2002)MathSciNetMATHCrossRefGoogle Scholar
  117. [29.117]
    L.I. Kuncheva, C.J. Whitaker: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Mach. Learn. 51(2), 181–207 (2003)MATHCrossRefGoogle Scholar
  118. [29.118]
    P. Devijver, J. Kittler: Pattern Recognition: A Statistical Approach (Prentice Hall, New York 1982)MATHGoogle Scholar
  119. [29.119]
    Y. Saeys, I. Inza, P. Larraaga: A review of feature selection techniques in bioinformatics, Bioinformatics 19(23), 2507–2517 (2007)CrossRefGoogle Scholar
  120. [29.120]
    I. Guyon, A. Elisseeff: An introduction to variable and feature selection, J. Mach. Learn. Res. 3, 1157–1182 (2003)MATHGoogle Scholar
  121. [29.121]
    A. Jain, R. Duin, J. Mao: Statistical pattern recognition: A review, IEEE Trans. Pattern Anal. Mach. Intell. 22, 1 (2000)CrossRefGoogle Scholar
  122. [29.122]
    I. Guyon, J. Weston, S. Barnhill, V. Vapnik: Gene selection for cancer classification using support vector machines, Mach. Learn. 46(1-3), 389–422 (2002)MATHCrossRefGoogle Scholar
  123. [29.123]
    M. Dash, K. Choi, P. Scheuermann, H. Liu: Feature selection for clustering -- A filter solution, Proc. 2nd Int. Conf. Data Min. (2002) pp. 115–122, Dec.Google Scholar
  124. [29.124]
    M. Law, M. Figueiredo, A. Jain: Simultaneous feature selection and clustering using mixture models, IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1154–1166 (2004)CrossRefGoogle Scholar
  125. [29.125]
    P. Mitra, C. Murthy, S.K. Pal: Unsupervised feature selection using feature similarity, IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)CrossRefGoogle Scholar
  126. [29.126]
    V. Roth: The generalized LASSO, IEEE Trans. Neural Netw. 15(1), 16–28 (2004)CrossRefGoogle Scholar
  127. [29.127]
    C. Constantinopoulos, M. Titsias, A. Likas: Bayesian feature and model selection for Gaussian mixture models, IEEE Trans. Pattern Anal. Mach. Intell. 28(6), 1013–1018 (2006)CrossRefGoogle Scholar
  128. [29.128]
    J. Dy, C. Brodley: Feature selection for unsupervised learning, J. Mach. Learn. Res. 5, 845–889 (2004)MathSciNetMATHGoogle Scholar
  129. [29.129]
    B. Zhao, J. Kwok, F. Wang, C. Zhang: Unsupervised maximum margin feature selection with manifold regularization, Proc. Int. Conf. Comput. Vis. Pattern Recognit. (2009)Google Scholar
  130. [29.130]
    B. Turlach, W. Venables, S. Wright: Simultaneous variable selection, Technometrics 27, 349–363 (2005)MathSciNetCrossRefGoogle Scholar
  131. [29.131]
    B. Schölkopf, A. Smola: Learning with Kernels (MIT Press, Cambridge 2002)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Dep. Computer Science and EngineeringHong Kong University of Science and TechnologyHong KongHong Kong
  2. 2.National Key Lab. for Novel Software TechnologyNanjing UniversityNanjingChina
  3. 3.Dep. Computer Science and EngineeringThe Chinese University of Hong KongHong KongHong Kong

Personalised recommendations