Linear Discriminant Dimensionality Reduction

  • Quanquan Gu
  • Zhenhui Li
  • Jiawei Han
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6911)


Fisher criterion has achieved great success in dimensionality reduction. Two representative methods based on Fisher criterion are Fisher Score and Linear Discriminant Analysis (LDA). The former is developed for feature selection while the latter is designed for subspace learning. In the past decade, these two approaches are often studied independently. In this paper, based on the observation that Fisher score and LDA are complementary, we propose to integrate Fisher score and LDA in a unified framework, namely Linear Discriminant Dimensionality Reduction (LDDR). We aim at finding a subset of features, based on which the learnt linear transformation via LDA maximizes the Fisher criterion. LDDR inherits the advantages of Fisher score and LDA and is able to do feature selection and subspace learning simultaneously. Both Fisher score and LDA can be seen as the special cases of the proposed method. The resultant optimization problem is a mixed integer programming, which is difficult to solve. It is relaxed into a L 2,1-norm constrained least square problem and solved by accelerated proximal gradient descent algorithm. Experiments on benchmark face recognition data sets illustrate that the proposed method outperforms the state of the art methods arguably.


Feature Selection Linear Discriminant Analysis Face Image Feature Selection Method Locality Preserve Projection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Machine Learning 73(3), 243–272 (2008)CrossRefGoogle Scholar
  2. 2.
    Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 711–720 (1997)CrossRefGoogle Scholar
  3. 3.
    Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  4. 4.
    Cai, D., He, X., Han, J.: Semi-supervised discriminant analysis. In: ICCV, pp. 1–7 (2007)Google Scholar
  5. 5.
    Cai, D., He, X., Han, J.: Spectral regression: A unified approach for sparse subspace learning. In: ICDM, pp. 73–82 (2007)Google Scholar
  6. 6.
    Fukunaga, K.: Introduction to statistical pattern recognition, 2nd edn. Academic Press Professional, Inc., San Diego (1990)zbMATHGoogle Scholar
  7. 7.
    Golub, G.H., Loan, C.F.V.: Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)zbMATHGoogle Scholar
  8. 8.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)zbMATHGoogle Scholar
  9. 9.
    Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer, Heidelberg (2001)CrossRefzbMATHGoogle Scholar
  10. 10.
    He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: NIPS (2005)Google Scholar
  11. 11.
    He, X., Cai, D., Yan, S., Zhang, H.: Neighborhood preserving embedding. In: ICCV, pp. 1208–1213 (2005)Google Scholar
  12. 12.
    He, X., Niyogi, P.: Locality preserving projections. In: NIPS (2003)Google Scholar
  13. 13.
    Ji, S., Ye, J.: An accelerated gradient method for trace norm minimization. In: ICML, p. 58 (2009)Google Scholar
  14. 14.
    Liu, J., Ji, S., Ye, J.: Multi-task feature learning via efficient l 2,1-norm minimization. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2009 (2009)Google Scholar
  15. 15.
    Luo, D., Ding, C.H.Q., Huang, H.: Towards structural sparsity: An explicit l2/l0 approach. In: ICDM, pp. 344–353 (2010)Google Scholar
  16. 16.
    Masaeli, M., Fung, G., Dy, J.G.: From transformation-based dimensionality reduction to feature selection. In: ICML, pp. 751–758 (2010)Google Scholar
  17. 17.
    Moghaddam, B., Weiss, Y., Avidan, S.: Generalized spectral bounds for sparse lda. In: ICML, pp. 641–648 (2006)Google Scholar
  18. 18.
    Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Dordrecht (2003)zbMATHGoogle Scholar
  20. 20.
    Nie, F., Xiang, S., Jia, Y., Zhang, C., Yan, S.: Trace ratio criterion for feature selection. In: AAAI, pp. 671–676 (2008)Google Scholar
  21. 21.
    Obozinski, G., Taskar, B., Jordan, M.I.: Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing 20, 231–252 (2010)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience Publication, Hoboken (2001)zbMATHGoogle Scholar
  23. 23.
    Song, L., Smola, A.J., Gretton, A., Borgwardt, K.M., Bedo, J.: Supervised feature selection via dependence estimation. In: ICML, pp. 823–830 (2007)Google Scholar
  24. 24.
    Sugiyama, M.: Local fisher discriminant analysis for supervised dimensionality reduction. In: ICML, pp. 905–912 (2006)Google Scholar
  25. 25.
    Yan, S., Xu, D., Zhang, B., Zhang, H., Yang, Q., Lin, S.: Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 29(1), 40–51 (2007)CrossRefGoogle Scholar
  26. 26.
    Ye, J.: Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. Journal of Machine Learning Research 6, 483–502 (2005)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Ye, J.: Least squares linear discriminant analysis. In: ICML, pp. 1087–1093 (2007)Google Scholar
  28. 28.
    Yuan, M., Yuan, M., Lin, Y., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B 68, 49–67 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Zhao, Z., Wang, L., Liu, H.: Efficient spectral feature selection with minimum redundancy. In: AAAI (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Quanquan Gu
    • 1
  • Zhenhui Li
    • 1
  • Jiawei Han
    • 1
  1. 1.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUrbanaUS

Personalised recommendations