Skip to main content

Partially Supervised Sparse Factor Regression For Multi-Class Classification

  • Conference paper
  • First Online:
  • 921 Accesses

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

Abstract

The classical linear discriminant analysis (LDA) may perform poorly in multi-class classification with high-dimensional data. We propose a partially supervised sparse factor regression (PSFAR) approach, to jointly explore the potential low-dimensional structures in the high-dimensional class mean vectors and the common covariance matrix required in LDA. The problem is formulated as a multivariate regression analysis, with predictors constructed from the class labels and responses from the high-dimensional features. The regression coefficient matrix is then composed of the class means, for which we explore a sparse and low rank structure; we further explore a parsimonious factor analysis representation in the covariance matrix. As such, our model assumes that the high-dimensional features are best separated in their means in a low-dimensional subspace, subject to a few unobserved latent factors. We propose a regularized log-likelihood criterion for model estimation, for which an efficient Expectation-Maximization algorithm is developed. The efficacy of PSFAR is demonstrated by both simulation studies and a real application using handwritten digit data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Allwein, E. L., Schapire, R. E. and Singer, Y. (2001) Reducing multiclass to binary: A unifying approach for margin classifiers. The Journal of Machine Learning Research, 1, 113–141.

    MathSciNet  MATH  Google Scholar 

  • Anderson, T. W. (1951) Estimating linear restrictions on regression coefficients for multivariate normal distributions. Annals of Mathematical Statistics, 22, 327–351.

    Article  MathSciNet  MATH  Google Scholar 

  • Bickel, P. J. and Levina, E. (2004) Bernoulli, 10, 989–1010.

    Article  MathSciNet  Google Scholar 

  • Bunea, F., She, Y. and Wegkamp, M. (2011) Optimal selection of reduced rank estimators of high-dimensional matrices. Annals of Statistics, 39, 1282–1309.

    Article  MathSciNet  MATH  Google Scholar 

  • Bunea, F., She, Y., Wegkamp, M. H. et al. (2012) Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. The Annals of Statistics, 40, 2359–2388.

    Article  MathSciNet  MATH  Google Scholar 

  • Cai, T. and Liu, W. (2011) A direct estimation approach to sparse linear discriminant analysis. Journal of the American Statistical Association, 106, 1566–1577.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, K., Chan, K.-S. and Stenseth, N. C. (2012) Reduced rank stochastic regression with a sparse singular value decomposition. Journal of the Royal Statistical Society Series B, 74, 203–221.

    Article  MathSciNet  Google Scholar 

  • Chen, K., Dong, H. and Chan, K.-S. (2013) Reduced rank regression via adaptive nuclear norm penalization. Biometrika, 100, 901–920.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, L. and Huang, J. (2016) Sparse reduced-rank regression with covariance estimation. Statistics and Computing, 26, 461–470.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, L. and Huang, J. Z. (2012) Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 107, 1533–1545.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J. and Fan, Y. (2008) High dimensional classification using features annealed independence rules. Annals of Statistics, 36, 2605.

    Article  MathSciNet  MATH  Google Scholar 

  • Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.

    Article  Google Scholar 

  • Friedman, J., Hastie, T. and Tibshirani, R. (2001) The elements of statistical learning. Springer Series in Statistics Springer, Berlin.

    MATH  Google Scholar 

  • Friedman, J., Hastie, T. and Tibshirani, R. (2010) Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1.

    Article  Google Scholar 

  • Glahn, H. R. (1968) Canonical correlation and its relationship to discriminant analysis and multiple regression. Journal of the Atmospheric Sciences, 25, 23–31.

    Article  Google Scholar 

  • Gower, J. C. and Dijksterhuis, G. B. (2004) Procrustes problems. Oxford University Press.

    Book  MATH  Google Scholar 

  • Izenman, A. (2008) Modern multivariate statistical techniques. Springer.

    Book  MATH  Google Scholar 

  • Izenman, A. J. (1975) Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis, 5, 248–264.

    Article  MathSciNet  MATH  Google Scholar 

  • LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278–2324.

    Google Scholar 

  • Li, T., Zhu, S. and Ogihara, M. (2006) Using discriminant analysis for multi-class classification: an experimental investigation. Knowledge and Information Systems, 10, 453–472.

    Article  Google Scholar 

  • Lorena, A. C., Carvalho, A. C. P. L. F. and Gama, J. M. P. (2009) A review on the combination of binary classifiers in multiclass problems. Artificial Intelligence Review, 30, 19–37.

    Article  Google Scholar 

  • McLachlan, G. (2004) Discriminant analysis and statistical pattern recognition, vol. 544. John Wiley & Sons.

    Google Scholar 

  • Rao, C. R. (1948) The utilization of multiple measurements in problems of biological classification. Journal of the Royal Statistical Society Series B, 10, 159–203.

    MathSciNet  MATH  Google Scholar 

  • Reinsel, G. C. and Velu, P. (1998) Multivariate reduced-rank regression: theory and applications. New York: Springer.

    Book  MATH  Google Scholar 

  • Rothman, A. J., Levina, E. and Zhu, J. (2010) Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics, 19, 947–962.

    Article  MathSciNet  Google Scholar 

  • Shao, J., Wang, Y., Deng, X. and Wang, S. (2011) Sparse linear discriminant analysis by thresholding for high dimensional data. Ann. Statist., 39, 1241–1265.

    Article  MathSciNet  MATH  Google Scholar 

  • Vapnik, V. N. and Vapnik, V. (1998) Statistical learning theory. Wiley New York.

    MATH  Google Scholar 

  • Yuan, M. and Lin, Y. (2006) Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68, 49–67.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, M.-L. and Zhou, Z.-H. (2014) A review on multi-label learning algorithms. Knowledge and Data Engineering, IEEE Transactions on, 26, 1819–1837.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kun Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Luo, C., Dey, D., Chen, K. (2016). Partially Supervised Sparse Factor Regression For Multi-Class Classification. In: Lin, J., Wang, B., Hu, X., Chen, K., Liu, R. (eds) Statistical Applications from Clinical Trials and Personalized Medicine to Finance and Business Analytics. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-42568-9_24

Download citation

Publish with us

Policies and ethics