Skip to main content

Discriminant Analysis and Other Linear Classification Models

  • Chapter

Abstract

In this chapter we discuss models that classify samples using linear classification boundaries. We begin this chapter by describing a grant applications case study data set (Section 12.1) which will be used to illustrate models throughout this chapter as well as for Chapters 13-15. As foundational models, we discuss logistic regression (Section 12.2) and linear discriminant analysis (Section 12.3). In Section 12.4 we define and illustrates partial least squares discriminant analysis and its fundamental connection to linear discriminant analysis. Penalized models such as ridge penalty for logistic regression, glmnet, penalized linear discriminant analysis are discussed in Section 12.5. Nearest shrunken centroids, an approach tailored towards high dimensional data, is presented in Section 12.6. We demonstrate in the Computing Section (12.7) how to train each of these models in R. Finally, exercises are provided at the end of the chapter to solidify the concepts.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://blog.kaggle.com/.

  2. 2.

    http://www.kaggle.com/c/unimelb.

  3. 3.

    The RFCD codes can be found at http://tinyurl.com/25zvts while the SEO codes can be found at http://tinyurl.com/8435ae4.

  4. 4.

    However, there are several tree-based methods described in Chap. 14 that are more effective if the categorical predictors are not converted to dummy variables. In these cases, the full set of categories are used.

  5. 5.

    Data with three or more classes are usually modeled using the multinomial distribution. See Agresti (2002) for more details.

  6. 6.

    Bayes’ Rule is examined in more detail in Sect. 13.6.

  7. 7.

    This situation is addressed again for the naïve Bayes models in the next chapter.

  8. 8.

    Mathematically, if we know C − 1 of the dummy variables, then the value of the last dummy variable is directly implied. Hence, it is also possible to only use C − 1 dummy variables.

  9. 9.

    The perturbed covariance structure is due to the optimality constraints for the response matrix. Barker and Rayens (2003) astutely recognized that the response optimality constraint in this setting did not make sense, removed the constraint, and resolved the problem. Without the response-space constraint, the PLS solution is one that involves exactly the between-group covariance matrix.

  10. 10.

    As a reminder, the set of predictors is not being selected on the basis of their association with the outcome. This unsupervised selection should not produce selection bias, which is an issue described in Sect. 19.5.

  11. 11.

    Recall that the models are built on the pre-2008 data and then tuned based on the year 2008 holdout set. These predictions are from the PLSDA model with four components created using only the pre-2008 data.

  12. 12.

    Another method for adding this penalty is discussed in the next chapter using neural networks. In this case, a neural network with weight decay and a single hidden unit constitutes a penalized logistic regression model. However, neural networks do not necessarily use thebinomial likelihood when determining parameter estimates (see Sect. 13.2).

  13. 13.

    http://www.kaggle.com/c/unimelb.

  14. 14.

    In a later chapter, several models are discussed that can represent the categorical predictors in different ways. For example, trees can use the dummy variables in splits but can often create splits based on one or more groupings of categories. In that chapter, factor versions of these predictors are discussed at length.

  15. 15.

    Another duplicate naming issue may occur here. A function called sda in the sda package (for shrinkage discriminant analysis) may cause confusion. If both packages are loaded, using sparseLDA:::sda and sda:::sda will mitigate the issue.

  16. 16.

    In microarray data, the number of predictors is usually much larger than the number of samples. Because of this, and the limited number of columns in popular spreadsheet software, the convention is reversed.

  17. 17.

    http://www.sgi.com/tech/mlc.

References

  • Agresti A (2002). Categorical Data Analysis. Wiley–Interscience.

    Google Scholar 

  • Barker M, Rayens W (2003). “Partial Least Squares for Discrimination.” Journal of Chemometrics, 17(3), 166–173.

    Article  Google Scholar 

  • Berntsson P, Wold S (1986). “Comparison Between X-ray Crystallographic Data and Physiochemical Parameters with Respect to Their Information About the Calcium Channel Antagonist Activity of 4-Phenyl-1,4-Dihydropyridines.” Quantitative Structure-Activity Relationships, 5, 45–50.

    Article  Google Scholar 

  • Chung D, Keles S (2010). “Sparse Partial Least Squares Classification for High Dimensional Data.” Statistical Applications in Genetics and Molecular Biology, 9(1), 17.

    Article  MathSciNet  MATH  Google Scholar 

  • Clemmensen L, Hastie T, Witten D, Ersboll B (2011). “Sparse Discriminant Analysis.” Technometrics, 53(4), 406–413.

    Article  MathSciNet  Google Scholar 

  • Dobson A (2002). An Introduction to Generalized Linear Models. Chapman & Hall/CRC.

    Google Scholar 

  • Dunn W, Wold S (1990). “Pattern Recognition Techniques in Drug Design.” In C Hansch, P Sammes, J Taylor (eds.), “Comprehensive Medicinal Chemistry,” pp. 691–714. Pergamon Press, Oxford.

    Google Scholar 

  • Eilers P, Boer J, van Ommen G, van Houwelingen H (2001). “Classification of Microarray Data with Penalized Logistic Regression.” In “Proceedings of SPIE,” volume 4266, p. 187.

    Google Scholar 

  • Fisher R (1936). “The Use of Multiple Measurements in Taxonomic Problems.” Annals of Eugenics, 7(2), 179–188.

    Article  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software, 33(1), 1–22.

    Article  Google Scholar 

  • Guo Y, Hastie T, Tibshirani R (2007). “Regularized Linear Discriminant Analysis and its Application in Microarrays.” Biostatistics, 8(1), 86–100.

    Article  MATH  Google Scholar 

  • Harrell F (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer, New York.

    Book  MATH  Google Scholar 

  • Hastie T, Tibshirani R (1990). Generalized Additive Models. Chapman & Hall/CRC.

    Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2 edition.

    Google Scholar 

  • Liu Y, Rayens W (2007). “PLS and Dimension Reduction for Classification.” Computational Statistics, pp. 189–208.

    Google Scholar 

  • Park M, Hastie T (2008). “Penalized Logistic Regression for Detecting Gene Interactions.” Biostatistics, 9(1), 30.

    Article  MATH  Google Scholar 

  • Tibshirani R, Hastie T, Narasimhan B, Chu G (2002). “Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression.” Proceedings of the National Academy of Sciences, 99(10), 6567–6572.

    Article  Google Scholar 

  • Tibshirani R, Hastie T, Narasimhan B, Chu G (2003). “Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays.” Statistical Science, 18(1), 104–117.

    Article  MathSciNet  MATH  Google Scholar 

  • Welch B (1939). “Note on Discriminant Functions.” Biometrika, 31, 218–220.

    MathSciNet  MATH  Google Scholar 

  • Witten D, Tibshirani R (2009). “Covariance–Regularized Regression and Classification For High Dimensional Problems.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 71(3), 615–636.

    Article  MathSciNet  MATH  Google Scholar 

  • Witten D, Tibshirani R (2011). “Penalized Classification Using Fisher’s Linear Discriminant.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 73(5), 753–772.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kuhn, M., Johnson, K. (2013). Discriminant Analysis and Other Linear Classification Models. In: Applied Predictive Modeling. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6849-3_12

Download citation

Publish with us

Policies and ethics