Discriminant Analysis and Other Linear Classification Models

Kuhn, Max; Johnson, Kjell

doi:10.1007/978-1-4614-6849-3_12

Discriminant Analysis and Other Linear Classification Models

Max Kuhn³ &
Kjell Johnson⁴

Chapter

210k Accesses
11 Citations

Abstract

In this chapter we discuss models that classify samples using linear classification boundaries. We begin this chapter by describing a grant applications case study data set (Section 12.1) which will be used to illustrate models throughout this chapter as well as for Chapters 13-15. As foundational models, we discuss logistic regression (Section 12.2) and linear discriminant analysis (Section 12.3). In Section 12.4 we define and illustrates partial least squares discriminant analysis and its fundamental connection to linear discriminant analysis. Penalized models such as ridge penalty for logistic regression, glmnet, penalized linear discriminant analysis are discussed in Section 12.5. Nearest shrunken centroids, an approach tailored towards high dimensional data, is presented in Section 12.6. We demonstrate in the Computing Section (12.7) how to train each of these models in R. Finally, exercises are provided at the end of the chapter to solidify the concepts.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://blog.kaggle.com/.
2.
http://www.kaggle.com/c/unimelb.
3.
The RFCD codes can be found at http://tinyurl.com/25zvts while the SEO codes can be found at http://tinyurl.com/8435ae4.
4.
However, there are several tree-based methods described in Chap. 14 that are more effective if the categorical predictors are not converted to dummy variables. In these cases, the full set of categories are used.
5.
Data with three or more classes are usually modeled using the multinomial distribution. See Agresti (2002) for more details.
6.
Bayes’ Rule is examined in more detail in Sect. 13.6.
7.
This situation is addressed again for the naïve Bayes models in the next chapter.
8.
Mathematically, if we know C − 1 of the dummy variables, then the value of the last dummy variable is directly implied. Hence, it is also possible to only use C − 1 dummy variables.
9.
The perturbed covariance structure is due to the optimality constraints for the response matrix. Barker and Rayens (2003) astutely recognized that the response optimality constraint in this setting did not make sense, removed the constraint, and resolved the problem. Without the response-space constraint, the PLS solution is one that involves exactly the between-group covariance matrix.
10.
As a reminder, the set of predictors is not being selected on the basis of their association with the outcome. This unsupervised selection should not produce selection bias, which is an issue described in Sect. 19.5.
11.
Recall that the models are built on the pre-2008 data and then tuned based on the year 2008 holdout set. These predictions are from the PLSDA model with four components created using only the pre-2008 data.
12.
Another method for adding this penalty is discussed in the next chapter using neural networks. In this case, a neural network with weight decay and a single hidden unit constitutes a penalized logistic regression model. However, neural networks do not necessarily use thebinomial likelihood when determining parameter estimates (see Sect. 13.2).
13.
http://www.kaggle.com/c/unimelb.
14.
In a later chapter, several models are discussed that can represent the categorical predictors in different ways. For example, trees can use the dummy variables in splits but can often create splits based on one or more groupings of categories. In that chapter, factor versions of these predictors are discussed at length.
15.
Another duplicate naming issue may occur here. A function called sda in the sda package (for shrinkage discriminant analysis) may cause confusion. If both packages are loaded, using sparseLDA:::sda and sda:::sda will mitigate the issue.
16.
In microarray data, the number of predictors is usually much larger than the number of samples. Because of this, and the limited number of columns in popular spreadsheet software, the convention is reversed.
17.
http://www.sgi.com/tech/mlc.

References

Agresti A (2002). Categorical Data Analysis. Wiley–Interscience.
Google Scholar
Barker M, Rayens W (2003). “Partial Least Squares for Discrimination.” Journal of Chemometrics, 17(3), 166–173.
Article Google Scholar
Berntsson P, Wold S (1986). “Comparison Between X-ray Crystallographic Data and Physiochemical Parameters with Respect to Their Information About the Calcium Channel Antagonist Activity of 4-Phenyl-1,4-Dihydropyridines.” Quantitative Structure-Activity Relationships, 5, 45–50.
Article Google Scholar
Chung D, Keles S (2010). “Sparse Partial Least Squares Classification for High Dimensional Data.” Statistical Applications in Genetics and Molecular Biology, 9(1), 17.
Article MathSciNet MATH Google Scholar
Clemmensen L, Hastie T, Witten D, Ersboll B (2011). “Sparse Discriminant Analysis.” Technometrics, 53(4), 406–413.
Article MathSciNet Google Scholar
Dobson A (2002). An Introduction to Generalized Linear Models. Chapman & Hall/CRC.
Google Scholar
Dunn W, Wold S (1990). “Pattern Recognition Techniques in Drug Design.” In C Hansch, P Sammes, J Taylor (eds.), “Comprehensive Medicinal Chemistry,” pp. 691–714. Pergamon Press, Oxford.
Google Scholar
Eilers P, Boer J, van Ommen G, van Houwelingen H (2001). “Classification of Microarray Data with Penalized Logistic Regression.” In “Proceedings of SPIE,” volume 4266, p. 187.
Google Scholar
Fisher R (1936). “The Use of Multiple Measurements in Taxonomic Problems.” Annals of Eugenics, 7(2), 179–188.
Article Google Scholar
Friedman J, Hastie T, Tibshirani R (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software, 33(1), 1–22.
Article Google Scholar
Guo Y, Hastie T, Tibshirani R (2007). “Regularized Linear Discriminant Analysis and its Application in Microarrays.” Biostatistics, 8(1), 86–100.
Article MATH Google Scholar
Harrell F (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer, New York.
Book MATH Google Scholar
Hastie T, Tibshirani R (1990). Generalized Additive Models. Chapman & Hall/CRC.
Google Scholar
Hastie T, Tibshirani R, Friedman J (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2 edition.
Google Scholar
Liu Y, Rayens W (2007). “PLS and Dimension Reduction for Classification.” Computational Statistics, pp. 189–208.
Google Scholar
Park M, Hastie T (2008). “Penalized Logistic Regression for Detecting Gene Interactions.” Biostatistics, 9(1), 30.
Article MATH Google Scholar
Tibshirani R, Hastie T, Narasimhan B, Chu G (2002). “Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression.” Proceedings of the National Academy of Sciences, 99(10), 6567–6572.
Article Google Scholar
Tibshirani R, Hastie T, Narasimhan B, Chu G (2003). “Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays.” Statistical Science, 18(1), 104–117.
Article MathSciNet MATH Google Scholar
Welch B (1939). “Note on Discriminant Functions.” Biometrika, 31, 218–220.
MathSciNet MATH Google Scholar
Witten D, Tibshirani R (2009). “Covariance–Regularized Regression and Classification For High Dimensional Problems.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 71(3), 615–636.
Article MathSciNet MATH Google Scholar
Witten D, Tibshirani R (2011). “Penalized Classification Using Fisher’s Linear Discriminant.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 73(5), 753–772.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Division of Nonclinical Statistics, Pfizer Global Research and Development, Groton, Connecticut, USA
Max Kuhn
Arbor Analytics, Saline, Michigan, USA
Kjell Johnson

Authors

Max Kuhn
View author publications
You can also search for this author in PubMed Google Scholar
Kjell Johnson
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kuhn, M., Johnson, K. (2013). Discriminant Analysis and Other Linear Classification Models. In: Applied Predictive Modeling. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6849-3_12

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6849-3_12
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6848-6
Online ISBN: 978-1-4614-6849-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics