Chapter

Applied Predictive Modeling

pp 419-443

Remedies for Severe Class Imbalance

  • Max KuhnAffiliated withDivision of Nonclinical Statistics, Pfizer Global Research and Development
  • , Kjell JohnsonAffiliated withArbor Analytics

* Final gross prices may vary according to local VAT.

Get Access

Abstract

When modeling discrete classes, the relative frequencies of the classes can have a significant impact on the effectiveness of the model. An imbalance occurs when one or more classes have very low proportions in the training data as compared to the other classes. Imbalance can be present in any data set or application, and hence, the practitioner should be aware of the implications of modeling this type of data. To illustrate the impacts and remedies for severe class imbalance, we present a case study example (Section 16.1) and the impact of class imbalance on performances measures (Section 16.2). Sections 16.3-16.6 describe approaches for handling imbalance using the existing data such as maximizing minority class accuracy, adjusting classification cut-offs or prior probabilities, or adjusting sample weights prior to model tuning. Handling imbalance can also be done through sophisticated up- or down-sampling methods (Section 16.7) or by applying costs to the classification errors (Section 16.8). In the Computing Section (16.9) we demonstrate how to implement these remedies in R. Finally, exercises are provided at the end of the chapter to solidify the concepts.