Beta-Boosted Ensemble for Big Credit Scoring Data
In this work we present the novel ensemble model for credit scoring problem. The main idea of the approach is to incorporate separate beta binomial distributions for each of the classes to generate balanced datasets that are further used to construct base learners that constitute the final ensemble model. The sampling procedure is performed on two separate ranking lists, each for one class, where the ranking is based on probability of observing positive class. The two strategies are considered in the studies: one assumes mining easy examples and the second one force good classification of hard cases. The proposed solutions are tested on two big datasets from credit scoring domain.
KeywordsCredit scoring Ensemble model Beta distribution Beta boost Big data
- Give Me Some Credit (2011) Give me some credit. https://www.kaggle.com/c/GiveMeSomeCredit
- Härdle WK, Prastyo DD, Hafner C (2012) Support vector machines with evolutionary feature selection for default prediction. In: Handbook of applied nonparametric and semi-parametric econometrics and statistics. Oxford University Press, Oxford, pp 346–373Google Scholar
- Kumar MP, Packer B, Koller D (2010) Self-paced learning for latent variable models. In: Advances in neural information processing systems. MIT Press, Cambridge, pp 1189–1197Google Scholar
- Lending Club (2016) Lending club loan data. https://www.kaggle.com/wendykan/lending-club-loan-data