An Optimized k-NN Approach for Classification on Imbalanced Datasets with Missing Data
In this paper, we describe our solution for the machine learning prediction challenge in IDA 2016. For the given problem of 2-class classification on an imbalanced dataset with missing data, we first develop an imputation method based on k-NN to estimate the missing values. Then we define a tailored representation for the given problem as an optimization scheme, which consists of learned distance and voting weights for k-NN classification. The proposed solution performs better in terms of the given challenge metric compared to the traditional classification methods such as SVM, AdaBoost or Random Forests.
Keywordsk-NN classifier Missing data Imbalanced datasets
- 2.Batista, G., Monard, M.C.: A study of k-nearest neighbour as an imputation method. Hybrid Intell. Syst. 87(48), 251–260 (2002)Google Scholar