Advertisement

Parameter-Free Imputation for Imbalance Datasets

  • Jintana Takum
  • Chumphol Bunkhumpornpat
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8839)

Abstract

Class imbalance is a problem that aims to improve the accuracy of a minority class, while imputation is a process to replace missing values. Traditionally, class imbalance and imputation problems are considered independently. In addition, filled-in minority-class values that are substituted by traditional methods are not sufficient for imbalance datasets. In this paper, we provide a new parameter-free imputation to operate on imbalance datasets by estimating a random value between the mean of the missing value attribute and a value in this attribute of the closet record instance from the missing value record. Our proposed algorithm ignores mean of instances to avoid an over-fitting problem. Consequently, experimental results on imbalance datasets reveal that our imputation outperforms other techniques, when class imbalance measures are used.

Keywords

Imputation Parameter-Free Class Imbalance Classification K-Nearest Neighbours 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gelman, A., Hill, J.: Data Analysis Using Regression and Multi-level/Hierarchical Models. In: Missing-data Imputation, pp. 529–544. Cambridge University Press (2006)Google Scholar
  2. 2.
    Batista, G., Monard, M.C.: A study of K-nearest neighbour as an imputation method. In: Abraham, A., et al. (eds.) Hybrid Intell. Syst., Ser. Front Artif. Intell. Appl., vol. 87, pp. 251–260. IOS Press (2002)Google Scholar
  3. 3.
    Batista, G., Monard, M.C.: Experimental comparison of K-nearest neighbour and mean or mode imputation methods with the internal strategies used by C4.5 and CN2 to treat missing data. Tech. Rep., University of Sao Paulo (2003)Google Scholar
  4. 4.
    Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. Department of Information and Computer Sci-ences, University of California, Irvine, California, USA (2009), http://archive.ics.uci.edu/ml/
  5. 5.
    Bradley, A.P.: The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition 30(6), 1145–1159 (1997)CrossRefGoogle Scholar
  6. 6.
    Buckland, M., Gey, F.: The Relationship between Recall and Precision. Journal of the American Society for Information Science 45(1), 12–19 (1994)CrossRefGoogle Scholar
  7. 7.
    Bunkhumpornpat, C., Subpaiboonkit, S.: Safe Level Graph for Synthetic Minority Over-sampling Techniques. In: The 13th International Symposium on Communications and Information Technologies (ISCIT) indexed in IEEE Xplore, Samui Island, Thailand, pp. 570–575 (2013)Google Scholar
  8. 8.
    Zhu, H., Lee, S.-Y., Wei, B.-C., Zhou, J.: Case-deletion meas-ures for models with incomplete data. Biometrika, 727–737 (2001)Google Scholar
  9. 9.
    Japkowicz, N.: Class imbalance Problem: Significance and Strategies. In: The 2000 International Conference on Artificial Intelligence (IC-AI 2000), Las Vegas, NV, USA, pp. 111–117 (2000)Google Scholar
  10. 10.
    Hall, M.A., Frank, E., Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. The Kaufmann Series in Data Management Systems (2011)Google Scholar
  11. 11.
    Solomon, N., Oatley, G., McGarry, K.: A Fast Multivariate Nearest Neighbour Imputation Algorithm (2007) (manuscript received March 9)Google Scholar
  12. 12.
    Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing Mis-classifica-tion Costs. In: The 11th International Conference on Machine Learning, ICML 1994, pp. 217–225. Morgan Kaufmann, San Francisco (1994)CrossRefGoogle Scholar
  13. 13.
    Garcıa-Laencina, P.J., Sancho-Gomez, J.-L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Computing and Applications (2009)Google Scholar
  14. 14.
    Randall Wilson, D., Martinez, T.R.: Improved Heterogeneous Distance Functions. AI Access Foundation and Morgan Kaufmann Publishers. Journal of Artificial Intelligence Research 6, 1–34 (1997)MathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jintana Takum
    • 1
  • Chumphol Bunkhumpornpat
    • 1
  1. 1.Theoretical and Empirical Research Group, Department of Computer Science, Faculty of ScienceChiang Mai UniversityChiang MaiThailand

Personalised recommendations