Machine Learning

, Volume 107, Issue 3, pp 605–637 | Cite as

Manifold-based synthetic oversampling with manifold conformance estimation

  • Colin Bellinger
  • Christopher Drummond
  • Nathalie Japkowicz
Article
  • 116 Downloads

Abstract

Classification domains such as those in medicine, national security and the environment regularly suffer from a lack of training instances for the class of interest. In many cases, classification models induced under these conditions have poor predictive performance on the important minority class. Synthetic oversampling can be applied to mitigate the impact of imbalance by generating additional training instances. In this field, the majority of research has focused on refining the SMOTE algorithm. We note, however, that the generative bias of SMOTE is not appropriate for the large class of learning problems that conform to the manifold property. These are high-dimensional problems, such as image and spectral classification, with implicit feature spaces that are lower-dimensional than their physical data spaces. We show that ignoring this can lead to instances being generated in erroneous regions of the data space. We propose a general framework for manifold-based synthetic oversampling that helps users to select a domain-appropriate manifold learning method, such as PCA or autoencoder, and apply it to model and generate additional training samples. We evaluate data generation on theoretical distributions and image classification tasks that are standard in the manifold learning literature, and empirically show its positive impact on the classification of high-dimensional image and gamma-ray spectra tasks, along with 16 UCI datasets.

Keywords

Class imbalance Synthetic oversampling Manifold learning SMOTE 

Notes

Acknowledgements

Funding was provided by Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada, Health Canada, Ontario Graduate Scholarship.

References

  1. Akbani, R., Kwek, S., & Japkowicz, N. (2004). Applying support vector machines to imbalanced datasets. Machine Learning: ECML, 3201, 39–50.MATHGoogle Scholar
  2. Alain, G., & Bengio, Y. (2014). What regularized auto-encoders learn from the data generating distribution. The Journal of Machine Learning Research, 15(1), 3563–3593.MathSciNetMATHGoogle Scholar
  3. Alpaydin, E. (2014). Introduction to machine learning (3rd ed.). Cambridge, MA: MIT Press.MATHGoogle Scholar
  4. Batista, G., Prati, R. C., & Monard, M. C. (2004a). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKD Explorations Newsletter: Special Issue on Learning from Imbalanced Datasets, 6(1), 20.CrossRefGoogle Scholar
  5. Batista, G., Prati, R. C., & Monard, M. C. (2004b). A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter, 6(1), 20–26.CrossRefGoogle Scholar
  6. Batista, G. E., Bazzan, A. L. C., & Monard, M. C. (2003). Balancing training data for automated annotation of keywords: A case study. In Brazilian workshop on bioinformatics (pp. 10–18).Google Scholar
  7. Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.CrossRefMATHGoogle Scholar
  8. Belkin, M., & Niyogi, P. (2004). Semi-supervised learning on Riemannian manifolds. Machine Learning, 56, 209–239.CrossRefMATHGoogle Scholar
  9. Bellinger, C. (2016). Beyond the boundaries of SMOTE: A framework for manifold-based synthetic oversampling. PhD thesis.Google Scholar
  10. Bellinger, C., Drummond, C., & Japkowicz, N. (2016). Beyond the boundaries of SMOTE: A framework for manifold-based synthetic oversampling. In European conference on machine learning (pp. 1–16).Google Scholar
  11. Bellinger, C., Japkowicz, N., & Drummond, C. (2015). Synthetic oversampling for advanced radioactive threat detection. In International conference on machine learning and applications. doi: 10.1109/ICMLA.2015.58.
  12. Blondel, M., Seki, K., & Uehara, K. (2011). Tackling class imbalance and data scarcity in literature-based gene function annotation. In Proceedings of the 34th international ACM SIGIR conference on research and development in information—SIGIR’11 (pp. 1123–1124). New York, NY: ACM Press.Google Scholar
  13. Bunkhumpornpat, C., Sinapiromsaran, K., & Lursinsap, C. (2009). Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In Pacific-Asia conference on knowledge discovery and data mining (pp 475–482). Berlin: Springer.Google Scholar
  14. Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245–276.CrossRefGoogle Scholar
  15. Chapelle, O., Scholkopf, B., & Zien, A. (2006). Semi-supervised learning. Cambridge: MIT Press.CrossRefGoogle Scholar
  16. Chawla, N., Bowyer, K., Hall, L., & WP, K. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.MATHGoogle Scholar
  17. Chawla, N., Japkowicz, N., & Kolcz, A. (2003). Workshop learning from imbalanced data sets II. In International conference on machine learning.Google Scholar
  18. Chawla, N. V., Japkowicz, N., & Drive, P. (2004). Editorial: Special issue on learning from imbalanced data sets. ACM SIGKD Explorations Newsletter, 6(1), 2000–2004.CrossRefGoogle Scholar
  19. Chawla, N., & Zhou, Z. H. (2009). Data mining when classes are imbalanced and errors have costs. In Workshop, 13th Pacific-Asia conference on knowledge discovery and data mining.Google Scholar
  20. Courtney, M., & Ray, G. (2013). Determining the number of factors to retain in EFA: Using the SPSS R-Menu v2.0 to make more judicious estimations. Practical Assessment, Research & Evaluation, 18(8), 1–14.Google Scholar
  21. Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1923.CrossRefGoogle Scholar
  22. Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78.CrossRefGoogle Scholar
  23. Drummond, C., & Holte, R. C. (2003). C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In ICML workshop on learning from imbalanced datasets II.Google Scholar
  24. Fecker, D., Märgner, V., & Fingscheidt, T. (2013). Density-induced oversampling for highly imbalanced datasets. In P. R. Bingham & E. Y. Lam (Eds.), SPIE. 8661, image processing: Machine vision applications VI (Vol. 8661, pp. 86610P-1–86610P-11). Bellingham, WA: Society of Photo-Optical Instrumentation Engineers (SPIE).Google Scholar
  25. Gao, M., Hong, X., Chen, S., & Harris, C. J. (2012). Probability density function estimation based over-sampling for imbalanced two-class problems. In The 2012 international joint conference on neural networks (IJCNN) (pp. 1–8). IEEE.Google Scholar
  26. Garrido, L. E., Abad, F. J., & Ponsoda, V. (2011). Performance of Velicer’s minimum average partial factor retention method with categorical variables. Educational and Psychological Measurement, 71, 551–570.CrossRefGoogle Scholar
  27. Gauld, D. B. (2008). Topological properties of manifolds. The American Mathematical Monthly, 81(6), 633–636.MathSciNetCrossRefMATHGoogle Scholar
  28. Goldberg, A. B., Zhu, X., Singh, A., Xu, Z., & Nowak, R. (2009). Multi-manifold semi-supervised learning. Journal of Machine Learning Research, 5, 169–176.Google Scholar
  29. Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In D. S. Huang, X. P. Zhang, & G. B. Huang (Eds.), Advances in intelligent computing. ICIC 2005. Lecture Notes in Computer Science (Vol. 3644). Springer, Berlin, Heidelberg.Google Scholar
  30. He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (no. 3, pp. 1322–1328).Google Scholar
  31. He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. doi: 10.1109/TKDE.2008.239.CrossRefGoogle Scholar
  32. Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179–185.CrossRefMATHGoogle Scholar
  33. Humphreys, L. G., & Montanelli, R. G. J. (1975). An investigation of the parallel analysis criterion for determining the number of common factors. Multivariate Behavioral Research, 10(2), 193–205.CrossRefGoogle Scholar
  34. Huo, X., Ni, X. S., & Smith, A. K. (2007). A Survey of manifold-based learning methods. In International workshop on mining of enterprise data at recent advances in data mining of enterprise data: Algorithms and applications (pp. 691–745). Singapore: World Scientific.Google Scholar
  35. Japkowicz, N. (2000). Editor. In AAAI’2000 workshop on learning from imbalanced data sets.Google Scholar
  36. Japkowicz, N. (2001). Supervised versus unsupervised binary-learning by feedforward neural networks. Machine Learning, 42(1), 97–122. doi: 10.1023/A:1007660820062.CrossRefMATHGoogle Scholar
  37. Jo, T., & Japkowicz, N. (2004). Class imbalances versus small disjuncts. ACM SIGKDD Explorations Newsletter: Special Issue on Learning from Imbalanced Datasets, 6(1), 40–49.CrossRefGoogle Scholar
  38. Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20, 141–151.CrossRefGoogle Scholar
  39. Kangas, L. J., Keller, P. E., Siciliano, E. R., Kouzes, R. T., & Ely, J. H. (2008). The use of artificial neural networks in PVT-based radiation portal monitors. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 587(2–3), 398–412.CrossRefGoogle Scholar
  40. Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto.Google Scholar
  41. Kubat, M., Holte, R. C., & Matwin, S. (1998). Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30(2), 195–215.Google Scholar
  42. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.CrossRefGoogle Scholar
  43. Liu, M., Wang, R., Huang, Z., Shan, S., & Chen, X. (2013). Partial least squares regression on grassmannian manifold for emotion recognition. In Proceedings of the 15th ACM on international conference on multimodal interaction (pp. 525–530). ACM.Google Scholar
  44. Lui, Y. M., Beveridge, J. R., & Kirby, M. (2010). Action classification on product manifolds. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 833–839). IEEE.Google Scholar
  45. Ma, Y., & Fu, Y. (Eds.). (2011). Manifold learning theory and applications. CRC Press.Google Scholar
  46. Nguwi, Y. Y., & Cho, S. Y. (2009). Support vector self-organizing learning for imbalanced medical data. In 2009 international joint conference on neural networks (pp. 2250–2255). IEEE.Google Scholar
  47. Olmos, P., Diaz, J., Perez, J., Gomez, P., Rodellar, V., Aguayo, P., et al. (1991). A new approach to automatic radiation spectrum analysis. IEEE Transactions on Nuclear Science, 38(4), 971–975.CrossRefGoogle Scholar
  48. Raiche, G., Roipel, M., & Blais, J. G. (2006). Non graphical solutions for the Cattell’s scree test. In The international annual meeting of the psychometric.Google Scholar
  49. Revelle, W. (2013). psych: Procedures for personality and psychological research. Evanston, IL: Northwestern University. http://CRAN.R-project.org/package=psych Version = 1.3.2.
  50. Revelle, W., & Rocklin, T. (1979). Very simple structure: An alternative procedure for estimating the optimal number of interpretable factors. Multivariate Behavioral Research, 4(14), 403–414.CrossRefGoogle Scholar
  51. Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.CrossRefGoogle Scholar
  52. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1, pp. 318–362). MA: The MIT Press.Google Scholar
  53. Ruscio, J., & Roche, B. (2012). Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure. Psychological Assessment, 24(2), 282.CrossRefGoogle Scholar
  54. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.MathSciNetCrossRefMATHGoogle Scholar
  55. Silva, V. D., & Tenenbaum, J. B. (2003). Global versus local methods in nonlinear dimensionality reduction. In Advances in neural information processing systems (Vol. 15, pp. 721–728). Cambridge, MA: The MIT Press.Google Scholar
  56. Slama, R., Wannous, H., Daoudi, M., & Srivastava, A. (2015). Accurate 3D action recognition using learning on the Grassmann manifold. Pattern Recognition, 48(2), 556–567.CrossRefGoogle Scholar
  57. Stefanowski, J., & Wilk, S. (2007). Improving rule-based classifiers induced by MODLEM by selective pre-processing of imbalanced data. In ECML/PKDD international workshop on rough sets in knowledge discovery (RSKD’2007) (pp. 54–65).Google Scholar
  58. Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science (New York, NY), 290(5500), 2319–23.CrossRefGoogle Scholar
  59. Tuzel, O., Porikli, F., & Mee, P. (2007a). Human detection via classification on Riemannian manifolds. In 2017 IEEE conference on computer vision and pattern recognition, Minneapolis, MN, 2007 (pp. 1–8).Google Scholar
  60. Tuzel, O., Porikli, F., & Meer, P. (2007b). Human detection via classification on riemannian manifolds. In IEEE conference on computer vision and pattern recognition, 2007. CVPR’07 (pp. 1–8). IEEE.Google Scholar
  61. Tuzel, O., Porikli, F., & Meer, P. (2008). Pedestrian detection via classification on riemannian manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10), 1713–1727.CrossRefGoogle Scholar
  62. Van Hulse, J., Khoshgoftaar, T. M., & Napolitano, A. (2007). Experimental perspectives on learning from imbalanced data. In Proceedings of the 24th international conference on machine learning (pp. 935–942).Google Scholar
  63. Velicer, W. (1976). Determining the number of components from the matrix of partial correlations. Psychometrika, 41(3), 321–327.CrossRefMATHGoogle Scholar
  64. Vincent, P. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising. Criterion, 11, 3371–3408.MathSciNetMATHGoogle Scholar
  65. Wallace, B. C., Small, K., Brodley, C. E., & Trikalinos, T. A. (2011). Class imbalance, redux. In 2011 IEEE 11th international conference on data mining (pp. 754–763). IEEE.Google Scholar
  66. Wang, S., Minku, L. L., Chawla, N., & Yao, X. (2017). Workshop on learning in the presence of class imbalance and concept drift. In IJCAI 2017 workshop.Google Scholar
  67. Wei, J., Peng, H., Lin, Y.-S., Huang, Z.-M., & Wang, J.-B. (2008). Adaptive neighborhood selection for manifold learning. In 2008 international conference on machine learning and cybernetics (Vol. 1, pp. 380–384).Google Scholar
  68. Weinberger, K. Q., Sha, F., & Saul, L. K. (2004a). Learning a kernel matrix for nonlinear dimensionality reduction. In International conference on machine learning (pp. 106–113).Google Scholar
  69. Weinberger, K. Q., Sha, F., & Saul, L. K. (2004b). Learning a kernel matrix for nonlinear dimensionality reduction. In Proceedings of the twenty-first international conference on machine learning (p. 106). ACM.Google Scholar
  70. Weiss, G. M. (2004). Mining with rarity: A unifying framework. SIGKDD Explorations Newsletter, 6(1), 7–19. doi: 10.1145/1007730.1007734.CrossRefGoogle Scholar
  71. Xue, H. U. I., & Chen, S. C. (2007). Alternative robust local embedding. In 2007 international conference on wavelet analysis and pattern recognition (pp. 591–596).Google Scholar
  72. Yang, Q., Wu, X., Elkan, C., Gehrke, J., Han, J., Heckerman, D., et al. (2006). 10 challenging problems in data mining research. International Journal of Information Technology and Decision Making, 5(4), 597–604.CrossRefGoogle Scholar
  73. Yoshida, E., Shizuma, K., Endo, S., & Oka, T. (2002). Application of neural networks for the analysis of gamma-ray spectra measured with a Ge spectrometer. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 484(1–3), 557–563.CrossRefGoogle Scholar
  74. Zhang, D., & Chen, X. (2005). Text classification with kernels on the multinomial manifold. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 266–273).Google Scholar
  75. Zhu, M., & Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics and Data Analysis, 51(2), 918–930.MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Department of Computing ScienceUniversity of AlbertaEdmontonCanada
  2. 2.Department of Computer ScienceAmerican UniversityWashingtonUSA
  3. 3.National Research Council of CanadaOttawaCanada

Personalised recommendations