Strategies for Tackling the Class Imbalance Problem in Marine Image Classification

  • Daniel Langenkämper
  • Robin van Kevelaer
  • Tim W. Nattkemper
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11188)


Research of deep learning algorithms, especially in the field of convolutional neural networks (CNN), has shown significant progress. The application of CNNs in image analysis and pattern recognition has earned a lot of attention in this regard and few applications to classify a small number of common taxa in marine image collections have been reported yet.

In this paper, we address the problem of class imbalance in marine image data, i.e. the common observation that 80%–90% of the data belong to a small subset of \(L^\prime \) classes among the total number of L observed classes, with \(L^\prime \ll L\). A small number of methods to compensate for the class imbalance problem in the training step have been proposed for the common computer vision benchmark datasets. But marine image collections (showing for instance megafauna as considered in this study) pose a greater challenge as the observed imbalance is more extreme as habitats can feature a high biodiversity but a low species density.

In this paper, we investigate the potential of various over-/undersampling methods to compensate for the class imbalance problem in marine imaging. In addition, five different balancing rules are proposed and analyzed to examine the extent to which sampling should be used, i.e. how many samples should be created or removed to gain the most out of the sampling algorithms. We evaluate these methods with AlexNet trained for classifying benthic image data recorded at the Porcupine Abyssal Plain (PAP) and use a Support Vector Machine as baseline classifier. We can report that the best of our proposed strategies in combination with data augmentation applied to AlexNet results in an increase of thirteen basis points compared to AlexNet without sampling. Furthermore, examples are presented, which show that the combination of oversampling and augmentation leads to a better generalization than pure augmentation.


Class imbalance CNN Marine imaging Deep learning Taxonomic classification 



We thank the National Oceanography Centre for providing the data and consultation and NVIDIA Corporation for donating the GPU used in this project. This project has received funding by Projektträger Jülich (grant no 03F0707C) under the framework of JPI Oceans.


  1. 1.
    Batista, G., Prati, R., Monard, M.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004)CrossRefGoogle Scholar
  2. 2.
    Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. arXiv preprint arXiv:1710.05381 (2017)
  3. 3.
    Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  4. 4.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009, pp. 248–255. IEEE (2009)Google Scholar
  5. 5.
    Elkan, C.: The foundations of cost-sensitive learning. In: IJCAI, vol. 17, pp. 973–978. Lawrence Erlbaum Associates Ltd. (2001)Google Scholar
  6. 6.
    Ferri, C., Hernndez-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30(1), 27–38 (2009)CrossRefGoogle Scholar
  7. 7.
    Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)CrossRefGoogle Scholar
  8. 8.
    He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IJCNN 2008, pp. 1322–1328. IEEE (2008)Google Scholar
  9. 9.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV, pp. 1026–1034 (2015)Google Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012)Google Scholar
  11. 11.
    Kukar, M., Kononenko, I., et al.: Cost-sensitive learning with neural networks. In: ECAI, pp. 445–449 (1998)Google Scholar
  12. 12.
    Lawrence, S., Burns, I., Back, A., Tsoi, A.C., Giles, C.L.: Neural network classification and prior class probabilities. In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 1524, pp. 299–313. Springer, Heidelberg (1998). Scholar
  13. 13.
    Morris, K.J., Bett, B.J., Durden, J.M., et al.: A new method for ecological surveyingof the abyss using autonomous underwater vehicle photography. Limnol: Oceanogr. Methods 12, 795–809 (2014)Google Scholar
  14. 14.
    Pawara, P., Okafor, E., Schomaker, L., Wiering, M.: Data augmentation for plant classification. In: Blanc-Talon, J., Penne, R., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2017. LNCS, vol. 10617, pp. 615–626. Springer, Cham (2017). Scholar
  15. 15.
    Perez, L., Wang, J.: The effectiveness of data augmentation in image classification using deep learning. CoRR abs/1712.04621 (2017).
  16. 16.
    Richard, M.D., Lippmann, R.P.: Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Comput. 3(4), 461–483 (1991)CrossRefGoogle Scholar
  17. 17.
    Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)CrossRefGoogle Scholar
  18. 18.
    Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Daniel Langenkämper
    • 1
  • Robin van Kevelaer
    • 1
  • Tim W. Nattkemper
    • 1
  1. 1.Biodata Mining Group, Faculty of TechnologyBielefeld UniversityBielefeldGermany

Personalised recommendations