If We Did Not Have ImageNet: Comparison of Fisher Encodings and Convolutional Neural Networks on Limited Training Data

  • Christian HentschelEmail author
  • Timur Pratama Wiradarma
  • Harald Sack
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9475)


This work aims to compare two competing approaches for image classification, namely Bag-of-Visual-Words (BoVW) and Convolutional Neural Networks (CNNs). Recent works have shown that CNNs (Convolutional Neural Networks) have surpassed hand-crafted feature extraction techniques in image classification problems. Their success is partly attributed to the fact that benchmarking initiatives such as ImageNet in a massive crowd sourcing effort gathered sufficient data necessary to train deep neural networks with a very large number of model parameters. Obviously, manually annotated training datasets on a similar scale cannot be provided in every classification scenario due to the massive amount of required resources and time. In this paper, we therefore analyze and compare the performance of BoVW- and CNN-based approaches for image classification as a function of the available training data. We show that CNNs benefit from growing datasets while BoVW-based classifiers outperform CNNs when only limited data is available. Evidence is given by experiments with gradually increasing training data and visualizations of the classification models.


Training Image Convolutional Neural Network Deep Neural Network Linear Support Vector Machine Image Pane 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)Google Scholar
  2. 2.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  3. 3.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2. IEEE (1999)Google Scholar
  4. 4.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)CrossRefGoogle Scholar
  5. 5.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25. Curran Associates, Inc. (2012)Google Scholar
  6. 6.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recoginition. In: International Conference on Learning Representations (ICLR) (2015)Google Scholar
  7. 7.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: Delving deep into convolutional nets (2014). CoRR abs/1405.3531
  8. 8.
    Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA (2014)Google Scholar
  9. 9.
    Wei, Y., Xia, W., Huang, J., Ni, B., Dong, J., Zhao, Y., Yan, S.: CNN: single-label to multi-label (2014). CoRR abs/1406.5726
  10. 10.
    Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition (2014). CoRR abs/1403.6382
  11. 11.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks (2013). CoRR abs/1311.2901
  12. 12.
    Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 303–338 (2010)CrossRefGoogle Scholar
  13. 13.
    Fei-Fei, L., Fergus, R.P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 594–611 (2006)CrossRefGoogle Scholar
  14. 14.
    Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods (2011)Google Scholar
  15. 15.
    Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008).
  16. 16.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding (2014). arXiv preprint arXiv:1408.5093

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Christian Hentschel
    • 1
    Email author
  • Timur Pratama Wiradarma
    • 1
  • Harald Sack
    • 1
  1. 1.Hasso Plattner Institute for Software Systems EngineeringPotsdamGermany

Personalised recommendations