Advertisement

Sharing ConvNet Across Heterogeneous Tasks

  • Takumi KobayashiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10635)

Abstract

Deep convolutional neural network (ConvNet) is one of the most promising approaches to produce state-of-the-art performance on image recognition. The ConvNet exhibits excellent performance on the task of the training target as well as favorable transferability to the other datasets/tasks. It, however, is still dependent on the characteristics of the training dataset and thus deteriorates performance on the other types of task, such as by transferring the ConvNet pre-trained on ImageNet from object classification to scene classification. In this paper, we propose a method to improve generalization performance of ConvNets. In the proposed method, the ConvNet layers are partially shared across heterogeneous tasks (datasets) in end-to-end learning, while the remaining layers are tailored to respective datasets. The method provides models of various generality and specialty by controlling the degree of shared layers, which are effectively trained by introducing the diversity into mini-batches. It is also applicable to fine-tuning the ConvNet especially on a smaller-scale dataset. The experimental results on image classification using ImageNet and Places-365 datasets show that our method improves performance on those datasets as well as provides the pre-trained ConvNet of higher generalization power with favorable transferability.

References

  1. 1.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)Google Scholar
  2. 2.
    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRefGoogle Scholar
  3. 3.
    Ouyang, W., Wang, X., Zeng, X., Qiu, S., Luo, P., Tian, Y., Li, H., Yang, S., Wang, Z., Loy, C., Tang, X.: Deepid-net: deformable deep convolutional neural networks for object detection. In: CVPR, pp. 2403–2412 (2015)Google Scholar
  4. 4.
    Bertinetto, L., Valmadre, J., Henriques, J., Vedaldi, A., Torr, P.: Fully-convolutional siamese networks for object tracking arXiv:1606.09549 (2016)
  5. 5.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)Google Scholar
  6. 6.
    Zhou, B., Khosla, A., Lapedriza, A., Torralba, A., Oliva, A.: Places: an image database for deep scene understanding. arXiv:1610.02055 (2016)
  7. 7.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML, pp. 807–814 (2010)Google Scholar
  8. 8.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)zbMATHMathSciNetGoogle Scholar
  9. 9.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. J. Mach. Learn. Res. 37, 448–456 (2015)Google Scholar
  10. 10.
    Azizpour, H., Razavian, A.S., Sullivan, J., Maki, A., Carlsson, S.: Factors of transferability for a generic convnet representation. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1790–1802 (2016)CrossRefGoogle Scholar
  11. 11.
    Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: CVPR, pp. 1717–1724 (2014)Google Scholar
  12. 12.
    Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR Workshop, pp. 512–519 (2014)Google Scholar
  13. 13.
    Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: ECCV Workshop, pp. 1–22 (2004)Google Scholar
  14. 14.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)Google Scholar
  15. 15.
    Collobert, R., Weston, J.: A unified architecture for natural language processing. In: ICML, pp. 160–167 (2008)Google Scholar
  16. 16.
    Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)Google Scholar
  17. 17.
    Vedaldi, A., Lenc, K.: MatConvNet - convolutional neural networks for matlab. In: ACM MM (2015)Google Scholar
  18. 18.
    The PASCAL Visual Object Classes Challenge 2007 (VOC 2007). http://www.pascal-network.org/challenges/VOC/voc2007/index.html
  19. 19.
    Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical report 7694, Caltech (2007)Google Scholar
  20. 20.
    Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR, pp. 413–420 (2009)Google Scholar
  21. 21.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, pp. 2169–2178 (2006)Google Scholar
  22. 22.
    Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from Abbey to zoo. In: CVPR (2010)Google Scholar
  23. 23.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset. Technical report CNS-TR-2011-001, California Institute of Technology (2011)Google Scholar
  24. 24.
    Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Indian Conference on Computer Vision, Graphics and Image Processing (2008)Google Scholar
  25. 25.
    Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.V.: Cats and dogs. In: CVPR, pp. 3498–3505 (2012)Google Scholar
  26. 26.
    Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: ICCV (2007)Google Scholar
  27. 27.
    Yao, B., Jiang, X., Khosla, A., Lin, A., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: ICCV (2011)Google Scholar
  28. 28.
    Sharan, L., Rosenholtz, R., Adelson, E.: Material perception: What can you see in a brief glance? J. Vis. 9(8), 784 (2009)CrossRefGoogle Scholar
  29. 29.
    Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.National Institute of Advanced Industrial Science and TechnologyTsukubaJapan

Personalised recommendations