Adding New Tasks to a Single Network with Weight Transformations Using Binary Masks

  • Massimiliano ManciniEmail author
  • Elisa Ricci
  • Barbara Caputo
  • Samuel Rota Bulò
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11130)


Visual recognition algorithms are required today to exhibit adaptive abilities. Given a deep model trained on a specific, given task, it would be highly desirable to be able to adapt incrementally to new tasks, preserving scalability as the number of new tasks increases, while at the same time avoiding catastrophic forgetting issues. Recent work has shown that masking the internal weights of a given original conv-net through learned binary variables is a promising strategy. We build upon this intuition and take into account more elaborated affine transformations of the convolutional weights that include learned binary masks. We show that with our generalization it is possible to achieve significantly higher levels of adaptation to new tasks, enabling the approach to compete with fine tuning strategies by requiring slightly more than 1 bit per network parameter per additional task. Experiments on two popular benchmarks showcase the power of our approach, that achieves the new state of the art on the Visual Decathlon Challenge.


Incremental learning Multi-task learning 


  1. 1.
    Bendale, A., Boult, T.E.: Towards open set deep networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 1563–1572 (2016)Google Scholar
  2. 2.
    Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
  3. 3.
    Bilen, H., Fernando, B., Gavves, E., Vedaldi, A., Gould, S.: Dynamic image networks for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3034–3042 (2016)Google Scholar
  4. 4.
    Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3606–3613. IEEE (2014)Google Scholar
  5. 5.
    Eitz, M., Hays, J., Alexa, M.: How do humans sketch objects? ACM Trans. Graph. 31(4), Article no. 44–1 (2012)Google Scholar
  6. 6.
    French, R.M.: Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 3(4), 128–135 (1999)CrossRefGoogle Scholar
  7. 7.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
  8. 8.
    Goodfellow, I.J., Mirza, M., Xiao, D., Courville, A., Bengio, Y.: An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013)
  9. 9.
    Goodman, R.M., Zeng, Z.: A learning algorithm for multi-layer perceptrons with hard-limiting threshold units. In: Proceedings of the 1994 IEEE Workshop Neural Networks for Signal Processing 1994 IV, pp. 219–228. IEEE (1994)Google Scholar
  10. 10.
    Guerriero, S., Caputo, B., Mensink, T.: Deep nearest class mean classifiers. In: International Conference on Learning Representations, Worskhop Track (2018)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  12. 12.
    Hinton, G.: Neural networks for machine learning (2012). Coursera, video lecturesGoogle Scholar
  13. 13.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
  14. 14.
    Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Nat. Acad. Sci. U.S.A. 114(13), 3521–3526 (2017)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 2013 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 554–561. IEEE (2013)Google Scholar
  16. 16.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)Google Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  18. 18.
    Kuzborskij, I., Orabona, F., Caputo, B.: From N to N+1: multiclass transfer incremental learning. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013, pp. 3358–3365 (2013)Google Scholar
  19. 19.
    Kuzborskij, I., Orabona, F., Caputo, B.: Scalable greedy algorithms for transfer learning. Comput. Vis. Image Underst. 156, 174–185 (2017)CrossRefGoogle Scholar
  20. 20.
    Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2935–2947 (2017)CrossRefGoogle Scholar
  22. 22.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  23. 23.
    Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)
  24. 24.
    Mallya, A., Lazebnik, S.: PackNet: adding multiple tasks to a single network by iterative pruning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  25. 25.
    Mallya, A., Davis, D., Lazebnik, S.: Piggyback: adapting a single network to multiple tasks by learning to mask weights. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 72–88. Springer, Cham (2018). Scholar
  26. 26.
    McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 109–165 (1989)CrossRefGoogle Scholar
  27. 27.
    Mensink, T., Verbeek, J.J., Perronnin, F., Csurka, G.: Distance-based image classification: generalizing to new classes at near-zero cost. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2624–2637 (2013)CrossRefGoogle Scholar
  28. 28.
    Munder, S., Gavrila, D.M.: An experimental study on pedestrian classification. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1863–1868 (2006)CrossRefGoogle Scholar
  29. 29.
    Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning, vol. 2011, p. 5 (2011)Google Scholar
  30. 30.
    Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Sixth Indian Conference on Computer Vision, Graphics & Image Processing 2008, ICVGIP 2008, pp. 722–729. IEEE (2008)Google Scholar
  31. 31.
    Rebuffi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: Advances in Neural Information Processing Systems, pp. 506–516 (2017)Google Scholar
  32. 32.
    Rebuffi, S.A., Bilen, H., Vedaldi, A.: Efficient parametrization of multi-domain deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8119–8127 (2018)Google Scholar
  33. 33.
    Ristin, M., Guillaumin, M., Gall, J., Van Gool, L.: Incremental learning of random forests for large-scale image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(3), 490–503 (2016)CrossRefGoogle Scholar
  34. 34.
    Rosenfeld, A., Tsotsos, J.K.: Incremental learning through deep adaptation. arXiv preprint arXiv:1705.04228 (2017)
  35. 35.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)
  37. 37.
    Saleh, B., Elgammal, A.: Large-scale classification of fine-art paintings: Learning the right metric on the right feature. arXiv preprint arXiv:1505.00855 (2015)
  38. 38.
    Silver, D.L., Yang, Q., Li, L.: Lifelong machine learning systems: beyond learning algorithms. In: AAAI Spring Symposium: Lifelong Machine Learning, vol. 13, p. 05 (2013)Google Scholar
  39. 39.
    Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
  40. 40.
    Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 32, 323–332 (2012)CrossRefGoogle Scholar
  41. 41.
    Thrun, S., Mitchell, T.M.: Lifelong robot learning. Robot. Auton. Syst. 15(1–2), 25–46 (1995)CrossRefGoogle Scholar
  42. 42.
    Thrun, S., Pratt, L.: Learning to Learn. Springer. Heidelberg (2012)Google Scholar
  43. 43.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset (2011)Google Scholar
  44. 44.
    Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Massimiliano Mancini
    • 1
    • 2
    Email author
  • Elisa Ricci
    • 2
    • 3
  • Barbara Caputo
    • 4
  • Samuel Rota Bulò
    • 5
  1. 1.Sapienza University of RomeRomeItaly
  2. 2.Fondazione Bruno KesslerTrentoItaly
  3. 3.University of TrentoTrentoItaly
  4. 4.Italian Institute of TechnologyGenoaItaly
  5. 5.Mapillary ResearchGrazAustria

Personalised recommendations