Advertisement

A Modulation Module for Multi-task Learning with Applications in Image Retrieval

  • Xiangyun Zhao
  • Haoxiang LiEmail author
  • Xiaohui Shen
  • Xiaodan Liang
  • Ying Wu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11205)

Abstract

Multi-task learning has been widely adopted in many computer vision tasks to improve overall computation efficiency or boost the performance of individual tasks, under the assumption that those tasks are correlated and complementary to each other. However, the relationships between the tasks are complicated in practice, especially when the number of involved tasks scales up. When two tasks are of weak relevance, they may compete or even distract each other during joint training of shared parameters, and as a consequence undermine the learning of all the tasks. This will raise destructive interference which decreases learning efficiency of shared parameters and lead to low quality loss local optimum w.r.t. shared parameters. To address the this problem, we propose a general modulation module, which can be inserted into any convolutional neural network architecture, to encourage the coupling and feature sharing of relevant tasks while disentangling the learning of irrelevant tasks with minor parameters addition. Equipped with this module, gradient directions from different tasks can be enforced to be consistent for those shared parameters, which benefits multi-task joint training. The module is end-to-end learnable without ad-hoc design for specific tasks, and can naturally handle many tasks at the same time. We apply our approach on two retrieval tasks, face retrieval on the CelebA dataset [12] and product retrieval on the UT-Zappos50K dataset [34, 35], and demonstrate its advantage over other multi-task learning methods in both accuracy and storage efficiency.

Notes

Acknowledgements

This work was supported in part by National Science Foundation grant IIS-1217302, IIS-1619078, the Army Research Office ARO W911NF-16-1-0138, and Adobe Collaboration Funding.

References

  1. 1.
    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition (2014)Google Scholar
  3. 3.
    Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742. IEEE (2006)Google Scholar
  4. 4.
    He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  5. 5.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  6. 6.
    Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  7. 7.
    Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)CrossRefGoogle Scholar
  8. 8.
    Jou, B., Chang, S.F.: Deep cross residual learning for multitask visual recognition. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 998–1007. ACM (2016)Google Scholar
  9. 9.
    Kokkinos, I.: Ubernet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  11. 11.
    Lin, K., Yang, H.F., Hsiao, J.H., Chen, C.S.: Deep learning of binary hash codes for fast image retrieval. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2015Google Scholar
  12. 12.
    Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV) (2015)Google Scholar
  13. 13.
    Lu, Y., et al.: Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  14. 14.
    Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3994–4003 (2016)Google Scholar
  15. 15.
    Mousavian, A., Pirsiavash, H., Košecká, J.: Joint semantic segmentation and depth estimation with deep convolutional networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 611–619. IEEE (2016)Google Scholar
  16. 16.
    Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3384–3391. IEEE (2010)Google Scholar
  17. 17.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1–8. IEEE (2007)Google Scholar
  18. 18.
    Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv preprint arXiv:1603.01249 (2016)
  19. 19.
    Ranjan, R., Sankaranarayanan, S., Castillo, C.D., Chellappa, R.: An all-in-one convolutional neural network for face analysis. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 17–24. IEEE (2017)Google Scholar
  20. 20.
    Rothe, R., Timofte, R., Van Gool, L.: Dex: deep expectation of apparent age from a single image. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 10–15 (2015)Google Scholar
  21. 21.
    Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. Mach. Learn. 39(2–3), 135–168 (2000)CrossRefGoogle Scholar
  22. 22.
    Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)Google Scholar
  23. 23.
    Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Advances in Neural Information Processing Systems, pp. 41–48 (2004)Google Scholar
  24. 24.
    Shazeer, N., et al.: Outrageously large neural networks: the sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538 (2017)
  25. 25.
    Shen, X., Lin, Z., Brandt, J., Avidan, S., Wu, Y.: Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3013–3020. IEEE (2012)Google Scholar
  26. 26.
    Veit, A., Belongie, S., Karaletsos, T.: Conditional similarity networks. In: Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  27. 27.
    Wan, J., et al.: Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 157–166. ACM (2014)Google Scholar
  28. 28.
    Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing for scalable image retrieval. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3424–3431. IEEE (2010)Google Scholar
  29. 29.
    Wang, P., et al.: Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2809 (2015)Google Scholar
  30. 30.
    Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)Google Scholar
  31. 31.
    Xing, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: Advances in Neural Information Processing Systems, pp. 521–528 (2003)Google Scholar
  32. 32.
    Yang, Y., Hospedales, T.M.: Deep multi-task representation learning: a tensor factorisation approach. CoRR abs/1605.06391 (2016)Google Scholar
  33. 33.
    Yin, X., Liu, X.: Multi-task convolutional neural network for face recognition. arXiv preprint arXiv:1702.04710 (2017)
  34. 34.
    Yu, A., Grauman, K.: Fine-grained visual comparisons with local learning. In: Computer Vision and Pattern Recognition (CVPR), June 2014Google Scholar
  35. 35.
    Yu, A., Grauman, K.: Semantic jitter: dense supervision for visual comparisons via synthetic images. In: International Conference on Computer Vision (ICCV), October 2017Google Scholar
  36. 36.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10590-1_53CrossRefGoogle Scholar
  37. 37.
    Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Xiangyun Zhao
    • 1
  • Haoxiang Li
    • 2
    Email author
  • Xiaohui Shen
    • 3
  • Xiaodan Liang
    • 4
  • Ying Wu
    • 1
  1. 1.EECS DepartmentNorthwestern UniversityEvanstonUSA
  2. 2.AIBeePalo AltoUSA
  3. 3.Bytedance AI LabBeijingChina
  4. 4.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations