Deep Fashion Analysis with Feature Map Upsampling and Landmark-Driven Attention

  • Jingyuan Liu
  • Hong LuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11131)


In this paper, we propose an attentive fashion network to address three problems of fashion analysis, namely landmark localization, category classification and attribute prediction. By utilizing a landmark prediction branch with upsampling network structure, we boost the accuracy of fashion landmark localization. With the aid of the predicted landmarks, a landmark-driven attention mechanism is proposed to help improve the precision of fashion category classification and attribute prediction. Experimental results show that our approach outperforms the state-of-the-arts on the DeepFashion dataset.


Fashion analysis Landmark detection Clothing category classification Attention mechanism Deep learning 


  1. 1.
    Chen, H., Gallagher, A., Girod, B.: Describing clothing by semantic attributes. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 609–623. Springer, Heidelberg (2012). Scholar
  2. 2.
    Chen, K., Wang, J., Chen, L.C., Gao, H., Xu, W., Nevatia, R.: ABC-CNN: an attention based convolutional neural network for visual question answering. arXiv preprint arXiv:1511.05960 (2015)
  3. 3.
    Corbiere, C., Ben-Younes, H., Ramé, A., Ollion, C.: Leveraging weakly annotated data for fashion image retrieval and label prediction. arXiv preprint arXiv:1709.09426 (2017)
  4. 4.
    Han, X., Wu, Z., Jiang, Y.G., Davis, L.S.: Learning fashion compatibility with bidirectional LSTMs. In: Proceedings of the 2017 ACM on Multimedia Conference, pp. 1078–1086. ACM (2017)Google Scholar
  5. 5.
    Hidayati, S.C., You, C.W., Cheng, W.H., Hua, K.L.: Learning and recognition of clothing genres from full-body images. IEEE Trans. Cybern. 48(5), 1647–1659 (2018)CrossRefGoogle Scholar
  6. 6.
    Huang, J., Feris, R.S., Chen, Q., Yan, S.: Cross-domain image retrieval with a dual attribute-aware ranking network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1062–1070 (2015)Google Scholar
  7. 7.
    Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)Google Scholar
  8. 8.
    Kalantidis, Y., Kennedy, L., Li, L.J.: Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. In: Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval, pp. 105–112. ACM (2013)Google Scholar
  9. 9.
    Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1096–1104 (2016)Google Scholar
  10. 10.
    Liu, Z., Yan, S., Luo, P., Wang, X., Tang, X.: Fashion landmark detection in the wild. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 229–245. Springer, Cham (2016). Scholar
  11. 11.
    Lu, Y., Kumar, A., Zhai, S., Cheng, Y., Javidi, T., Feris, R.: Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1131–1140 (2017)Google Scholar
  12. 12.
    Ma, Y., Jia, J., Zhou, S., Fu, J., Liu, Y., Tong, Z.: Towards better understanding the clothing fashion styles: a multimodal deep learning approach. In: AAAI, pp. 38–44 (2017)Google Scholar
  13. 13.
    de Melo, E.V., Nogueira, E.A., Guliato, D.: Content-based filtering enhanced by human visual attention applied to clothing recommendation. In: 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 644–651. IEEE (2015)Google Scholar
  14. 14.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  15. 15.
    Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems, pp. 1799–1807 (2014)Google Scholar
  16. 16.
    Wang, W., Xu, Y., Shen, J., Zhu, S.C.: Attentive fashion grammar network for fashion landmark detection and clothing category classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4271–4280 (2018)Google Scholar
  17. 17.
    Yan, S., Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Unconstrained fashion landmark detection via hierarchical recurrent transformer networks. In: Proceedings of the 2017 ACM on Multimedia Conference, pp. 172–180. ACM (2017)Google Scholar
  18. 18.
    Yan, Y., et al.: Unsupervised image saliency detection with gestalt-laws guided optimization and visual attention based refinement. Pattern Recognit. 79, 65–78 (2018)CrossRefGoogle Scholar
  19. 19.
    Yang, W., Luo, P., Lin, L.: Clothing co-parsing by joint image segmentation and labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3182–3189 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Shanghai Key Lab of Intelligent Information Processing, School of Computer ScienceFudan UniversityShanghaiPeople’s Republic of China

Personalised recommendations