An improved landmark-driven and spatial–channel attentive convolutional neural network for fashion clothes classification

Abstract

Fashion clothes classification encompasses spotting and identifying items of clothing in an image. This area of research has involved using deep neural networks to make an impact in the field of social media, e-commerce and fashion world. In this paper, we propose an attention-driven technique for tackling visual fashion clothes analysis in images, aiming to achieve clothing category classification and attribute prediction by producing regularised landmark layouts. For enhancing clothing classification, our fashion model incorporates two attention pipelines: landmark-driven attention and spatial–channel attention. These attention pipelines allow our model to represent multiscale contextual information of landmarks, thus improving the efficiency of classification by identifying the important features and locating where they exist in an input image. We evaluated the proposed network on two large-scale benchmark datasets: DeepFashion-C and fashion landmark detection (FLD). Experimental results show that the proposed architecture involving deep neural network outperforms other recently reported state-of-the-art techniques in the classification of fashion clothes.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. 1.

    Huang, C.-Q., Chen, J.-K., Pan, Y., Lai, H.-J., Yin, J., Huang, Q.-H.: Clothing landmark detection using deep networks with prior of key point associations. IEEE Trans. Cybern. 49(10), 3744–3754 (2019)

    Article  Google Scholar 

  2. 2.

    Jingyuan, L., Lu, H.: Deep fashion analysis with feature map upsampling and landmark-driven attention. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 30–36 (2018)

  3. 3.

    Wang, W.,  Xu, Y.,  Shen, J., Zhu, S.-C.: Attentive fashion grammar network for fashion landmark detection and clothing category classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4271–4280 (2018)

  4. 4.

    Yan, S.,  Liu, Z.,  Luo, P.,  Qiu, S.,  Wang, X.,  Tang, X.: Unconstrained fashion landmark detection via hierarchical recurrent transformer networks. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 172–180 (2017)

  5. 5.

    Cho, H., Ahn, C., Min Yoo, C. , Seol, J., Lee, S. G.: Leveraging class hierarchy in fashion classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)

  6. 6.

    Liu, Z., Luo, P., Qiu, S., Wang, X.,Tang, X.: Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1096–1104 (2016)

  7. 7.

    Simonyan,K., Zisserman,A., Very deep convolutional networks for large-scale image recognition. In: Third International Conference on Learning Representations (ICLR) (2015)

  8. 8.

    Liu, Z., Yan, S., Luo, P.,  Wang, X., Tang, X.: Fashion landmark detection in the wild. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 229–245. Springer, Berlin (2016)

  9. 9.

    Li, Y., Tang, S., Ye, Y., Ma, J.: Spatial-aware non-local attention for fashion landmark detection. arXiv preprint arXiv:1903.04104 (2019)

  10. 10.

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

  11. 11.

    Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)

  12. 12.

    Cao, Z., Simon, T.,Wei, S.-E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7291–7299 (2017)

  13. 13.

    Li, P., Li, Y., Jiang, X., Zhen, X.: Two-stream multi-task network for fashion recognition. arXiv preprint arXiv:1901.10172 (2019)

  14. 14.

    Lee, S., Eun, H., Oh, S., Kim, W., Jung, C., Kim, C.: Landmark-free clothes recognition with a two-branch feature selective network. Electron. Lett. 55(13), 745 (2019)

    Article  Google Scholar 

  15. 15.

    Chen, M., Qin, Y., Qi, L., Sun, Y.: Improving fashion landmark detection by dual attention feature enhancement. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)

  16. 16.

    Wang, W., Zhang, Z., Qi, S., Shen, J., Pang, Y., Shao, L.: Learning compositional neural information fusion for human parsing. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5703–5713 (2019)

  17. 17.

    Jaderberg, M., Simonyan, K., Zisserman, A., et al. Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 2017–2025 (2015)

  18. 18.

    Wang, F., Jiang, M., Qian, C., Yang, S.,  Li, C., Zhang, H.,  Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164 (2017)

  19. 19.

    Fan, D.P., Wang, W., Cheng, M.M., Shen,J.: Shifting more attention to video salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8554–8564 (2019)

  20. 20.

    Chen, K., Wang, J., Chen, L.-C., Gao, H., Xu, W., Nevatia, R.: ABC-CNN: An attention based convolutional neural network for visual question answering. arXiv preprint arXiv:1511.05960 (2015)

  21. 21.

    Shih, K. J., Singh, S., Hoiem, D.: Where to look: focus regions for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4613–4621 (2016)

  22. 22.

    Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.-S.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5659–5667 (2017)

  23. 23.

    Wang, W., Shen, J., Ling, H.: A deep network solution for attention and aesthetics aware photo cropping. Pattern Anal. Mach. Intell. 41(7), 1531–1544 (2018)

    Article  Google Scholar 

  24. 24.

    Shen, Z., Wang, W., Lu, X., Shen, J., Ling, H., Xu, T., Shao, L.: Human-aware motion deblurring. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5572–5581 (2019)

  25. 25.

    Yu, F.,  Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)

  26. 26.

    Kabbai, L., Abdellaoui, M., Douik, A.: Image classification by combining local and global features. Vis. Comput. 35(5), 679–693 (2019)

    Article  Google Scholar 

  27. 27.

    Chen, Y., Wang, K., Xiangyun, L., Qian, Y., Wang, Q., Yuan, Z., Heng, P.: Channel-Unet: a spatial channel-wise convolutional neural network for liver and tumors segmentation. Front. Genet. 10, 1110 (2019)

    Article  Google Scholar 

  28. 28.

    Junyu, G., Wang, Q., Yuan, Y.: SCAR: spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019)

    Article  Google Scholar 

  29. 29.

    Shao, J., Qu, C., Li, J., Peng, S.: A lightweight convolutional neural network based on visual attention for SAR image target classification. Sensors 18(9), 3039 (2018)

    Article  Google Scholar 

  30. 30.

    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7132–7141 (2018)

  31. 31.

    Paszke, A., Gross, S., Chintala, S., Chanan, G.,  Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer,A.: Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop (2017)

  32. 32.

    Kingma, D. P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the Third International Conference on Learning Representations (2015)

  33. 33.

    Hadi Kiapour, M., Han, X., Lazebnik, S., Berg, A. C., Berg, T. L.: Where to buy it: matching street clothing photos in online shops. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3343–3351 (2015)

  34. 34.

    Huang, J., Feris, R. S., Chen, Q., Yan, S.: Cross-domain image retrieval with a dual attribute-aware ranking network. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1062–1070 (2015)

  35. 35.

    Corbiere, C., Ben-Younes, H., Ramé, A., Ollion, C.: Leveraging weakly annotated data for fashion image retrieval and label prediction. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2268–2274 (2017)

  36. 36.

    Lee, S., Oh, S., Jung, C., Kim, C.: A Global-local embedding module for fashion landmark detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)

  37. 37.

    Qi, S., Wang, W., Jia, B., Shen, J., Zhu, S.C.: Learning human-object interactions by graph parsing neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 401–417 (2018)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Majuran Shajini.

Ethics declarations

Conflict of interest

We declare that we have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shajini, M., Ramanan, A. An improved landmark-driven and spatial–channel attentive convolutional neural network for fashion clothes classification. Vis Comput (2020). https://doi.org/10.1007/s00371-020-01885-7

Download citation

Keywords

  • Dilated convolutions
  • Fashion clothes classification
  • Landmark-driven
  • Spatial–channel attention