Advertisement

Cluster Computing

, Volume 22, Supplement 3, pp 7123–7134 | Cite as

Retrieving real world clothing images via multi-weight deep convolutional neural networks

  • Ruifan LiEmail author
  • Fangxiang Feng
  • Ibrar Ahmad
  • Xiaojie Wang
Article

Abstract

Clothing images are abundantly available from the Internet, especially from the e-commercial platform. Retrieving those images is of importance for commercial and social applications and has recently been received tremendous attention from communities, such as multimedia processing and computer vision. However, the large variations in clothing of their appearance and style, and even the large quantity of multiple categories and attributes make those problems challenging. Furthermore, for real world images their labels provided by shop retailers from webpages are largely erroneous or incomplete. And the imbalance among those image categories prevents the effective learning. To overcome those problems, in this paper, we adopt a multi-task deep learning framework to learn the representation. And we propose multi-weight deep convolutional neural networks for imbalance learning. The topology of this network contains two groups of layers, shared layers at the bottom and task dependent ones at the top. Furthermore, category-relevant parameters are incorporated to regularize the backward gradients for categories. Mathematical proof shows its relationship to regulating the learning rates. Experiments demonstrate that our proposed joint framework and multi-weight neural networks can effectively learn robust representations and achieve better performance.

Keywords

Clothing image retrieval Convolutional neural network Multi-task Multi-weight 

Notes

Acknowledgements

This work was partially supported by National Natural Science Foundation of China (Nos. 61273365, 61472046, and 61472048) and Discipline Building Plan in 111 Base (No. B08004). The authors thank Prof. Chuan Shi at Beijing University of Posts and Telecommunications for reading the draft of this paper and for giving helpful comments. The authors would also like to thank the editor and the anonymous reviewers for useful comments and suggestions that allowed them to improve the final version of this paper.

References

  1. 1.
    Bai, Y., Yang, K., Yu, W., Ma, W.Y., Zhao, T.: Learning High-level Image Representation for Image Retrieval via Multi-Task DNN using Click through Data. arXiv:1312.4740 [cs.CV] (2013)
  2. 2.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  3. 3.
    Berg, T.L., Berg, A.C., Shih, J.: Automatic attribute discovery and characterization from noisy web data. In: European Conference on Computer Vision (ECCV). PART I, pp. 663–676. Heraklion (2010)Google Scholar
  4. 4.
    Chen, H., Gallagher, A., Girod, B.: Describing clothing by semantic attributes. In: European Conference on Computer Vision (ECCV). PART III, pp. 609–623. Firenze (2012)Google Scholar
  5. 5.
    Di, W., Wah, C., Bhardwaj, A., Piramuthu, R.: Style finder: Fine-grained clothing style detection and retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR), pp. 8–13 (2013)Google Scholar
  6. 6.
    Dong, Q., Gong, S., Zhu, X.: Multi-task curriculum transfer deep learning of clothing attributes. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 520–529. Santa Rosa (2017)Google Scholar
  7. 7.
    Feng, F., Li, R., Wang, X.: Deep correspondence restricted Boltzmann machine for cross-modal retrieval. Neurocomputing 154(C), 50–60 (2015)CrossRefGoogle Scholar
  8. 8.
    Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 38(1), 142–158 (2016)CrossRefGoogle Scholar
  9. 9.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Gordon, G.J., Dunson, D.B. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 15, pp. 315–323. Fort Lauderdale (2011)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. Las Vegas (2016)Google Scholar
  11. 11.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 [cs.CV] (2012)
  12. 12.
    Huang, J., Feris, R., Chen, Q., Yan, S.: Cross-domain image retrieval with a dual attribute-aware ranking network. In: IEEE International Conference on Computer Vision (ICCV), pp. 1062–1070. Santiago (2015)Google Scholar
  13. 13.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia (MM), pp. 675–678. Orlando (2014)Google Scholar
  14. 14.
    Jing, Y., Liu, D., Kislyuk, D., Zhai, A., Xu, J., Donahue, J., Tavel, S.: Visual search at pinterest. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 1889–1898. Sydney (2015)Google Scholar
  15. 15.
    Kalantidis, Y., Kennedy, L., Li, L.J.: Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. In: ACM Conference on International Conference on Multimedia Retrieval (ICMR), pp. 105–112. Dallas (2013)Google Scholar
  16. 16.
    Kiapour, M.H., Han, X., Lazebnik, S., Berg, A.C., Berg, T.L.: Where to buy it: matching street clothing photos in online shops. In: IEEE International Conference on Computer Vision (ICCV), pp. 3343–3351. Santiago (2015)Google Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (NIPS), pp. 1097–1105. Lake Tahoe (2012)Google Scholar
  18. 18.
    Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Describable visual attributes for face verification and image search. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 33(10), 1962–1977 (2011)CrossRefGoogle Scholar
  19. 19.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  20. 20.
    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRefGoogle Scholar
  21. 21.
    Lin, K., Yang, H.F., Liu, K.H., Hsiao, J.H., Chen, C.S.: Rapid clothing retrieval via deep learning of binary codes and hierarchical search. In: ACM International Conference on Multimedia Retrieval (ICMR), pp. 499–502. Shanghai (2015)Google Scholar
  22. 22.
    Liu, S., Song, Z., Liu, G., Xu, C., Lu, H., Yan, S.: Street-to-shop: cross-scenario clothing retrieval via parts alignment and auxiliary set. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3330–3337. Nara (2012)Google Scholar
  23. 23.
    Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1096–1104. Las Vegas (2016)Google Scholar
  24. 24.
    Lynch, C., Aryafar, K., Attenberg, J.: Images don’t lie: transferring deep visual semantic features to large-scale multimodal learning to rank. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 541–548. San Francisco (2016)Google Scholar
  25. 25.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Nature 323(6), 533–536 (1986)CrossRefGoogle Scholar
  26. 26.
    Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: 2010 European Conference of Computer Vision (ECCV), pp. 1–14. Heraklion (2012)Google Scholar
  27. 27.
    Shankar, D., Narumanchi, S., Ananya, H.A., Kompalli, P., Chaudhury, K.: Deep learning based large scale visual recommendation and search for E-Commerce. arXiv:1703.02344 [cs.CV] (2017)
  28. 28.
    Shankar, S.: DEEP-CARVING: discovering visual attributes by carving deep neural nets. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3403–3412. Boston (2015)Google Scholar
  29. 29.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR), pp. 1–14. San Diego (2015)Google Scholar
  30. 30.
    Simoserra, E., Ishikawa, H.: Fashion style in 128 floats: joint ranking and classification using weak data for feature extraction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 298–307. Las Vegas (2016)Google Scholar
  31. 31.
    Tangseng, P., Wu, Z., Yamaguchi, K.: Looking at outfit to parse clothing. arXiv:1703.01386 [cs.CV] (2017)
  32. 32.
    Wang, D., Gao, X., Wang, X., He, L., Yuan, B.: Multimodal discriminative binary embedding for large-scale cross-modal retrieval. IEEE Trans. Image Process. (TIP) 25(10), 4540–4554 (2016)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Wang, X., Sun, Z., Zhang, W., Zhou, Y., Jiang, Y.G.: Matching user photos to online products with robust deep features. In: ACM on International Conference on Multimedia Retrieval (ICMR), pp. 7–14. New York (2016)Google Scholar
  34. 34.
    Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Parsing clothing in fashion photographs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3570–3577. Washington, DC (2012)Google Scholar
  35. 35.
    Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Retrieving similar styles to parse clothing. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 37(5), 1028–40 (2015)CrossRefGoogle Scholar
  36. 36.
    Zhai, A., Kislyuk, D., Jing, Y., Feng, M., Tzeng, E., Donahue, J., Du, Y.L., Darrell, T.: Visual discovery at pinterest. arXiv:1702.04680 [cs.CV] (2017)
  37. 37.
    Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.D.: PANDA: Pose Aligned networks for deep attribute modeling. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1637–1644. Los Alamitos (2014)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Ruifan Li
    • 1
    • 2
    Email author
  • Fangxiang Feng
    • 3
  • Ibrar Ahmad
    • 1
    • 4
  • Xiaojie Wang
    • 1
    • 2
  1. 1.School of Computer ScienceBeijing University of Posts and TelecommunicationsBeijingChina
  2. 2.Engineering Research Center of Information NetworksMinistry of EducationBeijingChina
  3. 3.School of Digital Media and Design ArtsBeijing University of Posts and TelecommunicationsBeijingChina
  4. 4.Department of Computer ScienceUniversity of PeshawarPeshawarPakistan

Personalised recommendations