Unsupervised Deep Clustering for Fashion Images

  • Cairong YanEmail author
  • Umar Subhan Malhi
  • Yongfeng Huang
  • Ran Tao
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1027)


In many visual domains like fashion, building an effective unsupervised clustering model depends on visual feature representation instead of structured and semi-structured data. In this paper, we propose a fashion image deep clustering (FiDC) model which includes two parts, feature representation and clustering. The fashion images are used as the input and are processed by a deep stacked autoencoder to produce latent feature representation, and the output of this autoencoder will be used as the input of the clustering task. Since the output of the former has a great influence on the later, the strategy adopted in the model is to integrate the learning process of the autoencoder and the clustering together. The autoencoder is trained with the optimal number of neurons per hidden layers to avoid overfitting and we optimize the cluster centroid by using stochastic gradient descent and backpropagation algorithm. We evaluate FiDC model on a real-world fashion dataset downloaded from Amazon where images have been extracted into 4096-dimensional visual feature vectors by convolutional neural networks. The experimental results show that our model achieves state-of-the-art performance.


Unsupervised clustering Representation learning Autoencoder Fashion images 



This research was partly funded by the National Natural Science Foundation of China (No. 61402100).


  1. 1.
    Kiapour, M.H., Han, X., Lazebnik, S., Berg, A.C., Berg, T.L.: Where to buy it: matching street clothing photos in online shops. In: ICCV, Santiago (2015)Google Scholar
  2. 2.
    Chen, Q., Huang, J., Feris, R., Brown, L.M., Dong, J., Yan, S.: Deep domain adaptation for describing people based on fine-grained clothing attributes. In: CVPR, Boston (2015)Google Scholar
  3. 3.
    Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: CVPR, Las Vegas (2016)Google Scholar
  4. 4.
    Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T. L.: Parsing clothing in fashion photographs. In: CCPR, pp. 3570–3577 (2012)Google Scholar
  5. 5.
    Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Retrieving similar styles to parse clothing. IEEE Trans. Pattern Anal. Mach. Intell. 37(5), 1028–1040 (2015)CrossRefGoogle Scholar
  6. 6.
    Al-Halah, Z., Stiefelhagen, R., Grauman, K.: Fashion forward: forecasting visual style in fashion. In: IEEE International Conference on Computer Vision (ICCV), Venice, Italy (2017)Google Scholar
  7. 7.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)Google Scholar
  8. 8.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)zbMATHGoogle Scholar
  9. 9.
    Eamonn, K., Abdullah, M.: Curse of dimensionality. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Machine Learning and Data Mining. Springer, Boston (2017). Scholar
  10. 10.
    Hinton, E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, New York (2017)Google Scholar
  12. 12.
    Guo, X., Gao, L., Liu, X., Yin, J.: Improved deep embedded clustering with local structure preservation. In: IJCAI (2017)Google Scholar
  13. 13.
    Yang, B., Fu, X., Sidiropoulos, N.D., Hong, M.: Towards k-means-friendly spaces: simultaneous deep learning and clustering. In ICML (2016)Google Scholar
  14. 14.
    Tian, F., Gao, B., Cui, Q., Chen, E., Liu, T.-Y.: Learning deep representations for graph clustering. In: AAAI (2014)Google Scholar
  15. 15.
    Peng, X., Xiao, S., Feng, J., Yau, W.-Y., Yi, Z.: Deep subspace clustering with sparsity prior. In: IJCAI (2016)Google Scholar
  16. 16.
    Hsu, C.-C., Lin, C.-W.: Cnn-based joint clustering and representation learning with feature drift compensation for large-scale image data. IEEE Trans. Multimed. 20(2), 421–429 (2017)CrossRefGoogle Scholar
  17. 17.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Chowdhury, A.M.S., Rahman, M.S., Khanom, A., Chowdhury, T. I., Uddin, A.: On stacked denoising autoencoder based pre-training of ANN for isolated handwritten Bengali numerals dataset recognition. In: ICERIE, Sylhet (2017)Google Scholar
  19. 19.
    Krizhevsky, A., Hinton, G.E.: Using very deep autoencoders for content-based image retrieval. In: ESANN (2011)Google Scholar
  20. 20.
    Sarle, W.S.: Stopped training and other remedies for overfitting. In: Proceedings of the 27th Symposium on the Interface of Computing Science and Statistics (1995)Google Scholar
  21. 21.
    Hinton, G., Salakhutdinov, R.: Learning a non-linear embedding by preserving class neighbourhood structure. In: International Conference on Artificial Intelligence and Statistics (2007).
  22. 22.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: AISTATS (2011)Google Scholar
  23. 23.
    van der Maaten, L., Hinton, G.E.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar
  24. 24.
    McAuley, J., Targett, C., Shi, Q., van den Hengel, A.: Image-based recommendations on styles and substitutes. In: SIGIR, New York (2015)Google Scholar
  25. 25.
    Shelhamer, E., Donahue, J., Jia, Y., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM (2014)Google Scholar
  26. 26.
    Krizhevsky, A., Sutskever,I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012)Google Scholar
  27. 27.
    He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: IW3C2 (2016)Google Scholar
  28. 28.
    Cai, D., He, X., Han, J.: Locally consistent concept factorization for document clustering. IEEE Trans. Knowl. Data Eng. 23(6), 902–913 (2011)CrossRefGoogle Scholar
  29. 29.
    Santos, J.M., Embrechts, M.: On the use of the adjusted rand index as a metric for evaluating supervised classification. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009. LNCS, vol. 5769, pp. 175–184. Springer, Heidelberg (2009). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Cairong Yan
    • 1
    Email author
  • Umar Subhan Malhi
    • 1
  • Yongfeng Huang
    • 1
  • Ran Tao
    • 1
  1. 1.School of Computer Science and TechnologyDonghua UniversityShanghaiChina

Personalised recommendations