SP-VITON: shape-preserving image-based virtual try-on network

  • Dan Song
  • Tianbao Li
  • Zhendong Mao
  • An-An LiuEmail author


Image-based virtual try-on networks for changing the outfit of a person in an image with the desired clothes of another image have attracted increasing research interests. Previous work try to extract a clothing-agnostic person representation from the original person image and then synthesize it with the given clothes image through a try-on network. However, their body shape representation just downsamples the clothed body segmentation to a low resolution, which is too coarse and still contains noises of original clothes and may result in unrealistic artifacts. Correspondingly, we propose an SP-VITON (Shape-Preserving VIrtual Try-On Network) to keep the user’s original body shape while getting rid of the original clothes. Firstly, we augment the shape variety of the dataset and estimate the 2D shape under clothes of the person using DensePose. Then a try-on network is trained with the augmented dataset and new shape representation. Experiment results show our improvements for applying to various shapes and clothes types of the input person image, compared with the state-of-the-art image-based try-on methods.


Virtual try-on Shape-preserving Person image synthesis Image alignment 



This work was supported in part by the National Nature Science Foundation of China (61902277,61772359,61872267,61702471), the grant of 2019 Tianjin New Generation Artificial Intelligence Major Program, the grant of 2018 Tianjin New Generation Artificial Intelligence Major Program (18ZXZNGX00150), the Open Project Program of the State Key Lab of CAD & CG, Zhejiang University (Grant No.A1907), the grant of Elite Scholar Program of Tianjin University (2019XRX-0035).


  1. 1.
    Alp Güler R, Neverova N, Kokkinos I (2018) Densepose: dense human pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7297–7306Google Scholar
  2. 2.
    Anguelov D, Srinivasan P, Koller D, Thrun S, Rodgers J, Davis J (2005) Scape: shape completion and animation of people. In: ACM transactions on graphics (TOG), vol 24. ACM, pp 408–416Google Scholar
  3. 3.
    Bălan AO, Black MJ (2008) The naked truth: estimating body shape under clothing. In: European conference on computer vision. Springer, pp 15–29Google Scholar
  4. 4.
    Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. In: IEEE transactions on pattern analysis and machine intelligence (TPAMI) 24(4):509–522Google Scholar
  5. 5.
    Bender J, Müller M, Macklin M (2015) Position-based simulation methods in computer graphics. In: Eurographics (tutorials)Google Scholar
  6. 6.
    Bogo F, Kanazawa A, Lassner C, Gehler P, Romero J, Black MJ (2016) Keep it smpl: automatic estimation of 3d human pose and shape from a single image. In: European conference on computer vision. Springer, pp 561–578Google Scholar
  7. 7.
    Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1302–1310Google Scholar
  8. 8.
    Cheng Z, Ding Y, He X, Zhu L, Song X, Kankanhalli M (2018) A3ncf: an adaptive aspect attention model for rating predictionGoogle Scholar
  9. 9.
    Cheng Z, Ding Y, Zhu L, Kankanhalli M (2018) Aspect-aware latent factor model: rating prediction with ratings and reviews. arXiv:1802.07938
  10. 10.
    Dong H, Liang X, Gong K, Lai H, Zhu J, Yin J (2018) Soft-gated warping-gan for pose-guided person image synthesis. In: Advances in neural information processing systems, pp 472–482Google Scholar
  11. 11.
    Han X, Wu Z, Wu Z, Yu R, Davis LS (2018) Viton: an image-based virtual try-on network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7543–7552Google Scholar
  12. 12.
    Hao T, Wang B, Zhao L, Feng X, Sun J (2018) Reconstruction and analysis of a genome-scale metabolic network foreriocheir sinensishepatopancreas. IEEE Access 6:79235–79244CrossRefGoogle Scholar
  13. 13.
    Hao T, Yu AL, Peng W, Wang B, Sun JS (2016) Cross domain mitotic cell recognition. Neurocomputing 195:6–12CrossRefGoogle Scholar
  14. 14.
    Hasler N, Stoll C, Rosenhahn B, Thormählen T, Seidel HP (2009) Estimating body shape of dressed humans. Comput Graphics 33(3):211–216CrossRefGoogle Scholar
  15. 15.
    Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Comput SciGoogle Scholar
  16. 16.
    Liang X, Gong K, Shen X, Lin L (2018) Look into person: joint body parsing & pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach IntellGoogle Scholar
  17. 17.
    Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2015) Smpl: a skinned multi-person linear model. ACM Transactions on Graphics (TOG) 34(6):248CrossRefGoogle Scholar
  18. 18.
    Macklin M, Müller M, Chentanez N (2016) Xpbd: position-based simulation of compliant constrained dynamics. In: Proceedings of the 9th international conference on motion in games. ACM , pp 49–54Google Scholar
  19. 19.
    Miguel E, Bradley D, Thomaszewski B, Bickel B, Matusik W, Otaduy MA, Marschner S (2012) Data-driven estimation of cloth simulation models. In: Computer graphics forum. Wiley Online Library, vol 31, pp 519–528Google Scholar
  20. 20.
    Neverova N, Alp Guler R, Kokkinos I (2018) Dense pose transfer. In: Proceedings of the European conference on computer vision (ECCV), pp 123–138Google Scholar
  21. 21.
    Omran M, Lassner C, Pons-Moll G, Gehler P, Schiele B (2018) Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: 2018 international conference on 3d vision (3DV). IEEE, pp 484–494Google Scholar
  22. 22.
    Pons-Moll G, Pujades S, Hu S, Black MJ (2017) Clothcap: seamless 4d clothing capture and retargeting. ACM Transactions on Graphics (TOG) 36(4):73CrossRefGoogle Scholar
  23. 23.
    Raj A, Sangkloy P, Chang H, Hays J, Ceylan D, Lu J (2018) Swapnet: image based garment transfer. In: European conference on computer vision. Springer, pp 679–695Google Scholar
  24. 24.
    Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242Google Scholar
  25. 25.
    Sarafianos N, Boteanu B, Ionescu B, Kakadiaris IA (2016). In: 3d human pose estimation: a review of the literature and analysis of covariates, vol 152, pp 1–20Google Scholar
  26. 26.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
  27. 27.
    Song D, Tong R, Chang J, Yang X, Tang M, Zhang JJ (2016) 3d body shapes estimation from dressed-human silhouettes. In: Computer graphics forum. Wiley Online Library, vol 35, pp 147–156Google Scholar
  28. 28.
    Song D, Tong R, Du J, Zhang Y, Jin Y (2018) Data-driven 3d human body customization with a mobile device. IEEE AccessGoogle Scholar
  29. 29.
    Tang M, Tong R, Narain R, Meng C, Manocha D (2013) A gpu-based streaming algorithm for high-resolution cloth simulation. In: Computer graphics forum. Wiley Online Library, vol 32, pp 21–30Google Scholar
  30. 30.
    Tsoli A, Mahmood N, Black MJ (2014) Breathing life into shape: capturing, modeling and animating 3d human breathing. ACM Transactions on Graphics (TOG) 33(4):52CrossRefGoogle Scholar
  31. 31.
    Wang B, Zheng H, Liang X, Chen Y, Lin L, Yang M (2018) Toward characteristic-preserving image-based virtual try-on network. In: European conference on computer vision. Springer, pp 607–623Google Scholar
  32. 32.
    Wang H, O’Brien JF, Ramamoorthi R (2011) Data-driven elastic models for cloth: modeling and measurement. In: ACM transactions on graphics (TOG), vol 30. ACM, p 71Google Scholar
  33. 33.
    Wu Z, Lin G, Tao Q, Cai J (2018) M2e-try on net: fashion from model to everyone. arXiv:1811.08599
  34. 34.
    Wuhrer S, Pishchulin L, Brunton A, Shu C, Lang J (2014) Estimation of human body shape and posture under clothing. Comput Vis Image Underst 127:31–42CrossRefGoogle Scholar
  35. 35.
    Yang S, Pan Z, Amert T, Wang K, Yu L, Berg T, Lin MC (2018) Physics-inspired garment recovery from a single-view image. ACM Transactions on Graphics (TOG) 37(5):170CrossRefGoogle Scholar
  36. 36.
    Yoo D, Kim N, Park S, Paek AS, Kweon IS (2016) Pixel-level domain transfer. In: European conference on computer vision, pp 517–532Google Scholar
  37. 37.
    Zhou S, Fu H, Liu L, Cohen-Or D, Han X (2010) Parametric reshaping of human bodies in images. In: ACM transactions on graphics (TOG), vol 29. ACM, p 126Google Scholar
  38. 38.
    Zhu S, Urtasun R, Fidler S, Lin D, Change Loy C (2017) Be your own prada: fashion synthesis with structural coherence. In: Proceedings of the IEEE international conference on computer vision, pp 1680–1688Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Multimedia Institute, School of Electrical and Information EngineeringTianjin UniversityTianjinChina
  2. 2.Institute of Information Engineering, Chinese Academy of Sciences (CAS) and the School of Cyber SecurityUniversity of CASBeijingChina

Personalised recommendations