Advertisement

Macro-Micro Adversarial Network for Human Parsing

  • Yawei Luo
  • Zhedong Zheng
  • Liang Zheng
  • Tao Guan
  • Junqing Yu
  • Yi Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)

Abstract

In human parsing, the pixel-wise classification loss has drawbacks in its low-level local inconsistency and high-level semantic inconsistency. The introduction of the adversarial network tackles the two problems using a single discriminator. However, the two types of parsing inconsistency are generated by distinct mechanisms, so it is difficult for a single discriminator to solve them both. To address the two kinds of inconsistencies, this paper proposes the Macro-Micro Adversarial Net (MMAN). It has two discriminators. One discriminator, Macro D, acts on the low-resolution label map and penalizes semantic inconsistency, e.g., misplaced body parts. The other discriminator, Micro D, focuses on multiple patches of the high-resolution label map to address the local inconsistency, e.g., blur and hole. Compared with traditional adversarial networks, MMAN not only enforces local and semantic consistency explicitly, but also avoids the poor convergence problem of adversarial networks when handling high resolution images. In our experiment, we validate that the two discriminators are complementary to each other in improving the human parsing accuracy. The proposed framework is capable of producing competitive parsing performance compared with the state-of-the-art methods, i.e., mIoU = 46.81% and 59.91% on LIP and PASCAL-Person-Part, respectively. On a relatively small dataset PPSS, our pre-trained model demonstrates impressive generalization ability. The code is publicly available at https://github.com/RoyalVane/MMAN.

Keywords

Human parsing Adversarial network Inconsistency Macro-Micro 

Notes

Acknowledgment

This work is partially supported by the National Natural Science Foundation of China (No. 61572211). We acknowledge the Data to Decisions CRC (D2D CRC) and the Cooperative Research Centers Programme for funding this research.

References

  1. 1.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRefGoogle Scholar
  2. 2.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv preprint arXiv:1606.00915 (2016)
  3. 3.
    Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3640–3649 (2016)Google Scholar
  4. 4.
    Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: detecting and representing objects using holistic models and body parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1971–1978 (2014)Google Scholar
  5. 5.
    Dai, W., et al.: SCAN: Structure Correcting Adversarial Network for Chest X-rays Organ Segmentation. arXiv preprint arXiv:1703.08770 (2017)
  6. 6.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition 2009, CVPR 2009, pp. 248–255. IEEE (2009)Google Scholar
  7. 7.
    Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1486–1494 (2015)Google Scholar
  8. 8.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2010 (VOC2010) results. http://www.pascal-network.org/challenges/VOC/voc2010/workshop/index.html
  9. 9.
    Gan, C., Lin, M., Yang, Y., de Melo, G., Hauptmann, A.G.: Concepts not alone: exploring pairwise relationships for zero-shot video activity recognition. In: AAAI, p. 3487 (2016)Google Scholar
  10. 10.
    Gong, K., Liang, X., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing (2017). arXiv preprint arXiv:1703.05446
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  12. 12.
    Hung, W.C., Tsai, Y.H., Liou, Y.T., Lin, Y.Y., Yang, M.H.: Adversarial learning for semi-supervised semantic segmentation (2018). arXiv preprint arXiv:1802.07934
  13. 13.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. arXiv preprint (2017)Google Scholar
  14. 14.
    Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANS for improved quality, stability, and variation, In: ICLR (2018)Google Scholar
  15. 15.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  16. 16.
    Kohli, P., Torr, P.H.: Robust higher order potentials for enforcing label consistency. Int. J. Comput. Vis. 82(3), 302–324 (2009)CrossRefGoogle Scholar
  17. 17.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Advances in Neural Information Processing Systems, pp. 109–117 (2011)Google Scholar
  18. 18.
    Li, Q., Arnab, A., Torr, P.H.: Holistic, instance-level human parsing. arXiv preprint arXiv:1709.03612 (2017)
  19. 19.
    Liang, X., et al.: Deep human parsing with active template regression. IEEE Trans. Pattern Anal. Mach. Intell. 37(12), 2402–2414 (2015)CrossRefGoogle Scholar
  20. 20.
    Liang, X., Shen, X., Xiang, D., Feng, J., Lin, L., Yan, S.: Semantic object parsing with local-global long short-term memory. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3185–3193 (2016)Google Scholar
  21. 21.
    Liang, X., Xu, C., Shen, X., Yang, J., Liu, S.: Human parsing with contextualized convolutional neural network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1386–1394 (2015)Google Scholar
  22. 22.
    Liu, Z., Li, X., Luo, P., Loy, C.C., Tang, X.: Semantic image segmentation via deep parsing network. In: Computer Vision (ICCV), IEEE International Conference on 2015, pp. 1377–1385. IEEE (2015)Google Scholar
  23. 23.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  24. 24.
    Luc, P., Couprie, C., Chintala, S., Verbeek, J.: Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408 (2016)
  25. 25.
    Luo, P., Wang, X., Tang, X.: Pedestrian parsing via deep decompositional network. In: Computer Vision (ICCV), IEEE International Conference on 2013, pp. 2648–2655, IEEE (2013)Google Scholar
  26. 26.
    Luo, Y., Guan, T., Pan, H., Wang, Y., Yu, J.: Accurate localization for mobile device using a multi-planar city model. In: Pattern Recognition (ICPR), 23rd International Conference on 2016, pp. 3733–3738. IEEE (2016)Google Scholar
  27. 27.
    Moeskops, P., Veta, M., Lafarge, M.W., Eppenhof, K.A.J., Pluim, J.P.W.: Adversarial training and dilated convolutions for brain MRI segmentation. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 56–64. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-67558-9_7CrossRefGoogle Scholar
  28. 28.
    Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier Gans. arXiv preprint arXiv:1610.09585 (2016)
  29. 29.
    Park, S., Nie, X., Zhu, S.C.: Attribute and-or grammar for joint parsing of human pose, parts and attributes. IEEE Trans. Pattern Anal. Mach. Intell., 1555–1569 (2017)CrossRefGoogle Scholar
  30. 30.
    Reed, S.E., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. In: Advances in Neural Information Processing Systems, pp. 217–225 (2016)Google Scholar
  31. 31.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  32. 32.
    Ulyanov, D., Vedaldi, A., Lempitsky, V.S.: Instance normalization: the missing ingredient for fast stylization (2016). CoRR abs/1607.08022. http://arxiv.org/abs/1607.08022
  33. 33.
    Xia, F., Wang, P., Chen, L.-C., Yuille, A.L.: Zoom better to see clearer: human and object parsing with hierarchical auto-zoom net. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 648–663. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_39CrossRefGoogle Scholar
  34. 34.
    Xia, F., Zhu, J., Wang, P., Yuille, A.L.: Pose-guided human parsing by an and/or graph using pose-context features. In: AAAI, pp. 3632–3640 (2016)Google Scholar
  35. 35.
    Xue, Y., Xu, T., Zhang, H., Long, R., Huang, X.: SegAN: adversarial network with multi-scale \( l\_1 \) loss for medical image segmentation. arXiv preprint arXiv:1706.01805 (2017)
  36. 36.
    Zhang, X., Kang, G., Wei, Y., Yang, Y., Huang, T.: Self-produced guidance for weakly-supervised object localization. In: European Conference on Computer Vision. Springer (2018)Google Scholar
  37. 37.
    Zhang, X., Wei, Y., Feng, J., Yang, Y., Huang, T.: Adversarial complementary learning for weakly supervised object localization. In: IEEE CVPR (2018)Google Scholar
  38. 38.
    Zhong, Z., Zheng, L., Li, S., Yang, Y.: Generalizing a person retrieval model hetero- and homogeneously. In: ECCV (2018)Google Scholar
  39. 39.
    Zhong, Z., Zheng, L., Zheng, Z., Li, S., Yang, Y.: Camera style adaptation for person re-identification. In: CVPR (2018)Google Scholar
  40. 40.
    Zhu, S., Fidler, S., Urtasun, R., Lin, D., Loy, C.C.: Be your own prada: fashion synthesis with structural coherence. In: International Conference on Computer Vision (ICCV) (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yawei Luo
    • 1
    • 2
  • Zhedong Zheng
    • 2
  • Liang Zheng
    • 2
    • 3
  • Tao Guan
    • 1
  • Junqing Yu
    • 1
  • Yi Yang
    • 2
  1. 1.School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhanChina
  2. 2.CAI, University of Technology SydneySydneyAustralia
  3. 3.Singapore University of Technology and DesignSingaporeSingapore

Personalised recommendations