Abstract
Human segmentation from a single image using deep learning models has obtained significant performance improvements. However, when directly adopting a deep human segmentation model on video human segmentation, the performance is unsatisfactory due to some issues, e.g., the segmentation results of video frames are discontinuous, and the speed of segmentation process is slow. To address these issues, we propose a new real-time video-based human segmentation framework which is designed for the single person from videos to produces smoothing, accurate and fast human segmentation results. The proposed framework for video human segmentation consists of a fully convolutional network and a tracking module based on a level set algorithm, where the fully convolutional network segments the human part in the first frame of the video sequence, and the tracking module obtains the segmentation results of other frames using the segmentation result of the last frame as the initial segmentation. The fully convolutional network is trained using human images datasets. To evaluate the proposed framework for video human segmentation, we have created and annotated a new single person video dataset. The experimental results demonstrate very accurate and smoothing human segmentation with very higher speed only using a deep human segmentation model.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bi, S., Liang, D.: Human segmentation in a complex situation based on properties of the human visual system. In: 2006 6th World Congress on Intelligent Control and Automation, vol. 2, pp. 9587–9590 (2006)
Chopp, D.L.: Computing minimal surfaces via level set curvature flow. J. Comput. Phys. 106, 77–91 (1993)
Dai, J., He, K., Sun, J.: Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: 2015 IEEE International Conference on Computer Vision (ICCV). pp. 1635–1643, December 2015
Gu, D., Zhao, Y., Yuan, Y., Hu, G.: Human segmentation based on disparity map and grabcut. In: 2012 International Conference on Computer Vision in Remote Sensing, pp. 67–71, December 2012
Heo, S., Koo, H.I., Kim, H.I., Cho, N.I.: Human segmentation algorithm for real-time video-call applications. In: 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 1–4, October 2013
Hernandez-Vela, A., et al.: Graph cuts optimization for multi-limb human segmentation in depth maps. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 726–732, June 2012
Jia, Y., et al.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia. MM 2014, pp. 675–678. ACM, New York (2014)
Junior, J.C.S.J., Jung, C.R., Musse, S.R.: Skeleton-based human segmentation in still images. In: 2012 19th IEEE International Conference on Image Processing, pp. 141–144, September 2012
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874, June 2014
Kim, K., Oh, C., Sohn, K.: Non-parametric human segmentation using support vector machine. In: 2016 IEEE International Conference on Consumer Electronics (ICCE), pp. 131–132, January 2016
Kim, Y.S., Yoon, J.C., Lee, I.K.: Real-time human segmentation from RGB-d video sequence based on adaptive geodesic distance computation. In: Multimedia Tools and Applications, November 2017
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Kohli, P., Rihan, J., Bray, M., Torr, P.H.: Simultaneous segmentation and pose estimation of humans using dynamic graph cuts. Int. J. Comput. Vision 79(3), 285–298 (2008)
Kumar, R., Kumar, R., Gopalakrishnan, V., Iyer, K.N.: Fast human segmentation using color and depth. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1922–1926, March 2017
Lee, Y.T., Su, T.F., Su, H.R., Lai, S.H., Lee, T.C., Shih, M.Y.: Human segmentation from video by combining random walks with human shape prior adaption. In: 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 1–4, October 2013
Li, C., Xu, C., Gui, C., Fox, M.D.: Level set evolution without re-initialization: a new variational formulation. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005) (CVPR), vol. 01, pp. 430–436, June 2005
Li, J., et al.: Multiple-Human Parsing in the Wild. ArXiv e-prints, May 2017
Liang, X., et al.: Human parsing with contextualized convolutional neural network. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1386–1394, December 2015
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440, June 2015
Mostajabi, M., Yadollahpour, P., Shakhnarovich, G.: Feedforward semantic segmentation with zoom-out features. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 00, pp. 3376–3385, June 2015
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: 2015 IEEE International Conference on Computer Vision (ICCV), vol. 00, pp. 1520–1528, December 2015
Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on hamilton-jacobi formulations. J. Comput. Phys. 79, 12–49 (1988)
Park, S., Yoo, J.H.: Human segmentation based on grabcut in real-time video sequences. In: 2014 IEEE International Conference on Consumer Electronics (ICCE), pp. 111–112, January 2014
Ramadan, H., Tairi, H.: Automatic human segmentation in video using convex active contours. In: 2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV), pp. 184–189, March 2016
Shen, X., et al.: Automatic portrait segmentation for image stylization. In: Proceedings of the 37th Annual Conference of the European Association for Computer Graphics (2016)
Shi, Y., Karl, W.C.: Real-time tracking using level sets. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 34–41, June 2005
Song, C., Huang, Y., Wang, Z., Wang, L.: 1000fps human segmentation with deep convolutional neural networks. In: Pattern Recognition, pp. 474–478 (2016)
Tan, Y., Guo, Y., Gao, C.: Background subtraction based level sets for human segmentation in thermal infrared surveillance systems. Infrared Phys. Technol. 61(5), 230–240 (2013)
Wu, X., Du, M., Chen, W., Li, Z.: Exploiting deep convolutional network and patch-level CRFs for indoor semantic segmentation. In: 2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA), pp. 150–155, June 2016
Wu, Z., Huang, Y., Yu, Y., Wang, L., Tan, T.: Early Hierarchical Contexts Learned by Convolutional Networks for Image Segmentation. In: Proceedings of the 22nd International Conference on Pattern Recognition, pp. 1538–1543. IEEE (2014)
Zhao, T., Nevatia, R.: Stochastic human segmentation from a static camera. In: Proceedings of the Workshop on Motion and Video Computing, pp. 9–14, December 2002
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, T., Lang, C., Xing, J. (2019). Realtime Human Segmentation in Video. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-05716-9_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05715-2
Online ISBN: 978-3-030-05716-9
eBook Packages: Computer ScienceComputer Science (R0)