Hierarchical Supervoxel Graph for Interactive Video Object Representation and Segmentation

  • Xiang Fu
  • Changhu WangEmail author
  • C.-C. Jay Kuo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10111)


In this paper, we study the problem of how to represent and segment objects in a video. To handle the motion and variations of the internal regions of objects, we present an interactive hierarchical supervoxel representation for video object segmentation. First, a hierarchical supervoxel graph with various granularities is built based on local clustering and region merging to represent the video, in which both color histogram and motion information are leveraged in the feature space, and visual saliency is also taken into account as merging guidance to build the graph. Then, a supervoxel selection algorithm is introduced to choose supervoxels with diverse granularities to represent the object(s) labeled by the user. Finally, based on above representations, an interactive video object segmentation framework is proposed to handle complex and diverse scenes with large motion and occlusions. The experimental results show the effectiveness of the proposed algorithms in supervoxel graph construction and video object segmentation.


  1. 1.
    Li, Y., Sun, J., Shum, H.Y.: Video object cut and paste. ACM Trans. Graph. (TOG) 24, 595–600 (2005)CrossRefGoogle Scholar
  2. 2.
    Yuri, B., Marie-Pierre, J.: Interactive graph cuts for optimal boundaryand region segmentation of objects in N-D images. In: 2001 IEEE International Conference on Computer Vision (ICCV), vol. 112 (2001)Google Scholar
  3. 3.
    Wang, J., Bhat, P., Colburn, R.A., Agrawala, M., Cohen, M.F.: Interactive video cutout. ACM Trans. Grap. (TOG) 24, 585–594 (2005)CrossRefGoogle Scholar
  4. 4.
    Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24, 603–619 (2002)CrossRefGoogle Scholar
  5. 5.
    Price, B.L., Morse, B.S., Cohen, S.: Livecut: Learning-based interactive video segmentation by evaluation of multiple propagated cues. In: 2009 IEEE International Conference on Computer Vision (ICCV), pp. 779–786. IEEE (2009)Google Scholar
  6. 6.
    Bai, X., Wang, J., Simons, D., Sapiro, G.: Video snapcut: robust video object cutout using localized classifiers. ACM Trans. Graph. (TOG) 28, 70 (2009)CrossRefGoogle Scholar
  7. 7.
    Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2274–2282 (2012)CrossRefGoogle Scholar
  8. 8.
    Xu, C., Corso, J.J.: Evaluation of super-voxel methods for early video processing. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1202–1209. IEEE (2012)Google Scholar
  9. 9.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 888–905 (2000)CrossRefGoogle Scholar
  10. 10.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vision 59, 167–181 (2004)CrossRefGoogle Scholar
  11. 11.
    Vedaldi, A., Soatto, S.: Quick shift and kernel methods for mode seeking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 705–718. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-88693-8_52 CrossRefGoogle Scholar
  12. 12.
    Levinshtein, A., Stere, A., Kutulakos, K.N., Fleet, D.J., Dickinson, S.J., Siddiqi, K.: Turbopixels: fast superpixels using geometric flows. IEEE Trans. Pattern Anal. Mach. Intell. 31, 2290–2297 (2009)CrossRefGoogle Scholar
  13. 13.
    Fu, X., Wang, C.Y., Chen, C., Wang, C., Kuo, C.C.J.: Robust image segmentation using contour-guided color palettes. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1618–1625. IEEE (2015)Google Scholar
  14. 14.
    Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2141–2148. IEEE (2010)Google Scholar
  15. 15.
    Van den Bergh, M., Van Gool, L.: Real-time stereo and flow-based video segmentation with superpixels. In: 2012 IEEE Workshop on Applications of Computer Vision (WACV), pp. 89–96. IEEE (2012)Google Scholar
  16. 16.
    Palou, G., Salembier, P.: Hierarchical video representation with trajectory binary partition tree. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2099–2106. IEEE (2013)Google Scholar
  17. 17.
    Chang, J., Wei, D., Fisher, J.W.: A video representation using temporal superpixels. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2051–2058. IEEE (2013)Google Scholar
  18. 18.
    Reso, M., Jachalsky, J., Rosenhahn, B., Ostermann, J.: Temporally consistent superpixels. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 385–392. IEEE (2013)Google Scholar
  19. 19.
    Xu, C., Whitt, S., Corso, J.: Flattening supervoxel hierarchies by the uniform entropy slice. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 2240–2247 (2013)Google Scholar
  20. 20.
    Liu, J., Sun, J., Shum, H.Y.: Paint selection. ACM Trans. Graph. (ToG) 28, 69 (2009)Google Scholar
  21. 21.
    Ren, X., Malik, J.: Learning a classification model for segmentation. In: 2003 IEEE International Conference on Computer Vision (ICCV), pp. 10–17. IEEE (2003)Google Scholar
  22. 22.
    Liu, C.: Beyond pixels: exploring new representations and applications for motion analysis. PhD thesis. Citeseer (2009)Google Scholar
  23. 23.
    Jiang, H., Wang, J., Yuan, Z., Liu, T., Zheng, N., Li, S.: Automatic salient object segmentation based on context and shape prior. In: Proceedings of the British Machine Vision Conference, vol. 6, p. 9 (2011)Google Scholar
  24. 24.
    Tsai, D., Flagg, M., Nakazawa, A., Rehg, J.M.: Motion coherent tracking using multi-label MRF optimization. Int. J. Comput. Vis. 100, 190–202 (2012)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE (2008)Google Scholar
  26. 26.
    McGuinness, K., O’connor, N.E.: A comparative evaluation of interactive segmentation algorithms. Pattern Recogn. 43, 434–444 (2010)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.University of Southern CaliforniaLos AngelesUSA
  2. 2.Microsoft ResearchBeijingChina

Personalised recommendations