Skip to main content

Abstract

We present a novel semi-supervised video segmentation algorithm which delivers pixel labels along with their uncertainty estimates. The underlying probabilistic model is a temporal tree-structured Markov Random Field. Our algorithm takes as input user labeled key frame(s) of a video sequence. We then infer the marginal class posteriors of the unlabeled pixels. These posteriors are used to learn pixel unaries by training a decision forest in a semi-supervised manner. We term this the soft label Random Forest (slRF), in which the pixel posterior is treated as its vector label at training time. This allows us to use the standard Shannon entropy-based information gain as objective function, in an iterative, self-training semi-supervised framework. This is in contrast to the transductive forest of Chap. 8 which uses separate entropy measures for labeled and unlabeled data, respectively. We demonstrate the efficacy of our approach in foreground/background segmentation problems, based on quantitative studies on the challenging SegTrack dataset. We envisage our results to have wide applicability, including harvesting labeled video data for several applications such as action recognition, shape learning and developing priors for video segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Badrinarayanan V, Galasso F, Cipolla R (2010) Label propagation in video sequences. In: Proc IEEE conf computer vision and pattern recognition (CVPR)

    Google Scholar 

  2. Bai X, Wang J, Simons D, Sapiro G (2009) Video SnapCut: robust video object cutout using localized classifiers. In: ACM SIGGRAPH

    Google Scholar 

  3. Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    Google Scholar 

  4. Boykov Y, Jolly M-P (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In: Proc IEEE intl conf on computer vision (ICCV), Vancouver, Canada, July 2001, vol 1

    Google Scholar 

  5. Boykov Y, Veksler O, Jolly M-P (1999) Fast approximate energy minimization via graph cuts. In: Proc IEEE intl conf on computer vision (ICCV), Kerkyra, Corfu, Greece, September 1999, vol 1

    Google Scholar 

  6. Breiman L (2001) Random forests. Mach Learn 45(1)

    Google Scholar 

  7. Brostow GJ, Shotton J, Fauqueur J, Cipolla R (2008) Segmentation and recognition using structure from motion point clouds. In: Proc European conf on computer vision (ECCV). Springer, Berlin

    Google Scholar 

  8. Brox T, Malik J (2010) Object segmentation by long term analysis of point trajectories. In: Proc European conf on computer vision (ECCV). Springer, Berlin

    Google Scholar 

  9. Budvytis I, Badrinarayanan V, Cipolla R (2011) Semi-supervised video segmentation using tree structured graphical models. In: Proc IEEE conf computer vision and pattern recognition (CVPR)

    Google Scholar 

  10. Cheung V, Frey BJ, Jojic N (2005) Video epitomes. In: Proc IEEE conf computer vision and pattern recognition (CVPR), June 2005, vol 1

    Google Scholar 

  11. Chockalingam P, Pradeep N, Birchfield S (2009) Adaptive fragments-based tracking of non-rigid objects using level sets. In: Proc IEEE intl conf on computer vision (ICCV)

    Google Scholar 

  12. Fathi A, Balcan M, Ren X, Rehg JM (2011) Combining self training and active learning for video segmentation. In: Proc British machine vision conference (BMVC)

    Google Scholar 

  13. Grundmann M, Kwatra V, Han M, Essa I (2010) Efficient hierarchical graph based video segmentation. In: Proc IEEE conf computer vision and pattern recognition (CVPR)

    Google Scholar 

  14. Hinton GE (2010) Learning to represent visual input. Philos Trans R Soc B 365

    Google Scholar 

  15. Jojic N, Frey BJ, Kannan A (2003) Epitomic analysis of appearance and shape. In: Proc IEEE intl conf on computer vision (ICCV), Nice, France, October 2003, vol 1

    Google Scholar 

  16. Kannan A, Winn J, Rother C (2006) Clustering appearance and shape by learning jigsaws. In: Advances in neural information processing systems (NIPS)

    Google Scholar 

  17. Kohli P, Torr PHS (2005) Efficiently solving dynamic Markov random fields using graph cuts. In: Proc IEEE intl conf on computer vision (ICCV), Beijing, China, October 2005, vol 2

    Google Scholar 

  18. Kontschieder P, Rota Buló S, Bischof H, Pelillo M (2011) Structured class-labels in random forests for semantic image labelling. In: Proc IEEE intl conf on computer vision (ICCV), Barcelona, Spain

    Google Scholar 

  19. Kumar S, Hebert M (2003) Discriminative random fields: a discriminative framework for contextual interaction in classification. In: Proc IEEE intl conf on computer vision (ICCV), October 2003, vol 2

    Google Scholar 

  20. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proc IEEE conf computer vision and pattern recognition (CVPR)

    Google Scholar 

  21. Lee YJ, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: Proc IEEE intl conf on computer vision (ICCV)

    Google Scholar 

  22. Lezama J, Alahari K, Sivic J, Laptev I (2011) Track to the future: spatio-temporal video segmentation with long-range motion cues. In: Proc IEEE conf computer vision and pattern recognition (CVPR)

    Google Scholar 

  23. Li Y, Sun J, Shum H-Y (2005) Video object cut and paste. ACM Trans Graph 24

    Google Scholar 

  24. Nowozin S, Rother C, Bagon S, Sharp T, Yao B, Kohli P (2011) Decision tree fields. In: Proc IEEE intl conf on computer vision (ICCV)

    Google Scholar 

  25. Rother C, Kolmogorov V, Blake A (2004) GrabCut—interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3)

    Google Scholar 

  26. Saul LK, Jordan MI (1996) Exploiting tractable substructures in intractable networks. In: Advances in neural information processing systems (NIPS)

    Google Scholar 

  27. Settles B (2010) Active learning literature survey. Technical report, Computer Sciences Technical Report 1648, University of Wisconsin Madison

    Google Scholar 

  28. Shotton J, Johnson M, Cipolla R (2008) Semantic texton forests for image categorization and segmentation. In: Proc IEEE conf computer vision and pattern recognition (CVPR)

    Google Scholar 

  29. Shotton J, Fitzgibbon AW, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from a single depth image. In: Proc IEEE conf computer vision and pattern recognition (CVPR)

    Google Scholar 

  30. Sudderth EB, Jordan MI (2008) Shared segmentation of natural scenes using dependent Pitman-Yor processes. In: Advances in neural information processing systems (NIPS)

    Google Scholar 

  31. Tsai D, Flagg M, Rehg JM (2010) Motion coherent tracking with multi-label MRF optimization. In: Proc British machine vision conference (BMVC)

    Google Scholar 

  32. Tu Z, Bai X (2010) Auto-context and its application to high-level vision tasks and 3D brain image segmentation. IEEE Trans Pattern Anal Mach Intell 32(10)

    Google Scholar 

  33. Vazquez-Reina A, Avidan S, Pfister H, Miller E (2010) Multiple hypothesis video segmentation from superpixel flows. In: Proc European conf on computer vision (ECCV). Springer, Berlin

    Google Scholar 

  34. Vezhnevets A, Ferrari V, Buhmann JM (2012) Weakly supervised structured output learning for semantic segmentation. In: Proc IEEE conf computer vision and pattern recognition (CVPR)

    Google Scholar 

  35. Wang C, Gorce M, Paragios N (2009) Segmentation, ordering and multi-object tracking using graphical models. In: Proc IEEE intl conf on computer vision (ICCV)

    Google Scholar 

  36. Yan R, Yang J, Hauptmann A (2003) Automatically labeling video data using multi-class active learning. In: Proc IEEE intl conf on computer vision (ICCV)

    Google Scholar 

  37. Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Badrinarayanan, V., Budvytis, I., Cipolla, R. (2013). Semi-supervised Video Segmentation Using Decision Forests. In: Criminisi, A., Shotton, J. (eds) Decision Forests for Computer Vision and Medical Image Analysis. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-4929-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4929-3_16

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4928-6

  • Online ISBN: 978-1-4471-4929-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics