Semi-supervised Video Segmentation Using Decision Forests

Badrinarayanan, V.; Budvytis, I.; Cipolla, R.

doi:10.1007/978-1-4471-4929-3_16

V. Badrinarayanan³,
I. Budvytis³ &
R. Cipolla³

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

7243 Accesses
1 Citations

Abstract

We present a novel semi-supervised video segmentation algorithm which delivers pixel labels along with their uncertainty estimates. The underlying probabilistic model is a temporal tree-structured Markov Random Field. Our algorithm takes as input user labeled key frame(s) of a video sequence. We then infer the marginal class posteriors of the unlabeled pixels. These posteriors are used to learn pixel unaries by training a decision forest in a semi-supervised manner. We term this the soft label Random Forest (slRF), in which the pixel posterior is treated as its vector label at training time. This allows us to use the standard Shannon entropy-based information gain as objective function, in an iterative, self-training semi-supervised framework. This is in contrast to the transductive forest of Chap. 8 which uses separate entropy measures for labeled and unlabeled data, respectively. We demonstrate the efficacy of our approach in foreground/background segmentation problems, based on quantitative studies on the challenging SegTrack dataset. We envisage our results to have wide applicability, including harvesting labeled video data for several applications such as action recognition, shape learning and developing priors for video segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Badrinarayanan V, Galasso F, Cipolla R (2010) Label propagation in video sequences. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Bai X, Wang J, Simons D, Sapiro G (2009) Video SnapCut: robust video object cutout using localized classifiers. In: ACM SIGGRAPH
Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Google Scholar
Boykov Y, Jolly M-P (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In: Proc IEEE intl conf on computer vision (ICCV), Vancouver, Canada, July 2001, vol 1
Google Scholar
Boykov Y, Veksler O, Jolly M-P (1999) Fast approximate energy minimization via graph cuts. In: Proc IEEE intl conf on computer vision (ICCV), Kerkyra, Corfu, Greece, September 1999, vol 1
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1)
Google Scholar
Brostow GJ, Shotton J, Fauqueur J, Cipolla R (2008) Segmentation and recognition using structure from motion point clouds. In: Proc European conf on computer vision (ECCV). Springer, Berlin
Google Scholar
Brox T, Malik J (2010) Object segmentation by long term analysis of point trajectories. In: Proc European conf on computer vision (ECCV). Springer, Berlin
Google Scholar
Budvytis I, Badrinarayanan V, Cipolla R (2011) Semi-supervised video segmentation using tree structured graphical models. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Cheung V, Frey BJ, Jojic N (2005) Video epitomes. In: Proc IEEE conf computer vision and pattern recognition (CVPR), June 2005, vol 1
Google Scholar
Chockalingam P, Pradeep N, Birchfield S (2009) Adaptive fragments-based tracking of non-rigid objects using level sets. In: Proc IEEE intl conf on computer vision (ICCV)
Google Scholar
Fathi A, Balcan M, Ren X, Rehg JM (2011) Combining self training and active learning for video segmentation. In: Proc British machine vision conference (BMVC)
Google Scholar
Grundmann M, Kwatra V, Han M, Essa I (2010) Efficient hierarchical graph based video segmentation. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Hinton GE (2010) Learning to represent visual input. Philos Trans R Soc B 365
Google Scholar
Jojic N, Frey BJ, Kannan A (2003) Epitomic analysis of appearance and shape. In: Proc IEEE intl conf on computer vision (ICCV), Nice, France, October 2003, vol 1
Google Scholar
Kannan A, Winn J, Rother C (2006) Clustering appearance and shape by learning jigsaws. In: Advances in neural information processing systems (NIPS)
Google Scholar
Kohli P, Torr PHS (2005) Efficiently solving dynamic Markov random fields using graph cuts. In: Proc IEEE intl conf on computer vision (ICCV), Beijing, China, October 2005, vol 2
Google Scholar
Kontschieder P, Rota Buló S, Bischof H, Pelillo M (2011) Structured class-labels in random forests for semantic image labelling. In: Proc IEEE intl conf on computer vision (ICCV), Barcelona, Spain
Google Scholar
Kumar S, Hebert M (2003) Discriminative random fields: a discriminative framework for contextual interaction in classification. In: Proc IEEE intl conf on computer vision (ICCV), October 2003, vol 2
Google Scholar
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Lee YJ, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: Proc IEEE intl conf on computer vision (ICCV)
Google Scholar
Lezama J, Alahari K, Sivic J, Laptev I (2011) Track to the future: spatio-temporal video segmentation with long-range motion cues. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Li Y, Sun J, Shum H-Y (2005) Video object cut and paste. ACM Trans Graph 24
Google Scholar
Nowozin S, Rother C, Bagon S, Sharp T, Yao B, Kohli P (2011) Decision tree fields. In: Proc IEEE intl conf on computer vision (ICCV)
Google Scholar
Rother C, Kolmogorov V, Blake A (2004) GrabCut—interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3)
Google Scholar
Saul LK, Jordan MI (1996) Exploiting tractable substructures in intractable networks. In: Advances in neural information processing systems (NIPS)
Google Scholar
Settles B (2010) Active learning literature survey. Technical report, Computer Sciences Technical Report 1648, University of Wisconsin Madison
Google Scholar
Shotton J, Johnson M, Cipolla R (2008) Semantic texton forests for image categorization and segmentation. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Shotton J, Fitzgibbon AW, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from a single depth image. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Sudderth EB, Jordan MI (2008) Shared segmentation of natural scenes using dependent Pitman-Yor processes. In: Advances in neural information processing systems (NIPS)
Google Scholar
Tsai D, Flagg M, Rehg JM (2010) Motion coherent tracking with multi-label MRF optimization. In: Proc British machine vision conference (BMVC)
Google Scholar
Tu Z, Bai X (2010) Auto-context and its application to high-level vision tasks and 3D brain image segmentation. IEEE Trans Pattern Anal Mach Intell 32(10)
Google Scholar
Vazquez-Reina A, Avidan S, Pfister H, Miller E (2010) Multiple hypothesis video segmentation from superpixel flows. In: Proc European conf on computer vision (ECCV). Springer, Berlin
Google Scholar
Vezhnevets A, Ferrari V, Buhmann JM (2012) Weakly supervised structured output learning for semantic segmentation. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Wang C, Gorce M, Paragios N (2009) Segmentation, ordering and multi-object tracking using graphical models. In: Proc IEEE intl conf on computer vision (ICCV)
Google Scholar
Yan R, Yang J, Hauptmann A (2003) Automatically labeling video data using multi-class active learning. In: Proc IEEE intl conf on computer vision (ICCV)
Google Scholar
Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University
Google Scholar

Download references

Author information

Authors and Affiliations

University of Cambridge, Cambridge, UK
V. Badrinarayanan, I. Budvytis & R. Cipolla

Authors

V. Badrinarayanan
View author publications
You can also search for this author in PubMed Google Scholar
I. Budvytis
View author publications
You can also search for this author in PubMed Google Scholar
R. Cipolla
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., 7 J.J. Thomson Avenue, Cambridge, CB3 0FB, United Kingdom
A. Criminisi
Microsoft Research Ltd., 7 J.J. Thomson Avenue, Cambridge, CB3 0FB, United Kingdom
J. Shotton

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Badrinarayanan, V., Budvytis, I., Cipolla, R. (2013). Semi-supervised Video Segmentation Using Decision Forests. In: Criminisi, A., Shotton, J. (eds) Decision Forests for Computer Vision and Medical Image Analysis. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-4929-3_16

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4929-3_16
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4928-6
Online ISBN: 978-1-4471-4929-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics