Abstract
We present a novel semi-supervised video segmentation algorithm which delivers pixel labels along with their uncertainty estimates. The underlying probabilistic model is a temporal tree-structured Markov Random Field. Our algorithm takes as input user labeled key frame(s) of a video sequence. We then infer the marginal class posteriors of the unlabeled pixels. These posteriors are used to learn pixel unaries by training a decision forest in a semi-supervised manner. We term this the soft label Random Forest (slRF), in which the pixel posterior is treated as its vector label at training time. This allows us to use the standard Shannon entropy-based information gain as objective function, in an iterative, self-training semi-supervised framework. This is in contrast to the transductive forest of Chap. 8 which uses separate entropy measures for labeled and unlabeled data, respectively. We demonstrate the efficacy of our approach in foreground/background segmentation problems, based on quantitative studies on the challenging SegTrack dataset. We envisage our results to have wide applicability, including harvesting labeled video data for several applications such as action recognition, shape learning and developing priors for video segmentation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Badrinarayanan V, Galasso F, Cipolla R (2010) Label propagation in video sequences. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Bai X, Wang J, Simons D, Sapiro G (2009) Video SnapCut: robust video object cutout using localized classifiers. In: ACM SIGGRAPH
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Boykov Y, Jolly M-P (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In: Proc IEEE intl conf on computer vision (ICCV), Vancouver, Canada, July 2001, vol 1
Boykov Y, Veksler O, Jolly M-P (1999) Fast approximate energy minimization via graph cuts. In: Proc IEEE intl conf on computer vision (ICCV), Kerkyra, Corfu, Greece, September 1999, vol 1
Breiman L (2001) Random forests. Mach Learn 45(1)
Brostow GJ, Shotton J, Fauqueur J, Cipolla R (2008) Segmentation and recognition using structure from motion point clouds. In: Proc European conf on computer vision (ECCV). Springer, Berlin
Brox T, Malik J (2010) Object segmentation by long term analysis of point trajectories. In: Proc European conf on computer vision (ECCV). Springer, Berlin
Budvytis I, Badrinarayanan V, Cipolla R (2011) Semi-supervised video segmentation using tree structured graphical models. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Cheung V, Frey BJ, Jojic N (2005) Video epitomes. In: Proc IEEE conf computer vision and pattern recognition (CVPR), June 2005, vol 1
Chockalingam P, Pradeep N, Birchfield S (2009) Adaptive fragments-based tracking of non-rigid objects using level sets. In: Proc IEEE intl conf on computer vision (ICCV)
Fathi A, Balcan M, Ren X, Rehg JM (2011) Combining self training and active learning for video segmentation. In: Proc British machine vision conference (BMVC)
Grundmann M, Kwatra V, Han M, Essa I (2010) Efficient hierarchical graph based video segmentation. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Hinton GE (2010) Learning to represent visual input. Philos Trans R Soc B 365
Jojic N, Frey BJ, Kannan A (2003) Epitomic analysis of appearance and shape. In: Proc IEEE intl conf on computer vision (ICCV), Nice, France, October 2003, vol 1
Kannan A, Winn J, Rother C (2006) Clustering appearance and shape by learning jigsaws. In: Advances in neural information processing systems (NIPS)
Kohli P, Torr PHS (2005) Efficiently solving dynamic Markov random fields using graph cuts. In: Proc IEEE intl conf on computer vision (ICCV), Beijing, China, October 2005, vol 2
Kontschieder P, Rota Buló S, Bischof H, Pelillo M (2011) Structured class-labels in random forests for semantic image labelling. In: Proc IEEE intl conf on computer vision (ICCV), Barcelona, Spain
Kumar S, Hebert M (2003) Discriminative random fields: a discriminative framework for contextual interaction in classification. In: Proc IEEE intl conf on computer vision (ICCV), October 2003, vol 2
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Lee YJ, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: Proc IEEE intl conf on computer vision (ICCV)
Lezama J, Alahari K, Sivic J, Laptev I (2011) Track to the future: spatio-temporal video segmentation with long-range motion cues. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Li Y, Sun J, Shum H-Y (2005) Video object cut and paste. ACM Trans Graph 24
Nowozin S, Rother C, Bagon S, Sharp T, Yao B, Kohli P (2011) Decision tree fields. In: Proc IEEE intl conf on computer vision (ICCV)
Rother C, Kolmogorov V, Blake A (2004) GrabCut—interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3)
Saul LK, Jordan MI (1996) Exploiting tractable substructures in intractable networks. In: Advances in neural information processing systems (NIPS)
Settles B (2010) Active learning literature survey. Technical report, Computer Sciences Technical Report 1648, University of Wisconsin Madison
Shotton J, Johnson M, Cipolla R (2008) Semantic texton forests for image categorization and segmentation. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Shotton J, Fitzgibbon AW, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from a single depth image. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Sudderth EB, Jordan MI (2008) Shared segmentation of natural scenes using dependent Pitman-Yor processes. In: Advances in neural information processing systems (NIPS)
Tsai D, Flagg M, Rehg JM (2010) Motion coherent tracking with multi-label MRF optimization. In: Proc British machine vision conference (BMVC)
Tu Z, Bai X (2010) Auto-context and its application to high-level vision tasks and 3D brain image segmentation. IEEE Trans Pattern Anal Mach Intell 32(10)
Vazquez-Reina A, Avidan S, Pfister H, Miller E (2010) Multiple hypothesis video segmentation from superpixel flows. In: Proc European conf on computer vision (ECCV). Springer, Berlin
Vezhnevets A, Ferrari V, Buhmann JM (2012) Weakly supervised structured output learning for semantic segmentation. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Wang C, Gorce M, Paragios N (2009) Segmentation, ordering and multi-object tracking using graphical models. In: Proc IEEE intl conf on computer vision (ICCV)
Yan R, Yang J, Hauptmann A (2003) Automatically labeling video data using multi-class active learning. In: Proc IEEE intl conf on computer vision (ICCV)
Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag London
About this chapter
Cite this chapter
Badrinarayanan, V., Budvytis, I., Cipolla, R. (2013). Semi-supervised Video Segmentation Using Decision Forests. In: Criminisi, A., Shotton, J. (eds) Decision Forests for Computer Vision and Medical Image Analysis. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-4929-3_16
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4929-3_16
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4928-6
Online ISBN: 978-1-4471-4929-3
eBook Packages: Computer ScienceComputer Science (R0)