Motion Segmentation

Vidal, René; Ma, Yi; Sastry, S. Shankar

doi:10.1007/978-0-387-87811-9_11

René Vidal¹⁶,
Yi Ma¹⁷ &
S. Shankar Sastry¹⁸

Part of the book series: Interdisciplinary Applied Mathematics ((IAM,volume 40))

9602 Accesses

Abstract

The previous two chapters have shown how to use a mixture of subspaces to represent and segment static images. In those cases, different subspaces were used to account for multiple characteristics of natural images, e.g., different textures. In this chapter, we will show how to use a mixture of subspaces to represent and segment time series, e.g., video and motion capture data. In particular, we will use different subspaces to account for multiple characteristics of the dynamics of a time series, such as multiple moving objects or multiple temporal events.

I can calculate the motion of heavenly bodies, but not the madness of people.

—Isaac Newton

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The special Euclidean group is defined as \(SE(3) =\{ (R,\boldsymbol{T}): R \in SO(3),\boldsymbol{T} \in \mathbb{R}^{3}\}\), where \(SO(3) =\{ R \in \mathbb{R}^{3\times 3}: R^{\top }R = I\ \text{and}\ \det (R) = 1\}\) is the special orthogonal group.
2.
The inverse of \(g \in SE(3)\) is \(g^{-1} = (R^{\top },-R^{\top }\boldsymbol{T}) \in SE(3)\), and the product of two transformations \(g_{1} = (R_{1},\boldsymbol{T}_{1})\) and \(g_{2} = (R_{2},\boldsymbol{T}_{2})\) is defined as \(g_{1}g_{2} = (R_{1}R_{2},R_{1}\boldsymbol{T}_{1} +\boldsymbol{ T}_{2})\).

References

Aggarwal, G., Roy-Chowdhury, A., & Chellappa, R. (2004). A system identification approach for video-based face recognition. In Proceedings of International Conference on Pattern Recognition (pp. 23–26).
Google Scholar
Ali, S., Basharat, A., & Shah, M. (2007). Chaotic invariants for human action recognition. In Proceedings of International Conference on Computer Vision.
Google Scholar
Avidan, S., & Shashua, A. (2000). Trajectory triangulation: 3D reconstruction of moving points from a monocular image sequence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(4), 348–357.
Article Google Scholar
Ayazoglu, M., Li, B., Dicle, C., Sznaier, M., & Camps, O. (2011). Dynamic subspace-based coordinated multicamera tracking. In IEEE International Conference on Computer Vision (pp. 2462–2469)
Google Scholar
Barbic, J., Safonova, A., Pan, J.-Y., Faloutsos, C., Hodgins, J. K., & Pollar, N. S. (2004). Segmenting motion capture data into distinct behaviors. In Graphics Interface.
Google Scholar
Béjar, B., Zappella, L., & Vidal, R. (2012). Surgical gesture classification from video data. In Medical Image Computing and Computer Assisted Intervention (pp. 34–41).
Google Scholar
Bissacco, A., Chiuso, A., Ma, Y., & Soatto, S. (2001). Recognition of human gaits. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 52–58).
Google Scholar
Boult, T., & Brown, L. (1991). Factorization-based segmentation of motions. In IEEE Workshop on Motion Understanding (pp. 179–186).
Google Scholar
Chan, A., & Vasconcelos, N. (2005a). Classification and retrieval of traffic video using auto-regressive stochastic processes. In Proceedings of 2005 IEEE Intelligent Vehicles Symposium (pp. 771–776).
Google Scholar
Chan, A., & Vasconcelos, N. (2005b). Mixtures of dynamic textures. In IEEE International Conference on Computer Vision (Vol. 1, pp. 641–647).
Google Scholar
Chaudhry, R., Ravichandran, A., Hager, G., & Vidal, R. (2009). Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In IEEE Conference on Computer Vision and Pattern Recognition.
Google Scholar
CMU (2003). MOCAP database. http://mocap.cs.cmu.edu.
Costeira, J., & Kanade, T. (1998). A multibody factorization method for independently moving objects. International Journal of Computer Vision, 29(3), 159–179.
Article Google Scholar
Doretto, G., Chiuso, A., Wu, Y., & Soatto, S. (2003). Dynamic textures. International Journal of Computer Vision, 51(2), 91–109.
Article MATH Google Scholar
Doretto, G., & Soatto, S. (2003). Editable dynamic textures. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. II, pp. 137–142).
Google Scholar
Doretto, G., & Soatto, S. (2006). Dynamic shape and appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 2006–2019.
Article Google Scholar
Feng, X., & Perona, P. (1998). Scene segmentation from 3D motion. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 225–231).
Google Scholar
Fitzgibbon, A., & Zisserman, A. (2000). Multibody structure and motion: 3D reconstruction of independently moving objects. In European Conference on Computer Vision (pp. 891–906).
Google Scholar
Ghoreyshi, A., & Vidal, R. (2007). Epicardial segmentation in dynamic cardiac MR sequences using priors on shape, intensity, and dynamics, in a level set framework. In IEEE International Symposium on Biomedical Imaging (pp. 860–863).
Google Scholar
Han, M., & Kanade, T. (2000). Reconstruction of a scene with multiple linearly moving objects. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 542–549).
Google Scholar
Han, M., & Kanade, T. (2001). Multiple motion scene reconstruction from uncalibrated views. In Proceedings of IEEE International Conference on Computer Vision (Vol. 1, pp. 163–170).
Google Scholar
Hartley, R., & Vidal, R. (2004). The multibody trifocal tensor: Motion segmentation from 3 perspective views. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. I, pp. 769–775).
Google Scholar
Hyndman, M., Jepson, A., & Fleet, D. J. (2007). Higher-order autoregressive models for dynamic textures. In British Machine Vision Conference (pp. 76.1–76.10). doi:10.5244/C.21.76.
Kanatani, K. (2001). Motion segmentation by subspace separation and model selection. In IEEE International Conference on Computer Vision (Vol. 2, pp. 586–591).
Google Scholar
Kanatani, K., & Matsunaga, C. (2002). Estimating the number of independent motions for multibody motion segmentation. In European Conference on Computer Vision (pp. 25–31).
Google Scholar
Kanatani, K., & Sugaya, Y. (2003). Multi-stage optimization for multi-body motion segmentation. In Australia-Japan Advanced Workshop on Computer Vision (pp. 335–349).
Google Scholar
Kim, S. J., Doretto, G., Rittscher, J., Tu, P., Krahnstoever, N., & Pollefeys, M. (2009). A model change detection approach to dynamic scene modeling. In Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, 2009 (AVSS ’09) (pp. 490–495).
Google Scholar
Li, B., Ayazoglu, M., Mao, T., Camps, O. I., & Sznaier, M. (2011). Activity recognition using dynamic subspace angles. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 3193–3200). New York: IEEE.
Google Scholar
Ma, Y., Soatto, S., Kosecka, J., & Sastry, S. (2003). An Invitation to 3D Vision: From Images to Geometric Models. New York: Springer.
MATH Google Scholar
Nascimento, J. C., Figueiredo, M. A. T., & Marques, J. S. (2005). Recognition of human activities using space dependent switched dynamical models. In IEEE International Conference on Image Processing (pp. 852–855).
Google Scholar
Nunez, F., & Cipriano, A. (2009). Visual information model based predictor for froth speed control in flotation process. Minerals Engineering, 22(4), 366–371.
Article Google Scholar
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (2013). Berkeley MHAD: A comprehensive multimodal human action database. In IEEE Workshop on Applications of Computer Vision.
Google Scholar
Overschee, P. V., & Moor, B. D. (1993). Subspace algorithms for the stochastic identification problem. Automatica, 29(3), 649–660.
Article MathSciNet MATH Google Scholar
Rahimi, A., Darrell, T., & Recht, B. (2005). Learning appearance manifolds from video. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. 1, pp. 868–875).
Google Scholar
Ravichandran, A., Chaudhry, R., & Vidal, R. (2009). View-invariant dynamic texture recognition using a bag of dynamical systems. In IEEE Conference on Computer Vision and Pattern Recognition.
Google Scholar
Ravichandran, A., Chaudhry, R., & Vidal, R. (2013). Categorizing dynamic textures using a bag of dynamical systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2), 342–353.
Article Google Scholar
Ravichandran, A., & Vidal, R. (2008). Video registration using dynamic textures. In European Conference on Computer Vision.
Google Scholar
Ravichandran, A., & Vidal, R. (2011). Video registration using dynamic textures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 158–171.
Article Google Scholar
Ravichandran, A., Vidal, R., & Halperin, H. (2006). Segmenting a beating heart using polysegment and spatial GPCA. In IEEE International Symposium on Biomedical Imaging (pp. 634–637).
Google Scholar
Saisan, P., Bissacco, A., Chiuso, A., & Soatto, S. (2004). Modeling and synthesis of facial motion driven by speech. In European Conference on Computer Vision (Vol. 3, pp. 456–467).
Google Scholar
Shakernia, O., Vidal, R., & Sastry, S. (2003). Multi-body motion estimation and segmentation from multiple central panoramic views. In IEEE International Conference on Robotics and Automation (Vol. 1, pp. 571–576).
Google Scholar
Shashua, A., & Levin, A. (2001). Multi-frame infinitesimal motion model for the reconstruction of (dynamic) scenes with multiple linearly moving objects. In Proceedings of IEEE International Conference on Computer Vision (Vol. 2, pp. 592–599).
Google Scholar
Sturm, P. (2002). Structure and motion for dynamic scenes - the case of points moving in planes. In Proceedings of European Conference on Computer Vision (pp. 867–882).
Google Scholar
Szummer, M., & Picard, R. W. (1996). Temporal texture modeling. In IEEE International Conference on Image Processing (Vol. 3, pp. 823–826).
Google Scholar
Torr, P., Szeliski, R., & Anandan, P. (2001). An integrated Bayesian approach to layer extraction from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 297–303.
Article Google Scholar
Torr, P. H. S. (1998). Geometric motion segmentation and model selection. Philosophical Transactions of the Royal Society of London, 356(1740), 1321–1340.
Article MathSciNet MATH Google Scholar
Tron, R., & Vidal, R. (2007). A benchmark for the comparison of 3-D motion segmentation algorithms. In IEEE Conference on Computer Vision and Pattern Recognition.
Google Scholar
Turaga, P., Veeraraghavan, A., Srivastava, A., & Chellappa, R. (2011). Statistical computations on special manifolds for image and video-based recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11), 2273–2286.
Article Google Scholar
Vidal, R. (2005). Multi-subspace methods for motion segmentation from affine, perspective and central panoramic cameras. In IEEE Conference on Robotics and Automation (pp. 1753–1758).
Google Scholar
Vidal, R., & Hartley, R. (2004). Motion segmentation with missing data by PowerFactorization and Generalized PCA. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. II, pp. 310–316).
Google Scholar
Vidal, R., & Ma, Y. (2004). A unified algebraic approach to 2-D and 3-D motion segmentation. In European Conference on Computer Vision (pp. 1–15).
Google Scholar
Vidal, R., Ma, Y., Soatto, S., & Sastry, S. (2006). Two-view multibody structure from motion. International Journal of Computer Vision, 68(1), 7–25.
Article Google Scholar
Vidal, R., & Ravichandran, A. (2005). Optical flow estimation and segmentation of multiple moving dynamic textures. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. II, pp. 516–521).
Google Scholar
Vidal, R., & Sastry, S. (2003). Optimal segmentation of dynamic scenes from two perspective views. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 281–286).
Google Scholar
Vidal, R., Soatto, S., Ma, Y., & Sastry, S. (2002b). Segmentation of dynamic scenes from the multibody fundamental matrix. In ECCV Workshop on Visual Modeling of Dynamic Scenes.
Google Scholar
Wang, J. M., Fleet, D. J., & Hertzmann, A. (2008b). Gaussian process dynamical models for human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 283–298.
Google Scholar
Wolf, L., & Shashua, A. (2001a). Affine 3-D reconstruction from two projective images of independently translating planes. In Proceedings of IEEE International Conference on Computer Vision (pp. 238–244).
Google Scholar
Wolf, L., & Shashua, A. (2001b). Two-body segmentation from two perspective views. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 263–270).
Google Scholar
Woolfe, F., & Fitzgibbon, A. (2006). Shift-invariant dynamic texture recognition. In Proceedings of European Conference on Computer Vision, pages II: 549–562.
Google Scholar
Wu, Y., Zhang, Z., Huang, T., & Lin, J. (2001). Multibody grouping via orthogonal subspace decomposition. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 252–257).
Google Scholar
Xiong, F., Camps, O., & Sznaier, M. (2011). Low order dynamics embedding for high dimensional time series. In IEEE International Conference on Computer Vision (pp. 2368–2374).
Google Scholar
Xiong, F., Camps, O., & Sznaier, M. (2012). Dynamic context for tracking behind occlusions. In European Conference on Computer Vision. Lecture notes in computer science (Vol. 7576, pp. 580–593). Berlin/Heidelberg: Springer.
Google Scholar
Yuan, L., Wen, F., Liu, C., & Shum, H. (2004). Synthesizing dynamic texture with closed-loop linear dynamic system. In European Conference on Computer Vision (pp. 603–616).
Google Scholar
Zelnik-Manor, L., & Irani, M. (2003). Degeneracies, dependencies and their implications in multi-body and multi-sequence factorization. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 287–293).
Google Scholar
Zhang, T., Szlam, A., Wang, Y., & Lerman, G. (2010). Randomized hybrid linear modeling via local best-fit flats. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1927–1934).
Google Scholar
Zhou, F., la Torre, F. D., & Hodgins, J. K. (2008). Aligned cluster analysis for temporal segmentation of human motion. In International Conference on Automatic Face and Gesture Recognition.
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Imaging Science Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
René Vidal
School of Information Science and Technology ShanghaiTech University, Shanghai, China
Yi Ma
Department of Electrical Engineering and Computer Science, University of California Berkeley, Berkeley, CA, USA
S. Shankar Sastry

Authors

René Vidal
View author publications
You can also search for this author in PubMed Google Scholar
Yi Ma
View author publications
You can also search for this author in PubMed Google Scholar
S. Shankar Sastry
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vidal, R., Ma, Y., Sastry, S.S. (2016). Motion Segmentation. In: Generalized Principal Component Analysis. Interdisciplinary Applied Mathematics, vol 40. Springer, New York, NY. https://doi.org/10.1007/978-0-387-87811-9_11

Download citation

DOI: https://doi.org/10.1007/978-0-387-87811-9_11
Published: 12 April 2016
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-87810-2
Online ISBN: 978-0-387-87811-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics