Abstract
The previous two chapters have shown how to use a mixture of subspaces to represent and segment static images. In those cases, different subspaces were used to account for multiple characteristics of natural images, e.g., different textures. In this chapter, we will show how to use a mixture of subspaces to represent and segment time series, e.g., video and motion capture data. In particular, we will use different subspaces to account for multiple characteristics of the dynamics of a time series, such as multiple moving objects or multiple temporal events.
I can calculate the motion of heavenly bodies, but not the madness of people.
—Isaac Newton
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The special Euclidean group is defined as \(SE(3) =\{ (R,\boldsymbol{T}): R \in SO(3),\boldsymbol{T} \in \mathbb{R}^{3}\}\), where \(SO(3) =\{ R \in \mathbb{R}^{3\times 3}: R^{\top }R = I\ \text{and}\ \det (R) = 1\}\) is the special orthogonal group.
- 2.
The inverse of \(g \in SE(3)\) is \(g^{-1} = (R^{\top },-R^{\top }\boldsymbol{T}) \in SE(3)\), and the product of two transformations \(g_{1} = (R_{1},\boldsymbol{T}_{1})\) and \(g_{2} = (R_{2},\boldsymbol{T}_{2})\) is defined as \(g_{1}g_{2} = (R_{1}R_{2},R_{1}\boldsymbol{T}_{1} +\boldsymbol{ T}_{2})\).
References
Aggarwal, G., Roy-Chowdhury, A., & Chellappa, R. (2004). A system identification approach for video-based face recognition. In Proceedings of International Conference on Pattern Recognition (pp. 23–26).
Ali, S., Basharat, A., & Shah, M. (2007). Chaotic invariants for human action recognition. In Proceedings of International Conference on Computer Vision.
Avidan, S., & Shashua, A. (2000). Trajectory triangulation: 3D reconstruction of moving points from a monocular image sequence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(4), 348–357.
Ayazoglu, M., Li, B., Dicle, C., Sznaier, M., & Camps, O. (2011). Dynamic subspace-based coordinated multicamera tracking. In IEEE International Conference on Computer Vision (pp. 2462–2469)
Barbic, J., Safonova, A., Pan, J.-Y., Faloutsos, C., Hodgins, J. K., & Pollar, N. S. (2004). Segmenting motion capture data into distinct behaviors. In Graphics Interface.
Béjar, B., Zappella, L., & Vidal, R. (2012). Surgical gesture classification from video data. In Medical Image Computing and Computer Assisted Intervention (pp. 34–41).
Bissacco, A., Chiuso, A., Ma, Y., & Soatto, S. (2001). Recognition of human gaits. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 52–58).
Boult, T., & Brown, L. (1991). Factorization-based segmentation of motions. In IEEE Workshop on Motion Understanding (pp. 179–186).
Chan, A., & Vasconcelos, N. (2005a). Classification and retrieval of traffic video using auto-regressive stochastic processes. In Proceedings of 2005 IEEE Intelligent Vehicles Symposium (pp. 771–776).
Chan, A., & Vasconcelos, N. (2005b). Mixtures of dynamic textures. In IEEE International Conference on Computer Vision (Vol. 1, pp. 641–647).
Chaudhry, R., Ravichandran, A., Hager, G., & Vidal, R. (2009). Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In IEEE Conference on Computer Vision and Pattern Recognition.
CMU (2003). MOCAP database. http://mocap.cs.cmu.edu.
Costeira, J., & Kanade, T. (1998). A multibody factorization method for independently moving objects. International Journal of Computer Vision, 29(3), 159–179.
Doretto, G., Chiuso, A., Wu, Y., & Soatto, S. (2003). Dynamic textures. International Journal of Computer Vision, 51(2), 91–109.
Doretto, G., & Soatto, S. (2003). Editable dynamic textures. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. II, pp. 137–142).
Doretto, G., & Soatto, S. (2006). Dynamic shape and appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 2006–2019.
Feng, X., & Perona, P. (1998). Scene segmentation from 3D motion. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 225–231).
Fitzgibbon, A., & Zisserman, A. (2000). Multibody structure and motion: 3D reconstruction of independently moving objects. In European Conference on Computer Vision (pp. 891–906).
Ghoreyshi, A., & Vidal, R. (2007). Epicardial segmentation in dynamic cardiac MR sequences using priors on shape, intensity, and dynamics, in a level set framework. In IEEE International Symposium on Biomedical Imaging (pp. 860–863).
Han, M., & Kanade, T. (2000). Reconstruction of a scene with multiple linearly moving objects. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 542–549).
Han, M., & Kanade, T. (2001). Multiple motion scene reconstruction from uncalibrated views. In Proceedings of IEEE International Conference on Computer Vision (Vol. 1, pp. 163–170).
Hartley, R., & Vidal, R. (2004). The multibody trifocal tensor: Motion segmentation from 3 perspective views. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. I, pp. 769–775).
Hyndman, M., Jepson, A., & Fleet, D. J. (2007). Higher-order autoregressive models for dynamic textures. In British Machine Vision Conference (pp. 76.1–76.10). doi:10.5244/C.21.76.
Kanatani, K. (2001). Motion segmentation by subspace separation and model selection. In IEEE International Conference on Computer Vision (Vol. 2, pp. 586–591).
Kanatani, K., & Matsunaga, C. (2002). Estimating the number of independent motions for multibody motion segmentation. In European Conference on Computer Vision (pp. 25–31).
Kanatani, K., & Sugaya, Y. (2003). Multi-stage optimization for multi-body motion segmentation. In Australia-Japan Advanced Workshop on Computer Vision (pp. 335–349).
Kim, S. J., Doretto, G., Rittscher, J., Tu, P., Krahnstoever, N., & Pollefeys, M. (2009). A model change detection approach to dynamic scene modeling. In Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, 2009 (AVSS ’09) (pp. 490–495).
Li, B., Ayazoglu, M., Mao, T., Camps, O. I., & Sznaier, M. (2011). Activity recognition using dynamic subspace angles. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 3193–3200). New York: IEEE.
Ma, Y., Soatto, S., Kosecka, J., & Sastry, S. (2003). An Invitation to 3D Vision: From Images to Geometric Models. New York: Springer.
Nascimento, J. C., Figueiredo, M. A. T., & Marques, J. S. (2005). Recognition of human activities using space dependent switched dynamical models. In IEEE International Conference on Image Processing (pp. 852–855).
Nunez, F., & Cipriano, A. (2009). Visual information model based predictor for froth speed control in flotation process. Minerals Engineering, 22(4), 366–371.
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (2013). Berkeley MHAD: A comprehensive multimodal human action database. In IEEE Workshop on Applications of Computer Vision.
Overschee, P. V., & Moor, B. D. (1993). Subspace algorithms for the stochastic identification problem. Automatica, 29(3), 649–660.
Rahimi, A., Darrell, T., & Recht, B. (2005). Learning appearance manifolds from video. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. 1, pp. 868–875).
Ravichandran, A., Chaudhry, R., & Vidal, R. (2009). View-invariant dynamic texture recognition using a bag of dynamical systems. In IEEE Conference on Computer Vision and Pattern Recognition.
Ravichandran, A., Chaudhry, R., & Vidal, R. (2013). Categorizing dynamic textures using a bag of dynamical systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2), 342–353.
Ravichandran, A., & Vidal, R. (2008). Video registration using dynamic textures. In European Conference on Computer Vision.
Ravichandran, A., & Vidal, R. (2011). Video registration using dynamic textures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 158–171.
Ravichandran, A., Vidal, R., & Halperin, H. (2006). Segmenting a beating heart using polysegment and spatial GPCA. In IEEE International Symposium on Biomedical Imaging (pp. 634–637).
Saisan, P., Bissacco, A., Chiuso, A., & Soatto, S. (2004). Modeling and synthesis of facial motion driven by speech. In European Conference on Computer Vision (Vol. 3, pp. 456–467).
Shakernia, O., Vidal, R., & Sastry, S. (2003). Multi-body motion estimation and segmentation from multiple central panoramic views. In IEEE International Conference on Robotics and Automation (Vol. 1, pp. 571–576).
Shashua, A., & Levin, A. (2001). Multi-frame infinitesimal motion model for the reconstruction of (dynamic) scenes with multiple linearly moving objects. In Proceedings of IEEE International Conference on Computer Vision (Vol. 2, pp. 592–599).
Sturm, P. (2002). Structure and motion for dynamic scenes - the case of points moving in planes. In Proceedings of European Conference on Computer Vision (pp. 867–882).
Szummer, M., & Picard, R. W. (1996). Temporal texture modeling. In IEEE International Conference on Image Processing (Vol. 3, pp. 823–826).
Torr, P., Szeliski, R., & Anandan, P. (2001). An integrated Bayesian approach to layer extraction from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 297–303.
Torr, P. H. S. (1998). Geometric motion segmentation and model selection. Philosophical Transactions of the Royal Society of London, 356(1740), 1321–1340.
Tron, R., & Vidal, R. (2007). A benchmark for the comparison of 3-D motion segmentation algorithms. In IEEE Conference on Computer Vision and Pattern Recognition.
Turaga, P., Veeraraghavan, A., Srivastava, A., & Chellappa, R. (2011). Statistical computations on special manifolds for image and video-based recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11), 2273–2286.
Vidal, R. (2005). Multi-subspace methods for motion segmentation from affine, perspective and central panoramic cameras. In IEEE Conference on Robotics and Automation (pp. 1753–1758).
Vidal, R., & Hartley, R. (2004). Motion segmentation with missing data by PowerFactorization and Generalized PCA. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. II, pp. 310–316).
Vidal, R., & Ma, Y. (2004). A unified algebraic approach to 2-D and 3-D motion segmentation. In European Conference on Computer Vision (pp. 1–15).
Vidal, R., Ma, Y., Soatto, S., & Sastry, S. (2006). Two-view multibody structure from motion. International Journal of Computer Vision, 68(1), 7–25.
Vidal, R., & Ravichandran, A. (2005). Optical flow estimation and segmentation of multiple moving dynamic textures. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. II, pp. 516–521).
Vidal, R., & Sastry, S. (2003). Optimal segmentation of dynamic scenes from two perspective views. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 281–286).
Vidal, R., Soatto, S., Ma, Y., & Sastry, S. (2002b). Segmentation of dynamic scenes from the multibody fundamental matrix. In ECCV Workshop on Visual Modeling of Dynamic Scenes.
Wang, J. M., Fleet, D. J., & Hertzmann, A. (2008b). Gaussian process dynamical models for human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 283–298.
Wolf, L., & Shashua, A. (2001a). Affine 3-D reconstruction from two projective images of independently translating planes. In Proceedings of IEEE International Conference on Computer Vision (pp. 238–244).
Wolf, L., & Shashua, A. (2001b). Two-body segmentation from two perspective views. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 263–270).
Woolfe, F., & Fitzgibbon, A. (2006). Shift-invariant dynamic texture recognition. In Proceedings of European Conference on Computer Vision, pages II: 549–562.
Wu, Y., Zhang, Z., Huang, T., & Lin, J. (2001). Multibody grouping via orthogonal subspace decomposition. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 252–257).
Xiong, F., Camps, O., & Sznaier, M. (2011). Low order dynamics embedding for high dimensional time series. In IEEE International Conference on Computer Vision (pp. 2368–2374).
Xiong, F., Camps, O., & Sznaier, M. (2012). Dynamic context for tracking behind occlusions. In European Conference on Computer Vision. Lecture notes in computer science (Vol. 7576, pp. 580–593). Berlin/Heidelberg: Springer.
Yuan, L., Wen, F., Liu, C., & Shum, H. (2004). Synthesizing dynamic texture with closed-loop linear dynamic system. In European Conference on Computer Vision (pp. 603–616).
Zelnik-Manor, L., & Irani, M. (2003). Degeneracies, dependencies and their implications in multi-body and multi-sequence factorization. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 287–293).
Zhang, T., Szlam, A., Wang, Y., & Lerman, G. (2010). Randomized hybrid linear modeling via local best-fit flats. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1927–1934).
Zhou, F., la Torre, F. D., & Hodgins, J. K. (2008). Aligned cluster analysis for temporal segmentation of human motion. In International Conference on Automatic Face and Gesture Recognition.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag New York
About this chapter
Cite this chapter
Vidal, R., Ma, Y., Sastry, S.S. (2016). Motion Segmentation. In: Generalized Principal Component Analysis. Interdisciplinary Applied Mathematics, vol 40. Springer, New York, NY. https://doi.org/10.1007/978-0-387-87811-9_11
Download citation
DOI: https://doi.org/10.1007/978-0-387-87811-9_11
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-87810-2
Online ISBN: 978-0-387-87811-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)