Journal of Computer Science and Technology

, Volume 19, Issue 4, pp 489–500 | Cite as

Learning-based tracking of complex non-rigid motion

Pattern Recognition and Image Processing


This paper describes a novel method for tracking complex non-rigid motions by learning the intrinsic object structure. The approach builds on and extends the studies on non-linear dimensionality reduction for object representation, object dynamics modeling and particle filter style tracking. First, the dimensionality reduction and density estimation algorithm is derived for unsupervised learning of object intrinsic representation, and the obtained non-rigid part of object state reduces even to 2–3 dimensions. Secondly the dynamical model is derived and trained based on this intrinsic representation. Thirdly the learned intrinsic object structure is integrated into a particle filter style tracker. It is shown that this intrinsic object representation has some interesting properties and based on which the newly derived dynamical model makes particle filter style tracker more robust and reliable. Extensive experiments are done on the tracking of challenging non-rigid motions such as fish twisting with self-occlusion, large inter-frame lip motion and facial expressions with global head rotation. Quantitative results are given to make comparisons between the newly proposed tracker and the existing tracker. The proposed method also has the potential to solve other type of tracking problems.


non-linear dimensionality reduction particle filter tracking 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Isard M, Blake A. Contour tracking by stochastic propagation of conditional density. InProc. ECCV, Cambridge, UK, April 15–18, 1996, Vol.1, pp.343–356.Google Scholar
  2. [2]
    Wu Y, Huang T S. Color tracking by transductive learning. InProc. IEEE CVPR, Hilton Head Island, South Carolina, June 13–15, 2000, Vol.I, pp.133–138.Google Scholar
  3. [3]
    Black M, Jepson A. Eigentracking: Robust matching and tracking of articulated object using a view-based representation. InProc. ECCV, Cambridge, UK, April 15–18, 1996, Vol.1, pp.329–342.Google Scholar
  4. [4]
    Toyama K, Blake A. Probabilistic tracking in a metric space. InProc. IEEE ICCV, Vancouver, Canada, July 9–12, 2001, Vol.II, pp.50–57.Google Scholar
  5. [5]
    Birchfield S. Elliptical head tracking using intensity gradient and color histograms. InProc. IEEE CVPR, Santa Barbara, California, June 23–25, 1998, pp.232–237.Google Scholar
  6. [6]
    Heap T, Hogg D. Wormholes in shape space: Tracking through discontinuous changes in shape. InProc. IEEE ICCV, Bombay, India, January 4–7, 1998, pp.344–349.Google Scholar
  7. [7]
    Tipping M E, Bishop C M. Mixtures of probabilistic principal component analysers.Neural Computation, 1999, 11(2): 443–482.CrossRefGoogle Scholar
  8. [8]
    Forsyth D A, Ponce J. Computer Vision: A Modern Approach. Prentice Hall, 2003, pp.520–574.Google Scholar
  9. [9]
    Blake A, Isard M, Reynard D. Learning to track the visual motion of contours.Artificial Intelligence, 1995, 78: 101–133.CrossRefGoogle Scholar
  10. [10]
    North B, Blake A, Isard Met al. Learning and classification of complex dynamics.IEEE Trans. PAMI, 2000, 22(9): 1016–1034.Google Scholar
  11. [11]
    Tay T, Sung K K. Probabilistic learning and modeling of object dynamics for tracking. InProc. IEEE ICCV, Vancouver, Canada, July 9–12, 2001, Vol.II, pp.648–653.Google Scholar
  12. [12]
    Blake A, Isard M. Active Contours. Springer-Verlag, 1998.Google Scholar
  13. [13]
    Sidenbladh H, Black M. Learning image statistics for Bayesian tracking. InProc. IEEE ICCV, Vancouver, Canada, July 9–12, 2001, Vol.II, pp.709–716.Google Scholar
  14. [14]
    Sullivan J, Blake A, Isard Met al. Object localization by Bayesian correlation. InProc. IEEE ICCV, Kerkyra, Corfu, Greece, September 20–25, 1999, pp.1068–1075.Google Scholar
  15. [15]
    Cootes T F, Taylor C J. Active shape models—Smart snakes. InProc. British Machine Vision Conference, Springer-Verlag, UK, 1992, pp.266–275.Google Scholar
  16. [16]
    Tao H, Huang T S. Explanation-based facial motion tracking using a piecewise Bazier volume deformation model. InProc. IEEE CVPR, Fort, Collins, Colorado, 1999, pp.611–617.Google Scholar
  17. [17]
    Seung H S, Lee D D. The manifold ways of perception.Science, 2000, 290(5500): 2268–2269.CrossRefGoogle Scholar
  18. [18]
    Roweis S, Saul L K. Nonlinear dimensionality reduction by locally linear embedding.Science, 2000, 290(5500): 2323–2326.CrossRefGoogle Scholar
  19. [19]
    Roweis S, Saul L, Hinton G. Global coordination of local linear models. InAdvances in Neural Information Processing Systems, Dietterich T G, Becker S, Ghahramani Z (eds.), MIT Press, 2002, pp. 889–896.Google Scholar
  20. [20]
    Tenenbaum J B, deSilva V, Langford J C. A global geometric framework for nonlinear dimensionality reduction.Science, 2000, 290(5500): 2319–2323.CrossRefGoogle Scholar
  21. [21]
    Ghahramani Z, Hinton G. The EM algorithm for mixtures of factor analyzers. University of Toronto Technical Report CRC-TR-96-1, 1996.Google Scholar
  22. [22]
    Verbeek J J, Vlassis N, Kroese B. Fast nonlinear dimensionality reduction with topology representing networks. InProc. 10th European Symposium on Artificial Neural Networks, Bruges, Belgium, April 24–26, 2002, pp.193–198.Google Scholar
  23. [23]
    deSilva V, Tenenbaum J. Unsupervised learning of curved manifolds. InProc. MSRI Workshop on Nonlinear Estimation and Classification, Berkeley, California, March 19–29, 2001, pp.453–466.Google Scholar
  24. [24]
    Isard M, Blake A. ICONDENSATION: Unifying lowlevel and high-level tracking in a stochastic framework. InProc. ECCV, Freiburg, Germany, June 2–6, 1998, Vol.1, pp.893–908.Google Scholar
  25. [25]
    Wang Q, Ai H, Xu G. A probabilistic dynamic contour model for accurate and robust lip tracking. InProc. IEEE Fourth International Conference on Multimodal Interfaces, Pittsburgh, Pennsylvania, October 14–16, 2002, pp.281–286.Google Scholar
  26. [26]
    Gavrila D, Philomin V. Real-time object detection for smart vehicles. InProc. IEEE ICCV, Kerkyra, Corfu, Greece, September 20–25, 1999, pp.87–93.Google Scholar
  27. [27]
    Kutulakos K. Approximate N-view stereo. InProc. ECCV, Dublin, Ireland, June 26–July 1, 2000, Vol.1, pp.67–83.Google Scholar
  28. [28]
    MPEG Video. Information technology—Coding of audio-visual objects Part 2: Visual Amendment 1: Visual extensions. ISO/IECJTC1/SC29/WG11/N3056, Dec, 1999.Google Scholar

Copyright information

© Science Press, Beijing China and Allerton Press Inc. 2004

Authors and Affiliations

  1. 1.State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and TechnologyTsinghua UniversityBeijingP.R. China

Personalised recommendations