Monocular Multiview Object Tracking with 3D Aspect Parts

  • Yu Xiang
  • Changkyu Song
  • Roozbeh Mottaghi
  • Silvio Savarese
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8694)


In this work, we focus on the problem of tracking objects under significant viewpoint variations, which poses a big challenge to traditional object tracking methods. We propose a novel method to track an object and estimate its continuous pose and part locations under severe viewpoint change. In order to handle the change in topological appearance introduced by viewpoint transformations, we represent objects with 3D aspect parts and model the relationship between viewpoint and 3D aspect parts in a part-based particle filtering framework. Moreover, we show that instance-level online-learned part appearance can be incorporated into our model, which makes it more robust in difficult scenarios with occlusions. Experiments are conducted on a new dataset of challenging YouTube videos and a subset of the KITTI dataset [14] that include significant viewpoint variations, as well as a standard sequence for car tracking. We demonstrate that our method is able to track the 3D aspect parts and the viewpoint of objects accurately despite significant changes in viewpoint.


multiview object tracking 3D aspect part representation 

Supplementary material

978-3-319-10599-4_15_MOESM1_ESM.pdf (787 kb)
Electronic Supplementary Material (PDF 787 KB)


  1. 1.
  2. 2.
    Babenko, B., Yang, M.H., Belongie, S.: Robust object tracking with online multiple instance learning. TPAMI 33(8), 1619–1632 (2011)CrossRefGoogle Scholar
  3. 3.
    Bao, C., Wu, Y., Ling, H., Ji, H.: Real time robust l1 tracker using accelerated proximal gradient approach. In: CVPR (2012)Google Scholar
  4. 4.
    Breitenstein, M.D., Reichlin, F., Leibe, B., Koller-Meier, E., Van Gool, L.: Online multiperson tracking-by-detection from a single, uncalibrated camera. TPAMI 33(9), 1820–1833 (2011)CrossRefGoogle Scholar
  5. 5.
    Butt, A.A., Collins, R.T.: Multi-target tracking by lagrangian relaxation to min-cost network flow. In: CVPR (2013)Google Scholar
  6. 6.
    Choi, C., Christensen, H.I.: Real-time 3D model-based tracking using edge and keypoint features for robotic manipulation. In: ICRA, pp. 4048–4055 (2010)Google Scholar
  7. 7.
    Choi, W., Pantofaru, C., Savarese, S.: A general framework for tracking multiple people from a moving camera. TPAMI (2012)Google Scholar
  8. 8.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  9. 9.
    Dickinson, S.J., Pentland, A.P., Rosenfeld, A.: From volumes to views: An approach to 3-d object recognition. CVGIP: Image Understanding 55(2), 130–154 (1992)CrossRefzbMATHGoogle Scholar
  10. 10.
    Drummond, T., Cipolla, R.: Real-time visual tracking of complex structures. TPAMI 24(7), 932–946 (2002)CrossRefGoogle Scholar
  11. 11.
    Feldman, A., Hybinette, M., Balch, T.: The multi-iterative closest point tracker: An online algorithm for tracking multiple interacting targets. Journal of Field Robotics (2012)Google Scholar
  12. 12.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI (2010)Google Scholar
  13. 13.
    Fidler, S., Dickinson, S., Urtasun, R.: 3D object detection and viewpoint estimation with a deformable 3D cuboid model. In: NIPS (2012)Google Scholar
  14. 14.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR (2012)Google Scholar
  15. 15.
    Hare, S., Saffari, A., Torr, P.H.: Struck: Structured output tracking with kernels. In: ICCV (2011)Google Scholar
  16. 16.
    Held, D., Levinson, J., Thrun, S.: Precision tracking with sparse 3D and dense color 2D data. In: ICRA (2013)Google Scholar
  17. 17.
    Hofmann, M., Wolf, D., Rigoll, G.: Hypergraphs for joint multi-view reconstruction and multi-object tracking. In: CVPR (2012)Google Scholar
  18. 18.
    Huang, Q.X., Adams, B., Wand, M.: Bayesian surface reconstruction via iterative scan alignment to an optimized prototype. In: Eurographics Symposium on Geometry Processing (2007)Google Scholar
  19. 19.
    Kaestner, R., Maye, J., Pilat, Y., Siegwart, R.: Generative object detection and tracking in 3D range data. In: ICRA (2012)Google Scholar
  20. 20.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. TPAMI 34(7), 1409–1422 (2012)CrossRefGoogle Scholar
  21. 21.
    Khan, S.M., Shah, M.: Tracking multiple occluding people by localizing on multiple scene planes. TPAMI 31(3), 505–519 (2009)CrossRefGoogle Scholar
  22. 22.
    Khan, Z., Balch, T., Dellaert, F.: Mcmc-based particle filtering for tracking a variable number of interacting targets. TPAMI 27(11), 1805–1819 (2005)CrossRefGoogle Scholar
  23. 23.
    Leal-Taixé, L., Pons-Moll, G., Rosenhahn, B.: Branch-and-price global optimization for multi-view multi-target tracking. In: CVPR (2012)Google Scholar
  24. 24.
    Lepetit, V., Fua, P.: Monocular model-based 3d tracking of rigid objects: A survey. Foundations and Trends in Computer Graphics and Vision 1(1), 1–89 (2005)CrossRefGoogle Scholar
  25. 25.
    Liebelt, J., Schmid, C., Schertler, K.: Viewpoint-independent object class detection using 3D feature maps. In: CVPR (2008)Google Scholar
  26. 26.
    Lim, J.J., Pirsiavash, H., Torralba, A.: Parsing ikea objects: Fine pose estimation. In: ICCV (2013)Google Scholar
  27. 27.
    Lowe, D.G.: Three-dimensional object recognition from single two-dimensional images. Artificial Intelligence 31(3), 355–395 (1987)Google Scholar
  28. 28.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of Imaging Understanding Workshop (1981)Google Scholar
  29. 29.
    Oron, S., Bar-Hillel, A., Avidan, S.: Extended lucas kanade tracking. In: ECCV (2014)Google Scholar
  30. 30.
    Pauwels, K., Rubio, L., Diaz, J., Ros, E.: Real-time model-based rigid object pose estimation and tracking combining dense and sparse visual cues. In: CVPR, pp. 2347–2354 (2013)Google Scholar
  31. 31.
    Pepik, B., Stark, M., Gehler, P., Schiele, B.: Teaching 3D geometry to deformable part models. In: CVPR (2012)Google Scholar
  32. 32.
    Petrovskaya, A., Thrun, S.: Model based vehicle tracking for autonomous driving in urban environments. In: RSS (2008)Google Scholar
  33. 33.
    Pirsiavash, H., Ramanan, D., Fowlkes, C.C.: Globally-optimal greedy algorithms for tracking a variable number of objects. In: CVPR (2011)Google Scholar
  34. 34.
    Prisacariu, V.A., Reid, I.D.: Pwp3D: Real-time segmentation and tracking of 3D objects. IJCV 98(3), 335–354 (2012)CrossRefMathSciNetGoogle Scholar
  35. 35.
    Roller, D., Daniilidis, K., Nagel, H.H.: Model-based object tracking in monocular image sequences of road traffic scenes. IJCV 10(3), 257–281 (1993)CrossRefGoogle Scholar
  36. 36.
    Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: ICCV (2007)Google Scholar
  37. 37.
    Su, H., Sun, M., Fei-Fei, L., Savarese, S.: Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. In: ICCV (2009)Google Scholar
  38. 38.
    Supancic III, J.S., Ramanan, D.: Self-paced learning for long-term tracking. In: CVPR (2013)Google Scholar
  39. 39.
    Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., Van Gool, L.: Towards multi-view object class detection. In: CVPR (2006)Google Scholar
  40. 40.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)Google Scholar
  41. 41.
    Wu, Y., Lim, J., Yang, M.H.: Online object tracking: A benchmark. In: CVPR (2013)Google Scholar
  42. 42.
    Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: A benchmark for 3D object detection in the wild. In: WACV (2014)Google Scholar
  43. 43.
    Xiang, Y., Savarese, S.: Estimating the aspect layout of object categories. In: CVPR (2012)Google Scholar
  44. 44.
    Yang, B., Nevatia, R.: An online learned crf model for multi-target tracking. In: CVPR (2012)Google Scholar
  45. 45.
    Yao, R., Shi, Q., Shen, C., Zhang, Y., van den Hengel, A.: Part-based visual tracking with online latent structural learning. In: CVPR (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Yu Xiang
    • 1
    • 2
  • Changkyu Song
    • 2
  • Roozbeh Mottaghi
    • 1
  • Silvio Savarese
    • 1
  1. 1.Computer Science DepartmentStanford UniversityPalo AltoUSA
  2. 2.Department of EECSUniversity of Michigan at Ann ArborAnn ArborUSA

Personalised recommendations