Untangling Object-View Manifold for Multiview Recognition and Pose Estimation

  • Amr Bakry
  • Ahmed Elgammal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8692)


The problem of multi-view/view-invariant recognition remains one of the most fundamental challenges to the progress of the computer vision. In this paper we consider the problem of modeling the combined object-viewpoint manifold. The shape and appearance of an object in a given image is a function of its category, style within category, viewpoint, and several other factors. The visual manifold (in any chosen feature representation space) given all these variability collectively is very hard and even impossible to model. We propose an efficient computational framework that can untangle such a complex manifold, and achieve a model that separates a view-invariant category representation, from category-invariant pose representation. We outperform the state of the art in the three widely used multiview dataset, for both category recognition, and pose estimation.


Category Recognition Generic Object Recognition Visual Manifold Style Vector Pose Estimation Accuracy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material

978-3-319-10593-2_29_MOESM1_ESM.pdf (455 kb)
Electronic Supplementary Material(456 KB)


  1. 1.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  2. 2.
    DiCarlo, J.J., Cox, D.D.: Untangling invariant object recognition. Trends in Cognitive Sciences 11(8), 333–341 (2007)CrossRefGoogle Scholar
  3. 3.
    DiCarlo, J.J., Zoccolan, D., Rust, N.C.: How does the brain solve visual object recognition? Neuron 73(3), 415–434 (2012)CrossRefGoogle Scholar
  4. 4.
    Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI (2010)Google Scholar
  6. 6.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. IJCV 61(1), 55–79 (2005)CrossRefGoogle Scholar
  7. 7.
    Grimson, W., Lozano-Perez, T.: Recognition and localization of overlapping parts from sparse data in two and three dimensions. In: Proceedings of the1985 IEEE International Conference on Robotics and Automation, vol. 2, pp. 61–66. IEEE (1985)Google Scholar
  8. 8.
    Kimeldorf, G.S., Wahba, G.: A correspondence between bayesian estimation on stochastic processes and smoothing by splines. The Annals of Mathematical Statistics 41, 495–502 (1970)CrossRefzbMATHMathSciNetGoogle Scholar
  9. 9.
    Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824. IEEE (2011)Google Scholar
  10. 10.
    Lai, K., Bo, L., Ren, X., Fox, D.: A scalable tree-based approach for joint object and pose recognition. In: Twenty-Fifth Conference on Artificial Intelligence, AAAI (2011)Google Scholar
  11. 11.
    Lamdan, Y., Wolfson, H.: Geometric hashing: A general and efficient model-based recognition scheme (1988)Google Scholar
  12. 12.
    Lathauwer, L.D., de Moor, B., Vandewalle, J.: A multilinear singular value decomposiiton. SIAM Journal on Matrix Analysis and Applications 21(4), 1253–1278 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Lowe, D.G.: Three-dimensional object recognition from single two-dimensional images. Artificial Intelligence 31(3), 355–395 (1987)CrossRefGoogle Scholar
  14. 14.
    Marr, D.: Vision: A computational investigation into the human representation and processing of visual information. W.H. Freeman (1982)Google Scholar
  15. 15.
    Mei, L., Liu, J., Hero, A., Savarese, S.: Robust object pose estimation via statistical manifold modeling. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 967–974. IEEE (2011)Google Scholar
  16. 16.
    Murase, H., Nayar, S.: Visual learning and recognition of 3d objects from appearance. International Journal of Computer Vision 14, 5–24 (1995)CrossRefGoogle Scholar
  17. 17.
    Ozuysal, M., Lepetit, V., Fua, P.: Pose estimation for category specific multiview object localization. In: CVPR (2009)Google Scholar
  18. 18.
    Payet, N., Todorovic, S.: From contours to 3d object detection and pose estimation. In: ICCV (2011)Google Scholar
  19. 19.
    Pepik, B., Stark, M., Gehler, P., Schiele, B.: Teaching 3d geometry to deformable part models. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3362–3369. IEEE (2012)Google Scholar
  20. 20.
    Poggio, T., Girosi, F.: Network for approximation and learning. Proceedings of the IEEE 78(9), 1481–1497 (1990)CrossRefGoogle Scholar
  21. 21.
    Savarese, S., Fei-Fei, L.: 3d generic object categorization, localization and pose estimation. In: ICCV (2007)Google Scholar
  22. 22.
    Savarese, S., Fei-Fei, L.: View synthesis for recognizing unseen poses of object classes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 602–615. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  23. 23.
    Schels, J., Liebelt, J., Lienhart, R.: Learning an object class representation on a continuous viewsphere. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3170–3177. IEEE (2012)Google Scholar
  24. 24.
    Shimshoni, I., Ponce, J.: Finite-resolution aspect graphs of polyhedral objects. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 315–327 (1997)CrossRefGoogle Scholar
  25. 25.
    Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images. In: ICCV (2005)Google Scholar
  26. 26.
    Teney, D., Piater, J.: Continuous pose estimation in 2d images at instance and category levels. In: 2013 International Conference on Computer and Robot Vision, pp. 121–127 (2013)Google Scholar
  27. 27.
    Torki, M., Elgammal, A.: Regression from local features for viewpoint and pose estimation. In: Proceedings of International Conference on Computer Vision, ICCV (2011)Google Scholar
  28. 28.
    Turaga, P., Veeraraghavan, A., Srivastava, A., Chellappa, R.: Statistical computations on grassmann and stiefel manifolds for image and video-based recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(11), 2273–2286 (2011)CrossRefGoogle Scholar
  29. 29.
    Willamowski, J., Arregui, D., Csurka, G., Dance, C.R., Fan, L.: Categorizing nine visual classes using local appearance descriptors. In: IWLAVS (2004)Google Scholar
  30. 30.
    Zhang, H., El-Gaaly, T., Elgammal, A., Jiang, Z.: Joint object and pose recognition using homeomorphic manifold analysis. In: AAAI (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Amr Bakry
    • 1
  • Ahmed Elgammal
    • 1
  1. 1.Department of Computer ScienceRutgers UniversityPiscatawayUSA

Personalised recommendations