View Synthesis for Recognizing Unseen Poses of Object Classes

  • Silvio Savarese
  • Li Fei-Fei
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5304)


An important task in object recognition is to enable algorithms to categorize objects under arbitrary poses in a cluttered 3D world. A recent paper by Savarese & Fei-Fei [1] has proposed a novel representation to model 3D object classes. In this representation stable parts of objects from one class are linked together to capture both the appearance and shape properties of the object class. We propose to extend this framework and improve the ability of the model to recognize poses that have not been seen in training. Inspired by works in single object view synthesis (e.g., Seitz & Dyer [2]), our new representation allows the model to synthesize novel views of an object class at recognition time. This mechanism is incorporated in a novel two-step algorithm that is able to classify objects under arbitrary and/or unseen poses. We compare our results on pose categorization with the model and dataset presented in [1]. In a second experiment, we collect a new, more challenging dataset of 8 object classes from crawling the web. In both experiments, our model shows competitive performances compared to [1] for classifying objects in unseen poses.


Object Recognition Object Class Linkage Structure View Versus View Synthesis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: IEEE Int. Conf. on Computer Vision, Rio de Janeiro, Brazil (October 2007)Google Scholar
  2. 2.
    Seitz, S., Dyer, C.: View morphing. In: SIGGRAPH, pp. 21–30 (1996)Google Scholar
  3. 3.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proc. Computer Vision and Pattern Recognition (2001)Google Scholar
  4. 4.
    Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 18–32. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Dance, C., Willamowski, J., Fan, L., Bray, C., Csurka, G.: Visual categorization with bags of keypoints. In: ECCV International Workshop on Statistical Learning in Computer Vision, Prague (2004)Google Scholar
  6. 6.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proc. Comp. Vis. and Pattern Recogn. (2003)Google Scholar
  7. 7.
    Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), Beijing, China (2005)Google Scholar
  8. 8.
    Leibe, B., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: Proc. Workshop on satistical learning in computer vision, Prague, Czech Republic (2004)Google Scholar
  9. 9.
    Berg, A., Berg, T., Malik, J.: Shape matching and object recognition using low distortion correspondences. In: Proc. Computer Vis. and Pattern Recog. (2005)Google Scholar
  10. 10.
    Todorovic, S., Ahuja, N.: Extracting subimages of an unknown category from a set of images. In: CVPR (2006)Google Scholar
  11. 11.
    Schneiderman, H., Kanade, T.: A statistical approach to 3D object detection applied to faces and cars. In: Proc. CVPR, pp. 746–751 (2000)Google Scholar
  12. 12.
    Weber, M., Einhaeuser, W., Welling, M., Perona, P.: Viewpoint-invariant learning and detection of human heads. In: Int. Conf. Autom. Face and Gesture Rec. (2000)Google Scholar
  13. 13.
    Torralba, A., Murphy, K., Freeman, W.: Sharing features: efficient boosting procedures for multiclass object detection. In: Proc. Conference on Computer Vision and Pattern Recognition (CVPR) (2004)Google Scholar
  14. 14.
    Beier, T., Neely, S.: Feature-based image metamorphosis. In: SIGGRAPH (1992)Google Scholar
  15. 15.
    Chen, S., Williams, L.: View interpolation for image synthesis. Computer Graphics 27, 279–288 (1993)Google Scholar
  16. 16.
    Szeliski, R.: Video mosaics for virtual environments. Computer Graphics and Applications 16, 22–30 (1996)CrossRefGoogle Scholar
  17. 17.
    Avidan, S., Shashua, A.: Novel view synthesis in tensor space. In: Proc. Computer Vision and Pattern Recognition, vol. 1, pp. 1034–1040 (1997)Google Scholar
  18. 18.
    Laveau, S., Faugeras, O.: 3-d scene representation as a collection of images. In: Proc. International Conference on Pattern Recognition (1994)Google Scholar
  19. 19.
    Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  20. 20.
    Xiao, J., Shah, M.: Tri-view morphing. CVIU 96 (2004)Google Scholar
  21. 21.
    Brown, M., Lowe, D.: Unsupervised 3D object recognition and reconstruction in unordered datasets. In: 5th International Conference on 3D Imaging and Modelling (3DIM 2005), Ottawa, Canada (2005)Google Scholar
  22. 22.
    Lowe, D.: Object recognition from local scale-invariant features. In: Proc. International Conference on Computer Vision, pp. 1150–1157 (1999)Google Scholar
  23. 23.
    Ullman, S., Basri, R.: Recognition by linear combination of models. Technical report, Cambridge, MA, USA (1989)Google Scholar
  24. 24.
    Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. IJCV 66(3), 231–259 (2006)CrossRefGoogle Scholar
  25. 25.
    Ferrari, V., Tuytelaars, T., Van Gool, L.: Simultaneous object recognition and segmentation from single or multiple model views. IJCV (2006)Google Scholar
  26. 26.
    Lazebnik, S., Schmid, C., Ponce, J.: Semi-local affine parts for object recognition. In: Proceedings of BMVC, Kingston, UK, vol. 2, pp. 959–968 (2004)Google Scholar
  27. 27.
    Bart, E., Byvatov, E., Ullman, S.: View-invariant recognition using corresponding object fragments. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 152–165. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  28. 28.
    Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., Van Gool, L.: Towards multi-view object class detection. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1589–1596 (2006)Google Scholar
  29. 29.
    Kushal, A., Schmid, C., Ponce, J.: Flexible object models for category-level 3d object recognition. In: Proc. Conf. on Comp. Vis. and Patt. Recogn. (2007)Google Scholar
  30. 30.
    Hoeim, D., Rother, C., Winn, J.: 3D layoutcrf for multi-view object class recognition and segmentation. In: Proc. In IEEE Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  31. 31.
    Yan, P., Khan, D., Shah, M.: 3d model based object class detection in an arbitrary view. In: ICCV (2007)Google Scholar
  32. 32.
    Chiu, H., Kaelbling, L., Lozano-Perez, T.: Virtual training for multi-view object class recognition. In: CVPR (2007)Google Scholar
  33. 33.
  34. 34.
    Russell, B., Torralba, A., Murphy, K., Freeman, W.: Labelme: a database and web-based tool for image annotation. Int. Journal of Computer Vision (2007)Google Scholar
  35. 35.

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Silvio Savarese
    • 1
  • Li Fei-Fei
    • 2
  1. 1.Department of Electrical EngineeringUniversity of Michigan at Ann ArborUSA
  2. 2.Department of Computer SciencePrinceton UniversityUSA

Personalised recommendations