Learning to Recognize Objects from Unseen Modalities

  • C. Mario Christoudias
  • Raquel Urtasun
  • Mathieu Salzmann
  • Trevor Darrell
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6311)


In this paper we investigate the problem of exploiting multiple sources of information for object recognition tasks when additional modalities that are not present in the labeled training set are available for inference. This scenario is common to many robotics sensing applications and is in contrast with the assumption made by existing approaches that require at least some labeled examples for each modality. To leverage the previously unseen features, we make use of the unlabeled data to learn a mapping from the existing modalities to the new ones. This allows us to predict the missing data for the labeled examples and exploit all modalities using multiple kernel learning. We demonstrate the effectiveness of our approach on several multi-modal tasks including object recognition from multi-resolution imagery, grayscale and color images, as well as images and text. Our approach outperforms multiple kernel learning on the original modalities, as well as nearest-neighbor and bootstrapping schemes.


Object Recognition Unlabeled Data Correct Classification Rate Kernel Principal Component Analysis Multiple Kernel 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Varma, M., Ray, D.: Learning the discriminative power-invariance trade-off. In: ICCV (2007)Google Scholar
  2. 2.
    Kapoor, A., Graumann, K., Urtasun, R., Darrell, T.: Gaussian processes for object categorizatio. IJCV (2009)Google Scholar
  3. 3.
    Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. In: CVPR (2006)Google Scholar
  4. 4.
    Saenko, K., Darrell, T.: Unsupervised learning of visual sense models for polysemous words. In: NIPS (2008)Google Scholar
  5. 5.
    Saenko, K., Darrell, T.: Filtering abstract senses from image search results. In: NIPS (2009)Google Scholar
  6. 6.
    Wang, G., Hoiem, D., Forsyth, D.: Building text features for object image classification. In: CVPR (2009)Google Scholar
  7. 7.
    Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: CVPR (2009)Google Scholar
  8. 8.
    Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: CVPR (2008)Google Scholar
  9. 9.
    Leibe, B., Mikolajczyk, K., Schiele, B.: Segmentation based multi-cue integration for object detection. In: BMVC (2006)Google Scholar
  10. 10.
    Levin, A., Viola, P., Freund, Y.: Unsupervised improvement of visual detectors using co-training. In: ICCV (2003)Google Scholar
  11. 11.
    Christoudias, C.M., Urtasun, R., Kapoor, A., Darrell, T.: Co-training with noisy perceptual observations. In: CVPR (2009)Google Scholar
  12. 12.
    Yan, R., Naphade, M.: Semi-supervised cross feature learning for semantic concept detection in videos. In: CVPR (2005)Google Scholar
  13. 13.
    Christoudias, C.M., Saenko, K., Morency, L.P., Darrell, T.: Co-adaptation of audio-visual speech and gesture classifiers. In: ICMI (2006)Google Scholar
  14. 14.
    Quattoni, A., Collins, M., Darrell, T.: Learning visual representations using images with captions. In: CVPR (2007)Google Scholar
  15. 15.
    Urtasun, R., Fleet, D., Hertzmann, A., Fua, P.: Priors for people tracking from small training sets. In: ICCV (2005)Google Scholar
  16. 16.
    Urtasun, R., Darrell, T.: Sparse probabilistic regression for activity-independent human pose inference. In: CVPR (2008)Google Scholar
  17. 17.
    Urtasun, R., Quattoni, A., Lawrence, N., Darrell, T.: Transfering nonlinear representations using gaussian processes with a shared latent space. Technical report, MIT (2008)Google Scholar
  18. 18.
    Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: ICCV (2005)Google Scholar
  19. 19.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  20. 20.
    Scholkopf, B., Smola, A., Muller, K.: Kernel principal component analysis. In: Gerstner, W., Hasler, M., Germond, A., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327. Springer, Heidelberg (1997)Google Scholar
  21. 21.
    Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to new domains. In: ECCV (2010)Google Scholar
  22. 22.
    Gehlera, P., Nowozin, S.: Learning image similarity from flickr groups using stochastic intersection kernel machines. In: ICCV (2009)Google Scholar
  23. 23.
    Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forest and ferns. In: ICCV (2007)Google Scholar
  24. 24.
    Lazebnik, S., Schmid, C., Ponce, J.: Semi-local affine parts for object recognition. In: BMVC (2004)Google Scholar
  25. 25.
    Lazebnik, S., Schmid, C., Ponce, J.: A maximum entropy framework for part-based texture and object recognition. In: CVPR (2005)Google Scholar
  26. 26.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • C. Mario Christoudias
    • 1
  • Raquel Urtasun
    • 2
  • Mathieu Salzmann
    • 1
  • Trevor Darrell
    • 1
  1. 1.UC Berkeley EECS & ICSI 
  2. 2.TTI Chicago 

Personalised recommendations