Advertisement

Exemplar-Specific Patch Features for Fine-Grained Recognition

  • Alexander FreytagEmail author
  • Erik Rodner
  • Trevor Darrell
  • Joachim Denzler
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8753)

Abstract

In this paper, we present a new approach for fine-grained recognition or subordinate categorization, tasks where an algorithm needs to reliably differentiate between visually similar categories, e.g., different bird species. While previous approaches aim at learning a single generic representation and models with increasing complexity, we propose an orthogonal approach that learns patch representations specifically tailored to every single test exemplar. Since we query a constant number of images similar to a given test image, we obtain very compact features and avoid large-scale training with all classes and examples. Our learned mid-level features are built on shape and color detectors estimated from discovered patches reflecting small highly discriminative structures in the queried images. We evaluate our approach for fine-grained recognition on the CUB-2011 birds dataset and show that high recognition rates can be obtained by model combination.

Keywords

Training Image Feature Representation Local Learning Semantic Part Unseen Image 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Agarwal, S., Roth, D.: Learning a sparse representation for object detection. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 113–127. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2911–2918 (2012)Google Scholar
  3. 3.
    Berg, T., Belhumeur, P.N.: Poof: part-based one-vs-one features for fine-grained categorization, face verification, and attribute estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 955–962 (2013)Google Scholar
  4. 4.
    Bottou, L., Vapnik, V.: Local learning algorithms. Neural Comput. 4(6), 888–900 (1992)CrossRefGoogle Scholar
  5. 5.
    Branson, S., Van Horn, G., Belongie, S., Perona, P.: Improved bird species categorization using pose normalized deep convolutional nets. In: British Machine Vision Conference (BMVC) (2014)Google Scholar
  6. 6.
    Coates, A., Ng, A.Y.: The importance of encoding versus training with sparse coding and vector quantization. In: International Conference on Machine Learning (ICML), pp. 921–928 (2011)Google Scholar
  7. 7.
    Deng, J., Krause, J., Fei-Fei, L.: Fine-grained crowdsourcing for fine-grained recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2013)Google Scholar
  8. 8.
    Doersch, C., Gupta, A., Efros, A.A.: Mid-level visual element discovery by discriminative mean-shift. In: Neural Information Processing Systems (NIPS), pp. 1–8 (2013)Google Scholar
  9. 9.
    Duan, K., Parikh, D., Crandall, D., Grauman, K.: Discovering localized attributes for fine-grained recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3474–3481 (2012)Google Scholar
  10. 10.
    Farrell, R., Oza, O., Zhang, N., Morariu, V.I., Darrell, T., Davis, L.S.: Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance. In: International Conference on Computer Vision (ICCV), pp. 161–168 (2011)Google Scholar
  11. 11.
    Felzenszwalb, P., Huttenlocher, D.: Efficient graph-based image segmentation. Int. J. Comput. Vis. (IJCV) 59, 167–181 (2004)CrossRefGoogle Scholar
  12. 12.
    Frome, A., Singer, Y., Sha, F., Malik, J.: Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: International Conference on Computer Vision (ICCV), pp. 1–8 (2007)Google Scholar
  13. 13.
    Gavves, E., Fernando, B., Snoek, C., Smeulders, A., Tuytelaars, T.: Fine-grained categorization by alignments. In: International Conference on Computer Vision (ICCV), pp. 1–8 (2013)Google Scholar
  14. 14.
    Göring, C., Rodner, E., Freytag, A., Denzler, J.: Nonparametric part transfer for fine-grained recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2014)Google Scholar
  15. 15.
    Hariharan, B., Malik, J., Ramanan, D.: Discriminative decorrelation for clustering and classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 459–472. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  16. 16.
    Ionescu, R., Popescu, M., Grozea, C.: Local learning to improve bag of visual words model for facial expression recognition. In: International Conference on Machine Learning - Workshop on Representation Learning (ICML-WS) (2013)Google Scholar
  17. 17.
    Jia, Y., Vinyals, O., Darrell, T.: Pooling-invariant image feature learning. CoRR abs/1302.5056 (2013)Google Scholar
  18. 18.
    Juneja, M., Vedaldi, A., Jawahar, C., Zisserman, A.: Blocks that shout: Distinctive parts for scene classification. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 923–930 (2013)Google Scholar
  19. 19.
    Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: International Conference on Computer Vision (ICCV), vol. 1, pp. 604–610 (2005)Google Scholar
  20. 20.
    Khan, F.S., Van De Weijer, J., Bagdanov, A.D., Vanrell, M.: Portmanteau vocabularies for multi-cue image representation. In: Neural Information Processing Systems (NIPS), pp. 1323–1331 (2011)Google Scholar
  21. 21.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems (NIPS), vol. 1, p. 4 (2012)Google Scholar
  22. 22.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2169–2178 (2006)Google Scholar
  23. 23.
    Lee, Y.J., Efros, A.A., Hebert, M.: Style-aware mid-level representation for discovering visual connections in space and time. In: International Conference on Computer Vision (ICCV), pp. 1857–1864 (2013)Google Scholar
  24. 24.
    Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 73–86. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  25. 25.
    Van De Weijer, J., Schmid, C.: Applying color names to image description. In: International Conference on Image Processing (ICIP), vol. 3, pp. III-493 (2007)Google Scholar
  26. 26.
    Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 34(3), 480–492 (2012)CrossRefGoogle Scholar
  27. 27.
    Vidal-Naquet, M., Ullman, S.: Object recognition with informative features and linear classification. In: International Conference on Computer Vision (ICCV), pp. 281–288 (2003)Google Scholar
  28. 28.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset. Technical report CNS-TR-2011-001, California Institute of Technology (2011)Google Scholar
  29. 29.
    Yang, S., Bo, L., Wang, J., Shapiro, L.: Unsupervised template learning for fine-grained object recognition. In: Neural Information Processing Systems (NIPS), pp. 3131–3139 (2012)Google Scholar
  30. 30.
    Zhang, H., Berg, A.C., Maire, M., Malik, J.: Svm-knn: discriminative nearest neighbor classification for visual category recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2126–2136 (2006)Google Scholar
  31. 31.
    Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based R-CNNs for fine-grained category detection. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 834–849. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  32. 32.
    Zhang, N., Farrell, R., Iandola, F., Darrell, T.: Deformable part descriptors for fine-grained recognition and attribute prediction. In: International Conference on Computer Vision (ICCV) (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Alexander Freytag
    • 1
    Email author
  • Erik Rodner
    • 1
  • Trevor Darrell
    • 2
  • Joachim Denzler
    • 1
  1. 1.Computer Vision GroupFriedrich Schiller University JenaJenaGermany
  2. 2.UC Berkeley ICSI and EECSBerkeleyUSA

Personalised recommendations