Semantic Segmentation with Second-Order Pooling

  • João Carreira
  • Rui Caseiro
  • Jorge Batista
  • Cristian Sminchisescu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7578)


Feature extraction, coding and pooling, are important components on many contemporary object recognition paradigms. In this paper we explore novel pooling techniques that encode the second-order statistics of local descriptors inside a region. To achieve this effect, we introduce multiplicative second-order analogues of average and max-pooling that together with appropriate non-linearities lead to state-of-the-art performance on free-form region recognition, without any type of feature coding. Instead of coding, we found that enriching local descriptors with additional image information leads to large performance gains, especially in conjunction with the proposed pooling methodology. We show that second-order pooling over free-form regions produces results superior to those of the winning systems in the Pascal VOC 2011 semantic segmentation challenge, with models that are 20,000 times faster.


Semantic Segmentation Feature Pooling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Schmid, C., Mohr, R.: Local grayvalue invariants for image retrieval. TPAMI (1997)Google Scholar
  2. 2.
    Dance, C., Willamowski, J., Fan, L., Bray, C., Csurka, G.: Visual categorization with bags of keypoints. In: ECCV SLCV Workshop (2004)Google Scholar
  3. 3.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  4. 4.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  5. 5.
    Boureau, Y., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in vision algorithms. In: ICML (2010)Google Scholar
  6. 6.
    Boureau, Y., Le Roux, N., Bach, F., Ponce, J., LeCun, Y.: Ask the locals: multi-way local pooling for image recognition. In: ICCV (2011)Google Scholar
  7. 7.
    Ranzato, M., Boureau, Y., LeCun, Y.: Sparse feature learning for deep belief networks. In: NIPS (2007)Google Scholar
  8. 8.
    Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Geometric means in a novel vector space structure on symmetric positive-definite matrices. In: SIAM JMAA (2006)Google Scholar
  9. 9.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV (2004)Google Scholar
  10. 10.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge (2011),
  11. 11.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In: CVIU (2007)Google Scholar
  12. 12.
    Fergus, R., Perona, P., Zisserman, A.: Weakly supervised scale-invariant learning of models for visual recognition. IJCV (2007)Google Scholar
  13. 13.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI (2010)Google Scholar
  14. 14.
    Grauman, K., Darrell, T.: The pyramid match kernel: discriminative classification with sets of image features. In: ICCV (2005)Google Scholar
  15. 15.
    Joachims, T.: Training linear svms in linear time. In: ACM KDD. ACM (2006)Google Scholar
  16. 16.
    Tuzel, O., Porikli, F., Meer, P.: Pedestrian detection via classification on riemannian manifolds. TPAMI (2008)Google Scholar
  17. 17.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for Large-Scale Image Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Carreira, J., Li, F., Sminchisescu, C.: Object Recognition by Sequential Figure-Ground Ranking. IJCV (2012)Google Scholar
  19. 19.
    Ion, A., Carreira, J., Sminchisescu, C.: Probabilistic joint segmentation and labeling. In: NIPS (2011)Google Scholar
  20. 20.
    Arbelaez, P., Hariharan, B., Gu, C., Gupta, S., Bourdev, L., Malik, J.: Semantic segmentation using regions and parts. In: CVPR (2012)Google Scholar
  21. 21.
    Bhatia, R.: Positive Definite Matrices. Princeton Series in Applied Mathematics. Princeton University Press, Princeton (2007)Google Scholar
  22. 22.
    Caseiro, R., Henriques, J., Martins, P., Batista, J.: A nonparametric riemannian framework on tensor field with application to foreground segmentation. In: ICCV (2011)Google Scholar
  23. 23.
    Davies, P.I., Higham, N.J.: A schur-parlett algorithm for computing matrix functions (2003)Google Scholar
  24. 24.
    Caputo, B., Jie, L.: A performance evaluation of exact and approximate match kernels for object recognition. ELCVIA (2010)Google Scholar
  25. 25.
    Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. TPAMI (2002)Google Scholar
  26. 26.
    Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008),
  27. 27.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)Google Scholar
  28. 28.
    Lampert, C., Blaschko, M., Hofmann, T.: Beyond sliding windows: Object localization by efficient subwindow search. In: CVPR (2008)Google Scholar
  29. 29.
    Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: From contours to regions: An empirical evaluation. In: CVPR, pp. 2294–2301 (2009)Google Scholar
  30. 30.
    Carreira, J., Sminchisescu, C.: CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts. TPAMI (2012)Google Scholar
  31. 31.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. JMLR (2008)Google Scholar
  32. 32.
    Hariharan, B., Arbelaez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: ICCV (2011)Google Scholar
  33. 33.
    Jolliffe, I.: Principal Component Analysis. Springer (1986)Google Scholar
  34. 34.
    Ladický, Ľ., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, Where and How Many? Combining Object Detectors and CRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  35. 35.
    Gonfaus, J.M., Boix, X., van de Weijer, J., Bagdanov, A.D., Serrat, J., Gonzàlez, J.: Harmony potentials for joint classification and segmentation. In: CVPR (2010)Google Scholar
  36. 36.
    Wang, J., Yang, J., Yu, K., Lv, F., Huang, T.S., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)Google Scholar
  37. 37.
    Bo, L., Sminchisescu, C.: Efficient Match Kernel between Sets of Features for Visual Recognition. In: NIPS (2009)Google Scholar
  38. 38.
    Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: CVPR (2008)Google Scholar
  39. 39.
    Duchenne, O., Joulin, A., Ponce, J.: A graph-matching kernel for object categorization. In: ICCV (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • João Carreira
    • 1
    • 2
  • Rui Caseiro
    • 1
  • Jorge Batista
    • 1
  • Cristian Sminchisescu
    • 2
  1. 1.Institute of Systems and RoboticsUniversity of CoimbraPortugal
  2. 2.Faculty of Mathematics and Natural SciencesUniversity of BonnGermany

Personalised recommendations