Advertisement

Stacked Deformable Part Model with Shape Regression for Object Part Localization

  • Junjie Yan
  • Zhen Lei
  • Yang Yang
  • Stan Z. Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8690)

Abstract

This paper explores the localization of pre-defined semantic object parts, which is much more challenging than traditional object detection and very important for applications such as face recognition, HCI and fine-grained object recognition. To address this problem, we make two critical improvements over the widely used deformable part model (DPM). The first is that we use appearance based shape regression to globally estimate the anchor location of each part and then locally refine each part according to the estimated anchor location under the constraint of DPM. The DPM with shape regression (SR-DPM) is more flexible than the traditional DPM by relaxing the fixed anchor location of each part. It enjoys the efficient dynamic programming inference as traditional DPM and can be discriminatively trained via a coordinate descent procedure. The second is that we propose to stack multiple SR-DPMs, where each layer uses the output of previous SR-DPM as the input to progressively refine the result. It provides an analogy to deep neural network while benefiting from hand-crafted feature and model. The proposed methods are applied to human pose estimation, face alignment and general object part localization tasks and achieve state-of-the-art performance.

Keywords

Object Detection Deep Neural Network Object Part Active Appearance Model Regression Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: CVPR. IEEE (2009)Google Scholar
  2. 2.
    Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting with constrained local models. In: CVPR. IEEE (2013)Google Scholar
  3. 3.
    Azizpour, H., Laptev, I.: Object detection using strongly-supervised deformable part models. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 836–849. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. In: CVPR. IEEE (2011)Google Scholar
  5. 5.
    Bengio, Y.: Learning deep architectures for ai. Foundations and trends® in Machine Learning (2009)Google Scholar
  6. 6.
    Berg, T., Belhumeur, P.N.: Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: CVPR. IEEE (2013)Google Scholar
  7. 7.
    Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 168–181. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: CVPR. IEEE (2012)Google Scholar
  9. 9.
    Chen, D., Cao, X., Wen, F., Sun, J.: Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification. In: CVPR. IEEE (2013)Google Scholar
  10. 10.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. PAMI (2001)Google Scholar
  11. 11.
    Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. CVIU (1995)Google Scholar
  12. 12.
    Cristinacce, D., Cootes, T.: Automatic feature localisation with constrained local models. Pattern Recognition (2008)Google Scholar
  13. 13.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR. IEEE (2005)Google Scholar
  14. 14.
    Dantone, M., Gall, J., Fanelli, G., Van Gool, L.: Real-time facial feature detection using conditional regression forests. In: CVPR. IEEE (2012)Google Scholar
  15. 15.
    Desai, C., Ramanan, D.: Detecting actions, poses, and objects with relational phraselets. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 158–172. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  16. 16.
    Dollár, P., Welinder, P., Perona, P.: Cascaded pose regression. In: CVPR. IEEE (2010)Google Scholar
  17. 17.
    Eichner, M., Ferrari, V.: Better appearance models for pictorial structures (2009)Google Scholar
  18. 18.
    Eichner, M., Ferrari, V.: Appearance sharing for collective human pose estimation. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 138–151. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  19. 19.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV pp. 303–338 (2010)Google Scholar
  20. 20.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI (2010)Google Scholar
  21. 21.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. IJCV (2005)Google Scholar
  22. 22.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR. IEEE (2008)Google Scholar
  23. 23.
    Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Transactions on Computers (1973)Google Scholar
  24. 24.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint (2013)Google Scholar
  25. 25.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC (2010)Google Scholar
  26. 26.
    Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: CVPR. IEEE (2011)Google Scholar
  27. 27.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  28. 28.
    Moeslund, T.B., Hilton, A., Krüger, V., Sigal, L.: Visual Analysis of Humans. Springer (2011)Google Scholar
  29. 29.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: CVPR. IEEE (2013)Google Scholar
  30. 30.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Pstrong appearance and expressive spatial models for human pose estimation. In: ICCV. IEEE (2013)Google Scholar
  31. 31.
    Sadeghi, M.A., Farhadi, A.: Recognition using visual phrases. In: CVPR. IEEE (2011)Google Scholar
  32. 32.
    Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: The first facial landmark localization challenge (2013)Google Scholar
  33. 33.
    Sapp, B., Taskar, B.: Modec: Multimodal decomposable models for human pose estimation. In: CVPR. IEEE (2013)Google Scholar
  34. 34.
    Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. IJCV (2011)Google Scholar
  35. 35.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep fisher networks for large-scale image classification. In: NIPS (2013)Google Scholar
  36. 36.
    Sun, M., Savarese, S.: Articulated part-based model for joint object detection and pose estimation. In: ICCV. IEEE (2011)Google Scholar
  37. 37.
    Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: CVPR. IEEE (2013)Google Scholar
  38. 38.
    Tian, Y., Zitnick, C.L., Narasimhan, S.G.: Exploring the spatial hierarchy of mixture models for human pose estimation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 256–269. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  39. 39.
    Tran, D., Forsyth, D.: Improved human parsing with a full relational model. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 227–240. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  40. 40.
    Wang, F., Li, Y.: Beyond physical connections: Tree models in human pose estimation. In: CVPR. IEEE (2013)Google Scholar
  41. 41.
    Wang, Y., Mori, G.: Multiple tree models for occlusion and spatial constraints in human pose estimation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 710–724. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  42. 42.
    Wang, Y., Tran, D., Liao, Z.: Learning hierarchical poselets for human parsing. In: CVPR. IEEE (2011)Google Scholar
  43. 43.
    Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR. IEEE (2013)Google Scholar
  44. 44.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR. IEEE (2011)Google Scholar
  45. 45.
    Yu, X., Huang, J., Zhang, S., Yan, W., Metaxas, D.N.: Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In: ICCV (2013)Google Scholar
  46. 46.
    Zhang, N., Farrell, R., Iandola, F., Darrell, T.: Deformable part descriptors for fine-grained recognition and attribute prediction. In: ICCV (2013)Google Scholar
  47. 47.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: CVPR. IEEE (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Junjie Yan
    • 1
  • Zhen Lei
    • 1
  • Yang Yang
    • 1
  • Stan Z. Li
    • 1
  1. 1.Center for Biometrics and Security Research & National Laboratory of Pattern Recognition, Institute of AutomationChinese Academy of SciencesChina

Personalised recommendations