Human Detection Using Learned Part Alphabet and Pose Dictionary

  • Cong Yao
  • Xiang Bai
  • Wenyu Liu
  • Longin Jan Latecki
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8693)


As structured data, human body and text are similar in many aspects. In this paper, we make use of the analogy between human body and text to build a compositional model for human detection in natural scenes. Basic concepts and mature techniques in text recognition are introduced into this model. A discriminative alphabet, each grapheme of which is a mid-level element representing a body part, is automatically learned from bounding box labels. Based on this alphabet, the flexible structure of human body is expressed by means of symbolic sequences, which correspond to various human poses and allow for robust, efficient matching. A pose dictionary is constructed from training examples, which is used to verify hypotheses at runtime. Experiments on standard benchmarks demonstrate that the proposed algorithm achieves state-of-the-art or competitive performance.


Human detection mid-level elements part alphabet pose dictionary matching 


  1. 1.
    Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: Proc. CVPR (2008)Google Scholar
  2. 2.
    Bai, X., Wang, X., Latecki, L.J., Liu, W.: Active skeleton for non-rigid object detection. In: Proc. ICCV (2009)Google Scholar
  3. 3.
    Benenson, R., Mathias, M., Timofte, R., Gool, L.V.: Pedestrian detection at 100 frames per second. In: Proc. CVPR (2012)Google Scholar
  4. 4.
    Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 168–181. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: Proc. ICCV (2009)Google Scholar
  6. 6.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  7. 7.
    Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. PAMI 17(8), 790–799 (1995)CrossRefGoogle Scholar
  8. 8.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. CVPR (2005)Google Scholar
  9. 9.
    Doersch, C., Singh, S., Gupta, A., Sivic, J., Efros, A.A.: What makes paris look like paris? ACM Trans. Graphics 31(3), 101 (2012)Google Scholar
  10. 10.
    Dollár, P., Appel, R., Kienzle, W.: Crosstalk cascades for frame-rate pedestrian detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 645–659. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  11. 11.
    Dollár, P., Babenko, B., Belongie, S., Perona, P., Tu, Z.: Multiple component learning for object detection. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 211–224. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    Dollar, P., Belongie, S., Perona, P.: The fastest pedestrian detector in the west. In: Proc. BMVC (2010)Google Scholar
  13. 13.
    Dollar, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: Proc. BMVC (2009)Google Scholar
  14. 14.
    Dollar, P., Wojek, C., Appel, R., Perona, P.: Pedestrian detection: A benchmark. In: Proc. CVPR (2009)Google Scholar
  15. 15.
    Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: An evaluation of the state of the art. IEEE Trans. PAMI 34(4), 743–761 (2012)CrossRefGoogle Scholar
  16. 16.
    Endres, I., Shih, K.J., Jiaa, J., Hoiem, D.: Learning collections of part models for object recognition. In: Proc. CVPR (2013)Google Scholar
  17. 17.
    Enzweiler, M., Gavrila, D.M.: Monocular pedestrian detection: Survey and experiments. IEEE Trans. PAMI 31(12), 2179–2195 (2009)CrossRefGoogle Scholar
  18. 18.
    Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes (VOC) challenge. IJCV 88(2), 303–338 (2010)CrossRefGoogle Scholar
  19. 19.
    Fei-Fei, L., Perona, P.: A bayesian heirarcical model for learning natural scene categories. In: Proc. CVPR (2005)Google Scholar
  20. 20.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. PAMI 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  21. 21.
    Forsyth, D., Fleck, M.: Body plans. In: Proc. CVPR (1997)Google Scholar
  22. 22.
    Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: Proc. CVPR (2009)Google Scholar
  23. 23.
    Girshick, R., Felzenszwalb, P., McAllester, D.: Object detection with grammar models. In: Proc. NIPS (2011)Google Scholar
  24. 24.
    Kukich, K.: Techniques for automatically correcting words in text. ACM Computing Surveys 24(4), 377–439 (1992)CrossRefGoogle Scholar
  25. 25.
    Lee, Y.J., Efros, A.A., Hebert, M.: Style-aware mid-level representation for discovering visual connections in space and time. In: Proc. ICCV (2013)Google Scholar
  26. 26.
    Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77(1-3), 259–289 (2008)CrossRefGoogle Scholar
  27. 27.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1996)MathSciNetGoogle Scholar
  28. 28.
    Li, Y., Liu, B.: A normalized levenshtein distance metric. IEEE Trans. PAMI 29(6), 1091–1095 (2007)CrossRefGoogle Scholar
  29. 29.
    McCann, S., Lowe, D.G.: Spatially local coding for object recognition. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 204–217. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  30. 30.
    Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)CrossRefGoogle Scholar
  31. 31.
    Opelt, A., Pinz, A., Zisserman, A.: Learning an alphabet of shape and appearance for multi-class object detection. IJCV 80(1), 16–44 (2008)CrossRefGoogle Scholar
  32. 32.
    Papageorgiou, C., Poggio, T.: A trainable system for object detection. IJCV 38(1), 15–33 (2000)CrossRefzbMATHGoogle Scholar
  33. 33.
    Van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)Google Scholar
  34. 34.
    Schwartz, W.R., Kembhavi, A., Harwood, D., Davis, L.S.: Human detection using partial least squares analysis. In: Proc. ICCV (2009)Google Scholar
  35. 35.
    Seemann, E., Schiele, B.: Cross-articulation learning for robust detection of pedestrians. In: Franke, K., Müller, K.-R., Nickolay, B., Schäfer, R. (eds.) DAGM 2006. LNCS, vol. 4174, pp. 242–252. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  36. 36.
    Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 73–86. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  37. 37.
    Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proc. ICCV (2003)Google Scholar
  38. 38.
    Song, X., Wu, T., Jia, Y., Zhu, S.C.: Discriminatively trained and-or tree models for object detection. In: Proc. CVPR (2013)Google Scholar
  39. 39.
    Tan, D., Li, Y., Kim, T.K.: Fast pedestrian detection by cascaded random forest with dominant orientation templates. In: Proc. BMVC (2012)Google Scholar
  40. 40.
    Tsai, S.S., Parameswarany, V., Berclazy, J., Vedanthamy, R., Grzeszczuky, R., Girod, B.: Design of a text detection system via hypothesis generation and verification. In: Proc. ACCV (2012)Google Scholar
  41. 41.
    Walk, S., Majer, N., Schindler, K., Schiele, B.: New features and insights for pedestrian detection. In: Proc. ICCV (2010)Google Scholar
  42. 42.
    Wang, X., Bai, X., Yang, X., Liu, W., Latecki, L.J.: Maximal cliques that satisfy hard constraints with application to deformable object model learning. In: Proc. NIPS (2011)Google Scholar
  43. 43.
    Wang, X., Han, T.X., Yan, S.: An HOG-LBP human detector with partial occlusion handling. In: Proc. ICCV (2009)Google Scholar
  44. 44.
    Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: A learned multi-scale representation for scene text recognition. In: Proc. CVPR (2014)Google Scholar
  45. 45.
    Zhu, S.C., Mumford, D.: A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision 2(4), 259–362 (1995)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Cong Yao
    • 1
  • Xiang Bai
    • 1
  • Wenyu Liu
    • 1
  • Longin Jan Latecki
    • 2
  1. 1.Department of Electronics and Information EngineeringHuazhong University of Science and TechnologyChina
  2. 2.Department of Computer and Information SciencesTemple UniversityUSA

Personalised recommendations