Deep Learning of Scene-Specific Classifier for Pedestrian Detection

  • Xingyu Zeng
  • Wanli Ouyang
  • Meng Wang
  • Xiaogang Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8691)


The performance of a detector depends much on its training dataset and drops significantly when the detector is applied to a new scene due to the large variations between the source training dataset and the target scene. In order to bridge this appearance gap, we propose a deep model to automatically learn scene-specific features and visual patterns in static video surveillance without any manual labels from the target scene. It jointly learns a scene-specific classifier and the distribution of the target samples. Both tasks share multi-scale feature representations with both discriminative and representative power. We also propose a cluster layer in the deep model that utilizes the scene-specific visual patterns for pedestrian detection. Our specifically designed objective function not only incorporates the confidence scores of target training samples but also automatically weights the importance of source training samples by fitting the marginal distributions of target samples. It significantly improves the detection rates at 1 FPPI by 10% compared with the state-of-the-art domain adaptation methods on MIT Traffic Dataset and CUHK Square Dataset.


Domain Adaptation Convolutional Neural Network Transfer Learning Pedestrian Detection Convolutional Layer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Benfold, B., Reid, I.: Stable multi-target tracking in real-time surveillance video. In: CVPR (2011)Google Scholar
  2. 2.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: ACM COLT (1998)Google Scholar
  3. 3.
    Chen, M., Xu, Z., Weinberger, K., Sha, F.: Marginalized denoising autoencoders for domain adaptation (2012)Google Scholar
  4. 4.
    Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: ICML (2007)Google Scholar
  5. 5.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  6. 6.
    Daumé III, H., Kumar, A., Saha, A.: Frustratingly easy semi-supervised domain adaptation. In: Proc. Workshop on Domain Adaptation for Natural Language Processing (2010)Google Scholar
  7. 7.
    Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: BMVC (2009)Google Scholar
  8. 8.
    Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: An evaluation of the state of the art. PAMI 34(4), 743–761 (2012)CrossRefGoogle Scholar
  9. 9.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  10. 10.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  11. 11.
    Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: A deep learning approach. In: ICML (2011)Google Scholar
  12. 12.
    Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: CVPR (2012)Google Scholar
  13. 13.
    Goodfellow, I.J., Courville, A., Bengio, Y.: Spike-and-slab sparse coding for unsupervised feature discovery. NIPS Workshop Challenges in Learning Hierarchical Models (2012)Google Scholar
  14. 14.
    Gopalan, R., Li, R., Chellappa, R.: Domain adaptation for object recognition: An unsupervised approach. In: ICCV (2011)Google Scholar
  15. 15.
    Guyon, I., Dror, G., Lemaire, V., Taylor, G., Aha, D.W.: Unsupervised and transfer learning challenge. In: IJCNN (2011)Google Scholar
  16. 16.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  17. 17.
    Jiang, W., Zavesky, E., Chang, S.F., Loui, A.: Cross-domain learning methods for high-level visual concept classification. In: ICIP (2008)Google Scholar
  18. 18.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, vol. 1, p. 4 (2012)Google Scholar
  19. 19.
    Le, Q.V., Ranzato, M., Salakhutdinov, R., Ng, A., Tenenbaum, J.: Challenges in learning hierarchical models: Transfer learning and optimization. In: NIPS Workshop (2011)Google Scholar
  20. 20.
    Levin, A., Viola, P., Freund, Y.: Unsupervised improvement of visual detectors using cotraining. In: ICCV (2003)Google Scholar
  21. 21.
    Luo, P., Tian, Y., Wang, X., Tang, X.: Switchable deep network for pedestrian detection. In: CVPR (2014)Google Scholar
  22. 22.
    Mesnil, G., Dauphin, Y., Glorot, X., Rifai, S., Bengio, Y., Goodfellow, I.J., Lavoie, E., Muller, X., Desjardins, G., Warde-Farley, D., et al.: Unsupervised and transfer learning challenge: a deep learning approach. JMLR-Proceedings Track 27, 97–110 (2012)Google Scholar
  23. 23.
    Nair, V., Clark, J.J.: An unsupervised, online learning framework for moving object detection. In: CVPR (2004)Google Scholar
  24. 24.
    Ouyang, W., Wang, X.: Single-pedestrian detection aided by multi-pedestrian detection. In: CVPR (2013)Google Scholar
  25. 25.
    Ouyang, W., Wang, X.: A discriminative deep model for pedestrian detection with occlusion handling. In: CVPR (2012)Google Scholar
  26. 26.
    Ouyang, W., Wang, X.: Joint deep learning for pedestrian detection. In: ICCV (2013)Google Scholar
  27. 27.
    Ouyang, W., Zeng, X., Wang, X.: Modeling mutual visibility relationship in pedestrian detection. In: CVPR (2013)Google Scholar
  28. 28.
    Pang, J., Huang, Q., Yan, S., Jiang, S., Qin, L.: Transferring boosted detectors towards viewpoint and scene adaptiveness. TIP 20(5), 1388–1400 (2011)MathSciNetGoogle Scholar
  29. 29.
    Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: WACV (2005)Google Scholar
  30. 30.
    Roth, P.M., Sternig, S., Grabner, H., Bischof, H.: Classifier grids for robust adaptive object detection. In: CVPR (2009)Google Scholar
  31. 31.
    Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: CVPR (2013)Google Scholar
  32. 32.
    Wang, M., Li, W., Wang, X.: Transferring a generic pedestrian detector towards specific scenes. In: CVPR (2012)Google Scholar
  33. 33.
    Wang, M., Wang, X.: Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In: CVPR (2011)Google Scholar
  34. 34.
    Wang, X., Wang, M., Li, W.: Scene-specific pedestrian detection for static video surveillance. TPAMI 36, 361–374 (2014)CrossRefGoogle Scholar
  35. 35.
    Wang, X., Hua, G., Han, T.X.: Detection by detections: Non-parametric detector adaptation for a video. In: CVPR (2012)Google Scholar
  36. 36.
    Wu, B., Nevatia, R.: Improving part based object detection by unsupervised, online boosting. In: CVPR (2007)Google Scholar
  37. 37.
    Yang, J., Yan, R., Hauptmann, A.G.: Cross-domain video concept detection using adaptive svms. ACM Multimedia (2007)Google Scholar
  38. 38.
    Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. PAMI 35(12), 2878–2890 (2013)CrossRefGoogle Scholar
  39. 39.
    Zeng, X., Ouyang, W., Wang, X.: Multi-stage contextual deep learning for pedestrian detection. In: ICCV (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Xingyu Zeng
    • 1
  • Wanli Ouyang
    • 1
  • Meng Wang
    • 1
  • Xiaogang Wang
    • 1
    • 2
  1. 1.The Chinese University of Hong KongShatinHong Kong
  2. 2.Shenzhen Institutes of Advanced TechnologyChinese Academy of SciencesChina

Personalised recommendations