A Method of Detecting Human Head by Eliminating Redundancy in Dataset

  • Chao Le
  • Huimin MaEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 875)


The method of constructing an image dataset by sampling images from videos with a short interval keeps the information in the video but also brings redundancy and increases the training costs significantly. In this paper, we propose a method to detect human heads with less training cost and higher performance, including: (1) A filtering standard to screen out the useless image in video-based image dataset with almost the same average precision. (2) An effective head detection model with the fusion of shoulder context. We evaluate our method on a human head dataset – HollywoodHeads and achieve reasonably good performance. This result shows that our method is very useful in human head detection task.


Convolutional neural network Dataset filtering Head detection 


  1. 1.
    Aziz, K.: Head detection based on skeleton graph method for counting people in crowded environments. J. Electron. Imaging 25(1), 013012 (2016)CrossRefGoogle Scholar
  2. 2.
    Felzenszwalb, P., Mcallester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8 (2008)Google Scholar
  3. 3.
    Geiger, A.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)Google Scholar
  4. 4.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2015)CrossRefGoogle Scholar
  5. 5.
    Jafari, O.H., Mitzel, D., Leibe, B.: Real-time RGB-D based people detection and tracking for mobile robots and head-worn cameras. In: IEEE International Conference on Robotics and Automation, pp. 5636–5643 (2014)Google Scholar
  6. 6.
    Lin, T.Y., Dollr, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection (2017)Google Scholar
  7. 7.
    Marin-Jimenez, M.J., Zisserman, A., Eichner, M., Ferrari, V.: Detecting people looking at each other in videos. Int. J. Comput. Vision 106(3), 282–296 (2014)CrossRefGoogle Scholar
  8. 8.
    Patronperez, A., Marszalek, M., Reid, I., Zisserman, A.: Structured learning of human interactions in tv shows. IEEE Trans. Pattern Anal. Mach. Intell. 34(12), 2441–2453 (2012)CrossRefGoogle Scholar
  9. 9.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRefGoogle Scholar
  10. 10.
    Schmid, C., Zisserman, A.: Human focused action localization in video. In: European Conference on Computer Vision, pp. 219–233 (2010)Google Scholar
  11. 11.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Computer Vision and Pattern Recognition, pp. 761–769 (2016)Google Scholar
  12. 12.
    Stewart, R., Andriluka, M., Ng, A.Y.: End-to-end people detection in crowded scenes. In: Computer Vision and Pattern Recognition, pp. 2325–2333 (2016)Google Scholar
  13. 13.
    Vu, T.H., Osokin, A., Laptev, I.: Context-aware CNNs for person head detection. In: IEEE International Conference on Computer Vision, pp. 2893–2901 (2015)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of Electronic EngineeringTsinghua UniversityBeijingChina

Personalised recommendations