Human position and head direction tracking in fisheye camera using randomized ferns and fisheye histograms of oriented gradients

  • Veerachart SrisamosornEmail author
  • Noriaki Kuwahara
  • Atsushi Yamashita
  • Taiki Ogata
  • Shouhei Shirafuji
  • Jun Ota
Original Article


This paper proposes a system for tracking human position and head direction using fisheye camera mounted to the ceiling. This is believed to be the first system to estimate head direction from ceiling-mounted fisheye camera. Fisheye histograms of oriented gradients descriptor is developed as a substitute to the histograms of oriented gradients descriptor which has been widely used for human detection in perspective camera. Human body and head are detected by the proposed descriptor and tracked to extract head area for direction estimation. Direction estimation using randomized ferns is adapted to work with fisheye images by using the proposed descriptor, guided by the direction of movement. With experiments on available dataset and new dataset with ground truth, the direction can be estimated with average error below \(40^{\circ }\), with head position error half of the head size.


Human tracking Fisheye camera Video surveillance Head direction estimation 



This work was partially supported by JSPS KAKENHI (Grant Number 15H01698).

Compliance with ethical standards

Conflict of interest

All authors declare that they have no conflict of interest.

Supplementary material

Supplementary material 1 (mp4 14768 KB)

Supplementary material 2 (mp4 41017 KB)

Supplementary material 3 (mp4 40135 KB)


  1. 1.
    Srisamosorn, V., Kuwahara, N., Yamashita, A., Ogata, T., Ota, J.: Human-tracking system using quadrotors and multiple environmental cameras for face-tracking application. Int. J. Adv. Robot. Syst. 14(5), 1–18 (2017)CrossRefGoogle Scholar
  2. 2.
    Yoshimoto, H., Date, N., Yonemoto, S.: Vision-based real-time motion capture system using multiple cameras. In: Proceedings of IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, MFI2003, pp. 247–251 (2003)Google Scholar
  3. 3.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 (2005)Google Scholar
  4. 4.
    Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2012)CrossRefGoogle Scholar
  5. 5.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  6. 6.
    Kittipanya-ngam, P., Ong, S.G., Eng, H.L.: Estimation of human body orientation using histogram of oriented gradients. In: 12th IAPR Conference on Machine Vision Applications, pp. 459–462 (2011)Google Scholar
  7. 7.
    Liu, B., Wu, H., Su, W., Sun, J.: Sector-ring HOG for rotation-invariant human detection. Signal Process. Image Commun. 54, 1–10 (2017)CrossRefGoogle Scholar
  8. 8.
    Meinel, L., Findeisen, M., Heß, M., Apitzsch, A., Hirtz, G.: Automated real-time surveillance for ambient assisted living using an omnidirectional camera. In: 2014 IEEE International Conference on Consumer Electronics (ICCE), pp. 396–399 (2014)Google Scholar
  9. 9.
    Zhou, Z., Chen, X., Chung, Y.C., He, Z., Han, T.X., Keller, J.M.: Activity analysis, summarization, and visualization for indoor human activity monitoring. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1489–1498 (2008)CrossRefGoogle Scholar
  10. 10.
    Demiröz, B.E., Arı, I., Eroğlu, O., Salah, A.A., Akarun, L.: Feature-based tracking on a multi-omnidirectional camera dataset. In: 2012 5th International Symposium on Communications, Control and Signal Processing, pp. 1–5 (2012)Google Scholar
  11. 11.
    Saito, M., Kitaguchi, K., Kimura, G., Hashimoto, M.: Human detection from fish-eye image by bayesian combination of probabilistic appearance models. In: 2010 IEEE International Conference on Systems, Man, and Cybernetics, pp. 243–248 (2010)Google Scholar
  12. 12.
    Chiang, A.T., Wang, Y.: Human detection in fish-eye images using HOG-based detectors over rotated windows. In: 2014 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–6 (2014)Google Scholar
  13. 13.
    Tang, Y., Li, Y., Bai, T., Zhou, X., Li, Z.: Human tracking in thermal catadioptric omnidirectional vision. In: 2011 IEEE International Conference on Information and Automation, pp. 97–102 (2011)Google Scholar
  14. 14.
    Tasson, D., Montagnini, A., Marzotto, R., Farenzena, M., Cristani, M.: FPGA-based pedestrian detection under strong distortions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 65–70 (2015)Google Scholar
  15. 15.
    Cinaroglu, I., Bastanlar, Y.: A direct approach for human detection with catadioptric omnidirectional cameras. In: 2014 22nd Signal Processing and Communications Applications Conference (SIU), pp. 2275–2279 (2014)Google Scholar
  16. 16.
    Krams, O., Kiryati, N.: People detection in top-view fisheye imaging. In: 2017 14th IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS), pp. 1–6 (2017)Google Scholar
  17. 17.
    Delibasis, K.K., Plagianakos, V.P., Maglogiannis, I.: Pose recognition in indoor environments using a fisheye camera and a parametric human model. In: 2014 International Conference on Computer Vision Theory and Applications (VISAPP), vol. 2, pp. 470–477 (2014)Google Scholar
  18. 18.
    Jalal, A., Kim, Y.H., Kim, Y.J., Kamal, S., Kim, D.: Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recognit. 61, 295–308 (2017)CrossRefGoogle Scholar
  19. 19.
    Jalal, A., Sarif, N., Kim, J.T., Kim, T.S.: Human activity recognition via recognized body parts of human depth silhouettes for residents monitoring services at smart home. Indoor Built Environ. 22(1), 271–279 (2013)CrossRefGoogle Scholar
  20. 20.
    Nguyen, V.T., Nguyen, T.B., Chung, S.T.: ConvNets and AGMM based real-time human detection under fisheye camera for embedded surveillance. In: 2016 International Conference on Information and Communication Technology Convergence (ICTC), pp. 840–845 (2016)Google Scholar
  21. 21.
    Bensebaa, A., Larabi, S.: Direction estimation of moving pedestrian groups for intelligent vehicles. Vis. Comput. 34(6), 1109–1118 (2018)CrossRefGoogle Scholar
  22. 22.
    Benfold, B., Reid, I.: Guiding visual surveillance by tracking human attention. In: Proceedings of the 20th British Machine Vision Conference (2009)Google Scholar
  23. 23.
    Özuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random ferns. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 448–461 (2010)CrossRefGoogle Scholar
  24. 24.
    Benfold, B.: The acquisition of coarse gaze estimates in visual surveillance. Ph.D. thesis, Oxford University (2011)Google Scholar
  25. 25.
    Chen, C., Odobez, J.M.: We are not contortionists: Coupled adaptive learning for head and body orientation estimation in surveillance video. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1544–1551 (2012)Google Scholar
  26. 26.
    Rehder, E., Kloeden, H., Stiller, C.: Head detection and orientation estimation for pedestrian safety. In: 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), pp. 2292–2297 (2014)Google Scholar
  27. 27.
    Yan, Y., Ricci, E., Subramanian, R., Liu, G., Lanz, O., Sebe, N.: A multi-task learning framework for head pose estimation under target motion. IEEE Trans. Pattern Anal. Mach. Intell. 38(6), 1070–1083 (2016)CrossRefGoogle Scholar
  28. 28.
    Benfold, B., Reid, I.: Unsupervised learning of a scene-specific coarse gaze estimator. In: 2011 International Conference on Computer Vision, pp. 2344–2351 (2011)Google Scholar
  29. 29.
    Chamveha, I., Sugano, Y., Sugimura, D., Siriteerakul, T., Okabe, T., Sato, Y., Sugimoto, A.: Head direction estimation from low resolution images with scene adaptation. Comput. Vis. Image Underst. 117(10), 1502–1511 (2013)CrossRefGoogle Scholar
  30. 30.
    Hulens, D., Van Beeck, K., Goedemé, T.: Fast and accurate face orientation measurement in low-resolution images on embedded hardware. In: Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016), vol. 4, pp. 538–544. Scitepress (2016)Google Scholar
  31. 31.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. I-511–I-518 (2001)Google Scholar
  32. 32.
    Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Zivkovic, Z.: Improved adaptive gaussian mixture model for background subtraction. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 2, pp. 28–31 (2004)Google Scholar
  34. 34.
    Grupo de Tratamiento de Imágenes, Universidad Politécnica de Madrid (GTI-UPM): PIROPO Database. Last accessed 30 Mar 2018
  35. 35.
    Dalal, N.: INRIA Person Dataset. Last accessed 22 Aug 2018
  36. 36.
    Prisacariu, V., Reid, I.: fastHOG—a real-time GPU implementation of HOG. Technical report 2310/09, Department of Engineering Science, Oxford University (2009)Google Scholar
  37. 37.
    Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)CrossRefGoogle Scholar
  38. 38.
    Motion Analysis Corporation: Motion Analysis Corporation, the Motion Capture Leader. Last accessed 21 July 2018
  39. 39.
    Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1335–1340 (2006)CrossRefGoogle Scholar
  40. 40.
    Sobral, A.: BGSLibrary: An OpenCV C++ background subtraction library. In: IX Workshop de Visão Computacional (WVC’2013). Rio de Janeiro, Brazil (2013). Accessed 15 Mar 2018
  41. 41.
    Lepetit, V., Özuysal, M., Pilet, J.: Ferns: Planar Object Detection Demo | CVLAB. Last accessed 21 Aug 2018

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Precision Engineering, School of EngineeringThe University of TokyoTokyoJapan
  2. 2.Graduate School of Science and TechnologyKyoto Institute of TechnologyKyotoJapan
  3. 3.Department of Computer Science, School of ComputingTokyo Institute of TechnologyTokyoJapan
  4. 4.Research Into Artifacts, Center for Engineering (RACE), School of EngineeringThe University of TokyoTokyoJapan

Personalised recommendations