3D pedestrian tracking and frontal face image capture based on head point detection

  • Zhongchuan ZhangEmail author
  • Fernand Cohen


This paper proposes a method to track pedestrians in crowded scenes and capture the close-up frontal face images of a person of interest (POI) for recognition. Pedestrians are tracked via 3D positions of the head points (the highest point of a person) using 2 static overhead cameras. Head points are located and tracked based on the geometric and color cues in the scene. Possible head areas in a frame acquired from one of the overhead cameras are determined based on projective geometry. Head areas belonging to a person are clustered. Without creating a full disparity map of the scene, the 3D position of a pedestrian is obtained by utilizing the disparity along the line segment that passes through his/her head top. The 3D head position is then tracked using common assumptions on motion velocity. If the tracking is not accurate enough, the color distribution of a head top is integrated as a complementary method. With the 3D head point information, a set of pan-tilt-zoom (PTZ) cameras are scheduled to capture the frontal face images of POI. A most suitable PTZ camera is selected by evaluating the capture quality of each PTZ camera and its current state. The approach is tested using a publicly available visual surveillance simulation test bed. The experiments show that the 3D tracking errors are around 4 cm and high quality frontal face images are captured.


3D head position detection Pedestrian tracking Overhead camera Crowded scene Facial image capture Pan-tilt-zoom camera scheduling 



  1. 1.
    Bellotto N, Sommerlade E, Benfold B, Bibby C, Reid I, Roth D et al (2009) A distributed camera system for multi-resolution surveillance. In ACM/IEEE International Conference on Distributed Smart Cameras, pp. 1–8Google Scholar
  2. 2.
    Beymer D (2000) Person counting using stereo. Workshop on Human Motion:127–133Google Scholar
  3. 3.
    Bimbo AD, Pernici F (2006) Towards on-line saccade planning for high-resolution image sensing. Pattern Recogn Lett 27:1826–1834CrossRefGoogle Scholar
  4. 4.
    Boltes M, Seyfried A (2013) Collecting pedestrian trajectories. Neurocomputing 100:127–133CrossRefGoogle Scholar
  5. 5.
    Boltes M, Seyfried A, Steffen B, Schadschneider A (2010) Automatic extraction of pedestrian trajectories from video recordings. In Pedestrian and Evacuation Dynamics 2008, W. W. F. Klingsch, C. Rogsch, A. Schadschneider, and M. Schreckenberg, Eds., ed, pp. 43–54Google Scholar
  6. 6.
    Brostow G, Cipolla R (2006) Unsupervised bayesian detection of independent motion in crowds. IEEE Conference on Computer Vision and Pattern Recognition:594–601Google Scholar
  7. 7.
    Collins RT, Lipton AJ, Fujiyoshi H, Kanade T (2001) Algorithms for cooperative multisensor surveillance. Proc IEEE:1456–1477Google Scholar
  8. 8.
    Comaniciu D, Meer P (2002) Mean shift: A robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24:603–619CrossRefGoogle Scholar
  9. 9.
    Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 25:564–577CrossRefGoogle Scholar
  10. 10.
    Crow FC (1984) Summed-area tables for texture mapping. SIGGRAPH:207–212Google Scholar
  11. 11.
    Daugman J (2002) How iris recognition works. International Conference on Image Processing:33–36Google Scholar
  12. 12.
    Delannay D, Danhier N, Vleeschouwer CD (2009) Detection and recognition of sports(wo)man from multiple views. In ACM/IEEE International Conference on Distributed Smart Cameras, pp. 1–7Google Scholar
  13. 13.
    Eshel R, Moses Y (2008) Homography based multiple camera detection and tracking of people in a dense crowd. IEEE Conference on Computer Vision and Pattern Recognition:1–8Google Scholar
  14. 14.
    Guo R, Dai Q, Hoiem D (2013) Paired Regions for Shadow Detection and Removal. IEEE Trans Pattern Anal Mach Intell 35:2956–2967CrossRefGoogle Scholar
  15. 15.
    Hampapur A, Pankanti S, Senior A, Tian Y-L, Brown L, Bolle R (2003) Face cataloger: Multi-scale imaging for relating identity to location. IEEE Conference on Advanced Video and Signal Based Surveillance:13–20Google Scholar
  16. 16.
    Jin Z, Bhanu B (2015) Analysis-by-synthesis: Pedestrian Tracking with Crowd Simulation Models in a Multi-camera Video Network. Comput Vis Image Underst 134:48–63CrossRefGoogle Scholar
  17. 17.
    Kailath T (1967) The divergence and bhattacharyya distance measures in signal selection. IEEE Transactions on Communication Technology 15:52–60CrossRefGoogle Scholar
  18. 18.
    Kawanaka H, Fujiyoshi H, Iwahori Y (2006) Human head tracking in three dimensional voxel space. International Conference on Pattern Recognition:826–829Google Scholar
  19. 19.
    Khan SM, Shah M (2006) A multi-view approach to tracking people in crowded scenes using a planar homography constraint. European Conference on Computer Vision:133–146Google Scholar
  20. 20.
    Khan SM, Shah M (2009) Tracking multiple occluding people by localizing on multiple scene planes. IEEE Trans Pattern Anal Mach Intell 31:505–519CrossRefGoogle Scholar
  21. 21.
    Krumm J, Harris S, Meyers B, Brumitt B, Hale M, Sha S (2000) Multi-camera multi-person tracking for easy living. Third IEEE International Workshop on Visual SurveillanceGoogle Scholar
  22. 22.
    Marchesotti L, Marcenaro L, Regazzoni C (2003) Dual camera system for face detection in unconstrained environments. International Conference on Image Processing:681–684Google Scholar
  23. 23.
    Mittal A, Larry S (2003) M2tracker: A multi-view approach to segmenting and tracking people in a cluttered scene. 51:189–203Google Scholar
  24. 24.
    Ning J, Zhang L, Zhang D, Wu C (2012) Scale and orientation adaptive mean shift tracking. IET Comput Vis 6:52–61MathSciNetCrossRefGoogle Scholar
  25. 25.
    Nummiaro K, Koller-Meier E, Van Gool L (2003) An adaptive color-based particle filter. Image Vis Comput 21:99–110CrossRefzbMATHGoogle Scholar
  26. 26.
    Oosterhout TV, Bakkes S, Kröse BJA (2011) Head detection in stereo data for people counting and segmentation. In: International Conference on Computer Vision Theory and Applications, pp. 620–625.Google Scholar
  27. 27.
    Oosterhout TV, Englebienne G, Kröse B (2015) RARE: People Detection in Crowded Passages by Range Image Reconstruction. Mach Vis Appl 26:561–573CrossRefGoogle Scholar
  28. 28.
    Oosterhout TV, Kröse BJA, Englebienne G (2012) People counting with stereo cameras - two template-based solutions. In International Conference on Computer Vision Theory and Applications (2), pp. 404–408Google Scholar
  29. 29.
    Orwell J, Massey S, Remagnino P, Greenhill D, Jones G (1999) A multi-agent framework for visual surveillance. IEEE International 1st Conference on Image ProcessingGoogle Scholar
  30. 30.
    Ozturk O, Yamasaki T, Aizawa K (2009) Tracking of humans and estimation of body/head orientation from top-view single camera for visual focus of attention analysis. International Conference on Computer Vision:1020–1027Google Scholar
  31. 31.
    Prince SJD, Elder JH, Hou Y, Sizinstev M (2005) Pre-attentive face detection for foveated wide-field surveillance. IEEE Workshop on Applications on Computer Vision:439–446Google Scholar
  32. 32.
    Qureshi FZ, Terzopoulos D (2006) Surveillance camera scheduling: A virtual vision approach. Multimedia Systems 12:269–283CrossRefGoogle Scholar
  33. 33.
    Rougier C, Meunier J, St-Arnaud A, Rousseau J (2013) 3d head tracking for fall detection using a single calibrated camera. Image Vis Comput 31:246–254CrossRefGoogle Scholar
  34. 34.
    Sanin A, Sanderson C, Lovell BC (2010) Improved Shadow Removal for Robust Person Tracking in Surveillance Scenarios. International Conference on Pattern Recognition:141–144Google Scholar
  35. 35.
    Santos TT, Morimoto CH (2011) Multiple camera people detection and tracking using support integration. Pattern Recogn Lett 32:47–55CrossRefGoogle Scholar
  36. 36.
    Sasi RK, Govindan VK (2016) Shadow removal using sparse representation over local dictionaries. Engineering Science and Technology, an International Journal 192:1067–1075CrossRefGoogle Scholar
  37. 37.
    Sun L, Di H, Tao L, Xu G (2010) A robust approach for person localization in multi-camera environment. International Conference on Pattern Recognition:4036–4039Google Scholar
  38. 38.
    Taylor GR, Chosak AJ, Brewer PC (2007) OVVV: using virtual worlds to design and evaluate surveillance systems. IEEE Conference on Computer Vision and Pattern Recognition:1–8Google Scholar
  39. 39.
    Veksler O (2003) Fast variable window for stereo correspondence using integral images. IEEE Conference on Computer Vision and Pattern Recognition:556–561Google Scholar
  40. 40.
    Vincent L (1993) Gray scale area openings and closings, their efficient implementation and applications. Workshop on Mathematical Morphology Applications Signal Processing:22–27Google Scholar
  41. 41.
    Viola P, Jones M (2004) Robust real-time face detection. Int J Comput Vis 57:137–154CrossRefGoogle Scholar
  42. 42.
    Wang J, Zhang C, Shum H (2004) Face image resolution versus face recognition performance based on two global methods. In Asia Conference on Computer VisionGoogle Scholar
  43. 43.
    Yatim HSM, Talib AZ, Haron F (2017) An Automated Image-Based Approach for Tracking Pedestrian Movements from Top-View Video. In: International Visual Informatics Conference, pp 279–289CrossRefGoogle Scholar
  44. 44.
    Zhang Z, Cohen F (2013) Pedestrian tracking based on 3d head point detection. International Conference on Computer Vision Theory and Applications 2:382–385Google Scholar
  45. 45.
    Zhang Z, Cohen F (2013) 3d pedestrian tracking based on overhead cameras. International Conference on Distributed Smart Cameras:1–6Google Scholar
  46. 46.
    Zhao T, Nevatia R (2004) Tracking multiple humans in complex situations. IEEE Trans Pattern Anal Mach Intell 26:1208–1221CrossRefGoogle Scholar
  47. 47.
    Zhou X, Collins RT, Kanade T, Metes P (2003) A master-slave system to acquire biometric imagery of humans at distance. In First ACM SIGMM international workshop on Video surveillance, pp. 113–120Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Applied Materials IncSanta ClaraUSA
  2. 2.Electrical and Computer Engineering DepartmentDrexel UniversityPhiladelphiaUSA

Personalised recommendations