Skip to main content

Advertisement

Log in

Human segmentation by geometrically fusing visible-light and thermal imageries

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

From depth sensors to thermal cameras, the increased availability of camera sensors beyond the visible spectrum has created many exciting applications. Most of these applications require combining information from these hyperspectral cameras with a regular RGB camera. Information fusion from multiple heterogeneous cameras can be a very complex problem. They can be fused at different levels from pixel to voxel or even semantic objects, with large variations in accuracy, communication, and computation costs. In this paper, we propose a system for robust segmentation of human figures in video sequences by fusing visible-light and thermal imageries. Our system focuses on the geometric transformation between visual blobs corresponding to human figures observed at both cameras. This approach provides the most reliable fusion at the expense of high computation and communication costs. To reduce the computational complexity of the geometric fusion, an efficient calibration procedure is first applied to rectify the two camera views without the complex procedure of estimating the intrinsic parameters of the cameras. To geometrically register different blobs at the pixel level, a blob-to-blob homography in the rectified domain is then computed in real-time by estimating the disparity for each blob-pair. Precise segmentation is finally achieved using a two-tier tracking algorithm and a unified background model. Our experimental results show that our proposed system provides significant improvements over existing schemes under various conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Beyan C, Yigit A, Temizel A (2011) Fusion of thermal-and visible-band video for abandoned object detection. J Electron Imaging 20:033,001

    Article  Google Scholar 

  2. Bouguet JY (2005) Matlab camera calibration toolbox. Online at http://www.vision.caltech.edu/bouguetj/calib_doc/

  3. Bradski G, Kaehler A (2008) Learning openCV. O’Reilly Media Press

  4. Brown D (1966) Decentering distortion of lenses. Photogramm Eng 32(3):444–462

    Google Scholar 

  5. Bunyak F, Palaniappan K, Nath S, Seetharaman G (2007) Geodesic active contour based fusion of visible and infrared video for persistent object tracking. In: IEEE workshop on applications of computer vision, WACV’07. IEEE, pp 35–35

  6. Cevher V, Sankaranarayanan A, McClellan J, Chellappa R (2007) Target tracking using a joint acoustic video system. IEEE Trans Multimedia 9(4):715–727

    Article  Google Scholar 

  7. Chen S, Zhu W, Leung H (2008) Thermo-visual video fusion using probabilistic graphical model for human tracking. In: IEEE International Symposium on Circuits and systems, ISCAS 2008. IEEE, pp 1926–1929

  8. Chen X, Davis J, Slusallek P (2000) Wide area camera calibration using virtual calibration objects. In: Conference on computer vision and pattern recognition, vol 2. IEEE, pp 520–527

  9. Chen Y, Han C (2008) Night-time pedestrian detection by visual-infrared video fusion. In: 7th World congress on intelligent control and automation, WCICA 2008. IEEE, pp 5079–5084

  10. Conaire C, OConnor N, Smeaton A (2008) Thermo-visual feature fusion for object tracking using multiple spatiogram trackers. Mach Vis Appl 19(5):483–494

    Article  MATH  Google Scholar 

  11. Conaire CO, Cooke E, O’Connor N, Murphy N, Smeaton AF (2005) Fusion of infrared and visible spectrum video for indoor surveillance. In: Proc. of international workshop on image analysis for multimedia interactive services. Montreux, Switzerland

    Google Scholar 

  12. Cramer H, Scheunert U, Wanielik C (2003) Multi sensor data fusion using a generalized feature model applied to different types of extended road objects. In: 6th international conference of information fusion, vol 1, pp 2–10

  13. Davis J, Sharma V (2007) Background-subtraction using contour-based fusion of thermal and visible imagery. Comput Vis Image Underst 106(2):162–182

    Article  Google Scholar 

  14. Davis JW, Sharma V (2005) Fusion-based background-subtraction using contour saliency. In: CVPR ’05: proceedings of the 2005 IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR’05)—workshops. IEEE Computer Society, Washington, DC, p 11. doi:10.1109/CVPR.2005.462

  15. Denman S, Lamb T, Fookes C, Chandran V, Sridharan S (2010) Multi-spectral fusion for surveillance systems. Comput Electr Eng 36(4):643–663

    Article  MATH  Google Scholar 

  16. Elmenreich W (2002) Sensor fusion in time-triggered systems. Ph.D. thesis, Vienna University of Technology

  17. Forsyth DA, Ponce J (2002) Computer vision: a modern approach. Prentice Hall. http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20&path=ASIN/0130851981

  18. Goubet E, Katz J, Porikli F (2006) Pedestrian tracking using thermal infrared imaging. Mitsubishi Electric Research Laboratories, Technical Report, TR2005-126

  19. Hall DL, McMullen SAH (2004) Mathematical techniques in multisensor data fusion (Artech House Information Warfare Library). Artech House, Inc., Norwood, MA, USA

  20. Han J, Bhanu B (2007) Fusion of color and infrared video for moving human detection. Pattern Recogn 40(6):1771–1784. doi:10.1016/j.patcog.2006.11.010

    Article  MATH  Google Scholar 

  21. Hartley R, Reid I (2004) Multiple view geometry in computer vision. Cambridge University Press

  22. Hartley RI (1999) Theory and practice of projective rectification. Int J Comput Vis 35(2):115–127. doi:10.1023/A:1008115206617

    Article  Google Scholar 

  23. Johnson M, Bajcsy P (2008) Integration of thermal and visible imagery for robust foreground detection in tele-immersive spaces. In: 11th international conference on information fusion, 2008. IEEE, pp 1–8

  24. Kim K, Chalidabhongse TH, Harwood D, Davis L (2005) Real-time foreground-background segmentation using codebook model. Real-Time Imaging 11(3):172–185. doi:10.1016/j.rti.2004.12.004. http://www.sciencedirect.com/science/article/B6WPR-4FV362T-1/2/64a99673b255f07c51631846435c3ba5. Special issue on video object processing

    Article  Google Scholar 

  25. Kolmogorov V, Zabih R (2001) Computing visual correspondence with occlusions via graph cuts. Tech. rep., Cornell University, Ithaca, NY, USA

  26. Krotosky S, Trivedi M (2006) Multimodal stereo image registration for pedestrian detection. In: Intelligent Transportation Systems Conference, 2006. ITSC’06. IEEE, pp 109–114

  27. Kumar P, Mittal A, Kumar P (2006) Fusion of thermal infrared and visible spectrum video for robust surveillance. In: ICCVGIP06, pp 528–539

  28. Lee S, McHenry K, Kooper R, Bajcsy P (2009) Characterizing human subjects in real-time and three-dimensional spaces by integrating thermal-infrared and visible spectrum cameras. In: IEEE International Conference on Multimedia and Expo, ICME 2009. IEEE, pp 1708–1711

  29. Leykin A, Hammoud R (2010) Pedestrian tracking by fusion of thermal-visible surveillance videos. Mach Vis Appl 21(4):587–595

    Article  Google Scholar 

  30. Llinas J, Bowman C, Rogova G, Steinberg A, Waltz E, White F (2004) Revisiting the JDL data fusion model II. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.2996

  31. St-Laurent L, Maldague X, Prévost D (2007) Combination of colour and thermal sensors for enhanced object detection. In: 10th international conference on information fusion, 2007. IEEE, pp 1–8

  32. St Onge P, Bilodeau G (2007) Visible and infrared sensors fusion by matching feature points of foreground blobs. In: ISVC07, pp II: 1–10

  33. Steinberg AN, Bowman CL (2004) Rethinking the JDL data fusion levels. In: NSSDF conference proceedings. JHAPL

  34. Svoboda T, Martinec D, Pajdla T (2005) A convenient multi-camera self-calibration for virtual environments. PRESENCE: Teleoperators and Virtual Environments 14(4):407–422

    Article  Google Scholar 

  35. Torresan H, Turgeon B, Ibarra-Castanedo C, Hebert P, Maldague XP (2004) Advanced surveillance systems: combining video and thermal imagery for pedestrian detection. In: Burleigh DD, Cramer KE, Peacock GR (eds) Thermosense XXVI, vol 5405. SPIE, pp 506–515. doi:10.1117/12.548359. http://link.aip.org/link/?PSI/5405/506/1

  36. Ulusoy I, Yuruk H (2011) New method for the fusion of complementary information from infrared and visual images for object detection. IET Image Process 5(1):36–48

    Article  Google Scholar 

  37. Venkatesh MV, Cheung SC, Zhao J (2008) Efficient object-based video inpainting. Pattern Recogn Lett: Special issue on video-based object and event analysis. doi:10.1016/j.patrec.2008.03.011

  38. Venkatesh MV, Zhao J, Profitt L, Cheung SCS (2009) Audio-visual privacy protection for video conference. In: Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, ICME’09. IEEE, Piscataway, NJ, pp 1574–1575. http://portal.acm.org/citation.cfm?id=1698924.1699317

    Chapter  Google Scholar 

  39. Volfson L (2006) Visible, night vision and ir sensor fusion. In: 9th international conference on information fusion, pp 10–13:1–4

  40. White F (1988) A model for data fusion. In: 1st national symposium on sensor fusion

  41. Wolfram Research I (2010) Mathematica edition: version 8.0. Champaign, IL

  42. Wu Q, Boulanger P, Bischof WF (2008) Bi-layer video segmentation with foreground and background infrared illumination. In: MM ’08: Proceeding of the 16th ACM international conference on multimedia. ACM, New York, NY, pp 1025–1026. doi:10.1145/1459359.1459562

    Chapter  Google Scholar 

  43. Zhao J (2011) Camera planning and fusion in a heterogeneous camera network. Ph.D. thesis, University of Kentucky

  44. Zhao J, Cheung SC (2009) Human segmentation by fusing visible-light and thermal imaginary. In: International Conference on Computer Vision workshops (ICCV workshops). IEEE, p 1185

  45. Zhou H, Taj M (2008) Cavallaro: target detection and tracking with heterogeneous sensors. IEEE J Sel Topics Signal Process 2(4):503–513

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous reviewers and the guest editors for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sen-ching S. Cheung.

Additional information

Part of this material is based upon work supported by the National Science Foundation under Grant No. 1018241. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Appendix

Appendix

1.1 A Proof of Theorem 31

Since the homography matrix H′ is up to scale, we can assume it is in the form of

$$H'=\left[ \begin{array}{ccc} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & 1 \end{array} \right] $$

According to the definition of image rectification, epipoles of the two image is at infinity and in the form of [1 0 0]T and [a 0 0]T also subject to the homography. Plug them in (3) we have

$$\begin{array}{lll} a_{11}& =& a \\ a_{21}& = &0 \\ a_{31}& =& 0 \end{array}$$

since

$$ y_2'=\frac{a_{22}y_1'+a_{23}}{a_{32}y_1'+1}=y_1' $$

the following equation will always hold,

$$ a_{32}y_1'^2-(a_{22}-1)y_1'-a_{23}=0 $$

Therefore, all the coefficients for different order have to be zero. We have a 32 = 0, a 22 = 1, a 23 = 0.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, J., Cheung, Sc.S. Human segmentation by geometrically fusing visible-light and thermal imageries. Multimed Tools Appl 73, 61–89 (2014). https://doi.org/10.1007/s11042-012-1299-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1299-2

Keywords

Navigation