Abstract
From depth sensors to thermal cameras, the increased availability of camera sensors beyond the visible spectrum has created many exciting applications. Most of these applications require combining information from these hyperspectral cameras with a regular RGB camera. Information fusion from multiple heterogeneous cameras can be a very complex problem. They can be fused at different levels from pixel to voxel or even semantic objects, with large variations in accuracy, communication, and computation costs. In this paper, we propose a system for robust segmentation of human figures in video sequences by fusing visible-light and thermal imageries. Our system focuses on the geometric transformation between visual blobs corresponding to human figures observed at both cameras. This approach provides the most reliable fusion at the expense of high computation and communication costs. To reduce the computational complexity of the geometric fusion, an efficient calibration procedure is first applied to rectify the two camera views without the complex procedure of estimating the intrinsic parameters of the cameras. To geometrically register different blobs at the pixel level, a blob-to-blob homography in the rectified domain is then computed in real-time by estimating the disparity for each blob-pair. Precise segmentation is finally achieved using a two-tier tracking algorithm and a unified background model. Our experimental results show that our proposed system provides significant improvements over existing schemes under various conditions.
Similar content being viewed by others
References
Beyan C, Yigit A, Temizel A (2011) Fusion of thermal-and visible-band video for abandoned object detection. J Electron Imaging 20:033,001
Bouguet JY (2005) Matlab camera calibration toolbox. Online at http://www.vision.caltech.edu/bouguetj/calib_doc/
Bradski G, Kaehler A (2008) Learning openCV. O’Reilly Media Press
Brown D (1966) Decentering distortion of lenses. Photogramm Eng 32(3):444–462
Bunyak F, Palaniappan K, Nath S, Seetharaman G (2007) Geodesic active contour based fusion of visible and infrared video for persistent object tracking. In: IEEE workshop on applications of computer vision, WACV’07. IEEE, pp 35–35
Cevher V, Sankaranarayanan A, McClellan J, Chellappa R (2007) Target tracking using a joint acoustic video system. IEEE Trans Multimedia 9(4):715–727
Chen S, Zhu W, Leung H (2008) Thermo-visual video fusion using probabilistic graphical model for human tracking. In: IEEE International Symposium on Circuits and systems, ISCAS 2008. IEEE, pp 1926–1929
Chen X, Davis J, Slusallek P (2000) Wide area camera calibration using virtual calibration objects. In: Conference on computer vision and pattern recognition, vol 2. IEEE, pp 520–527
Chen Y, Han C (2008) Night-time pedestrian detection by visual-infrared video fusion. In: 7th World congress on intelligent control and automation, WCICA 2008. IEEE, pp 5079–5084
Conaire C, OConnor N, Smeaton A (2008) Thermo-visual feature fusion for object tracking using multiple spatiogram trackers. Mach Vis Appl 19(5):483–494
Conaire CO, Cooke E, O’Connor N, Murphy N, Smeaton AF (2005) Fusion of infrared and visible spectrum video for indoor surveillance. In: Proc. of international workshop on image analysis for multimedia interactive services. Montreux, Switzerland
Cramer H, Scheunert U, Wanielik C (2003) Multi sensor data fusion using a generalized feature model applied to different types of extended road objects. In: 6th international conference of information fusion, vol 1, pp 2–10
Davis J, Sharma V (2007) Background-subtraction using contour-based fusion of thermal and visible imagery. Comput Vis Image Underst 106(2):162–182
Davis JW, Sharma V (2005) Fusion-based background-subtraction using contour saliency. In: CVPR ’05: proceedings of the 2005 IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR’05)—workshops. IEEE Computer Society, Washington, DC, p 11. doi:10.1109/CVPR.2005.462
Denman S, Lamb T, Fookes C, Chandran V, Sridharan S (2010) Multi-spectral fusion for surveillance systems. Comput Electr Eng 36(4):643–663
Elmenreich W (2002) Sensor fusion in time-triggered systems. Ph.D. thesis, Vienna University of Technology
Forsyth DA, Ponce J (2002) Computer vision: a modern approach. Prentice Hall. http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20&path=ASIN/0130851981
Goubet E, Katz J, Porikli F (2006) Pedestrian tracking using thermal infrared imaging. Mitsubishi Electric Research Laboratories, Technical Report, TR2005-126
Hall DL, McMullen SAH (2004) Mathematical techniques in multisensor data fusion (Artech House Information Warfare Library). Artech House, Inc., Norwood, MA, USA
Han J, Bhanu B (2007) Fusion of color and infrared video for moving human detection. Pattern Recogn 40(6):1771–1784. doi:10.1016/j.patcog.2006.11.010
Hartley R, Reid I (2004) Multiple view geometry in computer vision. Cambridge University Press
Hartley RI (1999) Theory and practice of projective rectification. Int J Comput Vis 35(2):115–127. doi:10.1023/A:1008115206617
Johnson M, Bajcsy P (2008) Integration of thermal and visible imagery for robust foreground detection in tele-immersive spaces. In: 11th international conference on information fusion, 2008. IEEE, pp 1–8
Kim K, Chalidabhongse TH, Harwood D, Davis L (2005) Real-time foreground-background segmentation using codebook model. Real-Time Imaging 11(3):172–185. doi:10.1016/j.rti.2004.12.004. http://www.sciencedirect.com/science/article/B6WPR-4FV362T-1/2/64a99673b255f07c51631846435c3ba5. Special issue on video object processing
Kolmogorov V, Zabih R (2001) Computing visual correspondence with occlusions via graph cuts. Tech. rep., Cornell University, Ithaca, NY, USA
Krotosky S, Trivedi M (2006) Multimodal stereo image registration for pedestrian detection. In: Intelligent Transportation Systems Conference, 2006. ITSC’06. IEEE, pp 109–114
Kumar P, Mittal A, Kumar P (2006) Fusion of thermal infrared and visible spectrum video for robust surveillance. In: ICCVGIP06, pp 528–539
Lee S, McHenry K, Kooper R, Bajcsy P (2009) Characterizing human subjects in real-time and three-dimensional spaces by integrating thermal-infrared and visible spectrum cameras. In: IEEE International Conference on Multimedia and Expo, ICME 2009. IEEE, pp 1708–1711
Leykin A, Hammoud R (2010) Pedestrian tracking by fusion of thermal-visible surveillance videos. Mach Vis Appl 21(4):587–595
Llinas J, Bowman C, Rogova G, Steinberg A, Waltz E, White F (2004) Revisiting the JDL data fusion model II. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.2996
St-Laurent L, Maldague X, Prévost D (2007) Combination of colour and thermal sensors for enhanced object detection. In: 10th international conference on information fusion, 2007. IEEE, pp 1–8
St Onge P, Bilodeau G (2007) Visible and infrared sensors fusion by matching feature points of foreground blobs. In: ISVC07, pp II: 1–10
Steinberg AN, Bowman CL (2004) Rethinking the JDL data fusion levels. In: NSSDF conference proceedings. JHAPL
Svoboda T, Martinec D, Pajdla T (2005) A convenient multi-camera self-calibration for virtual environments. PRESENCE: Teleoperators and Virtual Environments 14(4):407–422
Torresan H, Turgeon B, Ibarra-Castanedo C, Hebert P, Maldague XP (2004) Advanced surveillance systems: combining video and thermal imagery for pedestrian detection. In: Burleigh DD, Cramer KE, Peacock GR (eds) Thermosense XXVI, vol 5405. SPIE, pp 506–515. doi:10.1117/12.548359. http://link.aip.org/link/?PSI/5405/506/1
Ulusoy I, Yuruk H (2011) New method for the fusion of complementary information from infrared and visual images for object detection. IET Image Process 5(1):36–48
Venkatesh MV, Cheung SC, Zhao J (2008) Efficient object-based video inpainting. Pattern Recogn Lett: Special issue on video-based object and event analysis. doi:10.1016/j.patrec.2008.03.011
Venkatesh MV, Zhao J, Profitt L, Cheung SCS (2009) Audio-visual privacy protection for video conference. In: Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, ICME’09. IEEE, Piscataway, NJ, pp 1574–1575. http://portal.acm.org/citation.cfm?id=1698924.1699317
Volfson L (2006) Visible, night vision and ir sensor fusion. In: 9th international conference on information fusion, pp 10–13:1–4
White F (1988) A model for data fusion. In: 1st national symposium on sensor fusion
Wolfram Research I (2010) Mathematica edition: version 8.0. Champaign, IL
Wu Q, Boulanger P, Bischof WF (2008) Bi-layer video segmentation with foreground and background infrared illumination. In: MM ’08: Proceeding of the 16th ACM international conference on multimedia. ACM, New York, NY, pp 1025–1026. doi:10.1145/1459359.1459562
Zhao J (2011) Camera planning and fusion in a heterogeneous camera network. Ph.D. thesis, University of Kentucky
Zhao J, Cheung SC (2009) Human segmentation by fusing visible-light and thermal imaginary. In: International Conference on Computer Vision workshops (ICCV workshops). IEEE, p 1185
Zhou H, Taj M (2008) Cavallaro: target detection and tracking with heterogeneous sensors. IEEE J Sel Topics Signal Process 2(4):503–513
Acknowledgements
We would like to thank the anonymous reviewers and the guest editors for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Part of this material is based upon work supported by the National Science Foundation under Grant No. 1018241. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Appendix
Appendix
1.1 A Proof of Theorem 31
Since the homography matrix H′ is up to scale, we can assume it is in the form of
According to the definition of image rectification, epipoles of the two image is at infinity and in the form of [1 0 0]T and [a 0 0]T also subject to the homography. Plug them in (3) we have
since
the following equation will always hold,
Therefore, all the coefficients for different order have to be zero. We have a 32 = 0, a 22 = 1, a 23 = 0.
Rights and permissions
About this article
Cite this article
Zhao, J., Cheung, Sc.S. Human segmentation by geometrically fusing visible-light and thermal imageries. Multimed Tools Appl 73, 61–89 (2014). https://doi.org/10.1007/s11042-012-1299-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-012-1299-2