Human segmentation by geometrically fusing visible-light and thermal imageries

Zhao, Jian; Cheung, Sen-ching S.

doi:10.1007/s11042-012-1299-2

Human segmentation by geometrically fusing visible-light and thermal imageries

Published: 05 December 2012

Volume 73, pages 61–89, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jian Zhao¹ &
Sen-ching S. Cheung²

587 Accesses
14 Citations
Explore all metrics

Abstract

From depth sensors to thermal cameras, the increased availability of camera sensors beyond the visible spectrum has created many exciting applications. Most of these applications require combining information from these hyperspectral cameras with a regular RGB camera. Information fusion from multiple heterogeneous cameras can be a very complex problem. They can be fused at different levels from pixel to voxel or even semantic objects, with large variations in accuracy, communication, and computation costs. In this paper, we propose a system for robust segmentation of human figures in video sequences by fusing visible-light and thermal imageries. Our system focuses on the geometric transformation between visual blobs corresponding to human figures observed at both cameras. This approach provides the most reliable fusion at the expense of high computation and communication costs. To reduce the computational complexity of the geometric fusion, an efficient calibration procedure is first applied to rectify the two camera views without the complex procedure of estimating the intrinsic parameters of the cameras. To geometrically register different blobs at the pixel level, a blob-to-blob homography in the rectified domain is then computed in real-time by estimating the disparity for each blob-pair. Precise segmentation is finally achieved using a two-tier tracking algorithm and a unified background model. Our experimental results show that our proposed system provides significant improvements over existing schemes under various conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online Mutual Foreground Segmentation for Multispectral Stereo Videos

Article 09 January 2019

Pierre-Luc St-Charles, Guillaume-Alexandre Bilodeau & Robert Bergevin

Supervoxel-based segmentation of 3D imagery with optical flow integration for spatiotemporal processing

Article Open access 19 June 2018

Xiaohui Huang, Chengliang Yang, … Anand Rangarajan

A Fast 3D Indoor-Localization Approach Based on Video Queries

References

Beyan C, Yigit A, Temizel A (2011) Fusion of thermal-and visible-band video for abandoned object detection. J Electron Imaging 20:033,001
Article Google Scholar
Bouguet JY (2005) Matlab camera calibration toolbox. Online at http://www.vision.caltech.edu/bouguetj/calib_doc/
Bradski G, Kaehler A (2008) Learning openCV. O’Reilly Media Press
Brown D (1966) Decentering distortion of lenses. Photogramm Eng 32(3):444–462
Google Scholar
Bunyak F, Palaniappan K, Nath S, Seetharaman G (2007) Geodesic active contour based fusion of visible and infrared video for persistent object tracking. In: IEEE workshop on applications of computer vision, WACV’07. IEEE, pp 35–35
Cevher V, Sankaranarayanan A, McClellan J, Chellappa R (2007) Target tracking using a joint acoustic video system. IEEE Trans Multimedia 9(4):715–727
Article Google Scholar
Chen S, Zhu W, Leung H (2008) Thermo-visual video fusion using probabilistic graphical model for human tracking. In: IEEE International Symposium on Circuits and systems, ISCAS 2008. IEEE, pp 1926–1929
Chen X, Davis J, Slusallek P (2000) Wide area camera calibration using virtual calibration objects. In: Conference on computer vision and pattern recognition, vol 2. IEEE, pp 520–527
Chen Y, Han C (2008) Night-time pedestrian detection by visual-infrared video fusion. In: 7th World congress on intelligent control and automation, WCICA 2008. IEEE, pp 5079–5084
Conaire C, OConnor N, Smeaton A (2008) Thermo-visual feature fusion for object tracking using multiple spatiogram trackers. Mach Vis Appl 19(5):483–494
Article MATH Google Scholar
Conaire CO, Cooke E, O’Connor N, Murphy N, Smeaton AF (2005) Fusion of infrared and visible spectrum video for indoor surveillance. In: Proc. of international workshop on image analysis for multimedia interactive services. Montreux, Switzerland
Google Scholar
Cramer H, Scheunert U, Wanielik C (2003) Multi sensor data fusion using a generalized feature model applied to different types of extended road objects. In: 6th international conference of information fusion, vol 1, pp 2–10
Davis J, Sharma V (2007) Background-subtraction using contour-based fusion of thermal and visible imagery. Comput Vis Image Underst 106(2):162–182
Article Google Scholar
Davis JW, Sharma V (2005) Fusion-based background-subtraction using contour saliency. In: CVPR ’05: proceedings of the 2005 IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR’05)—workshops. IEEE Computer Society, Washington, DC, p 11. doi:10.1109/CVPR.2005.462
Denman S, Lamb T, Fookes C, Chandran V, Sridharan S (2010) Multi-spectral fusion for surveillance systems. Comput Electr Eng 36(4):643–663
Article MATH Google Scholar
Elmenreich W (2002) Sensor fusion in time-triggered systems. Ph.D. thesis, Vienna University of Technology
Forsyth DA, Ponce J (2002) Computer vision: a modern approach. Prentice Hall. http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20&path=ASIN/0130851981
Goubet E, Katz J, Porikli F (2006) Pedestrian tracking using thermal infrared imaging. Mitsubishi Electric Research Laboratories, Technical Report, TR2005-126
Hall DL, McMullen SAH (2004) Mathematical techniques in multisensor data fusion (Artech House Information Warfare Library). Artech House, Inc., Norwood, MA, USA
Han J, Bhanu B (2007) Fusion of color and infrared video for moving human detection. Pattern Recogn 40(6):1771–1784. doi:10.1016/j.patcog.2006.11.010
Article MATH Google Scholar
Hartley R, Reid I (2004) Multiple view geometry in computer vision. Cambridge University Press
Hartley RI (1999) Theory and practice of projective rectification. Int J Comput Vis 35(2):115–127. doi:10.1023/A:1008115206617
Article Google Scholar
Johnson M, Bajcsy P (2008) Integration of thermal and visible imagery for robust foreground detection in tele-immersive spaces. In: 11th international conference on information fusion, 2008. IEEE, pp 1–8
Kim K, Chalidabhongse TH, Harwood D, Davis L (2005) Real-time foreground-background segmentation using codebook model. Real-Time Imaging 11(3):172–185. doi:10.1016/j.rti.2004.12.004. http://www.sciencedirect.com/science/article/B6WPR-4FV362T-1/2/64a99673b255f07c51631846435c3ba5. Special issue on video object processing
Article Google Scholar
Kolmogorov V, Zabih R (2001) Computing visual correspondence with occlusions via graph cuts. Tech. rep., Cornell University, Ithaca, NY, USA
Krotosky S, Trivedi M (2006) Multimodal stereo image registration for pedestrian detection. In: Intelligent Transportation Systems Conference, 2006. ITSC’06. IEEE, pp 109–114
Kumar P, Mittal A, Kumar P (2006) Fusion of thermal infrared and visible spectrum video for robust surveillance. In: ICCVGIP06, pp 528–539
Lee S, McHenry K, Kooper R, Bajcsy P (2009) Characterizing human subjects in real-time and three-dimensional spaces by integrating thermal-infrared and visible spectrum cameras. In: IEEE International Conference on Multimedia and Expo, ICME 2009. IEEE, pp 1708–1711
Leykin A, Hammoud R (2010) Pedestrian tracking by fusion of thermal-visible surveillance videos. Mach Vis Appl 21(4):587–595
Article Google Scholar
Llinas J, Bowman C, Rogova G, Steinberg A, Waltz E, White F (2004) Revisiting the JDL data fusion model II. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.2996
St-Laurent L, Maldague X, Prévost D (2007) Combination of colour and thermal sensors for enhanced object detection. In: 10th international conference on information fusion, 2007. IEEE, pp 1–8
St Onge P, Bilodeau G (2007) Visible and infrared sensors fusion by matching feature points of foreground blobs. In: ISVC07, pp II: 1–10
Steinberg AN, Bowman CL (2004) Rethinking the JDL data fusion levels. In: NSSDF conference proceedings. JHAPL
Svoboda T, Martinec D, Pajdla T (2005) A convenient multi-camera self-calibration for virtual environments. PRESENCE: Teleoperators and Virtual Environments 14(4):407–422
Article Google Scholar
Torresan H, Turgeon B, Ibarra-Castanedo C, Hebert P, Maldague XP (2004) Advanced surveillance systems: combining video and thermal imagery for pedestrian detection. In: Burleigh DD, Cramer KE, Peacock GR (eds) Thermosense XXVI, vol 5405. SPIE, pp 506–515. doi:10.1117/12.548359. http://link.aip.org/link/?PSI/5405/506/1
Ulusoy I, Yuruk H (2011) New method for the fusion of complementary information from infrared and visual images for object detection. IET Image Process 5(1):36–48
Article Google Scholar
Venkatesh MV, Cheung SC, Zhao J (2008) Efficient object-based video inpainting. Pattern Recogn Lett: Special issue on video-based object and event analysis. doi:10.1016/j.patrec.2008.03.011
Venkatesh MV, Zhao J, Profitt L, Cheung SCS (2009) Audio-visual privacy protection for video conference. In: Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, ICME’09. IEEE, Piscataway, NJ, pp 1574–1575. http://portal.acm.org/citation.cfm?id=1698924.1699317
Chapter Google Scholar
Volfson L (2006) Visible, night vision and ir sensor fusion. In: 9th international conference on information fusion, pp 10–13:1–4
White F (1988) A model for data fusion. In: 1st national symposium on sensor fusion
Wolfram Research I (2010) Mathematica edition: version 8.0. Champaign, IL
Wu Q, Boulanger P, Bischof WF (2008) Bi-layer video segmentation with foreground and background infrared illumination. In: MM ’08: Proceeding of the 16th ACM international conference on multimedia. ACM, New York, NY, pp 1025–1026. doi:10.1145/1459359.1459562
Chapter Google Scholar
Zhao J (2011) Camera planning and fusion in a heterogeneous camera network. Ph.D. thesis, University of Kentucky
Zhao J, Cheung SC (2009) Human segmentation by fusing visible-light and thermal imaginary. In: International Conference on Computer Vision workshops (ICCV workshops). IEEE, p 1185
Zhou H, Taj M (2008) Cavallaro: target detection and tracking with heterogeneous sensors. IEEE J Sel Topics Signal Process 2(4):503–513
Article Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous reviewers and the guest editors for their valuable comments.

Author information

Authors and Affiliations

Windows Phone, Microsoft Corporation, One Microsoft Way, Redmond, WA, 98052, USA
Jian Zhao
Center for Visualization and Virtual Environments, University of Kentucky, 329 Rose Street, Lexington, KY, 40506, USA
Sen-ching S. Cheung

Authors

Jian Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Sen-ching S. Cheung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sen-ching S. Cheung.

Additional information

Part of this material is based upon work supported by the National Science Foundation under Grant No. 1018241. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Appendix

1.1 A Proof of Theorem 31

Since the homography matrix H′ is up to scale, we can assume it is in the form of

$$H'=\left[ \begin{array}{ccc} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & 1 \end{array} \right] $$

According to the definition of image rectification, epipoles of the two image is at infinity and in the form of [1 0 0]^T and [a 0 0]^T also subject to the homography. Plug them in (3) we have

$$\begin{array}{lll} a_{11}& =& a \\ a_{21}& = &0 \\ a_{31}& =& 0 \end{array}$$

since

$$ y_2'=\frac{a_{22}y_1'+a_{23}}{a_{32}y_1'+1}=y_1' $$

the following equation will always hold,

$$ a_{32}y_1'^2-(a_{22}-1)y_1'-a_{23}=0 $$

Therefore, all the coefficients for different order have to be zero. We have a ₃₂ = 0, a ₂₂ = 1, a ₂₃ = 0.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, J., Cheung, Sc.S. Human segmentation by geometrically fusing visible-light and thermal imageries. Multimed Tools Appl 73, 61–89 (2014). https://doi.org/10.1007/s11042-012-1299-2

Download citation

Published: 05 December 2012
Issue Date: November 2014
DOI: https://doi.org/10.1007/s11042-012-1299-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human segmentation by geometrically fusing visible-light and thermal imageries

Abstract

Access this article

Similar content being viewed by others

Online Mutual Foreground Segmentation for Multispectral Stereo Videos

Supervoxel-based segmentation of 3D imagery with optical flow integration for spatiotemporal processing

A Fast 3D Indoor-Localization Approach Based on Video Queries

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 A Proof of Theorem 31

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human segmentation by geometrically fusing visible-light and thermal imageries

Abstract

Access this article

Similar content being viewed by others

Online Mutual Foreground Segmentation for Multispectral Stereo Videos

Supervoxel-based segmentation of 3D imagery with optical flow integration for spatiotemporal processing

A Fast 3D Indoor-Localization Approach Based on Video Queries

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 A Proof of Theorem 31

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation