Multimedia Tools and Applications

, Volume 77, Issue 22, pp 29283–29301 | Cite as

Real-time video fire smoke detection by utilizing spatial-temporal ConvNet features

  • Yaocong Hu
  • Xiaobo LuEmail author


Fire is one of the most dangerous disasters threatening human life and property globally. In order to reduce fire losses, researches on video analysis for early smoke detection have become particularly significant. However, it is still a challenging task to extract stable features for smoke recognition, largely due to its variations in color, shapes and texture. Classical convolutional neural networks can automatically learn feature representations of appearance from a single frame but fail to capture motion information between frames. For addressing this issue, in this paper, we propose a spatial-temporal based convolutional neural network for video smoke detection, and for real-time detection, propose an enhanced architecture, which utilizes a multitask learning strategy to jointly recognize smoke and estimate optical flow, capturing intra-frame appearance features and inter-frame motion features simultaneously. The effectiveness and efficiency of our proposed method is validated by experiments carried out on our self-created dataset, which achieves 97.0% detection rate and 3.5% false alarm rate with processing time of 5ms per frame, obviously outperforming existing methods.


Smoke detection Convolutional neural networks Spatial-temporal Multi-task learning 



The authors would like to thank the editor and the anonymous reviewers for their valuable comments and constructive suggestions. This work was supported by the National Key Science & Technology Pillar Program of China (No. 2014BAG01B03), the National Natural Science Foundation of China (No. 61374194), Key Research and Development Program of Jiangsu Province (No. BE2016739), and a Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.


  1. 1.
    Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. Computer vision - ECCV 2004: 8th European conference on computer vision, Prague, Czech Republic, May 11-14, 2004. Proceedings, Part IV, pp 25–36Google Scholar
  2. 2.
    da Penha OS, Nakamura EF (2010) Fusing light and temperature data for fire detection. In: The IEEE Symposium on computers and communications, pp 107–112.
  3. 3.
    Dosovitskiy A, Fischery P, Ilg E, Hausser P, Hazirbas C, Golkov V, Smagt VD, Cremers P, Brox D, Flownet T (2015) Learning optical flow with convolutional networks. In: 2015 IEEE International conference on computer vision (ICCV), pp 2758–2766.
  4. 4.
    Frizzi S, Kaabi R, Bouchouicha M, Ginoux JM, Moreau E, Fnaiech F (2016) Convolutional neural network for video fire and smoke detection. In: IECON 2016 - 42nd Annual conference of the IEEE industrial electronics society, pp 877–882.
  5. 5.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on computer vision and pattern recognition, pp 580–587Google Scholar
  6. 6.
    Gubbi J, Marusic S, Palaniswami M (2009) Smoke detection in video using wavelets and support vector machines. Fire Safe J 44(8):1110–1115. CrossRefGoogle Scholar
  7. 7.
    Han Y, Yang Y, Wu F, Hong R (2015) Compact and discriminative descriptor inference using multi-cues. IEEE Trans Image Process 24(12):5114–5126. MathSciNetCrossRefGoogle Scholar
  8. 8.
    Han J, Zhang D, Cheng G, Liu N, Xu D (2018) Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process Mag 35(1):84–100. CrossRefGoogle Scholar
  9. 9.
    Howard AG (2013) Some improvements on deep convolutional neural network based image classification. CoRR 1312.5402
  10. 10.
    Hu Y, Chang H, Nian F, Wang Y, Li T (2016) Dense crowd counting from still images with convolutional neural networks. J Vis Commun Image Represent 38:530–539. CrossRefGoogle Scholar
  11. 11.
    Huang X (2018) Automatic video superimposed text detection based on nonsubsampled contourlet transform. Multimed Tools Appl 77(6):7033–7049. CrossRefGoogle Scholar
  12. 12.
    Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2016) Flownet 2.0: evolution of optical flow estimation with deep networks. CoRR 1612.01925
  13. 13.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: MM 2014 - Proceedings of the 2014 ACM conference on multimediaGoogle Scholar
  14. 14.
    Kaiser T (2000) Fire detection with temperature sensor arrays. In: Proceedings IEEE 34th annual 2000 international carnahan conference on security technology (Cat. No.00CH37083), pp 262–268.
  15. 15.
    Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on computer vision and pattern recognition, pp 1725–1732.
  16. 16.
    Ko B, Park J, Nam JY (2013) Spatiotemporal bag-of-features for early wildfire smoke detection. Image Vis Comput 31(10):786–795. CrossRefGoogle Scholar
  17. 17.
    Krizhevsky A, Ilya S, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105Google Scholar
  18. 18.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation, 3431–3440.
  19. 19.
    Mao X, Shen C, Yang Y (2016) Image denoising using very deep fully convolutional encoder-decoder networks with symmetric skip connections. CoRR 1603.09056
  20. 20.
    Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: 2015 IEEE International conference on computer vision (ICCV), pp 1520–1528.
  21. 21.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. MathSciNetCrossRefGoogle Scholar
  22. 22.
    Sainath T, Kingsbury B, Mohamed A, Dahl GE, Saon G, Soltau H, Beran T, Aravkin AY, Ramabhadran B (2013) Improvements to deep convolutional neural networks for lvcsr. In: IEEE Workshop on automatic speech recognition and understanding, pp 315–320Google Scholar
  23. 23.
    Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst, 1Google Scholar
  24. 24.
    Srisuwan T, Ruchanurucks M (2013) Smoke detection using glcm, wavelet, and motion. In: Proceedings of SPIE - the international society for optical engineering, p 9069Google Scholar
  25. 25.
    Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes. In: IEEE Conference on computer vision and pattern recognition, pp 1891–1898Google Scholar
  26. 26.
    Tao C, Zhang J, Wang P (2016) Smoke detection based on deep convolutional neural networks. In: 2016 International conference on industrial informatics - computing technology, intelligent technology, industrial information integration (ICIICII), pp 150–153.
  27. 27.
    Tian H, Li W, Ogunbona P, Nguyen DT, Zhan C (2011) Smoke detection in videos using non-redundant local binary pattern-based features. In: 2011 IEEE 13th International workshop on multimedia signal processing, pp 1–4.
  28. 28.
    Toreyin B, Dedeolu Y, Enis A, Etin C (2005) Wavelet based real-time smoke detection in video. In: Proceedings of 13th European signal processing conferenceGoogle Scholar
  29. 29.
    Xu G, Zhang Y, Zhang Q, Lin G, Wang J (2017) Domain adaptation from synthesis to reality in single-model detector for video smoke detection. arXiv:1709.08142
  30. 30.
    Yao X, Han J, Zhang D, Nie F (2017) Revisiting co-saliency detection: a novel approach based on two-stage multi-view spectral rotation co-clustering. IEEE Trans Image Process 26(7):3196–3209. MathSciNetCrossRefGoogle Scholar
  31. 31.
    Yin Z, Wan B, Yuan F, Xia X, Shi J (2017) A deep normalization and convolutional neural network for image smoke detection. IEEE Access 5:18,429–18,438. CrossRefGoogle Scholar
  32. 32.
    Yuan F (2008) A fast accumulative motion orientation model based on integral image for video smoke detection. Pattern Recogn Lett 29(7):925–932. CrossRefGoogle Scholar
  33. 33.
    Yuan F (2011) Video-based smoke detection with histogram sequence of lbp and lbpv pyramids. Fire Safety J 46(3):132–139. CrossRefGoogle Scholar
  34. 34.
    Yuan F (2012) A double mapping framework for extraction of shape-invariant features based on multi-scale partitions with adaboost for video smoke detection. Pattern Recogn 45(12):4326–4336. CrossRefGoogle Scholar
  35. 35.
    Yuan F, Shi J, Xia X, Fang Y, Fang Z, Mei T (2016) High-order local ternary patterns with locality preserving projection for smoke detection and image classification. Inf Sci 372:225–240. CrossRefGoogle Scholar
  36. 36.
    Zeiler M, Fergus R (2014) Visualizing and understanding convolutional networks. In: Europe Conference on computer vision, pp 818–833Google Scholar
  37. 37.
    Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 833–841.
  38. 38.
    Zhang Q, Xu J, Xu L, Guo H (2016) Deep convolutional neural networks for forest fire detection. In: International forum on management, education & information technology applicationGoogle Scholar
  39. 39.
    Zhao S, Liu Y, Han Y, Hong R, Hu Q, Tian Q (2017) Pooling the convolutional layers in deep convnets for video action recognition. IEEE Trans Circ Syst Vid Technol PP(99):1–1. Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of AutomationSoutheast UniversityNanjingChina
  2. 2.Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of EducationSoutheast UniversityNanjingChina

Personalised recommendations