Skip to main content
Log in

Driving behaviour recognition from still images by using multi-stream fusion CNN

  • Special Issue Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Abnormal driving behaviour is a leading cause of serious traffic accidents threatening human life and public property globally. In this paper, we investigate the use of a deep learning approach to automatically recognize driving behaviour (such as normal driving, driving with hands off the wheel, calling, playing mobile phone, smoking and talking with passengers) in a single image. The task of driving behaviour recognition can be regarded as a multi-class classification problem, and we resolve this problem from two aspects in our study: (1) Employ multi-stream CNN to extract multi-scale features by filtering images with receptive fields of different kernel sizes and (2) investigate different fusion strategies to combine the multi-scale information and generate the final decision for driving behaviour recognition. The effectiveness of our proposed method is validated by extensive experiments carried out on our self-created simulated driving behaviour dataset, as well as a real driving behaviour dataset, and the experiment results demonstrate that the proposed multi-stream CNN-based method achieves the significant performance improvements compared to the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Asadi-Aghbolaghi, M., Clapés, A., Bellantonio, M., Escalante, H.J., Ponce-López, V., Baró, X., Guyon, I., Kasaei, S., Escalera, S.: Deep Learning for Action and Gesture Recognition in Image Sequences: A Survey, pp. 539–578. Springer, Cham (2017)

    Book  Google Scholar 

  2. Ba, Y., Zhang, W., Wang, Q., Zhou, R., Ren, C.: Crash prediction with behavioral and physiological features for advanced vehicle collision avoidance system. Transp. Res. Part C Emerg Technol 74, 22–33 (2017). https://doi.org/10.1016/j.trc.2016.11.009

    Article  Google Scholar 

  3. Chiang, H.H., Chen, Y.L., Wu, B.F., Lee, T.T.: Embedded driver-assistance system using multiple sensors for safe overtaking maneuver. IEEE Syst. J. 8(3), 681–698 (2014). https://doi.org/10.1109/JSYST.2012.2212636

    Article  Google Scholar 

  4. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1007/BF00994018

    MATH  Google Scholar 

  5. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Work Stat Learn Comput Vision, vol. 1, ECCV (2004)

  6. Del Coco, M., Carcagnì, P., Leo, M., Spagnolo, P., Mazzeo, P.L., Distante, C.: Multi-branch cnn for multi-scale age estimation. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) Image Analysis and Processing—DICIAP 2017, pp. 234–244. Springer, Cham (2017)

    Chapter  Google Scholar 

  7. Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: Proceedings of the British Machine Vision Conference, pp. 97.1–97.11. BMVA Press (2010). https://doi.org/10.5244/C.24.97

  8. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1933–1941 (2016). https://doi.org/10.1109/CVPR.2016.213

  9. Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision—ECCV 2014, pp. 392–407. Springer, Cham (2014)

    Google Scholar 

  10. Guo, G., Lai, A.: A survey on still image based human action recognition. Pattern Recognit. 47(10), 3343–3361 (2014). https://doi.org/10.1016/j.patcog.2014.04.018

    Article  Google Scholar 

  11. Guo, J., Lei, Z., Wan, J., Avots, E., Hajarolasvadi, N., Knyazev, B., Kuharenko, A., Junior, J.C.S.J., Bar, X., Demirel, H., Escalera, S., Allik, J., Anbarjafari, G.: Dominant and complementary emotion recognition from still images of faces. IEEE Access 6, 26391–26403 (2018). https://doi.org/10.1109/ACCESS.2018.2831927

    Article  Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  13. Hu, J., Xu, L., He, X., Meng, W.: Abnormal driving detection based on normalized driving behavior. IEEE Trans. Veh. Technol. 66(8), 6645–6652 (2017). https://doi.org/10.1109/TVT.2017.2660497

    Article  Google Scholar 

  14. Ji, S., Xu, W., Yang, M., Yu, K.: 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59

    Article  Google Scholar 

  15. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: MM 2014—Proceedings of the 2014 ACM Conference on Multimedia (2014)

  16. Koesdwiady, A., Bedawi, S.M., Ou, C., Karray, F.: End-to-end deep learning for driver distraction recognition. In: Karray, F., Campilho, A., Cheriet, F. (eds.) Image Analysis and Recognition, pp. 11–18. Springer, Cham (2017)

    Chapter  Google Scholar 

  17. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, vol. 25 (2012)

  18. Kulkarni, K., Corneanu, C., Ofodile, I., Escalera, S., Bar, X., Hyniewska, S., Allik, J., Anbarjafari, G.: Automatic recognition of facial displays of unfelt emotions. In: IEEE Transactions on Affective Computing, p. 1 (2018). https://doi.org/10.1109/TAFFC.2018.2874996

  19. Le, T.H.N., Zheng, Y., Zhu, C., Luu, K., Savvides, M.: Multiple scale faster-rcnn approach to driver’s cell-phone usage and hands on steering wheel detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 46–53 (2016). https://doi.org/10.1109/CVPRW.2016.13

  20. Liu, J., Zha, Z.J., Tian, Q., Liu, D., Yao, T., Ling, Q., Mei, T.: Multi-scale triplet cnn for person re-identification. In: Proceedings of the 2016 ACM on Multimedia Conference, MM ’16, pp. 192–196. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2964284.2967209

  21. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015). https://doi.org/10.1109/CVPR.2015.7298965

  22. Martinez, C.M., Heucke, M., Wang, F.Y., Gao, B., Cao, D.: Driving style recognition for intelligent vehicle control and advanced driver assistance: a survey. IEEE Trans. Intell. Transp. Syst. 19(3), 666–676 (2018). https://doi.org/10.1109/TITS.2017.2706978

    Article  Google Scholar 

  23. Noroozi, F., Marjanovic, M., Njegus, A., Escalera, S., Anbarjafari, G.: Audio-visual emotion recognition in video clips. In: IEEE Transactions on Affective Computing, p. 1 (2018). https://doi.org/10.1109/TAFFC.2017.2713783

  24. Peden, M.: Global collaboration on road traffic injury prevention. Int. J. Inj. Control Saf. Promot. 12(2), 85–91 (2005). https://doi.org/10.1080/15660970500086130

    Article  Google Scholar 

  25. Qi, T., Xu, Y., Quan, Y., Wang, Y., Ling, H.: Image-based action recognition using hint-enhanced deep neural networks. Neurocomputing 267, 475–488 (2017). https://doi.org/10.1016/j.neucom.2017.06.041

    Article  Google Scholar 

  26. Ragab, A., Craye, C., Kamel, M.S., Karray, F.: A visual-based driver distraction recognition and detection using random forest. In: 2014 International Conference on Image Analysis and Recognition (ICIAR), vol. 8814, pp. 256–265 (2014). https://doi.org/10.1007/978-3-319-11758-428

  27. Ravanbakhsh, M., Mousavi, H., Rastegari, M., Murino, V., Davis, L.S.: Action recognition with image based CNN features. CoRR arXiv:1512.03980 (2015)

  28. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91

  29. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems—Volume 1, NIPS’15, pp. 91–99. MIT Press, Cambridge, MA, USA (2015). http://dl.acm.org/citation.cfm?id=2969239.2969250

  30. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  31. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 1, NIPS’14, pp. 568–576. MIT Press, Cambridge, MA, USA (2014). http://dl.acm.org/citation.cfm?id=2968826.2968890

  32. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  33. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 00, pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594

  34. Tang, P., Wang, H., Kwong, S.: G-ms2f: googlenet based multi-stage feature fusion of deep cnn for scene recognition. Neurocomputing 225, 188–197 (2017). https://doi.org/10.1016/j.neucom.2016.11.023

    Article  Google Scholar 

  35. Wan, J., Escalera, S., Anbarjafari, G., Escalante, H.J., Baro, X., Guyon, I., Madadi, M., Allik, J., Gorbova, J., Lin, C., Xie, Y.: Results and analysis of ChaLearn LAP multi-modal isolated and continuous gesture recognition, and real versus fake expressed emotions challenges. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 3189–3197 (2017). https://doi.org/10.1109/ICCVW.2017.377

  36. Wang, W., Lu, X., Song, J., Chen, C.: A two-column convolutional neural network for facial point detection. In: 2016 International Conference on Progress in Informatics and Computing (PIC), pp. 169–173 (2016). https://doi.org/10.1109/PIC.2016.7949488

  37. Yan, C., Coenen, F., Zhang, B.L.: Driving posture recognition by joint application of motion history image and pyramid histogram of oriented gradients. In: Advances in Mechatronics, Automation and Applied Information Technologies, Advanced Materials Research, vol. 846, pp. 1102–1105. Trans Tech Publications (2014). https://doi.org/10.4028/www.scientific.net/AMR.846-847.1102

  38. Yan, C., Zhang, B., Coenen, F.: Driving posture recognition by convolutional neural networks. In: 2015 11th International Conference on Natural Computation (ICNC), pp. 680–685 (2015). https://doi.org/10.1109/ICNC.2015.7378072

  39. Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 17–24 (2010). https://doi.org/10.1109/CVPR.2010.5540235

  40. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 589–597 (2016). https://doi.org/10.1109/CVPR.2016.70

  41. Zhao, C., Gao, Y., He, J., Lian, J.: Recognition of driving postures by multiwavelet transform and multilayer perceptron classifier. Eng. Appl. Artif. Intell. 25(8), 1677–1686 (2012). https://doi.org/10.1016/j.engappai.2012.09.018

    Article  Google Scholar 

  42. Zhao, C., Zhang, B., Lian, J., He, J., Lin, T., Zhang, X.: Classification of driving postures by support vector machines. In: 2011 Sixth International Conference on Image and Graphics, pp. 926–930 (2011). https://doi.org/10.1109/ICIG.2011.184

  43. Zhao, C.H., Zhang, B.L., He, J., Lian, J.: Recognition of driving postures by contourlet transform and random forests. IET Intell. Transp. Syst. 6(2), 161–168 (2012). https://doi.org/10.1049/iet-its.2011.0116

    Article  Google Scholar 

  44. Zhao, C.H., Zhang, B.L., Zhang, X.Z., Zhao, S.Q., Li, H.X.: Erratum to: recognition of driving postures by combined features and random subspace ensemble of multilayer perceptron classifiers. Neural Comput. Appl. 22(1), 185–185 (2013). https://doi.org/10.1007/s00521-012-1121-0

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editor and the anonymous reviewers for their valuable comments and constructive suggestions. This work was supported by the National Natural Science Foundation of China (No. 61 871123), Key Research and Development Program in Jiangsu Province (No. BE2016739) and a Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaobo Lu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Natural Science Foundation of China (No. 61871123), Key Research and Development Program in Jiangsu Province (No. BE2016739) and a Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, Y., Lu, M. & Lu, X. Driving behaviour recognition from still images by using multi-stream fusion CNN. Machine Vision and Applications 30, 851–865 (2019). https://doi.org/10.1007/s00138-018-0994-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-018-0994-z

Keywords

Navigation