Advertisement

Complexity and Accuracy of Hand-Crafted Detection Methods Compared to Convolutional Neural Networks

  • Valeria Tomaselli
  • Emanuele Plebani
  • Mauro Strano
  • Danilo Pau
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10484)

Abstract

Even though Convolutional Neural Networks have had the best accuracy in the last few years, they have a price in term of computational complexity and memory footprint, due to a large number of multiply-accumulate operations and model parameters. For embedded systems, this complexity severely limits the opportunities to reduce power consumption, which is dominated by memory read and write operations. Anticipating the oncoming integration into intelligent sensor devices, we compare hand-crafted features for the detection of a limited number of objects against some typical convolutional neural network architectures. Experiments on some state-of-the-art datasets, addressing detection tasks, show that for some problems the increased complexity of neural networks is not reflected by a large increase in accuracy. Moreover, our analysis suggests that for embedded devices hand-crafted features are still competitive in terms of accuracy/complexity trade-offs.

Keywords

Aggregated channel features Convolutional neural networks Detection 

References

  1. 1.
    Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)CrossRefGoogle Scholar
  2. 2.
    Tomé, D., Monti, F., Baroffio, L., Bondi, L., Tagliasacchi, M., Tubaro, S.: Deep convolutional neural networks for pedestrian detection. Technical report, Politecnico di Milano (2015)Google Scholar
  3. 3.
    Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1–9 (2012)Google Scholar
  4. 4.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)Google Scholar
  5. 5.
    Viola, P., Jones, M., Snow, D.: Detecting pedestrians using patterns of motion and appearance. Int. J. Comput. Vis. 2, 734–741 (2003)Google Scholar
  6. 6.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 1, 886–893 (2005)Google Scholar
  7. 7.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  8. 8.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  9. 9.
    Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)CrossRefGoogle Scholar
  10. 10.
    Nam, W., Dollár, P., Han, J.H.: Local decorrelation for improved pedestrian detection. In: 28th Annual Conference on Neural Information Processing Systems (2014)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. arXiv:1406.4729v4 (2014)
  12. 12.
    Girshick, R.: Fast R-CNN. arXiv:1504.08083 (2015)
  13. 13.
    Ren, S., Ross, K.H., Sun, G.J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv:1506.01497v2 (2015)
  14. 14.
    Dalal, L., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2005)Google Scholar
  15. 15.
    Taiana, M., Nascimento, J.C., Bernardino, A.: An improved labelling for the INRIA person data set for pedestrian detection. In: Sanches, J.M., Micó, L., Cardoso, J.S. (eds.) IbPRIA 2013. LNCS, vol. 7887, pp. 286–295. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-38628-2_34 CrossRefGoogle Scholar
  16. 16.
    Yang, L., Luo, P., Loy, C.C., Tang, X.: A large-scale car dataset for fine-grained categorization and verification. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  17. 17.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  18. 18.
    Mathias, M., Timofte, R., Benenson, R., Van Gool, L.: Traffic sign recognition - how far are we from the solution? In: International Joint Conference on Neural Networks (IJCNN ), Dallas, USA, (2013)Google Scholar
  19. 19.
    Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., Igel, C.: Detection of traffic signs in real-world images: the German traffic sign detection benchmark. In: International Joint Conference on Neural Networks (2013)Google Scholar
  20. 20.
    Timofte, R., Zimmermann, K., Van Gool, L.: Multi-view traffic sign detection, recognition, and 3D localisation. Mach. Vis. Appl. 25, 633–647 (2011)CrossRefGoogle Scholar
  21. 21.
    Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  22. 22.
    Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. Pattern Anal. Mach. Intell. 34(4), 743–761 (2012)CrossRefGoogle Scholar
  23. 23.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678 (2014)Google Scholar
  24. 24.
    Redmon, J.: Darknet: Open Source Neural Networks in C (2013–2016). https://pjreddie.com/darknet/
  25. 25.
    Xing, W., Plebani, E.: YOLO (Real-Time Object Detection) in caffe. https://github.com/Banus/caffe-yolo
  26. 26.
    Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or \(-\)1 (2016). arXiv preprint arXiv:1602.02830

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Valeria Tomaselli
    • 1
  • Emanuele Plebani
    • 2
  • Mauro Strano
    • 1
  • Danilo Pau
    • 2
  1. 1.STMicroelectronicsCataniaItaly
  2. 2.STMicroelectronicsAgrate BrianzaItaly

Personalised recommendations