A Deep Learning Approach for Object Recognition with NAO Soccer Robots

  • Dario AlbaniEmail author
  • Ali Youssef
  • Vincenzo Suriani
  • Daniele Nardi
  • Domenico Daniele Bloisi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9776)


The use of identical robots in the RoboCup Standard Platform League (SPL) made software development the key aspect to achieve good results in competitions. In particular, the visual detection process is crucial for extracting information about the environment. In this paper, we present a novel approach for object detection and classification based on Convolutional Neural Networks (CNN). The approach is designed to be used by NAO robots and is made of two stages: image region segmentation, for reducing the search space, and Deep Learning, for validation. The proposed method can be easily extended to deal with different objects and adapted to be used in other RoboCup leagues. Quantitative experiments have been conducted on a data set of annotated images captured in real conditions from NAO robots in action. The used data set is made available for the community.


Robot vision Deep Learning RoboCup SPL NAO robots 



We wish to acknowledge the Multi-Sensor Interactive Systems Group Faculty 3 - Mathematics and Computer Science University of Bremen for providing a big part of the images used in the SPQR NAO image data set.


  1. 1.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)Google Scholar
  2. 2.
    Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. In: Computer Vision and Pattern Recognition, pp. 2155–2162 (2014)Google Scholar
  3. 3.
    Frese, U., Laue, T., Birbach, O., Röfer, T.: (A) vision for 2050-context-based image understanding for a human-robot soccer match. ECEASST 62 (2013)Google Scholar
  4. 4.
    Girshick, R.: Fast R-CNN. In: Proceedings of the International Conference on Computer Vision (ICCV) (2015)Google Scholar
  5. 5.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  6. 6.
    Härtl, A., Visser, U., Röfer, T.: Robust and efficient object recognition for a humanoid soccer robot. In: Behnke, S., Veloso, M., Visser, A., Xiong, R. (eds.) RoboCup 2013. LNCS, vol. 8371, pp. 396–407. Springer, Heidelberg (2014). Scholar
  7. 7.
    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caeff: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM 2014, pp. 675–678 (2014)Google Scholar
  9. 9.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classi cation with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, pp. 1106–1114 (2012)Google Scholar
  10. 10.
    Lienhart, R., Kuranov, A., Pisarevsky, V.: Empirical analysis of detection cascades of boosted classifiers for rapid object detection. In: Michaelis, B., Krell, G. (eds.) DAGM 2003. LNCS, vol. 2781, pp. 297–304. Springer, Heidelberg (2003). Scholar
  11. 11.
    Ouyang, W., Wang, X.: Joint deep learning for pedestrian detection. In: Computer Vision (ICCV), pp. 205–2063 (2013)Google Scholar
  12. 12.
    Röfer, T.: Region-based segmentation with ambiguous color classes and 2-D motion compensation. In: Visser, U., Ribeiro, F., Ohashi, T., Dellaert, F. (eds.) RoboCup 2007. LNCS, vol. 5001, pp. 369–376. Springer, Heidelberg (2008). Scholar
  13. 13.
    Röfer, T., Laue, T., Richter-Klug, J., Schünemann, M., Stiensmeier, J., Stölpmann, A., Stowing, A., Thielke, F.: B-Human team report and code release (2015).
  14. 14.
    Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: Integrated recognition, localization and detection using convolutional networks. CoRR, abs/1312.6229 (2013)Google Scholar
  15. 15.
    Sermanet, P., LeCun, Y.: Traffc sign recognition with multi-scale convolutional networks. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 2809–2813 (2011)Google Scholar
  16. 16.
    Szegedy, C., Toshev, A., Erhan, D.: Deep neural networks for object detection. In: Advances in Neural Information Processing Systems 26, pp. 2553–2561 (2013)Google Scholar
  17. 17.
    Volioti, S., Lagoudakis, M.G.: Histogram-based visual object recognition for the 2007 four-legged robocup league. In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds.) SETN 2008. LNCS, vol. 5138, pp. 313–326. Springer, Heidelberg (2008). Scholar
  18. 18.
    Zeiler, M.D., Ranzato, M., Monga, R., Mao, M.Z., Yang, K., Le, Q.V., Nguyen, P., Senior, A.W., Vanhoucke, V., Dean, J., Hinton, G.E.: On rectified linear units for speech processing. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 3517–3521 (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Dario Albani
    • 1
    Email author
  • Ali Youssef
    • 1
  • Vincenzo Suriani
    • 1
  • Daniele Nardi
    • 1
  • Domenico Daniele Bloisi
    • 1
  1. 1.Department of Computer, Control, and Management EngineeringSapienza University of RomeRomeItaly

Personalised recommendations