Exploring Food Detection Using CNNs

  • Eduardo AguilarEmail author
  • Marc Bolaños
  • Petia Radeva
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10672)


One of the most common critical factors directly related to the cause of a chronic disease is unhealthy diet consumption. Building an automatic system for food analysis could enable a better understanding of the nutritional information associated to the food consumed and thus, help taking corrective actions on our diet. The Computer Vision community has focused its efforts on several areas involved in visual food analysis such as: food detection, food recognition, food localization, portion estimation, among others. For food detection, the best results in the state of the art were obtained using Convolutional Neural Networks. However, the results of all different approaches were tested on different datasets and, therefore, are not directly comparable. This article proposes an overview of the last advances on food detection and an optimal model based on the GoogLeNet architecture, Principal Component Analysis, and a Support Vector Machine that outperforms the state of the art on two public food/non-food datasets.


CNN PCA GoogLeNet SVM Food detection 



This work was partially funded by TIN2015-66951-C2, SGR 1219, CERCA, ICREA Academia’2014, CONICYT Becas Chile, FPU15/01347 and Grant 20141510 (Marató TV3). The funders had no role in the study design, data collection, analysis, and preparation of the manuscript. We acknowledge Nvidia Corporation for the donation of a Titan X GPU.


  1. 1.
    Ng, M., et al.: Global, regional, and national prevalence of overweight and obesity in children and adults during 1980–2013: a systematic analysis for the global burden of disease study 2013. Lancet 384, 766–781 (2014)CrossRefGoogle Scholar
  2. 2.
    World Health Organization: Diet, nutrition and the prevention of chronic diseases. WHO Technical Report Series, vol. 916, p. 149 (2003)Google Scholar
  3. 3.
    Kagaya, H., Aizawa, K., Ogawa, M.: Food detection and recognition using convolutional neural network. In: ACM Multimedia, pp. 1085–1088 (2014)Google Scholar
  4. 4.
    Bolaños, M., Radeva, P.: Simultaneous food localization and recognition. In: ICPR (2016)Google Scholar
  5. 5.
    Myers, A., et al.: Im2Calories: towards an automated mobile vision food diary. In: ICCV (2015)Google Scholar
  6. 6.
    Singla, A., Yuan, L., Ebrahimi, T.: Food/non-food image classification and food categorization using pre-trained GoogLeNet model. In: Proceedings of the 2nd International Workshop on MADiMa (2016)Google Scholar
  7. 7.
    Kitamura, K., Yamasaki, T., Aizawa, K.: FoodLog. In: Proceedings of the ACM Multimedia 2009 Workshop on Multimedia for Cooking and Eating Activities (2009)Google Scholar
  8. 8.
    Farinella, G.M., Allegra, D., Stanco, F., Battiato, S.: On the exploitation of one class classification to distinguish food vs non-food images. In: Murino, V., Puppo, E., Sona, D., Cristani, M., Sansone, C. (eds.) ICIAP 2015. LNCS, vol. 9281, pp. 375–383. Springer, Cham (2015). CrossRefGoogle Scholar
  9. 9.
    Ragusa, F., et al.: Food vs non-food classification. In: Proceedings of the 2nd International Workshop on MADiMa (2016)Google Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, p. 19 (2012)Google Scholar
  11. 11.
    Kagaya, H., Aizawa, K.: Highly accurate food/non-food image classification based on a deep convolutional neural network. In: Murino, V., Puppo, E., Sona, D., Cristani, M., Sansone, C. (eds.) ICIAP 2015. LNCS, vol. 9281, pp. 350–357. Springer, Cham (2015). CrossRefGoogle Scholar
  12. 12.
    Lin, M., Chen, Q., Yan, S.: Network in network. arXiv Preprint, p. 10 (2013)Google Scholar
  13. 13.
    Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 446–461. Springer, Cham (2014). Google Scholar
  14. 14.
    Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  15. 15.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 303–338 (2010)CrossRefGoogle Scholar
  16. 16.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)zbMATHGoogle Scholar
  17. 17.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Jollie, I.T.: Principal component analysis. J. Am. Statist. Assoc. 98, 487 (2002)MathSciNetGoogle Scholar
  19. 19.
    Kaiser, H.F.: The application of electronic computers to factor analysis. Edu. Psychol. Measur. 20, 141–151 (1960)CrossRefGoogle Scholar
  20. 20.
    Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Caltech mimeo 11, 20 (2007)Google Scholar
  21. 21.
    Farinella, G.M., Allegra, D., Stanco, F.: A benchmark dataset to study the representation of food images. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 584–599. Springer, Cham (2015). Google Scholar
  22. 22.
    Jia, Y. et al.: Caffe: convolutional architecture for fast feature embedding. arXiv Preprint (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Universitat de Barcelona and Computer Vision CenterBarcelonaSpain

Personalised recommendations