Where and What Am I Eating? Image-Based Food Menu Recognition

  • Marc BolañosEmail author
  • Marc Valdivia
  • Petia Radeva
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11134)


Food has become a very important aspect of our social activities. Since social networks and websites like Yelp appeared, their users have started uploading photos of their meals to the Internet. This phenomenon opens a whole world of possibilities for developing models for applying food analysis and recognition on huge amounts of real-world data. A clear application could consist in applying image food recognition by using the menu of the restaurants. Our model, based on Convolutional Neural Networks and Recurrent Neural Networks, is able to learn a language model that generalizes on never seen dish names without the need of re-training it. According to the Ranking Loss metric, the results obtained by the model improve the baseline by a 15%.


Multimodal learning Computer vision Food recognition 


  1. 1.
    Aguilar, E., Bolanos, M., Radeva, P.: Exploring food detection using CNNs. arXiv preprint arXiv:1709.04800 (2017)
  2. 2.
    Aguilar, E., Bolaños, M., Radeva, P.: Food recognition using fusion of classifiers based on CNNs. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10485, pp. 213–224. Springer, Cham (2017). Scholar
  3. 3.
    Aguilar, E., Remeseiro, B., Bolaños, M., Radeva, P.: Grab, pay and eat: semantic food detection for smart restaurants. arXiv preprint arXiv:1711.05128 (2017)
  4. 4.
    Bettadapura, V., Thomaz, E., Parnami, A., Abowd, G.D., Essa, I.: Leveraging context to support automated food recognition in restaurants. In: 2015 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 580–587. IEEE (2015)Google Scholar
  5. 5.
    Bolaños, M., Ferrà, A., Radeva, P.: Food ingredients recognition through multi-label learning. In: Battiato, S., Farinella, G.M., Leo, M., Gallo, G. (eds.) ICIAP 2017. LNCS, vol. 10590, pp. 394–402. Springer, Cham (2017). Scholar
  6. 6.
    Bolaños, M., Peris, Á., Casacuberta, F., Radeva, P.: VIBIKNet: visual bidirectional kernelized network for visual question answering. In: Alexandre, L.A., Salvador Sánchez, J., Rodrigues, J.M.F. (eds.) IbPRIA 2017. LNCS, vol. 10255, pp. 372–380. Springer, Cham (2017). Scholar
  7. 7.
    Bolanos, M., Radeva, P.: Simultaneous food localization and recognition. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3140–3145. IEEE (2016)Google Scholar
  8. 8.
    Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 446–461. Springer, Cham (2014). Scholar
  9. 9.
    Chen, J., Ngo, C.W.: Deep-based ingredient recognition for cooking recipe retrieval. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 32–41. ACM (2016)Google Scholar
  10. 10.
    Chollet, F., et al.: Keras (2015).
  11. 11.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 539–546. IEEE (2005)Google Scholar
  12. 12.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009 (2009)Google Scholar
  13. 13.
    Ege, T., Yanai, K.: Simultaneous estimation of food categories and calories with multi-task CNN. In: 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), pp. 198–201. IEEE (2017)Google Scholar
  14. 14.
    Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742. IEEE (2006)Google Scholar
  15. 15.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  16. 16.
    Kawano, Y., Yanai, K.: Automatic expansion of a food image dataset leveraging existing categories with domain adaptation. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 3–17. Springer, Cham (2015). Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  18. 18.
    Li, H.C., Ko, W.M.: Automated food ontology construction mechanism for diabetes diet care. In: 2007 International Conference on Machine Learning and Cybernetics, vol. 5, pp. 2953–2958. IEEE (2007)Google Scholar
  19. 19.
    Liu, C., Cao, Y., Luo, Y., Chen, G., Vokkarane, V., Ma, Y.: DeepFood: deep learning-based food image recognition for computer-aided dietary assessment. In: Chang, C.K., Chiari, L., Cao, Y., Jin, H., Mokhtari, M., Aloulou, H. (eds.) ICOST 2016. LNCS, vol. 9677, pp. 37–48. Springer, Cham (2016). Scholar
  20. 20.
    Martinel, N., Foresti, G.L., Micheloni, C.: Wide-slice residual networks for food recognition. arXiv preprint arXiv:1612.06543 (2016)
  21. 21.
    Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 689–696 (2011)Google Scholar
  22. 22.
    Organization, W.H., et al.: Food and health in Europe: a new basis for action. World Health Organization, Regional Office for Europe (2004)Google Scholar
  23. 23.
    Ragusa, F., Tomaselli, V., Furnari, A., Battiato, S., Farinella, G.M.: Food vs non-food classification. In: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, pp. 77–81. ACM (2016)Google Scholar
  24. 24.
    Rozin, P., Fischler, C., Imada, S., Sarubin, A., Wrzesniewski, A.: Attitudes to food and the role of food in life in the usa, japan, flemish belgium and france: possible implications for the diet-health debate. Appetite 33(2), 163–180 (1999)CrossRefGoogle Scholar
  25. 25.
    Salvador, A., et al.: Learning cross-modal embeddings for cooking recipes and food images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (2017)Google Scholar
  26. 26.
    Salvador, A., Hynes, N., Aytar, Y., Marin, J., Ofli, F., Weber, I., Torralba, A.: Learning cross-modal embeddings for cooking recipes and food images. Training 720, 619–508 (2017)Google Scholar
  27. 27.
    Shore, J., Johnson, R.: Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inform. Theor. 26(1), 26–37 (1980)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Speer, R., Lowry-Duda, J.: Conceptnet at semeval-2017 task 2: extending word embeddings with multilingual relational knowledge. arXiv preprint arXiv:1704.03560 (2017)
  29. 29.
    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol. 4, p. 12 (2017)Google Scholar
  30. 30.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, Boston (2009). Scholar
  31. 31.
    Wu, W., Yang, J.: Fast food recognition from videos of eating for calorie estimation. In: IEEE International Conference on Multimedia and Expo, ICME 2009, pp. 1210–1213. IEEE (2009)Google Scholar
  32. 32.
    Xu, R., Herranz, L., Jiang, S., Wang, S., Song, X., Jain, R.: Geolocalized modeling for dish recognition. IEEE Trans. Multimed. 17(8), 1187–1199 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Universitat de BarcelonaBarcelonaSpain
  2. 2.Computer Vision CenterBellaterraSpain

Personalised recommendations