Food-101 – Mining Discriminative Components with Random Forests

  • Lukas Bossard
  • Matthieu Guillaumin
  • Luc Van Gool
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8694)


In this paper we address the problem of automatically recognizing pictured dishes. To this end, we introduce a novel method to mine discriminative parts using Random Forests (rf), which allows us to mine for parts simultaneously for all classes and to share knowledge among them. To improve efficiency of mining and classification, we only consider patches that are aligned with image superpixels, which we call components. To measure the performance of our rf component mining for food recognition, we introduce a novel and challenging dataset of 101 food categories, with 101’000 images. With an average accuracy of 50.76%, our model outperforms alternative classification methods except for cnn, including svm classification on Improved Fisher Vectors and existing discriminative part-mining algorithms by 11.88% and 8.13%, respectively. On the challenging mit-Indoor dataset, our method compares nicely to other s-o-a component-based classification methods.


Image classification Discriminative part mining Random Forest Food recognition 


  1. 1.
    Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: CVPR (2012)Google Scholar
  2. 2.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded Up Robust Features. In: ICCV (2006)Google Scholar
  3. 3.
    Bosch, A., Zisserman, A., Munoz, X.: Image Classification using Random Forests and Ferns. In: ICCV (2007)Google Scholar
  4. 4.
    Breiman, L.: Random forests. Machine Learning (2001)Google Scholar
  5. 5.
    Chen, M., Dhingra, K., Wu, W., Yang, L., Sukthankar, R., Yang, J.: PFID: Pittsburgh fast-food image dataset. In: ICIP (2009)Google Scholar
  6. 6.
    Chen, M.Y., Yang, Y.H., Ho, C.J., Wang, S.H., Liu, S.M., Chang, E., Yeh, C.H., Ouhyoung, M.: Automatic Chinese food identification and quantity estimation. In: SIGGRAPH Asia 2012 Technical Briefs (2012)Google Scholar
  7. 7.
    Doersch, C., Gupta, A., Efros, A.A.: Mid-level visual element discovery as discriminative mode seeking. In: NIPS (2013)Google Scholar
  8. 8.
    Endres, I., Shih, K., Jiaa, J., Hoiem, D.: Learning Collections of Part Models for Object Recognition. In: CVPR (2013)Google Scholar
  9. 9.
    Felzenszwalb, P.F., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. PAMI (2010)Google Scholar
  10. 10.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient Graph-Based Image Segmentation. IJCV (2004)Google Scholar
  11. 11.
    Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. PAMI (2011)Google Scholar
  12. 12.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)Google Scholar
  13. 13.
    Hariharan, B., Malik, J., Ramanan, D.: Discriminative decorrelation for clustering and classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 459–472. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  14. 14.
    Ho, T.K.: Random decision forests. In: ICDAR (1995)Google Scholar
  15. 15.
    Hoashi, H., Joutou, T., Yanai, K.: Image Recognition of 85 Food Categories by Feature Fusion. In: ISM (2010)Google Scholar
  16. 16.
    Jia, Y.: Caffe: An open source convolutional architecture for fast feature embedding (2013),
  17. 17.
    Joachims, T., Finley, T., Yu, C.N.J.: Cutting-plane training of structural SVMs. Machine Learning (2009)Google Scholar
  18. 18.
    Joutou, T., Yanai, K.: A food image recognition system with Multiple Kernel Learning. In: ICIP (2009)Google Scholar
  19. 19.
    Juneja, M., Vedaldi, A., Jawahar, C., Zisserman, A.: Blocks That Shout: Distinctive Parts for Scene Classification. In: CVPR (2013)Google Scholar
  20. 20.
    Kawano, Y., Yanai, K.: Real-Time Mobile Food Recognition System. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (2013)Google Scholar
  21. 21.
    King, D.E.: Dlib-ml: A machine learning toolkit. JMLR (2009)Google Scholar
  22. 22.
    Kontschieder, P., Rota Bulò, S., Bischof, H., Pelillo, M.: Structured class-labels in random forests for semantic image labelling. In: ICCV (2011)Google Scholar
  23. 23.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  24. 24.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  25. 25.
    Li, Q., Wu, J., Tu, Z.: Harvesting mid-level visual concepts from large-scale internet images. In: CVPR (2013)Google Scholar
  26. 26.
    Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-svms for object detection and beyond. In: ICCV (2011)Google Scholar
  27. 27.
    Martin, C., Correa, J., Han, H., Allen, H., Rood, J., Champagne, C., Gunturk, B., Bray, G.: Validity of the remote food photography method (RFPM) for estimating energy and nutrient intake in near real-time. Obesity (2011)Google Scholar
  28. 28.
    Matsuda, Y., Hoashi, H., Yanai, K.: Multiple-Food Recognition Considering Co-occurrence Employing Manifold Ranking. In: ICPR (2012)Google Scholar
  29. 29.
    Moosmann, F., Nowak, E., Jurie, F.: Randomized clustering forests for image classification. PAMI (2008)Google Scholar
  30. 30.
    Noronha, J., Hysen, E., Zhang, H., Gajos, K.Z.: Platemate: crowdsourcing nutritional analysis from food photographs. In: ACM Symposium on UI Software and Technology (2011)Google Scholar
  31. 31.
    Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR (2009)Google Scholar
  32. 32.
    Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image Classification with the Fisher Vector: Theory and Practice. IJCV (2013)Google Scholar
  33. 33.
    Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: CVPR (2008)Google Scholar
  34. 34.
    Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 73–86. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  35. 35.
    Sun, J., Ponce, J.: Learning discriminative part detectors for image classification and cosegmentation. In: ICCV (2013)Google Scholar
  36. 36.
    Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. IJCV (2013)Google Scholar
  37. 37.
    Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008),
  38. 38.
    Wang, X., Wang, B., Bai, X., Liu, W., Tu, Z.: Max-margin multiple-instance dictionary learning. In: NIPS (2013)Google Scholar
  39. 39.
    Yang, S.L., Chen, M., Pomerleau, D., Sukthankar, R.: Food recognition using statistics of pairwise local features. In: CVPR (2010)Google Scholar
  40. 40.
    Yao, B., Khosla, A., Fei-Fei, L.: Combining randomization and discrimination for fine-grained image categorization. In: CVPR (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Lukas Bossard
    • 1
  • Matthieu Guillaumin
    • 1
  • Luc Van Gool
    • 1
    • 2
  1. 1.Computer Vision LabETH ZürichSwitzerland
  2. 2.ESAT, PSI-VISICSK.U. LeuvenBelgium

Personalised recommendations