Bag of Tricks for Retail Product Image Classification

  • Muktabh Mayank SrivastavaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12131)


Retail Product Image Classification is an important Computer Vision and Machine Learning problem for building real world systems like self-checkout stores and automated retail execution evaluation. In this work, we present various tricks to increase accuracy of Deep Learning models on different types of retail product image classification datasets. These tricks enable us to increase the accuracy of fine tuned convnets for retail product image classification by a large margin. As the most prominent trick, we introduce a new neural network layer called Local-Concepts-Accumulation (LCA) layer which gives consistent gains across multiple datasets. Two other tricks we find to increase accuracy on retail product identification are using an instagram-pretrained Convnet and using Maximum Entropy as an auxiliary loss for classification.


Convolutional Neural Networks Retail image recognition Grocery image recognition Image classification 


  1. 1.
    Akgul, C.B.: Color histogram descriptors, data mining for visual media 2015, Assignment 04 (2015).
  2. 2.
    Baz, I., Yoruk, E., Cetin, M.: Context-aware hybrid classification system for fine-grained retail product recognition. In: 2016 IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), pp. 1–5. IEEE (2016)Google Scholar
  3. 3.
    DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)Google Scholar
  4. 4.
    Dubey, A., Gupta, O., Raskar, R., Naik, N.: Maximum-entropy fine grained classification. In: Advances in Neural Information Processing Systems, pp. 637–647 (2018)Google Scholar
  5. 5.
    Facebook Inc: Instagram (2020).
  6. 6.
    Franco, A., Maltoni, D., Papi, S.: Grocery product detection and recognition. Expert Syst. Appl. 81, 163–176 (2017)CrossRefGoogle Scholar
  7. 7.
    Geng, W., et al.: Fine-grained grocery product recognition by one-shot learning. In: 2018 ACM Multimedia Conference on Multimedia Conference, pp. 1706–1714. ACM (2018)Google Scholar
  8. 8.
    George, M., Floerkemeier, C.: Recognizing products: a per-exemplar multi-label image classification approach. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 440–455. Springer, Cham (2014). Scholar
  9. 9.
    Goldman, E., Herzig, R., Eisenschtat, A., Goldberger, J., Hassner, T.: Precise detection in densely packed scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5227–5236 (2019)Google Scholar
  10. 10.
    Harris, C.G., Stephens, M., et al.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 15. Citeseer (1988). 10-5244Google Scholar
  11. 11.
    Karlinsky, L., Shtok, J., Tzur, Y., Tzadok, A.: Fine-grained recognition of thousands of object categories with single-example training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4113–4122 (2017)Google Scholar
  12. 12.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)CrossRefGoogle Scholar
  13. 13.
    Leutenegger, S., Chli, M., Siegwart, R.: BRISK: binary robust invariant scalable keypoints. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2548–2555. IEEE (2011)Google Scholar
  14. 14.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). Scholar
  15. 15.
    Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 185–201. Springer, Cham (2018). Scholar
  16. 16.
    Merler, M., Galleguillos, C., Belongie, S.: Recognizing groceries in situ using in vitro training data. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)Google Scholar
  17. 17.
    Orhan, A.E.: Robustness properties of Facebook’s ResNeXt WSL models. arXiv preprint arXiv:1907.07640 (2019)
  18. 18.
    Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8024–8035 (2019)Google Scholar
  19. 19.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). Scholar
  20. 20.
    Tonioni, A., Di Stefano, L.: Domain invariant hierarchical embedding for grocery products recognition. Comput. Vis. Image Underst. 182, 81–92 (2019)CrossRefGoogle Scholar
  21. 21.
    Tonioni, A., Di Stefano, L.: Product recognition in store shelves as a sub-graph isomorphism problem. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10484, pp. 682–693. Springer, Cham (2017). Scholar
  22. 22.
    Varadarajan, S., Kant, S., Srivastava, M.M.: Benchmark for generic product detection: a low data baseline for dense object detection (2019)Google Scholar
  23. 23.
    Wikipedia Contributors: Packshot – Wikipedia, the free encyclopedia (2019). Accessed 2 Apr 2020
  24. 24.
    Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)Google Scholar
  25. 25.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.ParallelDots, Inc.GurugramIndia

Personalised recommendations