Advertisement

Naïve Approach for Bounding Box Annotation and Object Detection Towards Smart Retail Systems

  • Pubudu EkanayakeEmail author
  • Zhaoli Deng
  • Chenhui YangEmail author
  • Xin HongEmail author
  • Jang Yang
Conference paper
  • 434 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11637)

Abstract

It is becoming a trend that companies use smart retail stores to reduce the selling cost, by using the sensor technologies. Deep convolutional neural network models which are pre-rained for the Object detection task achieve state-of-the-art result in many benchmark. However, when applying these algorithms to the intelligent retail system to help automated checkout, we need to reduce the manual labelling cost of making retail data sets, and to achieve real-time demand while ensuring accuracy. In our paper, we propose a naive approach to get first portion of the bounding box annotations for a given custom image dataset in order to reduce manual cost. Experimental results show that our approach helps to label the first set of images in short time of period. Further, the custom module we designed helped to reduce the number of parameters by 41.77% for the YOLO model maintaining the original model’s accuracy (85.8 mAP).

Keywords

Smart retail system Object detection Bounding box annotation Convolutional neural network YOLO 

References

  1. 1.
    Domdouzis, K., Kumar, B., Anumba, C.: Radio-frequency identification (RFID) applications: a brief introduction. Adv. Eng. Inf. 21(4), 350–355 (2007)CrossRefGoogle Scholar
  2. 2.
    Wankhede, K., Wukkadada, B., Nadar, V.: Just walk-out technology and its challenges: a case of Amazon go. In: 2018 International Conference on Inventive Research in Computing Applications (ICIRCA). IEEE (2018)Google Scholar
  3. 3.
    Wu, B.-F., et al.: An intelligent self-checkout system for smart retail. In: 2016 International Conference on System Science and Engineering (ICSSE). IEEE (2016)Google Scholar
  4. 4.
    Redmon, J., Farhadi, A.: YOLO9000: Better, faster, stronger. arxiv (2016). arxiv preprint arXiv:1612.08242
  5. 5.
    Redmon, J., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  6. 6.
    Karlinsky, L., et al.: Fine-grained recognition of thousands of object categories with single-example training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  7. 7.
    George, M., et al.: Fine-grained product class recognition for assisted shopping. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2015)Google Scholar
  8. 8.
    Tonioni, A., Di Stefano, L.: Product recognition in store shelves as a sub-graph isomorphism problem. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10484, pp. 682–693. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68560-1_61CrossRefGoogle Scholar
  9. 9.
    Franco, A., Maltoni, D., Papi, S.: Grocery product detection and recognition. Expert Syst. Appl. 81, 163–176 (2017)CrossRefGoogle Scholar
  10. 10.
    Solti, A., et al.: Misplaced product detection using sensor data without planograms. Decision Support Syst. 112, 76–87 (2018). S0167923618301039CrossRefGoogle Scholar
  11. 11.
    Papadopoulos, D.P., et al.: We don’t need no bounding-boxes: training object class detectors using only human verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  12. 12.
    Mettes, P., van Gemert, J.C., Snoek, C.G.M.: Spot on: action localization from pointly-supervised proposals. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 437–453. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_27CrossRefGoogle Scholar
  13. 13.
    Papadopoulos, D.P., et al.: Training object class detectors with click supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  14. 14.
    Papadopoulos, D.P., Clarke, A.D.F., Keller, F., Ferrari, V.: Training object class detectors from eye tracking data. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 361–376. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_24CrossRefGoogle Scholar
  15. 15.
    Vedaldi, A., Bilen, H.: Weakly supervised deep detection networks. In: Institute of Electrical and Electronics Engineers (2016)Google Scholar
  16. 16.
    Kantorov, V., Oquab, M., Cho, M., Laptev, I.: ContextLocNet: context-aware deep network models for weakly supervised localization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 350–365. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_22CrossRefGoogle Scholar
  17. 17.
    Zhu, Y., et al.: Soft proposal networks for weakly supervised object localization. In: Proceedings of the IEEE International Conference on Computer Vision (2017)Google Scholar
  18. 18.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  19. 19.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  20. 20.
    Van Horn, G., et al.: The inaturalist species classification and detection dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  21. 21.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  22. 22.
    King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10(7), 1755–1758 (2009)Google Scholar
  23. 23.
    Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  24. 24.
    Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  25. 25.
    Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 404–419. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01252-6_24CrossRefGoogle Scholar
  26. 26.
    Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  27. 27.
    Wei, X.-S., et al.: RPC: A Large-Scale Retail Product Checkout Dataset. arXiv preprint arXiv:1901.07249 (2019)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Computer Science Department, School of Information Science and EngineeringXiamen UniversityXiamenChina
  2. 2.College of Computer Science and TechnologyHuaqiao UniversityXiamenChina
  3. 3.Cognitive Science Department, Sixth CollegeUniversity of CaliforniaSan DiegoUSA

Personalised recommendations