Advertisement

CNN-Based Erratic Cigarette Code Recognition

  • Zhi-Feng XieEmail author
  • Shu-Han ZhangEmail author
  • Peng WuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11901)

Abstract

Cigarette code is a string printed on the wrapper of cigarette packet as a basis of distinguishing illegal sales for tobacco administration. In general, the code is excerpted and entered to administration system manually during on-site inspection, which is quite time-consuming and laborious. In this paper, we propose a new solution based on convolutional neural network for intelligent transcription. Our recognition method is composed of four components: detection, identification, alignment, and regularization. First of all, the detection component fine-tunes an end-to-end detection network to obtain the bounding box region of cigarette code. Then the identification component constructs an optimized CNN architecture to recognize each character in the region of cigarette code. Meanwhile the alignment component trains a CPM-based network to estimate the positions of all characters including some missing characters. Finally, the regularization component develops a matching algorithm to produce a regularized result with all characters. The experimental results demonstrate that our proposed method can perform a better, faster and more labor-saving cigarette code transcription process.

Keywords

Cigarette code Optical Character Recognition Convolutional neural network 

References

  1. 1.
    Bartz, C., Yang, H., Meinel, C.: SEE: towards semi-supervised end-to-end scene text recognition. arXiv preprint arXiv:1712.05404 (2017)
  2. 2.
    Busta, M., Neumann, L., Matas, J.: Deep TextSpotter: an end-to-end trainable scene text localization and recognition framework. In: IEEE International Conference on Computer Vision, pp. 2223–2231 (2017)Google Scholar
  3. 3.
    Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: AON: towards arbitrarily-oriented text recognition. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 5571–5579 (2018).  https://doi.org/10.1109/CVPR.2018.00584
  4. 4.
    Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. arXiv preprint arXiv:1609.04938 (2016)
  5. 5.
    Deng, Y., Kanervisto, A., Rush, A.M.: What you get is what you see: a visual markup decompiler (2016)Google Scholar
  6. 6.
    Ghosh, S.K., Valveny, E., Bagdanov, A.D.: Visual attention models for scene text recognition. arXiv preprint arXiv:1706.01487 (2017)
  7. 7.
    He, P., Huang, W., Qiao, Y., Chen, C.L., Tang, X.: Reading scene text in deep convolutional sequences, vol. 116, no. 1, pp. 3501–3508 (2015)Google Scholar
  8. 8.
    He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 5020–5029 (2018).  https://doi.org/10.1109/CVPR.2018.00527
  9. 9.
    Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. Eprint Arxiv (2014)Google Scholar
  10. 10.
    Jain, A., Tompson, J., LeCun, Y., Bregler, C.: MoDeep: a deep learning framework using motion features for human pose estimation. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9004, pp. 302–315. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16808-1_21CrossRefGoogle Scholar
  11. 11.
    Kowalski, M., Naruniec, J., Trzcinski, T.: Deep alignment network: a convolutional neural network for robust face alignment. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2034–2043 (2017)Google Scholar
  12. 12.
    Kuhn, H.W.: The hungarian method for the assignment problem. Nav. Res. Logist. Q. 2(1–2), 83–97 (1955)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Li, W., Cao, L., Zhao, D., Cui, X.: CRNN: integrating classification rules into neural network. In: International Joint Conference on Neural Networks, pp. 1–8 (2013)Google Scholar
  14. 14.
    Liu, W., Chen, C., Wong, K., Su, Z., Han, J.: STAR-NET: a spatial attention residue network for scene text recognition (2016)Google Scholar
  15. 15.
    Liu, W., Chen, C., Wong, K.K.: Char-Net: a character-aware neural network for distorted scene text recognition. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI 2018), The 30th Innovative Applications of Artificial Intelligence (IAAI 2018), and The 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI 2018), New Orleans, Louisiana, USA, 2–7 February 2018, pp. 7154–7161 (2018)Google Scholar
  16. 16.
    Liu, Y., Wang, Z., Jin, H., Wassell, I.: Synthetically supervised feature learning for scene text recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 449–465. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01228-1_27CrossRefGoogle Scholar
  17. 17.
    Liu, Z., Li, Y., Ren, F., Goh, W.L., Yu, H.: SqueezedText: a real-time scene text recognition by binary convolutional encoder-decoder network. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI 2018, The 30th Innovative Applications of Artificial Intelligence (IAAI 2018), and The 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI 2018), New Orleans, Louisiana, USA, 2–7 February 2018, pp. 7194–7201 (2018)Google Scholar
  18. 18.
    Mulmuley, K., Vazirani, U.V., Vazirani, V.V.: Matching is as easy as matrix inversion. Combinatorica 7(1), 105–113 (1987).  https://doi.org/10.1007/BF02579206MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3538–3545. IEEE (2012)Google Scholar
  20. 20.
    Pan, S.J., Yang, Q., et al.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  21. 21.
    Prasad, S., Kong, A.W.K.: Using object information for spotting text. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 559–576. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01270-0_33CrossRefGoogle Scholar
  22. 22.
    Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. CoRR arXiv:abs/1804.02767 (2018)
  23. 23.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  24. 24.
    Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 1 (2018).  https://doi.org/10.1109/TPAMI.2018.2848939CrossRefGoogle Scholar
  25. 25.
    Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483 (2013)Google Scholar
  26. 26.
    Tompson, J., Goroshin, R., Jain, A., Lecun, Y., Bregler, C.: Efficient object localization using convolutional networks, pp. 648–656 (2014)Google Scholar
  27. 27.
    Tompson, J., Jain, A., Lecun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. Eprint Arxiv, pp. 1799–1807 (2014)Google Scholar
  28. 28.
    Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)Google Scholar
  29. 29.
    Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1457–1464. IEEE (2011)Google Scholar
  30. 30.
    Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 3304–3308. IEEE (2012)Google Scholar
  31. 31.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  32. 32.
    Wu, Y., Hassner, T., Kim, K., Medioni, G., Natarajan, P.: Facial landmark detection with tweaked convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2015)Google Scholar
  33. 33.
    Xu, Y., et al.: End-to-end subtitle detection and recognition for videos in East Asian languages via CNN ensemble. Sig. Process. Image Commun. 60, 131–143 (2018)CrossRefGoogle Scholar
  34. 34.
    Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Sig. Process. Lett. 23(10), 1499–1503 (2016)CrossRefGoogle Scholar
  35. 35.
    Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_7CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Film and Television EngineeringShanghai UniversityShanghaiChina
  2. 2.Shanghai Engineering Research Center of Motion Picture Special EffectsShanghaiChina

Personalised recommendations