Advertisement

An Evaluation of OCR Systems Against Adversarial Machine Learning

  • Dan SporiciEmail author
  • Mihai Chiroiu
  • Dan Ciocîrlan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11359)

Abstract

Optical Character Recognition (OCR), while representing a significant progress in the field of computer vision can also contribute to malicious acts that imply automation. As an example, copycats of whole books use OCR technologies to eliminate the effort of typing by hand whenever a clear text version is not available; the same OCR process is also used by various bots in order to bypass CAPTCHA filters and gain access to certain functionalities. In this paper, we propose an approach for automatically converting text into unrecognizable characters for the OCR systems. This approach uses adversarial machine learning techniques, based on crafting inputs in an evolutionary manner, in order to adapt documents by performing a relatively small number of changes which should, in turn, make the text unrecognizable. We show that our mechanism can preserve the readability of text, while achieving great results against OCR services.

Keywords

Optical character recognition Genetic algorithm Adversarial machine learning 

Notes

Acknowledgements

This work was supported by a grant of Romanian Ministry of Research and Innovation, CCCDI - UEFISCDI, project number PN-III-P1-1.2-PCCDI-2017-0272/17PCCDI-2018, within PNCDI III.

References

  1. 1.
    Smith, R.: An overview of the Tesseract OCR engine. In: 2007 Ninth International Conference on Document Analysis and Recognition. ICDAR 2007, vol. 2. IEEE (2007)Google Scholar
  2. 2.
    Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of OCR research and development. Proc. IEEE 80(7), 1029–1058 (1992)CrossRefGoogle Scholar
  3. 3.
    Mori, S., Nishida, H., Yamada, H.: Optical Character Recognition. Wiley, Hoboken (1999)Google Scholar
  4. 4.
    Chaudhuri, B.B., Pal, U.: A complete printed Bangla OCR system. Pattern Recogn. 31(5), 531–549 (1998)CrossRefGoogle Scholar
  5. 5.
    Hasegawa, M., et al.: Removal of salt-and-pepper noise using a high-precision frequency analysis approach. IEICE Trans. Inf. Syst. 100(5), 1097–1105 (2017)CrossRefGoogle Scholar
  6. 6.
    Patel, C., Patel, A., Patel, D.: Optical character recognition by open source OCR tool tesseract: a case study. Int. J. Comput. Appl. 55(10) (2012)CrossRefGoogle Scholar
  7. 7.
    Forrest, S.: Genetic algorithms: principles of natural selection applied to computation. Science 261(5123), 872–878 (1993)CrossRefGoogle Scholar
  8. 8.
    Koza, J.R.: Genetic programming as a means for programming computers by natural selection. Stat. Comput. 4(2), 87–112 (1994)CrossRefGoogle Scholar
  9. 9.
    Xu, W., Qi, Y., Evans, D.: Automatically evading classifiers. In: Proceedings of the 2016 Network and Distributed Systems Symposium (2016)Google Scholar
  10. 10.
    DiPaola, S., Gabora, L.: Incorporating characteristics of human creativity into an evolutionary art algorithm. Genet. Program. Evolvable Mach. 10(2), 97–110 (2009)CrossRefGoogle Scholar
  11. 11.
    Moriarty, D.E., Miikkulainen, R.: Discovering complex Othello strategies through evolutionary neural networks. Connect. Sci. 7(3), 195–210 (1995)CrossRefGoogle Scholar
  12. 12.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversalsGoogle Scholar
  13. 13.
    Dalvi, N., et al.: Adversarial classification. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2004)Google Scholar
  14. 14.
    Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM (2005)Google Scholar
  15. 15.
    Laskov, P.: Practical evasion of a learning-based classifier: a case study. In: 2014 IEEE Symposium on Security and Privacy (SP). IEEE (2014)Google Scholar
  16. 16.
    Hosseini, H., Xiao, B., Poovendran, R.: Google’s Cloud Vision API Is Not Robust To Noise. arXiv preprint arXiv:1704.05051 (2017)
  17. 17.
    Papernot, N., et al.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM (2017)Google Scholar
  18. 18.
    Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.BucharestRomania

Personalised recommendations