Abstract
Optical Character Recognition (OCR), while representing a significant progress in the field of computer vision can also contribute to malicious acts that imply automation. As an example, copycats of whole books use OCR technologies to eliminate the effort of typing by hand whenever a clear text version is not available; the same OCR process is also used by various bots in order to bypass CAPTCHA filters and gain access to certain functionalities. In this paper, we propose an approach for automatically converting text into unrecognizable characters for the OCR systems. This approach uses adversarial machine learning techniques, based on crafting inputs in an evolutionary manner, in order to adapt documents by performing a relatively small number of changes which should, in turn, make the text unrecognizable. We show that our mechanism can preserve the readability of text, while achieving great results against OCR services.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Smith, R.: An overview of the Tesseract OCR engine. In: 2007 Ninth International Conference on Document Analysis and Recognition. ICDAR 2007, vol. 2. IEEE (2007)
Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of OCR research and development. Proc. IEEE 80(7), 1029–1058 (1992)
Mori, S., Nishida, H., Yamada, H.: Optical Character Recognition. Wiley, Hoboken (1999)
Chaudhuri, B.B., Pal, U.: A complete printed Bangla OCR system. Pattern Recogn. 31(5), 531–549 (1998)
Hasegawa, M., et al.: Removal of salt-and-pepper noise using a high-precision frequency analysis approach. IEICE Trans. Inf. Syst. 100(5), 1097–1105 (2017)
Patel, C., Patel, A., Patel, D.: Optical character recognition by open source OCR tool tesseract: a case study. Int. J. Comput. Appl. 55(10) (2012)
Forrest, S.: Genetic algorithms: principles of natural selection applied to computation. Science 261(5123), 872–878 (1993)
Koza, J.R.: Genetic programming as a means for programming computers by natural selection. Stat. Comput. 4(2), 87–112 (1994)
Xu, W., Qi, Y., Evans, D.: Automatically evading classifiers. In: Proceedings of the 2016 Network and Distributed Systems Symposium (2016)
DiPaola, S., Gabora, L.: Incorporating characteristics of human creativity into an evolutionary art algorithm. Genet. Program. Evolvable Mach. 10(2), 97–110 (2009)
Moriarty, D.E., Miikkulainen, R.: Discovering complex Othello strategies through evolutionary neural networks. Connect. Sci. 7(3), 195–210 (1995)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals
Dalvi, N., et al.: Adversarial classification. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2004)
Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM (2005)
Laskov, P.: Practical evasion of a learning-based classifier: a case study. In: 2014 IEEE Symposium on Security and Privacy (SP). IEEE (2014)
Hosseini, H., Xiao, B., Poovendran, R.: Google’s Cloud Vision API Is Not Robust To Noise. arXiv preprint arXiv:1704.05051 (2017)
Papernot, N., et al.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM (2017)
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Acknowledgements
This work was supported by a grant of Romanian Ministry of Research and Innovation, CCCDI - UEFISCDI, project number PN-III-P1-1.2-PCCDI-2017-0272/17PCCDI-2018, within PNCDI III.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sporici, D., Chiroiu, M., Ciocîrlan, D. (2019). An Evaluation of OCR Systems Against Adversarial Machine Learning. In: Lanet, JL., Toma, C. (eds) Innovative Security Solutions for Information Technology and Communications. SECITC 2018. Lecture Notes in Computer Science(), vol 11359. Springer, Cham. https://doi.org/10.1007/978-3-030-12942-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-12942-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12941-5
Online ISBN: 978-3-030-12942-2
eBook Packages: Computer ScienceComputer Science (R0)