An Evaluation of OCR Systems Against Adversarial Machine Learning

Sporici, Dan; Chiroiu, Mihai; Ciocîrlan, Dan

doi:10.1007/978-3-030-12942-2_11

Dan Sporici¹⁴,
Mihai Chiroiu¹⁴ &
Dan Ciocîrlan¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11359))

Included in the following conference series:

International Conference on Security for Information Technology and Communications

1581 Accesses

Abstract

Optical Character Recognition (OCR), while representing a significant progress in the field of computer vision can also contribute to malicious acts that imply automation. As an example, copycats of whole books use OCR technologies to eliminate the effort of typing by hand whenever a clear text version is not available; the same OCR process is also used by various bots in order to bypass CAPTCHA filters and gain access to certain functionalities. In this paper, we propose an approach for automatically converting text into unrecognizable characters for the OCR systems. This approach uses adversarial machine learning techniques, based on crafting inputs in an evolutionary manner, in order to adapt documents by performing a relatively small number of changes which should, in turn, make the text unrecognizable. We show that our mechanism can preserve the readability of text, while achieving great results against OCR services.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Smith, R.: An overview of the Tesseract OCR engine. In: 2007 Ninth International Conference on Document Analysis and Recognition. ICDAR 2007, vol. 2. IEEE (2007)
Google Scholar
Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of OCR research and development. Proc. IEEE 80(7), 1029–1058 (1992)
Article Google Scholar
Mori, S., Nishida, H., Yamada, H.: Optical Character Recognition. Wiley, Hoboken (1999)
Google Scholar
Chaudhuri, B.B., Pal, U.: A complete printed Bangla OCR system. Pattern Recogn. 31(5), 531–549 (1998)
Article Google Scholar
Hasegawa, M., et al.: Removal of salt-and-pepper noise using a high-precision frequency analysis approach. IEICE Trans. Inf. Syst. 100(5), 1097–1105 (2017)
Article Google Scholar
Patel, C., Patel, A., Patel, D.: Optical character recognition by open source OCR tool tesseract: a case study. Int. J. Comput. Appl. 55(10) (2012)
Article Google Scholar
Forrest, S.: Genetic algorithms: principles of natural selection applied to computation. Science 261(5123), 872–878 (1993)
Article Google Scholar
Koza, J.R.: Genetic programming as a means for programming computers by natural selection. Stat. Comput. 4(2), 87–112 (1994)
Article Google Scholar
Xu, W., Qi, Y., Evans, D.: Automatically evading classifiers. In: Proceedings of the 2016 Network and Distributed Systems Symposium (2016)
Google Scholar
DiPaola, S., Gabora, L.: Incorporating characteristics of human creativity into an evolutionary art algorithm. Genet. Program. Evolvable Mach. 10(2), 97–110 (2009)
Article Google Scholar
Moriarty, D.E., Miikkulainen, R.: Discovering complex Othello strategies through evolutionary neural networks. Connect. Sci. 7(3), 195–210 (1995)
Article Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals
Google Scholar
Dalvi, N., et al.: Adversarial classification. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2004)
Google Scholar
Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM (2005)
Google Scholar
Laskov, P.: Practical evasion of a learning-based classifier: a case study. In: 2014 IEEE Symposium on Security and Privacy (SP). IEEE (2014)
Google Scholar
Hosseini, H., Xiao, B., Poovendran, R.: Google’s Cloud Vision API Is Not Robust To Noise. arXiv preprint arXiv:1704.05051 (2017)
Papernot, N., et al.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM (2017)
Google Scholar
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar

Download references

Acknowledgements

This work was supported by a grant of Romanian Ministry of Research and Innovation, CCCDI - UEFISCDI, project number PN-III-P1-1.2-PCCDI-2017-0272/17PCCDI-2018, within PNCDI III.

Author information

Authors and Affiliations

Bucharest, Romania
Dan Sporici, Mihai Chiroiu & Dan Ciocîrlan

Authors

Dan Sporici
View author publications
You can also search for this author in PubMed Google Scholar
Mihai Chiroiu
View author publications
You can also search for this author in PubMed Google Scholar
Dan Ciocîrlan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dan Sporici .

Editor information

Editors and Affiliations

Inria-RBA, Rennes, France
Jean-Louis Lanet
Bucharest University of Economic Studies, Bucharest, Romania
Cristian Toma

A Appendix

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sporici, D., Chiroiu, M., Ciocîrlan, D. (2019). An Evaluation of OCR Systems Against Adversarial Machine Learning. In: Lanet, JL., Toma, C. (eds) Innovative Security Solutions for Information Technology and Communications. SECITC 2018. Lecture Notes in Computer Science(), vol 11359. Springer, Cham. https://doi.org/10.1007/978-3-030-12942-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-12942-2_11
Published: 06 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12941-5
Online ISBN: 978-3-030-12942-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Evaluation of OCR Systems Against Adversarial Machine Learning

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation