Abstract
In Mongolian historical document recognition, preprocessing mainly involves image binarization and denoising. This is a challenging task and greatly effects the accuracy of the recognition result. Concerning the fact that image binarization and denoising are both image-to-image tasks, this paper proposes an end-to-end preprocessor for Mongolian historical document OCR. The preprocessor is trained in an adversarial learning fashion and deal with binarization and denoising simultaneously. The input of the preprocessor is the color image of Mongolian document images, and the output is the clean binary images which can be used for word recognition. The preprocessor was trained on a limited dataset and performed better than the combination of binarization and denoising methods used in earlier Mongolian historical document OCR systems.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. CoRR, vol. abs/1611.07004 (2016)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR, vol. abs/1411.1784 (2014)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Florian, W.: Efficient document image binarization using heterogeneous computing and interactive machine learning, Licentiate Dissertation (2018)
Calvo-Zaragoza, J., Gallego, A.: A selectional auto-encoder approach for document image binarization (2017)
Tensmeyer, C., Martinez, T.: Document Image Binarization with Fully Convolutional Neural Networks (2017)
Szlam, A., Denton, E., Chintala, S., Fergus, R.: Deep generative image models using a laplacian pyramid of adversarial networks. In: Computer Vision and Pattern Recognition (2015)
Gauthier, J.: Conditional generative adversarial nets for convolutional face generation. In: Convolutional Neural Networks for Visual Recognition (2014)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)
Collis, J.: Glossary of deep learning: batch normalisation (2017)
Fischer, P., Ronneberger, O., Brox, T.: U-net: convolutional networks for biomedical image segmentation (2015)
Tamura, S., Waibel, A.: Noise reduction using connectionist models. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 553–556 (1988)
Acknowledgment
This work was funded by National Natural Science Foundation of China (Grant No. 61762069), Natural Science Foundation of Inner Mongolia Autonomous Region (Grant No. 2017BS0601, Grant No. 2018MS06025) and program of higher-level talents of Inner Mongolia University (Grant No. 21500-5165161).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Su, X., Xu, H., Zhang, Y., Kang, Y., Gao, G., Batusiren (2019). An End-to-End Preprocessor Based on Adversiarial Learning for Mongolian Historical Document OCR. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11672. Springer, Cham. https://doi.org/10.1007/978-3-030-29894-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-29894-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29893-7
Online ISBN: 978-3-030-29894-4
eBook Packages: Computer ScienceComputer Science (R0)