Abstract
The article reviews the existing methods and algorithms for clearing printed and handwritten texts from noise and proposes an alternative approach. Among the solutions analyzed, a group of methods based on adaptive threshold conversion is distinguished. Our method for clearing print and handwritten documents from noise is based on using of a convolutional neural network ensemble with a U-Net architecture and a multi-layer perceptron. Using consequently a convolutional neural network and a multilayer perceptron demonstrates high efficiency in small training sets. As a result of applying our method to the entire test sample, an image cleaning degree of 93% was achieved. In the future, these methods can be introduced in libraries, hospitals, news companies where people work with non-digitized papers and digitization is needed.
1 Introduction
Recognizing noisy text is difficult for most OCR algorithms [16]. Many documents that need to be digitized contain spots, pages with curved corners and many wrinkles - the so-called noise. Often that results in recognition errors. But when an image is cleared, then the accuracy can increase up to 100%. The quality of character recognition varies greatly depending on the text recognition used, filtering and image segmentation algorithm [13].
Currently, there are many different solutions to this problem, but most of them either do not provide satisfactory result, or require hardware resource and are highly time-consuming [3].
This article proposes an effective method for clearing printed and handwritten texts from noise [2], based on the use of a sequential convolutional neural network with a U-Net architecture and a multi-layer perceptron.
Neural network definition lies in the field of artificial intelligence, which based on attempts to reproduce the human nervous system. The main problem is the ability to learn and correct errors.
2 The Concept of a Neural Network
Neural network definition lies in the field of artificial intelligence, which based on attempts to reproduce the human nervous system. The main problem is the ability to learn and correct errors.
A mathematical neuron is a computational unit that receives input data, performs calculations, and transmits it further along the network.
Each neuron has inputs (x1, x2, x3, …, xn), they receive data. Each neuron stores the weights of its connections. When a neuron is activated, a nonlinear transformation (activation function) of the weighted sum of the neuron input values is calculated. In other words, the output value of the neuron is calculated as:
Where xi…n is the input values of the neuron, wi…n are the weight coefficients of the neuron, fa is the activation function. The task of training a neuron is to adjust the weights of the neuron using a particular method.
We considered architectures of neural networks that are feedforward networks [11] - sequentially connected layers of mathematical neurons. The input values of each subsequent layer are the output values of the neurons of the previous layer. When the network is activated, values are fed to the input layer, each layer is activated sequentially. As a result of network activation are output values of the last layer.
Direct propagation networks are usually trained using the backpropagation method [12] and their modifications. This method refers to guided training methods and is itself a form of gradient descent. The output values of the network are subtracted from the desired values, then, as a result, an error is generated that propagates through the network in the opposite direction. These weights are adjusted to maximize the output of the network to the desired.
3 Overview of Existing Methods
Until now, classical computer vision algorithms are the most popular in the tasks of clearing images from noise.
One way to eliminate dark spots and other similar artifacts in an image is the adaptive threshold [8]. This operation does not binarize the image by a constant threshold, but takes into account the values of the neighboring pixels, thus the areas of the spots will be eliminated. However, after performing a threshold transformation, noise remains in the image, in the form of a set of small dots in place of spots (some parts of the spot exceed the threshold). Result of adaptive threshold is demonstrated in Fig. 1.
The successive overlay of filters of erosion and dilation [9] helps to get rid of this effect, but such operations can damage the letters of the text. This is shown in Fig. 2.
This result is not sufficient for recognition, so it can be improved using the non-local means method [10]. This method is applied before the adaptive threshold conversion and allows you to reduce the values of those points where small noise occurs. The result shown in Fig. 3 is much better, but it still shows small artifacts such as lines and points.
Analysis of existing methods has shown that the use of classical computer vision algorithms does not always show a good result and needs to be modernized.
4 Description of the Method Developed for Clearing Print and Handwritten Texts from Noise
4.1 Task Setting
The task of clearing text from noise, recognizing text in an image and converting it into text format consists of a number of subtasks [4, 6]:
-
1.
Select a test image;
-
2.
Converting color images to shades of gray;
-
3.
Scaling and cutting images to a certain size;
-
4.
Clearing the text from noise with the help of a convolutional neural network and a multilayer perceptron.
4.2 Preparation of Images Before Training
After reading, the input image is converted to a single channel (grayscale), where the value of each pixel is calculated as:
This equation is used in the OpenCV library and is justified by the characteristics of human color perception [14, 15].
Image sizes can be arbitrary, but too large sizes are undesirable.
For training 144 pictures are used. Since the size of the available training sample was not enough to train the convolutional network, it was decided to divide the images into parts. Since the training sample consists of images of different sizes, each image was scaled to the size of 448 × 448 pixels using the linear interpolation method. After that, they were all cut into non-overlapping windows measuring 112 × 112 pixels. All images were rotated 90, 180 and 270°. Thus, about 9216 images were obtained in the training sample. As a result, an array with the dimension (16,112,112,1) is fed to the input of the network. In the same way, the test sample was processed. The test sample consisted of similar images, the differences were only in the texture of the noise and in the text. We can see process of slicing and resizing of an image in Fig. 4
The training sample of a single-layer perceptron is formed as follows [7]:
-
1.
Images of the training set are passed through the pre-trained network of the U-Net architecture. Of the 144 images, only 36 were processed;
-
2.
28 different filters are superimposed on the resulting images. Thus, from each image we get 29 different, using the initial one;
-
3.
Next, pairs are formed (input vector; resultant vector). The input vector is formed from pixels located at the same place in the 29 resulting images. The resulting vector consists of one corresponding pixel from the cleaned image;
-
4.
Operation (3) is performed for each of the 36 images. As a result, the training sample has a volume of 36 * 448 * 488 elements.
As a result, the training sample has a volume of 36 * 448 * 488 elements.
4.3 Artificial Neural Network Training
In the proposed method, to clean print and handwritten texts from noise, a sequential convolutional neural network with U-Net architecture [17] and a multilayer perceptron are used. An array of non-overlapping areas of the original image measuring 112 × 122 is fed to the input of the network, and the output is a similar array with processed areas.
A smaller version of the U-Net architecture was selected, consisting of only two blocks (the original version of four) (Fig. 5).
The advantage of this architecture is that a small amount of training data is required for network training. At the same time, the network has a relatively small number of weights due to its convolutional architecture.
The architecture is a sequence of layers of convolution and pooling [18], which reduce the spatial resolution of the image, then increase it by combining the image with the data and passing it through other layers of the convolution.
Despite the fact that the convolutional network coped with the majority of noise, the image became less clear and left artifacts on it. To improve the quality of the text in the image, another artificial neural network is used - the multilayer perceptron.
The output array of the convolutional network is glued together into a single image with dimensions of 448 × 488 pixels, after which it is fed to the input of a multilayer perceptron. The format of the training set was described in Sect. 4.2.
The structure of the multilayer perceptron consists of 3 layers: 29 input neurons, 500 neurons on the hidden layer and one output neuron [1].
4.4 Testing of an Artificial Neural Network
The results of processing the original image using the reduced U-Net architecture are shown in Fig. 6.
As a result of the subsequent processing of the obtained image, its accuracy and contrast increased significantly. Small artifacts were also removed. An example is shown in Fig. 7.
5 Developed Solution
During the study, a software module was developed for digitizing damaged documents using this method. Python 3 was chosen as the development language. The Keras open neural network library, the Numpy library, and the OpenCV computer vision library have been used.
The module also has the ability to recognize text from the processed image using the Tesseract OCR engine [5].
When processing the input data in the form of a noisy image with text using this module, the output data is obtained in the form of text in a format suitable for its processing.
6 Conclusion
We reviewed several existing methods for clearing noisy printed documents, identified their shortcomings and proposed a method that has higher efficiency. The method described in the work requires a small training sample, it works quickly and has an average accuracy of noise removal of 93% [20].
Thus, the image processed by the method described in this article is quite clean, has no significant distortion and is easily recognized by most OCR engines and applications.
In the future, these methods can be used in libraries, hospitals [19], news companies where people work with non-digitized papers and their digitization is needed.
References
Khorosheva, T.: Neural network control interface of the speaker dependent computer system «Deep Interactive Voice Assistant DIVA» to help people with speech impairments. In: International Conference on Intelligent Information Technologies for Industry. Springer, Cham (2018)
Cai, J., Liu, Z.-Q.: Off-line unconstrained handwritten word recognition. Int. J. Pattern Recognit. Artif. Intell. 14(03), 259–280 (2000)
Fan, K.-C., Wang, L.-S., Tu, Y.-T.: Classification of machine-printed and handwritten texts using character block layout variance. Pattern Recogn. 31(9), 1275–1284 (1998)
Imade, S., Tatsuta, S., Wada, T.: Segmentation and classification for mixed text/image documents using neural network. In: Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR 1993). IEEE (1993)
Impedovo, S., Ottaviano, L., Occhinegro, S.: Optical character recognition—a survey. Int. J. Pattern Recognit. Artif. Intell. 5(01n02), 1–24 (1991)
Rehman, A., Kurniawan, F., Saba, T.: An automatic approach for line detection and removal without smash-up characters. Imaging Sci. J. 59(3), 177–182 (2011)
Brown, M.K., Ganapathy, S.: Preprocessing techniques for cursive script word recognition. Pattern Recogn. 16(5), 447–458 (1983)
Bradley, D., Roth, G.: Adaptive thresholding using the integral image. J. Graph. Tools 12(2), 13–21 (2007)
Jawas, N., Nanik, S.: Image inpainting using erosion and dilation operation. Int. J. Adv. Sci. Technol. 51, 127–134 (2013)
Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2. IEEE (2005)
Hornik, K., Maxwell, S., Halbert, W.: Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366 (1989)
Hecht-Nielsen, R.: Theory of the backpropagation neural network. In: Neural Networks for Perception, pp. 65–93. Academic Press, New York (1992)
Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition, vol. 2. IEEE (2007)
OpenCV: Color Conversions. https://docs.opencv.org/3.4/de/d25/imgproc_color_conversions.html. Accessed 01 May 2019
Güneş, A., Habil, K., Efkan, D.: Optimizing the color-to-grayscale conversion for image classification. Signal Image Video Process. 10(5), 853–860 (2016)
Sahare, P., Dhok, S.B.: Review of text extraction algorithms for scene-text and document images. IETE Tech. Rev. 34(2), 144–164 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham (2015)
He, K.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Adamo, F.: An automatic document processing system for medical data extraction. Measurement 61, 88–99 (2015)
Denoising dirty documents. https://www.kaggle.com/c/denoising-dirty-documents. Accessed 01 July 2019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Chernenko, S. et al. (2020). The Method of Clearing Printed and Handwritten Texts from Noise. In: Kovalev, S., Tarassov, V., Snasel, V., Sukhanov, A. (eds) Proceedings of the Fourth International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’19). IITI 2019. Advances in Intelligent Systems and Computing, vol 1156. Springer, Cham. https://doi.org/10.1007/978-3-030-50097-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-50097-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50096-2
Online ISBN: 978-3-030-50097-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)