The Method of Clearing Printed and Handwritten Texts from Noise

Chernenko, S.; Lychko, S.; Kovalkova, M.; Esina, Y.; Timofeev, V.; Varshamov, K.; Karlov, A.; Pozdeev, A.

doi:10.1007/978-3-030-50097-9_2

The Method of Clearing Printed and Handwritten Texts from Noise

Conference paper
First Online: 23 June 2020

764 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1156))

Abstract

The article reviews the existing methods and algorithms for clearing printed and handwritten texts from noise and proposes an alternative approach. Among the solutions analyzed, a group of methods based on adaptive threshold conversion is distinguished. Our method for clearing print and handwritten documents from noise is based on using of a convolutional neural network ensemble with a U-Net architecture and a multi-layer perceptron. Using consequently a convolutional neural network and a multilayer perceptron demonstrates high efficiency in small training sets. As a result of applying our method to the entire test sample, an image cleaning degree of 93% was achieved. In the future, these methods can be introduced in libraries, hospitals, news companies where people work with non-digitized papers and digitization is needed.

Download conference paper PDF

1 Introduction

Recognizing noisy text is difficult for most OCR algorithms [16]. Many documents that need to be digitized contain spots, pages with curved corners and many wrinkles - the so-called noise. Often that results in recognition errors. But when an image is cleared, then the accuracy can increase up to 100%. The quality of character recognition varies greatly depending on the text recognition used, filtering and image segmentation algorithm [13].

Currently, there are many different solutions to this problem, but most of them either do not provide satisfactory result, or require hardware resource and are highly time-consuming [3].

This article proposes an effective method for clearing printed and handwritten texts from noise [2], based on the use of a sequential convolutional neural network with a U-Net architecture and a multi-layer perceptron.

Neural network definition lies in the field of artificial intelligence, which based on attempts to reproduce the human nervous system. The main problem is the ability to learn and correct errors.

2 The Concept of a Neural Network

Neural network definition lies in the field of artificial intelligence, which based on attempts to reproduce the human nervous system. The main problem is the ability to learn and correct errors.

A mathematical neuron is a computational unit that receives input data, performs calculations, and transmits it further along the network.

Each neuron has inputs (x₁, x₂, x₃, …, x_n), they receive data. Each neuron stores the weights of its connections. When a neuron is activated, a nonlinear transformation (activation function) of the weighted sum of the neuron input values is calculated. In other words, the output value of the neuron is calculated as:

$$ {\text{O }} = {\rm{ f}}_{\text{a}} \left( {{\rm{x}}_{ 1} {\rm{w}}_{ 1} + {\text{x}}_{ 2} {\rm{w}}_{ 2} + \cdots + {\rm{x}}_{\rm{n}} {\text{w}}_{\rm{n}} } \right) $$

(1)

Where x_i…n is the input values of the neuron, w_i…n are the weight coefficients of the neuron, f_a is the activation function. The task of training a neuron is to adjust the weights of the neuron using a particular method.

We considered architectures of neural networks that are feedforward networks [11] - sequentially connected layers of mathematical neurons. The input values of each subsequent layer are the output values of the neurons of the previous layer. When the network is activated, values are fed to the input layer, each layer is activated sequentially. As a result of network activation are output values of the last layer.

Direct propagation networks are usually trained using the backpropagation method [12] and their modifications. This method refers to guided training methods and is itself a form of gradient descent. The output values of the network are subtracted from the desired values, then, as a result, an error is generated that propagates through the network in the opposite direction. These weights are adjusted to maximize the output of the network to the desired.

3 Overview of Existing Methods

Until now, classical computer vision algorithms are the most popular in the tasks of clearing images from noise.

One way to eliminate dark spots and other similar artifacts in an image is the adaptive threshold [8]. This operation does not binarize the image by a constant threshold, but takes into account the values of the neighboring pixels, thus the areas of the spots will be eliminated. However, after performing a threshold transformation, noise remains in the image, in the form of a set of small dots in place of spots (some parts of the spot exceed the threshold). Result of adaptive threshold is demonstrated in Fig. 1.

The successive overlay of filters of erosion and dilation [9] helps to get rid of this effect, but such operations can damage the letters of the text. This is shown in Fig. 2.

This result is not sufficient for recognition, so it can be improved using the non-local means method [10]. This method is applied before the adaptive threshold conversion and allows you to reduce the values of those points where small noise occurs. The result shown in Fig. 3 is much better, but it still shows small artifacts such as lines and points.

Analysis of existing methods has shown that the use of classical computer vision algorithms does not always show a good result and needs to be modernized.

4 Description of the Method Developed for Clearing Print and Handwritten Texts from Noise

4.1 Task Setting

The task of clearing text from noise, recognizing text in an image and converting it into text format consists of a number of subtasks [4, 6]:

1.
Select a test image;
2.
Converting color images to shades of gray;
3.
Scaling and cutting images to a certain size;
4.
Clearing the text from noise with the help of a convolutional neural network and a multilayer perceptron.

4.2 Preparation of Images Before Training

After reading, the input image is converted to a single channel (grayscale), where the value of each pixel is calculated as:

$$ {\text{Y}}' = \, 0. 2 9 9 {\rm{R }} + \, 0. 5 8 7 {\text{G }} + \, 0. 1 1 4 {\rm{B}} $$

(2)

This equation is used in the OpenCV library and is justified by the characteristics of human color perception [14, 15].

Image sizes can be arbitrary, but too large sizes are undesirable.

For training 144 pictures are used. Since the size of the available training sample was not enough to train the convolutional network, it was decided to divide the images into parts. Since the training sample consists of images of different sizes, each image was scaled to the size of 448 × 448 pixels using the linear interpolation method. After that, they were all cut into non-overlapping windows measuring 112 × 112 pixels. All images were rotated 90, 180 and 270°. Thus, about 9216 images were obtained in the training sample. As a result, an array with the dimension (16,112,112,1) is fed to the input of the network. In the same way, the test sample was processed. The test sample consisted of similar images, the differences were only in the texture of the noise and in the text. We can see process of slicing and resizing of an image in Fig. 4

The training sample of a single-layer perceptron is formed as follows [7]:

1.
Images of the training set are passed through the pre-trained network of the U-Net architecture. Of the 144 images, only 36 were processed;
2.
28 different filters are superimposed on the resulting images. Thus, from each image we get 29 different, using the initial one;
3.
Next, pairs are formed (input vector; resultant vector). The input vector is formed from pixels located at the same place in the 29 resulting images. The resulting vector consists of one corresponding pixel from the cleaned image;
4.
Operation (3) is performed for each of the 36 images. As a result, the training sample has a volume of 36 * 448 * 488 elements.

As a result, the training sample has a volume of 36 * 448 * 488 elements.

4.3 Artificial Neural Network Training

In the proposed method, to clean print and handwritten texts from noise, a sequential convolutional neural network with U-Net architecture [17] and a multilayer perceptron are used. An array of non-overlapping areas of the original image measuring 112 × 122 is fed to the input of the network, and the output is a similar array with processed areas.

A smaller version of the U-Net architecture was selected, consisting of only two blocks (the original version of four) (Fig. 5).

The advantage of this architecture is that a small amount of training data is required for network training. At the same time, the network has a relatively small number of weights due to its convolutional architecture.

The architecture is a sequence of layers of convolution and pooling [18], which reduce the spatial resolution of the image, then increase it by combining the image with the data and passing it through other layers of the convolution.

Despite the fact that the convolutional network coped with the majority of noise, the image became less clear and left artifacts on it. To improve the quality of the text in the image, another artificial neural network is used - the multilayer perceptron.

The output array of the convolutional network is glued together into a single image with dimensions of 448 × 488 pixels, after which it is fed to the input of a multilayer perceptron. The format of the training set was described in Sect. 4.2.

The structure of the multilayer perceptron consists of 3 layers: 29 input neurons, 500 neurons on the hidden layer and one output neuron [1].

4.4 Testing of an Artificial Neural Network

The results of processing the original image using the reduced U-Net architecture are shown in Fig. 6.

As a result of the subsequent processing of the obtained image, its accuracy and contrast increased significantly. Small artifacts were also removed. An example is shown in Fig. 7.

5 Developed Solution

During the study, a software module was developed for digitizing damaged documents using this method. Python 3 was chosen as the development language. The Keras open neural network library, the Numpy library, and the OpenCV computer vision library have been used.

The module also has the ability to recognize text from the processed image using the Tesseract OCR engine [5].

When processing the input data in the form of a noisy image with text using this module, the output data is obtained in the form of text in a format suitable for its processing.

6 Conclusion

We reviewed several existing methods for clearing noisy printed documents, identified their shortcomings and proposed a method that has higher efficiency. The method described in the work requires a small training sample, it works quickly and has an average accuracy of noise removal of 93% [20].

Thus, the image processed by the method described in this article is quite clean, has no significant distortion and is easily recognized by most OCR engines and applications.

In the future, these methods can be used in libraries, hospitals [19], news companies where people work with non-digitized papers and their digitization is needed.

References

Khorosheva, T.: Neural network control interface of the speaker dependent computer system «Deep Interactive Voice Assistant DIVA» to help people with speech impairments. In: International Conference on Intelligent Information Technologies for Industry. Springer, Cham (2018)
Google Scholar
Cai, J., Liu, Z.-Q.: Off-line unconstrained handwritten word recognition. Int. J. Pattern Recognit. Artif. Intell. 14(03), 259–280 (2000)
Article Google Scholar
Fan, K.-C., Wang, L.-S., Tu, Y.-T.: Classification of machine-printed and handwritten texts using character block layout variance. Pattern Recogn. 31(9), 1275–1284 (1998)
Article Google Scholar
Imade, S., Tatsuta, S., Wada, T.: Segmentation and classification for mixed text/image documents using neural network. In: Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR 1993). IEEE (1993)
Google Scholar
Impedovo, S., Ottaviano, L., Occhinegro, S.: Optical character recognition—a survey. Int. J. Pattern Recognit. Artif. Intell. 5(01n02), 1–24 (1991)
Article Google Scholar
Rehman, A., Kurniawan, F., Saba, T.: An automatic approach for line detection and removal without smash-up characters. Imaging Sci. J. 59(3), 177–182 (2011)
Article Google Scholar
Brown, M.K., Ganapathy, S.: Preprocessing techniques for cursive script word recognition. Pattern Recogn. 16(5), 447–458 (1983)
Article Google Scholar
Bradley, D., Roth, G.: Adaptive thresholding using the integral image. J. Graph. Tools 12(2), 13–21 (2007)
Article Google Scholar
Jawas, N., Nanik, S.: Image inpainting using erosion and dilation operation. Int. J. Adv. Sci. Technol. 51, 127–134 (2013)
Google Scholar
Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2. IEEE (2005)
Google Scholar
Hornik, K., Maxwell, S., Halbert, W.: Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366 (1989)
Article Google Scholar
Hecht-Nielsen, R.: Theory of the backpropagation neural network. In: Neural Networks for Perception, pp. 65–93. Academic Press, New York (1992)
Google Scholar
Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition, vol. 2. IEEE (2007)
Google Scholar
OpenCV: Color Conversions. https://docs.opencv.org/3.4/de/d25/imgproc_color_conversions.html. Accessed 01 May 2019
Güneş, A., Habil, K., Efkan, D.: Optimizing the color-to-grayscale conversion for image classification. Signal Image Video Process. 10(5), 853–860 (2016)
Article Google Scholar
Sahare, P., Dhok, S.B.: Review of text extraction algorithms for scene-text and document images. IETE Tech. Rev. 34(2), 144–164 (2017)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham (2015)
Google Scholar
He, K.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
Adamo, F.: An automatic document processing system for medical data extraction. Measurement 61, 88–99 (2015)
Article Google Scholar
Denoising dirty documents. https://www.kaggle.com/c/denoising-dirty-documents. Accessed 01 July 2019

Download references

Author information

Authors and Affiliations

Moscow Polytechnic University, Moscow, Russia
S. Chernenko, S. Lychko, M. Kovalkova, Y. Esina, V. Timofeev, K. Varshamov, A. Karlov & A. Pozdeev

Authors

S. Chernenko
View author publications
You can also search for this author in PubMed Google Scholar
S. Lychko
View author publications
You can also search for this author in PubMed Google Scholar
M. Kovalkova
View author publications
You can also search for this author in PubMed Google Scholar
Y. Esina
View author publications
You can also search for this author in PubMed Google Scholar
V. Timofeev
View author publications
You can also search for this author in PubMed Google Scholar
K. Varshamov
View author publications
You can also search for this author in PubMed Google Scholar
A. Karlov
View author publications
You can also search for this author in PubMed Google Scholar
A. Pozdeev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Chernenko .

Editor information

Editors and Affiliations

Rostovskogo Strelkovogo Polka Narodnogo, Rostov State Transport University, Rostov-on-Don, Russia
Sergey Kovalev
Bauman Moscow State Technical University, Moscow, Russia
Valery Tarassov
Department of Computer Science, VSB-Technical University of Ostrava, Ostrava-Poruba, Czech Republic
Vaclav Snasel
Rostovskogo Strelkovogo Polka, Rostov State Transport University, Rostov-on-Don, Russia
Andrey Sukhanov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chernenko, S. et al. (2020). The Method of Clearing Printed and Handwritten Texts from Noise. In: Kovalev, S., Tarassov, V., Snasel, V., Sukhanov, A. (eds) Proceedings of the Fourth International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’19). IITI 2019. Advances in Intelligent Systems and Computing, vol 1156. Springer, Cham. https://doi.org/10.1007/978-3-030-50097-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-50097-9_2
Published: 23 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50096-2
Online ISBN: 978-3-030-50097-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics