Advertisement

Handwritten Text Line Segmentation Method by Writing Pheromone Diffusion and Convergence

  • Yintong WangEmail author
  • Wenjie Xiao
Conference paper
  • 32 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 1227)

Abstract

Text line segmentation in offline handwritten documents remains a challenge because the offline handwritten text lines are often inconsistency curved and skewed. More serious is the space between lines is not enough to distinguish them. In this paper, we propose a novel offline handwritten text line segmentation method by writing pheromone diffusion and convergence. According to the principle of gravity, we apply it to the lines location of the offline handwritten texts, the pheromone diffusion and convergence can learn to generate the pheromone matrix for extracting the key locations and fragments of the text line, that is made robust to deal with various offline handwritten documents with curved and multi-skewed text lines. In experiments on a commonly used database with offline handwritten text images, our method can significantly improve upon state-of-the-art text line segmentation methods.

Keywords

Handwritten text Line segmentation Pheromone matrix Diffusion and convergence 

1 Introduction

Text line segmentation from offline handwritten text images is one of the major issues in offline handwritten text image analysis. It provides crucial information for the subsequent tasks of text fragments and character segmentation, text string and character recognition. Comparing with the machine-printed document analysis, the complex layout structure and degraded image quality, offline handwritten document image analysis facing with more complex character shapes and irregularity of layout, which due to the variability of individuals writing styles [1, 2, 3]. For offline handwritten document images, text line segmentation is not completely solved though greatly efforts have been devoted to them and great advances have been achieved [4, 5, 6, 7].

In this paper, we present a handwritten text line segmentation method by writing pheromone diffusion and convergence. The main works of the proposed approach consist of (i) writing pheromone diffusion and convergence mode, (ii) key locations and fragments of text line, and (iii) text line segmentation. The most contribute of the proposed approach is that the pheromone diffusion and convergence can learn to generate the pheromone matrix for extracting the key locations and fragments of the text line, that is made robust to deal with various offline handwritten document images with curved and multi-skewed text lines.

The rest of this paper is organized as follows. Section 2 reviews briefly previous related work for text line segmentation of offline handwritten document images. In Sect. 3, we explain the proposed methodology for text line segmentation by writing pheromone diffusion and convergence. In Sect. 4, we present the experimental results. Finally, we draw the conclusions and future works in Sect. 5.

2 Related Works

In this section, we review the previous related work for text line segmentation and character segmentation of offline handwritten text images. As far as we know, the following methods either got the best results in the relevant opened offline handwritten document datasets, or are elements of comprehensive systems for specific tasks.

The main approaches of text line segmentation in offline handwritten document images can be practically divided into three categories: global, local, hybrid. The global approach [8, 9, 10] more focus on the isolate characters and reference them by their position in the offline handwritten document. That means it estimating the position of the text lines first, then allocating the character strings to the components to likely looking text lines, and dividing the components to multiple independent text lines. Zhang et al. [11] proposes the constrained seam carving method to acquire the global characteristics of the offline handwritten document images. This method calculates the energy map by passing along the connected components, and extracts the represent text line positions by computing the energy map. Quang et al. [12] draw a fully convolutional network to segment text line structure in offline handwritten documents. This method rough estimates of text line by a line map, and then constructs text strings pass through characters in corresponding text line. Sindhushree et al. [3] proposed an global text line segmentation method based on entropy, in which text region with higher entropy compared to that of non-text region in offline handwritten documents, and text line segmentation with the separate text from non-text part.

The local approach exceeds the limits of the global methods, its goal is to find local characters or graphemes first, and then aggregate them to split text lines [13, 14, 15]. These approaches in local classification vary from the way to represent and gather the local characters or graphemes. Nguyen and Lee [14] proposed a grouping approach to segment text lines, in which a text string that connects the center points of the characters in this text line is built, each tensor is consisting of a center location of a connected text string, and then compute the curve saliency values and normal vectors to construct the character sequences. Zhang et al. [16, 17, 18] proposed text line segmentation method based on Hough transform, which treats the text lines as partitioned character or grapheme blocks of connected components (CCs). Splitting these CCs to small, middle, large sized CCs by their average height value and width value of all CCs, dividing each middle sized CC into equally sized blocks, applying Hough transform to center of these equally sized blocks to extract the corresponding text lines based on the accumulator array, which means each CC is either allocated to the nearest text line or decomposed into multi-parts and then allocated again to the nearest text lines.

The hybrid approach uses a flexible handling to extract text lines, combining the advantages of the local approach and the global approach [6, 19, 20, 21]. Guo et al. finds the connected character or grapheme components as symbols, and then the direction of the text lines is computed using a special criterion framework [19]. All the above three approaches have their own advantages and disadvantages, in which global approach do not perform well on abnormal offline handwritten documents, such as adhesion, overlapping and curved text lines. The performance of local approach reliance on some preset parameters or heuristic rules, such as the nearest component distance metric for corresponding text lines. And the hybrid approach is complicated in calculation, and is non-trivial to design a robust combination framework.

3 Handwritten Text Line Segmentation Method

In this section, we introduce the methodology of handwritten text line segmentation method by writing pheromone diffusion and convergence. The objective of the proposed method is to deal with that text line appears in the document have an arbitrary skew angle and characters or graphemes of neighboring text lines may be connected, then made robust to handle various document images with multi-skewed, curved and connected text lines.

3.1 Writing Pheromone Diffusion and Convergence Mode

The character height is important for many parameters estimation in the handwritten text processing, such as character width, line height and line spacing. In this paper, we randomly select multiple positions in the handwritten text, then the nearest neighbor handwriting CCs are obtained from these positions and their heights also are calculated, \( \left\{ {h_{1} ,h_{2} , \cdots ,h_{n} } \right\} \). the estimated value of character height (ch) is set as the median value of the height of CCs.

In writing pheromone diffusion and convergence mode, each pixel is treated as a pheromone propagates information around, and then accumulates the information at each points in the handwritten text. The handwritten text image X’s pheromone matrix \( PM_{n \times m} \), the size of which is \( n \times m \), the initial value is 0. In the pheromone diffusion, the writing pixel \( x_{ij} \), ith row and jth column, contains one unit information, whose propagated information to neighboring is inversely to their distance. The farthest information propagation distance of pheromone is set to k, that is to say, \( x^{\prime}_{ij} \) is the farthest point affected by \( x_{ij} \), whose information from later is zero or infinity equals zero. Next, the pheromone information propagation matrix of \( x_{ij} \) is \( IN_{2k - 1,2k - 1} \), where \( IN_{k,k} \,{ = }\, 1 \) represents the \( x_{ij} \) to its own information is 1 unit, and the information to the pixels in the neighboring k range is
$$ IN_{k \pm \delta ,k \pm \delta } \,{ = }\,fun\_inv\left( {dist_{k \pm \delta ,k \pm \delta } } \right),\, 1\le \delta < {\text{k}} . $$
(1)
The distance between \( x_{ij} \) and \( x_{i \pm \delta ,j \pm \delta } \) is represented as:
$$ dist_{k \pm \delta ,k \pm \delta } = sqrt\left( {\left( {k \pm \delta } \right) \wedge 2 + \left( {k \pm \delta } \right) \wedge 2} \right) . $$
(2)

In the pheromone convergence, the Information Matrix \( PM_{n \times m} \) converge all the writing pheromone of the handwritten text. The ith row and jth column pixel’s information \( PM_{ij} \) is formula as \( PM_{ij} = IN_{ij}^{ + } \), where \( IN_{ij}^{ + } \) represents its k-nearest neighbor pheromone information. Note that, the information quantization of the image X edge pixel needs special processing, for example, the 1th row and 1th column pixel \( x_{1,1} \) is affected by the pixel’s pheromone in the fourth quadrant of coordinate axis, and the nth row and mth column pixel \( x_{n,m} \) is affected by the pixel’s pheromone in the second quadrant of coordinate axis.

3.2 Key Locations and Fragments

The pheromone matrix is composed of pheromone diffusion and convergence of each handwriting pixel, the high pheromone value corresponding to the handwriting character pixels in the region, and the low pheromone value is more biased towards non-text line areas. Therefore, the key locations and fragments of text line can be determined by analyzing the peaks and peak regions of the pheromone matrix, and then the text line segmentation of the handwritten text is further realized.

The min-max normalization as a simplest normalization technique, which is suited for the cases where the maximum and minimum bounds of the scores produced by a matcher are known. In pheromone computation case, we can easily shift the minimum and maximum scores of pheromone matrix to zero and one, respectively. The peaks of the pheromone matrix are the positions where the pheromone value in a local region is the highest, and they are also the most representative position of a character belonging to the corresponding text line. In handwritten text, the influence of the writing pixel is one unit by themselves, and gradually affects the pixels with distance of k, form the peak area of characters or the valley area between characters. As shown in the Fig. 1(c), local peak \( p_{ij} \) on the pheromone matrix \( PM_{n \times m} \), the ith row and jth column position or the central coordinate position of a plurality of adjacent pixels considered to be a character center, that is, a key peaks of the text lines. The formal inequality is as follows:
Fig. 1.

Text line segmentation result on HIT-MW handwritten document

$$ PM_{ij} \ge fun\_nei(p_{ij} ) . $$
(3)
where \( fun\_nei \) represents the max pheromone function of \( p_{ij} \)’s adjacent pixels.

In order to further analyze the directionality of text lines, the pheromone matrix be considered as a surface and slice it at height \( \varphi \left( { 0\,{ < }\,\varphi \,{ < }\, 1} \right) \) to obtain the cross section of the pheromone matrix, these cross sections corresponding to the regions with a higher pheromone value, as shown in Fig. 1(d). If there is a unique long axis in the cross section (the height is less than 3/2 * ch and the length is greater than 2 * ch), then the long axis is a fragment of text line. Otherwise, if there are multiple long axes in the cross section, then multiple text lines in this cross section, the appearance of skew and slant in the text lines.

3.3 Text Line Segmentation

The text line segmentation stage consists of three-step process. First, normal text line segmentation, the key peaks are mapped to the corresponding fragments of text lines, and using the least squares polynomial fitting function to achieve the text line. Second, abnormal text line segmentation, referring to the nearest text line, and gradually obtaining the key peaks cluster and fitting of each text line. Finally, text lines post-processing, there are some scattered key peaks and unconnected lines, merge it into the corresponding text line or as a single text line.

The least squares polynomial fitting technique as a simplest and most common form of linear/non-linear regression, it provides a solution to the case of finding the optimum fitting straight/curve line through a given point set. In the text line segmentation, its math procedure for extracting the optimum fitting text line to a given set of key peaks and fragments of text line by minimizing the sum of the squares of the offsets of the point set from the text line, which benefit is outlying key peaks and fragments have minimum effect on the text line’s fitting. Let \( f(x,p) \) be a known function of \( x \), parametrized in \( p \) consisting of a minimal number of coefficients. As we all known, the function is uniquely determined once the parameter set \( p \) is known. In text line segmentation, Curve text lines’ fitting is to compute the optimal parameter set \( p \) by minimizing the sum of squared differences between the real value and the expected value, they are measured values \( y^{\prime}_{i} \) of \( f \) and the values determined from the model \( f(x^{\prime}_{i} ,p) \) for measured values \( x^{\prime}_{i} \) of \( x \), respectively. That is, given n measurement pairs \( \left( {x^{\prime}_{i} ,y^{\prime}_{i} } \right) \), finding out \( p \) to minimize [22]:
$$ s(p) = \sum\limits_{i = 1}^{n} {\left( {y^{\prime}_{i} - f(x^{\prime}_{i} ,p)} \right)^{2} } . $$
(4)

4 Experimental Results

To evaluate the effectiveness of our method, we took into a commonly used database of offline handwritten Chinese documents and compared with some state-of-the-art text line segmentation methods.

4.1 Database Preparation

A commonly used database of offline handwritten Chinese document images, HIT-MW [23], is collected by Harbin Institute of Technology. The document images database consists 853 document images written by more than 780 writers. By analyzing the document images, we know that 8,677 text lines and each line has 21.51 characters on average in this database. Each handwritten document image is scanned at a resolution of 300 dots per inch, so a typical document image size is approximately equal to \( 1700*1500 \) pixels, and each document image contains 530 connected components on average.

4.2 Performance of Text Line Segmentation

For evaluating the performance of our method, the matching score table is introduced to describe the degree of matching between the text line segmentation region and the ground truth. When matching score is equal or above 95%, the text line segmentation region is deemed as a one-to-one match to the ground truth region. Let \( M \) is the number of text line segmentation region, \( N \) is the number of ground truth region, and \( o2o \) is the number of one-to-one match pairs, then the detection rate (\( DR \)) and recognition accuracy (\( RA \)) are defined as follows:
$$ DR = \frac{o2o}{N} . $$
(5)
$$ RA = \frac{o2o}{M} . $$
(6)
Combining \( DR \) and \( RA \), \( F{-}M{\text{easure}} \) is the evaluation metric:
$$ F{-}M{\text{easure}} = \frac{2 \times DR \times RA}{DR + RA} . $$
(7)
Offline handwritten Chinese text line segmentation contest is shown in Table 1. Our proposed method generates good results compared to the other state-of-the-art methods, the detection rate is 98.36%, recognition accuracy is 98.20%, and \( F{-}M{\text{easure}} \) is 98.28%. However, there are still some errors in complex layout structure cases due to the imperfect separation of characters or graphemes in the text line segmentation process.
Table 1.

Comparative experimental results

 

DR (%)

RA (%)

F-Measure (%)

X-Y projection [8]

45.67

46.13

45.90

Stroke skew correction [9]

55.34

55.12

55.20

Piece-wise projection [4]

92.07

92.51

92.29

MST clustering [5]

95.35

94.96

95.15

CUBS [15]

97.56

96.81

97.18

INMC [10]

98.38

98.24

98.31

NUS [13]

98.24

98.19

98.21

Proposed method

98.36

98.20

98.28

5 Conclusion and Future Work

In this paper, we have proposed a text line segmentation method by writing pheromone diffusion and convergence for the processing of handwritten document images. The main works of the proposed approach consist of (i) writing pheromone diffusion and convergence mode, (ii) key locations and fragments of text line, and (iii) text line segmentation. From experimental results it is shown that our proposed method outperforms state-of-the-art text line segmentation methods in offline handwritten document images. However, there are still some works for improving the segmentation of complex layout structure. Besides that, we also consider the application of writing pheromone diffusion and convergence mode for word segmentation in the future research.

Notes

Acknowledgement

This work is sponsored by the National Natural Science Fund of China (61976118, 61806098), Jiangsu Province Natural Science Foundation (BK20180142), Jiangsu Province Natural Science Foundation for Colleges and Universities (17KJB520020, 18KJB520029).

References

  1. 1.
    Ryu, J., Koo, H.I., Cho, N.I.: Language-independent text-line extraction algorithm for handwritten documents. IEEE Signal Process. Lett. 21(9), 1115–1119 (2014)CrossRefGoogle Scholar
  2. 2.
    Renton, G., Soullard, Y., Chatelain, C., Adam, S., Kermorvant, C., Paquet, T.: Fully convolutional network with dilated convolutions for handwritten text line segmentation. Int. J. Doc. Anal. Recogn. (IJDAR) 21(3), 177–186 (2018).  https://doi.org/10.1007/s10032-018-0304-3CrossRefGoogle Scholar
  3. 3.
    Sindhushree, G.S., Amarnath, R., Nagabhushan, P.: Entropy-based approach for enabling text line segmentation in handwritten documents. In: Nagabhushan, P., Guru, D.S., Shekar, B.H., Kumar, Y.H.S. (eds.) Data Analytics and Learning. LNNS, vol. 43, pp. 169–184. Springer, Singapore (2019).  https://doi.org/10.1007/978-981-13-2514-4_15CrossRefGoogle Scholar
  4. 4.
    Arivazhagan, M., Srinivasan, H., Srihari, S.: A statistical approach to line segmentation in handwritten documents. In: International Society for Optics and Photonics in Document Recognition and Retrieval, vol. 65000, pp. 1–11 (2007)Google Scholar
  5. 5.
    Yin, F., Liu, C.-L.: Handwritten Chinese text line segmentation by clustering with distance metric learning. Pattern Recogn. 42(12), 3146–3157 (2009)CrossRefGoogle Scholar
  6. 6.
    Deshmukh, M.S., Patil, M.P., Kolhe, S.R.: A hybrid text line segmentation approach for the ancient handwritten unconstrained freestyle modi script documents. Imaging Sci. J. 66(7), 433–442 (2018)CrossRefGoogle Scholar
  7. 7.
    Pak, I., Teh, P.L.: Text segmentation techniques: a critical review. In: Zelinka, I., Vasant, P., Duy, V.H., Dao, T.T. (eds.) Innovative Computing, Optimization and Its Applications. SCI, vol. 741, pp. 167–181. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-66984-7_10CrossRefGoogle Scholar
  8. 8.
    Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. Computer 25(7), 10–22 (1992)CrossRefGoogle Scholar
  9. 9.
    Su, T.-H., Zhang, T.-W., Huang, H.-J., Zhou, Y.: Skew detection for Chinese handwriting by horizontal stroke histogram. In: Ninth International Conference on Document Analysis and Recognition, vol. 2, pp. 899–903. IEEE (2007)Google Scholar
  10. 10.
    Koo, H.I., Cho, N.I.: Text-line extraction in handwritten Chinese documents based on an energy minimization framework. IEEE Trans. Image Process. 21(3), 1169–1175 (2012)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Zhang, X., Tan, C.L.: Text line segmentation for handwritten documents using constrained seam carving. In: International Conference on Frontiers in Handwriting Recognition, pp. 98–103. IEEE (2014)Google Scholar
  12. 12.
    Vo, Q.N., Kim, S.H., Yang, H.J., Lee, G.S.: Text line segmentation using a fully convolutional network in handwritten document images. IET Image Proc. 12(3), 438–446 (2017)CrossRefGoogle Scholar
  13. 13.
    Shi, Z., Setlur, S., Govindaraju, V.: Text extraction from gray scale historical document images using adaptive local connectivity map. In: Eighth International Conference on Document Analysis and Recognition, pp. 794–798. IEEE (2005)Google Scholar
  14. 14.
    Nguyen, T.D., Lee, G.: Text line segmentation in handwritten document images using tensor voting. Trans. Fund. Electron. Commun. Comput. Sci. 94(11), 2434–2441 (2011)Google Scholar
  15. 15.
    Shi, Z., Setlur, S., Govindaraju, V.: A steerable directional local profile technique for extraction of handwritten Arabic text lines. In: International Conference on Document Analysis and Recognition, pp. 176–180. IEEE (2009)Google Scholar
  16. 16.
    Zezhong, X., Shin, B.-S., Klette, R.: Closed form line-segment extraction using the hough transform. Pattern Recogn. 48(12), 4012–4023 (2015)CrossRefGoogle Scholar
  17. 17.
    Boukharouba, A.: A new algorithm for skew correction and baseline detection based on the randomized hough transform. J. King Saud Univ. Comput. Inf. Sci. 29(1), 29–38 (2017)Google Scholar
  18. 18.
    Zhang, L., Weidong, Yu.: Orientation image analysis of electrospun submicro-fibers based on hough transform and regionprops function. Text. Res. J. 87(18), 2263–2274 (2017)CrossRefGoogle Scholar
  19. 19.
    Guo, Y., Sun, Y., Bauer, P., Allebach, J.P., Bouman, C.A.: Text line detection based on cost optimized local text line direction estimation. In: The International Society for Optical Engineering, vol. 9395, pp. 1–7 (2015)Google Scholar
  20. 20.
    Adiguzel, H., Sahin, E., Duygulu, P.: A hybrid for line segmentation in handwritten documents. In: International Conference on Frontiers in Handwriting Recognition, pp. 503–508 (2012)Google Scholar
  21. 21.
    Ali, A.A.A., Suresha, M.: Efficient algorithms for text lines and words segmentation for recognition of Arabic handwritten script. In: Shetty, N.R., Patnaik, L.M., Nagaraj, H.C., Hamsavath, P.N., Nalini, N. (eds.) Emerging Research in Computing, Information, Communication and Applications. AISC, vol. 882, pp. 387–401. Springer, Singapore (2019).  https://doi.org/10.1007/978-981-13-5953-8_32CrossRefGoogle Scholar
  22. 22.
    Motulsky, H., Christopoulos, A.: Fitting Models to Biological Data Using Linear and Nonlinear Regression: A Practical Guide to Curve Fitting. Oxford University Press, Oxford (2004)zbMATHGoogle Scholar
  23. 23.
    Su, T.: Chinese Handwriting Recognition: An Algorithmic Perspective. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-31812-2CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Key Laboratory of Intelligent Information ProcessingNanjing Xiaozhuang UniversityNanjingPeople’s Republic of China

Personalised recommendations