Abstract
There are many classical Mongolian historical documents which are reserved in image form, and as a result it is inconvenient for us to search and mining the desired content. In order to facilitate the word recognition in the document digitization procedure, this paper proposes a novel approach to segment the historical words in which the characters are intrinsically connected together and possess remarkable overlapping and variation. The approach consist of three steps: (1)significant contour point (SCP) detection on the approximated polygon of the word’s external contour, (2)baseline locating based on the logistic regression model and (3)segment path generation and validation based on the heuristic rules and the neural network. The SCP helps in the baseline locating and segment path generation. Experiment on the historical Mongolian Kanjur demonstrates that our approach could effectively locate the words’ baselines and segment the words into characters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zand, M., Nilchi, A.N., Monadjemi, S.A.: Recognition-based Segmentation in Persian Character Recognition. World Academy of Science, Engineering and Technology 2, 162–166 (2008)
Saba, T., Rehman, A., Elarbi-Boudihir, M.: Methods and Strategies on Off-line Cursive Touched Characters Segmentation: a Directional Review. Artificial Intelligence Review (2011)
Verma, B., Lee, H.: Segment Confidence-based Binary Segmentation (SCBS) for Cursive Handwritten Words. Expert Systems with Applications 38, 11167–11175 (2011)
Lee, H., Verma, B.: Binary Segmentation Algorithm for English Cursive Handwriting Recognition. Pattern Recognition 45, 1306–1317 (2012)
Li, W., Gao, G., Hou, H., Li, Z.: A Design and Implementation of Element Segmentation in the Recognition of Printed Mongolian Characters. Inner Mongolia University 34, 357–360 (2003)
Verma, B.: A Contour Code Feature Based Segmentation For Handwriting Recognition. In: Proceedings of ICDAR, vol. 2. IEEE Computer Society (2003)
Lei, Y., Liu, C.S., Ding, X.Q., Fu, Q.: A Recognition Based System for Segmentation of Touching Handwritten Numeral Strings. In: Liu, C.S., Ding, X.Q., Qiang, F. (eds.), pp. 294–299 (2004)
Liang, Z., Shi, P.: A Metasynthetic Approach for Segmenting Handwritten Chinese Character Strings. Pattern Recognition Letters 26, 1498–1511 (2005)
Vellasquesa, E., Oliveiraa, L.S., Britto Jr., A.S., Koericha, A.L.: Filtering Segmentation Cuts for Digit String Recognition. Pattern Recognition 41, 3044–3053 (2008)
Gao, G., Su, X., Wei, H., Gong, Y.: Classical Mongolian Words Recognition in Historical Document. In: Proceedings of ICDAR, pp. 692–697. IEEE Computer Society (2011)
Peng, L., Liu, C., Ding, X., Jin, J., Wu, Y., Wang, H., Bao, Y.: Multi-font Printed Mongolian Document Recognition System. International Journal of Document Analysis Recognition 13, 93–106 (2010)
Ramer, U.: An Iterative Procedure for the Polygonal Approximation of Plane Curves. Computer Graphics and Image Processing 1, 244–256 (1972)
Douglas, D., Peucker, T.: Algorithms for the Reduction of the Number of Points Required to Represent a Digitized Line or Its Caricature. Cartographica: The International Journal for Geographic Information and Geovisualization 10, 112–122 (1973)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Su, X., Gao, G., Wang, W., Bao, F., Wei, H. (2014). Character Segmentation for Classical Mongolian Words in Historical Documents. In: Li, S., Liu, C., Wang, Y. (eds) Pattern Recognition. CCPR 2014. Communications in Computer and Information Science, vol 484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45643-9_49
Download citation
DOI: https://doi.org/10.1007/978-3-662-45643-9_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45642-2
Online ISBN: 978-3-662-45643-9
eBook Packages: Computer ScienceComputer Science (R0)