Character Segmentation for Classical Mongolian Words in Historical Documents

Su, Xiangdong; Gao, Guanglai; Wang, Weihua; Bao, Feilong; Wei, Hongxi

doi:10.1007/978-3-662-45643-9_49

Xiangdong Su¹⁵,
Guanglai Gao¹⁵,
Weihua Wang¹⁵,
Feilong Bao¹⁵ &
…
Hongxi Wei¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 484))

Included in the following conference series:

Chinese Conference on Pattern Recognition

2379 Accesses
3 Citations

Abstract

There are many classical Mongolian historical documents which are reserved in image form, and as a result it is inconvenient for us to search and mining the desired content. In order to facilitate the word recognition in the document digitization procedure, this paper proposes a novel approach to segment the historical words in which the characters are intrinsically connected together and possess remarkable overlapping and variation. The approach consist of three steps: (1)significant contour point (SCP) detection on the approximated polygon of the word’s external contour, (2)baseline locating based on the logistic regression model and (3)segment path generation and validation based on the heuristic rules and the neural network. The SCP helps in the baseline locating and segment path generation. Experiment on the historical Mongolian Kanjur demonstrates that our approach could effectively locate the words’ baselines and segment the words into characters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zand, M., Nilchi, A.N., Monadjemi, S.A.: Recognition-based Segmentation in Persian Character Recognition. World Academy of Science, Engineering and Technology 2, 162–166 (2008)
Google Scholar
Saba, T., Rehman, A., Elarbi-Boudihir, M.: Methods and Strategies on Off-line Cursive Touched Characters Segmentation: a Directional Review. Artificial Intelligence Review (2011)
Google Scholar
Verma, B., Lee, H.: Segment Confidence-based Binary Segmentation (SCBS) for Cursive Handwritten Words. Expert Systems with Applications 38, 11167–11175 (2011)
Article Google Scholar
Lee, H., Verma, B.: Binary Segmentation Algorithm for English Cursive Handwriting Recognition. Pattern Recognition 45, 1306–1317 (2012)
Article Google Scholar
Li, W., Gao, G., Hou, H., Li, Z.: A Design and Implementation of Element Segmentation in the Recognition of Printed Mongolian Characters. Inner Mongolia University 34, 357–360 (2003)
Google Scholar
Verma, B.: A Contour Code Feature Based Segmentation For Handwriting Recognition. In: Proceedings of ICDAR, vol. 2. IEEE Computer Society (2003)
Google Scholar
Lei, Y., Liu, C.S., Ding, X.Q., Fu, Q.: A Recognition Based System for Segmentation of Touching Handwritten Numeral Strings. In: Liu, C.S., Ding, X.Q., Qiang, F. (eds.), pp. 294–299 (2004)
Google Scholar
Liang, Z., Shi, P.: A Metasynthetic Approach for Segmenting Handwritten Chinese Character Strings. Pattern Recognition Letters 26, 1498–1511 (2005)
Article Google Scholar
Vellasquesa, E., Oliveiraa, L.S., Britto Jr., A.S., Koericha, A.L.: Filtering Segmentation Cuts for Digit String Recognition. Pattern Recognition 41, 3044–3053 (2008)
Article Google Scholar
Gao, G., Su, X., Wei, H., Gong, Y.: Classical Mongolian Words Recognition in Historical Document. In: Proceedings of ICDAR, pp. 692–697. IEEE Computer Society (2011)
Google Scholar
Peng, L., Liu, C., Ding, X., Jin, J., Wu, Y., Wang, H., Bao, Y.: Multi-font Printed Mongolian Document Recognition System. International Journal of Document Analysis Recognition 13, 93–106 (2010)
Article Google Scholar
Ramer, U.: An Iterative Procedure for the Polygonal Approximation of Plane Curves. Computer Graphics and Image Processing 1, 244–256 (1972)
Article Google Scholar
Douglas, D., Peucker, T.: Algorithms for the Reduction of the Number of Points Required to Represent a Digitized Line or Its Caricature. Cartographica: The International Journal for Geographic Information and Geovisualization 10, 112–122 (1973)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Inner Mongolia University, Hohhot, China, 010021
Xiangdong Su, Guanglai Gao, Weihua Wang, Feilong Bao & Hongxi Wei

Authors

Xiangdong Su
View author publications
You can also search for this author in PubMed Google Scholar
Guanglai Gao
View author publications
You can also search for this author in PubMed Google Scholar
Weihua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Feilong Bao
View author publications
You can also search for this author in PubMed Google Scholar
Hongxi Wei
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Electrical and Information Engineering, Hunan University, 410082, Changsha, P.R. China
Shutao Li
Chinese Academy of Sciences, Beijing, China
Chenglin Liu
College of electrical and information engineering, Hunan University, 410082, Changsha, P.R. China
Yaonan Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Su, X., Gao, G., Wang, W., Bao, F., Wei, H. (2014). Character Segmentation for Classical Mongolian Words in Historical Documents. In: Li, S., Liu, C., Wang, Y. (eds) Pattern Recognition. CCPR 2014. Communications in Computer and Information Science, vol 484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45643-9_49

Download citation

DOI: https://doi.org/10.1007/978-3-662-45643-9_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45642-2
Online ISBN: 978-3-662-45643-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics