Skip to main content

Character Segmentation for Classical Mongolian Words in Historical Documents

  • Conference paper
Pattern Recognition (CCPR 2014)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 484))

Included in the following conference series:

Abstract

There are many classical Mongolian historical documents which are reserved in image form, and as a result it is inconvenient for us to search and mining the desired content. In order to facilitate the word recognition in the document digitization procedure, this paper proposes a novel approach to segment the historical words in which the characters are intrinsically connected together and possess remarkable overlapping and variation. The approach consist of three steps: (1)significant contour point (SCP) detection on the approximated polygon of the word’s external contour, (2)baseline locating based on the logistic regression model and (3)segment path generation and validation based on the heuristic rules and the neural network. The SCP helps in the baseline locating and segment path generation. Experiment on the historical Mongolian Kanjur demonstrates that our approach could effectively locate the words’ baselines and segment the words into characters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zand, M., Nilchi, A.N., Monadjemi, S.A.: Recognition-based Segmentation in Persian Character Recognition. World Academy of Science, Engineering and Technology 2, 162–166 (2008)

    Google Scholar 

  2. Saba, T., Rehman, A., Elarbi-Boudihir, M.: Methods and Strategies on Off-line Cursive Touched Characters Segmentation: a Directional Review. Artificial Intelligence Review (2011)

    Google Scholar 

  3. Verma, B., Lee, H.: Segment Confidence-based Binary Segmentation (SCBS) for Cursive Handwritten Words. Expert Systems with Applications 38, 11167–11175 (2011)

    Article  Google Scholar 

  4. Lee, H., Verma, B.: Binary Segmentation Algorithm for English Cursive Handwriting Recognition. Pattern Recognition 45, 1306–1317 (2012)

    Article  Google Scholar 

  5. Li, W., Gao, G., Hou, H., Li, Z.: A Design and Implementation of Element Segmentation in the Recognition of Printed Mongolian Characters. Inner Mongolia University 34, 357–360 (2003)

    Google Scholar 

  6. Verma, B.: A Contour Code Feature Based Segmentation For Handwriting Recognition. In: Proceedings of ICDAR, vol. 2. IEEE Computer Society (2003)

    Google Scholar 

  7. Lei, Y., Liu, C.S., Ding, X.Q., Fu, Q.: A Recognition Based System for Segmentation of Touching Handwritten Numeral Strings. In: Liu, C.S., Ding, X.Q., Qiang, F. (eds.), pp. 294–299 (2004)

    Google Scholar 

  8. Liang, Z., Shi, P.: A Metasynthetic Approach for Segmenting Handwritten Chinese Character Strings. Pattern Recognition Letters 26, 1498–1511 (2005)

    Article  Google Scholar 

  9. Vellasquesa, E., Oliveiraa, L.S., Britto Jr., A.S., Koericha, A.L.: Filtering Segmentation Cuts for Digit String Recognition. Pattern Recognition 41, 3044–3053 (2008)

    Article  Google Scholar 

  10. Gao, G., Su, X., Wei, H., Gong, Y.: Classical Mongolian Words Recognition in Historical Document. In: Proceedings of ICDAR, pp. 692–697. IEEE Computer Society (2011)

    Google Scholar 

  11. Peng, L., Liu, C., Ding, X., Jin, J., Wu, Y., Wang, H., Bao, Y.: Multi-font Printed Mongolian Document Recognition System. International Journal of Document Analysis Recognition 13, 93–106 (2010)

    Article  Google Scholar 

  12. Ramer, U.: An Iterative Procedure for the Polygonal Approximation of Plane Curves. Computer Graphics and Image Processing 1, 244–256 (1972)

    Article  Google Scholar 

  13. Douglas, D., Peucker, T.: Algorithms for the Reduction of the Number of Points Required to Represent a Digitized Line or Its Caricature. Cartographica: The International Journal for Geographic Information and Geovisualization 10, 112–122 (1973)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Su, X., Gao, G., Wang, W., Bao, F., Wei, H. (2014). Character Segmentation for Classical Mongolian Words in Historical Documents. In: Li, S., Liu, C., Wang, Y. (eds) Pattern Recognition. CCPR 2014. Communications in Computer and Information Science, vol 484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45643-9_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45643-9_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45642-2

  • Online ISBN: 978-3-662-45643-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics