Skip to main content

Tree Based Shape Similarity Measurement for Chinese Characters

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9403))

  • 2940 Accesses

Abstract

In Chinese, there are many characters which are similar in shape, and this phenomenon usually induces writing errors. As one important issue in spelling automatic correction, shape similarity measurement is still a challenging problem. To address this issue, we propose a component-tree based method in this paper, which is based on the hypothesis “characters are similar if their construction and components are both similar”. Firstly, we decompose each character to a tree recursively, in which the root node is the character and the leaf nodes are atomic parts, called strokes. Then, we align any pair of trees using their minimal super-tree and calculate their similarity from bottom to up based on weighted edit distance. Finally, the cognitive prominence is used to adjust the similarity scores. In text proofreading experiments, our method achieved 97% precision and 95.6% recall, which can be applied in practical systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rou, S., Min, L., Shili, G.: Similarity Calculation of Chinese Character Glyph and its Application in Computer Aided Proofreading System. Journal of Chinese Computer Systems 29 (2008)

    Google Scholar 

  2. Lin, M., Song, R.: A Stroke-Segment-Mesh (SSM) Glyph Description Method of Chinese Characters. Journal of Computer Research and Development 47(2) (2010)

    Google Scholar 

  3. Nagata, M.: Japanese OCR error correction using character shape similarity and statistical language model. In: Proceedings of the 17th International Conference on Computational Linguistics (1998)

    Google Scholar 

  4. Chinese Character Coding Group: Shanghai Jiaotong University: Chinese Character Information Dictionary. Science Press, Beijing (1988)

    Google Scholar 

  5. National Languate Committee: GF3001-1997 Chinese Character Component Standard of GB 13000.1 Character Set for Information Processing. Language & Culture Press, Beijing (1997)

    Google Scholar 

  6. Bishop, T., Cook, R.: A Specification for CDL (Character Description Language). http://www.wenlin.com/cdl/cdl_spec_2003_10_32.pdf

  7. ZhiWei, F.: Description of Chinese Character Structure by Context Free Grammar. Lingustic Sciences 5(3), 14–23 (2006)

    Google Scholar 

  8. Xingming, S., Jianping, Y., Huowang, C.: On Mathematical Expression of a Chinese Character. Journal of Computer Research and Development 39(6), 707–711 (2002)

    Google Scholar 

  9. ChuBong-Foo: Handbook of the Fifth Generation of the Cangjie Input Method (2008). http://www.cbflabs.com/book/ocj5/ocj5/index.html

  10. Liu, C.L., Lin, J.H.: Using structural information for identifying similar Chinese characters. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (2008)

    Google Scholar 

  11. Jing, C., Zhichun, M., Youqian, S.: Computer simulation of the cognition of Chinese characters. Transactions on Intelligent Systems 3 (2008)

    Google Scholar 

  12. Marzal, A., Vidal, E.: Computation of normalized edit distance and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 15 (1993). Ph.D. Dissertation Submitted to UC Berkeley, Department of Linguistics (2003)

    Google Scholar 

  13. Tversky, A.: Preference, Brlief, and Similarity. MIT Press (2003)

    Google Scholar 

  14. Jiang, T., Wang, L., Zhang, K.: Alignment of trees: an alternative to tree edit. Theoretical Computer Science 143(1), 137–148 (1995)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shi Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Cao, Y., Wang, S., Cao, C. (2015). Tree Based Shape Similarity Measurement for Chinese Characters. In: Zhang, S., Wirsing, M., Zhang, Z. (eds) Knowledge Science, Engineering and Management. KSEM 2015. Lecture Notes in Computer Science(), vol 9403. Springer, Cham. https://doi.org/10.1007/978-3-319-25159-2_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25159-2_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25158-5

  • Online ISBN: 978-3-319-25159-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics