Tree Based Shape Similarity Measurement for Chinese Characters

Cao, Yanan; Wang, Shi; Cao, Cungen

doi:10.1007/978-3-319-25159-2_26

Yanan Cao²²,
Shi Wang²³ &
Cungen Cao²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9403))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

2940 Accesses

Abstract

In Chinese, there are many characters which are similar in shape, and this phenomenon usually induces writing errors. As one important issue in spelling automatic correction, shape similarity measurement is still a challenging problem. To address this issue, we propose a component-tree based method in this paper, which is based on the hypothesis “characters are similar if their construction and components are both similar”. Firstly, we decompose each character to a tree recursively, in which the root node is the character and the leaf nodes are atomic parts, called strokes. Then, we align any pair of trees using their minimal super-tree and calculate their similarity from bottom to up based on weighted edit distance. Finally, the cognitive prominence is used to adjust the similarity scores. In text proofreading experiments, our method achieved 97% precision and 95.6% recall, which can be applied in practical systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rou, S., Min, L., Shili, G.: Similarity Calculation of Chinese Character Glyph and its Application in Computer Aided Proofreading System. Journal of Chinese Computer Systems 29 (2008)
Google Scholar
Lin, M., Song, R.: A Stroke-Segment-Mesh (SSM) Glyph Description Method of Chinese Characters. Journal of Computer Research and Development 47(2) (2010)
Google Scholar
Nagata, M.: Japanese OCR error correction using character shape similarity and statistical language model. In: Proceedings of the 17th International Conference on Computational Linguistics (1998)
Google Scholar
Chinese Character Coding Group: Shanghai Jiaotong University: Chinese Character Information Dictionary. Science Press, Beijing (1988)
Google Scholar
National Languate Committee: GF3001-1997 Chinese Character Component Standard of GB 13000.1 Character Set for Information Processing. Language & Culture Press, Beijing (1997)
Google Scholar
Bishop, T., Cook, R.: A Specification for CDL (Character Description Language). http://www.wenlin.com/cdl/cdl_spec_2003_10_32.pdf
ZhiWei, F.: Description of Chinese Character Structure by Context Free Grammar. Lingustic Sciences 5(3), 14–23 (2006)
Google Scholar
Xingming, S., Jianping, Y., Huowang, C.: On Mathematical Expression of a Chinese Character. Journal of Computer Research and Development 39(6), 707–711 (2002)
Google Scholar
ChuBong-Foo: Handbook of the Fifth Generation of the Cangjie Input Method (2008). http://www.cbflabs.com/book/ocj5/ocj5/index.html
Liu, C.L., Lin, J.H.: Using structural information for identifying similar Chinese characters. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (2008)
Google Scholar
Jing, C., Zhichun, M., Youqian, S.: Computer simulation of the cognition of Chinese characters. Transactions on Intelligent Systems 3 (2008)
Google Scholar
Marzal, A., Vidal, E.: Computation of normalized edit distance and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 15 (1993). Ph.D. Dissertation Submitted to UC Berkeley, Department of Linguistics (2003)
Google Scholar
Tversky, A.: Preference, Brlief, and Similarity. MIT Press (2003)
Google Scholar
Jiang, T., Wang, L., Zhang, K.: Alignment of trees: an alternative to tree edit. Theoretical Computer Science 143(1), 137–148 (1995)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Science, Beijing, China
Yanan Cao
Institute of Computer Science, Chinese Academy of Science, Beijing, China
Shi Wang & Cungen Cao

Authors

Yanan Cao
View author publications
You can also search for this author in PubMed Google Scholar
Shi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Cungen Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shi Wang .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Songmao Zhang
Ludwig-Maximilians-Universität München, Munich, Germany
Martin Wirsing
Southwest University, Chongqing, China
Zili Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cao, Y., Wang, S., Cao, C. (2015). Tree Based Shape Similarity Measurement for Chinese Characters. In: Zhang, S., Wirsing, M., Zhang, Z. (eds) Knowledge Science, Engineering and Management. KSEM 2015. Lecture Notes in Computer Science(), vol 9403. Springer, Cham. https://doi.org/10.1007/978-3-319-25159-2_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-25159-2_26
Published: 03 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25158-5
Online ISBN: 978-3-319-25159-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics