Advertisement

Method Combining Rule-Based and Corpus-Based Approaches for Oracle-Bone Inscription Information Processing

  • Huiying Cai
  • Minghu Jiang
  • Beixing Deng
  • Lin Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4114)

Abstract

Word segmentation and part of speech (POS) tagging are basis of processing oracle-bone inscription by using computer. It is hard to build a large tagged oracle-bone inscription corpus with grammar information. This is an obstacle if we want to use statistical method. In this paper, we propose to solve both problems with methods combining corpus-based and rule-based approaches. The accuracy of segmentor and tagger are 98.33% and 96.75% respectively. Our experiment result shows that the combining method is quite practical for processing the oracle-bone inscription, especially when the corpus is too sparse. In the end, we briefly discuss how to use the tagged result to complete syntax analysis with rule-based method.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Huiying Cai
    • 1
  • Minghu Jiang
    • 1
  • Beixing Deng
    • 2
  • Lin Wang
    • 3
  1. 1.Lab of Computational Linguistics, School of Humanities and Social Sciences, Tsinghua University, Beijing, 100084China
  2. 2.Dept. of Electronic Eng., Tsinghua University, Beijing, 100084China
  3. 3.School of Electronic Eng., Beijing Univ. of Post and Telecom, Beijing, 100876China

Personalised recommendations