Method Combining Rule-Based and Corpus-Based Approaches for Oracle-Bone Inscription Information Processing
Word segmentation and part of speech (POS) tagging are basis of processing oracle-bone inscription by using computer. It is hard to build a large tagged oracle-bone inscription corpus with grammar information. This is an obstacle if we want to use statistical method. In this paper, we propose to solve both problems with methods combining corpus-based and rule-based approaches. The accuracy of segmentor and tagger are 98.33% and 96.75% respectively. Our experiment result shows that the combining method is quite practical for processing the oracle-bone inscription, especially when the corpus is too sparse. In the end, we briefly discuss how to use the tagged result to complete syntax analysis with rule-based method.
Unable to display preview. Download preview PDF.