Boundary Recognition of Light-Pause Marks via Grammar Testing Method

  • Yiwen Mo
  • Bo Chen
  • Pei Lei
Computer Science


Boundary recognition is an important research of natural language processing, and it provides a basis for the application of Chinese word segmentation, chunk analysis, named entity recognition, etc. Based on ambiguity in boundary recognition of Chinese punctuation marks, this paper proposes grammar testing methods for boundary recognition of slight-pause marks and then calculates the annotation consistency of these methods. The statistical results show that grammar testing methods can greatly improve the annotation consistency of slight-pause marks boundary recognition. The consistency during the second time is 0.030 3 higher than during the first, which will help guarantee the consistency of large-scale corpus annotation and improve the quality of corpus annotation.

Key words

slight-pause marks boundary grammar testing corpus annotation Kappa statistics 

CLC number

TP 301 H 085 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Wang Z, Xue N W. Joint POS tagging and transition-based constituent parsing in Chinese with non-local features [C]// Meeting of the Association for Computational Linguistics. Berlin: Association for Computational Linguistics, 2014: 733–742.Google Scholar
  2. [2]
    Dhivya R, Dhanalakshmi V, Kumar M A, et al. Clause boundary identification for Tamil language using dependency parsing[C] // International Joint Conference on Advances in Signal Processing and Information Technology. Berlin Heidelberg: Springer-Verlag, 2011:195–197.Google Scholar
  3. [3]
    Xue N W, Ng H T, Pradhan S, et al. CoNLL 2016 shared task on multilingual shallow discourse parsing[C] // Proceedings of the Fifteenth Conference on Computational Natural Language Learning Shared Task. Berlin: Association for Computational Linguistics, 2016: 978–986.Google Scholar
  4. [4]
    Li X, Palmer M, Xue N W. Large multi-lingual, multi-level and multi-genre annotation corpus [C] // Proceedings of the 10th Edition of the Language Resources and Evaluation Conference (LREC). Portorož: Jozef Stefan Institute, 2016: 906–913.Google Scholar
  5. [5]
    Kong F, Zhou G. Chinese comma disambiguation on K-best parse trees [J]. Communications in Computer & Information Science, 2014, 496: 13–22.CrossRefGoogle Scholar
  6. [6]
    Li Y C, Gu J J, Zhou G D. Adding colon and semicolon label feature to Chinese comma classification [J]. Journal of Chinese Information Processing, 2014, 28(5): 215–222(Ch).Google Scholar
  7. [7]
    Qiu L K, Zhang Y, Jin P, et al. Multi-view Chinese Treebanking[ C] //Proceedings of the 25th International Conference on Computational Linguistics. Dublin: Association for Computational Linguistics, 2014: 257–268.Google Scholar
  8. [8]
    Huang C R, Xue N W. Modeling Word Concepts without Convention: Linguistic and Computational Issues in Chinese Word Identification [M]. Oxford: Oxford University Press, 2015: 348–361.Google Scholar
  9. [9]
    Chen Y P, Zheng Q H, Zhang W. Omni-word feature and soft constraint for Chinese relation exraction[C] // Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics Association for Computational Linguistics. Baltimore: Association for Computational Linguistics, 2014: 572–581.Google Scholar
  10. [10]
    Sun X, Matsuzaki T, Li W J. Latent structured perceptrons for large-scale learning with hidden information[J]. IEEE Trans Knowl Data Eng, 2013, 25(9): 2063–2075.CrossRefGoogle Scholar
  11. [11]
    Celce-Murcia M, Mcintosh L. Teaching English as a Second or Foreign Language [M]. Piscataway: IEEE, 1979.Google Scholar
  12. [12]
    Zhou J S, Qu W G, Zhang F. Exploiting chunk-level features to improve phrase chunking[C] // Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Stroundsburg: Association for Computational Linguistics, 2012: 557–567.Google Scholar
  13. [13]
    Sun X, Matsuzaki T, Okanohara D, et al. Latent variable perceptron algorithm for structured classification[C] // Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09). San Francisco: Morgan Kaufmann Publishers Inc, 2009: 1236–1242.Google Scholar
  14. [14]
    Stab E C, Gurevych I. Annotating argument components and relations in persuasive[C] // International Conference on Coling. Dublin: Computational Linguistics, 2014: 1501–1510.Google Scholar
  15. [15]
    Sergeant A. Automatic argumentation extraction [C] // Extended Semanic Web Conference. New York: ACM Press, 2013: 656–660. DOI: 10. 1007/978-3-642-38288-8-46.Google Scholar
  16. [16]
    Shermis M D, Burstein J. Handbook of Automated Essay Evaluation: Current Applications and New Directions [M]. Rutledge: Taylor & Francis Group, 2013.CrossRefGoogle Scholar
  17. [17]
    Attali Y, Lewis W, Steier M. Scoring with the computer: Alternative procedures for improving the reliability of holistic essay scoring [J]. Language Testing, 2013, 30(1):125–141. DOI:10.1177/026553212452396.CrossRefGoogle Scholar
  18. [18]
    Cohen J. A coefficient of agreement for nominal scales [J]. Educational and Psychological Measurement, 1960, 20(1): 37–46.CrossRefGoogle Scholar
  19. [19]
    Klebanov B B, Flor M. Argumentation-relevant metaphors in test-taker essays[C] // Proceedings of the First Workshop on Metaphor in NLP. Atlanta: NLP, 2013: 11–20.Google Scholar
  20. [20]
    Luu A, Malamud S A, Xue N W. Converting SynTagRus dependency treebank into penn treebank style[C] // Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016). Berlin: Association for Computational Linguistics, 2016: 16–21.CrossRefGoogle Scholar
  21. [21]
    Song L, Zhang Y, Peng X, et al. AMR-to-text generation as a Traveling Salesman Problem[EB/OL]. http://arXiv preprint arXiv. 2016: 1609. 07451.Google Scholar

Copyright information

© Wuhan University and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.College of Chinese Language and LiteratureWuhan UniversityWuhan, HubeiChina
  2. 2.School of ComputerWuhan UniversityWuhan, HubeiChina
  3. 3.Department of Language & LiteratureHubei University of Art & ScienceXiangyang, HubeiChina

Personalised recommendations