Skip to main content

An Automatic Approach to Treebank Error Detection Using a Dependency Parser

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

Abstract

Treebanks play an important role in the development of various natural language processing tools. Amongst other things, they provide crucial language-specific patterns that are exploited by various machine learning techniques. Quality control in any treebanking project is therefore extremely important. Manual validation of the treebank is one of the steps that is generally necessary to ensure good annotation quality. Needless to say, manual validation requires a lot of human time and effort. In this paper, we present an automatic approach which helps in detecting potential errors in a treebank. We use a dependency parser to detect such errors. By using this tool, validators can validate a treebank in less time and with reduced human effort.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abney, S.: Parsing by Chunks. Principle-Based Parsing 44, 257–278 (1991)

    Article  Google Scholar 

  2. Agarwal, R., Ambati, B., Sharma, D.: A Hybrid Approach to Error Detection in a Treebank and its Impact on Manual Validation Time. Linguistic Issues in Language Technology 7(1) (2012)

    Google Scholar 

  3. Ambati, B.R., Gupta, M., Husain, S., Sharma, D.M.: A High Recall Error Identification Tool for Hindi Treebank Validation. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA), Valletta (2010)

    Google Scholar 

  4. Ambati, B.R., Husain, S., Nivre, J., Sangal, R.: On the Role of Morphosyntactic Features in Hindi Dependency Parsing. In: Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, SPMRL 2010, pp. 94–102. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  5. Ambati, B., Agarwal, R., Gupta, M., Husain, S., Sharma, D.: Error Detection for Treebank Validation. In: The 9th International Workshop on Asian Language Resources (ALR), Chiang Mai, Thailand (2011)

    Google Scholar 

  6. Begum, R., Husain, S., Dhwaj, A., Misra, D., Bai, L., Sangal, R.: Dependency Annotation Scheme for Indian Languages (2008)

    Google Scholar 

  7. Bharati, A., Chaitanya, V., Sangal, R., Ramakrishnamacharyulu, K.: Natural Language Processing: A Paninian Perspective. Prentice-Hall of India (1995)

    Google Scholar 

  8. Bhatt, R., Narasimhan, B., Palmer, M., Rambow, O., Sharma, D.M., Xia, F.: A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu. In: Proceedings of the Third Linguistic Annotation Workshop, ACL-IJCNLP 2009, pp. 186–189. Association for Computational Linguistics, Stroudsburg (2009)

    Chapter  Google Scholar 

  9. Eskin, E.: Automatic Corpus Correction with Anomaly Detection. In: North American Chapter of the Association for Computational Linguistics (2000)

    Google Scholar 

  10. van Halteren, H.: The Detection of Inconsistency in Manually Tagged Text. In: Proceedings of LINC 2000, Luxembourg (2000)

    Google Scholar 

  11. Husain, S., Agrawal, B.: Analyzing Parser Errors to Improve Parsing Accuracy and to Inform Treebanking Decisions. Linguistic Issues in Language Technology 7(1) (2012)

    Google Scholar 

  12. Kaljurand, K.: Checking Treebank Consistency to Find Annotation Errors (2004)

    Google Scholar 

  13. de Kok, D., Ma, J., van Noord, G.: A Generalized Method for Iterative Error Mining in Parsing Results. In: Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks, GEAF 2009, pp. 71–79. Association for Computational Linguistics, Stroudsburg (2009)

    Chapter  Google Scholar 

  14. Kordoni, V.: Strategies for Annotation of Large Corpora of Multilingual Spontaneous Speech Data. In: The workshop on Multilingual Corpora: Linguistic Requirements and Technical Perspectives held at Corpus Linguistics, Citeseer (2003)

    Google Scholar 

  15. Nivre, J.: Incrementality in Deterministic Dependency Parsing. In: Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together, pp. 50–57. Association for Computational Linguistics (2004)

    Google Scholar 

  16. Nivre, J., Hall, J.: Maltparser: A Language-Independent System for Data-Driven Dependency Parsing. In: Proc. of the Fourth Workshop on Treebanks and Linguistic Theories, pp. 13–95 (2005)

    Google Scholar 

  17. van Noord, G.: Error Mining for Wide-Coverage Grammar Engineering. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL 2004. Association for Computational Linguistics, Stroudsburg (2004)

    Google Scholar 

  18. Volokh, A., Neumann, G.: Automatic Detection and Correction of Errors in Dependency Tree-Banks. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - HLT 2011, vol. 2, pp. 346–350. Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

  19. Xia, F., Rambow, O., Bhatt, R., Palmer, M., Sharma, D.: Towards a Multi-Representational Treebank. In: The 7th International Workshop on Treebanks and Linguistic Theories, Groningen, Netherlands. Citeseer (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Agrawal, B., Agarwal, R., Husain, S., Sharma, D.M. (2013). An Automatic Approach to Treebank Error Detection Using a Dependency Parser. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37247-6_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37246-9

  • Online ISBN: 978-3-642-37247-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics