Skip to main content

Error Annotation of the Arabic Learner Corpus

A New Error Tagset

  • Conference paper
Language Processing and Knowledge in the Web

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8105))

Abstract

This paper introduces a new two-level error tagset, AALETA (Alfaifi Atwell Leeds Error Tagset for Arabic), to be used for annotating the Arabic Learner Corpora (ALC). The new tagset includes six broad classes, subdivided into 37 more specific error types or subcategories. It is easily understood by Arabic corpus error annotators. AALEETA is based on an existing error tagset for Arabic corpora, ARIDA, created by Abuhakema et al. [1], and a number of other error-analysis studies. It was used to annotate texts of the Arabic Learner Corpus [2]. The paper shows the tagset broad classes and types or subcategories and an example of annotation. The understandability of AALETA was measured against that of ARIDA, and the preliminary results showed that AALETA achieved a slightly higher score. Annotators reported that they preferred using AALETA over ARIDA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abuhakema, G., Feldman, A., Fitzpatrick, E.: ARIDA: An Arabic Interlanguage Database and Its Applications: A Pilot Study. Journal of the National Council of Less Commonly Taught Languages (JNCOLCTL) 7, 161–184 (2009)

    Google Scholar 

  2. Alfaifi, A. and E. Atwell. المدونات اللغوية لمتعلمي اللغة العربية: نظامٌ لتصنيف وترميز الأخطاء اللغوية (in Arabic)"Arabic Learner Corpora (ALC): A Taxonomy of Coding Errors". in 8th International Computing Conference in Arabic (ICCA 2012) 26-28 December 2012. 2012. Cairo, Egypt.

    Google Scholar 

  3. Granger, S.: The International Corpus of Learner English: A New Resource for Foreign Language Learning and Teaching and Second Language Acquisition Research. TESOL Quarterly 37(3), 538–546 (2003)

    Article  Google Scholar 

  4. Nesselhauf, N.: Learner Corpora and Their Potential in Language Teaching. In: Sinclair, J. (ed.) How to Use Corpora in Language Teaching, pp. 125–152. Benjamins, Amsterdam (2004)

    Google Scholar 

  5. Buttery, P., Caines, A.: Normalising Frequency Counts to Account for ‘opportunity of use’ in Learner Corpora. In: Tono, Y., Kawaguchi, Y., Minegishi, M. (eds.) Developmental and Crosslinguistic Perspectives in Learner Corpus Research, pp. 187–204. John Benjamins, Amsterdam (2012)

    Google Scholar 

  6. Meunier, F., et al.: The LONGDALE (Longitudinal Database of Learner English), [cited 2012, September 14] (2010), http://www.uclouvain.be/en-cecl-longdale.html

  7. Diez-Bedmar, M.B.: Written Learner Corpora by Spanish Students of English: an overview. In: Gómez, P.C., Pére, A.S. (eds.) A Survey on Corpus-based Research, Proceedings of the AELINCO Conference, pp. 920–933. Asociación Española de Lingüística del Corpus, Murcia (2009)

    Google Scholar 

  8. Hammarberg, B.: Introduction to the ASU Corpus, a Longitudinal Oral and Written Text Corpus of Adult Learners’ Swedish with a Corresponding Part from Native Swedes. Stockholm University, Department of Linguistics (2010)

    Google Scholar 

  9. Dagneaux, E., et al.: Error tagging manual (1996)

    Google Scholar 

  10. Granger, S.: Error-tagged Learner Corpora and CALL: A Promising Synergy. CALICO Journal 20(3), 465–480 (2003)

    Google Scholar 

  11. Nicholls, D.: The Cambridge Learner Corpus - error coding and analysis for lexicography and ELT. In: Corpus Linguistics 2003 Conference (CL 2003), Lancaster, UK (2003)

    Google Scholar 

  12. Izumi, E., Uchimoto, K., Isahara, H.: Error anotation for corpus of Japanese learner English. In: Sixth International Workshop on Linguistically Interpreted Corpora (LINC 2005), Jeju Island, Korea, October 15 (2005)

    Google Scholar 

  13. Alosaili, A.I., الأخطاء الشائعة في الكلام لدى طلاب اللغة العربية الناطقين بلغات أخرى: دراسة وصفية تحليلية (in Arabic) "Common Errors in Speech Production of Non-Native Arabic Learners". 1985, Al Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia.

    Google Scholar 

  14. Alateeq, Z.M., تحليل الأخطاء الدلالية لدى دارسي اللغة العربية من غير الناطقين بها في مادة التعبير الكتابي (in Arabic) "Semantic Errors Analysis of Non-Native Arabic Learners in Writing". 1992, Al Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia.

    Google Scholar 

  15. Alhamad, M.M.: تحليل أخطاء التعبير الكتابي لدى المستوى المتقدم من دارسي العربية غير الناطقين بها في جامعة الملك سعود (in Arabic)"Writing Errors Analysis of Advanced-Level Arabic Learners at King Saud University. Al Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia (1994)

    Google Scholar 

  16. Alaqeeli, A.S.: تحليل الأخطاء في بعض أنماط الجملة الفعلية للغة العربية في الأداء الكتابي لدى دارسي المستوى المتقدم (in Arabic). Error Analysis in Some Verbal Sentence Patterns of Arabic in Writing Production of Advanced-Level Learners, Al Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alfaifi, A., Atwell, E., Abuhakema, G. (2013). Error Annotation of the Arabic Learner Corpus. In: Gurevych, I., Biemann, C., Zesch, T. (eds) Language Processing and Knowledge in the Web. Lecture Notes in Computer Science(), vol 8105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40722-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40722-2_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40721-5

  • Online ISBN: 978-3-642-40722-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics