Error Annotation of the Arabic Learner Corpus

Alfaifi, Abdullah; Atwell, Eric; Abuhakema, Ghazi

doi:10.1007/978-3-642-40722-2_2

Abdullah Alfaifi²²,
Eric Atwell²² &
Ghazi Abuhakema²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8105))

1326 Accesses
1 Citations

Abstract

This paper introduces a new two-level error tagset, AALETA (Alfaifi Atwell Leeds Error Tagset for Arabic), to be used for annotating the Arabic Learner Corpora (ALC). The new tagset includes six broad classes, subdivided into 37 more specific error types or subcategories. It is easily understood by Arabic corpus error annotators. AALEETA is based on an existing error tagset for Arabic corpora, ARIDA, created by Abuhakema et al. [1], and a number of other error-analysis studies. It was used to annotate texts of the Arabic Learner Corpus [2]. The paper shows the tagset broad classes and types or subcategories and an example of annotation. The understandability of AALETA was measured against that of ARIDA, and the preliminary results showed that AALETA achieved a slightly higher score. Annotators reported that they preferred using AALETA over ARIDA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abuhakema, G., Feldman, A., Fitzpatrick, E.: ARIDA: An Arabic Interlanguage Database and Its Applications: A Pilot Study. Journal of the National Council of Less Commonly Taught Languages (JNCOLCTL) 7, 161–184 (2009)
Google Scholar
Alfaifi, A. and E. Atwell. المدونات اللغوية لمتعلمي اللغة العربية: نظامٌ لتصنيف وترميز الأخطاء اللغوية (in Arabic)"Arabic Learner Corpora (ALC): A Taxonomy of Coding Errors". in 8th International Computing Conference in Arabic (ICCA 2012) 26-28 December 2012. 2012. Cairo, Egypt.
Google Scholar
Granger, S.: The International Corpus of Learner English: A New Resource for Foreign Language Learning and Teaching and Second Language Acquisition Research. TESOL Quarterly 37(3), 538–546 (2003)
Article Google Scholar
Nesselhauf, N.: Learner Corpora and Their Potential in Language Teaching. In: Sinclair, J. (ed.) How to Use Corpora in Language Teaching, pp. 125–152. Benjamins, Amsterdam (2004)
Google Scholar
Buttery, P., Caines, A.: Normalising Frequency Counts to Account for ‘opportunity of use’ in Learner Corpora. In: Tono, Y., Kawaguchi, Y., Minegishi, M. (eds.) Developmental and Crosslinguistic Perspectives in Learner Corpus Research, pp. 187–204. John Benjamins, Amsterdam (2012)
Google Scholar
Meunier, F., et al.: The LONGDALE (Longitudinal Database of Learner English), [cited 2012, September 14] (2010), http://www.uclouvain.be/en-cecl-longdale.html
Diez-Bedmar, M.B.: Written Learner Corpora by Spanish Students of English: an overview. In: Gómez, P.C., Pére, A.S. (eds.) A Survey on Corpus-based Research, Proceedings of the AELINCO Conference, pp. 920–933. Asociación Española de Lingüística del Corpus, Murcia (2009)
Google Scholar
Hammarberg, B.: Introduction to the ASU Corpus, a Longitudinal Oral and Written Text Corpus of Adult Learners’ Swedish with a Corresponding Part from Native Swedes. Stockholm University, Department of Linguistics (2010)
Google Scholar
Dagneaux, E., et al.: Error tagging manual (1996)
Google Scholar
Granger, S.: Error-tagged Learner Corpora and CALL: A Promising Synergy. CALICO Journal 20(3), 465–480 (2003)
Google Scholar
Nicholls, D.: The Cambridge Learner Corpus - error coding and analysis for lexicography and ELT. In: Corpus Linguistics 2003 Conference (CL 2003), Lancaster, UK (2003)
Google Scholar
Izumi, E., Uchimoto, K., Isahara, H.: Error anotation for corpus of Japanese learner English. In: Sixth International Workshop on Linguistically Interpreted Corpora (LINC 2005), Jeju Island, Korea, October 15 (2005)
Google Scholar
Alosaili, A.I., الأخطاء الشائعة في الكلام لدى طلاب اللغة العربية الناطقين بلغات أخرى: دراسة وصفية تحليلية (in Arabic) "Common Errors in Speech Production of Non-Native Arabic Learners". 1985, Al Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia.
Google Scholar
Alateeq, Z.M., تحليل الأخطاء الدلالية لدى دارسي اللغة العربية من غير الناطقين بها في مادة التعبير الكتابي (in Arabic) "Semantic Errors Analysis of Non-Native Arabic Learners in Writing". 1992, Al Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia.
Google Scholar
Alhamad, M.M.: تحليل أخطاء التعبير الكتابي لدى المستوى المتقدم من دارسي العربية غير الناطقين بها في جامعة الملك سعود (in Arabic)"Writing Errors Analysis of Advanced-Level Arabic Learners at King Saud University. Al Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia (1994)
Google Scholar
Alaqeeli, A.S.: تحليل الأخطاء في بعض أنماط الجملة الفعلية للغة العربية في الأداء الكتابي لدى دارسي المستوى المتقدم (in Arabic). Error Analysis in Some Verbal Sentence Patterns of Arabic in Writing Production of Advanced-Level Learners, Al Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Leeds, Leeds, UK
Abdullah Alfaifi & Eric Atwell
College of Charleston, SC, USA
Ghazi Abuhakema

Authors

Abdullah Alfaifi
View author publications
You can also search for this author in PubMed Google Scholar
Eric Atwell
View author publications
You can also search for this author in PubMed Google Scholar
Ghazi Abuhakema
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Technical University Darmstadt, 64289 Darmstadt, Germany, and German Institute for International Education Research,, 60486, Frankfurt, Germany
Iryna Gurevych
Technical University Darmstadt, 64289, Darmstadt, Germany
Chris Biemann
Technical University Darmstadt, 64289 Darmsadt, and German Institute for International Educational Research, 60486, Frankfurt, Germany
Torsten Zesch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alfaifi, A., Atwell, E., Abuhakema, G. (2013). Error Annotation of the Arabic Learner Corpus. In: Gurevych, I., Biemann, C., Zesch, T. (eds) Language Processing and Knowledge in the Web. Lecture Notes in Computer Science(), vol 8105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40722-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-40722-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40721-5
Online ISBN: 978-3-642-40722-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics