Skip to main content

A Property Grammar-Based Method to Enrich the Arabic Treebank ATB

  • Conference paper
  • First Online:
Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015)

Abstract

We present a method based on the formalism of Property Grammars to enrich the Arabic treebank ATB with syntactic constraints (so-called properties). The Property Grammar formalism is an effectively constraint-based approach that directly specifies the constraints on information categories. This can facilitate the enrichment process. The latter is based on three phases: the problem formalization, the Property Grammar induction from the ATB and the treebank regeneration with a new syntactic property-based representation. The enrichment of the ATB can make it more useful for many NLP applications such as the ambiguity resolution. This allows also the acquisition of new linguistic resources and the ease of the probabilistic parsing process. This enrichment process is purely automatic and independent from any language and source corpus formalism. This motivates its reuse. We obtained good and encouraging experiment results and various properties of different types.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Arabic Transliteration Table on Tim Buckwalter site: www.qamus.org/transliteration.htm.

References

  1. Abdul-Mageed, M., Diab, M.: AWATIF: a multi-genre corpus for modern standard Arabic subjectivity and sentiment analysis. In: Language Resources and Evaluation Conference (LREC 2012), Istanbul, Turkey (2012)

    Google Scholar 

  2. Alkuhlani, S., Habash, N.: A corpus for modeling morpho-syntactic agreement in Arabic: gender, number and rationality. In: Association for Computational Linguistics (ACL 2011), Portland, Oregon, USA (2011)

    Google Scholar 

  3. Alkuhlani, S., Habash, N., Roth, R.: Automatic morphological enrichment of a morphologically underspecified treebank. In: North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL 2013), pp. 460–470, Atlanta, Georgia, USA (2013)

    Google Scholar 

  4. Bensalem, R.B., Elkarwi, M.: Induction d’une grammaire de propriétés à granularité variable à partir du treebank arabe ATB. In: Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2014), pp. 124–135, ATALA, ACL-ontology, Marseille, France (2014)

    Google Scholar 

  5. Bahloul, R.B., Elkarwi, M., Haddar, K., Blache, P.: Building an Arabic linguistic resource from a treebank: the case of property grammar. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 240–246. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10816-2_30

    Google Scholar 

  6. Blache, P., Rauzy, S.: Hybridization and treebank enrichment with constraint-based representations. In: LREC 2012 - Workshop on Advanced Treebanking, Istanbul, Turkey (2012)

    Google Scholar 

  7. Çakıcı, R.: Automatic induction of a CCG grammar for Turkish. In: ACL Student Research Workshop, pp. 73–78, Ann Arbor, Michigan (2005)

    Google Scholar 

  8. El-taher, A.I., Abo Bakr, H.M., Zidan, I., Shaalan, K.: An Arabic CCG approach for determining constituent types from Arabic treebank. J. King Saud Univ. Comput. Inf. Sci. 1319–1578 (2014)

    Google Scholar 

  9. Habash, N., Rambow O.: Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: ACL, pp. 573–580, Ann Arbor, Michigan (2005)

    Google Scholar 

  10. Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: the 90% solution. In: North American Chapter of the Association for Computational Linguistics (NAACL 2006), pp. 57–60, USA (2006)

    Google Scholar 

  11. Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic treebank: building a large-scale annotated Arabic corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, Cairo, Egypt (2004)

    Google Scholar 

  12. Maruyama, H.: Structural disambiguation with constraint propagation. In: ACL 1990 Workshop on Dependency-based Grammars, pp. 31–38. Pittsburgh, Pennsylvania, USA (1990)

    Google Scholar 

  13. Müller, H.H.: Annotation of morphology and NP structure in the Copenhagen Dependency Treebanks (CDT). In: International Workshop on Treebanks and Linguistic Theories, pp. 151–162, University of Tartu, Estonia (2010)

    Google Scholar 

  14. Oepen, S., Flickinger, D., Toutanova, K., Manning, C.D.: LinGO redwoods - a rich and dynamic treebank for HPSG. In: LREC 2002 - Workshop on Parsing Evaluation, Las Palmas, Spain (2002)

    Google Scholar 

  15. Palmer, M., Babko-Malaya, O., Bies, A., Diab, M., Maamouri, M., Mansouri, A., Zaghouani, W.: A pilot Arabic propbank. In: LREC 2008, Marrakech, Morocco (2008)

    Google Scholar 

  16. Pollard, C., Sag, I.: Head-driven Phrase Structure Grammars. Chicago University Press, Chicago (1994)

    Google Scholar 

  17. Tounsi, L., Attia, M., Van-Genabith, J.: Automatic treebank-based acquisition of Arabic LFG dependency structures. In: The European Chapter of the ACL (EACL) Workshop on Computational Approaches to Semitic Languages, pp. 45–52, Greece (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raja Bensalem Bahloul .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Bahloul, R.B., Haddar, K., Blache, P. (2016). A Property Grammar-Based Method to Enrich the Arabic Treebank ATB. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2015. Communications in Computer and Information Science, vol 631. Springer, Cham. https://doi.org/10.1007/978-3-319-52758-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52758-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52757-4

  • Online ISBN: 978-3-319-52758-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics