Skip to main content

Supertagging for a Statistical HPSG Parser for Spanish

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9449))

Included in the following conference series:

  • 659 Accesses

Abstract

We created a supertagger for the Spanish language aimed at disambiguating the HPSG lexical frames for the verbs in a sentence. The supertagger uses a CRF model and achieves an accuracy of 83.58 % for the verb classes on the test set. The tagset contains 92 verb classes, extracted from a Spanish HPSG-compatible annotated corpus that was created by automatically transforming the Ancora Spanish corpus. The verb tags include information about the arguments structure and syntactic categories of the arguments, so they can be easily translated into HPSG lexical entries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pollard, C., Sag, I.A.: Head-Driven Phrase Structure Grammar. University of Chicago Press/CSLI Publications, Chicago/Stanford (1994)

    Google Scholar 

  2. Joshi, A.K., Srinivas, B.: Disambiguation of super parts of speech (or supertags): almost parsing. In: Proceedings of the 15th Conference on Computational Linguistics, vol. 1, pp. 154–160. Association for Computational Linguistics (1994)

    Google Scholar 

  3. Curran, J.R., Clark, S., Vadas, D.: Multi-tagging for lexicalized-grammar parsing. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 697–704. Association for Computational Linguistics (2006)

    Google Scholar 

  4. Lewis, M., Steedman, M.: Improved CCG parsing with semi-supervised supertagging. Trans. Assoc. Comput. Linguist. 2, 327–338 (2014)

    Google Scholar 

  5. Dridan, R.: Using lexical statistics to improve HPSG parsing. Doctoral dissertation, University of Saarland (2009)

    Google Scholar 

  6. Zhang, Y.Z., Matsuzaki, T., Tsujii, J.I.: Forest-guided supertagger training. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1281–1289. Association for Computational Linguistics (2010)

    Google Scholar 

  7. Silva, J., Branco, A.: Assigning deep lexical types. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 240–247. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Marimon, M., Bel, N., Espeja, S., Seghezzi, N.: The spanish resource grammar: pre-processing strategy and lexical acquisition. In: Proceedings of the Workshop on Deep Linguistic Processing, pp. 105–111. Association for Computational Linguistics (2007)

    Google Scholar 

  9. Kolachina, P., Bangalore, S., Kolachina, S. Extracting LTAG grammars from a Spanish treebank. In: Proceedings of ICON-2011: 9th International Conference on Natural Language Processing. Macmillan Publishers, India (2011)

    Google Scholar 

  10. Taulé, M., Martí, M.A., Recasens, M.: Ancora: multilevel annotated corpora for catalan and Spanish. In: Proceedings of 6th International Conference on Language Resources and Evaluation, Marrakesh, Morocco (2008)

    Google Scholar 

  11. Chiruzzo, L., Wonsever, D.: Desarrollo de un parser HPSG Estadístico para el Español. In: Proceedings of I Workshop on Tools and Resources for Automatically Processing Portuguese and Spanish, São Carlos, SP, Brazil (2014)

    Google Scholar 

  12. Copestake, A., Flickinger, D., Pollard, C., Sag, I.A.: Minimal recursion semantics: an introduction. Res. Lang. Comput. 3(2–3), 281–332 (2005)

    Article  Google Scholar 

  13. Miyao, Y., Ninomiya, T., Tsujii, J.: Corpus-oriented grammar development for acquiring a head-driven phrase structure grammar from the penn treebank. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 684–693. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Babko-Malaya, O.: PropBank annotation guidelines (2005)

    Google Scholar 

  15. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)

    Google Scholar 

  16. Kudo, T.: CRF++: yet another CRF toolkit (2005). Software available at http://crfpp.sourceforge.net

  17. Manning, C., Klein, D.: Stanford classifier. The Stanford Natural Language Processing Group (2003). Software available at http://nlp.stanford.edu/software/classifier.shtml

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luis Chiruzzo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Chiruzzo, L., Wonsever, D. (2015). Supertagging for a Statistical HPSG Parser for Spanish. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25789-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25788-4

  • Online ISBN: 978-3-319-25789-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics