Using Syntactic and Semantic Features for Classifying Modal Values in the Portuguese Language

Sequeira, João; Gonçalves, Teresa; Quaresma, Paulo; Mendes, Amália; Hendrickx, Iris

doi:10.1007/978-3-319-75487-1_28

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9624))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1123 Accesses

Abstract

This paper presents a study made in a field poorly explored in the Portuguese language – modality and its automatic tagging. Our main goal was to find a set of attributes for the creation of automatic taggers with improved performance over the bag-of-words (bow) approach. The performance was measured using precision, recall and \(F_1\). Because it is a relatively unexplored field, the study covers the creation of the corpus (composed by eleven verbs), the use of a parser to extract syntactic and semantic information from the sentences and a machine learning approach to identify modality values. Based on three different sets of attributes – from trigger itself and the trigger’s path (from the parse tree) and context – the system creates a tagger for each verb achieving (in almost every verb) an improvement in \(F_1\) when compared to the traditional bow approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The MMAX2 software is platform-independent, written in java and can freely be downloaded from http://mmax2.sourceforge.net/.

References

der Auwera, J.V., Plungian, V.A.: Modality’s semantic map. Linguist. Typol. 1(2), 79–124 (1998)
Google Scholar
Baker, K., Bloodgood, M., Dorr, B., Filardo, N.W., Levin, L., Piatko, C.: A modality Lexicon and its use in automatic tagging. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA), Valletta, Malta, May 2010
Google Scholar
Bick, E.: The Parsing System PALAVRAS. Aarhus University Press, Aarhus (1999)
Google Scholar
Diab, M.T., Levin, L.S., Mitamura, T., Rambow, O., Prabhakaran, V., Guo, W.: Committed belief annotation and tagging. In: Third Linguistic Annotation Workshop, pp. 68–73. The Association for Computer Linguistics, Singapore, August 2009
Google Scholar
Farkas, R., Vincze, V., Móra, G., Csirik, J., Szarvas, G.: The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pp. 1–12. Association for Computational Linguistics, Uppsala, Sweden, July 2010
Google Scholar
Généreux, M., Hendrickx, I., Mendes, A.: Introducing the reference corpus of contemporary Portuguese on-line. In: Calzolari, N., Choukri, K., Declerck, T., Dogan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) LREC 2012, pp. 2237–2244. European Language Resources Association (ELRA), Istanbul (2012)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Hendrickx, I., Mendes, A., Mencarelli, S.: Modality in text: a proposal for corpus annotation. In: Chair, N.C.C., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, Turkey, May 2012
Google Scholar
Matsuyoshi, S., Eguchi, M., Sao, C., Murakami, K., Inui, K., Matsumoto, Y.: Annotating event mentions in text with modality, focus, and source information. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA), Valletta, Malta, May 2010
Google Scholar
McShane, M., Nirenburg, S., Beale, S., O’Hara, T.: Semantically rich human-aided machine annotation. In: Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky, pp. 68–75. Association for Computational Linguistics, Ann Arbor, Michigan, June 2005
Google Scholar
Mendes, A., Hendrickx, I., Salgueiro, A., Ávila, L.: Annotating the interaction between focus and modality: the case of exclusive particles. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp. 228–237. Association for Computational Linguistics, Sofia, Bulgaria, August 2013
Google Scholar
Miwa, M., Thompson, P., McNaught, J., Kell, D.B., Ananiadou, S.: Extracting semantically enriched events from biomedical literature. BMC Bioinform. 13, 108 (2012)
Article Google Scholar
Müller, C., Strube, M.: Multi-level annotation of linguistic data with MMAX2. In: Braun, S., Kohn, K., Mukherjee, J. (eds.) Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods, pp. 197–214. Peter Lang, Frankfurt a.M., Germany (2006)
Google Scholar
Nirenburg, S., McShane, M.: Annotating modality. Technical report, University of Maryland, Baltimore County, USA, March 2008
Google Scholar
Nissim, M., Pietrandrea, P., Sanso, A., Mauri, C.: Cross-linguistic annotation of modality: a data-driven hierarchical model. In: Proceedings of IWCS 2013 WAMM Workshop on the Annotation of Modal Meaning in Natural Language, pp. 7–14. Association for Computational Linguistics, Postam, Germany (2013)
Google Scholar
Palmer, F.R.: Mood and Modality. Cambridge Textbooks in Linguistics. Cambridge University Press, Cambridge (1986)
Google Scholar
Prabhakaran, V., Bloodgood, M., Diab, M., Dorr, B., Levin, L., Piatko, C.D., Rambow, O., Van Durme, B.: Statistical modality tagging from rule-based annotations and crowdsourcing. In: Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics, ExProM 2012, pp. 57–64. Association for Computational Linguistics, Stroudsburg, PA, USA (2012)
Google Scholar
Ruppenhofer, J., Rehbein, I.: Yes we can!? Annotating English modal verbs. In: Chair, N.C.C., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, Turkey, May 2012
Google Scholar
Sauri, R., Verhagen, M., Pustejovsky, J.: Annotating and recognizing event modality in text. In: FLAIRS Conference, pp. 333–339 (2006)
Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience, Hoboken (1998)
Google Scholar
Ávila, L., Melo, H.: Challenges in modality annotation in a Brazilian Portuguese spontaneous speech corpus. In: Proceedings of IWCS 2013 WAMM Workshop on the Annotation of Modal Meaning in Natural Language. Association for Computational Linguistics, Postam, Germany (2013)
Google Scholar

Download references

Acknowledgements

This work was partially supported by national funds through FCT – Fundação para a Ciência e Tecnologia, under project Pest-OE/EEI/LA0021/2013 and project PEst-OE/LIN/UI0214/2013.

Author information

Authors and Affiliations

Department of Informatics, University of Évora, Évora, Portugal
João Sequeira, Teresa Gonçalves & Paulo Quaresma
Center for Linguistics of the University of Lisbon, Lisbon, Portugal
Amália Mendes & Iris Hendrickx
Center for Language Studies, Radboud University Nijmegen, Nijmegen, The Netherlands
Iris Hendrickx
L2F – Spoken Language Systems Laboratory, INESC-ID, Lisbon, Portugal
Paulo Quaresma

Authors

João Sequeira
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Quaresma
View author publications
You can also search for this author in PubMed Google Scholar
Amália Mendes
View author publications
You can also search for this author in PubMed Google Scholar
Iris Hendrickx
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Teresa Gonçalves .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sequeira, J., Gonçalves, T., Quaresma, P., Mendes, A., Hendrickx, I. (2018). Using Syntactic and Semantic Features for Classifying Modal Values in the Portuguese Language. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-75487-1_28
Published: 21 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75486-4
Online ISBN: 978-3-319-75487-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Syntactic and Semantic Features for Classifying Modal Values in the Portuguese Language