Informal Mathematical Discourse Parsing with Conditional Random Fields

Gutierrez de Piñerez Reyes, Raúl Ernesto; Díaz-Frías, Juan Francisco

doi:10.1007/978-3-319-11397-5_20

Raúl Ernesto Gutierrez de Piñerez Reyes⁷ &
Juan Francisco Díaz-Frías⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8791))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

1029 Accesses

Abstract

Discourse parsing for the Informal Mathematical Discourse (IMD) has been a difficult task because of the lack of data sets, partly because the Natural Language Processing (NLP) techniques must be adapted to informality of IMD. In this paper, we present an end-to-end discourse parser which is a sequential classifier of informal deductive argumentations (IDA) for Spanish. We design a discourse parser using sequence labeling based on CRFs (Conditional Random Fields). We use the CRFs on lexical, syntactic and semantic features extracted from a discursive corpus (MD-TreeBank: Mathematical Discourse TreeBank). In this article, we describe a Penn Discourse TreeBank (PDTB) styled End-to-End discourse parser into the Control Natural Languages (CNLs) context. Discourse parsing is focused from a discourse low level perspective in which we identify the IDA connectives avoiding complex linguistic phenomena. Our discourse parser performs parsing as a connective-level sequence labeling task and classifies several types of informal deductive argumentations into the mathematical proof.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://nlp.lsi.upc.edu/freeling/

References

Bikel, D.: Design of a multilingual, parallel processing statistical parsing engine. In: Proceedings of the 2nd International Conference on Human Language Technology Research HLT’02, pp. 178–182. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Google Scholar
Dines, N., Lee, A., Miltsakaki, E., Prasad, R., Joshi, A., Webber, B.: Attribution and the (non-)alignment of syntactic and discourse arguments of connectives. In: Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky, CorpusAnno ’05, Stroudsburg, PA, USA, pp. 29–36. Association for Computational Linguistics (2005). http://dl.acm.org/citation.cfm?id=1608829.1608834
Ghosh, S., Johansson, R., Riccardi, G., Tonelli, S.: Shallow discourse parsing with conditional random fields. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, pp. 1071–1079 (2011)
Google Scholar
Humayoun, M., Raffalli, C.: Mathabs: A representational language for mathematics. In: Proceedings of the 8th International Conference on Frontiers of Information Technology, FIT ’10, pp. 37:1–37:7. ACM, New York (2010). http://doi.acm.org/10.1145/1943628.1943665
Kamareddine, F., Maarek, M., Retel, K., Wells, J.B.: Narrative structure of mathematical texts. In: Kauers, M., Kerber, M., Miner, R., Windsteiger, W. (eds.) MKM/CALCULEMUS 2007. LNCS (LNAI), vol. 4573, pp. 296–312. Springer, Heidelberg (2007). http://dx.doi.org/10.1007/978-3-540-73086-6_24
Chapter Google Scholar
Lin, Z., Ng, H.T., Kan, M.: A PDTB-styled end-to-end discourse parser. Comput. Res. Repository (2011)
Google Scholar
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: The penn treebank. Comput. Linguist. 19(2), 313–330 (1993). http://dl.acm.org/citation.cfm?id=972470.972475
Google Scholar
Pitler, E., Nenkova, A.: Using syntax to disambiguate explicit discourse connectives in text. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (ACLShort 2009), Stroudsburg, PA, USA, pp. 13–16. Association for Computational Linguistics (2009). http://dl.acm.org/citation.cfm?id=1667583.1667589
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The penn discourse treebank 2.0. In: Proceedings of the 6th International Conference on Languages Resources and Evaluations (LREC 2008), Marrakech, Marocco (2008)
Google Scholar
Qi, L., Chen, L.: A linear-chain CRF-based learning approach for web opinion mining. In: Chen, L., Triantafillou, P., Suel, T. (eds.) WISE 2010. LNCS, vol. 6488, pp. 128–141. Springer, Heidelberg (2010)
Chapter Google Scholar
Gutierrez de Piñerez Reyes, R.E., Díaz Frías, J.F.: Preprocessing of informal mathematical discourse in context of controlled natural language. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, pp. 1632–1636. ACM, New York (2012). http://doi.acm.org/10.1145/2396761.2398487
Gutierrez de Piñerez Reyes, R.E., Díaz Frias, J.F.: Building a discourse parser for informal mathematical discourse in the context of a controlled natural language. In: Gelbukh, A. (ed.) CICLing 2013, Part I. LNCS, vol. 7816, pp. 533–544. Springer, Heidelberg (2013). http://dx.doi.org/10.1007/978-3-642-37247-6_43
Chapter Google Scholar
Ruesga, S.L., Sandoval, S.L., León, L.F.: Spanish treebank: specifications version 5. Technical report, Universidad Autónoma de Madrid (1999)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Book MATH Google Scholar
Wellner, B.: Sequence models and ranking methods for discourse parsing. Ph.D. thesis, Brandeis University (2009)
Google Scholar
Wellner, B., Pustejovsky, J.: Automatically identifying the arguments of discourse connectives. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, June 2007, pp. 92–101. Association for Computational Linguistics (2007). http://www.aclweb.org/anthology/D/D07/D07-1010
Wolska, M.: A language engineering architecture for processing informal mathematical discourse. In: Towards Digital Mathematics Library, pp. 131–136. Masaryk University (2008)
Google Scholar
Zinn, C.: Understanding informal mathematical discourse. Ph.D. thesis. Universität Erlangen-Nürnberg Institut für Informatik (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

EISC, Universidad del Valle, Cali, Colombia
Raúl Ernesto Gutierrez de Piñerez Reyes & Juan Francisco Díaz-Frías

Authors

Raúl Ernesto Gutierrez de Piñerez Reyes
View author publications
You can also search for this author in PubMed Google Scholar
Juan Francisco Díaz-Frías
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raúl Ernesto Gutierrez de Piñerez Reyes .

Editor information

Editors and Affiliations

University Joseph Fourier, Grenoble, France
Laurent Besacier
Rovira i Virgili University, Tarragona, Spain
Adrian-Horia Dediu
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gutierrez de Piñerez Reyes, R.E., Díaz-Frías, J.F. (2014). Informal Mathematical Discourse Parsing with Conditional Random Fields. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-11397-5_20
Published: 03 September 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11396-8
Online ISBN: 978-3-319-11397-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics