Grammar Debugging

Maxwell, Michael

doi:10.1007/978-3-319-23980-4_11

Michael Maxwell¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 537))

Included in the following conference series:

International Workshop on Systems and Frameworks for Computational Morphology

265 Accesses

Abstract

Perhaps the dominant method for building morphological parsers is to use finite state transducer toolkits. The problem with this approach is that finite state transducers require one to think of grammar writing as a programming task, rather than as providing a declarative linguistic description. We have therefore developed a method for representing the morphology and phonology of natural languages in a way which is closer to traditional linguistic descriptions, together with a method for automatically converting these descriptions into parsers, thus allowing the linguistic descriptions to be tested against real language data.

But there is a drawback to this approach: the fact that the descriptive level is different from the implementation level makes debugging of the grammars difficult, and in particular it provides no aid to visualizing the steps in deriving surface forms from underlying forms. We have therefore developed a debugging tool, which allows the linguist to see each intermediate step in the generation of words, without needing to know anything about the finite state implementation. The tool runs in generation mode; that is, the linguist provides an expected parse, and the debugger shows how that underlying form is converted into a surface form given the grammar. (Debugging in the opposite direction—starting from an expected surface form—might seem more natural, but in fact is much harder if that form cannot be parsed, as presumably it cannot be if the grammar needs debugging.)

The tool allows tracing the application of feature checking constraints (important when there is multiple exponence) and phonological rules. It will soon allow viewing the application of suppletive allomorphy constraints, although we describe some theoretical linguistic issues with how the latter should work. The tool can be run from the command line (useful when repeatedly testing the same wordforms while tweaking the grammar), or from a Graphical User Interface (GUI) which prompts the user for the necessary information. The output can be displayed in a browser.

In addition to its use in debugging, the debugger could have an educational use in explicating the forms in a paradigm chart: each cell of the paradigm could be run through the debugger to produce the cell’s derivation, showing how forms which might seem counter-intuitive or irregular are derived. We have not yet implemented this.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Another approach to writing computational grammars is the ‘Grammatical Framework’ (http://www.grammaticalframework.org/). To an even greater extent than most finite state toolkits intended for linguists, the Grammatical Framework takes a programming language approach to writing rules. Indeed, the first lesson in the tutorial (http://www.grammaticalframework.org/doc/tutorial/gf-tutorial.html#toc4) starts out by saying “we learn the way of thinking in the GF theory.” In contrast, our approach is to assume the linguist already knows linguistics, and prefers to think in linguistic terms, not in “a typed functional language, borrowing many of its constructs from ML and Haskell” (http://www.grammaticalframework.org/doc/gf-refman.html#toc1). For the same reason, we have not attempted to model two-level sorts of analyses; as [1] note, that formalism tends not to appeal to most linguists.
2.
Some linguistic issues arise when combining different models. For example, should phonologically conditions on listed allomorphs be applied before or after phonological rules are applied?
3.
To my knowledge, the only morphological parser that directly supports the use of phonological features is SIL’s Hermit Crab. An outdated description of this program is at http://www.sil.org/computing/hermitcrab/. Hermit Crab has been re-implemented in SIL’s Fieldworks Language Explorer system, FLEx: http://www-01.sil.org/sil/news/2009/flex3.htm. Alternatively, it would be possible to convert feature-based descriptions of phonological environments into phoneme-based descriptions; so far as I know, there is no computational tool which does that. There has been only a little work on implementing Optimality Theory-based descriptions, see [9].
4.
Rules of epenthesis require additional steps, as do rules which apply to a lexically defined subclass of words (e.g., to a particular conjugation class).
5.
The motivations behind this work have been more thoroughly documented elsewhere, e.g., [5, 14, 15, 17].
6.
The caveat “as much as possible” refers to the inherent conflict between informal verbal descriptions and formal descriptions which can be processed computationally. One example of this is affix processes (such as reduplication), for which the model presents an explicit formalism based on early work in reduplication by generative linguists [13].
7.
http://www.xmlmind.com/xmleditor/.
8.
http://www.oxygenxml.com/.
9.
SIL’s Fieldworks Language Explorer (FLEx) provides similar capabilities for editing grammars conforming to a slightly different schema.
10.
The scare quotes around the word “compile” are intended to indicate that this compilation is not the same thing as compiling a C program, say, to executable code. Rather, it represents the conversion of some text-based format into a highly compressed and rapidly interpretable internal format as a finite state network.
11.
Descriptions of FST implementations use the terms “upper” and “lower,” but inconsistently: documentation of xfst refers to the lexical side as upper and the surface as lower, while documentation of SFST uses the opposite convention.
12.
FST tools often provide for multi-character symbols, which can be useful to represent such non-phonemic entities. The FST engine generally views them as if they were single characters.
13.
We use this affix and its allomorphs for illustrative purposes precisely because of its simplicity. A full regular expression capability is available in the XML formalism, and can be translated into SFST code, allowing much more complex allomorph (and phonological rule) environments.
14.
We will not discuss unusual plurals, such as the –en of oxen or the –i/–us alternation of words like octopus~octopi; nor irregular plurals such as geese, mice. I also ignore stems ending in /f/, since in many words this voices to become /v/, which then takes the /z/ suffix allomorph: wife/wives.
15.
The examples are represented using IPA characters, since the standard orthography does not make the distinctions in a consistent way.
16.
If the allomorph environments were stated intensionally in terms of phonological features, rather than extensionally as lists of phonemes, the first and second allomorphs of this affix would be need to be extrinsically ordered with respect to each other as well.
17.
I have simplified this somewhat, e.g., the prefixes on the XML tags designating the namespace for our XML schema are not shown.
18.
One point that may be unclear is the use of the ‘idref’ attribute to refer to phonemes. The phonemes are defined elsewhere in the XML grammar, with unique IDs, and these IDs are referred to here. The representation of morphosyntactic features would use a similar notation, but for reasons of compatibility with the TEI and ISO encodings of features and feature structures, we instead use the ‘name’ and ‘value’ attributes.
19.
The name we use is a multi-character symbol containing the affix gloss. The tag appears on both sides of the allomorph, bracketing it; this allows us to avoid confusion between the allomorph and any homographic sequence of characters.
20.
One can build a parser-as-guesser, in which the “lexicon” is a regular expression representing all possible lexemes, including those which do not correspond to real lexemes. But this seldom reveals the problem with a failed parse either.
21.
Or forms, should there be optional phonological rules.
22.
This problem happens when two features are not completely orthogonal. For example, in Spanish the feature value of [Mood subjunctive] is incompatible with the feature value [Tense future]: while there is a present subjunctive and a past subjunctive in Spanish, there is no future subjunctive.
23.
The situation is somewhat more complicated than this, since the formal grammar treats features as part of feature structures. There may thus be conflicting values for features which appear in distinct parts of the feature structure. For example, a language which marked transitive verbs for agreement with the person of both subject and object could have distinct values of the number feature for subject and for object, without conflict.
24.
In the case of multiple application of a rule, e.g., vowel harmony rules, the output of the entire rule application is shown.

References

Beesley, K.R., Karttunen, L.: Finite State Morphology. University of Chicago Press, Chicago (2003)
Google Scholar
Berry, D.M., Kamsties, E.: Ambiguity in requirements specification. In: do Prado Leite, J.C.S., Doorn, J.H. (eds.) Perspectives on Software Requirements, vol. 753. Springer, US (2003)
Google Scholar
Bonet, E., Harbour, D.: Contextual allomorphy. In: Trommer, J. (ed.) The Morphology and Phonology of Exponence, Oxford Studies in Theoretical Linguistics, vol. 41. Oxford University Press (2012)
Google Scholar
Carstairs, A.D.: Allomorphy in Inflexion. Croom Helm, London (1987)
Google Scholar
David, A., Maxwell, M.: Joint grammar development by linguists and computer scientists. In: Third International Joint Conference on Natural Language Processing, IJCNLP 2008, Hyderabad, India, 7–12 January, 2008, pp. 27–34. The Association for Computer Linguistics (2008). http://aclweb.org/anthology/I/I08/I08-3007.pdf
Dixon, R.M.W.: Basic Linguistic Theory. Methodology, vol. 1. Oxford University Press, Oxford (2009)
Google Scholar
Dixon, R.M.W.: Basic Linguistic Theory. Grammatical Topics, vol. 2. Oxford University Press, Oxford (2009)
Google Scholar
Hulden, M.: Foma: a finite-state compiler and library. In: Proceedings of the ACL, pp. 29–32. ACL, Athens (2009). http://www.aclweb.org/anthology/E/E09/E09-2008.pdf
Karttunen, L.: The proper treatment of optimality in computational phonology. In: Karttunen, L., Oflazer, K. (eds.) Proceedings of the International Workshop on Finite State Methods in Natural Language Processing, pp. 1–12. Bilkent University, Ankara (1998). http://www.aclweb.org/anthology/W/W98/W98-1301.pdf
Kiparsky, P.: ‘elsewhere’ in phonology. In: Anderson, S.R. (ed.) A Festschrift for Morris Halle, Holt, New York, pp. 93–106 (1973)
Google Scholar
Kiparsky, P.: Allomorphy or morphophonology? In: Singh, R. (ed.) Trubetzkoy’s Orphan: Proceedings of the Montreal Roundtable “Morphonology: Contemporary Responses”, Montreal, 30 Sept–2 Oct, 1994, pp. 13–31. Benjamins, Amsterdam (1996)
Google Scholar
Koskenniemi, K.: Two-level Morphology: A General Computational Model for Word-Form Recognition and Production. Ph.D. thesis, University of Helsinki (1983). http://www.ling.helsinki.fi/koskenni/doc/Two-LevelMorphology.pdf
Marantz, A.: Re reduplication. Linguist. Inquiry 13, 435–482 (1982)
Google Scholar
Maxwell, M.: Standardization as a means to sustainability. In: Workshop on Language Resources: From Storyboard to Sustainability and LR Lifecycle Management, LREC 2010, pp. 30–33 (2010)
Google Scholar
Maxwell, M.: Electronic grammars and reproducible research. In: Nordoff, S., Poggeman, K.L.G. (eds.) Electronic Grammaticography, pp. 207–235. University of Hawaii Press (2012)
Google Scholar
Maxwell, M.: Accounting for allomorphy in finite state transducers. In: Finite-State Methods and Natural Language Processing (2015)
Google Scholar
Maxwell, M., David, A.: Interoperable grammars. In: Webster, J., Ide, N., Fang, A.C. (eds.) First International Conference on Global Interoperability for Language Resources (ICGL 2008), Hong Kong, pp. 155–162 (2008). http://hdl.handle.net/1903/11611
Paster, M.E.: Explaining phonological conditions on affixation: Evidence from suppletive allomorphy and affix ordering 1. Word Structure 2(1), 18–37 (2009)
Article Google Scholar
Schmid, H.: A programming language for finite state transducers. In: Yli-Jyrä, A., Karttunen, L., Karhumäki, J. (eds.) FSMNLP 2005. LNCS (LNAI), vol. 4002, pp. 308–309. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

University of Maryland, College Park, MD, 20742, USA
Michael Maxwell

Authors

Michael Maxwell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Maxwell .

Editor information

Editors and Affiliations

Institut für Deutsche Sprache, Mannheim, Germany
Cerstin Mahlow
Leibniz Institute of European History, Mainz, Germany
Michael Piotrowski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maxwell, M. (2015). Grammar Debugging. In: Mahlow, C., Piotrowski, M. (eds) Systems and Frameworks for Computational Morphology. SFCM 2015. Communications in Computer and Information Science, vol 537. Springer, Cham. https://doi.org/10.1007/978-3-319-23980-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-23980-4_11
Published: 09 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23978-1
Online ISBN: 978-3-319-23980-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics