A Model-Based Multilingual Natural Language Parser — Implementing Chomsky’s X-bar Theory in ModelCC
Natural language support is a powerful feature that enhances user interaction with query systems. NLP requires dealing with ambiguities. Traditional probabilistic parsers provide a convenient means for disambiguation. However, they incorrigibly return wrong sequences of tokens, they impose hard constraints on the way lexical and syntactic ambiguities can be resolved, and they are limited in the mechanisms they allow for taking context into account. In comparison, model-based parser generators allow for flexible constraint specification and reference resolution, which facilitates the context consideration. In this paper, we explain how the ModelCC model-based parser generator supports statistical language models and arbitrary probability estimators. Then, we present the ModelCC implementation of a natural language parser based on the syntax of most Romance and Germanic languages. This natural language parser can be instantiated for a specific language by connecting it with a thesaurus (for lexical analysis), a linguistic corpus (for syntax-driven disambiguation), and an ontology or semantic database (for semantics-driven disambiguation).
KeywordsNatural languages disambiguation query parsing
Unable to display preview. Download preview PDF.
- 1.20q, http://www.20q.net
- 2.Charniak, E.: Statistical parsing with a context-free grammar and word statistics. In: Proc. AAAI 1997, pp. 598–603 (1997)Google Scholar
- 3.Chomsky, N.: Remarks on nominalization. In: Jacobs, R., Rosenbaum, P. (eds.) Readings in English Transformational Grammar, pp. 184–221 (1970)Google Scholar
- 5.Fodor, J.A.: The Language of Thought. Crowell Press (1975)Google Scholar
- 7.Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, 2nd edn. Prentice Hall (2009)Google Scholar
- 9.Markov, A.A.: Dynamic Probabilistic Systems (Volume I: Markov Models). In: Howard, R. (ed.) Extension of the Limit Theorems of Probability Theory to a Sum of Variables Connected in a Chain, pp. 552–577. John Wiley & Sons (1971)Google Scholar
- 12.Quesada, L.: A model-driven parser generator with reference resolution support. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp. 394–397 (2012)Google Scholar