Natural language grammatical inference: A comparison of recurrent neural networks and machine learning methods

Lawrence, Steve; Fong, Sandiway; Giles, C. Lee

doi:10.1007/3-540-60925-3_36

Steve Lawrence^1,2,
Sandiway Fong¹ &
C. Lee Giles^1,3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1040))

Included in the following conference series:

International Joint Conference on Artificial Intelligence

253 Accesses
6 Citations

Abstract

We consider the task of training a neural network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government and Binding theory. We investigate the following models: feed-forward neural networks, Frasconi-Gori-Soda and Back-Tsoi locally recurrent neural networks, Williams and Zipser and Elman recurrent neural networks, Euclidean and edit-distance nearest-neighbors, and decision trees. Non-neural network machine learning methods are included primarily for comparison. We find that the Elman and Williams & Zipser recurrent neural networks are able to find a representation for the grammar which we believe is more parsimonious. These models exhibit the best performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allen, R. B. (1983). Sequential connectionist networks for answering simple questions about a microworld. In 5th Annual Proceedings of the Cognitive Science Society, pages 489–495.
Google Scholar
Back, A. and Tsoi, A. (1991). FIR and IIR synapses, a new neural network architecture for time series modeling. Neural Computation, 3(3):375–385.
Google Scholar
Berg, G. (1992). A connectionist parser with recursive sentence structure and lexical disambiguation. In Proceedings AAAI, pages 32–37.
Google Scholar
Chomsky, N. (1981). Lectures on Government and Binding. Foris Publications.
Google Scholar
Chomsky, N. (1986). Knowledge of Language: Its Nature, Origin, and Use. Prager.
Google Scholar
Clouse, D., Giles, C., Horne, B., and Cottrell, G. (1994). Learning large DeBruijn automata with feed-forward neural networks. Technical Report CS94-398, Computer Science and Engineering, University of California at San Diego, La Jolla, CA.
Google Scholar
Crutchfield, J. P. and Young, K. (1989). Computation at the onset of chaos. In Zurek, W., editor, Complexity, Entropy and the Physics of Information. Addison-Wesley, Reading, MA.
Google Scholar
Darken, C. and Moody, J. (1991). Note on learning rate schedules for stochastic optimization. In Neural Information Processing Systems 3, pages 832–838. Morgan Kaufmann.
Google Scholar
Das, S., Giles, C., and Sun, G. (1992). Learning context-free grammars: Limitations of a recurrent neural network with an external stack memory. In Proceedings of The Fourteenth Annual Conference of the Cognitive Science Society, pages 791–795, San Mateo, CA. Morgan Kaufmann Publishers.
Google Scholar
Elman, J. (1990). Finding structure in time. Cognitive Science, 14:179–211.
Google Scholar
Elman, J. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(2/3):195–226.
Google Scholar
Elman, J. L. (1984). Structured representations and connectionist models. In 6th Annual Proceedings of the Cognitive Science Society, pages 17–25.
Google Scholar
Frasconi, P., Gori, M., and Soda, G. (1992). Local feedback multilayered networks. Neural Computation, 4(1):120–130.
Google Scholar
Gasser, M. and Lee, C. (1990). Networks that learn phonology. Technical report, Computer Science Department, Indiana University.
Google Scholar
Giles, C., Horne, B., and Lin, T. (1995). Learning a class of large finite state machines with a recurrent neural network. Neural Networks. In press.
Google Scholar
Giles, C., Miller, C., Chen, D., Chen, H., Sun, G., and Lee, Y. (1992). Learning and extracting finite state automata with second-order recurrent neural networks. Neural Computation, 4(3):393–405.
Google Scholar
Hare, M. (1990). The role of similarity in hungarian vowel harmony: A connectionist account. Technical Report CRL Tech report 9004, Centre for Research in Language, University of California, San Diego.
Google Scholar
Hare, M., Corina, D., and Cottrell, G. (1989). Connectionist perspective on prosodic structure. Technical Report CRL Newsletter Volume 3 Number 2, Centre for Research in Language, University of California, San Diego.
Google Scholar
Harris, C. L. and Elman, J. L. (1984). Representing variable information with simple recurrent networks. In 6th Annual Proceedings of the Cognitive Science Society, pages 635–642.
Google Scholar
Haykin, S. (1994). Neural Networks, A Comprehensive Foundation. Macmillan, New York, NY.
Google Scholar
John, M. F. S. and McLelland, J. L. (1990). Learning and applying contextual constraints in sentence comprehension. Artificial Intelligence, 46:5–46.
Google Scholar
Joshi, A. K. (1985). Tree adjoining grammars: how much context-sensitivity is required to provide reasonable structural descriptions? In D. R. Dowty, L. K. and Zwicky, A. M., editors, Natural Language Parsing. Cambridge University Press, Cambridge.
Google Scholar
Kruskal, J. B. (1983). An overview of sequence comparison. In Sankoff, D. and Kruskal, J. B., editors, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, Massachusetts.
Google Scholar
Kullback, S. (1959). Information Theory and Statistics. Wiley, New York.
Google Scholar
Lasnik, H. and Uriagereka, J. (1988). A Course in GB Syntax: Lectures on Binding and Empty Categories. MIT Press, Cambridge, MA.
Google Scholar
Lawrence, S., Giles, C. L., and Fong, S. (1995). On the applicability of neural network and machine learning methodologies to natural language processing. Technical Report UMIACS-TR-95-64 and CS-TR-3479, Institute for Advanced Computer Studies, University of Maryland, College Park MD 20742.
Google Scholar
MacWhinney, B., Leinbach, J., Taraban, R., and McDonald, J. (1989). Language learning: cues or rules? Journal of Memory and Language, 28:255–277.
Google Scholar
Miikkulainen, R. and Dyer, M. (1989). Encoding input/output representations in connectionist cognitive systems. In Touretzky, D. S., Hinton, G. E., and Sejnowski, T. J., editors, Proceedings of the 1988 Connectionist Models Summer School, pages 188–195, Los Altos, CA. Morgan Kaufmann.
Google Scholar
Pereira, F. (1992). Inside-outside reestimation from partially bracketed corpora. In Association for Computational Linguistics, ACL 92.
Google Scholar
Pesetsky, D. M. (1982). Paths and Categories. PhD thesis, MIT.
Google Scholar
Pollack, J. (1990). Recursive distributed representations. Artificial Intelligence, 46:77–105.
Google Scholar
Pollack, J. (1991). The induction of dynamical recognizers. Machine Learning, 7:227–252.
Google Scholar
Pollard, C. (1984). Generalised context-free grammars, head grammars and natural language. PhD thesis, Department of Linguistics, Stanford University, Palo Alto, CA.
Google Scholar
Quinlan, R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, California.
Google Scholar
Siegelmann, H. and Sontag, E. (1995). On the computational power of neural nets. Journal of Computer and System Sciences, 50(1):132–150.
Google Scholar
Sperduti, A., Starita, A., and Goller, C. (1995). Learning distributed representations for the classification of terms. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 509–515.
Google Scholar
Stolcke, A. (1990). Learning feature-based semantics with simple recurrent networks. Technical Report TR-90-015, International Computer Science Institute, Berkeley, California.
Google Scholar
Sun, G., Giles, C., Chen, H., and Lee, Y. (1993). The neural network push-down automaton: Model, stack and learning simulations. Technical Report UMIACS-TR-93-77, Institute for Advanced Computer Studies, University of Maryland, College Park, MD.
Google Scholar
Touretzky, D. S. (1989a). Rules and maps in connectionist symbol processing. Technical Report CMU-CS-89-158, Carnegie Mellon University: Department of Computer Science, Pittsburgh, PA.
Google Scholar
Touretzky, D. S. (1989b). Towards a connectionist phonology: The’ many maps’ approach to sequence manipulation. In Proceedings of the 11th Annual Conference of the Cognitive Science Society, pages 188–195.
Google Scholar
Watrous, R. and Kuhn, G. (1992). Induction of finite-state languages using second-order recurrent networks. Neural Computation, 4(3):406.
Google Scholar
Williams, R. and Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270–280.
Google Scholar
Williams, R. and Zipser, D. (1990). Gradient-based learning algorithms for recurrent connectionist networks. In Chauvin, Y. and Rumelhart, D., editors, Backpropagation: Theory, Architectures, and Applications. Erlbaum, Hillsdale, NJ.
Google Scholar
Zeng, Z., Goodman, R., and Smyth, P. (1994). Discrete recurrent neural networks for grammatical inference. IEEE Transactions on Neural Networks, 5(2):320–330.
Google Scholar

Download references

Author information

Authors and Affiliations

NEC Research Institute, 4 Independence Way, 08540, Princeton, NJ
Steve Lawrence, Sandiway Fong & C. Lee Giles
Electrical and Computer Engineering, University of Queensland, 4072, St. Lucia, Australia
Steve Lawrence
Institute for Advanced Computer Studies, University of Maryland, 20742, College Park, MD
C. Lee Giles

Authors

Steve Lawrence
View author publications
You can also search for this author in PubMed Google Scholar
Sandiway Fong
View author publications
You can also search for this author in PubMed Google Scholar
C. Lee Giles
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Stefan Wermter Ellen Riloff Gabriele Scheler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lawrence, S., Fong, S., Giles, C.L. (1996). Natural language grammatical inference: A comparison of recurrent neural networks and machine learning methods. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_36

Download citation

DOI: https://doi.org/10.1007/3-540-60925-3_36
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60925-4
Online ISBN: 978-3-540-49738-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics