Skip to main content

Natural language grammatical inference: A comparison of recurrent neural networks and machine learning methods

  • Conference paper
  • First Online:
Book cover Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing (IJCAI 1995)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1040))

Included in the following conference series:

Abstract

We consider the task of training a neural network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government and Binding theory. We investigate the following models: feed-forward neural networks, Frasconi-Gori-Soda and Back-Tsoi locally recurrent neural networks, Williams and Zipser and Elman recurrent neural networks, Euclidean and edit-distance nearest-neighbors, and decision trees. Non-neural network machine learning methods are included primarily for comparison. We find that the Elman and Williams & Zipser recurrent neural networks are able to find a representation for the grammar which we believe is more parsimonious. These models exhibit the best performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Allen, R. B. (1983). Sequential connectionist networks for answering simple questions about a microworld. In 5th Annual Proceedings of the Cognitive Science Society, pages 489–495.

    Google Scholar 

  • Back, A. and Tsoi, A. (1991). FIR and IIR synapses, a new neural network architecture for time series modeling. Neural Computation, 3(3):375–385.

    Google Scholar 

  • Berg, G. (1992). A connectionist parser with recursive sentence structure and lexical disambiguation. In Proceedings AAAI, pages 32–37.

    Google Scholar 

  • Chomsky, N. (1981). Lectures on Government and Binding. Foris Publications.

    Google Scholar 

  • Chomsky, N. (1986). Knowledge of Language: Its Nature, Origin, and Use. Prager.

    Google Scholar 

  • Clouse, D., Giles, C., Horne, B., and Cottrell, G. (1994). Learning large DeBruijn automata with feed-forward neural networks. Technical Report CS94-398, Computer Science and Engineering, University of California at San Diego, La Jolla, CA.

    Google Scholar 

  • Crutchfield, J. P. and Young, K. (1989). Computation at the onset of chaos. In Zurek, W., editor, Complexity, Entropy and the Physics of Information. Addison-Wesley, Reading, MA.

    Google Scholar 

  • Darken, C. and Moody, J. (1991). Note on learning rate schedules for stochastic optimization. In Neural Information Processing Systems 3, pages 832–838. Morgan Kaufmann.

    Google Scholar 

  • Das, S., Giles, C., and Sun, G. (1992). Learning context-free grammars: Limitations of a recurrent neural network with an external stack memory. In Proceedings of The Fourteenth Annual Conference of the Cognitive Science Society, pages 791–795, San Mateo, CA. Morgan Kaufmann Publishers.

    Google Scholar 

  • Elman, J. (1990). Finding structure in time. Cognitive Science, 14:179–211.

    Google Scholar 

  • Elman, J. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(2/3):195–226.

    Google Scholar 

  • Elman, J. L. (1984). Structured representations and connectionist models. In 6th Annual Proceedings of the Cognitive Science Society, pages 17–25.

    Google Scholar 

  • Frasconi, P., Gori, M., and Soda, G. (1992). Local feedback multilayered networks. Neural Computation, 4(1):120–130.

    Google Scholar 

  • Gasser, M. and Lee, C. (1990). Networks that learn phonology. Technical report, Computer Science Department, Indiana University.

    Google Scholar 

  • Giles, C., Horne, B., and Lin, T. (1995). Learning a class of large finite state machines with a recurrent neural network. Neural Networks. In press.

    Google Scholar 

  • Giles, C., Miller, C., Chen, D., Chen, H., Sun, G., and Lee, Y. (1992). Learning and extracting finite state automata with second-order recurrent neural networks. Neural Computation, 4(3):393–405.

    Google Scholar 

  • Hare, M. (1990). The role of similarity in hungarian vowel harmony: A connectionist account. Technical Report CRL Tech report 9004, Centre for Research in Language, University of California, San Diego.

    Google Scholar 

  • Hare, M., Corina, D., and Cottrell, G. (1989). Connectionist perspective on prosodic structure. Technical Report CRL Newsletter Volume 3 Number 2, Centre for Research in Language, University of California, San Diego.

    Google Scholar 

  • Harris, C. L. and Elman, J. L. (1984). Representing variable information with simple recurrent networks. In 6th Annual Proceedings of the Cognitive Science Society, pages 635–642.

    Google Scholar 

  • Haykin, S. (1994). Neural Networks, A Comprehensive Foundation. Macmillan, New York, NY.

    Google Scholar 

  • John, M. F. S. and McLelland, J. L. (1990). Learning and applying contextual constraints in sentence comprehension. Artificial Intelligence, 46:5–46.

    Google Scholar 

  • Joshi, A. K. (1985). Tree adjoining grammars: how much context-sensitivity is required to provide reasonable structural descriptions? In D. R. Dowty, L. K. and Zwicky, A. M., editors, Natural Language Parsing. Cambridge University Press, Cambridge.

    Google Scholar 

  • Kruskal, J. B. (1983). An overview of sequence comparison. In Sankoff, D. and Kruskal, J. B., editors, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, Massachusetts.

    Google Scholar 

  • Kullback, S. (1959). Information Theory and Statistics. Wiley, New York.

    Google Scholar 

  • Lasnik, H. and Uriagereka, J. (1988). A Course in GB Syntax: Lectures on Binding and Empty Categories. MIT Press, Cambridge, MA.

    Google Scholar 

  • Lawrence, S., Giles, C. L., and Fong, S. (1995). On the applicability of neural network and machine learning methodologies to natural language processing. Technical Report UMIACS-TR-95-64 and CS-TR-3479, Institute for Advanced Computer Studies, University of Maryland, College Park MD 20742.

    Google Scholar 

  • MacWhinney, B., Leinbach, J., Taraban, R., and McDonald, J. (1989). Language learning: cues or rules? Journal of Memory and Language, 28:255–277.

    Google Scholar 

  • Miikkulainen, R. and Dyer, M. (1989). Encoding input/output representations in connectionist cognitive systems. In Touretzky, D. S., Hinton, G. E., and Sejnowski, T. J., editors, Proceedings of the 1988 Connectionist Models Summer School, pages 188–195, Los Altos, CA. Morgan Kaufmann.

    Google Scholar 

  • Pereira, F. (1992). Inside-outside reestimation from partially bracketed corpora. In Association for Computational Linguistics, ACL 92.

    Google Scholar 

  • Pesetsky, D. M. (1982). Paths and Categories. PhD thesis, MIT.

    Google Scholar 

  • Pollack, J. (1990). Recursive distributed representations. Artificial Intelligence, 46:77–105.

    Google Scholar 

  • Pollack, J. (1991). The induction of dynamical recognizers. Machine Learning, 7:227–252.

    Google Scholar 

  • Pollard, C. (1984). Generalised context-free grammars, head grammars and natural language. PhD thesis, Department of Linguistics, Stanford University, Palo Alto, CA.

    Google Scholar 

  • Quinlan, R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, California.

    Google Scholar 

  • Siegelmann, H. and Sontag, E. (1995). On the computational power of neural nets. Journal of Computer and System Sciences, 50(1):132–150.

    Google Scholar 

  • Sperduti, A., Starita, A., and Goller, C. (1995). Learning distributed representations for the classification of terms. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 509–515.

    Google Scholar 

  • Stolcke, A. (1990). Learning feature-based semantics with simple recurrent networks. Technical Report TR-90-015, International Computer Science Institute, Berkeley, California.

    Google Scholar 

  • Sun, G., Giles, C., Chen, H., and Lee, Y. (1993). The neural network push-down automaton: Model, stack and learning simulations. Technical Report UMIACS-TR-93-77, Institute for Advanced Computer Studies, University of Maryland, College Park, MD.

    Google Scholar 

  • Touretzky, D. S. (1989a). Rules and maps in connectionist symbol processing. Technical Report CMU-CS-89-158, Carnegie Mellon University: Department of Computer Science, Pittsburgh, PA.

    Google Scholar 

  • Touretzky, D. S. (1989b). Towards a connectionist phonology: The’ many maps’ approach to sequence manipulation. In Proceedings of the 11th Annual Conference of the Cognitive Science Society, pages 188–195.

    Google Scholar 

  • Watrous, R. and Kuhn, G. (1992). Induction of finite-state languages using second-order recurrent networks. Neural Computation, 4(3):406.

    Google Scholar 

  • Williams, R. and Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270–280.

    Google Scholar 

  • Williams, R. and Zipser, D. (1990). Gradient-based learning algorithms for recurrent connectionist networks. In Chauvin, Y. and Rumelhart, D., editors, Backpropagation: Theory, Architectures, and Applications. Erlbaum, Hillsdale, NJ.

    Google Scholar 

  • Zeng, Z., Goodman, R., and Smyth, P. (1994). Discrete recurrent neural networks for grammatical inference. IEEE Transactions on Neural Networks, 5(2):320–330.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Stefan Wermter Ellen Riloff Gabriele Scheler

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lawrence, S., Fong, S., Giles, C.L. (1996). Natural language grammatical inference: A comparison of recurrent neural networks and machine learning methods. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_36

Download citation

  • DOI: https://doi.org/10.1007/3-540-60925-3_36

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60925-4

  • Online ISBN: 978-3-540-49738-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics