Abstract
The behavior of simple recurrent neural networks trained on regular languages is analyzed in terms of accuracy and interpretability. We use controlled amounts of noise and L1 regularization to obtain stable and accurate responses that are at the same time highly interpretable, and introduce a shocking mechanism that reactivates silent neurons when learning stops due to an excessive regularization. Proper parameter tuning allows the networks to develop a strong generalization capacity, and at the same time provides solutions that may be interpreted as finite automata. Experiments carried out with different regular languages show that, in all cases, the trained networks display activation patterns that automatically cluster into a set of discrete states without any need to explicitly perform quantization. Analysis of the transitions between states in response to the input symbols reveals that the networks are in fact implementing a finite state machine that in all cases matches the regular expressions used to generate the training data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In all the experiments carried out we set C = 5000. Whenever the network’s weights are shocked the number of training epochs is increased in order to give more time for the training to converge.
- 2.
We have performed experiments with different values of the shocking parameter \(\zeta \), and found that any value in the range [0.5, 1.2] provides similar results.
- 3.
The results for the complete set of Tomita grammars, including the activation color plots, the Isomap projections of the hidden layer activation space and the extracted automata (see Sect. 4.3) can be publicly accessed at our GitHub repo: https://github.com/slyder095/coliva_llago_icann2019.
References
Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Zeng, Z., Goodman, R.M., Smyth, P.: Learning finite state machines with self-clustering recurrent networks. Neural Comput. 5(6), 976–990 (1993)
Omlin, C.W., Giles, C.L.: Extraction of rules from discrete-time recurrent neural networks. Neural Netw. 9(1), 41–52 (1996)
Casey, M.: The dynamics of discrete-time computation, with application to recurrent neural networks and finite state machine extraction. Neural Comput. 8(6), 1135–1178 (1996)
Cohen, M., Caciularu, A., Rejwan, I., Berant, J.: Inducing regular grammars using recurrent neural networks. CoRR, abs/1710.10453 (2017)
Gers, F.A., Schmidhuber, E.: LSTM recurrent networks learn simple context-free and context-sensitive languages. Trans. Neural Netw. 12(6), 1333–1340 (2001)
Giles, C.L., Miller, C.B., Chen, D., Sun, G., Chen, H., Lee, Y.: Extracting and learning an unknown grammar with recurrent neural networks. In: Advances in Neural Information Processing Systems 4, [NIPS Conference, Denver, Colorado, USA, 2–5 December 1991], pp. 317–324 (1991)
Jacobsson, H.: Rule extraction from recurrent neural networks: a taxonomy and review. Neural Comput. 17(6), 1223–1263 (2005)
Weiss, G., Goldberg, Y., Yahav, E.: Extracting automata from recurrent neural networks using queries and counterexamples. CoRR, abs/1711.09576 (2017)
Wang, Q., Zhang, K., Ororbia II, A.G., Xing, X., Liu, X., Giles, C.L.: An empirical evaluation of rule extraction from recurrent neural networks. Neural Comput. 30(9), 2568–2591 (2018)
Oliva, C., Lago-Fernández, L.F.: Interpretability of recurrent neural networks trained on regular languages. In: Rojas, I., Joya, G., Catala, A. (eds.) IWANN 2019. LNCS, vol. 11507, pp. 14–25. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20518-8_2
Tomita, M.: Dynamic construction of finite automata from examples using hill-climbing. In: Proceedings of the Fourth Annual Conference of the Cognitive Science Society, Ann Arbor, Michigan, pp. 105–108 (1982)
Karpathy, A., Johnson, J., Fei-Fei, L.: Visualizing and understanding recurrent networks. CoRR, abs/1506.02078 (2015)
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319 (2000)
Adilova, L., Paul, N., Schlicht, P.: Introducing noise in decentralized training of neural networks. CoRR, abs/1809.10678 (2018)
Das, S., Mozer, M.: Dynamic on-line clustering and state extraction: an approach to symbolic learning. Neural Netw. 11(1), 53–64 (1998)
Frasconi, P., Gori, M., Maggini, M., Soda, G.: Representation of finite state automata in recurrent radial basis function networks. Mach. Learn. 23(1), 5–32 (1996)
Acknowledgments
This work has been partially funded by grant S2017/BMD-3688 from Comunidad de Madrid and by Spanish project MINECO/FEDER TIN2017-84452-R (http://www.mineco.gob.es/).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Oliva, C., Lago-Fernández, L.F. (2019). On the Interpretation of Recurrent Neural Networks as Finite State Machines. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation. ICANN 2019. Lecture Notes in Computer Science(), vol 11727. Springer, Cham. https://doi.org/10.1007/978-3-030-30487-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-30487-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30486-7
Online ISBN: 978-3-030-30487-4
eBook Packages: Computer ScienceComputer Science (R0)