Patterns Versus Characters in Subword-Aware Neural Language Modeling

  • Rustem TakhanovEmail author
  • Zhenisbek Assylbekov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10635)


Words in some natural languages can have a composite structure. Elements of this structure include the root (that could also be composite), prefixes and suffixes with which various nuances and relations to other words can be expressed. Thus, in order to build a proper word representation one must take into account its internal structure. From a corpus of texts we extract a set of frequent subwords and from the latter set we select patterns, i.e. subwords which encapsulate information on character n-gram regularities. The selection is made using the pattern-based Conditional Random Field model [19, 23] with \(l_1\) regularization. Further, for every word we construct a new sequence over an alphabet of patterns. The new alphabet’s symbols confine a local statistical context stronger than the characters, therefore they allow better representations in \({\mathbb {R}}^n\) and are better building blocks for word representation. In the task of subword-aware language modeling, pattern-based models outperform character-based analogues by 2–20 perplexity points. Also, a recurrent neural network in which a word is represented as a sum of embeddings of its patterns is on par with a competitive and significantly more sophisticated character-based convolutional architecture.


Subword-aware language modeling Pattern-based conditional random field Word representation Deep learning 



We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.


  1. 1.
    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems (2016). arXiv preprint: arXiv:1603.04467
  2. 2.
    Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information (2016). arXiv preprint: arXiv:1607.04606
  3. 3.
    Botha, J., Blunsom, P.: Compositional morphology for word representations and language modelling. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 1899–1907 (2014)Google Scholar
  4. 4.
    Chomsky, N.: Three models for the description of language. IRE Trans. Inf. Theor. 2(3), 113–124 (1956)CrossRefzbMATHGoogle Scholar
  5. 5.
    Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 1019–1027 (2016)Google Scholar
  6. 6.
    Graves, A.: Generating sequences with recurrent neural networks (2013). arXiv preprint: arXiv:1308.0850
  7. 7.
    Hardt, M., Ma, T.: Identity matters in deep learning (2016). arXiv preprint: arXiv:1611.04231
  8. 8.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  9. 9.
    Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 2741–2749. AAAI Press (2016)Google Scholar
  10. 10.
    Lankinen, M., Heikinheimo, H., Takala, P., Raiko, T., Karhunen, J.: A character-word compositional neural language model for finnish (2016). arXiv preprint: arXiv:1612.03266
  11. 11.
    Ling, W., Dyer, C., Black, A.W., Trancoso, I., Fermandez, R., Amir, S., Marujo, L., Luis, T.: Finding function in form: compositional character models for open vocabulary word representation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1520–1530. Association for Computational Linguistics, September 2015Google Scholar
  12. 12.
    Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)Google Scholar
  13. 13.
    Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models. In: Proceedings of ICLR 2017 (2017)Google Scholar
  14. 14.
    Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech, vol. 2, p. 3 (2010)Google Scholar
  15. 15.
    Mikolov, T., Sutskever, I., Deoras, A., Le, H.S., Kombrink, S., Cernocky, J.: Subword language modeling with neural networks (2012). Preprint (
  16. 16.
    Shannon, C.E., Weaver, W.: A mathematical theory of communication (1963)Google Scholar
  17. 17.
    Sperr, H., Niehues, J., Waibel, A.: Letter n-gram-based input encoding for continuous space language models. In: Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, pp. 30–39 (2013)Google Scholar
  18. 18.
    Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in Neural Information Processing Systems, pp. 2377–2385 (2015)Google Scholar
  19. 19.
    Takhanov, R., Kolmogorov, V.: Inference algorithms for pattern-based CRFs on sequence data. In: ICML, vol. 3, pp. 145–153 (2013)Google Scholar
  20. 20.
    Verwimp, L., Pelemans, J., Wambacq, P., et al.: Character-word LSTM language models. In: Proceedings of EACL 2017 (2017)Google Scholar
  21. 21.
    Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)CrossRefGoogle Scholar
  22. 22.
    Wieting, J., Bansal, M., Gimpel, K., Livescu, K.: Charagram: embedding words and sentences via character n-grams. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, pp. 1504–1515, 1–4 November 2016Google Scholar
  23. 23.
    Ye, N., Lee, W.S., Chieu, H.L., Wu, D.: Conditional random fields with high-order features for sequence labeling. In: Advances in Neural Information Processing Systems, pp. 2196–2204 (2009)Google Scholar
  24. 24.
    Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization (2014). arXiv preprint: arXiv:1409.2329
  25. 25.
    Zilly, J.G., Srivastava, R.K., Koutník, J., Schmidhuber, J.: Recurrent highway networks (2016). arXiv preprint: arXiv:1607.03474
  26. 26.
    Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: Proceedings of ICLR 2017 (2017)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Nazarbayev UniversityAstanaKazakhstan

Personalised recommendations