Abstract
State-of-the-art sequence labeling systems require large amounts of task-specific knowledge in the form of handcrafted features and data pre-processing, and those systems are established on news corpus. English as second language (ESL) corpus is collected from articles written by English-learner. The corpus is full of grammatical mistakes, and then it is much more difficult to do sequence labeling. We propose a two-stage deep neural network architecture for sequence labeling, which enable the higher-layer to make use of the coarse-grained labeling information of the lower-level. We evaluate our model on three datasets for three sequence labeling tasks—Penn Treebank WSJ corpus for part-of-speech (POS) tagging, CoNLL 2003 corpus for named entity recognition (NER) and CoNLL 2013 corpus for grammatical error correction (GEC). We obtain state-of-the-art performance on three datasets—97.60% accuracy for POS tagging, 91.38% F1 for NER and 38% F1 for determiner error correction of GEC and 28.89% F1 for prepositional error correction of GEC. We also evaluate our system on ESL corpus PiGai for POS tagging and obtain 96.73% accuracy. The implementation of our network is publicly available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization (2016)
Baxter, J.: Learning internal representations. In: 8th Conference on Computational Learning Theory, pp. 311–320. ACM (1995)
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Bohnet, B., McDonald, R., Simoes, G., Andor, D., Pitler, E., Maynez, J.: Morphosyntactic tagging with a Meta-BiLSTM model over context sensitive token encodings. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 2642–2652 (2018)
Cai, R., Zhang, X., Wang, H.: Bidirectional recurrent convolutional neural network for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 756–765 (2016)
Caruana, R.A.: Multitask connectionist learning. In: Connectionist Models Summer School, pp. 372–379 (1995)
Chang, C.H., Chen, C.D.: HMM-based Part-of-Speech Tagging for Chinese Corpora (1993)
Chung, J., Ahn, S., Bengio, Y.: Hierarchical Multiscale Recurrent Neural Networks (2016)
Cho, K., Merrienboer, B.V., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. Comput. Sci. (2014)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Cooijmans, T., Ballas, N., Laurent, C., Gülçehre, Ç., Courville, A.: Recurrent Batch Normalization (2016)
Enez, J.U.G., Arquez, L.S.M.: SVMTool: A general POS tagger generator based on Support Vector Machines. Report on the geology of Trinidad: Part I. of the West Indian survey (2004)
Graves, A., Schmidhuber, J.: Special issue: framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks 18(5–6), 602–610 (2005)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. Comput. Sci. 3(4), 212–223 (2012)
Hochreiter, S.: Untersuchungen zu Dynamischen Neuronalen Netzen (1991)
Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. Comput. Sci. (2015)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. Comput. Sci. (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of NAACL (2016)
Laurent, C., Pereyra, G., Brakel, P., Zhang, Y., Bengio, Y.: Batch normalized recurrent neural networks. Comput. Sci. (2015)
Luo, G., Huang, X., Lin, C.Y., Nie, Z.: Joint entity recognition and disambiguation. In: Conference on Empirical Methods in Natural Language Processing, pp. 879–888 (2015)
Ma, X., Hovy, E.: End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF (2016)
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Comput. Sci. (2013)
Ng, H.T., Wu, S.M., Wu, Y.B., Hadiwinoto, C., Tetreault, J.: The CoNLL-2013 shared task on grammatical error correction. In: Proceedings of the 17th Conference on Computational Natural Language Learning, pp. 1–12 (2013)
Passos, A., Kumar, V., Mccallum, A.: Lexicon infused phrase embeddings for named entity resolution. Comput. Sci. (2014)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Peters, M.E., et al.: Deep contextualized word representations (2018)
Sang, E.F.T.K.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the 7th conference on Natural language learning at HLT-NAACL (2003)
Santos, C.N.D., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: International Conference on Machine Learning, pp. 1818–1826 (2014)
Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. Comput. Sci. (2015)
Søgaard, A., Goldberg, Y.: Deep multi-task learning with low-level tasks supervised at lower layers (2016)
Tan, Y.M., Yang, L., Hu, D.: A part-of-speech tagging algorithm for essay written by chinese english learner. J. Beijing Univ. Posts Telecommun. 40(2), 30–34 (2017)
Veit, A., Wilber, M., Belongie, S.: Residual Networks Behave Like Ensembles of Relatively Shallow Networks (2016)
Wen, Y., Zhang, W., Luo, R., Wang, J.: Learning text representation using recurrent convolutional neural network with highway layers (2016)
Yoshimoto, I., et al.: NAIST at 2013 CoNLL grammatical error correction shared task. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, pp. 26–33 (2013)
Acknowledgments
The authors would like to thank the anonymous reviewers for the constructive comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tan, Y., Yang, L., Niu, S., Zhu, H., Zhang, Y. (2019). A Two-Stage Deep Neural Network for Sequence Labeling. In: Sun, X., Pan, Z., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2019. Lecture Notes in Computer Science(), vol 11633. Springer, Cham. https://doi.org/10.1007/978-3-030-24265-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-24265-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24264-0
Online ISBN: 978-3-030-24265-7
eBook Packages: Computer ScienceComputer Science (R0)