Abstract
The core of sentence classification is to extract sentence semantic features. The existing hybrid methods have huge parameters and complex models. Due to the limited dataset, these methods are prone to feature redundancy and overfitting. To address this issue, this paper proposes an orthogonal independent Bi-GRU sentence classification model with multi-scale feature extraction, called Multi-scale Orthogonal Independent Bi-GRU (MODE-Bi-GRU). First, the hidden state of the Bi-GRU model is split into multiple small hidden states, and the corresponding recursive matrix is constrained orthogonally. Then, multiple sliding windows of different sizes are defined according to the forward and reverse angles of the sentence, and the sliding window is obtained. Finally, different sentence fragments are superimposed and input to the model, and the output results of multiple small Bi-GRU models are spliced and processed by soft pooling. The improved focal loss function is adopted to speed up the convergence of the model. Compared to the existing models, our proposed model achieves better results on four benchmark datasets, and it has better generalization ability with fewer parameters.
Similar content being viewed by others
References
Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364
Dangovski R, Jing L, Nakov P, Tatalović M, Soljačić M (2019) Rotational unit of memory: a novel representation unit for rnns with scalable applications. Trans Assoc Comput Ling 7:121–138
Devlin J, Chang MW, Lee K Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Ding Z, Xia R, Yu J, Li X, Yang J (2018) Densely connected bidirectional lstm with applications to sentence classification. In: CCF International Conference on Natural Language Processing and Chinese Computing, pp. 278–287. Springer
Er MJ, Zhang Y, Wang N, Pratama M (2016) Attention pooling-based convolutional neural network for sentence modelling. Inf Sci 373:388–403
Guo H, Mao Y, Zhang R (2019) Augmenting data with mixup for sentence classification: an empirical study. arXiv preprint arXiv:1905.08941
Hochreiter Schmidhuber (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Kalchbrenner N, Grefenstette E, Blunsom PA (2014) A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1746–1751
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Twenty-ninth AAAI Conference on Artificial Intelligence
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942
Lee JY, Dernoncourt F (2016) Sequential short-text classification with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827
Lei T, Barzilay R, Jaakkola T (2015) Molding cnns for text: non-linear, non-consecutive convolutions. arXiv preprint arXiv:1508.04112
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988
Lin Z, Feng M, Santos CND, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130
Madasu A, Rao VA (2019) Sequential learning of convolutional features for effective text classification. arXiv preprint arXiv:1909.00080
Ma Q, Lin Z, Yan J, Chen Z, Yu L (2020) Mode-lstm: a parameter-efficient recurrent network with multi-scale for sentence classification. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6705–6715
Perone C, Silveira R, Paula TS (2018) Evaluation of sentence embeddings in downstream and linguistic probing tasks. arXiv preprint arXiv:1806.06259
Shi Y, Yao K, Tian L, Jiang D (2016) Deep lstm based feature mapping for query classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1501–1511
Song X, Petrak J, Roberts A (2018) A deep neural network sentence level classification method with context information. arXiv preprint arXiv:1809.00934
Vaswani Shazeer, Parmar, Uszkoreit Jones, Gomez Kaiser, Polosukhin: Attention is all you need. Advances in neural information processing systems 30 (2017)
Wang B (2018) Disconnected recurrent neural networks for text categorization. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 2311–2320
Wang X, Jiang W, Luo Z (2016) Combination of convolutional and recurrent neural network for sentiment analysis of short texts. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2428–2437
Wang Y, Tian F (2016) Recurrent residual learning for sequence classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 938–943
Wang P, Xu J, Xu B, Liu C, Zhang H, Wang F, Hao H (2015) Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2, pp. 352–357
Wang J, Yu LC, Lai KR, Zhang X Dimensional sentiment analysis using a regional CNN-LSTM model. In: Proceedings of the 54th Annual Meeting of the Association for Computational, vol. 2, pp. 225–230. Association for Computational Linguistics, Berlin, Germany (2016)
Xiao L, Zhang H, Chen W, Wang Y, Jin J (2018) Transformable convolutional neural network for text classification. In: IJCAI, pp. 4496–4502
Xing B, Ivor IW (2022) Darer: Dual-task temporal relational recurrent reasoning network for joint dialog sentiment classification and act recognition. arXiv preprint arXiv:2203.03856
Yin W, Kann K, Yu M, Schütze H (2017) Comparative study of cnn and rnn for natural language processing. arXiv preprint arXiv:1702.01923
Yin W, Schütze H (2016) Multichannel variable-size convolution for sentence classification. arXiv preprint arXiv:1603.04513
Zhang T, Huang M, Zhao L (2018) Learning structured representation for text classification via reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32
Zhang R, Lee H, Radev D (2016) Dependency sensitive convolutional neural networks for modeling sentences and documents. arXiv preprint arXiv:1611.02361
Zhang Y, Roller S, Wallace B (2016) Mgnc-cnn: A simple approach to exploiting multiple word embeddings for sentence classification. arXiv preprint arXiv:1603.00968
Zhao J, Zhan Z, Yang Q, Zhang Y, Hu C, Li Z, Zhang L, He Z (2018) Adaptive learning of local semantic and global structure representations for text classification. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 2033–2043
Zheng W, Zheng Z, Wan H, Chen C (2019) Dynamically route hierarchical structure representation to attentive capsule for text classification. In: IJCAI, pp. 5464–5470
Zhou C, Sun C, Liu Z, Lau F (2015) A c-lstm neural network for text classification. arXiv preprint arXiv:1511.08630
Zhou Q, Wang X, Dong X (2018) Differentiated attentive representation learning for sentence classification. In: IJCAI, pp. 4630–4636
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors have no conflict of interest to declare.
Additional information
Responsible editor: Charalampos Tsourakakis.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, W., Ruan, W. & Meng, X. MODE-Bi-GRU: orthogonal independent Bi-GRU model with multiscale feature extraction. Data Min Knowl Disc 38, 154–172 (2024). https://doi.org/10.1007/s10618-023-00964-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-023-00964-2