Skip to main content
Log in

MODE-Bi-GRU: orthogonal independent Bi-GRU model with multiscale feature extraction

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The core of sentence classification is to extract sentence semantic features. The existing hybrid methods have huge parameters and complex models. Due to the limited dataset, these methods are prone to feature redundancy and overfitting. To address this issue, this paper proposes an orthogonal independent Bi-GRU sentence classification model with multi-scale feature extraction, called Multi-scale Orthogonal Independent Bi-GRU (MODE-Bi-GRU). First, the hidden state of the Bi-GRU model is split into multiple small hidden states, and the corresponding recursive matrix is constrained orthogonally. Then, multiple sliding windows of different sizes are defined according to the forward and reverse angles of the sentence, and the sliding window is obtained. Finally, different sentence fragments are superimposed and input to the model, and the output results of multiple small Bi-GRU models are spliced and processed by soft pooling. The improved focal loss function is adopted to speed up the convergence of the model. Compared to the existing models, our proposed model achieves better results on four benchmark datasets, and it has better generalization ability with fewer parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364

  • Dangovski R, Jing L, Nakov P, Tatalović M, Soljačić M (2019) Rotational unit of memory: a novel representation unit for rnns with scalable applications. Trans Assoc Comput Ling 7:121–138

    Google Scholar 

  • Devlin J, Chang MW, Lee K Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  • Ding Z, Xia R, Yu J, Li X, Yang J (2018) Densely connected bidirectional lstm with applications to sentence classification. In: CCF International Conference on Natural Language Processing and Chinese Computing, pp. 278–287. Springer

  • Er MJ, Zhang Y, Wang N, Pratama M (2016) Attention pooling-based convolutional neural network for sentence modelling. Inf Sci 373:388–403

    Article  Google Scholar 

  • Guo H, Mao Y, Zhang R (2019) Augmenting data with mixup for sentence classification: an empirical study. arXiv preprint arXiv:1905.08941

  • Hochreiter Schmidhuber (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Kalchbrenner N, Grefenstette E, Blunsom PA (2014) A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188

  • Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1746–1751

  • Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Twenty-ninth AAAI Conference on Artificial Intelligence

  • Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942

  • Lee JY, Dernoncourt F (2016) Sequential short-text classification with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827

  • Lei T, Barzilay R, Jaakkola T (2015) Molding cnns for text: non-linear, non-consecutive convolutions. arXiv preprint arXiv:1508.04112

  • Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988

  • Lin Z, Feng M, Santos CND, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130

  • Madasu A, Rao VA (2019) Sequential learning of convolutional features for effective text classification. arXiv preprint arXiv:1909.00080

  • Ma Q, Lin Z, Yan J, Chen Z, Yu L (2020) Mode-lstm: a parameter-efficient recurrent network with multi-scale for sentence classification. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6705–6715

  • Perone C, Silveira R, Paula TS (2018) Evaluation of sentence embeddings in downstream and linguistic probing tasks. arXiv preprint arXiv:1806.06259

  • Shi Y, Yao K, Tian L, Jiang D (2016) Deep lstm based feature mapping for query classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1501–1511

  • Song X, Petrak J, Roberts A (2018) A deep neural network sentence level classification method with context information. arXiv preprint arXiv:1809.00934

  • Vaswani Shazeer, Parmar, Uszkoreit Jones, Gomez Kaiser, Polosukhin: Attention is all you need. Advances in neural information processing systems 30 (2017)

  • Wang B (2018) Disconnected recurrent neural networks for text categorization. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 2311–2320

  • Wang X, Jiang W, Luo Z (2016) Combination of convolutional and recurrent neural network for sentiment analysis of short texts. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2428–2437

  • Wang Y, Tian F (2016) Recurrent residual learning for sequence classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 938–943

  • Wang P, Xu J, Xu B, Liu C, Zhang H, Wang F, Hao H (2015) Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2, pp. 352–357

  • Wang J, Yu LC, Lai KR, Zhang X Dimensional sentiment analysis using a regional CNN-LSTM model. In: Proceedings of the 54th Annual Meeting of the Association for Computational, vol. 2, pp. 225–230. Association for Computational Linguistics, Berlin, Germany (2016)

  • Xiao L, Zhang H, Chen W, Wang Y, Jin J (2018) Transformable convolutional neural network for text classification. In: IJCAI, pp. 4496–4502

  • Xing B, Ivor IW (2022) Darer: Dual-task temporal relational recurrent reasoning network for joint dialog sentiment classification and act recognition. arXiv preprint arXiv:2203.03856

  • Yin W, Kann K, Yu M, Schütze H (2017) Comparative study of cnn and rnn for natural language processing. arXiv preprint arXiv:1702.01923

  • Yin W, Schütze H (2016) Multichannel variable-size convolution for sentence classification. arXiv preprint arXiv:1603.04513

  • Zhang T, Huang M, Zhao L (2018) Learning structured representation for text classification via reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32

  • Zhang R, Lee H, Radev D (2016) Dependency sensitive convolutional neural networks for modeling sentences and documents. arXiv preprint arXiv:1611.02361

  • Zhang Y, Roller S, Wallace B (2016) Mgnc-cnn: A simple approach to exploiting multiple word embeddings for sentence classification. arXiv preprint arXiv:1603.00968

  • Zhao J, Zhan Z, Yang Q, Zhang Y, Hu C, Li Z, Zhang L, He Z (2018) Adaptive learning of local semantic and global structure representations for text classification. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 2033–2043

  • Zheng W, Zheng Z, Wan H, Chen C (2019) Dynamically route hierarchical structure representation to attentive capsule for text classification. In: IJCAI, pp. 5464–5470

  • Zhou C, Sun C, Liu Z, Lau F (2015) A c-lstm neural network for text classification. arXiv preprint arXiv:1511.08630

  • Zhou Q, Wang X, Dong X (2018) Differentiated attentive representation learning for sentence classification. In: IJCAI, pp. 4630–4636

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenhan Ruan.

Ethics declarations

Conflict of interest

Authors have no conflict of interest to declare.

Additional information

Responsible editor: Charalampos Tsourakakis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, W., Ruan, W. & Meng, X. MODE-Bi-GRU: orthogonal independent Bi-GRU model with multiscale feature extraction. Data Min Knowl Disc 38, 154–172 (2024). https://doi.org/10.1007/s10618-023-00964-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-023-00964-2

Keywords

Navigation