MODE-Bi-GRU: orthogonal independent Bi-GRU model with multiscale feature extraction

Wang, Wei; Ruan, Wenhan; Meng, Xiangfu

doi:10.1007/s10618-023-00964-2

MODE-Bi-GRU: orthogonal independent Bi-GRU model with multiscale feature extraction

Published: 09 October 2023

Volume 38, pages 154–172, (2024)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

182 Accesses
1 Altmetric
Explore all metrics

Abstract

The core of sentence classification is to extract sentence semantic features. The existing hybrid methods have huge parameters and complex models. Due to the limited dataset, these methods are prone to feature redundancy and overfitting. To address this issue, this paper proposes an orthogonal independent Bi-GRU sentence classification model with multi-scale feature extraction, called Multi-scale Orthogonal Independent Bi-GRU (MODE-Bi-GRU). First, the hidden state of the Bi-GRU model is split into multiple small hidden states, and the corresponding recursive matrix is constrained orthogonally. Then, multiple sliding windows of different sizes are defined according to the forward and reverse angles of the sentence, and the sliding window is obtained. Finally, different sentence fragments are superimposed and input to the model, and the output results of multiple small Bi-GRU models are spliced and processed by soft pooling. The improved focal loss function is adopted to speed up the convergence of the model. Compared to the existing models, our proposed model achieves better results on four benchmark datasets, and it has better generalization ability with fewer parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Densely Connected Bidirectional LSTM with Applications to Sentence Classification

Chinese Sentence Semantic Matching Based on Multi-Granularity Fusion Model

Deletion-Based Sentence Compression Using Bi-enc-dec LSTM

References

Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364
Dangovski R, Jing L, Nakov P, Tatalović M, Soljačić M (2019) Rotational unit of memory: a novel representation unit for rnns with scalable applications. Trans Assoc Comput Ling 7:121–138
Google Scholar
Devlin J, Chang MW, Lee K Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Ding Z, Xia R, Yu J, Li X, Yang J (2018) Densely connected bidirectional lstm with applications to sentence classification. In: CCF International Conference on Natural Language Processing and Chinese Computing, pp. 278–287. Springer
Er MJ, Zhang Y, Wang N, Pratama M (2016) Attention pooling-based convolutional neural network for sentence modelling. Inf Sci 373:388–403
Article Google Scholar
Guo H, Mao Y, Zhang R (2019) Augmenting data with mixup for sentence classification: an empirical study. arXiv preprint arXiv:1905.08941
Hochreiter Schmidhuber (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Kalchbrenner N, Grefenstette E, Blunsom PA (2014) A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1746–1751
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Twenty-ninth AAAI Conference on Artificial Intelligence
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942
Lee JY, Dernoncourt F (2016) Sequential short-text classification with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827
Lei T, Barzilay R, Jaakkola T (2015) Molding cnns for text: non-linear, non-consecutive convolutions. arXiv preprint arXiv:1508.04112
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988
Lin Z, Feng M, Santos CND, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130
Madasu A, Rao VA (2019) Sequential learning of convolutional features for effective text classification. arXiv preprint arXiv:1909.00080
Ma Q, Lin Z, Yan J, Chen Z, Yu L (2020) Mode-lstm: a parameter-efficient recurrent network with multi-scale for sentence classification. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6705–6715
Perone C, Silveira R, Paula TS (2018) Evaluation of sentence embeddings in downstream and linguistic probing tasks. arXiv preprint arXiv:1806.06259
Shi Y, Yao K, Tian L, Jiang D (2016) Deep lstm based feature mapping for query classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1501–1511
Song X, Petrak J, Roberts A (2018) A deep neural network sentence level classification method with context information. arXiv preprint arXiv:1809.00934
Vaswani Shazeer, Parmar, Uszkoreit Jones, Gomez Kaiser, Polosukhin: Attention is all you need. Advances in neural information processing systems 30 (2017)
Wang B (2018) Disconnected recurrent neural networks for text categorization. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 2311–2320
Wang X, Jiang W, Luo Z (2016) Combination of convolutional and recurrent neural network for sentiment analysis of short texts. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2428–2437
Wang Y, Tian F (2016) Recurrent residual learning for sequence classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 938–943
Wang P, Xu J, Xu B, Liu C, Zhang H, Wang F, Hao H (2015) Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2, pp. 352–357
Wang J, Yu LC, Lai KR, Zhang X Dimensional sentiment analysis using a regional CNN-LSTM model. In: Proceedings of the 54th Annual Meeting of the Association for Computational, vol. 2, pp. 225–230. Association for Computational Linguistics, Berlin, Germany (2016)
Xiao L, Zhang H, Chen W, Wang Y, Jin J (2018) Transformable convolutional neural network for text classification. In: IJCAI, pp. 4496–4502
Xing B, Ivor IW (2022) Darer: Dual-task temporal relational recurrent reasoning network for joint dialog sentiment classification and act recognition. arXiv preprint arXiv:2203.03856
Yin W, Kann K, Yu M, Schütze H (2017) Comparative study of cnn and rnn for natural language processing. arXiv preprint arXiv:1702.01923
Yin W, Schütze H (2016) Multichannel variable-size convolution for sentence classification. arXiv preprint arXiv:1603.04513
Zhang T, Huang M, Zhao L (2018) Learning structured representation for text classification via reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32
Zhang R, Lee H, Radev D (2016) Dependency sensitive convolutional neural networks for modeling sentences and documents. arXiv preprint arXiv:1611.02361
Zhang Y, Roller S, Wallace B (2016) Mgnc-cnn: A simple approach to exploiting multiple word embeddings for sentence classification. arXiv preprint arXiv:1603.00968
Zhao J, Zhan Z, Yang Q, Zhang Y, Hu C, Li Z, Zhang L, He Z (2018) Adaptive learning of local semantic and global structure representations for text classification. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 2033–2043
Zheng W, Zheng Z, Wan H, Chen C (2019) Dynamically route hierarchical structure representation to attentive capsule for text classification. In: IJCAI, pp. 5464–5470
Zhou C, Sun C, Liu Z, Lau F (2015) A c-lstm neural network for text classification. arXiv preprint arXiv:1511.08630
Zhou Q, Wang X, Dong X (2018) Differentiated attentive representation learning for sentence classification. In: IJCAI, pp. 4630–4636

Download references

Author information

Wenhan Ruan and Xiangfu Meng have contributed equally to this work.

Authors and Affiliations

Department of Foundation, Liaoning Technical University, Longwan South Street, Longwan South Huludao, Liaoning, 125105, China
Wei Wang
School of Electronic and Information Engineering, Liaoning Technical University, Street, Huludao, Liaoning, 125105, China
Wei Wang, Wenhan Ruan & Xiangfu Meng
Shenyang Hunnan Branch, China Construction Bank, Xinlong Street, Shenyang, 110179, Liaoning, China
Wenhan Ruan

Authors

Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenhan Ruan
View author publications
You can also search for this author in PubMed Google Scholar
Xiangfu Meng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenhan Ruan.

Ethics declarations

Conflict of interest

Authors have no conflict of interest to declare.

Additional information

Responsible editor: Charalampos Tsourakakis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, W., Ruan, W. & Meng, X. MODE-Bi-GRU: orthogonal independent Bi-GRU model with multiscale feature extraction. Data Min Knowl Disc 38, 154–172 (2024). https://doi.org/10.1007/s10618-023-00964-2

Download citation

Received: 06 December 2022
Accepted: 12 July 2023
Published: 09 October 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10618-023-00964-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MODE-Bi-GRU: orthogonal independent Bi-GRU model with multiscale feature extraction

Abstract

Access this article

Similar content being viewed by others

Densely Connected Bidirectional LSTM with Applications to Sentence Classification

Chinese Sentence Semantic Matching Based on Multi-Granularity Fusion Model

Deletion-Based Sentence Compression Using Bi-enc-dec LSTM

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MODE-Bi-GRU: orthogonal independent Bi-GRU model with multiscale feature extraction

Abstract

Access this article

Similar content being viewed by others

Densely Connected Bidirectional LSTM with Applications to Sentence Classification

Chinese Sentence Semantic Matching Based on Multi-Granularity Fusion Model

Deletion-Based Sentence Compression Using Bi-enc-dec LSTM

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation