Convolutional Neural Network for Core Sections Identification in Scientific Research Publications

Muhammad, Bello Aliyu; Iqbal, Rahat; James, Anne; Nkantah, Dianabasi

doi:10.1007/978-3-030-33607-3_29

Bello Aliyu Muhammad¹⁴,
Rahat Iqbal¹⁴,
Anne James¹⁴ &
…
Dianabasi Nkantah¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11871))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1654 Accesses
2 Citations

Abstract

The overwhelming volume of data generated online continuous to grow at an exponential and unprecedented rate. Over 80% of such data is unstructured. Scientific research publications constitute a significant portion of such unstructured data. Systematic literature review (SLR) activity is a rigorous and challenging process. The key challenge in SLR is the automatic extraction of the relevant data from the sheer volume of research publications. Lack of a unified framework has been identified as the key problem. A canonical model, based on the structure of the papers was proposed as the framework for data extraction purposes in SLR. Implemented as a classification problem, traditional machine learning models were used to realise the canonical model. A good accuracy was reported in these traditional models. However, there is room for improvement. This paper presents the result of the work on the same problem using convolutional neural network (CNN), which is more sophisticated (deeper). The results show an improvement over the traditional machine learning models with an accuracy of 85%. Unlike the previous CNN NLP works, this work also demonstrates the application of CNN on a bigger NLP dataset such as the data from the scientific research publications. The result also shows that the CNN performs even better in NLP tasks with bigger datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Melinat, P., Kreuzkam, T., Stamer, D.: Information overload: a systematic literature review. In: Johansson, B., Andersson, B., Holmberg, N. (eds.) BIR 2014. LNBIP, vol. 194, pp. 72–86. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11370-8_6
Chapter Google Scholar
Landhuis, E.: Scientific literature: information overload. Nature 535(7612), 457 (2016)
Article Google Scholar
Blumberg, R., Atre, S.: The problem with unstructured data. DM Rev. 13(42–49), 62 (2003)
Google Scholar
Muhammad, A.B., Iqbal, R., James, A.: The canonical model of structure for data extraction in systematic reviews of scientific research articles. In: 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 264–271. IEEE (2018)
Google Scholar
Jaspers, S., De Troyer, E., Aerts, M.: Machine learning techniques for the automation of literature reviews and systematic reviews in EFSA. EFSA Support. Publ. 15(6), 1427E (2018)
Google Scholar
Jonnalagadda, S., Goyal, P., Huffman, M.: Automating data extraction in systematic reviews: a systematic review. Syst. Rev. 4(1), 78 (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS 2012 (2012)
Google Scholar
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of ICASSP 2013 (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS 2013 (2013)
Google Scholar
Dauphin, Y.N., Fan, A., Auli, M., Grangier, D.: Language modelling with gated convolutional networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 933–941. JMLR.org, August 2017
Google Scholar
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J.: Relation classification via convolutional deep neural network. In: COLING, pp. 2335–2344 (2014)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. Adaptive Computation and Machine Learning Series. MIT Press, Cambridge (2016)
Google Scholar
Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv preprint arXiv:1611.06639 (2016)
Muhammad, A.B., Iqbal, R., James, A.: Machine learning based data analytics for automatic identification of core sections in research publications. Nat. Lang. Eng. J. (2019, under review)
Google Scholar

Download references

Author information

Authors and Affiliations

Coventry University, Coventry, UK
Bello Aliyu Muhammad, Rahat Iqbal, Anne James & Dianabasi Nkantah

Authors

Bello Aliyu Muhammad
View author publications
You can also search for this author in PubMed Google Scholar
Rahat Iqbal
View author publications
You can also search for this author in PubMed Google Scholar
Anne James
View author publications
You can also search for this author in PubMed Google Scholar
Dianabasi Nkantah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bello Aliyu Muhammad .

Editor information

Editors and Affiliations

University of Manchester, Manchester, UK
Hujun Yin
Technical University of Madrid, Madrid, Spain
David Camacho
University of Birmingham, Birmingham, UK
Peter Tino
University of Huelva, Huelva, Spain
Antonio J. Tallón-Ballesteros
University of Exeter, Exeter, UK
Ronaldo Menezes
University of Manchester, Manchester, UK
Richard Allmendinger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Muhammad, B.A., Iqbal, R., James, A., Nkantah, D. (2019). Convolutional Neural Network for Core Sections Identification in Scientific Research Publications. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A., Menezes, R., Allmendinger, R. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2019. IDEAL 2019. Lecture Notes in Computer Science(), vol 11871. Springer, Cham. https://doi.org/10.1007/978-3-030-33607-3_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-33607-3_29
Published: 18 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33606-6
Online ISBN: 978-3-030-33607-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics