Skip to main content

Convolutional Neural Network for Core Sections Identification in Scientific Research Publications

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2019 (IDEAL 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11871))

Abstract

The overwhelming volume of data generated online continuous to grow at an exponential and unprecedented rate. Over 80% of such data is unstructured. Scientific research publications constitute a significant portion of such unstructured data. Systematic literature review (SLR) activity is a rigorous and challenging process. The key challenge in SLR is the automatic extraction of the relevant data from the sheer volume of research publications. Lack of a unified framework has been identified as the key problem. A canonical model, based on the structure of the papers was proposed as the framework for data extraction purposes in SLR. Implemented as a classification problem, traditional machine learning models were used to realise the canonical model. A good accuracy was reported in these traditional models. However, there is room for improvement. This paper presents the result of the work on the same problem using convolutional neural network (CNN), which is more sophisticated (deeper). The results show an improvement over the traditional machine learning models with an accuracy of 85%. Unlike the previous CNN NLP works, this work also demonstrates the application of CNN on a bigger NLP dataset such as the data from the scientific research publications. The result also shows that the CNN performs even better in NLP tasks with bigger datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Melinat, P., Kreuzkam, T., Stamer, D.: Information overload: a systematic literature review. In: Johansson, B., Andersson, B., Holmberg, N. (eds.) BIR 2014. LNBIP, vol. 194, pp. 72–86. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11370-8_6

    Chapter  Google Scholar 

  2. Landhuis, E.: Scientific literature: information overload. Nature 535(7612), 457 (2016)

    Article  Google Scholar 

  3. Blumberg, R., Atre, S.: The problem with unstructured data. DM Rev. 13(42–49), 62 (2003)

    Google Scholar 

  4. Muhammad, A.B., Iqbal, R., James, A.: The canonical model of structure for data extraction in systematic reviews of scientific research articles. In: 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 264–271. IEEE (2018)

    Google Scholar 

  5. Jaspers, S., De Troyer, E., Aerts, M.: Machine learning techniques for the automation of literature reviews and systematic reviews in EFSA. EFSA Support. Publ. 15(6), 1427E (2018)

    Google Scholar 

  6. Jonnalagadda, S., Goyal, P., Huffman, M.: Automating data extraction in systematic reviews: a systematic review. Syst. Rev. 4(1), 78 (2015)

    Google Scholar 

  7. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS 2012 (2012)

    Google Scholar 

  8. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of ICASSP 2013 (2013)

    Google Scholar 

  9. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS 2013 (2013)

    Google Scholar 

  10. Dauphin, Y.N., Fan, A., Auli, M., Grangier, D.: Language modelling with gated convolutional networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 933–941. JMLR.org, August 2017

    Google Scholar 

  11. Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)

  12. Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)

  13. Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J.: Relation classification via convolutional deep neural network. In: COLING, pp. 2335–2344 (2014)

    Google Scholar 

  14. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. Adaptive Computation and Machine Learning Series. MIT Press, Cambridge (2016)

    Google Scholar 

  15. Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv preprint arXiv:1611.06639 (2016)

  16. Muhammad, A.B., Iqbal, R., James, A.: Machine learning based data analytics for automatic identification of core sections in research publications. Nat. Lang. Eng. J. (2019, under review)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bello Aliyu Muhammad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Muhammad, B.A., Iqbal, R., James, A., Nkantah, D. (2019). Convolutional Neural Network for Core Sections Identification in Scientific Research Publications. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A., Menezes, R., Allmendinger, R. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2019. IDEAL 2019. Lecture Notes in Computer Science(), vol 11871. Springer, Cham. https://doi.org/10.1007/978-3-030-33607-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33607-3_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33606-6

  • Online ISBN: 978-3-030-33607-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics