Skip to main content
Log in

Automatic zone identification in scientific papers via fusion techniques

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Zone identification is a topic in the area of text mining which helps researchers be benefited by the content of scientific papers in a satisfactory manner. The major aim of zone identification is to classify the sentences of scientific texts into some predefined zone categories which can be useful for summarization as well as information extraction. In this paper, we propose a two-level approach to zone identification within which the first level is in charge of classifying the sentences in a given paper based on some semantic and lexical features. In this respect, several machine learning algorithms such as Simple Logistics, Logistic Model Trees and Sequential Minimal Optimization are applied. The second level is responsible for applying fusion to the classification results obtained for consecutive sentences of the first level in order to make the final decision. The proposed method is evaluated on ART and DRI corpora as two well-known data sets. Results obtained for the accuracy of zone identification for these corpora are respectively 65.75% and 84.15%, which seem to be quite promising compared to those obtained by previous approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Agarwal, S., & Yu, H. (2009). Automatically classifying sentences in full-text biomedical articles into introduction, methods, results and discussion. Bioinformatics, 25(23), 3174–3180.

    Article  Google Scholar 

  • Badie, K., Asadi, N., & Tayefeh Mahmoudi, M. (2018). Zone identification based on features with high semantic richness and combining results of separate classifiers. Journal of Information and Telecommunication, 2(4), 411–427.

    Article  Google Scholar 

  • Barua, S. (2013). Multi-sensor information fusion for classification of driver’s physiological sensor data. Master's thesis, Mlardalen University, Sweden.

  • Castanedo, F. (2013). A review of data fusion techniques. The Scientific World Journal, 2013, 1–19.

    Article  Google Scholar 

  • Dasigi, V., Mann, R. C., & Protopopescu, V. A. (2001). Information fusion for text classificationan experimental comparison. Pattern Recognition, 34(12), 2413–2425.

    Article  MATH  Google Scholar 

  • Fisas, B., Saggion, H., & Ronzano, F. (2015). On the discoursive structure of computer graphics research papers. In LAW@ NAACL-HLT (pp. 42–51).

  • Groza, T. (2013). Using typed dependencies to study and recognise conceptualisation zones in biomedical literature. PLoS ONE, 8(11), e79570.

    Article  Google Scholar 

  • Groza, T., Hassanzadeh, H., & Hunter, J. (2013). Recognizing scientific artifacts in biomedical literature. Biomedical Informatics Insights, 6, 15.

    Google Scholar 

  • Guo, Y., Korhonen, A., & Poibeau, T. (2011). A weakly-supervised approach to argumentative zoning of scientific documents. In Proceedings of the conference on empirical methods in natural language processing (pp. 273–283). Association for Computational Linguistics.

  • Guo, Y., Korhonen, A., Silins, I., & Stenius, U. (2011). Weakly supervised learning of information structure of scientific abstractsis it accurate enough to benefit real-world tasks in biomedicine? Bioinformatics, 27(22), 3179–3185.

    Article  Google Scholar 

  • Guo, Y., Reichart, R., & Korhonen, A. (2015). Unsupervised declarative knowledge induction for constraint-based learning of information structure in scientific documents. Transactions of the Association for Computational Linguistics, 3, 131–143.

    Article  Google Scholar 

  • Guo, Y., Silins, I., Stenius, U., & Korhonen, A. (2013). Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review. Bioinformatics, 29(11), 1440–1447.

    Article  Google Scholar 

  • Heffernan, K., & Teufel, S. (2018). Identifying problems and solutions in scientific text. Scientometrics, 116(2), 1367–1382.

    Article  Google Scholar 

  • Hirohata, K., Okazaki, N., Ananiadou, S., & Ishizuka, M. (2008). Identifying sections in scientific abstracts using conditional random fields. In Proceedings of the third international joint conference on natural language processing: volume-I.

  • Holmes, G., Donkin, A., & Witten, I.H. (1994). Weka: A machine learning workbench. In Proceedings of the second Australian and New Zealand conference on intelligent information systems (pp. 357–361). IEEE.

  • Kiela, D., Guo, Y., Stenius, U., & Korhonen, A. (2014). Unsupervised discovery of information structure in biomedical documents. Bioinformatics, 31(7), 1084–1092.

    Article  Google Scholar 

  • Kilicoglu, H. (2018). Biomedical text mining for research rigor and integrity: Tasks, challenges, directions. Briefings in Bioinformatics, 19(6), 1400–1414.

    Google Scholar 

  • Kuncheva, L. I. (2014). Combining pattern classifiers: Methods and algorithms (2nd ed.). New York: Wiley.

    MATH  Google Scholar 

  • Landwehr, N., Hall, M., & Frank, E. (2005). Logistic model trees. Machine Learning, 59(1–2), 161–205.

    Article  MATH  Google Scholar 

  • Liakata, M., Dobnik, S., Saha, S., Batchelor, C.R., & Rebholz-Schuhmann, D. (2013). A discourse-driven content model for summarising scientific articles evaluated in a complex question answering task. In EMNLP (pp 747–757).

  • Liakata, M., Teufel, S., Siddharthan, A., & Batchelor, C. R., et al. (2010). Corpora for the conceptualisation and zoning of scientific papers. In LREC.

  • Liakata, M., Saha, S., Dobnik, S., Batchelor, C., & Rebholz-Schuhmann, D. (2012). Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics, 28(7), 991–1000.

    Article  Google Scholar 

  • Mangai, U. G., Samanta, S., Das, S., & Chowdhury, P. R. (2010). A survey of decision fusion and feature fusion strategies for pattern classification. IETE Technical Review, 27(4), 293–307.

    Article  Google Scholar 

  • Mann, G. S., & McCallum, A. (2010). Generalized expectation criteria for semi-supervised learning with weakly labeled data. Journal of Machine Learning Research, 11, 955–984.

    MathSciNet  MATH  Google Scholar 

  • Mizuta, Y., & Collier, N. (2004). Zone identification in biology articles as a basis for information extraction. In Proceedings of the international joint workshop on natural language processing in biomedicine and its applications (pp. 29–35). Association for Computational Linguistics.

  • Platt, J. C. (1999). Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods, 185–208.

  • Rajesh, P., & Karthikeyan, M. (2017). A comparative study of data mining algorithms for decision tree approaches using weka tool. Advances in Natural and Applied Sciences, 11(9), 230–243.

    Google Scholar 

  • Ronzano, F., & Saggion, H. (2016). Knowledge extraction and modeling from scientific publications. In International workshop on semantic, analytics, visualization (pp. 11–25). Springer.

  • Saggion, H., & Ronzano, F. (2016). Natural language processing for intelligent access to scientific information. In COLING (Tutorials) (pp. 9–13).

  • Sarinnapakorn, K., & Kubat, M. (2007). Combining subclassifiers in text categorization: A dst-based solution and a case study. IEEE Transactions on Knowledge and Data Engineering, 19(12), 1638–1651.

    Article  Google Scholar 

  • Soldatova, L., & Liakata, M. (2007). An ontology methodology and cisp-the proposed core information about scientific papers. JISC Project Report.

  • Suanmali, L., Binwahlan, M.S., & Salim, N. (2009). Sentence features fusion for text summarization using fuzzy logic. In Ninth international conference on hybrid intelligent systems (Vol. 1, pp. 142–146). IEEE.

  • Sumner, M., Frank, E., & Hall, M. (2005). Speeding up logistic model tree induction. In European conference on principles of data mining and knowledge discovery (pp. 675–683). Springer.

  • Teufel, S. (2000). Argumentative zoning: Information extraction from scientific text. Ph.D. thesis, University of Edinburgh.

  • Teufel, S., & Kan, M.Y. (2011). Robust argumentative zoning for sensemaking in scholarly documents. In Advanced language technologies for digital libraries (pp. 154–170). Springer.

  • Teufel, S., Siddharthan, A., & Batchelor, C. (2009). Towards discipline-independent argumentative zoning: Evidence from chemistry and computational linguistics. In Proceedings of the 2009 conference on empirical methods in natural language processing (Vol. 3, pp. 1493–1502). Association for Computational Linguistics.

  • Teufel, S., & Moens, M. (1999). Argumentative classification of extracted sentences as a first step towards flexible abstracting. Advances in Automatic Text Summarization, 155, 1–171.

    Google Scholar 

  • Uma Shankar, B., Meher, S., Ghosh, A., & Bruzzone, L. (2006). Remote sensing image classification: A neuro-fuzzy mcs approach. In Computer vision, graphics and image processing (pp. 128–139).

  • Uysal, A. K. (2016). An improved global feature selection scheme for text classification. Expert Systems with Applications, 43, 82–92.

    Article  Google Scholar 

  • Ware, M., & Mabe, M. (2015). The stm report: An overview of scientific and scholarly journal publishing. Oxford: International Association of Scientific: Technical and Medical Publishers.

    Google Scholar 

  • Wilbur, W. J., Rzhetsky, A., & Shatkay, H. (2006). New directions in biomedical text annotation: Definitions, guidelines and corpus construction. BMC Bioinformatics, 7(1), 356.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maryam Tayefeh Mahmoudi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Asadi, N., Badie, K. & Mahmoudi, M.T. Automatic zone identification in scientific papers via fusion techniques. Scientometrics 119, 845–862 (2019). https://doi.org/10.1007/s11192-019-03060-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-019-03060-9

Keywords

Navigation