Skip to main content

Knowledge Extraction and Modeling from Scientific Publications

  • Conference paper
  • First Online:
Semantics, Analytics, Visualization. Enhancing Scholarly Data (SAVE-SD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9792))

Included in the following conference series:

Abstract

During the last decade the amount of scientific articles available online has substantially grown in parallel with the adoption of the Open Access publishing model. Nowadays researchers, as well as any other interested actor, are often overwhelmed by the enormous and continuously growing amount of publications to consider in order to perform any complete and careful assessment of scientific literature. As a consequence, new methodologies and automated tools to ease the extraction, semantic representation and browsing of information from papers are necessary. We propose a platform to automatically extract, enrich and characterize several structural and semantic aspects of scientific publications, representing them as RDF datasets. We analyze papers by relying on the scientific Text Mining Framework developed in the context of the European Project Dr. Inventor. We evaluate how the Framework supports two core scientific text analysis tasks: rhetorical sentence classification and extractive text summarization. To ease the exploration of the distinct facets of scientific knowledge extracted by our platform, we present a set of tailored Web visualizations. We provide on-line access to both the RDF datasets and the Web visualizations generated by mining the papers of the 2015 ACL-IJCNLP Conference.

This work is (partly) supported by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502) and by the European Project Dr. Inventor (FP7-ICT-2013.8.1 - Grant no: 611383).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.ncbi.nlm.nih.gov/pubmed.

  2. 2.

    http://www.scopus.com/.

  3. 3.

    http://www.webofknowledge.com/.

  4. 4.

    https://doaj.org/.

  5. 5.

    http://jats.nlm.nih.gov/.

  6. 6.

    http://www.elsevier.com/author-schemas/elsevier-xml-dtds-and-transport-schemas.

  7. 7.

    https://rawgit.com/essepuntato/rash/master/documentation/index.html.

  8. 8.

    http://backingdata.org/dri/library/.

  9. 9.

    http://pdfx.cs.man.ac.uk/.

  10. 10.

    http://www.bibsonomy.org/help/doc/api.html.

  11. 11.

    http://search.crossref.org/help/api.

  12. 12.

    http://freecite.library.brown.edu/welcome.

  13. 13.

    https://code.google.com/p/mate-tools/.

  14. 14.

    http://sempub.taln.upf.edu/dricorpus.

  15. 15.

    http://babelfy.org/.

  16. 16.

    http://babelnet.org/.

  17. 17.

    http://www.taln.upf.edu/pages/summa.upf/.

  18. 18.

    Rouge-2 is a measure which compares n-grams in automatic summaries to n-grams in gold stadard summaries.

  19. 19.

    Download link: http://backingdata.org/dri/viz/.

  20. 20.

    http://www.sparontologies.net/.

  21. 21.

    http://backingdata.org/dri/viz/.

References

  1. Munroe, R.: The rise of open access. Science 342(6154), 58–59 (2013). https://www.sciencemag.org/content/342/6154/58.full

    Article  Google Scholar 

  2. Björk, B.C., Laakso, M., Welling, P., Paetau, P.: Anatomy of green open access. J. Assoc. Inf. Sci. Technol. 65(2), 237–250 (2014)

    Article  Google Scholar 

  3. Solomon, D.J., Laakso, M., Björk, B.C.: A longitudinal comparison of citation rates and growth among open access journals. J. Inf. 7(3), 642–650 (2013)

    Article  Google Scholar 

  4. Lewis, D.W.: The inevitability of open access. Coll. Res. Libr. 73(5), 493–506 (2012)

    Article  Google Scholar 

  5. Huh, S.: Coding practice of the journal article tag suite extensible markup language. Sci. Editing 1(2), 105–112 (2014)

    Article  Google Scholar 

  6. Constantin, A., Pettifer, S., Voronkov, A.: PDFX: fully-automated PDF-to-XML conversion of scientific literature. In: Proceedings of the 2013 ACM Symposium on Document Engineering, pp. 177–180. ACM (2013)

    Google Scholar 

  7. Bohnet, B.: Very high accuracy and fast dependency parsing is not a contradiction. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 89–97. Association for Computational Linguistics (2010)

    Google Scholar 

  8. Tkaczyk, D., Szostek, P., Dendek, P.J., Fedoryszak, M., Bolikowski, L.: CERMINE-automatic extraction of metadata and references from scientific literature. In: 2014 11th IAPR International Workshop on Document Analysis Systems (DAS), pp. 217–221. IEEE (2014)

    Google Scholar 

  9. Ramakrishnan, C., Patnia, A., Hovy, E.H., Burns, G.A.: Layout-aware text extraction from full-text PDF of scientific articles. Sour. Code Biol. Med. 7(1), 7 (2012)

    Article  Google Scholar 

  10. Peng, F., McCallum, A.: Information extraction from research papers using conditional random fields. Inf. Process. Manage. 42(4), 963–979 (2006)

    Article  Google Scholar 

  11. Do, H.H.N., Chandrasekaran, M.K., Cho, P.S., Kan, M.Y.: Extracting and matching authors and affiliations in scholarly documents. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 219–228. ACM (2013)

    Google Scholar 

  12. Councill, I.G., Giles, C.L., Kan, M.Y.: ParsCit: an open-source CRF reference string parsing package. In: LREC (2008)

    Google Scholar 

  13. Luong, M.T., Nguyen, T.D., Kan, M.Y.: Logical structure recovery in scholarly articles with rich document features. In: Multimedia Storage and Retrieval Innovations for Digital Library Systems, vol. 270 (2012)

    Google Scholar 

  14. Liakata, M., Saha, S., Dobnik, S., Batchelor, C., Rebholz-Schuhmann, D.: Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics 28(7), 991–1000 (2012)

    Article  Google Scholar 

  15. Teufel, S.: The structure of scientific articles: applications to citation indexing and summarization. Comput. Linguist. 38(2), 443–445 (2012)

    Article  Google Scholar 

  16. Nakov, P.I., Schwartz, A.S., Hearst, M.: Citances: citation sentences for semantic analysis of bioscience text. In: Proceedings of the SIGIR 2004 Workshop on Search and Discovery in Bioinformatics, pp. 81–88 (2004)

    Google Scholar 

  17. Abu-Jbara, A., Ezra, J., Radev, D.R.: Purpose and polarity of citation: towards NLP-based bibliometrics. In: HLT-NAACL, pp. 596–606 (2013)

    Google Scholar 

  18. Abu-Jbara, A., Radev, D.: Coherent citation-based summarization of scientific papers. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 500–509. Association for Computational Linguistics (2011)

    Google Scholar 

  19. Ronzano, F., Saggion, H.: Taking advantage of citances: citation scope identification and citation-based summarization. In: Text Analytics Conference (2014)

    Google Scholar 

  20. Smit, E., Van Der Graaf, M.: Journal article mining: the scholarly publishers’ perspective. Learn. Publ. 25(1), 35–46 (2012)

    Article  Google Scholar 

  21. Ciancarini, P., Iorio, A., Nuzzolese, A.G., Peroni, S., Vitali, F.: Semantic annotation of scholarly documents and citations. In: Baldoni, M., Baroglio, C., Boella, G., Micalizio, R. (eds.) AI*IA 2013. LNCS (LNAI), vol. 8249, pp. 336–347. Springer, Cham (2013). doi:10.1007/978-3-319-03524-6_29

    Chapter  Google Scholar 

  22. Sateli, B., Witte, R.: What’s in this paper?: Combining rhetorical entities with linked open data for semantic literature querying. In: Proceedings of the 24th International Conference on World Wide Web Companion, pp. 1023–1028 (2015)

    Google Scholar 

  23. Shotton, D.: Semantic publishing: the coming revolution in scientific journal publishing. Learn. Publ. 22(2), 85–94 (2009)

    Article  Google Scholar 

  24. Iorio, A.D., Lange, C., Dimou, A., Vahdati, S.: Semantic publishing challenge – assessing the quality of scientific output by information extraction and interlinking. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) SemWebEval 2015. CCIS, vol. 548, pp. 65–80. Springer, Cham (2015). doi:10.1007/978-3-319-25518-7_6

    Chapter  Google Scholar 

  25. Tkaczyk, D., Szostek, P., Fedoryszak, M., Dendek, P.J., Bolikowski, Ł.: CERMINE: automatic extraction of structured metadata from scientific literature. Int. J. Doc. Anal. Recogn. (IJDAR) 18(4), 317–335 (2015)

    Article  Google Scholar 

  26. Ronzano, F., Saggion, H.: Dr. Inventor framework: extracting structured information from scientific publications. In: Japkowicz, N., Matwin, S. (eds.) DS 2015. LNCS (LNAI), vol. 9356, pp. 209–220. Springer, Cham (2015). doi:10.1007/978-3-319-24282-8_18

    Chapter  Google Scholar 

  27. Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K.: Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput. Biol. 9(2), e1002854 (2013)

    Article  Google Scholar 

  28. Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT press, Cambridge (2002)

    Google Scholar 

  29. Fisas, B., Ronzano, F., Saggion, H.: On the discoursive structure of computer graphics research papers. In: The 9th Linguistic Annotation Workshop held in Conjuncion with NAACL 2015, p. 42 (2015)

    Google Scholar 

  30. Fisas, B., Ronzano, F., Saggion, H.: A multi-layered annotated corpus of scientific papers. In: The Language Resource and Evaluation Conference (2016)

    Google Scholar 

  31. Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the ACL 2004 on Interactive poster and demonstration sessions, p. 20. Association for Computational Linguistics (2004)

    Google Scholar 

  32. Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, vol. 8 (2004)

    Google Scholar 

  33. Moro, A., Cecconi, F., Navigli, R.: Multilingual word sense disambiguation and entity linking for everybody. In: Proceedings of ISWC (P&D), pp. 25–28 (2014)

    Google Scholar 

  34. Saggion, H.: SUMMA: a robust and adaptable summarization tool. Traitement Automatique des Langues 49(2), 103–125 (2008)

    Google Scholar 

  35. Ronzano, F., Fisas, B., Bosque, G.C., Saggion, H.: On the automated generation of scholarly publishing linked datasets: the case of CEUR-WS proceedings. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) SemWebEval 2015. CCIS, vol. 548, pp. 177–188. Springer, Cham (2015). doi:10.1007/978-3-319-25518-7_15

    Chapter  Google Scholar 

  36. Peroni, S.: The semantic publishing and referencing ontologies. In: Peroni, S. (ed.) Semantic Web Technologies and Legal Scholarly Publishing. Law, Governance and Technology Series, vol. 15, pp. 121–193. Springer, Heidelberg (2014)

    Google Scholar 

  37. Thakker, D., Osman, T., Lakin, P.: Gate jape grammar tutorial. Nottingham Trent University, UK, Phil Lakin, UK, Version 1 (2009)

    Google Scholar 

  38. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  39. O’Donoghue, D.P., Abgaz, Y., Hurley, D., Ronzano, F., Saggion, H.: Stimulating and simulating creativity with Dr. Inventor. In: The Proceedings of the International Conference on Computational Creativity (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Ronzano .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Ronzano, F., Saggion, H. (2016). Knowledge Extraction and Modeling from Scientific Publications. In: González-Beltrán, A., Osborne, F., Peroni, S. (eds) Semantics, Analytics, Visualization. Enhancing Scholarly Data. SAVE-SD 2016. Lecture Notes in Computer Science(), vol 9792. Springer, Cham. https://doi.org/10.1007/978-3-319-53637-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53637-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53636-1

  • Online ISBN: 978-3-319-53637-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics