Abstract
High-precision linguistic and semantic analysis of scientific texts is an emerging research area. We describe methods and an application for extracting interesting factual relations from scientific texts in computational linguistics and language technology. We use a hybrid NLP architecture with shallow preprocessing for increased robustness and domain-specific, ontology-based named entity recognition, followed by a deep HPSG parser running the English Resource Grammar (ERG). The extracted relations in the MRS (minimal recursion semantics) format are simplified and generalized using WordNet. The resulting ‘quriples’ are stored in a database from where they can be retrieved by relation-based search. The query interface is embedded in a web browser-based application we call the Scientist’s Workbench. It supports researchers in editing and online-searching scientific papers.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bird, S., Dale, R., Dorr, B., Gibson, B., Joseph, M., Kan, M.Y., Lee, D., Powley, B., Radev, D., Tan, Y.F.: The ACL anthology reference corpus: a reference dataset for bibliographic research. In: Proc. of LREC, Marrakech, Morocco (2008)
Schäfer, U.: Integrating Deep and Shallow Natural Language Processing Components – Representations and Hybrid Architectures. PhD thesis, Faculty of Mathematics and Computer Science, Saarland University, Saarbrücken, Germany (2007)
Brants, T.: TnT - A Statistical Part-of-Speech Tagger. In: Proc. of Eurospeech, Rhodes, Greece (2000)
Drożdżyński, W., Krieger, H.U., Piskorski, J., Schäfer, U., Xu, F.: Shallow processing with unification and typed feature structures – foundations and applications. Künstliche Intelligenz 2004(1), 17–23 (2004)
Callmeier, U.: PET – A platform for experimentation with efficient HPSG processing techniques. Natural Language Engineering 6(1), 99–108 (2000)
Copestake, A., Flickinger, D.: An open-source grammar development environment and broad-coverage English grammar using HPSG. In: Proc. of LREC, Athens, Greece, pp. 591–598 (2000)
Oepen, S., Flickinger, D., Toutanova, K., Manning, C.D.: LinGO redwoods: A rich and dynamic treebank for HPSG. In: Proc. of the Workshop on Treebanks and Linguistic Theories, TLT 2002, Sozopol, Bulgaria, September 20–21 (2002)
Copestake, A., Flickinger, D., Sag, I.A., Pollard, C.: Minimal recursion semantics: an introduction. Journal of Research on Language and Computation 3(2–3) (2005)
Uszkoreit, H., Jörg, B., Erbach, G.: An ontology-based knowledge portal for language technology. In: Proc. of ENABLER/ELSNET Workshop, Paris (2003)
Schäfer, U.: OntoNERdIE – mapping and linking ontologies to named entity recognition and information extraction resources. In: Proc. of LREC, Genoa, Italy (2006)
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Five papers on WordNet. Technical report, Cognitive Science Lab, Princeton University (1993)
Rupp, C., Copestake, A., Corbett, P., Waldron, B.: Integrating general-purpose and domain-specific components in the analysis of scientific text. In: Proc. of the UK e-Science Programme All Hands Meeting 2007, Nottingham, UK (2007)
Sætre, R., Kenji, S., Tsujii, J.: Syntactic features for protein-protein interaction extraction. In: Baker, C.J., Jian, S. (eds.) Short Paper Proc. of the 2nd International Symposium on Languages in Biology and Medicine (LBM 2007), Singapore (2008)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schäfer, U., Uszkoreit, H., Federmann, C., Marek, T., Zhang, Y. (2008). Extracting and Querying Relations in Scientific Papers. In: Dengel, A.R., Berns, K., Breuel, T.M., Bomarius, F., Roth-Berghofer, T.R. (eds) KI 2008: Advances in Artificial Intelligence. KI 2008. Lecture Notes in Computer Science(), vol 5243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85845-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-85845-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85844-7
Online ISBN: 978-3-540-85845-4
eBook Packages: Computer ScienceComputer Science (R0)