Advances in Deep Parsing of Scholarly Paper Content

  • Ulrich Schäfer
  • Bernd Kiefer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6699)


We report on advances in deep linguistic parsing of the full textual content of 8200 papers from the ACL Anthology, a collection of electronically available scientific papers in the fields of Computational Linguistics and Language Technology.

We describe how – by incorporating new techniques – we increase both speed and robustness of deep analysis, specifically on long sentences where deep parsing often failed in former approaches. With the current open source HPSG (Head-driven phrase structure grammar) for English (ERG), we obtain deep parses for more than 85% of the sentences in the 1.5 million sentences corpus, while the former approaches achieved only approx. 65% coverage.

The resulting sentence-wise semantic representations are used in the Scientist’s Workbench, a platform demonstrating the use and benefit of natural language processing (NLP) to support scientists or other knowledge workers in fast and better access to digital document content. With the generated NLP annotations, we are able to implement important, novel applications such as robust semantic search, citation classification, and (in the future) question answering and definition exploration.


Semantic Similarity Sentence Length Computational Linguistics Paper Corpus Parse Time 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adolphs, P., Oepen, S., Callmeier, U., Crysmann, B., Flickinger, D., Kiefer, B.: Some fine points of hybrid natural language parsing. In: Proc. of LREC, Marrakesh, Morocco, pp. 1380–1387 (2008)Google Scholar
  2. 2.
    Bird, S., Dale, R., Dorr, B., Gibson, B., Joseph, M., Kan, M.Y., Lee, D., Powley, B., Radev, D., Tan, Y.F.: The ACL anthology reference corpus: A reference dataset for bibliographic research. In: Proc. of LREC, Marrakesh, Morocco, pp. 1755–1759 (2008)Google Scholar
  3. 3.
    Brants, T.: TnT – A Statistical Part-of-Speech Tagger. In: Proc. of ANLP 2000, Seattle, WA, pp. 224–231 (2000)Google Scholar
  4. 4.
    Callmeier, U.: PET – A platform for experimentation with efficient HPSG processing techniques. Natural Language Engineering 6(1), 99–108 (2000)CrossRefGoogle Scholar
  5. 5.
    Copestake, A., Flickinger, D.: An open-source grammar development environment and broad-coverage English grammar using HPSG. In: Proc. of LREC, Athens, Greece, pp. 591–598 (2000)Google Scholar
  6. 6.
    Copestake, A., Flickinger, D., Sag, I.A., Pollard, C.: Minimal recursion semantics: an introduction. Research on Language and Computation 3(2-3), 281–332 (2005)CrossRefGoogle Scholar
  7. 7.
    Cramer, B., Zhang, Y.: Constraining robust constructions for broad-coverage parsing with precision grammars. In: Proc. of COLING, Beijing, China, pp. 223–231 (2010)Google Scholar
  8. 8.
    Drożdżyński, W., Krieger, H.U., Piskorski, J., Schäfer, U., Xu, F.: Shallow processing with unification and typed feature structures – foundations and applications. Künstliche Intelligenz 1, 17–23 (2004)Google Scholar
  9. 9.
    Flickinger, D., Oepen, S., Ytrestøl, G.: WikiWoods: Syntacto-semantic annotation for English Wikipedia. In: Proc. of LREC, Valletta, Malta, pp. 1665–1671 (2010)Google Scholar
  10. 10.
    Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Five papers on WordNet. Tech. rep., Cognitive Science Laboratory, Princeton University (1993)Google Scholar
  11. 11.
    Ninomiya, T., Tsuruoka, Y., Miyao, Y., Taura, K., Tsujii, J.: Fast and scalable HPSG parsing. Traitement automatique des langues (TAL) 46(2) (2006)Google Scholar
  12. 12.
    Pollard, C., Sag, I.A.: Head-Driven Phrase Structure Grammar. Studies in Contemporary Linguistics. University of Chicago Press, Chicago (1994)Google Scholar
  13. 13.
    Rupp, C., Copestake, A., Corbett, P., Waldron, B.: Integrating general-purpose and domain-specific components in the analysis of scientific text. In: Proc. of the UK e-Science Programme All Hands Meeting 2007, Nottingham, UK (2007)Google Scholar
  14. 14.
    Sætre, R., Kenji, S., Tsujii, J.: Syntactic features for protein-protein interaction extraction. In: Baker, C.J., Jian, S. (eds.) Short Paper Proc. of the 2nd Int. Symp. on Languages in Biology and Medicine (LBM 2007), Singapore, pp. 6.1–6.14 (2008)Google Scholar
  15. 15.
    Schäfer, U.: Middleware for creating and combining multi-dimensional NLP markup. In: Proc. of the EACL-2006 Workshop on Multi-dimensional Markup in Natural Language Processing, Trento, Italy, pp. 81–84 (2006)Google Scholar
  16. 16.
    Schäfer, U., Kasterka, U.: Scientific authoring support: A tool to navigate in typed citation graphs. In: Proc. of the NAACL-HLT 2010 Workshop on Computational Linguistics and Writing, Los Angeles, CA, pp. 7–14 (2010)Google Scholar
  17. 17.
    Schäfer, U., Spurk, C.: TAKE Scientist’s Workbench: Semantic search and citation-based visual navigation in scholar papers. In: Proc. of the 4th IEEE Int. Conference on Semantic Computing (ICSC 2010), Pittsburgh, PA, pp. 317–324 (2010)Google Scholar
  18. 18.
    Schäfer, U., Uszkoreit, H., Federmann, C., Marek, T., Zhang, Y.J.: Extracting and querying relations in scientific papers. In: Dengel, A.R., Berns, K., Breuel, T.M., Bomarius, F., Roth-Berghofer, T.R. (eds.) KI 2008. LNCS (LNAI), vol. 5243, pp. 127–134. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Ulrich Schäfer
    • 1
  • Bernd Kiefer
    • 1
  1. 1.Language Technology LabGerman Research Center for Artificial Intelligence (DFKI)SaarbrückenGermany

Personalised recommendations