(German) Language Processing for Lucene

  • Bastian EntrupEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9103)


This paper introduces an open-source Java-package called German Language Processing for Lucene (glp4lucene). Although it was originally developed to work with German texts, it is to a large degree language independent. It aims at facilitating four language processing steps for working with non-English texts and Apache Lucene/Solr: lemmatizing words, weighting terms based on their part-of-speech, adding synonyms and decompounding nouns, without the necessity of a thorough understanding of natural language processing.



This package was created for and within the GeoBib project to facilitate searching the project’s data set and will be used in the planed website. GeoBib is funded by the German Federal Ministry of Education and Research (grant no. 01UG1238A-B).


  1. 1.
    Biemann, C., Riedl, M.: Text: now in 2D! a framework for lexical expansion with contextual similarity. J. Lang. Model. 1(1), 55–95 (2013)CrossRefGoogle Scholar
  2. 2.
    Bohnet, B.: Very high accuracy and fast dependency parsing is not a contradiction. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 89–97. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
  3. 3.
    Braschler, M., Ripplinger, B.: How effective is stemming and decompounding for german text retrieval? Inf. Retr. 7(3–4), 291–316 (2004)CrossRefGoogle Scholar
  4. 4.
    Hamp, B., Feldweg, H.: GermaNet - a lexical-semantic net for german. In: Proceedings of ACL Workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, pp. 9–15 (1997)Google Scholar
  5. 5.
    Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual document retrieval for european languages. Inf. Retr. 7(1–2), 33–52 (2004)CrossRefGoogle Scholar
  6. 6.
    Jespersen, O.: The Philosophy of Grammar. Chicago Studies in Ethnomusicology Series. University of Chicago Press, Chicago (1992)Google Scholar
  7. 7.
    Kraaij, W., Pohlmann, R.E.: Viewing stemming as recall enhancement. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 40–48 (1996)Google Scholar
  8. 8.
    Leveling, J.: University of hagen at CLEF 2003: natural language access to the GIRT4 data. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 412–424. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  9. 9.
    Lioma, C., Blanco, R.: Part of speech based term weighting for information retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 412–423. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  10. 10.
    Lioma, C., van Rijsbergen, C.K.: Part of speech based term weighting for information retrieval. In: Revue Franaise de Linguistique Applique, vol. 1 (2008)Google Scholar
  11. 11.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 2. Cambridge University Press, Cambridge (2008) zbMATHCrossRefGoogle Scholar
  12. 12.
    McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action, Second Edition: Covers Apache Lucene 3.0. Manning Publications Co., Greenwich (2010)Google Scholar
  13. 13.
    Miller, G.A.: WordNet: a lexical database for english. Commun. ACM 38, 39–41 (1995)CrossRefGoogle Scholar
  14. 14.
    Seeker, W., Kuhn, J.: Making ellipses explicit in dependency conversion for a german treebank. In: LREC, pp. 3132–3139 (2012)Google Scholar
  15. 15.
    Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the NAACL on Human Language Technology, NAACL 2003, pp. 173–180. Association for Computational Linguistics, Stroudsburg (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Applied and Computational LinguisticsJustus-Liebig-Universität GießenGiessenGermany

Personalised recommendations