Advertisement

PAPAGENO: A Parallel Parser Generator for Operator Precedence Grammars

  • Alessandro Barenghi
  • Ermes Viviani
  • Stefano Crespi Reghizzi
  • Dino Mandrioli
  • Matteo Pradella
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7745)

Abstract

In almost all language processing applications, languages are parsed employing classical algorithms (such as the LR(1) parsers generated by Bison), which are sequential due to their left-to-right state-dependent nature. Although early theoretical studies on parallel parsing algorithms delineated potential speedups on abstract parallel machines using a data-parallel approach, practical developments have not materialized, except in recent experiments on ad hoc parsers for large XML files. We describe a general-purpose practical generator (PAPAGENO) able to produce efficient deterministic parallel parsers, which exhibit significant speedups when parsing large texts on modern multi-core machines, while not penalizing sequential operation. The generated parser relies on the properties of Floyd’s operator precedence grammars, to provide a naturally parallel implementation of the parsing process. Parsing of each text portion proceeds in parallel and independently, without communication and synchronization, until all partial parse stacks are recombined into the final result. Since Floyd’s grammars can express most syntaxes with little adaptation, we have performed extensive experiments, on both synthetically generated texts and real JSON documents. The effective parallel code portion in the generated parsers exceeds 80% for most of the tested scenarios.

Keywords

Parser generation Parallel Parsing Floyd Operator Precedence Grammars 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Grune, D., Jacobs, C.J.H.: Parsing techniques a practical guide. Ellis Horwood Limited, Chichester (1990)Google Scholar
  2. 2.
    Crespi Reghizzi, S., Mandrioli, D.: Operator precedence and the visibly push-down property. JCSS, Journ. Computer and System Science 78(6), 1837–1867 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Cohen, J., Kolodner, S.: Estimating the speedup in parallel parsing. IEEE Transactions on Software Engineering 11(1), 114–124 (1985)CrossRefGoogle Scholar
  4. 4.
    Sarkar, D., Deo, N.: Estimating the speedup in parallel parsing. IEEE Trans. on Softw. Eng. 16(7), 677 (1990)CrossRefGoogle Scholar
  5. 5.
    Mickunas, M.D., Schell, R.M.: Parallel compilation in a multiprocessor environment (extended abstract). In: Proceedings of the 1978 Annual Conference, ACM 1978, pp. 241–246. ACM, New York (1978)CrossRefGoogle Scholar
  6. 6.
    Goeman, H.: On parsing and condensing substrings of LR languages in linear time. Theor. Comput. Sci. 267, 61–82 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Bates, J., Lavie, A.: Recognizing substrings of LR(k) languages in linear time. ACM Trans. Program. Lang. Syst. 16, 1051–1077 (1994)CrossRefGoogle Scholar
  8. 8.
    Lu, W., Chiu, K., Pan, Y.: A parallel approach to XML parsing. In: GRID, pp. 223–230. IEEE (2006)Google Scholar
  9. 9.
    Pan, Y., Zhang, Y., Chiu, K.: Hybrid Parallelism for XML SAX Parsing. In: IEEE International Conference on Web Services, ICWS 2008, pp. 505–512. IEEE Computer Society (2008)Google Scholar
  10. 10.
    Cole, M.: Parallel programming, list homomorphisms and the maximum segment sum problem. In: Proceedings of ParCo., vol. 93, pp. 211–230 (1993)Google Scholar
  11. 11.
    Germann, U., Joanis, E., Larkin, S.: Tightly packed tries: How to fit large models into memory, and make them load fast, too. In: Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 31–39 (2009)Google Scholar
  12. 12.
    Ramachandran, S.: Web metrics: Size and number of resources. Technical report, Google (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Alessandro Barenghi
    • 1
  • Ermes Viviani
    • 1
  • Stefano Crespi Reghizzi
    • 1
  • Dino Mandrioli
    • 1
  • Matteo Pradella
    • 1
  1. 1.Dipartimento di Elettronica e InformazionePolitecnico di MilanoItaly

Personalised recommendations