PAPAGENO: A Parallel Parser Generator for Operator Precedence Grammars
- 809 Downloads
In almost all language processing applications, languages are parsed employing classical algorithms (such as the LR(1) parsers generated by Bison), which are sequential due to their left-to-right state-dependent nature. Although early theoretical studies on parallel parsing algorithms delineated potential speedups on abstract parallel machines using a data-parallel approach, practical developments have not materialized, except in recent experiments on ad hoc parsers for large XML files. We describe a general-purpose practical generator (PAPAGENO) able to produce efficient deterministic parallel parsers, which exhibit significant speedups when parsing large texts on modern multi-core machines, while not penalizing sequential operation. The generated parser relies on the properties of Floyd’s operator precedence grammars, to provide a naturally parallel implementation of the parsing process. Parsing of each text portion proceeds in parallel and independently, without communication and synchronization, until all partial parse stacks are recombined into the final result. Since Floyd’s grammars can express most syntaxes with little adaptation, we have performed extensive experiments, on both synthetically generated texts and real JSON documents. The effective parallel code portion in the generated parsers exceeds 80% for most of the tested scenarios.
KeywordsParser generation Parallel Parsing Floyd Operator Precedence Grammars
Unable to display preview. Download preview PDF.
- 1.Grune, D., Jacobs, C.J.H.: Parsing techniques a practical guide. Ellis Horwood Limited, Chichester (1990)Google Scholar
- 8.Lu, W., Chiu, K., Pan, Y.: A parallel approach to XML parsing. In: GRID, pp. 223–230. IEEE (2006)Google Scholar
- 9.Pan, Y., Zhang, Y., Chiu, K.: Hybrid Parallelism for XML SAX Parsing. In: IEEE International Conference on Web Services, ICWS 2008, pp. 505–512. IEEE Computer Society (2008)Google Scholar
- 10.Cole, M.: Parallel programming, list homomorphisms and the maximum segment sum problem. In: Proceedings of ParCo., vol. 93, pp. 211–230 (1993)Google Scholar
- 11.Germann, U., Joanis, E., Larkin, S.: Tightly packed tries: How to fit large models into memory, and make them load fast, too. In: Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 31–39 (2009)Google Scholar
- 12.Ramachandran, S.: Web metrics: Size and number of resources. Technical report, Google (2010)Google Scholar