Confluent Orthogonal Drawings of Syntax Diagrams
Abstract
We provide a pipeline for generating syntax diagrams (also called railroad diagrams) from context free grammars. Syntax diagrams are a graphical representation of a context free language, which we formalize abstractly as a set of mutually recursive nondeterministic finite automata and draw by combining elements from the confluent drawing, layered drawing, and smooth orthogonal drawing styles. Within our pipeline we introduce several heuristics that modify the grammar but preserve the language, improving the aesthetics of the final drawing.
1 Introduction
The languages of computing, such as programming languages and data exchange formats, are typically specified using a finite set of rules called a grammar, and these rules are usually given in Backus–Naur Form or one of its extensions. Backus–Naur Form provides a notation rich enough to express all contextfree grammars, and in turn most grammars of practical interest, while being easily machine readable. However, being a purely textual representation, it is perhaps less readable by humans. For this reason, Jensen and Wirth used a graphical representation of contextfree grammars, called syntax diagrams, when defining the programming language Pascal [1].^{1} We investigate the problem of generating syntax diagrams for contextfree grammars and provide several heuristics optimizing the aesthetics of the resulting drawing. Our work provides the first algorithmic study of this problem and the first system that attempts to optimize the resulting diagram for readability rather than directly translating a given grammar into a diagram.
A contextfree grammar for the language of Sexpressions in LISP 1.5 [3]

A regular grammar is one in which the production rules all have the form \(A \rightarrow b\), \(A \rightarrow bC\) or \(A \rightarrow \epsilon \), where A and C are nonterminals, b is a terminal, and \(\epsilon \) is the empty string. An example of a regular grammar is the part of the LISP 1.5 grammar defining \(\langle \text {atom part}\rangle \). Languages definable by regular grammars are exactly the regular languages, whose equivalent characterizations include being recognizable by nondeterministic finite automata (NFAs). For these languages, we could use graph drawings of an NFA state graph as a graphical representation, by drawing an stdigraph with edges labeled by terminal symbols. A string \(\sigma \) is in the language if and only if there is a directed path through the graph from s to t such that the concatenation of the edge labels is equal to \(\sigma \). Unfortunately, such a representation will not work for nonregular languages.
To graphically represent contextfree languages we turn to syntax diagrams. Although other authors used syntax diagrams earlier [2], they were popularized by the Pascal User Manual and Report by Jensen and Wirth [1]. The style has been praised for its readability [4] and pedagogical value [5], and has been used by the Smalltalk80 Blue Book [6], JSON Data Interchange Standard [7], and the W3C technical report on CSS [8]. Several software packages have been created to automate the drawing of syntax diagrams [9, 10, 11]. These software packages provide little to no optimization of the drawing, providing only a onetoone translation of the Extended Backus–Naur grammars into syntax diagrams. Until now, there does not seem to be any algorithmic research involving the generation and optimization of syntax diagrams.
We introduce a new formalization for syntax diagrams consisting of a collection of stdigraphs (see e.g., Fig. 3), each representing the possible expansions of a single nonterminal symbol, with each edge in each graph labeled by either a terminal or a nonterminal symbol. As before a string is in the language if and only if the string can be represented by a directed path from s to t in the start symbol’s stdigraph. However, when this path would contain a nonterminal symbol, we recurse into the stdigraph corresponding to that symbol. The concatenation of the terminal symbols in the resulting system of recursively generated paths should match the sequence of terminal symbols in the given string.
Without further optimization this formalization merely gives a new notation for writing production rules, but it has two advantages over extended BNF. Firstly, it gives us additional freedom in our representation: a BNF grammar can only describe syntax diagrams formed by a collection of disjoint paths between the two terminals, and extended BNF can still only describe syntax diagrams in the form of seriesparallel graphs, while our diagrams are not restricted in these ways. Secondly, as we describe below, we can use this notation to directly represent the junctions and tracks of a confluent drawing style [12], in which a path through the graph is only valid if it is a smooth path, such as in Fig. 1 (right). It is this drawing style that gives rise to the occasionally used alternative name “railroad diagrams” for syntax diagrams.
1.1 Software Pipeline
The second and third steps in the pipeline attempt to reduce the number of total symbols in the NFA representation, through both global optimizations that act on the entire system of graphs and local optimizations that act on a single graph. The local optimization part of the pipeline is a form of the wellstudied problem of NFA minimization. In general exact NFA minimization is \(\mathsf {PSPACE}\)hard [17, 18], and furthermore approximating the minimum NFA efficiently to within an o(n) approximation ratio is also \(\mathsf {PSPACE}\)hard [19]. However, since the problem is of practical importance there are many heuristic approaches [20, 21]. In this paper, we use simple heuristics motivated by the structure of realworld grammars and typical simplifications found in hand drawn syntax diagrams, rather than attempting to implement the more complex heuristics devised for minimizing NFAs without regard to their appearance as a diagram.
Once the NFA representation is optimized, we draw each of the stdigraphs in a layered Sugiyama style [13, 14], rotated horizontally to direct edges from left to right. In these graphs, the only directed cycles come from tail recursion elimination, so rather than searching for a small feedback arc set to determine the reversed edges in the drawing, we maintain such a set during the process of NFA minimization and add to it whenever we perform a tail recursion elimination step. In this way, we can ensure that all the tokens in the drawing are traversed from left to right. Standard layered drawing optimizations are applicable in this stage, but were not implemented in our experiments as we were primarily interested in optimizing the NFA representation. Finally, we convert the layered drawing into a confluent syntax diagram.
1.2 Contributions
Our contributions in this paper are summarized below.

We formalize an abstract representation of syntax diagrams as a collection of mutually recursive NFAs, allowing the application of NFA minimization heuristics beyond what is possible with EBNF.

We formulate a software pipeline for producing syntax diagrams, based on NFA minimization and confluent layered graph drawing.

We develop a family of fast and simple NFA minimization heuristics, together with global heuristics that recombine multiple NFAs.

We describe a geometric layout method based on a horizontal Sugiyama layered drawing, where we reinterpret the vertices and edges in a layered drawing of an NFA as the junctions and vertices of a confluent drawing.

We provide a proofofconcept implementation that produces human quality syntax diagrams for realworld contextfree languages.

Finally, we experimentally evaluate the quality of our heuristics.
2 Global Minimization Heuristics

A cannot be the start symbol.

G and H must be two distinct graphs.

If H has more than one non\(\epsilon \) edge, then A must occur only once in the whole system of digraphs, and its occurrence must be in G.

The number of symbols in the graph produced by nesting H into G must be less than a predefined threshold k.
3 Local Minimization Heuristics
A local minimization heuristic seeks to minimize the total number of labeled edges in a single stdigraph within the NFA representation. Many of these optimizations can be seen in handdrawn syntax diagrams.
3.1 Tail Recursion Loop Back
3.2 Parallel State Elimination with Squish Heuristic
3.3 Epsilon Transition Removal
Our previous optimizations may introduce \(\epsilon \)labeled edges. We attempt to remove redundant \(\epsilon \)edges using the epsilon removal heuristic. If \(e = (u,v)\), with \(u\ne s\) and \(v \ne t\), is an \(\epsilon \) labeled edge, such that e is not a reversed edge (introduced via the loop back heuristic), and either e is the only outgoing edge of u or the only incoming edge to v, then the edge e is removed by merging u and v. We iteratively find and remove such edges until no such edge exists.
3.4 Confluent Pinch
3.5 Implementing the Heuristics
4 Sugiyama Layering
Once the NFA representation has been minimized, we give each of the stdigraphs a Sugiyamastyle layered drawing, using the standard layereddrawing pipeline for layout and crossing minimization. One modification that we make to this pipeline is that it is neither necessary nor desirable to compute a feedback arc set of the stdigraphs. Instead, the set of edges introduced during the loop back heuristic already form a feedback arc set with edges which should loop back into the drawing. Since we are using an orthogonal drawing style, we add bends to edges to allow them to shift their vertical positions from one layer to the next, and use an intervalgraph coloring algorithm to place the vertical connectors of these bent edges into a small number of columns.
Experimental results
Name  Optimized?  Area  Tokens  Components 

Canadian post codes  unoptimized  17  6  1 
(simple)  optimized  17  6  1 
Canadian post codes  unoptimized  693  69  9 
(complex)  optimized  1121  65  5 
Ottawa course codes  unoptimized  520  46  15 
optimized  570  36  5  
Palindromes  unoptimized  583  105  2 
optimized  583  105  2  
Nonempty data files  unoptimized  182  22  8 
(repetitive)  optimized  132  11  3 
Nonempty data files  unoptimized  143  22  7 
(recursive)  optimized  130  7  1 
Pascal variable declarations  unoptimized  156  21  7 
optimized  247  12  3  
Pascal type declarations  unoptimized  475  52  16 
optimized  486  30  6  
LISP 1.5  unoptimized  165  19  6 
optimized  105  9  1  
JSON  unoptimized  539  90  15 
optimized  651  42  5 
5 Experimental Results
In order to validate the heuristic optimizations performed by our implementation, we tested them on a set of eight realworld contextfree grammars collected by Neal Wagner at the web site http://www.cs.utsa.edu/~wagner/CS3723/grammar/examples2.html together with the Lisp 1.5 and JSON grammars. For each grammar, we measured the area of our drawing (in units of rows and columns), the number of tokens (boxes) in the drawing, and the total number of connected components, both before and after optimization. The results are shown in Table 2.
As these results show, our optimizations were not always effective at reducing the total area of our drawings, and in some cases even increased the area. However, we typically achieved more significant reductions in the numbers of tokens and connected components of the drawings, which we believe to be helpful in reducing their visual clutter. Additionally, it can be seen that our optimizations are typically more effective on grammars with larger numbers of nonterminals, and less effective on grammars that have only a very small number of nonterminals, because in those cases no nesting will be possible.
6 Gallery of Examples
Footnotes
References
 1.Jensen, K., Wirth, N.: PASCAL User Manual and Report. Springer, New York (1974)zbMATHGoogle Scholar
 2.Burroughs Corporation: Command and Edit (CANDE) Language Information Manual (1971)Google Scholar
 3.McCarthy, J.: LISP 1.5 Programmer’s Manual. MIT Press, Cambridge (1965)Google Scholar
 4.Braz, L.M.: Visual syntax diagrams for programming language statements. SIGDOC Asterisk J. Comput. Doc. 14, 23–27 (1990)CrossRefGoogle Scholar
 5.Bell, S., Gilbert, E.J.: Learning recursion with syntax diagrams. SIGCSE Bull. 6, 44–45 (1974)CrossRefGoogle Scholar
 6.Goldberg, A., Robson, D.: Smalltalk80: The Language and Its Implementation. AddisonWesley Longman Publishing Co., Inc., Boston (1983)zbMATHGoogle Scholar
 7.Crockford, D.: Introducing JSON (2015). http://json.org. Accessed: 04 June 2015
 8.Atkins, Jr., T., Sapin, S.: CSS Syntax Module Level 3 (2015). http://www.w3.org/TR/csssyntax3. Accessed: 04 June 2015
 9.Dopler, M., Schörgenhumer, S.: EBNF Visualizer (2015). http://dotnet.jku.at/applications/Visualizer. Accessed: 04 June 2015
 10.Thiemann, P.: Ebnf2ps: Peter’s Syntax Diagram Drawing Tool (2015). http://www2.informatik.unifreiburg.de/thiemann/haskell/ebnf2ps. Accessed: 04 June 2015
 11.Rademacher, G.: Railroad Diagram Generator (2015). http://bottlecaps.de/rr/ui. Accessed: 04 June 2015
 12.Dickerson, M.T., Eppstein, D., Goodrich, M.T., Meng, J.Y.: Confluent drawings: visualizing nonplanar diagrams in a planar way. In: Liotta, G. (ed.) GD 2003. LNCS, vol. 2912, pp. 1–12. Springer, Heidelberg (2004) CrossRefGoogle Scholar
 13.Sugiyama, K., Tagawa, S., Toda, M.: Methods for visual understanding of hierarchical system structures. IEEE Trans. Systems Man Cybernet. 11, 109–125 (1981)MathSciNetCrossRefGoogle Scholar
 14.Bastert, O., Matuszewski, C.: Layered drawings of digraphs. In: Kaufmann, M., Wagner, D. (eds.) Drawing Graphs. LNCS, vol. 2025, pp. 87–120. Springer, Heidelberg (2001) CrossRefGoogle Scholar
 15.Bekos, M.A., Kaufmann, M., Kobourov, S.G., Symvonis, A.: Smooth orthogonal layouts. J. Graph Algorithms Appl. 17, 575–595 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 16.Eppstein, D., Goodrich, M.T., Meng, J.Y.: Confluent layered drawings. In: Pach, J. (ed.) GD 2004. LNCS, vol. 3383, pp. 184–194. Springer, Heidelberg (2005) CrossRefGoogle Scholar
 17.Hunt III, H.B., Rosenkrantz, D.J., Szymanski, T.G.: On the equivalence, containment, and covering problems for the regular and contextfree languages. J. Comput. Syst. Sci. 12, 222–268 (1976)MathSciNetCrossRefzbMATHGoogle Scholar
 18.Stockmeyer, L.J., Meyer, A.R.: Word problems requiring exponential time. In: Proceedings 5th ACM Symposium on Theory of Computing (STOC 1973), pp. 1–9 (1973)Google Scholar
 19.Gramlich, G., Schnitger, G.: Minimizing NFA’s and regular expressions. J. Comput. Syst. Sci. 73, 908–923 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
 20.Champarnaud, J.M., Coulon, F.: NFA reduction algorithms by means of regular inequalities. Theor. Comput. Sci. 327, 241–253 (2004)MathSciNetCrossRefGoogle Scholar
 21.Han, Y.S., Wood, D.: Obtaining shorter regular expressions from finitestate automata. Theor. Comput. Sci. 370, 110–120 (2007)MathSciNetCrossRefzbMATHGoogle Scholar