Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Choreographic Programming is an emerging paradigm for developing concurrent software based on message passing [16]. Its key aspect is that programs are choreographies – global descriptions of communications based on an “Alice and Bob” security protocol notation. Since this notation disallows mismatched I/O actions, choreographies always describe deadlock-free systems by construction. Given a choreography, a distributed implementation can be projected automatically (synthesis) onto terms of a process model – a transformation called EndPoint Projection (EPP) [2, 3]. A correct definition of EPP yields a correctness-by-construction result: since a choreography cannot describe deadlocks, the generated process implementations are also deadlock-free. Previous works presented formal models capturing different aspects of choreographic programming, e.g., web services [2, 12], asynchronous multiparty sessions [3], runtime adaptation [9], modular development [18], protocol compliance [3, 4], and computational expressivity [7]. Choreography models have also been investigated in the realms of type theory [14], automata theory [11], formal logics [5], and service contracts [1].

Despite the rising interest in choreographic programming, there is still a lack of evidence about what nontrivial programs can actually be written with this paradigm. This is due to its young age [17]. Indeed, most works on languages for choreographic programming still focus on showcasing representative toy examples (e.g., [2, 3, 6, 12, 16, 18]), rather than giving a comprehensive practical evaluation based on standard computational problems.

In this work, we contribute to filling this gap. Our investigation uses the language of Procedural Choreographies (PC) [8], summarised in Sect. 2, which extends previous choreography models with primitives for parameterised procedures. Like other choreography languages (e.g., [3, 18]), PC supports implicit parallelism: non-interfering communications can take place in any order. We provide an empirical evaluation of the expressivity of PC, by using it to program some representative and standard concurrent algorithms: Quicksort (Sect. 3), Gaussian elimination (Sect. 4), and Fast Fourier Transform (Sect. 5). As a consequence of using choreographies, all these implementations are guaranteed to be deadlock-free. We also illustrate how implicit parallelism has the surprising effect of automatically giving concurrent behaviour to traditional sequential implementations of these algorithms. Our exploration brings us to the limits of the expressivity of PC, which arise when trying to tackle distributed graph algorithms (Sect. 6), due to the lack of primitives for accessing the structure of process networks, e.g., broadcasting a message to neighbouring processes.

2 Background

In this section, we recap the language and properties of Procedural Choreographies (PC). We refer the reader to [8] for a more comprehensive presentation.

Procedural Choreographies. The syntax of PC is given below:

A procedural choreography is a pair \(\langle \mathcal {D},C\rangle \), where C is a choreography and \(\mathcal {D}\) is a set of procedure definitions. Process names, ranged over by \(\mathsf {p},\mathsf {q},\mathsf {r},\ldots \), identify processes that execute concurrently. Each process \(\mathsf {p}\) is equipped with a memory cell storing a single value of a fixed type. In the remainder, we omit type information since they can always be inferred using the technique given in [8]. Statements in a choreography can either be communication actions (\(\eta \)) or compound instructions (I), and both can have continuations. Term \(\varvec{0}\) is the terminated choreography, often omitted, and \(\varvec{0};A\) is used only at runtime.

Processes communicate (synchronously) via direct references (names) to each other. In a value communication \(\mathsf {p}.e\;\texttt {-\!\!>}\;\mathsf {q}.f\), process \(\mathsf {p}\) evaluates expression e and sends the result to \(\mathsf {q}\); e can contain the placeholder \(\mathtt {c}\), replaced at runtime with the data stored at \(\mathsf {p}\). When \(\mathsf {q}\) receives the value from \(\mathsf {p}\), it applies to it the (total) function f and stores the result. The body of f can also contain \(\mathtt {c}\), which is replaced by the contents of \(\mathsf {q}\)’s memory. Expressions and functions are written in a pure functional language, left unspecified.

In a selection \(\mathsf {p}\;\texttt {-\!\!>}\;\mathsf {q} [l]\), \(\mathsf {p}\) communicates to \(\mathsf {q}\) its choice of label l.

In term \(\mathsf {p} \, \mathsf {start} \, \mathsf {q}\), process \(\mathsf {p}\) spawns the new process \(\mathsf {q}\), whose name is bound in the continuation C of \(\mathsf {p} \, \mathsf {start} \, \mathsf {q};C\). After executing \(\mathsf {p} \, \mathsf {start} \, \mathsf {q}\), \(\mathsf {p}\) is the only process who knows the name of \(\mathsf {q}\). This knowledge is propagated to other processes by the action \(\mathsf {p}\!: \mathsf {q}\, \texttt {<\!\!-\!\!>}\, \mathsf {r}\), read “\(\mathsf {p}\) introduces \(\mathsf {q}\) and \(\mathsf {r}\)”, where \(\mathsf {p}\), \(\mathsf {q}\) and \(\mathsf {r}\) are distinct.

In a conditional term \(\mathsf {if}\, \mathsf {p}.e \, \mathsf {then} \, C_1 \, \mathsf {else} \, C_2\), process \(\mathsf {p}\) evaluates expression e to choose between the possible continuations \(C_1\) and \(C_2\).

The set \(\mathcal {D}\) defines global procedures that can be invoked in choreographies. Term \(X(\widetilde{\mathsf {q}}) = C\) defines a procedure X with body C, which can be used anywhere in \(\langle \mathcal {D},C\rangle \) – in particular, inside the definitions of X and other procedures. The names \(\tilde{\mathsf {q}}\) are bound to C, and are assumed to be exactly the free process names in C. The set \(\mathcal {D}\) contains at most one definition for each procedure name. Term \(X\langle \tilde{\mathsf {p}} \rangle \) invokes procedure X, instantiating its parameters with the processes \(\tilde{\mathsf {p}}\).

The semantics of PC, which we do not detail, is a reduction semantics that relies on two extra elements: a total state function that assigns to each process the value it stores, and a connection graph that keeps track of which processes know (are connected to) each other [8]. In particular, processes can only communicate if there is an edge between them in the connection graph. Therefore, choreographies can deadlock because of errors in the programming of communications: if two processes try to communicate but they are not connected, the choreography gets stuck. This issue is addressed by a simple typing discipline, which guarantees that well-typed PC choreographies are deadlock-free [8].

Procedural Processes. Choreographies in PC are compiled into terms of the calculus of Procedural Processes (PP), which has the following syntax:

A term \({\mathsf {p}} \triangleright _{v} {B}\) is a process, where \(\mathsf {p}\) is its name, v is the value it stores, and B is its behaviour. Networks, ranged over by NM, are parallel compositions of processes, where \(\varvec{0}\) is the inactive network. Finally, \(\langle \mathcal {B},N\rangle \) is a procedural network, where \(\mathcal {B}\) defines the procedures that the processes in N may invoke.

We comment on behaviours. A send term \({\mathsf {q}}!{e};B\) sends the evaluation of expression e to process \(\mathsf {q}\), and then proceeds as B. Dually, term \(\mathsf {p}?f;B\) receives a value from process \(\mathsf {p}\), combines it with the value in memory cell of the process executing the behaviour as specified by f, and then proceeds as B. Term \({\mathsf {q}}!!{\mathsf {r}}\) sends process name \(\mathsf {r}\) to \(\mathsf {q}\) and process name \(\mathsf {q}\) to \(\mathsf {r}\), making \(\mathsf {q}\) and \(\mathsf {r}\) “aware” of each other. The dual action is \(\mathsf {p}?\mathsf {r}\), which receives a process name from \(\mathsf {p}\) that replaces the bound variable \(\mathsf {r}\) in the continuation. Term \({\mathsf {q}}\oplus l;B\) sends the selection of a label l to process \(\mathsf {q}\). Selections are received by the branching term \( {\mathsf {p}} \& {\{ l_i : B_i\}_{i\in I}}\) (I nonempty), which receives a selection for a label \(l_i\) and proceeds as \(B_i\). Term \(\mathsf {start} \, \mathsf {q} \triangleright B_2;B_1\) starts a new process (with a fresh name) executing \(B_2\), proceeding in parallel as \(B_1\). Other terms (conditionals, procedure calls, and termination) are standard; procedural definitions are stored globally as in PC.

Term \(\mathsf {start} \, \mathsf {q} \triangleright B_2;B_1\) binds \(\mathsf {q}\) in \(B_1\), and \(\mathsf {p}?\mathsf {r};B\) binds \(\mathsf {r}\) in B. We omit the formal semantics of PP, which follows the intuitions given above.

EndPoint Projection (EPP). In [8] we show how every well-typed choreography can be projected into a PP network by means of an EndPoint Projection (EPP). EPP guarantees a strict operational correspondence: the projection of a choreography implements exactly the behaviour of the originating choreography. As a consequence, projections of typable PC terms never deadlock.

3 Quicksort

In this section, we illustrate PC’s capability of supporting divide-and-conquer algorithms, by providing a detailed implementation of (concurrent) Quicksort.

We begin by defining procedure split, which splits the (non-empty) list stored at p among three processes: , and . We assume that all processes store objects of type List(T), where T is some type, endowed with the following constant-time operations: get the first element (fst); get the second element (snd); check that the length of a list is at most 1 (short); append an element (add); and append another list (append). Also, and test whether the first element of the list is, respectively, smaller or greater than the second. Procedure pop2 (omitted) removes the second element from the list.

We write p -> q1,...,qn[l] as an abbreviation for the sequence of selections p -> q1[l]; ...; p -> qn[l]. We can now define split.

figure a

When split terminates, we know that all elements in and are respectively smaller or greater than those in .Footnote 1 Using split we can implement a robust version of Quicksort (lists may contain duplicates), the procedure QS below. We write p start q1,..., qn for p start q1;...; p start qn. Note that split is only called when p stores a non-empty list.

figure b

Procedure QS implements Quicksort using its standard recursive structure. Since the created processes , and do not have references to each other, they cannot exchange messages, and thus the recursive calls run completely in parallel. Applying EPP, we get the following process procedures (among others).

figure c

4 Gauss Elimination

Let \(A\mathbf {x}=\mathbf {b}\) be a system of linear equations in matrix form. We define a procedure gauss that applies Gaussian elimination to transform it into an equivalent system \(U\mathbf {x}=\mathbf {y}\), with U upper triangular (so this system can be solved by direct substitution). We use parameter processes \(\mathsf {a}_{ij}\), with \(1\le i\le n\) and \(1\le j\le n+1\). For \(1\le i,j\le n\), \(\mathsf {a}_{ij}\) stores one value from the coefficient matrix; \(\mathsf {a}_{i,{n+1}}\) stores the independent term in one equation. (Including \(\mathbf {b}\) in the coefficient matrix simplifies the notation.) After execution, each \(\mathsf {a}_{ij}\) stores the corresponding term in the new system. We assume A to be non-singular and numerically stable.

This algorithm cannot be implemented in PC directly, as gauss takes a variable number of parameters (the \(\mathsf {a}_{ij}\)). However, it is easy to extend PC so that procedures can also take process lists as parameters, as we describe.

  • Syntax of PC and PP. The arguments of parametric procedures are now lists of process names, all with the same type. These lists can only be used in procedure calls, where they can be manipulated by means of pure functions that take a list as their only argument. Our examples use uppercase letters to identify process lists and lowercase letters for normal process identifiers.

  • Semantics of PC. We assume that a procedure that is called with an empty list as one of its arguments is equivalent to the terminated process \(\varvec{0}\).

  • Connections. Connections between processes are uniform wrt argument lists, i.e., if p and A are arguments to some procedure X, then X requires/guarantees that p be connected to none or all of the processes in A.

The definition of gauss uses: hd and tl (computing the head and tail of a list of processes); fst and rest (taking a list of processes representing a matrix and returning the first row of the matrix, or the matrix without its first row); and minor (removing the first row and the first column from a matrix). Processes use standard arithmetic operations to combine their value with values received.

figure d

Procedure solve divides the first equation by the pivot. Then, eliminate uses this row to perform an elimination step, setting the first column of the coefficient matrix to zeroes. The auxiliary procedure eli_row performs this step at the row level, using elim_all to iterate through a single row and elim1 to perform the actual computations. The first row and the first column of the matrix are then removed in the recursive call, as they will not change further.

This implementation follows the standard sequential algorithm for Gaussian elimination (Algorithm 8.4 in [13]). However, it runs concurrently due to the implicit parallelism in the semantics of choreographies. We explain this behaviour by focusing on a concrete example. Assume that A is a \(3\times 3\) matrix, so there are 12 processes in total. For legibility, we will write b1 for the independent term a14 etc.; A= \(\langle \) a11,a12,a13,b1,a21,a22,a23,b2,a31,a32,a33,b3 \(\rangle \) for the matrix; A1= \(\langle \) a11,a12,a13,b1 \(\rangle \) for the first row (likewise for A2 and A3); and, A’2= \(\langle \) a22,a23,b2 \(\rangle \) and likewise for A’3. Calling gauss(A) unfolds to

figure e

Fully expanding the sequence elim_row(A1,A3); solve(A’2) yields

figure f

and the semantics of PC allows the communications in the second line to be interleaved with those in the first line in any possible way; in the terminology of [7], the calls to elim_row(A1,A3) and solve(A’2) run in parallel.

This corresponds to implementing Gaussian elimination with pipelined communication and computation as in Sect. 8.3 of [13]. Indeed, as soon as any row has been reduced by all rows above it, it can apply solve to itself and try to begin reducing the rows below. It is a bit surprising that we get such parallel behaviour by straightforwardly implementing an imperative algorithm; the explanation is that EPP encapsulates the part of determining which communications can take place in parallel, removing this burden from the programmer.

5 Fast Fourier Transform

We now present a more complex example: computing the discrete Fourier transform of a vector via the Fast Fourier Transform (FFT), as in Algorithm 13.1 of [13]. We assume that n is a power of 2. In the first call, \(\omega =e^{2\pi i/n}\).

figure g

Implementing this procedure in PC requires two procedures gsel_then(p,Q) and gsel_else(p,Q), where p broadcasts a selection of label then or else, respectively, to every process in Q.Footnote 2 We also use auxiliary procedures intro(n,m,P), where n introduces m to all processes in P, and power(n,m,nm), where at the end nm stores the result of exponentiating the value in m to the power of the value stored in n (see [7] for a possible implementation in a sublanguage of PC).

The one major difference between our implementation of FFT and the algorithm R_FFT reported above is that we cannot create a variable number of fresh processes and pass them as arguments to other procedures (the auxiliary vectors \(\mathbf {q}\) and \(\mathbf {t}\)). Instead, we use \(\mathbf {y}\) to store the result of the recursive calls, and create two auxiliary processes inside each iteration of the final for loop.

figure h

The level of parallelism in this implementation is suboptimal, as both recursive calls to fft use n’ and w’. By duplicating these processes, these calls can run in parallel as in the previous example. (We chose the current formulation for simplicity.) Process n’ is actually the main orchestrator of the whole execution.

6 Graphs

Another prototypical application of distributed algorithms is graph problems. In this section, we focus on a simple example (broadcasting a token to all nodes of a graph) and discuss the limitations of implementing these algorithms in PC.

The idea of broadcasting a token in a graph is very simple: each node receiving the token for the first time should communicate it to all its neighbours. The catch is that, in PC, there are no primitives for accessing the connection graph structure from within the language. Nevertheless, we can implement our simple example of token broadcasting if we assume that the graph structure is statically encoded in the set of available functions over parameters of procedures. To be precise, assume that we have a function neighb(p,V), returning the neighbours of p in the set of vertices V. (The actual graph is encapsulated in this function.) We also use ++ and \(\mathtt{{\backslash }}\) for appending two lists and computing the set difference of two lists. We can then write a procedure broadcast(P,V), propagating a token from every element of P to every element of V, as follows.

figure i

Calling broadcast( \(\langle \) p \(\rangle \) ,G), where G is the full set of vertices of the graph and p is one vertex, will broadcast p’s contents to all the vertices in the connected component of G containing p. Implicit parallelism ensures that each node starts broadcasting after it receives the token, independently of the remaining ones.

This approach is not very satisfactory as a graph algorithm, as it requires encoding the whole graph in the definition of broadcast, and does not generalise easily to more sophisticated graph algorithms. Adding primitives for accessing the network structure at runtime would however heavily influence EPP and the type system of PC [8]. We leave this as an interesting direction for future work, which we plan to pursue in order to be able to implement more sophisticated graph algorithms, e.g., for computing a minimum spanning tree.

7 Related Work and Conclusions

To the best of our knowledge, this is the first experience report on using choreographic programming for writing real-world, complex computational algorithms.

Related Work. The work nearest to ours is the evaluation of the Chor language [16], which implements the choreographic programming model in [3]. Chor supports multiparty sessions (as \(\pi \)-calculus channels [15]) and their mobility, similar to introductions in PC. Chor is evaluated by encoding representative examples from Service-Oriented Computing (e.g. distributed authentication and streaming), but these do not cover interesting algorithms as in here.

Previous works based on Multiparty Session Types (MPST) [14] have explored the use of choreographies as protocol specifications for the coordination of message exchanges in some real-world scenarios [10, 19, 20]. Differently from our approach, these works fall back to a standard process calculus model for defining implementations. Instead, our programs are choreographies. As a consequence, programming the composition of separate algorithms in PC is done on the level of choreographies, whereas in MPST composition requires using the low-level process calculus. Also, our choreography model is arguably much simpler and more approachable by newcomers, since much of the expressive power of PC comes from allowing parameterised procedures, a standard feature of most programming languages. The key twist in PC is that parameters are process names.

Conclusions. Our main conclusion is that choreographies make it easy to produce simple concurrent implementations of sequential algorithms, by carefully choosing process identifiers and relying on EPP for maximising implicit parallelism. This is distinct from how concurrent algorithms usually differ from their sequential counterparts. Although we do not necessarily get the most efficient possible distributed algorithm, this automatic concurrency is pleasant to observe.

The second interesting realisation is that it is relatively easy to implement nontrivial algorithms in choreographies. This is an important deviation from the typical use of toy examples, of limited practical significance, that characterises previous works in this programming paradigm.