Keywords

1 Introduction

Visual, block-based environments such as ALICE [4] or Scratch [17] have recently transformed the teaching of computing [1, 9].

Yet this development in procedural and object-oriented programming tools has not disseminated to analysing and processing data. For example, the nifty assignments repository of computing assessment ideas [14, 15] contains 107 assignments, collected for their quality, but only eight of these incorporate work with a real data set.

Of particular concern to us, at Sheffield Hallam University, is adapting our tools, teaching methods and resources in order to facilitate access to and process of data by students at any level. Specific interest areas have been working with open data advocacy groups [12] and making data analytic tools more available [19].

The Open Piping project pursues this idea with an open-source functional programming environment and visual data flow interface for data processingFootnote 1.

2 Project Motivations

Open Piping is a visual functional programming environment, based on a boxes and wires model, intended for data processing applications.

Visual boxes and wires environments are common [11, 13], including some in commercial [8] and scientific [7] use. But in many cases, the value of the tools is limited due to the poor transparency of the processes and technology they implement.

Take the case of the popular - until its end in 2015 - Yahoo pipes [13]. To execute pipes on systems of their choice, users had to go through a complex export process. This was their only option when Yahoo support ended.

Open piping aims to propose an ease of use comparable to commercial tools, in an open architecture to facilitate development flexibility, reuse and allow richer exchanges between users.

2.1 Open by Design

Our ambition is to propose a graphical tool for user-defined data processes, which would include, by design, the transparency and flexibility needed to apply user-defined processes in a range of languages and environments. Open piping aims to be at once:

Open. That is, Open Source; the system’s source code is available under the GNU licence. But so is the notation used to define processes. Any user process can then be transformed from this notation into executable code in a target programming language.

Interoperable. The process specification format is openly available, and uses a human-readable, JSON formatted S-expression. This is needed to ensure the interoperability of the system with any manner of services, such as alternative end-user interfaces, new languages or process hosting and remote execution tools.

Easy to use. The user interface makes it easy to define data flows and shows clearly the relation between data flow, resulting S-expression, and executable functionality.

With resulting processes easy to deploy. The ability to choose from multiple languages and standards for services and content integration, would facilitate the re-use of user-defined processes in different environments, such as within content-management systems, as web or application widgets, or within a service-oriented architecture.

Altogether, these characteristics aim to ensure that users can easily define the processes they want to operate on data, while also retaining control of these processes to use them in new environments.

3 Open Piping Operation

3.1 System Architecture

The boxes-and-wires model describes the directed acyclic graph for a function, with the boxes representing functions and the wires, the data to which they apply.

Configuration data defines base functions available to the end-user. This information at once determines primary graphical blocks, provides access to basic processing capabilities, and limits that access, for security, to a chosen set with defined functionality.

The end-user defines a function by wiring elementary blocks. This function is translated into an S-expression in JSON, which can be compiled into an executable function in any number of languages, provided that calls to the primitive functions can be defined.

The interface elements presented Fig. 1 sum up the use of Open Piping. The end-user chooses elementary blocks (1) to define a flow (2) which is translated to a symbolic expression (3) encoded in JSON to use the many existing tools for this format. The expression is then interpreted (4) and executed (5).

Fig. 1.
figure 1

Open piping main interface elements

3.2 Defining and Encoding a Data Flow

The block description and interface configuration also uses the JSON format. For instance, Fig. 2 shows the configuration lines to define the box representing arithmetic operations. The user can choose add, subtract, divide, or multiply from a single ‘arithmetic’ box.

Fig. 2.
figure 2

Defining and representing graphically a box of arithmetic functions

An example data flow is presented Fig. 3. The web interface uses the JSPlumb library [10] to manipulate and represent the screen objects. Traversing the graph recursively provides a symbolic expression. An advantage of symbolic expressions is that code remains close to existing languages such as LISP or Scheme. For instance, in a LISP-like language, the workflow Fig. 3 results in the structure:

figure a
Fig. 3.
figure 3

An example workflow

Another benefit of S-expressions is the original argument for this notation: executable code and data follow the same conventions. This facilitates the processing of an expression like line [1] in multiple environments.

The expression is encoded in JSON, to provide to the interpreter. JSON’s wide use and readability make it particularly suitable to this purpose. The encoding follows these simple rules:

  • JSON notation defines objects, arrays, strings, numbers and the values true, false, and null. Our encoding relies on all but objects.

  • Atomic values are strings, numbers and the values true, false, and null.

  • Lists are represented by a JSON array. Each element of the list can be an atomic value or a list, and so on recursively.

Respecting this convention, the process shown Fig. 3 is written:

figure b

3.3 Interpret a Symbolic Expression in Executable Language

To allow the execution of the same expression in diverse environments, we rely on characteristics present in most programming languages - use of variables, of a means of conditional execution, of functions - but we must provide elementary information to support the interpretation in each language. These data are themselves written in JSON.

To illustrate the interpretation process, let us study the case of interpreting expression [2] above in JavaScript and JQuery.

The interpretation relies on a list of predefined functions and string substitutions for the language:

figure c

Some operators are interpreted by substituting character chains to form the target code. Arithmetic operators like + use this technique, but so do conditionals, which we interpret in JavaScript with the ternary operator. Functions are identified and composed from arguments and body information. So [4] contains all the information needed to interpret the example completely.

Using this data, the expression is interpreted recursively. First the expression

figure d

results in the definition of function isNumber,

figure e

and into one function call. The plus function is then interpreted by substituting strings, and finally if to compose the overall result:

figure f

We can see that the interpretation of a user-defined function is simple; to be able to execute a process in a given language, we simply need to define and execute safely the primitive functions required.

3.4 Overcoming Visual Limitations

The graphical model shown above should support end-user’s understanding and programming of simple processes. However, based on our experience and prior research such as [2, 3, 18], we speculate that several aspects of the visualisation are not easily represented in ways that end-users spontaneously understand. Here, we present a number of potential solutions to support end-users as programs become more complex.

Coordinating Visual Code with Results. Visual programming can support end-users with a number of displays - the results of a program, of its code, of its execution. The wires and boxes model is a form of visual code, but many systems show a visual representation of execution results.

Coordinated views can also apply to viewing code. Yahoo pipes [16] is an example of this approach: its visualisation showed code, in boxes and wires form, along with a sample of the data resulting from it. Users could also select subsets of the code to view its result. This supported end-users with a presentation of the code, of some results, and of execution information (as partial execution results), as well as debugging support by means of choosing code subsets to test.

Fig. 4.
figure 4

Viskell shows data type explicitly

Data Typing. The boxes and wires model shown in our example Fig. 3 does not show any type information. Typing has many advantages for novice programmers, in particular limiting errors by constraining the validity of constructs, ensuring security, and facilitating debugging.

Typing can be presented in textual form, a solution adopted by Viskell as shown in Fig. 4 [20]. An alternative is visual clues, such as colour, shape, or icons: languages like MIT Scratch [17] adopt this approach, and use the added advantage of shape as a metaphor for syntactic validity. Type can also be implemented in the language and enforced in the interaction, yet not presented visually: that is the solution adopted by Yahoo pipes, which enforces type checking with the impossibility of connecting a wire to a box if types do not match, but give no visual typing clue.

Representing Conditionals. Conditional execution is one of the basic elements of programming. A three-argument function, for the Boolean that determines which branch is executed, and each of the two branches, is a suitable technical answer, but as the prototype workflow shown earlier in Fig. 3, visual clues in support of the user are clearly lacking.

Fig. 5.
figure 5

Prograph shows the conditional branches within two frames for clarity.

Prograph [5] solves this problem by adding to boxes and wires a third construct, frames, for sections of code that are end-users should consider separately.

First Class Functions. First-class functions are a fundamental benefit of functional programming, but also a difficult concept to represent in ways that users can understand and control. The earlier illustration of Viskell (Fig. 4), shows a lambda-expression within the model, supported by textual type annotation: not every end-user will find it clear.

An alternative relies on the same notion of frames as for conditionals: a function that accepts another as a parameter, represents that parameter within a frame. Yahoo pipes adopts that solution, albeit for a limited use of first-class functions: it implements user actions to drop a box into a functional parameter slot. [6] have investigated the primitives needed to represent completely the power of first-class functions within frames, but the solution is not an easy visualisation of the notion.

4 Conclusion and Future Work

The structure of our system lets users retain control of their processes. In particular:

Limits to processing capabilities are not inherent to the system, but instead to the environment in which the process is deployed, for example by setting a processing time limit.

The visual language is loosely coupled to the execution environment, by producing a function definition in an open intermediate representation; this ensures that changes to the visual interface, to the target language, and to the execution environment are independent.

Risks of code injection are limited by transmitting the symbolic expression to an interpretation environment hosted with the execution environment, rather than communicate executable code, as well as by defining in the interpreter what primitive functions are allowable.

We believe that these characteristics can support adoption and self-learning through greater open access to computation.

Currently our prototype ensures that end-users can define processes, and demonstrates the compilation from the S-expression to JavaScript and execution. Multiple environments common on web servers and clients are considered - e.g. JQuery, PHP, node.js, etc, as well as deployment of executable results in new systems.

Developing this prototype’s capabilities to support users further, will require a balance of technical feasibility, theoretical clarity and empirical evidence to identify the most appropriate solutions.