Keywords

1 Introduction

The primary aim of this paper is to describe the implementation and applications of a scripting language, Interactive Sonification Markup Language (ISML), designed for expediting the creation of different parameters and specifications for interactive sonification [1] research. Sonification [2] and specifically interactive sonification is multidisciplinary by nature, including computer science, psychology, HCI, acoustics, music, and sound design, etc. We hope to promote greater efficiency in implementing experimental parameters by demonstrating a method for allowing sonification researchers, even those with no programming experience, to efficiently alter the software specifications of an interactive sonification research platform.

To this end, ISML needs to be flexible enough to support a wide variety of experimental setups, yet simple enough to be easily produced and parsed. Thus, once the initial language specification was complete, our work focused on creating a graphical user interface (GUI) that could facilitate the creation of ISML scripts by non-programmers. Given that auditory display and sonification community has grown, this type of effort [e.g., Auditory Menu Library for auditory menu research, [3]] is expected to increase efficiency and accessibility to sonification research.

The present paper briefly describes our sonification research platform that gave rise to ISML, the motivation for the development of ISML, the effectiveness of ISML in actual research, and the language features and GUI. Finally, limitations of the current system and potential expansions and improvements are discussed. A description of the ISML specification is then given in the appendix.

2 Background and Motivation

Interactive sonification research has various applications, such as enhancing learning effects and overall user experience. For example, research [46] suggests that a combination of physical interaction with sonified feedback improves users’ learning. Other studies [79] have shown that interactive sonification can also provide users with a more enjoyable experience and improve accessibility to more diverse audiences.

To conduct active interactive sonification research on those multiple applications, we have developed the Immersive Interactive Sonification Platform (iISoP) [10], which generates sounds based on users’ motion – location, movements, and gestures – tracking. We have built the iISoP system using the JFugue Library (i.e., Java API for Music Programming, www.jfugue.org). iISoP leverages the university’s Immersive Visualization Studio (IVS), which consists of 24 42″ monitors connected to a computing cluster of 8 machines with high-end graphics cards for display, along with a Vicon tracking system composed of 12 infrared cameras for tracking objects inside the lab. In constructing iISoP, we have developed phased projects – interactive map, virtual instrument, and dance-based sonification as a testbed [see [10], more details].

The third phase of iISoP uses parameter mapping [11] to map data features—in this case, users’ physical location, movements, and gestures—into sound. This is accomplished through the following process. Users (dancers, performing artists, kids, and robots) wear special reflective markers and thus, their movements can be tracked by the Vicon system, which relays the users’ positional data to the head node on the IVS cluster. There, this information is parsed to generate an abstract visualization of the users’ movements on the wall of monitors. The data is simultaneously forwarded to another computer running a program which parses the positional data to dynamically generate sounds or improvise real-time music in response to the users’ movements.

However, if the method the audio parser uses for generating sound were fixed, this would limit the usefulness of the platform. Therefore, the motion-to-sound mapping should be configurable, so as to facilitate various kinds of research and experiments. Because many of the researchers (psychologists, sound designers, performing artists, etc.) using iISoP may lack programming experience, expecting them to configure the system by altering the source code by hand would be an unreasonable expectation.

ISML was developed to bridge this gap between the non-technical researchers and the research platform. ISML is a simple scripting language for configuring iISoP’s motion-to-sound mapping. To make generating these scripts even easier, ISML includes a web-based GUI so that researchers can create a script simply by responding to prompts, rather than needing to learn ISML’s syntax and semantics. This method of configuring the mappings is far more efficient and accessible to non-technical researchers than attempting to reprogram the system for every new experiment. Figure 1 illustrates ISML’s role in the iISoP system.

Fig. 1.
figure 1

ISML’s role in the iISoP system

The format of ISML’s syntax is loosely inspired by markup languages, a system for annotating documents so that the annotations are distinguishable from the actual content [12]. Two major markup language standards are Standard Generalized Markup Language (SGML) and Extensible Markup Language (XML), the latter of which is a simplified version of the former designed to maintain its most useful aspects [13, 14]. While markup files are typically “documents” of various kinds, they can also represent arbitrary data structures [15]. ISML files, which can be thought of as an activity flowchart or state machine defining application behavior in response to external inputs, are more similar to the latter. Note that ISML is only inspired by these standards and does not make any attempt to conform to them.

ISML also bears similarity to scripting languages, since it is designed to automate the behavior of iISoP’s sound generation application. Specifically, it is an example of an audio synthesis scripting language. Other languages of this type exist. A non-exhaustive list of examples includes the ChucK audio programming language [16], the C-based audio programming language Csound [17], the commercial Reaktor software, and the MPEG-4 Structured Audio standard [18]. Some of these systems use graphical interfaces for their scripting languages, while others are textual. Additionally, some of them support live coding, the ability to change the program’s behavior while it is running.

By comparison, ISML scripts can be created in a graphical or textural manner. The present version of ISML does not support live coding. To the authors’ knowledge, ISML is the only scripting language specifically designed to translate physical movements into dynamically generated sound, which makes it very different from other audio scripting languages.

3 Cost Efficiency: Programmer Time and Effort

Based on our own experience developing the system (two CS/ECE Ph.D. graduates and one CS undergraduate), creating a complete iISoP specification by programming it from scratch takes about 50 h for an experienced programmer. Reusing a preexisting specification in the creation of a new one would likely cut that time to about 20–40 %, or 10–20 h. Conversely, creating an equivalent specification for a research study using ISML takes about 5 h from scratch, or 1–2 h if reusing an existing specification, and has the added advantage of not requiring extensive training in programming. Getting ISML fully operational requires an initial time investment of approximately 70 h. Taking all of this into account, Table 1 compares the total specification development times of each approach, demonstrating that the time savings of using ISML as opposed to creating new specifications for different research studies with hard coding from scratch rapidly accumulate. It seems that the development cost of ISML “pays for itself” after approximately 3–4 specifications.

Table 1. Comparison of spec development times with and without ISML

4 System Description

4.1 Overview

The iISoP system dynamically changes sound output via the following procedure. First, the system checks to see whether a set of conditions has been met; for example, “The user is currently moving at a speed equal to or greater than 1 meter per second”. If these conditions are met, the system executes one or more actions; for example, “Change the key signature to C-major and the time signature to 4/4”. Conditions are optional: It is allowed to specify a set of actions that is executed all the time, without any conditions being met.

Each set of conditions and actions is organized into an activity. Activities serve as a mechanism for grouping conditions and actions together. By having multiple activities, it is possible to have different sets of conditions and actions which can be checked and executed. Within an activity, all of the conditions must be satisfied in order for the actions to be executed; outside of that activity, its conditions do not matter.

Thus, considered collectively, activities and their conditions form a disjunction of conjunctions.

Lastly, activities are organized into items. Items provide a scoping mechanism for variables. ISML has 26 variables (a–z) available for use. However, each variable exists independently within the item in which it is used; that is, each variable has item scope. Within each item, all of that item’s activities are executed in sequential order from top to bottom. Each item, and all of its activities, are executed (all conditions are checked and the corresponding actions are executed) every cycle of the system. There is no limit on the number of items an ISML script may contain.

The current version of ISML allows for two basic types of conditions: object and comparison. An object condition applies only to an object in the Vicon tracking system that possesses the user-specified name; for example, “left_foot”. A comparison condition compares whether two values are equal to, greater than, or less than each other. Values that can be compared include constants, variables, current beats per minute, current velocity (in any of the X, Y, or Z axes, or the composite velocity on all three axes), average velocity, acceleration, proximity (the average distance between all tracked objects), current position, and elapsed time (which may be repeatable for conditions that are checked at regular intervals).

Additionally, the current version of ISML allows for the following types of actions:

  • Assignment. Sets one value equal to an expression. An expression may be another value, or two other values with some arithmetic operation applied on them (e.g., a = b*c).

  • Set the key signature.

  • Set the time signature.

  • Set the instruments being used.

  • Play a specific sequence of notes.

4.2 Graphical Editor

Since ISML is designed to be usable by non-programmers, it includes a GUI for the creation of scripts via prompts. The GUI is written entirely in Javascript, so it can be run in a Web browser; and because it does not use any server-side code, it can be run from any computer.

The GUI’s appearance is shown in Fig. 2. Upon startup, the user is presented with a blank script and can either access a comprehensive help guide, load an existing script, or begin creating a new script by clicking the “Add Item” button. From here, the GUI dynamically generates additional buttons for adding and deleting the various components of the script under development. To prevent mishaps, selecting a checkbox is required before deleting script components.

Fig. 2.
figure 2

Screen capture of the ISML graphical editor (Color figure online)

To aid user comprehension, each component of the GUI is indented and color-coded so that distinguishing different activities, condition sets, and so on, is more intuitive. Additionally, a “mapping summary” is generated on the side of the screen which summarizes the motion-to-sound mapping that has been created so far. These features help lighten the cognitive burden on the user by keeping the visual representation of the script organized.

Once editing is complete, the user can download the valid ISML-formatted file by clicking the “Download ISML File” button. This file can be further tweaked by hand if desired, or loaded back into the GUI editor at any time for later revision.

4.3 Feature Development

The ISML specification and accompanying editor have gone through several small revisions. After initial development, the specification was changed from infix to prefix notation for easier parsing, and semantic error prevention was added to the GUI editor. Various other tweaks were applied to the specification to simplify parsing, and then the “mapping summary window” (Fig. 2, right side) feature was added. Finally, the color scheme was changed to be more visually pleasing. Both the ISML specification and the editor are in a fluid state and will undergo iterative development cycles with end users.

5 Discussion and Future Work

Faste and Faste [19] proposed four categories of design research. With ISML, we have seized on two of those categories: “research on design” by investigating our research process for possible enhancements; and “design of research” by planning and preparing for future research on the system. From our experience, the ability to rapidly generate new sonification configurations has already shown benefits, and the time savings will continue to accumulate as more research is conducted with the system. This approach will decrease repetitive development tasks for various experiments and increase accessibility to researchers from various disciplines, who do not necessarily have programming experience.

In the future, we intend to strengthen our claims of ISML’s effectiveness with an empirical study and to continue developing ISML to improve its efficacy; for example, by adding features for volume control, panning, and audio filters. Additionally, the current version of ISML only supports the configuration of audio. In the future, we plan to add the ability to configure the visualization parameters and emotional parameters through ISML as well.