10.1  Introduction

Large-scale testing continues to play an important role in earthquake engineering, generating research results that lead to improved safety and security of European society. However, realistic tests of large or complex structural systems are constrained by the capacities of existing laboratories.

Distributed hybrid testing enables the use of resources at different locations to perform more complex, larger-scale tests than are possible in most individual laboratories. In a distributed experiment, the system under test is split into several sub-structures which can be tested or simulated in different locations. Data are passed between the sub-structures at each time-step to ensure that the distributed experiment realistically simulates the full system under test.

There are multiple benefits in conducting an experiment in a distributed manner but it also involves some difficulties: organizing and planning distributed experiments entails much more complexity than what is involved in a single-laboratory hybrid test. In a distributed test, the experimental procedure and the systems to support the experiment execution must be closely linked, and multi-site experiment coordination can entail many technical and social challenges (De la Flor et al. 2010).

Moreover, the difficulty of tracing errors caused by the distributed environment is greater than in a local one. Errors can occur in any of the participants of the experiment or in the communication between them, and tracing where and why something failed can be extremely complicated. When a system raises an error during an experiment, it can be detected and fixed. Unfortunately, not every error is raised and recognised at its source, and many of them have to be traced through several distributed systems or devices. Even when seemingly no major or obvious error happened, bugs in programs or unexpected situations might give the impression of a test being conducted successfully yet produce incorrect results. Tracing where the test failed involves following all testing steps carefully, and sometimes the erroneous situation is impossible, or not easily reproducible. This all points to the importance of a platform to support testing activities, dividing them into two groups: high-level activities (experiment planning, organization, participant location…) and low-level activities (the actual data exchange).

Several programs exist to conduct distributed tests, such as UK-NEES DHT (developed at the University of Oxford (Ojaghi et al. 2010)), OpenFresco (developed at the University of California, Berkeley (Schellenberg et al. 2010)) or the Platform for Networked Structural Experiments (developed at the National Center for Research on Earthquake Engineering, Taiwan (Yang et al. 2004; Wang et al. 2004)). All of these programs expect to communicate with their homologous program at the other end, so it is not possible to use UK-NEES DHT at one end and OpenFresco at the other, for instance. This means that service integration in distributed testing software is null, which is one of the motivations for the creation of a common framework called CelestinaFootnote 1.

The Celestina framework is explained in Sect. 10.2 of the paper. In Sect. 10.3 we describe a first implementation of the framework and its evaluation through a series of distributed experiments. In a typical substructured test, nodes at Oxford and Kassel Universities were used to simulate the response of a 33-degree-of-freedom steel frame fitted with a tuned mass damper, with both nodes conducting testing (in simulation) according to instructions from a Celestina-based program running in Oxford. Finally, conclusions and future developments are presented in Sect. 10.4.

10.2  The Celestina Framework

The Celestina framework is a specification to support high-level activities such as identification and location of participants, compatibility verification of participants, experimental planning and results collection.

When conducting an experiment on the Celestina framework, a computer works as a manager sending orders to the rest of the experiment participants to instruct them how to proceed and organise the experimental workflow. The experiment plan is defined only by the manager, so researchers just need to specify the experiment steps at one time at one place. The manager will contact every participant to instruct them specifically what they have to do at every workflow step.

The Celestina framework specifies what to do but not how to do it. The experiment manager sends orders about what to do but it is actually the decision of each experiment participant how to conduct the instructions given by the manager. Likewise, the Celestina framework does not dictate how the actual data exchange of low-level testing activities is conducted between participants, it only defines the data types that have to be exchanged at each experiment step.

To enable understanding between the manager and the experiment participants sharing laboratory facilities, the specification defines a set of services. These services must be implemented by any testing software integrated into the Celestina framework, but their specific implementation is actually decided by the software. The Celestina services are divided into three groups, as depicted in Fig. 10.1:

Fig. 10.1
figure 1

Celestina services overview

  • Networking services: deal with operations to locate and discover new Celestina nodes in the network (the experiment participants), identification of nodes and the resources they share.

  • Definition services: deal with compatibility, preparation of an experiment and the verification that the experiment is feasible and can be achieved. It involves verification of connectivity, checking that the data types used in potentially heterogeneous systems are compatible and validating an agreed experiment plan.

  • Testing services: deal with the actual experiment execution. This involves establishing the communication, running the data exchange over the network and collecting the results.

These services will be discussed in the following sections.

10.2.1  Networking Services

The Celestina framework is configured in a hybrid peer to peer network (P2P) operated above the Internet infrastructure. A peer to peer network is a network where all the machines within it share some of their hardware resources and where any machine is accessible by any other machine in the network without intermediaries (Schollmeier 2010). The Celestina hybrid P2P network is divided into sky nodes, which are pure managers in charge of controlling the experiment plan, and ground nodes, which are in charge of the simulation execution. Sky and ground nodes are connected as depicted in Fig. 10.2. In terms of the network, both sky and ground nodes are treated equally. The only practical exception is that sky nodes are expected to be online all the time and do not have the necessity of sharing laboratory facilities.

Fig. 10.2
figure 2

P2P Celestina network divided into sky and ground nodes

Every Celestina node has a unique ID that identifies the node within the network. Since the network is totally unstructured and decentralised, every node in the network keeps a list of known nodes (also called the friendbook). Friendbooks are exchanged between nodes in one-to-one basis when they get in contact. When a node joins in the Celestina network, it should know the contact details of at least another node (preferably a sky node) already in the network, so its contact details are populated within the network and it starts creating a larger friendbook. To make friendbook information persistent, a file, database or other structure must be used.

10.2.2  Definition Services

The definition services verify the feasibility of an experiment, in terms of the understanding of the participants on the nature and configuration of the experiment, the possibility of establishing network connections and an agreement on the terms the experiment to be conducted. Each service in the definition services is commanded by a sky node and executed by the respective ground nodes, which return the results once completed. The main services are discussed below.

One of the services verifies the link connection between two nodes. This is normally done by means of a ping command, which is considered a low-level network command, but in practice there is no restriction of how nodes test the link. Typically two nodes will be ordered to execute a ping command to the other node, and after executing it, the results will be returned to the sky node. Note that this verification is just a first verification and is only illustrative: ping might work in one way but not in the other, and it does not really reflect accurately the possibility of a test being conducted or the time-step that can be achieved.

Another service verifies data compatibility to check if two nodes understand each other in terms of exchanged data types. To do so, the sky node sends two values to a node (normally similar to the ones that will be used during the experiment) together with an operation. The node will then send this information to the second node (so the network link is verified at a higher level), which operates the two values with the operation and returns the result. The first node then verifies that the received result is exactly the expected result and sends this to the sky node. An example is a sky node sending the values “7.803” and “124.986” with the operation “multiplication”. The first ground node multiplies the two values and gets the expected result. Then it sends the information to the second ground node which performs the multiplication and sends the result back. This checks the possibility that different machines understand values in different ways, so the result sent by the second ground node is different from what is expected.

A third service assures that the experiment plan is agreed between all participants. Ground nodes must be able to analyse the experiment plan sent by the sky node and determine if they are able to conduct it. For example, if a ground node is only capable to conduct a two-site test, it will reject any multi-site test. Similarly, if the experiments use unknown data types or the date to conduct the experiment is not suitable, the experiment plan can be rejected.

10.2.3  Testing Services

The testing services are the most complex of the three groups of services. They define an experiment workflow and, unlike the network or definition services, they have to be called in a strict order. The experiment workflow defines the following phases:

  • Test locking: This prevents other tests from running at the same time.

  • Establishment of connections: Every node that needs to communicate with others will establish network connections (if necessary) and leave them open ready to start the test.

  • Preparation: During this phase, nodes will process all instructions in the experiment to have them ready for the final data exchange. Researchers might need to conduct some additional tasks in the laboratory to get the physical devices ready.

  • Experiment execution: This is when the experiment takes place and the actual data exchange occurs. At this point, no task in the computers should interfere while the experiment is running. The experiment will finish either because an error arises or an “end-of-experiment” signal is transmitted.

  • Result collection: After the sky node is informed of the end of the test, it will attempt to collect results (mainly experimental data) from each of the participants. Every participant will normally generate a signal file for each command they managed during the test.

  • Test unlocking and closing resources: Once the experiment is finished, every node receives the order of cleaning resources and reverting to a stable initial state, ready to conduct a new test.

10.2.4  Services Implementation in the Nodes

There are two types of implementations in the Celestina framework. Every ground node in Celestina has to implement the three groups of services and provide mechanisms to conduct each of them (nodes are free to decide how to implement them in practice). Sky nodes have to implement methods to call ground-node services, managing orders and data involved in the process. The nodes should normally provide a user-friendly interface for the operator.

While it is very useful to have many different implementations of ground-node software, sky-node software is not as important—it can be implemented once and be reused in many places. The reason for this is that ground-node software might need to implement different testing behaviors, but the main operation of sky nodes does not often vary.

The next section will discuss how the Celestina framework can be currently used.

10.3  A First Celestina Implementation

The Celestina specification has been implemented in a functional software by using Java and Web Services, and tested in a distributed environment between the universities of Oxford and Kassel. This implementation includes both the Celestina high-level activities as well as the data exchange required in low-level activities. It is very flexible in terms of the experiment plans that can be conducted. Therefore, all the experiment steps can be configured at the sky node, and the ground nodes will execute these steps automatically without requiring any code modification. For instance, an experiment could be configured to send two “float” values from a node to another, and receive three “float” values as a response. Should it be decided that the experiment uses “double” values instead of “float” and to send one value and receive four, this can be configured in the experiment plan without requiring any code modification. Ground nodes will just adapt themselves to the experiment situation.

Several experiments were conducted to validate the operation of the Celestina framework and its implementation. Nodes at Oxford and Kassel were used to simulate the response of a 33-degree-of-freedom steel frame fitted with a tuned mass damper, with both nodes conducting testing (in simulation) according to the instructions given by a Celestina sky node installed at Oxford. Two different implementations of Celestina were tested (“with Debug” and “FastJava”). Figure 10.3 shows the performance of the Celestina implementations compared to other testing software (NCREE and OpenFresco) under similar testing circumstances. The “direct communication” shows the performance of the numerical testing program by itself—with no additional software. As the figure shows, there is no significant overhead added by the operation of the Celestina FastJava implementation.

Fig. 10.3
figure 3

Celestina performance compared to other testing software (average time required per test step—latency not guaranteed)

In order to use Celestina and take advantage of the framework, testing software has to implement the Celestina specification and be aware of its existence. Existing software can be operated under Celestina, but if it does not implement any of the services, the real benefit of the framework is rather limited.

Current options to use the framework are the following:

  • Using the Celestina implementation created by the University of Oxford and validated with the University of Kassel. This is the implementation that has been used to validate the whole framework and to demonstrate the capabilities of service integration as well as the execution of low-level testing activities without significant overhead.

  • Integrate existing software or create a new one following the Celestina specification. Once the software is aware of the specification, it can take full advantage of the service integration.

The success of Celestina is bound to the commitment of testing software to implement the specification.

10.4  Conclusions

In this paper we have discussed the Celestina framework, which is a specification to support high-level activities (such as experiment planning and verification of an experiment feasibility) as well as to promote the integration of heterogeneous testing software. Celestina defines three groups of services that must be implemented by every node conducting experiments in the Celestina network. Two types of nodes, sky and ground, are considered, and they are integrated in a P2P network. Sky nodes are managers of the experiment whereas ground nodes share their laboratory facilities and conduct the actual data exchange.

Several compatibility tests are conducted under Celestina, such as testing the network link or verifying the data types understanding. A first implementation has been developed at the University of Oxford and tested with the University of Kassel, to validate the framework and demonstrate its capability of conducting distributed testing without significant overhead.

The next efforts should focus on promoting a framework to be used in existing or new testing software and to improve and create new implementations.