Keywords

1 Introduction

The impact of big data has been increasingly felt across many industry sectors in recent years. Its specific impact on the health sector was examined in 2013 in a report by McKinsey [1], which outlined several key factors that were likely to drive changes in healthcare policy and delivery. The first insight was that demand was growing for greater economic value to be generated from data, including improvements to policy-making processes, which should be linked to existing stored datasets that had not been re-used (that is, used for secondary purposes). The second factor was the expanding volumes of data being collected from organisational sources such as clinical devices and processes, as well as personal sources such as social media and mobile devices. The third point was the steady growth of open data being published by governments in order to catalyse private sector innovation. Finally, it was noted that technological change was driving predictable improvements in terms of the capture, storage and processing of data from diverse sources.

It was in this context that the MIDAS Project (Meaningful Integration of Data Analytics and Services) was funded under the EU’s Horizon 2020 programme in order to capitalise on big data trends and drive policy improvements in the European health sector. The project is intended to meet the needs of citizens and policy-makers through the provision of a state-of-the-art big data platform. In this paper we present a usability study of the first iteration of the MIDAS platform, focusing specifically on the dashboard layer, which is designed to facilitate the analysis and visualisation of big data for policy-making. We begin by providing a brief overview of the project and the technical architecture of the platform, including the data integration, analytics and visualisation layers. We then explain the usability evaluation methods that were used and present a summary of the main findings from the study. Finally, we discuss the most important implications of the results, review which parts of the methodology were most useful, and outline how the usability analysis of future software development iterations will be carried out in light of the findings.

2 Background

2.1 MIDAS Project Overview

The MIDAS Project consortium consists of a variety of academic, industrial and policy experts from six European countries and the United States. A key focus of the project is ensuring that the information delivered by the MIDAS platform is relevant and actionable, and to this end the consortium has adopted a user-centred design ethos. A broad selection of experts from a diverse spectrum of backgrounds were chosen in order to effectively capture user needs and provide realistic engagement in the co-design of the system [2]. This approach has been driven by a realisation that ultimate users of the system are likely to be very diverse, and will include stakeholders with very different backgrounds and levels of technical knowledge - from statisticians and data scientists to senior civil servants and politicians.

The core elements of the MIDAS platform will include technical components to enable the collection and preparation of heterogeneous data, and an agreed architecture for data storage, data integration, data virtualization, data cleansing, deployment and management. The secure management of personal data is a key deliverable, and will be addressed through the development of a ‘privacy by design’ approach. This approach will enable the anonymisation and aggregation of data, while simultaneously making it interoperable to the analytics layer. The platform will also incorporate a variety of tools and algorithms that will allow meaning to be extracted from data, building on existing machine learning and other analytics methods. From the user’s perspective, a key part of the platform will be the user interface (UI) layer that will draw aggregated outputs from the analytics layer for display via management dashboards and easy-to-use interactive tools.

2.2 MIDAS Platform Architecture

As stated above, a key goal of the MIDAS project is to access and integrate a range of heterogeneous datasets, located in many separate systems. A common metadata model will enable these datasets to be harmonised so that common analytics and processing can occur, but a flexible underlying data architecture is required to support the storage and processing of these disparate datasets, as well as integration of that data from diverse sources. The MIDAS platform will be based on Analytics Engines XDPTM, which is a proprietary, scalable and modular data analytics platform and Analytics Engines Unified Data View (UDV), which enables a single, unified view of data being analysed. This forms the core data storage/processing portion of the platform, upon which the analytics and visualisation layers are built.

The architecture of the MIDAS Dashboard is shown in Fig. 1. It is composed of three main layers: the User Interface, a Middleware layer and a Closed Intranet layer. The User Interface is developed with standard HTML, Bootstrap, JQuery and JavaScript technologies. It has a modular design, which will enable any third party to develop their own visualization with JavaScript. OpenVA is a Java framework developed by VTT, which is used to handle the results of the analytics that are received from the analytics layer. A Flask-based REST server manages communications between the user interface and the analytics system, allowing the frontend to retrieve specific parameters or analytics results.

Fig. 1.
figure 1

MIDAS Dashboard architecture.

2.3 MIDAS Interface

In this section we describe the main components in the MIDAS Platform User Interface which were the primary focus for the evaluation.

MIDAS Dashboard.

The MIDAS Dashboard has two views for the end user: the sign in page and main page. The main page is only shown to the user after they have successfully signed in. The three primary components of the main page are the menu bar at the top of the page, the name bar below the menu bar, and the workspace for the widgets. The workspace can extend outside the displayed area, and when this happens scrollbars appear on bottom and right border. This is normally triggered when there are more widgets than will fit on the current screen. An example of the MIDAS Dashboard is shown in Fig. 2.

Fig. 2.
figure 2

Screen capture of MIDAS dashboard prototype.

Wizard.

When a user wants to add a widget to the workspace, the relevant widget type can be selected from the top menu, triggering the corresponding widget wizard. The content of the wizard depends on the selected widget type. Some widget types require further input from the user. For example, the analytics widget requires the user to select the appropriate datasets and variables from a list of available datasets. Once these have been selected, the wizard asks the user to select a suitable analytics and visualization type. The available analytics and visualisation options are limited to those which are suitable for the selected data. As a final step, the wizard asks for parameters for the selected analytics and visualizations. Once these are chosen a widget will appear on dashboard, which can then be placed wherever the user sees fit.

Widgets.

A widget is an object on the MIDAS Dashboard which displays the results of a specific data analytics option. Depending on the specific widget type, the data and analytics can be provided by either external or internal systems, enabling the system to support multiple information sources on the same workspace. External widgets import results from external analytics systems such as MEDLINE abstract search or the MIDAS social media platform. Internal widgets display the results of internal data and analytics. Each widget has four main functions: move, resize, update and remove. Moving and resizing can be done by clicking and dragging the edges of the widget. The update function is triggered by clicking an icon at the top right corner of the widget. This will re-open the wizard to allow the user to modify the widget parameters. Widgets can be removed by clicking the red removal button on the top right corner of the widget.

Social Media Widget.

Trying to get a sense of how the public feel about a new policy can be a challenge. The MIDAS social media platform is designed to allow policy-makers to create chatbots to ask questions on their behalf. Responses are analysed using natural language processing techniques and the resulting insights can be displayed via a social media widget. This widget will enable policy-makers to get a sense of how the public feel about specific policy matters by visualising the sentiment detected in the public’s responses. This will indicate, for example, whether people are responding positively or negatively to individual policy proposals.

MEDLINE Widget.

The MEDLINE custom widget is an interactive visual tool that helps to surface information by re-indexing search results based on user input. More specifically, the user drags a cursor over a graph showing precomputed clusters of search terms, which triggers updates to the list of relevant documents. For example, when we enter a search term ‘diabetes type 2’, the system performs an elasticSearch over the MEDLINE dataset and extracts groups of keywords that best describe different subgroups of results. By moving the cursor over word-groups, the user provides the relevance criteria, thereby bringing to the top of the list the most relevant articles. The user can read its title and first lines of each abstract, and by clicking on it, open that article in the browser at its PubMed location.

External Platforms.

A number of linked external analytics resources will be available to later versions of the MIDAS Dashboard. These were not included in the current evaluation study but will be included in the future UX testing iterations when the connectivity with external components has been improved.

3 Methodology

3.1 Heuristic Evaluation

This first stage of the protocol was to carry out a heuristic evaluation. This involves an expert using a series of established usability and design principles to audit the user interface. In this case, Ulster University UX lab experts used well-established usability principles such as Jakob Nielsen’s 10 heuristics [3], laws from Gestalt psychology [4] and the 8 golden rules by Ben Shneiderman [5]. The test team included three specialists with extensive experience of UX testing. Each of the experts applied the whole protocol to the current system, and after these tests the severity of the problems were evaluated. Following the evaluation, usability issues were set out in a series of slides for sharing with the development team. These slides also included suggestions to improve the system. Most findings were addressed according to the suggestions prior to the formative evaluation taking place.

3.2 Formative Evaluation

For the formative evaluation a number of subjects were recruited to attempt a series of tasks using the MIDAS prototypes. It had been agreed at the outset of the MIDAS Project that the platform would tested and validated with real data and representative users of the system from across the partner countries. Twelve users in total were selected from the Basque Country, Finland, Northern Ireland and Republic of Ireland. These users were from a diverse range of technical and policy backgrounds. Although the final test protocol (see below) focused primarily on two main user personas - a data scientist and a policy-maker, it was decided that each user should perform the tasks associated with both personas in order to maximise data collection. The specific tasks for each persona are presented in Table 1.

Table 1. Tasks for each persona for usability testing protocol.

A rigorous test protocol was jointly developed by consortium members, led by usability testers from Ulster University’s UX Lab. The usability testing protocol was informed by Ulster’s UX-Lab having carried out a range of usability tests on medical devices, software and data visualisations [6,7,8,9,10,11,12,13,14,15,16] (see Fig. 3). Each subject attempted the task whilst ‘thinking-aloud’. The Think-Aloud Protocol (TAP) [17] allows the assessor to understand the user’s cognitive processes hence eliciting usability issues and cognitive errors. Before and after each task, the user was asked how difficult they expected the task to be and how difficult the task was, thus measuring whether the system met the user’s expectation. This is known as the Single Ease Question (SEQ) [18]. A questionnaire based on the System Usability Scale (SUS) [19] was also given to each user in order to measure perceived usability. A so-called SUS score was computed and benchmarked using a SUS distribution. In addition, after the usability test, we measured the frequency of usability errors/issues, task completion rates, and task completion times.

Fig. 3.
figure 3

Typical usability testing protocol.

Testing sessions using the shared protocol took place in Northern Ireland (including Northern Ireland and Republic of Ireland users), Finland and the Basque Country. Notes were taken by moderators or a separate note taker during the sessions and video was recorded for subsequent detailed analysis. The testing data was gathered and shared on a Google Doc, and the Ulster University team carried out an analysis of the aggregated results. This analysis includes summaries of user demographics, task completion rates and times, SEQ (Single Ease Question) scores for each task, and SUS (System Usability Scale) scores for each persona. It also included a summary and analysis of the qualitative comments and insights gathered during the testing sessions. The findings of the UX Evaluation Report were shared with developers to facilitate bugs fixes and enhance features, and to drive usability improvements as the project progressed.

4 Results

4.1 Heuristic Evaluation Results

The primary aim of heuristic evaluation in this study was to ensure that the system was ready for formative testing. Experts evaluated the MIDAS Dashboard interfaces to find the most critical problems and root causes for issues which might confuse end users. Most issues were corrected before end user representatives began the formative testing phase. Based on the heuristic evaluation the technical team fixed a number of issues, including visual fixes on boxes and text alignment, adding better search functionality, including a help menu on the menu bar, reordering menu items, and adding a “save as” option. Some problems and suggested fixes were not corrected before the formative evaluation due to time constraints, including a “forgot password” button for sign-in, better registration information on the sign-in page, an incorrect password lock on sign-in, and coloured sharing symbols for the dashboard menu.

4.2 Formative Evaluation Results

In this section we summarise the key findings and recommendations from the UX Evaluation Report.

Task Completion Times.

An analysis of the task completion rates and times showed that the majority of users were able to complete the majority of tasks without assistance. With the exception of Data Scientist Tasks 2 and 5, and Policy Maker Task 1, the majority of tasks were completed in less than a minute. Data Scientist Tasks 2 and 5 raised a number of specific issues that are explored in more detail in the Qualitative Analysis (below), and Policy Maker Task 1 was an exploratory task, for which the time taken was not a particularly relevant metric.

SEQ Results Summary.

An analysis of the Single Ease Question (SEQ) scores showed that in general, the platform was slightly easier to use than anticipated. One exception to this was Data Scientist Task 5, which users found more difficult that they expected. This finding is reflected in the longer task completion times, and also in the issues raised in the Qualitative Analysis. For the Policy Maker persona, the SEQ scoring indicated that user felt that Task 1 - the interpretation of shared dashboards - was not at all straightforward (scoring on average only 3/7). On the other hand, Task 2 - modifying an existing dashboard - was perceived to be relatively easy (scoring on average 6/7).

SUS Results Summary.

In the context of the SUS (System Usability Scale) scores, statements 8 and 10 were outliers for both the Data Scientist and Policy Maker persona tasks. Statement 8 relates to how “cumbersome” users find the interface, although it is important to note that some studies have suggested that this score can affected by language factors with non-native English speakers [19]. The Statement 10 score suggests that users found the system easy-to-learn. Another interesting finding from the SUS analysis is that the platform scored better for the Data Scientist persona than for the Policy Maker persona (75.0 vs. 56.7), implying that users perceived that policy makers might find the current prototype more challenging to use, and in particular to interpret visualisations. Figures 4 and 5 show the distribution of the aggregated results for the Data Scientist and Policy Maker tasks.

Fig. 4.
figure 4

Data Scientist SUS results (box plot showing overall scores for all tasks).

Fig. 5.
figure 5

Policy Maker SUS results (box plot overall scores for both tasks).

Qualitative Results Summary.

The qualitative data gives perhaps the richest understanding of the issues and insight that emerged during testing. Despite the diverse nature of the users involved in the usability testing, there were a number of clear recurring themes in the user feedback across all sites. Perhaps the most commonly identified issue was the need for more information to guide users through the system. Some options identified included more descriptive text in the menus and wizards, more hover text and pop-up hints and tips, making videos available on how to create, share and interpret dashboards, and rich user documentation. Dashboard sharing was generally seen to be a key feature for development. Many users suggested that the platform should support annotations and comments for widgets and dashboards, as well as chat functionality for dashboard creators to assist policy makers in interpreting the data. Sharing dashboards would also require some consideration of the need for effective version control, as well as user search functionality.

Alongside these and other high-level feature requests, there were numerous observations on how the interaction with individual widgets and menu items might be improved. Additionally, one user who had spent a lot of time training clinicians on how to use dashboards felt that the model used by the system conflicted with her expectations. For example, this user expected the system to provide the ability to have a number of different tabs or dashboards open simultaneously. While this user’s expectations were tied to a specific product that she had personally spent a number of years working with, it raised an important question about the extent to which the MIDAS design paradigm would align with user expectations based on established products and services.

5 Discussion

As stated earlier, the usability evaluation was comprised of two main parts - a heuristic analysis and a formative analysis. The advantages of heuristic evaluations are that they are relatively fast and inexpensive to deploy [20]. They also allow problems to be identified early in the design cycle, when they are less costly to fix. One drawback is that usability experts can be hard to come by [20], although this did not apply in the case of the MIDAS Project. Most scholars agree that heuristic evaluations are best seen as a complement rather than an alternative to user testing [21]. For this reason, we decided to employ both heuristic and formative methods to the evaluation. By employing the heuristic method first, and fixing as many of the identified problems before the formative evaluation, the intention was to ensure that the maximum value was derived from the user testing. This was particularly important given the logistical challenges of engaging with relevant users from the various MIDAS sites, which in many cases required relatively senior staff giving up significant amounts of time to assist with the testing protocol.

The user-centred formative evaluation phase involved a variety of users from different backgrounds carrying out task associated with two key user personas - the Data Scientist and Policy Maker. The findings showed that there was very little overlap in results compared to the heuristic evaluation, which was perhaps not surprising given the fact that developers had already addressed most of the major concerns that had been raised by the usability experts. Task completion times were generally satisfactory, and the Single Ease Questions (SEQ) for the tasks showed that overall the system was slightly easier to use than most users had anticipated. Both the SEQ and the System Usability Scale (SUS) results indicated that the application was harder to use from a Policy Maker perspective, compared to a Data Scientist perspective.

Arguably the most valuable feedback came from the qualitative results, which were enabled by adopting a Think Aloud Protocol approach to the testing. The qualitative feedback enabled the testers to identify key issues which were common to all testing sites. These included the need for more metadata and guidance via the user interface, which many users initially found somewhat confusing. In addition to the need to make functionality more discoverable, the complexity of the underlying datasets meant that users often required help in understanding the information they were attempting to use. Another challenge was raised by one user who had already built up significant experience of an alternative dashboard-based analytics product. In their case, the UX paradigm adopted by the MIDAS platform did not align with prior expectations. Ultimately, the researchers were able to achieve a compromise proposal in terms of how dashboards should be presented to users. This does raise an interesting question for developers, however: how can they balance the need to follow established industry practices and user expectations, with the desire to pursue more innovative and perhaps “superior” interface solutions?

6 Conclusions

Although research indicates that heuristic evaluation results tend to uncover similar problems to more user-centred methods, it is generally accepted that they should be seen as a complement to usability testing [21]. In order to reduce the potential for duplicated effort in this study, the heuristic evaluation was completed and the software was updated before the formative user-centred evaluation was carried out. This approach meant that the majority of concerns raised during the formative testing were newly identified. Testing with a variety of users across multiple sites allowed the researchers to identify recurring issues that should be prioritised for future development. While the task completion times, SEQ scores and SUS scores were useful for generating quantitative usability measures, it is worth noting that the most valuable feedback for the development team came from the qualitative feedback. This qualitative approach and analysis of user comments as they spoke about their experience using the platform provided the most valuable insights in terms of determining how ongoing development work should be targeted. This is perhaps reflective of the fact that software usability is by its nature a complex construct, requiring a depth of analysis that can only come from a qualitative approach.

It should be borne in mind that this evaluation study of the MIDAS platform was focused on an early prototype, which entailed incomplete functionality and limited access to client data. For future iterations of the platform and usability testing, it is anticipated that the functionality of the application will be significantly extended, and a richer ecosystem of datasets will be available for analysis. It is likely that large parts of the existing test protocols will be re-used for subsequent evaluation exercises. One important lesson learned through this process was the scale of the challenge in recruiting suitable users and coordinating times to attend lab-based tests. It was clear that future test events should be scheduled well in advance, preferably two to three months beforehand. Future test protocols will be adapted to focus on the specific needs of each MIDAS pilot site, and are expected to involve a richer user interface that will include external data sources and visualisation tools.