1 Introduction

This paper describes the design of an experimental methodology that aims to evaluate the User eXperience (UX) of a new Web platform, called UTAssistant (Usability Tool Assistant) [1]. This is a semi-automatic usability evaluation tool that supports practitioners in usability evaluations of Web systems and services provided by a public administration (PA), according to the eGLU 2.1 technical protocol [2]. This protocol provides a set of principles and procedures to support specialized usability assessments in a controlled and predictable way.

The UX evaluation design of UTAssistant described in this paper is an experimental methodology for assessing the UTAssistant platform with end-users and Web managers of PA Web sites, both in a laboratory setting and using a Web-based recruitment platform. The methodology proposed here includes several types of end-users, with the aim of assessing (i) the UTAssistant method through bio-behavioral measurements; (ii) the usability evaluation process of UTAssistant with Web managers in Italian PA; (iii) a heuristic evaluation of UTAssistant conducted by experts in UX; and (iv) a usability evaluation of UTAssistant with a highly representative number of end-users using a Web-based recruitment platform.

2 Usability Testing of Italian Public Administration Web Services

In October 2012, the Department of Public Function of the Italian Ministry for Simplification and Public Administration formed a working group called GLU (Working Group on Usability). The GLU team was composed of Italian universities, central and local Italian PAs, and other independent information and communication companies. The purpose of GLU is to support PA practitioners involved in Web content management, website development, or e-government systems development in performing usability evaluations, and particularly those who are not usability experts. The primary goal of GLU is to collect and identify golden rules for developing and evaluating systems that are easy to use and appropriate for this purpose. To this end, GLU developed a set of guiding protocols that are able to operatively support both the analysis and evaluation of graphical user interfaces for the Web. GLU can guide Web masters, and its protocols are explorative tools that can investigate how good or satisfactory the experience is for a user when using a PA Web service, e.g. searching for certain information, consulting or downloading a digital document, or completing an online form. GLU protocols guide PA practitioners in exploratory analyses to better understand the problems (or strengths) of their Web services, in order to collect use cases for future development. Since 2013, GLU has developed four different usability evaluation protocols called eGLU 1.0, eGLU 2.0, eGLU 2.1 [2], and eGLU-M [3]. Three protocols (1.0 and 2.0) are designed for desktop solutions (2.1), while the other (eGLU-M) is designed for mobile platforms [3].

The eGLU 1.0 protocol was developed in May 2013 [4]. The protocol involves two levels of analysis, basic and advanced, which can be used independently of each other according to the testing period and the practitioner’s skill. The basic level is specifically recommended for performing quick analyses to check the main problems affecting the usability of a short number of Web pages. It is a macroscopic analysis that asks users to freely navigate the content of the main pages of a given Web service, and then to complete a questionnaire to investigate the quality of the interaction. In a basic level analysis, practitioners primarily collect information on how many navigation tasks users achieved or failed, how difficult it was for users to perceive or understand Web interface elements, and user satisfaction. The advanced level analysis is recommended for practitioners who need a more detailed analysis of interaction problems. At this level, participants are required to report their actions and thoughts during their interaction with the system. Compared to the basic level, an advanced analysis allows practitioners a greater level of detail and information on user interactions, both in terms of the users’ navigation paths and the difficulties they encountered in perceiving or understanding information during the tasks. Both basic and advanced levels describe how to create and describe tasks for users, how to set parameters, the apparatus involved, and the selection of participants. The eGLU 1.0 protocol provides practitioners with practical advice on how to properly conduct the test, including how to verbally describe both the goals of the test and the instructions to participants. Both the basic and advanced levels follow five phases, which describe: (i) how to prepare testing documents; (ii) how to prepare tools and materials; (ii) how to conduct the test; (iv) how to handle the collected data; and (v) how to draw up the evaluation report. eGLU 1.0 recommends the use of at least one of two usability assessment questionnaires: (i) the System Usability Scale (SUS) [5, 6] or the Usability Evaluation (Us.E. 2.0) questionnaire [7].

The eGLU 2.0 protocol was released in 2014. Compared to eGLU 1.0, eGLU 2.0 provides practitioners with an easier and simpler methodology for conducting evaluation tests, together with a wide range of design and evaluation approaches and methods from which practitioners can freely choose according to their needs. eGLU 2.0 consists of two parts: the first gives recommendations and instructions to practitioners on how to design and conduct tests, while the second focuses on advanced design methods and evaluation techniques, and describes which alternative and/or complementary usability methods can be used.

In the same way as eGLU 1.0, eGLU 2.0 offers a first-level usability test methodology that is suitable for both expert and non-expert usability evaluation practitioners. eGLU 2.0 involves three phases, which describe how to (i) prepare, (ii) execute, and (iii) analyze the results. The protocol recommends using at least one of three usability assessment questionnaires: (i) the SUS [5, 6], the Us.E. 2.0 questionnaire [7], and the Usability Metric for User Experience, lite version (UMUX-LITE) [8,9,10]. The second part of eGLU 2.0 involves several in-depth analyses of and extensions to the basic procedure. These schedules can be useful in planning, conducting or analyzing the interaction, and increase the possibility of intervention via Web site redesign by providing elements from a broader and more complex range of methodological approaches compared to the basic protocol procedure. The advanced techniques described in eGLU 2.0 are the kanban board, scenarios and personas, evaluation strategies using the think-aloud verbal protocol, the methodology of the ASPHI non-profit organization foundation (http://www.asphi.it/), and the usability cards method (http://www.usabilitycards.com/).

An updated version of the methods and techniques proposed in eGLU 2.0 was developed in 2015 [2] with the eGLU 2.1 protocol. eGLU 2.1 is distributed together with the eGLU-M (eGLU-mobile) protocol [3], which is specifically designed for usability evaluations using mobile devices. Although the evaluation of mobile websites and Web services has some aspects that are operationally different from evaluations using desktop devices, the approach, methodology and phases of the exploratory analysis procedure remain substantially unchanged. The development of a new version of the protocol is currently in progress, and its release is expected in 2018.

3 UTAssistant: A New Usability Testing Support Web Platform for Italian Public Administration

UTAssistant is a Web platform, designed and developed within the PA++ Project. The goal of this platform is to provide Italian PA with a lightweight and simple tool for conducting user studies based the eGLU 2.1 protocol, without requiring installation on user devices.

One of the most important requirements driving the development of this platform was the need to perform remote usability tests with the aim of stimulating users to participate in a simpler and more comfortable way. To accomplish this, UTAssistant was developed as a Web platform so that the stakeholders involved, namely the evaluator (Web manager of a PA site) and users (typically of PA Web sites), can interact using their PCs, wherever and whenever they prefer. This is possible due to the recent evolutions of the HTML5 and JavaScript standards, which allow Web browsers to gather data from PC devices such as the webcam, microphone, mouse, and keyboard. This represents an important contribution to state-of-the-art of usability test tools, since remote participation fosters wider adoption of these tools and consequently of the usability testing technique. Indeed, the existing tools for usability testing require software installation on a PC with specific requirements (e.g. Morae® https://www.techsmith.com/morae.html [11]).

The following sub-sections describe how UTAssistant supports evaluators in designing a usability test and analyzing the results, and how users are supported by UTAssistant in completing the evaluation tasks.

3.1 Usability Test Design

A usability test starts from the test design, which mainly consists of: (i) creating a script to introduce the users to the test; (ii) defining a set of tasks; (iii) identifying the data to be gathered (e.g. the number of clicks and the time required by the user to accomplish a task, audio/video/desktop recording, logs, etc.); and (iv) deciding which questionnaire(s) to administer to users.

UTAssistant facilitates evaluators in performing these activities by means of three wizard procedures. The first guides evaluators in specifying: (a) general information (e.g. a title, the script); (b) the data to gather during execution of the user task (e.g. mouse/keyboard data logs, webcam/microphone/desktop recordings); and (c) the post-test questionnaire(s) to administer. The second procedure assists evaluators in creating the task lists; for each task, start/end URLs, the goal and the duration have to be specified. Finally, the third procedure requires evaluators to select the users, either from a list of users already registered to the platform, or by typing their email addresses. The invited users receive an email including the instructions for participating in the usability test. The following sub-section illustrates how UTAssistant aids users in performing the test.

3.2 Usability Test Execution

Following the creation of the usability test design, users receive an email with information about the evaluation they are asked to complete, and a link to access UTAssistant. After clicking on this link, users can carry out the evaluation test, which starts by giving general information about the platform use (e.g. a short description of the toolbar with useful commands), the script for the evaluation and, finally, privacy policies indicating which data will be captured, such as mouse/keyboard logs and webcam/microphone/desktop recordings.

Following this, UTAssistant administers each task, one at a time. The execution of each task is closely guided by the platform, which shows the task description in a pop-up window, and then opens the Web page at which users are asked to start the task (Fig. 1). To keep the platform as minimally invasive as possible during execution of the evaluation test, we grouped all the functions and indications in a toolbar placed at the top of the Web page. This toolbar indicates the title of the current task, its goal, the duration of the task, the task number, and a button to move to the next task, which shows the message “Complete Questionnaire” when the user finishes the last task and is asked to complete the questionnaire(s). During execution of the task, the platform collects all data identified by the evaluator at the design stage, in a transparent and non-invasive way.

Fig. 1.
figure 1

An example of execution of a task. The UTAssistant toolbar is shown at the top of the evaluated website page.

3.3 Evaluation of Test Data Analysis

One of the most time-consuming phases of a usability test is the data analysis, since evaluators are required to manually collect, store, merge and analyze a huge amount of data such as mouse logs, video/audio recordings and questionnaire results. Due to the effort required, this phase becomes a deterrent towards the adoption of usability testing techniques. UTAssistant automates all of these activities, thus removing the barriers to the analyses of usability test data. The evaluators have access to the data analysis results via the control panel and can exploit several functionalities that provide useful support in discovering usability issues. The next sub-sections present an overview of some of these tools.

3.4 Task Success Rate (Effectiveness)

Analysis of the results of the usability test often starts by investigating the task success rate, an essential indicator of the effectiveness of the website in supporting the execution of a set of tasks. This metric is calculated as the percentage of tasks correctly completed by users. It can be also calculated for each task, thereby estimating the percentage of users who completed that task. UTAssistant calculates these frequencies and displays them in a table (Fig. 2).

Fig. 2.
figure 2

Example of a table reporting the success rates of a study. The columns display the tasks, while the rows show a list of users. The last row reports the success rate for each task, while the last column depicts the success rate for each user. The overall success rate is reported below the table.

3.5 Questionnaire Results

Another phase requiring a great deal of effort by evaluators is the analysis of the results of the questionnaire. Using UTAssistant, evaluators can administer one or more questionnaires at the end of each usability evaluation. The platform automatically stores the user’s answers and produces results in the form of statistics and graphs. For example, if the SUS [5, 6] questionnaire is used, UTAssistant calculates the global SUS score (a unidimensional measure of perceived usability [12]), the usability score and the learnability score. In addition, different visualizations can display these results from different perspectives, e.g. a histogram of each user’s SUS scores, a box-plot of SUS score/learnability/usability (Fig. 3), and a score compared with the SUS evaluation scales (Fig. 3).

Fig. 3.
figure 3

Example of SUS score plotted on the three SUS scales.

3.6 Audio/Video Analysis

While users are executing the tasks, UTAssistant can record the user’s voice using the microphone, their facial expressions using the webcam, and the desktop display using a browser plugin. This recorded content can be analyzed by evaluators in order to understand, for example, low performance in executing a particular task or the reasons for a low success rate. To support a more effective audio/video analysis, UTAssistant provides annotation tools, so that when evaluators detect the existence of difficulties, as indicated by means of verbal comments or facial expressions, they can annotate the recorded audio/video tracks. If the evaluators decide to record both camera and desktop videos, the video tracks are merged and displayed together.

3.7 Mouse/Keyboard Logs Analysis (Efficiency)

Important information about the efficiency of performing tasks is given by metrics such as the time and number of clicks required to complete each task. UTAssistant tracks the user’s behavior by collecting mouse and keyboard logs. Based on the collected data, the platform shows performance statistics for each task, such as the number of pages visited, the average number of clicks and the time that each user needed to complete the task (Fig. 4).

Fig. 4.
figure 4

Summary of metrics measuring performance related to three tasks.

4 UX Evaluation Methodology for Assessing UTAssistant

4.1 Methodology

The UX evaluation design proposed here is an experimental methodology, consisting of four phases:

  • Phase 1. Heuristic evaluation of the UTAssistant platform;

  • Phase 2. Usability evaluation with PA practitioners, under workplace conditions;

  • Phase 3. Usability evaluation with Web end-users, under experimental laboratory conditions;

  • Phase 4. Usability evaluation with Web end-users, under remote online conditions.

4.2 Objective

This experimental methodology aims to provide a new approach to assessment of the UTAssistant semi-automatic usability evaluation tool. This methodology combines expert assessment methods with usability evaluation models, under workplace, laboratory, and remote online conditions. The implementation of the UX evaluation methodology for the UTAssistant platform is planned for future work.

4.3 Methods and Techniques

The experimental methodology proposed here involves both usability assessment and psychophysiological measurement methods. Different methods and techniques are used in each phase, as described below.

Phase 1. Heuristic Evaluation.

This is an inspection method [13,14,15,16], which consists of experts assessing the usability of a product. In general, the experts involved in heuristic evaluation use a list of principles, also called heuristics, to compare the product with a baseline representing how the product should meet the main usability requirements. Heuristics are based on sets of features for the ideal matching of a user model. At the end of each evaluation, the expert carrying out a heuristic evaluation provides a list of problems and related suggestions.

Phase 2. Usability Evaluation Under Workplace Conditions.

During Phase 2, users follow the evaluation methodology provided in the eGLU 2.1 protocol, as explained above in Sect. 2. A tailored procedure that applies the eGLU 2.1 protocol for usability evaluation tests is provided to the PAs involved in the UTAssistant experimental project. This protocol differs from eGLU 2.1 in that it uses the think-aloud (TA) technique rather than the partial concurrent think-aloud (PCTA) technique (see Sect. 4.3, Phase 3). The TA technique is used in traditional usability testing methods [17,18,19,20,21,22,23,24] and is especially useful in indoor conditions such as laboratories or workplaces. The TA technique asks users to verbalize (“think aloud”) each action and the problems they encounter during their interaction with the system. Evaluators are asked to transcribe and analyze each user action in order to identify interaction problems.

Phase 3. Usability Evaluation Under Laboratory Conditions.

Usability testing of the UTAssistant platform is also conducted under laboratory conditions. In this phase, evaluators use the PCTA technique, created by some of the current authors [25,26,27,28] to provide a technique for easily comparing collected data with blind, cognitively disabled, and non-disabled users. The PCTA technique asks users to interact silently with the interface and to ring a bell on the desk whenever they identify a problem. In the PCTA technique, all user interactions are registered. As soon as the test is complete, the user is invited to identify and verbalize any problems experienced during the interaction [29].

Any psychophysiological reactions of the users that may occur during this interaction are measured using two bio-behavioral measurement techniques: (i) facial expression recognition; and (ii) electroencephalography (EEG). The EEG method allows practitioners to record the electrical activity generated by the brain using electrodes placed on the user’s scalp. Due to its high temporal resolution, the EEG is able to analyze which areas of the brain are active at any given moment. The scientific community also recognizes a limited number of facial expressions (about 45) as universally able to express hundreds of emotions resulting from the combination of seven basic emotions [30]: joy, anger, surprise, fear, contempt, sadness, and disgust. In human beings, the user is mostly unaware of the ways in which facial muscles express basic emotions [31]. An analysis of involuntary facial expressions returns information about the emotional impact on the users of an interaction with a given interface.

Phase 4. Usability Evaluation Under Remote Online Conditions.

In this phase, users are recruited through a Web recruitment platform and redirected to the UTAssistant Web platform. This methodology has previously been validated for psychological studies [32, 33].

4.4 Material and Equipment

Phases 1, 3, and 4 use the UTAssistant platform to evaluate the Ministry of Economic Development (MiSE) website (http://www.sviluppoeconomico.gov.it), while Phase 2 uses the platform to evaluate the websites of each PA involved under workplace conditions. All phases are conducted using either desktop or laptop computer with a screen size of between 13ʹʹ and 15ʹʹ, and a minimum resolution of 1024 × 640. Computers should be equipped with a Google Chrome browser (http://www.google.com/intl/en/chrome). Computers should be plugged into a power source, and the brightness of the display should be set to the maximum level. In each phase, different materials and equipment are used, as described below.

Phase 1. Heuristic Evaluation.

Many heuristic lists are proposed in the literature [29]. In this work, we use 10 heuristics for Web interface analysis created by Nielsen and Molich [16]; these take into account many aspects of the user interaction such as safety, flexibility, and efficiency of use. The Nielsen heuristics are based on 10 principles derived from a factorial analysis carried out on a list of 249 problems detected by many usability evaluations.

Phase 2. Usability Evaluation Under Workplace Conditions.

This phase uses a tailored protocol asking managers of PA websites to evaluate them in conjunction with users. This evaluation should be done using the UTAssistant platform with a desktop or laptop computer.

Phase 3. Usability Evaluation Under Laboratory Conditions.

In this phase, UX experts are asked to measure user interaction by means of two bio-behavioral measurement devices: a facial expression recognition system, and an EEG. Both devices return data that can be synchronized using a biometric synchronization platform called iMotions (http://imotions.com).

Phase 4. Usability Evaluation Under Remote Online Conditions.

Tests are administered using an online recruitment procedure involving a crowdsourcing platform for psychological research called Prolific Academic (http://www.prolific.ac).

4.5 Subjects

Phase 1. Heuristic Evaluation.

A heuristic evaluation requires a small set of between three and five expert evaluators.

Phase 2. Usability Evaluation Under Workplace Conditions.

PA Web managers are asked to conduct their tests with a minimum of five participants.

Phase 3. Usability evaluation under laboratory conditions.

Ten participants are involved, equally divided by gender.

Phase 4. Usability Evaluation Under Remote Online Conditions.

One hundred users should be recruited. Participants should be equally divided by gender and language (50 native English speakers, and 50 native Italian speakers).

4.6 Procedure

Phase 1. Heuristic Evaluation.

Experts are asked to evaluate the main actions required by the UTAssistant platform to assess a website. In particular, experts are asked to evaluate the user experience of an evaluator using UTAssistant during the following actions:

  • Create a new usability test with UTAssistant in order to evaluate the MiSE website.

  • Define four user tasks.

  • Determine which user questionnaires will be administered to users at the end of the test.

  • Define which export data the system should record during the interaction.

  • Export navigation, questionnaire and log data.

  • Use the help function.

Phase 2. Usability Evaluation Under Workplace Conditions.

The Web managers involved in this phase are asked to evaluate the usability of their PA website. Web managers are asked to perform the same actions as required in Phase 1, and then to evaluate their websites with users recruited from within their workplace. Users should be asked to navigate the administration website to carry out four tasks, presented in the form of usage scenarios. A help service embedded into the platform is provided to users, which activates an error message and automatically sends a request to a remote help service.

Phase 3. Usability Evaluation Under Laboratory Conditions.

In this phase, users are required to perform the test in a quiet and sufficiently bright environment, using a comfortable chair placed at least 50 cm from the screen of a desktop or laptop computer. Users are asked to navigate the MiSE website to carry out the four tasks previously created by the UX expert conducting the sessions. Tasks should be presented to the users in form of usage scenarios.

Phase 4. Usability Evaluation Under Remote Online Conditions.

Online participants should be redirected to the UTAssistant platform to evaluate the MiSE website. In this phase, participants are asked to set up their devices as required in Phases 1, 2, and 3, and to perform the same tasks as defined in Phase 3, presented in the form of scenarios.

4.7 Data Collection

At the end of each phase, evaluators are asked to store their collected data in a database hosted by the Superior Institute of Communication and Information Technologies (ISCOM), Italy. Stored data will be analyzed, phase by phase, in aggregate form. Statistical analyses and comparisons will be carried out using the IBM SPSS platform, and then discussed and disseminated through reports and conference papers.

5 Conclusion

The paper describes UTAssistant, a semi-automatic usability assessment web platform for Italian PA, and proposes a new experimental evaluation methodology for assessing the UX of the proposed platform. Both UTAssistant and the experimental assessment methodology were developed as part of a multidisciplinary project involving design engineers, UX experts and PA Web managers. UTAssistant is a new tool aimed at the international scientific community; its goal is to provide a standardized model to guide non-experts in the usability of PA websites, in a quick and straightforward way, to meet international usability protocols and standards. Unlike the most commonly usability evaluation methods, the assessment methodology proposed for evaluating the UTAssistant platform uses bio-behavioral measures in addition to the standard validated usability assessment methodologies. The methodology proposed here provides an evaluation strategy that avoids the involvement of social desirability factors (often related to explicit satisfaction questionnaires), since bio-behavioral measures are hidden from users. This work is part of a two-year project (2017–2018) involving Italian PAs, the University of Bari and the University of Perugia. In future work, the proposed experimental methodology will be implemented to assess the UX of the UTAssistant platform.