1 Introduction

Usability testing with users is now one of the most popular method for website evaluation in user-centered interaction design [1]. There are different ways to conduct usability tests where we can observe user’s task performance. Nowadays we can identify over one hundred different usability testing methods [4], for example the experimentation with a moderator in a special usability laboratory [5], or the remote testing in the user’s environment [6]. Many methods utilize different equipment such as video recorders, eye trackers, thermal cameras and EEG devices.

In our research we decided to enhance the eye tracking usability testing with EEG metrics analysis in order to enable experts to provide conclusions about usability of web applications, based on the results of the selected SimplyTick web-based system for analysis [11]. Due to the grouping of webpage elements into different sections (i.e. highlight_boxes) it was possible to evaluate whether they were easily noticeable and helped users to complete the task or if they rather distracted their attention. The data we worked with was collected by Tobii X2-60 eye tracker and EMOTIV EPOC EEG devices.

The content of the paper is as follows, the second paragraph presents problem of application of Tobii X2-60 eye tracker in usability testing, the third paragraph introduces Emotiv EPOC application in user emotion determination, the fourth paragraph presents the conducted usability testing experiment of the SimplyTick web-based system with application of eye tracker and EEG analysis, the following two paragraphs discuss the obtained experimental results and further implications.

2 Eye Tracking Testing with Tobii X2-60

Eye tracking is one of the most advanced methods used in usability testing [2]. Gathering the eye tracking data gives us considerably more information about user’s behavior than a standard user tests. In our study we aimed to test the usability of the web application – SimplyTick with Tobii hardware X2-60 – a small and portable eye tracker [3] that approximately shows where people are looking using 60 Hz sampling rate. It enables both qualitative and quantitative research using calculations of several metrics.

In order to conduct the usability testing experiments with Tobii X2-60 and analyze the gathered experimental data we had to use Tobii Studio software. With this software it is possible to identify the areas of interest (AOI), which define the areas in the stimulus that are of interest within the scope of the experimental eye tracking analysis [7]. AOIs enables defining and tracking future experimental events, such as dwelling, transitions and AOIs hits. To improve our eye tracking data analysis, we divided each application’s subpage into elements with different purposes and then allotted them to main AOI areas.

From the Tobii Studio software package we chose several metrics predefined by the producer, that in our opinion would describe the usability characteristics best. Those were [3]:

  • Fixation Count Mean (FCM) – measuring the number of times the participant fixates on an AOI. This metric sets down how many times a user has visited the element (particular AOI) and, following on from this, whether and how much it enchained respondents’ attention.

  • First Fixation Duration Mean (FFDM) – measuring the duration of the first fixation on an AOI. Owing to this metric, it is possible to indicate ‘attractiveness’ of an element – whether it seemed interesting to a user or not.

  • Fixation Duration Mean (FDM) – measuring the duration of each individual fixation within an AOI. Due to this metric we can calculate the average time of respondent’s single look at the particular AOI, so then, whether it is interesting or easy to proceed.

  • Time To First Fixation Mean (TTFFM) – measuring how long it takes before a test participant fixates on an active AOI. The measurement of time starts when the AOI is displayed for the first time. This metric allows to describe the catchiness of an element and characterizes what is the most noticeable part of the webpage.

  • Visit Duration Mean (VDM) – measuring the duration of each individual visit within an AOI, where the visit is defined as the time interval between the first fixation on the AOI and the end of the last fixation within the same AOI, without outside fixations. Owing to this metric we can investigate the average time that users dedicated to each element, and with regard to this, how long it took them to for example, to find information.

  • Visit Count Mean (VCM) – measuring the number of visits within a particular AOI, where the visit is defined as the time interval between the first fixation on the AOI and the end of the last fixation within the same AOI, without outside fixations. This metric calculates how often a user had been visiting an AOI.

By merging values of these metrics the eye tracking experts can provide conclusions about usability of each element [7].

3 EEG Emotion Recognition with EMOTIV EPOC

It is believed that there are many similarities between eye tracking Electroencephalography (EEG), because the sampling frequencies are of the same range and both signals can be investigated as the process is measured [7]. There are many different EEG technologies, which are based on high- and low- impedance, that need different post-processing. Most EEG measurements are non-invasive and are based on the head surface measurements, which must take into account individual variance of the thickness of the skull and scalp. EEG data are usually presented in form of waves that correspond to the activity of the brain. These waves could be manually or automatically interpreted, i.e. it is possible to detect some emotions connected to decision making or reacting to a particular stimulus.

There are many commercial EEG devices available nowadays, i.e. Neurosky, Mindflex, EMOTIV and it is believed that the best low-cost (about 750 USD) device is the last headset – EMOTIV EPOC [8]. It’s Software Development Kit for research includes 14 channels based on the international 10–20 electrode location system (plus CMS/DRL references) that use saline sensors and wireless connection as well as quite a large 12 h operating time without external power supply. The impedance of the electrode is decreased by using saline liquid to 10–20 kΩ. It collects neuro data by recording raw brain signals, at a rate of 128 samples per second and then computes them to emotional indicators [10]. No earlier training nor calibration of a user is needed.

The EMOTIV EPOC is provided with a software suite consisting of the following three different detection applications working in real-time: Expressive, Affective and Cognitive. The first interprets the user’s facial expressions, the second monitors the user’s emotional states and the last enables standard BCI-like control.

EMOTIV EPOC is quite easy to manage and completely non-invasive. The headset is found to cause almost no discomfort whilst installed on the participant’s head, so it can be successfully applied to usability testing, where users’ comfort during data collection is very important.

The EMOTIV Affective monitors several emotional states, however the company doesn’t reveal any exact algorithms for the identification of these emotions. The results are told to have been validated by data collected over recording sessions, and relate to the distribution and correlations in brain networks. The Affective suite monitors the following emotional states [9]:

  • Frustration (frst) – described as an unpleasant feeling arousing while a person is not able to perform a task or cannot satisfy their need. The more helpless they feel, the higher the level of frustration score gets.

  • Short term excitement (shrt) – is experienced when the subject feels the psychological arousal of positive value. The level of short term excitement rises in response to both surprising or distracting situations.

  • Meditation (med) – its score represents a person’s composure and calmness. It’s level gets higher as a person settles.

  • Engagement (eng) – it is experienced when the subject is alert and consciously directs attention towards task-relevant stimuli, the opposite of which is “Boredom”. The level of engagement may increase when a person is concentrated, for example during calculating. What decreases the engagement score is closing of the eyes.

  • Long term excitement (long) – it reflects a person’s general mood (or emotional state), rather than reactions to short surprising stimuli. It is based on the weighted running average of the short-term excitement.

To record EEG data from EMOTIV EPOC and integrate it with Tobii eye tracking, we used the NXRecorder (owned by Eyetracking sp. z o.o.) software, by which it was possible to synchronize the recorded data and the NXWebAnalizer (owned by Eyetracking sp. z o.o.) to visualize the results (see Fig. 1).

Fig. 1.
figure 1

SimplyTick Basic page with defined AOIs (left), and NXRecorder output with AOIs (right).

4 Experiment Description

In order to conduct the usability testing using eye tracing and EEG analysis we selected the prototype of the SimplyTick [11] web-based application. It serves as the management dashboard for online stores that keep track of business’ sales in real-time and finds customers’ preferences. The application was developed within the scope of the BIWiSS grant (mentioned in Acknowledgement section). It is planned to be available for common use by the end of March 2016. For the purpose of our study we used the early prototype version with the following subpages: Basic, Customers and Sale (see Fig. 1).

4.1 Participants

There were 11 people that took part in this experiment – 7 women and 4 men from Warsaw and Wrocław, Poland. All of them were Polish native speakers. Data collected from 10 of them (6 women, 4 men) was included into further investigation. The age of all the participants ranged from 21 to 40 with the mean value equal to 28 years old. 7 participants had a master’s or bachelor’s degree, 2 own a diploma of Incomplete Higher Education and 1 had finished a secondary school with a maturity diploma. They declared that their knowledge of English is at least at B1 level.

4.2 Detailed Experiment Description

The experiment was conducted on December 2nd (Wrocław) and December 3rd (Warsaw). It took place in the Interactive Systems Laboratory at Wrocław University of Technology and in an Eye tracking Laboratory in Warsaw with the attendance of one or two experimenters. The main aim of the study was to enhance the eye tracking usability testing with EEG analysis and to find possible correlations between the eye contact characteristics and emotional reactions during task performance. As it was mentioned before we used Tobii X2-60 eye tracker with Tobii Studio software and the EMOTIV EPOC headset with the Effective software suite.

There were four tasks prepared, each related to testing main the functionalities of the SimplyTick system. Instructions to perform them were given in Polish and participants were to execute all of them, telling the supposed answer out loud. The order of the task was random and it was generated separately for each respondent. There was also a pre-task, conducted to give users the possibility to get acquainted with the webpage and to check if all the devices were connected and worked properly. The results of the pre-task were not given to further analysis. None of the tasks had time a limit and participants were also allowed to ask questions anytime they needed (for example when they had forgotten the instruction). The tasks performed are listed below:

  • Provide the difference in the amount of visits to a website between 2014 and 2015.

  • Provide the most frequently used source of entering the website.

  • Provide information about incomes in 2014 and check users whose operating system had the largest share in these incomes.

  • Check in which month of 2015 the smallest number of orders was placed. Next, provide the mean number of all the orders for this month.

The procedure of the experiment was as follows:

  1. 1.

    Participant reception and explanation of main premises of the study.

  2. 2.

    Introduction to the SimplyTick application in the context of its specification. Each participant was given exactly the same information in the form of a short presentation in English – the application’s characteristics and a role they were asked to play (an owner of some business).

  3. 3.

    Pre-test questionnaire including basic information about the participant – their gender, age, profession and educational background.

  4. 4.

    Installation of EEG data recorder and eye tracker calibration.

  5. 5.

    Test performing with simultaneous registration of eye tracking and EEG data.

  6. 6.

    Post-test questionnaire concerning task realization – difficulty level, description of helpful and disturbing elements, and general readability of the service.

SimplyTick is a web-based application allowing businesses’ owners to control in real-time and analyze sales, campaign efficiency and customers’ preferences [11]. We divided its subpages with regard to specific elements and defined 4 AOI groups, which were further divided as shown in Table 1.

Table 1. SimplyTick elements division

By this definition of AOIs and their groups we aimed to evaluate whether specific elements are easily noticeable and help users to complete the task or if they rather distract users’ attention.

5 Experimental Results

With regards to the outcomes of the recorded data, our initial hypothesis that there is an emotional reflection in eye tracking usability testing was partially confirmed.

5.1 Eye Tracking Usability Testing

To investigate whether the differences in eye tracking measurements are significant we used the t-student test for each pair of AOIs that was calculated with SPSS software from IBM. In the tables presented below each row highlighted green means that the calculated difference is statistically significant (for example: if Sig. (2-tailed) = 4 it should be understood as 0.004, while the assumed significance level is p = 0.05).

Fixation Count Mean.

Table 2 indicates that all the differences between groups of AOI in Fixation Count Mean were statistically significant. As shown in Fig. 2, graphs from all the SimplyTick subpages focused the greatest number of fixations from all the AOI groups, which means that respondents were looking at the graphs most frequently. The bigger the number of fixations, the more attention was paid to the element.

Table 2. Fixation Count Mean table
Fig. 2.
figure 2

Fixation Count Mean chart

First Fixation Duration Mean.

In Table 3, it is shown that there was a significant difference between 1 out of 6 pairs of AOI only. That was: higlight_boxes vs operation sections. As it can be read from the chart – the longest first fixation concerned the operation AOI, what can mean that respondents needed more time to understand elements from this section or that its elements were the more interesting (than in higlight_boxes).

Table 3. First Fixation Duration Mean table

Fixation Duration Mean.

There were statistically significant differences detected in 5 of 6 pairs in Fixation Duration Mean analysis. The longest average time of one fixation related to operation AOI, meaning that respondents probably needed more time to process elements from this section in comparison to elements from other sections (Table 4).

Table 4. Fixation Duration Mean table

Time To First Fixation Mean.

In Time To First Fixation Mean parameter, only one pair of AOI had an insignificant difference (0.051, while p = 0.050). From other sections we can conclude, that the most time-consuming element to notice belongs to the information section. That may indicate that this AOI seemed less visually interesting than others (Table 5).

Table 5. Time To First Fixation Mean table

Visit Duration Mean.

For Visit Duration Mean, there were half as many pairs that exhibited a statistically significant difference. All off them concerned graphs AOI, indicating that on average, visits to this section had been longer than in others (Table 6).

Table 6. Visit Duration Mean table

Visit Count Mean.

Data from Table 7 can both mean that elements from highlight_boxes grabbed participants’ attention the most because of their attractiveness or difficulty to process.

Table 7. Visit Count Mean table

5.2 Eye Tracking Enhanced with EEG

We have compared all emotional indicators (x axis in Fig. 8–17) with Tobi Studio parameters (y axis in Fig. 8–17). To check if there were any correlations to be found, we used Pearson product-moment correlation coefficient (R in Fig. 8–17) with p value = 0,1. In MS Excel we created charts presenting 10 detected correlations that turned out to be statistically significant.

Graphs AOI.

As shown in Fig. 3, there was positive correlation found between Short Term Excitement and Visit Duration Mean. One visit in a graph AOI lengthens, with the increase of instantaneous excitement.

Fig. 3.
figure 3

shrt-VDM correlation

There was also positive correlation found between Engagement and Time To First Fixation Mean (Fig. 4). This interdependence shows that the more the respondent engages, the longer they need to notice the first element from graphs AOI.

Fig. 4.
figure 4

eng-TTFFM correlation

The last correlation detected in graphs AOI was also positive. It was found between Long Term Excitement and Fixation Duration Mean, meaning, that as the Long Term Excitement score increases, fixations while observing graphs AOI lengthens.

Highlight_boxes AOI.

In highlight_boxes AOI there were two correlations found, both positive. First of them (Fig. 5) concerned Frustration score and Time To First Fixation Mean. With the increase of frustration, respondents sneeded more time to notice any elements from highlight_boxes AOI. The second positive correlation was found between Long Term Excitement and Visit Duration Mean (Fig. 6). It indicates, that the higher the score of Long Term Excitement, the longer a single visit in highlight_boxes AOI lasts.

Fig. 5.
figure 5

frst-TTFFM correlation

Fig. 6.
figure 6

long-VDM correlation

Information AOI.

The greatest number of correlations was found in information AOI, 4 of them were negative and 1 was positive.

In Fig. 7 a correlation between Frustration and Fixation Count Mean is presented. It suggests that with the increase of frustration, respondents fixate on information AOI elements less often.

Fig. 7.
figure 7

frst-FCM correlation

Another negative correlation was found between Frustration score and Visit Count Mean (Fig. 8). The more frustrated respondents gets, the less they visit an information AOI.

Fig. 8.
figure 8

frst-VCM correlation

A correlation shown in Fig. 9 is the only positive one found in information AOI. It indicates that as the respondent calms down (the Meditation score gets higher), the duration of a single visit lengthens.

Fig. 9.
figure 9

med-VDM correlation

What is interesting, with increasing levels of Meditation, the number of visits in information AOI decreases (Fig. 10). Taking previous correlations into consideration, even though the number of visits decreases, on average, visits last longer.

Fig. 10.
figure 10

med-VCM correlation

The last correlation found concerned Short Term Excitement and its influence on Visit Count Mean. In information AOI, the higher the level of Short Term Excitement, the less often the respondents visited an element.

Operation AOI.

There were no significant correlations found between emotional indicators and eye tracking measurements in operation AOI.

6 Discussion and Further Implications

By analyzing the eye tracking data for each AOI on a website, it is possible to provide information about the order of seen elements (gaze plot). With having such an important clue, UX experts are able to scan the exploration path for each visited website and then create an optimal layout.

In our experiment, all examined respondents were Polish. The language of the studied application was English, so there could have appeared a cognitive dissonance in their minds that should be considered as a distracting factor. It would explain the highest levels of EMOTIV indicators – Frustration and Engagement, both connected to concentration and task solving. It could have been frustrating for respondents that all instructions were shown or told in their native language, but the website included foreign-language vocabulary some of which they might not understand. Therefore, to translate the instructions and expressions appearing in the application, they had to pay more attention and as a result increase their engagement level, to find an answer.

If any further searching is conducted, it should consider the differences in both eye tracking and EEG data with regards to the difficulty level of a specific task. All the instructions and applications should be written or said in the same language and more emotional states might be measured with the help of other EEG devices. Moreover, the research group should be increased to around 35 participants.