1 Introduction

The Heuristic Evaluation (HE), proposed by Nielsen and Molich in [17], has been widely applied as a low-cost alternative method to evaluate usability of software products [5, 8, 13]. Our previous study showed that the traditional set of heuristics from Nielsen and Molich were still widely applied even to evaluate new technologies as mobile applications [8, 14, 17]. Despite its success, the quality of the outcomes of a HE is associated with the knowledge of evaluators [6, 14, 24].

Participation of usability experts in HEs still represents a high cost for organizations of low monetary power; only a few researches and organizations conduct HEs with the participation of expert evaluator [1, 2, 7, 9, 14, 15, 18, 23]. It is intuitive to understand that adapting a popular method of usability evaluator for novice evaluators can empower such organizations. However, an exhaustive literature review can show the existence of only a few evidences about adapting the method of HE for novice evaluators to support practitioners that resort to them to conduct HE. Studies have investigated adaptations for the method of HE for specific profiles of novice evaluators, but their results are not sufficient to generalize to the entire profile of novice evaluators [3, 10, 11, 19, 21, 22, 25].

In order to investigate about the causes that make HE hardly dependent of people who apply it, we started to figure out possible issues related to the characteristics of the HE as a method. Based on the literature of the field, and on our built-up experience teaching Human-Computer Interaction, we believe that one of the possible difficulties of novice evaluators during a HE is to distinguish the difference among the traditional heuristics of Nielsen and Molich [10, 21, 25]. During one of the courses taught by the authors, 12 out of 15 participants (80 %) said that they had difficulties to distinguish the difference among the ten heuristics of Nielsen and Molich during a HE - this feedback was the first motivation for this study.

Our goal was to explore situations where novice evaluators could possibly misunderstood different heuristics - regarding the traditional set of heuristics of Nielsen and Molich - as similar and develop adaptations for the heuristics in order to mitigate these misunderstandings. Based on three surveys, the results showed in this paper presented heuristics that are probably misunderstood as similar by novice evaluators. In addition, new descriptions for specific heuristics were made in order to mitigate this problem.

The following presents a review of the literature of the field, the design of this study, the surveys conducted, with its results and discussion, and the conclusions of this paper.

2 Literature Review

2.1 Heuristic Evaluation

The Heuristic Evaluation (HE) method was proposed by Nielsen and Molich [17]. A HE consists in three main sessions: preliminary, evaluation and results [20]. In the preliminary session, the evaluators receive the same instructions on how to conduct the HE from the responsible of organizing the evaluation. In the evaluation session, the evaluators analyze the interface aiming to find discordance between the interface and any of the heuristics. In their study, Nielsen and Molich considered a group of nine heuristics to define the method [17]. Later, Nielsen [14] verified the necessity of adding a tenth heuristic to the group. These ten heuristics have been known as the traditional usability heuristics of Nielsen and Molich. The title of each heuristic is shown as follows:

  • Heuristic 1 - Visibility of system status.

  • Heuristic 2 - Match between system and the real world.

  • Heuristic 3 - User control and freedom.

  • Heuristic 4 - Consistency and standards.

  • Heuristic 5 - Error prevention.

  • Heuristic 6 - Recognition rather than recall.

  • Heuristic 7 - Flexibility and efficiency of use.

  • Heuristic 8 - Aesthetic and minimalist design.

  • Heuristic 9 - Help users recognize, diagnose, and recover from errors.

  • Heuristic 10 - Help and documentation.

A full description of the heuristics can be retrieved at Nielsen Norman Group website [16].

At the last session of HE, evaluators define a final list of usability problems identified, rating specific severity to each one of them and suggesting solutions [15, 20].

2.2 Heuristic Evaluation for Novice Evaluators

In the next subsections, we present the proposed approaches classified in the literature, based on their main goals, that intended to get novice evaluators involved in a HE.

Classifying the Expertise of Evaluators. The literature about classification of expertise on usability evaluations is still reduced. At the best of our knowledge, no schema of classification is widely considered as a standard for classifying expert in usability related area. In this context, we highlighted a few important studies that presented a classification of expertise levels of evaluators in HE.

Regarding the study of Nielsen [14], to be an expert evaluator one needs to have several years of job experience in usability area or a post-graduation degree diploma in usability area. Similarly, those professionals that do not achieve this minimum qualification are classified as novice evaluators.

Slavkovic and Cross [24] studied HE for novice evaluators and, in their study, they qualified “graduate and undergraduate students in an introductory course on HCI evaluation methods” as novice evaluators.

At the best of our knowledge, the most structured schema of classification of proficiency in usability evaluation was proposed by Botella et al. [4]. Botella et al. [4] proposed a schema for classifying usability professionals on five different levels:

  • Novice: Professional without a university degree, but at least a training course on HCI, and few hours of practice in usability evaluation.

  • Beginner: Professional without a university degree, but several training courses on HCI, and less than 2,500 h of practice in usability evaluation.

  • Intermediate: Professional with a bachelor degree on usability area, and less than 5,000 h of practice in usability evaluation.

  • Senior: Professional with master’s degree on usability and less than 7,500 h of practice in usability evaluation.

  • Expert: Professional with at least a master’s degree on usability area and more than 10,000 h of practice in usability evaluation.

Nonetheless, the referred classification is still new to the literature and further discussions can be done in order to understand the generalization of these classifications considering different contexts. For the purpose of this study, we advocate that for being an expert in usability area - regarding the Brazilian context - the evaluator should have at least four years of job or research experience in the field of usability.

Adapting HE for Novice Evaluators. Through a literature review, one can see that studies on adapting HE for novice evaluator are still in a reduced number. Adapting a HE implies that it may not lose its main characteristics as a simple method. The need of studies about adaptations to the method of HE was primary shown by Slavkovic et al. [24]. Slavkovic et al. [24] studied HEs conducted by 43 novice evaluators. In their study, they showed that novice evaluators performed superficial analysis of the interface and have difficulties with specific areas of the interface.

A group of studies addressed adaptations of HE to be conducted by specific profiles of novice evaluators, as children and teenagers. A major part of these studies tried to simplify the HE for children as evaluators [10, 11, 21, 22]. These studies showed that children have a better understanding of the children’s user profile and, for this reason, they could be considered as evaluators. MacFarlane and Pasiali [10] showed that the following adaptations can be done to HE aiming children as evaluators:

  • simplifying heuristic description and;

  • changing the severity rating model to a Likert scale using smile faces to represent different degrees of satisfaction.

Results from these studies showed evidences that children can conduct HE adapted for them. Regarding the standard HE, these studies identified that children evaluators can face the following difficulties [22]:

  • understand heuristic description

  • understand severity ratings, and

  • identify similar issues at the result session, if the group needs to generate a unified list of usability issues.

Similarly, Wodike et al. [25] reported a study adapting HE for teenagers as evaluators. Their adaptation used one teenager as a facilitator for a group of teenagers. The role of the facilitator was to instruct his/her group on how to conduct a HE and also motivate them to evaluate the interface. The evaluations occurred in periods of 30 min. After each period, the evaluators had 15 min to free enjoy the interface. The results of this study did not show satisfactory evidences regarding the participation of the facilitator. Nevertheless, Wodike et al. [25] showed a helpful discussion on the theme, according to them the following characteristics of a HE still need adaptations in order to help teenager evaluators:

  • set of heuristic,

  • severity rating scale, and

  • forms for reporting usability problems.

The previous studies about novice evaluators and HE were not sufficient to provide an adaptation of HE for the whole profile of novice evaluators, beside children and teenagers ones, at the best of our knowledge. In this context, a gap remains in the literature on development of adaptations for HE method that can help novice evaluators to qualify their performance.

3 Study Design

The purpose of this study was to adapt HE for novice evaluators. Specifically, we investigated situations in which novice evaluators could possibly misunderstood different heuristics - from the traditional set of heuristics of Nielsen and Molich - as similar in order to adapt them to be better understood by novices. For this reason, we designed three surveys.

Surveys 1 and 2 were planned to obtain data about what is/are the situation(s) that novice evaluators possibly misunderstand different heuristics. Survey 1 was applied with 13 usability experts, and survey 2 with 15 usability novices. Survey 3 was applied with 7 usability experts and planned to find a suggestion of solution for the situation(s) identified in surveys 1 and 2. All participants that took part in the surveys agreed to participate voluntarily. The surveys were limited to the Brazilian context for convenience with the costs.

4 Survey 1

In Survey 1, we aimed to obtain the view from experts about situations in which novice evaluators could possibly misunderstand different heuristics as similar ones. For this reason, we applied this survey with experts with previous experience teaching or coaching (e.g. in software industry context) novice evaluators; it was required in order to ensure that these experts had knowledge about the challenges that novice evaluators may face.

A total of thirteen (13) usability experts took part in this survey. Among them, five (5) respondents were PhD in usability related area, and three (3) were MSc in usability related area. The other five (5) respondents have at least four years of research or job experience in usability related area.

Each respondent was asked to inform whether novice evaluators could possibly misunderstand one specific heuristic with another as similar. The respondents were asked to fill an on-line form containing all the ten heuristics of Nielsen and Molich. For each heuristic, respondents were able to mark another heuristic(s), or option “none”, they believed was/were possibly misunderstood as similar (p.m.a.s.) by novice evaluators.

4.1 Results and Discussion

Table 1 shows the results of Survey 1. Each column shows one of the heuristics of Nielsen and Molich. Each row shows the number of times possible answer was marked.

Table 1. Results of Survey 1. Number of times that a response (rows) was selected for each heuristic (columns). The initials p.m.a.s. means “possibly misunderstood as similar”.

Analysing the results showed in Table 1, one can see that some values appear to be much higher than the others for each column (heuristic). However, further analysis need to be done in order to verify the significance of these higher values among the others of the same column. For this reason, we decided to apply a box plot analysis on each column in order to verify the presence of outliers. We understand that analyzing the presence of outliers we could make a filter to focus our study in the most important cases of possible similarities.

The box plot analysis showed the presence of five outliers among all results (see Fig. 1). For each box in Fig. 1 we described which heuristic that its p.m.a.s. value was identified as an outlier. The following outliers were detected:

  • The value of p.m.a.s. of heuristic 9 was an outlier among the responses for heuristic 1.

  • The value of p.m.a.s. of heuristic 7 was an outlier among the responses for heuristic 3.

  • The value of p.m.a.s. of heuristic 2 was an outlier among the responses for heuristic 4.

  • The value of p.m.a.s. of heuristic 9 was an outlier among the responses for heuristic 5.

  • The value of p.m.a.s. of heuristic 3 was an outlier among the responses for heuristic 7.

Fig. 1.
figure 1

Box plot showing the presence of outliers among the p.m.a.s values for each heuristic (values of each column in Table 1).

Among the outliers detected, the presence of the heuristics 3 and 7 can be highlighted. Heuristic 7 was an outlier among the responses for heuristic 3; and also heuristic 3 was an outlier among the responses of heuristic 7. This kind of reflexivity occurred only between heuristics 3 and 7. For this reason, we believed that we should focus this study on to adapt both heuristics 3 and 7 in order to mitigate the problem of possible misunderstandings. Furthermore, we designed Survey 2 in order to have more insights and possible confirmation about this.

5 Survey 2

Survey 2 was prepared to collect complementary data to the results from Survey 1 in order to help us to define which/what heuristic(s) should be adapted for novice evaluators. In Survey 2, we asked novice about their difficulty for understanding the different heuristics. We did not ask novices for pointing out similarities, as we asked to the experts in Survey 1, because it would be contradictory: if novice evaluators had domain about the misunderstandings they probably perceive when distinguishing the heuristics, they would be capable of distinguishing them as well. We agree and comprehend that each heuristic of Nielsen and Molich is unique and distinguishable from each other, consequently, if novice evaluators understand the heuristic description they would understand its difference from the other heuristics.

A total of 15 novice evaluators took part in this survey. All of them had only an introductory course about Human-Computer Interaction. The respondents were asked to check a level of difficulty of understanding each one of the ten heuristics of Nielsen and Molich. The possible responses were distributed in a 5 options scale varying from “Very Easy” to “Very Difficult” to understand.

5.1 Results and Discussion

The results of Survey 2 are summarized in the graphs of Fig. 2. Each graph shows the responses regarding a specific heuristic. Only one response was possible for each heuristic. Each graph has a five degree scale (horizontal axis) representing the possible responses (levels of understanding easiness): “Very Easy”; “Easy”; “Neutral”; “Diff.” (Difficult); and “Very Diff.” (Very Difficult). The vertical axis shows the number of times that each response was checked by the novices.

Fig. 2.
figure 2

Number of responses of novice evaluators (vertical axis) that had each level of difficulty (horizontal axis) for understanding each heuristic.

We analyzed the graphs showed in Fig. 2 according to two different regions: easiness region (from option “Very Easy” to “Neutral”); and difficult region (from option “Neutral” to “Very Diff.”). Most of the graphs showed most part of responses in the easiness region, what can mean that novice evaluators do not have many difficulties to understand the heuristics. However, two graphs presented most part of responses in the difficult region: graph (c) and graph (g). These graphs referred to the responses about heuristics 3 and 7, respectively. The characteristic of these graphs were in accordance to the findings of Survey 1 and, for this reason, we believe that both surveys 1 and 2 provided evidences that some kind of clarifying of heuristics 3 and 7 should be made for novice evaluators. In this context, we prepared Survey 3 in order to discuss and develop the adaptations needed for heuristics 3 and 7.

6 Survey 3

Survey 3 was aimed to develop adaptations for heuristics 3 and 7 of the traditional set of usability heuristics of Nielsen and Molich. In this survey, we showed the results of survey 1 and 2 to usability experts, and asked them to suggest adaptations for both heuristics in order to mitigate the difficulty that novices have to understand these heuristics and, also, capacitate novices to distinguish each heuristic from the other. The method for this survey was based on the method used in [12].

A total of seven experts took part in survey 3. Among them, two were PhD in usability related area. All the others had at least four years of experience with heuristic evaluation and teaching novice evaluators about the method. Two usability researchers, the authors of this study, analyzed the suggestions made by the experts in order to synthesize them in a final new description for heuristic 3 and for heuristic 7.

6.1 Results and Discussion

The results of survey 3 showed that the experts preferred to adapt the title of the heuristics, while little changes were made in the descriptions. Two usability researchers compiled the suggestions of the experts with the traditional title and description of the heuristics (retrieved from [16]) to achieve the adaptations aimed by this study. The adapted heuristics are as follows:

  • Heuristic 3 - Control to undo and redo actions: Users often choose system functions by mistake - e.g. after actions of trial and error - and will need a clearly marked “emergency exit” to leave the unwanted state without having to go through an extended dialogue. Support undo and redo.

  • Heuristic 7 - Accelerators, shortcuts and efficiency of use: Accelerators (e.g. shortcuts) - unseen by the novice user - may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions.

The evidences from this study were not enough to ensure that these adapted heuristics are actually easier for novice evaluators to understand or distinguish them. However, these were initial contributions to this new field about heuristic evaluation for novice evaluators. Further studies are planned in order to explore the generalization of these findings. In addition, we suggest, as future studies, to validate the use of these adapted heuristics in comparison to the use of the traditional heuristics. Future researches can also explore new possibilities of adapting HE for novice evaluators.

7 Conclusions

Heuristic Evaluation (HE) is a popular method of usability inspection. It has many advantages in comparison to other methods. However, it is still dependent on the expertise of evaluators to produce results of quality. The conduction of HEs by novice evaluators is still not well supported.

In this study we investigated reasons and ways to adapt the HE for novice evaluators. The results of this study showed evidences that heuristics 3 and 7 of the traditional set are probably the most difficult for novice evaluators to understand and distinguish from each other. For this reason, we developed adaptations for these two heuristics, based on the knowledge of seven usability experts. The effectiveness of these adaptations in helping novice evaluators still needs to be tested.

Much is still needed in order to have a well-defined adaptation of HE for novice evaluators. This study was limited to a sample in the Brazilian scenario. Future studies can replicate our method with larger samples, also considering contexts from other countries. In addition, further researches can verify the validity of the adapted heuristics and compare the difference of performances among groups of novice evaluators using the traditional set of heuristics, and group of novice evaluators using our new set of heuristics adapted for them.