1 Introduction

Heuristic evaluation is a low-cost but effective evaluation method of usability testing [1]. According to the Usability Professional Association (UXPA) Salary Survey 2010, heuristic evaluation is the second highest used testing method by organizations worldwide. In heuristic evaluation, usability practitioners are gathered to review and evaluate interface design [2] based on their expertise and usability heuristics. Compared to usability testing methods such as user acceptance test, and focus group, where a number of users have to be recruited and moderated, less time and resources is used in heuristic evaluation.

A set of usability heuristics had been proposed by Nielsen [3] that could serve as framework for generic usability testing. Sivaji et al. [4] later enhanced the set of heuristics (as shown in Table 1) with study results showing desirable outcome when generic heuristics were integrated with domain-specific heuristics.

Table 1. General-purpose heuristic for usabaility testing

The heuristics in [4] originated from software usability expertise where the main form of interaction was done via mouse and keyboard, before the proliferation of mobile touchscreen devices such as smartphones [5] and tablets [6]. Touchscreen mobile devices primary utilize finger gestures [79] as main input. The type of gestures used in touchscreen may consists of simple gestures such as tap, swipe, and pinch, or more complex gestures such edge swipe, multiple fingers swipe, tilting, and shaking. The typical technology of touchscreen allows for multiple concurrent fingers and as a result, infinite amount of gestures can be programmed as inputs.

Norman and Nielsen [10] was critical of the gestural interactions of the time (i.e., those in Apple iOS, and Google Android). The interactions are filled with usability issues due to the refusal of developers to follow established fundamental interaction design principles. Further, when touch-oriented Microsoft Windows 8 operating system was launched, its gesture-based interface was also found to provide poor user experience by having hard-to-discover and error-prone gestures [11]. Thus, it is desirable to minimize the usability problems that arise from designing and developing gestures vocabulary for current touchscreen devices and for devices of emerging technologies such as motion tracking [12], augmented virtual reality, and hologram.

2 Problem Statement

Organizations with dedicated usability testing team usually have established usability testing methods and its corresponding usability heuristic framework (e.g., example in Table 1). The existing methods and usability frameworks are designed to for general-purpose usability testing. If the new gesture-specific heuristics are not created, it is hypothesized that fewer problem would be found by heuristics evaluators [10]. The gesture-specific domain knowledge would result in new gesture usability heuristics. These new heuristics need to be created with minimal overlapping with the existing general-purpose heuristics.

In addition, it is harder to find usability problems during early and middle part of the product development life cycle [13] without gesture-specific heuristics. This problem is more severe for complex gestures [14]. When usability problem is found at the late stage of the product development, it may be costly or impossible to fix.

Further, designing gestures is a time-consuming and delicate process [14]. By having gesture-specific heuristics, interaction designer could shorten the gesture design process [15] and also avoid spending time benchmarking on impractical gestures.

3 Literature Review

A systematic literature review is carried out to study previous works. The objective is to adopt a suitable gesture definition and framework. The definition and framework are needed to identify, differentiate, and filter gesture-specific heuristics from subsequent literature review.

3.1 Definition and Framework

The study refers to works done by Karam [16] on gesture framework due to its extensiveness. Here, the term “gestures” refers to an expensive range of interactions enabled through a variety of gesture styles, enabling technologies, system response, and application domain (see Fig. 1). In addition, types of gesture styles (the physical movement of gesture) are reviewed and emphasized. Gesture styles can be grouped into five categories: deictic, gesticulation, manipulation, semaphores, and sign languages.

Fig. 1.
figure 1figure 1

Classification of gesture with highlight on the gesture styles of study scope with omission of sign language from list of gesture styles.

Deictic gestures involve pointing in order to establish spatial location or identity of the object. This is very similar to director manipulation input of a mouse. Deictic gestures are one of the simplest gestures to implement.

Manipulation gestures are gestures that have tight relationship between actual movements of the gesturing fingers, hand, or arm with the entity being manipulation [17]. Manipulations could include two-dimensional movements or three-dimensional movements, depending on the user interface. Manipulations could also involve the use of tangible objects as a medium of input (e.g., a model of object being controlled). The object being controlled could be an on-screen digital object, or a physical mechanical object such as robots.

Gesticulation gestures evolved from non-verbal communication gesture by human that accompany or substitute speech. The gestures rely on computation analysis of body, hand, or arm movements in the context of speech. When used in conjunction with speech command, the gestures add clarity to the speech. Gesticulation gestures can also be used in lieu of speech such as iconic and pantomime gestures.

Semaphores style can be found in any gesturing system that employs a stylized dictionary of static or dynamic hand or arm gestures. Semaphore gestures are widely applied in literature because of its practicability of not being tied to factors that defines other type of gesture styles. In addition, the combination of dynamic and static poses offer infinite amount gestures choices.

Sign languages gestures are considered linguistic-based and are independent from other gestures styles. There are more than a hundred of known sign language in the world [18]. Sign languages are not designable by interface designer; therefore, not testable and, consequently, not included in the scope of this study.

Further, the gesture framework proposed by [16] describes gestural interaction as a composition of intercepting two human-computer interaction (HCI) task artifact cycle [19]. The task cycles comprises four main components from Fig. 1: gestures styles, application domain, enabling technologies and system response; which roles are either to provide possibilities (i.e., methods and medium of interaction) or system responses.

3.2 Gathering of Gesture Usability Heuristics

Once the gesture definition and framework has been determined, the study proceeds to review and gather gesture-related usability heuristics. The selected studies from literature review are: [10], Wach et al. [20], Baudel and Beaudouin-Lafon [21], Wu et al. [22], Yee [23], and Ryu et al. [24].

As mentioned, [10] provides critical analysis of the usability issues in mainstream touch-screen gestural interactions. Six fundamental principles of general interaction design were offered: Visibility, Feedback, Consistency and Standards, Discoverability, Scalability, and Reliability. However, these heuristics overlapped with the generic-purpose usability heuristics of Table 1.

Study in [20] is an older literature on the general state and potential of gestural interaction before the proliferation of smartphone and tablets gestural interaction devices. The set of heuristics outlined were Learnability, Intuitiveness, User Feedback, Low Mental Load, User Adaptability, Reconfigurability, and Comfort. The heuristics are gesture-specific and could be used in a wide range of field from gaming to medical surgery.

In [21], a glove-based gestural interaction named Charade was created. The study proposed a set of heuristics for designing and testing the gestures of the device. Five major heuristics proposed are Fatigue, Non-Self-Revealing, Lack of Comfort, Immersion Syndrome, and Segmentation of Hand Gestures.

Study in [22] has a gestural interaction device based on small table-sized touch-screen surface that utilizes hand and stylus. The study also proposed a set of heuristics for designing and evaluating the gestural interaction. The heuristics are Gesture Registration, Gesture Relaxation, and Gesture and Tool Reuse. Compared to [20], there is more focus on re-using a small set of gestures for multiple functions.

Study in [23] reviews the problem of usability of indirect or abstract gestures which, according to the classification in [16], could be a mixture of gesticulation, and semaphores. The study identified five major needs and the corresponding guidelines for gestural interaction: “achieve high effectiveness”, “deviate potential limitation in productivity application”, “minimize learning among users and increase differentiation among gestures”, “design efficient gestures to increase user adoption”, and “maximize the value of finger gestures”.

Lastly, [24] conducted a systematic gathering and evaluation of information and guidelines of gesture applicability. The amount of heuristics offered is not necessary gesture-specific. Using the definition of gestural interaction in [16], two gesture-specific heuristics have been identified: Naturalness and Expressiveness.

4 Proposed Heuristics

After gathering the heuristics, the study uses a simple phenomenon or phase-based classification method [25] to group and combine these heuristics. The gesture heuristics are put into four phases: “Before using”, “During using”, “After using”, and “Prolonged using” (as shown in Table 2).

Table 2. Proposed heuristics by phase classification

This method of classification is simple to understand and yet comprehensive for all stages of gesture use. Labels are assigned to the classifications based on the generalization of the heuristics inside the classification. The labels are also the names of the four proposed heuristics: Learnability, Cognitive Workload, Adaptability, and Ergonomics (as shown in Table 3).

Table 3. Naming of the four group of gsetures

5 Experiment

An experiment was conducted in order to verify the gesture heuristics. Five usability practitioners were invited to participate in a heuristic evaluation using the gesture-specific heuristics. The participant ages are between 25 to 55 year old, with at least three years of usability-related experience. The number of participant is considered sufficient because heuristics evaluations are usually performed using a recommended three to five evaluators [2].

The testing was carried out on basic touch-screen gestures related tasks on an iPhone 5 running iOS 8.1.1 operating system. The tasks are phone unlocking, internet browsing, map navigation, note taking, and home screen miscellaneous operations. The tasks involves using the gestures available [7].

Initially, the usability practitioners were instructed to perform heuristic evaluation to identify issues and good design features of the gestures used during the tasks. At this point, the practitioners can only rely on existing heuristic guideline in Table 1.

Subsequently, a waiting period of two days imposed in order to eliminate bias from the first session. After that, the GH (gesture heuristics) descriptions and its related literature [10, 2024] were taught to the practitioners. Then, the usability practitioners perform the heuristics evaluation again with the additional of the new gesture heuristics. From this, the null and alternative hypotheses are as follows:

  • \( \varvec{H}_{\varvec{0}} :\varvec{ \mu }_{{\varvec{before }\;\varvec{GH}}} =\varvec{\mu}_{{\varvec{after }\;\varvec{GH}}} \) there is no significant difference in the mean numbers of defects for before and after using the gesture heuristics.

  • \( \varvec{H}_{\varvec{a}} :\varvec{ \mu }_{{\varvec{before }\;\varvec{GH}}} \ne\varvec{\mu}_{{\varvec{after }\;\varvec{GH}}} \) there is significant difference in the mean numbers of defects for before and after using the gesture heuristics.

6 Results

Shapiro-Wilk test supports the hypothesis the data is normal for both with GH and without GH (p > 0.05). A paired-sample t-test was conducted using SPSS vs22 to compare the number of defects found before and after the GH knowledge were transferred to the evaluators. Results reveal that there were statistically significant differences in the number of defect before (mean = 4.4, SD = 0.6782) and the defects found after the knowledge transfer (mean = 6.2, SD = 0.6633); t (4) = 4.811; p = 0.009. Hence, it could be concluded that GH intervention provides significantly more defect than heuristic evaluation performed with GH.

On average, there is an increase of 1.8 issues for each participant when GH is used to find issues. Figure 2 shows the amount of issues found per users while Table 4 shows the statistical analysis results.

Fig. 2.
figure 2figure 2

Experiment result showing number of issues found without and with GH (gesture heuristics).

Table 4. Paired sample t-test analysis results

Among the important findings are that more ergonomics issues were detected with gesture heuristic. This is because the general-purpose heuristics does not involve consideration in prolonged use of a product.

In addition, learnability issues are more likely to be considered valid when using gesture heuristics. This is because the gesture heuristics literature provides better clarity and information on past errors. Further, the gesture Learnability heuristic requires user to be able to perform systematic exploration of menu to learn gestures.

Overall, the participants’ feedback was that the gesture heuristics provides additional domain-specific knowledge that is useful in heuristic evaluation.

7 Conclusion

The study proposed four usability heuristics for the use of heuristic evaluation specifically on the gesture vocabulary used in human-computer interaction. The heuristics are Learnability, Cognitive Workload, Adaptability, and Ergonomics. These heuristics does not include general-purpose software heuristics or hardware performance based-heuristics that could be evaluated in existing usability model. The study gathered six existing usability heuristics models to create the proposed heuristics. An experiment was conducted to verify the effectiveness of the gesture-specific heuristics and the results show that the gesture-specific heuristics did complement the general-purpose heuristics to find more issues. As hypothesized, the gesture heuristics are able to enable heuristics evaluators to discover more usability issues.

8 Future Works

Some of the limitations of this study are that the sample size of the heuristic evaluators were small, and the evaluators belonged to the same organization, culture, and country. Future experiment with more diverse participants could address these limitations. The gesture heuristics could also be used for designing gestures used in new products such as those that utilize hand and body movement as input in large screen environment. In such case, the effectiveness, efficiency, and satisfaction of a set of gestures created by the heuristics in this paper could be gaged and compared with other set of gesture vocabularies that are designed based on general-purpose design or usability principles. In addition, time taken to modify or replace gestures in later stages of the product development could be documented in order to calculate the amount of timesaving that could be gained from using the gesture heuristics.