Keywords

1 Introduction

Usability evaluation enables researchers to identify problems of existing systems and products and provide insight for system improvement. Recognized as a key quality attribute of software, usability is defined as the degree to which a product can be effortlessly operated by users to accomplish particular objective with efficiency, effectiveness, and satisfaction (Harrati et al. 2016). In the domain of human-computer interaction, the usability of mobile applications has been highlighted since well-designed applications improves user experiences (Hoehle et al. 2016).

Meanwhile, with the prevalence of mobile devices, mobile health (mHealth) supporting medical care are becoming more common. Among the mobile applications, applications targeting diabetes self-care were developed. Diabetes, one of the chronic illness, affect around 415 million people worldwide; based on estimation, 193 million people suffer from undiagnosed diabetes (Chatterjee et al. 2017). Moreover, study pointed out that diabetes is a main cause of hospitalization and in-hospital mortality (Li et al. 2018). Diabetes requires patients to monitor their own health condition. Poor glycemic control leads to complications including heart disease or stroke, visual impairment, hyperglycemic crisis causing death, and limb amputation (Fu et al. 2017). Based on a study of diabetes age from 3419 adults with T2DM, the mean (± SD) age was 62.9 ± 12.5 years (Nanayakkara et al. 2018).

Despite the prevalence and the complications of diabetes, the conditions of patients can be alleviated by means of appropriate management. Diabetes self-care is essential to blood glucose levels control (Lin et al. 2017). According to studies, mobile applications assisting self-care has been found to be beneficial to diabetes patients. Mobile health applications could assist diabetes self-care through monitoring of blood glucose, weight, and dietary (Hoppe et al. 2017). Research found that diabetes applications were associated with improved glycemic control and have great potential to assist diabetes self-care (Fu et al. 2017).

Technology could assist diabetes self-care improvement while studies also pointed out the need for assessment of the experienced usability of users in the field of mobile health applications. Furthermore, the insufficiency of usability has been identified as one of the obstacles in the adoption of health information systems. Although the number of new healthcare applications greatly increased in the last few years, the usefulness of the applications is inconsistent (Rose et al. 2017). In addition, studies found that the prevailing obstacle in the adoption and operation of health information systems includes the ambiguous design and low usability (Khajouei et al. 2018). With the continuous growing number of patients suffering from diabetes and elderly patients, the usability of applications assisting diabetes self-care is especially important.

The objective of this study was to evaluate the usability of three existing diabetes self-care applications and explore the main pros and cons users pointed out during the operation of the applications. Usability evaluation study explores the problems of the system and assists designers in generating improved design solutions. Therefore, usability test, an effective and widely adopted method assessing the product by means of practical task scenario of product usage (Sonderegger et al. 2016) was utilized.

To collect more detailed information about the operation process of applications, participants followed think-aloud protocol. Think-aloud protocol has been widely applied to usability evaluation studies. With think-aloud data collection method, participants share their thoughts with the researcher (Verkuyl et al. 2018). Researchers conducting the usability testing of an electronic system utilized concurrent think-aloud moderating technique to encourage participants vocalize their thoughts during the test sessions (Aiyegbusi et al. 2018). Study showed that think-aloud protocol analysis and realistic simulations provided a presented a successful usability evaluation (Chrimes et al. 2014).

2 Methods

2.1 Participants

A total of 30 participants were enrolled in the usability evaluation of diabetes self-care mobile applications from a diabetes clinic. The basic criteria included a type 2 diabetes diagnosis, regular usage of smart phone, and normal vision.

The participants consisted of 15 women and 15 men between 41 and 78 years of age. Participants had a mean age of 60.03 years (SD = 8.92). Participants did not report any vision problems that interfered with mobile application operation. All participants had a type 2 diabetes diagnosis and the habit of using smart phones and tablet. The table below presents the participant demographics (Table 1).

Table 1. Participant demographics (n = 30).

2.2 Experimental Materials

The main experimental materials in this study included three different mobile applications for diabetes self-care, and a System Usability Scale (SUS). This study conducted the survey in a paper-based format. The three applications were actual products that could be downloaded for either iOS or Android systems from App Store or Google Play. For each app, the content was the same when operated in both systems. All three diabetes applications could present the entered blood glucose level in a graph view.

Since the study included elderly participants, mobile phone with larger screen was used to present the experimental stimuli. An iPhone 7 Plus with 5.5 in. (diagonal) widescreen LCD and 1,920 by 1,080 pixel resolution was adopted. The three applications were installed in the iPhone used in the experiment (Fig. 1).

Fig. 1.
figure 1

The interfaces of the three evaluated diabetes applications.

2.3 Experimental Task Scenarios Design

As a widely adopted and very effective method of usability evaluation, the purpose of usability test is to assess the product by constructing a practical product usage task scenario that involves prospective users (Sonderegger et al. 2016). Therefore, to design practical experimental tasks, this study conducted interviews at a diabetes clinic, one of the certified organizations of the National Diabetes Shared Care Network. Through the interview, 6 medical personnel including 2 diabetes dietitians and 4 diabetes educators (also registered nurses) identified the diabetes application functions most important and fundamental to diabetes patients’ self-care. A feature list with 27 functions categorized from a study of 40 diabetes targeted applications (Hoppe et al. 2017) was provided to the personnel as the basis of important function identification. The selected functions became the basis of tasks scenario design for the usability evaluation.

The task scenarios included (1) Enter blood glucose value before breakfast into the application, (2) Enter blood glucose value after breakfast and enter food intake by adding a photo of the meal into the application, and (3) View the blood glucose measurement in a graph format.

Since the standard blood glucose value is different for before and after meal, entering the “before” or “after” meal in the record is important. For diabetes patients aiming for a target A1C of below 7%, blood glucose levels should mostly be under 130 mg/dl before meals and under 180 mg/dl after meals (Gebel 2011). For task one, a blood glucose value before meal was provided to the participant. For task two, a blood glucose value after meal was given. When using the applications, the picture of food consumed should be taken first. After finishing the meal, patients measure and put the glucose value with meal photo into the application. Therefore, a photo was provided in the album of the iPhone.

2.4 Procedures

The procedure of the experiment was as follows: All participants were informed about the purpose and experimental process. To reduce the stress of participants, the moderator highlighted that the purpose of the experiment was to evaluate the usability of the mobile applications instead of the performance of participants. Basic information was collected from the participants, including their age, frequency of using smart phone and tablet, and education backgrounds. Participants were asked to use think-aloud protocol as a strategy during the operation process.

Each task was printed on an A4-sized paper. Necessary information for task completion was provided below the task scenario description, including given blood glucose values of “before” or “after” breakfast information. For the task of entering food intake, a meal photo was prepared in the photo album of iPhone. During the operation, when the participants were not sure where to tap and asked for assistance, hint was provided. The moderator suggested an area, such as the upper or lower part of the screen, instead of directly pointing at the button. Following think-aloud protocol, the participants were asked to describe the issues that influenced the usability of the systems during the operation of mobile applications. After completing the three tasks, the participants finished the SUS subjective assessment to evaluate the overall usability of the applications. For each participant, the process of completing the three task scenarios and SUS was repeated to evaluate the three diabetes self-care applications.

3 Results and Discussion

3.1 Quantitative Results

The SUS (System Usability Scale) means of the three evaluated diabetes applications were first compared based on the scale matching the SUS scores, acceptability ranges, and grade scale. This study used repeated measures ANOVA to analyze the SUS scores of the three applications. The effect of gender on the SUS scores was analyzed with a t-test. The three diabetes applications were referred to as App 1, App 2, and App 3 in the results and discussion sections.

Developed by Brooke (1996), the ten-item SUS has been widely applied to usability studies. Researchers further established the five grade scales and six adjective ratings that correlates with the SUS score (Bangor et al. 2009) (see Fig. 2). Based on the means of SUS scores, in terms of acceptability ranges, App 1 (M = 69.75) and App 2 (M = 65.42) were “marginal” while App 3 (M = 82.50) was “acceptable.” In terms of grade scale, App 1 and App 2 received a “D” scale while App 3 received a “B” scale.

Fig. 2.
figure 2

A comparison of the average SUS score in relation to acceptability scores, grade scale, and adjective ratings (Bangor et al. 2009).

There was a statistically significant difference between groups (F(2,87) = 10.231, p < .001). The results of the repeated measures analyses of variance revealed that the SUS scores of App 3 were significantly higher than that of App 1 and App 2. Post-hoc analysis using Scheffe indicated significant difference between the SUS scores of App 3 and App 1, as well as App 3 and App 2. The SUS scores did not differ significantly between App1 and App 2 (Tables 2 and 3).

Table 2. SUS mean and standard deviation of the experimental data.
Table 3. Multiple comparisons: SUS scores of the three applications.

An independent-sample t-test revealed no significant effect of gender on the SUS scores of the three applications, App 1: t(28) = 0.807, p = 0.426; App 2: t(28) = 0.391, p = 0.698; App 3: t(28) = 1.678, p = 0.104. For each application, both male and female participants gave similar evaluations to the usability (Table 4).

Table 4. Independent sample t-test for gender effects on different apps

3.2 Qualitative Results

The following sections presents the usability issues identified by means of think-aloud protocol when the participants operated the applications to complete the assigned tasks. With the screenshots of the operation flow shown in the figures and description of the operation procedure, these findings could be a reference for diabetes application usability improvement and user interface (UI) design.

App 1.

When opening App 1, the button entitled “Add a new record” at the bottom of the interface (see Fig. 3(a)) enabled the participants to understand its function. When entering blood glucose (see Fig. 3(b)), many participants were not sure where to tap even though the “Tap here” hint was shown in the text field next to the mg/dl. Below the mg/dl, another way of entering the blood glucose was using the slider, a horizontal track with a control. By tapping the plus and minus symbol on either end of the slider, the value could be adjusted. However, the function of the colorful slider was not clear to most of the participants.

Fig. 3.
figure 3

Operation flow screenshots of App 1.

At the section of entering the before or after meal timing, the title “Event” was not clear to participants (see Fig. 3(c)). After tapping the text field with rounded corners, a picker with a scrollable list was displayed. In addition, even if the participants overlooked the text field entering the before or after meal information, the blood glucose value record could still be saved. However, the information was critical to blood glucose self-care since before or after meal has different standards. Therefore, an error prevention mechanism to assist the correct procedure was required.

In App 1, the items for entering various information were placed on the same interface as a long form, including insulin amount, exercise, health condition, medicine taken, blood pressure, and body weight. However, participants found the interface complex and confusing with too much content. Instead, only the items most important to diabetes self-care should be displayed.

Except the “add a new record” button, the format of buttons in App 1 were ghost buttons, the button bordered by a thin line (see Fig. 3(b)). To participants, the buttons were difficult to find among the text on interface layout (Table 5).

Table 5. App 1 issues identified in think-aloud protocol.

App 2.

When opening App 2, there were twelve buttons (see Fig. 1(b)). Therefore, the participants need to spend more time searching for the place to enter blood glucose. The button entitled “Physical measurement” was not immediately associated with blood glucose. Instead of “Physical measurement”, before or after meal could be a better description. Since the first button includes the functions of entering blood glucose, blood pressure, heart beat, body weight measurement, the title of the button should be adjusted to include the above functions yet simple to understand.

Fig. 4.
figure 4

Operation flow screenshots of App 2.

According to participants, after entering the blood glucose entering section, the buttons on the top could not be noticed immediately since the green background color blended with the color on the top of the interface (see Fig. 4(b)). However, when the participants forgot to tap the before or after meal button, a reminder was provided by the application. The application prevented the users from going to the next section without recording before or after meal. The font size of the description above the buttons were too small. Participants with presbyopia had difficulty reading the description.

In addition, the transparent design of the “after breakfast” button was not clear enough. To several elderly participants, the color was too light to be noticed. Participants mentioned that both before and after meal font should be clearly displayed simultaneously while the underline could be used indicate the selected button. When the entered blood glucose value exceeds the standard level, App 2 provided an instant reminder to the users.

The formant of buttons in App 2 was ghost button bordered by a thin line, with transparent internal area consisting of plain text, as shown in the upper right corner of the figure. For the layout, important buttons were placed on the upper right corner, including the “add a new record” and “complete” buttons. Compared to App 3, Participants spent more time searching for buttons when using App 2 (Table 6).

Table 6. App 2 issues identified in think-aloud protocol.

App 3.

In App 3, participants found the “Next step” button useful because it provided guidance to the continuing process (see Fig. 5(a)). After entering the blood glucose value, the “Next step” button led users to the next section entering the before or after meal where eight buttons were clearly displayed, including before meal, after meal, before exercising, after exercising, fasting, before sleeping, midnight, or others (see Fig. 5(b)). Once a button was tapped, the following question “Which meal?” was presented below. With the question, four additional buttons were displayed, including breakfast, lunch, dinner, and dessert (see Fig. 5(c)).

Fig. 5.
figure 5

Operation flow screenshots of App 3.

The format of buttons in App 3 were filled button with rounded corners. The tapped buttons were colored while the other buttons remained unchanged. Compared to App 2, the areas of App3 buttons were twice larger than that of App 2 (Table 7).

Table 7. App 3 issues identified in think-aloud protocol.

Comparison of the Applications.

The major findings of the think-aloud protocol in this study were compared and discussed in the following points: First, “Guidance” was highlighted by the participants. Most participants found the guidance of App 3 helpful since they did not need to spend much time thinking about what to do next. With the “Next Step” filled button, the system directed users to the continuing procedure. In contrast, the other two applications did not provide clear guidance. In App 1 and App 2, various text fields for information inputting were displayed on an interface. Although participants could input information with random sequence, most of them found the user interface layout too complex.

Second, choices presented as buttons tend to help participants input information more efficiently. In App 1, all choices of before or after meal were integrated in the picker below the “Event” button. Therefore, the participants had to tap the “Event” button first in order to see the scrollable list with eleven choices including before breakfast, after breakfast, before lunch, after lunch, etc. In App 2, the choices of meal were displayed as six buttons, including fasting, breakfast, lunch, dinner, before sleep, and midnight. After tapping any of the six buttons, the submenu of before or after meal was displayed. App 3 presented all choices of the before or after meal at once with eight buttons, including before meal, after meal, before exercise, after exercise, fasting, before sleep, midnight, and others. Users did not need to tap the button to see the submenu. According to many participants, instead of using a picker, the choices should be presented as buttons for better usability.

Third, issue with interaction area was identified. During the observation, when the participants tried to tap a button but did not successfully hit the interaction area, they tend to give up and try tapping other buttons, especially in App 2, the application with the smallest buttons among the three applications. Therefore, the size and interaction area of the button could be extended for an improved usability. According to the research on elderly usage of interface, the surface of the interaction area should be extended instead of being limited to a small area (Castilla et al. 2018). For instance, the area of the buttons in App 3 were larger than that of the other two applications. To participants, larger buttons were easier to tap. Previous research on the effect of button size on performance and perceptions found that users generally had better performance with medium to large buttons when using touchscreen (Tao et al. 2018).

Consequently, error prevention was important to usability. For blood glucose monitoring, the before or after meal information is critical since the before or after meal has different standards. Inputting the record with before or after meal is mandatory. Therefore, the applications should assist the users in selecting before or after meal choices labeling the entered blood glucose value. For App 1, the before or after meal buttons could be skipped by mistake. In App 2, if the before or after meal buttons were not tapped, a reminder preventing users from saving the record appeared. However, many users could not find the “after meal” button due to its transparency. App 3 actively guided the participants to select before or after meal buttons during the information input process. Error prevention, one of the identified usability principles (Fung et al. 2016), should be highlighted in the design of diabetes applications.

In sum, based on the think-aloud protocol, participants mentioned that clear and simple user interface layout would increase the usability of the diabetes applications, especially to elderly users. Most importantly, “guidance” was frequently highlighted by participants. Applications for diabetes self-care should actively provide clear guidance that led to the following steps instead of displaying complex information on the interface. Font size and buttons should be larger for clear recognition, especially to users with presbyopia. If the usability of the applications were insufficient, the elderly users may give up using the applications.

4 Conclusion

This study evaluated and compared the usability of three existing diabetes self-care applications by means of usability survey, SUS. Based on the results of SUS, the usability of App 3 was significantly better than that of the other two applications. According to the independent t-test, gender did not have a significant effect on the SUS scores. Through think-aloud protocol, usability issues were identified in all three applications. Issues highlighted by elderly participants should be taken into consideration by designers since diabetes patients was composed largely of elderly people. With the quantitative scores, the qualitative results of think-aloud protocol, and the presented screenshots of the operation flow, the results could be applied to the interface design and usability improvement of diabetes self-care applications.