Keywords

1 Introduction

With the improvement in miniaturization of computational devices and also in wireless communications, the last years have seen an increase in ubiquitous applications development [1]. Such applications completely change the way users interact with technology, once their services must be available everywhere at any time. They must support users in everyday activities and remain transparent with little or no need of attention. Thus, an essential characteristic for these applications acceptance is calmness, which was firstly cited by Weiser and Brown (1997) as a new approach to properly fit computing into people’s lives [2].

As ubiquitous applications are embedded within everyday objects (e.g., cell phones, watches) and environments (e.g., home, office), there is a high risk of users feeling annoyed and overwhelmed by them. A calm application must support user’s activities at the right time and place, delivering the best service possible [3, 4]. We believe that calmness has a great impact on the user satisfaction, therefore, on the usability and acceptance of the ubiquitous application.

We propose in this paper to evaluate calmness by using software measurements. To perform these measurement, firstly, it is necessary to define what software measures should be collected. In this case, there is a well-known method in the Software Quality area called Goal-Question-Metric (GQM) [5]. GQM is a goal-oriented approach that follows a hierarchical structure model starting with the definition of a measurement goal, that is refined in several questions and, finally, into metricsFootnote 1, which will provide information to answer the questions. By answering these questions, it is possible to analyze if the goal is achieved.

This paper presents a model composed of a goal, questions and software measures, defined by the GQM method, for evaluating the calmness characteristic in ubiquitous applications. By applying this model, it is possible to verify if the application presents a good level of calmness and what could be improved within the application to improve calmness level. We also present results from a case study involving three mobile ubiquitous applications.

2 Calmness Evaluation in Ubiquitous Applications

Ubiquitous applications are those capable of monitoring environment and users in order to provide services as natural as possible. To achieve this goal, they have to comply with challenging requirements such as autonomy, heterogeneity, coordination of activities, mobility and context-awareness [6]. In the scope of Human-Computer Interaction (HCI), there is another indispensable characteristic for ubiquitous applications being used and accepted by users: calmness.

According to [2], a calm technology should move easily from the periphery of attention to the center, and back. Periphery is used to describe what we are attuned to without focusing on it explicitly. For example, when we are driving, we do not focus on sounds but rather on the road. However, when a specific sound related to an event occurs, this information comes to the center of attention. That means we can keep information in periphery and, only when necessary, we can attend to it.

Through a literature search, we found one study [3] that carefully defines calmness and how it can be evaluated. It proposes a conceptual framework for evaluating calmness in ubiquitous applications. The authors classify calmness from two statements: calm timing and calm interaction. Calm timing means that the ubiquitous application should interact with the user in the right situation. Calm interaction means that the application should remain out of the user’s attention whenever possible. They propose a subject evaluation using values such as High, Medium, Low and Very Low. However, they do not apply the proposed framework to evaluate ubiquitous applications in a real usage situation. Also it does not define a measurement function or a collect method for the proposed values.

Other previous work that was found related to calmness were [4, 7]. They sate that calm technology should allow users to access new information peripherally, enabling them to decide whether to divert their attention and change their focus. These works aimed to create a model for evaluating if a technology is calm or not. Anthropology-Based Computing (ABC) and Peripheral Interaction (PI) are the objects of their studies. However, they still are testing fade-ups and calm ringtones.

We argue that an evaluation using software measurement is also needed for calmness in ubiquitous applications. According to [10], measurement is the process of defining, gathering and analyzing data about products, in order to provide meaningful information for the purpose of improving it. There are several papers in literature that use measures to evaluate ubiquitous applications [811], including a paper with measures to evaluate context-awareness [12]. However, the literature study did not encounter any papers with software measures to evaluate calmness.

3 A GQM Model to Evaluate Calmness in Ubiquitous Applications

Using the GQM method, we defined our goal as follows: “Analyze the ubiquitous application, for the purpose of evaluating, with respect to calmness, from the viewpoints of the user”. Then, we derived three questions and eleven measures based on both literature review [3, 814] and interviews with five experts on ubiquity. The resulted GQM model is presented in Fig. 1.

Fig. 1.
figure 1

The GQM model to evaluate calmness in ubiquitous applications

3.1 Question 1: Is the Application Capable of Interacting with Users in the Right Moment?

Users need to feel the ubiquitous application is available anytime and anywhere. This can be achieved by the following statements: (i) the context-awareness adaptation should happen when requested by a contextual change; (ii) this adaptation should be correct; and (iii) the application should have the characteristic mobility, which refers to a continuous or uninterrupted use of the systems as the user moves through several devices. Based on all these statements, we defined five measures showed in Table 1.

Table 1. Software measures for question 1

The two first software measures are related to context-awareness adaptation. When the context changes (e.g., location and activity) the application has to adapt to this new context, if this does not happen, the user probably will feel the application has not worked well. So, the measure Adaptation Degree counts how many times the system adapts when requested by a context change.

Adaptation Correctness Degree aims to identify if that adaptation happened in an expected way for the user, it means checking if the services and/or information were delivered in the right way with respect to what is expected for the user at that specific moment. To calculate this, it is necessary to identify which adaptations the system has that can be collected (N). Thus, it is possible to count how many times a particular identified adaptation occurred during a certain usage period of the application (Bi) and how many of times those adaptions have occurred correctly (Ai). As a system can have several adaptations happening, the measure shows a summation aiming to add up all the adaptations occurred, and then calculating an average by dividing by N.

Indicator of Transparent Mobility was based on the information about mobility from the work of [13]:

  • High is when the application can move from one device to another, keeping the past interactions and adapting resources to the new device (e.g., screen size), so the user can continue their tasks seamlessly.

  • Medium is when the application can move from one device to another, keeping the past interactions and adapt to the new features. However, the user is likely to have to wait a long time to start interacting with the application on the new device.

  • Low is when the application can move to another device. However, the application does not adapt to it. For example, the new screen size is not taken into account.

  • Nonexistent is when the application cannot move from one device to another.

The two last measures (Availability Degree and Context-awareness Timing Degree) are qualitative and collected from the user after the use of the application.

3.2 Question 2: Does the Application Effectively Use the Periphery and the Center of the User’s Attention?

This question is related to the first consideration Weiser wrote in his paper that a calm technology should move easily from the periphery of our attention to the center, and back. This leads to measure if the application is being proactive to reduce the decision-making time of the user, the number of times the user is unnecessarily harassed and the number of failures that occurred during use.

It is necessary to know if the application interacts with the user only when needed, delivering relevant information and requests. To answer this question, five measures were defined and presented in Table 2.

Table 2. Measures for question 2

The first measure (Number of irrelevant Focus Changes) aims to identify the amount of time the user had to change focus due to the technology. This shift in focus happens when some user action has to be performed for the application to work correctly during use. For example, restart a particular sensor during use and/or restart the application are actions that change the user attention unnecessarily.

The second measure (Proactivity of the application) aims to identify what the degree of proactivity is by counting how many user actions the application is able to replace. This measure is calculated by counting how many actions were developed in the application that can be supported by sensors and which among these actions are replaced by the sensors. The closer to zero the better, as this means every action capable of being developed by the sensors was actually developed by them.

The third measure (Number of failures) aims to identify how many failures happened while the user was using the application. The more failures that occur, the more distractions the user will face.

Finally, the two last measures (Relevancy Degree and Courtesy Timing Degree) are qualitative and collected from the form the user has to answer after using the application.

3.3 Collection Methods

Each measure, as already presented above, is collected by one or more of the following collected methods:

  • User interaction Logs: Generation of user interaction logs with the application through code instrumentation. These logs should be collected during the use of the application. Each application that will be evaluated need to be instrumented.

  • Questionnaires: Two questionnaires were developed to collect some measures. One questionnaireFootnote 2 is composed of questions to the application developer and the otherFootnote 3 to the user who answers after using the application, to collect the qualitative measures.

  • Observation by an Evaluator: A form for manual observation by an evaluator is filled when following the user during the application use.

4 Case Studies

The collection of the proposed measures was performed through case studies with three ubiquitous applications developed for mobile devices on the Android platform as follows: GREatPrint, GREatMute and GREatTour.

4.1 GREatPrint

This application aims to print documents at the nearest printer from the user. The application works as follows: after choosing a file, the user clicks on the print button for a given document, then the application searches for the Wi-Fi network with the highest signal intensity, which signifies that it is probably the closest to the mobile device. With this information, the application checks which printer is in the range of that network. Thus, the application sends the document to be printed on that printer, and informs the user which printer was selected.

Twelve users participated on this application evaluation. The developer also participated by answering a form specific to the applications developers. The twelve users were divided in four groups of three people to execute the application in different floors and rooms. The task defined to be performed by the users was to print a pre-established document on the application. Users were asked to use the application in the GREat research lab, because GREatPrint are targeted for this environment.

An evaluator was present during the usage, noting, for example, if the captured context was correct, if the application failed, and other valid information. After usage, users were asked to answer the user questionnaire. Table 3 presents the results of the measures collected.

Table 3. Results from the GREatPrint evaluation

We can conclude that the application must be improved to better address context-awareness and mobility. An improvement suggestion is to add more context information to infer more precisely the actual location of the user by utilizing more sensors such as the accelerometer or the magnetometer. Also an improvement would be to allow the use of the application on other device types, for example: desktops and notebooks.

4.2 GREatMute

GREatMute is a service that runs in the background of the user’s mobile phone. It monitors the mobile user’s Google Calendar for events during which the user cannot receive calls, e.g., “meeting” or “class”. By discovering such events, the application places the user’s phone on silent mode during the event time, so user does not get disturbed.

Also, this application allows the user to specify which events they would like to the device to be on silent mode, this is performed by the registry of keywords to be monitored. When the application finds an event in which the title has one of the keywords registered, GREatMute schedules based on the information extracted from start time and end time of the event. During this time the phone uses the silent profile.

To execute the tests with GREatMute, users were asked to install the application on their own device and use it during a week. Thus, users experienced the use of the application in their actual day to day environments. The task for the user to perform was registering at least one event that would actually happen throughout the week on the calendar, as well as a corresponding keyword to the name of this event in the GREatMute application. A period of one week was selected because it was considered sufficient time for real events to happen that could be monitored.

For the testing of GREatMute, it was important to recruit people who knew how to use Google Calendar. If the users had previously used Google Calendar to schedule events, they would not have to also learn how to use it.

Only eight of the twelve invited users participated in the GREatMute evaluation. Table 4 presents the results.

Table 4. Results from the GREatMute evaluation

The application needs to improve in relation to the capability of interacting at the right moment. GREatMute could take into account two additional pieces of context information: the user’s current location and the location of the event he/she registered on the calendar. Thus, it is possible to know if the user is at the event that they registered and thus change the mobile profile more reliably. If the user is not in an approximate radius of the registered venue, the application does not need to put the phone on silent mode.

4.3 GREatTour

GREatTour is a mobile guide for visiting the GREat Lab. It provides information about the environments of the laboratory that the user is visiting. The application works as follows: the user must scan a QR Code that is found on the door of the environment to update user location. Then, a map of the lab is displayed, highlighting the environment where the user is. So, the user can view media options related to this environment (texts, photos and videos). However, the rendering of this media depends on the battery level of the device. When battery is low (0–9 %), only text is displayed, when it is medium (10–20 %), text and images are displayed, and finally, when it is high (21–100 %), texts, images and videos are displayed.

Like with GREatPrint, the users were asked to use the applications in the GREat research lab, since this application is targeted for this environment. Also, an evaluator was present during the usage. In this case study, only six of the twelve users actually used the application, due to unavailability of the other users. Each user made three visits (tours) to the laboratory GREat, with different battery levels in order to test all application states. Each visit consisted of seeing three environments. In each environment, users updated the map and accessed all available media, according to battery of the phone. Table 5 presents the results.

Table 5. Results from GREatTour evaluation

GREatTour provides a good degree of context-awareness correctness. However it must be improved to better address adaptation degree, mobility, availability, context-awareness timing, focus changing and proactivity. The following changes could improve GREatTour: (i) allow mobility between different devices (different phones and tablets) without loss of information in the new device screen and maintain the previous tour information; and (ii) use mobile sensors (e.g., magnetometer, accelerometer and wi-fi) to detect the user location more transparently without the need for user intervention (QR code scans).

5 Conclusion and Future Work

By using the GQM method, it was possible to develop a model to evaluate calmness in ubiquitous applications. This model is composed of software measures that were applied in three ubiquitous applications developed for mobile devices: GREatPrint, GREatMute and GREatTour. As a result of these case studies, the measures indicate that these applications still need to improve to have good levels of calmness. This reveals an indication that the proposed model was able to assess Calmness.

However, due to the variety of application fields within ubiquitous computing, each application may require different test procedures. For instance, GREatMute required that the user use the application during the week on their mobile phones and after that the subjective measures were collected. On the other hand, GREatPrint and GREatTour were used inside the GREat Lab and an evaluator could observe the use. However, each one of them required different planning of scenarios, because the context is not the same between these applications.

Therefore, this proposed model does not exclude a plan for how an evaluation must be in different ubiquitous applications. The perspective for future work and improvement of this work is to create a methodology capable of systematically guiding an evaluator, so that they are able to assess calmness and investigate the influence of calmness on other characteristics of HCI, for example, Usability.