1 Introduction

Smartwatches have emerged in the last few years as a viable commercial product. Market analysis repeatedly forecast growth, yet the predicted sales numbers are rarely met by actual sales—and wearable sales are largely accounted to activity trackers, not actual smartwatches. With the smartwatch market growth slowing down [3, 60], there is concern that future consumer adoption may be diminished. It is not yet clear how smartwatches are appropriated for long-term use, and the current literature on the long-term usage of smartwatches specifically is limited. Current smartwatch users can be categorised as early adopters [49], who explore the capabilities of new technologies. The answer as to why large-scale adoption is yet to be reached can lie within observing how the current users leverage their smartwatches, explore and opt different functionalities.

Clawson et al. [11] report on the frequent abandonment of wearable technologies, with the most cited reason being consumer mismatch of expectations for the technology and what technology actually offers. Similar reasons concerning consumer expectations have limited the growth of other smart technologies, e.g. smartglasses [29], and also affected the early development of smartphones [33]. The similarities with smartphones and smartwatches influences how smartwatches are perceived by users in terms of usefulness and customer expectations [53], and the long-term use will diminish if the device’s usefulness is not perceived.

Our extensive dataset consists of long-term smartwatch usage data from 79 users over a period of 14 months, from December 2015 to March 2017. We aim to understand whether self-generated usage patterns exist for distinct usage behaviours during long-term use, and how users adapt these usage behaviour use over time. This paper offers a novel data-driven quantitative approach for exploring how smartwatch usage behaviour organically (i.e. without outside interference) fluctuates and adapts over time.

The analysis methods presented in this paper are exploratory and this type of mobile sensor data-driven longitudinal usage behavioural analysis has not been previously conducted on this scale. This is partly due to lack of sufficient data collection infrastructure, e.g. when smartphones were becoming prominent (the early to mid-2000s), and partly because of lack of technology in a suitable phase in time in its development and adoption. We describe our analysis pipeline in detail. Our analysis offers an understanding of which user-selected device usage characteristics (e.g. recharging habits, notification management, frequency of use) organically evolve over time to more strongly support potential specific types of smartwatch use.

2 Related work

Modern smartwatches have mostly overcome previous technical limitations [44], which have said to be the main cause for their lack of popularity. The key to user acceptance of smartwatches is result demonstrability [28, 57]—indicating tangible end-user benefits of such devices. Typical beneficial characteristics of smartwatches are increased user availability via increased device proximity [5, 51], quicker and more convenient responses to prompts such as notifications [61] and reduced social tension which can be caused by frequent smartphone use [32].

2.1 Consumer adoption

By analysing consumer adoption, one can identify key factors that influence towards upcoming and current technologies [56]. The emergence of new technologies to consumer markets follows the diffusion of innovations [49]—the early and late adopters amongst other specified groups. Ram et al. [39] claim to develop a marketing strategy that overcomes barriers such as usage, value and risk, and psychological barriers such as tradition and image as key factors in creating innovation. The adoption of smartphones, now ubiquitous devices in our connected world, was not instantaneous and went through similar stages of consumer adoption. How smartphone use evolved over time and was appropriated for is investigated in works like the acceptance and innovations regarding smartphone use by Park et al. [33] or more recent investigations regarding smartphone usage habits [20, 31].

One can draw some guidance from the literature on wearable devices, which suggests that users who manage to derive long-term value from their wearable gadgets habitually and proactively use their devices, creating the usefulness from their own device usage behaviour [16]. As such, user innovativeness directly impacts the usefulness of a smartwatch [18]. We can create a generic picture of device use [6] from embedded sensors [55] or device logs [50], and different types of users and user characteristics can be discovered from such information [66].

Wu et al. [62] explored consumer perspectives and note that attitudes towards the new technology and demand for enjoyment influence acceptance and that external factors such as ease of use, price, aesthetics and expectations impact technological adoption. However, it has been frequently reported that experienced usefulness of a product has the highest long-term impact [10]. Furthermore, Hong et al. [18] report that the impact of utilitarian satisfaction (usefulness) is higher than the hedonic satisfaction (enjoyment) on consumers. This can result in abandonment in case of lacking usefulness, once the novelty effect wears off, and if users do not successfully adapt their usage style to match the technology and its capabilities.

2.2 Mobile sensing

Previous studies have investigated smartphone and mobile device use using sensor information [14, 55]. Falaki et al. [13] reveal distinguishable differences between users from observing their interactions and used applications, network and battery. Banerjee et al. [2] study and theorise how users interact with the mobile device batteries and showcase how by adapting to the user’s battery interactions the battery life can be prolonged. Min et al. [27] study similar user-battery interactions on smartwatches and report the same theorems applying to smartwatches, although the interval between charging events is generally longer for smartwatches. Both the work of Min et al. [27] and Visuri et al. [58] reveal findings on how users interact with their smartwatches in comparison to smartphones; smartwatches are generally interacted with more frequently and for shorter durations.

Clustering methods can be applied to further the understanding of user behaviour. Xu et al. [64] reveal how users can be grouped, and these groups identified, according to the users’ web browser history. Similarly, Rawassizadeh et al. [43] used pattern identification algorithms to identify smartphone users. Zhao et al. [66] showcase how smartphone user groups can be identified by clustering users based on their used applications. Lastly, Meyer et al. [26] identify different types of usage patterns—periodic changes, and changes over time—of activity tracker users, according to their daily (in)activity.

2.3 Usage behaviour and understanding smartwatch use

The majority of smartwatch research focuses on hardware and software improvements [4, 38, 65]. In terms of device use, communication applications and their associated notifications drive most of our smartphone use, while the smartwatch is seen as a timekeeping device [36]. It is often assumed that smartwatch is an extension of a smartphone—‘It’s kind of like an extra screen for my phone’ [52]. Previous work introduces Duet [9]—an interactive system that explores the interconnectivity of these two devices and enables users to perform multi-device gestures. Similarly, work presented in [8, 61] reveals how users prefer to receive information and use different devices in multi-device environments.

The use of smartphones elicits specific behaviours, e.g. in using and revisiting specific applications [20], and how such behaviour transforms, prolongs and facilitates long-term device usage [31]. Karapanos et al. [21] performed a longitudinal investigation of smartphone user experiences and highlighted that frequently occurring habits become tied to aspects of device use that are meaningful in one’s life. In [26], Meyer et al. revealed that long-term use of activity trackers—a technology similar to smartwatches due to wearable nature—does not necessarily involve the continuous use, and the periodicity of use and usage breaks should be also accounted for.

Visuri et al. [58] quantify smartwatch use over a period of 6 months and reveal our smartwatch use is heavily impacted by arriving notifications. Cecchinato et al. [8] perform a qualitative analysis on users’ usage preferences of smartwatches, and the capabilities of smartwatches—namely timekeeping and receiving short pieces of information via notification. Both works showcase similarities in what device capabilities users prefer and how users interact with their devices. Gouveia et al. [17] analysed the use of activity trackers and noted how their use is heavily driven by glances. However, comparing the use of activity trackers to smartwatches directly is not appropriate, since smartwatches have vastly higher usage capabilities, i.e. more options and applications. McMillan et al. [25] investigate smartwatch use according to surrounding context with the use of video recordings and highlight how location, time and surroundings (e.g. the presence of others) impact usage in terms of session duration and frequency. Lastly, Min et al. [27] sample smartwatch users, both via interviews and mobile sensing, and their work highlights (1) the transformation from being interested in timekeeping capabilities of the smartwatch (for novel users) to preferring notification capabilities (for experienced users) and (2) recharging and battery associated tendencies of the users, namely the increased recharging frequencies on smartwatches.

3 Dataset overview

Our initial dataset consists of data collected from a total of 81 users over a period of 4–14 months. Two participants are later omitted due to emerging inconsistency using our data processing methods. The dataset is collected in-the-wild using the ‘Insight4Wear’ application [45] freely available for users to download in the Google Play application store. After installing, the user is prompted to accept the end-user license agreement to allow the application to store the information locally and every 3 h upload the data to a secure server. The data collection process has been approved by an IRB. The participants are anonymous, and no personal identifiers or contact information is retrieved. We use Android’s unique device ID to distinguish between devices. As such, we are unable to contact the users or make further inquiries about their device usage habits. The dataset was collected between December 2015 and March 2017. Majority of the users have Android 6.0 (48.88%) or higher (42.98%) installed on their devices. According to the Google Play application store installation statistics, 32.8% of installations are from the USA, 12.5% from the UK, 13% from Central Europe, 8% from Asia, 3.9% from Canada and 2.3% from Australia. The anonymity of our participants and the lack of knowledge about their intended use and purpose for obtaining and using smartwatches do signal potential biases within the dataset. This is a necessary evil, however, as obtaining data concerning long-term usage differences that is not influenced by being part of an experiment is significantly valuable by itself. Considering the end outcome of our analysis methods is to achieve a high-level conceptual understanding of usage style changes, as well as the size of the collected dataset, we are confident that these personal biases do not significantly impact the results of our analysis. The device base in this dataset also comprises earlier smartwatch models, thus the more recent models with newer design and their users and not part of this analysis.

3.1 Method

We deploy an exploratory analysis pipeline consisting of four steps, as illustrated in the overview in Fig. 1. The next chapters describe each of the four steps in detail and end with a summary of the end product of each step.

Fig. 1
figure 1

Overview of the dataset processing and analysis pipeline steps (1–4)

3.2 Step 1: extracting usage characteristics

With the collected data of notifications, screen events and battery information, we define a total of 98 different types of usage characteristics (i.e. features) of which 72 (3 × 24) are three different types of characteristics logged hourly (e.g. battery level per hour). For each day of use and each individual user, we average the daily data into the 98 individual daily usage characteristics. The characteristics of notifications are adopted from previous research on smartwatch notifications [58] and smartphone notifications [50]. Usage session characteristics are based on previous work on smartwatches [27, 58], battery-related characteristics are derived from work by Min et al. on smartwatch users’ battery usage habits [27], the theorems concerning energy management provided in [2] and users’ reactions to low battery levels, as explored in [19]. The daily usage patterns of the 30-day windows are based on work by Meyer et al. [26], and device use is further analysed in term of the SIRB (short, isolated, reward-Based) categorisation first introduced in [31]. All analysed variables are detailed in Table 1. The work of Meyer et al. [26] also functions as a motivation to use 30-day time windows.

Table 1 List of features extracted from the raw dataset of notification, screen and battery data. Number in parenthesis indicates the number of distinct features of the same type

3.2.1 Notification-related characteristics

We apply a categorisation of each notification into a general application category (e.g. media, health, maps and travel), based on each application’s Google Play Store category. Certain applications display the unusually high number of daily notifications and these tend applications use notifications as the main user interface and constantly refresh the notifications resulting in a high number of daily notifications (e.g. navigation applications). We scan the data for applications that emit a high number of notifications (N > 100K) and automatically discard excess notifications from a total of four applications—GPS status application, internet bandwidth measurement application, automatic Wi-Fi controller and the Android system UI. Furthermore, we calculate a daily downsampling threshold (948.43 daily notifications, aggregated mean and SD) for arriving notifications and discard notifications from any application sending a higher number of notifications than the threshold during a day of logging as these are also considered abnormally behaving applications.

We measure the total number of notifications that arrive on the smartwatch per day, the ratio of notifications from different general application categories over each day, the ratio of how many of the arriving notifications are promptly observed (user turns the screen on within 60 s of a notification arriving [58]), and whether the device use was initiated by a notification from an application that can be deemed as reward-based (application category in communication, news, social or entertainment), according to the SIRB categorisation [31].

3.2.2 Screen-related characteristics

We measure different types of standardised session characteristics; the total number of usage sessions per day, the distribution of these sessions across the hours of the day, the mean duration of daily sessions and the mean duration between usage sessions. We also analyse the source of the usage session, by specifying each session as notification-initiated (at least one notification arrived 60 s prior to the usage session) or user-initiated. We also measure the ratio of the daily sessions that are considered glance sessions, i.e. sessions that last exactly 5 s indicating no user interaction with the device during the session and the screen automatically turning off after 5 s. Finally, using the SIRB categorisation [31], we calculate a daily ratio of short sessions (duration less than 30 s), and isolated sessions (minimum of 10 min between sessions).

3.2.3 Battery-related characteristics

For battery events, we measure the starting battery level of each charging event and the interval between two subsequent charging events. Previous work has shown that users put a higher value on the remaining battery percentages the lower their current battery level is [19]; we flag each day where the battery level dropped below low (20%), and similarly if the battery level dropped below critical (10%). Additionally, we measure the hourly frequency of charging events and hourly battery drain rate for each logged day.

3.2.4 Daily usage characteristics

Each day with a marked usage session is considered a day when the user actively used the smartwatch, and when performing our analysis, we aggregate the data in 30-day time windows. For each of these 30-day windows (for each user), we then measure the longest usage streak, the longest break in smartwatch usage, and the longest usage phase—continuous use with breaks of less than a week in between usage streaks—all three according to [26].

3.2.5 Post-processing and standardisation

This method initially creates 593K daily usage characteristics, which are then aggregated to 468 30-day usage characteristics, which are further analysed in the next steps. Last step of the pre-processing is to mitigate any overfitting of individual features in the following processing and analysis steps. We negate the impact of overfitting by normalising all features individually; i.e. the 30-day summaries are normalised for each feature column in an A × B formatted matrix, where rows A denote entries for a user in a specified 30-day time window, and columns B denote either identifiable information (such as DEVICE_ID or timestamp) or distinct features. We use z-score normalisation as our method since it retains outlier behaviour within the provided dataset—which is important in our next processing step.

Step 1: Key takeaways

In the first step of our data-driven approach, we have aggregated data points collected from 81 individual Android smartwatches’ notification, screen and battery sensors, to 468 unique 30-day usage characteristics that highlight the differences of individuals smartwatch usage behaviour.

3.3 Step 2: clustering

In step 2, we generate distinct usage behaviour types using the k-means clustering method, with the aggregated 30-day usage characteristics as features. Each group (usage behaviour) has its own identifiable usage characteristics that are averaged from the users within that group and functions as the centroid for the cluster.

3.3.1 Grouping usage segments via clustering

Clustering is commonly used as a way to generate knowledge of users and combine users into groups instead of relying on analysing individual users [64]. The choice of a clustering method must be carefully considered in order to capture the required types of characteristics, and to generate meaningful user groups. In our preliminary analysis, we explored the use of the MeanShift algorithm, or a combination of MeanShift and k-means, similar to the method in [66], and concluded that the clusters generated by MeanShift are often one-dimensional; clusters are based on extreme qualities, without properly attending to a combination of different characteristics. The initial approaches created clusters that combined all users that shared one extreme trait (e.g. high number of notifications from media applications) into one cluster, ignoring the other traits of these users. We wanted to create more comprehensive user groups, which consider the combined dimensions. Thus, from the two considered algorithms or their combination, we deployed the k-means approach.

We test various k-values (number of user groups) between 5 and 100 and benchmark the results based on the Dunn index (D) [24] and Shannon’s entropy (E) [34]. The configurations where the majority of users are in a single cluster, or where the within-cluster consistency is low, are penalised. The two factors have equal weight (0.5 for both) for the resulting benchmark score. The combination of these two factors ensures that the selected cluster configuration is compact and balanced, and the clusters are clearly distinct of one another. Using the parameter sensitivity analysis, we find that the optimal cluster configuration is iteration one with k = 33 (D = 7.15 * 10−5, E = 3.06).

3.3.2 Outlier detection and removal

As the k-means clustering method relies on two factors: (a) the actual number of distinct groups is unknown; thus, researchers are required to rely on arbitrary k-values and use methods like our benchmarking approach to validate their choice of k; (b) this can lead into situations where samples are placed into a cluster merely because it is the least bad choice. This often leads to situations where clusters are populated by potential outliers. The fit of each sample to a cluster can be measured using the Euclidian distance. We first measure for any outliers within the existing clusters using a standard IQR × 1.5 measurement and detect 24 (out of 468) outlier behaviours. These outliers are discarded, and the discarded behaviours eliminate two of the users completely from our initial dataset of 81, bringing it down to 79 users. Apparently, these two users were highly inconsistent within their usage styles during their smartwatch use. 56.7% of the 30-day usage behaviours fall within the overall mean distance to a cluster centroid (M = 12.43, where the unit is the summed Euclidian distance overall 98 normalised features).

We further evaluate the remaining daily behaviours (each daily entry within the 30-day time window is assigned to the cluster according to the aggregated 30-day sample) and calculate the ratio of these daily sample’s fit to each assigned cluster. 87.4% of the daily usage characteristics fall within one standard deviation (M + 1 × SD) inside the specified cluster, and 96.2% of all usage behaviours are within three standard deviations (M + 3 × SD). This evaluation indicates that the created clusters and the behaviours within these clusters (accounting also for variance during individual days within the 30-day time period) are compact and the within-cluster behaviour is consistent over each clusters’ population.

Step 2: Key takeaways

We have created 33 unique usage behaviour types using a k-means clustering approach, each average from samples over a 30-day time period.

The daily behaviour variance within the 30-day segments is low enough to warrant generalising the behaviour within the specified time period.

3.4 Step 3: separation

The goal of our analysis is to uncover usage behaviours that either (1) become more prominent over time or (b) are actively discarded over time. Closely examining the attributes of each of these 33 behaviours individually is an ineffective method for this scale as there are simply too many details to uncover. Instead, we analyse how users transition into and out of these behavioural clusters over time.

First, for each behaviour, we can calculate whether it is becoming more popular or less popular over time. To do this, we look at the number of unique users that exhibit that behaviour in their 1st 30-day period, their 2nd 30-day period, etc. Intuitively, a cluster that becomes more popular over time will have an increasing number of users as the month’s progress, and we use Pearson’s correlation to measure this trend.

For each user, we can reconstruct their behavioural transitions over time. For example, if a user’s first 30-day segment is classified as usage behaviour 17 (B17), their second 30-day segment as behaviour 9 (B9) and their third 30-day segment as behaviour 11 (B11), we use the notation B17 → B9 → B11 to denote how that user’s behaviour varied over time. The results show that 26 behaviours (out of 33) occur at least twice, and we classify those into:

  • Exploratory behaviours: 10 behaviours decreased in popularity over time.

  • Accepted behaviours: 16 behaviours became increasingly popular over time.

Due to repeated tests for correlation, we use p < 0.01 as a threshold for statistical significance and reveal one accepted behaviour (B17) to be statistically significantly increasing over time. As an additional measurement for the exploratory nature (or lack of) of behaviours, we measure the probability of transitioning away from a behaviour. Using a t test to cross-compare the probability of transition between accepted and exploratory behaviours, we can verify the accepted behaviours being less likely (48.2% < 68.8%, p < 0.001, Shapiro-Wilk for normality) to lead to a behaviour transition. This verifies that the categorisation of the two behaviour types is valid.

On average, users in our dataset exhibit three (M = 3.4) different behaviours (SD = 1.23, min = 2, max = 6), and users rarely (89 out of 444 transitions, 20.1%) return to prior behaviours over time. Typically, if the user returns to a prior behaviour, the behaviour is the penultimate one; i.e. following our notation, the transitions where users return to prior behaviour typically look as follows: ABCCBB. Large portion of transitions (56.7%) involve no changes in usage behaviours; i.e. the user maintains the same usage style in subsequent periods (CC), and usage styles tend to stabilise later during longitudinal use, rather than in the beginning. For example, the transitions for one specific participant (P71) are as follows:

$$ B14\to B17\to B21\to B21\to B21\to B21\to B32\to B32\to B32\to B32\to B32 $$

In the previous example, the user had logged data for approx. 330 days, and employed the same behaviour (B32) for five 30-day segments. Another participant (P59) employs the following transitions, highlighting (in bold) accepted behaviours. His choices highlight transitions between accepted (B10, B21)—which tend to last longer—and exploratory (B13, B14) behaviours. A more thorough investigation would be necessary to uncover transition details and the underlying reasons for these changes, and is out of scope for this work due to the volume of our dataset.

$$ \boldsymbol{B}\mathbf{10}\to \boldsymbol{B}\mathbf{10}\to \boldsymbol{B}\mathbf{10}\to \boldsymbol{B}\mathbf{10}\to \boldsymbol{B}\mathbf{10}\to B13\to B13\to B14\to \boldsymbol{B}\mathbf{21}\to \boldsymbol{B}\mathbf{21}\to B31 $$

Figure 2 illustrates the transitions between usage behaviours within the complete data set. We can see how the transition trend progresses from earlier behaviours (light cyan) to latter (darker), with participants infrequently returning to earlier (lighter) usage styles. Users also frequently maintain a usage style over multiple 30-day periods (indicated by the same colour replicated when proceeding upwards on the y-axis). There is no significant association between an individual usage behaviour and whether that behaviour consistently leads to a transition (p = 0.24, χ2 = 25.051 using a chi-squared test), which indicates that regardless of the behaviour style, some users tend to occasionally remain within those behaviours for longer than 30-day periods.

Fig. 2
figure 2

Visualisation of behaviour transitions for each user. The first observed behaviour is colour-coded with light cyan, and each new subsequent behaviour is coded with an increasingly darker colour

3.4.1 Limitations

A limitation exists in the collected dataset, as a user can install the logging application at any point during their device use. The starting point (day 0) in the dataset is not necessarily the same moment in each user’s device usage; i.e. one user might install the logging application a week after obtaining a smartwatch, while another user might only install the application after already using the smartwatch for 3 months. However, our analysis aims to understand the change in usage behaviour, so simply having access to the transitions in behaviours is sufficient.

Another factor that can be considered a limitation is the sheer number of identified usage behaviours, as this leads to difficulties in analysing individual traits of said behaviours. With 33 distinct behaviours (each comprising 72 features), and each behaviour consisting of a distinct combination of ranges of values in a multitude of these features as a mean of categorisation, analysing the characteristics of these behaviours becomes unfeasible. Such detailed analysis could be an important goal of future work extending our approach. The next chapter takes a high-level approach in understanding similarities in the two behavioural categories but does not take into consideration differences of individual behaviours.

Step 3: Key takeaways

We have identified two distinct types of usage behaviours, exploratory and accepted behaviours, and initially analysed how users transition between these behaviours.

Users tend to explore new usage behaviours (new ways to use their technology), and if they fall back to previously used behaviours, they tend to favour the most recent behaviour (one prior to change).

3.5 Step 4: analysis and results

Next, we analyse how the exploratory and accepted behaviours vary in terms of the classification features. Each 30-day usage behaviour contains the normalised characteristics of each user within that behaviour (i.e. cluster), so we use a t test to compare the values on a feature-to-feature basis. In case of usage ratios that are not normally distributed (using Shapiro-Wilk to verify normality), we use the Kolmogorov-Smirnov to assess similarities in the distribution. When cross-comparing exploratory behaviours to the accepted behaviours, we can observe statistically significant changes in 24 classification features at the 0.05 level of significance and eleven features at the 0.01 level. For presenting the cross-comparison results, we refer to our non-normalised dataset (as the features were cross-compared independently of others, thus no need for normalisation) for better transparency of the results. The findings can be summarised into three general categories:

  1. 1.

    Notification-related use: Accepted behaviours elicit fewer daily notifications (M = − 64.8, p < 0.01), less promptly observed notifications (difference of − 0.30 in the ratio, p < 0.05), and more notifications from ‘Other’ (+ 0.04, p < 0.01) and ‘Games’ (− 0.001, p < 0.05) categories. Lastly, the notification-initiated sessions are less likely for accepted behaviours (difference of − 0.03 in the ratio, p < 0.01), albeit by a small margin and with more variance within the exploratory behaviours. The differences are illustrated in Fig. 3.

Fig. 3
figure 3

Differences in notification-related usage characteristics between accepted and exploratory behaviours

  1. 2.

    Battery-related use: There are noticeable differences in both diurnal charging and draining behaviours between the accepted and exploratory behaviours. The accepted behaviours generally elicit higher charging likelihood (+ 0.06, p < 0.05) during evening hours (21–23), and higher hourly drain rate (ranging between + 0.4 and 1.4, p < 0.05) during night-time (12AM–2AM) and afternoon (14–17). The differences are illustrated in Fig. 4.

Fig. 4
figure 4

Differences in battery charging and battery drain features between the accepted and exploratory behaviours

  1. 3.

    Frequency of use: Accepted behaviours show decreases both in terms of daily usage sessions (− 25.6, p < 0.01), frequency of use (− 0.01, p < − 0.05) during morning hours (7–9), and shorter usage streaks (− 2.7 days, p < 0.01) and usage phases (− 3.7 days, p < 0.01). (See Fig. 5 for details.)

Step 4: Key takeaways

After separating elicited usage behaviours to accepted and exploratory, we reveal 24 key usage differences between the behaviours.

The accepted behaviours tend to be affiliated with less active use, showcasing fewer daily usage sessions, less notifications and shorter longitudinal usage streaks and phases.

In general, the accepted smartwatch usage behaviour tends to be more passive.

Fig. 5
figure 5

Differences in usage frequency between the accepted and exploratory behaviours

4 Discussion

We set out to investigate via our dataset how smartwatch usage behaviours change over time. This offers us a reflection on how the user expectations match what the current smartwatches have to offer and how the two entities (i.e. the technology and the user) potentially meet in the middle due to the user’s learning or adaptation phase.

In previous work, it has been reported that users of wearable devices elicit phases of device use, accompanied by phases of device neglect [26]. It is understandable that a technology (smartwatch) that is designed to be in the role of a supporting device (to a smartphone) is not constantly required to be in use. However, it has also been shown that by adopting distinct usage behaviours, use of technologies can become pervasive [31] and that users self-employ distinct strategies (e.g. revisiting and using specific applications [20]) to create these behaviours. We revealed how in our dataset, comprising 79 individuals, the users exhibit a total 33 distinct usage behaviour styles over a period of 14 months. However, individual users tend to showcase only a limited selection of such behaviours, most of the users used three distinct styles even with longer data collection periods up to 14 months. The users often remain in the same usage trend over long periods of time and rarely transition back to previously abandoned (or consciously unlearned) behaviours.

4.1 Adapting to support long-term smartwatch use

We use the annotation of accepted vs. exploratory behaviours to identify usage styles that the users prefer (the accepted) over those that often end up becoming abandoned (the exploratory). Notification, device use frequency and battery use related features are strongly associated with the accepted behaviours. Surprisingly often, the accepted long-term behaviours are associated with less frequent and more passive usage styles. These characteristics are the primary culprits for mismatches in customer expectations and actual device use and based on our findings the initial (heightened) customer expectations could be to blame for smartwatches’ lack of wide spread adaptation. Alternatively, the findings also relate strongly to the overall diffusion of innovation theory [49], which describes the transition from early adopters (or early explorers) to everyday majority use in a similar fashion.

The key findings of our work are specific usage styles becoming prominent over time, and the usage characteristics shared by these styles. The smartwatch as a technology that clearly supports certain styles of use, and users who start with expectations differing from this type of use (and do not subsequently abandon their devices) eventually adapt their use to match. Most significantly, users become increasingly likely to adapt the notification capabilities of the smartwatch, a result also highlighted in recent works [8, 25, 36, 52]. Similarly, Min et al. [27] suggest that while inexperienced smartwatch users prefer the traditional watch capabilities of the device, experienced users prefer the notification capabilities.

4.2 Reflecting on previous assumptions of user preferences

From previous literature, we can identify four key assumptions on what smartwatch users perceive as important capabilities and how long-term users would adapt their device use. These preferences are often associated with overall device engagement [8, 26, 36, 49], the timekeeping capabilities of smartwatches [8, 36], notification mechanisms [8, 25, 36, 52, 58] and battery or recharging–related habits [27, 45]. Here, we validate these assumptions against our findings and assess how the design of smartwatches could be changed to reflect.

4.2.1 Long-term usage elicits decline in activity

According to the diffusion of innovations [49] and the results presented in [26], users of wearable devices often exhibit recurring periods of activity and inactivity. Usage of new technologies, such as smartwatches, often begins with a novelty period. During initial use, engagement is heightened, with the level of engagement eventually reducing over time. This behaviour was previously discovered amongst smartphone users in e.g. work by Rawassizadeh et al. [46]. After the initial interest in technology, users exhibit more periods of inactivity and overall have reduced engagement with their devices.

This suggests that the assumption that smartwatches start off as interesting pieces of technology that the users will then begin to neglect over is true. As users become more experienced with their devices they tend to come to the realisation that not all the device’s capabilities are of use to them. Clearly, the technology can still offer benefits, but the gained benefits can also be user-driven and self-generated, indicating learning and selective behaviour.

4.2.2 Notifications become less important

Device usage is often driven by arriving notifications [58], but this can often be experienced as a burden [8]. Users often opt to modify their device settings to reduce the number of incoming notifications, but at the same time learn to live with the increased availability. This can eventually increase the number of social interactions via notifications, and the number of notifications themselves.

The second assumption we consider is that users both manage their notification settings to fit their preference and adapt their use of communications applications due to increased availability, and the subsequent increased prompts and interruptions. This result is observed as the number of arriving notifications declines, and notification-initiated device use and the amount of promptly observed notifications significantly decrease. How effectively users observe the incoming notifications can also be a by-product by the less passive usage style overall; the user simply is not available as often. Additionally, the larger variance showcased with the exploratory behaviours indicates that smartwatch users attempt to be more available, but eventually see this as an unwanted usage behaviour style.

Thus, our results confirm the findings in the literature on notification management, however, not from the perspective of constant increased availability or increase in arriving notifications. Similar to new notification delivery management methods on mobile devices, it is doubly important to consider by developers which notifications should be delivered to one’s smartwatch—the smartwatch does not necessarily have the means on interact with all incoming cues, nor is all information necessary to deliver via the smartwatch. The new notification channel mechanismFootnote 1 in Android O is a step to the correct direction, as managing the channels is easier for the user than attempting to manage all individual applications, as long as we are still lacking a ubiquitous intelligent mechanism for notification management. From our data set, it is yet unclear whether the current smartwatch users already actively filter out notifications from unwanted sources to warrant the difference of 64 fewer daily notifications, or if this result is merely due to user-opted decreased connectivity between the smartphone and smartwatch. Regardless, the user-driven choice to not be constantly available is evident.

4.2.3 More than a wristwatch

The timekeeping capabilities of the smartwatch can be associated with frequent usage sessions with short durations [58]. As users become more experienced they begin to explore more advanced capabilities of the smartwatch [27], and the short sessions associated with e.g. observing time will become relatively less frequent.

The third assumption from the previous literature—that experienced users explore more advanced capabilities of the smartwatch, instead of settling for it being a glorified wristwatch—is partly validated. This topic is debatable, as the key metric for assessing timekeeping capabilities of the smartwatch is the ratio of glance sessions, which showed no statistical significance between the accepted and exploratory behaviours. However, the fact that user-initiated usage increased (direct by-product of decreased notification-initiated usage) shows evidence of user-driven exploration of more complex applications. However, the way different users enable themselves to use more complex features on their smartwatches is highly individual, thus from our findings we cannot offer comprehensive design guidelines for increasing user engagement and remains an interesting challenge for future work.

4.2.4 Taking care of your smartwatch

Smartwatch battery life is often criticised and highlighted as the main reason for smartwatches’ lack of appeal [44]. However, the literature shows [27] that recent users often adapt their recharging behaviours to tackle this issue. New users often recharge their smartwatches at a specific time of day (e.g. during the night), but a better method for ensuring prolonged battery life, which is not as reliant on habits, can be taking advantage of available charging opportunities regardless of the current charge level and the time of day. The fourth assumption is that users will explore and learn new ways to circumvent the limitations of short smartwatch battery lifetimes. Novel users are often forgetful in their charging habits, and this can lead to increased battery stress [1, 15].

Our findings indicate this assumption to be true on two fronts. The accepted behaviours showcase increased in recharging frequency, as well as decreases in battery usage in general. Smartwatch battery can last for longer than 24 h, but experienced users are more likely to remember to charge their devices every night. With frequent micro-charges (as shown as increased recharging events during the day, Fig. 4), the daily battery drain reduces the battery to 30–50% before the next nightly recharge. Device design can support this behaviour by influencing the end-user via marketing approaches (attempt to transform the recharging behaviour) and by making the recharging process easier to perform—either via making the connected cables easier to manage or leveraging other approaches, like motion-chargers or wireless chargers (Table 2).

Table 2 Summary of previous research on smartwatch use and what type of benefits long-term smartwatch users prefer, annotated with how our findings reflect on the assumptions

In summary, our comprehensive data set of long-term smartwatch use validates many key assumptions made in previous literature regarding smartwatch use and end-user choices. Many of the factors are related to each other, leading to the overarching theme for prolonged smartwatch use: there is no need to constantly rely on the smartwatch or being connected. Our findings do not indicate lack of engagement with the smartwatch, as the usage streaks (consecutive days) and phases (less than 7 days between streaks) still average 10–20 days (Fig. 5).

4.3 Guidelines for Design

The purpose of the following suggestions is not to advocate for certain specific technological decisions or interactions but to direct designers’ and researchers’ attention to the factors revealed in our results.

One of the earliest marketed motivations for obtaining a smartwatch has been its ability to synchronise with a smartphone, effectively functioning as an extension to the user’s current smartphone. However, as long-term users tend to have large periods of inactivity in their smartwatch use, these benefits largely manifest at chosen times and contexts rather than continuously throughout time. A key factor in increasing quality of experience with smartwatches is identifying these contexts and providing required interactions, functionalities and application areas that effectively leverage the smartwatch capabilities. Such context awareness has been explored by e.g. Kubo et al. [23] in designing cross-device interactions that support context changes.

Similarly, while the smartwatch can technically display all smartphone notifications, interacting with many of these notifications is not always feasible. User-driven decisions in managing the number and type of arriving notifications on smartwatches should be taken into consideration, and efforts should be made to (i) automatically manage the notifications shown through the smartwatch and (ii) offer the user more effortless tools in managing notifications. Current methods mostly comprise manual settings for specific applications, which is far from optimal. A data-driven approach might reveal user tendencies which could be taken into consideration when designing the core OS and its functionalities. Recent work by Khurana et al. [22] informs of a very rapidly ensuing interaction fatigue on smartwatches, which significantly reduces sustained usage of smartwatches.

With a relatively high price point—most smartwatches range in price between 200USD and 400USD—the reasons behind device abandonment is not always lack of capabilities, but rather misalignment between expected capabilities and price. Cost was included as one of the most frequent user complaints in parallel with functional problems [30]. The high cost-capability ratio can drive the device to become more of a novelty product, when compared to the capabilities of e.g. laptops or smartphones. The increasing trend of activity trackers in wearable markets offers some validation to this, as the capabilities offered by activity trackers (timekeeping, health tracking, GPS, etc.) are minimal in terms of what smartwatches have to offer but are therefore capable of offering a better fit for the consumer’s price-to-capabilities ratio. A more minimalistic approach to smartwatches technical capabilities could lead to a better fit for consumers.

The TicWatch Pro offers consumers a peek into what the future of a smartwatch could be like, by offering two usage modalities. Two screens are overlaid with each other, the top screen is a transparent LCD display which displays a minimal amount of information (time, heart rate), while the full-colour OLED screen beneath offers all typical smartwatch OS capabilities and interactions. Using only the top screen (coined as ‘essential mode’) also boosts the battery lifetime from 1 to 2 days to five or more. Ensuring battery lifetime of longer than just one full day is also considered key in designing smartwatches, and the method designed for the TicWatch Pro seems to solve some of the key design issues prohibiting consumers from adopting smartwatches: contextual transformation of device capabilities (albeit the modalities are still manually controlled) and ensuring effortlessly long battery lifetimes.

A number of recent works have explored novel interaction methods with smartwatches, from gesture detection [47, 59] to leveraging acoustics [48]. Another recent smartwatch design that aims to mitigate end-user issues with the device is the Samsung Galaxy Watch. The device leverages approaches similar to the circular list design explored in [37] and the mechanical rotating pan proposed by Xiao et al. [63]. A mechanical rotating bezel can be used to select items and navigate through OS, without the need for touch interaction. The device also uses a more limited operating system (the Tizen OS), which fits our design proposal of a more minimalistic design. Our dataset is collected prior to these devices arriving in the markets, so a follow-up experiment or analysis comparing the usage behaviours with more recent devices could reveal further details.

The increase in popularity of activity trackers and the observed challenge in sustained usage and adoption of smartwatches, combined with the usage traces from the accepted behaviours, indicates that the smartwatch as a completely functional sidekick for the smartphone might not necessarily be the optimal nor desired design.

4.4 Reflections and future work

This paper offers a data-driven approach in identifying usage adoption in a technology that is still in the early adopters phase—indicating that the user base tends to still explore means of obtaining tangible benefits from their devices. This can cause misalignment of expectations and the effect can be amplified in situations where the ultimate use cases for the technology remain ambiguous. This effect is a natural part of any new (or re-emerging) technology, but the means of measuring the progress of this effect have previously been mostly unexplored in a data-driven manner. Reasons for this can be lack of an appropriate technology, as the effect is hard to simulate in laboratory settings, and nigh impossible in the wild. Additionally, when new technologies did emerge, the technological advancements at that time might not have allowed such methods. A similar pattern can be seen in smartphone usage—starting as a potential tool for work environments like hospitals—they did not include either the means for data collection on the device via physical sensors, or the means for centralised collection of this information. Ultimately, smartphones found their users and became an integral part of our daily lives, but the steps taken to get here are still partially clouded as we have no tangible data from this period.

The latest in wearable technologies includes two other potential candidates for similar explorations—the smartglasses and activity trackers. Rauschnabel has worked on investigating potential drivers for smartwatch adoption, naming functional benefits, ease of use, as well as brand attitudes and social norms as influential [42]. Personality also has an influence in intention to adopt wearable technologies [40], and for many wearable technologies, the novelty factor—is the device fashion or technology?—can have a large influence in the intent for adoption [41]. Spil et al. [54] categorise potential use cases for wearable technologies from a personal health standpoint and categorise the use cases to five main categories: sense, process, store, transmit and apply. Dibia [12] writes about design principles for wearables, aiming to address the challenges of sustained device use.

Wearables technologies in general have been focusing more towards personal health, and an article by Piwek et al. [35] discusses the use of these personal devices for assisting medical practitioners. While a single device can monitor a range of medical risk factors and they offer potential for direct personal health analytics, it is still unclear how trustworthy this information is and how the information might best serve medicine. Canhoto et al. [7] highlight the importance of wearable user adoption in materialising these benefits and divide the adoption process to two steps—adoption (exploratory behaviour using our terminology) and sustained use (accepted behaviour). Different features support the two phases; e.g. the capability to collect data is important for adoption, whereas resilience (e.g. battery lifetime) is key for sustained use.

All these works ultimately address similar challenges to our approach but take a more qualitative and theoretical approach. Sadly, the use of device-derived sensor information has not been fully leveraged in emerging technologies, and our work is an initial step in moving from theory to long-term in-the-wild observations. ‘What is long-term?’ is always debatable, but the span of several months offers valuable insights in adopters of new technologies. This paper serves as an initial stepping stone for validating the concept of a data-driven approach and provides sufficient room for future work to explore this area. We provide a high-level conceptual understanding of usage behaviour, but an alternative approach could look at detailed differences, also accounting for demographical information and previously known traits of users. Simulating (or prompting) the complete process of adoption intent, obtaining a new technology and observing the user during initial and later device usage are challenging but ultimately would function as the best option for experimenting and investigating this idea further. This will be a challenge from a staging standpoint, but also due to the experimental subjects being easily influenced by the experimental conditions.

5 Conclusion

In this work, we investigate how smartwatch users adapt their device use over time and how this change reflects the initial expectations users potentially have of the technology. We analyse a dataset from 79 long-term users with four different types of characteristics—notification, battery, usage session and overall use frequency.

We use a categorisation of usage behaviours to those that the users use to explore new ways to benefit from their smartwatches and to those that are deemed accepted—indicating that the user is content with the benefits the smartwatch offers in his or her daily life. We highlight how users tend to adjust their device usage towards specific desired usage behaviours and move away from unwanted or suboptimal usage styles. Our results indicate how the smartwatch supports specific types of use, and users show similar changes in their usage behaviour during long-term use.

Our findings relate strongly to previous literature on smartwatch users’ preferences. This offers further validation to our results and our contribution to the ongoing research on smartwatches, how they are used and how they could be further improved to support a wider user base and increasing user requirements.