Introduction

Actigraphy is established as a valuable tool for research into sleep, physical and mental disorders. Actigraphy provides an ecologically valid assessment of sleep with a modest, but manageable loss of data reliability compared to polysomnography [1], and it is an acceptable measure of energy expenditure if indirect calorimetry is unfeasible [2]. Objective monitoring of both sleep and daytime activities is preferable to self-report as the latter show significantly lower levels of reliability. For example, whilst adults self-reported that they spent 38% of their time in moderate or vigorous physical activity (MVPA), actigraphic recordings revealed that they spent only 5% of their time at this intensity [3]. Overall, there is growing recognition that actigraphy can be applied to real-time monitoring of free-living activities in general population and clinical studies [4].

Actigraphy is increasingly employed in research in mental disorders, including in individuals at risk of developing mood or bipolar disorders [5,6,7]. Evidence demonstrates that actiwatches have utility in evaluating sleep-wake cycles in clinical or natural environments, can be employed prospectively to monitor longitudinal changes in symptoms or the evolution of an illness prodrome and might be employed to assess response to clinical interventions or patterns of recovery [8, 9]. This has led to suggestions that actiwatches could be used more widely in day-to-day practice, especially in adolescents and young adult populations where both irregular sleep patterns and low daytime activity may trigger or exacerbate health problems [4, 10]. However, research grade actiwatches, with their required device readers and software programmes, are relatively expensive. This, plus the fact the actiwatches may be lost or not returned to the clinic, may prohibit their routine use in youth mental health (YMH) settings. Unsurprisingly, it has been suggested that the rapid evolution of technology may offer opportunities to employ cheaper, consumer-based monitors in research and clinical practice [11].

The use of commercially available fitness tracking devices and other wearable technology is a key emerging trend of the last decade [11]. This ‘quantified self’ movement has been made possible by the development of consumer acceptable, low-cost, stream-lined alternatives to research grade medical devices such as actigraphic watches [12,13,14]. As such, it is now considered the norm for an individual to track their daily activity levels and sleep patterns via an on-screen display on a small wearable or transportable gadget. These devices usually link to a web-based interface that allows summary data (e.g. weekly number of steps) to be displayed on a smart phone, tablet or computer [13]. However, it is unclear whether commercial grade tracking devices could or should be used as an alternative to research grade actiwatches in YMH settings. Critically, using a commercial device to record rest-activity patterns and monitor personal wellness or lifestyle behaviours is a different proposition from monitoring putative clinical markers of mental disorders. This is especially likely to be true if the latter results in the individual receiving a clinical diagnosis or a recommendation for treatment. In addition, despite widespread adoption of commercial devices by youth populations, there is relatively little information on their accuracy in clinical settings and there has been limited testing of the validity of the various rest-activity outputs against ‘criterion’ measures [11, 15, 16].

This study examines the validity, feasibility and acceptability of using a consumer grade activity device (ConD) as a substitute for a commonly used research grade actigraphic device (ResD) in the measurement of the five sleep and activity variables that are reported most frequently in actigraphy studies in youth (see, e.g. [1, 6]. The specific aims were to:

  1. (a)

    compare seven consecutive days of 24-h recordings of sleep-wake cycles and to assess the cross-validity of the ConD with the ResD on five metrics associated with mental and physical health, namely sleep duration, waking after sleep onset, sleep efficiency, minutes spent in MVPA and proportion of time sedentary.

  2. (b)

    determine the device feasibility and usability by assessing study dropout and reviewing participant feedback regarding the use of ConD and ResD.

Methods

Ethical approval was granted by the north east branch of the National Research and Ethical Committee (NREC) in the UK (reference: 12/NE/0325) and the University of Sydney Human Research Ethics Committee (HREC: 2015/4961) in Australia.

Prior to commencing recruitment to the comparison study, a literature review was undertaken to identify the time duration required for recordings [17] and the most appropriate wrist-worn ConD, including acceptable battery life and inter-device reliability of the chosen ConD [18,19,20,21,22,23]. (further information regarding the device selection process is available from the corresponding author). This established that Fitbit© devices were the most appropriate for the present study with good inter-device reliability between different models and ease of access to minute-level data if required (via a Fitbit application programming interface; Fitabase). The reference device (Actiwatch-64; Philips Respironics, USA) was selected as it was the actiwatch that had been used in youth studies undertaken by these and other researchers.

Adherence to the study protocol, recruitment of participants and collection of data was overseen at both sites by a senior researcher (JS).

Sample

A convenience sample was recruited via clinicians involved in clinical and research programmes that undertook actigraphic monitoring of sleep and/or activity in youth (e.g. a pilot study for a new therapy for individuals at risk of bipolar disorders; a study of subjective and objective measures of sleep and mood problems in YMH clinic attendees). Eligibility criteria were as used in previous studies by several previous research studies by our research groups (see, e.g. [6, 10]).

The inclusion criteria were that the individual was (a) aged 16–25 years and (b) willing and able to give written informed consent to participate. The exclusion criteria were: (1) clinically assessed IQ < 70, evidence of intellectual impairment and/or history of head injury; (2) mental disorder secondary to a medical condition; (3) substance or alcohol use disorder; (4) elevated risk of suicide or self-harm; (5) regular use of medications that affect sleep, melatonin secretion, circadian rhythms or alertness; (6) evidence of other sleep (e.g. sleep apnoea, narcolepsy), neurological (e.g. epilepsy) or primary medical conditions associated with sleep-wake dysfunction; (7) recent trans-meridian travel (i.e. potential for jet lag) or regular shift work; and (8) the presence of mobility problems (i.e. unable to walk unaided, etc.).

Sleep and activity metrics

We identified the five most commonly reported markers of physical and mental health that were available from the ResD and could be extracted from the ConD without any additional input from the study participant or any need for further calculations (e.g. this excluded sleep onset latency as this requires additional information). There were three sleep and two activity metrics:

  1. (a)

    total sleep time (TST) and waking after sleep onset (WASO) in minutes; and sleep efficiency (SE; reported on a 0–1 value) as extracted from the device recordings.

  2. (b)

    time spent in physical activities of different intensities is reported directly by the ConD and by applying published algorithms for the ResD [24, 25]. The selected metrics, which are easily interpretable and have established positive or negative associations with health [3, 26], were: MVPA (time spent in active minutes) and Sedentary Behaviour (reported as a 0–1 value which represents the number of minutes sedentary divided by the monitoring time).

Procedure

A researcher gave discussed the study protocol and completed the consent procedures with the participant, then basic demographic and health information was recorded or estimated (e.g. height and weight were used to calculate body mass index: BMI). Each participant received instructions on how to use the ResD and ConD (as required). Individuals who did not own a Fitbit© were provided with the basic model and a charger for the duration of the study. Time was synchronized on both devices and individuals were asked to wear the ConD and ResD concurrently for seven consecutive days and nights on their non-dominant wrist (For the ConD, participants were reminded to recharge the unit on day 5). The ‘normal’ setting was selected for detection of sleep and activity on the ConD. For the ResD, the ‘usual’ threshold (medium sensitivity) was selected for sleep-wake detection.

At the end of the monitoring, the researchers collected data from the ConD for each study parameter using the same epochs as the ResD (ConD data were obtained by the participant downloading reports from the website and giving the data to a researcher; by giving the researchers direct access to the ConD or raw data from Fitabase; or by returning the ConD to a researcher). The participant returned the ResD to a researcher who downloaded and extracted recordings for each study parameter from the actiwatch. The two sets of recordings were combined into a data file and any personal identifying information was removed.

Individuals who commenced the monitoring week were asked to provide verbal feedback about their views and/or preferences for using a ConD or ResD. The four questions covered acceptability of wearing each device (in terms of, e.g. being seen wearing by their peers); how intrusive it was to wear and manage the device; any preference in terms of using or interacting with the device; any other personal comments or feedback.

Statistical analyses

All analyses were planned a priori and undertaken using SPSS (version 23). In free-living conditions, it is known that adolescent sleep-wake patterns with entrainment (weekdays with regular scheduled activities) may differ from those with reduced or no entrainment (e.g. weekends), so analyses take this into account (see below).

Normality of distributions for all measured variables was established using the Shapiro–Wilk test and statistical significance was set at p < 0.05.

Descriptive statistics were used to characterize the sample; the ConD and ResD data were then compared using three approaches:

  1. 1.

    Paired samples t tests were used to assess systematic differences between the recordings obtained from the ConD and ResD for five consecutive weekdays/weeknights.

  2. 2.

    Mean absolute percentage error (MAPE) values were then calculated. The MAPE provides an indication of the absolute value of the error and is estimated for the ConD by dividing the absolute bias (ResD–ConD) by the ResD (criterion value) measure and multiplying by 100 [11, 13].

  3. 3.

    Bland–Altman (difference) plots were used to determine whether the ConD provides an over- or under-estimation of any metric compared to the ResD [27]. We plotted two points on each Bland–Altman graph for each participant (one represents the comparison of weekday values and the other the weekend values). To create the graph, it is first necessary to compute the mean bias for the measure (i.e. the mean difference between the ResD and ConD), along with the standard deviation (s.d.) of the bias. Next the lower and upper limits of the level of agreement are calculated (± 1.96 × s.d. of the bias). The mean difference is then plotted against the average of the two measures (i.e. ConD plus ResD divided by two). A positive bias indicates that the ConD over-estimates the ResD values, whilst a negative bias indicates that the ConD under-estimates the ResD values.

Power calculation

We used data from our own and other previous studies to determine the sample size required for the paired t tests (using http://samplesizecalculator.com). Assuming a TST of 7–8 h and a 10% difference between TST recorded on the ConD and ResD (42–48 min) and a predicted s.d. of the difference (s.d.-diff) of about 30 min, then 7–8 individuals are required to achieve 80% statistical power for identifying a statistically significant difference (at p < 0.05). Likewise, six participants are required if we assume the SE is 0.8–0.9, with a 10% difference in the SE between devices (and a s.d.-diff of 0.05). Based on previously reported dropout rates from actigraphy (10–35%) and objective monitoring studies of youth (20–40%) and to allow for random missing recordings of sleep or activity variables, we estimated that if a minimum of 12 individuals commenced the study, we would obtain the required data from 8 to 10 individuals.

Results

Thirteen individuals commenced data collection, but two did not attend the follow-up appointment (and did not provide data) and ConD data were unavailable for another individual.

The included sample comprised of 10 youth (6 females) with a median age of 19.3 years (interquartile range, IQR 17–21) and a median BMI of 22.3 kg/m2 (IQR 18.9–25.7). All individuals reported depressive symptoms, four also reported anxiety symptoms, three reported symptoms of hypomania and one had a history of hallucinations. Five individuals had a family history of mood disorders (unipolar and/or bipolar) and four individuals were currently prescribed psychotropic medication. Characteristics of individuals recruited in England (n = 6) did not differ from those recruited in Australia.

Table 1 Paired t tests for five selected health markers measured on consecutive weekdays using a commercial grade (ConD) and research grade device (ResD) and the estimated mean average percentage error of the ConD

All ten individuals provided sleep and activity data for five consecutive weekdays, but one individual did not wear an actiwatch for the weekend. As shown in the paired t tests reported in Table 1, the ConD gives significantly higher values for weeknight TST and SE and for weekday MVPA compared to the ResD and significantly lower values for weeknight WASO and weekday sedentary behaviour. The MAPE indicates that the percentage error for measurements undertaken by the ConD is particularly high for WASO (45%), exceeds 10% for TST and MVPA, but is lower for the SE (9%) and Sedentary Behaviour (5%).

Table 2 Bias and limits of agreement between a commercial grade (ConD) and research grade device (ResD) for five selected health markers (see plots in Figs. 1, 2, 3, 4)

As shown in Table 2 and Fig. 1, the above findings translate into a high level of systematic bias for sleep measures, with the ConD over-estimating the TST by about one hour (+ 56.41 min) and underestimating WASO by about half an hour (− 29.65 min) compared to the ResD. In additon, the ConD overestimated SE by about seven percent (+ 0.074). Daytime activity parameters were less prone to bias (MVPA overestimated by about 6 min; Sedentary Behaviour underestimated by about 4%) (Fig. 5).

Fig. 1
figure 1

Sleep duration in minutes

Fig. 2
figure 2

Wake after sleep onset (WASO) in minutes

Fig. 3
figure 3

Sleep efficiency

Fig. 4
figure 4

Minutes spent in moderate or vigorous physical activity (MVPA)

Fig. 5
figure 5

Percentage of time sedentary

Feedback from nine participants suggested that whilst the ConD was highly acceptable there was less enthusiasm for the ResD. This appeared to focus on two main issues. First, whilst no individual refused to wear the ResD, five individuals expressed concern that wearing an actiwatch identified them as ‘a patient’. Three of these individuals indicated that they regarded wearing a medical device as potentially stigmatising and said they would be reluctant to use an actiwatch for an extended period. In addition, six individuals reported that they were disappointed that there is no option to review the ResD recordings for themselves and that they preferred the ConD because it allowed them to examine their daytime activity and sleep pattern in real-time using the device display or web-based interface on their phone or tablet.

Conclusions

Low levels of daytime activity and disrupted sleep patterns in youth are not only a public health concern, but also are increasingly regarded as important targets for assessment and monitoring in YMH settings [28]. This prompted us to examine whether a consumer grade rest-activity tracker might be used as a substitute for a research grade actigraphic device, and in what circumstances it might be employed. We discuss in the findings and limitations of the study and consider the implications for both research and clinical practice from the perspective of utility and acceptability of the different tools.

Seventy percent (10 of 13) of YMH attendees who consented to participate in this study provided simultaneously recorded data from the ResD and ConD. Using ResD outputs as the criterion values, the findings indicate that the ConD shows only modest levels of accuracy overall (MAPE range from 4 to 45%). As in some previous studies, the use of ConD for monitoring sleep is undermined by the significant overestimation of TST and under-estimation of WASO (and the consequence impact on SE estimation) [20, 23, 29, 30]. Although MVPA is overestimated by the ConD, the daytime activity metrics appear to be slightly more representative of the ResD values than the sleep metrics [18, 19, 31].

The strength of this study is that metrics and analyses were selected a priori and the sample size was calculated to allow use of paired t tests, MAPE and bias estimates (using Bland–Altman methods). This approach is preferable to correlational analyses as these may conflate apparent agreements between ConD and ResD [32], whilst concealing the magnitude of any errors and/or whether discrepancies are primarily due to over- or under-estimation of values [33]. However, the current study has several limitations. For example, we cannot definitively state that other rest-activity metrics extracted from ConD or ResD will show the biases reported for the five variables selected, although the indications in the emerging literature is that the emerging literature is consistent with our findings (and indicates problems with other measures such as sleep latency onset) [18,19,20,21,22,23, 31]. The metrics we chose were selected because of their widespread use in research and clinical practice, but also because it was easy to extract data for the variables and it was possible to examine them without recourse to sleep or activity diaries (although the ideal would be to use self-report diaries alongside objective data collection, we wished to minimize participant burden). Further, we chose an activity tracker produced by one manufacturer. Whilst this decision is justifiable, we cannot simply extrapolate findings from this ConD to other devices. The pre-study review indicated that the device we chose was the optimal ConD available for the study, but this is a rapidly developing field and testing of new ConD or new models of existing devices will be necessary. In addition, the 7-day recording period was appropriate for comparing the mean values for sleep and activity parameters, but a longer duration is required to extend analyses to, e.g. the study of variability [34]. This may create further issues in collecting data from commercial devices (as batteries need to be recharged every 5 days). Lastly, the sample size was sufficient for the primary analyses, but is insufficient to explore potential confounders of reported findings or predictors of the biases observed.

The data recordings from ConD and ResD are derived from accelerometers and the presumption is that wrist movements can be used as a proxy for monitoring daytime activity patterns, whilst the absence of movement at night equates to sleep. We found that the ConD was insufficiently sensitive to night-time movement; thus, the current study suggests ConD are a relatively poor substitute for ResD for research targeted specifically at sleep patterns. We speculate that this could indicate that, although the ConD contains a triaxial accelerometer system, the calibration of the system for the horizontal plane may be less accurate (the problem of over-estimating TST and underestimating WASO was reversed but not resolved using the higher sensitivity setting on the ConD). We emphasize that this is a hypothesis that requires testing, but the normal setting produced significantly different values for the sleep metrics compared to the ResD. In contrast, daytime estimates of MVPA and sedentary behaviour (mostly sitting rather than lying down) appeared to be more comparable with the actiwatch recordings. However, from a technological point of view, we do not know for certain if it is the hardware or the software that performs less well in the ConD (or both). As such, the findings warrant further testing in larger studies with, e.g. samples undertaking more detailed, systematic clinical assessments; use of self-report diaries to clarify the nature of rest-activity behaviours; comparisons with other gold standard measures; using different types of ConD; experimenting with different device sensitivity settings; and longer periods of recording.

One of the reasons for increased interest in objective measures of sleep and activity cycles is their potential use in personalised or precision medicine [35]. At this stage, assuming our findings is confirmed by others, we do not recommend employing the ConD for personalised diagnostics in youth. For instance, using a ConD is unlikely to help to screen for or reliably determine whether an individual is experiencing a clinically meaningful delayed sleep phase; nor is a ConD likely to provide sufficient information to enhance treatment selection. However, ConD might be useful, e.g. to assess within-subject changes in daytime activity or sleep patterns in situations where absolute accuracy is not required. This application could help in clinical practice where intra-individual monitoring may help to ascertain whether a specific intervention or treatment (that has already been selected) is having a positive effect on activity levels or sleep patterns [8, 36]. Further, the availability and acceptability of ConD and the opportunity for real-time self-monitoring may increase their potential utility as an adjunct to repeated prospective assessments of clinical progress. In youth, their preference for using a ConD may increase the likelihood of engagement in monitoring rest-activity patterns and therefore increasing the amount of data available to a clinician or researcher. In addition, its use may enhance collaboration between clinicians and clients.

In conclusion, an advantage of research grade actiwatches is that they measure and store much more data and the algorithms allow evaluation of many more sleep-wake cycle parameters than ConD. A major disadvantage is that actiwatch data are not readily available to the wearer and youth are ambivalent or against their routine use. However, the perceived benefits of ConD in youth may be insufficient in situations where reliable data on a sophisticated set of parameters, as available from ResD, are critical to the decision-making process. Thus, the current study suggests that ResD is preferable for precision diagnostics or attempts to stratify cases into treatment relevant subgroups. For other purposes, there may be a need to consider the ‘trade off’ between quality of data recording versus the likelihood of obtaining the required quantity of sleep and activity data.