Simple and rationale-providing SMS reminders to promote accelerometer use: a within-trial randomised trial comparing persuasive messages
Literature on persuasion suggests compliance increases when requests are accompanied with a reason (i.e. the “because-heuristic”). The reliability of outcomes in physical activity research is dependent on sufficient accelerometer wear-time. This study tested whether SMS reminders—especially those that provided a rationale—are associated with increased accelerometer wear-time.
We conducted a within-trial partially randomised controlled trial during baseline data collection in a school-based physical activity intervention trial. Of 375 participants (mean age = 18.1), 280 (75%) opted to receive daily SMS reminders to wear their accelerometers. These 280 participants were then randomised to receive either succinct reminders or reminders including a rationale. Data was analyzed across groups using both frequentist and Bayesian methods.
No differences in total accelerometer wear minutes were detected between the succinct reminder group (Mdn = 4909, IQR = 3429–5857) and the rationale group (Mdn = 4808, IQR = 3571–5743); W = 8860, p = 0.65, CI95 = − 280.90–447.20. Similarly, we found no differences in wear time between participants receiving SMS reminders (Mdn = 4859, IQR = 3527–5808) and those not receiving them (Mdn = 5067, IQR = 3201–5885); W = 10,642.5, p = 0.77, CI95 = − 424.20–305.30. Bayesian ANOVA favored a model of equal weartime means, over one of unequal means, by a Bayes Factor of 12.05. Accumulated days of valid accelerometer wear data did not differ either. Equivalence testing indicated rejection of effects more extreme than a Cohen’s d (standardised mean difference) of ±~0.3.
This study casts doubt on the effectiveness of using the because-heuristic via SMS messaging, to promote accelerometer wear time among youth. The because-heuristic might be limited to face-to-face communication and situations where no intention for or commitment to the behavior has yet been made. Other explanations for null effects include non-reading of messages, and reminder messages undermining the self-reminding strategies which would occur naturally in the absence of reminders.
DRKS DRKS00007721. Registered 14.04.2015. Retrospectively registered.
KeywordsAccelerometry Intervention Text messaging SMS Persuasion Adherence Behaviour change Adolescents School-based research Partially randomised trial
Let’s Move It intervention to increase physical activity and decrease sedentary behavior in older adolescents
Randomised controlled trial
Short Message Service, also known as “text messaging”
Compliance with accelerometer wear instructions
Reliable and valid assessment is necessary when evaluating whether public health policies or interventions change physical activity (PA) levels in the target group. Little consensus exists about what to measure, when, with what and for how long in PA research [1, 2]. While an inability of individuals to accurately remember their past PA and social desirability are clear problems with self-reported PA measures , objective measurements of PA (e.g. pedometers and accelerometers) have issues too. Zhuang et al.  found that missing accelerometry data was more common in 15- to 17-year-olds than among younger participants, especially during weekends (Sundays in particular), with missing data occurring increasingly from the first recording day to the last. This exemplifies a key issue in measurement: the proportion of an individual’s day or week captured by the measure. An extreme example would be an individual, who only wears the measurement device when undertaking PA. Thus, some guidelines suggest that a person should wear an accelerometer for a minimum of 10 h daily for at least 4 days in a 7-day measurement period in order to obtain an accurate reading of PA [1, 2]. Participants’ compliance with instructions on wearing the accelerometer is clearly very important in obtaining accurate PA measurements .
Research on enhancing accelerometer instruction compliance rates is rare [2, 6], particularly among older adolescents. One strategy has been monetary incentives contingent on proper wear-time . Sallis et al.  used an alternative strategy, asking participants to re-wear the accelerometer if they had not worn it for at least 5 valid days (> 10 valid hours of data) or a minimum of 66 valid hours across 7 days.
Barak et al.  suggest that new opportunities to promote compliance—such as text messaging (SMS; Short Messaging Service)—may be more reliable and effective than traditional methods, such as written or verbal wear instructions by the investigator. Zhuang et al. , too, recommend SMS reminders. Toftager et al.  used SMS reminders to increase compliance but did not report effects or acceptability. In a self-selected Irish sample of adolescents , daily SMS reminders were associated with putting on the accelerometer in the morning, but not in increased overall compliance (defined as valid days of data or minutes of non-wear). The study did not report levels of wear or effects of the reminders. The discrepancy between remembering to put on the device and actually wearing it for a sufficient amount of time indicates that these may be separate behaviors.
Compliance and the ‘because-heuristic’
A well-known principle of human behavior says that when we ask someone to do us a favor we will be more successful if we provide a reason. People simply like to have reasons for what they do. 
Following the terminology used by Key, Edlund, Sagaring and Bizer , the phenomenon of increased compliance by providing reasons is referred to as “the because-heuristic.” Let us accordingly define the naïve because-heuristic as “reasons increase compliance.”
In the Langer, Blank and Chanowitz study 1, this effect of reasons increasing compliance was only found when the confederate asked for ‘a small favor’ (five instead of ten pages, translating to effect sizes of d = 0.87 and d = 0.13, respectively) . Still, the results in general, as well as their implications have been questioned [21, 22]. A study by Folkes suggests, that instead of the size of the request, the effect is moderated by controllability . Pooling Folkes’ reason conditions results to an effect size of d = − 0.026, speaking against the quote above, and pointing out that the “power of reasons” effect is malleable, in the least.
To our knowledge, only one published direct replication of the Langer, Blank and Chanowitz study 1 exists . The main effect of the study replicated (d = 0.67 for placebic over no reason and d = 0.69 for real over no reason conditions), although over 20% (34 out of 163) of the participants needed to be excluded for various reasons. Lack of published replication studies, of course, is not new in the field of psychology .
In a conceptual replication of the phenomenon, in small request conditions, reasons (either placebic or real) increased compliance by an equivalent of d = 0.43 (calculated from Table 1 of ) when including their additional persuasion group and d = 0.22 when excluding it. Another conceptual replication  found d = 0.15 for requests perceived as small, and d = 0.21 for requests perceived as large (as calculated from Figure 3 of ).
These studies seem to temper earlier claims for the power of reasons in increasing compliance. In contrast to the naïve because-heuristic, let us define the weak because-heuristic as “reasons increase compliance, but only if the perceived favour is small”.
This study will investigate the effects of the because heuristic on compliance with the physical activity measurement procedures in the context of baseline measurements of a large school-based intervention.
The Let’s Move It cluster randomized trial
Inadequate PA predicts increased morbidity and mortality in people of low socioeconomic status (SES) , with SES differences in PA emerging already in adolescence . Finnish vocational school students are less physically active than those in high school . The Let’s Move It intervention aimed to increase PA and decrease sedentary behaviors in older adolescents in vocational schools.
The current study was conducted as a sub-study of the cluster randomised effectiveness evaluation trial of the Let’s Move It intervention . In a preceding feasibility study , participants’ accelerometer wear times were suboptimal; 47% (18/38) of baseline participants reached the cutoff of 10 h per day for at least 4 days, 63% (17/27) for the first and 75% (9/12) for the second follow-up. A frequently cited explanation for not wearing the accelerometer was forgetting to put on the device.
Aims and hypotheses
Are SMS-reminders associated with greater accelerometer wear times?
The current study investigated this by comparing the compliance rates across a) participants who opted to receive SMS reminders to wear their accelerometer, and b) participants who opted not to receive the reminders (non-randomised control group). If forgetting is an important reason for non-compliance, in the absence of intervening factors, reminders should increase compliance.
Does offering reasons to comply affect accelerometer wear time?
If reasons increase compliance, SMS reminders containing reasons to wear an accelerometer should lead to greater compliance.
Statistical hypothesis H2: Those who receive reasons in the SMS reminders have more minutes of accelerometer wear and more days of valid data (≥10 h of activity) than those who do not receive reminders containing a reason.
An additional research question, on whether providing reasons to comply with accelerometer wear increases trial retention, is omitted here. These null results are reported in .
Participants and sampling procedures
To be included in the study, the participants had to fulfill inclusion criteria of the Let’s Move It study  and had to have consented to the accelerometry measurements: all were at least 16 years old and were vocational school students. The reminder arms consisted of the participants who opted in to receive reminders for accelerometer wear.
During baseline recruitment of the first two recruitment waves of the Let’s Move It trial, students in two vocational schools were approached during class and informed about their school’s study participation in the study. After the invitation to participate in the main trial and collection of signed informed consent forms, those who consented were given an online questionnaire to complete. Details of trial procedures are reported in the protocol .
After 1–3 days, research assistants gave the participants a waist-worn accelerometer (Hookie AM 20, Traxmeet Ltd., Espoo, Finland) and instructed them on how to wear it for a duration of seven consecutive days (including the day of receiving the device). The used Hookie accelerometer is a tri-axial accelerometer that collects data at 100 Hz sampling rate without preprocessing. The measurement range of the accelerometer is ±16 g and the resolution is 4 mg (milligravity). The Hookie accelerometer employs the same tri-axial acceleration sensor component (ADXL345; Analog Devices, Norwood MA) that is used in widely used research-grade accelerometers . The validation of the Hookie accelerometer has been reported in both children  and adults  in studies comparing analysis of raw acceleration data from different accelerometers.
When participants received the accelerometers, they were asked whether they would like to receive SMS messages to help them remember to put it on every morning. Those who consented to the messages were subsequently randomised to one of two message conditions, and those who opted not to receive the reminders were treated as a self-selected control arm.
After 7 days, participants returned their devices to research assistants and were asked to fill out a short questionnaire assessing process measures (see Additional file 1: Appendix S1 and Additional file 2: Appendix S2.
Participants were assigned to the reason and succinct arms after they were recruited. The first author extracted the phone numbers from the list and used R code to create an amount of random numbers equal to the number of new participants. The vector of random numbers was then assigned to the participants. Participants with a number equal to or smaller than the median of the vector were allocated to the reason-condition. Others were allocated to the succinct condition. Research assistants working in the field to assess were blind to group allocation. Recruitment and randomisation took place on the same day, and restrictions such as blocking or stratification were not used.
Recruitment took place in two waves, alongside the recruitment of the main trial. In order to increase the rates of participants opting in for the reminders, the recruitment prompt was slightly modified for the second wave. The research assistants presented the SMS reminders as the default option, and asked whether this is acceptable to the participants.
Random assignment was not visible to the participants and the research assistants did not mention that different kinds of messages were going to be sent. The statistician who analysed the raw accelerometer data was blind to group assignment.
An important issue regarding the current study was to avoid tampering with the effects of the main trial. In other words, it should not affect main trial outcome measures in any other ways except for increased data quality. Care was taken to formulate the SMS messages to not pressure participants or provoke changes in main trial outcome measures such as PA.
We altered a previous procedure  by varying the message content slightly each day to reduce habituation and thus expected to increase the chances of the message being read, for both arms.
Succinct reminder condition: 1. a greeting – 2. a reminder – 3. a thank you
Reminder and reason: 1. a greeting – 2. a reason beginning with “Because…”, followed up with a reminder – 3. a thank you
SMS content, translated to English
Reminder with rationale (the “because heuristic”)
Morning! Because your participation is precious, please remember to put on the motion measurement device and wear it until you go to sleep (except in the shower etc.) - thanks!
Morning! This is a reminder to put on the motion measurement device and wear it until you go to sleep (except in the shower etc.) - thanks!
Hi! Because you’re aboard in producing very important knowledge, please remember to put on the motion measurement device now and wear it as instructed until you go to sleep. Thanks a lot!
Hi! Please remember to put on the motion measurement device now and wear it as instructed until you go to sleep. Thanks a lot!
Hello! Because the study wouldn’t succeed without your help, please remember to put on the motion measurement device again and wear it until you go to sleep (except in the shower etc.) - thanks!
Hello! Please remember to put on the motion measurement device again and wear it until you go to sleep (except in the shower etc.) - thanks!
Morning! Because the data you gather is highly valued, please remember to put on the motion measurement device and wear it until you go to sleep. Thanks (we’re already past midpoint)!
Morning! Please remember to put on the motion measurement device and wear it until you go to sleep. Thanks (we’re already past midpoint)!
Howdy! Because your participation produces very important knowledge, please remember to put on the motion measurement device and wear it until you go to sleep (except in the shower etc.) - thanks!
Howdy! Please remember to put on the motion measurement device and wear it until you go to sleep (except in the shower etc.) - thanks!
Hi! Because even this last day is important, please remember to put on the motion measurement device and wear it until you go to sleep. Return the motion measurement device to school tomorrow - thanks!
Hi! Please remember, even on this last day, to put on the motion measurement device and wear it until you go to sleep. Return the motion measurement device to school tomorrow - thanks!
We sent the messages using an SMS Gateway device MT-SF100-G-EU (MultiModem iSMS Server 1-port) by Multi-Tech Systems (http://www.multitech.com/brands/multimodem-isms). We used a manufacturer-designed guided user interface for the first recruitment wave and a custom interface designed by a local service provider for the second wave.
Registration and deviations from registered plan
The study plan was reviewed by the Ethics Committee for Gynaecology and Obstetrics, Pediatrics and Psychiatry of the Hospital District of Helsinki and Uusimaa (decision number 367/13/03/03/2014).
Official public registration in the German Clinical Trials Register (DRKS-ID: DRKS00007721) was completed 3 months after recruitment of the first wave had been initiated, but before data was available. Pre-registration (before starting data collection) failed due to lack of available resources at the time.
The original plan was to establish the additive effect of messages containing a reason and those not containing one over a no-message condition during the baseline measurement of the first batch. With the sample size we expected (n = 140), we would have had over 95% power to detect an effect of d = 0.6 (slightly smaller than the one discovered in the Langer, Blank and Chanowitz replication ). We had then planned to pit the more successful message type against a third message in the second wave. Instead of going forward with the plan of using a third message, we made the decision to gather another wave of participants with the same message types after the data from the first wave was analysed. This was due to the fact that, contrary to our expectations, no difference between the two messages was detected. This is important to note, as it means we can no longer rely on a long-term error rate of 5%  and—as p-values depend on the sampling distribution—default p-values from common statistical programs no longer apply .
To address the issue of inadequate reporting in the sciences , the current report complies with the Consolidated Standards of Reporting Trials (CONSORT) statement . Contributor roles are clarified in Additional file 3: Appendix S3, according to a taxonomy for this purpose .
Primary outcome measures
Primary outcome measures were 1) accelerometer wear time minutes and 2) days with ≥10 h of valid accelerometer data. As this trial was conducted within a larger trial, several other measures were collected and are listed in the Let’s Move It protocol . The main trial used a 3-axis accelerometer with a 2GB internal memory (Hookie Meter v2.0, Hookie Technologies Ltd., Espoo, Finland). The activity data was registered using raw data and a 100 Hz sampling rate.
Implementation assessment measures
Self-reported message receipt. As we could not gather objective log data on the number of messages opened, we asked participants to assess on how many mornings they had opened and read the SMS. Response options were: Not on a single morning, On 1 morning, On 2–3 mornings, On 4–5 mornings and Every morning.
Manipulation and contamination check. As participants were randomised individually, as opposed to clusters at school class level, discussing the SMS messages with their classmates could have led to students finding out that not everyone received the same messages, and perhaps also reveal the study hypotheses. We attempted to gauge the extent of this by asking them how often they had discussed the messages with peers. Response options were: Not once, Once, 2–3 times, 4–5 times and More often.
Acceptability of SMS message content was assessed by asking the participants, how much they agree with the statement “I was satisfied with the content of the messages”. Response options again had a 5-point scale: Completely disagree, Somewhat disagree, Do not agree nor disagree, Somewhat agree and Completely agree.
All non-Bayesian analyses were conducted using RStudio running R [41, 42]. Plots were drawn using R packages ‘ggplot2’  and ‘yarrr’ . Distributions between the reason and succinct groups in the implementation assessment questions were compared using the chi-square test.
Accelometer wear times were analysed using bootstrapping methods. A 95% bootstrap confidence interval for a mean can be acquired by resampling observed data to simulate a sampling distribution, obtaining the values for the 0.025th and 0.975th percentiles of resampled means . A kernel density plot, bootstrap confidence interval and a bootstrap test of equivalence were conducted using R package ‘sm’  for differences of distributions of the two reminder arms. Wilcoxon rank sum test with continuity correction was used to compare medians between groups.
ANOVA for equivalence of means between the two reminder groups and the no-reminder group, as well as its illustration, was performed using R package ‘userfriendlyscience’ . Additionally, a MANOVA with wear time minutes and wear days with valid data as dependent variables, and SMS group as an independent variable, was used to test robustness of results.
A 95% Bayesian Highest Density Interval (HDI)  of the means of valid wear days was plotted using R package ‘yarrr’. HDI refers to the most likely population parameter values (here: means) given the data; information which is not delivered by frequentist confidence intervals [48, 49].
Due to our sampling methods (e.g. decision to collect more data was based on observed data), traditional frequentist statistics faced limitations. Thus, we also calculated Bayes Factors [50, 51, 52] for our main outcome measures. A Bayes factor BF01 is essentially the ratio of two likelihoods, answering questions such as “Given the data, how many times more likely is the null hypothesis, compared to a specific alternative hypothesis”. We used the R package BayesFactor ; For comparing means, this package assigns the alternative hypothesis a Cauchy prior. We used a prior scale of 0.3, in accordance with common effects in health psychological research . This reflects a prior belief that 50% of the effects lie between d = − 0.3 and 0.3. For contingency tables, priors are described in Jamil et al. . The minimum value is 1, and an increase reflects the belief, that the distribution of observations in the given categories under H1 is relatively more similar to H0. Additional information on inference using Bayes Factors, and prior robustness checks are found in the supplementary website (https://git.io/vNl8X).
In the frequentist statistical paradigm, support for the null hypothesis is indicated by the practice of equivalence testing . For a difference between means, one essentially first establishes a region of equivalence to zero, then conducts and combines two t-tests. The first one tests whether the effect is higher than the lower bound (in our case, − 0.3), and the other tests whether the effect is smaller than the higher bound (in our case, 0.3). The tests were conducted using R package “TOSTER” .
We did not conduct multi-level analyses to account for the intra-class correlation of 0.09 for total accelerometer wear time. Heterogeneity analysis is presented in the supplementary website (https://git.io/vNl8X) file under “Heterogeneity among clusters”.
Using standard deviations estimated from feasibility study  data, we determined a practically significant effect size for wear time hours to be d = 0.42 – enough to bring a person from 9.5 h of daily data to reach the cutoff of 10 h. For our purposes, we decided to consider effect sizes between − 0.3 and 0.3 as equivalent to zero. Additional details are presented in the supplementary website under “Statistical power”.
We also evaluated Type S and type M error probabilities , and the v-statistic . The analysis is presented in the supplementary website (https://git.io/vNl8X). In brief; our design was relatively well-equipped to handle medium-sized effects, but is subject to considerable bias under small effects.
A participant flow diagram presented in Fig. 1 indicates how the messages were sent to almost all participants as intended.
Of the 375 participants consenting to accelerometer measurements as part of the main trial, 95 opted out of receiving reminders and an additional 7 did not receive messages due to technical difficulties. In the end, the SMS messages with reasons were sent to 138 and the succinct messages to 135 participants. Consent rate for reminders was 54% (101 out of 186) in the first wave and 95% for the second wave (179 out of 189).
Weartime data available
M age (SD)
Implementation and process measures
Open comments did not reveal unforeseen negative effects. In addition, 13% (9 out of 70) of participants who answered the question explicitly added, that remembering to wear the device was due to receiving the messages.
Wear time minutes
Bootstrap tests of equal densities indicated no differences in total wear time minutes between the two message types (p = 0.28), nor between those who received and did not receive messages (p = 0.35). Wilcoxon rank sum test showed no differences in distributions between message groups (W = 8860, p = 0.647, CI95 = − 280.90–447.20) or whether one opted in the messages or not (W = 10,642.5, p = 0.771, CI95 = − 424.20–305.30). Differences were neither detected between the two schools (W = 17,398.5, p = 0.051, CI95 = − 1.60–619.60) or recruitment waves (W = 17,310.5, p = 0.067, CI95 = − 19.0–586.3).
Equivalence tests indicated, that the mean wear time differences between message types (69.92 min, 90% CI [− 262.37; 402.21]) and the reminder/opt out groups (1.98 min, 90% CI [− 347.12; 351.08]) were statistically significantly larger than d = − 0.3 and smaller than d = 0.3. In other words, the effect size for the difference in means was deemed less than |0.3|.
Valid measurement days
Differences between the distributions of measurement days with > 10 h of data were not detected between the reason and succinct groups, χ2(7) = 7.893, p = 0.342. A Bayesian contingency tables test provided BF01 = 6.96 (Poisson sampling, prior concentration = 1.0; prior robustness test depicts a concave function where, as concentration approaches 2, BF01 approaches 22.97).
Differences were not detected in valid wear day distributions between participants for whom reminders were sent, and for whom they were not: χ2(7) = 8.344, p = 0.303. A BF01 = 34.79 (Poisson sampling, prior concentration = 1.0; robustness function is concave as before. As concentration approaches 2, BF01 approaches 93.50).
Again, equivalence tests of mean differences between message types (− 0.07 days, 90% CI [− 0.47; 0.33]) was statistically significantly larger than d = − 0.3 and smaller than d = 0.3. The mean difference between reminder and opt out groups (− 0.18 days, 90% CI [− 0.60; 0.24]) was statistically significantly smaller than d = 0.3, but we could not reject the hypothesis that the effect was higher than d = − 0.3.
A MANOVA with both total wear time minutes and valid wear days as dependent variables neither detected differences between the reason, succinct and opt out groups (F(4, 682) = 2.335, p = 0.054, Wilk’s Λ = 0.973), although multicollinearity may have posed a problem to the model (τ = 0.81, ρ = 0.93).
In an attempt to improve measurement of physical activity and sedentary behaviour—key public health issues—this study evaluated the effects of two interventions to increase accelerometer wear times during the first two recruitment waves of the Let’s Move It trial. Specifically, it tested the effects of the because-heuristic on accelerometer wear time in older adolescents. We did not detect increased wear times among participants who received a reason in their daily SMS reminders, nor did we detect different wear times between those receiving the reminder messages and those who opted out. In all cases, null models were supported over those with small-to-medium sized effects (see supplementary website (https://git.io/vNl8X) sections “Interpreting Bayes Factors” and “Bayesian ANOVA” for details). As it is neither logically nor statistically appropriate to conclude the absence of an effect from a non-significant hypothesis test [60, 61], we hope the analyses contribute to a long-overdue inferential development in the field.
Our main results are in line with the results of Belton et al. , of reminders not being able to increase wear time, despite attempting to improve on the earlier studies by not having exactly the same message sent every day. We do not have data on whether the reminder caused our participants put on the accelerometer more often, in spite of not increasing wear time as found in .
Although the xerox machine study  has been highly publicised for 30 years, the contextual framework of the effect remains unclear. Thus, many possible reasons could explain the null results obtained in this study, including the impersonal nature of SMS communication (as compared to face-to-face interaction), the source of the information, being incapable to complete the requested task, and a several other factors varying in plausibility – demographic factors, the target behaviour, contextualised cognitive processes and so forth. Accordingly, the effect of reasons on this particular behavior, given our context and delivery method, has proved smaller than what would be considered minimally interesting (although participants did attribute remembering to wear their accelerometer to receipt of the reminders), and possibly zero. Thus, the naïve because-heuristic does not receive support in the current study. We can not make conclusions regarding the weaker claim of reasons affecting only tasks which are easy to carry out, due to design and sample size considerations.
The flat dose-dependence curve can indicate several things, including the possibility that text messages do not affect wear times. Attributing remembering to getting reminded could be a case of a post hoc reasoning error . Another possibility is that the messages could have had a small effect, but opening and reading the message provided no additional benefit. For example, the participant could have looked at the preview of the message on the cell phone screen and remembered without reading the whole message.
As there were no differences between the SMS and no-SMS arms, this effect may have been masked by selection bias, with those people who expect to experience problems with remembering, opting in to receive SMS reminders. As consent was almost fully dependent on the recruitment prompt, an additional assumption is needed that the two recruitment waves differ qualitatively (on an unobserved confounder). So, for example, the second wave may have consisted of more compliant participants or the potential interactions with the first wave participants might have made the opinion of the study more favorable. Thirdly, the effect of reminders may not have been linear, or only a small dose is needed to form a habit, and thus achieve maximal effect. This explanation requires the same assumptions as the one described above. Fourth, the flat curve may also be caused by unreliable measurement: dose should be operationalized in a way not dependent on self-report. Finally, it is possible that receiving reminders causes an undermining of one’s own responsibility, so that those who receive reminders relinquish control and do not carry out the remembering techniques (e.g. placing of the accelerometer in a conspicuous place as a prompt to put it on) they would have, in the absence of reminders.
It may be that daily accelerometer wear is not determined by heuristic/automatic processes, but rather, is under more reflective reasoning processes. In this case, these reminders should have provided justifications and rationales that truly are important for this target group. We do not have any evidence what thoughts and connotations our reminder content evoked in the youth’s minds, and whether it was counterproductive. Finally, it is possible that participants who had agreed to take part in the accelerometer data collection already had made the reflective decision and proceeded to “implemental mindset” where persuasion messages are less relevant; e.g. as speculated in .
Limitations and strengths
There are a number of ways this study could have been improved on.
Opening and reading the messages (manipulation success)
Number of participants who opened and read the messages was assessed with a questionnaire instead of objective log data. This self-report measure (as well as the other post-intervention questionnaire items) was only a non-validated single item, thus probably far from optimal in terms of reliability. We had no reliable way to certify at which times the messages were received or whether they were opened at all. Anecdotal evidence indicated that the messages were too late for some students (i.e. they had already left the house and forgotten the accelerometer when receiving the message). On the other hand, we deemed sending the messages too early might pose an acceptability issue. The SMS queue in the gateway device presented a difficulty: larger number of message recipients heavily affected the deviation of delivery times, making the last messages in the queue arrive late for some students. During the second recruitment wave, time of initiating the send process was changed to be 45 min earlier (06:15 instead of 07:00), but we do not have data on the effect of this change.
We attempted to alleviate effects of not opening the messages by starting the each with the word “because”, so that message preview would render it visible on many devices even when not opened. Unfortunately we did not have access to a gateway system that could have sent e.g. MMS-messages, where a small picture could have been added, thus providing log data on how many times the picture was downloaded.
Contamination effects and masking the different message conditions
Participants may have found out their group allocation when discussing the messages with peers. This would require the discussion to have been about the nuances of message content and assumes that the participants are intrigued enough to spend time on making such inferences in the first place; an assumption perhaps not warranted. It is unclear how the discovery of SMS arm would have affected the results, but the possibility of confounding cannot be excluded. Randomising the groups by clusters could have helped to avoid this, but would have led to a reduction in statistical power. Still, the participants reported mainly not having discussed the messages with peers.
The stopping rule for data collection was not defined in advance. The decision to collect another wave of participants with the same design was made, when it became apparent that the messages did not have the strong impact we had anticipated. This leads to uninformative p-values in terms of error control , whereas Bayesian analyses are not as crucially affected by stopping rules (, but see also ).
Lack of a randomised no-SMS control group
In order to avoid distortion of main trial outcomes (e.g. increased PA), care had to be taken in this within-trial RCT. The risk of sabotage due to disappointment of being allocated to a no-SMS control group was deemed too high, and thus participants were not randomised into a no-SMS group. This, in turn, lessens the strength of conclusions based on wear times between the participants receiving the reminder and those not receiving one. People who know they do not need a reminder may have thus ended up self-selecting to the no-SMS group.
This presumes that teenagers studying in a vocational school have the capacity to make accurate predictions about their future self-regulation capabilities in an unfamiliar task (putting on an accelerometer). On the other hand, as described, the wording of the recruitment prompt was slightly modified from wave 1 to wave 2, and consent to reminders was increased from 53% (85 out of 97) to 95% (176 out of 186), whereas wear times did not differ. Thus, strong selection effects seem unlikely. Although this indicates that opting out was more a result of the recruitment procedure than knowledge of not needing the reminders, future research should aim to randomise when feasible.
One way to address this problem would have been an n-of-1 design, where each day is randomised to one of the three message conditions. With this design, one should be careful to not leave learning effects undetected, as participants could habituate to reminders and forget in the concurrent absence of them.
Message content and size of request
The intervention was not piloted, nor was extensive testing of it’s component parts done, which may have affected the results. The pre-testing of the message content was limited, too, and we thus do not have data on whether our participants considered the messages persuasive. This could be important theoretically, especially if the request size was considered large and our reasons were perceived as placebic or near-placebic. However, this might not be an issue in the first place, as participants had already agreed to wear the accelerometer as part of the trial. Message content (as explicated in hypothesis H2) may not play a role at all, if the real reason for non-compliance is e.g. leaving the house in a rush. In such a case, though, we would still expect those who are reminded to have increased wear times compared to those who are not reminded (hypothesis H1).
In this paper, we attempted to answer to the call of more stringent methodology by pre-registration. Optimally, this would have been done prior to beginning data collection. In these cases, it has been proposed that analyses should be considered exploratory —especially in the presence of researcher degrees of freedom or data-dependent analysis decisions —and can render p-values meaningless. In our case, this mistake turned out to be nonconsequential. We used Bayes factors to avoid claiming findings based on p-values alone, as recently warned against by the American Statistical Association . Other approaches we used to address the replicability problem were transparent reporting and open data.
Implications for practice
Our results, in line with some other studies e.g.  indicate that researchers should not expect simple reminders to have strong effects on accelerometer wear times among youth. Also, despite previous strong claims, the because-heuristic in this context lacks the strength attributed to it in the popular literature. When considering using SMS reminders for youth, we suggest ensuring that remembering plausibly plays the key role in compliance with the behavior and target group in question, instead of other determinants/factors (such as social norms or motivation). Participants’ coping skills and attention span may act as a ceiling to the potential effect of the reminder in situations where the target behavior can not immediately be carried out, so suitability of SMS reminders could be assessed in these respects as well.
Implications for future research
To an extent, the findings here apply to situations where cost-effective reminders can potentially improve compliance. These areas may range from medication adherence  to sunscreen use . An interesting hypothesis to test, would be whether reminders actually reduce active coping strategies that people use spontaneously – this could partly explain some null findings in the literature on technical reminder systems . Second, the delivery of the reminders should optimally be objectively trackable, in order to make firm conclusions about the independent effects of delivery and receipt. Third, the context (including timing and location) where the participant receives the reminder is likely to be important, as well as the coping behaviour of the control group. It may also be worthwhile to gauge whether altering frequency of reminders affects the target behavior , or if the system can be made such that it adapts to the users and their environments . Lastly, it might be worthwhile to investigate, if personally meaningful persuasive arguments work better than vague and general ones (e.g. contributing to science), which were used in order to minimise risk of participants changing their activity behaviour instead of merely the wear time behaviour. As the literature presented earlier suggests, any reasons should be enough for heuristic decision making, whereas good reasons may be needed for more reflective decisions. Further theorising and additional measures to test hypotheses based on dual process models may be fruitful – but we encourage researchers to see [74, 75, 76, 77, 78, 79], and also consider a wider perspective from complexity and systems theory [80, 81, 82, 83], which have recently been applied in public health [84, 85, 86].
In this research, we have found evidence against the assumed superiority of the naïve because-heuristic; providing reasons in simple compliance requests having a general persuasive effect on behaviour. By using Bayesian methods and equivalence testing, we were able to claim evidence of no effect for the because heuristic in this setting. Likewise, sending SMS reminders was not associated with improved accelerometer wear times. Although we did not randomise the no-SMS group, the changed recruitment procedure plausibly accounts for majority of the selection effect, and a more potent explanation for the lack of differing weartimes is reaching the ceiling of the participants’ ability to wear in the absence of very high motivation.
Our design had several limitations, which should be improved upon in future research. All in all, we remain pessimistic of the efficacy of the naïve because-heuristic and of simple reminders, even if they have a potent effect in participants’ perceptions.
We conclude that despite strong claims, there is reason to consider the study of the because-heuristic a degenerating research programme , although there may be some contexts where the technique works as intended. Seeking to increase accelerometry wear time in participants may benefit from a design using the intervention mapping approach , including a plausible theoretical framework.
We would like to thank Falko Sniehotta for his helpful comments during the conception of this study. We would also like to thank the research participants and the schools, as well as the research staff who aided in collecting the data.
The Let’s Move It study, within which this study was conducted, was funded by the Ministry of Education and Culture, funding number 34/626/2012 (years 2012–14), and OKM/81/626/2014, (years 2015–17), as well as the Ministry of Social Affairs and Health, funding number 201310238 (years 2013–15). Finalisation of the manuscript was done under funding by the Academy of Finland (MH: grant number 295765, NH: grant number 285283). The funding bodies played no role in the design of the study or writing the manuscript, nor the data collection, analysis, or interpretation.
Availability of data and materials
Data and materials will be available at https://osf.io/tbyaz/, when the anonymisation process of the full Let’s Move It trial data has been completed.
MH participated in study conception, methodology, performing the experiments, and writing the manuscript. He also wrote the initial draft of the paper and the code for analysis and visualisations, created the supplementary website, and was responsible for data management. KK and AH participated in methodology and preparing the manuscript, and in addition AH contributed to funding acquisition. TV provided resources (including funding and measurement instruments). NH participated in study conception, methodology and performing the experiments, as well as providing resources (including funding) and being responsible for research supervision and project administration. In addition, MH, KK, AH, TV and NH provided critical review, commentary and revision of the manuscript. All authors read and approved the final manuscript. Detailed authors’ contributions are presented in the CRediT contributor role taxonomy (Additional file 1: Appendix S1).
Ethics approval and consent to participate
This study was approved by the Hospital District of Helsinki and Uusimaa, The Ethics Committee for gynaecology and obstetrics, pediatrics and psychiatry (decision number 367/13/03/03/2014). All participants consented to participate in the study.
Consent for publication
The manuscript does not contain any individual person’s data.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 6.Audrey S, Bell S, Hughes R, Campbell R. Adolescent perspectives on wearing accelerometers to measure physical activity in population-based trials. Eur J Pub Health. 2012;cks081:475-80.Google Scholar
- 13.Pratkanis AR. Social influence analysis: an index of tactics. In: Pratkanis AR, editor. The science of social influence: advances and future progress. New York: Psychology Press; 2007. p. 17–82.Google Scholar
- 14.Cialdini RB, Goldstein NJ, Martin SJ. Influence: Science and practice. Boston: Pearson Education; 2009.Google Scholar
- 15.Blount J. Fanatical prospecting: the ultimate guide to opening sales conversations and filling the pipeline by leveraging social selling, telephone, email, text, and cold calling. New Jersey: John Wiley & Sons; 2015.Google Scholar
- 16.Goldman B. The Science of Settlement: Ideas for Negotiators. Pennsylvania: ALI-ABA; 2008.Google Scholar
- 17.Mortensen KW. Maximum influence: the 12 universal Laws of power persuasion. 2nd ed. New York: American Management Association; 2013.Google Scholar
- 18.Weinschenk S. The power of the word “because” to get people to do stuff. Psychol Today. 2013; https://www.psychologytoday.com/blog/brain-wise/201310/the-power-the-word-because-get-people-do-stuff. Accessed 5 Nov 2015.
- 19.Cialdini RB. Influence: Science and practice. 4th edition. USA: Arizona State University: Allyn & Bacon; 2001.Google Scholar
- 23.Makel MC, Plucker JA, Hegarty B. Replications in psychology research how often do they really occur? Perspect Psychol Sci. 2012;7:537–42.Google Scholar
- 28.National institute for Health and Welfare. School health survey 2015 results: Lifestyle. Terveyden ja hyvinvoinnin laitos. 2015. https://web.archive.org/web/20170306230805/https://www.thl.fi/fi/tutkimus-ja-asiantuntijatyo/vaestotutkimukset/kouluterveyskysely/tulokset/tulokset-aiheittain/elintavat. Accessed 4 Dec 2015.
- 29.Hankonen N, Heino MTJ, Araujo-Soares V, Sniehotta FF, Sund R, Vasankari T, et al. ‘Let’s move it’ – a school-based multilevel intervention to increase physical activity and reduce sedentary behaviour among older adolescents in vocational secondary schools: a study protocol for a cluster-randomised trial. BMC Public Health. 2016;16:451–66.CrossRefPubMedPubMedCentralGoogle Scholar
- 30.Hankonen N, Heino MTJ, Hynynen S-T, Laine H, Araújo-Soares V, Sniehotta FF, et al. Randomised controlled feasibility study of a school-based multi-level intervention to increase physical activity and decrease sedentary behaviour among vocational school students. Int J Behav Nutr Phys Act. 2017;14. https://doi.org/10.1186/s12966-017-0484-0.
- 31.Heino MTJ. No use reasoning with adolescents? A randomised controlled trial comparing persuasive messages. 2016. https://helda.helsinki.fi/handle/10138/163800. Accessed 7 Jun 2017.
- 32.Heino MTJ. Comparing persuasive SMS reminders: supplementary website. 2018. https://web.archive.org/web/20180828180801/https://heinonmatti.github.io/sms-persuasion/sms-persuasion-supplement.html. Accessed 21 Feb 2018.
- 33.Rowlands, A. V., Fraysse, F., Catt, M., Stiles, V. H., Stanley, R. M., Eston, R. G., & Olds, T. S. Comparability of Measured Acceleration from Accelerometry-Based Activity Monitors. Medicine & Science in Sports & Exercise, 2015;47(1), 201–210. https://doi.org/10.1249/MSS.0000000000000394
- 34.Aittasalo M, Vähä-Ypyä H, Vasankari T, Husu P, Jussila A-M, Sievänen H. Mean amplitude deviation calculated from raw acceleration data: a novel method for classifying the intensity of adolescents’ physical activity irrespective of accelerometer brand. BMC Sports Sci Med Rehabil. 2015;7:18.CrossRefPubMedPubMedCentralGoogle Scholar
- 36.Dienes Z. Understanding psychology as a science: an introduction to scientific and statistical inference. New York: Palgrave Macmillan; 2008.Google Scholar
- 41.R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2015.Google Scholar
- 44.Phillips N. yarrr: A companion to the e-book YaRrr!: The Pirate’s Guide to R. 2016. https://cran.r-project.org/web/packages/yarrr/vignettes/pirateplot.html.
- 47.Peters G.-J. Y. userfriendlyscience: Quantitative analysis made accessible. 2016. http://CRAN.R-project.org/package=userfriendlyscience.
- 48.Morey RD, Hoekstra R, Rouder JN, Lee MD, Wagenmakers E-J. The fallacy of placing confidence in confidence intervals. Psychon Bull Rev. 2015. https://doi.org/10.3758/s13423-015-0947-8.
- 49.Heino MTJ, Vuorre M, Hankonen N. Bayesian evaluation of behavior change interventions: A brief introduction and a practical example. PsyArXiv. 2017. doi: https://doi.org/10.17605/OSF.IO/XMGWV.
- 50.Morey RD, Romeijn J-W, Rouder JN. The philosophy of Bayes factors and the quantification of statistical evidence. J Math Psychol. 2016. https://doi.org/10.1016/j.jmp.2015.11.001.
- 51.Etz A, Vandekerckhove J. Introduction to Bayesian Inference for Psychology. Psychon Bull Rev. 2018;25:5–34.Google Scholar
- 53.Morey RD, Rouder JN. BayesFactor: Computation of Bayes Factors for Common Designs. 2015. https://CRAN.R-project.org/package=BayesFactor.
- 55.Jamil T, Ly A, Morey RD, Love J, Marsman M, Wagenmakers E-J. Default “Gunel and dickey” Bayes factors for contingency tables. Behav Res Methods. 2015:1–15.Google Scholar
- 60.Lakens D, McLatchie N, Isager PM, Scheel AM, Dienes Z. Improving Inferences about Null Effects with Bayes Factors and Equivalence Tests. J Gerontol B. Psychol Sci Soc Sci. 2018;:gby065.Google Scholar
- 61.Harms C, Lakens D. Making “Null Effects” Informative: Statistical Techniques and Inferential Frameworks. J Clin Transl Res in press. doi: https://doi.org/10.31234/osf.io/48zca.
- 62.Hansen H. Fallacies. In: Zalta EN, editor. The Stanford encyclopedia of philosophy. Summer 2015. 2015. https://plato.stanford.edu/entries/fallacies/. Accessed 12 Mar 2016.
- 65.Dienes Z. Using Bayes to get the most out of non-significant results. Quant Psychol Meas. 2014;5:781.Google Scholar
- 69.Wasserstein RL, Lazar NA. The ASA’s Statement on p-Values: Context, Process, and Purpose. The American Statistician. 2016;70:129–33.Google Scholar
- 72.Demonceau J, Ruppar T, Kristanto P, Hughes DA, Fargher E, Kardas P, et al. Identification and assessment of adherence-enhancing interventions in studies assessing medication adherence through electronically compiled drug dosing histories: a systematic literature review and meta-analysis. Drugs. 2013;73:545–62.CrossRefPubMedPubMedCentralGoogle Scholar
- 87.Lakatos I. History of science and its rational reconstructions: Springer; 1971. http://link.springer.com/chapter/10.1007/978-94-010-3142-4_7. Accessed 2 Dec 2015
- 88.Bartholomew Eldredge LK, Markham CM, Ruiter RA, Fernández ME, Kok G, Parcel GS. Planning health promotion programs: an intervention mapping approach. New Jersey: John Wiley & Sons; 2016.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.