Can smartphones be used to bring computer-based tasks from the lab to the field? A mobile experience-sampling method study about the pace of life
Researchers are increasingly using smartphones to collect scientific data. To date, most smartphone studies have collected questionnaire data or data from the built-in sensors. So far, few studies have analyzed whether smartphones can also be used to conduct computer-based tasks (CBTs). Using a mobile experience-sampling method study and a computer-based tapping task as examples (N = 246; twice a day for three weeks, 6,000+ measurements), we analyzed how well smartphones can be used to conduct a CBT. We assessed methodological aspects such as potential technologically induced problems, dropout, task noncompliance, and the accuracy of millisecond measurements. Overall, we found few problems: Dropout rate was low, and the time measurements were very accurate. Nevertheless, particularly at the beginning of the study, some participants did not comply with the task instructions, probably because they did not read the instructions before beginning the task. To summarize, the results suggest that smartphones can be used to transfer CBTs from the lab to the field, and that real-world variations across device manufacturers, OS types, and CPU load conditions did not substantially distort the results.
KeywordsPace of life Experience sampling Smartphone Well-being Psychological pressure
In recent years, researchers have been increasingly using smartphones to collect scientific data (Dufau et al., 2011; Miller, 2012; Raento, Oulasvirta, & Eagle, 2009). Smartphones offer several advantages over conventional data collection devices (e.g., printed diaries; e.g., Harari et al., 2016) and have the potential to broaden knowledge about psychological concepts by allowing researchers to do research in the field instead of in the lab (Wrzus & Mehl, 2015; for an early similar attempt using a microphone sensor, see Mehl, Pennebaker, Crow, Dabbs, & Price, 2001). To date, smartphones have mostly been used to collect questionnaire data and/or data from the built-in sensors. In contrast, the potential for using smartphones to conduct computer-based tasks (CBTs)—that is, tasks that rely on computer programming to collect some form of nonquestionnaire data (e.g., reaction times or the results of a sorting task)—has been largely neglected. In fact, to date, only a handful of studies have used smartphones to conduct CBTs (Dufau et al., 2011; Kassavetis et al., 2016; Lee et al., 2016). Thus, there is little methodological information about how successfully smartphones can be used to collect such data. In the present study, we therefore use the example of a smartphone-based “tapping task” to explore how well smartphones can be used to conduct CBTs according to a range of criteria, including technologically induced dropout, task compliance, and the accuracy of time measurements.
CBTs are used widely in psychological research. Well-known examples of CBTs include the implicit association test (IAT; Greenwald, McGhee, & Schwartz, 1998) and the Stroop task (Stroop, 1935).1 Meanwhile, many CBTs are designed so that the task can be accessed from the Internet and completed with a desktop computer. These CBTs are either created using specific Web browser plugins and technologies (e.g., Flash, Shockwave, Java Applet), or require participants to install specific players in order to perform the task on their own computers (e.g., Inquisit from Millisecond). Although these approaches to CBT design have their advantages, they also suffer from several drawbacks such as discontinued support of certain web technologies (e.g., Java Applets, Flash), the refusal of participants to install unknown software, or participants’ inability to install software due to missing computer administrator rights. It is well established that CBT designs can lead to technologically induced dropout, which can potentially bias the results (e.g., Schwarz & Reips, 2001; Stieger, Göritz, & Voracek, 2011).
Unlike CBTs designed for desktop computers, CBTs designed for smartphones usually do not rely on an Internet browser. Instead, smartphone tasks typically use apps (i.e., applications), which are installed on the smartphone itself. Apps can be used to present questionnaires, retrieve information from the built-in sensors, store a participant’s data and send them via the Internet or phone line to the researcher’s server, and send bings/signals (i.e., reminders) in longitudinal studies (e.g., experience-sampling method [ESM] designs; Bolger & Laurenceau, 2013), along with many other functions. As such, smartphone apps can be used for a variety of functions, which can greatly increase the richness of the data that can be collected. For example, with digital devices such as personal digital devices (PDAs) or smartphones, it is possible to assess the time when the participant completes a questionnaire, offering the possibility to get more information about the participants’ compliance (Stone, Shiffman, Schwartz, Broderick, & Hufford, 2002).
Interestingly, despite the potential advantages of smartphone data collection procedures, to date few studies have used smartphone apps to conduct CBTs (for exceptions, see Dufau et al., 2011; Kassavetis et al., 2016; Lee et al., 2016). Thus, currently little information is available about how successfully smartphones can be used to conduct CBTs from a methodological point of view. Dufau and colleagues conducted a lexical decision task using Internet-connected iPhone/iPad devices. They found that response time distributions in their online tasks were very similar to those found in lab studies, but it is unclear if this also applies to smartphones using an Android operating system (OS), which are heterogeneous with regard to OS type and manufacturer (Götz, Stieger, & Reips, 2017). Furthermore, it is largely unknown whether using smartphones to conduct CBTs might lead to technologically induced or design-specific dropout or measurement inaccuracy e.g., due to incompatibilities or tasks not being displayed as intended by the researcher. For example, Stisen and colleagues (2015) found substantial differences in the accuracy of the accelerometer sensor across different smartphone devices (i.e., there might also be differences in the accuracy and precision of different sensors).
In the present study, we addressed the gap in methodological knowledge about how well smartphones can be used to conduct CBTs. We embedded our methodological questions into a project about the pace of life and its correlates (e.g., Garhammer, 2002; Levine & Bartlett, 1984; Levine & Norenzayan, 1999; Rosa, 2003). Specifically, we conducted a smartphone app study using an ESM design. We assessed pace of life twice a day for three weeks. We measured pace of life in two ways. First, we used the classical direct approach by asking participants to use a visual analogue scale (VAS; Reips & Funke, 2008) to answer the question, “How do you perceive your pace of life at the moment?” Second, we developed a computer-based tapping task in which participants had to tap on the smartphone’s touchscreen display for 10 s according to their current pace of life. Past research has shown that walking speed can be used as a measure of pace of life (e.g., Levine & Bartlett, 1984). Hence, a faster pace of life seems to be reflected in the speed of body movements. If this is the case, then tapping with one’s finger might also be an indicator of one’s pace of life. Tapping tasks in general have frequently been used in functional neuroimaging studies (e.g., Witt, Laird, & Meyerand, 2008) and within the medical sciences for assessing motor system abnormalities such as with Parkinson’s disease (e.g., Lee et al., 2016).
We evaluated how well the CBT task could be implemented using smartphones, on the basis of a number of criteria. First, we explored whether there were indications of technological problems (e.g., technologically induced dropout). Given that CBTs are generally not as self-explanatory as simple questionnaires (e.g., sets of questions with a predefined answering format such as Likert-type scales), we assessed participants’ task compliance. In the laboratory, experimenters can easily provide instructions and address any arising problems, but this is difficult when tasks are conducted on the Internet or in the field. With Internet-based tasks, participants are usually given written instructions, but it is questionable whether participants always comply with such instructions (see Stieger & Reips, 2010). Furthermore, we assessed the accuracy of millisecond measurements. Millisecond measurements formed the basis of our dependent variable (the number of finger taps, described below) and are a critical part of many other CBTs (e.g., reaction time tasks). The running of many parallel processes on a computer or smartphone can result in distorted time measurements. In lab experiments using desktop computers, it is possible to turn off any potentially interfering processes. The number of parallel processes can hardly be controlled in smartphone studies. Thus, inaccurate millisecond measurement represents a potential problem associated with using smartphones to conduct CBTs. Finally, we compared the predictive validity of the CBT to the VAS.
The sample was recruited through word of mouth in southern Germany. A total of 295 people installed the study app and indicated informed consent. After excluding people who did not participate or only participated once during the longitudinal part of the study, the data from 246 participants remained for analyses (53% women, 44% men, 3% did not disclose their sex). Reported participant age ranged from 16 to 70 years (M = 25.2, SD = 10.9). About half of the participants indicated that they were students (54.1%).
The participants who dropped out of the study (n = 49) did not differ from the participants who filled in the longitudinal questionnaire at least twice (n = 246) with regard to reported sex (χ 2 < 0.1, df = 1, p = .88; odds ratio OR = 1.06, 95% CI: 0.48, 2.33) or age (Mann–Whitney U: z = – 1.28, p = .20). However, drop-outs were more frequently iOS users (χ 2 = 5.5, df = 1, p = .02; odds ratio OR = 2.23, 95% CI: 1.13, 4.40).
The smartphone study on pace of life used a mobile experience sampling methodology (mESM; real-time and multiple time point measurements using mobile devices). The app prompted participants to provide ratings twice a day. The longitudinal part of the study lasted three weeks for a total of 42 measurement occasions. Throughout the study, participants were in their natural surroundings and evaluated their present situation. Real-time data is usually more accurate than retrospective self-report data (e.g., Conner, Tennen, Fleeson, & Barrett, 2009). The design allowed us to analyze the methodological characteristics of the task longitudinally (e.g., whether possible reaction time measurement distortions appeared only once or systematically across time).
After completing the longitudinal part of the study, participants completed an Internet-based post-test questionnaire, which is not part of this study. Participation was compensated with either the chance to win one of two Amazon gift cards worth €20 each (entry was optional) or with course credit. The study was conducted in German.
Pace of life
Paradata: Time to complete the questionnaire2
As an indirect measure of pace of life, we assessed the time that participants needed to complete the whole app questionnaire. Pace of life seems to be associated with a faster working speed (see Levine & Bartlett, 1984). Participants with a faster pace of life should therefore also complete the questionnaire more quickly than participants with a comparatively slower pace of life.
Correlates of pace of life from previous studies: Well-being and psychological pressure
To explore the predictive validity of the tap measure, we assessed well-being and psychological pressure as two variables that had been related to pace of life in previous research (Garhammer, 2002). Well-being was assessed with the item, “How is your current well-being?” and a VAS (0 = very bad; 100 = very good). Psychological pressure was assessed with the item, “How strongly are you currently bothered by things you actually have to do, but haven’t yet?” and a VAS (0 = not bothered, 100 = very bothered).
When the app was first opened, participants had to provide informed consent and were asked about their basic demographics (age, sex, and country of origin). The first two screens (informed consent, demographics) were only presented once, during the first administration. Afterward, the main screen appeared. The main screen showed a counter indicating when the next measurement would take place. The app randomly produced a trigger (i.e., in-app reminder) to conduct the measurement within two predefined time windows (morning 5:00 to 12:00; afternoon/evening 15:00 to 24:00). Participants could select their own time frame within these predefined times (i.e., personal time windows could only be smaller, not larger).3 Furthermore, on the main screen, participants had the possibility to request personal statistics in a graphical format (for an example, see Fig. 1).4
When the app generated a trigger, the participant’s smartphone alerted him or her that it was time to answer the study questions. After the participant tapped on the alert, the app automatically started, and the participant was presented with the study questions on three successive screens (see Fig. 1).
First, we examined how many participants dropped out of the study. We also analyzed whether technological problems with the tapping task might have led to dropout. The tapping task was on the third page and was followed by another page with questionnaire items. We thus compared the rate of missing data on the different pages. If missing data were higher on the fourth than on the other pages, it would suggest that the automatic forwarding after 10 s from the tapping task page to the last page did not work.
To assess millisecond measurement accuracy, we calculated how long a particular participant at a particular measurement occasion had his or her finger on and away from the touchscreen (i.e., the sum of all durations that the finger rested on the touchscreen, plus the sum of all durations the finger was away from the touchscreen). If the millisecond measurement was perfectly accurate, the total duration should sum up to 10,000 ms, the time at which the app automatically loaded the next questionnaire page. We examined the distribution and central tendencies of the total recorded time on and away from the touchscreen during the tapping task to determine the extent to which the millisecond measurements were accurate.
Finally, to examine the predictive validity of the tap measure and the VAS measure of pace of life, we first calculated the correlation between the two measures. Next, we assessed the extent to which fill-in time, psychological well-being, and psychological pressure predicted each of the two measures of pace of life. The distribution of fill-in time was skewed. We therefore log-transformed this variable prior to analysis (log + 1). We also included the time-dependent number of the assessment as a predictor to assess a possible effect of time on pace of life, in which case the separation of the between- from within-subjects variance would be biased but solvable through detrending (Curran & Bauer, 2011). Because the current dataset represents a mixture of independent (data across participants) as well as dependent (re-tests within participants) data, we used R (package “nlme”) to calculate two multilevel models with random intercepts and coefficients and the two pace of life measures as the dependent variables. Daily observations (Level 1) were nested within participants (Level 2). To maximize information from the available data and to separate within- from between-participants variance, we followed the CWC(M) approach (i.e., centered within context with reintroduction of the mean at Level 2; Curran & Bauer, 2011). Specifically, we centered the Level 1 variables around the person-means (i.e., the personal average of each individual) and included the person means as predictors at Level 2. Thus, the Level 1 variables capture the within-person variance (i.e., the extent to which a particular measurement deviated from that person’s personal average) whereas the variable means on Level 2 capture the between-person variance (i.e., differences between participants). Because Level 1 variables represent data from multiple retests, which are often correlated, we controlled for autocorrelations. We compared the results of the multilevel model to the results of earlier research on pace of life (Garhammer, 2002). We also examined the intraclass correlations (ICC) for the VAS and tap measures to assess the extent to which the variance could be attributed to between- and within-subjects differences. Because we questioned the validity of low tap values (see above), we excluded cases with fewer than four taps from these analyses.
Results and discussion
Technologically induced problems and dropout
The dropout rate was very low for a longitudinal design. Only 14 participants dropped out before the end of the first week, and another 11 participants by the end of the second week. Of the 6,000+ completed questionnaires, there were only 18 instances in which a participant did not complete the fourth (last) questionnaire page. This suggests that the automatic forwarding after 10 s from the tapping task page to the last page functioned correctly.
As can be seen from Fig. 2 (panel A), the VAS data were just about normally distributed, but there were slightly more responses in the middle of the scale (= 50) than in a normal distribution. Participants may have more frequently used the middle category because they really did have a “medium” pace of life more frequently, or because they were not willing or able to provide information about their pace of life (i.e., a sign of noncompliance; Kulas & Stachowski, 2009).
Figure 2B displays the distribution of the number of taps from the tapping task. This distribution is rather skewed, with substantially more values between 1 and 3 than would be expected in a normal distribution. This probably occurred because some participants did not read the instructions (i.e., they first tapped the circle a couple of times, which started the counter, and then read the instructions). Furthermore, the high frequency of low tap values could also represent technical problems during the task (for more details, see Fig. 2C, gray rectangle).
Inspection of the bar charts indicated a high frequency of low tap values (<4) still at the last measurement occasion (graphs have been omitted for brevity). Thus, the bar graphs did not suggest that low tap values were due to a misunderstanding of the task instructions fading over time. The frequency of low tap values did not differ significantly across the different time points, χ 2 = 19.6, df = 43, p = .99. However, we did observe a significant, positive Kendall’s tau–b correlation between low-versus-high tap values, on the one hand, and measurement occasion, on the other hand (r = .023, p = .028). Specifically, there were more low tap values at the beginning of the study, although the magnitude of the relationship was of very small effect size. The standardized residuals from the cross-tabulation were higher than expected for the low taps at the beginning of the study (standardized residuals #1, #3, #5, and #6 were larger than 1), and lower than expected at the end of the study (e.g., 16 of the last 20 low tap counts had negative standardized residuals—i.e., counts were lower than expected). This again suggested that low tap values were more frequent at the beginning than at the end of the study. Nevertheless, low tap values were recorded throughout the entire course of the study (e.g., on the last day of the study, six participants [2.4%] still had fewer than four taps). Furthermore, 11 participants (4.5%) had very high rates of low tap values over the entire course of the study, potentially representing either technological problems or noncompliance.
Millisecond measurement accuracy
The median time recorded on and away from the touchscreen during the tapping task was 9,779 ms (M = 9,289, SD = 1,700). Closer inspection of the data revealed that the deviation from 10,000 occurred due to the last millisecond value sometimes not being stored because the app had already loaded the next question. In 306 cases (4.1%), the sum of all milliseconds was more than 10,000, with a median of 10,019 ms (M = 10,436, SD = 988; range 10,001 to 19,034). It is highly probable that values higher than 10,000 occurred because other processes were running on the smartphone’s CPU, which led to a slowdown of the processor. In general, there were few cases in which the sum was greater than 10,000 ms, and the bias was very small (the mode was 10,001). This is indicative of the high accuracy of the millisecond measurements.
Predictive validity of the VAS and tapping task measures of pace of life
Results of the multilevel model analysis with visual analog scale (VAS) score and tap measures of pace of life as the dependent measures
Time of assessment
In the present study, we used a tapping task to measure pace of life to analyze whether smartphones can be used to successfully transfer CBTs from the lab to the field (e.g., Dufau et al., 2011). We found that the smartphone CBT functioned quite well. First, although some participants did not appear to read the instructions and/or did not comply with the task’s requirements (for a similar result when using online questionnaires, see Stieger & Reips, 2010), task noncompliance was small and decreased over time. Nevertheless, researchers need to be aware that more complex CBTs might produce higher rates of noncompliance.
Second, we found low rates of dropout, and no evidence of more dropout on the tapping task page relative to the other pages. This is a very encouraging result because technology-induced dropout has been frequently found in online questionnaire studies using technologies other than HTML (Schwarz & Reips, 2001; Stieger et al., 2011). Nevertheless, we recommend that researchers using smartphones to conduct CBTs check dropout rates in great detail.
Third, in the present study we found evidence that smartphones could measure milliseconds quite accurately. Thus, other processes running in parallel to the study app did not appear to substantially influence the accuracy of smartphone millisecond measurements. This result is in line with the results of Dufau et al. (2011) who used iPads and iPhones to measure milliseconds. In the present study, participants had smartphones from many different manufacturers with many different Android operating system versions. The results with regards to measurement accuracy therefore appear to generalize to Android smartphones as well (see also Götz, Stieger, & Reips, 2017). Importantly, it seems that using smartphones as opposed to computers will not substantially affect the results of CBTs involving millisecond measurements (e.g., Implicit Association Test; Greenwald et al., 1998) or exact timing (e.g., speeded computer tasks; MouseTracker; Freeman & Ambady, 2010). Hence, smartphones appear to have considerable potential for allowing researchers to transfer CBTs from the lab to the field. Nevertheless, more systematic research will be needed before smartphones can be used for tasks in which single-millisecond measurements are of importance (for an example regarding Web experiments, see Keller, Gunasekharan, Mayo, & Corley, 2009).
Despite these limitations, we believe that the present results point to the potential of using smartphones to conduct CBT studies. The results suggest that variations in smartphone manufacturers, OS types, and CPU load conditions probably do not substantially distort the results of CBTs when transferred from the lab to the field.
During the installation process, 70 participants accepted the early time frame 5:00 to 12:00, and 101 participants accepted the later time frame 15:00 to 24:00 as is (61 participants accepted both time frames). All the other participants used the option to adjust the time frames. During the study, 57 participants changed the time frames; 45 changed them once, six changed them twice, and another six changed more than twice. The mean time frames chosen by all participants were from 7:36 (SD = 1:56) to 11:17 (SD = 1:24) and from 16:18 (SD = 1:48) to 22:10 (SD = 1:57).
Participants requested the following graphics at least once: 86% the calendar, 83% the mean pace-of-life score across all participants, 72% a line chart of one’s own well-being over the course of a day, 75% a scatterplot with a regression line displaying the association between pace of life and well-being, and 71% a world map with data about the participants’ countries.
- Bolger, N., & Laurenceau, J.-P. (2013). Intensive longitudinal methods: An introduction to diary and experience sampling research. New York, NY: Guilford.Google Scholar
- Curran, P. J., & Bauer, D. J. (2011). The disaggregation of within-person and between-person effects in longitudinal models of change. Annual Review of Psychology, 62, 583–619. https://doi.org/10.1146/annurev.psych.093008.100356 CrossRefPubMedPubMedCentralGoogle Scholar
- Dufau, S., Duñabeitia, J. A., Moret-Tatay, C., McGonigal, A., Peeters, D., Alario, F.-X., . . . Grainger, J. (2011). Smart phone, smart science: How the use of smartphones can revolutionize research in cognitive science. PLoS ONE, 6, e24974. https://doi.org/10.1371/journal.pone.0024974 CrossRefGoogle Scholar
- Harari, G. M., Lane, N. D., Wang, R., Crosier, B. S., Campbell, A. T., & Gosling, S. D. (2016). Using Smartphones to collect behavioral data in psychological science: Opportunities, practical considerations, and challenges. Perspectives on Psychological Science, 11, 838–854. https://doi.org/10.1177/1745691616650285 CrossRefPubMedPubMedCentralGoogle Scholar
- Lee, C. Y., Kang, S. J., Hong, S.-K., Ma, H.-I., Lee, U., Kim, Y. J. (2016). A validation study of a smartphone-based finger tapping: Application for quantitative assessment of bradykinesia in Parkinson’s disease. PLoS ONE, 11, e0158852. https://doi.org/10.1371/journal.pone.0158852 CrossRefPubMedPubMedCentralGoogle Scholar
- Mehl, M. R., Pennebaker, J. W., Crow, D. M., Dabbs, J., & Price, J. H. (2001). The Electronically Activated Recorder (EAR): A device for sampling naturalistic daily activities and conversations. Behavior Research Methods, Instruments, & Computers, 33, 517–523. https://doi.org/10.3758/BF03195410 CrossRefGoogle Scholar
- Stisen, A., Blunck, H., Bhattacharya, S., Prentow, T. S., Kjærgaard, M. B., Dey, A., . . . Jensen, M. M. (2015). Smart devices are different: Assessing and mitigating mobile sensing heterogeneities for activity recognition. In J. Song, T. Abdelzahar, & C. Mascolo (Eds.), Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems (SenSys) (pp. 127–140). New York, NY: Association for Computing Machinery. https://doi.org/10.1145/2809695.2809718