1 Introduction

1.1 Participants Retention and Completion in MOOCs

Massive Open Online Courses (MOOCs) demonstrate the potential of scaling higher education by means of digital media and the Internet. More than 100 million participants signed up to 11,400 courses from 900 universities around the globe [27]. MOOCs enable participants of different academic backgrounds to study at any time and in any place, to enhance their learning experience and to gain important 21st-century skills free or at significantly lower costs. The high potential of MOOCs has been criticized due to low retention and completion rates [10, 22] that often drop below 10% of the participants who registered to the course [5, 13, 18].

1.2 Intention-Fulfillment

Some researchers have questioned whether completion rates and completion certificates are the appropriate measures for evaluating the success of this new form of lifelong learning [12, 21]. Their basic claim was that the success of lifelong learning in MOOCs should be evaluated not through traditional instructor-focused measures such as dropout rates and earning of completion certificates, but rather through learner-centered measures that take into account the informal nature of MOOC learning. One such measure is intention-fulfillment (IF) which measures the extent to which the learners fulfilled the initial intentions they had when accessing the course. This measure takes into account the personal objectives that the learners intend to achieve, rather than external success criteria [12]. In MOOCs and in other forms of open education, students may enroll with different intentions that effect their learning behavior [17, 19, 30]. From that point of view, a successful learning experience can take a variety of forms ranging from viewing a single lecture, attaining a specific skill, or studying a topic of interest, to studying a whole course and fulfilling all of its formal requirements. Thus, the participants´ intentions and their fulfillment should take center stage when evaluating the participants´ success in the course.

1.3 Learning Activity Sequences

Learning behavior in MOOCs is mostly visible through logs, which record access and usage patterns of the different course resources (e.g. video lecture, quiz, etc.). Many MOOC studies are based on simple access logs, counting each time the learner accessed or used a course resource, but ignored the order of the activities and their sequential nature [16]. Taking into consideration only the number of activities that the participants performed and ignoring the sequence of activities, provides only a partial picture. For example, as demonstrated by Li et al. [16], if we consider three imaginary participants who watched videos (V) and answered quiz questions (Q), one of them can watch all the videos and then answer the quizzes (V-V-V-Q-Q-Q) while another participant might first try to answer the quiz questions and only then watch the video lectures (Q-Q-Q-V-V-V). A third participant might follow each video by a quiz (V-Q-V-Q-V-Q). Although all three fictional participants watched three videos and answered three quizzes, their learning paths, or sequences, are fundamentally different.

Several researchers attempted to understand differences between the learning paths of MOOC participants who passed or failed a course. It was found that learners who passed the course followed a path that had different characteristics than those who did not pass the course [7, 11]. For example, replaying videos more than once, and watching a relatively high percentage of the course videos, were positively correlated with finishing the MOOC [28]. On the other hand, Van den Beemt, Buijs and Van der Aalst [29] found that successful students exhibit a more steady learning behavior and that this behavior is highly related to regularly watching course successive videos in batches.

Several studies used natural language processing (NLP) features in order to study MOOC participants dropout and retention mainly by studying the language students use [6, 14, 23]. However, we found only few studies that applied NLP methods such as n-gram analysis, to study learner activity sequences [16]. None of those studies had used NLP methods in order to predict subjective success outcomes in MOOCs such as intention-fulfilment. In this study, we apply methods that originate from the NLP realm, to analyze learning activities and learning activity sequences and to compare those activities and activities sequences between participants who report high-IF and participants who report low-IF.

2 Method

2.1 Sample

In the current study, we used clickstream data gathered from log files of 462 participants in a MOOC teaching the subject English as a Second Language (ESL) to identify the learning process of the participants. The data collection for the current study was carried out between July 2016 to February 2018. During this period, the participants were able to join and leave the offered MOOC whenever they liked to.

2.2 Course Activities and Their Annotations

MOOCs usually comprise of modules such as video lectures, quizzes and other resources [15]. The manner in which students interact with these course resources are considered conceptualizations of their higher-order thinking, which lead to knowledge construction [4]. In this ESL-MOOC, the participants were able to choose ten different types of activities in any order, place and time. The course was arranged by units. Each unit contained an introductory page (I). This page pointed participants to several additional resources: a list of learning strategy videos (S), a PDF reading comparison text that is used throughout the unit (P), a recommended learning track (T), several lessons (L) quizzes (Q) and a final exam (E). Each of the lessons comprises of a single video (V) and links to specific learning strategy videos (S). Participants who watched videos could click the video play/pause button according to their personal progress during the video lecture. Although the course does not provide academic credit, the participants could get a participation badge (B) if they answered all the questions in the quizzes and achieved a predefined minimum score. The participants were also able to watch the list of rights (R) (credits) of the course materials. In total, we harvested 61,713 activities. It is important to note that the logs only recorded the clicks, and did not record other activities (e.g. reading text, feedback on quizzes). Table 1 summarized the courses’ activities, their codes, and a short description of each.

Table 1. Course activities – codes and description.

2.3 Computational Tool Kit for Sequence Analysis

Preprocessing: In order to use the NLP tools to analyze learning sequences, each participant’s sequence of learning activities was coded as mentioned above in Table 1.

For the sequence analysis, we used Antconc 3.5.7, a multiplatform toolkit developed for carrying out corpus linguistics research and data-driven learning [1, 2]. Specifically, we used two NLP methods: n-gram tool, and keyness tool.

The n-gram tool allows us to find common “expressions”, i.e., common sequences of activities, and their transitional probabilities. In the current study, the n-gram analysis consisted of uni- bi-, tri-, and four-grams calculations by Antconc. For each group separately (high-IF or low-IF), we sorted the ni-gram lists according to their probability values. We then excluded activities with probability below 0.1, and calculated two measures:

  1. 1.

    The relative frequency of each ni-gram sequence was calculated by dividing the absolute frequency of that ni-gram sequence of activities by the total number of ni-grams in that group. For example, the bi-gram sequence V-V occurred 6,767 times in the low-IF group, which was divided by 25,742 (total number of bi-grams in that group), resulting in a relative frequency of 26%.

  2. 2.

    Participation range was calculated by dividing the number of participants that performed each ni-gram sequence of activities by the total number of participants in that group. Thus, the participation range is the relative distribution (entropy) of each ni-gram sequence. For example, 186 participants out of the 231 participants in the low-IF group performed the V-V sequence. Therefore, the relative distribution of this sequence is 81%.

The keyness analysis was carried out in order to identify the activities that are unusually frequent (or infrequent) in one group in comparison with the activities in the other group. The keyness analysis provides an indication of a keyword’s importance as a content descriptor in a given corpus relative to a reference corpus [3]. “A word is said to be “key” if […] its frequency in the text when compared with its frequency in a reference corpus is such that the statistical probability as computed by an appropriate procedure is smaller than or equal to a p-value specified by the user” [25]. The statistical significance of keyness is calculated by using the value of log likelihood [2, 26] and the size of the differences is calculated by effect size [9].

2.4 Dependent Variable

The fulfilment of the initial intention (IF) was measured by 4 items on 7-point Likert scale ranging from 1 ‘totally don’t agree’ to 7 ‘strongly agree’ (e.g. ‘I achieved my personal learning goals by participating in this MOOC’, ‘the MOOC met my expectations’; Cronbach’s alpha = .89). The participants were split into two groups according to their post-course IF level divided by the sample median (med = 4.75). Two hundred and twenty participants had been identified as high-IF and 242 participants had been identified as a low-IF. Participants that carried out less than four activities were not included in the sample, leaving a total of 445 participants – 214 with high-IF and 231 with low-IF. Due to the anonymization process, no demographic information was available about the participants.

3 Results

In the following section, we first present the differences between the two groups in total activities per participant – high and low IF. We then present the learning sequences findings using the n-gram and keyness measurements.

Table 2 shows the descriptive statistics of the number of activities per participant in each group. In total, 61,713 activities were analyzed (high-IF = 35,790; low-IF = 25,973). The non-parametric Mann-Whitney U test indicated that the number of activities per participant was significantly higher for the high-IF group compare to the low-IF group (U = 17223.5, p < .001). In order to check if there are differences between the two groups in their level of heterogeneity, we checked whether the standard deviations in the number of activities are significantly different between the low and the high IF groups. Levene’s test of the homogeneity of group variances showed significant difference (F(1,443) = 1.46, p < .05). Although on average the number of activities in the high-IF is higher compared to the low-IF group, the standard deviation of the number of activities and the maximum activities per participant are both higher in the low-IF group compared to the high-IF group (see Table 2).

Table 2. Descriptive statistics of the number of activities per participant and the activity frequencies in the high and low IF groups.

3.1 N-gram Analysis

In order to identify the learning sequences of the two groups, we used n-gram analysis to compare sequences of activities (activities’ relative frequency analysis) and their distribution among the participants (range analysis). The two analyses are complementary to each other. While the activities’ relative frequency analysis answers the question of what is the relative prevalence of an activity or sequence of activities in a specific group of participants, the range analysis answers the question, what is the percentage of participants that participated in an activity or sequence of activities?

The number of the unique tokens in the unigram analysis is 10 (representing the 10 codes of activities), the bigrams – 95, the trigrams – 682 and the four-grams – 3,134.

Figures 1a–d present the results of the activities’ relative frequency n-gram analysis and Figs. 1e–h present the results of the range n-gram analysis. In both cases, only activities with probability above 0.1 were included.

Fig. 1.
figure 1

(a–h) Relative frequency of activities (a–d) and relative range distribution (e–h) among the two groups in uni- bi- tri- and four- grams.

Figure 1a presents the comparison of the unique unigrams in both groups (the figure represents the information in Table 2). The video activity (V) is more salient in the high-IF group compared to the low-IF one. On the other hand, the track (T), lessons (L), quiz (Q) and exam (E) activities have higher occurrences in the low-IF group compared to the high-IF group.

Figure 1b presents a difference in the V-V bigram between the low-IF and high-IF groups that is larger than the differences in the other bigrams. The participants in the high-IF group sequentially press the video play/pause button more than the participants in the low-IF group. Interestingly, five of the bigrams (Q-Q, P-Q, S-L, V-L, and T-Q) are unique to the low-IF group.

Figure 1c presents the trigrams activities that show a similar pattern to the bigrams, with more participants in the high-IF group that sequentially press the play/pause button video (V-V-V). While looking at the sequences that are unique to one of the groups, it can be seen that in the low-IF group, there is a unique sequence of practicing the final exam (E-E-E), a sequence that does not exist in the high-IF group.

The four-gram figure (Fig. 1d) presents a prominent presence of the high-IF group compared to a minor presence of the low-IF group. The participants in the high-IF group made more four-gram sequences of video watching (V-V-V-V), and sequences of video watching after watching the recommended learning track (T-V-V-V), accessing the lessons (L-V-V-V), answering a quiz (Q-V-V-V) accessing the reading comprehension text (P-V-V-V), self-practicing the final exam (E-V-V-V), etc.

The results of the range n-gram analysis show similar trends. The range shows the percentage of participants who actually did each activity (or sequence of activities) out of the overall activities (or sequence of activities) in each group. The calculation of the range enables us to calculate the relative distribution (entropy) of each activity. Figure 1a shows that, in the high-IF group, four activities have been performed by above 80% of participants, while in the low-IF group only two activities were carried out by 80% or more of participants. Two activities in the high-IF group were performed by 50% to 79% of the participants compared to five activities in this range of participation in the low-IF group. In both groups, the three activities - S, R, and B - were carried out by less than 40%. A higher percentage of participants in the high-IF group pressed the play/pause video button (V), accessed the quizzes (Q), accessed the reading comprehension PDF text (P), accessed the introductory page of the course (I), and accessed to the video lessons dealing with learning strategies (S). No differences were found between the two groups in the range of participants who accessed the recommended learning track (T), the self-practice exam (E), the right of use (R), and the achievements page (B).

The differences in the range parameters between the two groups increase when we look at the bi-, tri- and four-grams (Fig. 1f–h). This is evident by the fact that the longer the n-gram, the higher the participation range in the high-IF group compared to the low-IF group. The low-IF participants, on the other hand, performed five unique bi-gram sequences, one unique tri-gram sequence, and no unique four-gram sequence of activities. The decrease in unique sequences and the fact that we only analyzed n-grams with relatively high probability (>0.1), means that the low-IF participants use more varied sequences by less and less participants. This also means that in the range parameter, the high-IF group behaves more consistently and that more participants behave similarly (lower entropy).

3.2 Keyness Results

Video play/pause activity (V) was identified as a key activity in the high-IF group compared to the low-IF group. Participants in the high-IF group pressed the play/pause video (V) button 1.28 more than the participants in the low-IF group (log(.25) = 232.11, p < .001, Effect Size = 1.28).

In the low-IF group, we found that lessons (L), track (T), exam (E) and quiz (Q) activities are key activities compared to the high-IF group. Participants at the low-IF accessed to more lessons (log(.25) = 84.28, p < .001, Effect Size = 1.27), followed more recommended learning track (log(.25) = 49.44, p < .001, Effect Size = 1.71), accessed more exams (log(.25) = 36.64, p < .001, Effect Size = 1.23) and participated in more quizzes (log(.25) = 11.21, p < .001, Effect Size = 1.10) compared to the high-IF group. These results are reflected in the relative frequency unigram analysis mentioned above.

4 Discussion

The purpose of the current study was to compare behavioral patterns and learning sequences between participants with high and low IF in a MOOC. The comparison was conducted in order to identify behavioral differences between activities and activity sequences of these two groups using NLP techniques, namely n-gram and keyness.

In order to achieve those aims, we compared the differences in the relative frequencies of learning behavior sequences and in the participation range (participation entropy) by using n-gram analyses and keyness analysis.

As might be expected, participants with high-IF are more active in the course compared to participants with low-IF. Furthermore, the unigram analysis and the keyness analysis revealed that participants in the high-IF group pressed the play/pause video button more often than the participants in the low-IF group did. On the other hand, participants in the low-IF group more frequently accessed lessons, recommended learning tracks, and took exams and quizzes. These results suggest that the participants in the high-IF group were more focused on acquiring knowledge, as evidenced by watching the video lectures, which contained the course content. On the other hand, the participants in the low-IF group showed a more diverse and less orderly (“messy”) learning behavior. Our interpretation of these patterns is that the participants in the low-IF group were less sure what to do in the course. They spent more attention on understanding what and how to learn, and on quizzes and final exams, and less on knowledge acquisition. These results are similar to the results by Mukala, Buijs, & Van Der Aalst (2015), who showed that students who passed a Coursera MOOC followed a more structured process in submitting their weekly quizzes until the final quiz and in watching video, when compared to students who did not pass the course. It is important to note that our conceptual replication of the results uses a broader perspective about success and failure in MOOCs. We see that the activities of the participants in the current MOOC can predict more subjective success outcomes, namely intention-fulfilment.

The n-gram analysis enabled us to compare the most probable sequences of activities and their distribution among the participants. Although Li et al. [16], showed that the most effective n-gram for predicting students’ activity in MOOCs is the trigram, our analysis suggests that we can differentiate between the groups even with a shorter string of annotation, meaning a bi-gram. The bigram analysis reveals that the high-IF group was characterized mostly by a two step sequence of the knowledge acquisition activity of watching video lectures sequentially (V-V), while the low-IF group was characterized by diverse bigram activities such as repeating the assessment tasks (Q-Q), moving from the reading comprehension to the quizzes without watching the video lecture (P-Q), moving from the short and focused videos dealing with learning strategies to the lesson (S-L), moving from the video lecture to the lesson (V-L), and moving from the recommended learning track to the quizzes (T-Q). These results are similar to the findings of Van den Beemt et al. [29] who used other success criteria such as passing rates. The researchers showed that regularly watching successive videos in batches leads to high passing rates.

Nevertheless, for the two parameters – activity frequency and participation range – we found that looking at longer n-gram sequences is beneficial in predicting the level of IF. The longer the n-gram, the higher the divergence between the two groups. Moreover, the longer the n-gram, the more prominent are the participants from high-IF group. The results showed that the activities of the high-IF group are more predictable, suggesting that this group behaves more consistently and similarly. When we analyze longer sequences, it is clearer that the participants in the high-IF group are following the designed path, i.e. the learning path suggested by the course designers in this particular MOOC.

Several limitations should be considered. First, we used median splits in order to distinguish between participants with high and low IF. This technique helped us to simplify our analyses and discussion. Recording continuous variables into categorical variables is often criticized due to the rough segmentation of the continuous variable [8], but this simplification was useful in our case. The results showed that we could easily differentiate between, and predict the learning sequences of the different participants. Future work could use a more sensitive segmentation and a larger amount of clusters. Another simplification that was used in this research is the use of only one learner-centered success measure, namely IF. Future research should use additional subjective success measures such as learner satisfaction [21] and perceived achievement [20, 24, 31].

Future research could also look at additional kinds of knowledge acquisition with video lectures. The MOOC studied here offered two kinds of video lectures – content-based lectures (V) and learning strategy lectures (S). As shown in Fig. 1e and f, in the high-IF group, a wider range of participants accessed the learning strategy videos (S) and learning strategy videos following by video lectures watching (S-V) compared to the low-IF group. Further investigation of the effect of using those learning strategy lectures on the level of IF is outside the scope of this study, but could be productive.

5 Conclusions

To conclude, the purpose of the current research was to distinguish between the low and the high IF groups based on their learning behavior. The results suggest that the single activity and sequential behavior of the participants enable us to identify their affiliation group. As has been shown by the keyness analysis, the two groups are different in the pattern of single activities, and bigger differences become apparent in the longer n-grams, both in terms of the relative prevalence of the activity and in terms of the number of participants who performed it. The high-IF group showed more homogeneous behavior. One of the contributions of our study is the feasibility of developing automatic intervention systems, which will analyze learning sequences in real time and identify inconsistent participant behavior, to support the participants in real time. For example, such system could propose a different learning track for learners, depending on their behavioural pattern. Alternatively, learning strategies could be proposed for specific sub-groups supporting their self-regulated learning.