1 Introduction

The spread of the novel coronavirus disease 2019 (COVID-19) started in December 2019 in Wuhan, China [32]. Due to the rising health concerns, many universities around the world transitioned from face-to-face to online course delivery and assessments [24]. In the province of British Columbia (BC), Canada, all on-campus classes and activities in the post-secondary schools were cancelled starting March 15, 2020. At Thompson Rivers University (TRU), the face-to-face classes shifted to alternative modes of delivery mostly conducted through the online learning management systems. In order to adapt to the changes, most, if not all, of remaining course assessments were switched to open-book non-invigilated exams with adjusted weights for the evaluation components. In this study, we proposed a Bayesian statistical model to evaluate the effects of the sudden change from classroom-based to online delivery and assessments in response to COVID-19 pandemic on students’ academic performance. It is important to understand comprehensively how students’ performance has changed after March 15, 2020 due to the COVID-19 effect and to investigate factors that potentially contributed to the changes.

Due to the unique situation of COVID-19 and its social and educational ramifications, there have been a few published articles discussing how COVID-19 and the shutdown of universities have impacted students’ performance in post-secondary education. For instance, Sintema [26] investigated the possible impacts that the closure of secondary schools due to COVID-19 in Zambia would have on the general performance of students in specific subject areas. Having interviewed STEMFootnote 1 educators at a public secondary school in Zambia, the study concluded that learners’ performance would be negatively affected in STEM subjects in the upcoming national examination if the COVID-19 epidemic is not being lessened in the shortest possible time. This is because of not only the lack of contact and meaningful interactions between learners and teachers, but also insufficient e-learning tools to facilitate such interactions. Basilaia and Kvavadze [3] studied the capacities of the country Georgia and its population to continue the education process at the schools in the online form of distance learning. The study was conducted in a private school during the COVID-19 pandemic which reviewed different available platforms that were used with the support of the government for online education and live communications. Their findings showed that the transition from traditional to online education system was successful in terms of adaptability and gained skills by students, teachers and administrative staff. Nevertheless, none of these studies compared students’ performances empirically.

During the Severe Acute Respiratory Syndrome (SARS) epidemic in Hong Kong, when all schools and universities were ordered closed and governments invoked quarantine laws to isolate those who might be the carriers, Wong [29] conducted a study which briefly described the impact of e-commerce on the local community with an emphasis on the use of e-learning technology as a contingency measure in tertiary institutions. Their study showed that, given limited time available for the course design and delivery, the examination result of the e-learning class was slightly better than the traditional class. However, their study lacks the rigorous empirical comparisons of students’ marks using a sound statistical model. They suggested further rigorous study.

There are some studies evaluating how students perform in online versus face-to-face delivery in post-secondary mathematics education. Jones and Vena [13] focused on equity in learning as reflected in the final grades of online and on-site students from the same post-secondary mathematics course taught repeatedly over 10 semesters. On-site students attended regular class sessions, while online students only attended an orientation session and a final exam. For both groups, the evaluations were invigilated. Their findings revealed significant statistical differences in online and on-site students’ final grades, in favor of on-site student achievement. Rey [23] examined the associations between taking basic skills mathematics courses online versus face-to-face and student success and persistence. The study stressed on the difficulties associated with effective communication of mathematics topics and ideas via the Internet and found no noticeable associations with learning outcomes or persistence. Their study also pointed out that the quality of education gained from online basic skills mathematics courses is relatively equivalent to face-to-face courses. Weems [27] compared two sections of beginning algebra course where one was taught online and the other was taught on site. The study reported no significant difference for exam average between the two formats, but highlighted a decrease in performance by the online students across the exams. Ashby et al. [2] compared student success in a Developmental Math course offered in three different learning environments: online, blended, and face-to-face. Using a one way analysis of variance (ANOVA), the authors showed that there were significant differences between learning environments with the students in the blended courses having the least success. With regards to the efficiency of learning outcomes in the online pedagogy, Arias et al. [1] studied the effectiveness of online delivery relative to face-to-face delivery by randomly assigning students into two categories: online and face-to-face. These two sections were taught by the same instructor and the course objectives and exams were the same for both sections as well. The authors concluded that “both course objectives and the mechanism used to assess the relative effectiveness of the two modes of education may play an important role in determining the relative effectiveness of alternative delivery methods.” One other area of interest in this respect is the notion of exams and their competence in the digital space. In a study by Williams and Wong [28], the authors examined the efficacy of closed-book invigilated final exams versus open-book and open-web (OBOW) final exams in a completely online university. They analyzed students’ experience who had completed both formats of the exam by surveying them on the merits of each exam format. Their findings showed that \(100\%\) of students found OBOW preferable to traditional closed-book exams. On the issue of academic integrity, their results indicated that in both exam formats “there has been equal opportunity of plagiarism in the view of students.” On the other hand, there are some studies indicating that online exams and access to technology provide more opportunities for dishonest behaviors [14]. Even though there are some similarities and differences with the aforementioned studies, the research environment of our study is qualitatively different. In our study, a Bayesian statistical model is proposed to compare students’ performances reflected by their empirical marks received in a sudden unexpected scenario where both classes and evaluations were virtually forced to be conducted online.

On the other hand, the closure of universities and transitioning to online teaching may have prompted mental, psychological and educational challenges for students and faculty members. A recent study by Statistics Canada based on crowdsourcing data completed by over 100,000 post-secondary students from April 19 to May 1, 2020 provides insight into how students’ academic life was impacted by the COVID-19 pandemic [11]. Admittedly, the adoption of alternate modes of delivery and distance learning under these circumstances are not ideal. Students accustomed to on-campus learning have expressed concerns over the loss of social and interactive side of education. In addition, the rapid transition to online instruction poses challenges to students who were not equipped to adjust to this mode of learning either because they learn better in-person, lack appropriate tools, suffer financial hardship, or do not have a home environment suitable for learning online [11]. The transition of classes to online from in-person lectures due to COVID-19 had an even more severe impact on students with mental health disabilities who needed extra care and frequent face-to-face interactions with the instructors in order to keep maintaining motivation, interest and persistence to succeed in the course [9]. In a recent study, Sahu [24] highlighted the impacts of COVID-19 on education and mental health of students and academic staff and indicated the challenges posed by the closure of universities and other restrictive measures such as the shift in the delivery mode of the course, the change in the format of assessments, the travel restrictions on students, etc. In another study, Cao et al. [5] investigated students’ anxiety level during COVID-19 in a Chinese medical college located in Hubei Province. Their findings showed that about a quarter of students in their sample had experienced mild, moderate or severe levels of anxiety during COVID-19 outbreak. In this paper, we have studied some aspects of stress among students with disabilities and special needs that contributed to their disengagement in any of the remaining evaluation components after the emergence of COVID-19.

We developed a Bayesian hierarchical linear mixed effects model to measure the effects of COVID-19 on students’ marks. The literature of random effects model goes back to Laird and Ware [16]. The classical maximum likelihood estimation and inference of the linear mixed effects model using expectation maximization (EM) algorithm were shown by Laird et al. [15] and Lindstrom et al. [19]. These articles approached the problem of model building and inference from a frequentist viewpoint. On the other hand, the general design of Bayesian method for the linear mixed effects model was described by Zhao et al. [31]. The missing value imputation for linear mixed effects model, using a frequentist viewpoint, was proposed by Schafer et al. [25]. In this paper, we redesigned the linear mixed effects model for the change of students’ raw marks before and after COVID-19 effect in Winter 2020 semester at TRU. We described the complete Bayesian methodology for the proposed model using conjugate and semi-conjugate prior distributions. We then derived the full conditional posterior distributions of the parameters and proposed the Gibbs sampling [17] to generate Markov Chain Monte Carlo (MCMC) samples [12] from the posterior distributions. In order to impute missing values of marks, we assumed the mechanism of missingness completely at random [4]. Our novel contributions are the design and implementation of the fully Bayesian missing value imputation method in a linear mixed effects modeling setup. While most of the classical statistical missing value imputation methods are performed only once given complete data before the model building [7], our method of Bayesian missing value imputation is flexible which seamlessly generates missing values before every generation of the Markov chain from the posterior distributions of the missing values given observed data. In our model, we allow student-specific error variances to vary which is a considerable extension of the methodology in a Bayesian linear mixed effects hierarchical modeling setup. We wrote our codes for the proposed fully Bayesian hierarchical model using R [22]. The R codes and data are available on GitHub (https://github.com/jhtomal/covid19_impact.git). We are in the process of wrapping up the codes in an R package to publish in the Comprehensive R Archive Network (CRAN) so that the broad scientific community can apply our method in their applications.

We hypothesize that the COVID-19 has negative effects on students’ performances, reflected on their marks, and on their stress level especially on students needing special supports. As such, this article aims to explore the following questions:

  1. (i)

    How are the raw marks of all students in a course compared before and after the university switched to online delivery due to COVID-19?

  2. (ii)

    How are the raw marks of individual students within a course compared before and after the transition to online delivery?

  3. (iii)

    Are the stronger or weaker students getting higher or lower average raw marks due to online delivery relative to in-person delivery?

  4. (iv)

    Are there disengaged students who did not participate in any evaluation components in a course after the university shifted to online delivery? Are these students on special support?

  5. (v)

    Are the trends of marks consistent across all courses or departments in this study? If there are different trends, what are the potential factors that make the difference?

In order to cope with the the unexpected transition to online delivery and open-book non-invigilated evaluations, the instructors from different departments in the Faculty of Science at TRU came up with new sets of weighted schemes which were strikingly different from what they had at the beginning of the semester. We thus analyze students’ raw marks, as opposed to weighted aggregated marks, because the raw numbers reflect the unbiased scenario. Subsequently in Sect. 5, we discussed the results according to the level of cognitive skills and hands-on experiences required in the courses we investigated. In our analysis, we classified the cognitive skills with reference to the Bloom’s Taxonomy of Knowledge.Footnote 2

2 Design of the study and description of data

The analysis of empirical data starts with some summary statistics followed by a fully Bayesian linear mixed effects model. The posterior statistical inferences of the model parameters proceed following posterior model inspections.

The study population consists of all of the students in the Faculty of Science at Thompson Rivers University. We collect longitudinal data by sampling 11 courses (with total number of students equals 326; see Table 1 for details) across three departments from the Faculty of Science, including seven courses in Mathematics (MATH) and Statistics (STAT), two courses in Computing Science (COMP) and two courses in Architectural and Engineering Technology (ARET). Each course has been taught, marked,Footnote 3 and graded by the same instructor from the start to the end of the Winter 2020 semester (before and after COVID-19 effects) which ensures instructor’s effects being adjusted while leaving only students’ effects to compare. Moreover, the Cronbach’s alpha [8] test of reliability analysis has been performed to each course to make sure that the course data are reliable. After the selection of the courses, student-specific raw marks are recorded over time using a series of evaluations such as multiple assignments, quizzes, tests, exams, and projects. Specific to each course, we have analyzed marks from all sorts of evaluations and ensured covering all possible grading practices. The raw marks specific to each evaluation component are then converted into percents (from 0 to 100) where larger numbers indicate better marks. We also record whether the marks are observed or missing. Given that all the classes went online starting March 15, 2020, the index variable is simply defined as a vector of boolean values indicating whether the effects of COVID-19 have occurred (before March 15 vs after March 15). As there is no randomness in the index variable, we consider it deterministic. A Bayesian hierarchical model with missing value imputation technique is then applied to determine whether students’ raw marks were increased or decreased before and after March 15. After model fitting, statistical analysis and inference are performed to student specific raw marks in each course. Our findings generalize to the Faculty of Science at TRU, and may eventually to all faculties of science across universities in Canada. Figure 1 shows the overall design of the study.

Table 1 Number of students, percentage of disengaged students, and percentage of missing marks
Fig. 1
figure 1

Overall design of the study

3 Methods

3.1 The linear mixed effects model

Let \(Y_{it}\) be the raw marks in percent for the ith student \((i = 1, \ldots , k)\) evaluated at time t \((t = 1, \ldots , n)\) in a particular course. The subscript t represents a series of evaluations conducted over time from multiple assignments, quizzes, tests, exams, and projects. Let \(x_{it}\) be another variable measured at time t for the ith student which can explain the variation in \(Y_{it}\). We consider the following linear mixed effects model

$$\begin{aligned} Y_{it} = \beta _{0i} + \beta _{1i} x_{it} + \epsilon _{it}, \end{aligned}$$
(1)

where

$$\begin{aligned} \ x_{it} = \left\{ \begin{array}{cl} 1 &{}\quad \text {evaluation taken after March 15, 2020}\\ 0 &{}\quad \text {otherwise}, \end{array}\right. \end{aligned}$$

and \(\epsilon _{it} \sim \text {Normal}(0, \sigma ^2_i)\) with student-specific error variance \(\sigma ^2_i\). The sampling distribution of \(Y_{it}\) is

$$\begin{aligned} Y_{it} | \beta _{0i}, \beta _{1i}, \sigma ^2_i \sim \text {Normal}\left( \beta _{0i} + \beta _{1i} x_{it}, \sigma ^2_i\right) . \end{aligned}$$
(2)

In this model, \(\beta _{0i}\) and \(\beta _{0i} + \beta _{1i}\) are the average marks of the ith student before and after March 15, 2020, respectively. Here, \(\beta _{1i}\) is the difference of average marks before and after March 15, 2020 for the ith student. Let \(\varvec{\beta }_i = \left( \beta _{0i}, \beta _{1i}\right) ^T\) represents the vector of regression coefficients for the ith student. We assume that the ith student is a randomly and independently selected student from the pool of students in a specific course in the faculty of science at Thompson Rivers University (TRU). This leads us to consider the sampling distribution for \(\varvec{\beta }_i\) as

$$\begin{aligned} \varvec{\beta }_i | \varSigma \sim \text {Multivariate Normal}\left( \varvec{\theta }, \varSigma \right) , \end{aligned}$$
(3)

where \(\varvec{\theta } = \left( \theta _0, \theta _1\right) ^T\) and \(\varSigma \) is the variance-covariance matrix. This part of the model explains that \(\theta _0\) and \(\theta _0 + \theta _1\) are the average marks of all students in a course before and after March 15, 2020, respectively.

The sampling distribution for the student-specific error variance is

$$\begin{aligned} \frac{1}{\sigma ^2_i} \sim \text {Gamma}\left( \frac{\nu _0}{2}, \frac{\nu _0 \sigma _0^2}{2}\right) , \end{aligned}$$
(4)

defined in terms of shape and rate parameters, where the course specific error variance is \(\sigma ^2_0\) with strength \(\nu _0\) \((> 0)\). Here, the large values of \(\nu _0\) will force \(1/\sigma ^2_i\) to be tightly clustered around \(\sigma _0^2\) and vice versa. On the other hand, large values of \(\sigma ^2_0\) represent large between-student variability and small within-student variability \(\sigma ^{2}_i\). The overall picture of the hierarchical model is shown below in Fig. 2.

Fig. 2
figure 2

Graphical representation of the hierarchical model

The likelihood function for the parameters of the linear mixed effects model is:

$$\begin{aligned}&L\left( \varvec{\beta }_1, \ldots , \varvec{\beta }_k, \sigma ^2_1, \ldots , \sigma ^2_k, \varvec{\theta }, \varSigma , \nu _0, \sigma _0^2 | \mathbf {y}_1, \ldots , \mathbf {y}_k, \mathbf {x}_1, \ldots , \mathbf {x}_k\right) \nonumber \\&\quad =\prod _{i=1}^k \prod _{t=1}^n \text {dnorm}\left( y_{it}, \beta _{0i} +\beta _{1i}x_{it}, \sigma ^2_i\right) \nonumber \\&\qquad \times \prod _{i=1}^k \text {dmvnorm}\left( \varvec{\beta }_i, \varvec{\theta }, \varSigma \right) \times \prod _{i=1}^k \text {dgamma}\left( \frac{1}{\sigma ^2_i}, \frac{\nu _0}{2}, \frac{\nu _0\sigma ^2_0}{2}\right) , \end{aligned}$$
(5)

where dnorm, dmvnorm, and dgamma are the density functions for the normal, multivariate normal, and gamma distributions, respectively. Please note that the likelihood function provides information contained in the data for the unknown parameters of interests.

3.2 The prior distributions

In addition to the information contained in the data, some extra knowledge may come from experimenter’s prior experience of the system giving rise to the data. Incorporation of this prior knowledge about the system in model building may increase the precision of the estimates for the parameters of interest.

The prior distribution of a parameter represents experimenter’s prior belief via hyper-parameters. We note that the prior belief should be unbiased: a belief which reflects the expected truth of the system that generates the data. In addition, we emphasize that the prior belief should not be strong unless there is enough evidence towards the belief. This is the case because the strong prior belief might pull the posterior belief towards itself. In situations where there is weak or no prior belief, we suggest to be objective and follow the lead by the data.

The prior distribution for the course-specific error variance \(\sigma _0^2\) is considered as

$$\begin{aligned} \sigma ^2_0 \sim \text {Gamma}\left( \alpha _1, \alpha _2\right) , \end{aligned}$$
(6)

where the prior belief regarding \(\sigma ^2_0\) is represented by the hyper-parameters \(\alpha _1\) (the shape parameter) and \(\alpha _2\) (the rate parameter). Specifically, the expected prior belief about \(\sigma ^2_0\) is expressed as \(E(\sigma ^2_0) = \alpha _1/\alpha _2\). Here, small and large numbers of \(\alpha _1\) represent weak and strong prior belief regarding the course-specific error variance \(\sigma ^2_0\).

We restrict \(\nu _0\) to be a whole number and choose the prior on \(\nu _0\) to be a discrete analogue of exponential distribution on \(\{1, 2, 3, \ldots \}\) as following:

$$\begin{aligned} p(\nu _0) \propto (1 - e^{-\alpha _3}) e^{-\alpha _3 \nu _0}, \end{aligned}$$
(7)

where \(\alpha _3\) reflects the strength of prior belief about \(\nu _0\). Specifically, small values of \(\alpha _3\) represent weak belief about \(\nu _0\) and vice versa.

The parameter vector for the course-specific mean \(\varvec{\theta }\) is considered to follow the following distribution

$$\begin{aligned} \varvec{\theta } \sim \text {Multivariate-Normal}\left( \varvec{\theta }_0, \varSigma _0\right) . \end{aligned}$$
(8)

The prior belief about the course-specific mean vector is considered to be \(\varvec{\theta _0}\) (i.e., \(E(\varvec{\theta }) = \varvec{\theta }_0\)) and the strength of the prior belief is represented by the variance-covariance matrix \(\varSigma _0\). Here, \(\varSigma _0\) is a positive-definite matrix where the diagonal elements contain the variances (large values of the variances represent weak prior belief) and the off-diagonal elements contain the covariances (small values of absolute covariance represent weak correlation between \(\theta \)’s).

The prior distribution corresponding to the variance-covariance matrix \(\varSigma \) of the course-specific mean vector \(\varvec{\theta }\) is

$$\begin{aligned} \varSigma \sim \text {Inverse-Wishart}\left( \eta _0, S_0^{-1}\right) , \end{aligned}$$
(9)

where the prior belief about \(\varSigma \) is represented by \(S_0\) (i.e., \(E(\varSigma ) = S_0/(\eta _0 - d - 1)\), where d be the dimension of \(\varvec{\theta }\)) and the strength of prior belief is represented by \(\eta _0\). As in other prior distributions, large and small values of \(\eta _0\) represent strong and weak prior belief, respectively.

3.3 The posterior distributions

After collecting data, we combine the information from the data with prior belief to obtain posterior belief. In other words, the posterior belief is our updated belief after observing the data.

The posterior distribution for the overall error variance \(\sigma _0^2\) is Gamma which is obtained using Eqs. (4) and (6) as following:

$$\begin{aligned} \sigma _0^2 | \sigma _1^2, \ldots , \sigma _k^2, \nu _0 \sim \text {Gamma}\left( \alpha _1 + \frac{k\nu _0}{2}, \alpha _2 + \frac{\nu _0}{2} \sum _{i=1}^k \frac{1}{\sigma _i^2}\right) . \end{aligned}$$
(10)

The posterior distribution for the strength parameter \(\nu _0\) for the overall error variance \(\sigma _0^2\) is obtained using Eqs. (4) and (7):

$$\begin{aligned} \nu _0 | \sigma _0^2, \sigma _1^2, \ldots , \sigma _k^2 \propto \left( \frac{\left( \frac{\nu _0\sigma _0^2}{2}\right) ^{\frac{\nu _0}{2}}}{\varGamma {(\frac{\nu _0}{2})}}\right) ^k \times \left( \prod \frac{1}{\sigma _i^2}\right) ^{\frac{\nu _0}{2}-1} e^{-\nu _0 \left[ \alpha _3 +\frac{\sigma _0^2}{2}\sum \frac{1}{\sigma _i^2}\right] } \end{aligned}$$
(11)

The posterior distribution for the student-specific error variance \(1/\sigma ^2_i\) is independent Gamma which is obtained by using Eqs. (2) and (4):

$$\begin{aligned} \frac{1}{\sigma ^2_i}|\mathbf {y}_i, \mathbf {x}_i, \varvec{\beta }_i, \nu _0, \sigma _0^2 \sim \text {Gamma}\left( \frac{\nu _0+d}{2},\frac{\nu _0\sigma _0^2 + \text {SSE}(\varvec{\beta }_i)}{2}\right) , \end{aligned}$$
(12)

where \(\text {SSE}(\varvec{\beta }_i) =\left( \mathbf {y}_i -\mathbf {x}_i\varvec{\beta }_i\right) ^T\left( \mathbf {y}_i -\mathbf {x}_i\varvec{\beta }_i\right) \) be the student-specific sum of squares of errors.

The posterior distribution for the student-specific mean vector \(\varvec{\beta }_i\) is independent Multivariate-Normal which is obtained using Eqs. (2) and (3) as following:

$$\begin{aligned} \varvec{\beta }_i|\mathbf {y}_i, \mathbf {x}_i, \varvec{\theta }, \sigma _i, \varSigma \sim \text {Multivariate-Normal} \left( E\left( \varvec{\beta }_i\right) _p, \text {Var-Cov}\left( \varvec{\beta }_i\right) _p\right) , \end{aligned}$$
(13)

with posterior mean

$$\begin{aligned} E\left( \varvec{\beta }_i\right) _p = \left( \frac{\mathbf {x}_i^T\mathbf {x}_i}{\sigma ^2_i} +\varSigma ^{-1}\right) ^{-1}\left( \frac{\mathbf {x}_i^T\mathbf {y}_i}{\sigma ^2_i}+\varSigma ^{-1}\varvec{\theta }\right) \end{aligned}$$

and posterior variance-covariance matrix

$$\begin{aligned} \text {Var-Cov}\left( \varvec{\beta }_i\right) _p =\left( \frac{\mathbf {x}_i^T\mathbf {x}_i}{\sigma ^2_i}+\varSigma ^{-1}\right) ^{-1}. \end{aligned}$$

The posterior distribution for the overall course-specific mean vector \(\varvec{\theta }\) is Multivariate Normal which is obtained using Eqs. (3) and (8):

$$\begin{aligned} \varvec{\theta }|\varvec{\beta }_1, \ldots , \varvec{\beta }_k, \varSigma \sim \text {Multivariate-Normal}\left( E\left( \varvec{\theta }\right) _p, \text {Var-Cov}\left( \varvec{\theta }\right) _p\right) , \end{aligned}$$
(14)

with posterior mean

$$\begin{aligned} E\left( \varvec{\theta }\right) _p = \left( \varSigma _0^{-1} + k \varSigma ^{-1}\right) ^{-1} \left( \varSigma _0^{-1}\varvec{\theta }_0+\varSigma ^{-1}\sum _{i=1}^k \varvec{\beta }_i\right) \end{aligned}$$

and posterior variance-covariance matrix

$$\begin{aligned} \text {Var-Cov}\left( \varvec{\theta }\right) _p = \left( \varSigma _0^{-1} + k \varSigma ^{-1}\right) ^{-1}. \end{aligned}$$

The posterior distribution of \(\varSigma \) is Inverse-Wishart which is obtained via Eqs. (3) and (9) as following

$$\begin{aligned} \varSigma | \varvec{\theta }, \varvec{\beta }_1, \ldots , \varvec{\beta }_k \sim \text {Inverse-Wishart}\left( \nu _0 + k, \left( S_{\varvec{\theta }} + S_0\right) ^{-1}\right) , \end{aligned}$$
(15)

where \(S_{\varvec{\theta }} = \sum _{i=1}^k \left( \varvec{\beta }_i -\varvec{\theta }\right) \left( \varvec{\beta }_i - \varvec{\theta }\right) ^T\).

3.4 Distribution of missing values

Let \(Y_{i, o}\) and \(Y_{i, m}\) be the marks that are observed and missing, respectively, for the ith student in a course. Given the distribution of the marks

$$\begin{aligned} \left\{ \mathbf {Y}_{i, o}, \mathbf {Y}_{i, m} | \mathbf {x}_i, \varvec{\beta }_i, \varSigma , \sigma ^2_i \right\} \sim \text {Multivariate-Normal} \left( \varvec{\mu }_{i} = \mathbf {x}_i\varvec{\beta }_i, V_{i} =\mathbf {x}_i \varSigma \mathbf {x}_i^T + \sigma ^2_i I_{d\times d}\right) ,\nonumber \\ \end{aligned}$$
(16)

the missing marks for the ith student are imputed independently by generating data from the following distribution

$$\begin{aligned} \left\{ Y_{i,m}|y_{i,o}, \varvec{\beta }_i, \varSigma , \sigma ^2_i\right\} \sim \text {Multivariate-Normal}\left( \varvec{\mu }_{i, m| i, o}, V_{i, m| i, o}\right) , \end{aligned}$$
(17)

where

$$\begin{aligned} \varvec{\mu }_{i, m| i, o} = \varvec{\mu }_{i, m} + V_{i, m, o} \left( V_{i, o, o}\right) ^{-1} (y_{i, o} - \varvec{\mu }_{i, o}) \end{aligned}$$

and

$$\begin{aligned} V_{i,m | i, o} = V_{i, m, m} - V_{i, m, o} \left( V_{i, o, o}\right) ^{-1} V_{i, o, m} \end{aligned}$$

are obtained using the properties of conditional multivariate normal distribution with

$$\begin{aligned} \varvec{\mu }_{i} = \left[ \begin{array}{l} \varvec{\mu }_{i, o}\\ \varvec{\mu }_{i, m} \end{array}\right] \end{aligned}$$

and

$$\begin{aligned} V_{i} = \left[ \begin{array}{ll} V_{i, o, o} &{}\quad V_{i, o, m}\\ V_{i, m, o} &{}\quad V_{i, m, m} \end{array}\right] . \end{aligned}$$

Note that, in generating the missing values, the \(\varvec{\beta }_i\), \(\varSigma \) and \(\sigma ^2_i\) are generated first from their respective posterior distributions. The computational details of the missing value imputation is provided in Sect. 3.5 with specifics in step 7 of the Gibbs sampling algorithm.

Many missing value imputation methods exist in the literature such as: mean imputation, row average imputation, ordinary least squares imputation, linear model based imputation, local least squares imputation [7], regression imputation, imputation of longitudinal data [30], singular value decomposition, principal component analysis [18], and expectation maximization [25]. Most of the above missing value imputation methods are classical methods which allow the imputation of missing values only once given the observed data. Our method of missing value imputation is fully Bayesian which allows seamless imputation of missing values before every generation of the MCMC scans of the parameters from their posterior distributions.

3.5 The Gibbs sampling algorithm

The approximation of posterior distribution via Gibbs sampling is briefly presented below. For a given state of the parameters

$$\begin{aligned} \left\{ \nu _0^{(s)}, \sigma _0^{2(s)}, \varvec{\beta }_1^{(s)}, \ldots , \varvec{\beta }_k^{(s)}, \sigma _1^{2(s)}, \ldots , \sigma _k^{2(s)}, \varvec{\theta }^{(s)}, \varSigma ^{(s)}, \mathbf {y}_m^{(s)}\right\} \end{aligned}$$

and \(\mathbf {y}_i = \{\mathbf {y}_{i, o}, \mathbf {y}_{i, m}^{(s)}\}\), a new state is generated as follows:

  1. 1.

    Sample \(\nu _0\) using

    $$\begin{aligned} \nu _0^{(s+1)} \sim P\left( \nu _0 | \sigma _0^{2(s)}, \sigma _1^{2(s)}, \ldots , \sigma _k^{(2)}\right) , \end{aligned}$$

    where the posterior distribution is specified in Eq. (11).

  2. 2.

    Sample \(\sigma _0^2\) using

    $$\begin{aligned} \sigma _0^{2(s+1)} \sim P\left( \sigma _0^2 | \sigma _1^{2(s)}, \ldots , \sigma _k^{2(s)}, \nu _0^{(s+1)}\right) , \end{aligned}$$

    with posterior distribution specified in Eq. (10).

  3. 3.

    For each \(i \in \{1, 2, \ldots , k\}\), independently sample \(\sigma ^2_i\) using

    $$\begin{aligned} \sigma _i^{2(s+1)} \sim P\left( \sigma _i^2 | \mathbf {y}_{i, o}, \mathbf {y}_{i, m}^{(s)}, \mathbf {x}_i, \varvec{\beta }_i^{(s)}, \nu _0^{(s+1)}, \sigma _0^{2(s+1)} \right) , \end{aligned}$$

    where the posterior distribution is specified in Eq. (12).

  4. 4.

    For each \(i \in \{1, 2, \ldots , k\}\), independently sample \(\varvec{\beta }_i\) using

    $$\begin{aligned} \varvec{\beta }_i^{(s+1)} \sim P\left( \varvec{\beta }_i|\mathbf {y}_{i,o}, \mathbf {y}_{i,m}^{(s)}, \mathbf {x}_i, \varvec{\theta }^{(s)}, \sigma _i^{2(s+1)}, \varSigma ^{(s)}\right) , \end{aligned}$$

    with posterior distribution specified in Eq. (13).

  5. 5.

    Sample \(\varvec{\theta }\) using

    $$\begin{aligned} \varvec{\theta }^{(s+1)} \sim P\left( \varvec{\theta }|\varvec{\beta }_1^{(s+1)}, \ldots , \varvec{\beta }_k^{(s+1)}, \varSigma ^{(s)}\right) , \end{aligned}$$

    where the posterior distribution is specified in Eq. (14).

  6. 6.

    Sample \(\varSigma \) using

    $$\begin{aligned} \varSigma ^{(s+1)} \sim P\left( \varSigma | \varvec{\theta }^{(s+1)}, \varvec{\beta }_1^{(s+1)}, \ldots , \varvec{\beta }_k^{(s+1)}\right) , \end{aligned}$$

    with posterior distribution specified in Eq. (15).

  7. 7.

    For each \(i \in \left\{ 1, 2, \ldots , k\right\} \), independently sample the missing marks using

    $$\begin{aligned} \mathbf {y}_{i,m}^{(s+1)} \sim P\left( \mathbf {Y}_{i,m} | \mathbf {y}_{i, o}, \mathbf {y}_{i, m}^{(s)}, \mathbf {x}_i, \varvec{\beta }_i^{(s+1)}, \varSigma ^{(s+1)}, \sigma _i^{2(s+1)}\right) , \end{aligned}$$

    where the posterior distribution of missing data given the observed data is specified in Eq. (17).

The order in which the new parameters and missing data are generated does not matter. What does matter is that each parameter is updated conditional upon the current value of the remaining parameters and imputed missing values. The Gibbs sampling algorithm is implemented using the R [22] language.

3.6 Specification of hyperparameters

The hyper-parameters for the prior distributions are chosen as described below. The hyper-parameters for the prior distribution of overall error variance \(\sigma _0^2\) (Eq. (6)) are chosen as \(\alpha _1 = 1\) and \(\alpha _2 = 1/100\). This specification implies that \(E(\sigma ^2_0) = 100\). In other words, our prior specification considers large variability between \(\sigma _1^{-2}\), \(\sigma _2^{-2}, \ldots , \sigma _k^{-2}\). That is, our belief implies large overall error variability and small within student error variability. At the same time, we consider \(\alpha _3\) (the hyper-parameter for the prior of \(\nu _0\)) to be 1 as the small values of \(\alpha _3\) represent weak prior belief about \(\nu _0\). Given that we have no prior knowledge about the overall error variance and the student specific error variances, a weak prior imposes less subjectivity and lets the data objectively determine the parameter values.

To choose the hyper-parameters for the prior distribution of \(\varvec{\theta }\) (Eq. (8)), we fitted ordinary least squares (OLS) regression to student-specific marks and saved the OLS coefficients. The OLS coefficients are averaged to specify \(\varvec{\theta }_0\) (the prior mean vector for \(\varvec{\theta }\)). The prior variance-covariance matrix \(\varSigma _0\) is considered to be the sample covariance of the OLS coefficients. Such a prior distribution represents belief that is aligned with the information contained in the data.

Similarly, we consider the prior sum of squares matrix \(S_0\) (Eq. (6)) to be equal to the sample covariance of the ordinary least squares estimates of the coefficients. This specification ensures the lead by the data as the expectation of the prior distribution of \(\varSigma \) equals to the ordinary least squares estimates of the coefficients. But at the same time, we consider \(\eta _0 = d + 2 = 4\). This specification makes the prior distribution flat or diffuse to make the prior belief weak. Such a prior specification ensures less subjectivity or more objectivity.

In order to initiate the Gibbs sampling algorithm, the initial parameter values are obtained from the OLS estimates of the coefficients. The initial missing values are obtained by simple average of the student-specific observed marks. We used the first 1000 scans of the Gibbs sampler as burn-in and threw the values of the realizations out. This eliminates the effect of the initial values we have chosen to start the Gibbs sampler. We then ran the Gibbs sampler for another 10,000 scans and saved every 10th scan to produce a sequence of 1000 values for each parameter. We checked if the Markov chain for each parameter obtained stationarity or not. The autocorrelation values and plots for the sequence of saved scans for each parameter are also examined. After confirming convergence in terms of stationarity and minimal autocorrelation, we proceeded next to perform Bayesian posterior inference.

Figure 3 shows the MCMC trace-plots against thinned scans of the chains for \(\theta _0\) (top-left) and \(\theta _1\) (top-right). These plots show the generated values of \(\theta _0\) and \(\theta _1\) from their respective posterior distributions saved at every 10th scan of the chain after throwing out the burned-in scans. It is visible that the chain has achieved stationarity. The bottom panels of Fig. 3 show the auto-correlation functions (ACFs) for \(\theta _0\) (bottom-left) and \(\theta _1\) (bottom-right). It is also clear that the generated thinned scans of the chains are nearly uncorrelated. In fact, their effective sample sizes are straight 1000 in each. Similarly, we have confirmed stationarity for the parameters for every other course selected in our study.

Fig. 3
figure 3

Trace plots for MCMC samples (top) and autocorrelation functions (bottom) for \(\theta _0\) and \(\theta _1\) for MATH 1070: Mathematics for Business and Economics

4 Results

We summarized the number of students, the percentage of disengaged students, and the percentage of missing marks within each course in Table 1. The percentage of disengaged students is defined as the percentage of students within a course who did not participate in any of the assessment/evaluation components after classes and assessments/evaluations transitioned to online. The percentage of disengaged students varies from \(0\%\) to \(5.56\%\) across different courses. Even though these numbers are relatively small, the implications are significant. We investigated the reasons as to why students disengaged after March 15, 2020. According to students’ records/responses, at least 43% of them were in need of special supports and accommodations due to mental illness caused by concussion, severe disability in coping with stress, and being slow in processing information. Most of such students had difficulties following the course content in an online delivery mode without face-to-face interactions with the instructors. The other 57% of disengaged students were those who struggled the most with the course contents. Since there were no marks available for the evaluation components for the disengaged students after March 15, 2020, we excluded them from further analysis.

The percentage of missing marks is defined as the percentage of missing evaluation components relative to all evaluation components for the remaining students in a course. These numbers vary from 3.73 to \(14.07\%\). As shown in Table 2, the percentage of missing marks are higher for the courses in which more direct hands-on supports were required in the form of labs, coding, programming, and seminars than the courses evaluated mainly using assignments, quizzes, tests, and exams.

Table 2 Number of evaluation components and percentage of hands-on (lab, coding and programming) components

4.1 Course-specific analysis

Figure 4 and Table 4 show the comparisons of overall performances (top-left panel) and three student-specific performances (top-right, bottom-left and bottom-right panels) before and after March 15 for the course MATH 1070: Mathematics for Business and Economics. The overall marks for all the students in this course went up from a median of \(66.683\%\) (\(95\%\) credible interval of 61.203–71.763%) to \(74.991\%\) (with \(95\%\) credible interval of 69.353–80.647%). Even though there is a small overlap in the \(95\%\) credible intervals for the overall marks, student-specific marks for some students increased by a large margin. The first case (Student 5) made a significant shift of marks from a failing letter grade to a passing letter grade of C (TRU grading scales are summarized in Table 3). This student was most likely failing as his/her average marks had a \(95\%\) credible interval ranging from \(47.384\%\) to \(50.470\%\). The second case (Student 14) made an even larger shift from a failing letter grade to a strong letter grade of \(A-\). The third case (Student 27) also showed a large shift from a barely passing letter grade of D to a passing letter grade of A. None of the distributions of marks before and after March 15 had any overlaps, making the shifts highly statistically significant.

Table 3 Grading scale for undergraduate academic programs at Thompson Rivers University
Table 4 Overall performances and interesting cases for MATH 1070: Mathematics for Business and Economics
Fig. 4
figure 4

Overall performances and interesting cases for MATH 1070: Mathematics for Business and Economics

Figures 5, 6, 7 and Tables 5, 6, 7 show the comparisons of overall performance and student-specific cases in each of the courses MATH 1240: Calculus 2, MATH 1250: Calculus for Biological Sciences 2, and MATH 1640: Technical Mathematics 1. After the transition to online delivery, the overall medians in these courses were increased by about \(20\%\), \(8\%\) and \(7\%\), respectively. The overall increase of marks was statistically significant for the course MATH 1240. On the other hand, the increase of student-specific marks within the course was not uniform. For example, the performance of Student 5 in MATH 1240 improved by only \(1.67\%\), while Students 6 and 29 had an increase of marks by \(33.66\%\) and \(29.18\%\), respectively. The last two students shifted their marks from a failing grade to passing letter grades of \(A^-\) and B, respectively. A more or less similar trend is observed for Students 6 and 24 in MATH 1250, with a small variability in their grades. In other words, a few weaker students managed to improve their marks consistently to a significantly higher range of grades after March 15. However, some good-standing students, such as Student 26 in MATH 1250 and Student 5 in MATH 1640, experienced a decline in their marks. Note that the decline of marks for these good students were statistically insignificant. While the decline of marks for Student 5 in MATH 1640 was practically insignificant, the decline for Student 26 in MATH 1250 was considered practically significant as the change of the letter grade was from \(A^+\) to A.

Table 5 Overall performances and interesting cases for MATH 1240: Calculus 2
Table 6 Overall performances and interesting cases for MATH 1250: Calculus for Biological Sciences 2
Table 7 Overall performances and interesting cases for MATH 1640: Technical Mathematics 1
Fig. 5
figure 5

Overall performances and interesting cases for MATH 1240: Calculus 2

Fig. 6
figure 6

Overall performances and interesting cases for MATH 1250: Calculus for Biological Sciences 2

Fig. 7
figure 7

Overall performances and interesting cases for MATH 1640: Technical Mathematics 1

Figures 8, 9 and Tables 8, 9 show the comparisons of overall marks and student-specific cases for the second-year courses MATH 2200: Introduction to Analysis, and MATH 2240: Differential Equations 1, respectively. MATH 2200 is a proof course, required for Math Majors and a gateway to heavy-proof math courses. After the transition to online delivery, the overall median marks were increased by about \(10\%\). For most of the students the improvement was not notable. However, for some students, such as Students 5 and 9, the improvements in marks (\(41.96\%\) and \(41.28\%\), respectively) were practically and statistically significant. Both of the students’ marks moved from a failing letter grade to passing letter grades of \(B^+\) and \(C^-\), respectively. In MATH 2240, the overall marks increased by about \(11\%\) with almost zero overlap before and after COVID-19. A similar trend of growth in the marks, especially for the struggling students, has been observed in this course. For instance, Students 14 and 21 have shown a significant jump in their marks after March 15. On the other hand, the \(A^+\) level Student 29 experienced a very minimum change in his/her grade. This change is insignificant both practically and statistically.

Table 8 Overall performances and interesting cases for MATH 2200: Introduction to Analysis
Table 9 Overall performances and interesting cases for MATH 2240: Differential Equations 1
Fig. 8
figure 8

Overall performances and interesting cases for MATH 2200: Introduction to Analysis

Fig. 9
figure 9

Overall performances and interesting cases for MATH 2240: Differential Equations 1

Figure 10 and Table 10 display the comparisons of marks before and after March 15 for the course STAT 2000: Probability and Statistics. The overall marks went up slightly from a median of 65.653–67.675%. As there is an overlap in the two distributions, the increase is not statistically significant. Regarding the student-specific cases, Students 5 and 22 who were barley passing the course had an increase by about \(15\%\) and \(16\%\), respectively. None of the two distributions overlap each other before and after March 15, hence, the shifts are highly statistically significant. On the other hand, a few good-standing students experienced a decline in their marks from higher percentages to lower percentages. For example, Student 12 was in the \(A^+\) grade range before March 15, while his/her marks went down significantly by around \(11\%\) to the letter grade of \(B+\) after March 15.

Table 10 Overall performances and interesting cases for STAT 2000: Probability and Statistics
Fig. 10
figure 10

Overall performances and interesting cases for STAT 2000: Probability and Statistics

Figures 11, 12 and Tables 11, 12 compare the overall performance and student-specific cases for the courses ARET 1400: Civil Technology 1, and ARET 2600: Statics and Strength of Materials. The overall students’ performances in these two courses declined after the transition to online delivery. In ARET 1400, the median marks decreased significantly by \(12.14\%\), whereas in ARET 2600, the median marks decreased by \(4.93\%\). The decrease in ARET 2600 is not statistically significant. The percentage of decrease or increase of marks varies from one student to another. For example, the decrease in marks for Student 29 in ARET 1400 and Student 15 in ARET 2600 were \(30.75\%\) and \(18.74\%\), respectively. On the other hand, Student 6 in ARET 1400 and Student 4 in ARET 2600 did not experience a large drop in their marks. For Student 4 in ARET 2600, the decrease is insignificant both statistically and practically. There are a few exceptions to this trend as well. For example, Student 12 in ARET 1400 and Student 12 in ARET 2600 did experience some increase in their marks after transitioning to online delivery mode. Again, the increase of \(2.36\%\) for Student 12 in ARET 1400 was insignificant both statistically and practically.

Table 11 Overall performances and interesting cases for ARET 1400: Civil Technology 1
Table 12 Overall performances and interesting cases for ARET 2600: Statics and Strength of Materials
Fig. 11
figure 11

Overall performances and interesting cases for ARET 1400: Civil Technology 1

Fig. 12
figure 12

Overall performances and interesting cases for ARET 2600: Statics and Strength of Materials

Figures 13, 14 and Tables 13, 14 show overall performances and student-specific cases for the courses COMP 2680: Web Development, and COMP 4980: Bioinformatics. Students’ performances in these Computing Science courses were negatively affected by the transition to online delivery mode. The median of overall marks went down by \(3\%\) for COMP 2680 and by 19% for COMP 4980. The decrease is statistically insignificant for COMP 2680, while significant for COMP 4980. The marks decreased for a large number of students after March 15. For example, the performance of Students 18 and 22 in COMP 2680 decreased by \(14.86\%\) and \(33.82\%\) and the performance of Students 8 and 11 in COMP 4980 decreased by \(1.95\%\) and \(40.56\%\), respectively. On the other hand, some good-standing students were able to maintain their good performance after March 15, such as Student 16 in COMP 2680 and Student 19 in COMP 4980. However, these increases were insignificant both practically and statistically.

Table 13 Overall performances and interesting cases for COMP 2680: Web Site Design and Development
Table 14 Overall performances and interesting cases for COMP 4980: Introduction to Bioinformatics
Fig. 13
figure 13

Overall performances and interesting cases for COMP 2680: Web Site Design and Development

Fig. 14
figure 14

Overall performances and interesting cases for COMP 4980: Introduction to Bioinformatics

5 Discussion of results

Among the 11 courses in this study, both increasing and decreasing trends in students’ marks were observed. Specifically, a general increase in the marks is shown in theory-based courses requiring lower-level cognitive skills according to Bloom’s Taxonomy of Knowledge, whereas a general decrease in the marks are shown in the courses requiring either interactive hands-on support or higher-level cognitive skills.

5.1 Rising trend in courses requiring lower-level cognitive skills

University-level math courses are normally delivered in a traditional lecture format with the instructor teaching core concepts and theories accompanied by related examples and applications varying from direct applications to more conceptual and intricate ones. Student assessment is then composed of several in-class quizzes, written homeworks, one or two midterms and a final exam with generally more weight on summative assessments than formative ones. Knowing that a standard first- or second-year math course is often taken by a large group of students enrolled in various university programs with a wide range of backgrounds in math, assessments in these courses are mainly focused on questions with a medium-difficulty level in order to reasonably evaluate students’ learning. In other words, according to Bloom’s Taxonomy of Knowledge, in-person math exams are normally testing low- to medium-level skills and abilities with limited allocation of questions to higher level skills such as analysis and synthesis. However, transitioning to online and open-book exams did change all equations.

The unprecedented closure of universities due to COVID-19 imposed an unexpected shock to academia in particular to the traditional culture of course delivery and assessment in mathematics. Given the very limited time for instructors to prepare for switching to online modes of delivery, creating online open-book tests and restructuring in-person exams to be suitable for the online format was rather infeasible. Moreover, although some instructors did attempt to design tests relatively different than in-person exams in order to target deeper levels of understanding, they were faced with students’ complaints and resistance. This is completely understandable though on account of the lack of training throughout the semester for such exams. Students in MATH 1250, for example, even in normal circumstances, are mostly categorized in the struggling group with levels of math anxiety; hence it is unrealistic to expect them to perform well in an exam format to which they are not used to. Therefore, most MATH and STAT courses maintained a format similar to in-person exams with the online assessments being open-book.

Nevertheless, with the availability of resources in an open-book exam the low- and medium-level question types that target memory and comprehension skills such as recalling, defining, describing or explaining concepts were no longer truly examining students’ learning as these can be easily found in textbooks, class notes, etc. It was observed that students had a better performance in these question types in the online version of exams compared to similar face-to-face exams prior to university closure. For instance, students in MATH 1240 are normally in the category of moderately strong Science students and their performance improved after the transition to online delivery. The improvement took place mostly because the assessment components and their structures did not change, except the exams being open-book, and students could adapt to the new delivery method with relative ease. Similarly, in MATH 2240 which is a second-year course, the overall performance has grown significantly which can be partly attributed to open-book exams maintaining a similar structure to face-to-face ones.

Furthermore, one can consider the role of technology and online resources available to students during an online math exam. Tools were no longer limited to a basic scientific calculator. Advanced online calculators and math programs, along with many online forums were at hand during an open-book exam. For example, at the beginning of the Winter 2020 semester, all students in MATH 2200 were struggling with the course content as writing a rigorous proof is a skill never taught in the first-year courses. After about a month, many students in this class could improve their proof writing skills and consequently improved their grades to some extent. When the course transitioned to online delivery, the assessment components remained unchanged, most of the students continued their upward trend and their overall performance improved. But this transition might have provided some weaker students the opportunity to seek other resources for answers in non-invigilated tests. This pattern can be seen for students 5 and 9 (Fig. 8). This is also observed in Student 24 in MATH 1250 and Student 21 in MATH 2240, whose large grade improvements turned their grade from failing to the letter grades of \(\mathrm {C}^+\) and B, respectively. The performance of Student 29 in MATH 2240 and Student 17 in MATH 2200, however, did not change significantly after March 15. These students were among the average- to high-performing students.

The statistics course (STAT 2000) we investigated in this study tests medium-level cognitive skills of students, as per Bloom’s Taxonomy, and falls slightly above the Math courses. This course requires students to understand the methods, organize the information in the data, apply the methodology to the data to gain insightful knowledge, and provide summary and explanations of the findings. Nurturing these skills requires some discussions between the instructor and the students. After the transition to online delivery, the support that students needed were provided to the best of the instructor’s ability, especially to the students who requested support via online meetings and discussions. As the structure of the course is very similar to Math courses with some added applications, the overall performance of students improved slightly by about \(2\%\). This increase of overall performance is not as large as the Math courses which are aligned with lower-level cognitive skills. Moreover, when most of the weaker students in this course improved their grades by a large margin, the top students experienced a slight decrease in their marks despite their potential.

5.2 Decreasing trend in courses requiring higher-level cognitive skills and interactive hands-on support

In ARET courses, application and analysis of learned theories and concepts play a key role in the assessments. While assessments in math courses revolve around defining, calculating or reproducing facts pertaining to a topic, ARET courses demand students’ expertise in applying the knowledge learned to novel applications/situations. Assessments for the ARET courses investigated in this study entail developing problem solving skills with emphasis on analyzing and devising the concepts and principles in applied situations. These skills are categorized as medium- to high-level cognitive skills, as per Bloom’s Taxonomy. Students in these ARET courses experienced a decline in their grades after March 15 mainly because they needed some hands-on and face-to-face support to develop their analyzing and problem solving skills required for the final exam. Providing such hands-on support was not feasible after March 15 and, as a result, many students in ARET 1400 and ARET 2600 did not perform well in their final exam. A few of these students happened to be among the stronger cohort of students: for example, Student 29 in ARET 1400 and Student 15 in ARET 2600.

Computing Science is known as a ‘learning-by-doing’ subject, and most COMP courses require enormous interactive supports in hands-on programming and laboratories, without which it is hard for students to succeed in these courses. Table 2 shows that the computing science courses at TRU are more geared towards hands-on practices in lab components compared to MATH and STAT courses. For the two COMP courses investigated in this study, students in COMP 2680 learn web development skills in lectures, then practice and apply these skills in hands-on laboratories. COMP 4980 is an interdisciplinary course where students learn how to apply computing science skills to analyze, synthesize and interpret the biological data. It requires not only hands-on laboratories to practice problem solving skills, but also supports from instructor’s domain-specific knowledge to help students connect the biological problems that they try to solve with computational models and interpret their results both biologically and mathematically. In other words, COMP courses investigated in this study test students’ medium- to high-level cognitive skills, as per Bloom’s Taxonomy.

Unsurprisingly, students’ performances in both courses were negatively affected by the rapid switch from face-to-face to online delivery mode. The marks dropped after March 15 for most students as shown in Figs. 13 and 14, including students with good standings (e.g., Student 18 in COMP 2680 and Student 8 in COMP 4980) and relatively weak students (e.g., Student 22 in COMP 2680 and Student 11 in COMP 4980). There might be a few reasons that potentially caused the decrease of students’ marks after March 15 in these courses. For instance, most students did better in the first half of the semester, because the first few topics of the course were introductory topics which were easy to pick up, and more difficult topics were introduced in the second half of the semester (after March 15). During the face-to-face delivery, students could also get hands-on help from instructors or TAs in lectures, labs, or Computing Science Help Center, but the hands-on labs and the Help Center were no longer accessible after March 15. For COMP 4980, there was a group project in the last three weeks of the semester, but students could not get face-to-face interaction with each other and did not receive the same level of support from the instructor due to the online delivery mode, therefore, some students struggled in the term project.

6 Conclusion

We report results from a moderately large scale study from 11 courses, where the effects of COVID-19 on students’ performance were compared with empirical rigor. This study shows that a sudden change of delivery mode has an immense impact on students’ marks. After switching to online delivery mode and assessments due to COVID-19, students’ marks were increased in theory-based courses that required lower-level cognitive skills based on Bloom’s Taxonomy, whereas in courses with hands-on lab, coding and programming components, or courses that required higher-level cognitive skills, the marks were decreased. The larger increase (for MATH and STAT courses) or decrease (for COMP and ARET courses) of marks are mainly observed for weaker students as opposed to stronger students. The group of stronger students experienced a smaller decrease of marks, while some very hard working students were able to maintain a good standing of marks towards their credentials. The impact has been much more significant on students with special needs who disengaged from the course after March 15. We also emphasize on the fact that the COVID-19 outbreak, lock-down and closure of schools have exposed students to an extraordinary stress level. Students faced the sudden shock of online transition with virtually no education and training on how to take ownership over their submitted work in the online space and be accountable for that.

The authors of this paper observed similar trends in results across Canada which was discussed in many educational workshops and meetings, such as SSC Webinar on Teaching Statistics OnlineFootnote 4 and CMS COVID-19 Research and Education Meeting (CCREM).Footnote 5 Hence, the results of the study can also be generalized across Canada. This is because most, if not all, of the universities across Canada follow the same educational system and experienced moving towards online teaching and assessment at around the same time.

Our novel contribution is analysing and comparing COVID-19 effects on students’ marks. In addition to this novel application, we also designed and developed novel computational models. A Bayesian linear mixed effect model was designed to fully address the comparison of marks. The implementation of Bayesian missing value imputation is novel both in terms of statistics and application.

In this paper, we considered a normal distribution (Eq. (2)) for the response variable of interest. As alternatives, one may wish to use other probability distributions as they fit. For example, in the presence of unusually small or large numbers in student-specific data, one may wish to use heavy-tailed non-central t-distribution. The use of alternative distributions may complicate the computational process for posterior realizations in situations when the posterior distributions are not in closed form. In such a situation, one may need to use Metropolis or Metropolis-Hastings algorithm instead of Gibbs sampling. On the other hand, the applications of open-source MCMC software, such as JAGS [10], WinBUGS [20], or Stan [6] may appear handy to improve computational issues.

This paper considered STEM courses offered under the faculty of science at TRU. On the other hand, consideration of more courses across multiple faculties in University might be of interest. Such interest may also extend to multiple universities in a country or across the world. However, such augmentation of data may require one to use the multilevel linear mixed effects model [21].