Cloud and edge based data analytics for privacy-preserving multi-modal engagement monitoring in the classroom


Learning management systems are service platforms that support the administration and delivery of training programs and educational courses. Prerecorded, real-time or interactive lectures can be offered in blended, flipped or fully online classrooms. A key challenge with such service platforms is the adequate monitoring of engagement, as it is an early indicator for a student’s learning achievements. Indeed, observing the behavior of the audience and keeping the participants engaged is not only a challenge in a face-to-face setting where students and teachers share the same physical learning environment, but definitely when students participate remotely. In this work, we present a hybrid cloud and edge-based service orchestration framework for multi-modal engagement analysis. We implemented and evaluated an edge-based browser solution for the analysis of different behavior modalities with cross-user aggregation through secure multiparty computation. Compared to contemporary online learning systems, the advantages of our hybrid cloud-edge based solution are twofold. It scales up with a growing number of students, and also mitigates privacy concerns in an era where the rise of analytics in online learning raises questions about the responsible use of data.


Student engagement is a topic that has sparked a fair amount of interest over the past decade within the e-learning research community as well as with higher education institutions. Previous research (Atherton et al. 2017; Kahu and Nelson 2018) has shown that students who are engaged with their studies are more likely to be successful. However, engagement is a complex construct. Multiple theories have been proposed in the literature and compared by Kahu (2013). In general, engagement can be understood as a mix of behavioral, cognitive and emotional (or affective) factors. In this work, we propose a multi-modal solution for analyzing behavioral engagement in online and blended (i.e. combination of traditional and online) learning environments. Next to quantitative self-reports by means of student engagement questionnaires, behavioral engagement is the most frequent reported style of engagement. Furthermore, behavior is easier to measure compared to cognitive and emotional styles of engagement.

Maintaining a continuous awareness of the engagement level of students is not only a challenge in a face-to-face classroom setting, but even a bigger one in remote settings in which students participate in interactive online lectures. The research that we present in this paper tackles the following three challenges:

  1. 1.

    First, to capture behavior engagement, the system should process audiovisual, interaction and physiological data of the audience in near-real time. This requires a considerable amount of computational resources and network capacity which may not be readily available within or near the classroom.

  2. 2.

    Second, the effectiveness of different behavioral and interaction modalities to measure the engagement of an individual or a group of students is highly context-dependent. For example, gaze and head pose estimation may not always be a good indication whether students are paying attention as the teacher is walking around or the web camera may be ill-positioned.

  3. 3.

    Last but not least, the continuous tracking and centralized analysis of sensitive personal information may invade the privacy of remote students (May and George 2011; Lorenz et al. 2013).

This remains an issue todayFootnote 1 and is a growing concern. In a The Washington Post article of December 2018, Valerie Strauss (Strauss 2018) highlights why parents and students are protesting against an online learning program backed by Mark Zuckerberg and Facebook.

To address these three concerns, we present a decentralized multi-modal engagement analysis solution that benefits from the bring-your-own-device (BYOD) trend in the classroom by distributing the student engagement analysis towards the client devices (e.g. laptops of students). Not only do the students get insights into their own individual engagement levels, the values are also aggregated over all participants and reported to the teacher in near real-time so that the latter can act accordingly during interactive lectures, e.g. by launching a poll or quiz.

Our edge computing software framework analyses different kinds of student behaviors across different online learning contexts. It is implemented as a JavaScript based web browser extension to simplify the deployment and upgrade cycle of our engagement analysis software framework across different operating systems and web browsers. Our decentralized framework federates the data analytics, machine and deep learning, not only to easily scale up with a growing number of students, but also to mitigate any privacy concerns due to the continuous student behavior tracking.

The main contributions of this research are the application of well-known and proven enabling technologies for decentralized data analytics, as well as best practices for preserving privacy, and this in a non-trivial educational context and real-world use case. The contributions can be summarized as follows:

  1. 1.

    A multi-modal engagement model that runs in the cloud and at the edge to easily scale with a growing number of participants

  2. 2.

    Privacy enhanced analysis of student engagement using federated behavior data processing and the application of secure multi-party computation

  3. 3.

    Performance and practical feasibility assessment using real-world data collected in the classroom

While the distributed deployment of the different enabling technologies resembles a typical client-server architecture, the data can be processed and stored close to the location where it is used (e.g. the client), as well as in resource richer environments (e.g. the server). The deployment strategies – i.e. the decisions of where to process certain data – are mainly driven by privacy and computational complexity constraints. It is this flexibility to shift parts of the computations elsewhere and the ability of the student to decide on which data may be analyzed, that we believe is the main novelty of our hybrid cloud and edge based data analytics framework for engagement monitoring.

This research extends previous work reported in (Preuveneers and Joosen 2019), and was carried out in the frame of the LECTURE+project,Footnote 2 which aims to research, design and evaluate a next-generation platform that improves decision support for teachers, room operators and students to enhance learner engagement in face-to-face and remote lectures. For this study, 25 students and various researchers participated in field trials during real-world lectures within the frame of this project during which browser-based engagement data was captured and analyzed. A detailed description of the data will be provided in a later section.

The remainder of this paper is structured as follows. In section 2 we review relevant related works on behavioral engagement monitoring. Our edge-based privacy-preserving multi-modal engagement monitoring solution and its building blocks are presented in section 3. In section 4, we discuss the experimental results with our framework from a performance and privacy perspective, and compare the outcome with the related work. Finally, section 5 summarizes the main results and gives concluding remarks.

Related work

As our research focuses on the application of edge computing in an online learning context, we mainly discuss relevant related work on engagement analysis to then investigate the feasibility for a scalable and real-time solution in an edge computing deployment, and implications from a privacy point of view.

Learning analytics for engagement analysis

The domain of higher education (Daniel 2016) is exploring big data and learning analytics technology to (1) identify useful data, (2) transform that data to identify useful patterns and deviations from patterns in real-time for educational decision making. Such learning analytics systems can engage students to create new insights, investigate and compare models on learning factors, and detect and appreciate insights generated by their peers. We refer to the work by Henrie et al. (2015) in which the authors review relevant related work on the measuring of student engagement. They argue that consensus is needed for the definition and operationalization of student engagement. While most technology enhanced learning research relies on the self-reporting of engagement, they conclude that physiological and systems data offer an alternative method to measuring engagement, but that more research is necessary to measure their value.

Rather than using quantitative self-reporting with student engagement questionnaires, Thomas et al. ( 2017) relied on facial expressions, head pose and eye gaze using computer vision techniques, as well as machine learning methods to analyze the engagement of students in a class room setting. Their method relies on the analysis of 10 s video clips recorded in a 1920 × 1080 resolution at 25 frames per second. From these videos, they extracted a 30-dimensional feature vector of which 27 were statistically significant. They tested Supported Vector Machine (SVM) and Logistic Regression machine learning methods to classify the engagement level. They were able to improve the accuracy with up to 10% against a baseline classifier (method which classifies everything to the majority class). Contrary to our work, the evaluation of the individual video clips was carried out offline, rather than in a real-time manner. Also, the high-end video equipment may not be readily available to students.

Engagement in a collaborative learning setting supported by the Slack communication tool was explored by Zhang et al. (2017). They investigated to what extent the openness and visibility of each group member’s contribution on Slack can promote students’ engagement. Their empirical study shows that mutual trust, social influence and reward valence have positive influence on teamwork engagement. Furthermore, this engagement has positive effects on personal success in terms of learning and work satisfaction. Our solution analyzes interaction events in collaboration hubs as well. However, it is not tied to any specific online communication platform, as long as students interact through a browser.

More recently, Gray et al. ( 2019) evaluated several candidate machine learning methods based on a variety of student attendance and engagement information to produce predictions of student outcome. They demonstrate how students can be identified as early as week 3 with approximately 97% accuracy. Our contribution depends on machine learning methods as well, with deep learning methods for facial expressions being the most sophisticated. Our contribution focuses on the technical challenges in terms of performance to monitor engagement in near-real time. Rather than using the data to predict student outcomes, our goal is to be able to adapt the lecture to retain student engagement.

Data analytics with edge and fog computing

With the proliferation of Internet of Things devices, edge (Shi et al. 2016; Satyanarayanan 2017) and fog computing (Bonomi et al. 2012; Bonomi et al. 2014) have emerged as viable paradigms to process many data streams and push the communication, computation, control and storage resources from the cloud to the edge of the network. Cao et al. (2017) proposed an edge computing platform to uncover meaningful patterns from real-time transit data streams and generate actionable insights. Their platform supports mobile edge nodes and adopts descriptive analytics for monitoring bus routes.

Similar work was proposed by He et al. (2018). They present a multi-tier fog computing platform based on Raspberry Pi devices that consists of ad-hoc fogs and dedicated fogs with opportunistic and dedicated computing resources. The platform offers functionalities for QoS aware admission control, offloading and resource allocation schemes. The application area of the platform is smart city analytics services, and experimental results demonstrate that fogs can largely improve the performance of such services compared to when they are deployed in the cloud.

A detailed discussion of the many platforms presented in the literature is beyond the scope of this work. We refer interested readers to the following application overviews and surveys (de Assuncao et al. 2018; Yi et al. 2015; Abbas et al. 2018).

Data analytics and privacy

With the collection and processing of large amounts of data come severe security and privacy challenges. Roman et al. (2018) analyze the security threats, challenges, and mechanisms inherent to the edge computing paradigm.

To address privacy concerns with data analytics, various theoretical frameworks have been proposed – including k-anonymity, l-diversity, t-closeness (Li et al. 2007), and differential privacy (Dwork 2011) – to mitigate the leakage of potentially sensitive information during data analytics. In (Mohan et al. 2012), Mohan et al. presented and evaluated GUPT, a system for privacy preserving data analysis for applications not designed with privacy in mind (Danezis et al. 2015). GUPT enables differential privacy while minimizing the loss in output accuracy, which makes it suitable for a wide variety of data analysis programs while providing both utility and privacy. More recently, Abadi et al. (2016b) illustrated that it is feasible to train deep neural networks with differential privacy, making sure that the learned models do not expose private information from the datasets. Their experimental results show that it is feasible with an acceptable overhead in software complexity, training efficiency, and model quality.

Orthogonal to these approaches are the privacy enhancing technologies (PETs) that exploit cryptographic protocols – such as secure multi-party computation (Cramer et al. 2015; Holzer et al. 2012; Prabhakaran and Sahai 2013) or homomorphic encryption (van Dijk et al. 2010) – to gain insights without having access to the complete datasets or to outsource data analytics to the cloud. An illustration of the feasibility of secure multi-party computation is the work by Ma et al. (2018). They presented a framework for privacy-preserving multi-party deep learning in cloud computing. The training data is shared across many parties, and the privacy of the local dataset as well as the learned model are protected against the cloud server. An example of the second approach is described in (El-Yahyaoui and El Kettani 2017), in which El-Yahyaoui et al. leverage fully homomorphic encryption to search over encrypted data in a cloud context. The following recent surveys (Acar et al. 2018; Beimel 2011; Shan et al. 2018) provide more background on these cryptographic protocols and their applications.

Bridging the gap

The gap that we aim to address with this work is not a further improvement of the accuracy for one of the individual classification methods, nor designing novel cryptographic protocols, but rather providing an integrated framework that offers the same multi-modal engagement monitoring capabilities as in the related work in a unified manner. Our goal is to offer a practical solution that is easy to deploy for which the feasibility of near-real time analysis is demonstrated in a realistic online or blended learning environment, while at the same time addressing performance, scalability and privacy challenges that were not considered as key concerns in the related work.

An edge-based framework for privacy-preserving engagement analysis of multi-modal behaviors

This section first elaborates on a motivating scenario from which the requirements regarding multi-modal engagement analytics are elicited. We then continue with the identification and implementation of the individual building blocks of our edge-based framework for privacy-preserving engagement analysis.

A blended learning motivating scenario

As motivating use case, we consider a blended learning (Gong et al. 2018) scenario as a combination of face-to-face education with distance learning, where the latter is offered via an online learning environment in which the student can access digital material, web lectures and other external digital tools and resources. Contemporary video streaming solutions nowadays allow remote students to follow courses in a similar manner as in a face-to-face setting. The computer assisted learning platform allows both remotely and face-to-face participating students to engage in interactive quizzes, polls and assignments launched by the teacher. Online tools and resources offered by third parties can further augment the learning experience. These may include communication tools, such as Slack (Zhang et al. 2017) for collaboration on assignments and exchanging ideas, simulations for a physics course, online translation tools for language courses, etc. In such a scenario, measuring the engagement of a student is not straightforward for two reasons:

  1. 1.

    The learning context continuously changes, and a good engagement indicator in one learning context may not be adequate in another context.

  2. 2.

    Online content and resources of third parties cannot be modified to capture relevant interaction events that are meaningful to measure engagement.

Regarding the first observation, we not only need to distinguish between on-task and off-task behavior (Baker 2007; Cetintas et al. 2009), but also identify the appropriate modality for measuring engagement during on-task behavior. For example, the webcam (Thomas and Jayagopi 2017) may detect whether a remote student is paying attention and looking at the screen during a video stream of the teacher. However, the webcam is not a good engagement indicator when the student is participating face-to-face, or watching irrelevant YouTube videos while the lecture is streaming in another browser tab. It is also not a useful indicator during an interactive quiz, as a webcam can only monitor facial expressions but not the interaction with the quiz itself. Mouse and keystroke dynamics may provide insights into engagement during an interactive quiz or some online simulation tools, but not when the student is chatting with friends on Facebook.

The second observation is particularly a concern for courses that rely on online resources (e.g. collaboration tools, interactive simulations, scientific articles) offered by third parties. An online learning platform can be augmented with logging functionality to not only audit the learning progress of the student, but also the participation in, for example, polls and quizzes. However, this is not easily achieved with third party resources if they do not offer any integration capabilities to monitor student interactions in a fine-grained manner. Furthermore, if such functionality would be available, care must be taken to not invade a student’s privacy when such online resources are used both during and outside lectures (e.g. Youtube videos or Slack communication).

Edge-based multi-modal engagement analytics

To support teachers with retaining the engagement of their students, our engagement analytics framework (as depicted in Fig. 1) must be able to analyze in real-time heterogeneous static silos of data (e.g. user profile data of students) and dynamic data streams (e.g. behavioral and physiological measures, click streams, evolving collaborations with peers). For multi-modal learning and engagement analytics, a key challenge is that different end-points are responsible for collecting and preprocessing the raw engagement data (e.g. audiovisual data from cameras and click streams from interactions with online applications and content). That is why our edge computing solution leverages the standardized extension capabilities of the web browser to analyze the behavior of the student:

  1. 1.

    Keyboard usage

  2. 2.

    Mouse motion events and clickstream data

  3. 3.

    Web camera

  4. 4.

    Browser tab activations, updates and removals, and website snapshots

  5. 5.

    Browser information and window focus listener

Fig. 1

Technology-enhanced multi-modal engagement monitoring of students in blended learning environments

The main benefits of our browser extension is that it (1) can analyze on- and off-task behavior on the client via the browser of the student across different operating systems and web browsers, (2) reduces the cloud based workload by shifting computations towards the client, (3) measure the level of engagement in an application or website agnostic manner which simplifies integration with third party resources, and (4) avoid the transmission of privacy-sensitive information to a centralized virtual server for analysis.

All business logic to monitor and analyze engagement is implemented in cross-platform JavaScript code. It runs in the background as part of the browser extension (see Fig. 2), and does not interfere directly with visited websites (i.e. in a way that ad blockers do). Our browser extension maintains a list of URLs of web-sites that are considered on-task for each course. After installation of the browser extension, the student can configure and grant consent to which events can be captured and analyzed. Our browser extension has been tested in recent versions of the Google Chrome, Mozilla Firefox and Opera browsers. More details about the browser extension are available at

Fig. 2

Browser extension for multi-modal and cross-site behavior engagement analysis

User interaction through keyboard usage, mouse motion events and clickstream analysis

Both the keyboard usage and mouse interactions give an explicit indication of user activity. For privacy reasons, we discard the actual keys pressed, as it could otherwise incidentally capture sensitive information such as private communications or passwords. Mouse motion and click events only contain x,y coordinates. Both types of events do not hold any website specific details (e.g. buttons or input fields). The derived interaction features are aggregated over 10 s tumbling (i.e. non-overlapping) window intervals and interpreted in context with respect to the active URL.

Head pose estimation via the webcam

If available, the webcam is used for real-time head pose estimation. Our browser extension uses a deep learning model similar to (Toshev and Szegedy 2014) for facial landmark detection. To evaluate the model within the web browser, it was re-implemented with TensorFlow.js,Footnote 3 a low-level JavaScript implementation of the TensorFlow deep learning library (Abadi et al. 2016a). Our browser extension only uses a subset of the landmarks for presence detection and eye-gaze estimation. Our solution uses 5 landmarks (nose, eyes and ears) after calibration to identify whether a person is looking at the screen or not (see Fig. 3). From a practical point of view, note that webcams usually do not offer the same high-end capturing capabilities as the equipment used in (Thomas and Jayagopi 2017).

Fig. 3

Calibration of face orientation based on 5 landmarks (nose, eyes and ears)

Browser tab activity for context-dependent interpretation of engagement indicators

The browser extension keeps track of whether the web browser or another application is running in the foreground. Furthermore, it monitors which browser tab is currently active and which website/URL is being visited. This way, the active browser tab can indicate whether the student is interacting with (1) the main online learning platform (e.g. slides, assignment, poll, quiz), (2) an external resource associated with the course (e.g. a collaboration hub such as Slack) or (3) a website considered as off-task (e.g. social media websites).

The tab snapshots (i.e. a snapshot image of the website) are used to compute a perceptual hash (Yang et al. 2006) of the website. The advantage of these hashes is that the Hamming distance between two hashes indicates how far visually apart two images are, irrespective of their resolutions. If only a few bits of the hash differ, it is a good indication that the two images are identical. Our browser extension relies on perceptual hashes when the URL of a website is not a good discriminator (e.g. in the case of dynamic web content). This way, the websites that individual students visit can be automatically compared for similarity without disclosing the actual current content of the website. This further improves the privacy of the engagement monitor.

Context-dependent aggregation of engagement indicators

The teacher can influence the importance of each of the engagement indicators during the lecture. By launching a quiz or poll, students are expected to interact with the online learning platform. During these brief periods, the explicit user interaction events have a higher weight compared to the webcam analytics. The assumption is that in order to interact with the poll or quiz, the student has to look at the screen anyhow. When the teacher is presenting the content of the course or drawing on the blackboard, the expected level of interaction by the students will be limited (e.g. raising hand to ask a question). In that case, the webcam will be the only source of information to confirm the student is engaged, unless off-task behavior – such as interaction on social media – already provides strong indication that the student is not engaged in the lecture.

The browser extension computes an aggregated score between −1.0 and 1.0 for each 10 s interval. Every second, it checks for off-task and on-task behavior, and computes the engagement score as the cumulative sum as follows:

$$ {\displaystyle \begin{array}{l}\kern6em \mathrm{score}=\sum \limits_{t=1}^{10}{w}_t\\ {}\mathrm{with}\ {w}_t=\left\{\begin{array}{l}-0.1,\kern0.33em \mathrm{off}-\mathrm{task}\ \mathrm{behavior}\\ {}0.0,\kern1em \mathrm{no}\ \mathrm{engagement}\ \mathrm{events}\\ {}0.05,\kern0.33em \mathrm{webcam}\kern0.33em \mathrm{engaged}\\ {}0.1,\kern1em \mathrm{webcam}+\mathrm{interaction}\kern0.33em \mathrm{engaged}\end{array}\right.\end{array}} $$

This simple cumulative weighted sum is fairly easy to extend with new engagement indicators (e.g. physiological data). To analyze or visualize longer term engagement trends, the above score is further aggregated in a 5 min sliding interval (i.e. the normalized sum of the last 30 scores). The actual values for the duration of the tumbling interval (∆t = 10 s), the sliding interval (∆t = 5 min), and the values wt for (dis)engaged behavior can be customized, though should be consistent across all students.

The browser extension provides a visualization of the above values, as well as a slider for a user to self-report his engagement level. This value is currently logged but not taken into consideration for cross-student analysis, as it is not trivial to calibrate the self-reported engagement levels.

Server-side aggregation of engagement streams with different data processing pipelines

The browser-based multi-modal engagement analysis is complemented with a server-side backend (see Fig. 4) that aggregates the high-level engagement scores of the individual students. It offers a WebSocket interface to easily integrate the high-level aggregated event stream into a dedicated visualization dashboard for the teacher. When overall engagement drops, it may be time for the teacher to launch another quiz or poll, and and the system make it also possible to evaluate the effect of this intervention on students’ engagement.

Fig. 4

Cloud-based aggregation of student engagement streams on top of the Spring Cloud Data Flow framework

Furthermore, the backend implements the same multi-modal engagement analysis functionality available in the browser. This allows us to systematically compare the performance gains of analyzing the raw data streams on the server versus shifting (part of) the processing workload towards the student’s browser. The performance and throughput of the event stream processing is visualized by means of an online dashboard, as depicted in Fig. 5.

Fig. 5

Visualization of performance and throughput through online dashboard

Secure cross-user aggregation of engagement scores

Our edge-based framework further improves the privacy of its users by aggregating the engagement scores of the individual students in a privacy preserving manner so that a teacher only has an aggregated engagement score for the entire audience. This is achieved by summing the engagement scores by means of secure multiparty computation. Secure multi-party computation (Ben-David et al. 2008) allows computations on encrypted values. This way it is possible to compute an average engagement score without any individual group member revealing her personal engagement score to the others. We make use of the JIFF JavaScript libraryFootnote 4 to compute the overall engagement level.


This section reports on various qualitative and quantitative evaluation metrics of our solution, identifying performance and impact trade-offs. We measure the overhead of the engagement analysis at the client-side, and carry out a systematic performance analysis of centralized versus edge-based engagement analysis on the server – hosting an Intel Core i7–7700 CPU @ 3.60GHz with 32GB of memory – for a growing number of students.

Performance impact on server for centralized and edge-based engagement analysis configurations

We collected real-world interaction traces of student behavior during various interactive lectures.Footnote 5 The data being collected by the web browser extension consists of the following types of information and events:

  • Personal details: The first and last name of the individual, as well as the email address.

  • Consent: Explicit consent from the individual to collect and process information, as well as the selection of events that may be processed.

  • Device configuration: Name and version of the web browser, as well as the operating system on top of which it is running.

  • Self-reported engagement events: The individual can provide a score, from −2 to 2, indicating the level of (dis)engagement.

  • Browser focus events: Events indicated whether the web browser gains from or loses focus to another application running on the individual’s device.

  • Browser tab events: Events indicating when a browser tab was created, deleted, activated, or up- dated when visiting a new website.

  • Keyboard events: The amount of keystrokes in a given fixed time interval (e.g. every 10 s).

  • Mouse events: The amount of mouse clicks and mouse motions in a given fixed time interval (e.g. every 10 s).

  • Head pose events: Events every second indicating whether the individual is looking at the screen, looking away, or is not present (see Fig. 3).

Due to privacy reasons, we only capture the amount, but not the details of keyboard and mouse events. So, it is not possible to reconstruct what was typed, or where on the website the individual clicked. We correlate these events with the URL of the website. Again, due to privacy reasons, we check whether the website is white-listed as on-task (as an indication the website is relevant for the ongoing lecture), and classify it into 5 different categories: on-task, news site, social network, search engine, or other. After classification, the original URL is discarded.

From the 25 students that consented to have their data collected, we filtered the top 3 that created the most events, and used those traces to create a baseline for a "worst case scenario” from a performance point of view. These traces were replicated and replayed to simulate from 10 up to 500 concurrent students to analyze the scalability of the back-end of our solution. Figure 6 compares the performance impact in terms of CPU load on the server for both an edge-based and a centralized data processing pipeline. For the centralized variant, we distinguish two variations where the deep learning model for head pose detection is either run on the CPU or delegated to a CUDA card (more specifically, an NVIDIA Titan V graphics card). It is clear that the edge-based configuration scales to many more users (the maximum capacity of the server was not reached at 500 concurrent students). Furthermore, by sending the raw data to the server in the centralized scenario, the network bandwidth usage is more than 50 times higher due to the need to stream webcam images to the server. The additional overhead is less significant if the webcam video of each student is streamed to all participants anyhow (e.g. via the WebRTC protocol). However, in such a scenario, the bottleneck in terms of concurrent users is the network capacity on the server.

Fig. 6

Server-side performance impact in cloud

Performance impact on edge device for client-side engagement analysis

Given that the deep learning component for head pose estimation and eye-gaze analysis is computationally the most intensive, we tested the practical feasibility of this component with Google Chrome 69 running on different Dell laptops, the oldest being a Dell Latitude E6330 of more than 5 years old and the newest being a Dell Latitude 7480 of more than 1 year old:

  • 5 year old laptop: Dell Latitude E6330 with Intel Core i5-3320M CPU @ 2.60GHz, 8 GB memory

  • 1 year old laptop: Dell Latitude 7480 with Intel Core i7-7600U CPU @ 2.80GHz, 32 GB memory

For a 640 × 480 webcam configuration, the maximum framerate including face landmark detection varied be tween 12 and 22 frames per second, but for engagement analysis one frame per second suffices. Neither of the laptops relied on CUDA hardware acceleration to evaluate the deep learning models. Figure 7 provides a break-down of the computational overhead of the browser extension into the individual engagement indicators. These values were obtained through the built-in task manager of Google Chrome, which allows to measure the memory footprint, CPU and network usage for each browser tab and extension. It is clear that the JavaScript based Tensorflow model for head pose estimation has the biggest overhead. TensorFlow.js can use a GPU to accelerate math operations. So if available on the device, we expect the performance overhead on the client to be significantly less.

Fig. 7

Client-side performance impact (average CPU load over 1 min) on edge device, measured through task manager of Google Chrome browser

Impact and limitations for student engagement

Complementary to this work and within the frame of the same ICON LECTURE+ research project, Raes et al. (2020) previously investigated the effect on students’ engagement as a result of technology-enhanced quizzes launched throughout the online education platform. The interactions of the students with the education platform are also monitored by our browser-based solution.

The authors identified a positive effect from the quizzes on the students’ motivation but that further research is necessary to validate the impact of different kinds and the timing of quizzes. The authors also acknowledge that hybrid virtual classrooms are a promising educational environment for flexible learning, but that they remain challenging to teach in and to learn in as a remote participant.

One shortcoming identified in the web browser extension is the currently limited support for the notion of synchrony as an indicator for engagement. Kawamura et al. (2019) recently investigated engagement based on the synchrony of head movements in e-learning scenarios. Changes in appearance, such as looking away, may be an indicator that a student is disengaged. However, when all students look away at the same time and in the same direction, this notion of synchrony may be an indication that the teacher is walking around. In an online environment, students may be looking away when working on a pen-and-paper assignment. In both situations, the students are in fact engaged, but an individual assessment may lead to contradictory results. Indeed, synchrony is an engagement indicator that operates on a group level, rather than individually. As head pose and movements are currently analyzed within the browser for privacy reasons, novel techniques are needed to use synchrony as a collective engagement indicator without jeopardizing the individual’s privacy.

Security analysis and threat model

The web browser extension collects and analyzes a variety of multi-modal event data and can therefore be considered as fairly invasive. An immediate concern is whether the web browser extension may be abused and therefore imposes a security risk. To analyze this risk, we conceive a threat model with either an honest but curious party or a malicious party. The former party only aims to intercept possibly sensitive information, whereas the latter also manipulates the data. The parties to be considered in the given scenario are (1) the student, (2) the teacher, and (3) a curious or malicious website.

The web browser extension interacts with any website to intercept useful events, but the website cannot directly interact with the extension to intercept and infiltrate sensitive data. It may, however, use JavaScript code to emulate mouse click or keyboard events that may be mistakenly perceived by the web browser extension as genuine student interaction. However, only white-listed URLs at which such events are collected contribute positively to a student’s engagement level, whereas all other websites have a negative influence. As such, a malicious website would not have a significant impact on the engagement level as the website would not be not white-listed.

The teacher cannot interact with the software running at the client-side in the student’s web browser. However, anyone with the necessary privileges to interact with the back-end infrastructure, may be able to manipulate the aggregated data. In theory, such a teacher may modify the engagement results, but such interventions would beat the purpose of the framework and that is to help the teacher keeping the students engaged. The goal of the framework is not to evaluate the lecturing skills of the teacher.

From a fraud point of view, the student is still able to cheat, as the software runs on his own device, and the code of the browser extension could be modified to send fabricated data. As there is no solution to make the client-side code tamper-proof, this is a risk that is difficult to mitigate without the proper hardware support, including Trusted Execution Environments (TEE) such as the Intel SGX. Previous work by Krawiecka et al. (2018) proposed a solution, called SafeKeeper, to securely store web

passwords using TEEs. A similar solution would be required to make the engagement analysis tamperproof, whereby the client-side code that analyzes engagement would need to run within the TEE. However, even in that scenario, there is still an opportunity for a student to cheat using an additional web browser extension that emulates fake interaction events for the white-listed websites.

Privacy impact analysis

Compared to online learning analytics solutions that centralize and process all raw data in the cloud, our edge-based system provides the following benefits:

  • The user remains in control of which data is collected, and must provide consent before any data is captured.

  • The individual engagement score is computed on the user’s client. No sensitive data (e.g. keystrokes, webcam or website snapshots) are sent to the cloud.

  • The cross-user engagement score is aggregated by means of secure multi-party computation so that individual scores are not revealed to others.

While efficiently preserving students’ privacy, our approach merely reduces engagement monitoring to a single aggregated value per student or for a group of students. Yet, student engagement has been regarded as a complex analysis built upon multiple factors. Given that teachers hold a prominent position in shaping levels of engagement to improve student outcomes (Fredricks et al. 2004), leveraging richer data might boost their ability to provide personalized feedback to theirs students. In an online scenario, this translates into exposing possibly sensitive student data for the sake of conveying richer information and creating a better characterization of student engagement. For example, automatic engagement monitoring from facial expressions has been regarded as a potential future asset in remote learning environments (Monkaresi et al. 2017; Whitehill et al. 2014). Whitehall et al. (2014) used state-of-the-art machine learning algorithms employed in emotion recognition to classify the levels of user engagement, finding a positive correlation between the automatic recognition and students outcome after receiving feed-back.

Enhancing our current solution with emotion recognition requires the student to collaborate for training the model, which can be done in two ways. The first one is decentralized training of a common machine learning-based engagement recognition model, which allows the students to keep their data local while contributing to the process by only sharing incremental training up-dates or model parameters. Federated learning embodies this concept and enhances privacy by protecting sensitive data (Konecný et al. 2016). However, it has been proven vulnerable to data leakage under certain attack scenarios (Melis et al. 2018). Additionally, exploiting federated learning prevents from central pre-processing, such as teacher-aided labelling of video frames, ultimately affecting the classification outcome. The second solution consists of sharing video frames with a central node which is in charge of training the engagement recognition model based on all students’ input. However, sharing sensitive raw input might lead to various privacy issues: (1) a face image represents a biometric trait from which a number of personal details can be deduced, a harbinger of function creep by curious third-parties (Dantcheva et al. 2016); (2) sharing sensitive data might leak through an untrusted communication channel or due to mismanagement by the service provider. By anonymizing data upon sharing, we can protect against the aforementioned threats. Traditional techniques, like differential privacy, provide us with formal guarantees w.r.t. private data release at the expense of a drop in performance (Abadi et al. 2016b). Conversely, context-aware techniques involving deep feature extraction may allow for a better balance between privacy and utility (Hukkelås et al. 2019). By learning a compressed representation of the original input, context-aware privacy models may contribute to overcome network bandwidth constraints, which is especially advantageous for client devices that lack the resources to perform sophisticated machine learning tasks.

Conclusion and future work

In this work, we presented an edge-based multi-modal engagement solution that runs as an extension with contemporary web browsers. It supports the monitoring of interaction data with the online learning platform as well third party resources, and can analyze head pose estimation using a deep learning model implemented and evaluated in JavaScript. The added value of our contribution is that the multi-modal engagement analysis is off-loaded towards the students’ browsers, allowing our framework to easily scale up with a growing number of students. At the same time, it mitigates any privacy concerns that students may have due to continuous tracking of their on-task and off-task behavior. One limitation of our solution is that interaction with non-browser tools (e.g. a native word processing or presentation application) used in the frame of a course are ignored to monitor engagement.

As future work, we will enhance autotuning of the configuration of the browser extension to automatically reduce the performance overhead on the client below a certain threshold. Also, the current solution allows for students to cheat by manipulating the client-side data collection process. As future work, we will investigate the feasibility for temporary replication of the engagement analysis on both the clients and server to detect adversarial behavior.


  1. 1.

  2. 2.

  3. 3.

  4. 4.,

  5. 5.

    The data collection was approved on April 2018 by the Social and Societal Ethics Committee of the university with case number G-2018 041206


  1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D.G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., Zheng, X. (2016a) Tensorow: A system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. pp. 265–283. Berkeley: OSDI'16, USENIX Association.

  2. Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., Zhang, L. (2016b) Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. pp. 308–318. ACM.

  3. Abbas, N., Zhang, Y., Taherkordi, A., & Skeie, T. (2018). Mobile edge computing: A survey. IEEE Internet of Things Journal, 5(1), 450–465.

    Article  Google Scholar 

  4. Acar, A., Aksu, H., Uluagac, A. S., & Conti, M. (2018). A survey on homomorphic encryption schemes: Theory and implementation. ACM Computing Surveys, 51(4), 79:1–79:35.

    Article  Google Scholar 

  5. de Assuncao, M. D., da Silva Veith, A., & Buyya, R. (2018). Distributed data stream processing and edge computing: A survey on resource elasticity and future directions. Journal of Network and Computer Applications, 103, 1–17.

    Article  Google Scholar 

  6. Atherton, M., Shah, M., Vazquez, J., Griffths, Z., Jackson, B., & Burgess, C. (2017). Using learning analytics to assess student engagement and academic outcomes in open access enabling programmes. Open Learning: The Journal of Open, Distance and e-Learning, 32(2), 119–136.

    Article  Google Scholar 

  7. Baker, R.S. (2007) Modeling and understanding students off-task behavior in intelligent tutoring systems. In: Proceedings of the SIGCHI Conference on Hu- man Factors in Computing Systems. pp. 1059–1068. New York: CHI '07, ACM.

  8. Beimel, A. (2011) Secret-sharing schemes: a survey. In: International Conference on Coding and Cryptology. pp. 11–46. Springer.

  9. Ben-David, A., Nisan, N., Pinkas, B.(2008) Fairplaymp: A system for secure multi-party computation. In: Proceedings of the 15th ACM Conference on Computer and Communications Security. pp. 257–266. New York: CCS '08, ACM

  10. Bonomi, F., Milito, R., Natarajan, P., Zhu, J. (2014) Fog computing: A platform for internet of things and analytics. In: Big data and internet of things: A roadmap for smart environments, pp. 169–186. Springer.

  11. Bonomi, F., Milito, R., Zhu, J., MAddepalliilito, S. (2012) Fog computing and its role in the internet of things. In: Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing. pp. 13–16. New York: MCC '12, ACM.

  12. Cao, H., Wachowicz, M., Cha, S. (2017) Developing an edge computing platform for real-time descriptive analytics. In: 2017 IEEE International Conference on Big Data (Big Data). pp. 4546–4554.

  13. Cetintas, S., Si, L., Xin, Y.P., Hord, C., Zhang, D. (2009) Learning to identify students' off-task behavior in intelligent tutoring systems. In: Proceedings of the 2009 Conference on Artificial Intelligence in Education: Building Learning Systems That Care: From Knowledge Representation to Affective Modelling. pp. 701–703. Amsterdam: IOS Press,

  14. Cramer, R., Damgrd, I. B., & Nielsen, J. B. (2015). Secure Multiparty Computation and Secret Sharing (1st ed.). New York, NY, USA: Cambridge University Press.

    Google Scholar 

  15. Danezis, G., Domingo-Ferrer, J., Hansen, M., Hoepman, J., Métayer, D.L., Tirtea, R., Schiffner, S. (2015) Privacy and data protection by design - from policy to engineering. CoRR abs/1501.03726,

  16. Daniel, B.K. (2016) Big data and learning analytics in higher education. Springer.

  17. Dantcheva, A., Elia, P., & Ross, A. (2016). What else does your biometric data reveal? a survey on soft biometrics. IEEE Transactions on Information Forensics and Security, 11(3), 441–467.

    Article  Google Scholar 

  18. van Dijk, M., Gentry, C., Halevi, S., & Vaikuntanathan, V. (2010). Fully homomorphic encryption over the integers. In H. Gilbert (Ed.), Advances in Cryptology – EUROCRYPT 2010 (pp. 24–43). Berlin Heidelberg, Berlin, Heidelberg: Springer.

    Google Scholar 

  19. Dwork, C. (2011) Differential privacy. Encyclopedia of Cryptography and Security pp. 338–340.

  20. El-Yahyaoui, A., El Kettani, M.D.E.C. (2017) Fully homomorphic encryption: Searching over encrypted cloud data. In: Proceedings of the 2Nd International Conference on Big Data, Cloud and Applications. pp. 10:1–10:5. New York: BDCA'17, ACM.

  21. Fredricks, J. A., Blumenfeld, P. C., & Paris, A. H. (2004). School engagement: Potential of the concept, state of the evidence. Review of Educational Research, 74(1), 59–109.

    Article  Google Scholar 

  22. Gong, L., Liu, Y., Zhao, W. (2018) Using learning analytics to promote student engagement and achievement in blended learning: An empirical study. In: Proceedings of the 2Nd International Conference on E-Education, E-Business and E-Technology. pp. 19–24. New York: ICEBT 2018, ACM.

  23. Gray, C. C., & Perkins, D. (2019). Utilizing early engagement and machine learning to predict student outcomes. Computers & Education, 131, 22–32.

    Article  Google Scholar 

  24. He, J., Wei, J., Chen, K., Tang, Z., Zhou, Y., & Zhang, Y. (2018). Multitier fog computing with large-scale iot data analytics for smart cities. IEEE Internet of Things Journal, 5(2), 677–686.

    Article  Google Scholar 

  25. Henrie, C. R., Halverson, L. R., & Graham, C. R. (2015). Measuring student engagement in technology-mediated learning: A review. Computers & Education, 90, 36–53.

    Article  Google Scholar 

  26. Holzer, A., Franz, M., Katzenbeisser, S., Veith, H. (2012) Secure two-party computations in ansi c. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security. pp. 772–783. New York: CCS '12, ACM .

  27. Hukkelås, H., Mester, R., Lindseth, F.(2019) DeepPrivacy: A Generative Adversarial Network for Face Anonymization. arXiv e-prints arXiv:1909.04538.

  28. Kahu, E. R. (2013). Framing student engagement in higher education. Studies in Higher Education, 38(5), 758–773.

    Article  Google Scholar 

  29. Kahu, E. R., & Nelson, K. (2018). Student engagement in the educational interface: understanding the mechanisms of student success. Higher Education Research & Development, 37(1), 58–71.

    Article  Google Scholar 

  30. Kawamura, R., Toyoda, Y., Niinuma, K. (2019) Engagement estimation based on synchrony of head movements: Application to actual e-learning scenarios. In: Proceedings of the 24th International Conference on Intelligent User Interfaces: Companion. pp. 25–26. New York: IUI '19, ACM.

  31. Konecný, J., McMahan, H.B., Yu, F.X., Richárik, P., Suresh, A.T., Bacon, D. (2016) Federated learning: Strategies for improving communication effciency. CoRRabs/1610.05492.

  32. Krawiecka, K., Kurnikov, A., Paverd, A., Mannan, M., Asokan, N. (2018) Safekeeper: Protecting web passwords using trusted execution environments. In: Proceedings of the 2018 World Wide Web Conference. pp. 349–358. WWW '18, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland.,

  33. Li, N., Li, T., Venkatasubramanian, S. (2007) t-closeness: Privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering. pp. 106–115. IEEE.

  34. Lorenz, B., Sousa, S., & Tomberg, V. (2013). Privacy awareness of students and its impact on online learning participation – a case study. In T. Ley, M. Ruohonen, M. Laanpere, & A. Tatnall (Eds.), Open and Social Technologies for Networked Learning. pp. 189–192. Berlin Heidelberg, Berlin, Heidelberg: Springer.

    Google Scholar 

  35. Ma, X., Zhang, F., Chen, X., & Shen, J. (2018). Privacy preserving multi-party computation delegation for deep learning in cloud computing. Information Sciences, 459, 103–116.

    Article  Google Scholar 

  36. May, M., George, S.: Using students' tracking data in e-learning: Are we always aware of security and privacy concerns? In: 2011 IEEE 3rd International Conference on Communication Software and Networks. pp. 10{14 (May 2011).

  37. Melis, L., Song, C., Cristofaro, E.D., Shmatikov, V.(2018) Inference attacks against collaborative learning. CoRR abs/1805.04049

  38. Mohan, P., Thakurta, A., Shi, E., Song, D., Culler, D. (2012) Gupt: Privacy preserving data analysis made easy. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. pp. 349–360. New York: SIGMOD '12, ACM

  39. Monkaresi, H., Bosch, N., Calvo, R.A., D'Mello, S.K.: Automated detection of engagement using video-based estimation of facial expressions and heart rate. IEEE Transactions on Affective Computing 8(1), 15–28 (Jan 2017).

  40. Prabhakaran, M., & Sahai, A. (2013). Secure Multi-Party Computation. Amsterdam, The Netherlands, The Netherlands: IOS Press.

    Google Scholar 

  41. Preuveneers, D., Joosen, W. (2019) Edge-based and privacy-preserving multi-modal monitoring of student engagement in online learning environments. In: Proceedings of the IEEE International Conference on Edge Computing (IEEE EDGE 2019). pp. 1–3. IEEE.

  42. Raes, A., Vanneste, P., Pieters, M., Windey, I., Noortgate, W. V. D., & Depaepe, F. (2020). Learning and instruction in the hybrid virtual classroom: An investigation of students' engagement and the effect of quizzes. Computers & Education, 143, 103682.

  43. Roman, R., Lopez, J., & Mambo, M. (2018). Mobile edge computing, fog et al.: A survey and analysis of security threats and challenges. Future Generation Computer Systems, 78, 680–698.

    Article  Google Scholar 

  44. Satyanarayanan, M. (2017). The emergence of edge computing. Computer, 50(1), 30–39.

    Article  Google Scholar 

  45. Shan, Z., Ren, K., Blanton, M., & Wang, C. (2018). Practical secure computation outsourcing: a survey. ACM Computing Surveys (CSUR), 51(2), 31.

    Article  Google Scholar 

  46. Shi, W., Cao, J., Zhang, Q., Li, Y., & Xu, L. (2016). Edge computing: Vision and challenges. IEEE Internet of Things Journal, 3(5), 637–646.

    Article  Google Scholar 

  47. Strauss, V. (2018) Why parents and students are protesting an online learning program backed by mark zuckerberg and facebook.

  48. Thomas, C., Jayagopi, D.B. (2017) Predicting student engagement in classrooms using facial behavioral cues. In: Proceedings of the 1st ACM SIGCHI International Workshop on Multimodal Interaction for Education. pp. 33–40. New York: MIE 2017, ACM

  49. Toshev, A., Szegedy, C. (2014) Deeppose: Human pose estimation via deep neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1653–1660.

  50. Whitehill, J., Serpell, Z., Lin, Y., Foster, A., & Movellan, J. R. (2014). The faces of engagement: Automatic recognition of student engagementfrom facial expressions. IEEE Transactions on Affective Computing, 5(1), 86–98.

    Article  Google Scholar 

  51. Yang, B., Gu, F., Niu, X. (2006) Block mean value based image perceptual hashing. In: Proceedings of the 2006 International Conference on Intelligent Information Hiding and Multimedia. pp. 167–172.Washington: IIH-MSP '06, IEEE Computer Society.

  52. Yi, S., Hao, Z., Qin, Z., Li, Q. (2015) Fog computing: Platform and applications. In: 2015 Third IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb). pp. 73–78. IEEE.

  53. Zhang, X., Meng, Y., de Pablos, P. O., & Sun, Y. (2017). Learning analytics in collaborative learning supported by slack: From the perspective of engagement. Computers in Human Behavior.

Download references


This research is partially funded by the Research Fund KU Leuven, and by imec through ICON LECTURE+. LECTURE+ is a project realized in collaboration with imec, with Barco, Televic Education and Limecraft as project partners and with project support from VLAIO.

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan V GPU used for this research.

Author information



Corresponding author

Correspondence to Davy Preuveneers.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Preuveneers, D., Garofalo, G. & Joosen, W. Cloud and edge based data analytics for privacy-preserving multi-modal engagement monitoring in the classroom. Inf Syst Front 23, 151–164 (2021).

Download citation


  • data analytics
  • multi-modal engagement monitoring
  • privacy
  • cloud and edge computing
  • browser