Keywords

1 Introduction

Adaptive video game research aims at creating games that adapt to players in order to create more enjoyable, engaging gaming experiences. In the line of affective computing research, most studies so far have focused on using player’s emotional state, physiologically-inferred using fuzzy or supervised learning models, as the target of adaptation. Instead of using basic emotions [1] in the adaption loop, this paper investigates the use of fun levels trough a combined Physiological and Behavioural Model of Fun (PMBF) to model the player’s gameplay preferences.

In the literature, Ravaja et al. [2] were the first to show that instantaneous video game events (e.g., gameplay or story elements) could elicit phasic psychophysiological responses indexing emotional valence and arousal, thus highlighting the potential of physiological measures for gaining insight into the player’s experience. Therefore, most affective gaming researches have used physiological data as a means of assessing emotional states of players in relation to game content. For example, Gilleade and Dix [3] explored frustration in adaptive games using physiological indicators of emotional arousal. The authors argued that monitoring (and eventually manipulating) the player’s frustration level could lead to the development of more complex emotional gaming experiences. Interestingly, other authors later suggested that in-game frustration could also increase player engagement, possibly resulting in an overall more satisfying gaming experience [4]. Also, Martínez et al. [5] studied the generality of physiological features of heart rate and skin conductance as predictors of a player’s affective state. They showed that heat rate (HR) and skin conductance (SC) features could be used to predict affective states through different game genres and game mechanics. Finally, Nogueira et al. [6] proposed a model to investigate relationships between emotions (represented as valence and arousal) and game events using the fuzzy physiological model of valence and arousal proposed by Mandryk and Atkins [7].

While some authors have worked on integrating physiological data as direct inputs to a game (i.e. biofeedback games, e.g. [8]), most research using physiological signals have focused on ways of integrating emotions to game design. Those researches have focused mainly on specific emotional states, such as anxiety or fear, that could have more or less straightforward applications in adaptive games. For instance, Liu et al. [9] and Rani et al. [10] used peripheral physiological signals (ECG, EDA, EMG, etc.) to model the player’s anxiety level. In accordance with the concept of challenge-skill balance of the flow theory [11,12,13], they used anxiety as a tool indicating when dynamic difficulty adjustments were required in a game. Similar work has been carried out by Chanel et al. [14] who used both central and peripheral physiological signals for the same purpose. Other authors have employed physiological (ECG and EEG) and behavioural data in order to monitor suspense level in an adaptive survival horror game [15].

Although some authors in the affective gaming community have proposed frameworks to integrate emotions in the design of adaptive games, such as Hudlicka [16, 17] and Tijs et al. [18], more research is required to bring this technology to industry-ready levels. Indeed, little is known on how emotion-driven adaptations should be carried out, since different players will most probably react in dissimilar ways to a same adaptation. Thus, emotion-driven adaptations also require player modelling [19], which constitute an ongoing research topic.

Alternatively, recent studies have focused on developing physiological models of the player’s fun level. Using Assassin’s Creed video game series, it has been demonstrated in previous phases of the FUNii project that fun variations were detectable through players’ physiological signals and behavioural cues (i.e. physio-behavioural measures). Using the continuous fun rating (Fun Trace) proposed in [20], Clerico et al. [21] and Fortin-Cote et al. [22] have trained a supervised machine learning model that detects player’s fun changes throughout a game session, making possible the continuous monitoring of the player’s fun. To train this model, they used the FUNii Database, which contains the physiological and behavioural data along with subjective Fun Traces of 218 players, totalling over 400 game sessions.

Monitoring fun using physio-behavioural modalities rather than post-game questionnaires provides continuous assessment of player experience in real-time, without disturbing gameplay. Furthermore, modelling player enjoyment directly from physiology, instead of modelling emotions from physiology, as in most works in the literature, circumvents the problem of having to provide a model that maps emotions to player enjoyment afterwards. Finally, monitoring the player’s fun level gives insight into what game events likely yielded increases or decreases of a player’s fun during a game session. This kind of information can then be used to build, in real-time, a model of players’ preferences.

The FUNii project aims at developing an adaptive gaming system that uses a physio-behavioural model of fun (heart rate, respiratory activity, skin conductance, eye-tracking and head movements) to detect and maximize players’ enjoyment in real-time. The first phase of this project was conducted in [20,21,22] and aimed at designing a PBMF and involved training supervised classification models on over 200 video game players’ physio-behavioural data. This paper presents the second and third phase of the FUNii project, which focus on integrating and testing the effectiveness of the system inside an adaptive game.

This paper investigates the use of fun detection to model player’s preferences in real-time using physio-behavioural modalities in the context of an adaptive game. The goal of this paper is two-fold. First, we test the reliability of a Physiological and Behavioural Model of Fun (PBMF), trained on 218 players in previous works, to model player’s preferences using gameplay events and according to a predetermined stereotypical model of players. Secondly, as a proof-of-concept we use the inferred preferences to tailor the gaming experience and test which of two versions of the same game (an adaptive and a non-adaptive one) is perceived as more fun by the players.

2 Method

Participants were invited to play missions of Assassin’s Creed: Odyssey that were custom-built by Ubisoft Québec. They first played a baseline mission followed by two variants of a second mission, one predicted by the model to be the player’s preferred one and the other, the least preferred one among three possibilities. Their level of fun was measured subjectively afterwards using both questionnaires and the Fun Trace tool in order to determine whether the player’s preferred variant of the mission could be correctly identified by the PBMF and gameplay events.

2.1 Experimental Design

The participants played a baseline mission during which they were exposed to a mix of three different styles of gameplay, namely Fight, Exploration and Stealth gameplay styles within the same game (Assassin’s Creed: Odyssey). Each style respectively implied to: fight one or more enemies (Fight), explore the game world to discover cues or just wander (Exploration), and sneak around enemies while trying to remain unseen (Stealth). Those gameplay styles were detected by the game itself and were based on ingame events such as entering combat, moving crouched and the like. If the game character was not doing an action associated to fight or stealth, it was classified within the exploration category. While playing, fun increases were detected based on physiological signals in real time using our PBMF previously trained on 218 participants during prior phases of the project [21, 22]. At the end of the baseline mission, the player’s preference levels for each gameplay styles were compiled by a process further detailed in Sect. 2.6. Those preferences levels allowed us to infer the most and the least preferred gameplay style. Following the baseline mission, players were asked to play two variants of a second mission in a counterbalanced order: one tailored to their most preferred gameplay style and one tailored to their least preferred one.

2.2 Participants

A group of 39 (5 women) participants aged between 20 and 28 years old (M: 23, SD: 2.5) were recruited through Université Laval’ student email list as well as Ubisoft Québec player database. Selected participants reported having no diagnosed mental illness, cognitive, neurological or nervous system disorder, nor any uncorrected visual impairment. They also needed to have played the previous instalment of the Assassin’s Creed series: Origin. This was required so that all participants would be already familiar with the game controls and new mechanics introduced in this opus, which are to a great extent the same as in Assassin’s Creed: Odyssey, and would not have to learn them before the experiment.

2.3 Material

Participants played a custom-built version of the most recent opus of the series Assassin’s Creed: Odyssey, which was not released to the public at the time of the experiment. A total of 4 custom missions were developed by Ubisoft Québec developers: a baseline mission and three variants of a second mission, namely a fight tailored mission, a stealth tailored mission and an exploration tailored mission. Summary of the mission’s objectives are presented in Table 1.

Table 1. Missions tested in the experiment along with their descriptions. Each variant was designed to fit one of the 3 stereotypical preference profiles, while Baseline mission presented a balanced mix of fight, stealth and exploration game events.

Physiological and behavioural measures were recorded during every mission by a Biopac MP150 system at a sampling rate of 100 Hz and the Smart Eye Pro eye-tracking system at a sampling rate of 60  Hz. Measurements details are presented in Table 2. Also, a webcam was used to record video of the participant and the OBS Studio screen capture software was used to record gameplay.

Table 2. Physiological measures recorded during experiment. ECG, RSP and EDA were recorded using a Biopac MP150 (with sampling rate of 100 Hz), while a Smart Eye Pro system (60 Hz) was used for ET and HM.

2.4 Fun Assessment

For this experiment, 3 methods allowed to assess subjective fun and gameplay style preferences during each mission: Fun Trace, fun assessment questionnaire, and gameplay preference questionnaire. First, Fun Trace, which is a continuous rating (analogue scale from −100 to 100) of fun throughout the whole game session, was recorded after a mission playthrough. The Fun Trace homemade software, which is similar to GTrace [24], shows participants their gameplay recordings while also presenting a scrolling analogue trace as a visual feedback of their fun annotation. Participants controlled the Fun Trace through a physical control knob: the PowerMate USB from Griffin technology. Figure 1 presents the application as well as the control knob.

One thing to note is that when participants turned the knob to a value below 0, the scale turns red, while it is green otherwise, making a clear demarcation between positive and negative values. Also, the concept of “fun” itself was deliberately left undefined (to gain insight into participants’ own conceptions of “fun”). During Fun Trace recording, the playback speed of the video is set to \(1.5 \times \) in order to minimize task boredom, which could affect validity of the Fun Trace ratings.

Fig. 1.
figure 1

The fun trace application

The second subjective fun assessment was through questionnaires using a six-point Likert scale. The first fun related question was asked after each mission:

Question 1

How pleasant was this mission?

The second question was asked at the end of the two mission variants:

Question 2

What version of the mission did I prefer?

The responses ranged from 1 to 6, 1 signifying that the first mission was strongly preferred and 6 signifying that the second mission was strongly preferred. The use of a six-point Likert scale forced participants to make a categorical choice between the two variants.

Finally, to get a self-reported measure of the gameplay style preferences (fight, stealth and exploration) of the participants, they were asked, at the end of the study, the following three questions to answer on a six-point Likert scale:

Question 3

I prefer the fighting gameplay style, meaning that I prefer direct confrontations with the enemy.

Question 4

I prefer the stealthy gameplay style, meaning that I prefer moving stealthily while avoiding direct confrontations.

Question 5

I prefer the exploration gameplay style, meaning that I prefer to explore the world around to find the best path and hidden treasure.

2.5 Experiment Protocol

The total duration of the experiment ranged between 2 h and 30 min to 3 h and there was a 20$ compensation for participating in the study. Participants were first welcomed and invited to sign the required agreements. Electrodes for the Biopac MP150 system were placed before participants sat at the computer. They were asked to fill out a profile questionnaire which included questions about their gaming habits, self-reported skill level and favourite types of game. The calibration phase of the Smart Eye Pro software was then performed. Afterwards, baseline physio-behavioural signals were recorded for 30 s and participants were asked to remain still while looking at a fixation cross. Participants were then presented with a tutorial that served only as a quick refresher since they were already familiar with the last instalment of the Assassin’s Creed series, where controls were very similar. Participants were then presented with a training mission, where they had to fulfill a set of goals that insure that they possessed the minimal abilities to succeed in the following missions. A schematic representation of the following phases of the experiment is shown in Fig. 2 to help visualize the process. Following this mission, participants played the baseline mission, where they experienced the three types of gameplay. Subsequently, they used the Fun Trace app to generate a Fun Trace for the baseline mission, which was used as a “ground truth” to assess the validity of the inference afterwards. They then played a first variant of the second mission –the variant they prefer (counterbalanced)–, responded to Question 1 and used the Fun Trace app. They then played the second variant of the second mission, responded to Question 1 again and used the Fun Trace app for the last time. Finally, they were asked to fill the gameplay style preferences questionnaire including Questions 25. Participants were then debriefed and given monetary compensation.

Fig. 2.
figure 2

A schematic representation of the experimental procedure.

2.6 Profiles Generation

Participants’ profiles were generated through detection of their fun increases during the baseline mission using the PBMF. This mission gave players the opportunity to experiment with each of the three gameplay styles. Figure 3 displays a summary of the relative amount of time participants experienced each style. One thing to note is that time spent under the fight style was lower than exploration and stealth and that is a consequence of the game architecture: fight sequences are inherently shorter than the two other gameplay style sequences. By contrasting the rate of fun increases (amount of fun increases divided by time in gameplay style) for each gameplay style, the preferred style could be predicted. The rate of fun increases was used instead of an average of the fun level because of the inference algorithm, which is better at detecting discrete increases over absolute level of fun reported with the Fun Trace.

Fig. 3.
figure 3

Distribution of the ratio of time played under each different gameplay style in the baseline mission.

The supervised machine learning model used in this study for inferring these fun increases has been trained to detect Fun Trace increases on 218 previous participants who played Assassin’s Creed: Unity (2014) or Assassin’s Creed: Syndicate (2015). The labelling and feature extraction are illustrated in Fig. 4.

For a single mission of the game, the 20 largest increases of Fun Trace were identified. Two main reasons justified this method. First, using largest increases alleviates concerns about border effects, which arise when participants hit one of the two boundaries of Fun Trace (i.e. −100 or 100). Indeed, when this happened, participants were forced to increase or decrease Fun Trace, which introduced noise. Second, using increases of Fun Trace instead of the value itself is explained by the fact that human preferences are arguably more ordinal in nature than cardinal [25], meaning that relative levels of Fun Trace in a game session (e.g., going from low to high Fun Trace value) is more likely to capture relevant information about a player’s experience than absolute Fun Trace value. With those increases identified, a 20 s temporal window of the physio-behavioural signals was extracted around the increases to capture its physio-behavioural signature. Examples of constant (no changes) Fun Trace and decrease in the Fun Trace were extracted in a similar fashion. We ended up with 7623 labelled samples, 2496 of which corresponds to Fun Trace increases, while the remaining one were examples of constant (2706) or decrease (2421) in the Fun Trace. Temporal and spectral statistics were extracted from each of the physio-behavioural signal as to compile a vector of 201 features from each sample. Using inter-participant, meaning no samples from a single participant was used in the training of the model used to make prediction on his/her subset of data, K-Fold cross validation was used to train the model, tune the hyper parameters and select the most accurate model with the Scikit-learn library [26]. In this case, the most accurate model, with an F1 score of 65% was an extreme gradient boosting classifiers (XGBoost implementation [27]) compared to an F1 score of 56% for a Stratified Dummy Classifier.

Fig. 4.
figure 4

Examples of the fun trace labelling and corresponding physio-behavioural signals. A total of 7 significant increases in the Fun Trace are represented by a vertical dotted line. The 20 s windows extracted around fun increases are represented by shaded regions in each of the plot. The sample of physio-behavioural signals shown is electrodermal activity (EDA), heart rate (HR) and pupil dilatation (PUP).

3 Results

3.1 Validation of the Generated Profile on the Baseline Mission

In order to validate the generated profile, the profile computed by the real-time algorithm was compared to the one computed using the actual Fun Trace of the baseline mission. Using the amount of fun increases in the “ground-truth” Fun Trace for the preferred and the least preferred variants, as determined by the model, the observed agreement was \(69\%\, (p = 0.03)\) between the two. This means that the actual Fun Trace correctly showed more fun increases in the preferred profile 69\(\%\) of the time. This is statistically significant under the one-side binomial test (\(\alpha =0.05\)), where the null hypothesis is that the detected profile is random (probability of success 0.5). To further consolidate that the detected profile was a valid one, it was possible to compare it to the self-reported gameplay style preferences provided by the participant on a six-point Likert scale at the end of the experiment (Questions 35). Results revealed that \(76\%\) of the time, participants rated higher the preferred gameplay style detected by the model than the least preferred one \((p = 0.004)\) of the time, which was again statistically significant under the one-side binomial test (\(\alpha =0.05\)).

The participants’ preference level for each played variants were assessed using Questions 1 and 2. According to the answers to Question 1, participants’ most pleasant mission variant agreed with the preferred mission as selected by the PBMF model only \(52\%\) of the time. Similar results were observed in answers to Question 2, which matched the preferred profile identified by the PBMF model only \(48\%\) of the time. Therefore, both metrics did not significantly differ from a random choice between the mission by the participants. One interesting thing to note is that participants were not always consistent in their answers. Indeed, preferred variant (has determined from Question 2) matched the most pleasant variant (Question 1) only \(69\%\) of the time.

4 Discussion

A discrepancy was observed between profile metrics stemming from the baseline mission and the profiles generated from subjective appreciation questionnaires following both variants of the second mission. There is an indication that the detected profile stemming from the baseline mission was valid because of a concordance with self-reported gameplay style preferences. The validity of the generated preferences profile is also supported by the Fun Trace of the baseline mission. This suggests valid inference of player’s preference profile from the baseline mission, and therefore supports that our PBMF can be used to model players’ preferences in the context of a predetermined stereotypical preferences model. The discordance with the fun reports of second mission variants could stem from the adaptation strategy or the measure of the response to the adaptation, similar to those raised by Fuchs [28]. The adaptation strategy could be at fault in that the categorisation of the variant of each type might be too coarse. Indeed, while each variant was designed to favour its corresponding style, the game did not enforce a particular way to play. For example, in the stealth variant of the second mission a participant might still tried to fight its way through the level, which is difficult (if not impossible) and prompted the experimenter to redirect the participant to the more streamlined path.

A failure in the measure of the response to the game’s adaptation could also be at fault. Fun Traces are subjective, their temporal resolution is higher than questionnaires’, and therefore allow for a much more precise inspection of differences in game experience between each gameplay style, something that was not reflected by answers to the questionnaire.

Furthermore, the overall improvement of game experience caused by tailoring a single mission to inferred preferences might not be important enough to be measurable with a six-point Likert scale, especially considering that participants rated the mission between 3 or 5 in most cases (85% of the time). Another possibility is that the 3-class preferences model that we used in this paper, even though being strongly tied to our test-bench game mechanics, is too simplistic to properly orient the adaptation process. Indeed, it is not straightforward that a player’s preferences are fixed throughout the game [29], and it is even less straightforward that they unfold only in one gameplay dimension (e.g. a player that enjoys Fight might as well enjoy Stealth, even if it is to a lesser extent). Thus, a adaptive game using the same PBMF would most likely benefit from using a game-agnostic model of player types [30].

Finally, there is also the possibility that a single adaptive mission is not long enough to measure fun increases, but that a sequence of multiple missions tailored to the evolving player profile might generate an adaptive game that is perceived as more fun overall. This would necessitate further study, including multiple, longer, play sessions with the same players as they progress through several tailored missions.

5 Conclusion

This research is a step towards integrating real-time player modelling, using objective measurements of players experience through physio-behavioural data, to the design of an adaptive video game. Using real-time prediction of the fun level of player can help steer the game towards the player preferences and even adapt to changing preferences during gameplay. While the real-time generated profile seems accurate under two different metrics (the “ground truth” Fun Trace as well as self-reported profile), the adaptation strategy did not provide measurable improvements in enjoyment of the subsequent mission. Further work might include investigation into which types of adaptation strategies might show a measurable improvement in enjoyment. A simpler game that allows easier adaptation to different styles might provide opportunities to test more real-time adaptation and leverage further benefits from real-time profile generation. For example, becoming tired of a particular gameplay style could be detected by the use of a rolling average of the profiles allowing for varying preferences inside a mission and more fluid adaptations.