1 Background

1.1 Overview extended reality

Case simulations as an imitation of clinical experience are accepted educational resources for training technical skills [1] and non-technical skills such as teamwork, communication, and situation awareness [2, 3]. Case simulations can take place in conventional settings (Fig. 1, left picture, “Real World”), but also using highly immersive technologies such as Virtual Reality (VR).

Fig. 1
figure 1

Extended Reality and Mixed Reality continuum

VR has been researched for many decades and has also gained popularity in medical education as well as, to some extent, in Emergency Medical Services (EMS) [4]. VR plays an important role in the professional training of healthcare professionals and is comparable to traditional media in learning gains [5]. Let us start by distinguishing the terms Extended Reality (XR), Virtual Reality (VR), Augmented Reality (AR), Augmented Virtuality (AV), and Mixed Reality (MR). XR is an umbrella term for technologies that create computer-generated environments or objects. VR describes a completely computer-generated world as diametrically opposed to the real environment [6]. AR means that the user still perceives the physical reality, but it is additionally enriched or extended with digital information. MR combines real and virtual objects. AV consists predominantly of virtual spaces that incorporate real physical objects and people that can interact with the virtual world in real time. Figure 1 shows the relationship between these technologies. The left image in Fig. 1 shows a case simulation for EMS without virtual elements. On the right, the virtual spectrum extends. On the far right, a complete treatment situation is virtualized (VR). The two middle images are variants of MR. In the center left image, a skin phenomenon (e.g., an allergic reaction) is projected onto the real existing manikin. In the center right image, the environment of the physically present trainees is virtualized with physically existing objects.

This article focuses on the results of the ViTAWiN project [7]. ViTAWiN is an acronym for the German equivalent of "Virtually augmented training for education and training in interprofessional emergency care" (German: "Virtuell-augmentiertes Training für die Aus- und Weiterbildung in der interprofessionellen Notfallversorgung"). The project developed a multi-user Virtual Learning Environment (VLE) to enable interprofessional training of paramedics and emergency nurses in a virtualized, highly immersive scenario, independent of location. This means that paramedics, emergency nurses, and physicians can train at the same time in the same case scenario, but from different locations [8]. At the time of the study, ViTAWiN was the only research project that addressed both structured diagnostic and treatment processes and interprofessional aspects in the context of EMS and emergency care. An interprofessional aspect is, for example, the cooperation between paramedics and emergency care in the shock room. This is a special treatment unit for severely injured or seriously ill people. The interface between EMS and hospital is characterized by complexity resulting from the large number of decision variables, unknown information, the large treatment team from various disciplines with different mental models of care, and the imbalance between the time needed to structure the situation and the dynamic course of the illness or injury. The ViTAWiN project, which was based on the experiences of the predecessor project EPICSAVE [9, 10], was launched to enable simulating structured courses of action on the one hand and, on the other hand, to raise awareness of this critical phase of patient transfer among students of emergency medical services and emergency care during their school education. The common human resources of the treatment team are important aspects of the treatment phase. “Virtual team training” in this study also addressed the team dynamics within the emergency medical services and emergency care teams, as well as the joint behavior and targeted synergetic actions of the two ad-hoc teams. These aspects are explored in the presentation of the relevant literature. In addition to the team training component, the patient's manikin was integrated into the VR environment as a haptic element to enable examination steps such as palpating the pulse. A higher degree of realism was expected. During the course of the project, aspects of facial expressions and gestures were implemented on the virtual patient to visualize pain expression and fear, in particular (Fig. 2).

Fig. 2
figure 2

(Source: ViTAWiN project)

Virtualized settings

In the project, an evaluation concept was developed first, which included an evaluation at the different milestones. In this article, we report the results of the final evaluation.

1.2 Review of relevant research

This narrative review focuses primarily on VR and MR in EMS and paramedic education.

In their "Trauma bay VR study", Colonna et al. [11] describe the learning successes of their convenience sample in clinical decision-making using critical decision points, i.e., intubation, cricothyroidotomy, chest tube, and intravenous access. Chabaane et al. [12] showed that a single VR application can increase competence perception and significantly reduce anxiety in paramedics. How long this effect lasts is unknown. Effect sizes were small (anxiety reduction: d = 0.33, increase in competence perception: d = 0.35). When training triage processes during mass casualty incidents, Mills et al. [13] showed that VR simulation was nearly identical in simulation efficiency. They concluded that there were no measurable differences in the mental demand, temporal demand, performance, effort, or frustration domains between the VR group and control groups. Equal levels of satisfaction and triage results could be measured in both groups. Harrington et al. [14] used the example of participants in an Advanced Trauma Life Support course to show that participants enjoyed using VR, rated it as a platform of choice, and considered VR to be a cost-effective tool. Regarding the intensity of cognitive-emotional stress, it can be assumed that case simulations in VR can reach high intensity [15]. Birtill et al. [4] describe three VR studies in their scoping review (up to February 2020) and describe the technology as promising in terms of teaching methods.

With regard to the use of tactile elements, the first promising approaches already exist in the field of emergency medicine, as Schild et al. point out [8]. Girao et al. [16] also report on the implementation of a manikin in first aid training and document stable integration of the manikin into the VLE and positive feedback from the user experience. Furthermore, Melo et al. [17] show in their systematic review that haptic integration can have a positive impact on user performance and enhance the sense of actually being “there” (sense of presence) in the virtualized environment. Melo et al. also note that performance studies were found to be significantly more frequent with haptic integration as opposed to user perception. They further note that the existing literature has a distinctly technical focus and underemphasizes human-centered aspects. Dinh et al. [18] describe the strong positive correlation between multisensory stimuli and presence, while Biocca specifically highlights the potential of haptics to improve presence [19].

The reason for focusing the project and this study on patient transfer and trauma care is also supported by the literature regarding the chosen scenario. Interprofessional handovers in ad-hoc teams pose high risks due to the lack of clear agreements, different mental models of the treatment situation, and/or difficult team briefings, making the process less straightforward [16,17,18]. Boosting team performance in dynamic settings requires a mix of strategies involving individual readiness, teamwork, and environmental optimization [19].

Apart from ViTAWiN [7], no other XR simulations were found at the time of the study that addressed patient handover in interprofessional teams. Furthermore, there are no known XR applications that specifically address the shock room phase. The combination of EMS treatment and patient transfer with tactile elements was also new at the time of the study, although multisensory input, e.g., through tactile augmentation, seems promising.

1.3 Objectives

Aim: As there is still a lack of systematic data, especially in the area of haptic augmented training, team training, and location-independent settings, our study aims to contribute to the evaluation of MR and VR in the context of paramedic education.

Objectives: In our study, we analyze the media use factors associated with the use of a highly immersive VLE for paramedic students. In order to extrapolate the relevance for highly immersive training in paramedic education, we analyzed the differences that occur in terms of media perception and motivation when a haptic tactile patient manikin is incorporated into a highly immersive simulation. The question of the usefulness of haptic augmentation is important because multidimensional approaches have been shown to enhance students' perception of realism [4, 18, 19].

  • RQ 1: To what extent do situational motivation, VR sickness, presence, and usability impact paramedic trainees after a VR and MR training sequence as measured by a questionnaire survey?

  • RQ 2: How do paramedic trainees in a VR training sequence differ from a group of paramedic trainees using an integrated haptic, palpable manikin in an MR setting in terms of situational motivation, VR sickness, presence, and usability?

2 Methods

2.1 Research design and organizational background

The cross-sectional quasi-experimental study with controlled comparison is based on the examination of a VR and MR training sequence in paramedic trainees, with a subsequent survey on media effect factors, motivation, and experience of the immersive environment conducted via an online questionnaire.

2.2 Data collection & eligibility criteria

The data was collected from the two educational EMS project partners in the ViTAWiN project. The two educational partners are part of the vocational training scene for EMS in Germany. To be included in the study, participants had to be trainees in the German emergency medical services profession and be physically and mentally healthy according to their own subjective assessment. In addition, the requirements of the Covid-19 regulations at the time had to be met in order to identify asymptomatic patients. In addition to the absence of symptoms, this also included a negative rapid Covid-19 test. These criteria applied to both the control group and the experimental group.

2.3 Inclusion and exclusion criteria

The participants were interviewed about their health status on the day of the simulation. Only participants who did not report being acutely ill or in any way acutely impaired were included. Participants were given verbal and written information about the study and had the option of not participating or dropping out, as well as withdrawing consent.

2.4 Participants’ characteristics

The participants were students in the German three-year paramedic training program. Unfortunately, due to staffing difficulties during the pandemic, we were not able to recruit emergency nurses from our educational partner. Prior to MR/VR exposure, experienced instructors gave the participants the opportunity to familiarize themselves with the hardware and software as well as the elements of the virtual environment. The participants had little or no prior experience with VR/MR.

2.5 Technical setup & hygiene measures

All simulations were performed in a play area setup, allowing the participants to roam freely. During the entire simulation, the participants wore hair nets to improve hygienic conditions regarding the head-mounted displays (HMD). In addition, disposable gloves and FFP2 masks (comparable to N95 and KN95) were used due to the Covid-19 pandemic.

While a 3 × 4 m play area was used, the size can be adapted to local conditions. In this study, we used the VR set Valve Index® (Valve Cooperation, Bellevue, Washington/USA). The position of the HMD and the controller is detected by laser sensors. This position information is transmitted to and processed by a computer, which reflects the updated VLE back to the HMD. In the classroom, the VLE can be additionally displayed on a screen from different perspectives so that other learners can observe the virtualized patient care.

2.6 Experimental manipulation

In the comparison group (which used VR; hereinafter referred to as VR Group), prehospital patient care was completed without a manikin (Fig. 3); in the experimental group (which used MR; hereinafter referred to as MR Group), a patient manikin was integrated as a haptic element (Fig. 4). In the MR Group, the patient manikin was integrated into the virtualized environment by a simulation technician. A simulation technician was also available for the VR Group. As a consequence of the experimental design, the participants of the MR Group could "feel" the patient on a rudimentary level. This enabled examination steps such as palpating the pulse. The VR Group did not have this possibility and could only rely on the optically virtualized patient. The lack of a haptic reference could lead to less realistic perception as, for example, when auscultating the lungs, there is no corresponding resistance on the patient's body surface. Other important examination steps, such as palpation of the pulse, can be simulated by haptic feedback. Both groups received an individual debriefing in the form of an after-action review by an educational professional from the education partners. This involved discussing the objectives and achievement of the medical treatment with the participants.

Fig. 3
figure 3

VR setting without manikin

Fig. 4
figure 4

MR setting with manikin

Figure 5 shows the main steps of the study implementation on site.

Fig. 5
figure 5

Main steps of the study implementation on site

The participants had to assess the scene in terms of safety and had to provide an initial assessment and structured care to a burn patient in the MR/VR environment using medical guidelines and regional protocols. The initial assessment referred to may indicate the need for some type of action. For example, an assessment of the respiratory tract may reveal that a foreign body has been inhaled and needs to be removed. As an additional resource, we provided a detailed description of the assessment and treatment steps. This also served as a basis for the debriefing.

2.7 Assignment method

Assignment to an experimental condition was made in a non-randomized manner on a class-by-class basis due to the extensive technical setting. Assignment at the individual level was not feasible from a research economics perspective as it would have involved a lot more time and personnel. The reasons for the increased resource requirements were the technical setup, the room arrangement, and especially the still very complex integration and calibration of the tracking of the manikin.

2.8 Instrumentation

After the VR/MR exposure, the participants were asked to complete an online questionnaire. This comprised the collection of sociodemographic data, the Situational Motivation Scale – SIMS [20], the System Usability Scale – SUS [21, 22], and the IGroup Presence Questionnaire – IPQ [23] (Table 1). The Simulator Sickness Questionnaire – SSQ [24] was completed after the evaluation. Approximately three to five minutes elapsed between the end of the exposure and the start of the questionnaire. The instruments used are described below.

Table 1 Constructs, scale, and sample items

Motivation and engagement are predictors of academic performance and may indicate planning problems (e.g., integrating the new learning method just before an exam) [25]. Based on the self-determination theory [26], the SIMS measures Intrinsic Motivation, Identified Regulation, External Regulation, and Amotivation. The higher the score, the higher the level. The instrument was tested with moderate to excellent reliability (Cronbach’s alpha: intrinsic motivation = 0.95, identified regulation = 0.85, external regulation = 0.62, amotivation = 0.83) and factor validity. The results of the confirmatory factor analysis showed χ2 to be significant (χ2 (98, n = 907) = 856.50, p < 0.05), and NNFI (0.89) to be somewhat lower than the 0.90 cut-off value. However, CFI (0.90) was shown to be satisfactory. All hypothesized factor loadings, covariances, error residuals, and factor residuals were found to be significant [20].

Presence, i.e., the feeling of actually being in the virtual environment, as well as the illusion of location and plausibility are, in our view, key quality criteria for the user experience of an XR application. The measurement of presence, i.e., the user's feeling of being directly in the virtualized action, will be discussed in more detail in the Methods section. However, we want to establish the connection to the technical perspective here and thus state that HMDs have an outstanding reputation with regard to the creation of a sense of presence, i.e., they have high immersive potential compared to other virtualization methods [27]. In this respect, Salomoni et al. [27] point out the high importance of the graphical user interface and the way in which the user interface can be operated by the users, which also has an effect on the expression of the sense of presence.

The IPQ addresses questions representing Involvement, Spatial Presence, Realism, and a General Factor, with responses using seven-point rating scales (0–6) and high scores being positive for each construct. At this point, we quote Schubert et al. because there seems to be some misunderstandings about scaling: “All items have a range from 0 to 6. The left endpoint of the scale is always 0, the right endpoint is always 6” [28]. The IPQ was tested with good reliability (Cronbach’s alpha > 0.7 for all subscales) and factor validity. For details, see Schubert et al. [23].

In highly immersive HMD-based XR, user experience can be affected by side effects, which are called simulator sickness. Symptoms include headaches, impaired vision, and dizziness. There are objective and subjective methods for assessing the severity of simulator sickness, with the most popular being Kennedy’s SSQ [29]. The SSQ includes three constructs – Disorientation, Nausea, and Oculomotor – which are answered using a four-point rating scale. Each construct as well as the total score has its own weighting (Table 3). Weighted scores < 5 are considered "negligible", scores of 5—10 are considered "minimal", scores of 10—15 are considered "significant", and scores of 15—20 are considered "of concern". A total score above 20 is considered "bad" [30]. The use of the SSQ is critically debated in the application domain of VR due to its development, which was aimed at flight simulators rather than the latest generation of HMD-VR [30, 31]. Nevertheless, because SSQ has been used in numerous studies for reporting VR-related adverse events [29, 32], it was used here to allow comparability. Symptoms of simulator sickness can last from minutes to hours and increase with the duration of VR exposure, although no general pattern is currently discernible. Studies in VR settings have shown that the probability of symptoms is heterogeneous [32, 33].

The usability of a virtual learning system should be as high as possible to allow participants to focus on the learning objectives. A system with poor usability runs the risk of participants losing engagement and using cognitive resources to operate the system that are actually needed to master the learning objectives. The SUS is a simple and technology-independent questionnaire to evaluate the usability of a system. It comprises ten questions, which are answered on a five-point rating scale and evaluated as a percentage on the grade scale. Rule of thumb: “[…] products that scored in the 90 s were exceptional, products that scored in the 80 s were good, and products that scored in the 70 s were acceptable. Anything below a 70 had usability issues that were cause for concern.” [22]. The SUS was tested with excellent reliability (Cronbach’s alpha = 0.91) and acceptable factorial validity [22, 34].

2.9 Measures and covariates, power and precision

All statistical operations were performed using RStudio (Version 2022.02.1 + 461). First, the data was prepared. This was followed by a pattern analysis of missing values and a descriptive-statistical evaluation.

Subgroups were compared using Welch’s t-Test (α = 0.05, two-sided) after ruling out extreme deviation from the normal distribution (R package "stats"). While the pooled t-test according to Student is based on a paired calculation of standard errors, the t-test with Welch estimation applied here is based on an unpooled calculation of standard errors, which does not require heteroskedacity, but is at least as reliable as pooled calculation when presented in terms of Type I error [35,36,37]. The power of the Welchs’ t-test is similar to that of the Student's t-test even when the population variances are equal. With regard to unequal sample sizes, we note the robustness of the Welch approximation compared to the Student's t-test, especially in the absence of gross violations of normal distribution [38].

Correlations were tested with Pearson correlation with Holm correction (R-package “rstatix”). The sample size of the two subgroups resulted in a beta error probability of 0.43 for the t-Test for two independent means (effect size d = 0.5) and 0.49 for the calculation of the correlation coefficient by Pearson (effect size q = 0.6) for independent samples (“g*power”, Version 3.1). Considering that the size of the two subgroups is not extremely unbalanced, it can be assumed that the calculation of the Welch’s t-Tests and the Pearson correlation is robust for the present sample [39].

The three genders were compared with the Tukey–Kramer-Test (α = 0.05) under adjustments for multiple comparisons (“stats” package). This test uses the harmonic mean of sample numbers and can therefore be used when sample numbers in the groups being compared are different [40].

To test interrelatedness, Cronbach’s alpha and McDonald´s omega were computed (R-package “psych”) (Table 2). The raw data is publicly available at the FORDATIS research data repository [41].

Table 2 Reliability measurements

3 Results

3.1 Participant flow

A total of 42 participants were distributed class-wise between the experimental group (n = 21) and the comparison group (n = 18). The sample sizes of the EG and the CG are different because the participants were recruited from the paramedic courses of two schools. The courses have different numbers of participants because there are no uniform guidelines for the size of the courses. Other factors, such as voluntary participation and absence due to illness, also come into play and are difficult to plan in advance. All participants in both groups completed the experiment. In the experimental group, one person opened the online questionnaire but did not submit it. In the comparison group, there were two persons with this same behavior. Figure 6 shows the flow of participants.

Fig. 6
figure 6

Flow of participants

3.2 Baseline data

Age was distributed without practice-relevant differences in the two groups, but gender was not. There were significantly more males than females and non-binary persons (Table 3).

Table 3 Age and sex – Complete sample and subgroups

3.3 Descriptive statistics

Missing data: One person did not answer any of the sociodemographic questions, and three persons did not answer the age question. The remaining variables were partially not answered by one person, without any specific pattern. Three persons did not submit the questionnaire at all. Table 4 shows the location and dispersion parameters (RQ1 & RQ2).

Table 4 Means and standard deviations of constructs and factors

Figure 7 shows the results of the SSQ dimensions for the total sample (RQ1) grouped by experimental conditions (RQ2). Extreme outliers have been removed; i.e., we removed values that were 1.5 times the interquartile range above the third quartile or 1.5 times the interquartile range below the first quartile. An analysis at the item level shows that all those who reported moderate to severe problems with focusing (Items: “Difficult focusing”) and/or seeing clearly (Item: “Blurred vision”) wore a visual aid (glasses or contact lenses).

Fig. 7
figure 7

Boxplot of SSQ scores

The SUS is additionally interpreted as a percentage on the grade scale [22, 34]. Table 5 shows the grouped results. Here, the SUS score was divided into sections (F to A) and underlaid with acceptability ranges to simplify interpretation according to Bangor. In each case, the numbers represent the frequency of participants allocated to the respective section after weighting.

Table 5 System Usability Scale – Grade scale (according to Bangor et al., 2009 [34])

3.4 Inferential statistics

Mean score differences

Construct-level and factor-level t-Tests revealed a significant mean score difference for Amotivation (p = 0.021 [CI95 = -1.47,-0.13]). No other significant differences were found regarding the groups. A significant mean difference (p = 0.03 [CI95: -26.47,-1.33) between the three genders was found only for SUS between females (mean = 60.42, SD = 18.49) and males (mean = 74.32, SD = 11.42).

Correlations

With reference to RQ2, significant correlations of r > 0.3 are reported below. In both the VR group and the MR group, Identified Regulation (VR group: r = 0.67, MR group: r = 0.64) and Intrinsic Motivation (VR group: r = 0.70, MR group: r = 0.83) are highly correlated with Usability. Similarly, Spatial Presence is highly correlated with Usability (VR group: r = 0.61, MR group: r = 0.71), Identified Regulation (VR group: r = 0.55, MR group: r = 0.70), and Intrinsic Motivation (VR group: r = 0.51, MR group: r = 0.61) in both groups. Involvement is highly negatively correlated with External Regulation (r = -0.67) and Amotivation (r = -0.53) only in the VR group. The MR group is further distinguished by the high negative correlation of VR Sickness with Intrinsic Motivation (r = -0.53) and Usability (r = -0.63). Further differences of the MR group are the high negative correlation of Usability with External Regulation (r = -0.69) and Amotivation (r = -0.51). Table 6 shows a summary of the correlations.

Table 6 Significant correlations

3.5 Secondary findings

In addition to the quantifiable results presented so far, we present three secondary findings. These are to be understood as observations of the study team. First: The technical environment can be challenging. This is mainly due to the various system components that need to be connected. Each component is critical to success. As a trivial example, empty batteries in one of the controllers will cause significant delays. In the course of the project, the development and subsequent use of a checklist became established. An example is provided in the appendix. Second: Wired HMDs can affect the sense of security and presence. At the time of the study, there were no high-performance wireless solutions available that could be integrated into the image setup without latencies. Third: Small-scale operations, such as the dispensing of medication with the controllers, were time-consuming to learn and perform. Suitable data gloves that allow intuitive movement were not available at the time the study was conducted.

4 Discussion

4.1 Interpretation

Regarding RQ 1, which includes the complete sample, it can be stated that the participants had a high level of Intrinsic Motivation and Identified Regulation. Low Amotivation and low External Regulation complete the picture. Overall, this means that the participants indicated a high level of intrinsic interest. The low External Regulation may have been caused by the integration into the curriculum itself, but, like Amotivation, it must not be overestimated due to the low expression. Nevertheless, it is worth questioning how the participants were prepared for the simulation and the evaluation by the training centers and whether competing events, such as imminent examinations, should be better considered in terms of time. Identified Regulation and Intrinsic Regulation were highly correlated with Spatial Presence, similar to Lerner et al. [42].

VR Sickness: The complete sample shows a mean value of the Total Score, which must be considered “significant”. Likewise, the mean values for Disorientation and Oculomotor need to be discussed in order to initiate improvement measures. The reasons for the high expression of Disorientation and Oculomotor are due to the structure of the SSQ: Both constructs contain items that address eye problems; in part, the same item is used in several constructs, like "Difficulty focusing". In both Oculomotor and Disorientation, this item was the leading problem. Therefore, it must be emphasized that individual adaptation of the HMD is essential. Not only must the HMD fit securely without slipping, but the interpupillary distance and the distance from eye to lens must also be adjusted. It should be noted here that a secure fit is difficult to achieve due to personal protective precautions (i.e., hair net). This is also crucial for a good visual user experience and can possibly explain the high symptom expression in our case. Also, the limitations of the system for people with visual impairments need to be explored. In case of continuous use, individual solutions, such as corrective lenses for the HMD, may have to be provided.

Presence: The participants indicated a medium to high level for all facets. In detail, this means that Spatial Presence has already reached a satisfactory level, while Realism and Involvement are not a cause for concern, although there is still room for improvement. A comparison with the study by Girau et al. [16], which deals with a similar scenario in MR and reports better results for realism, spatial presence, and involvement, supports the assumption that improvements are possible. In this respect, reference can be made to Muckler [43], who makes effective simulation dependent on the ability of teachers to help trainees overcome their disbelief. The following suggestion for improvement should be considered: Effective simulation depends on the ability of teachers to enable learners to overcome their disbelief. Here, the pedagogical professional must also be held responsible. The goal is to enable learners to accept the otherwise unrealistic aspects of a clinical simulation, according to Muckler. Muckler further notes: "The ability to suspend disbelief enhances the participant's level of immersion in the simulation." To overcome the suspension of disbelief, a fiction contract can be used. This is a "psychological contract" between learners and teachers, in the sense of establishing a "safe container", which includes the following components: clarifying expectations, attending to logistic details, and declaring and enacting a commitment to respecting learners and concern for their psychological safety [44]. Furthermore, controlling the application with respect to the graphical user interface can be discussed, for example in terms of a diegetic user interface as described by Salomoni et al. [27]. Concretely, it can be discussed whether a diegetic application would improve the virtualized fine motor tasks (e.g., drawing up medicaments) in terms of usability and realism.

The high correlation of Usability and Spatial Presence in both groups highlights the great importance of the spatial dimension of presence. Due to the connection between embodiment and presence, improvements could also be achieved through stronger embodiment [45]. This includes, for example, individualizable avatars (which are very important in embodiment [46]), wearing, e.g., professional clothing with which the respective group can identify themselves. In terms of embodiment and self-identification through avatars, we would like to highlight the paper by Dalgarno and Lee [47] for further explanations.

In terms of usability, the scores are close to the cut-off of 70, which suggests that improvements are necessary, but also that an intermediate level has been reached. Participants' comments indicate that the wired HMD was particularly annoying. Similarly, there was criticism regarding the fact that very small-scale movements, such as drawing up medications, would require a high level of attention and distract from the actual learning goal due to the unnatural imitation of movement by the controllers and occasional tracking problems with very fine movements. Unfortunately, due to a lack of devices with adequate performance, it was not possible to introduce "data gloves" or wireless HMDs as originally planned. There is a lot of ongoing development in this field, and we expect XR to get to a new level of immersion with the release of improved, affordable "data gloves". However, this idea should be pursued in future projects. It is worth investigating whether increasing the number of sensors could enhance the precision of haptic movement tracking. In the present configuration, the position of the HMDs and controllers is determined through triangulation using two external sensors, both of which require an unobstructed line of sight to the devices. Any obstruction to this sight line can cause momentary position loss and sudden "jumps" within the simulation. Introducing additional "lighthouses" might increase the stability and accuracy of device positioning. The high positive correlation between Usability and Identified Regulation in both groups as well as the high negative correlation of Involvement with External Regulation and Amotivation in the VR Group suggest the immense importance of targeted, planned pedagogical guidance.

The results regarding Usability and Presence are similar to Lerner et al. [42] in the predecessor VR project EPICSAVE and to Schild et al. [8] from previous evaluations.

With respect to RQ 2, a significant mean difference between the MR Group and the VR Group was found only for Amotivation. It is surprising that Amotivation was lower in the experimental group, but no significant differences could be found for Intrinsic Motivation, Identified Regulation, and External Regulation. This can be seen as being in contrast to [4, 18, 19]. We point out the very different objectives and methodologies, which means that a comparison of the studies would require a deeper discussion of the methods and methodological effects. One possible explanation for the different findings is that the extraneous cognitive load [48] required by the operation outweighed the perception of the manikin during training. In future studies, it would be important to perform cognitive load measurements in addition to the current study design.

Understanding the underlying learning effectiveness is inherently intricate and methodologically challenging, as Hattie suggests [49]. Therefore, a longitudinal study should be conducted over several training sessions on different days to account for learning effects.

In summary, the described findings regarding Realism, Involvement, and Usability imply that there are main prerequisites that enable the acquisition of competencies as a basal criterion: The technical setting must be mastered in a suitable classroom without ifs and buts in order to avoid presence dropouts and distractors. VR/MR can only develop its full effect if used in a planned manner by pedagogical professionals. Pre-briefing, fiction contract, and structured debriefing can be assumed to be vital, especially with regard to the perception of Realism and Involvement. This hypothesis needs to be tested in future learning-outcome-oriented studies. Since competence-based assessment of learning outcomes is effort-intensive, a first step may be to consider measuring objectively assessable performance metrics such as accuracy, timing, and sequencing of actions [50]. A practical example that can be further extended is the measurement of no-flow time during BLS-CPR in VR by Issleib et al. [51]. Independent variables to be considered include the impact of familiarization and habituation on learning outcome. Abelsson et al. [1] also highlight the importance of familiarization in prehospital case simulations.

4.2 Generalizability and limitations

The results show that trainees at paramedic training schools are open to technology and highly motivated. These are two important foundations for the use of “new” educational technologies.

In this study, internal validity may be biased by the novelty effect [52]. It should be noted that (without competent pedagogical supervision) a trainee's engagement gradually wanes as the now familiar game elements and mechanisms no longer entertain, challenge, or satisfy the user [53]. Furthermore, it can be assumed that the operation of the virtual learning environment represents moderate to high extrinsic cognitive demand. This can lead to failure to achieve the learning outcomes. Both of these limitations can be addressed by repeating the measurement of the MR and VR application over multiple instructional sessions on different days. Regarding the low interrelatedness from the presence dimension Involvement, this has to be tested in larger samples in the field of HMD simulation.

In view of the sample size, the generalizability of this study must be discussed critically. Reproducibility should be examined in further studies. We would once again like to draw attention to the particularities of the restrictions imposed by Covid-19 at the time of the study, which did not allow larger samples to be included. Furthermore, in the case of self-report measures, it must be considered in the research design that considerable bias may occur [54].

Due to the restrictions on teaching associated with the Covid-19 pandemic, it was not possible to establish larger longitudinal learning outcome measurements. Regarding the determination of knowledge gains in VR settings for paramedics and nursing students, we refer to the study by Schild et al. [8] in comparable settings.

VR and MR are inconceivable without an interface between man and machine. Following Joisten et al. [55], there is a relationship to the human lifeworld, a certain depth of intervention in the human lifeworld, and an influence on professional and learning practice. This repercussion on humans requires a techno-ethical perspective, which has not been adopted in this paper for the sake of the two research questions. However, a more detailed techno-ethical analysis is needed in future research.

With regard to external validity, an important strength of this study should be highlighted, namely, its practical relevance, as the evaluation was done in real learning situations at the educational partners. Thus, it can be assumed that the results can be transferred very well to social reality. However, the findings cannot be extrapolated uncritically to an international sample. Different training prerequisites as well as differences in professional socialization and understanding of roles must be considered. This is, however, mainly a question of pedagogical design and less one of basic technology-related impact factors.

4.3 Implications

The results of the survey and the evaluation itself have shown that the technical setting is demanding. Aspects such as customization of the HMD must be mastered in order to enable appropriate usability and a positive learning experience. To realize both, a "simulation faculty" is needed that can synergistically combine technical aspects and pedagogical aspects. Working with a checklist is recommended.

The results show that there is high motivation to learn in XR scenarios on the part of the paramedic trainees, which should be exploited as a potential. Where this potential should be used in comparison to conventional settings in terms of the relationship between cost, time required for the setup, and learning success must be weighed. Conceivable are primarily learning situations that are difficult to produce using conventional means, such as certain injuries or mass casualty incidents.

This leads to the next considerations that are familiar from conventional settings: Educational variables must be clarified and planned in advance. Even before the simulation, competence-based learning objectives must be defined. These considerations lead to the need to consider ex ante the competence-oriented learning goal also in terms of cognitive, affective, and psychomotor learning goals, including achievability within the available time. In order to consider an adequate level of stress and strains, the extrinsic cognitive load caused by the learning environment must also be included in these considerations. To strengthen involvement, a fiction contract is recommended. To enable reflection as an integral part of a case simulation, a structured debriefing should be established.

That an MR scenario is superior to a VR scenario could not be shown with this study. However, as a consequence for research, further studies designed as repeated measures in a within-subjects design are recommended to investigate the role of confounding variables such as familiarization effects, learning effects, and cognitive load.

For future studies, we recommend first an extension of the methodology, for example by focusing on competence-oriented outcome parameters. These could be structured according to curricular learning objectives. Second, we recommend that the sample be expanded and varied in terms of sample size and professional affiliation. In addition to larger samples, emergency nurses, medical staff, and other health professionals could be included. Third, variation in setting and intervention is recommended, such as future wired HMDs versus wireless HMDs. In general, the challenging technical environment, the demanding didactic considerations, the pedagogical interventions, and the organization on site make an interprofessional study team indispensable.

4.4 Conclusion

This study has a high degree of innovation as it is the first to address location-independent training from two different sites and to combine this with the pedagogical implications of interprofessional team training in EMS.

Looking at the results in terms of participant motivation, MR and VR appear to be learning environments that provide a sense of presence and realism and allow for psychological engagement of the participants. Participants in paramedic training were found to have high intrinsic and self-regulated motivation. For a planful and goal-oriented approach, we recommend creating a simulation checklist that addresses specific points of the XR simulation. Apparent trivialities, such as adequate fit of the HMD, can be critical factors for success. A competence-oriented approach is also recommended for XR simulations, as are a fiction contract and a structured debriefing, just like in a conventional case simulation. Technically, it is desirable to implement both wireless HMD and "data gloves". In the overall appraisal, we conclude that XR is not just a gimmick, but a promising learning method when used by a competent simulation faculty. XR case simulations are difficult to run by teachers as "lone wolves". Further studies addressing competence-based learning outcomes, techno-ethical perspectives and the role of habituation effects, learning effect, and cognitive load are needed.