1 Introduction

Within the next few years, automated vehicles (AV) will be integrated into today’s traffic. From SAE level 4 onwards, no driver will be needed to steer the vehicle, and automation should handle nearly every situation independently [1]. AVs will communicate seamlessly with each other by means of new technologies such as Car2X and Car2Car [2]. AVs will, however, be integrated into manually driven traffic and must be able to communicate with human road users (HRU) in this complex socio technical system [3]. An insufficient integration might raise communication issues [4]. This can be particularly problematic when it comes to high risk traffic groups such as pedestrians [5]. The lack of communication between driver and pedestrian may decrease trust [6] and confidence [7]. Communication has been formed over time through social and cultural influences [8]. AVs therefore not only need to understand manual traffic, but have to adapt communication according to cultural specificities.

To communicate appropriately AVs can try to emulate human behavior. They can adapt their driving behavior and communicate intentions in an implicit way, such as by adapting speed and trajectories [9] or communicating directly [10] via already existing features, such as turn indicators or the horn [11]. Equipping AVs with novel external human machine interfaces (eHMI) which communicate additional messages might also be a beneficial approach [7, 12, 13] to replacing current driver behavior such as eye gaze, mimics, and head and hand gestures. In a video survey, it was found that pedestrians would cross the road more often when the AV was equipped with an eHMI [14].

Most eHMI solutions prevalent in showcars and academic research predominantly use technologies which rely on visuals, such as projection onto the street, externally legible displays showing either text or icons, or direct light from a light bar [15]. An overview of different eHMI approaches can be found in [16]. Replacing direct communication is a major challenge in the development of AVs, but also creates an opportunity to improve current interaction by establishing clear and consistent interaction patterns with eHMIs specifically designed for this purpose. Besides increasing the trust and acceptance in AVs, improvement of traffic flow is a major goal of introducing eHMIs. To improve traffic flow, fast and correct intention recognition is crucial.

References [17, 18] developed different eHMI solutions in a user centered design process: one being a light-band wrapped around the vehicle, emitting different signals, and a further one being an external display, showing icons (Fig. 1). These eHMIs are able to transmit the intention of the AV through the messages “AV will give way” and “AV will pass” when encountering other traffic participants. Indeed, communicating the intention and awareness of the vehicle has been considered a more fruitful approach for eHMIs than showing commands of what the pedestrian should do [19,20,21,22].

Fig. 1
figure 1

eHMI Variants used in the study (from left: Baseline, Light-band, Icon)

HMIs have generally been found to differ depending on the cultural background. For instance, design patterns in websites were found to differ depending on the cultural background that the website was created in [23]. This suggests that mental models and expectations of HMIs differ between cultures. If external HMIs for AVs are introduced into different cultural contexts, there are two possible solutions: EHMIs are either not adapted by the manufacturer and must therefore work cross-culturally, or they are indeed tailored to the specific markets into which the AV is introduced. However, cultural influence might not be limited to explicit communication through eHMIs. Reference [4] states that driving behavior differs between the US, Europe, and China.

The author argues that in the US and Germany drivers show more consistent behavior than Chinese drivers, who are more prone to traffic violations. Indeed, the general rate of cars stopping to give way to pedestrians was found to be very low in China [24]. As expectancy is formed through experience, people from a Chinese cultural background might expect different stopping rates than Westerners. Furthermore, attitude towards automated systems has been found to differ between Western and Asian cultures. In a study run in the aviation context [25] Asian and Western pilots differed in their preferences and enthusiasm towards automated systems, with Asian pilots being more enthusiastic towards automation than pilots from Western societies.

To examine the feasibility of transferring eHMIs cross-culturally, we conducted a virtual reality (VR) study on the influences of two different eHMIs on pedestrians’ intention recognition in the US, Germany, and China. We expect eHMI to improve intention recognition across all scenarios compared to baseline. As the eHMIs used in this study [17, 18] have been developed in Germany, we expect both eHMIs to generally work best in the Western and especially German background compared to a baseline without eHMI. One of the two eHMI solutions included icons, which are known to be culturally dependent [26, 27]. The other solution was purely light-based, featuring a light-band – a solution that resembles already cross-culturally used features such as turn indicators – which we thought should be more universally understood. We therefore expected the light-based eHMI to perform in a more stable way across the three studies.

2 Method

2.1 VR Pedestrian Simulator

The study took place at test studios in Mountain View (CA, USA), Mannheim (Germany) and Shanghai (China). In all three surveys, the same BMW research pedestrian simulator was used (Fig. 2). The pedestrian simulator consists of a standard HTC Vive Pro VR setup (head mounted display, and two infrared trackers, as well as the HTC VIVE’s remote control) and a computer, running the simulation software which is based on Unity 3D. During the simulation, participants were immersed in an urban environment, standing on the sidewalk of a street and encountering an AV, a BMW i3.

Fig. 2.
figure 2

BMW research pedestrian simulator

2.2 Study Design and Measures

The intention of the AV was manipulated by simulating different driving behaviors. When the AV’s intention was to give way to the pedestrian, it started to decelerate 20 m before the pedestrian at a constant deceleration of −3.5 m/s2; When the AV’s intention was to pass the pedestrian, it continued onwards at constant speed at 25 mph (about 40 km/h).

For the intention “Give Way”, a 3 × 3 factorial study plan was used (Table 1), including three priorities (AV, HRU, undefined) and three eHMI solutions (none, icon, light-band). For the intention “Pass”, a 2 × 3 factorial study plan with the factor’s priority (AV, undefined) and eHMI solutions (none, icon, light-band) was used (Table 2). We did not include this intention for the HRU priority condition, because this condition would imply a violation of the traffic code, which might have influenced the overall study results.

Table 1. Study plan for the intention “Give Way”.
Table 2. Study plan for the intention “Pass”.

Independent Variables

External Human Machine Interface

To account for the effects of the technologies and potential differences in the comprehensibility of eHMIs across cultures, two eHMI solutions [17, 18] were used in this study. Furthermore, a baseline without eHMI was included.

The first eHMI consisted of an exterior display, mounted behind the windscreen (see Fig. 3) which displayed two different icons. For the AV intention “Give Way” there was an icon displaying a car with a “stop” line in front of the car to symbolize a stopping car (Fig. 3, right). For the intention “Pass” an icon, showing an open hand (Fig. 3, left) was displayed to communicate that the pedestrian should stay back. The second eHMI was a light-band integrated into the chassis of the AV showing two different states for the two different intentions of the AV. The light-band pulsed slowly to communicate the intention “Give Way” and pulsed rapidly when communicating the intention “Pass”.

Fig. 3.
figure 3

Icons for intentions (left: “Pass” with open hands, right: “Give Way” with stopping car)

In the baseline condition, no eHMI was displayed and participants had to derive the AV’s intentions solely from the driving behavior. The icons as well as light-band states were displayed in white to prevent attentional biases due to color [27]. The signals were presented 6 s after the start of each trial, at a distance of 40 m from the pedestrian.

Priority

Three different traffic scenes with different priority regulations were included in the experimental setup: a street with a zebra crossing, where the HRU has priority (HRU), a 2-lane street with no cross markings, where the AV has priority (AV), and a parking space scenario with undefined priorities (Undefined). Speed and longitudinal and lateral distances to the pedestrian at the relative points in time were identical in all trials.

In the HRU priority, the pedestrian was standing at the curb waiting at a zebra crossing. The street was an urban two-lane street with no middle lane markings. In the US, a traffic guard was additionally placed at the zebra crossing, ready to stop the approaching car. In the AV priority condition, the pedestrian was placed at the sidewalk of the same two-lane street as in the zebra crossing condition, but distant from the zebra crossing. In the Undefined priority condition, participants were standing in a parking lot next to a parked car. The parked car was placed next to the pedestrian to create the same physical barriers as in the other conditions while not impeding visibility. The parking lot was large with multiple parked cars and no lane markings except the parking spots.

Dependent Variables

The intention recognition time (IRT) was measured from the moment participants recognized the AV’s intention [9] and pressed the button of HTC Vive’s remote control. By means of a short interview included after each trial, two variables were measured in all three countries: correct recognition of the AV’s intention was measured by asking participants to judge the intent of the AV (give way or pass). We furthermore measured participants’ certainty of choice (very uncertain to very certain on a scale from 1 to 5).

2.3 Procedure

After filling out their demographic data, participants were introduced to the pedestrian simulator and familiarized with the experimental setup. Participants were instructed that they would be participating in the study as a pedestrian encountering an AV in an urban environment. Following this, participants put on the head mounted display and ran three practice trials to become familiar with the setup, and the remote control they were holding in their right hands, as well as with the rating scales.

Participants were placed in identical starting positions for each trial. The study consisted of three study blocks which differed in terms of priority. Altogether 15 trials were executed. In the simulation, participants stood on the side of the street (on the sidewalk or next to a parked car, depending on the traffic scene). Each trial started with the AV driving at a constant speed of 25 mph (about 40 km/h). Participants were instructed that an AV would be approaching from the left, and were asked to press a button on the remote control once they decided that they had understood the intention of the AV. The moment they pressed the button the simulation froze. The IRT was measured and the experimenter completed a short interview. Afterwards the next trial started. In the end, short debriefing interviews regarding the interpretation of the eHMIs were conducted.

2.4 Participants

Across all countries N = 82 participants took part in this study. Table 3 shows the description of the US, German (GER), and Chinese (CN) samples. Participants’ ages ranged between 20 and 65 years. They either had no visual impairment or corrected vision such as contact lenses or glasses. In the US, Germany, and China participants were recruited externally and received compensation.

Table 3. Sample description.

2.5 Data Analysis

Data was analyzed using SPSS statistical software, version 23. As the goal was to compare the mechanisms of different eHMIs for different AV’s intentions six repeated measures ANOVA were run: one for each country (US, GER, CN) and each intention of the AV (pass, give way). In case sphericity was violated, degrees of freedom were corrected according to Huynh-Feldt. The main effects of the two factors, eHMI (baseline, icon, light-band) and priority (AV, HRU, undefined), are reported separately. For the factor eHMI, two-tailed planned contrasts were conducted between baseline and the eHMI conditions in combination with effect size r. For the factor priority, post-hoc tests were conducted. Contrasts and post-hoc tests were Bonferroni adjusted.

3 Results

3.1 External Human Machine Interface (eHMI)

Correct Interpretation of Vehicle Intention

Intention “Give Way”

In the US (F(2, 56) = 7.95, p = .001) and in Germany (F(2, 56) = 4.07, p = .022), the type of eHMI had a significant main effect on correct interpretation rates of the vehicle intention (Fig. 4). In both countries, the light-band significantly enhanced the number of correct interpretations, improving the rate of understanding in the US from 59% (baseline) to 87% (F(1, 28) = 12.72, p = .003, r = .56) and in Germany from 64% to 86% (F(1, 29) = 6.35, p = .035, r = .42). For the Chinese sample, the type of eHMI did not influence the correct interpretation significantly (F(2, 42) = 0.58, p = .563).

Fig. 4.
figure 4

Correct interpretation across countries and eHMI for the intention “Give Way”

Intention “Pass”

Overall, eHMI influenced the rate of correct interpretation in an opposite way than for the intention “Give Way” and decreased the rate of understanding (Fig. 5). In the US (F(2,56) = 6.38, p = .003), the icon eHMI significantly decreased the correct interpretation rate from 93% in the baseline to 69% (F(1, 28) = 10.98, p = .005, r = .53). For the German sample, no significant main effect can be reported (F(1.65, 47.98) = 2.73, p = .082). However, contrasts revealed that a significantly lower number of participants detected the intention of an AV with a light-band correctly (83%) compared to baseline (97%), (F(1, 29) = 7.86, p = .018, r = .46). The significant main effect of the type of eHMI on correct interpretation rates in China (F(1.44, 30.18) = .409, p = .041) could not reveal a significant Bonferroni-adjusted contrast. The intention recognition rate for light-band (82%) and icon (80%) did not significantly differ from the baseline (97%).

Fig. 5.
figure 5

Correct interpretation across countries and eHMI for the intention “Pass”

Intention Recognition Time (IRT)

Intention “Give Way”

In the US (F(2, 56) = 7.49, p = .001) and in China (F(2, 42) = 8.32, p = .001), the main effects of eHMI showed that responses were faster with an eHMI present (Fig. 6). Contrasts revealed that in both countries the icon (US: F(1, 28) = 13.04, p = .002, r = .56; CN: F(1, 21) = 11.26, p = .006, r = .59) as well as the light-band (US: F(1, 28) = 6.23, p = .037, r = .4; CN: F(1, 21) = 9.28, p = .012, r = .55) led to faster IRTs than the baseline condition. In Germany the IRT did not significantly differ between eHMIs (F(1.74, 48.75) = 1.80, p = .174).

Fig. 6.
figure 6

IRT across countries and eHMI for the intention “Give Way”

Intention “Pass”

In the US, results were similar to the intention “Give Way”, with eHMI conditions leading to faster IRTs (F(2, 56) = 7.74, p = .009) (Fig. 7). The icon (F(1, 28) = 11.93, p = .004, r = .55) and the light-band (F(1, 28) = 10.46, p = .006, r = .52) differed significantly from the baseline condition. In China (F(2, 42) = 3.13, p = .054) as well as in Germany (F(1.39, 40.33) = 3.26, p = .065) eHMI did not influence IRT significantly when the AV’s intention was to pass. In Germany, however, planned contrasts revealed that responses were significantly faster with an icon present as compared to the baseline condition (F(1, 29) = 8.24, p = .015, r = .47).

Fig. 7.
figure 7

IRT across countries and eHMI for the intention “Pass”

Certainty of Choice

Intention “Give Way”

When the AV’s intention was to give way, participants in all three countries did not differ in their certainty of choice between eHMI variants (US: F(1.62, 45.38) = 0.07, p = .899; GER: F(2, 56) = 0.42, p = .582; CN: F(2, 42) = 0.02, p = .984).

Intention “Pass”

For the intention “Pass” there were significant effects of the eHMIs in the US (F(2, 56) = 5.41, p = .007) and in Germany (F(2, 56) = 4.06, p = .022) which can be attributed to the light-band leading to a lower certainty of choice (US: F(1, 28) = 11.73, p = .004, r = .54; GER: F(1, 29) = 6.28, p = .036, r = .42). In China, certainty did not differ significantly between eHMIs (F(2, 40) = 1.20, p = .311).

3.2 Priority

Intention “Give Way”

Across all countries and variables, there were three significant main effects of the factor priority. In the US, priority influenced the correct interpretation (F(2, 56) = 3.33, p = .043) and the certainty of choice (F(2, 56) = 3.97, p = .024). For the German sample, priority did have an influence on IRT (F(1.52, 42.61) = 3.63, p = .046). However, for all three effects, Bonferroni-adjusted post hoc tests revealed no significant differences between the type of priority. In China, no main effects of priority can be reported.

Intention “Pass”

For the intention “Pass” there was only one significant main effect of priority on correct interpretation rates in China (F(1, 21) = 6.83, p = .016). The post hoc test revealed that the parking scene with undefined priority led to a better understanding of the AV’s intention than the two-lane street condition where the AV had priority (Mdiff = −0.14 (95% CI[−0.25–−0.03]), p = .016).

3.3 Interaction Effect (eHMI and Priority)

Intention “Give Way”

In total, three significant interaction effects between eHMI and priority occurred, being one effect per country.

In the US (F(2, 56) = 3.71, p = .007) as well as in China (F(4, 84) = 2.60, p = .041), there were significant interactions associated with the correct interpretation rates. For the US sample, the light-band eHMI had a consistently high rate of correct interpretation for all traffic scenarios, while the icon eHMI only improved the correct ratings in the pedestrian priority situation and did not differ from baseline in the other scenarios. In China, the correct understanding for the light-band eHMI and the icon eHMI were rather stable for all traffic scenarios, while the baseline condition differed strongly between the priorities (best for HRU priority and worst for undefined priority).

In Germany, there was a significant interaction effect regarding the certainty of choice (F(4, 112) = 2.79, p = .030) which can be attributed to the HRU priority condition in which the icon and the light-band improved certainty compared to the baseline for the AV priority and undefined priority condition. EHMIs and baseline did not differ.

Intention “Pass”

When the AV was passing, no significant interaction effects between the eHMIs and priority occurred.

4 Discussion

This pedestrian VR simulator study investigated the influence of eHMI on pedestrians’ intention recognition across three studies and three cultural backgrounds.

We predicted that an eHMI would generally improve intention recognition across all scenarios compared to the baseline. Additionally, since the eHMIs were developed in Germany, we expected them to generally work best in a Western and especially German environment. One of the two eHMI solutions included icons which are known to be culturally dependent [26, 27]. The other solution was a light-band. The light-band solution resembles common cross-culturally used features such as turn indicators, which we predicted to be more universally understandable. We therefore expected the light-based eHMI to perform in a more stable way across the three studies.

4.1 Main Findings: eHMI and Intentions of the AV

We expected eHMIs to improve intention recognition across all priorities and all cultures. We found that priority did not have systematic influences across all variables. Results show that eHMIs improved correct intention recognition rates in the US and Germany when the AV’s intention was to give way. These effects were caused by the light-band eHMI, while the icon eHMI did not differ in terms of intention recognition from the baseline across all cultures. IRT was lowered in the US, while all participants in all three studies felt equally confident in their selection. EHMIs, however, led to deteriorated intention recognition when the AVs intention was not to yield. Intention recognition was most accurate in the baseline condition for all three countries (US: 93%, DE: 97%, CN: 95%), when the car did not yield to the pedestrian. Certainty of choice was lowered in the US and Germany when an eHMI was present and the AV passed and did not yield.

These findings suggest that using an eHMI to communicate that the AV will not yield to a pedestrian is not beneficial, rather potentially confusing or even detrimental. The event of a car approaching at 40 km/h itself seems to be dissuasive enough for the pedestrians to decide not to cross in front of the arriving AV. Showing eHMIs in this condition seems to influence pedestrians’ safe legacy behavior, which could ultimately lead to an inappropriate or dangerous decision to cross the street. As scenarios were stopped at the point in time participants made their decision and pressed the button, we cannot conclude that participants would ultimately have walked into the street. The lower error rate without the eHMI, however, suggests that it would be beneficial to refrain from showing any eHMI signals unless the car is engaged in an already safe interaction with vulnerable road users – such as stopping.

When the AV was about to yield, eHMIs provided significant benefits for the pedestrians compared to baseline in the US and Germany. It could therefore be beneficial to use eHMIs to reinforce the AV’s intention to let the pedestrian cross. This finding is in accordance with other findings of significant benefits for communicating the intention to yield to pedestrians [28, 29].

4.2 EHMIs and Culture

We expected the light-based eHMI to be less susceptible to cultural influence, thus showing the most stable effects across all three samples. While the German and US sample showed improved intention recognition with an eHMI when the AV was about to yield, Chinese participants did not profit from an eHMI in this scenario. IRTs were shortened in the US and China with an eHMI present, while remaining constant in Germany. The fact that the icon eHMI did not show any significant benefits in intention recognition rates might be caused by the shortcomings of the specific eHMI design, as the icon eHMI might be more difficult to process or was visible at a later point in time and both icons thus difficult to distinguish.

The Chinese participants did not seem to fully profit from any eHMI showing the AV’s intention to yield. While they responded significantly more quick, intention recognition was not improved. Insights from the post study interviews suggest that this might be due to misinterpretations of the eHMI. Participants associated the slow pulsing with warning signals or even interpreted the eHMI as a design element without any further meaning. Thus, there are indications that the specific eHMI type that showed positive effects in Germany and the US was not suitable for Chinese pedestrians.

Cultural differences regarding the general traffic behavior in China might be a further explanation for this study’s results. The traffic scenarios used were rated by all study participants as very suitable for the Chinese market at the study debriefing. However, the typical behavior of Chinese drivers encountering pedestrians in these scenarios might differ fundamentally from German or US drivers in equal scenarios. For instance, the general stopping rate of cars to give way to pedestrians was found to be very low in China [24]. This habitual behavior of traffic participants might lead to a very low expectancy of Chinese pedestrians that any car, manually driven or automated, will yield to them, thus influencing overall probabilities that an eHMI will be considered in their decisions. The interaction effect of improved intention recognition with eHMI in the parking space scene found in this study further supports this hypothesis.

4.3 Limitations

This study’s results are inherently limited since very controlled traffic scenarios were used in order to isolate differences in the comprehensibility of the eHMI. Therefore, the results cannot be directly transferable to real traffic. All trials in this study included only one pedestrian and one AV, which only represents an excerpt of actual traffic. Furthermore, the findings have to be limited due to constraints caused by the VR setting, such as resolution, brightness, or angle of view, which might also have had an impact on effects such as the difference between a light-band and an icon eHMI. The visibility of the icon eHMI might have been reduced due to a lack of sufficient resolution or a limited field of view. In addition, the pedestrians’ perception of the AV’s braking behavior in VR might be different from the one they have for manually driven vehicles. If braking behavior is less predictable in real traffic, for instance, due to other influencing factors, eHMIs could have a greater impact than in VR. Thus, specific VR effects resulting in reduced visibility opposed and to the real world experience with eHMI should lead to reduced effects for the eHMIs compared to the baseline.

4.4 Future Research

To generalize this study’s findings, more complex scenarios which also include additional traffic participants besides the study participant and the AV should be investigated. In addition, different speeds for the approaching AV should be included as this might profoundly impact the perception and interpretation of the eHMI. EHMIs should be tested on the real road to overcome the technological shortcomings of VR or other simulators. Furthermore, different methodological setups to evaluate eHMIs should be part of future research. Besides IRTs, critical gap acceptance [30] as an indicator of traffic efficiency or actual crossing behavior, such as crossing initiation time, might be of interest. Even in the scenarios in this study, which yielded significant benefits, negative side effects such as pedestrian distraction and the lack of safety glances at other vehicles present might be observed by using eye tracking in different setups. Pedestrians’ attention might be captured by an eHMI solution and lead to neglect of the road traffic around them. This should therefore be investigated further.

No “one size fits all” eHMI solution was found in this study. It seems questionable to just deploy existing eHMI solutions to differing cultures. In further research and development, eHMIs should be adapted to their respective markets, such as the Chinese one, by means of new and improved design concepts by and for the respective market. The interaction effect between priority and eHMIs found in the Chinese sample suggests that eHMIs have further potential for Chinese pedestrians. Once developed, these localized eHMIs have to undergo thorough evaluation in different, culturally adapted traffic scenarios, also considering that we cannot know the cultural background of the recipient of the AVs message.

5 Conclusion

The authors conclude from the results presented that from a safety point of view it is not necessary or may even be counterproductive to display eHMIs in situations that might be harmful when the eHMI is misinterpreted. We therefore argue that situations in which an eHMI is displayed should be selected cautiously and benefits and potential problems should be studied carefully.

From a cultural point of view, the results of this study might have several implications. First, it might be concluded that eHMIs have to be localized and adapted to the respective market and the expectancy of the traffic participants in the culture they are introduced into. It will, however, be challenging to deploy localized eHMI solutions to different markets as, unlike in most other HMIs, an AV does not know the cultural background of the recipient of its messages. For instance, a European pedestrian encountering the same AV type in Europe and then China might be confused if the same vehicle interacts with him or her in a different way in each country.

Second, traffic scenarios in which an eHMI is used might differ between cultures. It might for instance be suitable to use eHMI in the Chinese market only when interacting in shared space scenarios, as interactions might be resolved equally well without eHMI in other traffic scenarios.

Third, traffic scenarios used for testing eHMIs have to be selected carefully taking into account cultural differences. These scenarios not only have to be comparable in terms of measurable context factors such as priorities, distances, or velocities [11], they also should take into account habitual behavioral patterns of current traffic and the expectations of road users as to what happens in such situations.