1 Introduction

“Social stories” are a storytelling approach used in educational and therapeutic interventions for persons with Neurodevelopmental Disorder (NDD)Footnote 1, particularly autism. A social story is a short narrative that uses visuals and in most cases also written text to describe a particular social situation, event, or activity in a clear and reassuring manner that is easily understood by the individual with disability. Social stories are used as learning material to promote the development of autonomy and appropriate behaviors, and to teach particular social skills, such as identifying important cues in a given situation, understanding rules, routines, and expectation, or taking another’s point of view. We exploit Wearable Immersive Virtual Reality (WIVR) technology to create a novel form of Social Story, i.e., Wearable Immersive Social Story (WISS).

The digital content of a WISS are 360° videos that reproduce real environments of everyday life and the social situations taking place there (e.g., “taking the metro”, “visiting a museum”, “shopping at the supermarket”). These videos are executed on a smartphone and viewed through a low-cost Head-Mounted Display (Google Cardboard) that makes the user feel inside the virtual space. Interaction is achieved through head movements, gaze pointing, or gaze focus. In a WISS, videos are organized into a hypertextual structure. They are enriched with visual clues, which help users gain a better understanding of the social situation, and with interactive elements. The latter make the virtual experience more fun and engaging, while the action-feedback mechanism of interaction enforces cause-effect understanding and promotes a sense of purpose and of active control over the stimulation.

The paper discusses the design process - 4 workshops with NDD experts interplayed with prototyping activities – which was performed in cooperation with 6 NDD specialists leading to the definition of WISS and to the creation of two examples of Wearable Immersive Social Stories. We briefly describe the technological framework (called XOOM) that enables the authoring and personalization of a WISS and its execution at run-time. We finally discuss the highlights emerged from the final workshop concerning the usability of XOOM and the benefits and drawbacks of Wearable Immersive Social Stories for persons with Neurodevelopmental Disorder.

2 Related Work

Social stories have been proposed as an effective intervention for persons with (Autism Spectrum Disorders) ASD since the early 1990s. The term “Social Story” has been trademarked by its original creator to denote a narrative characterised by 10 detailed criteria [7] that define and guide the story creation in order to help the individual with disability understanding the entirety of a situation - who, what, when, where, and why. In most of the existing literature [1, 3, 5, 10, 20, 21, 22] the term is used in a broader sense according to the definition given at the beginning of this paper. The research on the effectiveness of this instrument is limited and highlights highly variable effects in the learning process [13, 15]. Still, social stories remain widely used in therapeutic and educational interventions for subjects with ASD as well as with other forms of disability in the NDD spectrum.

Originally based on paper-based visual and textual materials only, in today’s practices the social stories often use digital media, e.g., images, animations and videos on computer displays [8, 16]. In the research arena of Virtual Reality (VR) and Wearable Immersive Virtual Reality (WIVR), there are some examples of applications created for subjects with NDD that focus on social situations and can be regared as transpositions of social stories. Strickland et al. [19] developed desktop 2D virtual environments to teach fire safety skills to young (3–6 years old) children with ASD. Josman et al. [9] used 2D VR to teach students on the autism spectrum aged 8–16 years to cross the road safely. In [12], participants with autisms used a WIVR application for Oculus RiftFootnote 2 that aims at preparing individuals with ASD to use public transportation by placing them in a 3D city and setting tasks that involve taking the bus to reach specific destinations. Cheng et al. [4] presents a system that employs a 3D virtual environment in I-Glasses PC 3D Pro to help children concentrate on social situations and to learn non-verbal communication and social behavior. VR is thought appropriate for this target group to help learning about real life situations because in the virtual space behaviors and responses can be practiced in a safe and repeatable environment while interactivity promotes engagement and cause-effect understanding [2, 11].

WIVR applications have been found to improve attention skill because the head monuted display (HMD) removes the distractions of the outside world, a feature that is important for subjects with NDD who often have severe attention deficits. Benefits in this area have been observed in the study reported in [5], with low-medium functioning children with NDD using a low cost viewer to interact with the immersive digital transpositions of paper-based fantasy tales. Two important concerns related to the use of WIVR in interventions for persons with NDD are the acceptability of the headset and the risk of physical side-effects that are typical of experiences with wearable VR environments [17]: Motion sickness (due to a disagreement between visually perceived movement in the simulated world and the vestibular system’s sense of movement our body), double-vision (a particular condition under which the virtual elements are seen twice as overlapping copies instead of being perceived as one as they should) and eye fatigue (the feeling that our eyes are burning, itchy, and tired). The first-generation VR headsets were characterized by poor viewing angles, high latency, and weight. In 1996 Strickland et al. [18] explored the acceptability of WIVR in a study with two autistic children, aged 7.5 and 9 years, aimed at teaching them to recognize the colors of cars and cross the street safely. The authors defined the HMD as “heavy and awkward”: eventually the children accepted to wear it, but they manifested dizziness and eye fatigue during the experience. Today HMDs are much more comfortable, including those commercially available at an affordable cost (e.g., Samsung Gear VRFootnote 3 and Google CardboardFootnote 4). In [5] the majority of study participants using WIVR on Google Cardboard had an enjoyable experience, were fascinated by the immersive experience, and manifested the willingness to play with it again.

VR headsets are increasingly being used to view 360° videos, e.g., in tourism, cultural heritage, and professional training. There are examples of the use of wearable immersive 360° videos in regular education, e.g., the Google Expeditions ProgramFootnote 5 and the Immersive Education InitiativeFootnote 6. To our knowledge, the VR environments adopted in existing studies on WIVR for subjects with NDD are created in computer graphics, and the use of wearable 360° videos has not yet been explored in learning interventions for persons with NDD. In our research, we use Google Cardboard, the cheapest WIVR solution in the market (5€ for the paper version, 30€ for more resistant plastic variants). The VR viewer is composed of two biconvex lenses mounted on a plastic or cardboard structure available in different colors and shapes. The smartphone positioned inside the visor displays the visual contents, splitting them into two near-identical bi-dimensional images (Fig. 1 – left). The illusion of space depth and immersion in the virtual environment is created by the stereoscopic effect generated by the viewer lenses (Fig. 1 – right). The interaction is achieved through gaze pointing and focus, assuming that the direction of the gaze focus is defined by head orientation (detected by smartphone sensors), and is always at the center of the screen. Users can navigate the virtual world by rotating their head, which will consequently rotate the virtual scene projected in the display.

Fig. 1.
figure 1

Google cardboard viewer (top-left), view on the smartphone screen (bottom-left) and conceptual view during the experience (right)

3 Designing and Developing Wearable Immersive Social Stories

The design of Wearable Immersive Social Stories was a collaborative process among our university group (4 computer engineers and 1 designer, hereinafter referred to as “technical team”) and a team of 6 NDD specialists (special educators, neuro-psychiatric doctors, therapists, hereinafter referred to as “experts”) from two local care centers. We did not involve subjects with NDD in the design activities because of the nature and severity of the disability of the persons attending the centers. The experts and their patients had participated in an empirical study to evaluate of our previous applications for WIVR. They were enthusiast of this technology and had the idea of using it for the purposes of their “M4A - Museum4All” project, an initiative devoted to improving accessibility of museums for persons with NDD. In this project they created a set of social stories about the visit to different museums (see example in Fig. 2 – left) and they wanted to transpose them in WIVR environments. The iterative co-design process comprised 4 workshops interplayed with the development of progressive prototypes.

Fig. 2.
figure 2

Left: paper-based social story about the museum visit; Right: script for the 360° video (scenes 1 and 2) defined during workshop 1

3.1 Workshop 1

The first workshop was devoted to identifying the contents and the technology for the social stories to be rendered in a virtual environment. We discussed the tradeoffs between computer graphic contents and 360° videos, and between the different VR devices. We selected some examples of 360° videos available on the Internet in various museum websites (e.g., Louvre) and the experts experienced them on different devices (HTC ViveFootnote 7, Samsung Gear VR and Google Cardboard). We agreed to use 360° videos and Google Cardboard, for several reasons. Story contents based on 360° videos can be created at low cost, by recording real-life situations (using commercial and cheap 360° cameras) or retrieving existing videos from the web.

In principle, caregivers (specialists and parents) can create these contents autonomously, without learning any particular technical skill except video editing, and without the need of involving technology experts. 360° videos were perceived as more immersive and realistic than computer graphic applications, and enable the person with NDD to experience a naturalistic setting and to have the visual stimuli of the social space “as it really is”. Inside the immersive digital space, the user is required to build and process a representation of the virtual environment “as a location” in order to successfully navigate it. Outside the virtual experience, the user is expected to capitalize on this generalization construct, linking his/her mental representation of the virtual environment to the real world in order to understand the physical environment and the social situation in it. Subjects with NDD have limited capability of generalization, and this process may require long exposure with the virtual environment to take place; 360° videos of real contexts would facilitate this mapping. Finally, high-end headsets (e.g. HTC Vive) offer better quality VR experiences in terms of accuracy of head movement detection and screen resolution. Still, the very low cost of Google Cardboard increases its potential for adoption of WIVR experiences at care centers and facilitates its use at home. After this discussion, we selected a specific social story as case study, and the rest of the workshop was devoted to design the videos for its WIVR transposition. We decided to record different situations, one for each scene of the paper-based social story. We explored pros and cons of recording each scene with a fixed camera or a mobile camera. In the first case, the camera is placed in a specific point of the environment and remains fixed, so that the user of the 360° video would perceive to stand in the scene. Instead, producing a video with a mobile camera would suggest to the user the idea of walking through the scene, following the movement of the camera. In the end, we generated a detailed video script, defining what the camera should record and in which mode (fixed or mobile) for each scene (Fig. 2 – right.)

3.2 First Prototype and Workshop 2

During the 3 weeks between workshop 1 and workshop 2, two members of the technical team and one therapist went to the museum and, using a Samsung Gear 360Footnote 8 camera, shot the videos for each scene according to the script. In doing so, we followed existing guidelines on how to correctly record 360° videosFootnote 9. For example, in order to make the viewer feel present in first person in the story, the camera should be mounted at almost 150 cm from the floor to simulate an average person’s height, and all the key elements in the scene must be kept roughly 1 to 1.5 meters away from the camera. In fact, if they are too close, they look distorted, while if they are too far they may be indistinguishable. Other important suggestions are to avoid unexpected and fast movements with the camera and to keep camera’s front side in the movement direction in case of mobile camera videos, otherwise the vision of the final product can cause dizziness or nausea in the users. To reduce vibrations the camera was mounted on a bike helmet and was worn by the smallest team member (1.55 m) to meet at the best the height criterion.

After merging the single scenes, the resulting 360° video was experienced and discussed among experts in workshop 2 to identify requirements for improvements and extensions.

Some of the videos we recorded had quality issues: in some cases, the camera was not steady enough to guarantee a smooth VR experience in a few others the camera was not on focus, resulting in a blurred fragment. Low quality video fragments could induce possible sense of sickness; we identified the problematic fragments and decided to re-shoot them with greater care after the workshop.

The therapists pinpointed the need for facilitators, i.e., visual cues superimposed on the video that attract the user’s attention and help users focusing on those elements in the virtual that are more appropriate for understanding the current situation or explaining the behavior in a given context. To make the experience more engaging and fun some facilitators should be interactive, i.e., they generate visual (animated) effects when the user focuses the eye gaze on them. We defined the following facilitators:

  • Geometric shape (e.g., arrow or circle): it draws the user’s attention with clear visual signals contrasting with the realistic background.

  • Highlight: it lights up a specific area in a scene “shading” the rest of the environment, to drive the user’s attention on relevant details of the scene removing the surrounding stimuli.

  • PCS (Picture Communication Symbols): widely used in Augmentative and Alternative Communication (AAC), PCSs are color or black & white picture cards that represent objects, actions, activities, people, events, or more abstract concepts like feelings (e.g., happiness, sadness, disappointment).

  • Sound: it adds more realism to the scene or provides voice instructions and details in a specific situation.

  • Textual popup: it contains textual instructions, social cues, or comments for subjects who can read.

The experts also recommended to introduce “pause points” in which the video is suspended. Pause points would give the user some time to explore the surrounding environment, to discover “facilitators” and to understand their meaning; they would facilitate interaction with the interactive facilitator (which would be difficult while the video is running). Pauses can have a fixed duration, or the video would restart when the user focuses his/her gaze on an interactive element.

Workshop 2 led to the design of a new script for the videos of the “museum” WISS extending the initial one with the allocation of pause points at specific times and the specification of facilitators (type, content, space and time characteristics, i.e., when they had to appear, where, and for how long).

3.3 Second Prototype and Workshop 3

Between workshop 2 and 3 (4 weeks), the technical team re-shooted the low-quality video fragments and implemented the second version the Wearable Immersive Social Story, adding facilitators (Fig. 3) and pause points according to the specifications defined in Workshop 2.

Fig. 3.
figure 3

Examples of facilitators: an interactive sphere (left) and a highlight (right)

In workshop 3, the therapists brought an additional social story, “going to the supermarket”, which they normally use to teach persons with NDD to search and buy specific products in a large store. This story was used as second case study. As a group activity, we defined the script for a second WISS based on this story in order to validate the overall approach and to elicit additional requirements. Four new ideas emerged: “Distractor”, “Hyper-story”, “Caregiver’s Monitoring and Control”, and “Personalization”.

Distractor.

Reexamining the role of the stimuli associated to “facilitators”, we realized that depending on the user, the contexts, and the learning goal, the same stimulus could have different meanings, and trigger different reactions. For instance, the noise of a car in a city environment would act as facilitator, since users expect it as they see cars in the video: not having it would somehow break the immersion. Still, the same car roaring in a video showing a nature landscape would probably distract the user attention. A graphic element such as an arrow pointing to a door could indicate that the door should be opened and the user should interact with it; but in case of an emergency door, the user should ignore it or at least avoid interaction. So we came out with the idea of “distractor”, i.e., visual elements (of the same type as facilitators) that could be introduced in advanced sessions to train selective attention and improve understanding of a social situation.

Hyper-story.

The experience of a real situation is intrinsically sequential in the time dimension but is not always linear in terms of the space. The physical environment (e.g., a large supermarket with main aisles) may have a “hyper” structure with different paths available from a given position. To help persons with NDD master the complexity of “hyper” physical spaces, the experts suggest creating hyper-stories, in which at some moments the users must choose among alternative directions to take, i.e., alternative videos to play in order to continue the experience. In these situations, some interactive elements on the videos are “links” and act like “portals” towards different physical contexts. For instance, in the supermarket scenario, choosing an interactive element on a specific aisle would correspond to activating the video fragment that “moves along” the selected aisle. Hyper-stories increase the “realism” of the experience and stimulate some basic skills that are often impaired in subjects with NDD: the capability of “making choices” and the sense of “agency” (the subjective awareness of initiating, executing, and controlling one’s own volitional actions in the world).

Caregiver’s Real-Time Monitoring and Control.

To monitor, give support to, and stimulate the user during the WISS experience, the therapist should be enabled to see what the person is currently watching in the head mounted display, and in some cases to control the video execution (i.e., pausing the video because a person needs more time to explore the space or to use facilitators, or restarting the video when the subject is unable to proceed after a pause point).

Personalization.

Each person with NDD is unique, and the value of any interactive technology for this target is directly related to its ability to meet the specific characteristics and the needs of each single person [14]. Hence the expert’s final requirement was about a tool enabling them to personalize stories for each patient.

3.4 Technological Tools and Workshop 4

In the four months between workshop 3 and 4 an intense development activity took place, involving the implementation of a software platform for WISS creation, personalization and execution, and the construction of a tele-operated mobile robot for video shooting. The workshop then focused on the experimentation of the authoring tool and on the evaluation and discussion of the overall results of the project.

Software Platform.

The software platform - called XOOM [6] – is web-based and integrates two main components: Creator and Runtime Controller. The Creator is an authoring tool for Wearable Immersive Social Stories. It enables therapists to create a new WISS starting from a set of 360° videos, and to personalize an existing WISS (Fig. 4). The authoring functionalities include: allocation of videos on a timeline; definition of visual elements (facilitators and distractors) in terms of graphic and interactive properties and positioning on the videos; definition of pause points. The Runtime Controller manages the interaction with and the execution of fully featured Wearable Interactive Social Stories on smartphones. This component also enables the visualization of the running WISS on an external PC and the manual control of its flow (launching a WISS, pausing/starting a video, launching a different WISS).

Fig. 4.
figure 4

The creator component of the technological platform XOOM

The Robot.

Our simple remotely operated mobile robot – called Bob (Fig. 5). - has been created to shoot 360° videos with a better quality than the ones recorded manually. Bob’s body is composed by plastic pipe inserted in a hard plastic cone and a “hat” on top of the pipe. The cone has a movement stabilization purpose and is mounted on a mobile base that exploits the commercial iRobot iCreate programmable platform. A Samsung Gear 360° camera is placed on the hat. The body height can be adjusted based on specific needs (e.g., if we want to record video with a child’s eyes or from an adult perspective). A therapist and two members of our team used Bob at a local supermarket (outside opening hours) to record the videos for the WISS “shopping in the supermarket” (Fig. 5 - left), the specifications of which were defined in Workshop 3.

Fig. 5.
figure 5

Left side: Bob video-recording outside and inside the supermarket; Right side: resulting WISS with a superimposed PCS card as facilitator

Activities During Workshop 4.

During the workshop, we used the videos recorded by Bob and the complete script of the “shopping at supermarket” WISS as testbed materials for a final evaluation and brainstorming session. Each expert was required to use XOOM and to create a new WISS according to the “shopping at supermarket” specifications. At the end of this work, the experts were asked to write down their opinions about the usability of the system, the benefits of Wearable Immersive Social Stories for specific target groups in the NDD spectrum, and the potential drawbacks of this instrument. We discussed these results in a final brainstorming session.

4 Discussion

A number of interesting themes emerged from the answers collected in the form and the following discussion. Only 50% of the experts could complete the creation of all features of the WISS in the required timeframe (20 min). Some of them omitted to do the most complex task. This concerned the creation of an interactive visual element to simulate the “placing a box in the shopping chart” (by creating a pause point, defining a 3D interactive shape that disappears when pointed and appears on top of the shopping chart). Still, all experts expressed a global positive judgment about the usability of the authoring tool and the control functionalities. All participants understood the structure and the different functionalities offered by XOOM, considered them intuitive, and managed to use them with a progressively increasing autonomy.

Concerning the WISS, all therapists and educators claimed the intention to adopt the two wearable immersive social stories as a complement of their current learning practices for NDD individuals. All experts confirmed the initial impression emerged since workshop 1, i.e., that they expect to be easy for a subject with NDD to navigate the 360° videos and interact with interactive elements. Still, they also pinpointed some potential drawbacks in this technological approach. Individuals with NDD may have an initial resistance to wear the HMD and to explore a WISS, as in general these persons feel a strong need for routine and tend to get distressed when a situation or a pattern of behavior changes. Some educators suggested some preliminary familiarization activities without the viewer that would mitigate the risk of resistance to use a WISS, e.g., wearing a hand-crafted cardboard-based mask like the ones used in some physical games and look at the 360° videos on a regular PC screen. The experts also argued that a WISS, and WIVR technology in general, may be not suitable for patients that suffer of psychosis or hallucinations, since they already live situations of detachment from the reality and the immersion in a virtual environment could worsen their condition.

With the above caveat, the expert highlighted the motivational benefit that a WISS can have among the persons with NDD, particularly children and young adults. Once the “distress” originated from novelty is overcome, a WISS offers a playful and enjoyable experience that can promote learning. It would be particularly useful for patients that need to develop autonomy, such as taking the metro and other public transport means alone, or learning new street paths. The virtual experience can act as a preparation to the reality, helping users in driving attention towards the relevant elements and teaching them to overcome unexpected events and distractions. They experts also pinpointed that Wearable Immersive Social Stories should be offered also at school, and in non-structured learning environments, e.g., at home or in the social contexts that are the subjects of the social story. Finally, the experts were convinced that the combination of Wearable Immersive Social Stories with a tool like XOOM have many advantages with respect to traditional social stories. XOOM makes it possible to create and control visual stimuli inside a WISS in a way that is appropriate for each specific subject, and to create becoming a progressively more complex story with the addition of new facilitators and distractions at each repetition. XOOM offers a simple and smooth way to play the same story as many times as one wills, or to change it when the subject is bored. It was hypothesized that subjects that suffer of anxiety would particularly benefit of the possibility of repeating the experience again and again in order to familiarize with it.

5 Conclusions

The main contribution of our work is in the definition of the concept of Wearable Immersive Social Story for persons with NDD and in the presentation of a co-design process for these learning tools that can guide designers of WIVR experiences for this target group. The next step in our research agenda is to validate all the considerations emerged during the co-design process reported in this paper. Concerning XOOM, we will perform a more systematic usability study involving a wider number of NDD experts who never experienced WIVR before. Concerning WISS, we have planned a long term empirical study involving 40 subjects with NDD at the 2 centers participating in our research. The research is designed as a controlled study to investigate the learning benefits of Wearable Immersive Social Stories also compared to more traditional forms of social stories.