Keywords

1 Introduction

In the 1920s, Otto Neurath among others formulated the idea, that one should use images to make impression and introduced with Isotype a graphic system in order to make data understandable to a broad, non-informed and probably non-educated audience through data visualization means (cf. [13]). Ever since, VR and AR visualizations are seen as such utilities that, by mixing and blending reality with overlaid annotations and information, create such clear, unambiguous views. Especially in industrial application, such as maintenance, production or training, this is not only crucial but regarded as the key advantage and added value of AR and mixed reality technology. However, even with the third AR wave and technology push from Smartphones in 2007, PokemonGo and Hololens in 2015, and ARKit in 2017, AR apps and prototypes stick to rather simple superimposition that present 3D data merely affixed on reality, or worse eventually block or occlude it. By doing so, they are showcasing technological improvements instead of focusing on communication and mediation aspects or the creation of a clear and consistent information context. In that sense, AR visuals are presented in reality but are often hardly or not at all connected to it.

With AR tracking technology in mind that is capable track objects precisely (e.g. via vision based techniques such as [17]), information can be presented precisely spatially positioned at an object of interest and enables to use tracking technology e.g. for a discrepancy check. Consequently, this enables both: to add complementary information to such target objects (e.g. 2D annotations or 3D graphics), and to employ these objects during visualization, e.g. by digitally highlighting parts of a machinery.

Ideally such techniques allow the user to focus on the content and the perception of an augmented or extended reality. It needs a visual language and an idea on how to strategically use visual elements to convey information appropriately. In this paper we are going to discuss such AR elements in the context of industrial use cases.

Through a bottom up approach, we present a collections of elements that we’ve seen useful throughout industrial AR projects, such as (a) communication of a novel product’s main features, (b) task- and procedure-based maintenance support, (c) spatially located and aligned expert information.

Having a background of around 10 years in the field, up until today we found a coherent system or classification during the design phase missing, of how AR is utilized to (ideally) convey information of such industrial apps. But we think it is needed in order to create a distinct information context while presenting visual overlays in either video- or optical-see-though setups. With ARs ability to display information that is relevant to a context at hand, i.e. a situation or a task users perform, the imperative hypothesis is that ARs eases a users cognitive transfer performance through distinct visuals that guide the eye and the user thereby. Hence, visual elements ideally come in an appropriate visual form, and at best supplement mediation approaches.

We discuss the state of AR visual elements and aim at answering which visual elements are available and what to gain from them in terms of mediation and communicative goals, besides technological issues, which have been intensively addressed in literature until now.

2 Related Work

We’ve been looking into mediation and visualization techniques in AR with a bottom-up approach, using various publication sources, from research to public project documentations on the web and our own experience and work. A sound collection can nowadays be found in [18], which focuses on visualization techniques on technical level, detailing algorithmic and implementation details rather than focusing on application and use case. However, it identifies and describes the most relevant recent visual elements. Looking into the latter, at least the work of Xu [14] is relevant. The paper introduces and describes pre-patterns as an systematic attempt for visuals and interaction alike, but is focusing primarily on AR games.

The work of [15] and our study around [16] have illustrated, that there is a strong need for a smart view management and an informed usage of visual elements for information presentation in AR, and, that such presentations can benefit from camera- and motion-controlled interactions. Ideally, such techniques allow the user to focus on the content and the perception of an augmented or extended reality.

A formalized framing, or more concretely standardization, has also a vast history in the area. Many groups try to formalize information models and data structures as well as semantics in the data modeling for the AR-related usage of content. AREL was one of the first efforts, which was a propitiate declaration driven by the private organization Metaio. Today, NIST [20] and AREA [21] are driving such efforts, especially in terms of industrial utilization.

3 Visualizations and Visual Elements in AR

In industrial cases, we think AR evolves its potential especially in MRO scenarios (Maintenance, Repair, Overhaul) as it enables the superimposition of maintenance instructions and assisting information. Self-evidently the usage of AR should be meaningful without increasing the inherently complexity of maintenance tasks in industrial environments. Due to information density, there’s a necessity to adapt users perceptional limitation by adaptive visualizations and this should be considered in the design process of AR applications. However, as Grasset et al. [15] summarize various visualization techniques for AR have been proposed, but adaptive visualization techniques are still poorly studied in AR. Although this is a common methodology in disciplinary fields like geographic information system and cartography. To some extent this transfer is considered on a technological level by some approaches, f.e. by Julier et al. [11] who proposed filtering methods to reduce information overload and visual clutter or by adaptive visual aids by Webel et al. [10] where the strength of information changes dynamically.

With such a goal in mind, we collected and identified different visual elements with a focus on mediation quality instead of technical feasibility. The aim is to describe, categorize and organize visual elements in such a way that we are able to discuss their suitability for different tasks and mediation levels in industrial AR cases. With such a canvas, the idea is to derive a template or tool box of common visual AR elements and with a set of design criteria to (visually) guiding the user’s focus and attention by e.g. virtually highlighting elements of a physical object, or through adaptive annotations.

3.1 Common Visual Elements

In AR on smartphones, tablets or glassware like HoloLens, one can consider several layers, which compose an overall perceptual impression for a user: one is the real surrounding, either perceived by video-see-through or optically, second is the 3D world or AR space, where superimposed 3D renderings appear aligned to real objects, and third is the 2D or semi-3D screen space, which typically holds traditional, non-AR UI elements which stick on screen, even if the device is moved around.

3.2 Annotations and Labels

Appearances: By definition, annotations are spatially dependent components, which anchors an object and have to be registered to the real world or spatially independent components not contained in the real world [12]. While these could be any graphical element with usage from navigation in AR up to maintenance information, we see mainly textual labels are frequently used. From here, three major forms appear useful: (a) icon-like, bill-boarded pointers, which are directly centered in the anchor position, (b) twin-elements of (2D or 3D) labels with leader lines that connect the label to the anchor - these elements can be either in the 3D or the 2D screen space, a relative position to the object or with a fixed position in screen space. Either way, the leading line as the anchor point will move and appear sticky to its origin at the object and thereby maintain a visual connection of the two. Sometimes, pointer annotations also behave like targets that trigger events or reveals more details once activated.

Usage: On a mediation level, we tend to treat annotations and labels in our understanding like real world extending elements, which add additional contextually useful information without the need of this information to otherwise fit to the augmented object in visual terms (e.g. in contrast to a visually highlighted part of a target object). Those elements are nowadays heavily used and popular in AR browsers, like Layar, Wikitude or YELP Monocle. They act as visual proxies, with either a descriptive labeling role or a link to detailed information, for instance linking a Wikipedia entry to visually recognized book or sight, where the link is presented as an AR-icon.

3.3 Highlights

Appearances: Being able to precisely track objects with edge tracking, by means of a target object’s 3D model, enables to visually highlight parts or entire objects in an highly fitting and convincing visual way through the use of its shape. Highlights can also be animated, e.g. smoothly fading from none to the highlighted color while being slightly transparent so to not occlude the real part with the rendering entirely. When the geometry or shape of an element isn’t available, pointer or other proxy geometry can be used as an indicator or highlighting element instead.

Usage: While such proxies are technically close to annotations, on an mediation level, they serve a different purpose. Instead of extending reality, they rather emphasize it, just like a text marker would. Geometrically-fitting highlights make parts or objects stand out and catch attention, which is even enhanced when animated. Highlights not only lead attention, they can also be used as a selection indication during AR-selection techniques, where real physical objects become digitally select-able through the AR view.

3.4 Assisting Visual Aids: Helper and Guiding-Geometry

Appearances: Within this category we summarize supplemental visual elements, like arrows and other pointers, guideline elements or metaphorical indicators, like Torch-Light or Magic Lens (cf. [5]) effects, that are basically composed of 2D or 3D sprites or geometry and that refer to or are anchored to particular points-of-interest at the target object. Like Highlights, they can be animated too. Such elements can either appear in the AR space only, e.g. be fixed to their anchor position, or like the lead line of annotations and labels, be connected to the screen space in some way (e.g. as in [19]).

Usage: Again, such elements are close to annotations in technical terms. But in contrast to the latter or highlights they communicate stronger meaning by their own means. The way we see them used, they basically serve as marker, links and container for textual content within the AR space. In contrast to highlights, visual aids not only shallowly emphasize elements on a target object or a points-of-interest in the view. Instead, their shape and motion (when animated) itself suit a communication goal, for instance to clear a screwing direction, or to emphasize caution and attention on otherwise missed details during assemblies users would otherwise not pay attention to. Magic Lenses are a special case, as they typically come with a distinct visible bounding box that visually set the superimposition apart from the rest of the scene. In doing so, such elements draw attention but also are one way to cope with visual clutter - which is considered a notorious problem of information display in AR (cf. [18]).

3.5 Additive Elements: XRay

Appearances: XRay visuals uncover hidden, occluded or otherwise imperceptible or structures, e.g. a car’s engine underneath a closed hood. The illusion is created by artificially removing occluding parts from real world objects, as if looking through solid matter. Literature defines such visualizations as comprised by phantom or occlusion geometry, or a ghosting render technique that enables themselves to occupy, clip or occlude the real objects in areas, where the “look-behind-the-curtain” effect should happen, while keeping or introducing additional geometry as depth cues in order to support the depth perception. Especially the ghosting technique with shape outlines or transparent outer hulls aim to reveal enough of the XRay objects underneath (cf. [6]) to be comprehensible, while keeping/preserving major structural elements of the occluder, i.e. the physical object, in order to preserve valuable depth cues to understand the presented AR-view.

Usage: On an meditation level, we can think of such add-ins as an enrichment of reality, as they reveal spatial and semantic relationships between hidden and visible objects. In that sense, they are not only a sole extension, but create meaning immediately within the viewing context in AR. Examples are the revelation of pipes in the ground or a cable tree inside a car that through superimposition communicate the meticulous network-like character, while visually help to find and identify e.g. connecting control-units during a diagnose phase (rendering the prior physical dismantling unnecessary).

3.6 Additive Elements: Explosion Diagrams

Appearances: AR explosion diagrams are a quite common visual element in industrial AR. Originated in technical illustrations, such elements are used to present object assemblies at the real physical counterparts in such a way that it becomes possible to memorize and reassemble the object mentally and to show relations, assembly steps and the structural layering of parts. In literature (cf. [18]), these elements are often addressed as spatial manipulation.

Usage: The goal is to stimulate the user cognition process of creating a mental model of complex objects (cf. [10]). Likewise to the former elements, such visuals can be seen as additional layer, where virtual representation enriches the current scene. The challenge, however, is to ensure, that the virtual element coherently co-exist with the real one, i.e. not generate clutter or ambiguity, or mutual occlusion between virtual and real that hinders the mediation strength. Connected to interaction, explosions not only convey structural assembly hierarchies, they can also be treat as extending, linking elements, e.g. in case of the tires-breaking system of car, each part could be connected to a replacement part catalogue.

3.7 Trans-Media Material

Appearances: As a category of its own, we can think of trans-media material that can be of any form, from 2D to 3D sprites or video footage, which is superimposed into the user’s view and still aligned to the target object and viewing context. For instance historical 2D photo material, or the virtual recoloring of historical art pieces and statues belong in this category. Depending on the used material, not all media per sé meet AR requirements: while 3D models suite rather well, because they can be augmented and viewed perspective- and view independent, 2D and 2.5D images or videos might be not. Therefore, through the use of a ghost-like, bill-boarded presentation, such media can very well be integrated into the AR view as it allows to be sticky relative the target object or within the surrounding, while not appearing odd or misplaced, once users move the device.

Usage: In AR terms and those of our understanding, such visuals are again an enrichment of reality. But in contrast to the prior visual elements, they suite the wish for creating high immersion imagery, without necessarily much more meditation or communicative aspects. For instance in the area of heritage and tourism, the restitution of a broken temple would belong to such a classification. Ghost-like presentations typically won’t block or occlude real world elements, as they are rendered transparently; in doing so, they usually also come with an appealing aesthetics, that metaphorically-speaking give augmented objects literally a digital aura. Such visuals works as a whole: they don’t necessarily proxy or encode information in a visualization manner.

4 Discussion

4.1 Debriefing the Visual Elements

Summing up the common elements already reveals that some have become such a commodity in its usage that we can assume them as quasi-standard elements which are both, well elaborated and useful.

This especially concerns labels and annotations as they belong to the most important AR elements in industrial applications. We think this is mainly because they are easy to generate in technical terms. However, they are also one of the most challenging elements, because their misuse can easily produce clutter and ambiguity instead of being of assistance.

Next to annotations, highlights (Sect. 3.3) and what we refer to as assisting visual aids (Sect. 3.4) belong to the category of elaborated visual elements, too. Especially contour-bound highlights allow to visually stand out, while being appealing and unobtrusive at the same time (Fig. 2b+d). However, if the highlighted elements are too small, they can easily be overseen or not recognized. Here, visual aids that are complementing or acting as highlights are of huge help, as they can overcome such shortcomings (Fig. 2a).

Other categories, like XRay or explosion diagrams are less clear to grasp. In terms of their composition they are comprised of several media elements to such and extend, where not single elements, but rather the whole picture suits a communicative purpose; be it more or less only to give a whole picture, like in the case of virtual restitution. However, we think such visuals only add value if they align and integrate into the real view well enough, at least to some degree. For instance, maintaining depth perception of Xray blendings is rudimentary in order to frame and conceive the AR view. Either by using appropriate occlusion geometry (cf. Fig. 4c) or other visual cues, such as ghostings of the outer hull of a target object, enable to deliver a coherent AR presentation.

In term of trans-media material – which builds the visually richest and, if well done, most appealing superimposition elements – interaction can enhance the experience beyond the point of sole presentation-only. For instance, with Magic Lenses the superimposed content is not visible all-at-once, thereby encouraging the user to move the device so to explore the overlaid content that becomes incrementally visible by doing so (cf. Fig. 1d), which presents a Xray of a rocket’s interior in a Magic Lens). However, the better the visuals are fused and integrated into the real scene, the more credible and convincing the AR impression and immersion will become (cf. Fig. 5, where the shape in (a) is almost seamlessly blended, while the video in (d) is merely pinned at a building’s blank wall). In order to connect information precisely and clearly to objects, it is a necessity (Fig. 3). We think this is also true for the type of AR apps common for today, where with SLAM and visual odometry – as used by ARKit and HoloLens – media is superimposed and placed into reality like holograms: they rely on the AR illusion and appear to exist in reality, but in contrast to above must not be any further related or connected to objects in it.

Fig. 1.
figure 1

Examples for annotation and label types (left to right): (a) situated icons linking to online references of the sights [23], (b) pointer indicators as hints to further information, (c) lead lined labels with contextual expert information, (d) Magic Lens and lead-lined labels, which in contrast to the prior are sticky in the right-handed screen space.

Fig. 2.
figure 2

Range of highlighting elements (left to right): (a) semi-transparent, circular indicator enclosing the point-of-interest, (b+c) shape and geometry-based yellow highlight make target objects protrude [22], (d) highlight as selection indicator: similar to the prior, the object’s contour is used as highlight boundary, here interactively triggered at the target object through a gaze-technique. (Color figure online)

Fig. 3.
figure 3

Types of helper and guidance geometry (left to right): (a+b) circular arrow-like overlays communicating working directions on a machinery. (c) Superimposed guidelines on a see-through-device as a leading tool during assemblies [19]. (d) Red sickle-shaped element indicating the distance control’s preceding target in a head-up display [25]. (Color figure online)

Fig. 4.
figure 4

Range of XRay, explode and ghost-like visuals (left to right): (a) revealing sub-surface infrastructure with XRay visuals, and (b) with ghostings [24]. (c) XRay with semi-transparent cover structure of the target object’s shell. (d) Interactive explosion diagram revealing a breaking system’s topology.

Fig. 5.
figure 5

Examples of trans-media as visual overlays in AR. (a) One of the first AR-based restitution of the ruins of the Acropolis in Archeoguide [1]. (b) 2D restitution of a Medusa relief and (c) a recoloring of a maiden statue in CHESS [2]. (d) Historical video overlay superimposed in-situ in the Berlin Wall App [26].

4.2 Mediation and Communication Goals

Looking into literature around AR visuals and visualization techniques reveals that at first developers and AR app designers need to decide what they want to do in the AR view and what they want to achieve with it. With the prior mentioned visual elements in mind, we identified three major aspects:

  1. (a)

    an extension of reality: similar to visual search, where the AR device acts gap-bridging between physical and virtual via a “point-and-show” or “show-and-tell” metaphor: the added-value then lies in the ease of connecting digital info to the target objects, which would otherwise require users to manually search e.g. online. Interaction with the device and the visuals help to sort, filter or ease access of information and fuse them by utilizing annotations and labels as visual elements. Thereby, real objects are linked-with or linked-to complementary digital information immediately. This creates a quick and unobtrusive experience that nicely bridge the physical/digital gap.

  2. (b)

    an emphasis of reality: by highlighting or otherwise emphasizing relevant parts. Doing so, helps identifying and focusing on important aspects. For example during a step-by-step maintenance procedure, highlighting frankly lead users to focus on the machinery parts-of-interest at the current step. Or, they indicate its virtual selection or activity at the target object. Another aspect, which we see relevant in this context, but won’t be discussed in detail here, is working the other way around by diminishing reality in order to emphasize an AR view and to lead the eye. In video-see-through AR, this basically means manipulation and filtering the video, at simplest turn irrelevant information in the video-acquired reality gray, while relevant information stays colored.

  3. (c)

    an enrichment of reality: where augmenting rich visuals allows user to immerse in the AR view and create a picture as-a-whole. In our sense, going back step-wise from photo-realistically depiction to explosion diagrams, ghost-like add-ins or XRay visuals, such techniques are more than a extension or emphasis, but allow to reveal the invisible, restore, reconstruct or mix real and virtual quite vividly, enable to depict different temporal states of the target-object or scenery – but in technical terms take also more effort to create.

In terms of industrial application, an additional aspect leads into the element’s usage during authoring. Here, elements which can easily be created automatically and deliver quick-encoded information are likely preferred over (what we collected under) trans-media elements. The latter typically rely on time-intense manual creation work during the media production phase. Talking to industrial leads, automatized authoring is seen as a key element during the design and authoring phase of industrial AR maintenance systems.

Another huge topic is whether is it possible to transfer the often dense traditional information into AR views at all – or, more precisely, to find metrics to which degree such information should be transferred into AR. On tablets and smartphones, we nowadays see many UI concepts that not only rely on sole AR presentations, but combine state-of-art 2D UI dialogues with AR visuals. While this appears appropriate at first, in research related to the CHESS project [2] we found that the more (2D) UI elements in screens space exist, the less attention is given to the AR space. In a more recent follow-up evaluation with 40 students, who took our demonstrator from [3], all participants stated that this takes already effect, even if only a small amount of additional UI elements are visible: In the minute people need to cope or touch UI elements while holding the AR device up, they tend to immerse less in the AR view and to loose the impression of the media “existing” in reality.

Our identified visuals and categorizations ground on the core promise of mixed reality to bridge the gap between real world objects and the digital information space about them. In that context, two central questions arise: are these elements working well? And, to which extend do they suite as mediation elements? Especially for the latter, we see the question closely connected to possible interactions (e.g. while annotations and labels might visually be less intriguing aesthetics compared to more compelling renderings, in terms of UX and interaction they are so relevant, because of the prior mentioned show-and-tell principles). Although we can’t cover interactivity in AR and with UI elements in more detail here, we think it is an essential element that goes hand in hand with AR visuals in terms of our discussion regarding mediation and communication adequacy.

5 Proposing a Consistent Model

5.1 The AR Visualization Cube

Identifying the taxonomy of the common visual elements as a holistic view has exposed that manifold concepts of visualization techniques have been investigated and proposed for augmented reality applications. Especially in complex maintenance scenarios, AR should assist the user on the accomplishment of his tasks by only showing the currently relevant information. Therefore superimposing unfiltered information leads to ambiguities and disturb the user’s focus. The approaches in this paper and our work in this field have shown that there is a strong necessity to progress the core visual elements into an adaptive manner. This includes a multifaceted consideration of adaptive visualization techniques, where the strength of superimposed information changes dynamically through means of interaction techniques (cf. [16]).

As mentioned before, we see very similar demands to visualizations in AR to visualization techniques in geographic information systems and cartography. MacEachren [4] introduced a model called ‘the cartography/visualization cube’, which defines visualization in terms of map use, conceptualized it into three dimensions (user, task and interactivity). Synoptically, this model splits the map usage into visualizing/exploring the unknown and communicating the known. This model is an appropriate way to describe visualizations for information mediation on maps and we think, due its similarities (in terms of interactivity and modality), it is transferable to AR visualizations (Fig. 6).

Fig. 6.
figure 6

‘AR visualization cube’, transferred from visualization cube defined by MacEachren [4].

Against MacEachrens ‘cartography cube’, our approach defines visualizations in terms of usage in AR applications. As quintessence, the diagonal axis describes the effort which needs to be expended to transfer/generate knowledge for the respective visualization. Thereby this effort depends on the amount on informativeness (scale between discover unknown & present known) and on the level of interaction as a ratio of exploration. In other words, with less informativeness and a high degree of interaction the information extraction need to be done by a sophisticated process of exploration. As opposed to this, the communication of aggregated and pre-processed information significantly decrease the effort of knowledge transfer.

5.2 Applying the Cube

The cube’s principle can be illustrated at the example of annotations. Extending objects by annotations and labels should ideally create an added value and discreetly give the user a hint about additional context, without distracting the user’s attention. It should be noted, however that displaying multiple, excessive, unfiltered visual elements (i.e. annotations or labels) at once, leads to information overload and ambiguities affected by occlusion or visual clutter. Furthermore, they tend to appear visually incoherent with regard to the target object they are connected to and become illegible and impend to interact with (cf. [16, 18]). Considering our model, therefore unfiltered visualizations unveil less information and a sophisticated exploration process is required to gain insights. This could not only interrupt the users perception but even worse could unintendedly lead to wrong conclusions. Looking beyond AR, this visual impairment is also well known in cartography and is solved by generalization methods, which allow the simplification of map contents by deriving the information density in relation to the level-of-detail (LOD) [9]. Even adaptive rendering has been poorly studied in AR [15], however there are several approaches for filtering and ordering techniques [11]. Particularly for annotations, the effort of knowledge transfer can be decreased by spatial, knowledge-based, or combined filtering methods [7], as they enable the possibility for decomposing the information density and keeping the visual coherence by only showing the relevant information.

Fig. 7.
figure 7

Unfiltered [8] vs. proximity- and angle-sensitive clustered annotations.

With regard to e.g. adaptive annotations, this paradigm facilitates user-centric spatial- & motion driven interaction methods in AR. By connecting superimposed annotations and other content to motion-based interaction and to camera-acquired targets, the information density can be intuitively controlled by proximity. In a simplified scheme, this can be illustrated in (cf. Fig. 7, as the level-of-detail of information presented is influenced by the distance/proximity and the angle to the tracking target. In order to clarify the view through a meaningful filtering and thereby reducing the cognitive load to the users, the motion-driven and the user controlled adaptive behaviour of the visuals encourages the user’s perception and improves mediation strength. We see that this effect and trade-off of exploration to presentation is also true for the majority of the before mentioned visual elements (especially for indicative visualization rather then complementary/supplementary ones like XRay, explosion diagrams or trans-media material). Applicable on each of the presented common visuals in AR, we think our conceptional ‘AR visualization cube’ is an appropriate and consistent model and a start to describe visualization and their requirements in regards to Augmented Reality as well as implications on interaction.

6 Conclusion and Future Work

In this paper we collected common visual elements in AR in terms of mediation and communication aspects, especially focusing on industrial use cases, with ARs core promise in mind of being able to bridge the gap between real world objects and the digital information space about them. Reflecting on today’s common elements taken from public sources, literature and own experience, we categorized and discussed their suitability regarding mediation and communicative goals instead of technical and implementation considerations. In doing so, we have introduced (a) three mediation principles, namely extend, emphasize, enrich, and (b) the ‘AR visualization cube’, a framing meta-model that on the one hand helps clarifying the mediation strength of these elements and on the other enables to reflect on their suitability on a more strategic level.

While we see high relevance in shifting attention from technology to application of AR visual elements as visualization-based aids during industrial usage and similar mediation-intense cases, we also see the need for further investigation. Especially in terms of assessing the performance of AR visuals and their suitability regarding didactic and physiological models of cognition, a deeper formal investigation would contribute to the discussion (Webel et. al. [10] started to do so, but was focusing on maintenance based AR training).

As a result of this paper we raise the question of a visual language or at least a toolbox or framework of visual strategies in terms of communication and mediation aspects, since we found appropriate guidelines missing. With our presented model and the idea behind it to link the topic of AR visualization to other, more elaborated disciplines such as geo-visualization, we aim to contribute to the discussion without being complete or having considered all possibilities.