1 Introduction

Human–computer interaction (HCI) methods influence the technique of manipulating anatomy-related mental models of physicians during interventional radiology procedures. Interventional radiology procedures are complex and are performed in an environment where numerous devices have to be operated by the medical staff. This situation puts a high perceptual, physical, and cognitive demand on the interventional radiologist (IR). To make use of information originating from multiple sources, designers of complex systems must not only focus on providing the necessary information (information design) and technical system capabilities, but also on the presentation of information to facilitate situation awareness under dynamic operational constraints (Endsley 2000). Therefore, we would like to add the need of interaction design to information design in the field of interventional radiology in order to develop better user interfaces. Interaction design is defined as “developing interactive products that are easy, effective, and enjoyable to use from the users’ perspective” (Preece et al. 2002) and takes into account both the physical and the cognitive processes of users (Hartson 2003).

Mental manipulation of spatial images is a key task in interventional radiology. These spatial images are related to human anatomy, and their manipulation is necessary to perform successful procedures. These processes reportedly require a huge mental effort, and their accuracy depends on the experience level of the IR (Keehner et al. 2004). The form of mental models and the elementary actions of their mental manipulation are hard to reveal, because they are not directly available to the outsider due to their existence in the mind (Staggers and Norcio 1993). It is also often difficult for people to externalize and verbalize mental models and processes. Because of these reasons, it is hard for an equipment designer to define what kind of information the IR needs in specific situations and how this information should be presented to support decision making.

The ultimate goal of this research is to obtain a general understanding of the cognitive and physical processes (e.g., eye–hand coordination, information gathering from hand position, and tissue resistance during needle insertion) of the IR in order to reveal fundamental and relevant information that is applicable for UI design. Furthermore, the aim is to understand their mental model properties and their mental model construction and manipulation processes. With this knowledge, it could be anticipated how doctors behave in certain situations, and user interfaces could be designed taking into account human factors, such as human abilities and skills, individual strategies, and experience levels.

The aim is to attempt to answer the following questions: (1) How a mental model of anatomy is constructed and in what kind of form is it stored in memory? (2) What are the major mental manipulation processes related to interventional radiology procedures? (3) How are the mental manipulation processes performed and how do they relate to findings of vision science? (4) What are the consequences of mental manipulations of anatomy for designing HCI in the field of interventional radiology?

Mental models of anatomy and their manipulation were studied in multiple ways. Besides studying relevant literature, ethnographic studies (observations and interviews) were performed to analyze the current situation in the interventional suite. Furthermore, an interactive prototype was developed and applied in a user study in order to investigate the cognitive processes of IRs when confronting with a new user interface offering certain interaction techniques that are different than the ones in current medical practice.

1.1 Interventional radiology

Interventional radiology is a rapidly emerging field due to its positive effects on the health of the patient (Becker 2001; Ahmed et al. 2010). As compared to open surgery, interventional radiology procedures reduce infection risks and recovery time because of their minimally invasive nature. In short, open surgeries access the body by entering it through large incisions and by having a direct view on the inner structures of the patient. Interventional radiology procedures are planned, performed, and evaluated using artificial views provided by imaging techniques. Imaging modalities are either morphologic depicting anatomy [e.g., computed tomography (CT), fluoroscopy, magnetic resonance imaging (MRI), X-ray, ultrasound (US)] or functional depicting information on metabolism (e.g., SPECT, PET, fMRI). Nowadays, imaging equipment is connected to computer systems that provide control of images and the possibility to digitally draw on them, for example, in order to measure sizes of tumors or distances of a tumor to vital structures. Recently, image registration/fusion techniques enable combining two modalities in a single image in order to employ complementary information (Giesel et al. 2009).

One type of interventional radiology procedure is the direct percutaneous approach, in which the IR inserts a needle into the body by puncturing the skin and navigating the needle to the target tissue or lesion in an approximately straight line under image guidance. Our investigation focuses on radiofrequency ablation (RFA) of abdominal tumors and biopsies. In order to achieve successful navigation of the needle, the IR needs to maintain spatial orientation and avoid disorientation. To do so, he needs to be able to correctly identify anatomical locations in the human body and to understand the spatial relationships of surrounding organs and tissues (Stüdeli 2009; Hugh 2002). In this spatial navigation, the IR can only rely on the imaging techniques available in the interventional suite.

1.2 Human–computer interaction

The problem of designing HCI in the field of interventional radiology is that there is only very limited literature about the cognitive processes of the IR. Most of the literature focuses on preoperative situations (van der Heyden et al. 2001) and rarely on intraoperative situations (Johnson et al. 2006). Due to the lack of appropriate information, it is hard to predict which aspects of intraoperative navigation require special attention. Moreover, the literature focuses on high-level decision-making processes, while our goal is to identify and interpret elementary cognitive operations that are part of the complex decision-making processes.

The form of presenting images on the computer screen and the controls of their operation are defined by HCI techniques (Carrol 1997; Karray et al. 2008). HCI involves all software and hardware elements that take part in the interaction between users and computers. Cognitive factors of HCI affect human behavior, and a high cognitive load may lead to human errors (Zhang et al. 2004). Interaction occurs at the user interface (UI), and designers of the UI are responsible for the overall usability of the system (Gould and Lewis 1985). HCI is inherently a multidisciplinary field. Studies are conducted in various fields including cognitive psychology, industrial design, and computer science. A major goal of UI design is to provide interfaces that conform to the cognitive processes of users, that is, to make interfaces usable (Janß et al. 2007). ISO 9241-11 (1998) defines that usability is the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use. In this research, the user is the IR whose major goal is to perform successful procedures with regard to treatment effectiveness and patient safety. Besides adapting interaction techniques to the cognitive processes of the user, the UI also has to fit into the medical workflow and take into account environmental constraints.

2 Mental models and mental manipulation

2.1 Mental models and visual imagery

The concept of mental models was introduced by Craik (1943) and nowadays is used in several disciplines, such as psychology, cognitive ergonomics, HCI, and industrial design (Doyle and Ford 1998). A mental model is generally defined as an internal representation of an external reality, such as a situation, system, or a device. In HCI, a mental model is defined as a set of beliefs about how a particular system works (Norman 1983). In cognitive science, mental models are studied with the final aim to understand human cognitive processes and to develop detailed theoretical explanations of mind and behavior (Johnson-Laird 1980). Industrial designers and HCI researchers study mental models with the goal to have a proper (but possibly incomplete) understanding of user behavior and the underlying cognitive processes in specified situations in order to develop easy-to-learn and easy-to-use products (Freudenthal 1999; Jih and Reeves 1992).

Mental models can be either static or dynamic, or both. A static mental model is an internal image how things, such as objects or environments, look like, while a dynamic mental model captures processes, such as working procedures or situations. Combined mental models have static and dynamic features as well. For example, a static mental model is the mental map of the country where someone lives (Tversky 1993), and a dynamic mental model is someone’s idea how the calculator works (Halasz and Moran 1983), and a combined mental model looks like an internal movie depicting visual elements of an event in a certain order, such as collision of two cars (Thagard 2010).

Mental models are consciously or unconsciously used to predict situations or to make decisions in relation to product usage (Endsley 1995). Differences in the way mental models are constructed are influenced by certain user group characteristics, for example, age or experience (Demming 2004; Pattison and Stedmon 2006; Ziefle and Bay 2004). Although users may have some sort of knowledge before first use of a product, mental models become fully developed only during actual use. However, there seem to be a number of characteristics that hinder people to use mental models in a straightforward manner. Norman found that mental models are often incomplete, limited in use, unstable, unscientific, and parsimonious (Norman 1983).

There is a debate about the similarities and differences between mental models and visual images (Schwartz 1996; Liu and Stasko 2010), as well as about the form of their internal representation (Pylyshyn 1973; Kosslyn and Pomerantz 1977; Knauff and Johnson-Laird 2002). Visual imagery deals with processes that are involved in generating, examining, and manipulating visual images in the mind. Visual images are defined as vision-related experiences, and relative to perception—similar to mental models—they are ill-defined and internal (Palmer 1999). Visual images are generated while perceiving a picture and are stored in visual memory after the optical source is no longer available. Concerning the form of visual images, the analog position argues that they are picturelike representations, while the propositional position proposes that visual images are actually languagelike descriptions that are complemented with more detail at retrieval. However, a hybrid theory of imagery also exists, which assumes that visual images contain both analog and propositional components (Farah and Hammond 1988).

Despite the confusion, in this research, mental models and visual images are used to cover the same concept. In this paper, the term spatial mental model refers to human anatomy as perceived, interpreted, and mentally constructed by the IR based upon 2D radiology images and stored in memory for interventional purposes. This spatial mental model allows for different forms of mental manipulation, such as mentally observing it from different viewpoints, or mentally positioning external objects in it, for instance, a needle or a tumor. Furthermore, it also involves models of dynamic features of the human body, such as how organs move due to respiration or how temperature changes due to blood flow.

2.2 Mental manipulations

In the study, which is reported in following sections of this paper, IRs reported that they make a 3D model of anatomy in their head by scrolling through 2D radiology images preoperatively. At certain points in the intervention, this spatial mental model has to be retrieved and manipulated. Findings of vision science show that complex visual routines—processes for extracting useful information from optical information—are constructed from a set of elemental operations and are stored in long-term visual procedural memory (Ullman 1984). Visual images are also stored in long-term memory; however, they have to be pulled into the visual buffer, one type of short-term memory, in order to be able to transform them (Kirby and Kosslyn 1990).

Mental image transformations are similar to real-picture transformations in a sense that more time is needed to compensate for larger rotation angles (mental rotation) or larger differences in size (mental resizing). Similar to the way the eye inspects physical pictures, the so-called mind’s eye inspects mental images (Pearson 2010). So the mind’s eye sees objects continuously rotating (Shepard and Metzler 1971) or growing/shrinking (Bundesen and Larsen 1975) with all of its intermediate steps. Interestingly, people seem to apply mental rotations instead of other transformations that require a quick change in orientation, such as mirroring that requires a 180° rotation, even after hundreds of trials (Cooper and Shepard 1973).

In interventional radiology procedures, mental registration of images is necessary. Mental registration is required to spatially align a mental image to an image on the computer screen, for example, in order to be able to compare real-time intraoperative US images visualized on a screen with a mental image created from a set of CT/MRI slices. Mental registration is done by applying a set of mental image transformations, mostly mental rotations (Zacks et al. 2002). 2D and 3D combination view studies also predict that the bigger the angle of the necessary mental rotation is, the more time is needed to reach the required orientation and the more mistakes are made (Tory 2003). It is worth mentioning that in everyday life tasks, the mental rotation strategy is combined with a pattern matching strategy, that is, unique features of one image are matched with features of the other image (Corballis et al. 2007). In interventional radiology, bones and blood vessels are frequently used as registration landmarks, as easily recognizable image features.

Much research has been done in order to support the work of the radiologist by smart computer systems. Image registration algorithms have been developed to bring images into spatial alignment and to allow the combined display of data by overlaying two images and visualizing them as one anatomically consistent image. By showing two modalities in one computer view, mental registration is not required, as is currently done by the IR. However, computer-based image registration techniques currently have limitations (e.g., computational time, no compensation for organ deformation, the need for human interaction) that have to be overcome to be able to fully exploit them in practice. Interested readers can consult (Shams et al. 2010; Maintz and Viergever 1997; Pluim et al. 2003; Zitová and Flusser 2003; Wyawahare et al. 2009; Hill et al. 2001) for more information about medical image registration techniques. Despite the promising computational algorithms for image registration, it is hard to anticipate that external visualizations would completely substitute internal visualizations (Khooshabeh and Hegarty 2010).

We expect that manipulation of mental models is more complex in interventional radiology as compared to laboratory tasks in psychological experiments. In interventional radiology, high-accuracy results are necessary, and dynamic features of the mental model—such as organ deformation due to breathing and tool manipulations—have to be considered. We also expect that the IR has to start the manipulation process with a spatial mental model of a body part in order to find a specific intersection of the volume, possibly through a set of transformations. In other words, a specific orthogonal or oblique 2D slice of a complex 3D image has to be located and interpreted. There are a couple of aids that provide information for the IR in this situation, such as the orientation of the patient, the angle of the US probe in the hand, and anatomical landmarks, for example, blood vessels located in the US image.

3 Materials and methods

3.1 Materials

3.1.1 The interventional suite and its equipment

The interventional suite consists of a control room and a patient room. These two rooms are typically divided by a lead–glass wall in order to protect medical staff from radiation. The rooms are largely occupied by equipment, and very small moving space for the medical staff remains. There are different interventional suites. Some are equipped with a CT scanner, others with fluoroscopy or MRI.

An important decision during an RFA procedure is the selection of the needle. Typically, umbrella-shaped needles are able to ablate larger volumes, but at the same time, it is more difficult to use, because the ablation zone is less predictable. The straight needle is suitable only for smaller ablation volumes, but it is more straightforward to use, and the outcome is more predictable. There is also a possibility to apply three straight needles at the same time to increase the ablation volume, but the difficulty of mentally designing and executing the ablation zone in this case is extremely high. The three presumably spherical ablation zones have to overlap in a way that no cancer tissue remains in the combined ablation area.

3.1.2 The prototype

The prototype includes a number of hardware and software elements. Visual Studio 2005 was used as programming environment, and the code was written in C++. Coin3D was used for 3D graphics (www.coin3d.org), and Qt (http://qt.nokia.com/products) was applied for interface development. Coin3D and Qt were integrated using the SoQt libraries. SIMVoleon was applied for handling volumetric data. VRPN (http://www.cs.unc.edu/Research/vrpn/) serves as an interface between the application and the Ascension’s Flock of Birds tracking device (http://www.ascension-tech.com/). A MATLAB (http://www.mathworks.com/) script was employed to create a volume out of a series of axial CT slices that are usable by SIMVoleon.

The main purpose of the prototype was to integrate interaction methods that are not available in current practice in the hospital where the study was performed. These methods were having a preoperatively planned needle trajectory in the views, real-time feedback about the current position of the needle, and arbitrary control of CT images while observing them from different viewpoints. Visual feedback was provided though a regular computer monitor. A foam model served as a phantom to allow for needle insertion. Interaction happened through two hand-held devices. A planar device was used for exploration of the scanned CT volume and for checking the planned needle line. As the user moved the planar device in space, each of the views was updated accordingly. The planar form of the device represented the imaging plane, and a position and orientation sensor was attached to its center point to track spatial information. The other input device was an actual RFA needle equipped with a position and orientation sensor. As the needle moved, its motion was visualized on the computer screen. The volume in which the input devices moved was calibrated to allow comfortable movement of participants and at the same time to provide a good viewing distance and angle to the monitor (Fig. 1).

Fig. 1
figure 1

The input devices

3.2 Methods

3.2.1 Ethnographic studies

Interventional radiology procedures were observed in the interventional suite of Erasmus Medical Center in Rotterdam in the Netherlands. Three liver, three kidney, two bone RFAs, and three lung biopsies were observed. Liver and kidney RFAs are the most commonly performed RFA procedures, and therefore highly relevant for clinical practice. During the procedures, short explanations were given to the researcher by the performing IR, and the sequence of actions was recorded in a written form. Retrospective interviews were applied to check the correctness of the recordings and to gain a deeper insight into the procedure from the point of view of the physician. The recorded sequence of actions was transformed to a graphical workflow.

3.2.2 Task analysis

RFA was selected as a first case for task analysis. Considering all RFA procedures, liver RFA is reportedly the most challenging concerning mental model manipulations, therefore special attention was given to this case. A task analysis was performed to study the environment including equipment, as well as the roles of the interventional staff. A detailed task analysis was centered on the IR’s role, bringing down activities to the most elementary decision-making subtasks. Task analysis included physical actions, but more importantly also the cognitive processes involved. Our main interest was how medical images are utilized preoperatively, intraoperatively, and postoperatively. For this, the following steps were done:

  • mental model–related literature was studied (Sect. 2)

  • a user study was performed with the help of an interactive hardware- and software prototype (Sect. 3.2.3)

  • a task analysis was carried out based on clinical observations and retrospective interviews and based on the results of preceding research (Sect. 4.1)

  • working strategies and cognitive processes were identified based on the user study (Sect. 4.3)

  • quotes from the interviews and from the user study were put into categories to deduce mental manipulation tasks (Table 1)

    Table 1 Quotes and inferred mental manipulation processes
  • mental manipulation processes of liver RFA were identified (Sect. 4.2)

  • results were compared to findings of vision science (Sect. 5.1).

3.2.3 Prototype-based testing

3.2.3.1 Participants and tasks

Five participants were involved in the prototype study: two experienced IRs and three residents. Each session took 40 min on average. Each participant had to perform two tasks. In the first task, a 3D CT dataset was shown to the participant containing a target area marked with a sphere. The orientation of the patient was verbally explained. The participant was told that the diagnostic team has decided to perform a biopsy, that their colleague proposed two trajectories for needle insertion to hit the target area, and that their task would be to explore the dataset using the given input device, to orientate themselves and get familiar with patient data, and after that to compare the two trajectories and decide which one they would prefer to perform the needle insertion. The decision process and the final decision would need to be communicated to the researcher in clinical terms, based on the participant’s knowledge and experience.

In the second task, the same CT dataset as in the first task was shown with the trajectory the participant chose. The task was to align the needle according to the planned trajectory and insert it following that trajectory toward the tumor. The patient was represented by a block of foam positioned in front of the participant. To warn the participating interventionists for imperfectness of the test equipment, they were told that (1) the foam model did not fully correspond with the patient in size and shape, and that when they would touch the surface of the foam, the tip of the needle on the screen might not exactly touch the skin; (2) the input device was very sensitive to movements. They were advised not to make fast movements; (3) the foam was softer than the human body and allowed unrealistic needle movements. They were asked to approach the foam in the same way as they would approach a real patient; and (4) the needle bended easily and that only the grip of the needle could be tracked and not the tip (due to the large size of the sensor), therefore the trajectory could change during insertion even if it was perfectly aligned beforehand. Participants were asked to ignore these effects and to focus only on the navigation.

3.2.3.2 Views

The user was provided with six different windows, which were categorized into two groups, namely traditional and non-traditional views. In each window, a certain set of planes could be accessed. The traditional views were the transverse (axial) (A), sagittal (B), and coronal (C) planes. This was similar to what radiologists use now, they scroll through these standard planes to assess the medical situation. The prototype enabled scrolling through the planes using the planar input device, instead of the mouse, which is currently used in general practice. Interpretation of the traditional imaging planes is daily routine of IRs (Fig. 2).

Fig. 2
figure 2

Basics of anatomical orientation

The non-traditional views were the volumetric view, the oblique view, and the needle-line view. The volumetric view (D) showed the scanned area of the body as a volume. This was actually a block form constructed from the series of scanned transverse slices. This window was designed to facilitate the mental task of orientation in the dataset and did not involve segmentation of organs and tissues. The oblique view had two functionalities. Based on user selection, it either showed the current oblique plane (E′) that was selected by the planar input device with the needle in it or showed the needle-dot view (E′′) (Fig. 3), those slices that were perpendicular to the needle. The needle-line view (F) (Fig. 4) showed all planes that contained the needle line and was controlled by the planar input device. The planned trajectory and the needle were shown in all views of the interface. This can be used to improve needle orientation. The complete user interface is presented in Fig. 5.

Fig. 3
figure 3

Two parallel needle-dot views

Fig. 4
figure 4

Two intersecting needle-line views

Fig. 5
figure 5

The user interface

The well-known orientation cues [left (L), right (R), anterior (A), posterior (P), superior (S) and inferior (I)] were added to the transverse, sagittal, and coronal views. It is expected to aid quick and easy orientation. An active reference frame with transverse, sagittal, and coronal axes was added to the needle and needle plane views to support spatial orientation of the real-time moving oblique slices.

3.2.3.3 Assumptions

Regarding the views, the following assumptions were made. It was assumed that the needle-dot view is useful to check all oblique slices perpendicular to the needle in order to see where the needle intersects the body and to decide whether it is a good trajectory or not. The idea behind it is to look from the point of view of the needle, from the skin of the patient to the tip of the needle (or the other way around), slice by slice, having a close view on the needle trajectory, and its close surroundings. Similarly, it was assumed that the needle-line view clearly shows the route of the needle having the entire needle trajectory in view that can be inspected from different angles. These are two alternative ways of assessing critical tissues and trajectories. Finally, it was expected that the volumetric view helps the user to interpret the orientation of oblique slices and the orientation of the needle in relation to patient orientation.

4 Results

4.1 Task analysis of the RFA procedure

The work of the IR is commonly separated into three phases: preoperative, intraoperative, and postoperative. Stüdeli et al. (2008) further divided the intraoperative tasks for percutaneous needle placements and identified the related subtasks and user interface design requirements as well. They separated four major tasks of the RFA procedure: search, plan, act, and check. A model for surgical navigation was constructed by (Stüdeli 2008) building on the well-established theory of quality control, applying a plan-do-check-act control loop. (Freudenthal and Pattynama 2007) called upon Rasmussen’s theory of cognitive control of behavior (Rasmussen 1987), and applied the knowledge-, rule-, and skill-based behavior to several surgical situations by providing examples. Rasmussen’s abstraction hierarchy theory (Rasmussen and Lind 1981) was used to structure initial findings. Meijs’s study (Meijs et al. 2008) with IRs resulted in a detailed analysis of several procedures and characteristics of the environment, as well as proposals for future technological solutions. (Jalote-Parmar et al. 2007) provided a high-level overview of decision-making processes focusing on the RFA procedure and organized the workflow into a matrix form including several aspects, such as goals, constraints, critical factors, and teamwork, among other issues.

The above-listed investigations and the study presented in this paper together provided the basis for the following description of the RFA procedure.

The RFA procedure can be divided into four main phases:

  1. 1.

    planning the approach: the IR decides how to reach the tumor and which ablation needle to use,

  2. 2.

    needle insertion: IR navigates the needle from skin to target under US and/or CT guidance,

  3. 3.

    tumor ablation: cancer tissues are destroyed, and

  4. 4.

    checking ablation: the IR decides whether the outcome is according to expectations: redo or finish procedure

The disease is diagnosed, and the procedure is planned using CT, MRI, or US images or combinations of these. The difficulty of the intervention depends on many factors, for example, the number, size, and shape of the tumors, or the location of the tumor in the liver, whether it is close to vital structures, such as vessels or other organs. Before the procedure, a trajectory is selected that ends in the middle of the tumor and avoids bony and vital structures applying a safety zone. If the tumor is visible on the US image, the needle is navigated to the target using real-time US. If it is not visible, the IR applies an intermittent CT technique. In this case, the needle is gradually directed toward the tumor and checked with CT. Usually, it takes several trials to find a good trajectory. The needle is inserted into the tumor, and cancer cells are necrotized by connecting the needle to the ablation machine and applying a specific ablation time and temperature setting. The outcome of the ablation is checked by making a new CT scan and comparing it to preoperative CT/MRI. If the doctor concludes that the size of the resulted ablation is not according to plans, or there is suspected tumor tissue remaining, the procedure has to be repeated.

Based on our ethnographic studies, we also identified elementary decision moments in the intraoperative phase of the procedure related to the usage of different imaging modalities and needle navigation. The detailed workflow and the decision moments are presented in a graphical form in Fig. 6. The ethnographic studies combined with the prototype-based study were the means to identify the major cognitive tasks during the procedure. The identified cognitive tasks and quotes from the participating doctors can be read in Table 1, and the following section reports on mental model manipulations in detail.

Fig. 6
figure 6

RFA workflow

4.2 Mental models and their manipulation during RFA

Creation of mental models of anatomy and their manipulation are important tasks in interventional radiology. These spatial mental models are created based on two-dimensional radiology images, and their manipulation is often required to perform successful procedures.

Manipulation of mental models is presented through the example of RFA of liver tumors. In this procedure, the IR navigates a needle by puncturing the skin to the target tumor under CT and/or US guidance, and then, electrical current is introduced through the needle to destroy cancer cells. Radiology images are complemented with mental models in order to acquire necessary information. The following list describes the main activities related to mental models of anatomy in the intraoperative phase of liver RFA.

  • Data to build a spatial mental model of patient-specific anatomy acquired by scrolling through 2D axial preoperative images.

  • A safe trajectory is selected by (digitally) drawing on one of the orthogonal images and/or mentally placing a trajectory in the spatial mental model. The center of the tumor is defined by the axial image that shows the largest diameter of the lesion.

  • The tumor is either visible in the US image or not, depending mostly on the patient’s tissue characteristics. If the tumor can be located, the needle trajectory can be defined intraoperatively using a standard planning line in US that has a fixed angle to the hand-held transducer: in this case, the IR has real-time feedback about the location of the needle in the process of moving it toward the tumor. The IR compares the intraoperative US image to the preoperative CT or MRI, in other words, mentally registers the US image from the computer screen to a 2D slice of the spatial mental model.

  • It is also possible that the tumor is not visible with US, but a good reference plane can be found that is comparable to a CT image and can be used for needle guidance. If the IR is sure that the intraoperative US image is the same as (or close to) a previously observed CT image, the tumor can be mentally positioned on the US image as it was seen in the CT image with high accuracy. In this case, the IR extends the real-time US image with the mental model of anatomy in order to locate the tumor. This process requires mental registration of the US image to the spatial mental model.

  • Comparing intraoperative and preoperative images become more complex when the preferred plane is oblique and deviates significantly from the orthogonal planes. The IR was observed to go back and forth between intervention room and control room several times to check the preoperative images and refresh the spatial mental model.

  • When US does not provide useful information, the IR has to completely rely on the spatial mental model to insert the needle.

  • There are a set of dynamic features that influence the trajectory of the needle during insertion. These features are mentally anticipated using knowledge and experience. The spatial mental model has to be adjusted taking into account the following:

    • breathing of the patient and related tissue deformation. Inhalation and exhalation states can be used as reference phases, and breathing of the patient can be stopped for a short amount of time;

    • tissue resistance that causes bending of the needle. For instance, cirrhotic livers become harder and make the needle difficult to insert;

    • tissue deformation due to needle insertion;

    • cooling by blood flow at the ablation area.

  • To check the actual position of the needle, an intraoperative CT scan is made, which shows the route of the needle and the position of the needle tip. The spatial mental model is updated with the position of the needle, and the actual needle trajectory is mentally compared to the planned trajectory. If it deviates, the needle has to be reinserted. This process is repeated until a good trajectory is found. Each trial can be used as a reference to adjust the needle.

  • The predicted ablation area has to be taken into account when placing the needle line in the mental model. The center of the ablation sphere is defined by the tip of the needle. This sphere has to completely cover the tumor and a preferred safety margin. In case of large tumors, multiple needles can be used at the same time. For instance, in case of three needles, three ablation zones have to be imagined, and their sizes have to be calculated in a way that cancer tissues do not remain inside or outside the entire ablation zone.

  • After ablating the tumor, preoperative CT/MRI images are compared to postoperative CT images. This is a mental comparison aided by computer measurements on each 2D image. If the result is not satisfactory, for example, there is residual tumor, or the ablation size is smaller than the planned ablation size, a redo of the intervention is necessary.

4.3 Exploratory study with the prototype

According to participants of the exploratory study, in interventional radiology training, the standard approach to diagnostics is the following: first the transverse views are checked, and at a later stage, the sagittal and coronal views, in this order (Table 1, Q2). A 3D image is made in the head of the radiologist by scrolling through the transverse slices, then the sagittal and coronal views are used to better localize structures in the mind. Oblique views are generally not used, although there are computer programs to create them, and CT scanners are also capable of rotation through a small angle.

Participants of the study said that they had difficulty using oblique slices, mainly because of losing orientation and also due to distortion of structures. It was observed that when the oblique view is used, participants still try to make it similar to one of the orthogonal views, typically to the transverse view. Even those who use oblique views easier go back to transverse slices for an extra check, because reportedly these provide the most trustworthy information. Participants supported the idea of having oblique views that contain the needle; however, the interface should be improved to make it more usable.

Participants preferred to use the views that relate to body orientation as opposed to views related to needle position. But interestingly, the needle-dot and needle-line views were used as extra help, because they provided a quick reference to quickly find, for example, the tip of the needle in the body-related views. It turned out that it was also easier to interpret an oblique view that is derived from an orthogonal view. Although it was assumed that the needle-related views are most useful in the checking task, they proved to be more useful in the needle alignment task (which is a new task compared to current medical practice). The position of the needle line as compared to the planning line was often checked in the needle-dot view, as it clearly showed how far the needle was from the planned trajectory and also because its orientation could easily be checked in a visual form (it was only correct when the needle and planning dot merged in one spot).

It became clear that in subsequent research not only proper views but also corresponding navigation strategies have to be developed for needle insertion in order to be able to quickly and efficiently learn to use the new interface. One example is that participants tried to look at multiple views at the same time, which proved to be difficult. It was much more effective to focus on one view at one same time and then move to another one.

In Table 1 unprocessed data, quotes from participants of the exploratory study and from task analysis related interviews, are presented regarding the identified mental manipulation tasks.

5 Discussion

5.1 Comparing results to vision science findings

It is interesting how the identified mental manipulation processes of interventional radiology relate to theories of vision science. In this section, results of the study are compared to findings that may provide explanations about mental manipulation of anatomy.

5.1.1 Establishing a spatial mental model preoperatively

When scrolling through CT/MRI slices, the doctor needs to construct a spatial mental model that can be retrieved from memory during the intervention. According to theories of visual memory and visual imagery, slices are mentally scanned and interpreted storing intermediate results and visual routines using short-term visual memory, and the complete interpreted result is then stored in long-term memory (Kosslyn et al. 2007). Based on the comments in Table 1, it can be hypothesized that the form of stored information is hybrid, containing (picturelike) visual images and (languagelike) propositional descriptions as well. This is in line with (Schultheis et al. 2007). According to the literature (Kosslyn and Shwartz 1977), humans memorize spatial relations in the form of structural descriptions, and parts of these descriptions are stored as visual entities. Considering that doctors regularly discuss anatomical situations using directional terms, and at the same time often draw on paper for a better understanding, a hybrid theory seems valid.

5.1.2 Selecting an oblique slice in the spatial mental model

Selecting an oblique slice in the spatial mental model is usually part of several more complex operations, which will be discussed in the following subsections. One might logically speculate that this operation is a cutting operation—selecting an arbitrary plane in the volume in a single step. However, according to theories of visual imagery, people do not learn orientation-invariant features of manipulation (Cooper and Shepard 1973; Tarr and Pinker 1989). Research suggests that manipulation tasks are probably performed as a set of mental rotations. A mental rotation consists of a continuous path of intermediate orientations, not only a discrete orientation in the beginning and one at the end of rotation (Shepard and Metzler 1971).

Unfortunately, most studies on mental manipulation focus on 2D to 2D (Cooper 1975) or 3D to 3D (Shepard and Metzler 1971). Studies are rarely concerned with 3D to 2D transformations (Tory et al. 2006). Therefore, it is hard to compare the current results to the literature. Nevertheless, it can be assumed that selecting an oblique slice from a volume in a single cognitive step is beyond human capability, and therefore, other strategies are necessary, most likely gradual rotational operations. It is confirmed that IRs (as a first mental step) select orthogonal slices they are familiar with. Probably, this slice is then rotated continuously until it matches the required orientation. This operation, however, is much more difficult than what we know in daily life and, for example, from psychological IQ tests on spatial abilities. In those IQ tests, the image is fixed on the blocks and moves along as a reference that can be seen. In radiology, every 2D plane (View E′ in Fig. 5) shows different features in view: the intersection with the 3D dataset. The radiologist has to trace this intersection back to the 3D volume or, even more complicated, has to compare mentally constructed oblique intersections with a view from US (or has to search the right US angle). It can be assumed that rotation angles further away from well-known planes are more challenging and the level of accuracy decreases. This could also be an explanation why oblique slices are difficult to interpret in general.

5.1.3 Selecting a safe trajectory (needle planning)

The spatial mental model of anatomy has to be pulled into the visual buffer that allows inspecting and manipulating the mental image similar to the way visible pictures are manipulated using short-term memory (Shelton and Pippitt 2006). Probably, the desired route of the needle is designed as a static object in relation to patient anatomy. An exception will be when irregularities are expected that have to be compensated for upfront, for example, in the case of a cirrhotic liver that makes the needle more difficult to insert. In this case, needle insertion could be designed as a process considering the needle to irregularly interact with different tissues. In general, not all patient characteristics are known beforehand.

The first requirement indicated by patient safety is that the distance of the needle line to vital structures should be enough to assure that they will not be punctured, taking into account organ deformation due to needle insertion and breathing. The second requirement is that the needle should end in a predefined position of the tumor. Liver tumor shapes are often (approximately) spherical, so the needle tip should be positioned in the center of the sphere. If the shape of the tumor is different, either one sphere or a set of overlapping spheres is mentally drawn around the tumor: “Then in your mind you place a circle around it, which depends on the largest diameter of the lesion. Plus 1 cm on each side. And I only know it on plane and it makes more difficult. Usually you assume it is a perfect sphere. Weird shapes make it less accurate. Multiple ablations can be applied. It is nice to draw it on 2D, but I don’t know what happens in 3D. That is the real problem.” If the tumor is too large, multiple ablations are necessary due to the limitations of ablation technology. Another factor is the ablation of healthy tissue caused by the spherical shape of ablation. In some situations, it is more efficient to apply multiple ablations to save healthy tissue: “If there is tumor with difficult shape, I make a big circle around it rather than two smaller ones. Because if there is some lesion remained you have to make it 3D in your head and that is difficult. It could be easier when you would have some model or some help to assist you in placing the next needle. But sometimes this approach is not preferable. When you could do a smaller ablation it would be nicer. If I would be sure, I would do the smallest ablation possible, because you always ablate normal tissue as well. And those patients are sometimes unstable in their liver functions, so the more tissue you ablate the worst it gets. It would be nice if you could do as small as possible, especially in those not spherical lesions. And the larger the ablation zone the more complications you could have, because there are more structures around. The number of inconsistencies of this approach at the moment is larger than when you take a large ablation zone. That is why I always use the larger ablation zone. You always try to avoid getting back your patient with residual tumor.”

5.1.4 Combining mental models with digital images (mental registration)

Mental combination of images is first presented through the example of combining preoperatively established mental models to intraoperative US images. This activity is reported to be extremely challenging, which is logical if looking into the series of underlying mental processes. First of all, the spatial mental model has to be pulled into the visual buffer and rotated in a way that matches patient orientation. Probably, a set of other transformations needs to be applied as well, such as sizing, so that the preoperatively established spatial mental model is comparable to the actual anatomy seen in the US image. The next step is to locate the slice in the 3D mental model that fits the intraoperative US image, as was described in Sect. 5.1.2. Our interviews showed that the doctor positions the US probe in a way that it is orthogonal to the body—although it is suitable to visualize any oblique plane—and therefore easier to interpret and to be compared to the previously seen CT image (Table 1, Q8).

5.1.5 Creating spatial reference systems

CT images show the route of the needle in a way that it can be clearly interpreted by the experienced IR. It can be immediately seen whether it follows the desired route or not. If not, the needle has to be reinserted, but now the doctor has a new frame of reference—the recently scanned actual needle orientation—to which the desired orientation can be compared. Therefore, it is very likely that the information used from this moment on is propositional rather than visual. The IR can plan the new trajectory in relation to the previous needle trajectory (Table 1, Q11).

5.2 Prototype-related issues

The interactive prototype proved to be very useful as a facilitator for in-depth discussions. Not only new techniques were explored and commented on, but participants were more likely to bring up difficulties of the current situation or in some cases propose solutions to these problems by theoretically adding some new features to the prototype. These experiences ensured that although mental models are in general hard to externalize in any form (verbal or visual), a well-chosen tool, and a corresponding clinical case can provide in-depth insight into the cognitive processes of IRs. Especially in this field, where ergonomics knowledge is missing, these fast practical solutions reveal design directions that are worth of further elaborate research.

5.3 Human–computer interaction

Findings of the exploratory study related to HCI, and their implications are discussed in this section. Table 2 presents these findings, provides examples, and addresses design consequences.

Table 2 HCI related findings

5.3.1 Interpretation of radiology images

The three aspects of interpreting a radiology image are as follows: (1) maintaining spatial orientation, for example, in order to know which part of the body is seen from which angle, (2) locating structures, for example, in order to assess anatomical information or to find a target, and (3) identifying shapes of structures based on their contours, for example, to recognize which structures are in the image and to assess individual anatomical variations. These three aspects are interconnected, because shapes (contours) are used to locate structures, and structures are used for spatial orientation.

In interventional radiology, orthogonal cross-sectional images are the main means for diagnostics and communication among doctors. These orthogonal cross-sections are used in anatomy training building up knowledge about human anatomy in the physician’s mind. As a discussion with an IR revealed, there are several sources of information used when assessing anatomy: (1) reference images of general anatomy as it is depicted in anatomy handbooks, (2) knowledge about variations in anatomy regarding population, and (3) reference images of different imaging modalities. Reference images are recalled from memory to compare to the perceived image on the computer screen. Among the three orthogonal orientations (axial, sagittal, and coronal), axial is the mostly used view that intersects the body from the head to the feet. Sagittal and coronal views are used to better localize structures. Our study showed that interventional radiologists always refer to an orthogonal view for mental reconstruction of anatomy and for reassuring the acquired information. Orthogonal views are also used in CT/MRI imaging. Even when using US, which is an inherently oblique modality, physicians have tactics to make use of the well-known orthogonal views. The US probe is moved in orthogonal directions until the required structure is located, and then, the probe is rotated to have oblique views to assess fine anatomical and pathological details.

5.3.2 Trust

Physicians naturally use orthogonal views when scanning the patient, because they are directly comparable with the reference images in their mind; therefore, these views provide the most trustworthy information. Images taken in oblique orientations differ from the well-known orthogonal views, and as it turned out in our study, they are not considered as useful source of information. Oblique views can even cause disorientation, because organ contours are distorted as compared to the reference images.

Our study showed that oblique orientations are useful or even necessary in certain situations, for example, when planning or evaluating a needle line. Nowadays, there are also products on the market that promote the use of oblique CT/MRI views. In order to make use of the combined information, US images are registered with CT/MRI images to better localize structures intraoperatively. In order to maintain spatial orientation when using oblique views, a solution could be to apply spatial reference aids that help in quick orientation in each view, also when a quick jump is necessary between different views.

5.3.3 Experience

Two aspects of experience were discovered in the exploratory study. The first relates to years spent in training and practice, and the second relates to frequently performed procedures and related imaging modalities and tactics. Regarding the first aspect, it could be seen that novice IRs follow the standard procedure of diagnostics, that is, first having a look at the transverse, then the sagittal, and then the coronal views. More experienced IRs were more flexible in their approach of combining the provided views and were more open to unfamiliar views. Regarding the second aspect, it was found that different tactics were applied, which were sometimes explained by tactics used in other procedures. Please consult Table 2 for examples.

The discovered differences related to experience imply the design of different modes of the system (novice, expert, practice) and a flexible user interface. As it was pointed out by one participant, a practice mode would help physicians with less experience to gain a better understanding of spatial orientation and needle navigation without the stress of causing damage to a human being.

5.3.4 Interaction devices

In current practice, the mouse is the most widely used device to interact with radiology images. Participants of our study mentioned that they would prefer to use the mouse over the planar interaction device. However, the device was used correctly after a short explanation by each participant. It was observed that more experienced IRs found it more intuitive than less experienced ones, probably because it took less effort for them to match the orientation of the device to the orientation of the dataset representing the patient. The study also showed that because physicians have an extensive experience in navigating a needle, a tracked needle or a needle-like device could be an intuitive interaction means in different phases of percutaneous procedures.

5.3.5 Strategies and training

The interface presented a range of views at once to allow freedom of choosing preferred tactics and views. Participants had their individual strategies in choosing the views in terms of deciding which ones to use and the order of their usage. While certain strategies may be good to be chosen by the user, best strategies could possibly also be identified and trained with the newly developed system. For example, strategies that relate to common psychological or psychomotor properties of humans are probably worthwhile to be discovered and trained, while the system should remain flexible enough to accommodate strategies that relate to experience. Table 2 gives the example of focusing attention on one view at a time during needle alignment. Another example is that participants found the rotation of the volumetric view very helpful in aligning the needle to the planning line. This action was also natural to non-medical people who tried out our system. A logical explanation is that depth perception was improved by the moving planning line on the computer screen—a phenomenon which is defined as optical flow caused by moving objects (ref.). Because of this, the orientation of the planning line was better understood, which was followed by an intuitive and more accurate physical action.

6 Conclusions

In a percutaneous procedure, the IR has to navigate the needle to a target area in the human body following a safe trajectory. In this process—due to the lack of direct visual feedback—the IR collects information about the location of the needle using radiology images, for example, US and/or CT images that are visualized on computer screens. Crucial information is often unavailable or inappropriately presented, and therefore, the IR has to make decisions based on knowledge about the current situation that was obtained preoperatively. Therefore, it is crucial how accurate this knowledge is and how it can be retrieved and applied in the decision-making processes.

The preoperatively generated knowledge is related to human anatomy and to the optimal trajectory of the needle, therefore it has a spatial structure. The IR uses his spatial mental model of anatomy specific to the patient. This spatial mental model has to be manipulated in order to define the entry point and set the orientation of the needle before insertion, and later to ascertain that the trajectory is in fact safe. When the (planned) needle trajectory differs significantly from the preferred orthogonal directions and is oblique, manipulation of the spatial mental model is highly difficult. Findings of visual imagery studies suggest that a continuous and gradual mental rotation of images occurs when manipulating mental models, in order to arrive at an unfamiliar view from a well-known one.

Also outside the intraoperative situation, oblique views are hard to interpret. The most important aspect of interpretation is maintaining spatial orientation. Sometimes in the liver, oblique views are the only possibility to support planning and insertion. Therefore, an urgent task of UI design is to develop visual feedback that provides a spatial reference system for easier interpretation of oblique views that are transferrable to a myriad of situations. Visualization techniques have to conform to human capacities and cognitive processes as well as to techniques and workflows of current medical practice.

In future research, UI elements will be designed and tested in order to find patterns in cognitive reactions. Results of these tests will then be used to develop general interface solutions that can be applied in 3D radiology navigation. Action research will be used to iteratively prototype and test variations of user interface designs in the context of interventional radiology.