1 Introduction

Visual disability has a major impact on the quality of life of people who has it, including their ability to study, work and to develop personal relationships [1]. In this aspect, technologies, such as serious games [2], have been designed to assist people who are blind to support daily life activities. These technologies work as aids to facilitate their independence, autonomy, and safety. Thus, such technologies improve the quality of life of people with visual disabilities and could stimulate and develop several skills, such as cognitive skills [3].

Even though there is technology specialized for blind people (i.e., visually impaired), they are still using applications that are similar to older applications for the sighted population. For example, Battleship was one of the earliest games to be produced as a computer game with its release in 1979 [4]. AudioBattleShip, a version for both blind children and sighted playing together came many years later [5]. In general, blind people have particular human-computer interaction needs, and the user interface should be suitable for them.

There are many efforts towards to develop accessible multimodal interfaces for visually impaired, especially in multimodal games [6, 7]. Despite this effort and in contrast to the visual interface evolutions of games and applications for sighted people, interfaces for people who are blind explore other ways to interact with the user. In general, technologies for people with visual disabilities combine different sources of perceptual inputs and outputs. The modes (sources of perceptual inputs and outputs) combined, typically audio and haptics [5, 8], provide multimodal interfaces that enable multimode channels for the combination of different user senses [2]. Although multimodal interfaces could help to improve the learning skills of people with visual disabilities, most of these technologies have been not completely validated; mostly, they remain in the prototype phase without being integrated into the people’s everyday life [9].

In relation to the quality of applications, the No Child Left Behind (NCLB) Act defined that research in inclusive education must (a) utilize the scientific method, (b) be replicated in more than one set by more than one investigator, and, (c) result in findings that converge to a clear conclusion [10]. Thus, some studies in this area use Evidence-Based Practice [5, 11], which meets prescribed criteria related to the research design, quality, quantity, and effect size of supporting research [12, 13]. Thus, this method provides the measurement of the effectiveness of using technology.

Considering these multimodal interactions with interfaces for blind people, it is necessary to verify if the technologies thought for them are effective and how they impact users in cognitive dimensions [14]. Effective impact evaluation, which has been used as evidence-based in other domains for users with and without disabilities, should, therefore, be able to assess precisely the mechanisms by which people with visual disabilities are developing or enhancing cognitive skills [3]. According to Darin et al. (2015), there is a gap of studies proposing instruments and methods for evaluating the cognitive impact in the context of multimodal video games for cognitive enhancement of people who are blind. In general, the literature studies do not follow guidelines to support cognitive impact evaluation of multimodal interfaces for blind people.

In pursuing to shed some light concerning this, our work presents the state-of-the-art study on cognitive impact evaluation of multimodal interfaces for blind people with profound inability to distinguish light from dark, or the total inability to see [15]. This state-of-the-art study consists of a systematic review to analyze how studies evaluate the cognitive impact in this context. The systematic review is part of an ongoing work on the design of guidelines for evaluating the cognitive impact development and enhancement in multimodal interfaces for blind people. As related literatures, some works also propose guidelines for other concepts in the context of accessibility. As an example, the study [16] defines games accessibility guidelines helping developers to design their products with the interface and the parameters adapted to the needs of users with disabilities.

The remainder of this paper is organized as follows: Sect. 2 includes the Theoretical Background; Sect. 3 presents the methodology used in this work; Sect. 4 covers the results of the systematic review; Sect. 5 discusses the results; and, finally, Sect. 6 concludes the study.

2 Background

2.1 Cognitive Impact Evaluation

There are technological aspects to these systems which are investigated from the perspective of how they affect use [17]. Cognitive impact concerns the interaction between humans and systems. The field of human-computer interaction has pioneered in the formal study of the cognitive relationship between a person’s activities, the artifact of the computer, and the task [18]. The technologies could enhance human cognitive capabilities.

Darin et al.’s (2015) literature mapping study shows the cognitive process analyze applications according to a four-dimensional classification (Interface, Interaction, Cognition, and Evaluation). The evaluation dimension includes two main aspects: usability and cognitive impact. This last one assures that an application can develop or enhance any cognitive skills for people with visual disabilities.

Still on this study, the cognition dimension comprises six skills: mental models, mental maps, spatial structures, Orientation and Mobility (O&M), problem-solving, and social collaboration. Such approach addresses the main cognitive skills developed and enhanced for impact evaluation purposes. These dimensions could provide directions to define tasks in an experiment to measure the cognitive impact as detection of some obstacles, a useful data for evaluating O&M [19].

The study also shows that most papers classified in main cognitive skills are about Mental Map and O&M. The O&M skill is a broad concept that is also related to wayfinding and navigation. According to Pissaloux and Velázquez (2018, p. 1), “Human mobility is one of the most important cognitive tasks. Indeed, independent and secure mobility in real physical space has a direct impact on the quality of life, on well-being, and on integration in the numeric society.”

Darin et al. [19] define mobility in a four-dimensional problem: walking, wayfinding (or orientation), space awareness (or space knowledge) and navigation. According to this definition, walking is a low conscious cognitive task and involves displacement in the near space. It takes in account obstacle detection and localization. The wayfinding is a set of processes to know one’s current position in space to reach one’s target. The space awareness requires a high consciousness level. It includes forming mental maps, e.g., know the name of the street on a plan. The navigation, the highest level cognitive task, is a result of the implementation of all listed above functions while traveling.

2.2 Experiment in Software Engineering

The impact evaluation of a software is an experiment process and includes several steps: Scoping; Planning; Operation; Analysis and interpretation; Presentation and package [20]. This provides a high level of control, using a formal, rigorous and controlled investigation.

The main concepts involved in the experiment, shown in Table 1, are used to understand how the cognitive impact is evaluated and to conduct the designing of the guidelines. Figure 1 shows how these concepts are related to the experimental process.

Table 1. The main concepts of experimental design [20]
Fig. 1.
figure 1

(Source: [20]).

Variable relationship in the experiment process

As an example of experiment process, we consider the measurements, instrumentation, and variables from the experiment conducted in [21]. In this study, the authors evaluate the navigational performance the of virtual environment called Audio-based Environment Simulator (AbES) that can be explored for the purposes of learning the layout of an unfamiliar, complex indoor environment. The dependent variable evaluated was the navigation performance (Orientation & Mobility, O&M).

Some information about the participants is controlled and works as independent variables such as etiology of blindness, age, gender, hand preference, and verbal memory (assessed by using the Wechsler Memory Scale). These variables are controlled and fixed to ensure the correct measurement. The factors in the experiment are the age of blindness onset and the interaction condition with AbES. The factors are randomly distributed into groups: early blind and late blind; and gamers, directed navigators, and control group. The measurements variables used are task success, navigation time, and shortest possible path score.

3 Methodology

The methodology consists in a Systematic Review [22], which aims to review the existing evidence concerning the impact evaluation of multimodal interfaces and also seeks to summarize the empirical evidence concerning the strengths and limitations of a specific evaluation method [22]. In contrast to an ad hoc literature review, the systematic review is a methodologically rigorous analysis and study of research results.

To achieve our goal, the main research question for this first part of the proposal was “How is the cognitive impact evaluated on multimodal interfaces for people who are blind?” For a better understanding, as a second goal question, we aim to learn the challenges regarding impact evaluation on this scenario.

The process of a systematic review includes three main phases: planning the review; conducting the review and reporting the review [22]. During all process of the systematic review, we used the tool StArt [23] and the software Microsoft ExcelFootnote 1 as a support to create the protocol, apply the filters, select the papers and show the results. We organize all references on software MendeleyFootnote 2. As the papers retrieved from PubMed Central are in MEDLINE format, we developed the tool Medline2bibtexFootnote 3. It works as a parser to permit the list to be read by both StArt and Mendeley.

The next subsections describe the planning (the study selection criteria, the research sources selected) and the conducting phase (the search Process, the data extraction form fields and the studies quality evaluation). The entire process was stored in an excel worksheet available onlineFootnote 4.

3.1 Planning: Definition of the Protocol

In the planning phase, we define a review protocol that specifies the research question being addressed and the methods that will be used to perform the review [22]. Based on the goal of systematic review we define the search string as shown in the Fig. 2.

Fig. 2.
figure 2

Search string

Study Selection Criteria.

We define the search criteria as studies that present technology for people who are blind or visually impaired and has applied a cognitive impact evaluation. The inclusion and exclusion criteria are according to the goal of the systematic review. These requirements comply the general objective of this study, but also aim to have a broader view of the assessment in various technologies in the area. Table 2 presents the inclusion (I) and exclusion (E) selection criteria. To be accepted, a scientific paper must cover all inclusion criteria and none exclusion criterion.

Table 2. Inclusion and exclusion criteria

The I.01 criterion defines the scientific articles must be in English, because it is the mandatory language for the main events and scientific journals in the search area. And they must be published between 1st January 1998 and 2nd August 2017. The year 1998 was a milestone due to the paper Lumbreras et Sánchez (1998) which works with 3D acoustic interfaces for blind children and is the last study known [24].

The technologies defined in the I.02 criterion include mobile application, computer software, IoT systems, virtual environments or a video game with multimodal interfaces. Also, we accept technologies that are not specifically for people who are blind or visually impaired with the goal of expanding the results, but the studies present the technology focused on users with visual disabilities. We exclude from all technologies that uses Sensory Substitution Devices [19], which substitutes a sense. The device SSD (Sensory Substitution Devices), out of our scope, includes sensory replacement, haptics as sensory augmentation, bionic eyes, retinal visual prosthesis, cortical implants and others. This definition is important to plan the methodology proposed and to delimit the focus.

We define the studies type in the E.03. This criterion excludes all studies type different from primary studies that present technology for people who are blind and its evaluation. We accept articles, conference papers, short papers, and book chapters. This criterion includes documents that have the minimum information to understand the evaluation. We did not cover books because the information is dispersed inside them.

3.2 Conducting

In the conducting phase, firstly, we identify and select studies. To identify, we did a manual string research in five scientific bases: Scopus, Springer Link, PubMed, PubMed Central, and Web of Science. We chose the main research bases in the research area or the bases that index them [25]. Other bases were not included because they are indexed by the bases considered.

Search Process.

The conducting phase starts with the initial search in the scientific bases proposed. The string was applied on the metadata of papers, which includes abstract, index terms, and bibliographic citation data (such as document title, publication title, etc.). It was retrieved 2136 papers. Figure 3 resumes the conducting phase process.

Fig. 3.
figure 3

Filters in the conducting phase

The first filter excluded papers duplicated and document types out the scope due their format (E.03). The second filter identifies which paper is in and out the scope by reading their titles and abstracts (E.01). A lot of papers were excluded in the first filter because the scientific base PubMed Central (PMC) brings a lot of medical papers focused on disease effectiveness and specific medical statements. Even though the area of this study is computer science, we decided to insert the PMC in the bases’ list due to the nature of the subject.

Next, in the third filter, we evaluated each retrieved paper in its entirety (E.02). If necessary, besides the entire text, we search more about the technologies and processes described, as project and institutional websites, videos, newspaper articles and others.

Once we have chosen the select papers, we extract all data required (detailed in the protocol) to achieve the objective. The organization of the data generates data synthesis, which will be shown in the Sect. 5.1. It remains to be done the study quality assessment of each paper retrieved. Table 3 shows the quantity of papers selected in each filter per scientific bases. The main reason for withdrawing papers in the last filter was the evaluation performed is out the search and at most times related to the system performance, e.g., sensor performance evaluation.

Table 3. Quantity of papers accepted per filter

Data Extraction Form Fields.

The data extraction was designed to answer the main and second questions and to understand the context in which each paper is inserted. We divide the data collected into three categories: (i) General, (ii) Research and (iii) Empirical. The general category comprises bibliographic information.

The research category comprises the attributes of the research and the technology present in the paper. In this category we appraise the scientific paper into two classifications. The first one fit the paper according to the research type [26], which can be validation research, evaluation research, solution research, philosophical research, opinion paper or experience papers. The second classification fits the technology presented in the key features of multimodal interfaces for the cognition of people who are blind [3]. This classification is divided into 4-dimension: Interface, Interaction, Cognition, and Evaluation; and it is applied to video games and virtual environments (Fig. 4).

Fig. 4.
figure 4

(Image from [3])

Key features in multimodal interfaces

For our purpose, we classify only in the interaction, interface and cognition dimensions; and we cover, in the classification, more than video games and virtual environments, since we also found these features present in the technologies selected. These features provide necessary insights for the practical understanding of the issues involved in their design and evaluation [3]. They are useful in our research for giving a comprehensive overview of technologies and evaluations regarding the multimodal interfaces. The research category still shows more information about the research, as other strategies used to evaluate.

The empirical category provides information specifically about how the empirical method evaluates the impact of the cognitive impact. The empirical method classification [27], which is retrieved in this category, classifies the empirical method in three types: Experiment, Survey and Case Study.

To conclude the achievements, all data pass by a manual analysis for acquiring qualitative and quantitative results. The Sect. 4 of this paper brings the compilation of data extraction.

Studies Quality Evaluation.

Study quality assessment. The quality evaluation was based on [28]. Although the areas are different, we apply adaptation that resulted in the quality checklist of Table 4. The checklist assesses the studies and measures the weight of each study found in the final results. Each question subtracts or adds points that give a general score to the empirical method. This score was defined according to the importance of the requested data and they are all linked to the data extraction form fields.

Table 4. Quality assessment form

4 First Results of the Systematic Review

The systematic review brought 25 papers that have 28 experiments because three papers have two experiments per paper [19, 29, 30]. Among these 25 papers, some of them do a cognitive impact evaluation in the same technology. Then the papers present 23 technologies for people who are blind or visually impaired as shown in Table 5. The next subsections explain all results obtained in each category of data.

Table 5. Technologies encountered

General Data.

Among all papers, 14 are conference papers [5, 8, 11, 30, 33, 34, 36, 38,39,40, 45,46,47,48], 1 paper [19] is a book chapter and the others (10 papers) are from scientific journals. Figure 5 shows the papers selected regarding the timeline that started in 2004 and has peaked in 2011 and 2014. The affiliation most present is University of Chile with 15 papers.

Fig. 5.
figure 5

Timeline papers

Research Data.

The papers selected present two types of research [26]: one paper is validation research [19], and 24 papers are evaluation research. The only validation research paper is defined in this way due to the technology presented is novel and have not yet been implemented in practice.

The interfaces, interactions and cognition skills of technologies presented in the papers are classified according to their key features of multimodal interfaces [3]. These characteristics are important to understand which type of interfaces are assessed and how the impact is evaluated on them. Figure 6 shows that the most notable mode of interaction is the keyboard and the least used is the mouse. The keyboard does not generate more complexity and expenses as the Novint Falcon device, used in [34]. The Novint Falcon is one of the Force Feedback Device that promotes a Tactile and Kinesthetic Feedback. Mouse and Natural language are less used. The mouse is replaced by buttons or other specific devices for better interaction as shown in [33]. The most common Feedback is the Sonorous and the main Audio Interface used is the Iconic Sound, which are sounds associated with each available object and action in the environment [8].

Fig. 6.
figure 6

Results of key features in multimodal interfaces

Many papers apply another approach to evaluate other criteria not covered in the cognitive evaluation. Concerning other strategies used to evaluate the interface, 14 papers [5, 11, 29, 30, 34,35,36,37,38,39, 41, 42, 47, 48] applied a usability evaluation together with cognitive evaluation. Other strategies used were: evaluation of a tactile perception [19, 46], system performance [38] that tested the hardware used, evaluation based on HCI heuristics [33], recognition of pattern [19], obstacle awareness goal [19], homing and obstacle avoidance [19] and iconic evaluation [39]. Eight papers do not apply other strategy to evaluate the system or user interaction.

Usability evaluation is the most used assessment besides cognition impact evaluation. The usability evaluation is mainly used to obtain information about the user’s acceptance of the software and the match between his or her mental model and the representation included in the software [36].

Empirical Data.

Concerning the 28 empirical strategies that evaluate the cognitive impact, most papers applied experiments and only one applied a case study [38] due to the small number of participants. As our focus is the experiments, we treat in both types only the experiment criteria. Due to this fact, we call all empirical methods as experiments. The data that came from the Empirical category gave a substantial part of the comprehension of this research. Figure 7 presents all data extracted from each experiment. It is important to note that not all the papers presented the data sought. These faults were included in the quality criteria and then considered in the discussions and conclusions. The empirical data is counted according to the number of experiments (28 experiments), and not papers.

Fig. 7.
figure 7

Data extracted in Empirical category

Instruments.

The instruments are means for data collection in the cognitive impact evaluation with the objective of identifying some user ability controlled on the Experiment (as an independent variable), e.g., the mathematics knowledge test in [37] or is used to guide the evaluation process, e.g., observation guideline to assess O&M skills in [39]. Among them, there are 15 checklists [5, 8, 11, 32, 34, 37, 39,40,41, 45, 46] which include guidelines and specific tests, 11 questionnaires [11, 29, 32, 35, 36, 38, 42, 43, 46, 47], 7 interviews [8, 29, 33, 39, 41, 42], 6 modeling kits [5, 19, 30, 32, 37, 42] and 9 logs [5, 21, 29,30,31,32,33, 36, 42, 48] which include, in addition to the system log, the video and audio logs. Many studies produce their instruments (7 experiments), others work with instruments found in scientific literature.

Statistical Methods.

Figure 8 shows the statistical methods used in the experiment data analysis. Neither we take account of simple statistical methods as averages and gain percent, that is the only method used by 11 experiments [19, 30,31,32,33, 35, 37, 38, 42]. Three experiments [36, 39, 45] do not specify the statistical method used, neither in the references. T-test, which uses statistical concepts to reject or not a null hypothesis is the most used, followed by ANOVA and Pearson’s Correlation Coefficients. These two methods are used to analyze the variation between groups. ANOVA procedure was applied in one, two and three-way.

Fig. 8.
figure 8

Statistical methods

Resources.

Concerning the resources, 7 experiments [11, 30, 36,37,38, 41] give information about the time spent in the evaluation and 2 experiments in the same paper [29] specify the human resource. The mean of the time among these is 3.4 months and the longest time spent is 6 months. None of them talk about financial resources.

Ethical Concepts.

5 experiments [21, 29, 43, 44] mentioned signing consents. One of them also applies stop rules to enforcing ethical concept [21] and present the ethics council that it approved.

Sample.

From the number of users to the onset age of blindness, there are many sample combinations in the selected experiments. The sample choice could be based on level of experience required to do the task, age or level of blindness. Some characteristic controlled in the sample are related to the disabilities, as the onset of blindness, the etiology of a visual impairment or the presence of another disability. The quantity of users varies as shown in Fig. 9. Most of the experiments (75%) are applied to 3 to 12 users. About the range of age, the most of experiments (9 experiments) are applied in teenagers (10 to 15 years old). The range and ages are shown in Fig. 10.

Fig. 9.
figure 9

Quantity of users in the samples

Fig. 10.
figure 10

Users age range per experiment

The gender distribution in samples is equilibrated in most cases, but we not count the 11 experiments which do not describe this information. The mean of gender proportion is 50% for women and men, with 2% of variance.

The distribution between the blindness level is varied. There are experiments where the sample is all formed by people who are blind [49] and there are samples formed only by people who are blindfolded, as the experiments in [19]. Figure 10 shows the user age range. Figure 11 shows the blindness level distribution between the samples.

Fig. 11.
figure 11

Blindness level distribution

Tasks.

The tasks explored in the experiments are related to the technology assessed. The tasks to assess the development of O&M skills are based on virtual and real environments [30, 34]. Some of them use modeling kits to represent the virtual environment in the real world and to analyze the space awareness of each participant and the cognition improvement [30]. Another example of cognitive task is to read of a text with the guidance of virtual sounds. Some works have used levels of complexity in the tasks to quantify the cognition impact, for example, the work [45] estimates the task performance in 5 levels.

Variables.

The most current independent variables, which are controlled in the experiment, are related to the sample choice, the characteristics defined as etiology of blindness, age, blindness level and gender [41]. The dependent variables, in which we want to see the effect, are related to the measures and the impact. The factor, which is a type of variable controlled is modified in the experiment to see the cognitive effect. For example, the experiment proposed in [8] uses as factor different outputs of the multimodal interface (audio group, haptic group, and haptic-audio groups).

Measures.

The measure to evaluate the cognitive impact focuses on the performance to compare before and after the use of the technology assessed, e.g., [37], or between two groups with and without the technology, e.g. [21]. The measures are strongly related to the instruments used. For example, the checklists of [34] assess the task using scores for sensory perception, tempo-spatial development, and O&M skills. In some works, each measure has a property scale or options, as the Likert scale used in [41]. Eleven experiments [8, 11, 30, 34, 37,38,39,40, 42, 48] apply instruments as pretest and posttest to measure the impact.

5 Discussion

As shown in the previous section there is consistent research that evaluates the cognition impact since 2004, showing the importance of this type of study on technologies for blind people. Among all papers retrieved from initial search in the systematic review, we could see in general, that the papers do not evaluate the cognition impact. The preference of evaluation lies to the system, be it hardware or software. The same result is seen in papers that are retrieved in the third filter, one of the most used strategies to evaluate the technologies is the system evaluation besides cognitive evaluation. Despite this finding, the leading role of these technologies is often to help a user’s cognitive activity. Thus, assessing if the cognitive purpose was achieved is an important part of constructing software for blind people.

The research data give us an understanding of how the interaction works on multimodal interfaces of the technologies for blind people. The interaction, cognition skills, and interface characteristics encountered model the planning of the experiment. The combination of a keyboard as the mode of interaction, the sonorous feedback and iconic sound to the interface stands out for the interactions encountered.

In the empirical data, we found a huge diversity of experiments. Although we understand these differences between the experiments, we come across differences in the process of planning and presenting the experiment and data acquired. Instruments, tasks, variables, and measures are strongly related to the technology assessed and the cognitive purpose. Almost no work presented the variables explicitly according to the classification of dependent variables, independent variables, and factors. To understand the experiments, we define these three kinds of variables in each experiment. However, this adds a cost to fully understand the experiment.

From the number of users to the onset age of blindness, there are many sample combinations in the selected experiments. One point that stands out in the selection of the sample is the number of blind users who perform the experiments and how these data are computed. There is, for example, one work that uses the only blindfolded sample, which refocus the conclusion of the experiment.

Statistical power is an inherent part of empirical studies that ensure the results found and to the study conclusions [50]. However, 11 experiments analyzed only the percentage of the gain or the average. Wohlin et al. [20] mention that one of the advantages of an experiment is the ability to perform statistical analysis using hypothesis testing methods and opportunities for replication.

Almost no experiment dealt with the resources and ethical concepts. Although there is information about the resources in some papers, in general, this information is not clear in the text and is often incomplete. The missing information makes difficult to repeat the experiment. The ethical concepts also are not well covered in the papers; even it is an essential step to produce an experiment with people who have disabilities [20].

6 Conclusion

The goal of a state-of-the-art is to review the existing evidence concerning the impact evaluation of multimodal interfaces and also seeks to summarize the empirical evidence concerning the strengths and limitations of a specific evaluation method [22]. With this work, we expected to have created a bibliographic review on the cognitive impact evaluation based on the steps of the systematic review approach. Our scope was bounded by multimodal interfaces for people who are blind (in this paper a synonym of visually impaired).

The technologies for people who are blind have many needs due to the target audience and special characteristics with multimodal interfaces, moreover, lots of applications for blind people aims to improve cognitive skills, such as enhancement in O&M, wayfinding, and navigation skills, and thus supporting the user in daily lives. As so, it is important to point out that the use of evidence-based is essential to measure the real impact of the technologies.

After compiling the data from the systematic review and analyzing theoretical foundations, we conclude that there is a need to better plan and present data from experiments on technologies for blind people. With this, we guarantee the quality of the experiment itself and the interaction of the technology with respect to the cognitive objective. Faced with this nego, we propose as future work to better explore the preliminary results found to improve the data analysis using Grounded Theory and create a set of guidelines that appropriately guide experiments to evaluate tools for blind people. These guidelines will provide a way to evaluate the cognitive impact to development and enhancement in blind people or visually impaired, considering the main aspects of multimodal interfaces.