I see, you design: user interface intelligent design system with eye tracking and interactive genetic algorithm

  • Shiwei ChengEmail author
  • Anind K. Dey
Regular Paper


User interfaces are important for streamlining the interactions between humans and computers. However, there are few effective approaches for collecting users’ preferences implicitly and objectively for the purpose of user interface (UI) design optimization. This paper presents an effective approach to interactive genetic algorithm (IGA) optimization-based UI design via eye tracking, including eye-movement data based users’ preference inferring, gene coding for real UI components and design features, and the visualization and interaction mechanisms. Then we design and build a prototype system that applies IGA to generate and optimize music player UI solutions automatically. An evaluation of our prototype system suggests that it can generate and identify personalized UI designs reliably with minimal user intervention and high efficiency and user satisfaction.


Eye-movement Interactive evolutionary computation 

1 Introduction

To improve the quality of user interface (UI) design, most UI designers make efforts to build multiple, alternative UI solutions, then evaluate and refine them iteratively. Despite the introduction of focus groups and personas, the traditional UI design process rarely takes users’ personal preferences into account. We can consider UI design as an optimization problem and automatically optimize a UI based on users’ personalized preferences.

The key to personalized UI optimization design is the evaluation function. The evaluation function characterizes how the UI contents need to be adjusted to match the preferences of users and, the content itself is optimized based on the evaluation outcome (Liapis et al. 2012). However, for UI design, some features, e.g., visual aesthetics, it is difficult to construct an explicit evaluation function, as evaluating visual aesthetics involves subjective decisions on stylistic preferences and lacks quantitative standards (Eisenstein and Puerta 2000).

In this context, a widely used evolutionary computation algorithm, interactive genetic algorithm (IGA), is used to tackle this kind of optimization problem (Takagi 2001). IGA does not require a mathematical expression of an evaluation function to indicate the quality of a candidate solution and, instead, subjective feedback from a human user can be used. Although IGA based optimization mechanism could be helpful in generating more personalized UIs, this approach still has some limitations (Takagi 2001; Semet 2002; Pallez et al. 2010). First, to produce new sets of candidate solutions (referred to as new generations), IGA requires manual evaluation for each individual solution of each generation, including mouse clicks or key strokes for selections, ratings and so on. This kind of explicit interaction increases users’ mental and physical workload, and likely causes fatigue, particularly as the number of generations of candidates to evaluate increases. This increased workload can sharply reduce the accuracy and efficiency of the optimization process. A second issue is that explicit interactions consumes user time, as it only proceeds as quickly as users can evaluate candidate UIs and explicitly select the best candidate. This results in a slowing of the optimization process.

To address the problems, in this paper, we use eye tracking to collect users’ preferences and model fitness functions on existing UI designs implicitly, and then drives the UI optimization process using IGA. We evaluate the approach using a music player UI design task. In this task, the participants are not required to make any preference selections or express any explicit preferences, instead, only using browsing UI candidate solutions. Our evaluation shows that our system can generate personalized designs by tracking users’ eye gaze implicitly, and reduce workload and time cost.

Before we discuss our novel approach to UI optimization, we discuss past efforts in this research area.

2 Related work

2.1 Genetic algorithm and UI optimization design

According to Darwin’s theory, “survival of the fittest”, candidate solutions are evolved using biological genetics and natural selection as driven by their “survival” ability (Engelbrecht 2007). Evolutionary computation uses an evaluation function called the fitness function that expresses the “survival” ability of a candidate solution. Different from classical evolution computation, IGA takes users’ subjective feedback via their interactions with a system, and then models this feedback to create the fitness function (Lee et al. 2001). IGA obtains optimized solutions with varying degrees of success in graphic design, such as brochure design (Quiroz et al. 2009), fashion design (Ito et al. 2008), poster design (Kitamura and Kanoh 2011); in web page design (Monmarche et al. 1999; Brabrand 2008; and, other types of UI design.

The key to personalized UI optimization design is the evaluation function. The evaluation function characterizes how the UI contents need to be adjusted to match the preferences of users and, the content itself is optimized based on the evaluation outcome (Liapis et al. 2012). For example, some researchers defined the cost function (evaluation function), as corresponding to some quantitative measure of interaction cost, such as the ability to navigate between widgets, or manipulate widgets, and a user’s performance in doing so (Gajos et al. 2010). They developed one prototype system, SUPPLE, which minimized the cost values to rebuild the UI layouts automatically, according to these measures. However, SUPPLE required users to complete multiple sets of pointing and selection tasks to collect user’s performances explicitly. Moreover, for visual aesthetics, it is difficult to construct an explicit evaluation function, as evaluating visual aesthetics involves subjective decisions on stylistic preferences and lacks quantitative standards (Eisenstein and Puerta 2000).

On the other hand, interactive evolutionary computation algorithms, such as IGA can be used to tackle this kind of optimization problem (Takagi 2001). Unlike classical evolutionary computation, interactive evolutionary computation does not require a mathematical expression of an evaluation function to indicate the quality of a candidate solution and, instead, subjective feedback from a human user can be used. This feedback is used to produce a new set of candidates, referred to as a new generation. This process of evaluation and candidate production is repeated until a desired solution has been reached. Researchers have employed interactive evolutionary computation algorithms and obtained optimized solutions with varying degrees of success in graphic design, such as brochure design (Quiroz et al. 2009), fashion design (Ito et al. 2008), poster design (Kitamura and Kanoh 2011); in web page design (Monmarche et al. 1999; Brabrand 2008) and, other types of UI design. All of this past work collected the user’s evaluation feedback by recording their operations in a task trace. For example, some researchers developed a system that displayed web page style sheets and asked the users to manually select those that looked the best and those that best represented their preferences, and then the system followed these UI styles to make further optimizations accordingly (Monmarche et al. 1999). Similarly, others developed a system for designing the visual layout of posters (Kitamura and Kanoh 2011). Users were asked to explicitly rate each design solution, and then generated new posters whose layouts were similar to the highest rated ones.

2.2 Eye tracking based interactive optimization

Some studies have used implicit interaction schemes to obtain users’ subjective feedback, such as eye tracking-based interactions. For example, Holmes and Zanker used an eye tracker to capture gaze fixations, and confirmed the reliability of such a measure of visual attention as representative of aesthetic preferences (Holmes and Zanker 2008). Furthermore, psychologists have found that eye-movement measures can produce good fitness functions for visually evolutionary algorithms (Holmes and Zanker 2012). When conducting research on gaze-data driven optimization, Pallez, Brisson and Baccino asked users to find the lightest color square amongst a set of presented colored squares, and they found that users’ eye-movement data had specific patterns (e.g., concentrated on some light color squares) during the searching process (Pallez et al. 2008). Pallez et al. (2007) recorded eye-movement data measures, such as total amount of time spent on a screen region, and combined them with IGA to propose the Eye-Tracking Evolutionary Algorithm (E-TEA). Our own previous work initially explored how to employ eye tracking-based IGA for optimizing the visual appearance of products (Cheng and Liu 2012; Cheng et al. 2010).

However, these eye-tracking based approaches have only been applied to very simple optimization problems, for example, those that involve a small number of features to be optimized. In addition, they did not propose effective visualization and interaction mechanisms for collecting eye-movement data input, such as how to display the solutions on the screen to filter out involuntary eye-movement data. Finally, they did not make detailed evaluation to validate the effectiveness and feasibility of the eye-tracking based systems.

2.3 Challenges

This past research combining eye tracking and IGA helped us to understand how to generate personalized UIs; however, there are still several challenges left to be addressed:
  1. 1.

    Existing work only has been applied to simple optimization problems, which focused on optimizing one or two visual attributes, but not involved design features for optimizing real UI design. For example, the research only concentrated on color (Pallez et al. 2007) or geometry shape optimization (Holmes and Zanker 2008);

  2. 2.

    Existing work did not analyzed the reasons why selecting specific eye-movement measures and parameters can be used to infer users’ subjective preferences, e.g., no quantitative models for users’ preference predicting or analysis for visual cognition process;

  3. 3.

    There has been no research on how to support users’ gaze-based operation with appropriate interaction and visualization mechanisms during the optimization process, e.g., how to propose effective visualization and interaction mechanisms for collecting eye-movement data input, or filtering out noisy eye-movement data;

  4. 4.

    There also has been no specific evaluation to validate the effectiveness and feasibility of the resulting applications or prototype systems.


So in this paper, we develop a novel approach that builds on the combination of eye tracking and IGA. The user study shows that it not only reduces or eliminates the explicit interactions during the human evaluation process, but also maintains quick optimization with high quality. We describe our approach next in detail.

3 Eye tracking-based preference inference

3.1 Visual attention mechanism of top-down and bottom-up

To select useful eye-movement measures for preference inference, we should firstly clarify what the user’s eye-movement patterns are. All eye-movement patterns are driven by our visual attention mechanisms. There are two main visual attention mechanisms: “top-down” and “bottom-up” (Borji and Itti 2013).

On one hand, the “top-down” mechanism drives the majority of the users’ eye gaze through their tasks, goals, motivations or other subjective factors; it is a task-driven, voluntary process (Itti and Koch 2001). Previous research found that different kinds of tasks would lead to different eye-movement patterns. For example, Koivunen et al. asked users to look at a set of images repeatedly with different tasks, such as reporting the first impression, and rating usability and aesthetics levels (Koivunen et al. 2004). They found that users looked at the same images in different patterns (e.g., as seen through the resulting spatial distribution of eye gaze fixation) depending on the tasks.

On the other hand, even given the specific tasks we mentioned above that drive top-down eye-movement, users’ eye-movement will still be influenced by other non-task factors. The “bottom-up” mechanism represents users’ visual focus based on characteristics of a visual scene (Nothdurft 2005), some of whose parts appear to an observer to stand out relative to their neighboring parts (Borji and Itti 2013), for example, one red dot among other black dots, referred to as the “pop-up” effect (Baldassi and Burr 2004). In particular, the “bottom-up” mechanism can drive some stimulus-based eye-movement, and it is a stimulus-driven and involuntary process (Borji and Itti 2013). For example, it is often the case that a single area of interest (AOI) will draw the user’s visual attention in an image within the first 200 ms (Tzanidou et al. 2005), mostly due to a “pop-up” effect, where one part of the image appears more salient than the others. In our previous research, we also found that the AOIs identified in users’ first fixation were not the preferred AOIs selected by the user (Cheng and Liu 2012). In fact, these initial AOIs only included visual objects with very salient colors or shapes compared to other objects on the same screen.

Both of these two visual attention mechanisms usually occur in the same scenario, e.g., one of them will dominate people’s visual attention behavior for some time, and then exchanges to the other. However, previous work has not considered these mechanisms at the same time. Hence, we will analyze the eye-movement patterns during the UI optimization process based on these two visual attention mechanisms. For example, based on top-down mechanism, we will define the task used in the following experiment to stimulate participants’ search behavior; and based bottom-up mechanism to help us filter out the “noisy” data, e.g., in the first few moments of browsing the UI solutions to find the task object, the first fixation usually is generated with the visual “distraction”, so the first fixation can be “noisy” data for performing preference inference and fitness computation.

3.2 Preference inference based on eye-movement measures

3.2.1 Pre-study

When conducting research on gaze-data driven optimization, some researchers asked users to find (using visual search) the lightest color square amongst a set of presented colored squares, and found that users’ eye-movement data had specific patterns (e.g., concentrated on some light color squares) during the searching process (Pallez et al. 2008). Our own previous research found that task-driven eye-movement during product’s color or shape evaluation tasks can indicate user’s visual preference (Cheng and Liu 2012; Cheng et al. 2010.

So we conducted a pre-study to explore how to use eye-movement patterns for inferring users’ preferences. We recruited 8 participants (3 females and 5 males) from the local participant pool, aged from 21 to 31 years old; none of them were color-blind or had any visual impairment. We asked each participant to browse eight different music player UI design solutions shown on a computer screen (denoted as a solution set), and then tell us which is their most preferred one (based on visual aesthetics) among them. Each participant repeated this task with different UI design sets for several times (e.g., 10 times). We recorded participants’ eye-movements with an SMITM iView X RED 250 Hz eye tracker. As shown in Fig. 1, we divided the screen into nine regions (each considered as one area of interest, AOI), and each of them included one UI solution except the center one. In total, there were 10 screens for each participant to browse. In Fig. 1, the small black dots represent the fixations, and the black lines between two fixations represent the saccades.
Fig. 1

Example of eye-movement data visualizations from a sample participant

3.2.2 Task driven eye-movement analysis

In our approach, on one hand, we should define a specific task to generate related eye-movement data that can reveal users’ visual preference about possible UI solutions. On the other hand, the task we assign to users should be non-intrusive from an interaction standpoint. However, many previous eye-tracking based interaction applications have been intrusive. For example, in the context of gaze-based typing (Majaranta and Räihä 2002), users had to fix their gaze on a virtual key on the screen for more than a pre-set amount of time, and then the system analyzed the gaze dwell time to determine whether to trigger the keystroke. Obviously, this type of task requires unnatural eye-movements, and it is hard for users to execute them well. In contrast, Pallez, Brisson and Baccino found that for visual search tasks, participants could understand and execute them easily (Pallez et al. 2008). Building on this past work, we will not ask user to fix her gaze for a long time on a preferred UI solution, nor for a short time on a disliked UI solution. Instead, we have chosen to allow our participants to simply browse (a kind of visual search) the UI candidates for their most preferred one. Then we can analyze the eye-movement patterns during their visual search processes to infer their preferred UI solutions.

During the task execution process of searching or browsing for the most preferred UI solutions, we found that there were three kinds of visual thinking activities that drove participants to generate different eye-movement patterns accordingly:
  • Locate in this phase, the participant tracked the most visually salient elements (e.g., “pop-up”) with his first few gazes, or looked at the white space in the image, and then located the main interaction area (UI solution) for further information;

  • Evaluate in this phase, the participant checked specific UI solutions (already located in the Locate phase) that were in line with his personal preference;

  • Confirm in this phase, the participant compared some candidate UI solutions, and made a final decision to confirm which UI solution is the most preferred.

We now describe these different eye-movement patterns with more detail. As shown in Fig. 1, we identified saccadic eye-movements that covered all the AOIs that were “Located” by the participant. Furthermore, the fixations within the AOIs represented the “Evaluate” phase. The amount of time spent during these fixations can often reveal the outcome of the evaluation. As the users are very sensitive to the most satisfactory and unsatisfactory solutions, they spend a short amount of time in choosing them from the population (Anderson 2009). In fact, Gong, Yao and Yuan used the time users spent on manual selections for “satisfactory” and “unsatisfactory” individuals to indicate users’ preferences for design optimization applications (Gong et al. 2009). Similarly, we can also use the gaze dwell time on each individual UI solution to infer the user’s preference (evaluation outcome) for each UI solution. Dwell time is defined as the time during one gaze visit to an AOI, from entry to exit (Holmqvist et al. 2011). For ease of computation, we use the number of fixations on the same AOI to replace its dwell time duration, as there is a high positive correlation with the dwell time, for a given user and task (Cheng and Liu 2012; Cheng et al. 2010). Additionally, we also observed some repeated saccades among some AOIs, indicating that the participant was comparing these AOIs and trying to “Confirm” which one was the most preferred.

We also asked the participants for their subjective feedback after the pre-study to help us validate these eye-movement patterns with their real impressions. On one screen, a participant told us that he disliked the UI solution in all AOIs, except AOIs A, B and C (labeled in Fig. 1). We found that these AOIs were the only ones with a significant number of fixations. The other AOIs had very few fixations. The participant told us that he compared the UI solutions in AOIs A, B, and C a number of times because he thought all of them could have been his preferred choice. In the end, he selected AOI A as his most preferred UI solution. This behavior was consistent among all of our pre-study participants. They did not have many fixations on their least preferred solutions, and had many on their most preferred solutions. However, during the “Confirm” process, fixation count is not sufficient for inferring which is the most “satisfactory” UI solution. The number of saccade transitions between the preferred solutions is higher, matching our earlier assessment that participants made many comparisons among them.

3.2.3 Eye-movement measures based preference inference

Based on the above analysis, fixation count and saccade transition frequency (i.e., number of saccades between different pairs of AOIs) can serve as an indication of users’ preferences. In addition, other research has found that a user’s pupil diameter also represents interest in a visual stimulus; positive feelings about a visual object will increase the user’s pupil diameter (Pallez et al. 2010). Hence, we selected fixation count, saccade transition frequency and pupil diameter as the measures to use to infer users’ preferences when browsing potential UI candidates.

The most preferred UI solution on each screen (each presenting 8 UI candidates) as specified by each participant was given a score of “1”, while the others were given a score of “0”. In order to identify the relationship between the preference of UI designs and the predictors: fixation count, saccade transition frequency and pupil diameter, we carried out a binary logistic regression analysis (Peng et al. 2002). We found that the full model against a constant-only model (a simple prediction of preference vs. non-preference, but without using any predictors) was statistically significant, indicating that the set of 3 eye-movement features reliably distinguished the most preferred UI solution from the other solutions (χ (3) 2  = 179.625, p < 0.05). Nagelkerke’s R2 of 0.576 indicated a moderately strong relationship between the prediction of the most preferred UI solution and grouping.

As can be seen in Table 1, The Wald Chi square test demonstrated that only fixation count (p < 0.05) and saccade transition frequency (p < 0.05) made significant contributions to the prediction, but pupil diameter (p > 0.05) was not a significant predictor. With these results, we can create a predictive equation (Engelbrecht 2007) representing the probability of participants’ preferences when browsing the UI design solutions. Note that, given the non-significance of pupil diameter, we left it out of the equation. Hence, the equation is denoted as:
Table 1

Binary logistic regression analysis of participants’ references in browsing the UI solutions








Variables in the equation

 Fixation count







 Saccade transition frequency

− 0.203






 Pupil diameter

− 0.280







− 3.794






$$ fitness = \frac{{e^{{w_{1} \cdot fc + w_{2} \cdot fst + c}} }}{{1 + e^{{w_{1} \cdot fc + w_{2} \cdot fst + c}} }} $$
where fitness is the likelihood of a UI solution’s selection as being the preferred UI solution, and fc is denoted as fixation count and fst is denoted as saccade transition frequency. w1 = 0.062 and w2 = − 0.203 are the regression coefficients and c = − 3.794 is the intercept. As the fitness nears 1, the user is more likely to prefer a given UI solution. Next, we discuss how to use this equation as part of our UI optimization approach.

4 IGA and eye tracking based UI optimization

4.1 Fitness computation

The fitness function performs evolution operations such as reproduction (e.g., copy), mutation, and recombination (e.g., crossover), and these operations help to direct the searching process in the solution space (thus generating optimized solutions) (Takagi 2001; Kim and Cho 2000). For example, a solution that has a high fitness score will have more opportunities to survive and generate offspring and, that means it (or its offspring) is more likely to be selected as the best solution by the end of the evolution process.

Different from classical evolution computation, IGA takes users’ subjective feedback via their interactions with a system, and then models this feedback to create the fitness function (Lee et al. 2001). Psychologists have found that eye-movement measures can produce good fitness functions for visually evolutionary algorithms (Holmes and Zanker 2012). Linear regression was used to model these fitness functions based on eye-movement measures, but the coefficients for each variable were hard to identify. Previous research has specified these coefficients using empirical experience (Pallez et al. 2007; Cheng and Liu 2012. Pallez et al. (2010) generated a linear regression formula by training linear neural networks, but this formula was only verified on color optimization, and the resulting accuracy (87%) was a little low. On the other hand, in the previous section, we found that a binary logistic regression had high accuracy in predicting UI design preferences in our pre-study. Hence, we selected this logistic regression to model the fitness function. Additionally, to exclude the negative effects of the first fixation caused by the “bottom-up” mechanism, we computer fixation count for the AOI that has the first fixation in this way: fixation count = (fixation count −1); and if fixation count ≤ 1, we set fitness to 0 directly. The fitness value in this paper is the same as the fitness (between 0 and 1).

4.2 Gene coding

For IGA, each individual solution is considered to be a chromosome that is composed of several genes. In our setting, each UI solution is a chromosome, and the UI components and design features are the genes (gene coding). Eye tracking studies have found that panels, buttons, icons and other UI components that constitute the visual appearance of the UI influence the user’s evaluation outcomes (Goldberg and Kotval 1999). So we selected these components and their features as the genes in this work. In order to compute over these easily, these components and their features will be coded as binary strings. For the music player UI optimization in this paper, each chromosome was represented as a 30-bit binary string. These genes include absolute positions and relative positions (e.g., the distance between two components), sizes (or text font sizes), shapes (e.g., ratio of width and height for panel) and colors. We divided the wireframe of the UI into 11 different components; and each of them had several genes. Some of these genes are shown in Fig. 2, and some examples of gene definitions and their value ranges (for ease of understanding, the values are presented in their decimal forms) are shown in Table 2.
Fig. 2

Components and their genes of the music player UI

Table 2

Examples of the gene definitions and value ranges

Gene ID


Value range


Width/height (panel)



The distance from u1’s center to left border of layout/height of whole layout


u2 s

u2’s radius/height of whole layout



color (RGB)



u5’s the height/u4’s radius


u9 s



4.3 Evolution operations

IGA has three evolutionary operations: copy, crossover, and mutation (Kim and Cho 2000). Copy is when a gene is passed directly from a parent generation to an offspring generation. Crossover swaps parts of the binary strings from the parent generation to generate new offspring. Mutation inverts some bits of the binary strings from the parent generation to generate offspring. It is also important to understand when the evolutionary operations are applied. We want to avoid the search falling into local optimum solutions quickly (i.e., the algorithm quickly identifies a small set of good solutions, and then cannot improve on the solutions even though better solutions, also known as global optimum solutions, exist) or cannot converge (i.e., the algorithm cannot identify any good solutions or identifying them takes a very long time). Additionally, we use design constrains (denote specific value ranges of genes) to keep the smooth evolution and good quality of UI solutions, such as avoid a small mutation, e.g., moving a UI component from top left to bottom right corner, or generating drastically different colors.

4.4 Algorithm flow

Our algorithm is inspired by the Eye-Tracking Evolutionary Algorithm (Pallez et al. 2007), and it includes the following steps:

Step1. Define the number of generations to use: Max; the probability for the crossover and mutation operations to be a = 0.8 and b = 0.2, respectively, for each gene; the count n of new UI solutions for the next generation; generate the first generation of UI solutions randomly (e.g., UI components and their design features, such as positions, colors are coded by genes, which are denoted as binary strings; and these binary strings are randomly generated) and display them on the screen;

Step2. Ask the user to browse the candidate UI solutions with the goal of identifying her most preferred one; record the eye-movement data measures: fixation count, and saccade transition frequency over the presented UI solutions; when the user identifies her most preferred UI solution for this generation, she presses a key to stop the data collection;

Step3. Compute the fitness values for each of the UI solutions (f(i) = fitness(i), i  = 1, 2,…,8, f(i) is the fitness for UI solution i, and it is equivalent to the value of fitness from the equation at Sect. 3.2.3); and then utilize the fitness-proportionate roulette-wheel-selection algorithm (De Jong, K.A.: An analysis of the behavior of a class of genetic adaptive systems. Dissertation. University of Michigan 1975) for selecting some of UI solutions to be copied, crossovered or mutated to generate the next generation; denote two UI solutions s and t to use in Step4, which have the highest f(i));

Step4. If n < 8, generate a number m (0 ≤ m≤1) randomly. If b ≤ m<a, apply single-point crossover to s and t, and get two new UI solutions s’ and t’ for the next generation; If m < b, apply mutation to s and t, and get two new UI solutions s’’ and t’’ for the next generation; If a ≤ m, copy s and t to next generation directly; and denote the count of new UI solutions n = n+2; else, go to Step5;

Step5. Display the UI solutions for next generation;

Step6. If current generation number is smaller than predefined Max, go to Step2; else go to Step7;

Step7. Stop the IGA process;

Step8. Display all predicted preferred solutions from each generation at the same time, and have the user indicate which is her preferred solution(s); and then export it (them) as images.

Although eye tracking-based IGA can reduce user fatigue, users can also become tired if the number of generations that they experience is too large. This fatigue will negatively impact their eye-movement data, and may not represent real preferences. To avoid fatigue, Takagi have recommended that the maximum number of generations for IGA should be 10–20 generations (Takagi 2001). However, with a limited number of generations, IGA will probably find local optimum solutions rather than global optimum solutions. To reduce the likelihood of this occurring, we can adjust the evolution operations, e.g., increasing the probability of mutation, because a lot of different UI solutions would appear at each generation. We will discuss this issue in more detail later.

5 Prototype system

5.1 Hardware

We used the SMITM iViewX RED remote eye tracker and the iViewX SDK V3.1.0 (also used in the pre-study) to record gaze data (gaze sampling rate was configured to be 250 Hz). Its operating distance is 60 cm to 80 cm, and it supports free head movement (40 cm × 20 cm at 70 cm distance). In this context, the gaze data sampling accuracy is 0.5°. It works with most glasses and contact lenses. The UI solutions were shown to participants on a 19 inch LCD with a resolution of 1600 × 1200.

5.2 Implementation of IGA

We divided the screen into 3 × 3 grids (each grid square contained an AOI except for the center square), each of which had the same size. The eye tracker could distinguish each AOI on the screen, and these were defined by 2 points, top-left and bottom-right corner. If the coordinates for a fixation occurred between an AOI’s two points, its fixation count was increased by one. We recorded the first fixation, which occurred within the first 200 ms when the participants saw the screen for each generation, for the purpose of filtering it out.

We developed the prototype system with two versions, eDesign and mDesign. eDesign uses implicitly gathered preferences using eye-tracking, while mDesign does not use eye-tracking has users explicitly rate each UI solution on a 5-point Likert scale (1 = really dislike, 5 = really like).

5.3 Visualization mechanism

eDesign has three different screens, as shown in Fig. 3.
Fig. 3

Visualization frames represented on the screen

Initial screen displays a countdown number from three to one in the center of the screen. It is used to lead a participant’s gaze to the center of the screen at the beginning of each generation. This reduces the impact of “position effects” (e.g., most people browse screens from top-left to bottom-right) on fixation locations. After the three-second countdown, the Solution Screen appears automatically.

Solution screen displays eight candidate UI solutions in eight AOIs (one solution per AOI) with nothing in the center of the display. We avoid displaying solutions in the center as participants’ eyes are naturally attracted to the center. As well, if the participants compare two solutions that are diametrically opposite, their gaze will cross through the center area, disrupting the solution’s estimated fitness in the center (Pallez et al. 2007). Once the participant is done viewing the current generation of solutions, she presses the spacebar, causing the Mask Screen to appear.

Mask screen displays a blank gray screen for a given period (e.g., 2 s). This screen acts as a visual mask, allowing any high contrast stimuli from the previous screen to fade, reducing the persistence of vision (Holmes and Zanker 2008). In addition, such a screen can reduce visual fatigue.

On the Solution Screen, we randomly assign UI solutions to positions on the screen to avoid “position effects”, i.e., the UI solution at the top-left may have the most fixations as users tend to browse from the top-left. As an individual UI solution could be presented more than once in multiple generations, the random presentation limits the impact of the contextual effects, as user’s decisions about a preferred solution is always relative to the surrounding solutions (Holmes and Zanker 2008).

6 User study

6.1 Experimental procedure

We recruited 15 new participants (denoted as P1–P15, 5 females and 10 males) from the local participant pool, and all of them are university students without relationship to our research team and not majored in computer science or any other engineering domain. Their ages ranged from 20 to 25 years old (M = 23.9, SD = 0.9). None of them were color-blind, but five of them were shortsighted, wearing glasses or contact lenses. We used a within-group approach, and randomly assigned the participants into two groups. One group used the eDesign system first and then used the mDesign system later, while the other group used the systems in the reverse order.

Before each experiment, the eye tracker was calibrated for each participant by having them gaze at nine different points on the screen. When using the eDesign system, participants were initially shown two screens of UI solutions as training (with none of these solutions appearing later in the optimization process). When using the mDesign system, for training, we asked each participant to browse two screens of UI solutions (these solutions did not appear later in the optimization process) and then rate (for visual aesthetics) each UI solution on our 5-point Likert scale. In both systems, we set the maximum number of generations to 10, to avoid the impact of user fatigue.

Both the eDesign and mDesign systems recorded the most preferred UI solution of each generation inferred by the system, and re-displayed them all again after the optimization process. Then each participant selected several re-displayed UI solutions to confirm their most preferred ones.

Finally, we conducted brief interviews with the participants via an anonymous online survey to elicit their subjective feedback about their fatigue in using each system, their preference for using them, and their satisfaction with the results.

6.2 Results

6.2.1 Evolution convergence

After several generations of evolution, a population tends to lose its diversity due to convergence, and the result is a set of similar instances being presented to the user (Liapis et al. 2012). In our case, that means most UI design solutions on the same screen (i.e., in a single generation) will be similar, and most will be considered as being preferred by participants. Hence, convergence is an important criterion for validating the feasibility of our eDesign system.

Figure 4 shows the UI design solutions generated by the eDesign system from generation 1 to 10 for one participant, P2. We find that generation 1 (in which UI solutions are generated randomly) has the most variability in the UI design. Gradually, more and more similar UI designs appear, especially in generations 9 and 10, where the designs on the screen are nearly all the same.
Fig. 4

Examples of UI design solutions from generation 1–10 for P2

Figure 5 shows the fitness that each solution in a given generation is the participants’ most preferred solution, calculated using Eq. (1) described earlier (with moralization finally). In particular, it shows the average fitness for all the solutions in each generation. As the design solutions converge, we expect that the fitness that any one solution preferred decrease. In addition, the graphs in Fig. 5 also have some fluctuations. This indicates that there are inconsistencies in individual solution fitness values provided by the user (Pallez et al. 2010), because participants look for what they like but their preferences are influenced by what they see (i.e., what is available to be viewed) (Holmes and Zanker 2012; Glaholt and Reingold 2009; Shimojo et al. 2003). On the other hand, after the experiments, some participants told us that in the early generations, they were not sure which UI solution was the most preferred. For last 1 or 2 generations, participants’ preferences were more stable, as they could more easily select a preferred UI solution, so the curve almost was horizontal.
Fig. 5

Fitness (average) for the UI solutions in each generation, from generation 1 to 10 for all participants

Generally speaking, it is difficult to show the convergence of our approach with a quantitative analysis as is done with classical evolutionary algorithms, because our approach is based on subjective feedback (Kim and Cho 1999). However, our results and analysis show that our participants identified their preferred UI solutions using the eDesign system, validating the feasibility of our approach. The screenshots show that the UI solutions converge to very similar styles, and our analysis shows that these match participants’ preferences. In the following section, we will describe the advantages of the eDesign system compared to mDesign system.

6.2.2 Efficiency

We recorded the task completion time in seconds (ten generation evolutions by mouse click and eye-tracking) for each participant. We found that using the eDesign system (M = 195.9, SD = 48.7) was faster than the mDesign system (M = 263.3, SD = 85.4) for participants completing their tasks. The difference was significant by paired t test (t(9) = 2.317, p < 0.05).

6.2.3 Subjective feedback

We collected the participants’ subjective feedback using a 5-point Likert scale questionnaire (1: very low; 5: very high).
  1. 1.

    For the question “How do you rate your fatigue when using the system”, the rating for the mDesign system was M = 2.8, SD = 0.9, and the rating for the eDesign system was M = 2.1, SD = 0.3. There was a statistically significant difference in rating these two systems via a Friedman analysis of variance by ranks (χ (1) 2  = 5.000, p < 0.05). That means participants felt less fatigue when using the eDesign system compared to the mDesign system.

  2. 2.

    For the question “How do you rate your satisfaction with the UI solutions generated by the system”, the systems were rated as: the mDesign system was M = 3.6, SD = 0.7, and the eDesign system was M = 4.1, SD = 0.6. There was not a statistically significant difference in this rating using a Friedman analysis of variance by ranks (χ (1) 2  = 2.667, p = 0.102). While participants did not feel that the eDesign system produced better solutions compared to the mDesign system, they did provide some useful feedback. P8 said “In my opinion, the eye tracker may capture my gaze with some errors, so I thought the eDesign system may have generated some UI solutions I dislike”. The eye tracker was quite accurate in capturing participants’ gaze data. We believe this bias (expressed by multiple participants) reduced their subjective satisfactions for eDesign system.

  3. 3.

    For the question “How do you rate your preference about the system’s interaction?”, the ratings for the mDesign system was M = 3.1, SD = 0.6; and the rate for the eDesign system was M = 3.9, SD = 0.7. This difference in rating was statistically significant using a Friedman analysis of variance by ranks (χ (1) 2  = 5.444, p < 0.05). This means that participants preferred using the eye tracking-based interaction over the manual selection interaction. One participant (P3) told us in the interview: “The eye tracking-based system was very interesting, and it could guess my thoughts, and then displayed my loved design styles”, and “I hate clicking the items when somebody asks me which is my preferred; but this system didn’t need me do that any more, I just moved my eyes, and everything was done”. Another participant (P4) said: “Eyes are the windows of our hearts; and the eye tracking-based system can reveal my preferences directly without using mouse clicks. And on the other hand, I think I change my preferences more or less during mouse clicks. So I think the eye tracking based interaction had much more accuracy. Only one participant (P8) did not prefer the eye tracking-based interaction, and he said in the interview: “It’s hard to use the eye tracking based interaction. But if [calibration of the eye tracking is improved], I’d like to use it”. In fact, the eye tracker could perform calibration quite well, but as P8 was nervous during the calibration process, he failed the gaze calibration several times.


Furthermore, we recorded the number of the UI solutions selected by the participants at the confirmation phase. For both the eDesign and mDesign systems, a total of ten UI solutions were displayed (each generation had one). The more UI solutions participants selected among them, the more preferred UI solutions the system generated, as it shows it has inferred the participants’ preference well. We conducted a Mann–Whitney U test to compare the number of preferred UI solutions participants confirmed at last, and found that there was no significant difference for the eDesign system (M = 4.1, SD = 1.9) and the mDesign system (M = 3.1, SD = 1.9); U = 33.5, p > 0.05. That means that the eDesign system had similar performance as the mDesign system, but without requiring users to provide explicit feedback about their preferred solutions. Although both the eDesign and mDesign system found ten preferred UI solutions for participants, at the confirmation phase, participants just selected some of them as their most preferred, and not all. The explanation for this is: in the experiment phase, the participant selects one UI solution as the most preferred for given generation by comparing the UI solutions in the that generation; but in the confirmation phase, the participant was not being asked to make comparisons, but instead to select all solutions that she likes. In most cases, participants left out some solutions as being not preferred.

7 Discussion

We now discuss some issues with our approach that impact how it can be applied, and address some of the limitations of our research.

7.1 The trade-off of visualization and optimization performance

How to configure the makeup of each generation is a key in IGA-based applications. Most IGA-based UI optimization design systems only present between eight and twelve solutions at each screen (Monmarche et al. 1999; Gong et al. 2009). The population size (number of solutions per generation) is limited by the number of individual images that can be spatially displayed on a screen (Takagi 2001). With this limitation, these IGA-based UI optimization design systems usually converged relatively quickly, but likely resulted in local optimum solutions instead of global optimum solutions. The same problem also existed in our eDesign system, which usually converged in around 6-8 generations (population size was eight, 3 × 3 grids), and this limited our participants from seeing more different UI solutions that they may have preferred. Obviously, increasing the population size will generate more different UI solutions. However, on the other hand, presenting too many solutions on the same screen will decrease efficiency accordingly. We increased the visualization density with 4 × 4, 5 × 5 and greater grids to display more UI solutions on each screen. We conducted a quick pilot study to explore these alternative grid sizes: we displayed 4 × 4, 5 × 5, 6 × 6, 7 × 7 and 8 × 8 grids respectively, and asked a participant to perform the same task described earlier (browse the displayed UI solutions for the most preferred UI solution, for ten generations). The task completion time for each grid size (averaged over the 10 generations) is shown in Fig. 6.
Fig. 6

Time cost (error bar: stand deviation) over different type of grids displayed on the screen

These pilot results indicate that the participant had to pay more visual attention when more UI solutions were displayed, increasing the time and decreasing efficiency accordingly. Moreover, it also influenced the participants’ evaluation accuracy, because: (1) the relative size of each AOI or UI solution decreased, while the eye tracker can track eye-movement with appropriate fidelity, participants may not be able to see design details clearly; (2) the decision making difficulty from having to compare many different solutions also increased.

To address this trade-off, as an alternative for displaying more solutions (or increasing population size), we can use multiple screens for a single generation. For example, we could generate 16 solutions, and randomly split these into 2 groups of 8. We could show one grouping and then the other, and use the combined results to produce the next generation. We expect that this approach will more easily achieve improved solutions, if not global optimum solutions. We will conduct a user study to validate this method in the future.

7.2 Complexity of gene coding

For the music player UI optimization in this paper, each chromosome was represented as a 30-bit binary string. However, for a more complex UI, the binary string will be much longer, requiring more time for the IGA to execute. This would inhibit our ability to instantly produce the next generation of solutions to view, likely resulting in reduced satisfaction with our system. We compared the performance with different size binary strings (copy and extend with original 30-bit string): 100, 1000, 10,000, 100,000, 1,000,000 and 10,000,000 bits, for 100 generations. We executed our test program 3 times on a PC with Windows7 ultimate 32bit (OS), AMD Athlon II X2 240 2.81GHZ (CPU) and, 3 GB RAM. The time cost (averaged across the 3 trials) for solution generation for each gene string length is: 100-bit (≈ 0.000 ms), 1000-bit (0.160 ms), 10,000-bit (1.097 ms), 100,000-bit (26.520 ms), 1,000,000-bit (256.933 ms), and 10,000,000-bit (program crashed). While we did not take rendering time (displaying the solutions) into account, this time should be small relative to the time for solution generation.

7.3 Design support and limitations

Our approach can be applied to many domains of design, such as mobile UI and Web UI design, and even product design. Generally, it can help designers to facilitate their workflow and refine UI solutions with involving end users’ feedback, such as preference, and so that drive design iteration loop with more efficiency and effect. For example, in the earlier design life-cycle, such as concept design phase, it requires designers proposing many rough solutions in a short period of time. Our eDesign system should support this process with high satisfaction and efficiency: when a designer completes a design solution, she can then input it to the eDesign system to optimize that solution using her gaze data; after each generation, the designer can obtain many other design solutions. On the other hand, it also can help end-user for custom design. In order to address personal choices of individuals, designers can create a set of design templates, and then allow customers to use eDesign system to optimize the templates with their own preferences by gazing (this may be especially appropriate for disabled customers who cannot use a mouse or keyboard).

Despite the potential application value of our approach mentioned above, it does have some limitations for design practice. First, the UI optimizations only looked at static UI components, and cannot support dynamically changing content, e.g., pop-up menus. We currently cannot distinguish the eye-movement data from different UI layers or from components moving around the screen. Second, our evaluation focused on the visual aesthetics aspects of UI design, not touching the functional aspects or usability problems, e.g., the affordances of the UI components or consistency across the interface. These problems cannot be evaluated by eye tracking only, and maybe need combining other feedback from users. Third, we defined each candidate UI solution as AOI and analyzed eye gaze data for AOI level, not for UI component level, for example, when the user paid attention to a particular component, e.g., a button, we did not analyzed how many fixations located in the button. This information can provide preference at different levels for more complex optimization.

We believe that these limitations can be addressed, and we will work towards this in our future research.

8 Conclusion and future work

We used eye-tracking data to infer users’ visual preference and model fitness function, and then proposed the general approach of using IGA in user interface design optimization, and conducted user study to validate that our approach can help users optimize a personalized UI with high satisfaction and efficiency. It indicates that users can help UI designers in designing or customizing personal UIs in an implicit and natural interaction manner, just by browsing the screen.

In the future work, on one hand, we will explore eye-movement data recordings and analysis methods for creating dynamic UI solutions, and then explore usability problems involving UI optimization that do not simply involve visual preference. On the other hand, in order to avoid obtaining local optimums and too fast convergence, we plan to refine our eye gaze-based IGA for more creative design (using divergent optimization to generate more diverse designs), e.g., allowing some less-preferred UI solutions to have increased probability of becoming the parents of offspring for next generation (Kelly et al. 2008). Additionally, compared with IGA, we will explore other optimization algorithms, such as Bayesian optimization (Brochu and Brochu 2010), and gradient-based optimization (Michalek et al. 2002), to drive the UI optimization design for more efficiency and accuracy.



We thank all the participants who took part in the pre-study and the user study. This research was sponsored by the National Natural Science Foundation of China (61772468). We also appreciate all the reviewers for their constructive comments for this paper.


  1. Anderson, J.R.: Cognitive Psychology and its Implications. Macmillan, Basingstoke (2009)Google Scholar
  2. Baldassi, S., Burr, D.C.: ‘Pop-out’ of targets modulated in luminance or colour: the effect of intrinsic and extrinsic uncertainty. Vis. Res. 44(12), 1227–1233 (2004)CrossRefGoogle Scholar
  3. Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 185–207 (2013)CrossRefGoogle Scholar
  4. Brabrand, G.B.: Dynamic layout optimization for newspaper web sites using a controlled annealed genetic algorithm. Dissertation, Gjøvik University College (2008)Google Scholar
  5. Brochu, E., Brochu, T., de Freitas, N.: A Bayesian interactive optimization approach to procedural animation design. In: Proc. ACM SIGGRAPH/eurographics symposium on computer animation, pp. 103–112 (2010)Google Scholar
  6. Cheng, S., Liu, Y.: Eye-tracking based adaptive user interface: implicit human-computer interaction for preference indication. J. Multimodal User Interfaces 5(1-2), 77–84 (2012)CrossRefGoogle Scholar
  7. Cheng, S., Liu, X., Yan, P., Zhou, J., Sun, S.: Adaptive user interface of product recommendation based on eye-tracking. In: Proc. EGIHMI, pp. 94–10 (2010)Google Scholar
  8. De Jong, K.A.: An analysis of the behavior of a class of genetic adaptive systems. Dissertation. University of Michigan (1975)Google Scholar
  9. Eisenstein, J., Puerta, A.: Adaptation in automated user-interface design. In: Proc. IUI. ACM Press, New York, pp. 74–81 (2000)Google Scholar
  10. Engelbrecht, A.P.: Computational Intelligence: An Introduction, 2nd edn. Wiley, New York (2007)CrossRefGoogle Scholar
  11. Gajos, K.Z., Weld, D.S., Wobbrock, J.O.: Automatically generating personalized user interfaces with Supple. Artif. Intell. 174(12), 910–950 (2010)CrossRefGoogle Scholar
  12. Glaholt, M.G., Reingold, E.M.: Stimulus exposure and gaze bias: a further test of the gaze cascade model. Atten. Percept. Psychophys. 71(3), 445–450 (2009)CrossRefGoogle Scholar
  13. Goldberg, J.H., Kotval, X.P.: Computer interface evaluation using eye movements: methods and constructs. Int. J. Ind. Ergon. 24(6), 631–645 (1999)CrossRefGoogle Scholar
  14. Gong, D., Yao, X., Yuan, J.: Interactive genetic algorithms with individual fitness not assigned by human. J. Univ. Comput. Sci. 15(13), 2446–2462 (2009)Google Scholar
  15. Holmes, T., Zanker, J.: Eye on the prize: using overt visual attention to drive fitness for interactive evolutionary computation. In: Proc. GECCO. ACM Press, New York, pp. 1531–1538 (2008)Google Scholar
  16. Holmes, T., Zanker, J.: Using an oculomotor signature as an indicator of aesthetic preference. i-Perception 3(7), 426–439 (2012)CrossRefGoogle Scholar
  17. Holmqvist, K., Nyström, M., Andersson, R., Dewhurst, R., Jarodzka, H., Van de Weijer, J.: Eye Tracking: A Comprehensive Guide to Methods and Measures. Oxford University Press, Oxford (2011)Google Scholar
  18. Ito, F., Hiroyasu, T., Miki, M., Yokouchi, H.: Discussion of offspring generation method for interactive genetic algorithms with consideration of multimodal preference. In: Proc. SEAL, 2008, pp. 349–359Google Scholar
  19. Itti, L., Koch, C.: Computational modeling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)CrossRefGoogle Scholar
  20. Kelly, J., Papalambros, P.Y., Seifert, C.M.: Interactive genetic algorithms for use as creativity enhancement tools. In: Proc. AAAI Spring Symposium: Creative Intelligent Systems, pp. 34–39 (2008)Google Scholar
  21. Kim, H.S., Cho, S.B.: Development of an IGA-based fashion design aid system with domain specific knowledge. Proc. IEEE SMC 3, 663–668 (1999)Google Scholar
  22. Kim, H.S., Cho, S.B.: Application of interactive genetic algorithm to fashion design. Eng. Appl. Artif. Intell. 13(6), 635–644 (2000)CrossRefGoogle Scholar
  23. Kitamura, S., Kanoh, H.: Developing support system for making posters with interactive evolutionary computation. Proc. ISCID 1, 48–51 (2011)Google Scholar
  24. Koivunen, K., Kukkonen, S., Lahtinen, S., Rantala, H., Sharmin, S.: Towards deeper understanding of how people perceive design in products. In: Proc. CADE (2004)Google Scholar
  25. Lee, J.H., Kim, H.S., Cho, S.B.: Accelerating evolution by direct manipulation for interactive fashion design. In: Proc. ICCIMA, pp. 343–347 (2001)Google Scholar
  26. Liapis, A., Yannakakis, G.N., Togelius, J.: Adapting models of visual aesthetics for personalized content creation. IEEE Trans. Comput. Intell. AI Games 4(3), 213–228 (2012)CrossRefGoogle Scholar
  27. Majaranta, P., Räihä, K.J.: Twenty years of eye typing: systems and design issues. In: Proc. ETRA, pp. 15–22 (2002)Google Scholar
  28. Michalek, J., Choudhary, R., Papalambros, P.: Architectural layout design optimization. Eng. Optim. 34(5), 461–484 (2002)CrossRefGoogle Scholar
  29. Monmarche, N., Nocent, G., Venturini, G., Santini, P.: On generating html style sheets with an interactive genetic algorithm based on gene frequencies. In: Proc. AE, pp. 99–110 (1999)Google Scholar
  30. Nothdurft, H.C.: Salience of feature contrast. In: Neurobiology of Attention, pp. 233–239 (2005)Google Scholar
  31. Pallez, D., Brisson, L., Baccino, T.: Towards a human eye behavior model by applying Data Mining Techniques on Gaze Information from IEC. In: Proc. HCP, pp. 51–64 (2008)Google Scholar
  32. Pallez, D., Collard, P., Baccino, T., Dumercy, L.: Eye-tracking evolutionary algorithm to minimize user fatigue in IEC applied to interactive one-max problem. In: Proc. GECCO, pp. 2883–2886, ACM Press, New York (2007)Google Scholar
  33. Pallez, D., Cremene, M., Baccino, T., Sabou, O.: Analyzing human gaze path during an interactive optimization task. In: Proc. EGIHMI, pp. 12–19 (2010)Google Scholar
  34. Peng, C.J., Lee, K.L., Ingersoll, G.M.: An introduction to logistic regression analysis and reporting. J. Educ. Res. 96(1), 3–14 (2002)CrossRefGoogle Scholar
  35. Quiroz, J.C., Banerjee, A., Louis, S.J., Dascalu, S.M.: Document design with interactive evolution. In: New Directions in Intelligent Interactive Multimedia Systems and Services-2, pp. 309–319 (2009)Google Scholar
  36. Semet, Y.: Interactive Evolutionary Computation: a Survey of Existing Theory. University of Illinois, Illinois (2002)Google Scholar
  37. Shimojo, S., Simion, C., Shimojo, E., Scheier, C.: Gaze bias both reflects and influences preference. Nat. Neurosci. 6(12), 1317–1322 (2003)CrossRefGoogle Scholar
  38. Takagi, H.: Interactive evolutionary computation: fusion of the capacities of EC optimization and human evaluation. Proc. IEEE 89(9), 1275–1296 (2001)CrossRefGoogle Scholar
  39. Tzanidou, E., Petre, M., Minocha, S., Grayson, A.: Combining eye tracking and conventional techniques for indications of user-adaptability. In: Proc. INTERACT, LNCS, vol. 3585, pp. 753–766 (2005)Google Scholar

Copyright information

© China Computer Federation (CCF) 2019

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyZhejiang University of TechnologyHangzhouChina
  2. 2.Information SchoolUniversity of WashingtonSeattleUSA

Personalised recommendations