Keywords

1 Introduction

Nowadays, online shops such as amazon.com or zalando.com are a well established and indispensable part of our everyday life. They often offer a better product availability, time savings and higher comfort compared to physical stores since purchases can be made from the comfort of ones home [35]. These factors have lead to an enormous growth that is anticipated to continue [23, 34]. While in 2017, only \(10.1\%\) of worldwide purchases were made online this is expected to grow to \(15.5\%\) by 2021Footnote 1.

While the design and interaction techniques of online shops have changed and improved over the last years, due to a higher focus on user experience and usability [11, 16, 41], they are still mainly focused on product presentation and purchase transactions. Especially the search bar functionality, where powerful algorithms are used to improve efficiency of customer searches, has received most attention [26]. Compared to this, product search and explorations using a menu interface has been largely neglected. Even though contrary to common belief, product searches via a search bar have been proven to be not necessarily preferred by the user nor are they generally more effective [14]. Especially when an online shopper does not know the explicit name of the desired product or is simply “just looking”, the search bars might not be a suitable solution. Customers that want to browse through the products using a specific category, e.g. if they are looking for a gift on menus for product exploration. Thus, it is important to further investigate possibilities of menu optimization.

Although the benefit of visualizations used in online shops is well known, menu representations are still mainly text-based. Furthermore, the underlying categorization depicted by the menu is often inconsistent throughout different online shops which makes it difficult for the customer to understand the underlying classification [27]. Therefore, in this paper we explore the usage of the Apartment metaphor [1] as a menu representation for online shops. It exploits the users familiarity with an apartment environment and categorizes products based on their association to rooms and furniture to make them easier to explore for the user. We present the development of the categorization of the different products as well as spatial visualization showing the floor plan of an apartment or a store to explore the different categories. Our comparative evaluation shows that an apartment-based online shop menu outperforms classic linear store-based menu structures in task performance, usability and user experience. Hence, this work contributes to the further development of online shop menus by:

  • A study of an online shop prototype was conducted and evaluated four combinations of menu representation and categorization based on related work with respect to task performance and user preference.

  • User insights and actionable improvements were provided for designing and developing future shopping environments.

2 Background

Prior works relevant to the here presented approach include those addressing (1) menu representations, as well as (2) menu categorization will be discussed here. Furthermore, we present an analysis of the menu structures of current popular online stores.

2.1 Menu Representation

Menu-guided interfaces have long been a prominent user interface component of software applications or websites and serve to structure the underlying amount of information hierarchically. Therefore a variety of different menu representations have been explored in the past. Miller et al. [21] examined the effects of menu width and depth on speed and accuracy. Their results were confirmed by Zaphiris et al. [42], which showed that flat hierarchies are easier and faster to use and lead to greater orientation and satisfaction, and shorten interaction times. This has been recently also confirmed by Zhang et al. [43] in their extensive study on web shop menus. But in their study they also investigated the difference of menu positioning. While early work showed no clear recommendation, and only a tendency towards menus at the top of the screen [24], Zhang et al. showed that top-positioned menus were faster for product search, while top and left-positioned menus were preferred. This confirms that linear menu variants are recommended for use in online shops. Based on these findings, we have developed a reference menu for our comparison study.

Cockburn et al. [5] investigated different approaches to improve the traditional linear menus in terms of performance and preference. They compared standard and shared menus [33], and showed that frequency split menus were the fastest. However, the use of frequency split menus in an online shop might be problematic as highlighting frequently selected but potentially unwanted objects would not lead to any improvement, and could lead to frustration and confusion [22]. Findlater et al. [6] used a prediction algorithm to determine menu items that the user probably needs and showed that they enable faster menu selection. As shopping is a changing process – i.e. the offer and the interests of customers are constantly varying – the use in online shops does not appear promising for the process of exploration and is therefore not considered in this work.

Ahlstrom et al. [2] compared different menu designs and their results show that a squared menu is faster and is preferred which indicates that a spatial and rasterized arrangement could lead to better usability. Similarly Scarr et al., structured items hierarchical in a grid in the CommandMaps menu [30]. Their comparison to a ribbon menu interface (known from Microsoft Word) demonstrated that CommandMaps are faster and require fewer pointing activities, since the user no longer needs to change menu levels and benefits from spatial memory. In the same line of exploiting spatial memory also Vrechopoulos et al. [39] examined how different “brick-and-mortar” layouts can be transferred into a virtual retail environment. They showed that a grid based on a tree structure is easier to handle, which is furthermore supported by Griffith [7].

In summary, an adjustment of the menu representation can lead to higher objective performance [6, 21, 30] and/or better subjective preference [2, 5, 42]. Deeper menus, i.e. menus with many layers, increase complexity and slow down interaction times, so that the width (number of menu items in the layer) is preferable to depth [13, 14, 21, 25, 40, 42]. Moreover, it is advisable to minimize the menu depth without increasing the width extremely and on condition that the semantic data allow such a distribution [21]. The reference menu used in this paper is based on [43], representing the state-of-the-art of traditional linear menus in current online shops. Furthermore, we developed a spatial menu representation in the line of Vrechopoulos et al. and Griffith [7, 39].

2.2 Menu Categorization

Besides the representation and arrangement, especially a meaningful and comprehensible categorization of the menu items is important for efficient menu interactions. Katz and Byrne’s investigation of menu interfaces for e-commerce environments highlighted the importance of high quality categories [12]. This is also confirmed by the work of Tuch et al. [38], who showed that a good categorization with a high information content can lead to a higher feeling of user-friendliness. Larson and Czerwinski [14] recognized the importance of a semantically founded categorization and integrated this fact into their research on width and depth of menus. In addition, Miller and Remington [20] also emphasize the necessity of both aspects, menu representation and categorization, due to their direct interdependence, i.e. both aspects influence each other in terms of task performance and user preferences.

Usually one differentiates between hierarchical (or faceted) categories and automated clustering [10]. While fully automated clustering according to the similarity of words or phrases has the advantage here, a quick structuring of information collections often leads to logical inaccuracies in contrast to the manually created hierarchies, which are rather preferred by users [10]. Practical examples also show how the categorization of menu items influences user performance, more precisely that users work faster with optimized menu categorizations [32]. Resnick and Sanchez [28] confirm the influence and effect of high-quality menu labels against those with lower quality. The use of meaningful and comprehensive menu labels is therefore an essential.

Adam et al. [1] presented a new categorization and representation scheme to enable intuitive menu navigation. Their spatial “Apartment” metaphor maps the mental model of an apartment with different rooms (living room, kitchen, etc.) to a structure of a smart-home control interface. The top level of categorization corresponds to the room category, followed by a device level and finally a task level that contains the potentially possible system tasks of the selected device. A similar approach has been utilized by Speicher et al. [36] for a virtual reality store. They found that the apartment metaphor provided excellent customer satisfaction, as well as a high level of immersion and user experience. The positive effects of the Apartment metaphor on task performance and user preferences serves as a basis and motivation for the menu categorization of the online shop prototype developed in our work.

2.3 State-of-the-Art Analysis

Table 1. Overview of state-of-the-art online shops for groceries, electronics, furniture or fashion. For the representation the first number is the menu depth, the second is the width of the top-level. These online shops have been accessed on August 21st, 2018.

Even though related work recommends to abandon linear store-based menu interfaces, most current online shops are still employing these. Table 1 lists a selection of popular online shops and their used categorization and representation. While we do not claim this to be the most representative selection it contains some of the most frequented stores. All of them use a linear representation in which the items are arranged horizontally or vertically. Some of them also integrate a multi-column menu display, e.g. the IKEA interface contains a two-column menu at the top level. There are significant differences in the menu width from 3 to 24 items, in contrast to the depths between 3 and 5 levels. This is in line with findings from previous work [14]. In current online shop menus, the individual menu items are predominantly text-based, while current research recommends more graphical methods. Only three of the eight online shops (REWE, CARREFOUR, IKEA) additionally integrate icons illustrating the associated text label, while three menus (CONRAD, REWE, TESCO direct) also show the number of sub-level items.

Besides the representation, the logical meaning of the underlying categorization is important. All considered menus in Table 1 use mixed categories, i.e. following different sorting strategies within a menu level. Mostly, however, the categories are based on a combination of assortments and themes known from physical shops. For example, the category “beverages” refers to an assortment, the category “baby” to a theme. Studies have shown that such a mixture of categories can be unclear to the users, since the labels do not clearly describe the underlying information space, which is essential to give the user an overall impression of the search space [10]. Especially occasional users or new customers who are not familiar with the specific occurrences of a new or existing shop interface could be particularly frustrated, resulting in a negative effect on performance and preference. In the worst case, it could lead to shopping attempts being cancelled and the shop not being visited again [18]. Our state-of-the-art analysis indicates that current online shops do not follow findings on menu optimization and tend to use classical methods that have already been established.

3 Approach

The purpose of this paper is to explore the usage of the Apartment metaphor for categorization in combination with a spatial representation as a new menu type for online shops. Therefore we re-categorized products from a set of previous categories (departments or themes and shelf names), which are more likely to be found in a physical market, to residential categories as part of a pilot study. Based on these categories we developed a spatial menu representation, as an alternative to the traditional linear menu. This results in the following summarized menu representations and categories to be compared in our conducted main experiment:

  • Representations

    • Linear

    • Spatial

  • Categorizations

    • Store

    • Apartment

To test and compare these four menu types and combinations of both representations and categorizations, they were integrated into an online shop prototype. As the basis for our menu we selected a set of 36 products, that represent the core areas of online trading. These products were selected from a local hypermarket with an associated online shop and is based on its frequently searched products. The 36 products included belong to various traditional product areas such as food, office supplies, clothing, electronics and others.

3.1 Representations

Representations of menus vary in the arrangement of the items on the screen. This can affect the search time to find a desired menu item visually, as well as the time to point and click [2, 4, 37]. We examine two different representations here: linear and spatial.

Fig. 1.
figure 1

Linear (left) and Spatial (right) representations, and product area relating to the sub-level “cabinet” under the top-level “kitchen”.

Linear. The linear menu is most common in today’s web interfaces such as online shops [43]. Here the menu items are arranged so that they form either a horizontal or a vertical line, usually with text labels. Since horizontal linear menus are recommended at the top of the screen [43], this combination is used as a reference menu representation. The depth and width of a linear menu depends on the underlying categorization (see Fig. 1).

Spatial. Our spatial menu representation is influenced by earlier findings on the grid arrangement [2, 30] and floor maps [19], which are often used in shopping malls, e.g. as printed maps. We use a spatial representation –a map–, based on real environments, where menu items are arranged according to the position they occupy in the real world either inside a store or an apartment depending on the used categorization (see Fig. 1). Thereby we hope to exploit the users spatial memory and improve the performance of the menus [8, 29]. As with the linear menu representation, the depth and width of the spatial menu depends on the underlying categorization.

3.2 Categorizations

The categorization determines the semantic structure of the menu and is usually structured hierarchically. In this paper, the categorizations are based on a three-level hierarchy (or menu depth) with top, sub and product levels. Overall, two different categories are investigated in this work: a traditional store- and an apartment-based categorization.

Store-Based. The traditional store-based categorization with different departments and themes, serves as the reference point for the evaluation as it represents the de-facto standard in current online shops [43]. The top-level categories of the hierarchy correspond to the product range that is typically separate departments, such as “milk & cheese” or “beverages”. The subordinate categories describe shelves in these delimited market areas, e.g. “cream cheese” or “lemonades”, followed by the product level as the lowest level (see Fig. 1).

Apartment-Based. The apartment metaphor used in our approach uses the fact that users are familiar with the structure of an apartment and the items (products) placed in it based on everyday habits and experiences [1]. As with the store-based categorization, this is also based on a three-level hierarchy: rooms (e.g. kitchen), furniture (e.g. refrigerator), and product (e.g. mustard). In order to develop these categories and define the corresponding product assignments a pilot study was carried out to find out where users assume that these products should be located inside an apartment.

4 Pilot Study

To create the needed reclassification of the selected products from store- to apartment-based categorization, we conducted a pilot study. This study is divided into two phases. In the first phase, we focus on the categorization and product allocation inside the Apartment metaphor. In the second phase, we aim to create two groups of products with the same size and approximately the same average level of difficulty (product search error rate) from the pool of the 36 selected products. Each product is then classified according to its average search error rate in order to ensure comparability of product data. This will serve as the logical basis for the main study design.

Fig. 2.
figure 2

The tree visualization representing the hierarchy of the apartment categories with room nodes on the first level and furniture nodes on the second level.

4.1 Phase 1: Categorization and Product Allocation

In total, 42 participants (20 female) between 18 and 58 years (\(M=31.12, SD=12.35\)) volunteered in the pilot study, which consisted of two sessions. In the first qualitative session, we conducted a semi-structured on-site interview and asked the participants where, i.e. in which room on or in which furniture, they store or would store each of the selected products, followed by a demographic questionnaire. Most participants lived in a two-room apartment (\(M=2.17, SD=0.85\)) (which do not include bathroom, kitchen or hallway) with an average of two inhabitants (\(M=2.38, SD=1.19\)). One third lived in a partnership (\(N=14\)), followed by singles (\(N=10\)), with parents/family (\(N=9\)) or in a shared flat (\(N=9\)). All of them have already shopped online, most of them at least once a month (\(N=34\)). Information about the housing situation of the participants was gathered. More precisely, the room types belonging to the apartment as well as the product storage habits, i.e. which furniture is used as storage space, such as milk in the refrigerator or newspapers on the table. This was done in a semi-structured interview to form a set of room and furniture categories (top and sub level).

The result of the first session was a preliminary set of apartment locations (rooms and associated furniture), which form an essential part of the apartment categorization, as they form the basis for the product allocation. A total of seven room categories were considered, as these were named by well over half of the participants: bathroom (\(N=41\)), kitchen (\(N=41\)), bedroom (\(N=41\)), hallway (\(N=37\)), living room (\(N=35\)), pantry/cellar (\(N=32\)) and office (\(N=30\)). After these there was very little agreement between participants, the next room in this list would be the garden with only 5 mentions. Therefore these rooms create our top-level for the categorization.

After defining the rooms used for product allocation, the furniture or places within these rooms had to be specified. The data collected by the interview was qualitative. Characteristic keywords were chosen to organize the answers given. Furniture that is very similar in use has been combined. For the example “bed linen” in the bedroom, the “built-in cupboard” (\(N=1\)), the “linen cupboard” (\(N=2\)), the “chest of drawers” (\(N=2\)) and the “cupboard” (\(N=11\)) were combined under the most frequently used keyword “cupboard”. Here is a complete overview of the rooms and associated furnitures (sub-level) in our categorization: Bathroom: cabinet, sink, sink cabinet, hook, toilet; Bedroom: bed, bedside table, cabinet, wardrobe; Storage: cabinet, washing machine; Hallway: cabinet, shoe cabinet; Kitchen: cabinet, drawer, fridge, sink, table, counter; Living Room: cabinet, computer, TV area, table; Office: cabinet, computer, desk.

In the second session, after we conducted and analyzed all 42 interviews, the same participants were asked to map the 36 selected products to one or multiple room-furniture pairs using an interactive web application. The web application consisted of three steps and was used to map the products to furniture and rooms (see Fig. 2). The current question ‘Where do you expect product “X”?’ was displayed at the top of the screen. The current selection was listed left, as well as a next button to confirm the selection and to get to the next question. On the right, the apartment categories were visualized by an interactive hierarchical tree representation. The root represented the apartment itself, followed by the rooms as top-level and associated furniture as sub-level. This process supported multiple placement of products. In the main study, however, only one room-furniture pair was assigned to each product, namely the most frequently selected pair in this phase of the pilot study.

4.2 Phase 2: Product Groups

After selecting a suitable product set in the conceptual process, each product was assigned a level of difficulty to ensure better comparability. This was necessary to control learning effects within a categorization during the main study. The second phase took place in the local hypermarket, which provides us with the selected product data set, floor plan for the spatial store-based interface, and the departments and shelf names. Each participant (\(N=30\), 14 female) was provided with a worksheet, where the products had to be assigned to one of the store-based top categories. For each product, all incorrect answers were identified and statistical values were calculated for the error rates. Overall, the average error rate over all products was \(21.67\%\) (\(SD=27.20\)). This finding also shows that there is considerable potential for improvement, at least for this particular retailer, since expectations often do not correspond to reality. The average statistical values resulting from the short classification led to two comparable product groups being formed in the following main study in order to eliminate learning effects in relation to the different categories. Therefore, the total package of 36 products was divided into two groups of 18 products each with comparable error rates (A: \({\sim }21.1\%\), B: \({\sim }22.2\%\)).

5 Main Study

We conducted an experiment to compare the developed apartment metaphor for online shopping with more common menu representations in respect to task performance (success rate, task completion time), user preferences (user experience, usability and task workload), and unmet needs. We evaluated two menu representations (Linear vs. Spatial) and categorizations (Store vs. Apartment). Our main hypotheses were defined as:

  • \(H_{1-1}\): The task can be performed more efficiently using Apartment categorization with regard to task performance.

  • \(H_{1-2}\): Apartment is preferred over Store categorization with regard to user preferences.

  • \(H_{2-1}\): The task can be performed more efficiently using Spatial representation with regard to task performance.

  • \(H_{2-2}\): Spatial is preferred over Linear representation with regard to user preferences.

5.1 Participants

For the main experiment, 24 different unpaid participants (12 female) were recruited from the university’s campus; they were aged between 20 and 33 years (\(M = 25.3\), \(SD = 3.6\)). Most of the participants live in a two room apartment (\(Median=2, M=2.04, SD=0.86\)), which do not include bathroom, kitchen or hallway, with two inhabitants (\(Median=2\)). Seven participants live with their parents/family (\(29.17\%\)), six in a partnership or shared apartment (\(25\%\)) and five live alone (\(20.83\%\)). On a 7-point scale from never to daily with regard to online shopping frequency, most participants regularly purchase online, i.e. \(62.5\%\) at least several times per month, and all participants shop online at least once per month with computer/laptop (\(N=24\)), compared to less frequently with tablets (\(N=7\)) or smart-phones (\(N=14\)).

5.2 Apparatus

The experiment was conducted on a MacBook Pro running Mac OS (10.11.6) connected to a 24-in. monitor. A standard wireless mouse was used as input device with medium speed settings. The software was displayed in Google Chrome (v58.0.3029.110, 64-bit). HTML, CSS and JavaScript were used for the different menu interfaces in the prototype. Additionally, the JavaScript D3 library was used for data visualization purposes in the spatial menu condition. A database was set up using XAMPP (v7.0.5-0), data exchange was realized using PHP.

5.3 Evaluated Conditions

We evaluated two menu representations (linear, spatial) and two categorizations (store, apartment) to search products. In the linear menus, the menu items are arranged horizontally for all three vertical menu levels (top, sub and product level). In the spatial representations, items are arranged in a grid representing a virtual floor plan. Here, the participant clicks on the area with the corresponding text label to select a top-level category (see Fig. 1). As a result, the corresponding furniture or shelf icons are displayed within the selected area. After clicking on a shelf or furniture, the associated products are then displayed in the product area below the menu area (see Fig. 1). Although some products in the apartment concept should normally be placed in several positions per product, only one position was considered for each product in this main study in order to improve comparability of the conditions. This unique product placement refers to the most frequently mentioned placement from the pilot study.

5.4 Design

The experiment was a within-subjects design, with two independent variables with two levels each (representation: linear and spatial; categorization: store- and apartment-based) and five dependent variables related to the performance (task completion time, success rate) and preference of users (user experience, usability, workload). All representation and categorization conditions were counterbalanced using a Latin square. In order to eliminate learning effects concerning the different categorizations, the total set of 36 selected products was split into two comparable and equally difficult groups with 18 products respectively. Aside from training, this amounted to: 24 participants \(\times \) 2 representations \(\times \) 2 categorizations \(\times \) 18 product searches \(=\) 1728 trials.

5.5 Task

During the main study, each participant performed a series of 18 search trials using a combination of the two representations (linear and spatial) and categorizations (store and apartment). The goal was to find and select a specific product and confirm the selection. The top-level categories were the departments of a local store for store-based and rooms for apartment-based categorization. The sub-level represented the shelves for store-based and furniture for apartment-based categorization (as developed in our pilot study). While the top- and sub-level visualizations differed depending on categorization and representation, the product level was displayed similar over all conditions by a product image and text label. A trial was successfully completed when the correct target product was selected and confirmed within a time limit of 30 s, or counted as failed otherwise.

5.6 Procedure

After welcoming the participant by the experimenter, she was introduced by an informed consent form. Each participant used all four menu types in Latin square order to search for products in the prototypical online shop. Before using a particular type of menu, the participant was introduced to the tested condition by watching a demonstration video that showed an example search task step by step. Then a set of 18 search trials was carried out in random order. Before each trial, the name and image of the target product appeared for five seconds. Then the product had to be found in the three-level menu and selected by clicking on it and confirming the selection. After each trial set per menu type, three post-task questionnaires (UEQ [15], SUS [3], NASA TLX [9]) were filled out by the participant to collect user preference ratings. The entire process was then repeated for the other three menu types. Afterwards, a final post-study questionnaire was answered, which included demographic questions. In total, the main study lasted about 50–60 min.

5.7 Evaluation Metrics

We measured task performance in the form of objective data (task completion time, success rate) and collected data describing users’ preference to the methods, including subjective feedback (user experience, workload, motion sickness, immersion).

Task Performance. For each participant, we measured task completion time and success rate across the 18 product searches, in accordance with the common standards for product searches in online shops [17], as follows:

  • Task Completion Time (s) was measured (in seconds) from when the countdown reaches zero to the product selection confirmation.

  • Success Rate (%) was computed by calculating the number of correct product searches divided by all per set of trials for a tested menu interface.

User Preference. We collected a variety of subjective feedback to assess user experience and workload, but also usability, important in online shop applications. Therefore, we used the following questionnaires:

  • User Experience Questionnaire (UEQ): rated on a 7-point scale [15]; The higher the score the better.

  • NASA Task Load Index (NASA TLX): rated on a 21-point scale  [9]; The lower the rating the better.

  • System Usability Scale (SUS): rated on a 5-point scale [3]; The higher the score the better.

6 Results

Throughout this results section and in the following discussion we use abbreviations, fill patterns and color indications for the two menu representations and categorizations we tested: Linear (striped), Spatial (solid), Store-based (orange), Apartment-based (blue). The results of the experiment were analyzed using IBM SPSS Statistics 25. For the data analysis, we calculated repeated measures MANOVA and follow-up univariate ANOVAs. Using Wilks-Lambda statistic, there were significant differences for representations (\(\varLambda =0.13, F(6,18)=19.38, p < 0.01\)) and categorizations (\(\varLambda =0.03, F(6,18)=92.70, p < 0.01\)) for all tested dependent variables.

6.1 Task Performance

The task performance is measured quantitative through task completion time and success rate. These metrics indicate to what extent users are able to cope with the menu interfaces. They are computed per participant and condition as the average over the 18 trials.

Success Rate. The success rate describes the ratio between the number of successful and the total number of product searches. A product search is considered successful if the correct product has been selected and confirmed within the maximum execution time of 30 s. Spatial/Apartment achieved the highest average success rate (\(M=98.61, SD=11.72\)), and Linear/Store the lowest (\(M=69.44, SD=46.12\)). Univariate ANOVAs showed significant differences with regard to success rate between the representations (\(F_{1,23}=58.97, p<0.01, \eta ^2=0.72\)) with Spatial (\(M=90.05, SD=29.96\)) better than Linear (\(M=80.44, SD=39.69\)), as well as between the categorizations (\(F_{1,23}=212.25, p<0.01, \eta ^2=0.90\)) with Apartment (\(M=95.02, SD=21.76\)) better than Store categorization (\(M=75.46, SD=43.06\)). An interaction effect for success rate could be shown between the two tested conditions (\(F_{1,23}=4.38, p<0.05, \eta ^2=0.16\)).

Task Completion Time. Task completion time was measured as the elapsed time in seconds to complete a single product search. The timer started when the countdown reaches zero and stopped automatically when the correct product has been selected and confirmed. In this analysis, we only included successful product searches. Furthermore, univariate ANOVAs showed significant differences for speed between representation (\(F_{1,23}=100.44, p<0.01, \eta ^2=0.81\)) with Spatial (\(M=9.47, SD=5.28\)) faster than Linear (\(M=10.38, SD=5.77\)), and categorization (\(F_{1,23}=610.66, p<0.01, \eta ^2=0.97\)) with Apartment (\(M=8.79, SD=4.82\)) faster than Store (\(M=11.31, SD=6.04\)).

Fig. 3.
figure 3

Speed measurements (seconds) of successful trials. (Color figure online)

Fig. 4.
figure 4

User Experience Questionnaire (UEQ) results with respect to comparison benchmarks [31] (see shaded boxes). To make it easier to read, this figure shows a detail part between −1.5 and 2.5, while the original ranges between −3 and 3. (Color figure online)

6.2 User Preferences

User Experience. We chose the UEQ [15] as an end-user questionnaire to measure user experience (UX) in a quick and straightforward way. On a scale between \(-3\) and 3 the overall UX, Spatial/Apartment achieved the highest score of 2.10 (\(SD=0.53\)) on average, and Linear/Store the lowest (\(M=-0.76, SD=1.15\)), with significant differences between representations (\(F_{1,23}=26.35, p<0.01, \eta ^2=0.53\)) and categorizations (\(F_{1,23}=99.20, p<0.01, \eta ^2=0.81\)) also differed significantly regarding the overall UX score. Spatial was rated higher with an average of 1.24 (\(SD=1.30\)) than Linear (\(M=0.20, SD=1.50\)) with respect to representation, whereas Apartment was rated higher (\(M=1.63, SD=1.02\)) than Store (\(M=-0.19, SD=1.33\)) with respect to categorization. However, the data was also subjected to a factor analysis, including the six UEQ factors. Spatial/Apartment outperformed all other menu interfaces across the UEQ subscales, even ‘excellent’ in terms of all subscales, followed by the Linear/ApartmentxSpatial/Store, and finally Linear/Store (see Fig. 4).

Usability. The SUS [3] is one of the most popular questionnaire for measuring attitudes toward system usability. It is a reliable and valid measure of perceived usability. Spatial/Apartment had the best score with 89.17 (\(SD=8.16\)) on average, and Linear/Store (\(M=50.73, SD=26.66\)) the worst. Univariate ANOVAs pointed out a significance between the representations (\(F_{1,23}=8.32, p<0.01, \eta ^2=0.27\)) and the categorizations (\(F_{1,23}=41.44, p<0.01, \eta ^2=0.64\)). An interaction for usability between categorizations and representations could be found (\(F_{1,23}=8.32, p<0.01, \eta ^2=0.27\)). Comparing the categorizations, Apartment had a higher average usability score (\(M=85.05, SD=13.95\)) than Store (M = 56.35, SD = 23.54). With regard to the representations, Spatial (\(M=75.57, SD=19.89\)) was higher than Linear (\(M=65.83, SD=26.93\)).

Fig. 5.
figure 5

The overall NASA TLX workload scores. (Color figure online)

Task Workload. The task workload of the tested menu representations and categorizations was assessed with NASA TLX [9]. On average, Spatial/Apartment was rated the best (\(M=22.10, SD=10.78\)) and Linear/Store (\(M=61.04, SD=18.69\)) the worst (see Fig. 5). Univariate ANOVAs showed significant differences between the representations (\(F_{1,23}=18.00, p<0.01, \eta ^2=0.44\)) and categorizations (\(F_{1,23}=134.24, p<0.01, \eta ^2=0.85\)). Spatial was rated lower (\(M=35.68, SD=22.06\)) than Linear (\(M=45.96, SD=24.27\)), whereas Apartment achieved lower scores (\(M=26.49, SD=16.19\)) than Store (\(M=55.15, SD=21.17\)). We conducted a multivariate ANOVA with regard to these factors and found significant differences for all factors except physical demand between the four conditions, only for temporal demand, effort and frustration between the representations, and for all factors between the categorizations (Table 2):

Table 2. NASA TLX contains six subscales (MD: Mental Demand, PD: Physical Demand, TD: Temporal Demand, PF: Performance, EF: Effort, FR: Frustration). The values refer to this format: (\(F(x,92)=..~, p<..~, \eta ^2=..~\)), with \(x=3\) for the menu types and \(x=1\) for representations and categorizations.

7 Discussion

7.1 Task Performance

The average success rate of about 99% of the spatial apartment-based menu is significantly higher than all other tested conditions, and contrasts with the linear store-based menu with the lowest rate of 69%. The speed results are based on successful product searches only and show that the task was executed faster with spatial apartment-based menus than with all other menus. This suggests that its intuitive categorization and spatial representation help the user to better understand the underlying information space. In addition, the visual cues in the spatial menus actually seem to facilitate the visual search process.

The clear differences in task performance indicate that a spatial grid-based menu in conjunction with an apartment-based categorization was more efficient than all other tested combinations, which proves \(H_{1-1}\) and \(H_{2-1}\). Since the menu with the worst average task performance is the commonly used menu type in today’s online shops (see Sect. 2.3), our results show a remarkable potential for improvement.

7.2 User Preferences

The overall user experience results and highest ratings in all six UX subscales show that there is a clear advantage of the spatial apartment-based menu over all other tested menus. In particular, the significantly higher ratings of “Perspicuity”, “Efficiency” and “Dependability” speak for more understanding, user-friendliness and reliability. Here, too, the visual hints of the spatial representation seem to facilitate the search process. High ratings in “Attractiveness”, “Stimulation” and “Novelty” indicate that the more realistic and vivid presentation of the apartment categories seems to lead to a new and appealing experience.

Similar trends can be observed in the usability results. Here, too, the two apartment-based menus have achieved significantly better results. In addition, spatial menus achieved significantly higher usability values than linear menus within the respective categorization. Since values around 68 can already be interpreted as average to moderateFootnote 2, the two apartment-based interfaces with an average score of 85.05 can be described as ‘excellent’. Whereas the spatial apartment-based menu even has a value of 89.17, which shows that comprehensibility is further supported by the illustrative character of the spatial representation. Overall, the results clearly show that spatial apartment-based menus were more usable than the other tested menus.

The task workload results show that store-based menus have scored more than twice as many points (55.15) as the apartment categorization (26.49). This indicates that the classification by rooms and furniture is cognitively less demanding than by product worlds and shelves. The spatial menus also achieved significantly lower utilization than the linear menus. For the individual subscales of the NASA TLX the spatial menu leads to significantly less effort, frustration and mental demand, which indicates that the visual cues facilitate and accelerate the orientation process. The apartment additionally minimizes the mental demand, since no complex and strenuous considerations were necessary. Overall, the new categorization and representation is less demanding and frustrating.

In summary, taking into account the results of user preference, it can be stated that there is a significantly higher preference with regard to user experience, workload and usability for the spatial apartment-based menu. Thus \(H_{1-2}\) and \(H_{2-2}\) can be accepted, since they are fulfilled in all aspects considered.

7.3 Observations and Comments

The participants’ comments also confirm the overall impression of the previously discussed results. In the post-study questionnaire, the participants were explicitly asked for their opinion of the tested menus. Here, 23 out of 24 participants preferred the spatial apartment-based menu, only one the linear apartment-based menu. This choice was based on terms such as “intuitive”, “easy”, “entertaining”, “clear” or “fast”. The results of a pair comparison also showed that the combination of spatial apartment-based menus was preferred (\(98.61\%\)), which faces the least preferred linear store-based one (\(15.28\%\)). This was also confirmed by comments like “so hard” or “it will take a long time” for linear store-based and “this was cool”, “great” or “very intuitive” for spatial apartment-based after the corresponding demonstration videos were shown. The majority of participants would like this combination to be integrated into current online shops.

7.4 Limitations

The major drawback of this study is the limited amount of 36 products tested. This applies in particular to the remarkably high ratings of preference questionnaires, which are often close to the optimal rating. Such ‘excellent’ results rarely occur in practice and are probably due to the limited test conditions. The scope and thus the number of products and categories of real online shops is usually much larger and therefore more complex. It might well be the case that the apartment metaphor does not scale with a large amount of products in its current form. A larger number of products would imply more level of categorizations, e.g. different parts of furniture. A fridge should therefore be partitioned into sublevels like door, vegetable drawer, and other layers. Each product can be categorized into different product variations and brands. Our 3-level approach has been chosen for a better overview and feasibility for our experiment. Our concept of an online shop using the apartment metaphor aims at products, which can be logically found in a standard apartment, e.g. no garage or garden products. Here, we would extend the apartment through e.g. a “home” metaphor. In addition, the exact categorization might be subject of cultural change.

Furthermore, online stores have a wider range of functions. The implemented prototype can therefore certainly not reflect the complexity of a real online shop, but forms the basis and new insights for a rethinking in the area of menus in online shops. In addition, the selected products are mainly based on a list of frequently sought-after products from a particular market. Thus it cannot be completely excluded that other products can be found more easily with the traditional categorization. Overall, expectations for preference evaluations and objective measurements in a fully functional online shop should be realistically lowered overall. However, the clear significant differences show that spatial apartment-based menus should still be preferred to the others.

8 Conclusion and Outlook

Especially when it comes to explorative setting in which a user tries to find something from a menu, current online shops can be significantly improved. Even though related work recommends to abandon linear and store-based menu interfaces [2, 19, 30], most current online shops are still employing these. Therefore, we investigated two representations (Linear, Spatial) and two categorizations (Store, Apartment) in an online shop prototype. The Apartment metaphor [1, 36] turned out to be an effective way to support consumers to quickly and easily understand and use the offered information in terms of filtering out desired parts. Compared to the reference menu, the success rate was 42% higher and led to 42% faster search times than with our stored-based concept. Spatial grid-based menus performed significantly better than linear menus with a success rate about 12% higher on average. Excellent usability and user experience ratings indicate that spatial apartment-based interfaces increase understanding and reliability. The workload results also indicate that the intuitive apartment categories are less complex and could lead to less frustration. In addition, 23 out of 24 participants explicitly indicated that they preferred the apartment categories. Hence, our study confirmed previous approaches and demonstrated that our approach leads to significant performance increases. While we do not claim absolute generalizability for all online stores, this work highlights the potential for improvement.

In summary, this work opens up a large new field of research for the realization of menus in online shops. New methods for an enriched and facilitated shopping experience have been introduced. While the new menu types usually have the potential to improve menu interaction in online shops, further research is needed to ensure that this result is maintained in a complete and comprehensive shop system. We would extend the apartment through a Home metaphor including garage or garden. Furthermore, hybrid representations could also solve problems that might occur which a larger amount of products.