1 Introduction

The increasing volume of music imposes a cognitive challenge when users explore preferred music from a large collection. To overcome this, music streaming services try to organize their collections in such a way that users can easily browse and find what they would like to hear. For this purpose, different categorization methods from music information retrieval (MIR) are used to organize music (for an overview see [4, 40]).

Whereas, the “genre” taxonomy has been most commonly used to organize music, popular music streaming services, such as 8tracks (http://www.8tracks.com), AccuRadio (http://www.accuradio.com), Songza (http://www.songza.com), Spotify (http://www.spotify.com), have started to provide additional, user-centric, taxonomies to better serve users with diverse music browsing needs. Derived from research that investigated how people use music in their everyday life (e.g., [67]), taxonomies such as mood and activity are being used.

Although, providing different taxonomies serve different browsing needs of users, taxonomies can start to compete with each other and thereby influence the overall satisfaction [47]. Even taxonomies that are not relevant for the search goal can distract the user [74], complicate the search process [9], and increase the search effort because of conflicting attention [54]. Therefore, understanding taxonomy preferences on an individual level is important to provide a personalized music experience. For example, music streaming services could emphasize the taxonomy that is important to the user while muting less preferred taxonomies to the background, or not showing them at all.

The subsequent amount of content within a taxonomy can further influence the users’ preference strength and satisfaction with the eventually chosen item [53]. Ample research has shown that presenting more options may not always have positive effects. More options can cause overchoice (also referred as “choice overload”), which in turn influences the difficulty to make a choice and satisfaction with the eventually chosen item and decreases choice satisfaction [6, 46, 51, 80, 86, 88].

In this work we look at the two aforementioned aspects (i.e., taxonomic music browsing strategies and overchoice effect within taxonomies). To investigate taxonomic music browsing strategies, we explore the relation with personality traits. Personality has shown to be a reliable predictor of human preferences. It has shown to be an enduring factor that influences people’s behavior [56], interests, and taste [32, 59, 78]. Hence, the preference for a certain taxonomic music browsing style may be reflected through users’ personality as well. Furthermore, we look at the amount of content presented within a taxonomy on the occurrence of overchoice. As the effect of overchoice has been shown to be influenced by different moderators (e.g., expertise, choice set attractiveness. For an overview see Scheibehenne et al. [85]), we investigate whether the musical expertise of users influence the preference for a smaller or larger choice set. In this study, where we rely on stable constructs, such as personality and expertise, systems can be adapted towards specific behaviors, preferences, and needs of users. Hence, it allows systems to accommodate for a better user experience.

The research questions (RQs) that we try to answer in this work are:

RQ 1: How do personality traits relate to taxonomy (mood, activity, genre) preferences in music streaming services?

RQ 2: How does the size of the choice set influence the user experience (i.e., choice satisfaction, perceived system usefulness, and perceived system quality), and how is this moderated by expertise?

To investigate these research questions, we conducted a user study (preceded by a preliminary study) in which we simulated a music streaming service application. Among 297 participants we found that personality is related to different music browsing strategies. Furthermore, looking at the effect of the choice set size within a chosen music taxonomy, we found that musical expertise plays a moderating role in how the system is evaluated by the user. Within the mood taxonomy, participants with higher musical expertise rated the system as more useful and of higher quality when they were facing the choice set with less options. However, this was the other way around for the genre taxonomy, where participants with a higher music expertise rated the system more useful and of a higher quality when facing a bigger choice set. The presented findings have important implications on how music interfaces should be designed in order to maximize the user experience by facilitating in specific music preferences and needs of users. Based on our work, music services could be further personalized by adapting the user interface depending on the user’s personality and level of music expertise. This allows for counteracting on decreased user experience by the user interface.

Overall, we provide new insights on how music streaming services can adapt their interfaces by targeting the user browsing needs, hence supporting users in finding music that they would like to listen to. Next to that, our work makes contributions to several research fields. Firstly, we contribute to the field of personality-based preferences. We show that personality does not only explain music genre preferences [78], but that it extends to the overarching music categorizations (i.e., taxonomies) by showing that personality traits are related to different music browsing strategies (i.e., browsing for music by mood, activity, or genre). Secondly, we contribute to the decision-making literature by extending the knowledge about when and how overchoice occurs in the context of music. For this we look at the categories within a taxonomy, and show that music expertise is an important influencing factor on the evaluation of the system and chosen item.

We investigated two different RQs within one study. Therefore, the remainder of the paper is structured as follows. We first discuss the related work separately for each RQ in Section 2. After the related work, we continue with the materials (Section 3) that were used for the user study to answer RQ 1 and 2. In Section 4 we discuss the preliminary study that was necessary to define the content for the user study. Subsequently, we divided Section 5 into Study A and B, where we will treat the hypotheses, findings, and discussion related to RQ 1 and 2. We discuss the limitations and future work in Section 6. Finally we round off the paper by drawing conclusions in Section 7.

2 Related work

We review the literature about taxonomies and categories according to the two parts of our user study respectively. The first part discusses work that is related to the taxonomies and personality traits (Study A. Section 2.1), and the second part focuses on the overchoice effect (Study B. Section 2.2).

2.1 Study A - taxonomies

In the following sections we discuss how taxonomies influences users’ decision making, and how personality is able to predict the preference for a taxonomy.

2.1.1 Taxonomic influence

The effects of overchoice have been well studied. However, most research on overchoice in consumer decision making has investigated choice satisfaction by focusing on choices in isolation (i.e., choices within a taxonomy; e.g., [6, 46, 51, 80, 86, 88]). For example, Iyengar and Lepper [51] investigated overchoice by using an assortment of on a specific set of jams, whereas Bollen et al. [6] created movie recommendations by using only the Top-5 and Top-20 movies. Although they found effects of overchoice on choices in isolation, others have shown that the satisfaction with the eventually chosen item already starts at the overarching categorizations; the taxonomies (e.g., [43, 47, 74]). Herpen et al. [47] asked their participants to choose a shirt from clothing brochures and found that taxonomies can distract in the decision making process. Their participants experienced higher decision effort, had more difficulties grasping the selection, andothing taxonomies (e.g., shirts, pants, shoes) than when substituted with content of one taxonomy (i.e., only shirts). Complementary taxonomies can cause consumers to extend their decision making time even when complementary taxonomies are not relevant for the initial search goal [74]. When taxonomies are placed next to each other, they start to compete and this is exacerbated especially when they consist of features that are unique and not directly comparable [43].

Although different taxonomies in music streaming services serve the same goal of providing users with music that they would want to listen to, they also consist of unique features that are not directly comparable: the taxonomies provide users the possibility to browse for music in different ways. In general, the mood taxonomy provides users with music that is similar to how they feel, the activity taxonomy provides music that fits a specific activity, and the genre taxonomy has music categorized based on a set of stylistic criteria. Given that the features of the taxonomies are not directly comparable, they can distract the user and increase the effort of picking the right music taxonomy to continue the music browsing. In the end, it can influence the satisfaction with the eventually chosen music item.

To minimize the negative influence of competing taxonomies, we try to counteract that by identifying the intrinsic music browsing preference of the user. By identifying the user’s most preferable music browsing strategy, the system can anticipate the desired user interface. For example, the system can display the preferred music browsing taxonomy or already recommend music that is in line with a user’s music browsing strategy (e.g., [26]). In order to identify the music browsing preference of users, we rely on personality traits. We will discuss prior work related to personality in the next section.

2.1.2 Personality

Personality has shown to be an enduring factor that influences an individual’s behavior [56], interest, and tastes [59, 78]. As personality plays such a prominent role in shaping human preferences, one can expect similar patterns (i.e., behavior, interest, and tastes) to emerge between similar personality traits [10]. Different models have been created to categorize personality, where the five-factor model (FFM) is the most well known and widely used [69]. The FFM consists of five general dimensions that describe personality. Each of the five dimensions consist of clusters of correlated primary factors. Table 1 shows the general dimensions with the corresponding primary factors.

Table 1 The five-factor model adapted from McCrae and John [69]

There is a growing amount of psychological literature investigating the relationship between personality traits and music consumption (e.g., [32, 38, 39, 76, 78, 79, 94]). For example, music preferences were found to be correlated with personality traits. Rentfrow and Gosling [78] categorized music pieces into four music-preference dimensions (reflective and complex, intense and rebellious, upbeat and conventional, and energetic and rhythmic), and found correlations with the five general personality dimensions (i.e., openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism), such as, a relationship between energetic and rhythmic music, and extraversion and agreeableness. The psychological work on personality provides valuable information for the development of domain specific recommender systems.

Personality in personalized systems

There has been an emergent interest in how to use personality in personalized systems (e.g., recommender systems), and several directions have been proposed (e.g., [14, 21, 23, 24, 26, 28, 34, 92, 93]). For example, Tkalcic et al. [93] propose a method to overcome the “cold-start problem”Footnote 1 by including personality information to enhance the neighborhood measurement. Hu and Pu [49] have shown that personality-based recommender systems are more effective in increasing users’ loyalty towards the system and decreasing cognitive effort compared to systems that do not use personality information.

2.2 Study B - categories

In this work, we further look into the influence of the number of categories presented within each taxonomy (Section 5.3). With this we join decision-making research on overchoice. Overchoice (or choice overload) refers to the increase of choice difficulty and eventual decrease of satisfaction as the number of choices increase. Iyengar and Lepper [51] were one of the first to define overchoice by testing the attractiveness between a set of 6 or 24 types of jam. Although their result shows that initially participants were more attracted to the larger (24 item) set, those who were exposed to the smaller (6 item) set were more inclined to actually buy a pot of jam (3% and 30% respectively of the participants bought jam). Additionally, assessment of satisfaction showed that those who bought jam from the larger choice set were less satisfied compared to those with a purchase from the smaller choice set.

The overchoice effect has been replicated numerous times in different context, and was shown to affect motivation to choose as well as satisfaction with the chosen item (e.g., [6, 46, 52, 80, 88]). Shah and Wolford [88] found a motivational buying decrease in purchasing black pens when the assortment size increases. When they increased the assortment size of the black pens, participants’ motivation to purchase decreased from 70% to 33%. Reutskaja and Hogarth [80] investigated overchoice in the context of gift-boxes prices, and found that satisfaction with the chosen gift-box decreased when the number of gift-boxes to choose from increased. Similar findings were shown by Haynes [46] in the context of the number of lottery prices to choose from. Likewise, Bollen et al. [6] demonstrated a decrease in choice satisfaction in a movie recommender system among an increased number of movies to choose from.

Although there is an increased chance of a decrease in choice satisfaction, people somehow still cherish more choice, and studies have shown that shops with a large variety of products even create a competitive advantage by providing more choices (e.g., [1, 8, 12, 13, 50, 57, 63, 68, 72]). So it seems that even though consumers risk to be more dissatisfied with their choice at the end, they still are attracted to more choices. A larger choice set becomes more attractive because of the summed benefit of each option, and thereby the total benefit of the set increases [18]. However, satisfaction decreases because making the right choice becomes more difficult. The psychological cost increases as a consequence of an increased number of choices. In other words, the summed benefit of a larger choice set is outweighed by the cost of comparing each option in order to make the right decision, increased risk of making a wrong choice, and increased expectations with the chosen item [18, 86, 87]. This results in that a larger choice set has a higher chance of decreased satisfaction or that no choice is made at all. Reutskaja and Hogarth [80] showed that overchoice occurs in an inverted satisfaction U-curve, where at one point the total cost of the choice set grows faster than the total benefit, causing a decrease of satisfaction.

Apart from studies that have shown the overchoice effect, there are also studies that demonstrate an opposing view (e.g., [5, 7, 19, 58, 91, 95]). They found that reducing the variety in retail shops often result in decrease sales or no change at all. Scheibehenne et al. [85] performed a meta-analysis of 50 studies voting against and in favor of the overchoice hypothesis, and found that the overall effect size comes close to zero. There seem to be necessary preconditions for a choice set before overchoice occurs [84, 85]. One factor is the attractiveness of the choice set plays. When items of a choice set are comparably attractive, and especially when they additionally consist of incomparable features, the chances of overchoice increases [15, 22]. Furthermore, a factor that has shown to play a significant role is domain expertise [70, 85, 96]. Domain experts are less prone to be overwhelmed by the increasing number of choices, and therefore, overchoice is less likely to occur.

In order to investigate the overchoice within a music taxonomy, we first needed to create a choice set that meets the precondition (i.e., a choice set with attractive items). We conducted a preliminary study where we identified the categories that would be most attractive to our participants (see Section 4). Additionally, the moderator effect of musical expertise on overchoice is further investigated in Section 5.3.

3 Method

To investigate music taxonomy preferences for music browsing, and overchoice of categories within a music taxonomy, we created an online experiment where we simulated a music streaming service application. This application allowed us to study both RQ1 and RQ2 at the same time. The studies are divided in Study A and Study B respectively. In the following sections we will discuss the experiment and the materials used in detail.

3.1 Procedure

To answer RQ1 and RQ2, we simulated a music streaming service application named “Tune-A-Find” (see Fig. 1 for the work-flow of the experiment). Before participants started the experiment, instructions were given stating that they were about to test a new music streaming service. We emphasized that it is important that they interact with the application in the most ideal way for them. This allowed us to minimize experience bias with any of the taxonomies. After participants agreed with the instructions they continued by interacting with Tune-A-Find.

Fig. 1
figure 1

Experiment work-flow. Participants were given instructions about the study, then continued by interacting with the music streaming application (see Section 5.2 for details). After choosing a music taxonomy to continue the music browsing, participants were randomly assigned to either the small choice set (i.e., 6-categories) condition or the large choice set (i.e., 24-categories) condition. After picking a category, participants continued to the concluding questionnaires

Tune-A-Find consists of a simple interface with three taxonomies (i.e., mood, activity, and genre) for participants to browse for music (see Fig. 2 and Section 5.2). A tooltip provided users a description of each taxonomy.Footnote 2 The order of the taxonomies was randomized to prevent order effects. After participants chose a taxonomy to search music by (i.e., mood, activity, or genre), they continued on by choosing a category (i.e., type of mood, type of activity, or type of genre) within the chosen taxonomy.

Fig. 2
figure 2

Screen shot of Tune-A-Find with the “Mood” tooltip

For the categories within a chosen taxonomy, participants were randomly assigned to either the small choice set (i.e., 6-categories) condition or the large choice set (i.e., 24-categories) condition (Fig. 3 and Section 5.3). The categories within each taxonomy were based on the results of the preliminary study (Section 4). We did not allow participants to go back to pick a different taxonomy. Therefore, we included a “None of the items” option. Category order was randomized with “None of the items” option always placed last to increase chances that participants would naturally assess all the options first. After participants picked a category, they continued with the concluding questionnaires (i.e., user experience, musical expertise, personality, and demographics questionnaires). We tried to maximize ecological validity by not including real music recommendations (so that evaluations of participants were not influenced by the algorithm) and by stressing out that the application concerned a prototype of a new music streaming service.

Fig. 3
figure 3

Screen shots of the 6- and 24-categories conditions (top and bottom, respectively) with an extra option of “None of the items”

3.2 Materials

The taxonomies used in Tune-A-Find (i.e., mood, activity, and genre), are based on a close observation of current music streaming services. We found that these labels are increasingly being used (see Table 2).

Table 2 Grasp of the observed music streaming services and the taxonomies they use to organize music

For the number of categories to present within each taxonomy, we followed the original work of Iyengar and Lepper [51] on overchoice. They observed the occurrence of overchoice between choice sets consisting of 6 and 24-items. We conducted a separate user study to determine which categorical labels (types of moods, activities, or genres) to include within each taxonomy (see Section 4).

For the concluding questionnaires we made use of existing questionnaires measuring: user experience, musical expertise, and personality. To measure user experience factors we adapted the original user experience questionnaire of Knijnenburg et al. [61] to fit the music streaming context of our study. The questionnaire depicts different parts of the user experience. It measures participants choice difficulty, choice satisfaction, perceived system usefulness, and perceived system quality.Footnote 3

To measure participants’ musical expertise, we relied on the Goldsmiths Musical Sophistication Index (Gold-MSI; [71]). Although recent research has shown that personality traits can predict music sophistication [25, 44], we decided to explicitly measure music sophistication in order to obtain a more accurate music sophistication measurement. The Gold-MSI questionnaire measures music sophistication based on the following dimensions:

  • Active engagement (how much time and money one spends on music)

  • Perceptual abilities (cognitive musical ability related to music listening skills)

  • Musical training (musical training and practice)

  • Signing abilities (skills and activities related to singing)

  • Emotions (active behaviors related to emotional responses to music)

In the remainder of this paper, we will talk about “dimension expertise” to refer to the separate dimensions of the Gold-MSI. For this study we adopted parts of Gold-MSI that are related to the taxonomies (i.e., active engagement, perceptual abilities, and emotions).Footnote 4

To measure personality, we relied on the widely used, 44-item Big Five Inventory (5-point Likert scale; disagree strongly - agree strongly; [55]). Finally, standard demographic questions were asked (i.e., age and gender).

4 Preliminary study

To determine which categories to use in each taxonomy, we conducted a preliminary study. Prior research has shown that before overchoice occurs, the items in the choice set are subject to preconditions. For example, when the differences between the attractiveness of the items is small, and especially when they consist of incomparable features [15, 22].

In the following sections we outline the method and findings.

4.1 Method

For this preliminary study we recruited 45 participants through Amazon Mechanical Turk, a popular recruitment tool for user-experiments [60]. Only those located in the United States, and with a very good reputation were allowed to participate (≥95% Human Intelligence Task [HIT]Footnote 5 approval rate and ≥1000 HITs approved). We compensated participants with $1 for their participation.

We extracted the categories provided by Songza,Footnote 6 as they have a clear separation of categories between taxonomies whereas others (e.g., Spotify) have a mixed taxonomy view. For each taxonomy we asked participants to pick 12 categoriesFootnote 7 that they would most likely use when browsing for music.

4.2 Findings & conclusion

In line with prior work of [51] on overchoice, and work defining the preconditions of the choice set [15, 22], we picked the 6 and 24 most attractive (i.e., the categories that participants indicated to use most likely in their music browsing) categories (Table 3), and were used for Study B (see Section 5.3) where we investigate overchoice within a music taxonomy.

Table 3 Top 6- and 24-categories chosen by participants. # represents the number of votes

5 Main studies

In the following subsections, we discuss the main studies where we treat the hypotheses, findings, and discussion for each study separately. Study A depicts the taxonomy preferences (Section 5.2), and Study B addresses the overchoice effect within a chosen taxonomy (Section 5.3).Footnote 8

5.1 Participants

We recruited 326 participants through Amazon Mechanical Turk. Participation was restricted to those located in the United States, and also to those with very good reputation (≥95% HIT approval rate and ≥1000 HITs approved) to avoid careless contributions. Participants were recruited at various times of the day to balance night and day time music application usage. Several comprehension-testing questions were used to filter out fake and careless entries. This left us with 297 completed and valid responses. Age (19 to 68, with a median of 31) and gender (159 males and 138 females) information indicated an adequate distribution. Participants were compensated with $2 for their participation.

5.2 Study A

In Study A, we looked at how taxonomy preferences are related to different personality traits. To investigate this relation we simulated a music streaming service (Fig. 2). The application consists of a simple interface with three taxonomies (mood, activity, and genre) for participants to browse for music. A tooltip provided users a description of each taxonomy. The order of the taxonomies was randomized to prevent order effects. Once a taxonomy was picked, participants continued by choosing a category within the chosen taxonomy (this is addressed in Study B in Section 5.3). As we are interested in users’ intrinsic taxonomy preferences, participants were not able to go back once a taxonomy was picked. For those who want to choose a different taxonomy, we included an additional option of “None of the items” among the available categories. For those who picked this option, we included an additional question in the concluding questionnaire where they could indicate what they would have picked otherwise in terms of taxonomy (i.e., mood, activity, or genre) as well as the category within a taxonomy.

In order to prevent an experience bias with one of the music taxonomies (i.e., mood, activity, or genre), participants were told during the instructions of the user study that they were going to test a new music streaming service, and therefore it is important that they interact with the system in the most ideal way for them.

As there is no strong evidence from the literature to form hypotheses, we decided to adopt an exploratory approach. We try to draw relationships between our findings to what is known from prior research in the discussion section.

5.2.1 Findings

Using a chi-square test of independence, we explored the relationship between participants’ five personality dimensions and the chosen music taxonomy (mood, activity, and genre). We used a median split to divide each personality trait into a low and high measure and a binary value was assigned to each taxonomy representing whether or not a participant chose for a certain taxonomy. The distribution of the music taxonomy choices made by the participants are shown in Table 5. Participants in general chose the genre taxonomy followed by the mood and activity taxonomies. In the following sections we discuss the relationship between personality traits and the music taxonomy chosen by the participants (see Table 4 for an overview).

Table 4 Summary of the results for each taxonomy (i.e., mood, activity, and genre) with each personality trait: (O)pennes to experience, (C)onscientiousness, (E)xtraversion, (A)greeableness, and (N)euroticism

Mood taxonomy

Results of the chi-square test indicated a positive relationship between openness to experience and mood χ2(1, N = 297) = 3.117, p = .05. This means that those who scored high on the openness to experience dimension were more likely to choose for mood than for activity or genre taxonomy. We did not find any significant effects of the other personality traits: conscientiousness χ2(1, N = 297) = .934, p = .334, extraversion χ2(1, N = 297) = .870, p = .351, agreeableness χ2(1, N = 297) = .044, p = .833, and neuroticism χ2(1, N = 297) = .703, p = .402.

Activity taxonomy

When looking at the chi-square test results for the activity taxonomy, we found a positive significant effect of conscientiousness χ2(1, N = 297) = 3.210, p = .05. Additionally, we found a positive relationship of neuroticism χ2(1, N = 297) = 12.663, p < .001. These results indicate that those who scored high on neuroticism or conscientiousness were more likely to choose the activity taxonomy. We did not find significant effects for openness to experience χ2(1, N = 297) = .046, p = .830, extraversion χ2(1, N = 297) = .507, p = .477, and agreeableness χ2(1, N = 297) = .406, p = .524.

Genre taxonomy

The chi-square test results for the genre taxonomy indicated a positive significant effect of neuroticism χ2(1, N = 297) = 6.583, p = .01, which implies that those who scored high on neuroticism were more inclined to choose for genre than for the other taxonomies.. All the other personality traits were not significant: openness to experience χ2(1, N = 297) = 3.079, p = .11, conscientiousness χ2(1, N = 297) = 0, p = .997, extraversion χ2(1, N = 297) = 1.506, p = .220, and agreeableness χ2(1, N = 297) = .266, p = .606.

Additional finding

Additionally, we looked for effects of gender and age. Controlling for gender and age did not result in any significant effects. However, as seen in Table 5, the distribution of gender is interesting and indicates some trends. The distribution of women is higher in mood and activity, while conversely for genre.

Table 5 Distribution of men and women across the music taxonomy preference

5.2.2 Discussion

In this study, we investigated whether music taxonomy (mood, activity, and genre) preferences can be inferred from personality traits. We found that there is a relationship between personality traits taxonomy preferences that are used by music streaming services. We visualized our findings in Fig. 4.

Fig. 4
figure 4

Visualization of our findings: (O)penness to experience, (C)onscientiousness, (E)xtraversion, (A)greeableness, (N)euroticism

We found a positive relationship between openness to experience and the mood taxonomy. This indicated that those scoring high on openness to experience are likely to choose for music organized by mood. Knoll et al. [62] found that open individuals show reciprocal behavior towards emotional support. Those scoring high on openness to experience are more aware of, and more capable to judge their own emotions. Therefore, music can play a supportive role for them, and would find greater benefit from browsing for music by mood.

Furthermore, a positive relationship between conscientiousness and the activity taxonomy was found. In other words, highly conscientious people show an increased preference for activity, but not for genre. Conscientiousness refers to characteristics, such as, self-discipline. People that score high on the conscientiousness scale tend to be more plan- and goal-oriented, organized, and determined compared to those scoring low [10]. As conscientious people are more plan- and goal-oriented, they would benefit of taxonomies that consist of concrete music categories (e.g., activities) to support their plans and goals.

Lastly, we found relationships between neuroticism and the activity and genre taxonomy. This indicates that those scoring high on neuroticism are more likely to choose for activity or genre. The neuroticism dimension indicates emotional stability and personal adjustment. High scoring on neuroticism are those that frequently experience emotional distress and wide swings in emotions, while those scoring low on neuroticism tend to be calm, well adjusted, and not prone to extreme emotional reactions [10]. Additionally, those who are highly neurotic do not believe that emotions are malleable, but rather difficult to control and strong in their expressions [45]. As neurotic people do not consider emotions to be easily changed, they will not benefit much from the mood taxonomy, but more of the activity or genre taxonomies instead.

5.3 Study B

In Study B we looked into how the number of categories presented within a chosen music taxonomy influences the user experience (i.e., category choice satisfaction and difficulty, perceived system quality and usefulness), and how this effect is moderated by the participant’s musical dimension expertise (i.e., active engagement, emotion, and perceptual abilities). The conditions (6- and 24-categories) of Study B originate from the behavior in Study A, where participants picked a music taxonomy to continue their music browsing (Study A; Section 5.2). In the following subsections we continue with hypotheses building, findings, and discussion.

5.3.1 Hypotheses

Overchoice is not always bound to occur; the choice set needs to satisfy preconditions. We covered the choice set preconditions in Section 4. However, overchoice does not only depend on choice set characteristics, but the user’s characteristics play a role as well. A significant moderator for overchoice is the expertise of the user [11, 12, 64, 70, 85].

In line with findings showing expertise as a moderator for overchoice, we therefore hypothesize that also in the context of this study, expertise plays a role. In order to measure expertise, we rely on the different dimensions (i.e., active engagement, emotion, and perceptual abilities) of the Gold-MSI. The active engagement dimension depicts general music expertise (e.g., how much time and money one spends on music listening), while the dimensions emotion and perceptual abilities depict expertise related to the individual music taxonomies (mood and genre taxonomy respectively). For example, the emotion dimension is related to how often someone might choose music that will send shivers down their spine or how often music can evoke memories of past people and places, thereby mapping to the mood taxonomy. As the perceptual ability dimension is related to how well someone can compare two pieces of music or how well someone can identify genres of music, thereby mapping to the genre taxonomy. The active engagement dimension depicts general behavior, we believe that it has a positive effect in both the mood and genre music taxonomy.

Furthermore, we do not only investigate the effects of overchoice on choice satisfaction, but assess other parts of the user experience as well. Besides satisfaction, we also include choice difficulty, perceived system quality, and perceived system usefulness. Unless otherwise specified, we will refer to these factors as the user experience.

We hypothesize:

H1::

The number of categories within any of the taxonomies will have a positive effect on the user experience for dimension experts in active engagement, but not for non-experts. The dimensions emotion and perceptual abilities are more specifically oriented towards the mood and genre music taxonomy. Therefore we hypothesize:

H2::

The number of categories within the mood taxonomy will have a positive effect on the user experience for dimension experts in emotion, but not for non-experts.

H3::

The number of categories within the genre taxonomy will have a positive effect on the user experience for dimension experts in perceptual abilities, but not for non-experts. We do not hypothesize overchoice within the activity taxonomy as it depicts specific activities, and is unrelated to any kind of expertise or ambiguity.

5.3.2 Findings

A multivariate analysis of variance (MANOVA) was conducted to test for user experience (i.e., perceived system usefulness, perceived system quality, choice difficulty, and choice satisfaction) differences between 6- and 24-categories. With the MANOVA we first tested differences between the number of categories within each music taxonomy (i.e., without controlling for expertise). Tables 6 and 7 show the categories that the participants chose, and the total distribution across the music taxonomies respectively. Results show that for the mood, activity, and genre taxonomies, participants did not experience any significant difference whether it was the smaller choice set or the bigger choice set that they chose from (see Table 8 for means and standard deviations).

Table 6 Distribution of chosen categories within each taxonomy (6-categories condition)
Table 7 Distribution of chosen categories within each taxonomy (24-categories condition)
Table 8 Mean and standard deviations of the user experience factors on category size per taxonomy

In order to investigate the effects of expertise, we conducted a moderated multiple regression (MMR) analysis. We used the dimensions of the Gold-MSI (i.e., active engagement, perceptual abilities, and emotions) to assess participants’ expertise level, and added these as a moderator to the analyses. This allowed us to investigate how expertise influences the overchoice effect and the user experience factors (i.e., perceived system usefulness, perceived system quality, choice difficulty, and choice satisfaction).

The analyses were conducted in two steps.Footnote 9 In the first step we tested for main effects. This allowed us to see the general effects of expertise on the user experience factors within each music taxonomy, regardless of the number of categories. The second step involved the moderators (i.e., emotion, perceptual abilities, and active engagement dimension expertise). By including the moderators, we were able to look at how expertise influences overchoice, and in turn the user experience.

We separately discuss the significant findings of each music taxonomy on the user experience factors (i.e., perceived system usefulness, perceived system quality, choice difficulty, and choice satisfaction) below. In each of the following result sections, we first start with the significant main effects (i.e., the effect of expertise on the user experience without taking into account the different number of categories). After that we continue with the significant moderator effects (i.e., the effect of expertise on overchoice and the user experience).

Mood taxonomy

When looking at the results of perceived system usefulness, we found a significant main effect of emotion expertise (t(1, 63) = 1.939, p = 0.05). This indicates that in general participants that are emotion experts found the system more useful than non-experts. For perceived system quality, we found a significant main effect of active engagement expertise (t(1, 63) = −2.379, p = 0.02), as well as emotion expertise (t(1, 63) = 2.285, p = 0.02). This means that active engaged participants indicated that they perceived the system of lower quality while participations with emotion expertise rated the system of higher quality. Furthermore, we found a main effect on choice satisfaction of emotion expertise (t(1, 63) = 1.764, p = 0.08), indicating that those who use music for emotional activities are in general more satisfied with their category label choice.

When looking at differences between the number of categories while controlling for the expertise dimensions, we found the following moderator effects on the different factors of the user experience. For the perceived system usefulness, we found a significant moderator effect of emotion expertise (t(1, 63) = −2.147, p = 0.03). The results of the moderator effect indicate that emotion experts perceived the system as less useful when given more choices, while non-emotion experts perceived the system as more useful when given more choices (Fig. 5). When looking at the perceived system quality, we found a moderator effect of emotion expertise (t(1, 63) = −1.834, p = 0.07), indicating that emotion experts perceived the system of less quality when given more choices (Fig. 6). Lastly, we identified moderator effects on choice difficulty by emotion expertise and active engagement expertise. Emotion experts show a decrease in choice difficulty when given less choices (t(1, 63) = −1.754, p = 0.08; Fig. 7), whereas active engagement experts show a decrease of choice difficulty when given more choices (t(1, 63) = 2.385, p = 0.02; Fig. 8). No significant effects were found on choice satisfaction.

Fig. 5
figure 5

Moderator effect of emotion (E) expertise on perceived system usefulness (higher means more useful) within the mood taxonomy

Fig. 6
figure 6

Moderator effect of emotion (E) expertise on perceived system quality (higher means higher quality) within the mood taxonomy

Fig. 7
figure 7

Moderator effect of emotion (E) expertise on choice difficulty (higher means easier) within the mood taxonomy

Fig. 8
figure 8

Moderator effect of active engagement (AE) expertise on choice difficulty (higher means easier) within the mood taxonomy

Activity taxonomy

As expected, no main or moderator effects were found for the categories within the activity taxonomy.

Genre taxonomy

No main effects were found of the different expertise dimensions on the user experience. However, moderator effects were observed on the user experience factors when looking at the differences between the number of categories. A significant moderator effect was found on perceived system usefulness when controlling for perceptual abilities expertise (t(1, 197) = 2.260, p = 0.02). Participants with expertise in perceptual abilities rated the system as more useful when given more choices. On the other hand, those with low perceptual abilities rated the system as more useful when given less choices (Fig. 9). For perceived system quality, we found a moderator effect of perceptual abilities expertise (t(1, 197) = 1.838, p = 0.06). The results show that perceptual experts rated the system of higher quality when given more choices, while it hardly made a difference for non-experts (Fig. 10). No significant effects were found on choice satisfaction or choice difficulty by expertise in perceptual abilities, nor did we find any effects on the user experience factors by active engagement.

Fig. 9
figure 9

Moderator effect of perceptual abilities (PA) expertise on perceived system usefulness (higher means more useful) within the genre taxonomy

Fig. 10
figure 10

Moderator effect of perceptual abilities (PA) expertise on perceived system quality (higher means higher quality) within the genre taxonomy

5.3.3 Discussion

Our results show that expertise plays a role in whether overchoice occurs or not. With regards to H1, we only found partial support. We hypothesized that general musical expertise (active engagement), would play a role in whether overchoice occurs. However, we only found an overchoice effect in the mood taxonomy on choice difficulty. Those who were more expert indicated to find it more difficult to choose a category when they were given less choice, whereas non-experts indicated to experience more difficulties when given more choice. As this was the only effect found, the effect of expertise seem to be very specific, and cannot take any general form.

Remarkable is the effect of emotion expertise within the mood taxonomy. Here, the emotion expertise seem to adopt an opposite effect of overchoice. Therefore, we need to reject our hypothesis (H2). Instead of an increase in the user experience factors when given more choice, emotion experts show a decrease. In other words, they perceived the system as more useful and of higher quality, and indicated to have less difficulties to pick a category, when provided less choice. Non-experts indicated the opposite effect and were experiencing a higher user experience when given more choices. A possible explanation for this could be that emotional experts are in general more emotionally aroused and therefore prefer less choice because it takes less cognitive effort. This is in line with findings that show that emotional arousal can have an adverse effect on decision making because of reduced cognitive processing [20, 66]. In other words, information processing decreases as a result of emotional arousal. Making a choice from a bigger choice set would then take more effort to assess every option. Also, especially for those who rely more on the emotional triggers of music, making a bad choice will have bigger consequences than making a good choice [3]. Hence, as the choice sets within each music taxonomy were designed to be most attractive, choice difficulty within the mood taxonomy is exacerbated for the more experienced ones.

The effect of expertise in the genre taxonomy is partially in line with our hypothesis (H3). Prior research suggests that expertise is a moderator for overchoice [11, 12, 64, 70, 85]. Those who indicated to be experts in perceptual abilities rated the system of higher quality, and more useful, when more choices were provided.

It is striking that we did not observe a clear overchoice effect on the choice that was made (i.e., choice difficulty and choice satisfaction), but only on the evaluation of the system (i.e., perceived system usefulness and perceived system quality). Evaluating the necessary preconditions for overchoice to occur state that the user needs to have a lack of familiarity with the items, and should not have a clear prior preference for an item [51]. However, not meeting these preconditions should lead to preferring more choice [11, 12], whereas our results show no differences. Others argue that overchoice can only occur when all options are attractive. So, there should be no dominant option and the proportion of non-dominant options should be large [16, 17, 48, 77]. Otherwise, making a decision would be easy, regardless of the size of the choice set. In this study, we tried to control for that by creating choice sets with the most attractive items (see the preliminary study in Section 4). Also, by looking at the distribution of the choices made by the participants (see Tables 6 and 7), there is no category that excessively stands out of being chosen. The most plausible explanation for why we did not observe the hypothesized effects comes from Hutchinson [50]. He argued that overchoice seldom occurs among animals, because they seem to have adapted to the different sizes of choice sets that naturally occur in their environment. Although this hypothesis has not been verified on humans so far, it would explain best why the overchoice effect on the choices made (i.e., choice difficulty and choice satisfaction) was not found in our study. The sizes of the choice sets we used are not uncommon for music streaming services. We picked the size of our largest choice set (24 categories) to be in line with the original work of overchoice by Iyengar and Lepper [51]. However, this was just a subset of what would be presented to actual users of such a service. It could be that participants are accustomed to the sizes of the presented choice sets as currently in music streaming services they would need to deal with even larger choice sets than used in this study.

Although we did not experience the overchoice effect on the category items, it does not mean that our choice sets did not have any effects. We did find effects on the factors evaluating the system (i.e., perceived system usefulness and perceived system quality). These are important factors that help to form users’ general perspective of the system as a whole.

Aside of the fact that our results contribute to knowledge on how to design online music systems (see Section 5.4), our results also contribute to knowledge in other domains (see Fig. 11 for a visualization of our results). We show that personality traits relate to interactions within online music systems and thereby provide insights on how personality relate to online behaviors through new interactions methods that technologies are facilitating. Furthermore, our results provide additional insights important for decision making research by showing the versatility of expertise on the overchoice paradigm. Although expertise showed to be an important influencing factor, we show that it is case dependent whether it contributes to overcoming the overchoice effect.

5.4 Implications

The results of these studies support the creation of personalized user interfaces by taking into account the user’s personality and expertise (a proposed user model can be found in Fig. 11). With applications getting more and more connected and sharing resources (e.g., applications connect with social networking sites, such as, Facebook, Twitter, or Instagram), the automatic extraction of personality and expertise becomes more available. A possible scenario could be:

A user has the music application connected to his Facebook account. Based on his Facebook profile, the application inferred that he is someone open to new experiences. Therefore, the music application adjusts the user interface by emphasizing the mood taxonomy to let the user continue browsing for music. By analyzing his profile (e.g., he filled in artists and bands that he likes) and postings (e.g., posting often that he goes to concerts), the system may infer that he is actively engaged with music. Based on this, the system decides to provide him more categories to choose from within the mood taxonomy.

Fig. 11
figure 11

Proposed user model. Personality traits: (O)penness to experience, (C)onscientiousness, (E)xtraversion, (A)greeableness, (N)euroticism. Music expertise dimensions: active engagement (AE), perceptual abilities (PA), emotions (E)

In the last couple of years, it has been demonstrated that personality information can be extracted from social networking sites (SNSs) like Facebook (e.g., [2, 35, 42, 73, 81]), Twitter (e.g., [41, 75]), and Instagram (e.g., [29,30,31, 36, 65], or a combination of such [89]). Being able to extract personality traits from SNSs caters the possibility for (music) applications to adjust their user interface based on our results. For example, when someone appears to be open to new experiences, the mood taxonomy could be emphasized while other taxonomies could be placed more in the background of the interface. In addition, music recommendations could be given based on the mood taxonomy (e.g., music with similar mood expression).

Although recent work has shown that personality can predict music sophistication [25, 44], we believe that also the expertise dimensions (i.e., active engagement, perceptual abilities, and emotions) that we used in Study B, can be inferred from the same increased connectedness with SNSs. For example, active engagement can be inferred by extracting information on concert attendance (e.g., Facebook events, SongKick; http://www.songkick.com) as well as purchase behavior (e.g., iTunes store, Amazon; http://www.amazon.com). The “About” section, or the posted activities and status updates in SNSs can provide cues to infer perceptual abilities. Analyzing postings of a SNS user could give an indication about the emotion expertise dimension (e.g., postings about induced feelings when listening to a song). Also, there seems to be some relationship between factors of the emotion expertise dimension and the openness to experience personality factor. This could serve as an additional indicator. Music applications could anticipate the choice set based on the expertise dimension of the user.

6 Limitations & future work

There are several limitations in this study that should be addressed in future work. Our sample focused only on participants situated in the United States. Recent work showed that there are cultural differences in music consumption (e.g., [27, 37, 82, 83, 90]). Hence, cultural differences may also play a role in taxonomy usage and category preferences. Future work should address this.

We tested the relationship between personality traits and independent music taxonomies (i.e., mood, activity, and genre). One of our results show a relationship between neuroticism and the activity and genre taxonomy. On the other hand, it could well be that people prefer combinations of taxonomies (e.g., sad pop music, funky road trip music, or happy cooking music).

In the studies we conducted, we intentionally did not include real music recommendations as we believed this could interfere with rightfully answering our research questions. Since this study only simulated the decision making stage of using a music streaming service and did not play any actual music, it may have limiting effects on the holistic user experience.

7 Conclusion

The goal of this work was to investigate whether music browsing strategies are related to personality traits, by looking at the decision making of picking a music taxonomy (mood, activity, or genre) to browse for music. Additionally, we looked at the occurrence of overchoice with the number of categories within the music taxonomies, and how this effect is moderated by expertise.

We found that users’ choice of a taxonomy (mood, activity, or genre) to browse for music, is related to their personality. We found significant effects between openness to experience and the mood taxonomy, Conscientiousness and the activity taxonomy, neuroticism and the activity taxonomy, and neuroticism and the genre taxonomy. Furthermore, our results show that overchoice is moderated by expertise. We found that the effects of overchoice is counteracted by expertise in the genre taxonomy (i.e., a positive relationship between expertise and more choices). However, having more expertise/experience does not always make choosing easier. In our case, emotion experts (e.g., those who easily identify with emotions in music) had more difficulties making a decision with an increased choice set (i.e., a negative relationship with expertise). Although expertise may take the role as a proxy measure for cognitive processing, by assuming that expertise and experience with the topic makes processing information about the topic easier, this does not always seem true. In some cases, expertise or more experience can create averse effects.

Finally, while the majority of prior research focuses on the influence of overchoice on choice satisfaction and/or choice difficulty, we show with our results that overchoice does not necessarily limit its influence to these two factors. Our results show that even when choice satisfaction or difficulty are not affected by the overchoice effect, it may still influence other aspects of the user experience (e.g., system usefulness, and system quality). These other factors of the user experience should not be neglected, and could play an important role in the recurring use of the system by users.