1 Introduction

What is the difference between music and sound? Music is the art of sound. Sound is invisible waves moving through the air around us. When something vibrates, it disturbs the air molecules around it. Music is sound that is organized by people on purpose, to dance to, to tell a story, to make other people feel a certain way, or just to sound pretty or be entertaining [1, 2].

From the player’s perspective, the most important theories are [3]:

  • Sounds aid the learning curve for gamers - interactivity is crucial in a game, feedback sounds (positive and negative ones) are important to mark how players progress during the game

  • Sounds affect the degree of game immersion - auditory and positioning sounds give certain clues and shape the perception a player has in any given situation. Games sounds enhance the extension of the physical body

  • Sounds create a community outside the game itself - music can be customizable and players can share their own music remixes with an online community

These theories threat sound as a game accessory, giving support to the game’s main features, but what happens when the music itself is the key of the game? Music games most commonly challenge the player to follow sequences of movement or develop specific rhythms. Some games require the player to input rhythms by stepping with their feet on a dance pad (e.g. Dance Dance Revolution), or using a device similar to a specific musical instrument, like a replica drum set (e.g. Guitar Hero, Rock Band, DJ Hero). These games have changed the way players interact with their consoles by making the gaming experience more active and sociable, and paving the way for exergaming [46] (e.g. Lips, Just Dance, Dance Central).

When music meets casual games in mobile platforms and web, the music guessing games (MGGs) genre arise. This can be considered a mix of two distinct game genres: music games and trivia games. Trivia games are constantly growing in popularity, especially in mobile phones where people may only have a few minutes to play the game. In trivia games, the object is to answer questions with the goal of obtaining points. In music guessing games, the main question to be answered is “What is the song being played?”.

According to [7], Consumer Musical Intelligence (CMI) is a multi-faceted construct whose core components comprise the capacity to feel the emotion expression in music, to respond to musical stimuli, and to understand the music in a discerning manner. They also propose three factors underlying CMI:

  • Affective Musical Intelligence is the extent of spontaneity and intensity with which people identify with emotional content in music

  • Behavioral musical intelligence is the extent of semiconscious motor reactivity to musical stimuli

  • Cognitive musical intelligence constitutes the efficiency of the processes in the perception, encoding, and recall of musical information

This work focuses on the cognitive musical intelligence factor. We try to answer the following question: “How to make MGGs more interesting, challenging and pleasurable to players?”. Our main hypothesis is that it is possible to improve MGGs by simply altering the way music is presented to the player. Instead of playing a music section as it is, we think that a constructive approach sequentially presenting a combination of instruments being played can hold the player’s attention and improve the overall game experience.

The remainder of this paper is organized as follows. Section 2 presents an analysis regarding 67 MGGs on mobile platforms, none of which use the proposed approach. Section 3 provides details on how the proposed approach was implemented, how problems were overcome and the game prototype flow. Section 4 depicts the users’ experiments and the results obtained. At last, Sect. 5 concludes the paper and provides directions to future works.

2 Music Guessing Games

Currently, the most used mobile platforms are Android and iOS. Given that the majority of games have versions for both of them, and in order to facilitate the MGG analysis, we compared only iOS available games. The “AppCrawlr” App Discovery Engine (http://appcrawlr.com/) was used with the following parameters: “guess the song” as search query, “free application” as general filter, “iOS” as platform filter and “relevance” as sorting option. A total of 465 results were found, of which the first 67 were analyzed. Five different points were compared:

  • Response type: Alternative (user has to select as correct answer one of n alternatives), Letter (user has to construct the correct song/artist name from a mixed set of letters), Typing (user has to type the answer from standard input)

  • Clues: Yes (the game has clues to help users guess correctly), No (there are no clues available)

  • Full song playback: Yes (player can listen to the entire song), No (only part of the song is played)

  • Content: Single instrument (usually guitar or piano versions), Original song (part of the original song is played), No song (no song is played, user has to guess based on visual clues), Voice (player has to sing), Backwards (song in opposite direction)

  • Score calculation: None (there is no score computed), Correct/Wrong (score is calculated based on the number of correct guesses), Correct/Wrong & Time (score also considers how much time was spent to correctly guess)

Regarding response type, the three categories analyzed (by crescent order of difficulty) were Alternative, Letter and Typing. Alternative and Letter were the most frequent ones, with 45 and 42 %, respectively, as shown in Fig. 2. Typing represents the most difficult interaction, since the user has fewer clues to guess the answer.

Most of the analyzed games provide clues to help users guessing the correct answers. As shown in Fig. 2, 78 % of the games have some type of clues to aid players, which vary from removing wrong alternatives/letters or even skipping the current song.

We also verified how many of the analyzed games played the entire music to the user or just a part of it. All the 67 games only make use of music segments. This is probably justified by the fact that small music parts tend to diminish the amount of data to be transfered over the network.

How music content is presented to players is the focus of this work. From the analysis, we could conclude that none of the 67 MGGs use the proposed constructive approach, which is to sequentially increment the number of instruments played using isolated tracks. Approximately 75 % of the games use part of the original song to be guessed. This, besides making the game easier, does not provide the player with information on how the music is composed. According to Fig. 3, 15 % do not reproduce any kind of music at all, while 3 % use voice and 6 % use a single instrument. One of the games analyzed reproduces the original music, but in backward direction.

Regarding score calculation, three possible categories are shown in Fig. 3. There is no score calculation in 34 % of the games analyzed, which means that score is reverted to the number of songs guessed right. Also, 50 % do not take time to answer in consideration. The smallest portion (16 %) considers both time and right guesses on score computation.

3 The Proposed Approach

In order to verify this work’s hypothesis, we decided to implement a MGG prototype for mobile platforms, called “What‘s the Song”.

The base to our hypothesis is that there is available a music database composed by different song samples, each one having its tracks independently separated. Initially, the idea was to allow players to choose which tracks would play at a certain time.

The main problem with this approach is that each track would have to synchronously play in a separate processor thread. Since most songs have more than six different tracks, this would elevate the prototype processing requirements.

The Karaoke Version site is specialized in producing and selling music tracks for karaoke games. Once a song is bought, its owner can customize it and download each track separately or perform any track combination and then download the resulting mp3 file. Fortunately, the site provides random samples of 30 to 45 s for each music stored in its database.

In order to overcome the high processing demand stated before, we fixed a specific order in which the music would be presented to the user. This means, for example, that the first track to be heard would be the drums-based ones, followed by bass, keyboard, brass, guitar and vocal ones. To make sure that each song would have at maximum six tracks, we had to create six instrument classes for grouping the tracks, as shown in Table 1.

Table 1. Categories used to classify and combine instruments

After creating the track groups, we implemented a web crawler using the Java language. This tool was responsible for downloading the isolated tracks from the Karaoke Version website, and combining them according to their group. For eachsong, a folder was generated containing song information along with up to six combined tracks. The tracks were cumulative, which means that track #6 is composed by all previous tracks (1 to 5) and the instruments from the Vocal instrument group. This allowed us to play a single mp3 every time the player listens to the music, which significantly decreased the prototype requirements and eliminated the need for syncing the song playback. The tool used for combining the mp3 tracks was the SoX (Sound eXchange) [8].

The music selection process was performed as follows. We asked 14 people (all computer science students, aged from 20 to 30) to select music known to them from the Karaoke Version website. After that, we created a shared table using Google Docs containing all the 274 previously selected songs. There was given edit access to all the 14 people and they were asked to mark with an “x” all the songs they knew. Based on this voting process, we ordered the songs by number of votes and selected the 100 mostly known.

The prototype was implemented in a client-server fashion. Server side was developed using PHP, while client side was developed using Objective-C, since the iOS platform was chosen as target.

The server implementation provides the mobile client with the possible answers and links to each of the tracks to be played in a turn, in an XML-like structure. Since it does not take into account which songs were already used, repetitions can occur. The link information provided is relative to the song folder. It must be concatenated to the server‘s initial address before accessed on the client side.

The client side was implemented as a Single View Application on XCode. The game is mainly composed by tree stages/screens, as shown in Fig. 1. Figure 4 illustrates the complete game flow.

Fig. 1.
figure 1

Screenshots from the game prototype: initial screen, main screen, positive and negative result screens

Fig. 2.
figure 2

Response type comparison (Alternative, Letter, Typing) (left) and comparison regarding clues provided by the games (right).

Fig. 3.
figure 3

Comparison on how music content is presented to players (left) and comparison on how score is calculated (right).

Fig. 4.
figure 4

“What’s the Song” game flow

The second screen is where most game interaction occurs. In the top of the screen it is possible to use the back button to return to the initial screen (when this happens, all score is reset and any music playing is stopped) and to view the current score on the right. The discs are shown below the score area. A total of six different discs, each one mapping to an instrument group, indicate which instruments are playing on that specific moment. In order to advance to the next track, the user must swipe from right to left. It is important to notice that the discs are stacked, to give the impression that the pile of instruments being played only grows, according to what was explained earlier. This way, a single mp3 needs to be played every time. The bottom of the screen contains the alternatives to be chosen. Since the server already scrambles the alternative positions, the client just needs to display them. The user selects the answer which seems to the correct one by directly clicking on it. When an alternative is pressed, the last track is automatically played, revealing all the instruments.

The first screen indicates the starting point of the application. At this point, only the game logo is presented and no network access is made. When the user presses the play button, information regarding a turn is requested to server and data is downloaded. As soon as the mp3 s related to the songs finish downloading, the first track starts playing. Since all tests were performed using local network, a loading screen was not necessary. In a more real scenario, we think it is important to have such a loading screen, due to the fact that approximately 3.5 MB are downloaded before starting each turn.

The third screen shows whenever the user clicks on an answer. It can show a positive or negative message, according to the answer given. When the user presses the “next” button, a new turn is started, fetching new information from the server.

The user score is cumulative, meaning that each turn score is added to the final score. The turn score is based on the number of tracks needed to correctly guess the song and on how much time was used. Considering time for score calculation makes the game more dynamic, since users feel compelled to guess faster. Only positive scores are calculated, from 0 to 100. Wrong guesses mean that the turn‘s score was zero that time. The core calculation uses the following formula:

$$ Score = \left( {1 - \frac{{\left( {CurrentTrack *TrackDuration} \right) + TrackTime}}{TrackDuration*NumberOfTracks}} \right)*100 $$

4 Experiments

The primary purpose of this work was to understand if a constructive approach could contribute to a more engaging and overall better MGG. For this we conducted an experiment aiming to stimulate people to interact with the implemented MGG prototype. Two versions of the game were used: in the first one, the usual guessing game approach was chosen, in which the music was played with all instruments available at a time; in the second one, the proposed approach was chosen, giving the opportunity of guessing the song by progressively adding instruments to it.

A total of 20 participants (17 male and 3 female), aging from 18 to 29 (20.55 ± 2.79), contributed to the experiments. They had to play both versions of the game for about 5 min each and then answer a questionnaire about their experience. In order to not influence on questionnaire answers, the order of which version was played first was randomly chosen.

The questionnaire chosen was based on the one proposed by [9], and is related to the theory of the Core Elements of the Gaming Experience (CEGE). The CEGE are the necessary but not sufficient conditions to provide a positive experience while playing video-games. It is used to allow studying gaming experience objectively. From the original 38 questions proposed by them, we selected 19, since our game was just a prototype and just the song presentation approach was being evaluated. The selected questions are shown as follows. Answers were given using a Likert scale [10] containing 7 levels.

  1. 1.

    I enjoyed playing the game

  2. 2.

    I was frustrated whilst playing the game

  3. 3.

    I liked the game

  4. 4.

    I would play this game again

  5. 5.

    I was in control of the game

  6. 6.

    The controllers responded as I expected

  7. 7.

    I remember the actions the controllers performed

  8. 8.

    I was able to see on the screen everything I needed during the game

  9. 9.

    There was time when I was doing nothing in the game

  10. 10.

    I got bored playing this game

  11. 11.

    The game kept constantly motivating me to keep playing

  12. 12.

    I felt what was happening in the game was my own doing

  13. 13.

    I challenged myself even if the game did not require it

  14. 14.

    I played with my own rules

  15. 15.

    The game was unfair

  16. 16.

    I understood the rules of the game

  17. 17.

    The game was challenging

  18. 18.

    The game was difficult

  19. 19.

    I knew all the actions that could be performed in the game

The experimental results showed significant difference between usual and constructive approaches. Figs. 5 and 6 show the constructive approach was better scored on 13 of the 19 statements. These statements were mainly related to enjoyment, ownership and gameplay. The interaction offered by this approach may have caused this effect, expanding the sense of ownership and engagement.

Fig. 5.
figure 5

Usual approach response compilation

Fig. 6.
figure 6

Constructive approach response compilation

The statements in which the usual approach received higher scores were mainly related to frustration and control. This could be explained by the fact that the usual version is simpler to be manipulated than the constructive one.

As it can be observed in the Table 2, the more significant differences among the answers occurred in the following questions: “17. The game was challenging” and “18. The game was difficult”. Thus, we can say that players thought progressively adding instruments was 53.7 % more challenging and 75.9 % more difficult than the usual approach. According to question “13. I challenged myself even if the game did not require it”, the constructive approach also promoted more sense of challenging the player (21 % more).

Table 2. Mean and Standard Deviation values for user answers in both scenarios

Other results show that the constructive approach was 20 % more enjoyable (question 1) and 14.6 % more liked (question 3). Players also thought that the usual way was 38.8 % more boring (question 10), even if it had a low score. On the other hand, the users thought controlling the usual approach was 27.2 % more comprehensive than the other one (question 19), what was expected since there were fewer controls. These results meant that the difference in way of interaction did have a final impact on the level of gaming experience.

The participants were also encouraged to leave suggestions and comments after answering the questionnaire. Some participants suggested to make the usual approach more difficult and challenging, adding more songs to both approaches and dividing them by categories. The main criticisms were about the feedback given, often repeated songs (this happens because the server does not keep a record of songs already played) and the delay when the button next is touched (this delay was due to the fact there was no loading screen and as said before, was the time needed to download all mp3 tracks used on the game).

Many players were very enthusiastic about the constructive approach and reported that it was more interesting than the traditional version. Compliments about the user interface were also made.

5 Conclusion

This work proposed a new approach for music presentation on MGGs. The original hypothesis that players would be more engaged and have an overall better experience was validated by 20 participants during the test sessions.

The experiment showed that adding instruments progressively improved some scores of the gaming experience, such as enjoyment, ownership and gameplay. The main reason to this was that the new approach expanded the sense of difficulty, and consequently the sense of challenge of the game. Besides its simplicity, players liked the implemented prototype and most of them suggested that an enhanced version of it should be made available for public download. As future work directions, and according to the participants suggestions, we should perform the following modifications on the game:

  • Group musics by genre

  • Add multiplayer functionality to the game

  • Allow users to post their achievements in social networks

  • Implement a global rank in order to increase engagement and competitiveness among users.