Keywords

1 Introduction

In recent years there has been a growing use of the Internet, especially Internet Mobil, generating a large amount of data produced by users. Users are overwhelmed with lots of information. A person can spend a lot of time searching for what they need and finishing the search with an option that is not the most appropriate for them.

On the other hand, the growth in the use of the Internet has increased the participation of the consumers in the sites of electronic commerce. Consumers generate comments or product reviews, which helps other buyers in the buying process. User feedback on a product is very important and can have a positive or negative impact on other visitors [1, 2]. Users read carefully the opinions and experiences of other users before making the purchases. There is growing evidence that such forums inform and influence consumers’ purchase decisions [3, 4]. Information from user reviews is useful for knowing your preferences and predicting recommendations of new product.

In this paper we propose a method to acquire the user experience from the opinions written on the web. Text mining techniques and ontology are used to process user comments about a product.

2 Evaluating User Experience (UX)

The user experience is the process that is developed while the user interacts with a product/service. Often the concept of user experience is confused with usability. Usability is the ease of using a product/service/computer tool in order to achieve a specific goal [5, 6]. In contrast, the user experience is composed of a set of factors and elements related to the perception of the user in the interaction with the product/service. The total benefit to a user is achieved when the product/service is usable and generates a positive experience. The user’s experience involves social, cultural, contextual factors, expectations and previous user experiences.

The result is the generation of a positive or negative perception of the product/service. To be successful in accepting a product, not only must know the end users, but also the opinions and perceptions that are having users who are using or used the product/service. The success of a product depends on corrective measures and new versions made based on experience. If they had a good experience, they are happy and continue to consume the product or using the service. If you had a bad experience, do not use the product/service and tell your friends.

The evaluation of the user experience is a basic part of a user-centered design; it allows to know the degree of fulfillment of the expectations of the users. One of the most used methods to evaluate the user experience is by testing the product by the user and then performing a questionnaire to obtain the perceptions in the interaction with the product/service.

Numerous questionnaires have been developed to obtain information about the experience of users in different domains [7, 8, 9, 10, 11]. For the purpose of this paper, we are interested in the questionnaire developed by María Rauschenberger et al. [8], this questionnaire allows to obtain feelings, impressions and attitudes that arise when a user uses the product/service. The user responds 30 questions after using the product/service. The questionnaire consists of 6 categories to evaluate and 26 items. The categories are: Attractiveness, Efficiency, Perspicuity, Dependability, Stimulation, Novelty.

Table 1 shows the scale, definition and items that make up each scale to be evaluated.

Table 1. Scale questionnaire from [8]

Users should answer the questions by selecting one of the items. Figure 1 shows some values of answers that users should select.

Fig. 1.
figure 1

Values of answers that users should select for questionnaire of User Experience from [8]

3 Opinion Mining

Obtaining information from the user experience through a questionnaire requires that the product/service be used by them and then answer the questions. There is another source containing the experiences of users with products/services. Those sources are social networks, forums and blog of opinions [12, 13, 14].

Extracting the user experience from these sources allows you to make correct decisions, improve versions of products/services, correct and modify the product/service when they detect a problem based on the customer experience. The essential difference of the information sent in the networks compared to the one obtained by surveys is the immediacy. This information is spontaneous and unstructured, companies and communication agencies understand the value of this information for their business strategies, customer service and trend detection.

However, analyzing all these opinions manually would consume a lot of time, by volume, variety and speed.

Mining of feelings or mining of opinions emerged with the purpose of automating the analysis of information of the opinions of users. Automatic analysis of this information provides the ability to process high volumes of data with minimal delay, high accuracy and consistency, and low cost, which allows human analysis to be complemented in a multitude of scenarios [15, 16].

In the Sentiment Analysis aspects like the opinion, intention and emotion of the users of the social networks are measured. The automatic analysis of the contents in these networks has as main objective to know the opinion of the users about products, services, brands, people and institutions.

Research work in Mining Opinions focuses on three main tasks:

  • Polarity detection: allows determining if an opinion is positive or negative. Beyond a basic polarity, you may also want to get a numerical value within a given range.

  • Analysis of the feeling based on characteristics: allows determining the different characteristics of the product treated in the opinion or review written by the user, and for each of those characteristics mentioned in the opinion, be able to extract a polarity. This type of approach is much more complex than the detection of polarity.

  • Emotion analysis: The analysis of emotions tries to detect in an automatic way the emotions involved in the opinion expressed by the users.

4 Mining User Experience (UX)

Opinion Mining is related to the analysis of the subjective components that are implicit in the contents generated by the users. As mentioned in the previous section, obtaining polarity and analyzing feelings based on characteristics are fundamental objectives in the mining of opinions. In [17] the opinions of users of digital cameras are analyzed. The objective of this work is to obtain a numerical value (positive or negative) for each one of the characteristics of the product from the opinions of the users. For example if in the opinion mentioned “… the battery is bad …” is obtained for the characteristic battery of the camera the value –1.

An ontology is used to structure the information of the opinions. The concepts of ontology are the characteristics of the digital camera. Text mining techniques are used to obtain a set of rules that allow to classify each setence of the opinion in positive and negative. The ontology and a list of related words and synonyms are used to identify which concepts of the ontology are involved in sentences classified as positive and negative (Fig. 2).

Fig. 2.
figure 2

Sentiment analysis process from [17]

In this work we obtain a positive or negative value for each of the characteristics (Sentiment Analysis) and from these values a unique numerical value of the opinion (Polarity) is inferred. In the whole process, only words such as good, bad, broken, not working, no, yes, etc. are taken into account. These words represent the state or condition of the product or parts of the product.

As mentioned in previous sections, the user experience is composed of factors and elements related to the perception of the user in the interaction with the product/service.

It is important to identify in the text written in the opinions words that help infer factors of perception.

In this work we use the scales of [8] to evaluate the user experience and not only the positive or negative opinion from the text written by the user. Figure 3 shows the process created to obtain user’s experiences from text.

Fig. 3.
figure 3

Text mining process to obtain user’s experience information

The first phase consists in determining those irrelevant words for the analysis in order to debug the text eliminating words and nonsense signs. The elimination was not only based on frequency and length, but on the importance of the word evaluated in the text, for example articles and punctuation marks. It also determines the length of the unit of analysis, the size of the sentences to be analyzed.

The second phase consists in analyzing the polarity of the sentence using an ontology. The ontology represents the categorization of the user experience of [8]. At this stage a syntactic analysis of the sentences is performed to establish a relationship between its components, and the ontology. Each text is segmented into sentences and these into words (tokens). Then words related to the user experience are identified.

4.1 Ontology Construction

The ontology was built based on the user experience questionnaire presented in Sect. 2. It is composed of three parts. The first contains information about the user such as identifier and demographic data. The second part contains information about the product/service that is being discussed. This information represents the characteristics of the product/service as described in [17]. The third part represents information about the experience in the interaction with the product/service. In this paper, we will focus on explaining the third part of the ontology. As can be seen in Fig. 4, the concepts involved in ontology are the classification scales of the user experience of [8]. These are Attractiveness, Efficiency, Perspicuity, Dependability, Stimulation and Novelty. These categories represent concepts in the ontology, the concepts are composed of attributes which are the words and related words of the Table 1.

Fig. 4.
figure 4

Ontology representing the user’s experience information

For example the attributes of the Atractiveness concept are annoying, enjoyable, good, bad, unlikable, pleasing, unpleasant, pleasant, attractive, unattractive, friendly, and unfriendly.

The values of these attributes can be positive or negative, representing the polarity of these words.

To expand the text analysis, each attribute contains a list of related words and synonyms created from WordNet [18].

4.2 Prepocessing

In the first step, the pre-processing of the text is performed, where the texts are loaded and the process of eliminating the empty words (stopwords) is applied. This procedure consists of deleting from the text those words that do not contribute relevant information such as articles, auxiliary verbs and words dependent on the context.

In a second step, the text is segmented into sentences, which consists of dividing the text into independent sentences.

In the third step, segmented sentences are fragmented into tokens. In this step you get the most basic elements of a sentence structure, where a token is nothing more than a block of text that is characterized by the function it performs within a sentence.

4.3 Identification of Experiences

Starting from the pre-processed corpus of opinions:

  1. 1-

    Each token is identified in noun, adjective or verbal form, and then look for its occurrence in the ontology developed for this purpose. Each word identified in the ontology has its value of semantic polarity (represented in the value of the attribute in the ontology). In this way, we will obtain the semantic orientation (positive or negative) for each token.

  2. 2-

    Then, all the intensifiers or attenuators of the sentence are identified and the semantic orientation and the token they modify are also calculated. A dictionary of terms identified as intensifiers/attenuators is used for this purpose. There are structures that in themselves contain an enhanced value. Among the most frequent can be mentioned: Up/even/even/not even. You can also find intensifying comparative structures such as “It is easier to use than …”, and the exclamations “How problematic!”. For the identification and processing of these structures, dictionaries have been created that contain these grammatical structures.

  3. 3-

    Finally, when the two previous procedures are completed, an ontology instance is obtained for each user’s opinion. The instance contains the user experience categories, the terms belonging to each category, the polarity and the intensifiers identified therein.

An example of the instance of the ontology can be seen in the Table 2.

Table 2. Ontology instance

4.4 Obtaining the Polarity of User Experience

This last step consists of the calculation of the polarity of the sentences that summarizes the empirical interpretation that the user wrote in the opinion. We obtain a general value, a numerical quantification that, framed in numerical ranges, aims to give an approximation to the experience contained in the analyzed information. In other words, the goal of calculating the polarity of sentences is to bring interpretation to numerical estimates.

For each category the polarity value is obtained as:

$$ PC_{i} = \left\{ {\begin{array}{*{20}l} {0{\text{ if there is opinions en }}c_{i} } \hfill \\ {\frac{{\sum\limits_{ 1}^{\text{n}} {a_{n} + \text{int}_{n} } }}{n}} \hfill \\ \end{array} } \right. $$
(1)

Where PC i is the numerical value and its polarity of each category C i , with i = 1 … 6. a are the attributes of the category identified in the opinion. a has a value of 1 (positive polarity) or –1 (negative polarity). n is the number of attributes identified in the opinion. Int n is the intensifier associated with attribute n. In this work int n can take values 0.5 if it is a positive enhancer or –0.5 if it is a negative enhancer.

For the example the polarity values of the opinion are (see Table 3):

Table 3. Polarity values of the user’s experience categories

5 Case Study

A case study was developed in the tourism domain. The same consisted of obtaining information of the user experience for each category presented in Table 1. The data set available in [19, 20] was used. This consists of more than 100,000 hotel reviews retrieved from TripAdvisor in the period February 2009 to March 2009. These data contain information such as the author, date, content and rating for certain features of the hotel.

From this data set, we only took the opinions of 50 hotels and 1000 reviews for those 100 hotels. From this dataset the process described in Sect. 4 was implemented. Stage 1 is responsible for reading the opinions that were specified in an input xml file and selecting terms that provide useful information, eliminating irrelevant words. Tokens were obtained. We identified the words using dictionaries created for this purpose. Finally, the words representing the user experience were identified and the polarity value was calculated for each category.

Figure 5 shows the polarity values obtained for each of the user experience categories of the 50 hotels. These values summarize the experience of the users regarding the use of the hotel. Instead of conducting a questionnaire for hotel users, which are not answered many times, the value obtained through this method may indicate to decision makers what to improve based on the experience of previous users.

Fig. 5.
figure 5

polarity values obtained for each of the user experience categories of the 50 hotels.

6 Validation

Below we will present the main measures to evaluate the quality of the classification of the opinions regarding the user experience using the proposed method. For the validation of the results, the same 1000 opinions of the 50 hotels were taken and the opinions were manually tagged.

Each sentence was classified into one of the categories of user experience: Attractiveness, Efficiency, Perspicuity, Dependability, Stimulation and Novelty. Within each category, the sentence was classified according to its polarity (positive or negative) and the intensifiers and attenuators involved in the sentence were identified.

A Accuracy measurement was used to perform the automatic classification validation with the manual (Eq. 2). Accuracy represents the proportion of the total number of predictions that were correct.

$$ A = \frac{v}{f + v} $$
(2)

Where v is the number of cases where the classification was correct (coincide between manual and automatic classification). f is the number of incorrect cases (mismatch between manual and automatic sorting).

Applying this formula at the level of sentence classification, we obtain the following result:

$$ A = \frac{7526}{2594 + 7526} = 0,74 $$

The correct classification of 7526 sentence from a set of 10120 indicate the good performance of the proposed method to acquire the user experience automatically.

7 Conclusion

The user experience is composed of factors and elements related to the perception of the user in the interaction with the product/service. Usually, questionnaires are used to obtain information about the experience in the interaction with the product/service.

The use of questionnaires is sometimes frustrating for people. In this work a method was proposed to obtain the same information that would be obtained by questionnaire, but from the opinions voluntarily generated by the users on the web. It is important to identify words written in the text that help infer perceptions. In this work we use the scales of [8] to evaluate the user experience and not only the positive or negative opinion from the text written by the user.

An ontology and dictionaries were created with intensifying and attenuating words and grammatical structures that allow to identify, in addition to the words representing the user experience, the intensity of the perception and orientation of the opinion.

The experiments carried out reflect the importance of the processing of the syntactic analysis of texts taking into account the processing of intensifiers, which significantly increases the classification of the polarity of opinions.

As future work, it is proposed to link the information of the user experience with the information about the sentiment analysis based on characteristics proposed in [17]. We also propose the processing of adversative sentences, which are understood by contradictory sentences. The same sentence can be classified as positive or negative. These cases were detected as common errors in the validation of the results obtained. The proposed method does not know how to handle cases where there is contradiction in the sentence.