1 Introduction

The Appraisal Theory presented by Martin [10], provides a useful framework for distinguishing between different types of attitudes (Affect, Judgment or Appreciation) and describes how writers or speakers use the language to reveal their engagement with the reader or hearer, and to amplify or diminish the strength of their attitudes and engagements. This theoretical study has opened an interesting research area to analyze opinions. Even though opinions about a specific target may have a similar polarity, they can differ according to their evaluative purpose. Some of them refer to personal or emotional reactions, others evaluate objects and entities properties by reference to aesthetics aspects, or even, they assess the human behavior by reference to ethical and social norms. Going beyond opinion’s polarity, recognizing attitudes in opinions is a step towards more fine-grained models for sentiment analysis. Considering that, methods and resources for automatically classifying the attitude of the words or phases are a cornerstone for creating knowledge-based systems able to properly identify not only the valence but also determine the evaluative purpose of the opinions. Mining the attitudes behind opinions can enhance other sentiment analysis tasks and it can be useful for decision making.

Actually, some authors have turned their attention to computational treatment of evaluations in language. Taboada and Grieve [16] tried to calculate the strength of its semantic association with pairs of words, pronoun-copula, composed by a pronoun and the verb form “was”, as follows: “I was” for Affect, “he was” for Judgment, and “it was” for Appreciation using a Point Mutual Information (PMI) and AltaVista Search Engine. Bloom and Argamon (2010) propose an automatic method for complex appraisal extraction patterns [1] in English. They manually created a lexicon of targets commonly used in two given domains (Digital Camera and Movie) and annotated a list of 29 syntactic dependencies to associate these targets to attitudes expression, also to identify targets that were not included in the lexicon [2] based on syntactic dependencies. A notable advance in attitude recognition at sentences level was presented in [12, 13]. These works analyze complex contextual attitude on the basis of deep analysis of syntactic and dependence structures, the compositional linguistic rules applied at various grammatical levels, the rules elaborated for semantically distinct verb classes, and a strategy for considering the hierarchy of concepts based on WordNet. A corpus-based method for classifying attitudinal words in Spanish language was introduced by Hernández et al. [7, 8]. In these works binary classifiers were trained for recognizing attitude type and orientation at words level. A corpus and lexicon used for training the classifiers were manually created. According to our best knowledge, these works are the first ones that address the problem of attitude words classification in the Spanish language, and constitute the more closely approach to the work introduced in this research.

Previous studies were able to show that mostly approaches are focus on the English language; and rely on prior lexicons with attitude annotation for developing complex models based on machine learning or knowledge-based systems. Hence, the quality and size of these lexicons have implications on the effectiveness of these methods. Considering these limitations, proposing effective algorithms to build new lexicons of attitude words, especially for Spanish, comprises an interesting research direction due to limited advances that have been found in the scientific literature for this task in this language. Two are the main goals of this work: (i) to suggest a new solution for automatic classification of both attitude types and orientations of the words in Spanish better than existing ones; (ii) to explore large unlabeled datasets available on-line using neural network based word embedding techniques in order to obtain a good semantic representation of the words useful for identifying their attitude types and orientations. The remainder of this paper is organized as follows: in Sect. 2, theoretical background about Appraisal Framework is presented. Later, Sect. 3 introduces our proposal for classifying attitude types and orientation at words level. Later, in Sect. 4, experiments and discussion about the results are summarized. Finally, in Sect. 5, conclusions and main directions for future investigation and improvement are presented.

Fig. 1.
figure 1

General structure of the Appraisal Framework

2 Appraisal Framework Background

Appraisals according to Martin’s research work [10] can be divided into three distinct systems closely related (cf. Fig. 1): Attitude, Engagement and Gradation. The Attitude system is concerned with words or expression used to reveal feelings, including emotional states or reactions, judgments of behavior and evaluation of things. On the other hand, Engagement is concerned with words or expressions used by the writer or speaker for positioning her statements and point of views. Finally, Gradation considers words that attend to grading (intensify, diminish, soften or sharpen) evaluations insight of language. The central aspect of the Appraisal Framework is the Attitude system and it is divided into three refined subsystems: Affect, Judgment and Appreciation; that define the specific type of appraisal used by the writer for negotiating her feelings and private states. Specifically, Affect deals with language resources (words or expressions) for constructing emotional states and reactions in texts (e.g. the words: relaxed, disgusted, exited, miserable). Judgment is concerned with resources for assessing behavior according to various normative principles and ethics rules (e.g. the words: honest, skillful and loser). Lastly, Appreciation looks at resources for obtaining aesthetic qualities of objects and natural phenomena (e.g. the words: awful, magnificent, fabulous and horrible). The Attitude system also deals with the orientation of the appraisal and distinguishes whether has a positive or negative semantic orientation. One difficult point that must be considered in the Attitude system is the dependence of type and orientation of appraisal expressions according to the context where these occur. This is a hard challenge also related with subjective language ambiguity and contextual sentiment identification.

Fig. 2.
figure 2

Overall architecture proposed for the attitude classification method

3 Attitude Word Embedding Classification

This section introduces our method for attitude words classification. It constitutes a step towards more refined methods for understanding evaluative language from the Appraisal Theory perspective, and an advance beyond traditional systems for polarity classification in Spanish language. The Fig. 2 depicts the overall architecture of the proposal. It can be divided into two main parts: the word representation based on neural networks, and the training of one classifier for each attitude type and orientation. As illustrated in Fig. 2 the method takes as input an unlabeled corpus and a lexicon of words annotated with attitude types and orientation. Firstly, the text preprocessing of the corpus is carried out, including sentences recognition, word segmentation, and word stemming by using the FreeLing tool [14]. Once the corpus was preprocessed, it is given as input to the word embedding method in order to obtain a vector representation of the terms. This model aims to create vectors in a much lower dimensional space that the original Vector Space Model (VSM) [15], so it is a more efficient representation. Moreover, it provides more expressiveness because the words are encoded as dense vectors with syntactic and semantic properties. As consequence, semantic related words are close in this new vector space. Having a semantic representation of words, the next step is then using them for solving the problem of attitude classification. To achieve this objective, the words in the lexicon are represented according to the vectors learned in the previous step. Later, considering this representation and the attitude labels associated to each entry in the lexicon, a training set is build for each type of attitude and each type of orientation. By using these training sets, five classifiers are trained in order to automatically recognize the attitude and orientation of unseen terms.

3.1 Unlabeled Corpora and Attitude Lexicon

Our proposal relies on unlabeled background corpora and an attitude lexicon to learn words representation. In contrast to the works presented in [7, 8] which use a specific corpus of evaluative sentences (LAM11), in this work SBW16 [4] was considered as an open domain and useful corpus to encode semantic and syntactic information associated to the words based on distributional semantic principle. This has the advantage that a huge volume of textual information can be considered without requiring specific manually tagged corpora which demand great efforts by human experts. In order to establish a comparative analysis, also LAM11 and RMC17 were explored as background corpora. SBW16 consists of raw texts in Spanish language with approximately 1.5 billion words, extracted from different resources from the Web. It replaces all non-alphanumeric characters with whitespaces, all number with the token “DIGITO” and all the multiple whitespaces with only one. The capitalization of the words remain unchanged. LAM11 contains 56970 “attitudinal sentences” in Spanish language having around 1.3 million words. In the corpus creating process, sentences from distinct sources were retrieved. Specifically, movie reviews, Mexican news, letters, stories and poems. Finally, RMC17 is composed by all sentences in LAM11 and SBW16. The aim behind this merge was motivated by the need of taking the advantage of the huge volume of sentences in SBW16 useful for capturing the semantic properties associated to the words, also by considering the appraisal information explicitly encoded in LAM11.

Table 1. Description of corpora w.r.t. number of sentences, words, and vocabulary size
Table 2. Distribution of words in the lexicons w.r.t. to attitude and orientation

The distribution of sentences, words, and vocabulary size for each corpus is presented in Table 1. As can be observed, SWB16 and RMC17 are much longer than LAM11. This dramatic difference might have a direct impact on word representation, and on the effectiveness of the proposed method.

The Attitude lexicon can be considered as key point because it is used to train and validate the attitude classifiers. Specifically, LAM11-LEX lexicon [8] was considered. It has 3,005 word entries, where each word was manually annotated, considering three integer values: 0, 1, and 2, to establish its polarity (positive (Pos), negative (Neg)) and its correspondence to the Attitude system (i.e., Judgment (Jud), Appreciation (App), Affect (Aff)). Where 0 indicates the lowest and 2 the highest strength. The distribution of words in the lexicon respect to attitude and orientation labels is illustrated in Table 2. Notice that, there is a dramatic disproportion. The affect words (minority class) represent only 23.75% of the judgment words (majority class), also the appreciation words represent 50.56% of the majority class. Regarding positive and negative words, the problem is similar but to a lesser extent than in attitude classes. As consequence, in the process of classification of attitudes the rate of misclassified ones in the minority class might be increased.

3.2 Attitude Word Representation Based on Neural Network Embedding

Unsupervised learned word embedding has been a successful representation in numerous tasks of Natural Language Processing in recent years. This technique has obtained better or similar results to other complex models for representing words such as Latent Semantic Analysis (LSA) [6] and Random Indexing (RI) [9], specifically when large corpora are used to learn word vectors. The major contribution of the words embedding representation underlays in its capability of capturing and encoding semantic similarities between words or phrases based on their distributional properties. In contrast to the representation used in [8], which considers that the vectors of words (VSM) within sentences are good for capturing attitudinal and orientation of the words, this proposal aims at using word embedding as a more semantic-rich representation useful for improving the effectiveness of attitude word classification. To validate this assumption, Word2Vec [11] and FastText [3] were used for words representation.

3.3 Classification Schema Proposed

Considering the theoretical aspects related with the Appraisal Framework, it can be noted that attitude classification is a multi-classes and multi-labels problem due to the great overlapping between words in distinct attitude types (Appreciation, Judgment and Affect). However, in this work the problem was simplified, and we did not treat either attitude type and orientation classification as a multi-classification or multi-labels problem. Instead, the problem is modeled as a single-label binary classification task.

For recognizing attitude type and orientation, five binary classifiers were training in a separate manner, three of them for identifying the type of attitude and the two remainders for classifying the semantic orientation. The training set for each class was built in the following way. Firstly, all words associated to the class that needs to be recognized were taken as positive instances and the remaining words were considered as negative instances of the classFootnote 1. It is important to clarify that words that are in both classes (positive class and negative class) are removed from the negative instances and only considered as instances for the positive class. After that, the words were represented with their vectors obtained through the word embedding method. Once the training set was created, the next problem that was addressed is the imbalance of the minority class (positive) respect to the majority class (negative). For that, an oversampling techniques was applied. Specifically, the method called SMOTE (Synthetic Minority Over Sampling Technique) [5] was applied using the python packages Scikit-Learn-ContribFootnote 2. It has the purpose of increasing the number of instances in the minority class and hence reduce the problem of imbalance. Finally, different classifiers were trained for identifying each attitude type and orientation of the words. Regarding the classification methods, in this work five distinct models were evaluated. The motivation behind that was to assess the quality of the vectors learned using word embedding over large corpora for attitude word classification. In this work, no great effort to choose the best classification model and its optimal parameters setting was dedicated. Specifically, the implementation of Linear Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), Gaussian Naive Bayes (NB) provided by the python package Scikit-LearnFootnote 3 were applied. Also an ensemble of classifiers was evaluated combining in a soft voting schema the four previous classifiers.

4 Experiments and Results

The evaluation of the proposed method constitutes a bottleneck, due to the lack of benchmark collections in the scientist literature already used to validate the results and establish a strict comparison with previous approaches. As was explained in Sect. 1, the proposal introduced by Hernández et al. (2011) [8] is the most similar to this work. For that reason, the validation strategy followed by them was assumed here. This allows to consider their results as a baseline, and to establish a comparative analysis. The validation process relies on manually annotations provided by the attitude lexicon. In this case, labels associated to each word in the lexicon LAM11-LEX was used as “gold-standard”. The words in the lexicon were partitioned in training and test subsets using a stratified five-cross validation. For measuring the effectiveness of our classification method, two global measures were considered, the first one (F1-ATT), that measures the overall quality for recognizing attitude types and the second (F1-SO) for recognizing attitude orientations. Also, F1 measure was reported for each class.

In the validation of the proposed method the main interest was focused on analyzing the impact of using word embedding techniques for representing the attitudinal words. In this sense, Word2Vec and FastText model were applied on the three background corpora (LAM11, SBW16, and RMC17) with the purpose of learning dense semantic vectors for representing the words. The Word2Vec parameters were modified to consider the skip-gram model and fix at 300 the size of the word embedding vectors; the defaults values of the remainder parameters were maintained as reported in [11]. For FastText the default values as reported in [3] were considered, apart from the vector size, which we set respectively to 300, to match Word2Vec. Notice that, FastText uses the sub-word structure for learning word vectors. For this reason, when this model is used the word stemming task carried out in the preprocess stage was ignored.

Regarding the parameters setting of classifiers methods, in this work not great efforts were made to estimate the optimal parameters. The main goal aims at discovering how the embedding vector representation contributes to the task of attitude word classification. Therefore, a slight modification of parameters was carried out based on empirical knowledge. Specifically, in the case of the Random Forest classifier (RF) the number of trees in the forest was set to 300. For the Nearest Neighbors classifier (KNN) the number of neighbors to consider into neighborhood was set to 3. The linear kernel was chosen for the Support Vector Machine method (SVM), and the Gaussian Naive Bayes classifier (NB) was used with default settings. The classifiers mentioned before were combined in an ensemble (ENS) using a Weighted Voting Schema. The weight assigned to each classifier was empirically set (limiting the coefficient values in the interval [0,1]) taking into account the performance of base methods.

In the first experiment, vector representations for words in the LAM11-LEX were learned using the Word2Vec model over LAM11, SBW16 and RMC17 corpora separately. Based on these representations, the five classifiers were trained and validated using a stratified five-fold cross validation. Table 3 shows the F1-measures by class and the obtained macro-averaged. The first column shows the corpus used for learning the vector representation and the second shows the classification method used. The columns between the third and fifth illustrate the F1 measure achieved for the three types of attitude taken into account in this work, whereas the sixth and seventh show the values of F1-measure for the attitude orientation. The next two columns show the global effectiveness in attitude recognition and orientation classification, respectively. The last column (WV) quantifies the number of words in the lexicon for which were possible to build a vector representation according to the model and corpus used.

Table 3. Results achieved for attitude type and orientation classification in LAM11-LEX using Word2Vec model on three distinct background corpora

Several observations can be made by analyzing the results in Table 3. Firstly, the number of words in LAM11-LEX that can be represented from SBW16 is greater than in LAM11. On the other hand, RMC17 is the corpus that better covers the words in the lexicon, although the vector representations for 193 words could not be built from it. The constraint that a word should appear at least 5 times in the corpus passed as input to Word2Vec model may be the reason why these words were not represented. This problem will be addressed in further research.

Regarding corpora, it is clear that the lowest results were obtained using the LAM11 corpus. This performance could be conditioned by the corpus size. It probably does not provide enough sentences to train adequately the Word2Vec model hence noisy vectors associated to the words were obtained. The results achieved using SBW16 showed a considerable increase with respect to those obtained from LAM11. The classes Appreciation, Affect and Judgment are more clearly learned with SVM and ENS methods achieving (0.692) and (0.683) of F1-ATT correspondingly. On the other hand, positive or negative words are better recognized with RF and ENS methods obtaining an effectiveness of 0.871 and 0.868 respectively in terms of F1-SO. The results obtained on RMC17 show a similar performance to those derived from SBW16. Surprisingly, no increase in terms of F1-ATT and F1-SO were obtained when this corpus was considered. The values of F1-ATT (0.688) and F1-SO (0.858) obtained by SVM and RF show a slight drop with respect to previous results. Notice that, through this corpus, 99 words more than in SBW16 were represented and incorporated to the training and the validation sets. Adding these words may be correlated with the decrease of results.

The second experiment follows a similar structure that the previous one. In this case the same classifiers, attitude lexicon and background corpora were used. It differs from the experiment above in the model applied to build the words representation. The main goal aimed to evaluate the impact of using FastText as technique for representing words. Table 4 illustrates the results obtained by using this representation. As can be observed, the number of words represented with this model increase from LAM11 to SBW16 and from SBW16 to RMC17. In case of LAM11, the classifiers that achieved better performance for Appreciation, Affect and Judgment were ENS (0.640) and SVM (0.634) methods, whereas the best values for positive and negative words classification were acquired with SVM (0.730) and RF (0.722). Remarkable increase in terms of F1-ATT (0.719) and F1-SO (0.886) were achieved when SBW16 was used as background corpus (most words in the LAM11-LEX could be represented). In addition, the most significant improvement in F1-ATT (0.722) and F1-SO (0.889) were achieved using the RMC17 corpus combined with the ENS method. Analyzing Tables 3 and 4 together, several observations can be made. Firstly, the proposed method obtains very good results in terms of macro-weighted F1 for both attitude classification and orientation recognition when the words representation was built on large unlabeled corpora (SBW216 and RMC17). These results support the hypothesis that relying on large unlabeled corpora, good semantic vector representation can be learned by using word embedding techniques to improve the task of attitude words classification in the Spanish language. Secondly, the results achieved with FastText show an important increase in both F1-ATT and F1-SO. Also more words can be represented using it. With respect to the classifiers, in general, SVM, RF and ENS showed the best performance. Contrary to expectations, not significant differences were found among ENS, SVM and RF; this suggests that the weights assigned to each base classifier need to be tuned with the purpose of increasing the quality of the ENS method. Finally, it can be clearly observed that Appreciation was the attitude class better recognized whereas the Judgment was the most difficult. This could be correlated with the fact that words or phrases used to express judgments are more subjective than words used for evaluating aesthetic aspects of objects even from the Appraisal Theory.

Fig. 3.
figure 3

Comparison w.r.t. F1 measure against the best results achieved by Hernández et al. (2011) [8]

Table 4. Results achieved for attitude type and orientation classification in LAM11-LEX using FastText model on three distinct background corpora

Finally, a comparison is made between the best results achieved by the proposed method (henceforth ENS-FAST) and the most significant results (henceforth LH11+SVM and LH11-RF) obtained by Hernández et al. (2011) [8]. As it can be seen in Fig. 3 the proposed method obtained a remarkable increase in the recognition of all classes. Particularly, the positive and negative classes were the most improved by ENS-FAST, whereas the Affect was the attitude class with more increase in F1 measure. Based on these considerations, it is possible to conclude that the proposed method overcomes clearly the results achieved by the LH11+SVM and LH11-RF.

5 Conclusions

The task of recognizing and classifying words according to the attitude type and orientation that they convey is an important step in order to apply fine-grained models based on the Appraisal Theory for analyzing evaluative language. In this work we showed an improvement on the automatic classification of attitude type and orientation of Spanish words. These results were achieved using two approaches that rely on neural network word embeddings (Word2Vec and FastText) for learning good vectors on large unlabeled corpora. One of the directions for future work will be to study the impact of different parameter settings of neural network word embedding on the classification method. We also plan to analyze the effectiveness of the proposal for classifying new attitude words out of LAM11-LEX and extending popular Spanish opinion lexicons with attitudes. Also the authors will work on the classification of multi-words, expressions and idioms rather than individual words. Finally, great efforts will be made for tackling the problem with a multi-class and multi-labels approach, considering the overlapping inherent to the Attitude system of the Appraisal Framework.