1 Introduction

With the growth of the Internet and social networking services like Twitter and Facebook, people can easily transmit evaluations, opinions, emotions, and impressions regarding products or services offered. Many companies collect and utilize such consumer review text data as effective information for product development and marketing. Therefore, identifying and summarizing opinions from online reviews is a valuable and challenging task. Sentiment analysis (SA) or opinion mining is a research-based approach to this task. SA can be divided into three research domains: document-level, sentence-level, and aspect-level. This paper specializes in aspect-level SA. Generally, aspect-level SA has three processes: sentiment identification, sentiment classification, and aggregation. Figure 1 illustrates the representative aspect-level SA processes. In the first process, both sentiment expression word (SEW) and aspect is identified. Some surveys focus partly on SEWs or aspects. In the second process, SEWs and aspects are classified with sentiment values (e.g. positive or negative). In the third process, the sentiment values are aggregated for each aspect to provide a brief overview. This paper focuses on sentiment identification.

Fig. 1.
figure 1

Aspect-level sentiment analysis process

As mentioned above, both SEWs and aspects are identified in the first process. Because aspect detection is a primary task of sentiment identification, many aspect detection methods have been proposed. However, SEW identification is also an important task. In many cases SEW is identified with a sentiment dictionary. Sentiment dictionary has a huge set of expression words. Also, sentiment values of aspects can be scored using sentiment values of SEWs that are obtained from sentiment dictionaries. However, it is reported that SEWs have many types of word expressions in each entity domain. Therefore, most sentiment identification or classification methods are dependent upon sentiment dictionaries; some SEWs cannot be extracted with a sentiment dictionary because SEWs have many types of word expressions; and sentiment identification and classification methods may have low performance. Therefore, SEW identification is important task and many syntax-based models are applied to the task. Syntax-based model can consider low frequency word; however, we need to consider many syntax relations and that may be not practical. Therefore, it is difficult to identify SEWs with syntax-based model.

This paper proposes a method for SEW identification and tries to identify unique SEWs that do not exist in sentiment dictionaries. This paper shows when given word sets that are likely to be SEWs, SEWs can be identified from the word sets efficiently, and unique SEWs can be extracted effectively with supervised learning. The method applies supervised learning in order to obtain SEWs more efficiently. Data sets for supervised learning are created based on the characteristics of a quality table (QT). A QT is a binary table used in quality function deployment methodology. The characteristics of a QT give each word features; a supervised learning classifier learns the words’ features. The features are obtained with point mutual information (PMI). In supervised classification, there a case where the data set is unbalanced data. Therefore, synthetic minority over-sampling technique (SMOTE) algorithm, the algorithm for unbalanced data, is applied in this paper. This paper proposes a non-syntax and relation-based model in order to solve syntax-based models’ problems. This paper carries out an experimental test, demonstrates how many unique SEWs are extracted, and verifies the coverage of SEW with annotated text.

2 Related Work

Many aspect-level SA methods have been developed for both English and Japanese. In both languages, the same three processes are employed. In this section, we discuss some related works in English and Japanese. In addition, we found out the difference point between related works and this paper.

Aspect-level SA in English presents three challenging tasks: aspect detection, sentiment analysis, and joint aspect detection and sentiment analysis. Aspect detection is identifying some aspects of products or services from target entity text data. Two types of model are frequently used for aspect detection: frequency-based and syntax-based. Frequency-based models consider high frequency single nouns or compound nouns as aspects. This straightforward method is powerful, and many approaches apply this method. This method clearly does not consider low frequency words; however, low frequency words are valuable aspects for customers and for companies. High frequency words are also likely to be mistaken as aspects. Therefore, a method that can also consider low frequency words as aspects is needed. To address this problem, a method of statistically determining a threshold value has been proposed. Syntax-based models analyze the syntactic relationships in sentences and identifying words that fit the syntactic relationship as aspects. The simplest syntactic relationship is dependence (e.g. given a sentence “wonderful design”, where expression word “wonderful” is an adjective modifying the aspect “design”). The strength of the syntax-based model is that low frequency words can be extracted as aspects. However, text data transmitted by Internet users, such as customer reviews, have many collateral word expressions that do not apply to syntactic relations. Therefore, in order to achieve better results, it is necessary to consider many syntax relations. In addition to the two model methods mentioned above, a method using supervised learning and unsupervised learning has also been proposed. In Japanese, SEW identification is also an important task. Similarly, aspect detection is mostly frequency-based and syntax-based and the same problems appear. Many syntax-based models, such as anaphoric analysis and dependency-based, are proposed in Japanese aspect-level SA. Japanese, in particular, has more word expressions than English does at same one’s meaning, which means we can express the meaning “good” with various words in Japanese. Text from the internet is hard to analyze syntactically. Therefore, it is difficult to identify SEWs with a syntax-based model.

Sentiment analysis involves inferring sentiment values of each aspect according to pre-defined sentiment values such as “positive” or “negative” and classifying aspects. In sentiment analysis, dictionary-based models and supervised learning are frequently applied. Dictionary-based models infer sentiment values with dictionaries of prepared SEWs and sentiment values. In some cases, the dictionary may be created directly from the corpus others are created in advance—open source dictionaries. Open source dictionaries have greater numbers of set of pairs of SEWs and sentiment values; however, some SEW sentiment values which don’t exist in the open source dictionary cannot be taken in consideration. As customer reviews have various SEWs for each entity, it is required to construct a sentiment dictionary for each entity, and methods to create a sentiment dictionary directly from customer reviews are actively proposed. Sentiment value estimation methods include estimation using association with SEW (such as SO-Score) and inferring directly from customer reviews with supervised learning.

In summary, both sentiment identification and sentiment classification processes depend on sentiment dictionaries; however, text from the Internet using a syntax-based model makes it difficult to identify and classify aspects since expressions frequently appear that are not included in the sentiment dictionary.

This paper proposes a quality table-based method for SEW identification. This paper proposes a non-syntax and relation-based method in order to solve both frequency-based and syntax-based models’ problems.

3 Technics and Method

3.1 Quality Table

The QT is a binary table representing the relationship between the required qualities: the customer’s demands for the product or service, and the characteristics of the product, which are aspects of the product or service. QT is used in the quality function development methodology. The required quality can be translated to the characteristics of the product with the QT. In other words, the voice of the customer can be converted into the voice of the engineer, therefore product design reflecting the customers’ requests can be performed.

This research focuses on the characteristics of the QT, representing the relationship between each required quality and each characteristic of the product. Required qualities are described as feature vectors that consist of relationships between each characteristic and its qualities. Required quality can be featured by a relationship with each characteristic of the product. Words may be featured by a relationship with each aspect. Featuring words with each aspect makes it obvious that certain words are likely to be SEWs and some words are less likely to be; some are extracted as SEW and the others are not. This research proposes a QT-based method for SEW identification. This paper discusses only sentiment identification. However, application for sentiment classification and aggregation is expected since QT-based methods preserve relationships between each SEW and its aspect. Therefore, conventional sentiment analysis methods (e.g. SO-score) and statistical methods (e.g. principal component analysis), for aggregation, are expected to be applied.

3.2 SMOTE Algorithm

One challenge in classification is classes with a significantly smaller number of samples than that of other classes. Such data sets are called unbalanced data. Solving such classification problems with general supervised learning approaches may be impossible. Many algorithms to solve these problems have been proposed. In this research, the number of seed expressions as positive class is even smaller than that of words as unlabeled classes. Therefore, classification in this research has unbalanced data, and synthetic minority over-sampling technique (SMOTE) is applied to solve the problem. SMOTE (Chawla et al. 2002) is a well-known algorithm to fight that problem. Over-sampling of the minority class and under-sampling of the majority class are carried out with SMOTE algorithm. Figure 2 shows the image of SMOTE algorithm. The number of minority samples are increased and that of majority samples are decreased, we can obtain balanced data from unbalanced data with SMOTE algorithm. SMOTE will increase the minority samples by artificially creating and decrease majority samples by randomly under-sampling. For each point of minority sample, we randomly select points between the k-nearest neighbor and add it to minority samples.

Fig. 2.
figure 2

SMOTE algorithm

3.3 Pointwise Mutual Information

Pointwise mutual information (PMI), or point mutual information, is a measure of association. PMI between two words, word1 and word2, is defined as follows (Church and Hanks 1989):

$$ {\text{PMI}}\left( {{\text{word}}_{1} ,{\text{word}}_{2} } \right) = \log_{2} \left( {\frac{{{\text{p}}({\text{word}}_{1} \,\& \,{\text{word}}_{2} )}}{{{\text{p}}({\text{word}}_{1} ){\text{p}}({\text{word}}_{2} )}}} \right) $$

Here, \( {\text{p}}({\text{word}}_{1} \,\& \,{\text{word}}_{2} ) \) is the probability that \( {\text{word}}_{1} \) and \( {\text{word}}_{2} \) co-occur. If the words are statistically independent, the probability that they co-occur is given by \( {\text{p}}({\text{word}}_{1} ){\text{p}}({\text{word}}_{2} ) \). The ration between \( {\text{p}}\left( {{\text{word}}_{1} \& {\text{word}}_{2} } \right) \) and \( {\text{p}}({\text{word}}_{1} ){\text{p}}({\text{word}}_{2} ) \) is a measure of the degree of statistically dependence between the words. The relation in QT is given with PMI. When given words set, \( \{ {\text{word}}_{1} \cdots {\text{word}}_{\text{n}} \} \), and aspects set, \( \{ {\text{aspect}}_{1} \cdots {\text{aspect}}_{\text{m}} \} \), \( {\text{PMI}}_{\text{ij}} \) is defined as follows:

$$ {\text{PMI}}_{\text{ij}} = {\text{PMI}}\left( {{\text{word}}_{\text{i}} ,{\text{aspect}}_{\text{j}} } \right) = \log_{2} \left( {\frac{{{\text{p}}({\text{word}}_{\text{i}} \,\& \,{\text{aspect}}_{\text{j}} )}}{{{\text{p}}({\text{word}}_{\text{i}} ){\text{p}}({\text{aspect}}_{\text{j}} )}}} \right) \left( {\begin{array}{*{20}c} {{\text{i}} = 1, \ldots ,{\text{n}}} \\ {{\text{j}} = 1, \ldots ,{\text{m}}} \\ \end{array} } \right) $$

\( {\text{PMI}}_{\text{ij}} \) is a measure of association between \( {\text{word}}_{\text{i}} \) and \( {\text{aspect}}_{\text{j}} \). A \( {\text{word}}_{1} \) can be characterized with \( {\text{PMI}}_{{1{\text{j}}}} \) for each \( {\text{aspect}}_{\text{j}} \). As the same way, seed expression can be characterized with PMI. And a classifier learns the seed expression’s features and classify words into certain words which have similar feature with seed expression or not. The certain words can be expected to be SEW and that is the main purpose of this QT-based method.

3.4 Proposed Method

In this section, a QT-based method for SWE identification is described. Before the method is performed, preprocessing is done as shown Fig. 3.

Fig. 3.
figure 3

Constitution of this paper

Input data are customer reviews set and seed expressions. A seed expression is often used as a clue to identify the SEWs in the sentiment identification process. There are commonly four seed expressions used [“良い” (good), “最高” (satisfied), “悪い” (bad), “不満” (dissatisfied)]. In this research, we also use the four seed expressions mentioned above.

Preprocessing consists of three processes: dividing reviews into sentence units, morphological parsing, aspect detection. Consumer review sentences were divided into sentence units in accordance with end-of-sentence punctuation, such as “.”, “!”, and “?”. And morphological parsing is performed to determine the morphemes from which a given word is constructed. Then Aspect detection is performed. Frequency-based aspect identification model has powerful performance however low frequency word cannot be considered. And syntax-based model can also cover low frequency words however a lots of syntax relations required to be considered. This paper focuses on the co-occurrence relationship with seed expressions. Co-occurrence relation means that two words co-occur in the same sentence. In the frequency-based model, there is a problem that high frequency words are erroneously extracted as aspects and a problem that low frequency words cannot be taken into account. By considering the co-occurrence relation, it is possible to delete a weakly related word even at a high frequency, and to leave a strongly related word even at a low frequency. Therefore, aspects are extracted based on co-occurrence relation. Co-occurrence relation is generated with PMI measure. PMI measure of each single noun and compound noun is calculated, and the PMI threshold is statistically determined, only single or compound nouns that exceed the threshold are extracted. However, since it is considered that all words obtained with PMI do not hold as aspects, it is necessary to select certain aspects from the extracted words.

Selection is carried out with comparing characteristics of the product. Characteristics of the product can be categorized into seven elements as shown Table 1. Accordance with the seven elements, aspects are extracted. Then the method is applied, we identify the expressions of sentiment. A number of syntax-based models are applied to SEW identification methods however syntactic-based model methods need to use many syntactic relations in order to produce powerful results. This paper proposes a QT-based method for SEW identification. QT-based method has two processes: words extraction and classification. Figure 4 describes processes of the method.

Table 1. Seven elements of characteristic of the product
Fig. 4.
figure 4

Processes of the QT-based method

Words which can be SEW are extracted in word extraction process. Obviously the SEW consists of single words or compound words. Word extraction is generally done based on part of speech (POS). In Japanese text, the high frequency POS of the single SEW are adjective, noun, verb and adverb. However, the combination of POS in compound SEW cannot be determined since the patterns of the combination are too excessive. In this paper, annotation for customer reviews is performed, and accordance with the result of annotation, POS patterns of both single and compound SEW are determined and we obtain words set in this process.

Classification and identification process, supervised learning classification is used to obtain SEW from the words set. Training set of supervised learning is created with based on relationship of quality table. Initially, the relationship between each aspect and seed expressions is generated with PMI measure and then seed expression-aspect deployment is created. Next as the same way, the relation between each word from the words set and aspect is generated by PMI measure, and words-aspect deployment is created. Seed expression-aspect deployment and words-aspect deployment are combined as data set for supervised learning. Table 2 illustrates training set for classification. The purpose of using supervised learning is to classify words set into two classes, “positive” or unlabeled. Therefore, the classification target is words set. Feature of the target is PMI measure with each aspect. Learn the features of the seed expressions and find certain words with similar features with seed expressions from the words set. Here, the number of samples of seed expressions and that of words set can be biased. That is, it is considered that the number of data set of class “positive” is extremely small as compared with that of unlabeled class. Therefore, this data set can be regarded as unbalanced data. In this paper, SMOTE algorithm is applied to the unbalanced data. Lastly, words classified as class “positive” from the words set are extracted as SEW. Also, add the extracted SEWs to the seed expressions and repeat the classification and identification process several times to obtain more SEWs. Then the result of classification is evaluated with precision. In the experimental test, we compare the number of SEWs obtained from the words set with that of SEWs from sentiment dictionary. Then in verification test, precision and recall rate are calculated and verify the coverage of annotated SEWs.

Table 2. Training set for supervised classification

4 Results

In accordance with the method, experiment is performed for Japanese language text. The experiment result was evaluated for effectiveness and how new SEWs that do not exist in open source sentiment dictionary can be extracted, by comparing the SEWs in the experiment result with a pre-identified one in the open source sentiment dictionary. 450 customer reviews are for experiment and 113 ones for verification from RAKUTEN market review, the electronic commerce service in Japan. Seed expressions are these following four words, [“良い” (good), “最高” (satisfied), “悪い” (bad), “不満” (dissatisfied)]. R, the software environment for statistical computing, was used for all processes.

4.1 Pre-experiment

Aspects and sentiment expression words annotation was performed for a verification review set. They were annotated as one pair. Aspects and SEWs obtained totaled 223 sets. Then 110 unique annotated aspects were obtained. In the 110 aspects, 95 aspects consisted of single noun or compound nouns. Most of the annotated aspects consisted of single nouns or compound nouns. That result is the same as conventional aspect annotation. Next, 113 unique SEWs were obtained. Table 3 shows the result of SEW annotation.

Table 3. Result of SEW annotation

As shown in Table 3, a single word is more likely to appear than compound words as SEWs. This research revealed that single word SEWs consist of nouns, adjectives, verbs, or adverbs as SEWs. That result was similar to another researcher‘s annotation in Japanese. Compound words have various POS patterns, many types of POS patterns appear at low frequency. A frequently POS pattern in compound words is “Verb + Adjective“(e.g. “傷付きやすい“(easily hurt)) and “Noun + Verb“(e.g. “つるつるする“(slippery)). It is observed that more than triple word combination, POS patterns are very uneven. Therefore, this paper focuses on single and double words.

4.2 Experiment

In preprocessing process, 450 customer reviews are divided into 1602 sentence units and morphological parsing is carried out with RMeCab. Then aspects detection is performed. At first 3208 single nouns and compound nouns are extracted. Then they are screened out with threshold of PMI measure and eliminated into 329 words. 98 aspects are obtained accordance with the elements of characteristic of the product.

Here, the method is applied to 450 customer reviews. At first process, 965 words are extracted from the customer reviews. Then at second process, training set is created with both the 965 words as unlabeled samples and 4 seed expressions as positive sample. The features were generated with PMI measure. And support vector machine was used as classifier and modeled with SMOTEd set. Evaluation of the model is performed with cross-validation test for SMOTEd set as shown Table 4. SEWs were obtained from unlabeled samples classified into false positive (FP) domain as positive one. Obtained SEWs in each repetition of classification are described in Table 4. 34 Single SEWs and 30 double SEWs, totally 64 SEWs are extracted. The highest FP-precision (SEW/FP) value is 50.0% at 5th repeatation, conversely worst is 21.0% at 3th repeatarion. Average is 36.8%.

Table 4. Number of SEWs extracted from 450 reviews

It was confirmed that mis-typing words, such as “使い安い“(easy to operate), and unique expressions on compact digital camera review, such as “サスガキャノン“(Canon is great as expected) and “フォーカスする“(focus on), can be extracted.

4.3 Evaluation of Experiment

Japanese open resource sentiment dictionary was obtained from web site of Okazaki laboratory which opens declinable sentiment words [10]. Declinable word is general term of the declinable POS, in Japanese, that means verbs and adjectives. This paper aims to compare the result of experiment with the sentiment dictionary and turn out how many unique SEWs which isn’t from the sentiment dictionary can be obtained. The sentiment dictionary has 5280 declinable SEWs. Then, in 34 single SEWs, the number of total declinable SEWs is 8, and it was confirmed that of unique declinable SEWs which isn’t included in the sentiment dictionary is 4. The ratio between the number of unique declinable SEWs and that of total declinable SEWs is 50%. It is revealed that the number of SEWs which cannot be obtained with dictionary-based model is 4 in case of declinable words.

4.4 Verification of Coverage

Coverage of annotated SEWs are verified with 113 customer reviews accordance with the method. As the same with experiment, preprocessing is carried out. 333 sentence units are obtained and same 98 aspects is used for verification. Total number of annotated SEWs is 113, the number of annotated single SEWs is 80 and that of annotated compound SEWs is 33. However the method cannot consider all 33 annotated compound SEWs since only both POS patterns, “verb + ajective“and “noun + verb“, are designed. That fact can be confirmed in Table 2. Therefore, the verification is restricted with case where the number of annotated SEWs is 94. Then the method is applied to the customer reviews and classification was repeated 6 times as the same with experiment. Table 5 illustrates the final result of verification with precision, recall and F-measure.

Table 5. Prediction table

Totally 46 annotated SEWs were extracted correctly and 48 ones were missed. 101 words were misclassified as SEW and 476 words were classified as non-SEW correctly.

5 Discussion

It is confirmed the precision of the method in verification is 31.3%. Two problems which causes the low precision are assumed, and here we discuss the two problems. First problem is considered that process of dividing reviews into sentence units in preprocessing doesn‘t not works well. The processes is got involved in the stage where the training set is created, therefore if the process doesn‘t work well, classification doesn‘t work well and precision becomes lower. Dividing reviews is performed accordance with end-of-sentence punctuation however there is a case where customer review ends without end-of-sentence punctuation is observed. That case was confirmed in 563 customer reviews, 44 ones don’t include end-of-sentence punctuation. Therefore, reviews weren’t splitted correctly, and PMI measure wasn’t calculated correctly because of that. Verification and improving of accuracy of dividing reviews is required.

Considering noun word in word extraction process may be the second problem. In the process, single nouns were considered as they are likely to be SEWs. However, it was confirmed that not only single noun SEWs were extracted but also many single noun aspects were done. As shown in Table 5, 101 words are misclassified into false positive domain and 20 aspects were observed in that false positive domain. Excluding the 20 aspects, precision becomes higher at 36.2%.

6 Conclusion and Future Work

This paper proposes a quality table-method for SEW identification. It is revealed that the method extracted SEWs which includes mis-typing and unique ones with precision at 36.8%. And the result of SEWs has effective SEWs which isn’t included in sentiment dictionary. The coverage of the method was verified with recall at 48.9% and precision at 31.3%.

A half of annotated SEWs were missed and many words were misextracted as SEW with the method, therefore improving both precision and recall is future work. The explanation of that problem is considered the process of dividing reviews into sentence units. It can be expected that improving the process make the precision higher. And the SEWs-aspects deployment will be obtained and it may be helpful for sentiment classification and aggregation. Utilizing the deployment can be expected.

.