1 Introduction

The semantic annotation of opinions is one of the very important tasks of opinion mining. Semantic annotations are very important both for training machine learning approaches and for evaluating opinion mining methods. Unfortunately, there have been hardly any serious proposal attempts of appropriate annotation schemas until recently when SentiML [2], OpinionMiningML [9] and EmotionML [10] were proposed. In this paper, we discuss, compare and identify the positives and negatives of these annotation schemes. Following this overview, we propose SentiML++, an extension of SentiML that addresses several shortcomings of the state of the art.

SentiML. The SentiML annotation schema [2] follows a conventional sentiment annotation style and is based on Appraisal Framework (AF) [5] which is a strong linguistically-grounded theory. AF helps to define appraisal types (affect, judgments and appreciation) within the modifier tag which is another positive point to be noted in SentiML. With a very simple annotation scheme, SentiML is popular because adopting its annotation scheme does not require to acquire any specific skills. However, concerns can be raised about SentiML.

OpinionMiningML. OpinionMiningML [9] is an XML-based formalism that allows tagging of attitude expressions for features or objects as found in a textual segment. It targets extraction of feature-based opinion expressions but its scope is limited to proposing an annotation schema. Besides this, the structure of OpinionMiningML is not straightforward and can be threatened by challenges for feature and relation extraction while developing an automatic tagger for this annotation scheme.

EmotionML. EmotionML [10] aims to make concepts from major emotion theories available in a broad range of technological contexts. Being informed by the effective sciences, EmotionML recognises the fact that there is no single agreed representation of effective states, nor of vocabularies to use. Therefore, an emotional state \({<}emotion{>}\) can be characterised using four types of descriptions: \({<}category{>}, {<}dimension{>}, {<}appraisal{>}\) and \({<}action-tendency{>}\). Furthermore, the vocabulary used can be identified.

SentiML Example. Throughout the article, we will use the following sentence as a running example: “The U.S. State Department on Tuesday (KST) rated the human rights situation in North Korea “poor” in its annual human rights report, casting dark clouds on the already tense relationship between Pyongyang and Washington.” Relevant annotations in SentiML are given below:

figure a

OpinionMiningML example below annotated using the OpinionMiningML syntax:

figure b

EmotionML example presented hereby an example annotated using the EmotionML syntax:

figure c

2 Comparison

In this section we give a comparison of annotations schemes from different perspectives, summarized in Table 1. From this comparison, it can be concluded that SentiML has a larger scope and it is equipped with a more affordable vocabulary with respect to the previous work [1, 4, 6, 8]. Hence, we find it the most suitable choice for our current work.

3 SentiML++

In this section, we provide an extension of SentiML considering the work of Bing Liu [3] as a reference and find out that most of the aspects defining an opinion seem to be missing (completely or partially). For example, SentiML works on sub-sentence level and hence leaves actual holder and target entities of sentiment of a sentence unmarked. As far as opinion orientation is concerned, it deals with prior polarities in a better way than the contextual ambiguities. It does not recognize the topic of a sentence, hence fails to identify topic-based contextual ambiguities. Similarly, the contexts defined by cultural phrases and emoticons cannot be identified using SentiML. Flexibility and completeness are important characteristics of an annotation scheme [7] and unfortunately, SentiML fails to have both of these characteristics. Identification of opinion words and their polarity with respect to a given topic could help resolving contextual ambiguities. Therefore, we propose to take topic identification into account in SentiML++ by proposing \(<\)TOPIC\(>\) element. A good share of the opinions generally found on the web are expressed informally. This includes the use of emoticons and sarcastic phrases or cultural expressions (e.g., “bored to death”, “dressed to kill”, etc.) that could invert the semantics of the text surrounding them. Identification of such contexts could aid the automated detection of such opinion inversions. In SentiML++, we deal with this problem using \(<\)INFORMAL\(>\) element.

Table 1. Comparison of annotation schemes

SentiML++ Example. SentiML++ operates on both levels i.e. phrase as well as sentence level. In this section, we annotate the same sentence as an example using SentiML++. The annotation includes the introduction of sentence-level markups like \(<\)HOLDER\(>\), \(<\)TARGET\(>\), \(<\)ORIENTATION\(>\), \(<\)TOPIC\(>\) and \(<\)INFORMAL\(>\) while only \(<\)HOLDER\(>\) is introduced on the sub-sentence level. All of these markups come under the main markup of \(<\)SENTENCE\(>\) while sub-sentence level annotation comes under \(<\)PHRASES\(>\). The markup \(<\)SENTENCE\(>\) includes one attribute called “type” with possible values “general” or “informal”. The value “general” is used for ordinary sentences while the value “informal” is only used when the whole identified sentence is an idiom, a metaphor or an emoticon. When the value “informal” is used, only the \(<\)INFORMAL\(>\) markup plays its role while other markups are discarded because they are rendered meaningless. The \(<\)HOLDER\(>\) markup was introduced on the sub-sentence level to identify the holders even at this smaller granularity, if needed.

figure d

It must be noted that \(<\)APPRAISALGROUP\(>\) in SentiML++ links modifier, target and holder identified at the sub-sentence level. This is in contradiction with SentiML [2], where only modifier and target are linked. \(<\)Holder\(>\) and \(<\)Target\(>\) elements are found on both levels i.e. sentence and sub-sentence level. Natural language processing techniques such as syntactic parsing can be helpful in identifying these elements on both granularity levels.

4 Conclusions and Future Work

In this paper, we proposed SentiML++, an extension of SentiML. We proposed to add target (on sentence level), holder, topic and informal sentence identification as part of SentiML++. SentiML++ adds flexibility to SentiML by giving freedom of choice for a taxonomy when annotating the topic of a sentence. As part of our future work, we plan to further enhance SentiML++ by modeling relations between its elements. The idea is to propose a more suitable model for the semantic web so that state-of-the-art semantic web tools can be leveraged to exploit its semantics.