Support Vector Mind Map of Wine Speak

Flanagan, Brendan; Hirokawa, Sachio

doi:10.1007/978-3-319-40349-6_13

Brendan Flanagan² &
Sachio Hirokawa³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9734))

Included in the following conference series:

International Conference on Human Interface and the Management of Information

2006 Accesses
1 Altmetric

Abstract

Models created by blackbox machine learning techniques such as SVM can be difficult to interpret. It is because these methods do not offer a clear explanation of how classifications are derived that is easy for humans to understand. Other machine learning techniques, such as: decision trees, produce models that are intuitive for humans to interpret. However, there are often cases where an SVM model will out preform a more intuitive model, making interpretation of SVM trained models an important problem. In this paper, we propose a method of visualizing linear SVM models for text classification by analyzing the relation of features in the support vectors. An example of this method is shown in a case study into the interpretation of a model trained on wine tasting notes.

You have full access to this open access chapter, Download conference paper PDF

Introduction to SVM

A Hybrid Wine Classification Model for Quality Prediction

Understanding the Wine Judges and Evaluating the Consistency Through White-Box Classification Algorithms

Keywords

1 Introduction

SVM models have often been described as black box models because there is no comprehendible explanation or justification of the trained model and the classifications derived from it [1]. This stems from a lack of evidence as to how the model was derived from the analysis of the input data. Other machine learning methods, such as: decision trees, are easy for humans to interpret classification explanations and justifications by simply observing a visualization of the model. However, there are cases where an SVM trained model is a better fit for a classification problem, and will outperform other methods that generate models that are simple to interpret. A method often used in the interpretation of Linear SVM models for text classification is the extraction of feature weight [2]. However this method does not explain how different features in the model are related. In Sect. 2, we propose that the relation structure of features can be extracted from support vectors to help interpret the characteristics of an SVM model for text classification. Then in Sect. 3, the extracted relation characteristics are analyzed to generate visualizations of the model in the form of positive and negative feature trees. These two trees can be thought of as representing the support vectors that are placed on the positive and negative sides of the decision hyperplane, and will be referred to as the positive feature tree and negative feature tree respectively. The root node of the trees are the features that have the greatest positive and negative weight across all support vectors respectively. A case study of the proposed method is provided to demonstrate its application on real world data in Sect. 4, and how the generated visualizations can be interpreted. An overview of the proposed method is shown in Fig. 1.

2 Feature Scores and Relations

2.1 Extraction from Support Vectors

The output of linear kernel model trained in SVM$^{light}$ [3] contains the support vectors V that describe the decision hyperplane, in which each support vector $v_{j}\in V$ is made up of a vector weight $\alpha _{j}y_{j}$ and the original feature vector $\{w_{1},\dots ,w_{n}\}$ from the data analyzed in the training process. Equation 1 determines the weight of a feature word $w_{i}$.

$$\begin{aligned} weight(w_{i}) = \sum _{v_{j} \in V} \alpha _{j}y_{j} TF(w_{i},v_{j}) \end{aligned}$$

(1)

Where $TF(w_{i},v_{j})$ is the term frequency of the word $w_{i}$ in the support vector $v_{j}$. To classify a document $d_{l} \in D$, the $score(d_{l})$ in Eq. 2 is evaluated with respect to the model bias b.

$$\begin{aligned} score(d_{l}) = \sum _{w_{i} \in d_{l}}weight(w_{i}) \end{aligned}$$

(2)

A linear SVM model can be interpreted by analysis and visualization of the feature vectors and $\alpha _{j}y_{j}$ weight of support vectors. In the present paper, we propose that visualization by automatically generating trees for the positive and negative features contained in the support vectors of the model. To generate the visualizations, first we analyzed the relation of the feature vectors and apply the vector weight and other characteristics of the features to create a ranking of the features and their relations.

2.2 Feature Scoring and Relation Representation

As described in the previous section, the score of an individual feature word can be calculated from the support vectors contained in the model as seen in Eq. 1. In addition to this method, attributes of features can be used in calculating a word score for ranking. In Eq. 3 the document frequency of a word $DF(w_{i})$ is used to give more importance to words that occur in many documents.

$$\begin{aligned} WS(w_i) = \sum _{v_{j}\in V}\alpha _{j}y_{j}TF(w_i,v_j)DF(w_i) \end{aligned}$$

(3)

A word co-occurrence matrix that describes the relations of features can be generated by analyzing the features that occur within the same support vector. A naïve co-occurrence frequency could simply be the number of support vectors in which two feature words have occurred. However this does not take into account the weights of the support vectors in the model. We calculate the values describing co-occurring feature word as seen in Eq. 4.

$$\begin{aligned} CS(w_{u},w_{v}) = \sum _{v_{j}\in V}\alpha _{j}y_{j}\frac{1}{2}\sum _{w_{t}\in \{w_{u},w_{v}\}}TF(w_t,v_j)DF(w_t) \end{aligned}$$

(4)

3 Visualization Method

By analyzing the relations of features in the support vectors of Linear SVM models, we propose that the visualization of two trees representing the positive and negative features can be useful in model interpretation. A complete graph of the relation of features in the support vectors of the Linear SVM model can be generated by analyzing the Jaccard distance of pairs of features. The similarity of two feature word nodes u and v is calculated using the formula in Eq. 5.

$$\begin{aligned} Similarity(w_{u},w_{v})=\frac{CS(w_{u},w_{v})}{WS(w_u)+WS(w_v)-CS(w_{u},w_{v})} \end{aligned}$$

(5)

Where $CS(w_{u},w_{v})$ represents the score of support vectors that the two features co-occur in, and $WS(w_u)$ and $WS(w_v)$ represent the score of the support vectors that the features $w_{u}$ and $w_{v}$. Visualization of the relations of the features as a complete graph would be difficult to interpret as there is a large number of edges connecting all the nodes of the graph [4]. To help overcome this problem, we search for a minimum spanning tree of the complete graph that is made up of the strongest relations between the feature nodes. The pseudocode in Algorithm 1 searches for the minimum spanning tree by first creating a matrix of edges that are selected by finding the maximum similarity for nodes of decreasing importance.

After the maximum similarity edge matrix is determined the graph is constructed by creating all the nodes and joining the edges found by the search. When generating tree for positive or negative features, the set of features w is limited to only positive or negative features respectively. This ensures that the trees do not contain overlapping features.

4 Case Study: Interpreting SVM Models of Wine Sensory Viewpoints

In previous work, we have analyzed wine tasting notes using SVM [5]. The data analyzed is a corpus that consists of 91,010 wine tasting notes, or 255,966 sentences, that were collected from the Wine Enthusiast website^{Footnote 1}. A subset of the data consisting of 992 sentences from wine tasting notes was randomly selected for use in the training, testing and evaluation of sensory sentiment models. This data subset was manually classified by hand into four different sensory category viewpoints, as defined by Paradis and Eeg-Olofsson [6]. Optimal feature selection was achieved by a subset of 600 top positive and negative features.

In the present paper, we visualize the taste sensory viewpoint SVM model as a case study of interpreting linear models using our system to automatically generate positive and negative feature trees as seen in Figs. 2 and 3. Overall the two trees have quite different structures. The positive tree has groupings of words around hub words, whereas the negative tree has less groupings of features.

Table 1. Child nodes of finish feature node from the positive tree and example wine tasting notes they represent.

Full size table

A possible explanation for this is that the positive tree only contains features from one class, the taste class, and the negative tree contains features from at least three different classes: smell, touch, and vision. As these trees have numerous nodes, we will focus our interpretation on a small section of the positive tree. We decided to target the finish feature node as it has many first generation child nodes. Table 1 contains a sample of the child nodes and an example wine tasting note that is representative of the support vectors in which the features co-occurred.

The child node word and all the parent node words have been hi-lighted, showing that some of the support vectors extend from the root node, while others are only locally represented by finish and a child node. This hierarchal structure represents the finish characteristic of a wines taste. This structure would not be apparent just by examining a simple ranking of feature words by WS, as seen in Table 2, as other features in separate surrounding branches share similar scores, but belong to a different taste sub-characteristic. The proposed method enables the interpretation of the structure of related features and their relevance to the SVM model.

Table 2. Rank of features by $WS(w_i)$ from finish to hint.

Full size table

5 Related Work

Previous work into the extraction of features for interpretation of SVM models has focused on creating rules that describe the classifications made by the model. Barakat et al. [1], argued that the interpretation of a model is an important step to gaining acceptance when applying black box machine learning techniques in medical settings. They proposed the extraction of rules from models to aid medical understanding into the classifications made for the diagnoses of type 2 diabetes. Few features were used in the study, which would make interpretation by rules applicable. In the present paper, we focus on methods for interpreting text classification models, which have numerous features, making rule based interpretation an unfeasible option.

In other previous work, the authors have visualized the contents of wine tasting notes from the perspective of words describing sensory modalities by a system that automatically generates radar charts of the predicted value by SVM model [5]. This method only analyzes the predicted document $score(d_{l})$, and does not provide insight into the interpretation of the model. We previously examined the visualization of a corpus of documents by analyzing the SMART weight of single words and pairs of words selected by AND boolean search [4]. However, it was not possible to use a search engine for the visualization of Linear SVM models as the system needs to take into account both positive and negative scored words. To overcome this problem, we propose analyzing the ranking of features by $WS(w_{i})$ and the relation of features by co-occurrence matrix.

6 Conclusion

Blackbox machine learning techniques are difficult for humans to interpret due to the lack of explanation on how classifications are derived. Other machine learning techniques, such as: decision trees offer a model that is easy for human’s to interpret. However, a short coming of these techniques is that they are often out preformed by blackbox trained models, such as: SVM. In this paper, we proposed a method for extracting the relations from support vectors contained within a trained SVM model. These relations were then analyzed to automatically generate two trees that represent the positive and negative features of the SVM model. In a case study on the interpretation of an SVM model that classifies the taste sensory viewpoint of wine tasting notes, we found that the proposed method can reveal structures in the model that can be interpreted as sub-characteristics.

In future work, the propose method should be compared to other machine learning methods to evaluate the effectiveness of the visualization for model interpretation.

Notes

1.
http://buyingguide.winemag.com/.

References

Barakat, M.N.H., Bradley, A.P.: Intelligible support vector machines for diagnosis of diabetes mellitus. IEEE Trans. Inf. Technol. Biomed. 14(4), 1114–1120 (2010)
Article Google Scholar
Sakai, T., Hirokawa, S.: Feature words that classify problem sentence in scientific article. In: Proceedings of the 14th International Conference on Information Integration and Web-Based Applications and Services, pp. 360–367 (2012)
Google Scholar
Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, Berlin (2002)
Book Google Scholar
Hirokawa, S., Flanagan, B., Suzuki, T., Yin, C.: Learning winespeak from mind map of wine blogs. In: Yamamoto, S. (ed.) HCI 2014, Part II. LNCS, vol. 8522, pp. 383–393. Springer, Heidelberg (2014)
Google Scholar
Flanagan, B., Wariishi, N., Suzuki, T., Hirokawa, S.: Predicting and visualizing wine characteristics through analysis of tasting notes from viewpoints. In: Stephanidis, C., Tino, A. (eds.) HCII 2015 Posters. CCIS, vol. 528, pp. 613–619. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21380-4_104
Chapter Google Scholar
Paradis, C., Eeg-Olofsson, M.: Describing sensory experience: the genre of wine reviews. Metaphor Symb. 28(1), 22–40 (2013)
Article Google Scholar

Download references

Acknowledgment

This work was supported by JSPS KAKENHI Grant Number 15J04830.

Author information

Authors and Affiliations

Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
Brendan Flanagan
Research Institute for Information Technology, Kyushu University, Fukuoka, Japan
Sachio Hirokawa

Authors

Brendan Flanagan
View author publications
You can also search for this author in PubMed Google Scholar
Sachio Hirokawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brendan Flanagan .

Editor information

Editors and Affiliations

Tokyo University of Science , Tokyo, Japan
Sakae Yamamoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Flanagan, B., Hirokawa, S. (2016). Support Vector Mind Map of Wine Speak. In: Yamamoto, S. (eds) Human Interface and the Management of Information: Information, Design and Interaction. HIMI 2016. Lecture Notes in Computer Science(), vol 9734. Springer, Cham. https://doi.org/10.1007/978-3-319-40349-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-40349-6_13
Published: 21 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40348-9
Online ISBN: 978-3-319-40349-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Support Vector Mind Map of Wine Speak

Abstract