Using Clustering Labels to Supervise Mashup Service Classification
With the rapid growth of mashup resources, clustering mashup services according to the functions of the mashup services has become an effective way to improve the quality of mashup services management. Clustering is a learning task that classifies individuals or objects into different clusters based on the similarity. The purpose of clustering is to maximize the homogeneity of elements in the same cluster and maximize the heterogeneity of the elements in different clusters. It is a multivariate statistical method for classification. However, compared with the supervised classification, the clustering’s ability to categorize is much weaker. Existing methods for mashup services clustering mostly focus on utilizing key features from WSDL documents directly. In this paper, we proposed a method to improve the categorize ability of clustering. That is, applying supervised thought to cluster mashup services. First, taking basic clustering operations on the WSDL documents of mashups to obtain the clustering result for each element. Then, using the WSDL documents as training data, and the clustering results from the first step as pseudo-tags to train a classification learner. Finally, classifying mashups with this classification learner to get the final clustering results.
KeywordsSemi-supervised clustering Ensemble clustering Supervised LDA Pseudo-tag
This work is supported by the National Social Science Foundation of China (Grant No. 15BGL048), Hubei Province Science and Technology Support Project (Grant No. 2015BAA072), Hubei Provincial Natural Science Foundation of China (Grant No. 2017CFA012). The Fundamental Research Funds for the Central Universities (WUT: 2017II39GX).
- 1.Shi, M., Liu, J., Zhou, D., Tang, M, Cao, B.: WE-LDA: a word embeddings augmented LDA model for web services clustering. In: 24th International Conference on Web Services, Honolulu, HI, USA, pp. 9–16. IEEE (2017)Google Scholar
- 4.Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: 32nd International Conference on Machine Learning, Lille, France, pp. 957–966 (2015)Google Scholar
- 5.Mcauliffe, J.D., Blei, D.M.: Supervised topic models. In: 21st Advances in Neural Information Processing Systems Conference, Whistler, British Columbia, Canada, pp. 121–128 (2008)Google Scholar