Gaussian LDA and Word Embedding for Semantic Sparse Web Service Discovery
In recent years, more and more Web services are published in API marketplaces founded by cloud service providers or third party registries. In this situation, users rely heavily on the search engine model to retrieve their expected Web services. However, due to the fact that Web services registered in API marketplaces are described in short texts, the search engine based discovery method suffers from the semantic sparsity problem, which in turn leads to a poor recall during service discovery. To address this issue, in this paper, we propose a novel Web service discovery approach that uses Gaussian Latent Dirichlet Allocation (Gaussian LDA) and word embedding. More specifically, instead of clustering Web services like most existing service discovery approaches, we use word embedding to map the words as continuous word embeddings to extend and enrich the semantics of service descriptions. We also leverage the Gaussian LDA in service discovery, which takes continuous word distribution as the input and interprets the Web service description as a hierarchical model by its two distributions. Based on the Gaussian LDA and word embedding, we propose a Web service query and ranking approach. Experiments conducted on a real-world Web service dataset demonstrate the effectiveness of the proposed approach.
KeywordsWord embedding Gaussian LDA Semantic sparsity Web service discovery
The work is supported by the National Basic Research Program of China under grant No. 2014CB340404, the National Natural Science Foundation of China under grant Nos. 61672387 and 61373037, the State Key Laboratory of Software Engineering Foundation under the grant No.SKLSE 2014-10-07, University Science and Technology Program of Shandong Province under the grant No.J16LN08, Scientific Research Foundation of Shandong University of Science and Technology for Recruited Talents under the grant No.2016RCJJ045. Jian Wang is the corresponding author.
- 1.Aznag, M., Quafafou, M., Jarir, Z.: Leveraging formal concept analysis with topic correlation for service clustering and discovery. In: Proceedings of the 2014 IEEE International Conference on Web Services (ICWS), pp. 153–160. IEEE (2014)Google Scholar
- 2.Cassar, G., Barnaghi, P., Moessner, K.: Probabilistic methods for service clustering. In: Proceeding of the 4th International Workshop on Semantic Web Service Matchmaking and Resource Retrieval, Organised in Conjonction the ISWC. Citeseer (2010)Google Scholar
- 4.Das, R., Zaheer, M., Dyer, C.: Gaussian LDA for topic models with word embeddings. In: ACL 2015 July 26–31, 2015, Beijing, China, vol. 1, Long Papers, pp. 795–804 (2015)Google Scholar
- 5.Elgazzar, K., Hassan, A.E., Martin, P.: Clustering wsdl documents to bootstrap the discovery of web services. In: Proceedings of the 2010 IEEE International Conference on Web Services (ICWS), pp. 147–154. IEEE (2010)Google Scholar
- 6.Hu, X., Sun, N., Zhang, C., Chua, T.S.: Exploiting internal and external semantics for the clustering of short texts using world knowledge. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 919–928. ACM (2009)Google Scholar
- 7.Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 775–784. ACM (2011)Google Scholar
- 8.Kenter, T., de Rijke, M.: Short text similarity with word embeddings. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1411–1420. ACM (2015)Google Scholar
- 9.Li, Y., Xu, L., Tian, F., Jiang, L., Zhong, X., Chen, E.: Word embedding revisited: a new representation learning and explicit matrix factorization perspective (2015)Google Scholar
- 11.Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL, pp. 746–751 (2013)Google Scholar