Keywords

1 Introduction

The emerging cloud computing technology offers a new computing environment which enables us to access computing resources, storage and network infrastructure through the Internet without up-front infrastructure costs [1, 2]. With the rapid development of cloud computing technology, many information resources are wrapped and released as cloud services on public servers [3] and companies such as Google, IBM, Microsoft and Amazon opt to provide cloud service products through the public servers [4]. Because a public server usually has massive cloud service products, cloud service recommendation is necessary to provide right services to right users.

Many methods have been proposed to construct selection and ranking models for service products. Among them, QoS (quality of services)-based service selection model [7,8,9], AHP-based cloud service ranking model [10], trust-aware service selection model [11] and selection method based on collaborative filtering mechanism [12] are popular models. In these models, quantitative criteria are employed to evaluate service quality and the textual information (e.g. service descriptions) is rarely considered.

This paper proposes an approach to recommend cloud services with the textual description information and tags. We first propose a non-parametric Bayesian model to cluster cloud services. The model is constructed based on Dirichlet process mixture model (DPMM), which can infer the number of clusters automatically without specifying the number of clusters in advance and work well with large-scale datasets [6]. Then, we proposed a personalized PageRank algorithm to generate cloud service rankings based on service tags and clusters we obtained.

The major contributions of this paper are summarized as follows:

  1. (1)

    This paper employs textual information to recommend cloud services. Compared with service title and click records, the textual information implies rich service features which can help us understand the service functions and make accurate recommendations. To the best of our knowledge, this is the first research to recommend cloud services based on textual description information.

  2. (2)

    We propose a nonparametric DPMM to classify cloud services into an optimal number of clusters while the number of clusters is identified endogenously. To cluster cloud services, managers usually do not have knowledge on how many clusters exist and which cloud services belong to which cluster. The nonparametric model is particularly suitable for cloud service clustering because it requires no predefined number of clusters, instead it optimizes the number automatically based on data.

  3. (3)

    We propose a personalized PageRank algorithm to rank the cloud services in each cluster obtained by the proposed DPMM method. The personalized PageRank algorithm can rank cloud services by tags and textual descriptions, and recommend services to meet users’ personalized requirements.

  4. (4)

    We conduct a set of experiments based on a real-world dataset from Programmable Web. Our experiment shows, compared with the baseline methods, the proposed model achieves a significant improvement.

The remainder of this paper is organized as follows: Sect. 2 reviews the related works in literature. Section 3 introduces the proposed approach. Then, in Sect. 4, carries out experiments on some real-world data sets to validate the performance of our approach. Finally, we conclude our work by presenting summary and future directions in Sect. 5.

2 Related Work

2.1 Cloud Service Recommendation

Since Weiss [13] first proposed the concept of cloud computing, research on cloud computing is becoming more and more popular. Formerly, most of the researches on service selection and recommendation were based on the QoS values. However, sometimes it is difficult for us to get the exact QoS values, so scholars began to focus on evaluating and predicting the missing QoS values [14]. In [7], they presented an evaluation approach of QoCS (Quality of Cloud Service) in service-oriented cloud computing which combines the cloud users’ preferences evaluation of cloud service providers employing fuzzy synthetic decision with uncertainty calculation of cloud services based on monitored QOCS data for cloud users. Han [8] proposed a recommendation system which creates ranks of different cloud services based on the network QoS and Virtual Machine (VM) platform factors of different cloud providers. Considering that collaborative filtering technology (CF) is the most mature and widely used technology in the recommend system, CF is also widely used in service recommendation based on QoS [12, 15]. In reality, collaborative filtering is vulnerable to the sparse data and is extremely time-consuming with the enlargement of data.

In [16], the author introduced the cloud broker who is responsible for the service selection and developed impactful service selection algorithms to rank potential service providers and aggregate them. Yu [17] put forward a new train of thought that integrates Matrix Factorization (MF) with decision tree learning to bootstrap service recommendation systems. Ding [18] proposed a ranking-oriented prediction method and the method consists of two parts: ranking similarity estimation and cloud service ranking prediction that takes the customer’s attitude and expectations for service quality into account.

2.2 Text Clustering Based on Topic Model

Clustering is a widely researched data mining problem in text domain and the popular method in probabilistic description clustering is topic modeling [19]. Topic model is a probabilistic generation model for finding abstract topics in a series of descriptions and it has been widely applied in information retrieval, natural language processing and machine learning.

Topic models, such as Probabilistic Latent Semantic Analysis (PLSA), has been applied to service discovery [20]. Zhang [22] applied the LDA model to cluster the services and extracted service goals from the textual descriptions of services so that they can help users improve their initial queries by recommending similar service goals. The above service clustering models need to specify the number of clusters in advance. Given the limitations of managers’ expertise, time and energy, they may not be flexible enough.

Existing cloud service selection approaches rarely consider some important data sources, such as tags, which have been proved to be very powerful in many domains and have been widely used in search engines, social medias, such as Facebook [23].

For cloud service recommendation, we develop a novel model consisting of two phases: cloud services clustering based on Dirichlet Process Multinomial Mixture model (DPMM) and cloud service ranking based on service tags and clusters we obtained. Details of our model are discussed next.

3 The Proposed Model

Our cloud service recommendation system recommends a set of related cloud service products for users by jointly leveraging the textual description information and tag data. Our approach consists of two main phases. In Phase 1, we propose a non-parametric DPMM model to cluster cloud services based on the textual information. In Phase 2, we propose the Personalized PageRank algorithm to rank the cloud services in each cluster obtained by the proposed DPMM method. The approach framework is illustrated in Fig. 1.

Fig. 1.
figure 1

The framework of the cloud services recommendation.

3.1 Phase1-The Topic Modeling of Web Cloud Service Using DPMM

The DPMM Model.

The DPMM is a powerful non-parametric Bayesian method [24] which means that the method can cluster according to the actual situation without specifying the number of clusters in advance. The probabilistic graph of DPMM is shown in Fig. 2 Here, \( d \) represents each cloud service description. \( z \) represents the cluster label of cloud service description. Multinomial \( \varPhi \) is distributed according to Dirichlet prior \( \beta \). Multinomial \( \varTheta \) is distributed according to stick-breaking prior \( \alpha \) (Table 1).

Fig. 2.
figure 2

The probabilistic graph of DPMM.

Table 1. Notations

The generative process of our DPMM is described as follows:

  1. (1)

    When generating description, the DPMM first selects the cluster \( z_{d} |\varTheta \sim Multinomial\left( \varTheta \right) \) for description \( d \) and \( z_{d} \) is distributed according to multinomial \( \varTheta \).

  2. (2)

    Then, generating the description \( d|z_{d} ,\left\{ {\varPhi_{k} } \right\}_{k = 1}^{\infty } \sim Multinomial\left( {\varPhi_{z} } \right) \) by the selected the cluster \( z_{d} \) from multinomial \( \varPhi_{{z_{d} }} \).

  3. (3)

    Generating the weight vector of clusters, \( \varTheta |\alpha \sim GEM\left( {1,\alpha } \right) \) by a stick-breaking construction with the hyper-parameter \( \alpha \).

  4. (4)

    Generating the cluster parameters \( \varPhi_{z} |\beta \sim Dirichlet\left( \beta \right) \) by a Dirichlet distribution with a hyper-parameter \( \beta \).

Choosing an Existing Cluster.

To classify description d to an existing cluster \( z \), the conditional probability can be calculated as follows:

$$ p\left( {z_{d} = z |\varvec{z}_{\neg d} ,\varvec{d},\alpha ,\beta } \right) $$
$$ \propto p\left( {z_{d} = z |\varvec{z}_{\neg d} ,\varvec{d}_{\neg d} ,\alpha ,\beta } \right)p\left( {d |z_{d} = z,\varvec{z}_{\neg d} ,\varvec{d}_{\neg d} ,\alpha ,\beta } \right) $$
$$ \propto p\left( {z_{d} = z |\varvec{z}_{\neg d} ,\alpha } \right) p\left( {d |z_{d} = z,\varvec{d}_{z,\neg d} ,\beta } \right) $$
(1)

Here, we apply the Bayes Rule in Eq. (1) and use the properties of D-Separation [24] in Eq. (1) where \( \neg d \) means the description \( d \) does not include and \( d_{z,\neg d} \) represents other descriptions allocated to cluster z.

The first expression in Eq. (1) means the probability of description d choosing cluster z given the cluster assignments of other descriptions. It can be derived as follows:

$$ p\left( {z_{d} = z |\varvec{z}_{\neg d} ,\alpha } \right) $$
$$ = \smallint p\left( {\varTheta |\varvec{z}_{\neg d} ,\alpha } \right)p\left( {z_{d} = z |\varTheta } \right)d\varTheta $$
$$ = \smallint Dir\left( {\varTheta |\varvec{m}_{\neg d} } \right)Mult\left( {z_{d} = z |\varTheta } \right)d\varTheta $$
$$ = \frac{{m_{z,\neg d} }}{D - 1 + \alpha } $$
(2)

The second expression in Eq. (1) indicates a predictive probability of description d given \( \varvec{d}_{z,\neg d} \). We can derive the second expression as follows:

$$ p\left( {d |z_{d} = z,\varvec{d}_{z,\neg d} ,\beta } \right) $$
$$ = \smallint p\left( {\varPhi_{z} |\varvec{d}_{z,\neg d} ,\beta } \right)p\left( {d |\varPhi_{z} ,z_{d} = z} \right)d\varPhi_{z} $$
$$ \begin{aligned} = \smallint Dir\left( {\varPhi_{z} |\varvec{n}_{z,\neg d} + \beta } \right)\prod\nolimits_{\omega \in d} {Mult\left( {\omega |\varPhi_{z} } \right)d\varPhi_{z} } \hfill \\ \hfill \\ \end{aligned} $$
$$ = \frac{{\mathop \prod \nolimits_{\omega \in d} \mathop \prod \nolimits_{j = 1}^{{N_{d}^{\omega } }} \left( {n_{z,\neg d}^{\omega } + \beta + j - 1} \right)}}{{\mathop \prod \nolimits_{i = 1}^{{N_{d} }} \left( {n_{z,\neg d} + V\beta + i - 1} \right)}} $$
(3)

Now we can get the probability of description \( {\text{d}} \) choosing an existing cluster \( {\text{z}} \) when we know the information of other descriptions and their cluster assignments as follows:

$$ p\left( {z_{d} = z |\varvec{z}_{\neg d} ,\varvec{d},\alpha ,\beta } \right) \propto \frac{{m_{z,\neg d} }}{D - 1 + \alpha } *\frac{{\mathop \prod \nolimits_{\omega \in d} \mathop \prod \nolimits_{j = 1}^{{N_{d}^{\omega } }} \left( {n_{z,\neg d}^{\omega } + \beta + j - 1} \right)}}{{\mathop \prod \nolimits_{i = 1}^{{N_{d} }} \left( {n_{z,\neg d} + V\beta + i - 1} \right)}} $$
(4)

Choosing a New Cluster.

We denote a new cluster as \( {\text{K}} + 1 \), the conditional probability description \( d \) belonging to a new cluster \( z \) can be calculated as follows:

$$ p\left( {z_{d} = K + 1 |\varvec{z}_{\neg d} ,\varvec{d},\alpha ,\beta } \right) $$
$$ \propto p\left( {z_{d} = K + 1 |\varvec{z}_{\neg d} ,\varvec{d}_{\neg d} ,\alpha ,\beta } \right)p\left( {d |z_{d} = K + 1,\varvec{z}_{\neg d} ,\varvec{d}_{\neg d} ,\alpha ,\beta } \right) $$
$$ \propto p\left( {z_{d} = K + 1 |\varvec{z}_{\neg d} ,\alpha } \right) p(d|z_{d} = K + 1,\varvec{d}_{z,\neg d} ,\beta ) $$
(5)

We can derive the first expression in Eq. (5) as follows:

$$ \begin{aligned} p\left( {z_{d} = K + 1 |\varvec{z}_{\neg d} ,\alpha } \right) = 1 - \sum\nolimits_{k = 1}^{K} {p\left( {z_{d} = k |\varvec{z}_{\neg d} ,\alpha } \right) = \frac{\alpha }{D - 1 + \alpha }} \hfill \\ \hfill \\ \end{aligned} $$
(6)

Then, the second expression in Eq. (5) can be derived as follows:

$$ p\left( {d |z_{d} = K + 1,\varvec{d}_{z,\neg d} ,\beta } \right) $$
$$ \begin{aligned} = \smallint Dir(\varPhi_{K + 1} |\beta )\prod\nolimits_{\omega \in d} {Mult(\omega |\varPhi_{K + 1} )d\varPhi_{K + 1} } \hfill \\ \hfill \\ \end{aligned} $$
$$ = \frac{{\mathop \prod \nolimits_{\omega \in d} \mathop \prod \nolimits_{j = 1}^{{N_{d}^{\omega } }} \left( {\beta + j - 1} \right)}}{{\mathop \prod \nolimits_{i = 1}^{{N_{d} }} \left( {V\beta + i - 1} \right)}} $$
(7)

Finally, we can get the probability of description \( d \) choosing a new cluster:

$$ p(z_{d} = K + 1|\varvec{z}_{\neg d} ,\varvec{d},\alpha ,\beta ) \propto \frac{\alpha }{D - 1 + \alpha } *\frac{{\mathop \prod \nolimits_{\omega \in d} \mathop \prod \nolimits_{j = 1}^{{N_{d}^{\omega } }} \left( {\beta + j - 1} \right)}}{{\mathop \prod \nolimits_{i = 1}^{{N_{d} }} \left( {V\beta + i - 1} \right)}} $$
(8)

After Gibbs Sampling, we can get the representation of clusters by \( \varPhi \). For each cluster \( z \), we can derive the posterior of \( \varPhi_{z} \) as follows:

$$ p\left( {\varPhi_{z} |\varvec{d},\varvec{z},\alpha ,\beta } \right) = \frac{1}{{{\varDelta} \left( {\varvec{n}_{z} + \beta } \right)}}\prod\nolimits_{\omega = 1}^{V} {\varPhi_{z,\omega }^{{n_{z}^{w} + \beta - 1}} = Dir\left( {\varPhi_{z} |\varvec{n}_{\varvec{z}} + \beta } \right)} $$
(9)

where \( \varvec{n}_{z} = \left\{ {n_{z}^{\omega } } \right\}_{\omega = 1}^{V} \).

Using the expectation of the Dirichlet distribution, we can infer \( \varPhi_{z,\omega } \) as follows:

$$ \varPhi_{z,\omega } = \frac{{n_{z}^{\omega } + \beta }}{{n_{z} + V\beta }} $$
(10)

3.2 Phase2-Cloud Service Ranking Using Personalized PageRank Algorithm

In Phase1, cloud service products are classified into different clusters based on the proposed DPMM algorithm. However, it is still difficult to recommend the appropriate services to users among the massive cloud service products in same cluster. Here we propose the Personalized PageRank algorithm [25] to rank the cloud service products in same cluster.

The proposed Personalized PageRank algorithm employs random walk to rank nodes of a graph consisting of cloud services and tags as nodes and it is a variation of PageRank [26]. PageRank model random-walk process on the web graph composed of numerous pages as nodes and during the process a random surfer will stay the current page \( i \) as the next step with probability 1-ε and access to other pages with probability ε. Once the surfer decides to access to other pages, he will uniformly choose a hyperlink contained in the current page. Thus, the random access probability of each page can be calculated as:

$$ \begin{aligned} PR\left( i \right) = \frac{{\left( {1 - \varepsilon } \right)}}{N} + \varepsilon \sum\nolimits_{j \in in\left( i \right)} {\frac{PR\left( j \right)}{{\left| {out\left( j \right)} \right|}}} \hfill \\ \hfill \\ \end{aligned} $$
(11)

where PR(i) represents the probability of a node to be selected. \( N \) is the number of all nodes. \( in\left( i \right) \) represents the node set pointing to node. \( i \) and \( out\left( j \right) \) represents the node set pointed by node \( j \). The first part of Eq. (11) means the probability of the surfer staying on the current page \( i \) when it is the starting pointing and the second part means the probability of the surfer jumping back to the current page \( i \) by clicking on other pages.

For calculating the access probability of a cloud service node in Personalized PageRank, we substitute \( \frac{{\left( {1 - \varepsilon } \right)}}{N} \) to \( \left( {1 - \varepsilon } \right)\gamma_{i} \) where \( \gamma_{i} \) is 1 if the node is our target service and others \( \gamma_{i} \) is 0. In this way, we can get the relevance of all services relative to the target cloud service.

The Personalized PageRank algorithm will quickly converge to a stable state by recursively calculating and updating the probability of each node. As a result, we can use the value \( PR\left( i \right) \) of each node as the rank score and recommend Top-k cloud services by selecting cloud service nodes in the node set for the target cloud service.

4 Experiments and Results

4.1 Data Sets and Preprocessing

Experimental data is obtained from Programmable Web, which provides detailed profile information of massive cloud services. The information of cloud services contains services’ name, descriptive text and tags. Our data set consists of 799 cloud services and 790 distinct tags. Many tags exist in multiple services, totally 2,745 tags are included in these services. In addition, the average length (i.e., number of words) of each text description is 71.

Because the raw data of the descriptive texts are very noisy, we conduct the following preprocessing: (1) Convert letters into lowercase; (2) Remove meaningless words such as stop words, low frequency words, high frequency words and characters not in Latin.

4.2 Baseline Methods

In the experimental study, we compare DPMM with two typical service clustering methods for service texts nowadays. The details of them are shown below.

K-Means:

K-means [27] is probably the most widely used method for clustering. Before being able to utilizing k-means on a set of text descriptions, the texts must be represented as mutually comparable vectors. To achieve this task, each text description can be represented using the TF-IDF score [28]. 

LDA:

We consider the topics found by LDA [29] as clusters and assign each cloud service to the cluster with the highest value in its topic proportion vector.

Some automatic evaluation metrics are proposed in the past few years to measure the quality of the clusters discovered. The typical metric is the coherence score [30], which indicates that a cluster (or topic) is more coherent if the most probable words in it co-occurring more frequently in the corpus. We can calculate the coherence value of a cluster \( k \) as follows:

$$ \begin{aligned} C_{k} = \sum\nolimits_{m = 2}^{M} {\sum\nolimits_{l = 1}^{m - 1} {log\frac{{D\left( {v_{m}^{\left( k \right)} ,v_{l}^{\left( k \right)} } \right)}}{{D\left( {v_{l}^{\left( k \right)} } \right)}}} } \hfill \\ \hfill \\ \end{aligned} $$
(12)

where \( v_{m}^{\left( k \right)} \) is one of the most \( M \) probable words in cluster \( k; D\left( {v_{l}^{\left( k \right)} } \right) \) represent the description frequency of word \( l \); and \( D\left( {v_{m}^{\left( k \right)} ,v_{l}^{\left( k \right)} } \right) \) is the co-description frequency of words.

4.3 Parameter Setting

For DPMM, we set K = 1, β = 0.01. We also assume \( Gamma\left( {1,1} \right) \) priors over the parameters \( \alpha_{0} \) that can be optimized in Gibbs sampling procedure [31]. In LDA model, we place α = 50/k and β = 0.1 where \( K \) is the number of topics assumed by LDA.

4.4 Results of Service Clustering

Before presenting the final comparisons of baseline methods, we first show the results of cloud services clustering discovered by DPMM. We run Gibbs samplers for 3000 iterations and finally obtain 26 clusters. Figure 3 shows our cluster results with word cloud. Our methods exhibit effectiveness in grouping related cloud services and semantically coherent words together. For instance, Cluster 1 includes cloud-based services designed to handle description, optical character recognition (OCR), and email formats. Cluster 2 offers cloud-based software-as-a-service platforms for enterprise or business. Cluster 3 presents dedicated servers and cloud hosting services for computing. Cluster 4 is about Internet of Thing (IoT) platforms for connections between the clouds and different kinds of devices or appliances. Cluster 5 is about communication technologies that can integrate voice, messaging and email into application.

Fig. 3.
figure 3

Word clouds of the cluster results.

To evaluate the overall quality of a cluster set, we analyze the average coherence score, namely \( \begin{aligned} \frac{1}{K}\sum\nolimits_{k = 1}^{K} {C_{k} } \hfill \\ \hfill \\ \end{aligned} \), for each method. The result is listed in Table 2, where the number of top words ranges from 5 to 25. As shown in Table 2, we find that DPMM obtains the highest coherence score in all the settings. It demonstrates that the DPMM is able to achieve better performance for cluster quality compared with K-means and LDA.

Table 2. Comparison of coherence scores among different methods. A larger score indicates better performance for cluster quality.

4.5 Results of Recommendation

In this section, we show the results of cloud services recommendation. Using personalized PageRank algorithm for each cluster discovered by DPMM, we obtain a ranking list for each cloud service based on the relevance score. For assessing the performance of our results, we adopt Jaccard coefficient, which is an alternative approach to measuring the correlation between products [32, 33]. The Jaccard coefficient is defined as:

$$ Jaccard\left( {A,B} \right) = \frac{{\left| {d_{A} \;{\bigcap }\;d_{B} } \right|}}{{\left| {d_{A} \;{\bigcup }\;d_{B} } \right|}} $$
(13)

Where \( A \) is the given product and \( B \) the recommended product; \( d_{A} \) and \( d_{B} \) are the textual descriptions of product \( A \) and \( B \) respectively. \( d_{A} \;{\bigcap }\;d_{B} \) is the intersection between two sets \( d_{A} \) and \( d_{B} \). Thus \( d_{A} \;{\bigcap }\;d_{B} \) reveals all words which are in both sets. \( d_{A} \;\;{\bigcup }\;\;d_{B} \) is the union between two sets \( d_{A} \) and \( d_{B} \), which represents all words in two sets.

In our tasks, we calculate the averaged Jaccard coefficient of different recommendation lists which are obtained by three methods (Cosine similarity with TF-IDF on textual descriptions, Personalized PageRank on tags, our two-phase approach by jointly leveraging textual descriptions and tags). Each recommendation list contains \( L \) highest recommended cloud service resulting. For a given \( L \), the result with a higher averaged Jaccard coefficient is better, and vice versa. The averaged Jaccard coefficient for some typical lengths of recommendation list are shown in Fig. 4, as shown in the Figure, our recommendation results achieve better performance than other two methods, which strongly guarantee the validity of our two-phase approach.

Fig. 4.
figure 4

Comparison of the averaged Jaccard coefficient of different methods.

5 Conclusion

In this paper, we have presented a novel two-phase method by utilizing service text descriptions and tags, to extract latent relations among different cloud services, to generate relevant cloud service recommendation results for aiding users in discovering the available combination of cloud services. Our method is designed to successfully address the cloud service clustering and recommendation. With experiments on a real-world dataset consisting of 799 cloud services and 790 distinct tags obtained from Programmable Web, we demonstrate the effectiveness of this method.