Keywords

1 Introduction

Cloud computing is widely adopted by businesses and governments, and a large number of them prefer to deploy and run their software applications and enterprise systems on the cloud platform [1, 2]. Since the cloud is a multi-tenancy environment shared by multiple users, a significant concern about cloud systems is their security [2, 3]. A survey from the research firm Gartner in 2015 found that around 95% of the consumers of cloud computing reported cloud security issues [4]. A survey conducted by the Cloud Security Alliance (CSA) in 2016 indicated that at least 35% of the business owners did not trust the security of cloud as much as internal IT systems [5]. Cloud security has become a significant concern for ensuring business continuity [3].

In order to address the security issues of cloud systems and ensure the business continuity despite potential attacks, a useful procedure is to perform security risk assessment for cloud systems at runtime [6, 7]. Existing methods of cloud risk assessment [6, 8,9,10] largely focus on individual system components. They are concerned with either the application-level attacks that have an impact on service availability [10] or the intrinsic vendor-level risks of the cloud providers themselves [6, 8], and some of them do not fully leverage the risks that result from the on-demand nature of cloud [6, 8, 10]. Another problem with existing cloud risk assessment mechanisms is that they do not consider the complete dependencies of the cloud resources of the cloud system or the full conflictions of the cloud system’s threats [9, 10].

As such, in this paper we propose a novel cloud risk assessment framework, called ARA-Assessor, for determining the runtime risk value of the cloud system provided by the system owner. ARA-Assessor is application-aware, which means that the risk assessment leverages the system specification model provided by the system owner. ARA-Assessor relies on the infrastructure-level threats of the cloud system to calculate the risk. We implement ARA-Assessor and evaluate it with three representative types of cloud systems on AWS cloud. The experimental results show that our proposed method is able to continuously and quantitatively assess the runtime risk of cloud systems in an automated way and it is generalizable for multiple cloud systems.

The research contributions of this paper are: (1) we propose a generalizable cloud runtime risk assessment method; (2) we propose a generic cloud system modelling approach and a generic cloud system threats modelling approach; (3) we propose a cloud resource dependencies propagation mechanism and a recursive mechanism for resolving the threats conflictions issue for analyzing cloud system risk.

The remainder of this paper is organized as follows: Sect. 2 describes a motivating example; Sect. 3 discusses cloud system modelling; Sect. 4 discusses cloud system threats modelling; Sect. 5 illustrates our risk assessment method; Sect. 6 presents our experimental evaluation; Sect. 7 discusses the validity and general applicability of our model; Sect. 8 discusses the related work; Sect. 9 provides the conclusion and our future work.

2 A Motivating Example

We use a sample cloud system, as shown in Fig. 1, to discuss the risk management. This cloud system follows the typical 2-tier architecture [11]. The E-Business service and the report generation service run in the web instances. These two services are auto-scaled by the auto-scaling service provided by the Auto Scaling Group (ASG), and the workload requests on these two services are dispatched by the load balancing service provided by the Elastic Load Balancer (ELB). The E-Business service triggers the production database service running inside the production database instance, and the report generation service triggers the report database service running inside the reporting database instance. There is periodical synchronization from the production database store to the reporting database store. A number of potential cloud infrastructure-level threats could occur to this sample cloud system. We categorize these threats into the following categories: (1) threats on the cloud login credentials; (2) threats on the ASG; (3) threats on the LC; (4) threats on web instances; (5) threats on database instances; (6) threats on the ELB; (7) threats on each instance’s Amazon Machine Image (AMI); (8) threats on the security group. To study the occurrence frequency of the threats on the cloud system, we analyzed the cloud security report from Alert Logic [12], and obtained the month-to-month attack spread for real-world cloud systems in 2014, as shown in Fig. 2. How to accurately determine the system’s risk is a question.

Fig. 1.
figure 1

2-tier cloud system.

Fig. 2.
figure 2

Attacks for real cloud systems in 2014 [12].

3 Generic Modelling of Cloud Systems

A cloud system is deployed on a set of allocated cloud resources. Each cloud resource has a resource id, belongs to a cloud resource type (e.g. ELB), and has a significance value which reflects the importance of the resource. Some cloud resources in the system have dependencies, i.e. the attacks on such a resource can affect its dependent cloud resources. Hence, we can use DAG (Directed Acyclic Graph) to model the cloud system resources. The cloud system model, denoted as S, is represented as:

$$ S = \left( {\varvec{R},\varvec{E}} \right) $$
(1)

where R refers to the set of cloud resources, and E refers to the set of cloud resource dependencies. Each element of R, denoted as R i (1 ≤ i ≤ |R|), is represented as:

$$ R_{i} = \left( {N,V,W} \right) $$
(2)

where N denotes resource id, V denotes resource type, and W denotes resource significance value. Each element of E, denoted as E i (1 ≤ i ≤ |E|), is denoted as:

$$ E_{i} = \left( {R_{k} ,R_{m} } \right),R_{k} \in \varvec{R},R_{m} \in \varvec{R} $$
(3)

where R k refers to any cloud resource that can affect another cloud resource, and R m refers to R k ’s affected cloud resource. Taking the cloud system mentioned in Sect. 2 as an example, its system model is shown in Fig. 3. The resources are represented as R 1 to R 10 . The id, type and significance value of each cloud resource are presented. The cloud resource dependencies are represented by the directed arrows.

Fig. 3.
figure 3

DAG graph for the sample cloud system.

4 Generic Modelling of Cloud System’s Threats

A cloud system’s threats refer to all the potential infrastructure-level cloud threats that can occur to the system. Each cloud threat consists of the following information: (1) threat name; (2) threat feature which specifies the types of cloud resources that can be directly affected by the threat; (3) threat’s directly attacked cloud resources; (4) threat’s overall affected cloud resources propagated from the dependencies of the directly attacked resources; (5) threat impact value; (6) threat occurrence probability. Some threats have conflictions with each other, i.e., they are unable to occur at the same time. For example, terminating the database instance and changing database instance type cannot occur simultaneously. Hence, we can model the cloud system’s threats as a graph. The cloud system threats model, denoted as T S , is represented as:

$$ T_{S} = \left( {\varvec{T},\varvec{E}} \right) $$
(4)

where T refers to the set of cloud system threats, and E refers to the set of cloud system threats conflictions. Each element of T, denoted as T i (1 ≤ i ≤ |T|), is represented as:

$$ T_{i} = \left( {N, F,\varvec{R}_{\varvec{D}} ,\varvec{R}_{\varvec{A}} ,I,P} \right) $$
(5)

where N denotes each threat’s name, F denotes threat’s feature, R D denotes each threat’s directly attacked cloud resources set and each element in R D follows the model defined in Formula (2) in Sect. 3, R A denotes each threat’s overall affected cloud resources set and each element in R A follows the model defined in Formula (2) in Sect. 3, I denotes each threat’s impact value, and P denotes each threat’s occurrence probability. Each element of E, denoted as E i (1 ≤ i ≤ |E|), is denoted as:

$$ E_{i} = \left( {T_{k} ,T_{m} } \right),T_{k} \in \varvec{T},T_{m} \in \varvec{T} $$
(6)

where T k refers to any threat that conflicts with another threat, and T m refers to T k ’s conflicted threat. Taking the sample cloud system mentioned in Sect. 2 as an example, the threats graph of the system is shown in Fig. 4. The threats are T 1 to T n . The threats conflictions are represented by the undirected edges. T 1 and T 2 cannot occur simultaneously (T 1 conflicts with T 2 ); T 3 and T 4 cannot occur simultaneously (T 3 conflicts with T 4 ); T 5 and T 6 cannot occur simultaneously (T 5 conflicts with T 6 ).

Fig. 4.
figure 4

Threats graph for the sample cloud system.

5 Our Risk Assessment Method

In order for ARA-Assessor to assess the runtime risk for a cloud system, it requires two inputs: (1) the cloud system model S and (2) the cloud full threats model T F (T F  = (T, E)). S is manually provided by the cloud system owner who has enough system domain knowledge. The significance value of each cloud resource ranges from 1 to 5. It is determined according to the importance of the internal service. Resource dependencies are determined according to the interactions of the services inside the resources and the dependencies specified in cloud resources documentations [17]. For example, a web instance interacts with a database instance because the web service inside the web instance triggers the database service inside the database instance. For another example, an elastic load balancer (ELB) or an auto-scaling group (ASG) contains multiple web instances. T F resembles the threats model defined in Sect. 4. We assume T F is manually prepared by the system owner. The threats set and threats conflictions set of it can be determined by analyzing and understanding the domain knowledge on cloud threats and published dataset [1, 18,19,20,21]. The threats in T F include the threats related to all types of cloud resources, e.g. cloud web instance related threats, cloud database instance related threats, ASG/ELB related threats, etc.

Since attacks on a cloud system are usually unpredictable and can occur at any time, ARA-Assessor periodically assesses the cloud system’s risk. We implement ARA-Assessor as a dedicated service, which embodies the concept of “Security as a Service” [16]. Prior to performing a periodical risk assessment, ARA-Assessor first conducts the one-off procedure, which consists of five activities: (1) ARA-Assessor automatically determines the system’s threats subset model T S using the inputs of cloud system model S and full cloud threats model T F ; (2) ARA-Assessor obtains the initial occurrence probability of each threat T S  → T i (1 ≤ i ≤ |T S  → T|); (3) ARA-Assessor automatically determines the affected cloud resources for each threat T S  → T i ; (4) the impact value of each threat T S  → T i is calculated; (5) the cloud consumer specifies the frequency of performing risk assessment, e.g. every minute. Then four activities are conducted upon each time tick: (1) ARA-Assessor relies on external attack detection services [13,14,15] to detect the runtime threats and events that occur to the system, denoted as T RT ; (2) we derive a threats sub-model from T S , denoted as \( T^{\prime}_{S} \) (\( T^{\prime}_{S} \; = \;\left( {\varvec{T^{\prime}},\,\varvec{E^{\prime}}} \right) \), where removes all the conflicted threats of each runtime threat in T RT from T S  → T, and removes all the conflictions with regard to each runtime threat in T RT from T S  → E); (3) the occurrence probability of each threat in \( T^{\prime}_{S} \) is updated based on T RT ; (4) ARA-Assessor uses the threats sub-model \( T^{\prime}_{S} \) to calculate the system’s risk value for the time tick, denoted as RI S , as below:

$$ RI_{S} = Max\left( {\sum\nolimits_{j = 1}^{{\left| {M(T_{S}^{{\prime }} )\left[ i \right] \to \varvec{T}} \right|}} {\left( {M(T_{S}^{{\prime }} )\left[ i \right] \to T\left[ j \right] \to I} \right)} \times \left( {M(T_{S}^{{\prime }} )\left[ i \right] \to T\left[ j \right] \to P} \right)} \right) $$
(7)

where M(\( T^{\prime}_{S} \)) refers to an array of threats sub-models derived from \( T^{\prime}_{S} \), each threats sub-model M(\( T^{\prime}_{S} \))[i] (i ranges from 1 to |M(\( T^{\prime}_{S} \))|) represents a case of threats model that contains all the threats from \( T^{\prime}_{S} \) which are independent of each other and do not conflict with each other, and this array enumerates complete cases of such threats models for \( T^{\prime}_{S} \). For each case, the risk value is calculated, and the maximum of the calculated risk values is the quantified risk of the cloud system for that time tick.

5.1 Determination of Cloud System’s Threats Subset Model

The cloud infrastructure-level threats that can occur to the cloud system are a subset of all cloud infrastructure-level threats that can occur to all cloud resources. When we rely on external attack detection tools to detect threats, we should only subscribe the system’s threats subset in order to save cost. Hence, we need to determine the cloud system’s threats subset. Using the two inputs of cloud system model S and full cloud threats model T F , the cloud system’s threats subset model T S is determined as below:

$$ T_{S} = \left( {\varvec{T},\varvec{E}} \right) $$
(8)

where T is a subset of T F  → T, and E is a subset of T F  → E. Each threat in T S , denoted as T S  → T i , satisfies such a condition: (T S  → T i  → F) ∩ V(S → R) ≠ Ø, where V(S → R) represents the cloud system’s overall resources types set.

Taking the sample cloud system mentioned in Sect. 2 as an illustrating example, the determined threats subset model is the one represented in Fig. 4 in Sect. 4.

5.2 Determination of Threats Initial Occurrence Probabilities

The initial occurrence probabilities of the cloud system’s threats can be determined by analyzing cloud attacks historical data such as the security reports from Symantec Corporation [22]. Based on the research on cloud security threats analysis done by the University of Tunis [23], we are able to obtain the threats probabilities as shown in Fig. 5. According to the research, the probability of no cloud threats occurring is 0.97, so the upper bound probability for each cloud threat is 0.03. For simplicity, we assume that the occurrence probabilities of all the threats for the cloud system are 0.03.

Fig. 5.
figure 5

Threats probabilities for cloud [23].

5.3 Determination of Cloud Resources Affected by System Threats

The affected cloud resources for a threat refer to the system cloud resources that are affected either directly or indirectly by the threat. The indirectly affected resources are propagated from the directly attacked resources. A challenge with the resource propagation is that the resource dependencies can be multi-layer, which means that a cloud resource’s dependent resources can further have dependent resources, and so on. We address this challenge and design the affected resources determination mechanism, as illustrated in Algorithm 1. For each threat, we first determine its directly attacked cloud resources by mapping the feature of the threat with the cloud resources in the system model (DetermineDirectlyAffectedResources(T → F, S → R)); second, for each directly attacked resource of the threat R, we add it into the threat’s affected resources set, and then we use a recursive function to add its overall propagated dependent affected resources into the threat’s affected resources set (RecursivelyDetermineAffectedResources(R, T → R A )). Inside the recursive function, we first get R’s dependent affected cloud resources set (GetAffectedResources(R, S → E)). If this set is empty, we exit the recursion; otherwise, for each of R’s dependent affected resources, , we add it into the threat’s affected resources set, and then we further add its overall propagated dependent resources into the threat’s affected resources set.

figure a

Taking the sample cloud system mentioned in Sect. 2 as an illustrating example, its system model is represented by Fig. 3 in Sect. 3. One of its threats is “Attack the web instance AMI”. This threat’s attacking point is R 4 . R 4 affects R 3 , R 3 further affects R 1 , and R 1 further affects R 5 and R 6 , and hence the overall affected cloud resources of this threat are R 1 , R 3 , R 4 , R 5 and R 6 .

5.4 Calculation of Threats Impact Values

For each threat in the cloud system’s infrastructure-level threats set, with its affected cloud resources determined, we are able to compute its impact value based on the significance value of each affected cloud resource. The impact value of each threat in the cloud system’s threats subset, denoted as T S  → T i  → I (1 ≤ i ≤ |T S  → T|), is calculated as below:

$$ T_{S} \to T_{i} \to I = \sum\nolimits_{j = 1}^{{\left| {T_{S} \to T_{i} \to \varvec{R}_{\varvec{A}} } \right|}} {\left( {T_{S} \to T_{i} \to R_{A} \left[ j \right] \to W} \right)} $$
(9)

where T S  → T i  → R A denotes the overall affected cloud resources set of each threat, and T S  → T i  → R A [j] → W denotes the significance value of each affected cloud resource of each threat.

Taking the sample cloud system mentioned in Sect. 2 as an illustrating example, according to Sect. 5.4, the threat of “Attack the web instance AMI” affects the cloud resources of R 1 , R 3 , R 4 , R 5 , and R 6 , and hence the impact value of this threat is calculated to be 19 (3 + 2 + 4 + 5 + 5).

5.5 Threats Sub-model Derivation

Upon a time tick, the detected the runtime threats and events are denoted as T RT ; and the threats sub-model derived from the system’s threats set T S is denoted as \( T^{\prime}_{S} \), which removes all the threats in T RT from T S . If T RT is empty, \( T^{\prime}_{S} \) is equal to T S . We define GetConflictedThreats(T, E) as the function to get the conflicted threats set of threat T, and define GetConflictions(T, E) as the function to get the set of conflictions with regard to threat T. Then, \( T^{\prime}_{S} \) is determined as below:

$$ T_{S}^{{\prime }} = \left( {\varvec{T}^{{\mathbf{\prime }}} ,\varvec{E}^{{\mathbf{\prime }}} } \right) $$
(10)
$$ \varvec{T}^{{\mathbf{\prime }}} = T_{S} \to \varvec{T} - \sum\nolimits_{i = 1}^{{\left| {T_{RT} \to \varvec{T}} \right|}} {{\text{GetConflictedThreats}}\left( {T_{RT} \to T_{i} ,T_{S} \to \varvec{E}} \right)} $$
(11)
$$ \varvec{E}^{{\mathbf{\prime }}} = T_{S} \to \varvec{E} - \sum\nolimits_{i = 1}^{{\left| {T_{RT} \to \varvec{T}} \right|}} {{\text{GetConflictions}}\left( {T_{RT} \to T_{i} ,T_{S} \to \varvec{E}} \right)} $$
(12)

5.6 Updating of Latest Occurrence Probabilities of Threats

Now, ARA-Assessor needs to perform probability updating for each threat in \( T^{\prime}_{S} \) based on the detected runtime threats and events, denoted as T RT . The runtime threats detected are those threats that are factually occurring to the cloud system. The runtime events consist of two attacks: (1) CPU-intensive user requests explosion, which means that the attackers send excessive workload requests that significantly affect the CPU utilization of cloud instances to the cloud system; (2) data-intensive user requests explosion, which means that the attackers send excessive workload requests that significantly affect the database to the cloud system. Cloud systems are not necessarily faced with both of the two runtime events. If the cloud system only contains web servers (e.g. web instances running Tomcat service), then it can only have the runtime event of “CPU-intensive user requests explosion”; if the cloud system contains both web servers and database servers, then it can have both runtime events.

For the runtime threats detected upon a time tick, the updated occurrence probability of each detected runtime threat is set to be 1 because it has factually occurred to the cloud system and is causing certain negative consequence on the cloud system.

When either of the two runtime events happens upon a time tick, the probability of the correspondent threat must be updated, and we argue that the updated probability (denoted as ) is relevant to the number of CPU-intensive user requests or data-intensive user requests at that time tick (denoted as W), the threshold number of CPU-intensive user requests or data-intensive user requests for the cloud system (denoted as W threshold ), and the initial occurrence probability of the correspondent threat (denoted as P). Pʹ must satisfy three requirements: (1) is greater than P; (2) increases with W; (3) converges to 1. Hence, we calculate Pʹ as below:

$$ P^{{\prime }} = 1 - \frac{1 - p}{{a^{{\left( {W - W_{threshold} } \right)}} }} $$
(13)

where a is a constant greater than 1, in order to make an increasing function (i.e. increases with W). The value of the constant a (a > 1) is determined as below:

$$ a = \sqrt[{\left( {\frac{{10000 + W_{threshold} }}{2} - W_{threshold} } \right)}]{{\frac{1 - p}{{1 - \frac{1 + p}{2}}}}} = \sqrt[{\frac{{10000 - W_{threshold} }}{2} }]{2} $$
(14)

5.7 The System Risk Determination Mechanism

The mechanism of calculating the risk based on \( T^{\prime}_{S} \) is shown in Algorithm 2. We first derive a list of threats models each of which only contains the threats that do not conflict with each other (GetValidThreatsModelsList(\( T^{\prime}_{S} \))), then we calculate the risk value for each threats model in the list, and we return the maximum risk value as the final risk for the system. The function of GetValidThreatsModelsList(\( T^{\prime}_{S} \)) utilizes the recursive mechanism in order to enumerate all the cases where the threats in the threat model of the system are able to occur simultaneously. Inside this function, we first obtain all the threats that do not conflict with any other threats (GetThreatswithoutConflictions(\( T^{\prime}_{S} \))). If the number of such threats is equal to the overall threats number, we return such threats as the output of the function; otherwise, if the number of such threats is greater than 0, we first divide the threats model into threats with conflictions (TSubModel) and threats without conflictions (T W ), and then we recursively call the same function (GetValidThreatsModelsList(TSubModel)) using TSubModel as the input; otherwise, if the number of threats without conflictions is 0, we loop through each threat and divide the threats model into each threat and the threats that do not conflict with it, and then we recursively call the same function to obtain the output (GetValidThreatsModelsList(T C )) and merge it into the final output (TMList).

figure b

6 Experimental Evaluation

We implemented the prototype of ARA-Assessor and evaluated it with three typical cloud systems deployed on AWS EC2 [21]. They are: (1) the “all services in one instance” cloud system; (2) the 2-tier cloud system with a production database; (3) the 2-tier cloud system with a production database and a reporting database. Each of them is a simplified version of the real-world cloud system. These three cloud systems are a good representation of all types of cloud systems because they consider a variety of cloud resources composition scenarios of different cloud systems, so they are complete enough to verify the feasibility and generalizability of ARA-Assessor. The experimental environment is shown in Fig. 6. Since we assume we rely on external attack detection services to detect threats and events, we simulate the detection of runtime threats and events. ARA-Assessor triggers the attack detection services by simulation to obtain the simulated detection results, using the generated cloud system model as the input. The output is the cloud system’s quantified ongoing runtime risk.

Fig. 6.
figure 6

Experimental environment.

6.1 Experimental Procedure

For each of the three cloud systems, the frequency of performing risk assessment is set to be 1 min. This is in accordance with the monitoring frequency of the CloudWatch function provided by AWS [24]. We simulate the running of each system for 60 min and dynamically inject one or more infrastructure-level threats and events at random time points during each system’s running by simulation. The infrastructure-level threats randomly injected are able to occur simultaneously. The events can be injected by using an open source tool named Httperf [25], which is used for generating user workload requests from the client side. Since we use the free-tier cloud resources in our experiments, the allowed maximum number of CPU-intensive workload requests and the allowed maximum number of data-intensive workload requests should follow the requests number threshold of a free-tier instance. Based on our previous empirical study [26], we know that the requests threshold for a free-tier instance is 360 simultaneous requests per machine, including both CPU-intensive requests and data-intensive requests. As such, for the first cloud system, we determine that the CPU-intensive requests threshold is 180 and the data-intensive requests threshold is 180. For the second cloud system, there are initially eight free-tier web instances attached to an ASG and registered with an ELB, and one free-tier database instance shared by the web instances. So, the overall requests number threshold for the system is 2880, so the CPU-intensive requests threshold is 1440 and the data-intensive requests threshold is 1440. For the third cloud system, there are initially six free-tier web instances attached to an ASG and registered with an ELB, six free-tier reporting instances attached to another ASG and registered with another ELB, one free-tier production database instance shared by the web instances, and one free-tier reporting database instance shared by the reporting instances. So, the overall requests number threshold for the system is 4320, and hence the CPU-intensive requests threshold is 2160 and the data-intensive requests threshold is 2160. In the real-world case, the values of the two thresholds of a real industry system should be determined by the system owner. Since we rely on external attack detection tools, we simulate the detection results of threats and events and perform risk assessment. For each of the three cloud systems we run the experiment 50 times and obtain the ongoing risk values and the average execution time of performing risk assessment. The hardware configuration of the ARA-Assessor server is: CPU-Dual Core 2.6 GHz and RAM-8 GB.

6.2 Experimental Results

The assessed risk values and the average execution time for the three cloud systems are shown in Figs. 7, 8 and 9. The results are based on 50 runs.

Fig. 7.
figure 7

Risk assessment results for the first cloud system.

Fig. 8.
figure 8

Risk assessment results for the second cloud system.

Fig. 9.
figure 9

Risk assessment results for the third cloud system.

The risk threshold for each cloud system is determined to be 10.7, 20.5 and 36.3, respectively. Where the risk value is greater than the threshold, it is considered to be high risk and appropriate responses such as performing system recovery should be considered. The average execution time of performing risk assessment for each time point for each cloud system is less than 1300 ms, which is well below the risk assessment time frequency set by the system owner (60000 ms). This also buffers enough time to perform threats and events detection for the systems. The maximum relative standard deviation for the execution time of performing risk assessment for each time point for each cloud system is 1.9%.

7 Validity and Applicability of the Model

First, while the cloud threats concerned by ARA-Assessor are only cloud infrastructure-level, we assume that the business stakeholders and the system owners are capable of figuring out the full set of all infrastructure-level cloud threats. In the implementation of our proposed method, we only consider certain types of infrastructure-level cloud threats when constructing the full cloud threats set, and we argue that by doing this it does not influence our method’s validity.

Second, the determination of the initial occurrence probabilities of a cloud system’s threats is only based on a limited study of existing research work, and hence the determined probabilities might not be comprehensive enough. We expect the businesses to figure out the probabilities in a more sophisticated manner.

Third, for evaluating the generalizability of ARA-Assessor, we only used three types of cloud systems. Although we argue that these three systems are a good representation of all cloud systems, it is still worthwhile to evaluate our method against more types of cloud systems. Moreover, it would be even better if we could evaluate our method with systems deployed on other cloud platforms.

8 Related Work

8.1 Risks and Threat Models in Cloud Computing

The European Network & Information Systems Agency (ENISA) [18] classifies cloud computing risks into three categories: Organizational, Technical and Legal [1]. The organizational risks refer to “all the risks that may impact the structure of the organization or the business as an entity”, e.g. “loss of business reputation due to the tenants sharing the same resources” [1]. The technical risks refer to “problems or failures associated with the provided services or technologies contacted from the cloud service provider” [1], e.g. “malicious insiders/outsiders attacks on cloud” [1]. The legal risks refer to “issues that surround data being exchanged across multiple countries that have different laws and regulations concerning data traversal, protection requirements and privacy laws” [1]. The Cloud Security Alliance (CSA) [19] lists the following threats as the top cloud computing risks: malicious insiders, data loss/leakage, abuse and nefarious use of cloud computing and shared technology vulnerabilities. From the perspective of cloud infrastructure, the cloud threats include attacks on cloud instances (virtual machines), attacks on cloud data storage and attacks on cloud networking facilities such as elastic load balancers or auto scaling groups [20, 27]. From the perspective of SaaS providers or cloud consumers, cloud threats include attacks on different application functions, attacks on the business workflows of cloud systems, and attacks on the service modules of cloud systems [28].

8.2 Existing Risk Assessment Methods for Cloud Computing

Risk is measured in terms of the consequence (or impact) and the likelihood of the attacking event or threat [29]. Researchers from the University of Leeds proposed a cloud risk assessment framework used by cloud service providers and service consumers to assess risk during service deployment and operation [8]. This framework quantitatively assesses the risks in various stages of the service lifecycle, and it considers the risks of both cloud service providers (the cloud consumers) and infrastructure providers (cloud vendors) [8]. The threat impacts are mainly determined by looking at the seven security criteria (e.g. past SLA performance) of cloud providers and the three performance criteria (e.g. past SLA performance) of cloud consumers [8]. This framework calculates risk by computing the cross-product of the threat impacts vector and the threat probabilities vector [8]. However, the main drawback of this risk assessment framework is that it does not consider the threats resulting from the on-demand nature of cloud since it largely focuses on the seven typical security evaluation criteria of various cloud infrastructure providers and the three performance criteria of the cloud consumers. Researchers from Lincoln Laboratory of MIT proposed another risk assessment tool for cloud services, which can be used for evaluating the runtime risk of particular cloud services [9]. This tool assesses system runtime risk based on analyzing a list of possible runtime threats on cloud services. The impact of each threat is determined by investigating how many virtual machines can be affected by the threat, and the probability of each threat is derived from external historical data [9]. However, the major problem with this tool is that its way of calculating threat impacts does not well capture the real natures of the consequences of security failures on cloud, and it does not fully consider the dependencies of cloud resources of the cloud system or the conflictions of cloud system threats. In comparison, our proposed method addresses all of these drawbacks. To the best of our knowledge, it is the first time that such a cloud risk assessment framework is ever proposed.

9 Conclusion and Future Work

Systems deployed on the cloud are prone to security attacks, which is one of the greatest issues with cloud computing. Cloud system risk assessment is helpful for managing and analyzing the security of cloud systems. Since existing methods of cloud system risk assessment usually do not fully consider cloud resources dependencies or cloud system threats conflictions, we proposed ARA-Assessor to continuously perform risk assessment for cloud systems. ARA-Assessor is application-aware and leverages cloud infrastructure-level threats. We implemented the prototype of ARA-Assessor and evaluated it using three typical cloud systems. Based on the experimental results, we can see that our method is able to automatically assess the runtime risk of cloud systems in a continuous manner, and it is generalizable for multiple types of cloud systems. Our future work includes: (1) include the application-level and service-level threats into our risk assessment method; (2) evaluate the feasibility of our method with more types of cloud systems and more cloud platforms.