Schedulability of probabilistic mixed-criticality systems

Abstract

Mixed-criticality systems often need to fulfill safety standards that dictate different requirements for each criticality level, for example given in the ‘probability of failure per hour’ format. A recent trend suggests designing this kind of systems by jointly scheduling tasks of different criticality levels on a shared platform. When this is done, the usual assumption is that tasks of lower criticality are degraded when a higher criticality task needs more resources, for example when it overruns a bound on its execution time. However, a way to quantify the impact this degradation has on the overall system is not well understood. Meanwhile, to improve schedulability and to avoid over-provisioning of resources due to overly pessimistic worst-case execution time estimates of higher criticality tasks, a new paradigm emerged where task’s execution times are modeled with random variables. In this paper, we analyze a system with probabilistic execution times, and propose metrics that are inspired by safety standards. Among these metrics are the probability of deadline miss per hour, the expected time before degradation happens, and the duration of the degradation. We argue that these quantities provide a holistic view of the system’s operation and schedulability.

Introduction

Mixed-criticality (MC) systems are real-time systems that feature tasks of different criticality levels. Typical application domains include avionics and automotive (Burns and Davis 2017). In MC systems, each task has an associated criticality level. Depending on the criticality level, a failure of a task, for example due to deadline miss, can have a more or less severe impact on the overall safety of the system. Due to possible catastrophic consequences of a system failure, MC systems for some application domains are subject to certification standards. For example, DO-178C (Rtca/do-178c 2012) is a standard for avionics systems. It defines five criticality levels, ‘A’ to ‘E’, with ‘A’ being the highest criticality level. Here, a failure of a task of criticality ‘A’ can have a negative impact on the overall safety of the aircraft, while a failure of a task of criticality ‘D’ may only slightly increase the aircraft crew’s workload. Quantitatively, an application’s criticality correlates to a tolerable failure rate under a given certification standard. The failure rates of all tasks, under their respective criticality levels, have to be guaranteed for certification of the overall system. As an example, Table 1 states the tolerable failure rates for DO-178B.Footnote 1

Table 1 Failure rate specification for different criticality levels

Traditionally, industry favors physical segregation of tasks based on their criticality level (Tămaş-Selicean and Pop 2015). This implies, for example, that tasks of each criticality level execute on their own hardware, and tasks of different criticality levels do not interfere. However, such a physical separation based on criticality levels can lead to system under-utilization and complex distributed multi-processor architectures. Recently, there has been a push towards integrating tasks of different criticality levels on a single hardware platform (Burns and Davis 2017). The advantages for such consolidation include reduction in cost, power dissipation, weight, as well as maintenance.

Unfortunately, this consolidation of criticality levels makes isolating tasks of different criticality levels problematic. Essentially, a low criticality level ‘D’ task may hinder the execution of a higher criticality level ‘B’ task, possibly resulting in a deadline miss—which can be considered as a type of failure. To counter this, researchers have proposed several schemes which are covered in detail in Sect. 2. Broadly speaking, the approaches are based on an execution time abstraction proposed by Vestal (2007). Vestal’s model builds on the Worst-case Execution Time (WCET) abstraction. He assumes that tasks have a set of WCET estimates with different levels of confidence. The system is required to meet the deadline of a criticality level ‘A’ task for the highest confidence and most pessimistic WCET estimates. For lower criticality tasks, correct execution needs to be guaranteed for less pessimistic WCET estimates. Prominent proposed approaches that build on Vestal’s model feature mode-based scheduling schemes that ensure that the system executes tasks of all criticality levels correctly when less pessimistic WCETs estimates are not overrun, while reduced service to tasks of lower criticality levels is in place when this is not the case.

In this paper, instead of taking a single WCET estimate as in the traditional real-time model, or a criticality dependent set of WCET estimates as per Vestal’s model, we assume a stochastic model of execution times. For each task, the execution time is modeled with an independent random variable. This additional information on the execution time allows us to have improved schedulability due to the so called multiplexer gain, i.e., the likelihood of high execution times of many tasks occurring simultaneously is very small. Under the proposed scheme there is a non-zero probability of a high criticality task missing its deadline. If the probability is less than the failure rate specification of the criticality level, see for example Table 1, then the MC system can still be schedulable according to the probabilistic bounds on deadline misses.

Individual tasks are assumed to be periodic with constrained deadlines. The platform is assumed to have a single core. We assume a dual-critical model, where the criticality of tasks can be either lo or hi. The system is also assumed to have two modes of operation: lo- and hi-criticality mode. In the lo-criticality mode, all tasks are executed normally. In the hi-criticality mode, newly released jobs of lo tasks are starting in a degraded mode so that preference is given to hi tasks.

The application of stochastic execution to MC systems is not new and several recent works exist (Maxim et al. 2017; Masrur 2016; Guo et al. 2015). However, existing results do not provide a holistic scheduling scheme and analysis covering all execution modes and transitions. A detailed accounting of existing schemes and their limitations is given in Sect. 2. In the following, we suppose that a MC scheduling scheme fulfills the following requirements:

  • Schedulability analysis of tasks is provided for each criticality level in each system mode.

  • Conditions that should trigger a mode switch are defined.

  • Analysis of the time spent in each system mode is provided.

  • A method to consolidate these individual components and compute a metric comparable to the Probability of Failure per Hour for tasks of each criticality level is given.

In this paper, we address all of these individual components. Specifically, we make the following contributions:

  1. 1.

    We propose conditions that trigger a mode switch, both from lo- to hi-criticality mode (lo \(\rightarrow \) hi), and from hi- to lo-criticality mode (hi \(\rightarrow \) lo).

  2. 2.

    We provide a detailed stochastic analysis of lo-criticality mode. Using the analysis, the Probability of Deadline Miss per Hour in this mode is computed for tasks of both criticality levels.

  3. 3.

    We provide a first stochastic analysis of hi-criticality mode. Using the analysis, the maximal time spent in hi-criticality mode is obtained, along with the Probability of Deadline Miss per Hour for tasks of both criticality levels. Also taken into account is the probability the system enters hi-criticality mode.

  4. 4.

    Using contributions 1–3, we compute the overall Probability of Deadline Miss per Hour values for all tasks by consolidating the respective values for lo- and hi-criticality mode. This allows us to compare these probabilities with the permitted ones found in typical certification standards.

  5. 5.

    We determine the probability that a lo task is started in its degraded mode.

Due to these contributions, we claim that this is the first work which provides a system-wide approach to MC scheduling, while considering a stochastic model of task execution times.

Organization: This paper is organized as follows: Sect. 2 highlights the related research in Mixed-criticality scheduling and in stochastic analysis. It also highlights the limitations of existing research which are addressed by this work. Section 3 states our system model. The model includes the task model and the model of the MC system. This is followed by Sect. 4, which states and explains important definitions and operations for stochastic analysis of systems with non-deterministic execution times. Section 5 covers the proposed analysis for getting Probability of Deadline Miss per Hour values, both for all lo and for all hi tasks. This section also has important intermediate results such as the duration of lo- and hi-criticality mode, and the probability of each event that causes a system mode switch. Results are covered in Sect. 6. In this section, we evaluate various schedulability metrics and design trade-offs for MC systems. Conclusion is given in Sect. 7, followed by references.

Related work

Vestal’s paper (2007) is the first paper that presents the MC model, where safety-critical tasks have multiple WCET estimates with different levels of assurance. Based on the model, a preemptive fixed priority scheduling scheme for sporadic task sets is presented: Static Mixed Criticality (SMC). In the widely examined dual-criticality case, hard guarantees are given to hi tasks, but lo jobs might miss their deadline if a hi job overruns its optimistic WCET. As well as this, a lo job is de-scheduled if it overruns its WCET.

Baruah et al. (2011) introduced an important fixed priority scheduling scheme, Adaptive Mixed Criticality (AMC), which defines a system that can operate in different modes. The system starts in lo-criticality mode where all tasks are scheduled to execute according to their optimistic WCET estimates. If any job overruns its optimistic WCET, a switch to hi-criticality mode happens, where all lo tasks are de-scheduled. This way, hi tasks are guaranteed to meet their deadlines all the time, whereas lo tasks have this guarantee only in lo-criticality mode.

EDF scheduling has been adapted to Vestal’s model as well. Baruah et al. (2011) propose a scheduling scheme for sporadic task sets based on EDF, called EDF-VD. In this scheme, the deadlines of all hi tasks are scaled down by a single scaling factor so that an overrun is detected early. Once an overrun is detected, the system enters hi-criticality mode where all lo tasks are de-scheduled. In this scheme, all tasks meet their deadlines if no optimistic WCET is overrun, while only hi tasks meet their deadlines if some of them are overrun. Ekberg and Yi (2012) use demand-bound functions to scale the deadlines of hi tasks individually, by a heuristic search strategy. Deadlines are chosen so that the schedulability of the system is maximized. The lo- and hi-criticality mode model in this scheme is similar to the one used in Baruah et al. (2011). Huang et al. (2014) amend EDF-VD to include degraded service for low criticality tasks while the system is in hi-criticality mode. The paper also presents an upper bound on the duration of this mode. Park and Kim (2011) present another EDF-based scheme, CBEDF. Here, high criticality tasks are always guaranteed to execute, while some guarantees are given to tasks of low criticality using offline empty slack location discovery. Vestal’s model with two modes of operation was also investigated for time-triggered scheduling, most notably in Baruah and Fohler (2011). For a comprehensive overview of research into Mixed Criticality, we refer the reader to the review by Burns and Davis (2017), while for a discussion on the applicability of Mixed Criticality systems to industry and its safety-critical practices see Ernst and Di Natale (2016).

As for probabilistic MC systems, related work often models them with probabilistic Worst-Case Execution Time (pWCET) distributions, which are seen as extending Vestal’s model such that each task has a large number of WCETs with various levels of confidence (Burns and Davis 2017; Davis and Cucu-Grosjean 2019). A pWCET distribution comes from either the randomness inherent in a system and its environment, or the lack of knowledge we have about a system, or possibly both (Davis et al. 2017). To derive these distributions, well established methods like static probabilistic timing analysis (Devgan and Kashyap 2003), or measurement based probabilistic timing analysis techniques (Cucu-Grosjean et al. 2012) already exist. Ideally, modeling tasks with pWCET distributions removes dependency between them, meaning any task-set can be analyzed as though all tasks had independent execution times. In practice, by using pWCET distributions, these dependencies are reduced but not removed completely. This still poses a major problem in applying pWCET methodologies for real-time computing. For an extensive survey of timing analysis techniques, we refer the reader to Davis and Cucu-Grosjean (2019). In this paper we assume that tasks’ execution times are modeled with random variables which are given, and these random variables can be seen as an abstraction of ideal pWCETs.

For the analysis of probabilistic MC systems, obtaining probabilistic response times is key. The survey on probabilistic schedulability analyses by Davis and Cucu-Grosjean (2019) lists various approaches to response time analysis. Our paper builds mainly upon the work of Díaz et al. (2002, 2004), as their analysis of real-time systems is pessimistic. Using probabilistic analysis, existing work often presents scheduling schemes where individual tasks have certain permissible deadline miss probabilities. Examples are Maxim et al. (2017) and Abdeddaïm and Maxim (2017), were SMC and AMC scheduling are adapted to a probabilistic MC model, demonstrating the improvement in schedulability. Masrur (2016) proposes a scheme with no mode switches, where lo tasks have a soft guarantee on meeting their deadline as well. Alahmad and Gopalakrishnan (2016, 2018) use a Markov decision process to provide probabilistic guarantees to jobs, and also formulate an optimization problem that provides the scheduling policy. Santinelli and George (2015), Santinelli and Guo (2018), and Santinelli et al. (2016) examine probabilistic MC systems by doing a sensitivity analysis, which focuses on the impact made by varying execution times. However, we observe that a holistic characterization of probabilistic mixed-criticality systems remains largely unexplored in the state-of-the-art. Deadline miss probabilities of individual jobs are often not aggregated into system-wide metrics, for example in Masrur (2016) and Maxim et al. (2017). We note that giving soft guarantees to individual tasks is not equivalent to guaranteeing a probability of deadline miss per hour. Another related work, Guo et al. (2015), analyzes a simple probabilistic model, where a hi task has just two WCETs and their corresponding probabilities of occurrence. Using the model, they propose a EDF-based scheduling algorithm which has an allowed probability of a timing fault happening system-wide. Finally, Küttler et al. (2017) consider a model where some guarantees are available to tasks of lower criticality. They propose lowering the priorities of lower criticality tasks in certain modes of operation. Still, without characterizing the duration of modes, we believe that the impact of degradation of lo tasks can not be properly quantified.

Finally, our own previous work (Draskovic et al. 2016) addresses the probability of deadline miss in lo-criticality mode of a dual mode system, while also commenting on the time before a transition to hi-criticality mode happens. However, a system-wide overview of the system is not given as hi-criticality mode is not analyzed. In this paper, we address the aforementioned limitations of the state-of-the-art.

System model

We start this section with an informal overview of our system model, before precise definitions are presented. The model is an extension of Vestal’s original model (2007), and as is with Adaptive Mixed Criticality (Baruah et al. 2011), there are two modes of operation, lo- and hi-criticality mode.

lo-criticality mode can be considered a normal mode of operation, and the system is expected to operate in this mode most of the time. hi-criticality mode can be considered an emergency mode, where newly instantiated lo jobs are started and running in degraded mode so preference is given to the execution of hi jobs. More specifically, hi criticality tasks are not affected by the mode of operation, these task are always released and executed until their completion. lo criticality tasks have two variants: each lo job can be released in degraded or regular mode. They always finish in the mode they started with. Though lo tasks are never dropped, they are released with degradation when the system is in hi-criticality mode. In practice, this means that there are two implementations of each task, and the degraded variant offers a reduced functionality. For example, the numerical result is computed with less precision. Vestal’s original model specifies dropping lo jobs when hi jobs need more resources, and our model can be seen as a generalization where not executing a job is the extreme case.

The system starts in lo-criticality mode, and remains there until a mode switching event occurs. The first mode switching event is the only one discussed for non-probabilistic MC systems, and is thus found in previous work, for example (Baruah et al. 2011; Ekberg and Yi 2012; Huang et al. 2014; Maxim et al. 2017): a hi job’s execution lasts longer than a provided threshold. The second mode switching event is when a hi job misses its deadline. It is introduced to reduce the probability of consecutive deadline misses of hi jobs. Note that a hi job might miss its deadline without overrunning its threshold execution time, for example because it was blocked by jobs of higher priority. Finally, the third mode switch event is when a long backlog of lo jobs accumulates, which could in turn produce an arbitrarily high backlog when entering hi mode. Once in hi-criticality mode, the system switches back to lo-mode the first time it is idle.

Using this model, we say a task-set to be schedulable using fixed priority preemptive scheduling, if the probability that any job misses its deadline during an hour of operation is sufficiently small, and if the ratio of lo jobs released in degraded mode is acceptable.

General notation on random variables This work deals with discrete random variables, and they are denoted using calligraphic symbols, for example \({\mathcal {A}}\). The probability function of \({\mathcal {A}}\), noted \(p_{{\mathcal {A}}}(\cdot )\), tells us the probability that \({\mathcal {A}}\) takes a specific value u: \(p_{{\mathcal {A}}}(u) = {\mathbb {P}}({\mathcal {A}} = u)\). Without loss of generality, we assume that the possible values of all random variables span the full range of natural numbers. If the maximal and minimal values with non-zero probability of \({\mathcal {A}}\) exist, and are noted \(u_{\max {}}\) and \(u_{\min {}}\), then the probability function can be represented in vector notation:

$$ p_{\mathcal {A}} = [p_{{\mathcal {A}}}(u_{\min {}}), \ldots , p_{{\mathcal {A}}}(u_{\max {}})]^{\intercal } . $$
(1)

Let us define a relation to compare two random variables \({\mathcal {A}}\) and \({\mathcal {B}}\), as was done by Díaz et al. (2002).

Definition 1

(First-Order Stochastic Dominance) \({\mathcal {A}}\) is greater or equal than \({\mathcal {B}}\), written as \({\mathcal {A}} \succeq {\mathcal {B}}\), if and only if

$$ \forall l \ge 0: \quad \displaystyle \sum \limits _{u = l}^{\infty } p_{\mathcal {A}}(u) \ge \displaystyle \sum \limits _{u = l}^{\infty } p_{\mathcal {B}}(u). $$
(2)

Note that probability densities can be incomparable.

We introduce a shorthand notation for the probability that a variable modeled with random variable \({\mathcal {A}}\) has a value greater than scalar s. Instead of the cumbersome expression \(\sum _{s < i}{{\mathbb {P}}\left( i ={\mathcal {A}} \right) }\), we use \({\mathbb {P}}\left( s < {\mathcal {A}} \right) \).

Finally, we introduce a simple notation \([s]^{1}\) to indicate that a scalar or expression s is limited to a maximum value of 1, \([s]^{1} = \min {(s,1)}\).

Task model A task-set \(\Pi \) consists of N independent tasks. Each task is periodic, constrained deadline, with an initial phase and a criticality level. A single task \(\tau _{i}\) is characterized by tuple \((T_{i}, D_{i}, \phi _{i}, \chi _{i}, {\mathcal {C}}_{i})\), where \(T_{i}\) is the period, \(D_{i}\) is the relative deadline, \(\phi _{i}\) is the phase, \(\chi _{i}\in \{{{\textsc {lo}}}{},{{\textsc {hi}}}{}\}\) is the task’s criticality level, and \({\mathcal {C}}_{i}\) models the probabilistic execution time. \({\mathcal {C}}_{i}\) has a maximal value with non-zero probability, which is the WCET, noted \(C_{i}^{\max {}}\). Tasks with criticality level lo and hi are referred to as ‘lo tasks’ and ‘hi tasks’, respectively. An instance j of task \(\tau _{i}\) is called a job, and denoted as \({\tau _{{i,j}}}\). Each job \({\tau _{{i,j}}}\) has its release time \(r_{{i,j}} = \phi _{i} + (j-1) \cdot T_{i}\), and its absolute deadline \(d_{{i,j}}= r_{{i,j}}+ D_{i}\). The hyperperiod hp of a set of tasks is defined to be the least common multiple of all task periods.

We model the execution times of each task \(\tau _{i}\) with known independent and identically distributed random variables \({\mathcal {C}}_{i}\). This means that there is no dependency between the execution times of any two jobs, regardless of whether they are of the same task or not, and execution times of all jobs of one task are modeled with the same random variable. However, the provided analysis is safe, i.e., if the computed bounds hold for a given set of probabilistic execution times, they also hold if the execution times are smaller or equal according to Definition 1. Therefore, the probabilistic execution times \({\mathcal {C}}_{i}\) can also be regarded as ideal probabilistic worst case execution times (pWCETs), which would remove the requirement that execution times of jobs are independent.

In the standard MC model (Vestal 2007), hi tasks have an optimistic and a pessimistic WCET estimate, and lo tasks are executed by the processor only if hi tasks meet their optimistic WCET estimates during operation. The reasoning behind this is the assumption that most of the time hi tasks will not execute for longer than their optimistic WCET estimate, so less computational resources are needed for the correct operation of the system. In this paper, we assume that the distribution of the execution time of each task \({\mathcal {C}}_{i}\) is known. Therefore, instead of the optimistic WCET estimate, for each hi task we define a threshold execution time value \(C_{i}^{{\text {thr}}}\). We assume this value is a given design choice. Note that the probability that a hi task executes for longer than this threshold is \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}})\). The precise way this threshold is used in scheduling of jobs is described later in this section. Additionally, instead of not executing lo jobs in order to free up resources, we introduce that each lo job can be released in degraded or regular mode. If it executes with degradation, its WCET is \(C_{i}^{{\text {deg}}}\). The \(C_{i}^{{\text {deg}}}\) value is assumed to given as a design choice. It could be zero if the task is not to be run in hi-criticality mode, or it can be any value less than its WCET: in this case it is assumed that a lower functionality is provided.

For the execution time of hi tasks, it is useful to introduce the following random variable that describes a worst-case behavior as long as the analyzed system is still in \({{\textsc {lo}}}\)-critical mode.

Definition 2

(Trimmed Execution Time) Random variable \({\mathcal {C}}_{i}^{{{\textsc {lo}}}}\) models the execution time of hi tasks \(\tau _{i}\), but modified such that they do not execute for longer than \(C_{i}^{{\text {thr}}}\):

$$\begin{aligned} p_{{\mathcal {C}}_{i}^{{{\textsc {lo}}}}}(u)&=\left\{ \begin{array}{ll} p_{{\mathcal {C}}_{i}}(u) &{}\,\, u < C_{i}^{{\text {thr}}}, \\ \sum _{v = C_{i}^{{\text {thr}}}}^{C_{i}^{\max {}}} p_{{\mathcal {C}}_{i}}(v) &{} \,\,u = C_{i}^{{\text {thr}}}, \\ 0 &{} \,\,u > C_{i}^{{\text {thr}}}. \end{array}\right. \end{aligned}$$
(3)

Figure 1a illustrates the \({\mathcal {C}}_{i}\) of a lo task, as well as the WCET denoted as \(C_{i}^{{\text {deg}}}\) in degraded mode. Figure 1b illustrates the \({\mathcal {C}}_{i}\) of a hi task as well as the trimmed execution time \({\mathcal {C}}_{i}^{{{\textsc {lo}}}}\) with the corresponding \(C^{{\text {thr}}}_{i}\) and \(C^{\max {}}_{i}\) values.

Fig. 1
figure1

Task execution times, with named values and trimmed execution time \({\mathcal {C}}_{i}^{{{\textsc {lo}}}}\)

This definition differs from the one found in many related works, i.e. Draskovic et al. (2016), where the execution time of hi tasks in \({{\textsc {lo}}}\)-critical mode is defined as the conditional probability \({\mathbb {P}}(p_{{\mathcal {C}}_{i}}(u) = u | u \le C_{i}^{{\text {thr}}})\), often called ‘truncated’ execution time. The ‘trimmed’ execution times, as defined in this paper, are by definition greater or equal to the equivalent ‘truncated’ execution times. This paper uses ‘trimmed’ execution times because they simplify the analysis of hi-criticality mode, namely by simplifying initial conditions noted by Definition 12. The cost of this simplification is that it introduces pessimism in the lo-criticality mode analysis, however this has been found to be numerically negligible through simulations. Nevertheless, using the ‘truncated’ execution times option with a more complex analysis is also possible. For more information, see the comment on future work in the conclusion.

The response time of job \({\tau _{{i,j}}}\) is modeled with random variable \({\mathcal {R}}_{{i,j}}\). The way this variable can be obtained and upper-bounded is presented in Sect. 4. The deadline miss probability of job \({\tau _{{i,j}}}\) is the probability that this job finishes after its deadline \(\textsf {DMP}_{{i,j}}= {\mathbb {P}}({\mathcal {R}}_{{i,j}}> d_{{i,j}})\).

Schedulability In this paper, we consider a single-core platform. A simple execution model is used, where task preemption overhead is zero.

As in the standard MC model, the system is defined to operate in two modes of operation, lo- and hi-criticality mode. When the system is operating in lo-criticality mode, both lo and hi jobs are released. When the system is operating in hi-criticality mode, hi jobs are released normally, while lo jobs are released in degraded mode.

In this paper the definition of schedulability is inspired by the probability-of-failure-per-hour notion. Therefore, we first define the probability of deadline miss per hour, before defining schedulability. We also define the probability of degraded job, a proportion of how many lo jobs execute in degraded mode in the long run.

Definition 3

(Failure Probabilities) The probability of deadline miss per time interval T for hi or lo jobs is denoted as \(\textsf {DMP}_{{{\textsc {hi}}}}(T)\) or \(\textsf {DMP}_{{{\textsc {lo}}}}(T)\), respectively. It is the probability that at least one hi or lo job misses its deadline during a time interval of length T.

Formally, we define \(\textsf {DMP}_{{{\textsc {hi}}}}(T)\) and \(\textsf {DMP}_{{{\textsc {lo}}}}(T)\) as:

$$ \textsf {DMP}_{\chi }(T) = \max _{\forall t}{\;{\mathbb {P}}\left( \exists {\tau _{{i,j}}}\in S_{\chi }(t) \, : \; {\tau _{{i,j}}}\text { misses its deadline} \right) }, $$
(4)

where \(\chi = \{ {{\textsc {lo}}}, {{\textsc {hi}}}\}\), and \(S_{\chi }(t) = \{ {\tau _{{i,j}}}\; | \; \chi _{i} = \chi \; \wedge \; t \le r_{{i,j}}< t+T \} \).

Definition 4

(Probability of Degraded Job) The probability of degraded lo jobs \(\textsf {PDJ}_{{\text {deg}}}\) is the probability that any individual lo job is released in degraded mode:

$$ \textsf {PDJ}_{{\text {deg}}} = \max _{\forall t}{\frac{|S_{{{\textsc {lo-deg}}}}(t)|}{|S_{{{\textsc {lo}}}}(t)|}}, $$
(5)

where

$$\begin{aligned} S_{{{\textsc {lo}}}}(t)&= \{ {\tau _{{i,j}}}\; | \; \chi _{i} = {{\textsc {lo}}}\; \wedge \; t \le r_{{i,j}}< t+T \} \\ S_{{{\textsc {lo}}}\text {-}{\text {deg}}}(t)&= \{ {\tau _{{i,j}}}\; | \; \chi _{i} = {{\textsc {lo}}}\; \wedge \; t \le r_{{i,j}}< t+T \; \wedge \; {\tau _{{i,j}}}\text { is in degraded mode} \} \end{aligned}$$

Definition 5

(Schedulability) A MC system is \((\sigma _{{{\textsc {hi}}}}, \sigma _{{{\textsc {lo}}}}, \sigma _{{\text {deg}}})\)-schedulable if \(\textsf {DMP}_{{{\textsc {hi}}}}(1h) \le \sigma _{{{\textsc {hi}}}}\), \(\textsf {DMP}_{{{\textsc {lo}}}}(1h) \le \sigma _{{{\textsc {lo}}}}\), and \(\textsf {PDJ}_{{\text {deg}}} \le \sigma _{{\text {deg}}}\), where 1h denotes the duration of 1 h.

The probabilistic MC scheduling scheme used in this paper can now be defined:

Definition 6

(Probabilistic MC Scheduling) In lo-criticality mode, all tasks are scheduled using a provided fixed-priority preemptive schedule. The system starts in lo-criticality mode, and remains in it until one of the following events causes a transition to hi-criticality mode:

  1. 1.

    A hi job overruns its threshold execution time \(C_{i}^{{\text {thr}}}\).

  2. 2.

    A hi job misses its deadline.

  3. 3.

    The system-level backlog, meaning the amount of pending execution, becomes higher than a predefined threshold \(B_{\max }\).

In hi-criticality mode, the same fixed-priority preemptive schedule is used, but lo jobs are released with degradation in order to free up the processor. lo jobs starting in lo-criticality mode are still continuing in their normal mode with execution time \({\mathcal {C}}_{i}\). The system remains in hi-criticality mode until it becomes idle for the first time.

Preliminaries

With tasks having probabilistic execution times, a set of computational primitives are required to perform the schedulability analysis. A probabilistic analysis of real-time systems, on which our analysis is based, was described by Díaz et al. (2002, 2004). We summarize the analysis technique in this section. The analysis and its primitives are used extensively in the following sections to perform the schedulability analysis of mixed-criticality systems.

The analysis requires computation of the backlog, i.e., the sum of pending execution times of all ready jobs. For each priority level i there is a backlog containing the execution times of all pending jobs with priority i or higher. When a new job with priority i arrives, all backlogs with level i or lower are increased by adding its execution time. Adding the execution time random variable to a backlog is done using convolution. Executing a job decreases the backlogs of all levels i that are equal or smaller than the priority of the job. Decreasing the backlog is done using shrinking.

Definition 7

(Backlog) The ith priority backlog at time t, \({\mathcal {B}}_{i}(t)\), is a random variable that describes the sum of all remaining execution times of pending jobs of priority not less than i, at time t. The backlog \({\mathcal {B}}_{i}(t-)\) is the same as \({\mathcal {B}}_{i}(t)\), except it does not take into account jobs released at time t.

Using convolution to compute backlog after arrival of a job Suppose that a job \({\tau _{{i,j}}}\) is released at time \(r_{{i,j}}\), and \({\mathcal {B}}_{k}(r_{{i,j}}{}-)\) is the kth priority backlog at time \(r_{{i,j}}\), but excludes the newly released job. Assuming that \(i\ge k\), and that no other job is released at the same time, backlog \({\mathcal {B}}_{k}(r_{{i,j}})\) can be computed using the convolution operator \(\otimes \):

$$ p_{{\mathcal {B}}_{k}(r_{{i,j}})} = p_{{\mathcal {B}}_{k}(r_{{i,j}}{}-)} \otimes p_{{\mathcal {C}}_{i}}. $$
(6)

Backlog reduction due to execution of highest priority job Let us assume that in the interval \(t_{0}< t < t_{1}\) there are no job arrivals. During this interval, the backlog is decreased as the processor executes pending jobs. If \({\mathcal {B}}_{i}(t_{0})\) is the ith priority backlog at time \(t_{0}\), the corresponding backlog at time t can be computed using the so-called shrinking operation. Specifically, for computing backlog at time \(t_{0}< t < t_{1}\), the following equation can be used:

$$p_{{\mathcal {B}}_{i}(t)}(u) = \left\{ \begin{array}{ll} \displaystyle \sum \nolimits _{j=0}^{t-t_{0}} p_{{\mathcal {B}}_{i}(t_{0})}(j) &{}\quad u = 0, \\ p_{{\mathcal {B}}_{i}(t_{0})}(u + t-t_{0}) &{}\quad u > 0. \end{array}\right. $$
(7)

In other words, the backlog after an execution of \(t-t_{0}\) time units is computed by left-shifting the initial backlog by \(t-t_{0}\), while truncating at zero since the processor is idle when no pending execution is present. For brevity, we define the corresponding shrinking function of a random variable \({\mathcal {B}}\):

$$ \text {shrink}({\mathcal {B}}, m)(u) = \left\{ \begin{array}{ll} \displaystyle \sum \nolimits _{j=0}^{m} p_{{\mathcal {B}}}(j) &{}\quad u = 0, \\ p_{{\mathcal {B}}}(u + m) &{}\quad u > 0. \end{array}\right. $$
(8)

Backlog State Space Exploration First, we define the function \(\texttt {bsse}\) for computing the backlog at some time \(t+u\) given the backlog at time t.

Definition 8

(Backlog Computation) \(\texttt {bsse}\left( {\mathcal {B}}_{i}(t),\, \Pi ,\, i,\, t,\, u \right) \) is a function for computing the ith priority backlog at time \(t+u\), i.e., \({\mathcal {B}}_{i}(t+u)\). We assume that the ith priority backlog at time t is \({\mathcal {B}}_{i}(t)\), and that the task arrivals and execution times in the interval \([t, t+u)\) are in accordance with task set \(\Pi \).

The computation of \(\texttt {bsse}\) can be done by applying the definition of a task set as well as the previously described operations, namely convolution and shrinking. We demonstrate this using the following example.

Example

Task-set \(\Pi\) is given, and consists of task \(\tau_{1} = (T_1 = 5, \, \pi_1 = 0, \, D_1 = 5, \, {\mathcal {C}}_1)\), and of task \(\tau_{2} = (10,\, 0,\, 10,\,{\mathcal {C}}_2)\). Task \(\tau_{2}\) has a higher priority. The backlogs at time 0− at priority levels 1 and 2 are given as \({\mathcal {B}}_{1}(0-)\) and \({\mathcal {B}}_{2}(0-)\), respectively. For this set-up, find the backlog at time 10− at priority level 1, as well as the backlog at time 7 at priority level 2.

Solution. The following combination of convolution and shrinking computes \({\mathcal {B}}_{1}(10-) = \texttt {bsse}\left( {\mathcal {B}}_{1}(0-),\, \Pi ,\, 2,\, 0-,\, 10- \right) \), by taking into account the execution times of all jobs:

$$\begin{aligned} p_{{\mathcal {B}}_{1}(0)}&= p_{{\mathcal {B}}_{1}(0-)} \otimes p_{{\mathcal {C}}_{1}} \otimes p_{{\mathcal {C}}_{2}},\\ p_{{\mathcal {B}}_{1}(5-)}&= \text {shrink}({\mathcal {B}}_{1}(0), 5),\\ p_{{\mathcal {B}}_{1}(5)}&= p_{{\mathcal {B}}_{1}(5-)} \otimes p_{{\mathcal {C}}_{1}},\\ p_{{\mathcal {B}}_{1}(10-)}&= \text {shrink}({\mathcal {B}}_{1}(5), 5). \end{aligned}$$

For computing the highest priority backlog, task \(\tau _{1}\) is ignored. Using the same procedure, we obtain \({\mathcal {B}}_{2}(7) = \texttt {bsse}\left( {\mathcal {B}}_{2}(0-),\, \Pi ,\, 1,\, 0-,\, 7 \right) \):

$$\begin{aligned} p_{{\mathcal {B}}_{2}(0)}&= p_{{\mathcal {B}}_{2}(0-)} \otimes p_{{\mathcal {C}}_{2}}, \\ p_{{\mathcal {B}}_{2}(7)}&= \text {shrink}({\mathcal {B}}_{2}(0), 7). \end{aligned}$$

Upper bound of backlog

In order to provide a holistic schedulability analysis, we need to determine upper bounds of the backlogs for all time instances within any future hyperperiod, i.e., we are interested in a set of random variables \(\overline{{\mathcal {B}}}_{i}(t)\) such that \({\mathcal {B}}_{i}(n\cdot {\textsc {hp}}+ t) \preceq \overline{{\mathcal {B}}}_{i}(t)\) for all priority levels i, future hyperperiods \(n \ge 0\) and time instances within a hyperperiod \(0 \le t < {\textsc {hp}}\). We start by computing the steady-state backlog and proceed by showing that it provides the desired upper bound.

Computation of the steady state backlog The ith priority backlog at the start of the nth hyperperiod is \({\mathcal {B}}_{i}(n\cdot {\textsc {hp}})\), but this backlog may be different for each n. However, the sequence of random variables \(\{{\mathcal {B}}_{i}(n\cdot {\textsc {hp}})\}\) can be viewed as a Markov process as shown by Díaz et al. (2002). Specifically, they present the following theorem about the existence of a limit to the above mentioned sequence, including the corresponding proof:

Theorem 1

(Section 4.2 of Díaz et al. 2002) The sequence of backlogs \(\{{\mathcal {B}}_{i}(n\cdot {\textsc {hp}})\}\) for \(n\ge 0,\) where i is a priority level, has a limit if the average system utilization is less than one, and if the sequence of jobs remains the same each hyperperiod. If it exists, this limit is called the ith priority steady state backlog at the beginning of the hyperperiod, and noted \({\overline{{\mathcal {B}}}}_{i}(0)\).

For computing the steady state backlog at the start of a hyperperiod \(\overline{{\mathcal {B}}}_{i}(0)\), Diaz et al. propose three methods. The first method is an exact one stated in Sect. 4.3.2 of Díaz et al. (2002) and exploits the structure of the infinite dimension transition matrix \(\varvec{P}\). A second method (Sect. 4.3.3 of Díaz et al. (2002)) finds an approximate value of \(\overline{{\mathcal {B}}}_{i}(0)\) by truncating \(\varvec{P}\) to make its dimension finite. Finally, a third method is to iterate over hyperperiods until the following relaxed steady state condition is satisfied:

$$ \max _{i, x}\left\{ \left| p_{{\mathcal {B}}_{i}(k\cdot {\textsc {hp}})}(x) - p_{{\mathcal {B}}_{i}((k-1)\cdot {\textsc {hp}})}(x)\right| \right\} <\epsilon . $$
(9)

This condition states that the maximum difference between all ith priority backlogs must not exceed a configurable small value \(\epsilon \). This method does not require computation nor truncation of the transition matrix \(\varvec{P}\). For further details on choosing appropriate initial backlogs, please refer to Díaz et al. (2004).

Pessimism of the steady state backlog Assuming that the initial backlog is zero at every priority level, and that the sequence of jobs remains the same each hyperperiod, it has been shown in Díaz et al. (2004) that the ith priority steady state backlog is an upper bound to all ith priority backlogs at the start of the hyperperiod. The following two Lemmas can be used to show that the backlogs at the beginning of a hyperperiod are increasing from hyperperiod to hyperperiod. They state that the operations of convolution and shrinking preserve the partial ordering of random variables.

Lemma 1

(Property 3 in Díaz et al. 2004) Given three positive random variables \({\mathcal {A}}, {\mathcal {B}}, {\mathcal {C}}\). If \({\mathcal {A}}\preceq {\mathcal {B}},\) then \({\mathcal {A}}+{\mathcal {C}}\preceq {\mathcal {B}}+{\mathcal {C}}\).

Lemma 2

(Property 6 in Díaz et al. 2004) Given two positive random variables \({\mathcal {A}}, {\mathcal {B}}, {\mathcal {C}}\). If \({\mathcal {A}}\preceq {\mathcal {B}},\) then \(\text {shrink}({\mathcal {A}}, m) \preceq \text {shrink}({\mathcal {B}}, m)\).

figurea

Now, the following Theorem can be shown by means of the above considerations: We have, by definition, \(\overline{{\mathcal {B}}}_{i}(t) = \lim _{n \rightarrow \infty }{{\mathcal {B}}_{i}(t + n \cdot {\textsc {hp}})}\) for all \(n \ge 0\) and \(0 \le t < {\textsc {hp}}\), and we know from Theorem 1 that \({\mathcal {B}}_{i}(n \cdot {\textsc {hp}}) \preceq \overline{{\mathcal {B}}}_{i}(0)\) for all \(n \ge 0\).

Theorem 2

(Theorem 1 in Díaz et al. 2004) Assuming that the initial backlog is zero, and that the sequence of jobs remains the same each hyperperiod, the ith priority backlog at time t inside every hyperperiod is upper bounded by the ith priority steady state backlog at time t inside the hyperperiod:

$$ \forall i:\; p_{{\mathcal {B}}_{i}(0)}(0)=1 \, \Rightarrow \, \forall t\in [0, {\textsc {hp}}),\; \forall n\in {\mathbb {N}}:\; {\mathcal {B}}_{i}(n\cdot {\textsc {hp}}+ t) \preceq \overline{{\mathcal {B}}}_{i}(t) $$

In summary, if the initial backlog is zero, the steady-state backlog \(\overline{{\mathcal {B}}}_{i}(t)\) provides an upper bound for all backlogs within any future hyperperiod. This result will be used extensively in the the response time analysis described next.

Response time analysis

The response time of a job \({\mathcal {R}}_{{i,j}}\) tells us when this job will finish its execution, relative to its release time. We summarize the procedure as proposed by Díaz et al. (2002). The response time of a given job \({\tau _{{i,j}}}\) is influenced by the initial backlog at its release time \(B_{i}(r_{i,j})\), and the computation times of all jobs that preempt the job. Therefore we can define a function:

$$ {\mathcal {R}}_{{i,j}}= \texttt {rta}\left( {\mathcal {B}}_{i}(r_{{i,j}}),\, \Pi ,\, {\tau _{{i,j}}} \right) . $$
(10)

The pseudocode for computing response times is given in Algorithm 1. For a given job \({\tau _{{i,j}}}\), first \({\mathcal {C}}_{i}\) is convolved with the the current ith priority backlog (line 2). This would provide us with the response time of \({\tau _{{i,j}}}\), if there were no preempting jobs. When a preempting job is released at a given point in time, then the probability function vector of \({\tau _{{i,j}}}\)’s response time is split in two portions (line 6): the part before preemption (\({R}_{l}\)), and the part after preemption (\({R}_{u}\)). The part after preemption is convolved with the probability function vector of the preempting job’s computation time, and the result is added to \(R_{l}\) in order to get \({\tau _{{i,j}}}\)’s response time after this preemption (lines 7 and 8). The probability function of \({\mathcal {R}}_{{i,j}}\) is only computed until the job’s deadline \(d_{i,j}\).

Next, we present a theorem that we will use to obtain the worst-case hourly deadline miss probability. Beforehand, the Lemma shows that the response time function \(\texttt {rta}\) is monotone in the backlog at the release time of the job.

Lemma 3

(Theorem 1, Property 3 of López et al. 2008) Given two random variables \({\mathcal {A}},\) \({\mathcal {B}}\). If \({\mathcal {A}}\preceq {\mathcal {B}}\), then \(\texttt {rta}\left( {\mathcal {A}},\, \Pi ,\, {\tau _{{i,j}}} \right) \preceq \texttt {rta}\left( {\mathcal {B}},\, \Pi ,\, {\tau _{{i,j}}} \right) \).

As the steady-state backlog at any time within a hyperperiod is always greater than or equal to the backlog at the corresponding time within any hyperperiod, the following Lemma can be obtained.

Lemma 4

Assuming the initial backlog is zero, substituting any backlog \({\mathcal {B}}_{i}(r_{{i,j}})\) with the appropriate steady state backlog \(\overline{{\mathcal {B}}}_{i}(r_{{i,j}})\) in the response time analysis, produces a value greater or equal to the response time.

$$ \forall i:\; p_{{\mathcal {B}}_{i}(0)}(0)=1 \, \Rightarrow \, \texttt {rta}\left( {\mathcal {B}}_{i}(r_{i,j}),\, \Pi ,\, {\tau _{{i,j}}} \right) \preceq \texttt {rta}\left( \overline{{\mathcal {B}}}_{i}(r_{i,j} \bmod {\textsc {hp}}),\, \Pi ,\, {\tau _{{i,j}}} \right) . $$

Proof

This Lemma is a direct consequence of Lemma 3 and Theorem 2 as well as the results in López et al. (2008). \(\square \)

The value \(\texttt {rta}\left( {\overline{{\mathcal {B}}}}_{i}(r_{i,j} \bmod {\textsc {hp}}),\, \Pi ,\, {\tau _{{i,j}}} \right) \) will be named the steady state response time, and denoted as \({\overline{{\mathcal {R}}}}_{{i,j}}\). Note that use of the steady-state backlog \(\overline{{\mathcal {B}}}_{i}\) leads to an upper bound of the response time \({\mathcal {R}}_{{i,j}}\). Based on these results, we can now determine an upper bound on the response time of each job. Due to the fact that we defined the steady-state (worst case) hyperperiod, we can finally determine the worst-case deadline miss probability of a job \({\tau _{{i,j}}}\) within any hyperperiod. Instead of using the modulo operation as in Lemma 4 we can also just look at jobs \({\tau _{{i,j}}}\) within the single worst case hyperperiod with \(0 \le j < {\textsc {hp}}/T_{i}\).

Theorem 3

The deadline miss probability of a job \({\tau _{{i,j}}}\) denoted as \({\textsf {DMP}_{{i,j}}}\) can be bounded as follows:

$$ \forall i, 0 \le j< {\textsc {hp}}/T_{i} \;:\; {\textsf {DMP}_{{i,j}}\le {\overline{\textsf {DMP}}}_{{i,j}}} = {\mathbb {P}}\left( d_{{i,j}}< \texttt {rta}\left( \overline{{\mathcal {B}}}_{i}(r_{i,j}),\, \Pi ,\, {\tau _{{i,j}}} \right) \right) . $$

Proof

The proof follows directly from the results described in López et al. (2008) as well as Lemma 4. \(\square \)

Analysis of mixed-criticality systems with stochastic task execution times

In this section, we determine the \((\sigma _{{{\textsc {hi}}}}, \sigma _{{{\textsc {lo}}}}, \sigma _{{\text {deg}}})\)-schedulability of a mixed-critical task set \(\Pi \) as defined in Definition 5. To this end, we compute upper bounds on probabilities that there is at least one deadline miss of a \({{\textsc {lo}}}\) or \({{\textsc {hi}}}\) job within 1 h, i.e., \(\textsf {DMP}_{{{\textsc {hi}}}}(T)\) or \(\textsf {DMP}_{{{\textsc {lo}}}}(T)\), respectively, for a time interval of length \(T = 1\) h. In addition, we will compute an upper bound on the probability that a lo job operates in degraded mode \(\textsf {PDJ}_{{\text {deg}}}\). The underlying concept of the forthcoming analysis is described next.

Let us start with the computation of the probability \(\textsf {PDJ}_{{\text {deg}}}\) that a \({{\textsc {lo}}}\) job operates in degraded mode. This probability can be upper bounded by noting that \({{\textsc {lo}}}\) jobs are executed only in their degraded mode if their release time \(r_{{i,j}}\) happens during \({{\textsc {hi}}}\)-criticality mode. Therefore, we will first determine the maximal length \(\Delta ^{{{\textsc {hi}}}}_{\max {}}\) of any \({{\textsc {hi}}}\)-criticality mode execution. In addition, we determine an upper bound on the probability, that there is at least one mode switch within a single hyperperiod denoted as \(\textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}^{{\textsc {hp}}}\). Using these two values, we can bound the relative time the system is in \({{\textsc {hi}}}\) mode and therefore, the probability that a lo job operates in degraded mode.

To determine upper bounds on probabilities \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\), \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\) that there is at least one deadline miss of a \({{\textsc {lo}}}\) or \({{\textsc {hi}}}\) job within 1 h, we first look at upper bounds on the probabilities that at least one \({{\textsc {lo}}}\) or \({{\textsc {hi}}}\) job misses its deadline during any \({{\textsc {hi}}}\)-criticality mode execution that is started within a hyperperiod, denoted as \(\textsf {DMP}_{{{\textsc {hi}}}}^{{{\textsc {hi}}}}\) or \(\textsf {DMP}_{{{\textsc {lo}}}}^{{{\textsc {hi}}}}\), respectively. Note that the upper index denotes the mode, whereas the lower one denotes the criticality of the jobs we are considering. In addition, we determine an upper bound on the probability that at least one \({{\textsc {lo}}}\) or \({{\textsc {hi}}}\) job misses its deadline during a hyperperiod under the conditions that first, no mode switches take place and second, \({{\textsc {hi}}}\) jobs do not overrun their threshold \(C^{{\text {thr}}}\). We denote these values as \(\textsf {DMP}_{{{\textsc {hi}}}}^{{{\textsc {lo}}}}\) or \(\textsf {DMP}_{{{\textsc {lo}}}}^{{{\textsc {lo}}}}\), respectively. Again, the upper index concerns the mode and the lower one the criticality of the considered jobs. Now we can determine the desired probabilities \(\textsf {DMP}_{{{\textsc {hi}}}}(T)\) and \(\textsf {DMP}_{{{\textsc {lo}}}}(T)\) by combining (a) the w.c. probabilities \(\textsf {DMP}_{{{\textsc {hi}}}}^{{{\textsc {lo}}}}\) and \(\textsf {DMP}_{{{\textsc {lo}}}}^{{{\textsc {lo}}}}\) that a deadline miss happens during a hyperperiod if the system is in \({{\textsc {lo}}}\)-criticality mode, (b) the w.c. probabilities \(\textsf {DMP}_{{{\textsc {hi}}}}^{{{\textsc {hi}}}}\) or \(\textsf {DMP}_{{{\textsc {lo}}}}^{{{\textsc {hi}}}}\) that at least one \({{\textsc {lo}}}\) or \({{\textsc {hi}}}\) job misses its deadline during any \({{\textsc {hi}}}\)-criticality mode started within a hyperperiod.

We will now first determine bounds \(\textsf {PDJ}_{{\text {deg}}}\) and \(\textsf {DMP}_{\chi }(1h)\) using the above defined quantities: \(\Delta ^{{{\textsc {hi}}}}_{\max {}}\), \(\textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}^{{\textsc {hp}}}\), \(\textsf {DMP}_{\chi }^{{{\textsc {hi}}}}\) and \(\textsf {DMP}_{\chi }^{{{\textsc {lo}}}}\) for \({{\textsc {hi}}}\) and \({{\textsc {lo}}}\) jobs, i.e., for \(\chi \in \{ {{\textsc {lo}}}, {{\textsc {hi}}}\}\). Afterwards, we will explain how these quantities can be determined.

Probability of job degradation

In this section, we will compute an upper bound on the probability that a \({{\textsc {lo}}}\) job operates in degraded mode, i.e., \(\textsf {PDJ}_{{\text {deg}}}\). As described above, we will make use of the maximal duration of a \({{\textsc {hi}}}\)-criticality mode execution and the probability that there is no mode switch within a hyperperiod.

Definition 9

(Maximal Duration of High-Criticality-Mode) The quantity \(\Delta ^{{{\textsc {hi}}}}_{\max {}}\) denotes the maximal duration the system is continuously executing in \({{\textsc {hi}}}\)-criticality mode.

Definition 10

(Mode Switch Probability) The quantity \(\textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}^{{\textsc {hp}}}\) denotes an upper bound on the probability that there is at least one mode switch lo \(\rightarrow \) hi within a single hyperperiod.

Using these definitions, we can determine an upper bound on the desired quantity.

Theorem 4

The probability of degradation of a \({{\textsc {lo}}}\) job can be bounded as follows:

$$ \textsf {PDJ}_{{\text {deg}}} \le \left\lceil \frac{\Delta ^{{{\textsc {hi}}}}_{\max {}}}{{\textsc {hp}}} + 1 \right\rceil \textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}^{{\textsc {hp}}}. $$
(11)

Proof

We obtain this value by multiplying the probability that hi-criticality mode is entered during one hyperperiod, with the the number of lo jobs that are released in degraded mode when it does.

First, note that there is some constant number K of lo jobs with that are released every hyperperiod. From the moment one \({{\textsc {hi}}}\)-criticality mode is entered, it executes at least partly in at most \(\lceil 1 + \Delta ^{{{\textsc {hi}}}}_{\max {}}/ {\textsc {hp}}\rceil \) hyperperiods. Therefore, what ever the number of mode switches is inside one hyperperiod, in the worst case, all lo jobs from this and the next \(\lceil \Delta ^{{{\textsc {hi}}}}_{\max {}}/ {\textsc {hp}}\rceil \) hyperperiods are executed in degraded mode. In other words, \(K\cdot \left\lceil {\Delta ^{{{\textsc {hi}}}}_{\max {}}}/{{\textsc {hp}}} + 1 \right\rceil \) lo jobs are degraded.

Second, let us note that there is at least one mode switch within a hyperperiod with probability \(\textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}^{{\textsc {hp}}}\). Combining this probability with the number of lo jobs that are degraded if a mode switch happens, we get:

$$\begin{aligned} \textsf {PDJ}_{{\text {deg}}}&\le \left( K\cdot \left\lceil \frac{\Delta ^{{{\textsc {hi}}}}_{\max {}}}{{\textsc {hp}}} + 1 \right\rceil \textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}^{{\textsc {hp}}} + 0\cdot (1 - \textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}^{{\textsc {hp}}}) \right) K^{-1} \\&= \left\lceil \frac{\Delta ^{{{\textsc {hi}}}}_{\max {}}}{{\textsc {hp}}} + 1 \right\rceil \textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}^{{\textsc {hp}}}. \end{aligned}$$

\(\square \)

This upper bound on the probability of degradation of a \({{\textsc {lo}}}\) job may be overly pessimistic in the case when the hyperperiod is much larger than the maximal duration of \({{\textsc {hi}}}\)-criticality mode, \({\textsc {hp}}\gg \Delta ^{{{\textsc {hi}}}}_{\max {}}\). Still, in practical scenarios, it is not considered usual practice to design a system with a very long hyperperiod. We therefore accept the upper bound as satisfactory.

The necessary quantities \(\Delta ^{{{\textsc {hi}}}}_{\max {}}\) and \(\textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}^{{\textsc {hp}}}\) will be determined later as part of our analysis of the \({{\textsc {hi}}}\)- and \({{\textsc {lo}}}\)-criticality modes.

Probabilities of deadline misses

Let us now determine the deadline miss probabilities of \(\textsf {DMP}_{{{\textsc {hi}}}}(T)\) and \(\textsf {DMP}_{{{\textsc {lo}}}}(T)\), i.e., the probabilities that at least one \({{\textsc {hi}}}\) criticality job or one \({{\textsc {lo}}}\) criticality job misses its deadline within the time interval T. With \(T=1\) h we get the quantities as required by the schedulability test according to Definition 5. For the following theorem, let us suppose that \(\chi \in \{ {{\textsc {lo}}}, {{\textsc {hi}}}\}\) denotes the criticality of jobs in the deadline miss probabilities.

In principle, the analysis investigates two coupled systems. The first one which is denoted as the \({{\textsc {lo}}}\)-system never does a mode switch, i.e., all mode switch events are ignored. In addition, it uses modified execution time probabilities of \({{\textsc {hi}}}\) criticality jobs such that the \({{\textsc {lo}}}\)-system pessimistically describes the behavior of the original system if operating in \({{\textsc {lo}}}\)-criticality mode. In particular, all execution times of \({{\textsc {hi}}}\) jobs that are higher than the threshold are trimmed to it, see Definition 2. The worst-case steady-state probability that at least one \(\chi \) job misses its deadline during a hyperperiod in the \({{\textsc {lo}}}\)-system is denoted as \(\textsf {DMP}_{\chi }^{{{\textsc {lo}}}}\). This probability is determined using the worst-case steady-state backlog and response-time analysis as provided in Lemma 4, but using the trimmed execution times of \({{\textsc {hi}}}\) jobs. The other system is denoted as the \({{\textsc {hi}}}\)-system and considers the case that at least one lo \(\rightarrow \) hi mode switch happened within a hyperperiod, i.e., at least one \({{\textsc {hi}}}\)-criticality mode is executed.

Definition 11

(Deadline Miss Probabilities in Different Modes) The worst-case probability that at least one \(\chi \) critical job misses its deadline during any \({{\textsc {hi}}}\)-criticality mode started in a single hyperperiod is denoted as \(\textsf {DMP}_{\chi }^{{{\textsc {hi}}}}\). The worst-case steady-state probability that at least one \(\chi \) critical job misses its deadline during a hyperperiod in a system where (a) all mode switch events are ignored and (b) execution times of \({{\textsc {hi}}}\) jobs are trimmed to their threshold according to Definition 2 is denoted as \(\textsf {DMP}_{\chi }^{{{\textsc {lo}}}}\).

Note that \(\textsf {DMP}_{\chi }^{{{\textsc {lo}}}}\) can be computed according to Lemma 4. Using these definitions, we can determine bounds on the requested deadline miss probabilities using the following result. The desired probabilities per hour can be obtained by setting \(T=1\) h.

Theorem 5

(Deadline Miss Probabilities) The deadline miss probabilities \(\textsf {DMP}_{\chi }(T)\) for \(\chi \in \{ {{\textsc {lo}}}, {{\textsc {hi}}}\}\) can be bounded as follows:

$$ \textsf {DMP}_{\chi }(T) \le 2 - \left( 1 - \textsf {DMP}_{\chi }^{{{\textsc {lo}}}} \right) ^{\left\lceil \frac{T}{{\textsc {hp}}}\right\rceil } - \left( 1 - \textsf {DMP}_{\chi }^{{{\textsc {hi}}}} \right) ^{\left\lceil \frac{T}{{\textsc {hp}}}\right\rceil }. $$
(12)

Proof

It needs to be proven that the probability that there is no deadline miss of any \(\chi \) job within time interval T is bounded by

$$ 1 - \textsf {DMP}_{\chi }(T) \ge \left( 1 - \textsf {DMP}_{\chi }^{{{\textsc {lo}}}} \right) ^{\left\lceil \frac{T}{{\textsc {hp}}}\right\rceil } + \left( 1 - \textsf {DMP}_{\chi }^{{{\textsc {hi}}}} \right) ^{\left\lceil \frac{T}{{\textsc {hp}}}\right\rceil } - 1. $$

There is no deadline miss within T if there is no deadline miss when the system executes in \({{\textsc {lo}}}\)-criticality mode and there is no deadline miss if it operates in \({{\textsc {hi}}}\)-criticality mode. Suppose the first event is named a and the second one b, then we know that \(p(a \cap b) = p(a) + p(b) - p(a\cup b) \ge p(a) + p(b) - 1\) even if both events are not independent. Therefore, the theorem is true if

$$\left( 1 - \textsf {DMP}_{\chi }^{{{\textsc {lo}}}} \right) ^{\left\lceil \frac{T}{{\textsc {hp}}}\right\rceil } $$

lower bounds the probability that there is no deadline miss when the system is in \({{\textsc {lo}}}\)-criticality mode and

$$ \left( 1 - \textsf {DMP}_{\chi }^{{{\textsc {hi}}}}\right) ^{\left\lceil \frac{T}{{\textsc {hp}}}\right\rceil } $$

lower bounds the probability that there is no deadline miss when the system is in \({{\textsc {hi}}}\)-criticality mode.

Let us first look at the \({{\textsc {lo}}}\)-criticality mode. At first, note that \(\lceil T/{\textsc {hp}}\rceil \) is the number of hyperperiods that completely cover an interval of length T. Therefore, we can safely assume that our interval has the length of \(\lceil T/{\textsc {hp}}\rceil \) full hyperperiods. Remember that the backlogs during a steady-state computation are monotonically increasing, see Theorem 2. In a similar way, response times of jobs are monotonically increasing from hyperperiod to hyperperiod, see Lemma 4. As a result, the deadline miss probabilities of jobs are increasing from hyperperiod to hyperperiod as well and \(\textsf {DMP}_{\chi }^{{{\textsc {lo}}}}\) is a safe upper bound for every hyperperiod in our modified \({{\textsc {lo}}}\)-system. We model the system as a worst-case Bernoulli process, acting from hyperperiod to hyperperiod. As a result, \(\left[ 1 - \textsf {DMP}_{\chi }^{{{\textsc {lo}}}} \right] ^{\left\lceil \frac{T}{{\textsc {hp}}}\right\rceil }\) is a lower bound on the probability that there is no deadline miss in the \({{\textsc {lo}}}\)-system, i.e. all switching events are disabled and the execution times of \({{\textsc {hi}}}\) jobs are trimmed.

It remains to be shown that the response times in our \({{\textsc {lo}}}\)-system are always larger or equal than those in the original system when it is in \({{\textsc {lo}}}\)-criticality mode. This is certainly true as after a hi \(\rightarrow \) lo mode switch, the backlogs are 0 for sure and therefore, they are lower than those in the modified \({{\textsc {lo}}}\)-system. Due to Lemma 4, the response times are larger in the modified \({{\textsc {lo}}}\)-system. Moreover, trimming of execution times of \({{\textsc {hi}}}\) criticality jobs has no influence on the backlogs as long as there is no hi \(\rightarrow \) lo mode switch, i.e., the original system operates in \({{\textsc {lo}}}\)-mode.

Now let us look at the \({{\textsc {hi}}}\)-mode. Again note, that \(\lceil T/{\textsc {hp}}\rceil \) is the number of hyperperiods that completely cover an interval of length T. The worst-case probability that at least one \(\chi \) critical job misses its deadline during any \({{\textsc {hi}}}\)-criticality mode started in a single hyperperiod is denoted as \(\textsf {DMP}_{\chi }^{{{\textsc {hi}}}}\), see Definition 11. Therefore, \(\left[ 1 - \textsf {DMP}_{\chi }^{{{\textsc {hi}}}} \right] ^{\left\lceil \frac{T}{{\textsc {hp}}}\right\rceil }\) is a lower bound on the probability that there is no deadline miss caused by a lo \(\rightarrow \) hi switch within a hyperperiod.

This concludes the proof as we considered the case that the systems operates in \({{\textsc {lo}}}\)-criticality mode somewhere within a hyperperiod (bounded by the case that it is always in this mode during the hyperperiod) and the case that one or more \({{\textsc {hi}}}\)-criticality modes are started within a hyperperiod (all corresponding deadline misses are accounted for in the hyperperiod where the \({{\textsc {hi}}}\)-criticality mode was started). \(\square \)

Now we will determine the quantities \(\Delta ^{{{\textsc {hi}}}}_{\max {}}\), \(\textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}^{{\textsc {hp}}}\), \(\textsf {DMP}_{\chi }^{{{\textsc {lo}}}}\) and \(\textsf {DMP}_{\chi }^{{{\textsc {hi}}}}\) required to compute \(\textsf {PDJ}_{{\text {deg}}}\), \(\textsf {DMP}_{{{\textsc {hi}}}}(T)\) and \(\textsf {DMP}_{{{\textsc {lo}}}}(T)\). We start by analyzing the behavior of the MC system in \({{\textsc {lo}}}\)-criticality mode.

LO-criticality mode

The analysis of the \({{\textsc {lo}}}\)-criticality mode will allow us to determine some of the required quantities, namely the worst case probability \(\textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}^{{\textsc {hp}}}\) of at least one lo \(\rightarrow \) hi mode switch within a hyperperiod and the worst-case probability \(\textsf {DMP}_{\chi }^{{{\textsc {lo}}}}\) that at least one \(\chi \) critical job misses its deadline within a hyperperiod if operating in the modified \({{\textsc {lo}}}\)-system, see Sect. 5.2. Moreover, we will determine the worst-case probability of a lo \(\rightarrow \) hi mode switch at time instance \(t\in \{0,\ldots , {\textsc {hp}}-1\}\) within any hyperperiod, as this quantity will allow us to analyse the \(\chi \)-critical mode later on.

Lemma 5

Given a modified task system where no lo \(\rightarrow \) hi mode switch is executed and all \({{\textsc {hi}}}\) critical jobs are trimmed to their execution time threshold \(C_{i}^{{\text {thr}}},\) see Definition 2. Then,

$$\begin{aligned} \textsf {DMP}_{\chi }^{{{\textsc {lo}}}}&= \left[ \sum _{{\tau _{{i,j}}}\in S} {{\overline{\textsf {DMP}}}}_{{i,j}}\right] ^1, \\ S&= \{ {\tau _{{i,j}}}\; | \; \chi _{i} = \chi \; \wedge \; 0 \le j < {\textsc {hp}}/T_{i} \} \end{aligned}$$

is an upper bound on the probability of at least one deadline miss of any \(\chi \) job during \({{\textsc {lo}}}\)-criticality mode execution within any hyperperiod, where \({\overline{\textsf {DMP}}}_{{i,j}}\) denotes an upper bound on the deadline miss probability of job \({\tau _{{i,j}}}\) according to Theorem 5. Note, \([\ldots ]^{1}\) indicates the expression is limited to a maximum value of 1.

Proof

We will show that the response times in the modified system are always larger or equal than those in the original system when it is in \({{\textsc {lo}}}\)-criticality mode. According to Theorem 3, the upper bound on the deadline miss probability \({\overline{\textsf {DMP}}}_{{i,j}}\) holds for any hyperperiod. On the other hand, we can not assume that the miss probabilities for the jobs are within a hyperperiod are independent. Therefore, we upper bound the probability of the union of events by their sum. It remains to be shown that the modified \({{\textsc {lo}}}\)-system with all lo \(\rightarrow \) hi mode switches disabled and the trimmed execution times of \({{\textsc {hi}}}\) critical jobs provides upper bounds on the original system when operating in \({{\textsc {lo}}}\)-criticality mode. This is certainly true as after a hi \(\rightarrow \) lo mode switch in the original system, the backlogs are 0 for sure and therefore, they are lower than those in the modified \({{\textsc {lo}}}\)-system. Due to Lemma 4, the response times are larger in the modified \({{\textsc {lo}}}\)-system. Moreover, trimming of execution times of \({{\textsc {hi}}}\) criticality jobs has no influence on the backlogs as long as there is no hi \(\rightarrow \) lo mode switch, i.e., the original system operates in \({{\textsc {lo}}}\)-mode.

The bounding of the value \(\textsf {DMP}_{\chi }^{{{\textsc {lo}}}}\) to 1 is safe, as for any summation of events we have \(p(a\cup b) \le p(a) + (b)\) and \(p(a\cup b) \le 1\) leading to \(p(a\cup b) \le \min {(1, p(a) + (b))}\). \(\square \)

Now, we will determine an upper bound on the worst-case probability \(\textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}(t)\) of a lo \(\rightarrow \) hi mode switch at time instance \(t\in \{0,\ldots , {\textsc {hp}}-1\}\) within any hyperperiod. Remember that there are three triggering events for a lo \(\rightarrow \) hi mode switch, namely (a) a \({{\textsc {hi}}}\) critical job misses its deadline (b) the system-level backlog, meaning the amount of pending executions, becomes higher than a predefined threshold \(B_{\max {}}\) and (c) a \({{\textsc {hi}}}\) critical job overruns its threshold execution time \(C_{{\text {thr}}}\). We will analyze the three different mechanisms one after the other and finally combine the results.

Let us start with the deadline miss probability at time instance \(0 \le t < {\textsc {hp}}\) which we denote as \(\textsc {P}_{dm}(t)\).

Lemma 6

Given a modified task system where no lo \(\rightarrow \) hi mode switch is executed and all \({{\textsc {hi}}}\) critical jobs are trimmed to their execution time threshold \(C_{i}^{{\text {thr}}},\) see Definition 2. Then,

$$\begin{aligned} \forall 0 \le t < {\textsc {hp}}\; : \, \textsc {P}_{dm}(t)&= \left[ \sum _{{\tau _{{i,j}}}\in S(t)} {{\overline{\textsf {DMP}}}_{{i,j}}} \right] ^1 \\ S(t)&= \{ {\tau _{{i,j}}}\; | \; \chi _{i} = {{\textsc {hi}}}\; \wedge \; d_{{i,j}}= t \} \end{aligned}$$

is an upper bound on the probability of at least one deadline miss of any \({{\textsc {hi}}}\) critical job during \({{\textsc {lo}}}\)-criticality mode execution at time t\(0 \le t < {\textsc {hp}},\) where \({\overline{\textsf {DMP}}}_{{i,j}}\) denotes an upper bound on the deadline miss probability of job \({\tau _{{i,j}}}\) in the modified task system according to Theorem 3. Note, \([\ldots ]^{1}\) indicates the expression is limited to a maximum value of 1.

Proof

We can not assume that the deadline miss probabilities at time t are independent. Therefore we use as an upper bound of the union of events the sum of the individual probabilities.

The bounding of the value \(\textsc {P}_{dm}(t)\) to 1 is safe, as for any summation of events we have \(p(a\cup b) \le p(a) + (b)\) and \(p(a\cup b) \le 1\) leading to \(p(a\cup b) \le \min {(1, p(a) + (b))}\). S(t) denotes the set of all \({{\textsc {hi}}}\) critical jobs with deadline at time t. \(\square \)

We continue with the probability that at time instance \(0 \le t < {\textsc {hp}}\) the total backlog exceeds the upper bound \(B_{\max {}}\) which we denote as \(\textsc {P}_{be}(t)\).

Lemma 7

Given a modified task system where no lo \(\rightarrow \) hi mode switch is executed and all \({{\textsc {hi}}}\) critical jobs are trimmed to their execution time threshold \(C_{i}^{{\text {thr}}},\) see Definition 2. Then,

$$ \forall 0 \le t < {\textsc {hp}}\; : \, \textsc {P}_{be}(t) = {\mathbb {P}}({\overline{{\mathcal {B}}}}_{N}(t) > B_{\max {}}) $$

is an upper bound on the probability that the total backlog at time t exceeds \(B_{\max {}}\) during \({{\textsc {lo}}}\)-criticality mode execution within any hyperperiod, where \({\overline{{\mathcal {B}}}}_{N}(t)\) denotes an upper bound on the lowest priority backlog in the modified task system according to Theorem 2.

Proof

The total backlog equals \({\overline{{\mathcal {B}}}}_{N}(t)\) according to Definition 7. Then, the Lemma directly follows from Theorem 2. \(\square \)

Unfortunately, the computation of the probability \(\textsc {P}_{ov}(t)\) that at time instance \(0 \le t < {\textsc {hp}}\) at least one \({{\textsc {hi}}}\) critical job overruns its threshold execution time \(C_{i}^{{\text {thr}}}\) is more involved. Whereas the overrun probability \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}})\) can be simply calculated, it is more complex to understand at what time instance such an event happens, due to interference from other jobs. We will first compute the upper bound on the backlog for our modified \({{\textsc {lo}}}\)-system as usual. Based on this, we now consider each \({{\textsc {hi}}}\) critical job individually and compute its response time if the job would have the execution time \(C_{i}^{{\text {thr}}}\). If this response time plus the release time \(r_{{i,j}}\) of the job equals t, then the job overruns at t under the condition that it overruns at all. The following Lemma summarizes the corresponding result.

Lemma 8

Given a modified task system where no lo \(\rightarrow \) hi mode switch is executed and all \({{\textsc {hi}}}\) critical jobs are trimmed to their execution time threshold \(C_{i}^{{\text {thr}}},\) see Definition 2. Then, \(\forall 0 \le t < {\textsc {hp}}\)

$$\begin{aligned} \textsc {P}_{ov}(t)&= \left[ \sum _{{\tau _{{i,j}}}\in S} {\mathbb {P}}\left( (\texttt {rta}\left( {\overline{{\mathcal {B}}}}_{i}(r_{{i,j}}),\, \Pi ,\, \tau _{i,j}^{\textsf {ov}} \right) + r_{{i,j}}) \mod {\textsc {hp}}= t \right) \cdot {\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}}) \right] ^{1}, \\ S&= \{ {\tau _{{i,j}}}\; | \; \chi _{i} = {{\textsc {hi}}}\} \end{aligned}$$

is an upper bound on the probability that at time instance \(0 \le t < {\textsc {hp}}\) at least one \({{\textsc {hi}}}\) critical job overruns its threshold execution time \(C_{i}^{{\text {thr}}}\). Here, \({\overline{{\mathcal {B}}}}_{i}(t)\) denotes an upper bound on the level i backlog in the modified task system according to Theorem 2 and \(\tau _{i,j}^\textsf {ov}\) denotes a modified job \({\tau _{{i,j}}}\) with a deterministic computation time of \(C_{i}^{{\text {thr}}}\). Note, \([\ldots ]^{1}\) indicates the expression is limited to a maximum value of 1.

Proof

At first note that we do not assume that the probabilities of overrunning the threshold execution time \(C_{i}^{{\text {thr}}}\) are independent. Therefore, the union of at least one overrun at time t is bounded by the sum of individual probabilities for each \({{\textsc {hi}}}\) job, see the definition of S. Moreover, \({\mathbb {P}}(a) = {\mathbb {P}}(a | b) \cdot {\mathbb {P}}(b)\) for events a and b. In our case, \({\mathbb {P}}(b) = {\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}})\), i.e., the event that task \({\tau _{{i,j}}}\) has a overrun of its threshold execution time. We now need to show that the term \({\mathbb {P}}(\texttt {rta}\left( {\overline{{\mathcal {B}}}}_{i}(r_{{i,j}}),\, \Pi ,\, \tau _{i,j}^{\textsf {ov}} \right) + r_{{i,j}}) \mod {\textsc {hp}}= t)\) denotes the probability that an overrun due to task \({\tau _{{i,j}}}\) happens at time t under condition that the overrun happens at all, i.e., it represents \({\mathbb {P}}(a | b)\). Note that the term \([\texttt {rta}\left( {\overline{{\mathcal {B}}}}_{i}(r_{{i,j}}),\, \Pi ,\, \tau _{i,j}^{\textsf {ov}} \right) + r_{{i,j}}]\) denotes the finishing time of task \(\tau _{i,j}\) if using the worst-case steady-state backlogs \({\overline{{\mathcal {B}}}}\) and the execution time \(C_{i}^{{\text {thr}}}\). Therefore, under the assumption that the task overruns, it determines the distribution of the time when the overrun actually happens. As this time may be in the next hyperperiod, we use the modulo operation.

The bounding of the value \(\textsc {P}_{ov}\) to 1 is safe, as for any summation of events we have \(p(a\cup b) \le p(a) + (b)\) and \(p(a\cup b) \le 1\) leading to \(p(a\cup b) \le \min {(1, p(a) + (b))}\). \(\square \)

Based on the previous three Lemmas we can conclude this section with the desired worst-case probability \(\textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}(t)\) of a lo \(\rightarrow \) hi mode switch at time instance \(0 \le t < {\textsc {hp}}\) within any hyperperiod.

Theorem 6

\(\textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}(t)\) is an upper bound on the worst-case probability of a lo \(\rightarrow \) hi mode switch at time instance \(0 \le t < {\textsc {hp}}\) within any hyperperiod with

$$ \forall 0 \le t < {\textsc {hp}}\; : \, \textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}(t) = \left[ \textsc {P}_{dm}(t) + \textsc {P}_{be}(t) + \textsc {P}_{ov}(t) \right] ^{1}, $$

where \(\textsc {P}_{dm}(t),\) \(\textsc {P}_{be}(t)\) and \(\textsc {P}_{ov}(t)\) are computed according to Lemmas 67 and 8, respectively. An upper bound on the probability of at least one lo \(\rightarrow \) hi mode switch within a hyperperiod can be determined as

$$ \textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}^{{\textsc {hp}}} = \left[ \sum _{0 \le t < {\textsc {hp}}} \textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}(t) \right] ^{1}. $$

Note, \([\ldots ]^{1}\) indicates the expression is limited to a maximum value of 1.

Proof

The Theorem is a simple consequence of the previous Lemmas as we can not assume independence of events within a hyperperiod.

\(\square \)

As a simple corollary to the above Theorem, one can compute a lower bound on the expected length of a single \({{\textsc {lo}}}\)-criticality mode execution as

$$ \Delta ^{{{\textsc {lo}}}}_{\textit{exp}}= \left( \left\lceil \frac{1}{\textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}^{{\textsc {hp}}}} \right\rceil - 1\right) \cdot {\textsc {hp}}. $$

This results concludes the analysis of the \({{\textsc {lo}}}\)-criticality mode and we are now analysing the \({{\textsc {hi}}}\)-criticality mode in order to determine the remaining quantities as necessary for Theorems 4 and 5.

HI-criticality mode

We are still missing the computation of the maximal duration of a \({{\textsc {hi}}}\)-criticality mode execution quantity \(\Delta ^{{{\textsc {hi}}}}_{\max {}}\), as well as the worst-case probability \(\textsf {DMP}_{\chi }^{{{\textsc {hi}}}}\) of at least one deadline miss of any \(\chi \) job during any \({{\textsc {hi}}}\)-criticality mode started within a hyperperiod, where \(\chi \in \{ {{\textsc {lo}}}, {{\textsc {hi}}}\}\).

To this end, we will determine \({\textsc {hp}}\) different worst-case \({{\textsc {hi}}}\)-criticality mode scenarios, one for each starting time \(0 \le t < {\textsc {hp}}\) relative to the beginning of a hyperperiod. In other words, we will investigate \({\textsc {hp}}\) different \({{\textsc {hi}}}\)-criticality mode executions and then use the maximum of their durations as \(\Delta ^{{{\textsc {hi}}}}_{\max {}}\), and the maximum of their deadline miss probabilities to determine upper bounds that at least one \({{\textsc {hi}}}\) or \({{\textsc {lo}}}\) task misses its deadline during a single \({{\textsc {hi}}}\)-criticality mode execution. These quantities will then be combined with the probability \(\textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}(t)\) that a lo \(\rightarrow \) hi switch happens at relative starting time t in order to determine \(\textsf {DMP}_{\chi }^{{{\textsc {hi}}}}\), i.e., the worst-case probability of at least one deadline miss of any \(\chi \) critical job during any \({{\textsc {hi}}}\)-criticality mode started within a hyperperiod.

Broadly speaking, hi-criticality mode has three differences with lo-criticality mode. First, jobs released in hi-mode have different execution times: lo jobs are released in degraded mode, and hi jobs do not have the condition that they do not overrun their \(C_{i}^{{\text {thr}}}\) execution time threshold. Second, ‘carry-over’ jobs, which are released in lo-criticality mode but whose deadlines are after the mode switch, are present in hi-criticality mode and they need to be accounted for. Third, the initial system-level backlog is not zero, but depends on the mode switch time trigger. To account for these differences, we present the following worst-case \({{\textsc {hi}}}\)-critical execution task-set. It is created such that it is pessimistic what ever the mode switch trigger may be, and it accounts for both carry-over jobs and jobs released during hi-mode.

The worst-case \({{\textsc {hi}}}\)-mode scenario for starting at time t will be defined as follows:

Definition 12

(Worst-Case HI-Criticality Execution) We define \({\textsc {hp}}\) task set \({\widehat{\Pi }}(t)\), one for each starting time \(0 \le t < {\textsc {hp}}\). It differs from the original task set \(\Pi \) as follows:

  1. 1.

    The phase offsets \(\phi _{i}\) are implicitly changed such that all jobs are already available in \(0 \le t < {\textsc {hp}}\), i.e., we allow for negative job indices j.

  2. 2.

    We consider all jobs with starting times after t, i.e., \(j \ge (t - \phi _{i}) / T_{i} + 1\). They have a known execution time \({\widehat{{\mathcal {C}}}}_{i}\) which is not larger than the degraded mode WCET \(C_{i}^{{\text {deg}}}\) for \({{\textsc {lo}}}\) criticality jobs, and a known execution time \({\widehat{{\mathcal {C}}}}_{i} = {\mathcal {C}}_{i}\) for \({{\textsc {hi}}}\) criticality jobs.

  3. 3.

    We consider jobs whose release time is smaller than t and deadline is larger than t. These included jobs \({\tau _{{i,j}}}\in {\widehat{T}}\) with \((t - \phi _{i}) / T_{i} + 1< j < (t + D_{i} - \phi _{i}) / T_{i} + 1\) have execution times \({\widehat{{\mathcal {C}}}}_{i} = {\mathcal {C}}_{i}\) for both lo and hi criticality jobs; i.e. for lo jobs the execution times are not degraded, and for hi jobs they may or may not overrun their \(C_{i}^{{\text {thr}}}\) threshold.

  4. 4.

    In addition, for each hi-criticality mode starting time t, \(0 \le t < {\textsc {hp}}\), we introduce the initial backlog at time t and priority levels \(1 \le i \le N\), \({\widehat{{\mathcal {B}}}}_{i}(t)\). If a overrun can not happen at time t, due to the fact there is no hi job released whose deadline has passed by time t, the initial backlog is as follows:

    $$\begin{aligned} {\mathbb {P}}({\widehat{{\mathcal {B}}}}_i(t) = u)&= {\left\{ \begin{array}{ll} {\mathbb {P}}({\overline{{\mathcal {B}}}}_i(t) = u) &{} u < B_{\max {}}\\ \sum _{v = B_{\max {}}}^{\infty }{\mathbb {P}}({\overline{{\mathcal {B}}}}_i(t) = v) &{} u = B_{\max {}}\\ 0 &{} u > B_{\max {}}\end{array}\right. } \end{aligned}$$

    where \({\overline{{\mathcal {B}}}}_{i}(t)\) denotes an upper bound on the ith priority backlog in the modified lo-criticality system according to Theorem 2. If an overrun can happen at time t, due to at least one hi job having its release time before t and its deadline after, then the initial backlog at time t is the following:

    $$\begin{aligned} {\mathbb {P}}({\widehat{{\mathcal {B}}}}_i(t) = u)&= {\left\{ \begin{array}{ll} {\mathbb {P}}({\overline{{\mathcal {B}}}}^\textsf {ov}_i(t) = u) &{} u < B_{\max {}}\\ \sum _{v = B_{\max {}}}^{\infty }{\mathbb {P}}({\overline{{\mathcal {B}}}}^\textsf {ov}_i(t) = v) &{} u = B_{\max {}}\\ 0 &{} u > B_{\max {}}\end{array}\right. } \end{aligned}$$

    where \({\overline{{\mathcal {B}}}}^{\textsf {ov}}_{i}(t)\) denotes an upper bound on the ith priority backlog in the modified lo-criticality system according to Theorem 2, but with the added condition that at least one of the released hi jobs whose deadline is after time t has overrun its threshold execution time \(C_{i}^{{\text {thr}}}\).

    Let us now describe how \({\overline{{\mathcal {B}}}}^{\textsf {ov}}_{i}(t)\) can be computed.

    To this end, we solve

    $$ {\mathbb {P}}({\overline{{\mathcal {B}}}}^{\textsf {no+ov}}_{i}(t) = u) = {\mathbb {P}}({\textsf {no}}) \cdot {\mathbb {P}}({\overline{{\mathcal {B}}}}_{i}(t) = u) + {\mathbb {P}}({\textsf {ov}}) \cdot {\mathbb {P}}({\overline{{\mathcal {B}}}}^{\textsf {ov}}_{i}(t) = u). $$

    Here, \({\overline{{\mathcal {B}}}}_{i}(t)\) denotes an upper bound on the ith priority backlog in the modified lo-criticality system according to Theorem 2. \({\overline{{\mathcal {B}}}}^{\textsf {no+ov}}_{i}(t)\) is also an upper bound on the ith priority backlog according to Theorem 2, but the system used for its computation is slightly modified. It is the lo-criticality system with the difference that hi jobs released before time t whose deadlines are after that time have no condition on whether they overrun their \(C_{i}^{{\text {thr}}}\) execution time or not—we use their normal execution times \({\mathcal {C}}_{i}\) in calculating the backlog. The probability that none of these hi jobs overrun their respective \(C_{i}^{{\text {thr}}}\) execution times is noted \({\mathbb {P}}({\textsf {no}})\), while the \({\mathbb {P}}({\textsf {ov}}) = 1 - {\mathbb {P}}({\textsf {no}})\) is the probability that at least one of these hi jobs overruns. \({\mathbb {P}}({\textsf {no}})\) is obtained directly from execution times of these hi jobs, \({\mathbb {P}}({\textsf {no}}) = \sum _{{\tau _{{i,j}}}\in S}{{\mathbb {P}}\left( {\mathcal {C}}_{i} > C^{{\text {thr}}}_{i}\right) }\), where \(S = \left\{ {\tau _{{i,j}}}\; | \; \chi _{i} = {{\textsc {hi}}}\; \wedge \; r_{{i,j}}\le t \; \wedge \; d_{{i,j}}> t \right\} \).

Condition 2 includes all tasks which are released during \({{\textsc {hi}}}\)-criticality mode, noting that lo jobs are degraded and hi jobs have \({\mathcal {C}}_{i}\) execution times. The third condition deals with carry-over jobs from \({{\textsc {lo}}}\)- to \({{\textsc {hi}}}\)-criticality mode, whose deadline misses have not yet been accounted for in the \({{\textsc {lo}}}\)-criticality mode analysis. Note that here the worst case comes from the assumption that all hi jobs may overrun. Finally, condition 4 includes the worst-case backlog at the starting time t, as it is the backlog with the condition that an overrun of at least one hi job occurred, but also it is limited by the maximal backlog \(B_{\max {}}\). Simpler constructions of the worst-case task-set lead to high overestimations to the length and deadline miss probabilities of hi-criticality mode.

Starting from the worst-case scenarios for the \({{\textsc {hi}}}\)-mode for each time instant t, \(0 \le t < {\textsc {hp}}\), we now evaluate each scenario and determine the corresponding worst-case durations as well as the deadline miss probabilities. To do this, we apply the results from Sect. 4 and use the function \(\texttt {bsse}\left( \widehat{{\mathcal {B}}}_{i}(t),\, {\widehat{\Pi }},\, i,\, t,\, u \right) \) to compute all relevant backlogs for the task sets from Definition 12. The successive computation of the backlogs stops whenever the system gets idle for the first time: \(\widehat{{\mathcal {B}}}_{i}(t_{s}) = 0\) for all priority levels i. This time is an upper bound on the hi \(\rightarrow \) lo switching time. Using the response time analysis, see (10), we can finally determine all jobs that miss their deadline during the \({{\textsc {hi}}}\)-mode. Additionally, for the response time analysis for calculating the deadline miss probabilities of hi carry-over jobs, we substitute the execution time of the carry-over job under analysis \({\widehat{{\mathcal {C}}}}_{i}\) with the conditional execution time \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}})\), in order to get the deadline miss probability with the condition that the hi carry-over job overran its \(C_{i}^{{\text {thr}}}\) execution time threshold.

Lemma 9

The first time \(t_{\text {idle}},\) the execution of the task set \({\widehat{\Pi }}(t)\) from Definition 12 yields a system-level backlog which is zero, determines an upper bound \(\Delta ^{{{\textsc {hi}}}}_{\max {}}(t)\) on the duration of a \({{\textsc {hi}}}\)-criticality mode starting at time t relative to the beginning of any hyperperiod of the original task system \(\Pi \):

$$ \forall 0 \le t < {\textsc {hp}}\; : \, \Delta ^{{{\textsc {hi}}}}_{\max {}}(t) = t_{\text {idle}} - t. $$

Let us define the probability \(p_{{i,j}}(t)\) that some job \({\tau _{{i,j}}}\) of task set \({\widehat{\Pi }}(t)\) from Definition 12 misses its deadline in the time interval \([t, t+\Delta ^{{{\textsc {hi}}}}_{\max {}}(t)]\). Then \(\textsf {DMP}_{\chi }^{{{\textsc {hi}}}}(t)\) is an upper bound on the probability that there is at least one deadline miss of any \(\chi \) critical job with \(\chi \in \{ {{\textsc {lo}}}, {{\textsc {hi}}}\}\) within a \({{\textsc {hi}}}\)-criticality mode execution starting at time t relative to the beginning of any hyperperiod in the original task system \(\Pi \):

$$\begin{aligned} \forall 0 \le t < {\textsc {hp}}\; : \, {\textsf {DMP}_{\chi }^{{{\textsc {hi}}}}(t)}&= \left[ \sum _{{\tau _{{i,j}}}\in S(t)} p_{{i,j}}(t) \right] ^{1} \\ S(t)&= \{ {\tau _{{i,j}}}\in {\widehat{\Pi }}_{i}(t) \; | \; \chi _{i} = \chi \}. \end{aligned}$$

Note, \([\ldots ]^{1}\) indicates the expression is limited to a maximum value of 1.

Proof

The main part of the proof is to show that the task set \({\widehat{\Pi }}(t)\) indeed defines a worst-case scenario in terms of duration and deadline miss probabilities, when the \({{\textsc {hi}}}\)-criticality mode starts at time t relative to the beginning of any hyperperiod. Note that the second condition in Definition 12 ensures that all tasks which are released during a \({{\textsc {hi}}}\)-criticality mode in the worst case, are included in the \({{\textsc {hi}}}\)-criticality task set as well. Moreover, we consider the exact execution times for all of these jobs, namely the degraded execution times \({\widehat{{\mathcal {C}}}}_{i}\) which are not longer than \(C_{i}^{{\text {deg}}}\) for \({{\textsc {lo}}}\) criticality jobs, and \({\widehat{{\mathcal {C}}}}_{i} = {\mathcal {C}}_{i}\) for \({{\textsc {hi}}}\) criticality jobs. The third condition adds the worst-case carry-over jobs from \({{\textsc {lo}}}\)- to \({{\textsc {hi}}}\)-criticality mode whose deadline misses have not yet been accounted for in the \({{\textsc {lo}}}\)-mode analysis. All jobs who missed their deadline before the lo \(\rightarrow \) hi mode switch have been considered already in the \({{\textsc {lo}}}\)-mode analysis, but their possible backlog at t will be considered. Therefore, we just need to explicitly include jobs whose release time is before and whose deadline is after the lo \(\rightarrow \) hi mode switch. The corresponding execution times are taken as worst-case as well, namely for each carry-over hi job individually, for calculating its deadline miss probability we assume it overruns its execution time threshold. Finally, we look at the worst-case backlog at the starting time t. It encompasses the remaining execution times of jobs who were released before t but not yet finished. Due to the triggering condition of a mode switch, we assume the worst-case that at least one hi job has overrun its \(C_{i}^{{\text {thr}}}\) execution time. Also according to triggering conditions, the backlog is never larger then \(B_{\max {}}\) for all priority levels. Note that the backlog also contains jobs whose deadline is within the \({{\textsc {hi}}}\)-mode, i.e., the carry-over jobs who have been explicitly included as tasks.

In order to determine the upper bound on the deadline miss probability \(\textsf {DMP}_{\chi }^{{{\textsc {hi}}}}(t)\) of any \(\chi \)-critical job we again do not assume independence of individual miss events and use the sum of the corresponding probabilities as an upper bound. \(\square \)

As a result of this Lemma we can determine the desired quantities, namely maximal duration and upper bound on deadline misses, for each time point t relative to the starting of a hyperperiod. The computations are based on simple simulations of \({\textsc {hp}}\) executions of worst-case \({{\textsc {hi}}}\)-criticality mode scenarios. The simulation times are finite as long as there exists a finite time in \({\widehat{\Pi }}(t)\) when the system gets the first time idle. The following Lemma leads to a necessary and sufficient condition.

Lemma 10

A set of finite bounds \(\Delta ^{{{\textsc {hi}}}}_{\max {}}(t)\) on the duration of \({{\textsc {hi}}}\)-criticality modes exists if and only if the maximal system utilization in hi-criticality mode in the original system is less than one.

Proof

Let us look at the modified task set \({\widehat{\Pi }}(t)\) starting at time t. If the maximal system utilization in hi-criticality mode is less than one, then the maximal system level backlog at time \(t+(n+1)\cdot {\textsc {hp}}\) is strictly smaller than the maximal system level backlog at time \(t+n\cdot {\textsc {hp}}\) for \(n>1\), because the arriving jobs in time interval \([t+n\cdot {\textsc {hp}}, t+ (n+1)\cdot {\textsc {hp}})\) are identical for all \(n>1\) and there is less additional accumulated computation time from all arriving jobs than its length \({\textsc {hp}}\). Therefore, a time instance will exist when the maximal system level backlog is zero and the system is idle. If the maximal system utilization in hi-criticality mode is larger or equal than one, then the maximal system level backlog at time \(t+(n+1)\cdot {\textsc {hp}}\) could be equal or greater than the maximal system level backlog at time \(t+n\cdot {\textsc {hp}}\). Therefore, in the worst case, the system level backlog never gets to zero and the hi-criticality mode could last for ever. \(\square \)

Based on these results, we can now aggregate the computed quantities in order to determine the maximal duration of a \({{\textsc {hi}}}\)-criticality mode execution quantities \(\Delta ^{{{\textsc {hi}}}}_{\max {}}\) as well as the worst-case probability \(\textsf {DMP}_{\chi }^{{{\textsc {hi}}}}\) of at least one deadline miss of any \(\chi \) job during any \({{\textsc {hi}}}\)-criticality mode started within a hyperperiod, where \(\chi \in \{ {{\textsc {lo}}}, {{\textsc {hi}}}\}\).

Theorem 7

\(\Delta ^{{{\textsc {hi}}}}_{\max {}}\) is an upper bound on the maximal duration of any \({{\textsc {hi}}}\)-criticality mode in the original task system \(\Pi ,\) where

$$ \Delta ^{{{\textsc {hi}}}}_{\max {}}= \max _{0 \le t < {\textsc {hp}}} \Delta ^{{{\textsc {hi}}}}_{\max {}}(t). $$

\(\textsf {DMP}_{\chi }^{{{\textsc {hi}}}}\) is a bound on the worst-case probability of at least one deadline miss of any \(\chi \) critical job with \(\chi \in \{ {{\textsc {lo}}}, {{\textsc {hi}}}\}\) during any \({{\textsc {hi}}}\)-criticality mode started within a hyperperiod in the original task system, where

$$ \textsf {DMP}_{\chi }^{{{\textsc {hi}}}} = \left[ \sum _{0 \le t < {\textsc {hp}}} \textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}(t) \cdot \textsf {DMP}_{\chi }^{{{\textsc {hi}}}}(t) \right] ^{1} $$

with \(\textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}(t)\) as determined in Theorem 6. Note, \([\ldots ]^{1}\) indicates the expression is limited to a maximum value of 1.

Proof

According to Lemma 9, \(\Delta ^{{{\textsc {hi}}}}_{\max {}}(t)\) is an upper bound on the duration of a \({{\textsc {hi}}}\)-criticality mode starting at relative time t within a hyperperiod. Clearly, the maximum for all relative time instances provides the maximal duration for any time instance. The probability of a deadline miss within a \({{\textsc {hi}}}\)-mode execution is the probability of the union of deadline misses at any time instance within the hyperperiod. As we cannot assume independence, we upper bound this probability by the sum of individual probabilities. The probability of a deadline miss within a \({{\textsc {hi}}}\)-mode starting at relative time t is clearly the probability that a a mode switch happens, i.e., \(\textsc {P}_{{{\textsc {lo}}}\rightarrow {{\textsc {hi}}}}(t)\), times the probability that a deadline miss happens within the \({{\textsc {hi}}}\)-mode, i.e., \(\textsf {DMP}_{\chi }^{{{\textsc {hi}}}}(t)\).

\(\square \)

This concludes the schedulability analysis of probabilistic Mixed-Criticality Systems according to Definition 5, as all required quantities for Theorems 4 and 5 have been determined in Sects. 5.3 and 5.4 .

Of course, the tightness of the analysis can be improved through various approaches. Some of them as well as limitations of the described analysis are noted in the conclusion.

Experimental results

In order to illustrate our probabilistic Mixed Criticality (pMC) schedulability analysis, this section first shows one sample task-set. The sample task-set is inspired by applications from the avionics industry. Then, experiments on randomly generated task-sets are used to compare pMC scheduling with other schemes: a probabilistic but non-Mixed Criticality scheme ‘Probabilistic Deadline Monotonic Priority Ordering’ pDMPO, the deterministic ‘Adaptive Mixed Criticality’ scheme (AMC), and a deterministic non-MC ‘Deadline Monotonic Priority Ordering’ scheme. These are all listed in Table 2, and described in detail below. For the experiments, we generated randomized task-sets with all but one parameter the same, in order to see the effect this one parameter has.

Table 2 Scheduling schemes used throughout Sect. 6

Three experiments are conducted. The first experiment serves to show the impact of the system utilization, the second experiment varies the probability each hi task overruns its \(C_{i}^{{\text {thr}}}\) execution time threshold \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}})\), and finally the impact of the maximal system-level backlog is visualized in the third experiment. In general, we show that pMC dominates all other schemes, except in situations when hi-criticality mode is entered too often. In these cases, we find that there is too much degradation of lo jobs, therefore scheduling using the probabilistic but non-Mixed Criticality pDMPO yields better results.

Baseline schemes To evaluate pMC scheduling, we have used three deterministic and one probabilistic baseline scheme, as listed in Table 2. All schemes are based on fixed-priority preemptive scheduling. The first deterministic scheme is a non-Mixed Criticality one, Deadline Monotonic Priority Ordering (DMPO). As the name suggests, tasks are prioritized only by their deadlines, and scheduled according to their \(C_{i}^{\max {}}\) WCETs.

The next scheme is Adaptive Mixed Criticality (AMC), as described by Baruah et al. (2011). The scheme features two modes of operation. The system starts in lo-criticality mode where hi tasks are scheduled according to their \(C_{i}^{{\text {thr}}}\) threshold execution times. If any hi job overruns this value, a switch to hi-criticality mode happens, where all lo tasks are released in degraded mode. The scheme does not quantify the duration of these two modes, only the schedulability of them.

As a deterministic baseline scheme we introduce the UB-HL bound (Baruah et al. 2011). The bound is a necessary test for all fixed priority preemptive MC schemes, and such it provides an upper bound on the performance of all fixed priority preemptive deterministic MC schemes.

Finally, the Probabilistic Deadline Monotonic Priority Ordering (pDMPO) scheme represents the analysis as introduced by Díaz et al. (2002). In pDMPO, tasks are given priorities based on their deadlines, they are scheduled using their complete \({\mathcal {C}}_{i}\) execution times, and there is only one mode of operation. The scheme can be viewed as a border case of pMC, where hi-criticality mode is never entered.

Task Execution Times To model task execution times \({\mathcal {C}}_{i}\), Weibull distributions were used, with a condition that they do not take values greater than the task’s WCET \(C_{i}^{\max {}}\). These distributions have been used in related work for modeling the distribution of long but unlikely execution times (Cucu-Grosjean et al. 2012).

Weibull distributions are functions of parameters k and \(\lambda \). To generate an execution time, we first choose k uniformly from [1.5, 3]. Then, the parameter \(\lambda \) is computed the following way. For lo tasks, \(\lambda \) was computed such that the cumulative density function at the task’s WCET \(C_{i}^{\max {}}\) is \(1 - 10^{-8}\). Similarly, for hi tasks, we choose \(\lambda \) so the cumulative density function at the task’s execution time threshold \(C_{i}^{{\text {thr}}}\) is \(1 - 10^{-8}\), unless stated otherwise. This is the way we set the probability a hi task overruns its threshold execution time. Finally, all values of the probability density function above \(C_{i}^{\max {}}\) are set to be 0, and the rest of the distribution is normalized. This way, we have a valid execution time modeled by a Weibull distribution, with the condition it never exceeds the task’s WCET \(C_{i}^{\max {}}\), and for which the probability a hi task overruns its execution time threshold is \(C_{i}^{{\text {thr}}}\).

Sample system

Here we introduce a task-set modelling a sample system, to which we applied our proposed schedulability analysis. We explored the task-set, first by varying execution times of all tasks, and then by varying deadlines. This was done to illustrate probabilistic Mixed Criticality scheduling. We present the three schedulability values: \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\), \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\), and \(\textsf {PDJ}_{{\text {deg}}}\), and we also show the expected duration of lo-criticality mode \(\Delta ^{{{\textsc {lo}}}}_{\textit{exp}}\), and the maximal duration of hi-criticality mode \(\Delta ^{{{\textsc {hi}}}}_{\max {}}\).

The system’s lo and hi tasks are inspired by the ROSACE (Pagetti et al. 2014) and FMS (Durrieu et al. 2014) applications, respectively. The hi tasks are inspired by an industrial implementation of the flight management system (FMS). This application consists of one task which reads sensor data, and four tasks that compute the location of the aircraft. For lo tasks, the open source avionic benchmark ROSACE was modeled. It is made up of three tasks which simulate pilot’s instructions, and eight tasks implementing a controller.

Setup Table 3 lists the tasks’ periods and execution time values: worst-case execution times (WCETs) \(C_{i}^{\max {}}\), thresholds for hi tasks \(C_{i}^{{\text {thr}}}\), and degraded WCETs \(C^{{\text {deg}}}_{i}\) for lo tasks. Execution time values are functions of the parameter \(f_c\), which we vary from 0.05 to 7.5 in 0.05 steps. Note that for hi tasks, \(C_{i}^{\max {}}\) values are 2.5 times larger than the corresponding \(C_{i}^{{\text {thr}}}\), while for lo tasks the worst-case execution time in degraded mode is \(C^{{\text {deg}}}_{i} = 0.33 \cdot C_{i}^{\max {}}\), rounded up to the nearest integer. The deadline of each task has been constrained by a factor of \(f_{d}\), \(D_{i} = T_{i} \cdot f_{d}\), where \(f_{d}\) is varied from 0.005 to 1 in steps of 0.005. Next, initial phases for tasks are 0, while tasks’ priority assignments are given in the table. Note that we use deadline monotonic priority assignment.

Table 3 The sample system’s parameters

We model probabilistic execution times of tasks with Weibull distributions, as described in the beginning of this section. The probability that a hi task executes for longer than its threshold execution time \(C_{i}^{{\text {thr}}}\) is \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}}) = 10^{-8}\), for every hi task. For the maximal system-level backlog, we used \(B_{\max {}} = 5\,\)ms. The hyperperiod lasts for \(60\,\)ms, and inside one there are 500 lo jobs and 19 hi jobs. Regardless of the parameter \(f_c\), the utilization of lo tasks is 5.73 times higher than the utilization of hi tasks.

In Fig. 2, the two left plots have results when deadlines are fixed (\(f_{d} = 1\)), but execution times values from Table 3 are varied with \(f_{c} \in (0, 7.5]\). In the two right plots of Fig. 2, shown are results when deadlines are varied \(f_{d} \in (0, 1]\), but all execution time values are fixed (\(f_{c} = 2\)).

Fig. 2
figure2

Metrics characterizing the sample task-set. Left: with fixed deadlines \(f_{d}=1\) but various utilization. Right: with fixed utilization \(f_{c} = 2\) but scaling all deadlines

Results As expected, the deadline miss probability per hour for both hi and lo jobs, \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\) and \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\), increases as the utilization increases, or as the deadlines become more constrained. In this example, \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\) is larger than \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\), even though hi criticality tasks have the lowest priority. This is mainly because there are more lo than hi jobs, i.e. 500 versus 19 jobs per hyperperiod. As for the probability that a lo job is released in degraded mode, \(\textsf {PDJ}_{{\text {deg}}}\), we notice it follows a similar trend. In this experiment, the value never goes to zero, because there is always a non-zero probability a lo \(\rightarrow \) hi criticality mode switch occurs.

In the bottom right plot of Fig. 2, the expected duration of lo-mode is shown to resemble the inverse of \(\textsf {PDJ}_{{\text {deg}}}\). Except when the deadlines are very constrained (\(f_{d} < 0.12\)), lo-criticality mode lasts for an expected \(\Delta ^{{{\textsc {lo}}}}_{\textit{exp}}= 88\,\)h before a trigger event occurs. The maximal duration of hi-criticality mode \(\Delta ^{{{\textsc {hi}}}}_{\max {}}\) depends only on the system utilization. This is shown in the bottom left plot as a function of \(f_{c}\). The value is \(1.1\,\)ms for \(f_{c} = 2\), and \(21.7\,\)ms for \(f_{c} = 7.5\). Both values are smaller than \(\Delta ^{{{\textsc {lo}}}}_{\textit{exp}}\) by orders of magnitude.

Randomized systems

Now we continue, and present three further experiments. They demonstrate the impact of three design parameters on schedulability: the system utilization, the probability that a hi tasks overruns its execution time threshold \(C_{i}^{{\text {thr}}}\), and the choice of the maximal system-level backlog.

More specifically, the first experiment shows whether task-sets of different system utilizations are \((\sigma _{{{\textsc {hi}}}}, \sigma _{{{\textsc {lo}}}}, \sigma _{{\text {deg}}})\)-schedulable using probabilistic Mixed Criticality (pMC) scheduling, as well as other scheduling schemes.

The second and third experiments compare pMC with the probabilistic but non-MC scheme pDMPO. They demonstrate that pMC leads to improved schedulability, except when hi-criticality mode is entered too often, either because of the first or the third mode switch trigger, respectively.

For all three experiments, tasks were randomly generated as described below.

Task-Set Generation For each of the three experiments presented, the UUnifast-Discard algorithm (Davis and Burns 2011) was used to randomly generate task-sets, with the following parameters we found reasonable.

  • First, periods and maximal execution times in lo-criticality mode (\(C_{i}^{{\text {thr}}}\) values for hi tasks and \(C_{i}^{\max {}}\) for lo tasks) were generated by the UUnifast algorithm. Periods were chosen between \(\{50\,\upmu \)s, \(100\,\upmu \)s, \(200\,\upmu \)s, \(250\,\upmu \)s, \(500\,\upmu \)s, \(1000\,\upmu \)s\(\}\).

  • All initial phases were set to 0, and tasks’ deadlines are equal to their period.

  • Then, every task’s criticality is assigned to be hi with a probability of 0.5 (i.e. parameter \(CP = 0.5\)).

  • For every hi task, the worst case execution time (WCET) \(C_{i}^{\max {}}\) is a fixed multiplier of the corresponding threshold \(C_{i}^{{\text {thr}}}\), \(C_{i}^{\max {}} = 1.5 \cdot C_{i}^{{\text {thr}}}\) (i.e. parameter \(CF = 1.5\)). For lo tasks, their degraded WCET is set to be a third of their actual WCET, \(C_{i}^{{\text {deg}}} = 0.33 \cdot C_{i}^{\max {}}\).

  • To model task execution times \({\mathcal {C}}_{i}\), we have used Weibull distributions as explained at the beginning of this Section. The probability each hi job \({\tau _{{i,j}}}\) overruns its execution time threshold is \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}}) = 10^{-8}\), unless stated otherwise.

  • The number of tasks per task-set is 60.

  • Finally, the maximum backlog \(B_{\max {}}\) is \(500\,\upmu \)s, unless stated otherwise.

For the system utilization and other details, we refer the reader to the setup section of each experiment.

Priority Assignment For the probabilistic scheduling schemes pMC and pDMPO, we have used deadline monotonic priority assignment. Note that (Maxim et al. 2011) shows that this assignment is in general not optimal for probabilistic systems, they suggest instead Audsley’s priority assignment algorithm. For the deterministic scheduling schemes, AMC uses Audsley’s priority assignment which is optimal for this scheme, while DMPO by definition uses deadline monotonic priorities.

‘Utilization’ experiment

In this first experiment, we examine the schedulability of systems with various system utilizations. More precisely, we check whether randomly generated systems of utilization 0.1 through 2.0 are \((\sigma _{{{\textsc {hi}}}}, \sigma _{{{\textsc {lo}}}}, \sigma _{{\text {deg}}}) = (10^{-8}, 10^{-6}, 10^{-5})\)-schedulable under probabilistic Mixed Criticality (pMC) scheduling, under a probabilistic but non-MC scheme (pDMPO), as well as under deterministic baseline schemes: DMPO, AMC, and UB-HL. We also examine the values relevant to pMC scheduling as functions of maximum system utilization: the probability of deadline miss per hour for hi or lo jobs \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\) and \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\), and the probability of degraded lo jobs \(\textsf {PDJ}_{{\text {deg}}}\).

Setup We ranged the system utilization from 0.1 to 2.0 with 0.1 steps, and for each step we created 1000 task-sets according to the previously given description. To reiterate, the following parameters were used: the ratio between the WCET \(C_{i}^{\max {}}\) and execution time threshold \(C_{i}^{{\text {thr}}}\) for every hi task is \(CF = C_{i}^{\max {}} / C_{i}^{{\text {thr}}} = 1.5\), the probability each task is assigned hi criticality is \(CP = 0.5\), the probability a hi job overruns its execution time threshold \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}}) = 10^{-8}\), the degradation of lo tasks is \(C_{i}^{{\text {deg}}} = 0.33 \cdot C_{i}^{\max {}}\), there are 60 tasks in each task-set, and the maximal system-level backlog is \(B_{\max {}} = 500\,\upmu \)s.

Tasks’ execution times \({\mathcal {C}}_{i}\) depend on the utilization and task-set in question. We found the mean of the execution times to be between 2.84 and \(16.38\,\upmu \)s, with the maximal execution time \(C_{i}^{\max {}}\) among all tasks in a task-set being between 21 and \(387\,\upmu \)s.

Results Figure 3 presents the most important result of our experiments. For task-sets of different system utilizations, the \((10^{-8}, 10^{-6}, 10^{-5})\)-schedulability under various scheduling schemes is given in Fig. 3 Top. To understand better how utilization impacts pMC schedulability, Figure 3 Middle and Bottom show statistics on the \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\), \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\) and \(\textsf {PDJ}_{{\text {deg}}}\) metrics. The box-plots visualize the 10th, 25th, 50th, 75th, and 90th percentile of each metric.

Fig. 3
figure3

The \((10^{-8}, 10^{-6}, 10^{-5})\)-schedulability of task-sets, as a function of utilization under pMC and other schemes (Top), and the impact utilization has on \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\), \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\) (Middle) and \(\textsf {PDJ}_{{\text {deg}}}\) (Bottom)

Regarding the three deterministic schemes, we see that they perform similarly as in related work, for example (Baruah et al. 2011). Remember that for deterministic schemes, a task-set is either ‘completely’ schedulable or it is not, as there is no notion of probabilities.

In Fig. 3 Top, we can see that deadline monotonic priority ordering (DMPO) has the lowest schedulability among all tested schemes. This is because DMPO attempts to schedule a task-set using only WCET (\(C^{\max {}}_{i}\)) values. The adaptive Mixed Criticality (AMC) scheme performs better, as it performs a lo \(\rightarrow \) hi mode switch every time hi jobs need more execution time. Still, the schedulability of deterministic fixed priority preemptive schemes is upper-bounded by the UB-HL bound.

For the probabilistic schemes pDMPO and pMC, we can confirm that they outperform deterministic schemes. Probabilistic schemes allow a system with a utilization greater than one to be schedulable, because they take into account the low probability that a long execution time is observed. Let us first focus on probabilistic deadline monotonic priority ordering (pDMPO). We understand from Díaz et al. (2002) that deadline misses under pDMPO happen when the backlog is large, i.e. when one or more jobs take a long time to execute. The bigger the utilization is, the likelier it is that the backlog is large.

As for probabilistic Mixed Criticality (pMC), it features three lo \(\rightarrow \) hi mode switch triggers. All three triggers are indicators that the backlog is large: the first trigger activates when a hi job executes for a long time, the second trigger indicates that a hi job missed its deadline due to a large backlog blocking its execution, and finally the third trigger explicitly notes that the system-level backlog is too large. After detecting these high-backlog situations, the system under pMC transitions to hi-criticality mode where lo jobs are degraded, and thus the backlog is decreased. This ensures that deadline miss probabilities of both lo and hi tasks are reduced, at the cost of having some lo jobs released in degraded mode. Most importantly, this is demonstrated in Fig. 3 Top, where pMC outperforms pDMPO as well as all other schemes. Furthermore, in Fig. 3 Middle, we see how both \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\) and \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\) increase gradually with the increase of utilization. The small difference between \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\) and \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\) comes from the fact that the system switches to hi-criticality mode whenever a hi jobs overruns its \(C_{i}^{{\text {thr}}}\) threshold, which helps hi jobs keep their deadline. Finally, Fig. 3 Bottom shows the probability a lo job is released with degradation. This slight increase is a sign of being in hi-criticality mode more often, and this quantifies the cost of probabilistic Mixed Criticality scheduling.

‘Execution Threshold’ experiment

In this experiment, we varied a design parameter relating to tasks’ execution times \({\mathcal {C}}_{i}\): the probability that a hi job overruns its execution time threshold \(C_{i}^{{\text {thr}}}\). We then inspected how this impacts schedulability under probabilistic Mixed Criticality (pMC) and the probabilistic non-Mixed Criticality pDMPO scheme. Because we used a utilization of 1.4, deterministic schemes could not schedule any task-sets. The probability each hi job \({\tau _{{i,j}}}\) overruns its execution time threshold \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}})\) is ranged from \(5\cdot 10^{-12}\) to \(10^{-4}\). Ultimately, this experiment demonstrates that it makes sense to use probabilistic Mixed Criticality scheduling if hi-criticality mode is not entered too often, and the importance of the \(\textsf {PDJ}_{{\text {deg}}}\) metric is justified.

Setup A total of 16 configurations, each with 1000 task-sets, were generated for this experiment. The configurations have the same parameters, except for the probability each hi job \({\tau _{{i,j}}}\) overruns its execution time threshold \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}})\). The following values for \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}})\) were used: \(\{5\cdot 10^{-12},\) \(10^{-11},\) \(5\cdot 10^{-11},\) ..., \(10^{-4}\}\). Besides this, the system utilization for all configurations is 1.4, while all other parameters are according to the description mentioned at the beginning of Sect. 6.2.

Regardless of the fact that \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}})\) is varied by 8 orders of magnitude, we found that the mean execution time per configuration changes little. It is between 8.69 and \(8.70\,\upmu \)s. Among all tasks in every task-set, the worst case execution time \(C_{i}^{\max {}}\) is \(287\,\upmu \)s.

Results The results of this experiment are shown in Fig. 4. In the top figure, we present \((10^{-8}, 10^{-6}, 10^{-5})\)- and \((10^{-8}, 10^{-6}, 1)\)-schedulability under pMC, as well as \((10^{-8}, 10^{-6}, -)\)-schedulability under pDMPO. Since by definition \(\textsf {PDJ}_{{\text {deg}}} \le 1\), we can interpret \((\sigma _{{{\textsc {hi}}}}, \sigma _{{{\textsc {lo}}}}, \sigma _{{\text {deg}}}=1)\)-schedulability under pMC as a schedulability test which ignores the \(\textsf {PDJ}_{{\text {deg}}}\) metric. In the middle and bottom figures, the box-plots visualize the 10th, 25th, 50th, 75th, and 90th percentile of each metric.

Fig. 4
figure4

\((10^{-8}, 10^{-6}, 10^{-5})\)-schedulability of task-sets under pMC and pDMPO, and \((10^{-8}, 10^{-6}, 1)\)-schedulability under pMC, for various probabilities that a hi job overruns its execution time threshold \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}})\) (Top), and the impact this value has on \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\), \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\) (Middle) and \(\textsf {PDJ}_{{\text {deg}}}\) (Bottom)

In Fig. 4 Top, let us first focus on comparing pDMPO and pMC when \(\sigma _{{\text {deg}}} = 1\). In this case, when the \(\textsf {PDJ}_{{\text {deg}}}\) metric is ignored, we see that more task-sets are always schedulable under pMC than under pDMPO. The reasons pMC scheduling is better in this case are the same reasons as in the ‘utilization’ experiment: by switching to hi-criticality mode after certain triggering events, the system under pMC scheduling reduces the backlog in these situations, which ultimately makes deadline misses less likely.

Now, let us examine pMC with a realistic \(\textsf {PDJ}_{{\text {deg}}}\) bound, i.e. \(\sigma _{{\text {deg}}} = 10^{-5}\). As shown in the top figure, it is clear that there exists a limit after which pMC scheduling is not useful at all, as it leads to too much degradation. This can be understood by viewing Fig. 4 Bottom, where we see the cost of switching to hi-mode. On one extreme case, when \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}}) = 10^{-4}\), the system switches to hi-mode often, on average once every \(48.93\,\)ms (not shown in figure). Then, an average ratio of 0.046 of lo jobs are released in degraded mode. In a moderate case, for \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}}) = 10^{-8}\), hi jobs overrun their execution time threshold \(C_{i}^{{\text {thr}}}\) less often, and lo-mode lasts on average \(8.34\,\) min. Here, an average ratio of \(4.19\cdot 10^{-6}\) of lo jobs are degraded. Finally, on the other extreme case, when \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}}) = 5\cdot 10^{-12}\), lo-mode lasts for \(278.00\,\) h on average, and only a tiny fraction of \(2.09\cdot 10^{-9}\) lo jobs are released in degraded mode. For many realistic applications, there exists a limit on the degradation which can be tolerated, before a complete loss of function happens. Thus we argue that this experiment demonstrates why the \(\textsf {PDJ}_{{\text {deg}}}\) metric is crucial for probabilistic Mixed Criticality scheduling.

Finally, let us comment on \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\) and \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\), found in Fig. 4 Middle. These are similar, except \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\) is larger for higher \({\mathbb {P}}({\mathcal {C}}_{i} > C_{i}^{{\text {thr}}})\) values. We have found that this increase in \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\) appears as a result of pessimistic assumptions introduced in Definition 12. We comment more about this pessimism the next experiment.

‘Maximal Backlog’ experiment

In the final experiment on randomized systems, the maximum system-level backlog \(B_{\max {}}\) was varied. This affects how often hi-criticality mode is entered, while it has no effect on the lo-criticality mode. When the occurrence of hi-criticality mode is artificially increased, we can see the pessimism in the analysis of that mode—which we found mostly to be introduced by pessimistic assumptions on the initial conditions in hi-mode, as per Definition 12. As in the previous experiment, we tested the \((10^{-8}, 10^{-6}, 10^{-5})\)-schedulability of task-sets under pMC and pDMPO, and \((10^{-8}, 10^{-6}, 1)\)-schedulability under pMC scheduling.

Setup For this experiment, we first generated 1000 task-sets with a system utilization of 1.2. This high utilization guarantees that no deterministic scheme can be used to schedule task-sets. All parameters except for the maximum system-level backlog are according to the description at the beginning of this section. Then, the maximum system-level backlog \(B_{\max {}}\) was varied from 40 to \(600\,\upmu \)s, and all of the 1000 task-sets are analyzed for every \(B_{\max {}}\) value. Each generated task-set has 60 tasks, the mean execution time among all tasks in every task-set is \(10.61\,\upmu \)s, while the maximum execution time overall is \(255\,\upmu \)s.

Results Figure 5 visualizes the results of this experiment. As done in the previous experiment, we conducted a \((10^{-8}, 10^{-6}, 10^{-5})\)-schedulability test under pMC and pDMPO, as well as a schedulability test under pMC when the \(\textsf {PDJ}_{{\text {deg}}}\) metric is ignored (i.e. \( \sigma _{{\text {deg}}}=1\)). The box-plots visualize the 10th, 25th, 50th, 75th, and 90th percentile of each evaluated metric. By definition, the maximum system-level backlog \(B_{\max {}}\) does not impact scheduling under pDMPO at all, so the schedulability under this scheme is constant.

Fig. 5
figure5

\((10^{-8}, 10^{-6}, 10^{-5})\)-schedulability of task-sets under pMC and pDMPO, and \((10^{-8}, 10^{-6}, 1)\)-schedulability under pMC, for various \(B_{\max {}}\) values (Top), and the impact this value has on \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\), \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\), and \(\textsf {PDJ}_{{\text {deg}}}\) (Bottom)

Regarding the impact on pMC scheduling, specifically on \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\) and on \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\), we see two cases. On the one hand, when the maximum system-level backlog \(B_{\max {}}\) is sufficiently large, i.e. \(\ge 200\,\upmu \)s, we see that it has a negligible impact on \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\) and \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\) values. On the other hand, when a small \(B_{\max {}}\) causes hi-mode to be entered often, \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\) and \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\) both increase. Ideally, how often hi-mode is entered should not impact \(\textsf {DMP}_{{{\textsc {hi}}}}(1h)\) and \(\textsf {DMP}_{{{\textsc {lo}}}}(1h)\). The increase is a result of pessimism introduced in point 4 of Definition 12. As the reader recalls, there we make a pessimistic assumption that all hi jobs are overrunning their execution time thresholds \(C_{i}^{{\text {thr}}}\) at the time of the mode switch. This pessimistic assumption is mainly introduced to reduce the number of cases under which hi-criticality mode is analyzed.

The impact the backlog \(B_{\max {}}\) has on \(\textsf {PDJ}_{{\text {deg}}}\) is straightforward. As hi-mode is entered more often, \(\textsf {PDJ}_{{\text {deg}}}\) increases. Because of this increase, we find that few task-sets are \((10^{-8}, 10^{-6}, 10^{-5})\)-schedulable under pMC for \(B_{\max {}}\) values less than \(200\,\upmu \)s. We can therefore conclude thus the pessimism of hi-criticality mode analysis does not play a major role in the schedulability analysis of task-sets under realistic requirements for the maximal permitted degradation of lo jobs \(\sigma _{{\text {deg}}}\). Finally, we observe again the main result from the ‘execution threshold’ experiment: probabilistic Mixed Criticality (pMC) scheduling is better than the non-MC scheme pDMPO, except when hi-criticality mode is entered too often.

Conclusion

Modeling tasks’ execution times with random variables in Vestal’s mixed-criticality model allows for a schedulability analysis based on the ‘probability of deadline miss per hour’. We presented a dual-criticality system which operates in either lo- or hi-criticality mode. In lo-criticality mode, both lo and hi jobs run normally, but certain optimism towards hi jobs exists: they are required not to overrun their \(C_{i}^{{\text {thr}}}\) execution time threshold, a value analogues to the optimistic WCET in Vestal’s model. hi-criticality mode is entered when a violation of this optimistic condition is detected, or when one of the following two events happen: a hi job misses its deadline, or the system-level backlog exceeds its maximal value. In this mode, lo jobs are degraded by having a shorter time budget for execution, so hi jobs have more resources available. This mode lasts until the system becomes idle.

To characterize such a system, we first defined \((\sigma _{{{\textsc {hi}}}}, \sigma _{{{\textsc {lo}}}}, \sigma _{{\text {deg}}})\)-schedulability, which quantifies the soft schedulability of a probabilistic mixed-criticality system. The schedulability conditions determine whether the probability of deadline miss per hour for hi jobs, the probability of deadline miss per hour for lo jobs and the probability a lo job is started in its degraded mode are less that the given \((\sigma _{{{\textsc {hi}}}}, \sigma _{{{\textsc {lo}}}}, \sigma _{{\text {deg}}})\) limits.

Then, we presented an analysis approach. This was done by splitting the system into two—the lo- and the hi-criticality mode system—and combining the results. On one hand, a steady state analysis was carried out for lo-criticality mode, in which the system is expected to stay for a long time. This enabled us to pessimistically bound the deadline miss probability of each job, which we then used to find the probability that any job misses its deadline while in lo-mode in a certain time period. On the other hand, a simulation of the transient hi-criticality mode was used to bound its duration, and to obtain the probability of deadline miss of jobs inside it. This, together with the probability a lo \(\rightarrow \) hi mode switch happens, enabled us to find the probability any job misses its deadline while in hi-mode in a certain time period.

Finally, simulation results illustrate all of the metrics on a sample task-set, and experiments involving schedulability analysis show how various design choices impact schedulability. Here, we show how probabilistic Mixed Criticality scheduling compares to other schemes, and make a clear case that using pMC makes sense for most cases, except when the amount of lo job degradation is too high.

Limitations and Future Work Our analysis applies for fixed priority preemptive scheduling, but it could be extended to dynamic scheduling schemes as well. On the one hand, probabilistic response-time calculus already exists for dynamic schemes (Díaz et al. 2002). In addition, dynamic-priority Mixed-Criticality schemes are found to be relevant (Baruah et al. 2011; Guo et al. 2015).

Regarding our proposed scheme, its main limitation is the pessimism of the analysis of hi-criticality mode. This pessimism is due to the fact that we have a single analysis whatever the reason for making the lo \(\rightarrow \) hi transition was.

In a future work, it would be possible to do a less pessimistic analysis of hi-mode by deconstructing the analysis into three sub-classes, one for each lo \(\rightarrow \) hi mode switch reason. For example, if a mode switch was caused by a maximal system-level backlog exceedance, the initial backlog would surely be exactly \(B_{\max {}}\). If the mode switch was not caused by an overrunning job, there would be no need to assume that carry-over jobs of hi criticality surely overrun. If the mode switch was caused by an overrunning hi job, one could introduce cases depending on which job cause the mode switch.

The pessimism of the analysis for the lo-criticality mode could be reduced as well, but arguably this would bear less fruit. One idea here is to estimate the percentage of time a system spends in lo-criticality mode. In calculating \(\textsf {DMP}_{\chi }(T)\) in our work, we assumed the system is in lo-mode all the time. Replacing this assumption with a better estimate would bring improvement, however only for systems which spend a non-negligible amount of time in hi-criticality mode, which is usually not assumed to be the case. Another idea is to use a less pessimistic model of hi tasks in lo-mode, by modeling their executions with conditional ‘truncated’ execution times as is done in several related works (Draskovic et al. 2016; Maxim et al. 2017). However, this would require performing two lo-mode analyses: the one presented here would be used to calculate initial conditions in hi-mode, and the other with the less pessimistic model of hi tasks would be used to calculate deadline miss probabilities in lo-criticality mode.

Notes

  1. 1.

    DO-178B was replaced by DO-178C in 2012.

References

  1. Rtca/do-178c software considerations in airborne systems and equipment certification (2012)

  2. Abdeddaïm Y, Maxim D (2017) Probabilistic schedulability analysis for fixed priority mixed criticality real-time systems. In: Design, automation and test in Europe conference and exhibition (DATE), 2017. IEEE, pp 596–601

  3. Alahmad B, Gopalakrishnan S (2016) A risk-constrained Markov decision process approach to scheduling mixed-criticality job sets. In: Workshop on mixed criticality systems (WMC 2016)

  4. Alahmad BN, Gopalakrishnan S (2018) Risk-aware scheduling of dual criticality job systems using demand distributions. Leibniz Trans Embed Syst 5(1);01-1

    Google Scholar 

  5. Audsley NC, Burns A, Richardson MF, Wellings AJ (1991) Hard real-time scheduling: the deadline-monotonic approach. IFAC Proc Vol 24(2):127–132

    Article  Google Scholar 

  6. Baruah S, Fohler G (2011) Certification-cognizant time-triggered scheduling of mixed-criticality systems. In: 2011 IEEE 32nd real-time systems symposium (RTSS). IEEE, pp 3–12

  7. Baruah SK, Bonifaci V, D’Angelo G, Marchetti-Spaccamela A, Van Der Ster S, Stougie L (2011) Mixed-criticality scheduling of sporadic task systems. In: ESA. Springer, pp 555–566

  8. Baruah SK, Burns A, Davis RI (2011) Response-time analysis for mixed criticality systems. In: 2011 IEEE 32nd real-time systems symposium (RTSS). IEEE, pp 34–43

  9. Burns A, Davis RI (2017) A survey of research into mixed criticality systems. ACM Comput Surv (CSUR) 50(6):1–37

    Article  Google Scholar 

  10. Cucu-Grosjean L, Santinelli L, Houston M, Lo C, Vardanega T, Kosmidis L, Abella J, Mezzetti E, Quinones E, Cazorla FJ (2012) Measurement-based probabilistic timing analysis for multi-path programs. In: 2012 24th Euromicro conference on real-time systems (ECRTS). IEEE, pp 91–101

  11. Davis RI, Burns A (2011) Improved priority assignment for global fixed priority pre-emptive scheduling in multiprocessor real-time systems. Real Time Syst 47(1):1–40

    Article  Google Scholar 

  12. Davis RI, Burns A, Griffin D (2017) On the meaning of pWCET distributions and their use in schedulability analysis. In: Real-time scheduling open problems seminar at ECRTS

  13. Davis RI, Cucu-Grosjean L (2019) A survey of probabilistic schedulability analysis techniques for real-time systems. Leibniz Trans Embed Syst 6(1):1–53

    Google Scholar 

  14. Davis RI, Cucu-Grosjean L (2019) A survey of probabilistic timing analysis techniques for real-time systems. Leibniz Trans Embed Syst 6(1):03-1

    Google Scholar 

  15. Devgan A, Kashyap C (2003) Block-based static timing analysis with uncertainty. In: Proceedings of the 2003 IEEE/ACM international conference on computer-aided design. IEEE Computer Society, p 607

  16. Díaz JL, García DF, Kim K, Lee CG, Bello LL, López JM, Min SL, Mirabella O (2002) Stochastic analysis of periodic real-time systems. In: 23rd IEEE real-time systems symposium, 2002. RTSS 2002. IEEE, pp 289–300

  17. Díaz JL, López JM (2004) Safe extensions to the stochastic analysis of real-time systems. Technical report. Departamento de Informatica, University of Oviedo. http://www.atc.uniovi.es/research/SESARTS04.pdf. Accessed 1 Oct 2019

  18. Díaz JL, López JM, Garcia M, Campos AM, Kim K, Bello LL (2004) Pessimism in the stochastic analysis of real-time systems: concept and applications. In: 25th IEEE international real-time systems Symposium, 2004. Proceedings. IEEE, pp 197–207

  19. Draskovic S, Huang P, Thiele L (2016) On the safety of mixed-criticality scheduling. In: Workshop on mixed criticality systems (WMC 2016)

  20. Durrieu G, Faugere M, Girbal S, Pérez DG, Pagetti C, Puffitsch W (2014) Predictable flight management system implementation on a multicore processor. In: Embedded real time software (ERTS’14)

  21. Ekberg P, Yi W (2012) Outstanding paper award: bounding and shaping the demand of mixed-criticality sporadic tasks. In: 2012 24th Euromicro conference on real-time systems (ECRTS). IEEE, pp 135–144

  22. Ernst R, Di Natale M (2016) Mixed criticality systems—a history of misconceptions? IEEE Des Test 33(5):65–74

    Article  Google Scholar 

  23. Guo Z, Santinelli L, Yang K (2015) EDF schedulability analysis on mixed-criticality systems with permitted failure probability. In: 2015 IEEE 21st international conference on embedded and real-time computing systems and applications (RTCSA). IEEE, pp 187–196

  24. Huang P, Giannopoulou G, Stoimenov N, Thiele L (2014) Service adaptions for mixed-criticality systems. In: 2014 19th Asia and South Pacific design automation conference (ASP-DAC). IEEE, pp 125–130

  25. Küttler M, Roitzsch M, Hamann CJ, Volp M (2017) Probabilistic analysis of low-criticality execution. In: Workshop on mixed criticality systems (WMC 2017)

  26. López JM, Díaz JL, Entrialgo J, García D (2008) Stochastic analysis of real-time systems under preemptive priority-driven scheduling. Real Time Syst 40(2):180

    Article  Google Scholar 

  27. Masrur A (2016) A probabilistic scheduling framework for mixed-criticality systems. In: Proceedings of the 53rd annual design automation conference. ACM, p 132

  28. Maxim D, Buffet O, Santinelli L, Cucu-Grosjean L, Davis RI (2011) Optimal priority assignment algorithms for probabilistic real-time systems. In: RTNS, pp 129–138

  29. Maxim D, Davis RI, Cucu-Grosjean L, Easwaran A (2017) Probabilistic analysis for mixed criticality systems using fixed priority preemptive scheduling. In: Proceedings of the 25th international conference on real-time networks and systems. ACM, pp 237–246

  30. Pagetti C, Saussié D, Gratia R, Noulard E, Siron P (2014) The ROSACE case study: from simulink specification to multi/many-core execution. In: 2014 IEEE 20th real-time and embedded technology and applications symposium (RTAS). IEEE, pp 309–318

  31. Park T, Kim S (2011) Dynamic scheduling algorithm and its schedulability analysis for certifiable dual-criticality systems. In: Proceedings of the ninth ACM international conference on Embedded software. ACM, pp 253–262

  32. Santinelli L, George L (2015) Probabilities and mixed-criticalities: the probabilistic c-space. In: Proceedings of the workshop on mixed criticality systems (WMC 2015)

  33. Santinelli L, Guo Z (2018) A sensitivity analysis for mixed criticality: trading criticality with computational resource. In: 2018 IEEE 23rd international conference on emerging technologies and factory automation (ETFA), vol 1. IEEE, pp 313–320

  34. Santinelli L, Guo Z, George L (2016) Fault-aware sensitivity analysis for probabilistic real-time systems. In: 2016 IEEE international symposium on defect and fault tolerance in VLSI and nanotechnology systems (DFT), pp 69–74

  35. Tămaş-Selicean D, Pop P (2015) Design optimization of mixed-criticality real-time embedded systems. ACM Trans Embed Comput Syst 14(3):50

    Article  Google Scholar 

  36. Vestal S (2007) Preemptive scheduling of multi-criticality systems with varying degrees of execution time assurance. In: 28th IEEE international real-time systems symposium, 2007. RTSS 2007. IEEE, pp 239–243

Download references

Funding

Open Access funding provided by ETH Zurich.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Stefan Draskovic.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: Computational complexity of the analysis

Here we comment on the computational complexity of our proposed probabilistic Mixed Criticality (pMC) schedulability analysis. Algorithm A presents a high-level recapitulation of the analysis, where all pseudo-commands are as explained in Sect. 5.

The computational complexity of the analysis is \({\mathcal {O}}(n^{2} \cdot {\textsc {hp}}\cdot c\log {c})\), where n is the number of jobs in one hyperperiod, \({\textsc {hp}}\) is the length of one hyperperiod, and c is the length or number of values in the execution time distributions.

In the analysis, the most complex atomic command is the convolution. When using FFT, one convolution has a cost of \({\mathcal {O}}(c\log {c})\).

Let us now comment on the complexity of the analysis in detail. According to Sect. 4.1, the steady state backlog is approximated by \({\mathcal {B}}_{i}{(k\cdot {\textsc {hp}})}\), where k is the smallest natural number satisfying inequality (9). To calculate \({\mathcal {B}}_{i}{(k\cdot {\textsc {hp}})}\), a convolution is needed for every one of the \(n\cdot k\) jobs, thus the cost of line 2 is \({\mathcal {O}}( n \cdot k \cdot c\log {c})\). Similarly, according to point 4 of Definition 12, backlog \({\widehat{{\mathcal {B}}}}_{i}(t)\) is defined as a combination of two steady state backlogs, and the cost of line 14 is also \({\mathcal {O}}( n \cdot k \cdot c\log {c})\). The number k depends on the required numerical precision (9), but we have found it to be in the same order of magnitude as n, \(k\sim n\).

To compute deadline miss probabilities, i.e. lines 4 and, 17, response time analysis is used as defined by Algorithm 1. Line 6 is based on response time analysis as well (Lemma 8). To find the response time of a job, we need to do as many convolutions as there are jobs preempting the said job. Thus, the cost of these lines is \({\mathcal {O}}( n \cdot c\log {c})\).

Finally, when analyzing hi-mode, the maximal duration of the mode \(\Delta ^{{{\textsc {hi}}}}_{\max {}}\) plays a role. When calculating \(\Delta ^{{{\textsc {hi}}}}_{\max {}}\) in line 15, and when computing deadlines miss probabilities of jobs in lines 16 and 17, we need to take into account all jobs that are released in hi-mode. Regardless on when hi-mode is entered or exited, the number of these jobs is at most \(n \cdot \Delta ^{{{\textsc {hi}}}}_{\max {}}/ {\textsc {hp}}\). For schedulable systems, we found that \(\Delta ^{{{\textsc {hi}}}}_{\max {}}\) is in the same order of magnitude as \({\textsc {hp}}\), \(\Delta ^{{{\textsc {hi}}}}_{\max {}}\sim {\textsc {hp}}\) and \(\Delta ^{{{\textsc {hi}}}}_{\max {}}/ {\textsc {hp}}\sim 1\).

figureb

Event though the computational complexity of this scheme is high, we find it to be acceptable. The analysis only needs to be done offline, while designing the system. Furthermore, parts of Algorithm A are parallelizable. Each iteration of the for-loop in line 13 can be run independently, meaning that the analysis of hi-mode can be done in parallel on \({\textsc {hp}}\) processes, each of complexity \({\mathcal {O}}( n^{2} \cdot c\log {c})\). Consequently, this would be the computational complexity of the whole Algorithm, if we were to have unlimited resources.

Runtimes For Sect. 6.2, we ran the analysis of each task-set on a single core of a Dual Deca-Core Intel Xeon E5-2690 v2, running at \(3.00\,\)GHz. As defined in the task-set generation, all task-sets have \({\textsc {hp}}= 1000\) and \(c \sim 1000\). In Table 4, we noted the average analysis runtimes for task-sets of different utilizations and number of jobs.

Table 4 pMC analysis runtimes, for different number of jobs in a hyperperiod n

Appendix: Notation

See Table 5.

Table 5 Notations

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Draskovic, S., Ahmed, R., Huang, P. et al. Schedulability of probabilistic mixed-criticality systems. Real-Time Syst (2021). https://doi.org/10.1007/s11241-021-09365-4

Download citation

Keywords

  • Mixed-criticality scheduling
  • Probabilistic execution times
  • Stochastic analysis