Priority Inheritance Protocol Proved Correct
Abstract
In realtime systems with threads, resource locking and priority scheduling, one faces the problem of Priority Inversion. This problem can make the behaviour of threads unpredictable and the resulting bugs can be hard to find. The Priority Inheritance Protocol is one solution implemented in many systems for solving this problem, but the correctness of this solution has never been formally verified in a theorem prover. As already pointed out in the literature, the original informal investigation of the Property Inheritance Protocol presents a correctness “proof” for an incorrect algorithm. In this paper we fix the problem of this proof by making all notions precise and implementing a variant of a solution proposed earlier. We also generalise the scheduling problem to the practically relevant case where critical sections can overlap. Our formalisation in Isabelle/HOL is based on Paulson’s inductive approach to protocol verification. The formalisation not only uncovers facts overlooked in the literature, but also helps with an efficient implementation of this protocol. Earlier implementations were criticised as too inefficient. Our implementation builds on top of the small PINTOS operating system used for teaching.
Keywords
Priority Inheritance Protocol Formal correctness proof Realtime systems Isabelle/HOL1 Introduction
Many realtime systems need to support threads involving priorities and locking of resources. Locking of resources ensures mutual exclusion when accessing shared data or devices that cannot be preempted. Priorities allow scheduling of threads that need to finish their work within deadlines. Unfortunately, both features can interact in subtle ways leading to a problem, called Priority Inversion. Suppose three threads having priorities H(igh), M(edium) and L(ow). We would expect that the thread H blocks any other thread with lower priority and the thread itself cannot be blocked indefinitely by threads with lower priority. Alas, in a naive implementation of resource locking and priorities, this property can be violated. For this let L be in the possession of a lock for a resource that H also needs. H must therefore wait for L to exit the critical section and release this lock. The problem is that L might in turn be blocked by any thread with priority M, and so H sits there potentially waiting indefinitely (consider the case where threads with priority M continuously need to be processed). Since H is blocked by threads with lower priorities, the problem is called Priority Inversion. It was first described in [12] in the context of the Mesa programming language designed for concurrent programming.
If the problem of Priority Inversion is ignored, realtime systems can become unpredictable and resulting bugs can be hard to diagnose. The classic example where this happened is the software that controlled the Mars Pathfinder mission in 1997 [21]. On Earth, the software ran mostly without any problem, but once the spacecraft landed on Mars, it shut down at irregular, but frequent, intervals. This led to loss of project time as normal operation of the craft could only resume the next day (the mission and data already collected were fortunately not lost, because of a clever system design). The reason for the shutdowns was that the scheduling software fell victim to Priority Inversion: a low priority thread locking a resource prevented a high priority thread from running in time, leading to a system reset. Once the problem was found, it was rectified by enabling the Priority Inheritance Protocol (PIP) [24]^{1} in the scheduling software.
The idea behind PIP is to let the thread L temporarily inherit the high priority from H until L leaves the critical section unlocking the resource. This solves the problem of H having to wait indefinitely, because L cannot be blocked by threads having priority M. While a few other solutions exist for the Priority Inversion problem, PIP is one that is widely deployed and implemented. This includes VxWorks (a proprietary realtime OS used in the Mars Pathfinder mission, in Boeing’s 787 Dreamliner, Honda’s ASIMO robot, etc.) and ThreadX (another proprietary realtime OS used in nearly all HP inkjet printers [28]), but also the POSIX 1003.1c Standard realised for example in libraries for FreeBSD, Solaris and Linux.
“Priority inheritance is neither efficient nor reliable. Implementations are either incomplete (and unreliable) or surprisingly complex and intrusive.”
He suggests avoiding PIP altogether by designing the system so that no priority inversion may happen in the first place. However, such ideal designs may not always be achievable in practice.
“I observed in the kernel code (to my disgust), the Linux PIP implementation is a nightmare: extremely heavy weight, involving maintenance of a full waitfor graph, and requiring updates for a range of events, including priority changes and interruptions of wait operations.”
The criticism by Yodaiken, Baker and others suggests another look at PIP from a more abstract level (but still concrete enough to inform an implementation), and makes PIP a good candidate for a formal verification. An additional reason is that the original specification of PIP [24], despite being informally “proved” correct, is actually flawed.
Yodaiken [30] and also Moylan et al. [16] point to a subtlety that had been overlooked in the informal proof by Sha et al. They specify PIP in [24, Section III] so that after the thread (whose priority has been raised) completes its critical section and releases the lock, it “returns to its original priority level”. This leads them to believe that an implementation of PIP is “rather straightforward” [24]. Unfortunately, as Yodaiken and Moylan et al. point out, this behaviour is too simplistic. Moylan et al. write that there are “some hidden traps” [16]. Consider the case where the low priority thread L locks two resources, and two highpriority threads H and \(H'\) each wait for one of them. If L releases one resource so that H, say, can proceed, then we still have Priority Inversion with \(H'\) (which waits for the other resource). The correct behaviour for L is to switch to the highest remaining priority of the threads that it blocks. A similar error is made in the textbook [20, Section 2.3.1] which specifies for a process that inherited a higher priority and exits a critical section that “it resumes the priority it had at the point of entry into the critical section”. This error can also be found in the textbook [14, Section 16.4.1] where the authors write about this process: “its priority is immediately lowered to the level originally assigned”; and also in the more recent textbook [13, Page 119] where the authors state: “when [the task] exits the critical section that caused the block, it reverts to the priority it had when it entered that section”. The textbook [15, Page 286] contains a similar flawed specification and even goes on to develop pseudocode based on this flawed specification. Accordingly, the operating system primitives for inheritance and restoration of priorities in [15] depend on maintaining a data structure called inheritance log. This log is maintained for every thread and broadly specified as containing “[h]istorical information on how the thread inherited its current priority” [15, Page 527]. Unfortunately, the important information about actually computing the priority to be restored solely from this log is not explained in [15] but left as an “exercise” to the reader. As we shall see, a correct version of PIP does not need to maintain this (potentially expensive) log data structure at all. Surprisingly also the widely read and frequently updated textbook [25] gives the wrong specification. On Page 254 the authors write: “Upon releasing the lock, the [lowpriority] thread will revert to its original priority.” The same error is also repeated later in this popular textbook.
While [13, 14, 15, 20, 24, 25] are the only formal publications we have found that specify the incorrect behaviour, it seems also many informal descriptions of the PIP protocol overlook the possibility that another highpriority process might wait for a lowpriority process to finish. A notable exception is the textbook [3], which gives the correct behaviour of resetting the priority of a thread to the highest remaining priority of the threads it blocks. This textbook also gives an informal proof for the correctness of PIP in the style of Sha et al. Unfortunately, this informal proof is too vague to be useful for formalising the correctness of PIP and the specification leaves out nearly all details in order to implement PIP efficiently.
2 Formal Model of the Priority Inheritance Protocol
for the precedences of a set of threads Open image in new window in state Open image in new window . The point of precedences is to schedule threads not according to priorities (because what should we do in case two threads have the same priority), but according to precedences. Precedences allow us to always discriminate between two threads with equal priority by taking into account the time when the priority was last set. We order precedences so that threads with the same priority get a higher precedence if their priority has been set earlier, since for such threads it is more urgent to finish their work. In an implementation this choice would translate to a quite straightforward FIFOscheduling of threads with the same priority.
Moylan et al. [16] considered the alternative of “timeslicing” threads with equal priority, but found that it does not lead to advantages in practice. On the contrary, according to their work having a policy like our FIFOscheduling of threads with equal priority reduces the number of tasks involved in the inheritance process and thus minimises the number of potentially expensive threadswitches.
If there is no cycle, then every Open image in new window can be pictured as a forest of trees, as for example in Fig. 2.
Note that forests can have trees with infinite depth and containing nodes with infinitely many children. A finite forest is a forest whose underlying relation is wellfounded^{3} and every node has finitely many children (is only finitely branching).
The locking mechanism ensures that for each thread node, there can be many incoming holding edges in the Open image in new window , but at most one out going waiting edge. The reason is that when a thread asks for a resource that is locked already, then the thread is blocked and cannot ask for another resource. Clearly, also every resource can only have at most one outgoing holding edge—indicating that the resource is locked. So if the Open image in new window is wellfounded and finite, we can always start at a thread waiting for a resource and “chase” outgoing arrows leading to a single root of a tree, which must be a ready thread.
This definition is the relation that one thread is waiting for another to release a resource, but the corresponding resource is “hidden”. In Fig. 2 this means the Open image in new window connects Open image in new window and Open image in new window to Open image in new window , which both wait for resource Open image in new window to be released; and Open image in new window to Open image in new window , which cannot make any progress unless Open image in new window makes progress. Similarly for the other threads. If there is a circle of dependencies in a Open image in new window (and thus Open image in new window ), then clearly we have a deadlock. Therefore when a thread requests a resource, we must ensure that the resulting Open image in new window and Open image in new window are not circular. In practice, the programmer has to ensure this. Our model will enforce that critical resources can only be requested provided no circularity can arise (but critical sections can overlap, see Fig 1).
The first function is a waiting queue function (that is, it takes a resource Open image in new window and returns the corresponding list of threads that lock or wait for it); the second is a function that takes a thread and returns its current precedence [see the Open image in new window in (5)]. We assume the usual getter and setter methods for such records.
This is because the Open image in new window event is for a thread to change its own priority—therefore it must be running.
Note, however, that apart from the circularity condition, we do not make any assumption on how different resources can be locked and released relative to each other. In our model it is possible that critical sections overlap. This is in contrast to Sha et al. [24] who require that critical sections are properly nested (recall Fig. 1).
This completes our formal model of PIP. In the next section we present a series of desirable properties derived from this model of PIP. This can be regarded as a validation of the correctness of our model.
3 The Correctness Proof
Sha et al. state their first correctness criterion for PIP in terms of the number of lowpriority threads [24, Theorem 3]: if there are Open image in new window lowpriority threads, then a blocked job with high priority can only be blocked a maximum of Open image in new window times. Their second correctness criterion is given in terms of the number of critical resources [24, Theorem 6]: if there are Open image in new window critical resources, then a blocked job with high priority can only be blocked a maximum of Open image in new window times. Both results on their own, strictly speaking, do not prevent indefinite, or unbounded, Priority Inversion, because if a lowpriority thread does not give up its critical resource (the one the highpriority thread is waiting for), then the highpriority thread can never run. The argument of Sha et al. is that if threads release locked resources in a finite amount of time, then indefinite Priority Inversion cannot occur—the highpriority thread is guaranteed to run eventually. The assumption is that programmers must ensure that threads are programmed in this way. However, even taking this assumption into account, the correctness properties of Sha et al. are not true for their version of PIP—despite being “proved”. As Yodaiken [30] and Moylan et al. [16] pointed out: If a lowpriority thread possesses locks to two resources for which two highpriority threads are waiting for, then lowering the priority prematurely after giving up only one lock, can cause indefinite Priority Inversion for one of the highpriority threads, invalidating their two bounds (recall the counter example described in the Introduction).
Even when fixed, their proof idea does not seem to go through for us, because of the way we have set up our formal model of PIP. One reason is that we allow critical sections, which start with a Open image in new window event and finish with a corresponding Open image in new window event, to arbitrarily overlap (something Sha et al. explicitly exclude). Therefore we have designed a different correctness criterion for PIP. The idea behind our criterion is as follows: for all states Open image in new window , we know the corresponding thread Open image in new window with the highest precedence; we show that in every future state (denoted by Open image in new window ) in which Open image in new window is still alive, either Open image in new window is running or it is blocked by a thread that was alive in the state Open image in new window and was waiting for or in the possession of a lock in Open image in new window . Since in Open image in new window , as in every state, the set of alive threads is finite, Open image in new window can only be blocked by a finite number of threads.
Assumptions on the states Open image in new window and Open image in new window We need to require that Open image in new window and Open image in new window are valid states:
Assumptions on the thread Open image in new window The thread Open image in new window must be alive in Open image in new window and has the highest precedence of all alive threads in Open image in new window . Furthermore the priority of Open image in new window is Open image in new window (we need this in the next assumptions).
Assumptions on the events in Open image in new window To make sure Open image in new window has the highest precedence we have to assume that events in Open image in new window can only create (respectively set) threads with equal or lower priority than Open image in new window of Open image in new window . For the same reason, we also need to assume that the priority of Open image in new window does not get reset and all other reset priorities are either less or equal. Moreover, we assume that Open image in new window does not get “exited” in Open image in new window . This can be ensured by assuming the following three implications.
The locale mechanism of Isabelle helps us to manage conveniently such assumptions [9]. Under these assumptions we shall prove the following correctness property:
Theorem 1

there exists a thread Open image in new window with Open image in new window and Open image in new window such that Open image in new window , Open image in new window and Open image in new window .
This theorem ensures that the thread Open image in new window , which has the highest precedence in the state Open image in new window , is either running in state Open image in new window , or can only be blocked in the state Open image in new window by a thread Open image in new window that already existed in Open image in new window and is waiting for a resource or had a lock on at least one resource—that means the thread was not detached in Open image in new window . As we shall see shortly, that means there are only finitely many threads that can block Open image in new window in this way.
The next lemma is part of the proof for Theorem 1: Given our assumptions (on Open image in new window ), the first property we show that a running thread Open image in new window must either wait for or hold a resource in state Open image in new window .
Lemma 1
If Open image in new window and Open image in new window then Open image in new window .
Proof
Let us assume otherwise, that is Open image in new window is detached in state Open image in new window , then, according to the definition of detached, Open image in new window does not hold or wait for any resource. Hence the Open image in new window value of Open image in new window in Open image in new window is not boosted, that is Open image in new window , and is therefore lower than the precedence (as well as the Open image in new window value) of Open image in new window . This means Open image in new window will not run as long as Open image in new window is a live thread. In turn this means Open image in new window cannot take any action in state Open image in new window to change its current status; therefore Open image in new window is still detached in state Open image in new window . Consequently Open image in new window is also not boosted in state Open image in new window and would not run. This contradicts our assumption. \(\square \)
Proof (of Theorem 1)
If Open image in new window , then there is nothing to show. So let us assume otherwise. Since the Open image in new window is wellfounded, we know there exists an ancestor of Open image in new window that is the root of the corresponding subtree and therefore is ready (it does not request any resources). Let us call this thread Open image in new window . Since in PIP the Open image in new window value of any thread equals the maximum precedence of all threads in its Open image in new window subtree, and Open image in new window is in the subtree of Open image in new window , the Open image in new window value of Open image in new window cannot be lower than the precedence of Open image in new window . But, it can also not be higher, because the precedence of Open image in new window is the maximum among all threads. Therefore we know that the Open image in new window value of Open image in new window is the same as the precedence of Open image in new window . The result is that Open image in new window must be running. This is because Open image in new window value of Open image in new window is the highest of all ready threads. This follows from the fact that the Open image in new window value of any ready thread is the maximum of the precedences of all threads in its subtrees (with Open image in new window having the highest of all threads and being in the subtree of Open image in new window ). We also have that Open image in new window since we assumed Open image in new window is not running. By Lemma 1 we have that Open image in new window . If Open image in new window is not detached in Open image in new window , that is either holding or waiting for a resource, it must be that Open image in new window .
This concludes the Proof of Theorem 1. \(\square \)
4 A Finite Bound on Priority Inversion
Like in the work by Sha et al. our result in Theorem 1 does not yet guarantee the absence of indefinite Priority Inversion. For this we further need the property that every thread gives up its resources after a finite amount of time. We found that this property is not so straightforward to formalise in our model. There are mainly two reasons for this: First, we do not specify what “running the code” of a thread means, for example by giving an operational semantics for machine instructions. Therefore we cannot characterise what are “good” programs that contain for every locking request for a resource also a corresponding unlocking request. Second, we need to distinguish between a thread that “just” locks a resource for a finite amount of time (even if it is very long) and one that locks it forever (there might be an unbounded loop in between the locking and unlocking requests).
Because of these problems, we decided in our earlier paper [31] to leave out this property and let the programmer take on the responsibility to program threads in such a benign manner (in addition to causing no circularity in the RAG). This leaveittotheprogrammer approach was also taken by Sha et al. in their paper. However, in this paper we can make an improvement by establishing a finite bound on the duration of Priority Inversion measured by the number of events. The events can be seen as a rough(!) abstraction of the “runtime behaviour” of threads and also as an abstract notion of “time”—when a new event happens, some time must have passed.
Assumption on the number of threads created after the state Open image in new window : Given the state Open image in new window , in every “future” valid state Open image in new window , we require that the number of created threads is less than a bound Open image in new window , that iswhereby Open image in new window is a list of events.
Note that it is not enough to just state that there are only finite number of threads created up until a single state Open image in new window after Open image in new window . Instead, we need to put this bound on the Open image in new window events for all valid states after Open image in new window . This ensures that no matter which “future” state is reached, the number of Open image in new window events is finite. This bound Open image in new window is assumed with respect to all future states Open image in new window of Open image in new window , not just a single one.
Assumptions on the threads Open image in new window : For each such Open image in new window there exists a finite bound Open image in new window such that for all future valid states Open image in new window , we have that if Open image in new window , then
By this assumption we enforce that any thread potentially blocking Open image in new window must become detached (that is it owns no resource anymore) after a finite number of events in Open image in new window . Again we have to state this bound to hold in all valid states after Open image in new window . The bound reflects how each thread Open image in new window is programmed: Though we cannot express what instructions a thread is executing, the events in our model correspond to the system calls made by a thread. Our Open image in new window bounds the number of these “calls”.
Our theorem can then be stated as follows:
Theorem 2
This theorem uses Isabelle’s listcomprehension notation, which lists all intermediate states between Open image in new window and Open image in new window , and then filters this list according to states in which Open image in new window is not running. By calculating the number of elements in the filtered list using the function Open image in new window , we have the number of intermediate states in which Open image in new window is not running and which by the theorem is bounded by the term on the righthand side.
Proof
This theorem is the main conclusion we obtain for the Priority Inheritance Protocol. It is based on the fact that the set of Open image in new window is fixed at state Open image in new window when Open image in new window becomes the thread with the highest priority. Then no additional blocker of Open image in new window can appear after the state Open image in new window . And in this way we can bound the number of states where the thread Open image in new window with the highest priority is prevented from running. Our bound does not depend on the restriction of wellnested critical sections in the Priority Inheritance Protocol as imposed by Sha et al.
5 Properties for an Implementation
While our formalised proof gives us confidence about the correctness of our model of PIP, we found that the formalisation can even help us with efficiently implementing it. For example Baker complained that calculating the current precedence in PIP is quite “heavy weight” in Linux (see the Introduction). In our model of PIP the current precedence of a thread in a state Open image in new window depends on the precedences of all threads in its subtree—a “global” transitive notion, which is indeed heavy weight [see the equation for Open image in new window shown in (6)]. We can however improve upon this. For this recall the notion of Open image in new window of a thread Open image in new window defined in (3). There a child is a thread that is only one “hop” away from the thread Open image in new window in the Open image in new window (and waiting for Open image in new window to release a resource). Using children, we can prove the following lemma for more efficiently calculating Open image in new window of a thread Open image in new window .
That means the current precedence of a thread Open image in new window can be computed by considering the static precedence of Open image in new window and the current precedences of the children of Open image in new window . Their Open image in new window s, in general, need to be computed by recursively descending into deeper “levels” of the Open image in new window . However, the current precedence of a thread Open image in new window , say, only needs to be recomputed when Open image in new window its static precedence is reset or when Open image in new window one of its children changes its current precedence or when Open image in new window the children set changes (for example in a Open image in new window event). If only the static precedence or the childrenset changes, then we can avoid the recursion and compute the Open image in new window of Open image in new window locally. In such cases the recursion does not need to descend into the corresponding subtree. Once the current precedence is computed in this more efficient manner, the selection of the thread with highest precedence from a set of ready threads is a standard scheduling operation and implemented in most operating systems.
Below we outline how our formalisation guides the efficient calculation of Open image in new window in response to each kind of events.
This means in an implementation we do not have to recalculate the Open image in new window and also none of the current precedences of the other threads. The current precedence of the created thread Open image in new window is just its precedence, namely the pair Open image in new window .
This means again we do not have to recalculate the Open image in new window and also not the current precedences for the other threads. Since Open image in new window is not alive anymore in state Open image in new window , there is no need to calculate its current precedence.
The first property is again telling us we do not need to change the Open image in new window . The second shows that the Open image in new window values of all threads other than Open image in new window are unchanged. The reason for this is more subtle: Since Open image in new window must be running, then it does not wait for any resource to be released and it cannot be in any subtree of any other thread. So all current precedences of other threads are unchanged.
This means the recalculation of the Open image in new window of Open image in new window and Open image in new window can be done independently and also done locally by only looking at the children: according to (9) and (10) none of the Open image in new window of the children changes, just the childrensets changes by a Open image in new window event.
This means we need to add a holding edge to the Open image in new window . However, note that while the Open image in new window changes the corresponding Open image in new window does not change. Together with the fact that the precedences of all threads are unchanged, no Open image in new window value is changed. Therefore, no recalculation of the Open image in new window value of any thread Open image in new window is needed.
This property states that if an intermediate Open image in new window value does not change (in this case the Open image in new window value of Open image in new window ), then the procedure can also stop, because none of Open image in new window ancestorthreads will have their current precedence changed.
As can be seen, a pleasing byproduct of our formalisation is that the properties in this section closely inform an implementation of PIP, namely whether the Open image in new window needs to be reconfigured or current precedences need to be recalculated for an event. This information is provided by the lemmas we proved. We confirmed that our observations translate into practice by implementing our version of PIP on top of PINTOS, a small operating system written in C and used for teaching at Stanford University [19].^{5} While there is no formal connection between our formalisation and the Ccode shown below, the results of the formalisation clearly shine through in the design of the code.
To implement PIP in PINTOS, we only need to modify the kernel functions corresponding to the events in our formal model. The events translate to the following function interface in PINTOS:
Our implicit assumption that every event is an atomic operation is ensured by the architecture of PINTOS (which allows disabling of interrupts when some operations are performed). The case where an unlocked resource is given next to the waiting thread with the highest precedence is realised in our implementation by priority queues. We implemented them as Braun trees [17], which provide efficient Open image in new window operations for accessing and updating. In the code we shall describe below, we use the function Open image in new window , for inserting a new element into a priority queue, and the function Open image in new window , for updating the position of an element that is already in a queue. Both functions take an extra argument that specifies the comparison function used for organising the priority queue.
Lines 6 and 7 of lock_acquire make the operation of acquiring a lock atomic by disabling all interrupts, but saving them for resumption at the end of the function (Line 31). In Line 8, the interesting code with respect to scheduling starts: we first check whether the lock is already taken (its value is then 0 indicating “already taken”, or 1 for being “free”). In case the lock is taken, we enter the ifbranch inserting the current thread into the waiting queue of this lock (Line 9). The waiting queue is referenced in the usual Cway as Open image in new window . Next, we record that the current thread is waiting for the lock (Line 10). Thus we established two pointers: one in the waiting queue of the lock pointing to the current thread, and the other from the current thread pointing to the lock. According to our specification in Sect. 2 and the properties we were able to prove for Open image in new window , we need to “chase” all the ancestor threads in the Open image in new window and update their current precedence; however we only have to do this as long as there is change in the current precedence.
The “chase” is implemented in the whileloop in Lines 13–24. To initialise the loop, we assign in Lines 11 and 12 the variable Open image in new window to the owner of the lock. Inside the loop, we first update the precedence of the lock held by Open image in new window (Line 14). Next, we check whether there is a change in the current precedence of Open image in new window . If not, then we leave the loop, since nothing else needs to be updated (Lines 15 and 16). If there is a change, then we have to continue our “chase”. We check what lock the thread Open image in new window is waiting for (Lines 17 and 18). If there is none, then the thread Open image in new window is ready (the “chase” is finished with finding a root in the Open image in new window ). In this case we update the readyqueue accordingly (Lines 19 and 20). If there is a lock Open image in new window is waiting for, we update the waiting queue for this lock and we continue the loop with the holder of that lock (Lines 22 and 23). After all current precedences have been updated, we finally need to block the current thread, because the lock it asked for was taken (Line 25).
If the lock the current thread asked for is not taken, we proceed with the elsebranch (Lines 26–30). We first decrease the value of the lock to 0, meaning it is taken now (Line 27). Second, we update the reference of the holder of the lock (Line 28), and finally update the queue of locks the current thread already possesses (Line 29). The very last step is to enable interrupts again thus leaving the protected section.
Similar operations need to be implemented for the Open image in new window function, which we however do not show. The reader should note though that we did not verify our Ccode. This is in contrast, for example, to the work on seL4, which actually verified in Isabelle/HOL that their Ccode satisfies its specification, though this specification does not contain anything about PIP [11]. Our verification of PIP however provided us with (formally proven) insights on how to design the Ccode. It gave us confidence that leaving the “chase” early, whenever there is no change in the calculated current precedence, does not break the correctness of the algorithm.
6 Conclusion
The Priority Inheritance Protocol (PIP) is a classic textbook algorithm used in many realtime operating systems in order to avoid the problem of Priority Inversion. Although classic and widely used, PIP does have its faults: for example it does not prevent deadlocks in cases where threads have circular lock dependencies.
We had two goals in mind with our formalisation of PIP: One is to make the notions in the correctness proof by Sha et al. [24] precise so that they can be processed by a theorem prover. The reason is that a mechanically checked proof avoids the flaws that crept into their informal reasoning. We achieved this goal: The correctness of PIP now only hinges on the assumptions behind our formal model. The reasoning, which is sometimes quite intricate and tedious, has been checked by Isabelle/HOL. We can also confirm that Paulson’s inductive method for protocol verification [18] is quite suitable for our formal model and proof. The traditional application area of this method is security protocols.
The second goal of our formalisation is to provide a specification for actually implementing PIP. Textbooks, for example Vahalia [26, Section 5.6.5], explain how to use various implementations of PIP and abstractly discuss their properties, but surprisingly lack most details important for a programmer who wants to implement PIP (similarly Sha et al. [24]). That this is an issue in practice is illustrated by the email from Baker we cited in the Introduction. We achieved also this goal: The formalisation allowed us to efficiently implement our version of PIP on top of PINTOS, a simple instructional operating system for the x86 architecture implemented by Pfaff [19]. It also gives the first author enough data to enable his undergraduate students to implement PIP (as part of their OS course). A byproduct of our formalisation effort is that nearly all design choices for the implementation of PIP scheduler are backed up with a proved lemma. We were also able to establish the property that the choice of the next thread which takes over a lock is irrelevant for the correctness of PIP. Moreover, we eliminated a crucial restriction present in the proof of Sha et al.: they require that critical sections nest properly, whereas our scheduler allows critical sections to overlap. What we are not able to do is to mechanically “synthesise” an actual implementation from our formalisation. To do so for Ccode seems quite hard and is beyond current technology available for Isabelle. Also our proofmethod based on events is not “computational” in the sense of having a concrete algorithm behind it: our formalisation is really more about the specification of PIP and ensuring that it has the desired properties (the informal specification by Sha et al. did not).
PIP is a scheduling algorithm for singleprocessor systems. We are now living in a multiprocessor world. Priority Inversion certainly occurs also there, see for example work by Brandenburg, and Davis and Burns [1, 6]. However, there is very little “foundational” work about PIPalgorithms on multiprocessor systems. We are not aware of any correctness proofs, not even informal ones. There is an implementation of a PIPalgorithm for multiprocessors as part of the “realtime” effort in Linux, including an informal description of the implemented scheduling algorithm given by Rostedt in [23]. We estimate that the formal verification of this algorithm, involving more finegrained events, is a magnitude harder than the one we presented here, but still within reach of current theorem proving technology. We leave this for future work.
To us, it seems sound reasoning about scheduling algorithms is fiendishly difficult if done informally by “pencilandpaper”. We infer this from the flawed proof in the paper by Sha et al. [24] and also from [22] where Regehr points out an error in a paper about Preemption Threshold Scheduling by Wang and Saksena [28]. The use of a theorem prover was invaluable to us in order to be confident about the correctness of our reasoning (for example no corner case can be overlooked). The most closely related work to ours is the formal verification in PVS of the Priority Ceiling Protocol done by Dutertre [7]—another solution to the Priority Inversion problem, which however needs static analysis of programs in order to avoid it. There have been earlier formal investigations into PIP [8, 10, 29], but they employ model checking techniques. The results obtained by them apply, however, only to systems with a fixed size, such as a fixed number of events and threads. In contrast, our result applies to systems of arbitrary size. Moreover, our result is a good witness for one of the major reasons to be interested in machine checked reasoning: gaining deeper understanding of the subject matter.
Our formalisation consists of around 600 lemmas and overall 9200 lines of readable and commented Isabelle/Isar code with a few applyscripts interspersed. The formal model of PIP is 310 lines long; our graph theory implementation using relations is 1615 lines; the basic properties of PIP take around 5000 lines of code; and the formal correctness proof 1250 lines.
The properties relevant for an implementation require 1000 lines. The code of our formalisation can be downloaded from the Mercurial repository at http://talisker.inf.kcl.ac.uk/cgibin/repos.cgi/pip.
Footnotes
 1.
Sha et al. call it the Basic Priority Inheritance Protocol [24] and others sometimes also call it Priority Boosting, Priority Donation or Priority Lending.
 2.
We shall come back later to the case of PIP on multiprocessor systems.
 3.
For wellfounded we use the quite natural definition from Isabelle/HOL.
 4.
This situation is similar to the infamous occurs check in Prolog: In order to say anything meaningful about unification, one needs to perform an occurs check. But in practice the occurs check is omitted and the responsibility for avoiding problems rests with the programmer.
 5.
Notes
Acknowledgements
We are grateful for the comments we received from anonymous referees. We are also deeply saddened about the tragic death of our coauthor, colleague and friend, Chunhan, who suddenly died on 22 December 2016. He drove very much forward this work and extended it in his PhDthesis with a formal verification of a SELinuxstyle access control system. He was a stellar student and very promising young researcher in the field of interactive theorem proving. He was liked by many and indispensable for organising the ITP’15 conference in Nanjing. Chunhan left behind a grieving wife and 8yearold son.
References
 1.Brandenburg, B.B.: Scheduling and Locking in Multiprocessor RealTime Operating Systems. PhD thesis, The University of North Carolina at Chapel Hill (2011)Google Scholar
 2.Budin, L., Jelenkovic, L.: Timeconstrained programming in windows NT environment. In: Proceedings of the IEEE International Symposium on Industrial Electronics (ISIE), vol. 1, pp. 90–94 (1999)Google Scholar
 3.Buttazzo, G.C.: Hard RealTime Computing Systems: Predictable Scheduling Algorithms and Applications, 3rd edn. Springer, Berlin (2011)CrossRefzbMATHGoogle Scholar
 4.Cox, R., Kaashoek, F., Morris, R.: Xv6. http://pdos.csail.mit.edu/6.828/2012/xv6.html
 5.Cox, R., Kaashoek, F., Morris, R.: Xv6: A Simple. Unixlike Teaching Operating System. Technical report, MIT (2012)Google Scholar
 6.Davis, R.I., Burns, A.: A survey of hard realtime scheduling for multiprocessor systems. ACM Comput. Surv. 43(4), 35:1–35:44 (2011)CrossRefzbMATHGoogle Scholar
 7.Dutertre, B.: The priority ceiling protocol: formalization and analysis using PVS. In: Proceedings of the 21st IEEE Conference on RealTime Systems Symposium (RTSS), pp. 151–160. IEEE Computer Society (2000)Google Scholar
 8.Faria, J. M. S.: Formal Development of Solutions for RealTime Operating Systems with TLA+/TLC. PhD thesis, University of Porto (2008)Google Scholar
 9.Haftmann, F., Wenzel, M.: Local theory specifications in Isabelle/Isar. In: Proceedings of the International Conference on Types, Proofs and Programs (TYPES), vol. 5497 of LNCS, pp. 153–168 (2008)Google Scholar
 10.Jahier, E., Halbwachs, B., Raymond, P.: Synchronous modeling and validation of priority inheritance schedulers. In: Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering (FASE), vol. 5503 of LNCS, pp. 140–154, (2009)Google Scholar
 11.Klein, G., Andronick, J., Elphinstone, K., Heiser, G., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., Winwood, S.: seL4: formal verification of an OS kernel. Commun. ACM 53(6), 107–115 (2010)CrossRefGoogle Scholar
 12.Lampson, B.W., Redell, D.D.: Experiences with processes and monitors in mesa. Commun. ACM 23(2), 105–117 (1980)CrossRefGoogle Scholar
 13.Laplante, P.A., Ovaska, S.J.: RealTime Systems Design and Analysis: Tools for the Practitioner, 4th edn. Wiley, Hoboken (2011)CrossRefGoogle Scholar
 14.Li, Q., Yao, C.: RealTime Concepts for Embedded Systems. CRC Press, Boca Raton (2003)CrossRefGoogle Scholar
 15.Liu, J.W.S.: RealTime Systems. Prentice Hall, Upper Saddle River (2000)Google Scholar
 16.Moylan, P.J., Betz, R.E., Middleton, R.H.: The Priority Disinheritance Problem. Technical Report EE9345, University of Newcastle (1993)Google Scholar
 17.Paulson, L.C.: ML for the Working Programmer. Cambridge University Press, Cambridge (1996)CrossRefzbMATHGoogle Scholar
 18.Paulson, L.C.: The inductive approach to verifying cryptographic protocols. J. Comput. Secur. 6(1–2), 85–128 (1998)CrossRefGoogle Scholar
 19.Pfaff, B.: PINTOS. http://www.stanford.edu/class/cs140/projects/
 20.Rajkumar, R.: Synchronization in RealTime Systems: A Priority Inheritance Approach. Kluwer, Dordrecht (1991)CrossRefzbMATHGoogle Scholar
 21.Reeves, G.E.: Re: What Really Happened on Mars? Risks Forum 19(54) (1998)Google Scholar
 22.Regehr, J.: Scheduling tasks with mixed preemption relations for robustness to timing faults. In: Proceedings of the 23rd IEEE RealTime Systems Symposium (RTSS), pp. 315–326 (2002)Google Scholar
 23.Rostedt, S.: RTMutex Implementation Design. Linux Kernel Distribution at, www.kernel.org/doc/Documentation/rtmutexdesign.txt
 24.Sha, L., Rajkumar, R., Lehoczky, J.P.: Priority inheritance protocols: an approach to realtime synchronization. IEEE Trans. Comput. 39(9), 1175–1185 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
 25.Silberschatz, A., Galvin, P.B., Gagne, G.: Operating System Concepts, 9th edn. Wiley, Hoboken (2013)zbMATHGoogle Scholar
 26.Vahalia, U.: UNIX Internals: The New Frontiers. PrenticeHall, Upper Saddle River (1996)zbMATHGoogle Scholar
 27.Wang, J., Yang, H., Zhang, X.: Liveness reasoning with Isabelle/HOL. In: Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics (TPHOLs), volume 5674 of LNCS, pp. 485–499 (2009)Google Scholar
 28.Wang, Y., Saksena, M.: Scheduling fixedpriority tasks with preemption threshold. In: Proceedings of the 6th Workshop on RealTime Computing Systems and Applications (RTCSA), pp. 328–337 (1999)Google Scholar
 29.Wellings, A., Burns, A., Santos, O.M., Brosgol, B.M.: Integrating priority inheritance algorithms in the realtime specification for java. In: Proceedings of the 10th IEEE International Symposium on Object and ComponentOriented RealTime Distributed Computing (ISORC), pp. 115–123. IEEE Computer Society (2007)Google Scholar
 30.Yodaiken, V.: Against Priority Inheritance. Technical report, Finite State Machine Labs (FSMLabs) (2004)Google Scholar
 31.Zhang, X., Urban, C., Wu, C.: Priority inheritance protocol proved correct. In: Proceedings of the 3rd Conference on Interactive Theorem Proving (ITP), vol. 7406 of LNCS, pp. 217–232 (2012)Google Scholar
Copyright information
OpenAccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.