Efficient Verified Implementation of Introsort and Pdqsort

  • Peter LammichEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12167)


Sorting algorithms are an important part of most standard libraries, and both, their correctness and efficiency is crucial for many applications.

As generic sorting algorithm, the GNU C++ Standard Library implements the introsort algorithm, a combination of quicksort, heapsort, and insertion sort. The Boost C++ Libraries implement pdqsort, an extension of introsort that achieves linear runtime on inputs with certain patterns.

We verify introsort and pdqsort in the Isabelle LLVM verification framework, closely following the state-of-the-art implementations from GNU and Boost. On an extensive benchmark set, our verified implementations perform on par with the originals.

1 Introduction

Sorting algorithms are an important part of any standard library. The GNU C++ Library (libstdc++) [15] implements Musser’s introspective sorting algorithm (introsort) [28]. It is a combination of quicksort, heapsort, and insertion sort, which has the fast average case runtime of quicksort and the optimal \(O(n\log (n))\) worst-case runtime of heapsort. The Boost C++ Libraries [6] provide a state-of-the-art implementation of pattern-defeating quicksort (pdqsort) [29], an extension of introsort to achieve better performance on inputs that contain certain patterns like already sorted sequences. Verification of these algorithms and their state-of-the-art implementations is far from trivial, but turns out to be manageable when handled with adequate tools.

Sorting algorithms in standard libraries have not always been correct. The timsort [30] algorithm in the Java standard library has a history of bugs1, the (hopefully) last of which was only found by a formal verification effort [10]. Also, many real-world mergesort implementations suffered from an overflow bug [5]. Finally, LLVM’s libc++ [26] implements a different quicksort based sorting algorithm. While it may be functionally correct, it definitely violates the C++ standard by having a quadratic worst-case run time2.

In this paper, we present efficient implementations of introsort and pdqsort that are verified down to their LLVM intermediate representation [27]. The verification uses the Isabelle Refinement Framework [24], and its recent Isabelle-LLVM backend [23]. We also report on two extensions of Isabelle-LLVM, to handle nested container data structures and to automatically generate C-header files to interface the generated code. Thanks to the modularity of the Isabelle Refinement Framework, our verified algorithms can easily be reused in larger verification projects.

While sorting algorithms are a standard benchmark for theorem provers and program verification tools, verified real-world implementations seem to be rare: apart from our work, we are only aware of two verified sorting algorithms [3, 10] from the Java standard library.

The complete Isabelle/HOL formalization and the benchmarks are available at

2 The Introsort and Pdqsort Algorithms

The introsort algorithm by Musser [28] is a generic unstable sorting algorithm that combines the good average-case runtime of quicksort [18] with the optimal \(O(n\log (n))\) worst-case complexity of heapsort [1]. The basic idea is to use quicksort as main sorting algorithm, insertion sort for small partitions, and heapsort when the recursion depth exceeds a given limit, usually \(2\lfloor \log _2 n\rfloor \) for n elements.

Algorithm 1 shows our implementation of introsort, which closely follows the implementation in libstdc++ [15]. The function introsort sorts the slice from index Open image in new window (inclusive) up to index Open image in new window (exclusive) of the list3 xs. If there is more than one element (line 2), it initializes a depth counter and calls the function introsort_aux (line 3), which partially sorts the list such that every element is no more than threshold positions away from its final position in the sorted list. The remaining sorting is then done by insertion sort (line 4). The function introsort_aux implements a recursive quicksort scheme: recursion stops if the slice becomes smaller than the threshold (line 6). If the maximum recursion depth is exhausted, heapsort is used to sort the slice (line 7). Otherwise, the slice is partitioned (line 9), and the procedure is recursively invoked for the two partitions (line 10–11). Here, partition_pivot moves the pivot element to the first element of the left partition, and returns the start index of the right partition.

Note that we do not try to invent our own implementation, but closely follow the existing (and hopefully well-thought) libstdc++ implementation. This includes the slightly idiosyncratic partitioning scheme, which leaves the pivot-element as first element of the left partition. Moreover, the libstdc++ implementation contains a manual tail-call optimization, replacing the recursive call in line 11 by a loop. While we could easily add this optimization in an additional refinement step, it turned out to be unnecessary, as LLVM recognizes and eliminates this tail call automatically.

Algorithm 2 shows our implementation of pdqsort. As for introsort, the wrapper pdqsort just initializes a depth counter, and then calls the function pdqsort_aux (line 2), which, in contrast to introsort, completely sorts the list, such that no final insertion sort is necessary. Again, the pdqsort_aux function implements a recursive quicksort scheme, however, with a few additional optimizations. Slices smaller than the threshold are sorted with insertion sort (line 4). If the current slice is not the leftmost one of the list, as indicated by the parameter lm, the element before the start of the slice is guaranteed to be smaller than any element of the slice itself. This can be exploited to omit a comparison in the inner loop of insertion sort (cf. Sect. 3.3). If the slice is not smaller than the threshold, a pivot element is selected and moved to the front of the slice (line 6). If the pivot is equal to the element before the current slice (line 7), this indicates a lot of equal elements. The partition_left function (line 8) will put them in the left partition, and then only the right partition needs to be sorted recursively (line 10). Otherwise, partition_right (line 12) places elements equal to the pivot in the right partition. Additionally, it returns a flag ap that indicates that the slice was already partitioned. Next, we check for a highly unbalanced partitioning (line 13), i.e., if one partition is less than 1/8th of the overall size. After encountering a certain number of highly unbalanced partitionings, pdqsort switches to heapsort (line14). Otherwise, it will shuffle some elements in both partitions, trying to break up patterns in the input (line 15). If the input was already partitioned wrt. the selected pivot (indicated by the flag ap), pdqsort will optimistically try to sort both partitions with insertion sort (line 17). However, these insertion sorts abort if they cannot sort the list with a small number of swaps, limiting the penalty for being too optimistic. Finally, the two partitions are recursively sorted (lines 18–19).

Our implementation of pdqsort closely follows the implementation we found in Boost [6]. Again, we omitted a manual tail call optimization that LLVM does automatically. Moreover, for certain comparison functions, Boost’s pdqsort uses a special branch-aware partitioning algorithm [11]. We leave its verification to future work, but note that it will easily integrate in our existing formalization.

While introsort and pdqsort are based on the same idea, this presentation focuses on the more complex pdqsort: apart from the more involved pdqsort_aux function, pivot_to_front uses Tukey’s ‘ninther’ pivot selection [4], while introsort uses the simpler median-of-three scheme. It has two partitioning algorithms used in different situations, and the partition_right algorithm also checks for already partitioned slices. Finally, with insort and maybe_sort, it uses two different versions of insertion sort.

3 Verification

We use the Isabelle Refinement Framework [23, 24] to formally verify our algorithms. It provides tools to develop algorithms by stepwise refinement, and generates code in the LLVM intermediate representation [27].

A program returns an element of the following datatype:
Here Open image in new window represents possible non-termination or assertion violation, and Open image in new window a result nondeterministically chosen to satisfy predicate P. Note that we use Open image in new window to indicate defining equations. We define a refinement ordering on Open image in new window by
Intuitively, Open image in new window means that Open image in new window returns fewer possible results than Open image in new window , and may only fail if Open image in new window may fail. Note that Open image in new window is a complete lattice, with top element Open image in new window . The monad combinators are then defined as

Here, Open image in new window deterministically returns Open image in new window , and Open image in new window chooses a result of Open image in new window and then applies Open image in new window to it. If Open image in new window may fail, then the bind may also fail. We write Open image in new window for Open image in new window , and Open image in new window for Open image in new window .

Arbitrary recursive programs can be defined via a fixed-point construction [20]. An assertion fails if its condition is not met, otherwise it returns the unit value:
Assertions are used to express that a program Open image in new window satisfies the Hoare triple with precondition Open image in new window and postcondition  Open image in new window :

If the precondition is false, the right hand side is Open image in new window , and the statement trivially holds. Otherwise, m cannot fail, and every possible result x of m must satisfy Q.

While the Isabelle Refinement Framework provides some syntax to express programs, for better readability, we use the slightly more informal syntax that we have already used in Algorithms 1 and 2. In particular, we treat lists as if they were updated in place, while our actual formalization is purely functional, i.e., generates a new version of the list on each update, which is explicitly threaded through the program. Destructively updated arrays will only be introduced in a later refinement step (cf. Sect. 4).

3.1 Specification of Sorting Algorithms

The first step to verify a sorting algorithm is to specify the desired result. We specify a sorting algorithm as follows:

here Open image in new window is the length of the list Open image in new window and Open image in new window is the slice of the list Open image in new window for indexes in the interval Open image in new window4. The equivalence relation xs =\(_{l,h}\,\,xs'\) relates lists xs and \(xs'\) iff they are equal outside the slice Open image in new window and xs is a permutation of \(xs'\). To simplify the presentation, we assume a linear ordering on the elements. Note that both C++ and our actual formalization support arbitrary weak orderings [19].

3.2 Quicksort Scheme

We split a call of pdqsort_aux into phases, described by the following predicates:

Let Open image in new window denote the original list. First, a pivot element is selected and moved to the beginning of the slice (phase Open image in new window ). The pivot is selected in a way such there is at least one smaller ( Open image in new window ) and one greater ( Open image in new window ) element, e.g., by a median-of-three selection. This knowledge can later be exploited to optimize the inner loops of the partitioning algorithm. After the partitioning (phase Open image in new window ), m points to the pivot element, and all elements before m are smaller, and all elements after m are greater. Then, first the left (phase Open image in new window ), and then the right (phase Open image in new window ) partition gets sorted, while the list remains partitioned around m.

This approach allows us to prove correct the algorithm, without assuming too many details of the underlying subroutines. The following is all we need to know about the subroutines:

where Open image in new window states that the element Open image in new window before the slice is smaller than any element of the slice. Note that we explicitly mention the changed list \(xs'\) in these specifications, while we left the list changes implicit in the algorithm description.

Intuitively, (a,d,f) state correctness of the sorting subroutines, (b) states that pivot selection goes to phase Open image in new window , (c) states that partitioning transitions from phase Open image in new window to phase Open image in new window , and (e) states that shuffling preserves phase Open image in new window . From the above, we easily prove the following lemmas:
The correctness statement for pdqsort_aux is:

The proof is done by using the Refinement Framework’s verification condition generator, and then discharging the generated VCs using the above lemmas. The line numbers in the following brief sketch refer to Algorithm 2. As termination measure for the recursion, we use the size \(h-l\) of the slice to be sorted. If we switch to insertion sort in line 4, (a) implies that the slice gets sorted, and we are done. Otherwise, we select a pivot in line 6, going to phase Open image in new window (b). When the equals optimization is triggered in line 7, we transition to phase Open image in new window (c), and the left partition is already sorted5 (g), such that we can transition to phase Open image in new window (j), and, via a recursive call in line 10 to phase Open image in new window (k). This implies that the slice is sorted (l), and we are done. When the equals optimization is not used, (c) shows that we transition to phase Open image in new window in line 12. If the partition is unbalanced, we either use heapsort (line 14) to directly sort the slice (d), or shuffle the elements (line 15) and stay in phase Open image in new window (e). In line 17, the algorithm may attempt to sort the slice. If this succeeds, we are done (f). Otherwise, we stay in phase Open image in new window (h), and the recursive calls in lines 18 and 19 will take us to phase Open image in new window (j,k), which implies sortedness of the slice (l).

Using the above statement, and an analogous statement for introsort, we can prove the main correctness theorem:

Note that we could prove the correctness of our algorithm with only minimal assumptions about the used subroutines. This decoupling of the algorithm from its subroutines simplifies the proof, as it is not obfuscated with unnecessary details. For example, correctness of the algorithm does not depend on the exact partitioning scheme being used, as long as it implements a transition from the Open image in new window to the Open image in new window phase. It also simplifies changing the subroutines later, e.g., adding further optimizations such as branch-aware partitioning [11].

Breaking down an algorithm into small and decoupled modules is often the key to its successful verification. Note that the original implementation in Boost is more coarse grained, inlining much of the functionality into the main algorithm. After having proved correct an algorithm, we can always do the inlining in a later refinement step, or rely on the LLVM optimizer to do the inlining for us. In our formalization, we use the inlining feature of Isabelle-LLVM’s preprocessor.

3.3 Insertion Sort

Algorithm 3 shows our implementation of insertion sort. The insort procedure repeatedly calls insert to add elements to a sorted prefix of the list. The flag G controls the unguarded optimization: if it is false, we assume that insert will hit a smaller element before underflowing the list index i, and thus omit the comparison \(l<i\) (line 9) in the inner loop. We later specialize the insort algorithm for the two cases of G, and simplify the loop conditions accordingly.

Again, we split the insertion sort algorithm into two smaller parts, which are proved separately via the following specification for insert:

This captures the intuition that insert goes from a slice that is sorted up to index i to one that is sorted up to index \(i+1\).

3.4 The Remaining Subroutines

The proofs of the remaining subroutines follow a similar plot, and are not displayed here in full. Most of them were straightforward, and we could use existing Isabelle proofs as guideline [16, 22, 25]. For the shuffle and pivot_to_front procedures, which contain a large number of indexing and update operations, we ran into a scalability problem: the many partially redundant in-bound statements for the indexes overwhelmed the linear arithmetic solver that is hard-wired into the simplifier. We worked around this problem by introducing auxiliary definitions, which hide the in-bound statements from the simplifier, and allow us to precisely control when it sees them.

Finally, we point out another interesting application of refinement: the sift_down function of heapsort restores the heap property by floating down an element6. A straightforward implementation swaps the element with one of its children, until the heap property is restored (Algorithm 4 (left)). However, the element that is written to \(xs[\text {right}(i)]\) or \(xs[\text {left}(i)]\) by the swap will get overwritten in the next iteration. A common optimization to save half of the writes is to store the element to be moved down in a temporary variable, and only assign it to its final position after the loop (Algorithm 4 (right)). Note that the insert procedure of insertion sort (cf. Algorithm 3) does a similar optimization. However, for insert, it was feasible to prove the optimization together with the actual algorithm. For the slightly more complicated sift-down procedure, we first prove correct the simpler algorithm with swaps, and then refine it to the optimized version. Inside the loop, the refinement relation between the abstract list Open image in new window and the concrete list Open image in new window is Open image in new window . Using the tool support of the Isabelle Refinement Framework, the proof that the optimized version refines the version with swaps requires only about 20 lines of straightforward Isabelle script.

4 Imperative Implementation

We have presented a refinement based approach to verify the introsort and pdqsort algorithms, including most optimizations we found in their libstdc++ and Boost implementations. However, the algorithms are still expressed as nondetermistic programs on functional lists and unbounded natural numbers. In this section, we use the Isabelle-LLVM framework [23] to (semi-)automatically refine them to LLVM programs on arrays and 64 bit integers.

4.1 The Sepref Tool

The Sepref tool [21, 23] symbolically executes an abstract program in the Open image in new window -monad, keeping track of refinements for every abstract variable to a concrete representation, which may use pointers to dynamically allocated memory. During the symbolic execution, the tool synthesizes an imperative Isabelle-LLVM program, together with a refinement proof. The synthesis is automatic, but usually requires some program-specific setup and boilerplate. For a detailed discussion of Sepref and Isabelle-LLVM, we refer the reader to [21, 23].

Sepref comes with standard setup to refine lists to arrays. List updates are refined to destructive array updates, as long as the old version of the list is not used after the update. It also provides setup to refine unbounded natural numbers to bounded integers. It tries to discharge the resulting in-bounds proof obligations automatically. If this is not possible, it relies on hints from the user.

A common technique to provide such hints is to insert additional assertions into the abstract program. Usually, these can be proved easily. For example, in the pdqsort_aux algorithm (Algorithm 2, line 9), the assertion Open image in new window ensures that the addition Open image in new window in the next line cannot overflow. This assertion adds a proof obligation to the correctness proof of pdqsort_aux, which is easily discharged (we are in phase Open image in new window , which guarantees Open image in new window ). When refining pdqsort_aux to an implementation with bounded integers, one can assume Open image in new window to discharge the non-overflow proof obligation. Note that re-proving Open image in new window when doing the refinement would require duplicating large parts of the correctness proof. Thus, assertions provide a convenient tool to pass properties down the refinement chain. Our actual formalization contains multiple such assertions, which we have omitted in this presentation for the sake of readability.
Fig. 1.

Implementation of the insert procedure for 64 bit unsigned integer elements and Open image in new window , which is generated by the Sepref tool. This definition lies within the executable fragment of Isabelle-LLVM, i.e., the Isabelle LLVM code generator can translate it to LLVM intermediate representation. Note that the function does not depend on the lower bound parameter Open image in new window any more, as this was only required in the guarded version. Inlining will remove this bogus parameter.

Using the Sepref tool, it is straightforward to refine the sorting algorithms and their subroutines to an Isabelle-LLVM program. For example, Fig. 1 shows the Isabelle-LLVM code that is generated for the insert procedure for unsigned 64 bit integer elements and Open image in new window (cf. Sect. 3.3). Moreover, the Sepref tool proves that the generated program actually implements the abstract one:

This specifies the refinement relations for the parameters and the result, where Open image in new window relates arrays with lists, and \(nat_{64}\) relates 64-bit integers with natural numbers. The \(\cdot ^d\) annotation means that the parameter will be destroyed by the function call, while \(\cdot ^k\) means that the parameter is kept. Here, the insertion is done in place, such that the original array is destroyed.

The final correctness statement for our implementations is:

Here, Open image in new window and Open image in new window are the Isabelle-LLVM programs generated by Sepref from introsort and pdqsort (Algorithms 1 and 2). The theorem is easily proved by combining Theorem 1 with the theorems generated by Sepref.

4.2 Separation Logic and Ownership

Internally, the Sepref tool represents the symbolic state that contains all abstract variables and their refinements to concrete variables as an assertion in separation logic [8, 31]. Thus, two variables can never reference the same memory. This is a problem for nested container data structures like arrays of strings: when indexing the array, both the array element and the result of the indexing operation would point to the same string. In the original Sepref tool [21], which targeted Standard ML, we worked around this problem by always using functional data types (e.g. lists) to represent the inner type of a nested container. This workaround is no longer applicable for the purely imperative LLVM, such that we could not use Sepref for nested container data structures7.

We now describe an approach towards solving this problem for Sepref. Abstractly, we model an array by the type Open image in new window8, where Open image in new window means that the array does currently not own the respective element. The abstract indexing operation then moves the element from the list to the result:
As no memory is shared between the result and the array, we can show the following refinement:

where Open image in new window is the relation between an array and an Open image in new window , and Open image in new window is the relation for the array elements. Note that this operation does not change the concrete array Open image in new window . The movement of ownership is a purely abstract concept, which results in no implementation overhead.

The transition from Open image in new window to Open image in new window can typically be done in an additional refinement step, and thus does not obfuscate the actual correctness proofs, which are still done on plain Open image in new window . Moreover, the Open image in new window representation is only required for subroutines where extracted array elements are actually visible. For example, we define an operation to compare two array elements:
Inside this operation, we have to temporarily extract the elements i and j from the array, requiring an intermediate refinement step to Open image in new window . However, at the start and end of this operation, the array owns all its elements. For the whole operation, we thus get a refinement on plain arrays:

In our case, we only have to explicitly refine insert and sift_down. The other subroutines use Open image in new window and Open image in new window operations on plain lists.

4.3 The Isabelle-LLVM Code Generator

The programs that are generated by Sepref (cf. Fig. 1) lie in the fragment for which Isabelle-LLVM [23] can generate LLVM text. For example, the pdqsort algorithm for strings yields an LLVM function with the signature:

Here, the type Open image in new window represents dynamic arrays of characters9, represented by length, capacity, and a data pointer.

The generated LLVM text is then compiled to machine code using the LLVM toolchain. To make the generated program usable, one has to link it to a C wrapper, which handles parsing of command line options and printing of results. However, the original Isabelle-LLVM framework provides no support for interfacing the generated code from C: one has to manually write a C header file, which hopefully matches the object file generated by the LLVM compiler. If it doesn’t, the program has undefined behaviour10.

To this end, we extended Isabelle-LLVM to also generate a header file for the exported functions. For example, the Isabelle command

will check that the specified signature actually matches the Isabelle definition, and generate the following C declarations:

5 Benchmarks

The Boost library comes with a sorting algorithm benchmark suite, which we extended with further benchmarks indicated in [4]: apart from sorting random lists of elements that are mostly different (random), we also sort lists of length n that contain only n/10 different elements (random-dup-10), and lists of only two different elements (random-boolean), as well as lists where all elements are equal (equal). We also consider already sorted sequences (sorted, rev-sorted), as well as a sequence of n/2 elements in ascending order, followed by the same elements in descending order (organ-pipe). We also consider sorted sequences where we applied pn/100 random swap operations (almost-sorted-p). Finally, we consider sorted sequences with pn/100 random elements inserted at the end or in the middle ([rev-]sorted-end-p, [rev-]sorted-middle-p).
Fig. 2.

Benchmarking our verified implementations against the unverified originals. For each element type, machine, and distribution, the value \((t_1/t_2-1)*s\) is shown, where \(t_1\) is the slower time, \(t_2\) is the faster time, and \(s = 100\) if our implementation is slower, and \(s=-100\) if the original implementation is slower. That is, a positive value p indicates that our implementation is slower, requiring \(100+p\) percent of the run time of the original. Analogously, a negative value \(-p\) means that the original implementation is slower, requiring \(100+p\) percent of our implementation’s run time.

We sorted integer arrays with \(n=10^8\) elements, and string arrays with \(n=10^7\) elements. For strings, all implementations use the same data structure and compare function. For integers, we disable pdqsorts branch-aware partitioning, which we have not yet verified. For strings, it does not apply anyway.

We compile both, the verified and unverified algorithms with clang-6.0.0, and run them on a laptop with an Intel(R) Core(TM) i7-8665U CPU and 32 GiB of RAM, as well as on a server machine with 24 AMD Opteron 6176 cores and 128 GiB of RAM. Ideally, the same algorithm should take exactly the same time when repeatedly run on the same data and machine. However, in practice, we encountered some noise up to 17%. Thus, we have repeated each experiment at least ten times, and more often to confirm outliers where the verified and unverified algorithms’ run times differ significantly. Assuming that the noise only slows down an algorithm, we take the fastest time measured over all repetitions. The results are displayed in Fig. 2.

They indicate that both our pdqsort and introsort implementations are competitive. There is one outlier for pdqsort for already sorted integer arrays on the laptop. We have not yet understood its exact reason. The remaining cases differ by less than 20%, and in many cases our verified algorithm is actually faster.

6 Conclusions

We have presented the first verification of the introsort and pdqsort algorithms. We verified state-of-the-art implementations, down to LLVM intermediate representation. On an extensive set of benchmarks, our verified implementations perform on par with their unverified counterparts from the GNU C++ and Boost C++ libraries. Apart from our work, the only other verified real-world implementations of sorting algorithms are Java implementations of timsort and dual-pivot quicksort that have been verified with KeY [3, 10].

Compared to other program verification methods, the trusted code base of our approach is small: apart from the well-tested and widely used LLVM compiler, it only includes Isabelle’s logical inference kernel, and the relatively straightforward Isabelle-LLVM semantics and code generation [23]. In contrast, deductive verification tools like KeY [2] depend on the correct axiomatization of the highly complex Java semantics, as well as on several automatic theorem provers, which, themselves, are highly complex and optimized C programs.

Our verified algorithms can readily be used in larger verification projects, and we have already replaced a naive quicksort implementation that caused a stack overflow in an ongoing SAT-solver verification project [13]. For fixed element types and containers based on arrays (e.g. std::vector), we can use our verified algorithms as a drop-in replacement for C++’s std::sort. A direct verification of the C++ code of std::sort, however, would require a formal semantics of C++, including templates and the relevant concepts from the standard template library (orderings, iterators, etc.). To the best of our knowledge, such a semantics has not yet been formalized, let alone been used to verify non-trivial algorithms.

The verification of introsort took us about 100 person hours. After we had set up most of the infrastructure for introsort, we could verify the more complex pdqsort in about 30 h. The development consists of roughly 8700 lines of Isabelle text, of which 2400 lines are for introsort and 2200 lines for pdqsort. The rest is boilerplate and libraries shared between both algorithms, among them 1500 lines for the verification of heapsort.

6.1 Related Work

Sorting algorithms are a standard benchmark for program verification tools, such that we cannot give an exhaustive overview here. Nevertheless, we discuss a few notable examples: the arguably first formal proof of quicksort was given by Foley and Hoare himself [14], though, due to the lack of powerful enough theorem provers at these times, it was only done on paper.

One of the first mechanical verifications of imperative sorting algorithms is by Filliâtre and Magaud [12], who prove correct imperative versions of quicksort, heapsort, and insertion sort in Coq. However, they use a simplistic partitioning scheme, do not report on code generation or benchmarking, nor do they combine their separate algorithms to get introsort.

The timsort algorithm, which was used in the Java standard library, has been verified with the KeY tool [10]. A bug was found and fixed during the verification. Subsequently, KeY has been used to also verify the dual-pivot quicksort algorithm from the Java standard library [3]. This time, no bugs were found.

6.2 Future Work

An obvious next step is to verify a branch-aware partitioning algorithm [11]. Thanks to our modular approach, this will easily integrate with our existing formalization. We also plan to extend our work to stable sorting algorithms. Recently, we have extended the Refinement Framework to support reasoning about algorithmic complexity [17]. Once this work has been integrated with Isabelle-LLVM, we can also prove that our implementations have a worst-case complexity of \(O(n\log (n))\), as required by the C++ standard. Finally, we proposed an explicit ownership model for nested lists. We plan to extend this to more advanced concepts like read-only shared ownership, inspired by Rust’s [32] ownership system. Formally, this could be realized with fractional permission separation logic [7].


  1. 1.
  2. 2.

    See This has not been fixed by April 2020.

  3. 3.

    Our formalization initially uses lists to represent the sequence of elements to be sorted, and refines them to arrays later (cf. Sect. 4).

  4. 4.

    An interval from index Open image in new window (inclusive) to Open image in new window (exclusive) is denoted as Open image in new window . If both indexes are exclusive, we write Open image in new window .

  5. 5.

    Actually all elements in the left partition are equal to the pivot.

  6. 6.

    See, e.g., [9, Ch. 6] or [33, Ch. 2.4] for a description of heapsort.

  7. 7.

    We could still reason about such structures on a lower level.

  8. 8.

    Here, Open image in new window is Isabelle’s option datatype, and Open image in new window is the corresponding selector function.

  9. 9.

    For strings, we use the verified dynamic array implementation provided by Isabelle-LLVM. Note that C++ uses a similar representation, with an additional optimization for small strings.

  10. 10.

    In practice, this means it will probably SEGFAULT. However, it also might return wrong results, or be prone to various kinds of exploits.



We received funding from DFG grant LA 3292/1 “Verifizierte Model Checker” and VeTSS grant “Formal Verification of Information Flow Security for Relational Databases”.


  1. 1.
    Williams, J.W.J.: Algorithm 232: heapsort. Commun. ACM 7(6), 347–349 (1964)Google Scholar
  2. 2.
    Beckert, B., Hähnle, R., Schmitt, P.H.: Verification of Object-oriented Software: The KeY Approach. Springer, Heidelberg (2007). Scholar
  3. 3.
    Beckert, B., Schiffl, J., Schmitt, P.H., Ulbrich, M.: Proving JDK’s dual pivot quicksort correct. In: Paskevich, A., Wies, T. (eds.) VSTTE 2017. LNCS, vol. 10712, pp. 35–48. Springer, Cham (2017). Scholar
  4. 4.
    Bentley, J.L., McIlroy, M.D.: Engineering a sort function. Softw. Pract. Exp. 23(11), 1249–1265 (1993)CrossRefGoogle Scholar
  5. 5.
    Bloch, J.: Extra, extra - read all about it: nearly all binary searches and mergesorts are brokenGoogle Scholar
  6. 6.
    Boost C++ libraries (2011)Google Scholar
  7. 7.
    Bornat, R., Calcagno, C., O’Hearn, P., Parkinson, M.: Permission accounting in separation logic. In: Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2005, pp. 259–270. ACM, New York (2005)Google Scholar
  8. 8.
    Calcagno, C., O’Hearn, P., Yang, H.: Local action and abstract separation logic. In: LICS 2007, pp. 366–378, July 2007Google Scholar
  9. 9.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)zbMATHGoogle Scholar
  10. 10.
    de Gouw, S., Rot, J., de Boer, F.S., Bubel, R., Hähnle, R.: OpenJDK’s Java.utils.Collection.sort() is broken: the good, the bad and the worst case. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 273–289. Springer, Cham (2015). Scholar
  11. 11.
    Edelkamp, S., Weiß, A.: Blockquicksort: how branch mispredictions don’t affect quicksort. CoRR, abs/1604.06697 (2016)Google Scholar
  12. 12.
    Filliâtre, J.-C., Magaud, N.: Certification of sorting algorithms in the Coq system (1999)Google Scholar
  13. 13.
    Fleury, M., Blanchette, J.C., Lammich, P.: A verified SAT solver with watched literals using Imperative HOL. In: Proceedings of CPP, pp. 158–171 (2018)Google Scholar
  14. 14.
    Foley, M., Hoare, C.A.R.: Proof of a recursive program: quicksort. Comput. J. 14(4), 391–395 (1971)MathSciNetCrossRefGoogle Scholar
  15. 15.
    The GNU C++ library. Version 7.4.0Google Scholar
  16. 16.
    Griebel, S.: Binary heaps for imp2. Archive of Formal Proofs, June 2019. Formal proof development
  17. 17.
    Haslbeck, M., Lammich, P.: Refinement with time - refining the run-time of algorithms in Isabelle/HOL. In: ITP2019: Interactive Theorem Proving, June 2019Google Scholar
  18. 18.
    Hoare, C.A.R.: Algorithm 64: quicksort. Commun. ACM 4(7), 321 (1961)CrossRefGoogle Scholar
  19. 19.
    Josuttis, N.M.: The C++ Standard Library: A Tutorial and Reference, 2nd edn. Addison-Wesley Professional, Boston (2012)Google Scholar
  20. 20.
    Krauss, A.: Recursive definitions of monadic functions. In: Proceedings of the PAR, vol. 42, pp. 1–13 (2010)Google Scholar
  21. 21.
    Lammich, P.: Refinement to Imperative/HOL. In: Urban, C., Zhang, X. (eds.) ITP 2015. LNCS, vol. 9236, pp. 253–269. Springer, Cham (2015). Scholar
  22. 22.
    Lammich, P.: Refinement based verification of imperative data structures. In: Avigad, J., Chlipala, A. (eds.) CPP 2016, pp. 27–36. ACM (2016)Google Scholar
  23. 23.
    Lammich, P.: Generating verified LLVM from Isabelle/HOL. In: Harrison, J., O’Leary, J., Tolmach, A. (eds.) 10th International Conference on Interactive Theorem Proving (ITP 2019), Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, vol. 141, pp. 22:1–22:19. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2019)Google Scholar
  24. 24.
    Lammich, P., Tuerk, T.: Applying data refinement for monadic programs to Hopcroft’s algorithm. In: Beringer, L., Felty, A. (eds.) ITP 2012. LNCS, vol. 7406, pp. 166–182. Springer, Heidelberg (2012). Scholar
  25. 25.
    Lammich, P., Wimmer, S.: Imp2 – simple program verification in Isabelle/HOL. Archive of Formal Proofs, January 2019. Formal proof development
  26. 26.
    “libc++” C++ standard libraryGoogle Scholar
  27. 27.
    LLVM language reference manualGoogle Scholar
  28. 28.
    Musser, D.R.: Introspective sorting and selection algorithms. Softw. Pract. Exp. 27(8), 983–99 (1997)CrossRefGoogle Scholar
  29. 29.
    Pattern-defeating quicksortGoogle Scholar
  30. 30.
    Peters, T.: Original description of timsort. Accessed 21 Oct 2019Google Scholar
  31. 31.
    Reynolds, J.C.: Separation logic: a logic for shared mutable data structures. In Proceedings of Logic in Computer Science (LICS), pp. 55–74. IEEE (2002)Google Scholar
  32. 32.
    The rust programming languageGoogle Scholar
  33. 33.
    Sedgewick, R., Wayne, K.: Algorithms, 4th edn. Addison-Wesley Professional, Boston (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.The University of ManchesterManchesterUK

Personalised recommendations