Store Buffer Reduction in the Presence of Mixed-Size Accesses and Misalignment

Oberhauser, Jonas

doi:10.1007/978-3-030-03592-1_19

Jonas Oberhauser¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11294))

Included in the following conference series:

Working Conference on Verified Software: Theories, Tools, and Experiments

567 Accesses
1 Citations

Abstract

Naïve programmers believe that a multi-threaded execution of their program is some simple interleaving of steps of individual threads. To increase performance, modern Intel and AMD processors make use of store buffers, which cause unexpected behaviors that can not be explained by the simple interleaving model.

Programs that in the simple interleaving model obey one of various programming disciplines do not suffer from these unexpected behaviors in the presence of store buffers. These disciplines require that the program does not make use of several concrete features of modern processors, such as mixed-size/misaligned memory accesses and inter-processor interrupts. A common assumption is that this requirement is posed only to make the formal description and soundness proof of these disciplines tractable, but that the disciplines can be extended to programs that make use of these features with a lot of elbow grease and straightforward refinements of the programming discipline.

In this paper we discuss several of such features where that assumption is correct and two such features where it is not, namely mixed-size/misaligned accesses and inter-processor interrupts. We base our discussion on two programming disciplines from the literature. We present solutions and discuss some context, including a claim in the C11 standard that contradicts our findings.

Our work is based directly on the roughly 500 page PhD thesis of the author, which includes a formal treatment of the extensions and a detailed soundness proof.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We call the discipline the dirty-bit discipline because in the original paper, Cohen and Schirmer use a dirty bit for each thread that records whether a shared write has been executed by a thread since its last memory barrier; to check whether a program obeys the discipline, one checks that the dirty bit is zero when executing a shared read (which is not itself a memory barrier).
2.
For technical reasons, store buffer actions are allowed in the idealized machine, but are simply no-ops.
3.
The instantiation with MIPS86 can be found in the PhD thesis of the author [Obe17] and takes up 36 pages (not counting arithmetic and logical operations, for which we could literally reuse the definitions of Schmaltz). For comparison, the formal specification of MIPS86 takes up 52 pages, the condensed formal model of x86 by Degenbaev takes up 200 pages, and the informal instruction manual of x86 [Int10] takes up over 2000 pages.
4.
During testing, a few bugs were still found and fixed in this design; mostly indexing and spelling errors that we overlooked when checking the pencil and paper proofs. Mechanizing these proofs and thus fully eliminating all such bugs is future work.
5.
In contrast, in [Obe17], operations of untrusted code are considered shared by default, so technically all accesses are shared, but the rule DB must only be satisfied by trusted code. The results are the same.
6.
After transport-triggered architectures [Cor97], which forego normal instructions in favor of transport-triggered registers.
7.
In real x86, the ISR is too large to allow this simple test for zero; this does not matter for the sake of argument given here.
8.
The PhD thesis uses a stronger rule, where all accesses to transport-triggered memory regions are considered shared. This rule is only used in a single lemma, which is also implied by the rule below.
9.
In the presence of transport-triggered registers, all rules below concerning shared writes also apply to writes to such registers.

References

Alglave, J., Kroening, D., Nimal, V., Poetzl, D.: Don’t sit on the fence. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 508–524. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_33
Chapter Google Scholar
Boehm, H.-J., Adve, S.V.: Foundations of the C++ concurrency memory model. In: ACM SIGPLAN Notices, vol. 43, pp. 68–78. ACM (2008)
Google Scholar
Bouajjani, A., Derevenetc, E., Meyer, R.: Checking and enforcing robustness against TSO. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 533–553. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37036-6_29
Chapter MATH Google Scholar
Batty, M., Owens, S., Sarkar, S., Sewell, P., Weber, T.: Mathematizing C++ concurrency. SIGPLAN Not. 46(1), 55–66 (2011)
Article Google Scholar
C11 draft n1570. https://port70.net/~nsz/c/c11/n1570.html. Accessed 14 Apr 2018
Comments on the C++ memory model following a partial formalization attempt. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2955.html. Accessed 14 Apr 2018
Chen, G., Cohen, E., Kovalev, M.: Store buffer reduction with MMUs. In: Giannakopoulou, D., Kroening, D. (eds.) VSTTE 2014. LNCS, vol. 8471, pp. 117–132. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12154-3_8
Chapter Google Scholar
Chen, G.: Store buffer reduction theorem and application. Ph.D. thesis, Saarland University (2016)
Google Scholar
Corporaal, H.: Microprocessor architectures: from VLIW to TTA (1997)
Google Scholar
Cohen, E., Schirmer, B.: From total store order to sequential consistency: a practical reduction theorem. In: Kaufmann, M., Paulson, L.C. (eds.) ITP 2010. LNCS, vol. 6172, pp. 403–418. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14052-5_28
Chapter MATH Google Scholar
Degenbaev, U.: Formal specification of the x86 instruction set architecture. Ph.D. thesis, Saarland University (2012)
Google Scholar
Flur, S., et al.: Modelling the ARMv8 architecture, operationally: concurrency and ISA. SIGPLAN Not. 51(1), 608–621 (2016)
Article Google Scholar
Intel, Santa Clara, CA, USA. Intel®64 and IA-32 Architectures Software Developer’s Manual: Volumes 1–3b, June 2010
Google Scholar
Knuth, D.E.: The Art of Computer Programming, Volume 3: Sorting and Searching, 2nd edn. Addison Wesley Longman Publishing Co. Inc., Redwood City (1998)
Google Scholar
Introducing lockrefs. https://lwn.net/Articles/565734/. Accessed 14 Apr 2018
Lutsyk, P., Oberhauser, J., Paul, W.: Multicore system architecture. Lecture notes (2016)
Google Scholar
Oberhauser, J.: A simpler reduction theorem for x86-TSO. In: Gurfinkel, A., Seshia, S.A. (eds.) VSTTE 2015. LNCS, vol. 9593, pp. 142–164. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29613-5_9
Chapter Google Scholar
Oberhauser, J.: Justifying the strong memory semantics of concurrent high-level programming languages for system programming. Ph.D. thesis, Saarland University (2017). https://dx.doi.org/10.22028/D291-27208
Owens, S., Sarkar, S., Sewell, P.: A better x86 memory model: x86-TSO. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 391–407. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03359-9_27
Chapter Google Scholar
Owens, S.: Reasoning about the implementation of concurrency abstractions on x86-TSO. In: D’Hondt, T. (ed.) ECOOP 2010. LNCS, vol. 6183, pp. 478–503. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14107-2_23
Chapter Google Scholar
Schmaltz, S.: MIPS-86-a multi-core MIPS ISA specification. Technical report, Saarland University, Saarbrücken (2013)
Google Scholar
Zahran, S.: Implementing and debugging a pipelined MIPS machine with interrupts and multi-level address translation. Master’s thesis, Saarland University (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Saarland University, Saarbrücken, Germany
Jonas Oberhauser

Authors

Jonas Oberhauser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonas Oberhauser .

Editor information

Editors and Affiliations

Yale University, New Haven, CT, USA
Ruzica Piskac
Uppsala University, Uppsala, Sweden
Philipp Rümmer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oberhauser, J. (2018). Store Buffer Reduction in the Presence of Mixed-Size Accesses and Misalignment. In: Piskac, R., Rümmer, P. (eds) Verified Software. Theories, Tools, and Experiments. VSTTE 2018. Lecture Notes in Computer Science(), vol 11294. Springer, Cham. https://doi.org/10.1007/978-3-030-03592-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-03592-1_19
Published: 24 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03591-4
Online ISBN: 978-3-030-03592-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics