Skip to main content

Store Buffer Reduction in the Presence of Mixed-Size Accesses and Misalignment

  • Conference paper
  • First Online:
Verified Software. Theories, Tools, and Experiments (VSTTE 2018)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11294))

Abstract

Naïve programmers believe that a multi-threaded execution of their program is some simple interleaving of steps of individual threads. To increase performance, modern Intel and AMD processors make use of store buffers, which cause unexpected behaviors that can not be explained by the simple interleaving model.

Programs that in the simple interleaving model obey one of various programming disciplines do not suffer from these unexpected behaviors in the presence of store buffers. These disciplines require that the program does not make use of several concrete features of modern processors, such as mixed-size/misaligned memory accesses and inter-processor interrupts. A common assumption is that this requirement is posed only to make the formal description and soundness proof of these disciplines tractable, but that the disciplines can be extended to programs that make use of these features with a lot of elbow grease and straightforward refinements of the programming discipline.

In this paper we discuss several of such features where that assumption is correct and two such features where it is not, namely mixed-size/misaligned accesses and inter-processor interrupts. We base our discussion on two programming disciplines from the literature. We present solutions and discuss some context, including a claim in the C11 standard that contradicts our findings.

Our work is based directly on the roughly 500 page PhD thesis of the author, which includes a formal treatment of the extensions and a detailed soundness proof.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We call the discipline the dirty-bit discipline because in the original paper, Cohen and Schirmer use a dirty bit for each thread that records whether a shared write has been executed by a thread since its last memory barrier; to check whether a program obeys the discipline, one checks that the dirty bit is zero when executing a shared read (which is not itself a memory barrier).

  2. 2.

    For technical reasons, store buffer actions are allowed in the idealized machine, but are simply no-ops.

  3. 3.

    The instantiation with MIPS86 can be found in the PhD thesis of the author [Obe17] and takes up 36 pages (not counting arithmetic and logical operations, for which we could literally reuse the definitions of Schmaltz). For comparison, the formal specification of MIPS86 takes up 52 pages, the condensed formal model of x86 by Degenbaev takes up 200 pages, and the informal instruction manual of x86 [Int10] takes up over 2000 pages.

  4. 4.

    During testing, a few bugs were still found and fixed in this design; mostly indexing and spelling errors that we overlooked when checking the pencil and paper proofs. Mechanizing these proofs and thus fully eliminating all such bugs is future work.

  5. 5.

    In contrast, in [Obe17], operations of untrusted code are considered shared by default, so technically all accesses are shared, but the rule DB must only be satisfied by trusted code. The results are the same.

  6. 6.

    After transport-triggered architectures [Cor97], which forego normal instructions in favor of transport-triggered registers.

  7. 7.

    In real x86, the ISR is too large to allow this simple test for zero; this does not matter for the sake of argument given here.

  8. 8.

    The PhD thesis uses a stronger rule, where all accesses to transport-triggered memory regions are considered shared. This rule is only used in a single lemma, which is also implied by the rule below.

  9. 9.

    In the presence of transport-triggered registers, all rules below concerning shared writes also apply to writes to such registers.

References

  1. Alglave, J., Kroening, D., Nimal, V., Poetzl, D.: Don’t sit on the fence. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 508–524. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_33

    Chapter  Google Scholar 

  2. Boehm, H.-J., Adve, S.V.: Foundations of the C++ concurrency memory model. In: ACM SIGPLAN Notices, vol. 43, pp. 68–78. ACM (2008)

    Google Scholar 

  3. Bouajjani, A., Derevenetc, E., Meyer, R.: Checking and enforcing robustness against TSO. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 533–553. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37036-6_29

    Chapter  MATH  Google Scholar 

  4. Batty, M., Owens, S., Sarkar, S., Sewell, P., Weber, T.: Mathematizing C++ concurrency. SIGPLAN Not. 46(1), 55–66 (2011)

    Article  Google Scholar 

  5. C11 draft n1570. https://port70.net/~nsz/c/c11/n1570.html. Accessed 14 Apr 2018

  6. Comments on the C++ memory model following a partial formalization attempt. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2955.html. Accessed 14 Apr 2018

  7. Chen, G., Cohen, E., Kovalev, M.: Store buffer reduction with MMUs. In: Giannakopoulou, D., Kroening, D. (eds.) VSTTE 2014. LNCS, vol. 8471, pp. 117–132. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12154-3_8

    Chapter  Google Scholar 

  8. Chen, G.: Store buffer reduction theorem and application. Ph.D. thesis, Saarland University (2016)

    Google Scholar 

  9. Corporaal, H.: Microprocessor architectures: from VLIW to TTA (1997)

    Google Scholar 

  10. Cohen, E., Schirmer, B.: From total store order to sequential consistency: a practical reduction theorem. In: Kaufmann, M., Paulson, L.C. (eds.) ITP 2010. LNCS, vol. 6172, pp. 403–418. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14052-5_28

    Chapter  MATH  Google Scholar 

  11. Degenbaev, U.: Formal specification of the x86 instruction set architecture. Ph.D. thesis, Saarland University (2012)

    Google Scholar 

  12. Flur, S., et al.: Modelling the ARMv8 architecture, operationally: concurrency and ISA. SIGPLAN Not. 51(1), 608–621 (2016)

    Article  Google Scholar 

  13. Intel, Santa Clara, CA, USA. Intel®64 and IA-32 Architectures Software Developer’s Manual: Volumes 1–3b, June 2010

    Google Scholar 

  14. Knuth, D.E.: The Art of Computer Programming, Volume 3: Sorting and Searching, 2nd edn. Addison Wesley Longman Publishing Co. Inc., Redwood City (1998)

    Google Scholar 

  15. Introducing lockrefs. https://lwn.net/Articles/565734/. Accessed 14 Apr 2018

  16. Lutsyk, P., Oberhauser, J., Paul, W.: Multicore system architecture. Lecture notes (2016)

    Google Scholar 

  17. Oberhauser, J.: A simpler reduction theorem for x86-TSO. In: Gurfinkel, A., Seshia, S.A. (eds.) VSTTE 2015. LNCS, vol. 9593, pp. 142–164. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29613-5_9

    Chapter  Google Scholar 

  18. Oberhauser, J.: Justifying the strong memory semantics of concurrent high-level programming languages for system programming. Ph.D. thesis, Saarland University (2017). https://dx.doi.org/10.22028/D291-27208

  19. Owens, S., Sarkar, S., Sewell, P.: A better x86 memory model: x86-TSO. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 391–407. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03359-9_27

    Chapter  Google Scholar 

  20. Owens, S.: Reasoning about the implementation of concurrency abstractions on x86-TSO. In: D’Hondt, T. (ed.) ECOOP 2010. LNCS, vol. 6183, pp. 478–503. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14107-2_23

    Chapter  Google Scholar 

  21. Schmaltz, S.: MIPS-86-a multi-core MIPS ISA specification. Technical report, Saarland University, Saarbrücken (2013)

    Google Scholar 

  22. Zahran, S.: Implementing and debugging a pipelined MIPS machine with interrupts and multi-level address translation. Master’s thesis, Saarland University (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonas Oberhauser .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Oberhauser, J. (2018). Store Buffer Reduction in the Presence of Mixed-Size Accesses and Misalignment. In: Piskac, R., Rümmer, P. (eds) Verified Software. Theories, Tools, and Experiments. VSTTE 2018. Lecture Notes in Computer Science(), vol 11294. Springer, Cham. https://doi.org/10.1007/978-3-030-03592-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03592-1_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03591-4

  • Online ISBN: 978-3-030-03592-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics