Skip to main content

Modeling of Architectures

  • Chapter
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9104))

Abstract

Concurrent programming is known to be quite hard. It is made even harder by the fact that, very often, the execution models of the machines we run our software on are not precisely defined.

This document is a tutorial on the herd tool and the cat language, in which one can define consistency models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alglave, J.: A shared memory poetics. Ph.D. thesis, Université Paris 7 (2010)

    Google Scholar 

  2. Alglave, J.: A formal hierarchy of weak memory models. Formal Methods Syst. Des. 41(2), 178–210 (2012)

    Article  MATH  Google Scholar 

  3. Alglave, J., Batty, M., Donaldson, A.F., Gopalakrishnan, G., Ketema, J., Poetzl, D., Sorensen, T., Wickerson, J.: GPU concurrency: weak behaviours and programming assumptions. In: ASPLOS (2015)

    Google Scholar 

  4. Alglave, J., Kroening, D., Lugton, J., Nimal, V., Tautschnig, M.: Soundness of data flow analyses for weak memory models. In: Yang, H. (ed.) APLAS 2011. LNCS, vol. 7078, pp. 272–288. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  5. Alglave, J., Maranget, L., Tautschnig, M.: Herding cats: modelling, simulation, testing, and data-mining for weak memory. TOPLAS 36(2) (2014)

    Google Scholar 

  6. The Beatles: The Continuing Story of Bungalow Bill. In: White Album (1968)

    Google Scholar 

  7. Hower, D.R., Hechtman, B.A., Beckmann, B.M., Gaster, B.R., Hill, M.D., Reinhardt, S.K., Wood, D.A.: Heterogeneous-race-free memory models. In: ASPLOS 14 (2014)

    Google Scholar 

  8. Lamport, L.: How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. 28(9), 690–691 (1979)

    Article  MATH  Google Scholar 

  9. Roxy Music: Re-Make Re-Model. In: Roxy Music (1972)

    Google Scholar 

  10. Owens, S., Sarkar, S., Sewell, P.: A better x86 memory model: x86-TSO. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 391–407. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  11. SPARC International Inc.: The SPARC Architecture Manual Version 9 (1994)

    Google Scholar 

  12. Sparks: Here Kitty. In: Hello Young Lovers (2006)

    Google Scholar 

  13. T-Rex: Jeepster. In: Electric Warrior (1971)

    Google Scholar 

  14. Vafeiadis, V., Balabonski, T., Chakraborty, S., Morisset, R., Nardelli, F.Z.: Common compiler optimisations are invalid in the C11 memory model and what we can do about it. In: POPL (2015)

    Google Scholar 

  15. Sonic Youth: Kool Thing. In: Goo (1990)

    Google Scholar 

  16. Led Zeppelin: Black Dog. In: Led Zeppelin IV (1971)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jade Alglave .

Editor information

Editors and Affiliations

Appendices

A Answers to Exercises

1.1 A.1 First Step in Herding Cats

Exercise: what do you think?. Do you think such an execution is possible?

Yes! On IBM Power or ARM machines for example [5].

1.2 A.2 First Big Cat: Tiger

Exercise: SC per Location with Load-Load Hazard. Certain architectures, such as Sparc RMO [11] allow what is sometimes called load-load hazard, i.e. a situation where the coRR test that we’ve just seen is allowed to yield the result 0:r1=1; 0:r2=0;. How do you think we can build a check that forbids all tests coWW, coRW1, coRW2 and coWR, but allows the test coRR?

In cat speak:

figure dw

Intuitively what we’re doing here is removing the read-read pairs (R*R) from po-loc to build the po-loc-llh relation, then enforcing that this new relation is compatible with the communication relations com.

Let’s run herd on coWW, coRW1, coRW2, and coWR, and observe that they are still forbidden. Now let’s run it on coRR and observe that it is allowed. We’ve built a check that enforces SC per location but allows for load-load hazards!

Exercise: Implementing Address Dependencies. How do you think we can implement address dependencies? By this I mean that I’d like to see a sequence of instructions which, placed between two reads, implement e.g. an address dependency.

Consider the following variant of MP; the reading thread P1 has an address dependency between its two reads:

figure dx

This dependency, in conjunction with the lightweight fence on P0, should be enough to forbid the non-SC execution of MP.

Exercise: Distributed. 2+2w Consider the following litmus test, which is essentially a distributed variant of 2+2w:

figure dy

Which fences should we use to forbid the final state? Why? Where should we put them?

We should put a lightweight fence between the read-write pair on P1, and a lightweight fence between the write-write pair on P2, like so:

figure dz

This is because the A-cumulativity of the lightweight fence on P1 will impose an ordering between the write of x on P0 and the write of y on P1.

Exercise: Independent Reads of Independent Writes. Consider the following litmus test, known as IRIW:

figure ea

Which fences should we use to forbid the final state? Why? Where should we put them?

We should put a heavyweight fence between the read-read pairs on P1 and P3, like so:

figure eb

This is because IRIW essentially is a distributed variant of the store buffering example, therefore reacts to fences in much the same way as SB, just like ISA2 or WRC react to fences (lightweight in their case) in much the same way as MP.

Thus the A-cumulativity of the heavyweight fence will impose an ordering between the write of x on P0 and the read of y on P1 (idem for the write of y on P2 and the read of x on P3).

Exercise: SC and TSO Re-Make Re-Model [9]. Remember that we’ve defined SC and (not quite) TSO earlier. Try to reformulate both models in terms of the five principles we’ve just learnt: sc per location, no thin air, propagation, observation and restoring sc. The SC model you’ll end up with will be equivalent to the one above. The TSO model you’ll end up with will be TSO proper!

Take the tiger cat file, and use the following bell files. For SC:

figure ec

and for TSO:

figure ed

1.3 A.3 Second Big Cat: Jaguar

Exercise: Difference Between RMO and Power or ARM What’s a test that distinguishes RMO from Power or ARM (as we’ve defined them previously)? More precisely, what’s a test that’s forbidden on RMO but allowed on Power or ARM?

IRIW+deps distinguishes RMO from Power or ARM:

figure ee

because it is forbidden on RMO [1, 2], but allowed on Power and ARM [5].

Exercise: Implementing Scope Inclusion. How would you implement an RMO per scope model as above, but where the fences have an effect not only at their scopes, but also at narrower scopes? That is, the fence for system also has an effect at gpu and cta level for example? This procedure should forbid mp-mit-scopes+fgpu+fsys, but not mp-mit-scopes+fcta+fgpu.

We need to invent a recursive notion of wider, that will return, given a scope level s, all scope levels wider than s, not just the immediately wider one:

figure ef

We can then use this new notion in the way we define our rmo-fences:

figure eg

Note how we replace the call to tag2events by a call to wider2 in the definition of rmo-fences.

Exercise: Release Sequence. In C++ there’s a notion of release sequence, which says that synchronisation does not just happen via special-rf, i.e. from the special write of a special read-from to the corresponding special read. Rather, it can happen from any write of the same location and on the same thread that precedes (in program order) a special write.

How would you modify our model to include this notion?

Like so (where ? means zero or one step):

figure eh

B Complete Bell and Cat Files

1.1 B.1 Kittens

Here’s kittens.bell :

figure ei

Here’s kittens.cat :

figure ej

1.2 B.2 Tiger

Here’s tiger.bell :

figure ek

Here’s tiger.cat :

figure el

1.3 B.3 Jaguar

Here’s jaguar.bell :

figure em

Here’s jaguar.cat :

figure en

1.4 B.4 Panther

Here’s panther.bell :

figure eo

Here’s panther.cat :

figure ep

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Alglave, J. (2015). Modeling of Architectures. In: Bernardo, M., Johnsen, E. (eds) Formal Methods for Multicore Programming. SFM 2015. Lecture Notes in Computer Science(), vol 9104. Springer, Cham. https://doi.org/10.1007/978-3-319-18941-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18941-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18940-6

  • Online ISBN: 978-3-319-18941-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics