A Pipelined Multi-core MIPS Machine pp 161-206 | Cite as

# Pipelining

## Abstract

In this chapter we deviate from [12] and present somewhat simpler proofs in the spirit of [6]. Pipelining without speculative instruction fetch introduces delay slots after branch and jump instruction. The corresponding simple changes to ISA and reference implementation are presented in Sect. 7.1.

*invisible registers*to partition the reference implementation into pipeline stages. Replacing the invisible registers by pipeline registers and controlling the updates of the pipeline stages by a very simple stall engine we produce a basic pipelined implementation of the MIPS ISA. As in [12] and [6] we use scheduling functions which, for all pipeline stages

*k*and hardware cycles

*t*, keep track of the number of the sequential instruction

*I*(

*k*,

*t*) which is processed in cycle

*t*in stage

*k*of the pipelined hardware. The correctness proof intuitively then hinges on two observations:

- 1
The circuits of stage

*k*in the sequential hardware*σ*and the pipelined hardware*π*are almost identical; the one difference (for the instruction address*ima*) is handled by an interesting special case in the proof. - 2
If

*X*_{ π }is a signal of circuit stage*k*of the pipelined machine and*X*_{ σ }is its counter part in the sequential reference machine, then the value of*X*_{ π }in cycle*t*equals the value of*X*_{ σ }before execution of instruction*I*(*k*,*t*). In algebra \(X_{\pi}^t = X_{\sigma}^{I(k,t)}\).

*used*in the processing of the instruction

*I*(

*k*,

*t*) currently processed in stage

*k*. Proofs with slight gaps are wrong and should be fixed. Fixing the gap discussed here is not hard: one formalizes the concept of signals used for the processing of an instruction and then does the bookkeeping, which is lengthy and should not be presented fully in the classroom. In [6], where the correctness of the pipelined hardware was formally proven, the author clearly had to fix this problem, but he dealt with it in different way: he introduced extra hardware in the reference implementation, which forced signals, which are not used to zero. This makes observation 2 above strictly true.

Forwarding circuits and their correctness are studied in Sect. 7.3. The material is basically from [12] but we work out the details of the pipe fill phase more carefully.

The elegant general stall engine in Sect. 7.4 is from [6]. Like in [6], where the liveness of pipelined processors is formally proven, the theory of scheduling functions with general stall engines is presented here in much greater detail than in [12]. The reason for this effort becomes only evident at the very end of this book: due to possible interference between bus scheduler of the memory system and stall engines of the processors, liveness of pipelined multi-core machines is a delicate and nontrivial matter.