previous
 next 
CS 3853 Computer Architecture Notes on Appendix C Section 2

Read Appendix C

C2: Pipeline Hazards

A hazard prevents the next instruction from executing during its designated clock cycle.

Hazard Classifications

  1. structural hazard: insufficient hardware due to overlapped execution.
  2. data hazard: instruction needs data from a previous instruction before it is available.
  3. control hazard: branch changes the PC after a later instruction has been fetched.
A hazard may require that the pipeline stalls until the hazard can be cleared.
For now, when a stall occurs:

Performance with stalls

We will compare an unpipelined machine in which instructions take several cycles to a pipelined machine with the same clock rate.

Structural Hazards

At some stage of the pipeline, two instructions require the same resource.
A shared single-memory port for data and instructions
Examples:
ClassQue: Memory structural hazard
  1. Compare the corresponding balanced unpipelined machine with a 5-stage pipelined machine with one shared memory port to a pipelined machine with a single memory port in which loads and stores together make up 30% of the instructions.
  2. Since most computers use the same memory of data and instructions, why is the above not a problem for modern machines?

Data Hazards

These occur when the pipeline would change the order of read/write accesses so that they differ from the order of unpipelined execution.
Consider:
DADD    R1, R2, R3
DSUB    R4, R1, R5
AND     R6, R1, R7
OR      R8, R1, R9
XOR     R10 R1, R11

Today's News: September 17
No news yet.

Forwarding
The idea of forwarding is that even though a result is not stored in the register file until WB, it is often available several cycles earlier. For an ALU instruction it is available in the EX/MEM pipeline register.
Figure C.7 shows how the two stalls from the previous example can be eliminated by forwarding (part of the) contents of the pipeline registers to the next stage.
Problem
ClassQue: forwarding hardware
To implement forwarding for the first two instructions in the example above, one of the ALU inputs must be able to be gotten from two different places depending on the previous instruction.
  1. From which pipeline register(s) does the ALU get its inputs?
  2. What type of circuit is required to implement this?
Examples:
Consider:
DADD    R1, R2, R3
LD      R4, 0(R1)
SD      R4, 12(R1)
The LD and SD use the ALU to calculate the effective address in EX.
Figure C.8 shows how forwarding can be used to get R1 before it is stored back in the register file.
Also, the value of R4 from the LD is given to the SD before it goes into the register file.

Sometimes stalls are necessary
Consider:
LD      R1, 0(R2)
DSUB    R4, R1, R5
AND     R6, R1, R7
OR      R8, R1, R9
Figure C.9 shows the required forwarding paths.
The DSUB needs the result of the LD before it is available anywhere, so a stall is required.
The EX cycle of the DSUB requires the value generated in the MEM cycle of the LD which occurs at the same time.
It is fixed by introducing a stall in before the EX cycle of the DSUB.
All subsequent instructions are also stalled.
clock number
Instruction    123456789
LD   R1,0(R2)  IFIDEXMEMWB
DSUB R4,R1,R5IFIDstallEXMEMWB
AND  R6,R1,R7IFstallIDEXMEMWB
OR   R8,R1,R9stallIFIDEXMEMWB
Question:
ClassQue: stalls after memory access
  1. Suppose the sequence of instructions is:
    LD      R1, 0(R2)
    DSUB    R4, R1, R5
    AND     R6, R7, R7
    OR      R8, R7, R7
    
    1. Would we still have to delay the AND and OR instructions, even though they use different registers? Why?
    2. How could you prevent the stalls in this code sequence?
    Answer:
    ?

Branch Hazards

Control hazards can cause a significant performance loss. Four static methods of dealing with branch stalls
Method 1: freeze or flush the pipeline
This is the method that was shown above.
The penalty is always one cycle and cannot be fixed by software.

Method 2: treat every branch as not taken


Today's News: September 19
Pick up your Assignment 1 if you handed it in on Tuesday.

Method 3: treat every branch as taken

Method 4: The delayed branch

Reducing the branch cost through prediction

There are 2 classes of branch prediction:

Static Branch Prediction

Dynamic Branch Prediction

The simplest technique uses a branch prediction buffer or branch history table.
Next Notes

Back to CS 3853 Notes Table of Contents