previous
 next 
CS 3853 Computer Architecture Notes on Appendix C Section 3

Read Appendix C.3

C3: Pipeline Implementation

We start with a simple unpipelined implementation of a subset of the MIPS instructions.

Unpipelined Implementation

We consider the following 5 types on instructions: The following information is from Figure A-22.
All instructions are 32 bits and these instructions have one of 2 formats:
I-type:
Figure A.22-I
Used for:
R-type:
Figure A.22-R
Used for:

Today's News: September 24
I will have shortened office hours on Thursday from 1pm until 1:40pm
I will also be available before 11:15am by appointment.

Examples:
Figure C.21 shows the hardware needed to implement these instructions in 5 or fewer cycles.
ClassQue: Figure C.21 logic types

Here is what happens at each cycle:
ClassQue: Figure C.21 Muxes A and B

Question:
ClassQue: Figure C.21 Instruction Encoding
The RR instruction is described as:
RR ALU: Regs[rd] ← Regs[rs] funct Regs[rt]
What would have to change if instead it were:
RR ALU: Regs[rs] ← Regs[rt] funct Regs[rd]
Answer:
?

Pipelined Implementation

Figure C.22 shows a corresponding pipeline implementation.
The registers NPC, IR, A, B, Imm, Cond, ALUOutput and LMD are now contained in the pipeline registers.
Examples:
  1. NPC is contained in which pipeline register?
    Answer:
    NPC is created in IF so it it stored in the IF/ID register.
    It is needed in EX, so it must be also be in ID/EX.
  2. IR is stored in which pipieline registers?
    Answer:
    Parts of the IR register are needed in each cycle, so for simplicity, the entire IR is propagated to each pipeline register. This is somewhat inefficient.

Today's News: September 26
Exam 1 will be one week from today.

Examples: Figure C.23 shows the details of the pipelined execution for each type of instruction.

Below is a comparison for the RR ALU instruction. See Figures C.21 and C.22
Operations that are performed, but not needed for this instruction are shown this way: operation.
StageUnpipelinedPipielined
IF IR ← Mem[PC]
NPC ← PC + 4
IF/ID.IR ← Mem[PC]
PC ← PC + 4
IF/ID.NPC ← PC + 4
ID A ← Regs[IR.rs]
B ← Regs[IR.rt]
Imm ← sign-extended(IR.Immediate)
ID/EX.A ← Regs[ID/IF.IR.rs]
ID/EX.B ← Regs[ID/IF.IR.rt]
ID/EX.NPC ← IF/ID.NPC
ID/EX.IR ← ID/ID.IR
ID/EX.Imm ← sign-extended(IF/ID.IR.Immediate)
EX ALUOutput ← A funct B EX/MEM.IR ← ID/EX.IR
EX/MEM.ALUOutput ← ID/EX.A funct ID/EX.B
MEM PC ← PC + 4 MEM/WB.IR ← EX/MEM.IR
MEM/WB.ALUOutput ← EX/MEM.ALUOutput
WB Regs[IR.rd] ← ALUOutput Regs[MEM/WB.IR.rd] ← MEM/WB.ALUOutput
Note: My notation is slightly different from that of the book.
For the unpipelined case I use IR.rs instead of just rs, etc.
For the pipelined case I use XX/XX.IR.rs instead of XX/XX.IR[rs]

ClassQue: Exam 1 Sample Problems 1
ClassQue: Exam 1 Sample Problems 2
problems    solutions

How Branches Work

Branches are hard. In the unpipelined architecture shown in Figure C.21: The the pipelined architecture shown in Figure C.22 has a 3-cycle stall when a branch is taken:
Suppose the instruction stream looks like:
instruction (not branch)
instruction (not branch)
instruction (not branch)
instruction A: taken branch
instruction B
instruction C
instruction D
...
instruction X: branch target
The PC is set at the end of IF to either PC+4 (normally) or if the Zero? field of EX/MEM is not 0 it is set to the ALU result
The Zero? field of EX/MEM stays 0 until the branch instruction is executed.
If the branch instruction is fetched in cycle i:
Today's News: October 1
Exam 1 will be on Thursday.
You can find the figure sheet for the exam here.
You will not need to use the bottom figure which
shows the hardware needed to eliminate branch stalls.
Recitation this week has many sample problems,
most of which will not be covered in the recitation.
Solutions for these problems will be available this afternoon.

Examples: The timing diagram looks like this:
instructioncycle i  cycle i+1  cycle i+2  cycle i+3  cycle i+4  cycle i+5  cycle i+6  cycle i+7  cycle i+8
instruction A (taken branch)IF  ID  EX  MEMWB
instruction BIFIDEXMEMWB
instruction CIFIDEXMEMWB
instruction DIFIDEXMEMWB
instruction X (branch destination)IFIDEXMEMWB
ClassQue: Pipeline Branch

Reducing the branch penalty

Figure C.28 shows how to reduce the branch taken penalty from 3 to 1. Figures C.22 and C.28 compared The timing diagram now looks like this:
instructioncycle i  cycle i+1  cycle i+2  cycle i+3  cycle i+4  cycle i+5  cycle i+6
instruction A (taken branch)IF  ID  EX  MEMWB
instruction BIFIDEXMEMWB
instruction X (branch destination)IFIDEXMEMWB

Questions:
  1. Why do we not strike out the ID and EX of instruction B?
    Answer:
    We do not have to since they do not change the external state.
  2. Why don't with strike out the MEM and WB for instruction A?
    Answer:
    A branch instruction does not do anything in these stages.

Dealing with data hazards

Recall that there are 3 types of hazards: structural, data, and control.
Structural hazards will not occur because we included enough hardware.
The above discussion showed how to handle control hazards.
When a data hazard occurs, we need to either stall the pipeline, or elimintate the hazard by using forwarding.
ClassQue: forwarding hardware

Examples:
  1. The following requires a stall of the DADD instruction:
         LD    R1, 45(R2)
         DADD  R5, R1, R7
    
    • This can be detected in the ID stage of the DADD instruction by comparing rt of the LD instruction to rs and rt of the DADD instruction.
    • During the ID stage of DADD, rs is in IF/ID.IR.rs and rt is in IF/ID.IR.rt
    • During the ID stage of DADD, rt of LD is in ID/EX.IR.rt
  2. The following data hazard in the DSUB instruction can be removed by forwarding:
         LD    R1, 45(R2)
         DADD  R5, R6, R7
         DSUB  R8, R1 R7
    
    • This can be detected in the EX stage of the DSUB by comparing the rt of the LD to the rs or rt of DSUB
    • In this case in the EX stage of DSUB, the ALU must be fed not from the ID/EX register but from the load result in MEM/WB.
    • Figure C.27 shows the new data paths needed and the new muxes for the ALU.

Next Notes

Back to CS 3853 Notes Table of Contents