previous
 next 
CS 3853 Computer Architecture Notes on Chapter 3 Section 4



Read Section 3.4

3.4: Overcoming Data Hazards with Dynamic Scheduling

Disadvantages of Static Scheduling

Dynamic Scheduling: The Idea


Today's News: October 23, 2015
Assignment 2 due now


Example Program:
1) DIV.D   F0,F2,F4
2) ADD.D   F6,F0,F8
3) S.D.    F6,0(R1)
4) SUB.D   F8,F10,F14
5) MUL.D   F6,F10,F8
  • Data dependence between 1) and 2) (F0)
  • Data dependence between 2) and 3) (F6)
  • Antidependence between 2) and 4) (F8)
  • Output dependence between 2) and 5) (F6)
  • Antidependence between 3) and 5) (F6)
  • Data dependence between 4) and 5) (F8)
3 of these are name dependencies and can be removed by renaming.

Today's News: October 26, 2015
Assignment 2 returned
Assignment 3 available



Tomasulo's Approach
Renaming is handled by reservation stations.
  • One or more for each type of operation: load, store, FP add, FP mult, etc.
  • Contains:
    • the instruction
    • buffered operand values if available
    • otherwise, (a reference to the) reservation station producing the result
  • Fetches and buffers operands as soon as they are available
  • Only the last output updates the register file
  • Load and store reservation stations are just buffers containing data and addresses

Tomasulo Algorithm Steps
  • Issue
    • Get the next instruction
    • If there is an available reservation station (RS), issue the instruction to that RS
    • Enter the operand values into the RS, if available.
      For each operand that is not available, enter the RS it is coming from
    • If the destination is a register, set that register status so that its value will be coming from this reservation station.
  • Execute
    • If an operand becomes available, store in all RS's waiting for it.
    • If all operands are ready and an execution unit is available, start execution, but
    • No instruction can start execution until all branches that proceed it in program order have completed.
      (This is addressed in a later section.)
    • Loads and stores are handled in program order.
  • Write Result
    • Write result on CDB (common data bus)
    • This is broadcast into all reservation stations (and possibly one register) that need it
    • Only one value may be put on the CDB in a cycle
Figure 3.6 shows the basic structure of a MIPS floating point unit using Tomasulo's algorithm.

Today's News: October 28, 2015
Assignment 3 available
Exam 2 on Friday

Activity:
Tomasulo's Algorithm

Today's News: November 2, 2015
Assignment 3 available
Exam 2 returned today


Tomasulo's Approach
Example: Overview of Tomasulo's algorithm execution
Assume the following sequence of instructions:
1.  L.D.   F6,32(R2)
2.  L.D.   F2,44(R3)
3.  MUL.D  F0,F2,F4
4.  SUB.D  F8,F2,F6
5.  DIV.D  F10,F0,F6
6.  ADD.D  F6,F8,F2
Assume that there are sufficient reservation stations, 1 fmul unit, one fadd unit,
and that we have the following execution times: load:1, fadd:2, fmul:10, fdiv:40
We assume the integer operations complete in 1 cycle
We assume a dedicated ALU is used for address calculations and it can do one operation per cycle.
We assume that the floating point functional units become available on the cycle after the result is put on the CDB.
Fill in the
Instruction Summary Table or look at it here or tripled here. or for double-sided printing here.
Here is an interactive table to be filled in.
Here is a solution after all instructions have completed, assuming that all memory accesses are cache hits and take one cycle.
Note that in order to issue: Note that in order to start execution:

Today's News: November 4, 2015
Assignment 3 available
Change in quiz grade calculation


Tomasulo's Approach
Answer the following questions, each time starting with the original assumptions:
  1. How would the results change if there were only one add reservation station? answer
  2. How would the results change if there were 2 multiply functional units? answer
  3. The last instruction modifies F6 before the previous instruction which uses F6 starts to execute. Is this a problem? answer
  4. How would this change if the DIV.D instruction used F2 instead of F0? answer
  5. How would this change if the DIV.D instruction used F2 instead of F0 and there were 2 multiply functional units? answer
  6. The instruction
        ADD.D F0,F4,F4
    is executed at the end of this code. On what cycle does it complete? answer
    Today's News: November 6, 2015
    Assignment 3 available
    Change in quiz grade calculation

  7. How would this change if the second memory access is a cache miss with a miss penalty of 6 cycles? answer
  8. How would this change if the second memory access is a cache miss with a miss penalty of 6 cycles and the DIV.D instruction used F4 instead of F0? answer
  9. How would this change if the an fmul takes 5 cycles instead of 10? answer



Next Notes

Back to CS 3853 Notes Table of Contents