Today's News: November 9, 2015
Assignment 3 available
3.5: Dynamic Scheduling Algorithm
Tomasulo Algorithm Details
The 7 fields of a reservation station:
- Op: The operation to be performed
- Qj and Qk: The reservation station containing the source argument.
0 means the argument is is the corresponding V field.
- Vj and Vk: The value of a source argument.
For loads, the Vk or Qk fields are used for the offset (base register).
- A: Address for memory address calculations. Initially set to the immediate field and then the effective address.
- Busy: set if busy
Figure 3.7 (empty) shows a blank form that can be filled in.
Figure 3.7 (filled) shows the result filled with the second load instruction waiting for memory.
Figure 3.7 (completed) shows the result filled after cycle 13 (with additional values in parentheses).
Figure 3.7 interactive is a beta version of an interactive coding form.
How Loads and Stores work
Loads:
- The instruction is issued if a load reservation station (load buffer) is available.
- Memory is accessed as soon as the address is available and the load buffer is next in line.
- On a cache hit this takes one cycle.
- On a cache miss, the memory access may take many cycles.
- The value is put on the CDB (if available) on the next cycle.
Stores:
- The instruction is issued if a store reservation station (store buffer) is available.
- The address is calculated when the base register is available.
- The value cannot be written to memory until the address and source are available.
- Values are written to memory in issue order.
- The CDB is not used.
Interactive Tomasulo Through Cycle 6
Interactive Tomasulo Through Cycle 13
Today's News: November 11, 2015
Assignment 3 due Friday
Today's News: November 13, 2015
Assignment 3 due today
Exam next week.
Today's News: November 16, 2015
A Loop Example
Consider the unrolled (but unscheduled) loop from Section 3.2.
This has 4 iterations of the loop and takes 27 cycles.
How would this do under Tomasulo's algorithm?
Tomasulo Loop Form(empty) shows a blank form that can be filled in.
Here is an HTML version.
Assumptions:
- Enough reservation stations
- Load and store execution is address calculation which takes one cycle
- All cache hits: Memory access takes one cycle
- Floating point add takes 2 cycles of execution
- Floating point units are ready on cycle after execution ends
- Integer add takes one cycle of execution
- Priority for the CDB is based on issue time.
Here is a completed solution.
The unscheduled unrolled loop took 27 cycles.
The scheduled unrolled loop took 14 cycles.
What advantage does the tomasulo algorithm have over the scheduled unrolled loop?
Today's News: November 18, 2015
Tomasulo Algorithm Summary
- Instruction issue requires only an available reservation station
- Instruction start execution requires:
- instruction has issued by not started execution
- functional unit available
- operands available
- all previous branches completed
- Memory access on loads may take multiple cycles (depending on cache hit)
- results written to CDB when available (at most one per cycle)