We assume that everything in Figure 4.23 is combinational logic except for:
The PC
The Register File
The CC register
The data memory
All of the above are controlled by a single clock and store values
only when that clock goes high.
Each time the clock goes high one instruction is executed.
Basic idea:
When the clock goes high, a new value is stored in the PC.
After some propagation delay, the outputs of the instruction memory
are available.
These go through additional logic to provide the inputs to the ALU
and the read inputs of the register file.
After an additional delay, the read inputs to the Data Memory are correct.
We assume that reading from the data memory is combinational, so after
an additional propagation delay, the outputs of the data memory are valid.
We now have the write values and write address inputs to the register file
and the data memory valid.
On the next rising edge of the clock, values are stored in the PC,
and if necessary in the Register File, Data Memory, and condition code
register.
The clock speed is determined by the longest data path through the combinational
logic for the most complicated instruction.
This is the basic idea behind RISC design.
The speed of the simplest instructions is determined by the speed of the most
complicated instruction.
An example: Tracing OP1 rA, rB
Fetch
Look at
Figure 4.27
6 bytes are read from the instruction memory (even though only 2 will be used for this instruction).
The first byte is split into the iCode and iFun values. iCode is 6 in this case and iFun is one of 0, 1, 2, or 3.
The second byte of the instruction is divided into rA and rB.
Note that for all instructions, rA and rB are always in the second byte. iCode is used to generate the control lines:
Need ValC and Need regids.
These in turn generate the number of bytes needed for this instruction: 1, 2, 5, or 6.
This gets fed into the incrementer
(which is really an adder, which adds 1, 2, 5, or 6 to the PC).
Decode
Look at
Figure 4.28 valA and valB are available from the register file.
Execute
Look at
Figure 4.29 iFun determines which ALU function is used.
In the case iCode selects valA and valB as the ALU inputs.
Condition codes are also set if necessary.
Memory: nothing for this instruction
Write Back
The result from the ALU, if stored in rB.
Look at Figure 4.28. rB is fed into dstE and the output of the ALU is valE.
PC update
See Figure 4.23.
Because this is a OP1 instructions, iCode selects the valP input
to become newPC.
Note: Each instruction is executed in a single cycle.
Today's News: April 18
Don't forget the course evaluations
Question:
Design the circuit that calculates the value to be added to the PC
based on the values of Need ValC and Need regids
Answer:
Question:
Design the circuit that calculates Need ValC and Need regids from the
value if iCode.
Answer:
Interrupts
This material is not in the book.
External hardware devices need a way to interrupt the normal execution of
the instruction cycles.
Some examples of devices that need to do this:
hardware timers: used for keeping track of the time
keyboard: notify the CPU that a key has been pushed
mouse
disk drive: requested data is available
MMU: invalid memory reference or page fault
How this works:
The CPU chip has one or more pins for interrupts.
At the end of each instruction cycle, the pins are checked.
If interrupts are enabled and one of these pins is active
an interrupt occurs.
The registers are saved and an interrupt handler is called.
After the handler completes, the registers are restored and the
program resumes from where it left off.
The Instruction Cycle
The tradition instruction cycle:
Fetch instruction
Increment PC
Decode instruction
Execute instruction
Handle Interrupts
Note that the fetch, increment PC, and decode may be implemented in several steps:
Fetch the opcode
Increment the PC
Decode the opcode and determine the number of arguments
Fetch the arguments, incrementing the PC as we go
Decode the arguments
Today's News: April 21
Today is the last day for the course evaluations
How the traditional instruction cycle relates to the Y86 instruction cycle:
Fetch:
We always fetch 6 bytes, even if the instruction is 1, 2, or 5 bytes
Instead of incrementing the PC, we set valP to the new value of the PC based on the number of bytes of the instruction.
Decode: traditional
Execute: The traditional execute has 3 phases in the Y86. First,
new values are calculated.
Memory: Values are read from or written to memory
Write back: The register file is updated
PC update: The PC is updated using valP
Why is it done this way in the Y86?
The Y86 executes one instruction per cycle.
Each cycle can have only one change to sequential logic, since all use a positive edge trigger.
Instruction memory is combinational, a change to the PC changes the
instruction coming out,
so we must get all of the instruction before
we know how many bytes it is.
Everything is combinational logic except for:
Condition codes
Data Memory
Register File
PC register
Fetch, Decode, Execute, and Memory read are all done together
with combinational logic
Memory write, Write Back, and PC update are all done at the same time
at the beginning of the next clock cycle.
Example: Tracing jxx Dest
Fetch
Look at
Figure 4.27
6 bytes are read from the instruction memory
(even though only 5 will be used for this instruction).
The first byte is split into the iCode and iFun values. iCode is 7 in this case and iFun determines the type of jump.
The next 4 bytes are the value of valC which is the destination address
if the jump is taken.
In this case
Need ValC is true and Need regids is false,
which will feed 5 into the PC incrementer.
Decode
Nothing needs to be done here.
Execute
Look at
Figure 4.29 Cnd generated by the values if CC and iFun to determine
whether the jump will take place.
Memory: nothing for this instruction
Write Back: nothing to do for this instruction
PC update
If Cnd is true, PC is set to valC, otherwise to valP.
Today's News: April 23
Thank you for filling out the evaluation forms
Example: Tracing rmmovl rA, D(rB)
A more detailed version of this trace can be found
here.
Fetch
Look at
Figure 4.27
6 bytes are read from the instruction memory
(in this case all are used). iCode is 4 and iFun is not used. valC is set as well as valP = PC+6.
Decode is like OP1:
Look at
Figure 4.28 valA and valB are set.
Execute
Look at
Figure 4.29 iCode determines which ALU function is used (add). valE = valB + valC
Memory:
Look at
Figure 4.30 valE is the memory address and valA is the data.