Section 4.1: The Y86 Instruction Set Architecture
We will look at an assembly language instruction set simpler than but similar to IA32
and try to understand how it is encoded and how you would build hardware to implement it.
Section 4.1.1: Visible State
The Y86 has
- 8 32-bit registers with the same names as the IA32 32-bit registers
- 3 condition codes: ZF, SF, OF (no carry flag - interpret integers as signed)
- a program counter (PC)
- a program status byte: AOK, HLT, ADR, INS
- memory: up to 4 GB to hold program and data
The Y86 does not have:
- a carry flag
- floating point registers or instructions
Section 4.1.2 and 4.1.3: Y86 Instructions and encoding
The following is Figure 4.2 of the text.
Byte | 0 | 1 | 2 | 3 | 4 | 5 |
halt | 0 | 0 |
nop | 1 | 0 |
rrmovl rA, rB | 2 | 0 | rA | rB |
irmovl V, rB | 3 | 0 | F | rB | V |
rmmovl rA, D(rB) | 4 | 0 | rA | rB | D |
mrmovl D(rB), rA | 5 | 0 | rA | rB | D |
OP1 rA, rB | 6 | fn | rA | rB |
jXX Dest | 7 | fn | Dest |
cmovXX rA, rB | 2 | fn | rA | rB |
call Dest | 8 | 0 | Dest |
ret | 9 | 0 |
pushl rA | A | 0 | rA | F |
popl rA | B | 0 | rA | F |
Note the following:
- rA or rB represent one of the registers, encoded as follows:
number | register |
0 | %eax |
1 | %ecx |
2 | %edx |
3 | %ebx |
4 | %esp |
5 | %ebp |
6 | %esi |
7 | %edi |
F | no register |
- different opcodes for 4 types of moves:
- register to register
- immediate to register
- register to memory
- memory to register
Y86 encoding 1
- The only memory addressing mode is base register + displacement
- Memory operations always move 4 bytes (no byte or word memory operations)
- source or destination of memory move must be a register.
- The operations supported (OP1) are:
fn | operation |
0 | addl |
1 | subl |
2 | andl |
3 | xorl |
- Only 32-bit operations and no or and no not.
- These only take registers as operands and only work on 32 bits
- 7 jump instructions:
fn | jump |
0 | jmp |
1 | jle |
2 | jl |
3 | je |
4 | jne |
5 | jge |
6 | jg |
- 6 conditional move instructions with encodings similar to the conditional jump instructions.
These are similar to the IA32.
Note that rrmovl is a special case.
- Note that you can tell the type of instruction and how many bytes it has just be looking
at the first byte of the instruction.
Instruction encoding examples:
- rrmovl %eax, %ecx
The 4 nibbles are 2 0 0 1, so this would be stored in 2 bytes of memory, the first containing 0x20 and the
second byte containing 0x01
Understand the ordering of the bytes and what it means.
- rmmovl %ecx, 24(%ebp)
The first 4 nibbles are 4 0 1 5 and the displacement is 24.
The first 2 bytes of memory would contain 0x40 and 0x15.
On a little endian machine the next byte would be 0x24 followed by 3 bytes of 0.
Note: encodings of Y86 are simpler than the IA32, but not as compact.
Today's News: March 26
Second exam on Wednesday of next week.
RISC and CISC
RISC = reduced instruction set computer
CISC = complex instruction set computer
Basic ideas of RISC design:
- A small number of instructions
- Most instructions have the same length
- Simple addressing formats
- Arithmetic and logical operations only work on registers
- Memory operations only move between register and memory.
- No condition codes: test instructions store result in registers
Which is IA32?
Which is Y86?
Which is better: RISC or CISC?
Answer: a combination
Section 4.1.4: Y86 Exceptions
What happens when an invalid assembly instruction is found?
This generates an exception.
In Y86 an exception halts the machine, it stops executing.
On a real system, this would be handled by the OS and only the
current process would be terminated.
What are some possible causes of exceptions?
- Invalid operation
- Divide by 0
- sqrt of negative number
- memory access error (address too large)
- hardware error
Y86 handles 3 types of exceptions:
- HLT instruction executed
- Invalid address encountered
- Invalid instruction encountered.
In each case the status is set.
Section 4.1.5: Y86 Examples
The Sum function sums an integer array:
int Sum(int *Start, int Count) {
int sum = 0;
while (Count) {
sum += *Start;
Start++;
Count--;
}
return sum;
}
Here is the function in Y86 from the book:
Sum:
pushl %ebp // standard setup
rrmovl %esp, %ebp // standard setup
mrmovl 8(%ebp), %ecx // ecx = Start
mrmovl 12(%ebp),%edx // edx = Count
xorl %eax, %eax // eax = 0
andl %edx, %edx // same as IA32 testl %edx, %edx
je End
Loop:
mrmovl (%ecx), %esi // 2 instructions to add *Start to sum
addl %esi, %eax // IA32: addl (%ecx), %eax
irmovl $4, %ebx // 2 instructions to add 4 to Start
addl %ebx, %ecx // IA32: addl $4, %ecx
irmovl $-1, %ebx // 2 instructions to decrement Count
addl %ebx, %edx // IA32: subl $1, %edx
jne Loop // continue unless count is 0
End:
rrmovl %ebp, %esp // clean up for return
popl %ebp
ret
Question:
Can you optimize this code by reducing the number of instructions in the loop?
Answer:
IA32 add in Y86
Here is the IA32 code for Sum generated by our compiler with the names of the labels changed:
Sum:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %ecx // ecx = Start
movl 12(%ebp), %edx // edx = Count
movl $0, %eax // eax = 0
testl %edx, %edx // see if Count == 0
je End
Loop:
addl (%ecx), %eax // add *Start to sum
addl $4, %ecx // increment Start by 4
subl $1, %edx // decrement Count
jne Loop // continue unless count is 0
End:
popl %ebp
ret
Note: This is not much different from the IA32 code.
Question:
Can you do the same type of optimization on the above IA32 code?
Answer:
However, consider the following implementation of Sum:
int Sum(int Start[], int Count) {
int sum = 0;
int i;
for (i=0; i<Count; i++)
sum += Start[i];
return sum;
}
Here is the IA32 code generated by this array implementation:
Sum:
pushl %ebp
movl %esp, %ebp
pushl %ebx
movl 8(%ebp), %ebx // array start into %ebx (does not change)
movl 12(%ebp), %ecx // Count into %ecx (does not change)
movl $0, %eax // eax is sum to return (could use xorl)
movl $0, %edx // edx is i
testl %ecx, %ecx
jle .L3
.L6:
addl (%ebx,%edx,4), %eax // How many Y86 instructions to do this?
addl $1, %edx // increment i (could use incl)
cmpl %edx, %ecx // compare Count to i
jg .L6
.L3:
popl %ebx
popl %ebp
ret
Note that this would be harder to translate into Y86 since it uses
scaled addressing. Y86 does not have shift or multiply.
Question:
Can you do the same type of optimization on the above IA32 code?
Answer:
Section 4.1.6: Y86 Details
We will skip this section.