CS 3843 Computer Organization Notes on Chapter 4, Section 4.1

Section 4.1: The Y86 Instruction Set Architecture

We will look at an assembly language instruction set simpler than but similar to IA32 and try to understand how it is encoded and how you would build hardware to implement it.

Section 4.1.1: Visible State

The Y86 has

8 32-bit registers with the same names as the IA32 32-bit registers
3 condition codes: ZF, SF, OF (no carry flag - interpret integers as signed)
a program counter (PC)
a program status byte: AOK, HLT, ADR, INS
memory: up to 4 GB to hold program and data

The Y86 does not have:

a carry flag
floating point registers or instructions

Section 4.1.2 and 4.1.3: Y86 Instructions and encoding

The following is Figure 4.2 of the text.

Byte	0		1		2	3	4	5
`halt`	0	0
`nop`	1	0
`rrmovl` rA, rB	2	0	rA	rB
`irmovl` V, rB	3	0	F	rB	V
`rmmovl` rA, D(rB)	4	0	rA	rB	D
`mrmovl` D(rB), rA	5	0	rA	rB	D
`OP1` rA, rB	6	fn	rA	rB
`jXX` Dest	7	fn	Dest
`cmovXX` rA, rB	2	fn	rA	rB
`call` Dest	8	0	Dest
`ret`	9	0
`pushl` rA	A	0	rA	F
`popl` rA	B	0	rA	F

Note the following:

rA or rB represent one of the registers, encoded as follows:

number register

0 %eax

1 %ecx

2 %edx

3 %ebx

4 %esp

5 %ebp

6 %esi

7 %edi

F no register
different opcodes for 4 types of moves:
- register to register
- immediate to register
- register to memory
- memory to register
Y86 encoding 1
The only memory addressing mode is base register + displacement
Memory operations always move 4 bytes (no byte or word memory operations)
source or destination of memory move must be a register.
The operations supported (OP1) are:

fn operation

0    addl

1    subl

2    andl

3    xorl
Only 32-bit operations and no or and no not.
These only take registers as operands and only work on 32 bits
7 jump instructions:

fn   jump

0    jmp

1    jle

2    jl

3    je

4    jne

5    jge

6    jg
6 conditional move instructions with encodings similar to the conditional jump instructions.
These are similar to the IA32.
Note that rrmovl is a special case.
Note that you can tell the type of instruction and how many bytes it has just be looking at the first byte of the instruction.

Instruction encoding examples:

rrmovl %eax, %ecx
The 4 nibbles are 2 0 0 1, so this would be stored in 2 bytes of memory, the first containing 0x20 and the second byte containing 0x01
Understand the ordering of the bytes and what it means.
rmmovl %ecx, 24(%ebp)
The first 4 nibbles are 4 0 1 5 and the displacement is 24.
The first 2 bytes of memory would contain 0x40 and 0x15.
On a little endian machine the next byte would be 0x24 followed by 3 bytes of 0.

Note: encodings of Y86 are simpler than the IA32, but not as compact.

Today's News: March 26

Second exam on Wednesday of next week.

RISC and CISC
RISC = reduced instruction set computer
CISC = complex instruction set computer

Basic ideas of RISC design:

A small number of instructions
Most instructions have the same length
Simple addressing formats
Arithmetic and logical operations only work on registers
Memory operations only move between register and memory.
No condition codes: test instructions store result in registers

Which is IA32?
Which is Y86?

Which is better: RISC or CISC?
Answer: a combination

Section 4.1.4: Y86 Exceptions

What happens when an invalid assembly instruction is found?
This generates an exception.
In Y86 an exception halts the machine, it stops executing.
On a real system, this would be handled by the OS and only the current process would be terminated.
What are some possible causes of exceptions?

Invalid operation
Divide by 0
sqrt of negative number
memory access error (address too large)
hardware error

Y86 handles 3 types of exceptions:

HLT instruction executed
Invalid address encountered
Invalid instruction encountered.

In each case the status is set.

Section 4.1.5: Y86 Examples

The Sum function sums an integer array:

int Sum(int *Start, int Count) {  
   int sum = 0;
   while (Count) {
      sum += *Start;
      Start++;
      Count--;
   }
   return sum;
}

Here is the function in Y86 from the book:

Sum:
   pushl  %ebp              // standard setup
   rrmovl %esp, %ebp        // standard setup

   mrmovl 8(%ebp), %ecx     // ecx = Start
   mrmovl 12(%ebp),%edx     // edx = Count
   xorl   %eax, %eax        // eax = 0
   andl   %edx, %edx        // same as IA32 testl %edx, %edx
   je     End
Loop:
   mrmovl (%ecx), %esi      // 2 instructions to add *Start to sum
   addl   %esi, %eax        //         IA32: addl (%ecx), %eax
   irmovl $4, %ebx          // 2 instructions to add 4 to Start
   addl   %ebx, %ecx        //         IA32: addl $4, %ecx
   irmovl $-1, %ebx         // 2 instructions to decrement Count
   addl   %ebx, %edx        //         IA32: subl $1, %edx
   jne    Loop              // continue unless count is 0
End:

   rrmovl %ebp, %esp        // clean up for return
   popl   %ebp
   ret

Question:

Can you optimize this code by reducing the number of instructions in the loop?

Answer:

IA32 add in Y86

Here is the IA32 code for Sum generated by our compiler with the names of the labels changed:

Sum:
        pushl   %ebp
        movl    %esp, %ebp

        movl    8(%ebp), %ecx  // ecx = Start
        movl    12(%ebp), %edx // edx = Count
        movl    $0, %eax       // eax = 0
        testl   %edx, %edx     // see if Count == 0
        je      End            
Loop:
        addl    (%ecx), %eax  // add *Start to sum

        addl    $4, %ecx      // increment Start by 4

        subl    $1, %edx      // decrement Count

        jne     Loop          // continue unless count is 0
End:

        popl    %ebp
        ret

Note: This is not much different from the IA32 code.

Question:

Can you do the same type of optimization on the above IA32 code?

Answer:

However, consider the following implementation of Sum:

int Sum(int Start[], int Count) {  
   int sum = 0;
   int i;
   for (i=0; i<Count; i++)
      sum += Start[i];
   return sum;
}

Here is the IA32 code generated by this array implementation:

Sum:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        movl    8(%ebp), %ebx     // array start into %ebx (does not change)
        movl    12(%ebp), %ecx    // Count into %ecx (does not change)
        movl    $0, %eax          // eax is sum to return (could use xorl)
        movl    $0, %edx          // edx is i
        testl   %ecx, %ecx
        jle     .L3
.L6:
        addl    (%ebx,%edx,4), %eax  // How many Y86 instructions to do this?
        addl    $1, %edx             // increment i (could use incl)
        cmpl    %edx, %ecx           // compare Count to i
        jg      .L6
.L3:
        popl    %ebx
        popl    %ebp
        ret

Note that this would be harder to translate into Y86 since it uses scaled addressing. Y86 does not have shift or multiply.

Question:

Can you do the same type of optimization on the above IA32 code?

Answer:

Section 4.1.6: Y86 Details

We will skip this section.

Back to CS 3843 Notes Table of Contents

number	register
0	`%eax`
1	`%ecx`
2	`%edx`
3	`%ebx`
4	`%esp`
5	`%ebp`
6	`%esi`
7	`%edi`
F	no register

fn	operation
0	`addl`
1	`subl`
2	`andl`
3	`xorl`

fn	jump
0	`jmp`
1	`jle`
2	`jl`
3	`je`
4	`jne`
5	`jge`
6	`jg`