previous
 next 
CS 3843 Computer Organization
Notes on Chapter 4: Section 4.1

Section 4.1: The Y86 Instruction Set Architecture

We will look at an assembly language instruction set simpler than but similar to IA32 and try to understand how it is encoded and how you would build hardware to implement it.

Section 4.1.1: Visible State

The Y86 has The Y86 does not have:

Section 4.1.2 and 4.1.3: Y86 Instructions and encoding

The following is Figure 4.2 of the text.
Byte012345
halt00
nop10
rrmovl rArB20rArB
irmovl VrB30FrBV
rmmovl rAD(rB)40rArBD
mrmovl D(rB)rA50rArBD
OP1 rArB6fnrArB
jXX Dest7fnDest
cmovXX rArB2fnrArB
call Dest80Dest
ret90
pushl rAA0rAF
popl rAB0rAF


Note the following: Instruction encoding examples:
  1. rrmovl %eax, %ecx
    The 4 nibbles are 2 0 0 1, so this would be stored in 2 bytes of memory, the first containing 0x20 and the second byte containing 0x01
    Understand the ordering of the bytes and what it means.
  2. rmmovl %ecx, 24(%ebp)
    The first 4 nibbles are 4 0 1 5 and the displacement is 24.
    The first 2 bytes of memory would contain 0x40 and 0x15.
    On a little endian machine the next byte would be 0x24 followed by 3 bytes of 0.


Note: encodings of Y86 are simpler than the IA32, but not as compact.

Today's News: March 26
Second exam on Wednesday of next week.



RISC and CISC
RISC = reduced instruction set computer
CISC = complex instruction set computer


Basic ideas of RISC design: Which is IA32?
Which is Y86?

Which is better: RISC or CISC?
Answer: a combination

Section 4.1.4: Y86 Exceptions

What happens when an invalid assembly instruction is found?
This generates an exception.
In Y86 an exception halts the machine, it stops executing.
On a real system, this would be handled by the OS and only the current process would be terminated.
What are some possible causes of exceptions? Y86 handles 3 types of exceptions: In each case the status is set.

Section 4.1.5: Y86 Examples

The Sum function sums an integer array:
int Sum(int *Start, int Count) {  
   int sum = 0;
   while (Count) {
      sum += *Start;
      Start++;
      Count--;
   }
   return sum;
}


Here is the function in Y86 from the book:
Sum:
   pushl  %ebp              // standard setup
   rrmovl %esp, %ebp        // standard setup

   mrmovl 8(%ebp), %ecx     // ecx = Start
   mrmovl 12(%ebp),%edx     // edx = Count
   xorl   %eax, %eax        // eax = 0
   andl   %edx, %edx        // same as IA32 testl %edx, %edx
   je     End
Loop:
   mrmovl (%ecx), %esi      // 2 instructions to add *Start to sum
   addl   %esi, %eax        //         IA32: addl (%ecx), %eax
   irmovl $4, %ebx          // 2 instructions to add 4 to Start
   addl   %ebx, %ecx        //         IA32: addl $4, %ecx
   irmovl $-1, %ebx         // 2 instructions to decrement Count
   addl   %ebx, %edx        //         IA32: subl $1, %edx
   jne    Loop              // continue unless count is 0
End:

   rrmovl %ebp, %esp        // clean up for return
   popl   %ebp
   ret


Question:
Can you optimize this code by reducing the number of instructions in the loop?
Answer:


IA32 add in Y86

Here is the IA32 code for Sum generated by our compiler with the names of the labels changed:
Sum:
        pushl   %ebp
        movl    %esp, %ebp

        movl    8(%ebp), %ecx  // ecx = Start
        movl    12(%ebp), %edx // edx = Count
        movl    $0, %eax       // eax = 0
        testl   %edx, %edx     // see if Count == 0
        je      End            
Loop:
        addl    (%ecx), %eax  // add *Start to sum

        addl    $4, %ecx      // increment Start by 4

        subl    $1, %edx      // decrement Count

        jne     Loop          // continue unless count is 0
End:

        popl    %ebp
        ret

Note: This is not much different from the IA32 code.

Question:
Can you do the same type of optimization on the above IA32 code?
Answer:


However, consider the following implementation of Sum:
int Sum(int Start[], int Count) {  
   int sum = 0;
   int i;
   for (i=0; i<Count; i++)
      sum += Start[i];
   return sum;
}


Here is the IA32 code generated by this array implementation:
Sum:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        movl    8(%ebp), %ebx     // array start into %ebx (does not change)
        movl    12(%ebp), %ecx    // Count into %ecx (does not change)
        movl    $0, %eax          // eax is sum to return (could use xorl)
        movl    $0, %edx          // edx is i
        testl   %ecx, %ecx
        jle     .L3
.L6:
        addl    (%ebx,%edx,4), %eax  // How many Y86 instructions to do this?
        addl    $1, %edx             // increment i (could use incl)
        cmpl    %edx, %ecx           // compare Count to i
        jg      .L6
.L3:
        popl    %ebx
        popl    %ebp
        ret

Note that this would be harder to translate into Y86 since it uses scaled addressing. Y86 does not have shift or multiply.

Question:
Can you do the same type of optimization on the above IA32 code?
Answer:


Section 4.1.6: Y86 Details

We will skip this section.
 Back to CS 3843 Notes Table of Contents
 next