# Digital Design & Computer Arch. Lecture 10a: Instruction Set Architectures II

Prof. Onur Mutlu

ETH Zürich Spring 2022 25 March 2022

# Assignment: Lecture Video (April 1)

- Why study computer architecture? Why is it important?
- Future Computing Platforms: Challenges & Opportunities

### Required Assignment

- Watch one of Prof. Mutlu's lectures and analyze either (or both)
- <u>https://www.youtube.com/watch?v=kgiZISOcGFM</u> (May 2017)
- <u>https://www.youtube.com/watch?v=mskTeNnf-i0</u> (Feb 2021)

### Optional Assignment – for 1% extra credit

- Write a 1-page summary of one of the lectures and email us
  - What are your key takeaways?
  - What did you learn?
  - What did you like or dislike?
  - Submit your summary to <u>Moodle</u> by April 1

## Extra Assignment: Moore's Law (I)

### Paper review

 G.E. Moore. "Cramming more components onto integrated circuits," Electronics magazine, 1965

### Optional Assignment – for 1% extra credit

- Write a 1-page review
- Upload PDF file to Moodle Deadline: April 7

 I strongly recommend that you follow my guidelines for (paper) review (see next slide)

## Extra Assignment 2: Moore's Law (II)

### Guidelines on how to review papers critically

- Guideline slides: pdf ppt
- Video: <u>https://www.youtube.com/watch?v=tOL6FANAJ8c</u>
- Example reviews on "Main Memory Scaling: Challenges and Solution Directions" (link to the paper)
  - Review 1
  - Review 2
- Example review on "Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems" (link to the paper)
  - Review 1

### Agenda for Today & Next Few Lectures

- The von Neumann model
- LC-3: An example of von Neumann machine
- LC-3 and MIPS Instruction Set Architectures
- LC-3 and MIPS assembly and programming
- Introduction to microarchitecture and single-cycle microarchitecture
- Multi-cycle microarchitecture

| Problem            |
|--------------------|
| Algorithm          |
| Program/Language   |
| System Software    |
| SW/HW Interface    |
| Micro-architecture |
| Logic              |
| Devices            |
| Electrons          |

### What Will We Learn Today?

- Basic elements of a computer & the von Neumann model
   LC-3: An example von Neumann machine
- Instruction Set Architectures: LC-3 and MIPS
  - Operate instructions
  - Data movement instructions
  - Control instructions
- Instruction formats
- Addressing modes

| Problem            |  |  |  |  |  |  |
|--------------------|--|--|--|--|--|--|
| Algorithm          |  |  |  |  |  |  |
| Program/Language   |  |  |  |  |  |  |
| System Software    |  |  |  |  |  |  |
| SW/HW Interface    |  |  |  |  |  |  |
| Micro-architecture |  |  |  |  |  |  |
| Logic              |  |  |  |  |  |  |
| Devices            |  |  |  |  |  |  |
| Electrons          |  |  |  |  |  |  |
|                    |  |  |  |  |  |  |

## Readings

### This week

- Von Neumann Model, ISA, LC-3, and MIPS
  - P&P, Chapters 4, 5 (we will follow these today & tomorrow)
  - H&H, Chapter 6 (until 6.5)
  - P&P, Appendices A and C (ISA and microarchitecture of LC-3)
  - H&H, Appendix B (MIPS instructions)
- Programming
  - P&P, Chapter 6 (we will follow this tomorrow)
- Recommended: H&H Chapter 5, especially 5.1, 5.2, 5.4, 5.5

### Next week

- Introduction to microarchitecture and single-cycle microarchitecture
  - H&H, Chapter 7.1-7.3
  - P&P, Appendices A and C
- Multi-cycle microarchitecture
  - H&H, Chapter 7.4
  - P&P, Appendices A and C

Quick Review of the von Neumann Model

### Recall: The von Neumann Model



### Recall: von Neumann Model: Two Key Properties

Von Neumann model is also called stored program computer (instructions in memory). It has two key properties:

### Stored program

- Instructions stored in a linear memory array
- Memory is unified between instructions and data
  - The interpretation of a stored value depends on the control signals

### Sequential instruction processing

- One instruction processed (fetched, executed, completed) at a time
- Program counter (instruction pointer) identifies the current instruction
- Program counter is advanced sequentially except for control transfer instructions

### Recall: Programmer Visible (Architectural) State



#### Memory

array of storage locations indexed by an address



#### Registers

- given special names in the ISA (as opposed to addresses)
- general vs. special purpose

#### **Program Counter**

memory address of the current (or next) instruction

Instructions (and programs) specify how to transform the values of programmer visible state

### Recall: LC-3: A von Neumann Machine



Figure 4.3 The LC-3 as an example of the von Neumann model

### Recall: The Instruction (Processing) Cycle



### LC-3: A von Neumann Machine





Apple M1, 2021

SAFARI

Source: https://www.anandtech.com/show/16252/mac-mini-apple-m1-tested



10nm ESF=Intel 7 Alder Lake die shot (~209mm<sup>2</sup>) from Intel: https://www.intel.com/content/www/us/en/newsroom/news/12th-gen-core-processors.html Die shot interpretation by Locuza, October 2021

Intel Alder Lake, 2021

#### SAFARI

Source: https://twitter.com/Locuza\_/status/1454152714930331652



Core Count: 8 cores/16 threads

L1 Caches: 32 KB per core

L2 Caches: 512 KB per core

L3 Cache: 32 MB shared

#### AMD Ryzen 5000, 2020



IBM POWER10, 2020

Cores: 15-16 cores, 8 threads/core

L2 Caches: 2 MB per core

L3 Cache: 120 MB shared

# LC-3 and MIPS Instruction Set Architectures

### The Instruction Set

- It defines opcodes, data types, and addressing modes
  ADD and LDD have been our first examples
- ADD and LDR have been our first examples



### The Instruction Set Architecture

- The ISA is the interface between what the software commands and what the hardware carries out
- The ISA specifies
  - The memory organization
    - Address space (LC-3: 2<sup>16</sup>, MIPS: 2<sup>32</sup>)
    - Addressability (LC-3: 16 bits, MIPS: 8 bits)
    - Word- or Byte-addressable
  - The register set
    - R0 to R7 in LC-3
    - 32 registers in MIPS
  - The instruction set
    - Opcodes
    - Data types
    - Addressing modes
    - Length and format of instructions

| Problem           |
|-------------------|
| Algorithm         |
| Program           |
| ISA               |
| Microarchitecture |
| Circuits          |
| Electrons         |

## Opcodes

- A large or small set of opcodes could be defined
  - □ E.g, HP Precision Architecture: an instruction for A\*B+C
  - □ E.g, x86 ISA: multimedia extensions (MMX), later SSE and AVX
  - E.g, VAX ISA: opcode to save all information of one program prior to switching to another program
- Tradeoffs are involved
  - Hardware complexity vs. software complexity
- In LC-3 and in MIPS there are three types of opcodes
  - Operate
  - Data movement
  - Control

## Opcodes in LC-3

|                  | 15 14 13 12  | 2 11 10 9 | 8 7 6 5 4 3 2 1 | 0 |
|------------------|--------------|-----------|-----------------|---|
| ADD+             | 0001         | DR        | SR1 0 00 SR     | 2 |
| ADD+             | 0001         |           | SR1 1 imm5      | 1 |
| AND+             | 0101         |           | SR1 0 00 SR     | 2 |
| AND+             | 0101         |           | SR1 1 imm5      | 1 |
| BR               | 0000         | n z p     | PCoffset9       | 1 |
| JMP              | 1100         | 000       | BaseR 000000    | 1 |
| JSR              | 0100         | 1         | PCoffset11      | 1 |
| JSRR             | 0100         | 0 00      | BaseR 000000    | 1 |
| LD+              | 0010         | DR        | PCoffset9       | 1 |
| LDI+             | 1010         | DR        | PCoffset9       | 1 |
| LDR+             | 0110         | DR        | BaseR offset6   | 1 |
| LEA <sup>+</sup> | 1110         | DR        | PCoffset9       | 1 |
| NOT+             | 1001         | DR        | SR 111111       | 1 |
| RET              | 1100         | 000       | 111 000000      | 1 |
| RTI              | 1000         |           | 00000000000     | 1 |
| ST               | 0011         | SR        | PCoffset9       | 1 |
| STI              | 1011<br>1011 | SR        | PCoffset9       | 1 |
| STR              | 0111         | SR        | BaseR offset6   | 1 |
| TRAP             | 1111         | 0000      | trapvect8       | 1 |
| reserved         | 1101         |           |                 | 1 |

Figure 5.3 Formats of the entire LC-3 instruction set. NOTE: + indicates instructions that modify condition codes

## Opcodes in LC-3b

| ADD⁺             | 15 14 13 12<br>0001 | 11 10 9<br>DR | 8 7 6 5 4 3 2 1 0<br>SR1 A op.spec |
|------------------|---------------------|---------------|------------------------------------|
| AND <sup>+</sup> | 0101                | DR            | SR1 A op.spec                      |
| BR               | 0000                | n z p         | PCoffset9                          |
| JMP              | 1100                | 000           | BaseR 000000                       |
| JSR(R)           | 0100                | A             | operand.specifier                  |
| LDB <sup>+</sup> | 0010                | DR            | BaseR boffset6                     |
| LDW <sup>+</sup> | 0110                | DR            | BaseR offset6                      |
| LEA⁺             | 1110                | DR            | PCoffset9                          |
| RTI              | 1000                |               | 00000000000                        |
| SHF              | 1101                | DR            | SR A D amount4                     |
| STB              | 0011                | SR            | BaseR boffset6                     |
| STW              | 0111                | SR            | BaseR offset6                      |
| TRAP             | 1111                | 0000          | trapvect8                          |
| XOR <sup>*</sup> | 1001                | DR            | SR1 A op.spec                      |
| not used         | 1010                |               |                                    |
| not used         | 1011                |               |                                    |

| 0      | rs     | rt    | rd     | shamt | funct | R-type |
|--------|--------|-------|--------|-------|-------|--------|
| 6-bit  | 5-bit  | 5-bit | 5-bit  | 5-bit | 6-bit |        |
|        |        |       |        |       |       |        |
| opcode | rs     | rt    | immec  | liate |       | l-type |
| 6-bit  | 5-bit  | 5-bit | 16-bit |       |       |        |
|        |        |       |        |       |       |        |
|        | 1      | • .   |        |       |       |        |
| opcode | immed  | late  |        |       |       | J-type |
| 6-bit  | 26-bit |       |        |       |       |        |

## Funct in MIPS R-Type Instructions (I)

Table B.2 R-type instructions, sorted by funct field

Opcode is 0 in MIPS R-Type instructions. Funct defines the operation

| Funct       | Name              | Description                     | Operation                           |
|-------------|-------------------|---------------------------------|-------------------------------------|
| 000000 (0)  | sll rd, rt, shamt | shift left logical              | [rd] = [rt] << shamt                |
| 000010 (2)  | srlrd,rt,shamt    | shift right logical             | [rd]=[rt]>> shamt                   |
| 000011 (3)  | sra rd, rt, shamt | shift right arithmetic          | [rd] = [rt] >>> shamt               |
| 000100 (4)  | sllv rd, rt, rs   | shift left logical variable     | [rd] = [rt] << [rs] <sub>4:0</sub>  |
| 000110 (6)  | srlvrd,rt,rs      | shift right logical variable    | [rd] = [rt] >> [rs] <sub>4:0</sub>  |
| 000111 (7)  | srav rd, rt, rs   | shift right arithmetic variable | [rd] = [rt] >>> [rs] <sub>4:0</sub> |
| 001000 (8)  | jr rs             | jump register                   | PC = [rs]                           |
| 001001 (9)  | jalr rs           | jump and link register          | \$ra = PC + 4, PC = [rs]            |
| 001100 (12) | syscall           | system call                     | system call exception               |
| 001101 (13) | break             | break                           | break exception                     |
| 010000 (16) | mfhi rd           | move from hi                    | [rd]=[hi]                           |
| 010001 (17) | mthi rs           | move to hi                      | [hi]=[rs]                           |
| 010010 (18) | mflo rd           | move from lo                    | [rd] = [lo]                         |
| 010011 (19) | mtlo rs           | move to lo                      | [lo] = [rs]                         |
| 011000 (24) | mult rs, rt       | multiply                        | {[hi],[]o]} = [rs] × [rt]           |
| 011001 (25) | multurs,rt        | multiply unsigned               | {[hi],[]o]} = [rs] × [rt]           |
| 011010 (26) | div rs, rt        | divide                          | []o]=[rs]/[rt],<br>[hi]=[rs]%[rt]   |
| 011011 (27) | divu rs, rt       | divide unsigned                 | [lo]=[rs]/[rt],<br>[hi]=[rs]%[rt]   |
|             |                   |                                 | (continued)                         |

(continued)

#### Harris and Harris, Appendix B: MIPS Instructions

| Funct       | Name            | Description            | Operation                         |  |
|-------------|-----------------|------------------------|-----------------------------------|--|
| 100000 (32) | add rd, rs, rt  | add                    | [rd] = [rs] + [rt]                |  |
| 100001 (33) | addu rd, rs, rt | add unsigned           | [rd] = [rs] + [rt]                |  |
| 100010 (34) | sub rd, rs, rt  | subtract               | [rd] = [rs] - [rt]                |  |
| 100011 (35) | subu rd, rs, rt | subtract unsigned      | [rd] = [rs] - [rt]                |  |
| 100100 (36) | and rd, rs, rt  | and                    | [rd] = [rs] & [rt]                |  |
| 100101 (37) | or rd, rs, rt   | or                     | [rd] = [rs]   [rt]                |  |
| 100110 (38) | xor rd, rs, rt  | xor                    | [rd] = [rs] ^ [rt]                |  |
| 100111 (39) | nor rd, rs, rt  | nor                    | [rd] = ~([rs]   [rt])             |  |
| 101010 (42) | sltrd,rs,rt     | set less than          | [rs] < [rt] ? [rd] = 1 : [rd] = 0 |  |
| 101011 (43) | slturd,rs,rt    | set less than unsigned | [rs] < [rt] ? [rd] = 1 : [rd] = 0 |  |

#### Table B.2 R-type instructions, sorted by funct field—Cont'd

### Find the complete list of instructions in the H&H Appendix B

## Data Types

- An ISA supports one or several data types
- LC-3 only supports 2's complement integers
   Negative of a 2's complement binary value X = NOT(X) + 1
- MIPS supports
  - 2's complement integers
  - Unsigned integers
  - Floating point
- Again, tradeoffs are involved
  - What data types should be supported and what should not be?

## Data Type Tradeoffs

- What is the benefit of having more or high-level data types in the ISA?
- What is the disadvantage?
- Think compiler/programmer vs. microarchitect
- Concept of semantic gap
  - □ Data types coupled tightly to the semantic level, or complexity of instructions  $\rightarrow$  how close are instrs. to high-level languages
- Example: Early RISC architectures vs. Intel 432
  - Early RISC machines: Only integer data type
  - Intel 432: Object data type, capability based machine
  - VAX: Complex types, e.g., doubly-linked list

## Aside: An Example: BinaryCodedDecimal

Each decimal digit is encoded with a fixed number of bits



http://commons.wikimedia.org/wiki/File:Binary\_clock.svg#mediaviewer/File:Binary\_clock.svg

"Digital-BCD-clock" by Julo - Own work. Licensed under Public Domain via Wikimedia Commons http://commons.wikimedia.org/wiki/File:Digital-BCD-clock.jpg#mediaviewer/File:Digital-BCD-clock.jpg

## Addressing Modes

- An addressing mode is a mechanism for specifying where an operand is located
- There are five addressing modes in LC-3
  - Immediate or literal (constant)
    - The operand is in some bits of the instruction
  - Register
    - The operand is in one of R0 to R7 registers
  - Three memory addressing modes
    - PC-relative
    - Indirect
    - Base+offset
- MIPS has pseudo-direct addressing (for j and jal), additionally, but does **not** have indirect addressing

### Why Have Different Addressing Modes?

- Another example of programmer vs. microarchitect tradeoff
- Advantage of more addressing modes:
  - Enables better mapping of high-level programming constructs to hardware
    - some accesses are better expressed with a different mode → reduced number of instructions and code size
      - Array indexing
      - Pointer-based accesses (indirection)
      - Sparse matrix accesses
- Disadvantages:
  - More work for the compiler
  - More work for the microarchitect

## Many Tradeoffs in ISA Design...

- Execution model sequencing model and processing style
- Instruction length
- Instruction format
- Instruction types
- Instruction complexity vs. simplicity
- Data types
- Number of registers
- Addressing mode types
- Memory organization (address space, addressability, endianness, ...)
- Memory access restrictions and permissions
- Support for multiple instructions to execute in parallel?

# Operate Instructions

### **Operate Instructions**

- In LC-3, there are three operate instructions
  - NOT is a unary operation (one source operand)
    - It executes bitwise NOT
  - ADD and AND are binary operations (two source operands)
    - ADD is 2's complement addition
    - AND is bitwise SR1 & SR2
- In MIPS, there are many more
  - Most of R-type instructions (they are binary operations)
    - E.g., add, and, nor, xor...
  - I-type versions (i.e., with one immediate operand) of the Rtype operate instructions
  - □ **F-type** operations, i.e., floating-point operations

### NOT in LC-3

### NOT assembly and machine code

LC-3 assembly

NOT R3, R5

### Field Values

| OP | DR | SR |        |
|----|----|----|--------|
| 9  | 3  | 5  | 111111 |

#### Machine Code

| 0  | OP  |     | DR    |   | R  |      |   |
|----|-----|-----|-------|---|----|------|---|
| 10 | 0 1 | 0 1 | 1 001 |   | 11 | 1111 |   |
| 15 | 12  | 11  | 9     | 8 | 6  | 5    | 0 |



There is no NOT in MIPS. How is it implemented?

### **Operate Instructions**

- We are already familiar with LC-3's ADD and AND with register mode (R-type in MIPS)
- Now let us see the versions with one literal (i.e., immediate) operand
- Subtraction is another necessary operation
   How is it implemented in LC-3 and MIPS?

### Operate Instr. with one Literal in LC-3

#### ADD and AND

| OP     | DR     | SR1        | 1 | imm5   |
|--------|--------|------------|---|--------|
| 4 bits | 3 bits | 3 bits 5 b |   | 5 bits |

- $\Box$  OP = operation
  - E.g., ADD = 0001 (same OP as the register-mode ADD)
     DR ← SR1 + sign-extend(imm5)
  - E.g., AND = 0101 (same OP as the register-mode AND)
     DR ← SR1 AND sign-extend(imm5)
- $\Box$  SR1 = source register
- DR = destination register
- imm5 = Literal or immediate (sign-extend to 16 bits)

## ADD with one Literal in LC-3

### ADD assembly and machine code

LC-3 assembly

ADD R1, R4, #-2

**Field Values** 

| OP | DR | SR |   | imm5 |
|----|----|----|---|------|
| 1  | 1  | 4  | 1 | -2   |

#### Machine Code

| OP  |    | D   | S   | SR |     | ir | nm5 |     |
|-----|----|-----|-----|----|-----|----|-----|-----|
| 000 | 1  | 0 0 | 001 |    | 100 |    | 11  | 110 |
| 15  | 12 | 11  | 9   | 8  | 6   | 5  | 4   | 0   |



### ADD with one Literal in LC-3 Data Path



### Instructions with one Literal in MIPS

#### I-type MIPS Instructions

- 2 register operands and immediate
- Some operate and data movement instructions

| opcode | rs     | rt     | imm     |
|--------|--------|--------|---------|
| 6 bits | 5 bits | 5 bits | 16 bits |

- $\Box$  opcode = operation
- □ rs = source register
- □ rt =
  - destination register in some instructions (e.g., addi, lw)
  - source register in others (e.g., sw)
- □ imm = Literal or immediate

### Add with one Literal in MIPS

### Add immediate

MIPS assembly

addi \$s0, \$s1, 5

#### Field Values

| ор | rs | rt | imm |
|----|----|----|-----|
| 0  | 17 | 16 | 5   |

#### $rt \leftarrow rs + sign-extend(imm)$

#### Machine Code

| ор     | rs    | rt    | imm                 |
|--------|-------|-------|---------------------|
| 001000 | 10001 | 10010 | 0000 0000 0000 0101 |

0x22300005

### Subtract in LC-3

### MIPS assembly

High-level code

a = b + c - d;

#### MIPS assembly

| add | \$t0, | \$s0, | \$s1 |
|-----|-------|-------|------|
| sub | \$s3, | \$t0, | \$s2 |

### LC-3 assembly

High-level code

a = b + c - d;

- Tradeoff in LC-3
  - More instructions
  - But, simpler control logic

LC-3 assembly



### Subtract Immediate



# Data Movement Instructions and Addressing Modes

### Data Movement Instructions

- In LC-3, there are seven data movement instructions
   LD, LDR, LDI, LEA, ST, STR, STI
- Format of load and store instructions
  - Opcode (bits [15:12])
  - DR or SR (bits [11:9])
  - Address generation bits (bits [8:0])
  - Four ways to interpret bits, called addressing modes
    - PC-Relative Mode
    - Indirect Mode
    - Base+Offset Mode
    - Immediate Mode
- In MIPS, there are only Base+offset and immediate modes for load and store instructions

## PC-Relative Addressing Mode

LD (Load) and ST (Store)

| 15 14 13 12 | 11 10 9 | 8 | 7 | 6 | 5   | 4    | 3   | 2 | 1 | 0 |   |
|-------------|---------|---|---|---|-----|------|-----|---|---|---|---|
| OP          | DR/SR   |   |   | F | PCo | offs | set | 9 |   |   |   |
| 4 bits      | 3 bits  |   |   |   | 9   | bit  | S   |   |   |   | _ |

- $\Box$  OP = opcode
  - E.g., LD = 0010
  - E.g., ST = 0011
- DR = destination register in LD
- $\Box SR = source register in ST$
- □ LD: DR  $\leftarrow$  Memory[PC<sup>+</sup> + sign-extend(PCoffset9)]
- □ ST: Memory[PC<sup>+</sup> + sign-extend(PCoffset9)]  $\leftarrow$  SR

## LD in LC-3

#### LD assembly and machine code



## Indirect Addressing Mode

LDI (Load Indirect) and STI (Store Indirect)

| 15 14 13 12 | 11 10 9 | 8 | 7 | 6 | 5   | 4    | 3   | 2 | 1 | 0 |  |
|-------------|---------|---|---|---|-----|------|-----|---|---|---|--|
| OP          | DR/SR   |   |   | F | PCo | offs | set | 9 |   |   |  |
| 4 bits      | 3 bits  |   |   |   | 9   | bit  | S   |   |   |   |  |

- $\Box$  OP = opcode
  - E.g., LDI = 1010
  - E.g., STI = 1011
- DR = destination register in LDI
- $\Box SR = source register in STI$
- □ LDI: DR ← Memory[Memory[PC<sup>+</sup> + sign-extend(PCoffset9)]]
- □ STI: Memory[PC<sup>+</sup> + sign-extend(PCoffset9)]]  $\leftarrow$  SR

## LDI in LC-3

### LDI assembly and machine code



Now the address of the operand can be anywhere in the memory

### Base+Offset Addressing Mode

LDR (Load Register) and STR (Store Register)

| 15 14 13 12 | 11 10 9 | 8   | 76   | 5 | 4 | 3 | 2    | 1 | 0 |
|-------------|---------|-----|------|---|---|---|------|---|---|
| OP          | DR/SR   | Bas |      | 6 |   |   |      |   |   |
| 4 bits      | 3 bits  | 3 b | oits |   |   | 6 | bits | ; |   |

- $\Box$  OP = opcode
  - E.g., LDR = 0110
  - E.g., STR = 0111
- □ DR = destination register in LDR
- $\Box SR = source register in STR$
- □ LDR: DR ← Memory[BaseR + sign-extend(offset6)]
- □ STR: Memory[BaseR + sign-extend(offset6)]  $\leftarrow$  SR

## LDR in LC-3

### LDR assembly and machine code



Again, the address of the operand can be anywhere in the memory

### Address Calculation in LC-3 Data Path



53

### Base+Offset Addressing Mode in MIPS

 In MIPS, lw and sw use base+offset mode (or base addressing mode)

| High-level code | MIPS assembly           |  |  |  |  |
|-----------------|-------------------------|--|--|--|--|
| A[2] = a;       | sw \$s3, 8(\$s0)        |  |  |  |  |
|                 | Memory[\$s0 + 8] ← \$s3 |  |  |  |  |

| ор | rs | rt | imm |
|----|----|----|-----|
| 43 | 16 | 19 | 8   |

imm is the 16-bit offset, which is sign-extended to 32 bits

### An Example Program in MIPS and LC-3

| High-level code | MIPS registers | LC-3 registers |
|-----------------|----------------|----------------|
| a = A[0];       | A = \$s0       | A = R0         |
| c = a + b - 5;  | b = \$s2       | b = R2         |
| B[0] = c;       | B = \$s1       | B = R1         |

#### MIPS assembly

| lw   | \$t0, | 0(\$s0)    |
|------|-------|------------|
| add  | \$t1, | \$t0, \$s2 |
| addi | \$t2, | \$t1, -5   |
| SW   | \$t2, | 0(\$s1)    |

#### LC-3 assembly

| LDR | R5, | R0, | #0  |
|-----|-----|-----|-----|
| ADD | R6, | R5, | R2  |
| ADD | R7, | R6, | #-5 |
| STR | R7, | R1, | #0  |

## Immediate Addressing Mode

### LEA (Load Effective Address)

| 15 14 13 12 | 11 10 9 | 8 | 7 | 6 | 5  | 4    | 3   | 2 | 1 | 0 |
|-------------|---------|---|---|---|----|------|-----|---|---|---|
| OP          | DR      |   |   | P | Co | offs | set | 9 |   |   |
| 4 bits      | 3 bits  |   |   |   | 9  | bit  | s   |   |   |   |

• OP = 1110

- DR = destination register
- □ LEA: DR  $\leftarrow$  PC<sup>+</sup> + sign-extend(PCoffset9)

What is the difference from PC-Relative addressing mode?

Answer: Instructions with PC-Relative mode load from memory, but LEA does not  $\rightarrow$  Hence the name *Load Effective Address* 

### LEA in LC-3

### LEA assembly and machine code



| i idei iii |         |           |
|------------|---------|-----------|
| OP         | DR      | PCoffset9 |
| 111(       | 0 101   | 11111101  |
| 15 1       | 12 11 9 | 8 0       |

### Address Calculation in LC-3 Data Path



58

## Immediate Addressing Mode in MIPS

- In MIPS, lui (load upper immediate) loads a 16-bit immediate into the upper half of a register and sets the lower half to 0
- It is used to assign 32-bit constants to a register

High-level code

a = 0x6d5e4f3c;

MIPS assembly

# \$s0 = a

lui \$s0, 0x6d5e

ori \$s0, 0x4f3c

### Addressing Example in LC-3

• What is the final value of R3?

P&P, Chapter 5.3.5

| Address | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |                   |
|---------|----|----|----|----|----|----|---|---|---|---|---|---|---|---|---|---|-------------------|
| x30F6   | 1  | 1  | 1  | 0  | 0  | 0  | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | R1<- PC-3         |
| x30F7   | 0  | 0  | 0  | 1  | 0  | 1  | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | R2<- R1+14        |
| x30F8   | 0  | 0  | 1  | 1  | 0  | 1  | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | M[x30F4]<- R2     |
| x30F9   | 0  | 1  | 0  | 1  | 0  | 1  | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | R2<- 0            |
| x30FA   | 0  | 0  | 0  | 1  | 0  | 1  | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | R2<- R2+5         |
| x30FB   | 0  | 1  | 1  |    |    |    |   |   |   |   |   |   |   |   |   |   | M[R1+14]<- R2     |
| x30FC   | 1  | 0  | 1  | 0  | 0  | 1  | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | R3<- M[M[x30F4 ]] |

### Addressing Example in LC-3

#### What is the final value of R3?

P&P, Chapter 5.3.5

Address 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

| x30F6 | LÊA | 1 | 1 | 0 | 0 | 0 | 1 | <u>_</u> } | 1 | 1 | 1          | 1 1 1 R1 $=$ PC $- 3 = 0x30F7 - 3 = 0x30F4$                  |
|-------|-----|---|---|---|---|---|---|------------|---|---|------------|--------------------------------------------------------------|
| x30F7 | ADD | 0 | 0 | 1 | 0 | 1 | 0 | 0          | 0 | 1 | 1          | $Q_4 1$ R2 = R1 + 14 = 0x30F4 + 14 = 0x3102                  |
| x30F8 | SŪ  | 0 | 1 | 1 | 0 | 1 | 0 | -15        | 1 | 1 | 1          | $1 \ 1 \ 0 \ 1 \ M[PC - 5] = M[0x030F4] = 0x3102$            |
| x30F9 | AND | 1 | 0 | 1 | 0 | 1 | 0 | 0          | 1 | 0 | 1          | $0 \ 0 \ 0 \ 0 \ 0$ R2 = 0                                   |
| x30FA | ADD | 0 | 0 | 1 | 0 | 1 | 0 | 0          | 1 | 0 | 1          | $\Theta$ 0 1 0 1 R2 = R2 + 5 = 5                             |
| x30FB | STR | 1 | 1 | 1 | 0 | 1 | 0 | 0          | 0 | 1 | <b>M</b> [ | $[R1 + 14] = M[0 \times 30F4 + 14] = M[0 \times 3102] = 5$   |
| x30FC | LDI | 0 | 1 | 0 | 0 | 1 | 1 | -19        | 1 | 1 | 1          | $\frac{1}{1} R_3 = M[M[PC - 9]] = M[M[0 \times 30FD - 9]] =$ |
|       |     |   |   |   | _ |   |   |            |   |   |            | M[M[0x30F4]] = M[0x3102] = 5                                 |

#### The final value of R3 is 5

# Control Flow Instructions

### Control Flow Instructions

- Allow a program to execute out of sequence
- Conditional branches and unconditional jumps
  - Conditional branches are used to make decisions
    - E.g., if-else statement
  - □ In LC-3, three condition codes are used
  - Jumps are used to implement
    - Loops
    - Function calls
  - □ JMP in LC-3 and j in MIPS

### Condition Codes in LC-3

- Each time one GPR (R0-R7) is written, three single-bit registers are updated
- Each of these condition codes are either set (set to 1) or cleared (set to 0)
  - □ If the written value is negative
    - N is set, Z and P are cleared
  - If the written value is zero
    - Z is set, N and P are cleared
  - □ If the written value is positive
    - P is set, N and Z are cleared
- x86 and SPARC are examples of ISAs that use condition codes

### Conditional Branches in LC-3

### BRz (Branch if Zero)



- n, z, p = which condition code is tested (N, Z, and/or P)
  - n, z, p: instruction bits to identify the condition codes to be tested
  - N, Z, P: values of the corresponding condition codes
- PCoffset9 = immediate or constant value
- if ((n AND N) OR (p AND P) OR (z AND Z))
   then PC ← PC<sup>+</sup> + sign-extend(PCoffset9)
- □ Variations: BRn, BRz, BRp, BRzp, BRnp, BRnz, BRnzp

### Conditional Branches in LC-3



\*n, z, p are the instruction bits to identify the condition codes to be tested

### Conditional Branches in MIPS

### beq (Branch if Equal)

beq \$s0, \$s1, offset

| 4      | rs     | rt     | offset  |
|--------|--------|--------|---------|
| 6 bits | 5 bits | 5 bits | 16 bits |

• 4 = opcode

- □ rs, rt = source registers
- offset = immediate or constant value
- if rs == rt
  - then  $PC \leftarrow PC^+$  + sign-extend(offset) \* 4
- Variations: beq, bne, blez, bgtz

### Branch If Equal in MIPS and LC-3



- This is an example of tradeoff in the instruction set
  - □ The same functionality requires more instructions in LC-3
  - But, the control logic requires more complexity in MIPS

### What We Learned

- Basic elements of a computer & the von Neumann model
   LC-3: An example von Neumann machine
- Instruction Set Architectures: LC-3 and MIPS
  - Operate instructions
  - Data movement instructions
  - Control instructions
- Instruction formats
- Addressing modes

| Problem            |
|--------------------|
| Algorithm          |
| Program/Language   |
| System Software    |
| SW/HW Interface    |
| Micro architecture |
| Logic              |
| Devices            |
| Electrons          |
|                    |

### There Is A Lot More to Cover on ISAs



## Many Different ISAs Over Decades

- **x86**
- PDP-x: Programmed Data Processor (PDP-11)
- VAX
- IBM 360
- CDC 6600
- SIMD ISAs: CRAY-1, Connection Machine
- VLIW ISAs: Multiflow, Cydrome, IA-64 (EPIC)
- PowerPC, POWER
- RISC ISAs: Alpha, MIPS, SPARC, ARM, RISC-V, ...
- What are the fundamental differences?
  - E.g., how instructions are specified and what they do
  - E.g., how complex are the instructions

## Complex vs. Simple Instructions

- Complex instruction: An instruction does a lot of work, e.g. many operations
  - Insert in a doubly linked list
  - Compute FFT
  - String copy
  - ...
- Simple instruction: An instruction does little work -- it is a primitive using which complex operations can be built
  - Add
  - XOR
  - Multiply
  - ...

## Complex vs. Simple Instructions

### Advantages of Complex instructions

- + Denser encoding → smaller code size → better memory utilization, saves off-chip bandwidth, better cache hit rate (better packing of instructions)
- + Simpler compiler: no need to optimize small instructions as much

### Disadvantages of Complex Instructions

- Larger chunks of work → compiler has less opportunity to optimize (limited in fine-grained optimizations it can do)
- More complex hardware → translation from a high level to control signals and optimization needs to be done by hardware

## ISA-level Tradeoffs: Number of Registers

- Affects:
  - Number of bits used for encoding register address
  - Number of values kept in fast storage (register file)
  - uarch) Size, access time, power consumption of register file
- Large number of registers:
  - + Enables better register allocation (and optimizations) by compiler → fewer saves/restores
  - -- Larger instruction size
  - -- Larger register file size

### There Is A Lot More to Cover on ISAs



## There Is A Lot More to Cover on ISAs



## Detailed Lectures on ISAs & ISA Tradeoffs

- Computer Architecture, Spring 2015, Lecture 3
  - □ ISA Tradeoffs (CMU, Spring 2015)
  - https://www.youtube.com/watch?v=QKdiZSfwgg&list=PL5PHm2jkkXmi5CxxI7b3JCL1TWybTDtKq&index=3
- Computer Architecture, Spring 2015, Lecture 4
  - ISA Tradeoffs & MIPS ISA (CMU, Spring 2015)
  - https://www.youtube.com/watch?v=RBgeCCW5Hjs&list=PL5PHm2jkkXmi5CxxI7b3J CL1TWybTDtKq&index=4
- Computer Architecture, Spring 2015, Lecture 2
  - Fundamental Concepts and ISA (CMU, Spring 2015)
  - https://www.youtube.com/watch?v=NpC39uS4K4o&list=PL5PHm2jkkXmi5CxxI7b3J CL1TWybTDtKq&index=2

# Digital Design & Computer Arch. Lecture 10a: Instruction Set Architectures II

Prof. Onur Mutlu

ETH Zürich Spring 2022 25 March 2022