# 计算机组成原理 第二章"指令系统"

中科大11系李曦

## 概要

- 指令系统: 机器指令的集合
  - "程序控制"
    - 程序=顺序执行的指令流
  - 机器语言,汇编语言(Assemble Language)
  - Instruction Set Architecture (ISA)
    - 分类: CISC、RISC、VLIW
    - 影响:处理器、C编译器、OS。。。
- 本章的内容
  - RV指令系统
    - 操作数:寄存器,存储器(大小尾端,对齐),常数/立即数;
    - 指令功能, 指令格式与编码, 寻址方式(操作数, 下一条指令)
    - 分支指令\$2.7, 大立即数处理\$2.10
    - 过程调用\$2.8,\$2.13
    - 汇编程序设计: COD4附录B
  - 指令系统特征
  - 编译过程: \$2.12
    - 可执行程序生成:编译,汇编,链接,加载 llxx@ustc.edu.cn

David Patterson, Andrew Waterman, The RISC-V Reader: An Open Architecture Atlas, 2017



### Architecture (ISA) & Interfaces



Hardware

API – application programming interface

■ ABI – application binary interface——HLL<->LLL

■ **ISA** – instruction set architecture (p14)

 Architecture: formal specification of a system's interface and the logical behavior of its visible resources.

## Instruction-Set Processor Design

- Architecture (ISA) programmer/compiler view
  - "functional appearance to its immediate user/system programmer"
  - Opcodes, addressing modes, architected registers,
     IEEE floating point
  - 机器语言
- Implementation (µArch) processor designer view
  - "logical structure or organization that performs the architecture"
  - functional units, pipelining, caches, physical registers
- Realization (chip) chip/system designer view
  - "physical structure that embodies the implementation"
  - Gates, cells, transistors, wires



## 体系结构的8种属性

- 数据表示
  - 硬件能直接辨识和操作的数据类型和格式
- 寻址方式
  - 最小可寻址单位、寻址方式的种类、地址运算
- 寄存器组织
  - 操作寄存器、变址寄存器、控制寄存器及专用寄存器的定义、数量和使用规则
- 指令系统
  - 机器指令的操作类型、格式、指令间排序和控制机构
- 存储系统
  - 最小编址单位、编址方式、主存容量、最大可编址空间
- 输入输出
  - 输入输出的连接方式、处理机/存储器与输入输出设备间的数据交换方式、 数据交换过程的控制
- 中断机构
  - 中断类型、中断级别,以及中断响应方式等
- 信息保护
  - 信息保护方式、硬件信息保护机制

## 操作数 (opr)

- 操作数类型: 进制,编码,立即数(补码)
  - 地址:无符号整数。寄存器、内存、I/O端口ID
  - 数值:常数、定点数(有符号/无符号)、浮点数、逻辑值
  - 字符: ASCII、汉字内码
- 字长: "RV32I" ——32位, "RV64" ——64位("大立即数")
  - 字节
  - 半字: 2B
  - 字: 4B
  - 双字: 8B (大立即数)
- 物理操作数: 存放位置
  - 寄存器
  - 主存
  - I/O端口
  - 外存?



## 指令字中的操作数

- 寄存器
- 存储器
  - 内存地址
  - 字长: sw
- 立即数
  - 进制表示
    - 十进制
    - 十六进制
      - 0x12345
    - 二进制
      - 0b1101
- 标号/行号
  - main, end



### RV architected registers: RV64/RV32, 图2-14

| Name    | Register<br>number | Usage                          | Preserved on call? |
|---------|--------------------|--------------------------------|--------------------|
| x0      | 0                  | The constant value 0           | n.a.               |
| x1 (ra) | 1                  | Return address (link register) | yes                |
| x2 (sp) | 2                  | Stack pointer                  | yes                |
| x3 (gp) | 3                  | Global pointer                 | yes                |
| x4 (tp) | 4                  | Thread pointer                 | yes                |
| x5-x7   | 5–7                | Temporaries                    | no                 |
| x8-x9   | 8–9                | Saved                          | yes                |
| x10-x17 | 10–17              | Arguments/results              | no                 |
| x18-x27 | 18–27              | Saved                          | yes                |
| x28-x31 | 28–31              | Temporaries                    | no                 |

## RV典型内存地址空间分配

段式

 $SP \rightarrow 0000 003f ffff fff0_{hex}$ 

- DATA
- Stack/Heap
  - 栈: 自高向低
  - 堆: 自低向高
- CODE

0000 0000 1000 0000<sub>hex</sub>

I/O port?

- · 硬盘
- 网络





图**2-13** 用户地址空间划分 空间大小?

### 字存储顺序(Byte Ordering)\$2.3.1

- 字存储的顺序中,字节的次序有两种
  - 大尾端 (big endness)
    - 低地址,高字节
  - 小尾端(little endness)
    - 低地址,低字节
- X86和RV都为小端,ARM可以自主设置
- 0000000 0000000 0000000 00000001
  - 0000000 00000111 00000011 00000001?



```
大尾端: 00000000 00000000 0000000 00000001
```

addr+0 addr+1 addr+2 addr+3 //先存高有效位(在低地址)

小尾端: 00000001 00000000 00000000 00000000

addr+0 addr+1 addr+2 addr+3 //先存低有效位(在低地址)

## 数据存放位置(Memory Alignment)

- 在数据不对准边界的计算机中,数据(例如一个字)可能在两个存储单元中。
  - 此时需要访问两次存储器,并对高低字节的位置进行调整后,才能取得一字。
- 边界对齐:
  - 字对齐: 左移两位, 按字访问
    - RV/x86不要求,MIPS要求
  - 半字: 左移一位, 按半字访问

#### 存储器

#### 地址 (十进制)

| 3     | 字(地 | 过址2)    | 半字(地址0) | 0 |
|-------|-----|---------|---------|---|
| 字节(地) | 业7) | 字节(地址6) | 字 (地址4) | 4 |
| 半     | 字(均 | 也址10)   | 半字(地址8) | 8 |

llxx@ustc.edu.cn

## 操作分类

- 数据传递(data movement)
  - 访存: load, store, mov
  - I/O: in, out
- 算逻运算(arithmetic & logical)
  - add, sub, and, not, or, xor, dec, inc, cmp
  - monadic & dyadic operations
- 移位操作
  - monadic operations: shl, shr, srl, srr
- 分支控制(tranfer of contral, Branch)
  - comparisons & conditional branches: beq, bnz
  - 无条件转移: jmp
  - procedure call: call, ret, int, iret
- 系统指令: nop, sti, cli, lock, HLT



运算器

存储器

输出设备

## 指令示例

- ALU
  - 源操作数
    - 立即数
  - 目的操作数
- 分支
  - NPC
- 访存
  - 内存地址



# 指令字格式Machine Instruction Layout

- von Neumann: "指令由操作码和地址码构成"
- 操作码: 操作的性质
- 地址码: 指令和操作数 (operand) 的存储位置

| 操作码域(op) | 地址码域(addr) |
|----------|------------|

- 指令字长度固定vs.可变: RISC(RV/MIPS/ARM)一般32位
  - 固定:规则,浪费空间
- 操作码长度固定vs.可变
  - 固定:译码简单,指令条数有限,RISC(RV/MIPS/ARM)
  - 可变: 指令条数和格式按需调整, CISC (x86)
    - "扩展操作码技术":调整op与addr域
      - 如果指令字长固定,则操作码长度增加,地址码长度缩短

## 地址码:操作数,指令

- 源操作数、目的操作数、下一条指令地址
  - -地址:寄存器、主存、I/O端口
- 地址码域格式
  - 4地址指令: op rs1, rs2, rd, ni
  - 3地址指令: op rs1, rs2, rd; ni在PC中
  - 2地址指令: op rs1, rs2; rd=rs1 or ACC
  - 1地址指令: op rs2; rs1=ACC, rd=ACC
  - -0地址指令: op; 堆栈操作

## 寻址方式: 指令的地址码域

- 寻址方式: 指令字和操作数的存储地址计算方式
- 指令寻址:现代CPU利用PC
  - 顺序执行:每执行一条指令,PC自动1
  - 跳转: 更新PC, 转移到目的地址执行
- 操作数寻址
  - 指令中给出"形式地址"
  - 有效地址: 操作数在寄存器/内存中的物理地址
    - EA=寻址方式+形式地址

操作码 形式地址



## 寻址方式:操作数,下一条指令

- 常见约10种
  - 立即寻址 (a)
  - 直接寻址(b)
  - 间接寻址(c)
  - 寄存器寻址(d)
  - 寄存器间接寻址(e)
  - 基址寻址(f)
    - BP+offset
  - PC相对寻址(f)
    - PC+offset
  - 堆栈寻址 (g)
  - 变址寻址(d+f)
    - Index: x86的si/di
  - 隐含寻址(如堆栈)



## 寻址方式示例

- addi
- j
- SW
- beq



### RISC-V ISA的特点

- 模块化: 51+13+133
  - RV32I指令集: 支持完整软件栈, 永远不变
    - 共51条: 图2-1 (37条, 重点) + 图2-37 (14条)
  - 系统指令: 同步, CSR操作, 异常。图5-47, 13条
  - RV32IMFD指令集: RV32I的基本扩展, 133条

图2-38

- 约束
  - 成本: 芯片面积
  - 简洁
  - 性能: 时间,功耗
  - 架构与实现分离
  - 扩展性: 操作码域空间
  - 程序大小
  - 易于编程/编译/链接

| Mnemonic | Description                     | Insn. Count |
|----------|---------------------------------|-------------|
| I        | Base architecture               | 51          |
| M        | Integer multiply/divide         | 13          |
| A        | Atomic operations               | 22          |
| F        | Single-precision floating point | 30          |
| D        | Double-precision floating point | 32          |
| С        | Compressed instructions         | 36          |











RISC-I 1981

RISC-II 1983

RISC-III (SOAR) 1984

RISC-JV (SPUR) 1988

RISC-V 2013

# RISC-V指令格式与寻址方式

图2-19

| Name         |                                                          | Fi     |        |        |               |        | Comm                               | ents                         |           |           |     |
|--------------|----------------------------------------------------------|--------|--------|--------|---------------|--------|------------------------------------|------------------------------|-----------|-----------|-----|
| (Field size) | 7 bits                                                   | 5 bits | 5 bits | 3 bits | 5 bits        | 7 bi   | its                                |                              |           |           |     |
| R-type       | funct7                                                   | rs2    | rs1    | funct3 | rd            | opco   | opcode Arithmetic instruction form |                              |           | ormat     |     |
| I-type       | immediate[11:0]                                          |        | rs1    | funct3 | rd            | opcode |                                    | Loads & immediate arithmetic |           |           |     |
| S-type       | immed[11:5]                                              | rs2    | rs1    | funct3 | immed[4:0]    | opco   | ode                                | Stores                       | 3         |           |     |
| SB-type      | immed[12,10:5]                                           | rs2    | rs1    | funct3 | immed[4:1,11] | opco   | ode                                | Condit                       | ional bra | anch forn | nat |
| UJ-type      | imm                                                      |        | rd     | opco   | ode           | Uncon  | ditional                           | jump forr                    | mat       |           |     |
| U-type       | immediate[31:12]                                         |        |        |        | rd            | орсо   | ode                                | Upper                        | immedia   | ate forma | at  |
|              | 1 Immediate addressing funct6 immed rs1 funct3 rd oncode |        |        |        |               |        |                                    |                              |           |           |     |

指令格式: 6种

- 基本: R/I/S/U

• 7种: 4+2+ "IS"

• IS-type: funct6, 立即数移位

- 规整: Reg和Imm位置固定

• B/J-type的立即数域?

- op码与类型绑定
- 寻址方式: 4种
  - 本质: Imm, Reg, Base
  - 指令寻址方式
    - PC相对寻址: beg, jal
    - ▸ 间接跳转:jalr x0,100(x1)



### RV整数指令操作码

- 常用37条,图2-18
  - 按字长分类: 访存指令
    - b, h, w, d
  - 按数据类型分类
    - i, u
  - 按指令格式分类
- 典型:按功能分类
  - 同功能: op同, funct不同
  - ALU,访存,分支
    - add: R-type
    - addi: I-type
    - lw: load, I-type
    - sw: store, S-type
    - beq: SB-type
    - jal (J-type) , jalr (I-type)
- 51条指令示例: 图3-12

| Format  | Instruction | Opcode  | Funct3 | Funct6/7 |
|---------|-------------|---------|--------|----------|
|         | add         | 0110011 | 000    | 0000000  |
|         | sub         | 0110011 | 000    | 0100000  |
|         | sll         | 0110011 | 001    | 0000000  |
|         | xor         | 0110011 | 100    | 0000000  |
| R-type  | srl         | 0110011 | 101    | 0000000  |
| N-type  | sra         | 0110011 | 101    | 0000000  |
|         | or          | 0110011 | 110    | 0000000  |
|         | and         | 0110011 | 111    | 0000000  |
|         | lr.d        | 0110011 | 011    | 0001000  |
|         | sc.d        | 0110011 | 011    | 0001100  |
|         | 16          | 0000011 | 000    | n.a.     |
|         | 1h          | 0000011 | 001    | n.a.     |
|         | lw          | 0000011 | 010    | n.a.     |
|         | 1d          | 0000011 | 011    | n.a.     |
|         | 1 bu        | 0000011 | 100    | n.a.     |
|         | lhu         | 0000011 | 101    | n.a.     |
| I-type  | lwu         | 0000011 | 110    | n.a.     |
|         | addi        | 0010011 | 000    | n.a.     |
|         | slli        | 0010011 | 001    | 000000   |
|         | xori        | 0010011 | 100    | n.a.     |
|         | srli        | 0010011 | 101    | 000000   |
|         | srai        | 0010011 | 101    | 010000   |
|         | ori         | 0010011 | 110    | n.a.     |
|         | andi        | 0010011 | 111    | n.a.     |
|         | jalr        | 1100111 | 000    | n.a.     |
|         | sb          | 0100011 | 000    | n.a.     |
| 0.000   | sh          | 0100011 | 001    | n.a.     |
| S-type  | SW          | 0100011 | 010    | n.a.     |
|         | sd          | 0100011 | 111    | n.a.     |
|         | beq         | 1100111 | 000    | n.a.     |
|         | bne         | 1100111 | 001    | n.a.     |
|         | blt         | 1100111 | 100    | n.a.     |
| SB-type | bge         | 1100111 | 101    | n.a.     |
|         | bltu        | 1100111 | 110    | n.a.     |
|         | bgeu        | 1100111 | 111    | n.a.     |
| U-type  | lui         | 0110111 | n.a.   | n.a.     |
| UJ-type | jal         | 1101111 | n.a.   | n.a.     |
| gpo     | , , ,       |         | 11101  |          |

#### RV16/RV32/RV64的操作码域: 7位



- Can support variable-length instructions.
- Base instruction set (RV32) always has fixed 32-bit instructions lowest two bits = 11<sub>2</sub>
- All branches and jumps have targets at 16-bit granularity (even in base ISA where all instructions are fixed 32 bits)

#### RV示例:指令格式,寻址方式,图2-6

| R-type Instructions   | funct7         | rs2    | rs1   | funct3 | rd             | opcode  | Example           |
|-----------------------|----------------|--------|-------|--------|----------------|---------|-------------------|
| add (add)             | 0000000        | 00011  | 00010 | 000    | 00001          | 0110011 | add x1, x2, x3    |
| sub (sub)             | 0100000        | 00011  | 00010 | 000    | 00001          | 0110011 | sub x1, x2, x3    |
| I-type Instructions   | immediate      |        | rs1   | funct3 | rd             | opcode  | Example           |
| addi (add immediate)  | 001111         | 101000 | 00010 | 000    | 00001          | 0010011 | addi x1, x2, 1000 |
| ld (load doubleword)  | 001111         | 101000 | 00010 | 011    | 00001          | 0000011 | ld x1, 1000 (x2)  |
| S-type Instructions   | immed<br>-iate | rs2    | rs1   | funct3 | immed<br>-iate | opcode  | Example           |
| sd (store doubleword) | 0011111        | 00001  | 00010 | 011    | 01000          | 0100011 | sd x1, 1000(x2)   |

- 汇编指令寻址方式表示
  - 寄存器寻址【名】,立即数寻址【二/十/16进制】,基址寻址【1000(x2)】
  - **算逻**指令均为寄存器寻址,load/store为基址寻址
- 机器指令与汇编指令中源操作数和目的操作数的位置对应关系
  - 汇编指令: x2/x3为源操作数rs1/rs2, x1为目的操作数rd
  - 注意S-type: x1=rs2(源), x2=rs1(基址), rs2 => mem[rs1+1000]

## \$zero: x0寄存器, \$2.3.2

- x0固定为"0"
- data move: reg-reg

| Name    | Register<br>number | Usage                          | Preserved on call? |
|---------|--------------------|--------------------------------|--------------------|
| x0      | 0                  | The constant value 0           | n.a.               |
| x1 (ra) | 1                  | Return address (link register) | yes                |
| x2 (sp) | 2                  | Stack pointer                  | yes                |
| x3 (gp) | 3                  | Global pointer                 | yes                |
| x4 (tp) | 4                  | Thread pointer                 | yes                |
| x5-x7   | 5–7                | Temporaries                    | no                 |
| x8-x9   | 8–9                | Saved                          | yes                |
| x10-x17 | 10–17              | Arguments/results              | no                 |
| x18-x27 | 18–27              | Saved                          | yes                |
| x28-x31 | 28–31              | Temporaries                    | no                 |

```
add v0,s0,sero \# returns f (v0 = so + 0)
```

• 寄存器赋值

```
addi $v0,$zero,1 # return 1
```

Compare

```
slti $t0,$a0,1 # test for n < 1
beq $t0,$zero,L1 # if n >= 1, go to L1
```

• Goto: beq x0, x0, Exit

### 位扩展:短立即数=>32位立即数,\$2.4

- 位扩展: 高位填充, 从较小的数据类型转换成较大的类型
  - 无符号扩展(zero extension): 高位补0
    - 逻辑运算
  - 符号扩展(sign extension): 高位补1,补码
    - 算术运算, 地址偏移
- 需求: I/S/SB-type, 短立即数12位=>32位
  - addi \$s3,\$s3,4; \$s3 = \$s3 + 4
  - lw \$t1, offset(\$t2); \$t1=M[\$t2+offset]
  - beq \$1, \$3, 7; if(\$1=\$3)then goto nPC+7, else not taken.



| I-type | immediate[11:0] |     | rs1 | funct3 | rd         | opcode |
|--------|-----------------|-----|-----|--------|------------|--------|
| S-type | immed[11:5]     | rs2 | rs1 | funct3 | immed[4:0] | opcode |

#### 32位常数, "双指令序列"法

- 计算: 大立即数, \$2.10.1
  - 取20位立即数: 取左移12位后的20位立即数
    - lui: load upper immediate, U-type
      - 加载20位立即数到[31:12], [11:0]=0
      - 例: lui x5, 0x12345; x5=0x1234 5000,
  - +低12位
    - · addi: I-type
- 长跳转: 寻址32位地址空间, \$2.10.2
  - lui: 取高20位——\$2.18的auipc?
    - 例: lui x5, 0x12345;
  - jalr: jump & link reg, I-type
    - 高20位+低12位,——基于x5间接跳转,不是PC相对寻址!
    - 例: jalr x1, 100(x5); x1=PC+4, goto x5+100

| U-type | immediate[31    | rd  | opcode |    |        |
|--------|-----------------|-----|--------|----|--------|
| I-type | immediate[11:0] | rs1 | funct3 | rd | opcode |

## 转移指令的寻址方式\$2.7, 2.10.2, 2.8

- 两类转移
  - 分支指令: if, for, while, case, goto
    - 条件分支: beq rs1, rs2, L1; PC相对, 12位offset
    - 无条件分支: 可多种方式
      - jal x0, Label; J-type, PC相对, 20位offset
      - jalr x0, 100(x5); l-type, 间接转移, 12位offset
      - beq x0, x0, Loop; B-type
  - 过程调用: x1为返回地址ra
    - Calling: jal x1, ProcedureAddress; NPC=>ra, 转
    - Return: jalr x0, 0(x1); "间接跳转",基于x1而非PC
- 转移范围: near (12位, 20位), far (32位)
  - 远程转移: 32位(lui高20位,jalr低12位)

| imm[12] | imm[10:5] | rs2  | 2       | rs1   | funct3 | imm[4:1] | imm[11] | opcode | B-type |
|---------|-----------|------|---------|-------|--------|----------|---------|--------|--------|
| imm[20] | imm[1     | 0:1] | imm[11] | imm[1 | 9:12]  | r        | d       | opcode | J-type |
|         | imm[1     | 1:0] |         | rs1   | funct3 | r        | d       | opcode | I-type |

### 其他RV指令(了解),\$2.18,\$5.14

图2-3714条

| Instruction                           | Name  | Format | Description                                             |
|---------------------------------------|-------|--------|---------------------------------------------------------|
| Add upper immediate to PC             | auipc | U      | Add 20-bit upper immediate to PC; write sum to register |
| Set if less than                      | slt   | R      | Compare registers; write Boolean result to register     |
| Set if less than, unsigned            | sltu  | R      | Compare registers; write Boolean result to register     |
| Set if less than, immediate           | slti  | I      | Compare registers; write Boolean result to register     |
| Set if less than immediate, unsigned  | sltiu | I      | Compare registers; write Boolean result to register     |
| Add word                              | addw  | R      | Add 32-bit numbers                                      |
| Subtract word                         | subw  | R      | Subtract 32-bit numbers                                 |
| Add word immediate                    | addiw | I      | Add constant to 32-bit number                           |
| Shift left logical word               | sllw  | R      | Shift 32-bit number left by register                    |
| Shift right logical word              | srlw  | R      | Shift 32-bit number right by register                   |
| Shift right arithmetic word           | sraw  | R      | Shift 32-bit number right arithmetically by register    |
| Shift left logical word immedate      | slliw | I      | Shift 32-bit number left by immediate                   |
| Shift right logical word immediate    | srliw | I      | Shift 32-bit number right by immediate                  |
| Shift right arithmetic word immediate | sraiw | I      | Shift 32-bit number right arithmetically by immediate   |

- lui, auipc
- Ir.d, sc.d; \$2.11
- 图5-47, 系统指令, 13条
  - 同步,CSR访问,系统调用
  - CSR: 控制和状态寄存器

| Туре          | Mnemonic   | Name                        |
|---------------|------------|-----------------------------|
|               | FENCE.I    | Instruction Fence           |
| Mem. Ordering | FENCE      | Fence                       |
|               | SFENCE.VMA | Address Translation Fence   |
| CSR Access    | CSRRWI     | CSR Read/Write Immediate    |
|               | CSRRSI     | CSR Read/Set Immediate      |
|               | CSRRCI     | CSR Read/Clear Immediate    |
|               | CSRRW      | CSR Read/Write              |
|               | CSRRS      | CSR Read/Set                |
|               | CSRRC      | CSR Read/Clear              |
| System        | ECALL      | Environment Call            |
|               | EBREAK     | Environment Breakpoint      |
|               | SRET       | Supervisor Exception Return |
|               | WFI        | Wait for Interrupt          |

## 示例: COD4图B.1.5

```
#include <stdio.h>
int
main (int argc, char *argv[])
{
   int i;
   int sum = 0;

   for (i = 0; i <= 100; i = i + 1) sum = sum + i * i;
   printf ("The sum from 0 .. 100 is %d\n", sum);
}</pre>
```

# 示例: COD4图B.1.4

28(\$sp)

```
.text
.align 2
.globl main

subu $sp, $sp, 32
sw $ra, 20($sp)
sd $a0, 32($sp)
sw $0, 24($sp)
```

loop:

str:

SW

main:

| ir \$ra |  | lw \$ra, 20(\$sp)<br>addu \$sp, \$sp, 32<br>jr \$ra | mul<br>lw<br>addu<br>sw<br>ble<br>la<br>lw<br>jal<br>move<br>lw<br>addu | \$sp, \$sp, 32 |
|---------|--|-----------------------------------------------------|-------------------------------------------------------------------------|----------------|
|---------|--|-----------------------------------------------------|-------------------------------------------------------------------------|----------------|

\$0,

.data .align 0

| Name (ABI name) 🏻  | Reg# ₽  | Usage ₽                          |
|--------------------|---------|----------------------------------|
| x0 (zero) 🐶        | 0 0     | The constant value 0 ₽           |
| x1 (ra) 🔑          | 1 ₽     | Return address (link register) - |
| x2 (sp) 🔑          | 2 ₽     | Stack pointer &                  |
| x3 (gp) 🐶          | 3 ₽     | Global pointer &                 |
| x4 (tp) 🐶          | 4 ₽     | Thread pointer ₽                 |
| x5-x7 (t0-t2) -    | 5–7 ₽   | Temporaries ₽                    |
| x8-x9 (fp/s0-s1) - | 8-9 ₽   | Frame pointer, Saved register    |
| x10-x17 (a0-a7) .  | 10–17 ₽ | Arguments(a2-a7)/results(a0, a1) |
| x18-x27 (s2-s11) - | 18–27 ₽ | Saved register                   |
| x28-x31 (t3-t6) -  | 28–31 ₽ | Temporaries ₽                    |
|                    |         |                                  |

Stack

Dynamic data

Static data

Text

Reserved

.asciiz "The sum from 0 .. 100 is %d\n"

#### 过程调用: 断点, 传参, 现场保存(两种), \$2.8



# 过程调用procedure calling

- 步骤:约定被调函数保存状态
  - Caller
    - 参数传递: 将参数放在子过程可访问的位置: 寄存器/栈/内存
    - 控制转移: Call子过程——原子操作
      - 保存断点 (nPC): 返回点
      - 将控制交给子过程: 使PC指向子过程入口 调用程序caller (当前程序)
  - Callee

• 保存现场:将过程内须使用的reg入栈(push)

- 计算,并将结果放在caller可以访问的位置
- 恢复现场: 出栈 (pop)
- 子过程Return: 返回Caller的返回点(断点)
  - 将控制交回调用程序: PC = nPC
- 保留寄存器
- 控制转移指令: call/return
- 类型: 叶子过程, 嵌套过程, 递归过程



#### 单级call/return: 链接寄存器x1/返回地址ra





### RV architected registers: 保留寄存器

| Name    | Register<br>number | Usage                          | Preserved on call? |
|---------|--------------------|--------------------------------|--------------------|
| x0      | 0                  | The constant value 0           | n.a.               |
| x1 (ra) | 1                  | Return address (link register) | yes                |
| x2 (sp) | 2                  | Stack pointer                  | yes                |
| x3 (gp) | 3                  | Global pointer                 | yes                |
| x4 (tp) | 4                  | Thread pointer                 | yes                |
| x5-x7   | 5–7                | Temporaries                    | no                 |
| x8-x9   | 8–9                | Saved                          | yes                |
| x10-x17 | 10–17              | Arguments/results              | no                 |
| x18-x27 | 18–27              | Saved                          | yes                |
| x28-x31 | 28–31              | Temporaries                    | no                 |

• Preserved: 在函数调用中应保持不变

图2-14

## 数组排序: swap(), sort(), \$2.13

• 过程调用:参数传递, call/jal, return/jalr

```
swap:
void swap(int v[], size_t k)
                                                    x6, x11,(2) // reg x6 = k * 4
                                             slli
   int temp;
                                                    x6, x10, x6 // reg x6 = v + (k * 4)
                                             add
   temp = v[k]:
                                                    x5.0(x6)
                                                                 // \text{ reg x5 (temp)} = v[k]
                                             l w
   v[k] = v[k+1];
                                                    x7, 4(x6)
                                                                 // \text{ reg x7} = v[k + 1]
                                             ٦w
   v[k+1] = temp:
                                                    x7.0(x6)
                                                                 // v[k] = reg x7
                                             SW
                                                   x5, 4(x6) // v[k+1] = reg x5 (temp)
                                             SW
                                             jalr x0, 0(x1) // return to calling routine
void sort (int v[], size t int n)
  size_t i, j;
  for (i = 0; i < n; i += 1)
       for (j = i - 1; j \ge 0 \&\& v[j] > v[j + 1]; j -= 1) {
           swap(v.j):
```

```
Pass parameters and call
```

```
addi x10, x21, 0
addi x11, x20, 0
jal x1, swap
```

```
# first swap parameter is v
# second swap parameter is j
# call swap
```



## Use of Stack to Implement Nested **Procedures**



Addresses

4000

4100

4101

4500

4600 4601

4650

4651

Main memory

CALL Proc1

CALL Proc2

CALL Proc2

RETURN

Main

program

Procedure Proc1

(g) After

RETURN

### RV堆栈操作: push/pop, 图2-10

```
addi sp, sp, -24
                        // adjust stack to make room for 3 items
      x5, 16(sp)
                             // save register x5 for use afterwards
入栈
      sd x6, 8(sp)
                               // save register x6 for use afterwards
      x20, 0(sp)
                               // save register x20 for use afterwards
      ld x20, O(sp) // restore register x20 for caller
      ld x6, 8(sp) // restore register x6 for caller
出栈
      ld x5, 16(sp) // restore register x5 for caller
      addi sp, sp, 24 // adjust stack to delete 3 items
    High address
```



### RV的过程帧(栈帧),图2-12



• 参数,断点,保留寄存器,局部变量

## stack frame: 活动记录,帧指针fp



## RV calling conventions

• 传参和返回值: x10~x17

- a0-a1: 函数变量或返回值

- a2-a7: 函数变量

• 断点ra: x1

call

- jal x1, ProcessAddress; PC相对寻址

• jump-and-link: 跳转,并自动保存断点(nPC)至\$ra

图2-14

Preserved

on call?

n.a.

ves

yes

ves

no

yes

ves

Return

- jalr x0, 0(ra); 间接跳转

• jump register: 返回ra

• 状态(现场)保存策略: callee负责保存

图2-11

**Usage** 

Return address (link register)

The constant value 0

Stack pointer

Global pointer

Thread pointer

Arguments/results

**Temporaries** 

Temporaries

Saved

Saved

Register

number

0

8-9

10-17

18-27

28-31

Name

x1 (ra)

x2 (sp)

x3 (gp)

x4 (tp)

x5-x7

x8-x9

x10-x17

x18-x27

x28-x31

x0

| Preserved                       | Not preserved                       |
|---------------------------------|-------------------------------------|
| Saved registers: x8-x9, x18-x27 | Temporary registers: x5-x7, x28-x31 |
| Stack pointer register: x2(sp)  | Argument/result registers: x10-x17  |
| Frame pointer: x8(fp)           |                                     |
| Return address: x1(ra)          |                                     |
| Stack above the stack pointer   | Stack below the stack pointer       |

### 汇编语言程序设计要点:显式与约定

- 机器模型:对程序员显式可见
- ISA
  - 指令集
    - Move, ALU, 分支, I/O
    - 整数, *浮点*, <mark>伪指令</mark>(图2-40)
  - 寻址方式:操作数,目标指令
  - 寄存器使用约定
  - 内存分配:数据、代码、堆栈
- 程序结构
- 过程调用/系统调用约定
  - 堆栈,栈帧
- 可执行程序生成



### Policy of Use Conventions for registers

RV64/RV32, 图2-14(注意: 寄存器名)

| Name (ABI name) 4  | Reg# ₽  | Usage ₽                            | Preserved on call? |
|--------------------|---------|------------------------------------|--------------------|
| x0 (zero) ₽        | 0 43    | The constant value 0 ₽             | n.a ₽              |
| x1 (ra) ₽          | 1 ₽     | Return address (link register) ₽   | yes ₽              |
| x2 (sp) 🐶          | 2 ₽     | Stack pointer -                    | yes ₽              |
| x3 (gp) 🐶          | 3 ₽     | Global pointer ₽                   | yes ₽              |
| x4 (tp) 🐶          | 4 ₽     | Thread pointer ₽                   | yes ₽              |
| x5-x7 (t0-t2) -    | 5–7 ₽   | Temporaries ₽                      | no ₽               |
| x8-x9 (fp/s0-s1) - | 8-9 ₽   | Frame pointer, Saved register      | yes ₽              |
| x10-x17 (a0-a7) 🐶  | 10–17 ₽ | Arguments(a2-a7)/results(a0, a1) ₽ | no ₽               |
| x18-x27 (s2-s11) ₽ | 18–27 ₽ | Saved register ℯ                   | yes ₽              |
| x28-x31 (t3-t6) -  | 28–31 🏻 | Temporaries ₽                      | no 🕫               |

### RV内存分配约定,图2-13



P&W中, RV32I的pc从0x0001 0000开始

## System calls

- OS服务: various names
  - trap/exception, svc, soft interrupt
- Why: Certain operations require
  - specialized knowledge
    - I/O设备, PCIe总线, USB
  - protection: 多任务共享
- What
  - A special machine instruction that causes an softinterrupt/exception
    - 产生状态切换(protection): 需保存PSW
  - RV: 环境调用指令ecall。Ripes提供哪些服务?
  - x86系统调用(system calls): int16, int32
    - BIOS, Windows: 显示、键盘、磁盘、文件、打印机、时间



16M



### System call flow of control

- 1. User program invokes system call.
- Operating system code performs operation.
- 3. Returns control to user program.





2/3/4由软中断指令int完成!

2:保存断点和psw

3: psw赋0, 进入内核态

4: syscall入口赋给pc

5: 进入系统函数

6/7由中断返回指令iret完成

6: 恢复调用前状态

7: 从断点处继续执行

sys\_call\_table sys\_call\_ISR

ia: 指令地址寄存器

psw: 程序状态字寄存器

### 系统调用: x86调用门



### C语言和OS服务: APIs、libc、syscalls?



### RV汇编程序结构:《P&W》例

```
# Directive: enter text section
  .text
  .align 2
                                # Directive: align code to 2^2 bytes
  .globl main
                                # Directive: declare global symbol main
                                # label for start of main
main:
  addi sp,sp,-16
                                # allocate stack frame
                                                                #include <stdio.h>
                                                                int main()
       ra,12(sp)
                                # save return address
  SW
  lui a0,%hi(string1)
                                # compute address of
                                                                  printf("Hello, %s\n", "world");
  addi a0,a0,%lo(string1)
                                     string1
                                                                  return 0;
  lui a1, %hi(string2)
                                # compute address of
                                                                    SP - 0000 003f ffff fff0hex
                                                                                       Stack
  addi a1,a1,%lo(string2)
                                     string2
                                # call function printf
  call printf
     ra,12(sp)
                                # restore return address
  lw
                                                                                     Dynamic data
                                                                                      Static data
  addi sp,sp,16
                                # deallocate stack frame
                                                                    gp - 0000 0000 1000 0000hay
                                                                                       Text
  li
        a0.0
                                # load return value 0
                                                                    PC - 0000 0000 0040 0000<sub>hex</sub>
                                                                                      Reserved
                                # return
  ret
  .section .rodata
                                # Directive: enter read-only data section
  .balign 4
                                # Directive: align data section to 4 bytes
                                # label for first string
string1:
```

# label for second string

# Directive: null-terminated string

# Directive: null-terminated string

.string "Hello, %s!\n"

.string "world"

string2:

### Ripes汇编





### SPIM的系统调用: COD4附录B.9

- SPIM: MIPS-32仿真器
  - 汇编程序调试、执行
  - -标准设备I/O服务

#### COD-RV图5-47 syscall指令

| ECALL  | Environment Call            |
|--------|-----------------------------|
| EBREAK | Environment Breakpoint      |
| SRET   | Supervisor Exception Return |
| WFI    | Wait for Interrupt          |

- SYSCALL Step
  - \$v0=srv#
  - -\$a0~3=arg
  - syscall
  - \$v0=返回值
- Ripes类似

| Service                  | System Call<br>Code | Arguments                                                                                 | Result                     |
|--------------------------|---------------------|-------------------------------------------------------------------------------------------|----------------------------|
| print integer            | 1                   | \$a0 = value                                                                              | (none)                     |
| print float              | 2                   | \$f12 = float value                                                                       | (none)                     |
| print double             | 3                   | \$f12 = double value                                                                      | (none)                     |
| print string             | 4                   | \$a0 = address of string                                                                  | (none)                     |
| read integer             | 5                   | (none)                                                                                    | \$v0 = value read          |
| read float               | 6                   | (none)                                                                                    | \$f0 = value read          |
| read double              | 7                   | (none)                                                                                    | \$f0 = value read          |
| read string              | 8                   | \$a0 = address where string to be<br>stored<br>\$a1 = number of characters to<br>read + 1 | (none)                     |
| memory<br>allocation     | 9                   | \$a0 = number of bytes of storage desired                                                 | \$v0 = address of<br>block |
| exit (end of<br>program) | 10                  | (none)                                                                                    | (none)                     |
| print character          | 11                  | \$a0 = integer                                                                            | (none)                     |
| read character           | 12                  | (none)                                                                                    | char in \$v0               |

## Bubble sort (trace)

| A[0] | A[1] | A[2] | A[3] | A[4] |
|------|------|------|------|------|
| 3    | 4    | 10   | 5    | 3    |

| A[0] | A[1] | A[2] | A[3] | A[4] |
|------|------|------|------|------|
| 3    | 3    | 4    | 5    | 10   |

#### Basic idea:

- (1)  $j \leftarrow n 1$  (index of last element in A)
- ② If A[j] < A[j-1], swap both elements ③
- ③  $j \leftarrow j 1$ , goto ② if j > 0
- ④ Goto ① if a swap occurred

② A[0] A[1] A[2] A[3] A[4] 3 4 10 3  $\leftrightarrow$  5

 A[0]
 A[1]
 A[2]
 A[3]
 A[4]

 3
 4
 10
 3
 5

 A[0]
 A[1]
 A[2]
 A[3]
 A[4]

 3
 4
 3
 10
 5

3 3 4 10 5

- Swap occured? (Yes, goto ①)
- ① A[0] A[1] A[2] A[3] A[4] 3 3 4 10 5

参考Ripes仿真器的"Console printing"代码?

## 可执行程序生成与执行

### High Level to Assembly,

图1-4

- High Level Lang (C, etc.)
- High-level language program (in C)

- Statements
- Variables
- Operators
- func, proc, methods
- Assembly Language
  - Instructions
  - Registers
  - Memory segments/sections
- Data Representation
- Number Systems

Assembly language program (for RISC-V)

```
swap(size_t v[], size_t k)
   size t temp:
   temp = v[k]:
    v[k] = v[k+1]:
    v[k+1] = temp;
   Compiler
swap:
      slli x6. x11. 3
           x6, x10, x6
           x5.0(x6)
           x7.8(x6)
           x7.0(x6)
           x5.8(x6)
      ialr x0. 0(x1)
  Assembler
```

Binary machine language program (for RISC-V) 

### Program Development Process



- Implementation Phase
  - editor
  - Compilers
    - Cross compiler
      - Runs on one processor, but generates code for another
  - Assemblers
  - Linkers

#### Verification Phase

- Debuggers
- Profilers

### A translation hierarchy for C, FIG 2.20



### The Assembly Process: 生成.obj

- Assembler translates source file to object code (common object file format, COFF)
  - Recognizes mnemonics for OP codes
  - Interprets addressing modes for operands
  - Recognizes directives that define constants and allocate space in memory for data
  - Labels and names placed in symbol table
- 关键问题: Consider forward branch to label in program
  - Offset cannot be found without target address
- Let assembler make two passes over program
  - 1<sup>st</sup> pass: generate all machine instructions, and enter labels/addresses into symbol table
    - Some instructions incomplete but sizes known
  - 2<sup>nd</sup> pass: calculate unknown branch offsets using address information in symbol table



# .obj与Symbol Table



符号表: 全局定义和外部引用

directives: 内存地址指针 Labels: 程序地址标号 names: 段名,变量名

```
.text
  .align 2
  .globl main
main:
  addi sp,sp,-16
  sw ra, 12(sp)
  lui a0,%hi(string1)
  addi a0,a0,%lo(string1)
  lui a1,%hi(string2)
  addi a1,a1,%lo(string2)
  call printf
       ra, 12(sp)
  lw
  addi sp, sp, 16
  li
       a0,0
  ret
  .section .rodata
  .balign 4
string1:
  .string "Hello, %s!\n"
string2:
  .string "world"
```

### The Linker: 合并各段

- Combines object files into object program (exe)
  - Constructs map of full program in memory using length information in each object file
    - Map determines addresses of all names
  - Instructions referring to external names are finalized with addresses determined by map
- Libraries: Subroutines
  - includes name information to aid in resolving references from calling program

## Linking and Locating



### ELF 格式目标文件结构



### Loading/Executing Object Programs

- 将映像文件从磁盘加载到内存
  - 读取文件头来确定各段大小
  - 创建虚拟地址空间
  - 将代码和初始化的数据复制到内存中
    - 或设置页表项来处理缺页
  - 在栈上建立参数
  - 初始化寄存器(包括sp、fp、gp)
  - 跳转到启动例程
    - 将参数复制到x10等等并调用main函数
    - 当main函数返回时,进行exit系统调用

## ISA分类与进化

### 影响早期ISA设计的因素

- 内存小而慢,能省则省
  - 某个完整系统只需几K字节
  - 指令长度不等、执行多个操作的指令
- 寄存器贵,少
  - 操作基于存储器
  - 多种寻址方式
- 编译技术尚未出现
  - 程序是以机器语言或汇编语言设计
  - 当时的看法是硬件比编译器更易设计
    - 为了便于编写程序, 计算机架构师造出越来越复杂的指令, 完成高级程序语言直接表达的功能
    - 进化中的痕迹: X86中的串操作指令



### 机器结构与ISA分类

- 指令格式和寻址方式越复杂,则越灵活高效
  - 性能: 操作数的存放位置(访存是瓶颈)
  - 权衡: 硬件设计复杂度、指令系统的兼容性
- 机器结构: processor designer view
  - stack
  - Accumulator
  - register-mem
  - register-register
- ISA分类: programmer/compiler view
  - CISC: 以机器指令实现高级语言功能 (reg-mem)
  - RISC: 采用load/store体系,运算基于寄存器(reg-reg)
  - VLIW:兼容性差,硬件简单,低功耗

### ISA Classes (processor designer view)



| Stack  | Accumulator | Register<br>(register-memory) | Register<br>(load-store) |
|--------|-------------|-------------------------------|--------------------------|
| Push A | Load A      | Load R1,A                     | Load R1,A                |
| Push B | Add B       | Add R3,R1,B                   | Load R2,B                |
| Add    | Store C     | Store R3,C                    | Add R3,R1,R2             |
| Pop C  |             |                               | Store R3,C               |

H&P第6版: 附录A,图A.1,四类ISA中操作数的位置

# ISA分类 (programmer prospective)

- CISC: 硬件换性能! —上千条指令
  - 以机器指令实现高级语言功能
  - 指令译码复杂
    - 指令格式、字长不一(x86从1byte~6bytes)
    - 寻址方式多
  - 访存开销大:寄存器少,任何指令都可以访存
- RISC: 简化硬件, 优化常用操作! —两百条指令
  - 指令字长固定,格式规则,种类少,寻址方式简单
  - 减少访存,设置大量通用寄存器,运算基于寄存器
    - 为了提高性能,需要减少访存次数,因此寄存器寻址性能最高。
    - 采用load/store体系,只有load/store指令访存。
  - 采用Superscalar、Superpipeling等技术,提高IPC
- VLIW: 空间换时间,低功耗,兼容性差

### The CISC's eight principles:

- Instructions are of variable format.
- There are multiple instructions and addressing modes.
- Complex instructions take many different cycles.
- Any instruction can reference memory.
- There is a single set of registers.
- No instructions are pipelined.
- A microprogram is executed for each native instruction.
- Complexity is in the microprogram and hardware.

## X86指令格式,图2-35



### Growth of x86 instruction set over time



### X86 Instruction Distribution

| Rank | 80x86 instruction      | Integer average<br>(% total executed) |
|------|------------------------|---------------------------------------|
| 1    | load                   | 22%                                   |
| 2    | conditional branch     | 20%                                   |
| 3    | compare                | 16%                                   |
| 4    | store                  | 12%                                   |
| 5    | add                    | 8%                                    |
| 6    | and                    | 6%                                    |
| 7    | sub                    | 5%                                    |
| 8    | move register-register | 4%                                    |
| 9    | call                   | 1%                                    |
| 10   | return                 | 1%                                    |
| То   | tal                    | 96%                                   |

### RISC的理论基础

计算机指令代码的80 / 20规律



## The RISC's eight principles:

- Fixed-format instructions.
- Few instructions and addressing modes.
- Simple instructions taking one clock cycle.
- LOAD/STORE architecture to reference memory.
- Large multiple-register sets.
- Highly pipelined design.
- Instructions executed directly by hardware.
- Complexity handled by the compiler and software.

## RV指令分布,图2-41

#### SPEC CPU2006

|                   |                                        |                                              | Frequ   | iency   |
|-------------------|----------------------------------------|----------------------------------------------|---------|---------|
| Instruction class | RISC-V examples                        | HLL correspondence                           | Integer | Fl. Pt. |
| Arithmetic        | add, sub, addi                         | Operations in assignment statements          | 16%     | 48%     |
| Data transfer     | ld, sd, lw, sw, lh,<br>sh, lb, sb, lui | References to data structures in memory      | 35%     | 36%     |
| Logical           | and, or, xor, sll,<br>srl, sra         | Operations in assignment statements          | 12%     | 4%      |
| Branch            | beq, bne, blt, bge,<br>bltu, bgeu      | If statements; loops                         | 34%     | 8%      |
| Jump              | jal, jalr                              | Procedure calls & returns; switch statements | 2%      | 0%      |

#### CISC machine vs RISC machine

- Instructions are of variable format.
- There are multiple instructions and addressing modes.
- Complex instructions take many different cycles.
- Any instruction can reference memory.
- There is a single set of registers.
- No instructions are pipelined.
- A microprogram is executed for each native instruction.
- Complexity is in the microprogram and hardware.

- Fixed-format instructions.
- Few instructions and addressing modes.
- Simple instructions taking one clock cycle.
- LOAD/STORE architecture to reference memory.
- Large multiple-register sets.
- Highly pipelined design.
- Instructions executed directly by hardware.
- Complexity handled by the compiler and software.

#### Intel处理器架构演化(1995—2015 年)



## MIPS is simple, elegant.

- 无互锁流水的微处理器
  - Microprocessor w/o Interlocked Piped Stages
  - interlock单元: 数据依赖问题
    - 互锁: 检测,推迟后续指令执行(互锁状态)
    - 无互锁: 尽量利用软件办法避免
    - 无互锁困难,低效: R4000以后使用interlock
- Patterson提出RISC指令集, 1980
  - 4个Design Principles
    - Simplicity favors regularity
    - Smaller is faster.
    - Make the common case fast
    - Good design demands good compromises
- Hennessy完成MIPS, 1983?
  - 第一个RISC处理器



Most HP LaserJet workgroup printers are driven by MIPS-based™ 64-bit processors.





1983







2013

RISC-IV (SPUR)

#### RV vs. MIPS: 指令格式,图2.29



差别: 1)操作码位数与位置; 2)立即数位数与位置; 2)rs/rd位置。

## ISA: A Minimalist Perspective

- ISA design decisions must take into account:
  - technology
  - machine organization
  - programming languages
  - compiler technology
  - operating systems
- · "最小"计算机?——快速原型◎,ABC
  - -A: 由哪些部件构成?
  - B: 需要哪几条指令?需要哪些寻址方式?执行过程?
  - C:



#### OISC: the one instruction set computer

- OISC: the ultimate reduced instruction set computer
  - 一条SBN指令: substract and branch if negative
    - subleq a, b, c; Mem[b] = Mem[b] Mem[a], if (Mem[b]  $\leq$  0) goto c
- 应用: 嵌入式处理器
  - 硬件极其简单
    - 程序员有充分的控制权
    - 优化由编译器完成
  - 灵活
    - 其他"指令"都可由该指令构造
    - 意味着用户可自定义指令集
    - 意味着可适用于任何领域
  - 低功耗



## 小结

- 作业
  - -2.9, 2.24, 2.35, 2.40
- 思考(选一)
  - CPU的ISA要定义哪些内容? 见Yale Patt附录A
  - main()与swap()状态保存异同?
  - Windows系统中可执行程序的格式?
- 实验报告: 2周
  - 基于RV汇编,设计一个冒泡排序程序,并用 Ripes工具调试执行。
  - 可选:测量冒泡排序程序的执行时间。





#### 8086处理器逻辑结构



## 汇编语言计算机设计思路

- 快速原型
  - 概念、原型、产品
    - 构建、评估、求精
  - 冯机模型
    - 命令式,顺序执行
  - 自顶向下, 自内而外
    - CPU + MEM
- Processor/CPU
  - ISA
    - 指令: 原子操作
  - uArch
    - 数据通路,控制器





## 边界对准问题(Memory

- Alignment)

   为了便于硬件实现,通常要求多字节的数据在存 储器的存放方式能满足"边界对准"的要求。
- 字对齐: 左移两位, 按字访问

存储器

|           | 字 (       | 在对准边界的32 | 0                  |    |
|-----------|-----------|----------|--------------------|----|
|           | 字 (       | 地址4)     | 位字长的计算机            | 4  |
| 字节(地址11)  | 字节 (地址10) | 字节(地址9)  | 中,半字地址是            | 8  |
| 字节 (地址15) | 字节(地址14)  | 字节(地址13  | 2的整数倍,字<br>地址是4的整数 | 12 |
| 半字(比      | 也址18)     | 半        | 倍,双字地址是            | 16 |
| 半字(比      | 也址22)     | 半        |                    | 20 |
|           | 双字(       | 地址24)    | 所存数据不能满            | 24 |
|           | >         | 足此要求时,可  | 28                 |    |
|           | 双字(       | 填充一个至多个  | 32<br>36           |    |
|           |           | 双字       | 空白字节。              | 30 |

地址(十进制

#### RV Arch'ed Regs

| Name    | Register<br>number | Usage                          | Preserved on call? |
|---------|--------------------|--------------------------------|--------------------|
| x0      | 0                  | The constant value 0           | n.a.               |
| x1 (ra) | 1                  | Return address (link register) | yes                |
| x2 (sp) | 2                  | Stack pointer                  | yes                |
| x3 (gp) | 3                  | Global pointer                 | yes                |
| x4 (tp) | 4                  | Thread pointer                 | yes                |
| x5-x7   | 5–7                | Temporaries                    | no                 |
| x8-x9   | 8–9                | Saved                          | yes                |
| x10-x17 | 10–17              | Arguments/results              | no                 |
| x18-x27 | 18–27              | Saved                          | yes                |
| x28-x31 | 28–31              | Temporaries                    | no                 |

| Register | ABI Name | Description                       | Saver  |
|----------|----------|-----------------------------------|--------|
| х0       | zero     | Hard-wired zero                   | _      |
| x1       | ra       | Return address                    | Caller |
| x2       | sp       | Stack pointer                     | Callee |
| x3       | gp       | Global pointer                    | _      |
| x4       | tp       | Thread pointer                    | _      |
| x5       | t0       | Temporary/alternate link register | Caller |
| x6-7     | t1-2     | Temporaries                       | Caller |
| x8       | s0/fp    | Saved register/frame pointer      | Callee |
| x9       | s1       | Saved register                    | Callee |
| x10-11   | a0-1     | Function arguments/return values  | Caller |
| x12-17   | a2-7     | Function arguments                | Caller |
| x18-27   | s2-11    | Saved registers                   | Callee |
| x28-31   | t3-6     | Temporaries                       | Caller |

RV 图2-14

Table 25.1

过程调用时可用a0~a7, s0~s11, t0~t6

The RISC-V Instruction Set Manual Volume I

ABI calling convention: ——减少代码量,提升性能

- a/t-registers are caller-saved
- s-regs are callee-saved and preserve their contents across function calls

## 例: Byte Ordering & Memory Alignment 大众点评 笔试中的一道题(也是全国考研题):

计算机存储器按字节编址,采用小端方式存放数据。假定int型和short型长度分别为32位和16位,并且数据按边界对齐存储。某C语言程序段如下:

```
struct {
    int a;
    char b;
    short c;
} record;
record.a=273;
```

若record首地址为0xC008,则地址0xC008中内容及record.c的地址是

A. 0x00, 0xC00D

B . 0x11, 0xC00E

C . 0x11, 0xC00D

D. 0x00, 0xC00E

| Addressing mode    | Example instruction | Meaning                                                                                                                                                                                                      | When used                                                                                                                             |
|--------------------|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| Register           | Add R4,R3           | Regs[R4] ← Regs[R4]<br>+ Regs[R3]                                                                                                                                                                            | When a value is in a register.                                                                                                        |
| Immediate          | Add R4,#3           | Regs[R4] $\leftarrow$ Regs[R4] + 3                                                                                                                                                                           | For constants.                                                                                                                        |
| Displacement       | Add R4,100(R1)      | Regs[R4] ← Regs[R4]<br>+ Mem[100+Regs[R1]]                                                                                                                                                                   | Accessing local variables<br>(+ simulates register indirect,<br>direct addressing modes).                                             |
| Register indirect  | Add R4,(R1)         | Regs[R4] ← Regs[R4]<br>+ Mem[Regs[R1]]                                                                                                                                                                       | Accessing using a pointer or a computed address.                                                                                      |
| Indexed            | Add R3,(R1+R2)      | Regs[R3] ← Regs[R3]<br>+ Mem[Regs[R1]+Regs[R2]]                                                                                                                                                              | Sometimes useful in array addressing: R1 = base of array; R2 = index amount.                                                          |
| Direct or absolute | Add R1,(1001)       | Regs[R1] ← Regs[R1]<br>+ Mem[1001]                                                                                                                                                                           | Sometimes useful for accessing static data; address constant may need to be large.                                                    |
| Memory indirect    | Add R1,@(R3)        | Regs[R1] ← Regs[R1]<br>+ Mem[Mem[Regs[R3]]]                                                                                                                                                                  | If R3 is the address of a pointer $p$ , then mode yields $*p$ .                                                                       |
| Autoincrement      | Add R1,(R2)+        | $\begin{array}{c} Regs [R1] \; \leftarrow \; Regs [R1] \\  + \; Mem [Regs [R2]] \\ Regs [R2] \; \leftarrow \; Regs [R2] \; + \; d \end{array}$                                                               | Useful for stepping through arrays within a loop. R2 points to start of array; each reference increments R2 by size of an element, d. |
| Autodecrement      | Add R1,-(R2)        | $\begin{array}{l} \operatorname{Regs}[R2] \leftarrow \operatorname{Regs}[R2] - d \\ \operatorname{Regs}[R1] \leftarrow \operatorname{Regs}[R1] \\ + \operatorname{Mem}[\operatorname{Regs}[R2]] \end{array}$ | Same use as autoincrement. Autodecrement/-increment can also act as push/pop to implement a stack.                                    |
| Scaled             | Add R1,100(R2)[R3]  | Regs[R1] ← Regs[R1]<br>+ Mem[100+Regs[R2]<br>+ Regs[R3]*d]                                                                                                                                                   | Used to index arrays. May be applied to any indexed addressing mode in some computers.                                                |

#### **Control Instruction Distribution**



Ex: beq \$s4, \$s5, Lab1

#### 分支比较的实现方式 (Branch Conditions)

| Name                   | Examples                                       | How condition is tested                                                 | Advantages                                    | Disadvantages                                                                                                                           |
|------------------------|------------------------------------------------|-------------------------------------------------------------------------|-----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| Condition<br>code (CC) | 80x86,<br>ARM,<br>PowerPC,<br>SPARC,<br>SuperH | Special bits are set by ALU operations, possibly under program control. | Sometimes condition is set for free.          | CC is extra state. Condition codes constrain the ordering of instructions since they pass information from one instruction to a branch. |
| Condition<br>register  | Alpha,<br>MIPS                                 | Tests arbitrary register with the result of a comparison.               | Simple.                                       | Uses up a register.                                                                                                                     |
| Compare<br>and branch  | PA-RISC,<br>VAX                                | Compare is part of the branch. Often compare is limited to subset.      | One instruction rather than two for a branch. | May be too much work per instruction for pipelined execution.                                                                           |

Condition: (Z, N, C, O)

MIPS的beq?

#### 8086标志寄存器:系统当前状态与控制

| 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6 | 5  | 4 | 3  | 2 | 1  |           |
|----|----|----|----|----|----|----|----|----|----|---|----|---|----|---|----|-----------|
|    |    |    |    | OF | DF | IF | TF | SF | ZF |   | AF |   | PF |   | CF | 8086/8088 |

| 标志位          |
|--------------|
| CF 进位 (有/否)  |
| PF 奇偶(偶/奇)   |
| AF 半进位       |
| ZF 全零(是/否)   |
| SF 符号 (负/正)  |
| IF 中断(允许/禁止) |
| DF 方向(増量/減量) |
| OF 溢出 (是/否)  |

| 标志位名称 | 值为 1 时的标记 | 值为 0 时的标记   |
|-------|-----------|-------------|
| OF    | OV        | NV          |
| SF    | NG        | PL          |
| ZF    | ZR        | NZ          |
| PF    | PE        | PO          |
| CF    | CY        | NC          |
| DF    | DN Zac    | rv Xiao7hen |



#### VAX11/780机器指令格式与编码

- 16位机,有16种寻址方式,共303条指令
  - -16个寄存器,指令字不定长(1~54bytes)
- 例:

```
op(4 bits) 方式(3 bits) Rn(3 bits) 方式(3 bits) Rn(3 bits)
```

- Add R2, R4; R2+R4 -> R4, "06 <u>0</u>2 04" (8进制)
- Add [R2], R1; [R2] + R1 -> R1, 06 <u>1</u>2 01 <sub>寄存器寻址</sub>
- Add #1000, R1; 1000+R1 -> R1

寄存器间接寻址

#### MIPS指令格式

- 100余条指令(COD中33条),共32个通用寄存器
- 指令格式: 定长32位
  - R-type: arithmetic instruction
  - I-type: data transfer
  - J-type: branch instruction(conditional & unconditional)
- MIPS: Simple (<=regularity) = Elegant!!!</li>

| R-type<br>reg-reg       | op(6 bits) | rs(5 bits)    | rt(5 bits)                          | rd(5 bits)         | shamt(5 bits) | funct(6 bits) |  |  |
|-------------------------|------------|---------------|-------------------------------------|--------------------|---------------|---------------|--|--|
|                         | op(6 bits) | rs(5 bits)    | rt(5 bits)                          | immediate(16 bits) |               |               |  |  |
| l-type                  | (2.1.1.)   | <b>/</b> .    |                                     |                    |               |               |  |  |
| I-type<br><i>reg-mm</i> | op(6 bits) | rs(5 bits)    | rt(5 bits)                          | addr(16 bits)      |               |               |  |  |
| J-type                  | op(6 bits) | rs(5 bits)    | rs(5 bits) rt(5 bits) addr(16 bits) |                    |               |               |  |  |
|                         | op(6 bits) | addr(26 bits) |                                     |                    |               |               |  |  |

## MIPS寻址模式(3+2)

- 立即寻址: I-type
- 寄存器寻址: R-type
- 基址寻址: I-type
- PC相对寻址: J-type
- 伪直接寻址: J-type
  - pseudo-direct addressing
  - 26位形式地址左移2位, 与PC的高4位拼接



## MIPS 指令示例

add \$t0,\$s1,\$s2

| ор     | rs     | rt     | rd     | shamt  | funct  |
|--------|--------|--------|--------|--------|--------|
| 6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |
| 0      | 17     | 18     | 8      | 0      | 32     |
| 000000 | 10001  | 10010  | 01000  | 00000  | 100000 |

| s11 \$ | t2,\$s0,4 | # reg \$t2 | = reg \$s0 | << 4 bits |       |
|--------|-----------|------------|------------|-----------|-------|
| ор     | rs        | rt         | rd         | shamt     | funct |
| 0      | О         | 16         | 10         | 4         | 0     |

## MIPS ISA的演变



#### Cortex-M0/M1/M3/M4指令集



#### TMS320C64x: VLIW



## TMS320C64x指令字



## ISA Classes (uArch prospective)

| Stack  | Accumulator | Register<br>(register-memory) | Register<br>(load-store) |
|--------|-------------|-------------------------------|--------------------------|
| Push A | Load A      | Load R1,A                     | Load R1,A                |
| Push B | Add B       | Add R3,R1,B                   | Load R2,B                |
| Add    | Store C     | Store R3,C                    | Add R3,R1,R2             |
| Pop C  |             |                               | Store R3,C               |

$$C = A + B$$

# Compact Code & Stack Architecture

- 60's,编译技术发挥寄存器的效率很困难, 因此某些设计者完全放弃寄存器,而采用 堆栈执行模型。
- 基于堆栈的操作可以有效的缩短指令字长,可以减少存储和传输的开销
  - JAVA虚拟机即采用堆栈模型
  - FORTH,仪器控制语言
- Stack在哪儿?

#### Accumulator Architectures

- 早期硬件太昂贵, 所以使用的寄存器很少。
  - -实际上只有一个用于算术指令的寄存器——被称为"Accumulator",所以这种结构被称为
    - "Accumulator Architectures"
    - 如EDSAC(第一台存储程序计算机,1949)。
- 采用 "memory-based operand-addressing mode"。
  - 进化:采用一些专用寄存器,如数组指针、堆栈 指针、乘除运算寄存器等
    - 如x86, 称为 "extended accumulator arch"

# General-Purpose Register Architecture

- 通用: 所有寄存器可以用于任何目的
- 两类
  - Register-memory Arch
    - 允许一个操作数在内存中
    - x86: Register-memory
  - Load-store or register-register architecture
    - 要求所有操作数都在寄存器中
    - MIPS: Load-store

#### Transport triggered architecture

- originally called a "move machine"
  - 程序直接控制处理器内部的传输总线
- uses only the MOVE instruction
  - 计算: 由memory-mapped ALU完成
    - 向功能单元的触发端口写数据将触发功能单元完成计算
      - 故曰transport triggered
  - 跳转: 由memory-mapped PC完成
  - 适于实现ASIP
- 上市产品: MAXQ微控制器
  - 适合数模混合应用场景

## TTA的典型结构——冯机?



#### SPIM的系统调用:附录B.9

- SPIM: MIPS-32仿真器
  - 汇编程序调试、执行
  - -标准设备I/O服务
- SYSCALL Step
  - \$v0=srv#
  - \$a0~3=arg
  - syscall
  - \$v0=返回值

| Service System Call Code Argume |    | Arguments                                                                                 | Result                     |  |
|---------------------------------|----|-------------------------------------------------------------------------------------------|----------------------------|--|
| print integer                   | 1  | \$a0 = value                                                                              | (none)                     |  |
| print float                     | 2  | \$f12 = float value                                                                       | (none)                     |  |
| print double                    | 3  | \$f12 = double value                                                                      | (none)                     |  |
| print string                    | 4  | \$a0 = address of string                                                                  | (none)                     |  |
| read integer                    | 5  | (none)                                                                                    | \$v0 = value read          |  |
| read float                      | 6  | (none)                                                                                    | \$f0 = value read          |  |
| read double                     | 7  | (none)                                                                                    | \$f0 = value read          |  |
| read string                     | 8  | \$a0 = address where string to be<br>stored<br>\$a1 = number of characters to<br>read + 1 | (none)                     |  |
| memory<br>allocation            | 9  | \$a0 = number of bytes of storage desired                                                 | \$v0 = address of<br>block |  |
| exit (end of<br>program)        | 10 | (none)                                                                                    | (none)                     |  |
| print character                 | 11 | \$a0 = integer                                                                            | (none)                     |  |
| read character                  | 12 | (none)                                                                                    | char in \$v0               |  |

#### MARS模拟器



|            | Category              | Instruction                         | Example             | Meaning                                            | Comments                              |
|------------|-----------------------|-------------------------------------|---------------------|----------------------------------------------------|---------------------------------------|
|            |                       | add                                 | add \$s1,\$s2,\$s3  | \$s1 = \$s2 + \$s3                                 | Three register operands               |
|            | Arithmetic            | subtract                            | sub \$s1,\$s2,\$s3  | \$s1 = \$s2 - \$s3                                 | Three register operands               |
|            | ]                     | add immediate                       | addi \$s1,\$s2,20   | \$s1 = \$s2 + 20                                   | Used to add constants                 |
|            |                       | load word                           | lw \$s1,20(\$s2)    | \$s1 = Memory[\$s2 + 20]                           | Word from memory to register          |
|            | ]                     | store word                          | sw \$s1,20(\$s2)    | Memory[ $$s2 + 20$ ] = $$s1$                       | Word from register to memory          |
|            |                       | load half                           | 1h \$s1,20(\$s2)    | s1 = Memory[s2 + 20]                               | Halfword memory to register           |
|            |                       | load half unsigned                  | 1hu \$s1,20(\$s2)   | \$s1 = Memory[\$s2 + 20]                           | Halfword memory to register           |
|            | ] 1                   | store half                          | sh \$s1,20(\$s2)    | Memory[ $$s2 + 20$ ] = $$s1$                       | Halfword register to memory           |
|            | Data<br>transfer      | load byte                           | 1b \$s1,20(\$s2)    | \$s1 = Memory[\$s2 + 20]                           | Byte from memory to register          |
| MP         | uansici               | load byte unsigned                  | 1bu \$s1,20(\$s2)   | \$s1 = Memory[\$s2 + 20]                           | Byte from memory to register          |
|            | ]                     | store byte                          | sb \$s1,20(\$s2)    | Memory[ $$s2 + 20$ ] = $$s1$                       | Byte from register to memory          |
|            | ]                     | load linked word                    | 11 \$s1,20(\$s2)    | \$s1 = Memory[\$s2 + 20]                           | Load word as 1st half of atomic swap  |
|            | ]                     | store condition, word               | sc \$s1,20(\$s2)    | Memory[\$s2+20]=\$s1;\$s1-0 or 1                   | Store word as 2nd half of atomic swap |
| S          | ]                     | load upper immed.                   | lui \$s1,20         | \$s1 = 20 * 2 <sup>16</sup>                        | Loads constant in upper 16 bits       |
|            |                       | and                                 | and \$s1,\$s2,\$s3  | \$s1 = \$s2 & \$s3                                 | Three reg. operands; bit-by-bit AND   |
| <b>\</b> / | Logical               | or                                  | or \$s1,\$s2,\$s3   | \$s1 = \$s2   \$s3                                 | Three reg. operands; bit-by-bit OR    |
| <i>7</i> 1 |                       | nor                                 | nor \$s1,\$s2,\$s3  | \$s1 = ~ (\$s2   \$s3)                             | Three reg. operands; bit-by-bit NOR   |
| 1 +        |                       | and immediate                       | andi \$s1,\$s2,20   | \$s1 = \$s2 & 20                                   | Bit-by-bit AND reg with constant      |
| ム戸         |                       | or immediate                        | ori \$s1,\$s2,20    | \$s1 = \$s2   20                                   | Bit-by-bit OR reg with constant       |
|            |                       | shift left logical                  | sll \$s1,\$s2,10    | \$s1 = \$s2 << 10                                  | Shift left by constant                |
| עיול       |                       | shift right logical                 | srl \$s1,\$s2,10    | \$s1 = \$s2 >> <b>10</b>                           | Shift right by constant               |
| #4         | Conditional<br>branch | branch on equal                     | beq \$s1,\$s2,25    | if (\$s1 == \$s2) go to<br>PC + 4 + 100            | Equal test; PC-relative branch        |
|            |                       | branch on not equal                 | bne \$s1,\$s2,25    | if (\$s1!= \$s2) go to<br>PC + 4 + 100             | Not equal test; PC-relative           |
| <b>*</b>   |                       | set on less than                    | slt \$s1,\$s2,\$s3  | if $(\$s2 < \$s3)$ $\$s1 = 1$ ;<br>else $\$s1 = 0$ | Compare less than; for beq, bne       |
|            |                       | set on less than unsigned           | sltu \$s1,\$s2,\$s3 | if $(\$s2 < \$s3)$ $\$s1 = 1$ ;<br>else $\$s1 = 0$ | Compare less than unsigned            |
|            |                       | set less than immediate             | s1ti \$s1,\$s2,20   | if (\$s2 < 20) \$s1 = 1;<br>else \$s1 = 0          | Compare less than constant            |
|            |                       | set less than<br>immediate unsigned | sltiu \$s1,\$s2,20  | if $(\$s2 < 20) \$s1 = 1$ ;<br>else $\$s1 = 0$     | Compare less than constant unsigned   |
|            | Liumn                 | jump                                | j 2500              | go to 10000                                        | Jump to target address                |
|            |                       | jump register                       | jr \$ra             | go to \$ra                                         | For switch, procedure return          |
|            |                       | jump and link                       | ja1 2500            | \$ra = PC + 4; go to 10000                         | For procedure call                    |

#### Policy of Use Conventions for registers

| REGISTER  | NAME      | USAGE                                            |  |  |
|-----------|-----------|--------------------------------------------------|--|--|
| \$0       | \$zero    | 常量0(constant value 0)                            |  |  |
| \$1       | \$at      | 保留给汇编器(Reserved for assembler)                   |  |  |
| \$2-\$3   | \$v0-\$v1 | 函数调用返回值(values for results and expression evalua |  |  |
|           |           | tion)                                            |  |  |
| \$4-\$7   | \$a0-\$a3 | 函数调用参数(arguments)                                |  |  |
| \$8-\$15  | \$t0-\$t7 | 暂时的(或随便用的)                                       |  |  |
| \$16-\$23 | \$s0-\$s7 | 保存的(或如果用,需要SAVE/RESTORE的)(saved)                 |  |  |
| \$24-\$25 | \$t8-\$t9 | 暂时的(或随便用的)                                       |  |  |
| \$28      | \$gp      | 全局指针(Global Pointer)                             |  |  |
| \$29      | \$sp      | 堆栈指针(Stack Pointer)                              |  |  |
| \$30      | \$fp      | 帧指针(Frame Pointer)                               |  |  |
| \$31      | \$ra      | 返回地址(return address)                             |  |  |

• \$26..\$27: \$k0..\$k1, 为OS存放EPC

#### 程序结构



directives: 内存地址指针 Labels: 内存地址标号

names: 段名,变量名





First, design your system so that the code is measurable! Measure execution time as part of your standard testing. Do not only test the functionality of the code!

Learn both coarse-grain and fine-grain techniques to measure execution time.

Use coarse-grain measurements for analyzing real-time properties

Use fine-grain measurements for optimizing and fine-tuning

## No measurements of execution time!

#### WCET: 程序执行时间

- 执行代码所花费的处理器时间
  - 取决于程序的结构、目标处理器、执行环境
    - 分支路径、循环控制参数、数据存储位置
    - ILP、Cache、中断
  - 不考虑上下文切换
- 见CSAPP第九章"测量程序执行时间"



#### 计算机基本行为: 计算、访存、I/O

Operating Environment (source and destination of data)





**Data Movement** 



Processing from/to storage



**Data Storage** 



Processing

from Storage to I/O