1 iki20210 pengantar organisasi komputer kuliah no. 25: pipeline 10 januari 2003 bobby nazief...
TRANSCRIPT
1
IKI20210Pengantar Organisasi Komputer
Kuliah no. 25: Pipeline
10 Januari 2003
Bobby Nazief ([email protected])Johny Moningka ([email protected])
bahan kuliah: http://www.cs.ui.ac.id/~iki20210/
Sumber:1. Hamacher. Computer Organization, ed-4.2. Materi kuliah CS152, th. 1997, UCB.
3
Pipelining is Natural!
° Laundry Example
° Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold
° Washer takes 30 minutes
° Dryer takes 40 minutes
° “Folder” takes 20 minutes
A B C D
4
Sequential Laundry
° Sequential laundry takes 6 hours for 4 loads
° If they learned pipelining, how long would laundry take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
5
Pipelined Laundry: Start work ASAP
° Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
30 40 40 40 40 20
6
Pipelining Lessons
° Pipelining doesn’t help latency of single task, it helps throughput of entire workload
° Pipeline rate limited by slowest pipeline stage
° Multiple tasks operating simultaneously using different resources
° Potential speedup = Number pipe stages
° Unbalanced lengths of pipe stages reduces speedup
° Time to “fill” pipeline and time to “drain” it reduce speedup
° Stall for Dependences
A
B
C
D
6 PM 7 8 9
Task
Order
Time
30 40 40 40 40 20
8
Kilas Balik: Tahapan Eksekusi Instruksi
Instruksi:Add R1,(R3) ; R1 R1 + M[R3]
Langkah-langkah:1. Fetch instruksi
1. PCout, MARin, Read, Clear Y, Set carry-in to ALU, Add, Zin
2. Zout, PCin, WMFC
3. MDRout, IRin
2. Fetch operand #1 (isi lokasi memori yg ditunjuk oleh R3)
4. R3out, MARin, Read
5. R1out, Yin, WMFC
3. Lakukan operasi penjumlahan
6. MDRout, Add, Zin
4. Simpan hasil penjumlahan di R1
7. Zout, R1in, End
9
The Five Stages of (MIPS) Load Instruction
° Ifetch: Instruction Fetch
° Reg/Dec: Registers Fetch and Instruction Decode
° Exec: Calculate the memory address
° Mem: Read the data from the Data Memory
° Wr: Write the data back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
Load/Store Architecture:access to/from memory only by Load/Store instructions
10
Pipelined Execution
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WBProgram Flow
Time
° Overlapping instruction execution
° Maximum number instructions executed simultaneously = number of stages
11
Why Pipeline?
° Non-pipeline machine• 10 ns/cycle x 4.6 CPI (due to instr mix) x 100 inst = 4600 ns
° Ideal pipelined machine• 10 ns/cycle x (4 cycle fill + 1 CPI x 100 inst) = 1040 ns
Clk
Cycle 1
Non-pipeline Implementation:
Ifetch Reg Exec Mem Wr
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Load Ifetch Reg Exec Mem Wr
Ifetch Reg Exec Mem
Load Store
Pipeline Implementation:
Ifetch Reg Exec Mem WrStore
Ifetch
R-type
Ifetch Reg Exec Mem WrR-type
12
Why Pipeline? Because the resources are there!
Instr.
Order
Time (clock cycles)
Inst 0
Inst 1
Inst 2
Inst 4
Inst 3
AL
UIm Reg Dm Reg
AL
UIm Reg Dm Reg
AL
UIm Reg Dm RegA
LUIm Reg Dm Reg
AL
UIm Reg Dm Reg
14
Partitioning the Datapath (1/2)
PC
Nex
t P
C
Ope
rand
Fet
ch Exec Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
Inst
ruct
ion
Fet
ch
Res
ult
Sto
re
ALUctr
RegDst
ALUSrc
ExtOp
MemWr
nPC_sel
RegWr
MemWr
MemRd
° Add registers between smallest steps
Store Instruction
Store Source
(Register)Operands
Store Results
Store Read-Data
(from Memory)
15
Partitioning the Datapath (2/2)
ALU R
eg.
File
Mem
Acc
ess
Dat
aM
em
A
B
R
M
Reg
File
Equ
al
PC
Nex
t P
C
IR
Inst
. M
em
Valid
IRex
e
Dcd
Ctr
l
IRm
em
Ex
Ctr
l
IRw
b
Mem
Ctr
l
WB
Ctr
l
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
17
Can pipelining get us into trouble?
° Yes: Pipeline Hazards• structural hazards: attempt to use the same resource two
different ways at the same time
- E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV)
• data hazards: attempt to use item before it is ready
- E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer
- instruction depends on result of prior instruction still in the pipeline
• control hazards: attempt to make a decision before condition is evaluated
- E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in
- branch instructions
° Can always resolve hazards by waiting• pipeline control must detect the hazard
• take action (or delay action) to resolve hazards
18
Mem
Single Memory is a Structural Hazard
Instr.
Order
Time (clock cycles)
Load
Instr 1
Instr 2
Instr 3
Instr 4A
LUMem Reg Mem Reg
AL
UMem Reg Mem Reg
AL
UMem Reg Mem RegA
LUReg Mem Reg
AL
UMem Reg Mem Reg
Detection is easy in this case! (right half highlight means read, left half write)
19
° Stall: wait until decision is clear• Its possible to move up decision to 2nd stage by adding
hardware to check registers as being read
° Impact: 2 clock cycles per branch instruction => slow
Control Hazard Solutions
Instr.
Order
Time (clock cycles)
Add
Beq
Load
AL
UMem Reg Mem Reg
AL
UMem Reg Mem RegA
LUReg Mem RegMem
20
° Predict: guess one direction then back up if wrong• Predict not taken
° Impact: 1 clock cycles per branch instruction if right, 2 if wrong (right 50% of time)
° More dynamic scheme: history of 1 branch ( 90%)
Control Hazard Solutions
Instr.
Order
Time (clock cycles)
Add
Beq
Load
AL
UMem Reg Mem Reg
AL
UMem Reg Mem Reg
MemA
LUReg Mem Reg
21
° Redefine branch behavior (takes place after next instruction) “delayed branch”
° Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” ( 50% of time)
° As launch more instruction per clock cycle, less useful
Control Hazard Solutions
Instr.
Order
Time (clock cycles)
Add
Beq
Misc
AL
UMem Reg Mem Reg
AL
UMem Reg Mem Reg
MemA
LUReg Mem Reg
Load Mem
AL
UReg Mem Reg
23
• Dependencies backwards in time are hazards
Data Hazard on r1:
Instr.
Order
Time (clock cycles)
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
IF ID/RF EX MEM WBAL
UIm Reg Dm Reg
AL
UIm Reg Dm RegA
LUIm Reg Dm Reg
Im
AL
UReg Dm Reg
AL
UIm Reg Dm Reg
24
• “Forward” result from one stage to another
Data Hazard Solution:
Instr.
Order
Time (clock cycles)
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
IF ID/RF EX MEM WBAL
UIm Reg Dm Reg
AL
UIm Reg Dm RegA
LUIm Reg Dm Reg
Im
AL
UReg Dm Reg
AL
UIm Reg Dm Reg
25
Forwarding Structure
° Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe• Increase muxes to
add paths from pipeline registers• Data Forwarding =
Data Bypassing
npc
I mem
Regs
B
alu
S
D mem
m
IAU
PC
Regs
A im op rwn
op rwn
op rwn
op rw rs rtForward
mux
26
• Dependencies backwards in time are hazards
• Can’t solve with forwarding: • Must delay/stall instruction dependent on loads
Forwarding (or Bypassing): What about Loads
Time (clock cycles)
lw r1,0(r2)
sub r4,r1,r3
IF ID/RF EX MEM WBAL
UIm Reg Dm Reg
AL
UIm Reg Dm Reg