Parallel Programming Models & Platforms Application to Multimedia

footballsyrupΛογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 3 χρόνια και 6 μήνες)

84 εμφανίσεις

Parallel Programming Models & Platforms
Application to Multimedia
Pierre Paulin, Director
SoC Platform Automation Technologies
STMicroelectronics
Central R&D, Ottawa, Canada


Proven, established
Parallel programming
programming models
models
TM
StepNP
User-defined
Keep it simple, regular,
parallelism
predictable
Use industry standards:
Application S/W
Processors, NoC, I/O
Simplify use of
legacy architectures
FlexMP SoC Platform MultiFlex SoC Tools
Proc
Proc
Lightweight mapping
eFPGA
. . .
eFPGA
I/O tools, H/W RTOS
Standard simulation
I/O
& analysis tools
FPGA
FPGA . . .
I/O eMEM
FPGA FPGA
eMEM
H/W PE
H/W PE
User-defined analyses
2
MPSoC, July 2004

Outline
FlexMP architecture platform
MultiFlex Tools and Methodologies
MP-SoC compilation, H/W O/S
Applications
MPEG4 video codec
10 Gb/s IPv4 packet forwarding
2.5 Gb/s traffic manager
3G basestation
3
MPSoC, July 2004
Processor
Packeti-
FlexMP SoC Platform eSoG
zation
Mem
ASIC
P Pr ro oc ce es ss so or r 1 1 P Pr ro oc ce es ss so or r N N
MPU
eRAM eRAM
I/O
P1 P1
Proc. Proc.
CoProc
Pn Pn
. . .
I/O
eSoG eSoG
App-spec
I/O
Network-on-Chip
Network-on-Chip
Mem
I/O
H/W
Gen-purp
FPGA
FPGA
H/W PE
H/W PE
eMEM FPGA FPGA
schedulers
eMEM
I/O
(eFPGA)
(eSoG)
Multi-threaded, multi-processor platform
Popular processor models w. config. extensions:
H/W multithreading and pipeline depth
4
MPSoC, July 2004


MultiFlex MP-SoC Platform Tools
Two parallel
Executable Spec
Programming Models
DSOC: Message passing
SMP: Shared memory
MP programming models
Application to platform mapping
Conf. Proc.
Conf. Proc.
H/W message passing,
. . .
mem
mem
eFPGA
I/O eFPGA
SoG SoG
IP Plug and Play
I/O
H/W MP-O/S
NoC
scheduler
I/O
accellerators
FPGA
H/W O/S
eSoG/
FPGA
eMEM
. . .
eMEM
Schedulers
eFPGA
5
MPSoC, July 2004
MultiFlex Message Passing
Executable Spec
Message Passing
H/W Accelerator
MP programming models
Neutral Data Format,
Standard NoC I/F
Conf. Proc.
Conf. Proc.
. . .
mem
mem
eFPGA
I/O eFPGA
fpga SoG
I/O
NoC
Message passing
I/O
FPGA
H/W O/S
eSoG/
IP Plug and Play
FPGA
eMEM
. . .
eMEM
Schedulers
eFPGA
6
MPSoC, July 2004

MultiFlex H/W O/S
Executable Spec
Manage med-grain
concurrency (~100 instr)
Fault tolerance
MP programming models
Future:
Manage power, QoS
Conf. Proc.
Conf. Proc.
. . .
mem
mem
eFPGA
I/O eFPGA
fpga SoG
I/O
NoC
I/O
FPGA
H/W O/S
eSoG/
FPGA
eMEM
. . .
eMEM
Schedulers
eFPGA
7
MPSoC, July 2004

Message Passing Model: DSOC
(Distr. System Object Component)
Object2
Object1
IDL-based I/F
Object3
Object4
Based on leading distributed S/W concepts
E.g. CORBA, DCOM
Objects represent application functionality
Inter-object communication via standard I/F
Use of lightweight Interface Description Language
Platform independent, no mapping assumptions
8
MPSoC, July 2004

DSOC to Platform Mapping
S/W-S/W com
Object2
S/W-H/W com
Object1
Message
Passing Engine
Object3
Object4
Synthesis from IDL:
Auto generation of
Drivers between
PE1.n
PE2 PE3
different O/S
PE1.1
Fast H/W GP RISC,
Drivers between
MT-RISC Proc. Element Std. O/S
S/W PE’s and
network-on-chip
NoC
Glue logic between
H/W PE’s and NoC
DSOC Task
Input
Output
Scheduler
9
MPSoC, July 2004S/W
S/W
S/W
DSOC
DSOC
DSOC
Obj H/W
H/W
Obj
H/W
DSOC
Obj
DSOC
DSOC
DSOC
Obj
Obj
Obj
H/W MT
Platform
RISC (1 to N)
Message Engine Message Engine Message Engine
and NoC I/F and NoC I/F and NoC I/F
Network-on-Chip
NoC I/F
Max processor-processor
Client Server Server
message passing rate: req avail. start
Service
35 MHz (500 MHz clk)
FIFO’s
id
15 MHz (200 MHz clk)
id id

<15 instructions
H/W Object Request Broker
10
MPSoC, July 2004


Programming Model 2: SMP
SMP Object
T2
T1 T3
Symmetric multi-processing
Shared
Memory
with shared-memory
Complement to DSOC programming model
DSOC object may have SMP internal implementation
SMP is more natural MP model for Multimedia
SMP Nano-kernel written in C and C++
Java/C# style concurrency primitives implemented
with C++ API (or C Posix API)
Hardware O/S assists in implementation
11
MPSoC, July 2004T1 Tm T1 Tm
Pipe1 Pipe1
. .
. … . …
PipeP PipeP

Data$ Data$
SMP
RISC 1 RISC N
Platform
Message Engine Message Engine
and NoC I/F and NoC I/F
Network-on-Chip
NoC I/F
Fork 1~256 threads:
H/W Concurrency Engine
Run Queue
Semaphore
10 instructions
Monitor
(50ns @200MHz)
Entry List
+ 12 cycles/thread
Condition
(in conc. engine)
12
MPSoC, July 2004SMP/DSOC to Platform Mapping
Object2
Shared mem com
Object1 T2
T1 T3
S/W-S/W com
Shared
S/W-H/W com
Memory
Message
Object3
Object4
Passing Engine
PE1.n PE2 PE3.n
DSP
Fast H/W
DSP
DSP
MT-RISC Proc. Element
NoC
DSOC SMP
In Out
SRAM
Scheduler Scheduler
13
MPSoC, July 2004

Outline
FlexMP architecture platform
Multi-threaded processors, Flexible H/W
Network-on-Chip (NoC) interconnect
MultiFlex Tools and Methodologies
Multi-Processor SoC analysis and debug tools
MP-SoC compilation, H/W O/S
Applications
MPEG4 video codec
10 Gb/s IPv4 packet forwarding
2.5 Gb/s traffic manager
3G basestation
14
MPSoC, July 2004MPEG4 Codec Exploration
30 frame/sec, VGA resolution (4.1 GIPS required)
High-level model using SMP and Message Passing
• Off-the-shelf appln. code
95% 96% • Off-the-shelf appln. code
95% 96%
• ARM7 RISC @ 200MHz
Lines of Code Lines of Code • ARM7 RISC @ 200MHz
Lines of Code Lines of Code
• Simple memory arch.
!
in S/W • Simple memory arch.
in S/W
in S/W
in S/W
H/W S/W
Lower cost
More Flexibility
Exploration
0 RISC
21 RISC
S/W: 5 RISC, 4 threads S/W: 15 RISC, 16 threads
S/W: 5 RISC, 4 threads S/W: 15 RISC, 16 threads
(88% Utilization) (75% Utilization)
(88% Utilization) (75% Utilization)
Coproc: Clip Div Abs Sgn Coproc: Clip Div Abs Sgn
Coproc: Clip Div Abs Sgn Coproc: Clip Div Abs Sgn
H/W (80% perf.):
H/W (65% perf.):
H/W (80% perf.):
H/W (65% perf.):
DCT, SAD,
DCT, SAD
DCT, SAD,
DCT, SAD
BDIFF, BADD, BQ, BIQ
BDIFF, BADD, BQ, BIQ
15
MPSoC, July 2004
Load balancing
The total load average is about 88%
The load is well balanced over the 5 ARMs thanks to
concurrency engine
Load average
1
0.8
0.6
I-frame
P-frame
0.4
0.2
0
1 2 3 4 5
ARM
16
MPSoC, July 2004
load averageHW multithreading
H/W multithreading
1.2
1
0.8
0.6
0.4
0.2
0
1 2 3 4 5 6 7
Task
2 threads 4 threads 8 threads
17
MPSoC, July 2004
Load averageExecution speed up
Execution speed up
35
30
25
20
15
10
5
0
2 3 4 5 6
Number of ARMs
8 threads Theoretical 2 threads
4 threads Latency = 0
18
MPSoC, July 2004
FPSCache analysis
14
30
12
10
20
8
6
10
4
2
0
0 2 4 6 8 10 12 14 16
Cache size (KB)
Cache miss FPS: 8 threads FPS: 2 threads
19
MPSoC, July 2004
Cache miss ratio (%)
FPS

Local Noc Data Bandwidth (p-frame)
Data
1.00 GB/S
access
Quantization
Iinverse Quantization
0.75 GB/S
Hardware
DCT/IDCT
BADD/BDIFF
RISC data
BZIGZAG
access
Hardware
0.50 GB/S
RISC stack
access
CE access
0.25 GB/S
Horba
access
Use of H/W load balancing engines (CE and HORBA)
Only 3.8% data bandwidth overhead
20
MPSoC, July 2004

# processors N = 8-12
# clock = 500 MHz
dataMgrCpt
# pipe stages = 4
DSOC+
# threads Tm = 8
SMP
# D$ sets = 256
Traffic D$ size = 4 KB
Latency= 40ns,
Manager
+/-25% jitter
(2.5Gb/s)
SRAM banks = 4
queMgrCpt
SRAM acc = 10 ns
T1 Tm T1 Tm T1 Tm T1 Tm
Pipe1 Pipe1 Pipe1 Pipe1
. . . .
Results:

. … . … . … .
PipeP PipeP PipeP PipeP
. . .
85-92%
Data$ Data$ Data$
Data$
PE
RISC 1 RISC 2 RISC N-1 RISC N
utilization
Parameterizable Network-on-Chip (latency L ) Msg
Parameterizable Network-on-Chip (latency L )
i, j
i, j
passing
code
SMP
queMgr
dataMgr
SRAM
ingSPI egrSPI
SRAM
SRAM
<20%
DSOC
extSRAM
extDRAM
scheduler
21
MPSoC, July 2004
ingSPICpt
mem
ingPktCpt
schPktCpt
egrPktCpt
shpPktCpt
mem
egrSPICpt3G Basestation Platform Exploration
DSOC Objects
S/W
H/W
22
MPSoC, July 2004


MultiFlex MP-SoC Tools: Summary
Value-added:
Executable Spec
Platform independent eS/W
Platform scalability
High PE utilization (85-97%)
MP programming models Ease of programming
Application to platform mapping
Multi-media
Conf. Proc.
Conf. Proc.
. . .
mem
mem
Networking
eFPGA
I/O eFPGA
SoG SoG
3G basestation
I/O
NoC
S/W
I/O
Application
H/W IP
FPGA
H/W O/S
eSoG/
FPGA
eMEM
. . .
eMEM
S/W dbase
dbase
Schedulers
eFPGA
23
MPSoC, July 2004