A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

yakzephyrΤεχνίτη Νοημοσύνη και Ρομποτική

24 Νοε 2013 (πριν από 3 χρόνια και 10 μήνες)

68 εμφανίσεις

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

1


2011/07/25

Presenter: Dr. Alejandro Castillo
Atoche

A Multi
-
Processor System on Chip Architecture
for Real Time Remote Sensing Data Processing


IGARSS’11

2

Outline


Introduction


Previous Work


MPSoC

via the HW/SW Co
-
design


Case Study: RBR Algorithms


Algorithm Analysis


Network on Chip (
NoC
)
-
based Accelerator


Integration in a Co
-
design scheme


New Perspective: Network of FPGA
-
VLSI
architectures


Hardware Implementation Results


Performance Analysis


Conclusions

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

3

Introduction:

Radar Imagery, Facts


The

initial

problem

of

this

proposition

for

the

Geospatial

RS

imagery

consist

in

to

solve

the

ill
-
conditioned

inverse

spatial

spectrum

pattern

(SSP)

estimation

problem

with

model

uncertainties

via

the

Bayesian

minimum

risk

(BMR)

estimation

strategy
.



In

previous

works,

alternatives

of

MPSoC

propositions

have

been

developed

but

without

systolic

arrays

techniques

or

Network

on

a

Chip

structures
.


School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

4

Introduction:


HW implementation, Facts


Why Multiprocessor System on a Chip?


Because

MPSoCs

are

single
-
chip

multiprocessor

designed

for

real

time

signal

processing

applications
.



Why Network on a Chip Accelerators?

Networks
-
on
-
chips (
NoCs
) are multiprocessor
interconnection networks designed to achieved real
time SP. Avoids Bottlenecks in HW/SW co
-
designs.

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

5

MOTIVATION



To

efficiently

conceptualize

and

implement

an

architecture

with

the

aggregation

of

parallel

computing

and

systolic

array

mapping

techniques

in

a

novel

network

on

a

chip

(
NoC
)

accelerator

scheme

via

the

HW/SW

co
-
design

paradigm
.

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

6

CONTRIBUTIONS:


First,

a

high
-
speed

robust

Bayesian

regularization

hardware

accelerator

for

the

real
-
time

enhancement

of

the

large

scale

Geospatial

imagery

is

designed
.



Second,

the

use

of

High

Performance

Computing

techniques

in

an

efficient

architecture

based

on

Network

on

a

Chip

(
NoC
)

is

also

developed
.




School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

8

Algorithmic ref. Implementation

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

9

Algorithmic ref. Implementation

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

10

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

Algorithmic ref. Implementation

Method



RSF

RBR

SNR [dB] →

15

20

25

15

20

25

Metrics

IOSNR

[dB]

10.15

15.32

20.25

6.15

10.62

13.04

PIOSNR

(%)

81.37

86.62

85.24

95.18

90.29

98.24

MSE

0.16

0.46

0.57

0.03

0.29

0.34

11

Partitioning Stage

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

FPGA
Embedded
Processor
Robust SS vector
Coprocessor
1
Data acquisition
:
Parameters
:
1
,,,,
a r


n n
R R S S
Pre
-
computed Sw
-
stage
:
,
F
Ω
+
diag
{ }


V Fuu F
RBR estimator
Coprocessor
2
0
ˆ
RBR
 
b b
ΩV
( )
j
u
12

NoC

oriented structure of the
proposed coprocessors

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

(a) Robust SS vector

From
Embedded
Processor
Input
FIFO
:
,
data input
u F
tiled
control
Fu
tiled
control
FIFO
Memory
Buffer
tiled
control

Fuu
tiled
control
FIFO
2
Fixed-Sized PA
2
4
m

Memory
Buffer


diag
 
Fuu F
tiled
control
tiled
control
FIFO
1
Fixed-Sized PA
1
32
m

3
Fixed-Sized PA
3
32
m

Output
FIFO
:
data output
V
To Embedded
Processor
13

NoC

oriented structure of the
proposed coprocessors

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

(b) RBR estimator

RBR
ˆ
b
0
b
Input
FIFO
:
,
data input
Ω V
tiled
control
ΩV
tiled
control
FIFO
Memory
Buffer
4
Fixed-Sized PA
4
32
m

From
Embedded
Processor
0

ΩV b
0
b
Output
FIFO
:
data output
To Embedded
Processor
14

Aggregation of parallel computing
techniques

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

Application

for (tile=0, tile< L, tile++){

} } }

for (i=0, i< m, i++){

for (j=0, j< n, j++){

for (k=0, k< r, k++){

a(i,j,k)=a(i,j
-
1,k);

b(i,j,k)=b(i
-
1,j,k);

c(i,j,k)=c(i,j,k
-
1) +

a(i,j,k)*b(i,j,k);

}

B[2,2]
A[2,2]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
B[0,0]
A[0,0]
-
-
-
-
B[0,1]
B[1,0]
A[1,0]
A[0,1]
-
-
B[0,2]
B[1,1]
B[2,0]
A[2,0]
A[1,1]
A[0,2]
B[1,2]
B[2,1]
A[1,2]
A[2,1]
-
-
Linear Schedule:
set
of parallel and
uniformely spaced
hyperplanes.

SFG
Projection

3
-
D Dependance
Graph (DG)

add
mul
a[i,j
-
1]

a[i,j]

b[i
-
1, j]

b[i,j]

c[i,j]

15

Tiling technique

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
Large
-
Scale

Real
-
World
Image

PE
PE
PE
PE
FIFO
FIFO
FIFO
FIFO
E/S
E/S
E/S
E/S
Fixed
-
Size
Systolic

Array

16

Tiling technique

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
E/S
E/S
E/S
E/S
Fifo
Fifo
Fifo
Fifo
PE
PE
PE
PE
E/S
E/S
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
PE
PE
PE
PE
E/S
PE
PE
PE
PE
E/S
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
Fifo
Fifo
Fifo
PE
PE
PE
Fifo
PE
PE
PE
PE
PE
E/S
E/S
E/S
E/S
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
Fifo
Fifo
PE
PE
PE
PE
Fifo
Fifo
PE
PE
PE
PE
E/S
E/S
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
PE
PE
PE
PE
Fifo
Fifo
Fifo
Fifo
PE
PE
PE
PE
E/S
E/S
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
Fifo
Fifo
PE
PE
PE
PE
Fifo
Fifo
E/S
PE
PE
PE
PE
E/S
I/O
I/O
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
E/S
Fifo
Fifo
PE
PE
PE
PE
Fifo
Fifo
PE
PE
PE
PE
E/S
E/S
E/S
E/S
(1,2)

(2,1)

Large
-
Scale

Real
-
World
Image

Fixed
-
Size
Systolic

Array

17

Fixed
-
Sized
NoC
-
PAs
-
based
Robust SS vector co
-
processor

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

Stage1:

1

PA Fu
Degraded
Large
-
scale
RS Image
(1)
u
(2)
u
(3)
u
( )
n
u
1 1
k k

32
u
1,1
u
1,2
u
1,
u
n
,
u
n n
Global
Control
full
en
1,1
F
1,2
F
1,
F
n
,
F
n n
F
en
full
FIFO
Buffer
32
4,1 3,2 2,3,
3,1 2,2 1,3
2,1 1,2
1,1
F F F F
F F F 0
F F 0 0
F 0 0 0
m n
row1
row2
u
F
1,1
u
D
1,2
u
D
1,
u
m
D
FIFO
1,3
u
D
0

1,3
u
[
32
,
23
]
1,3
F
[
32
,
23
]
T
[
64
,
46
]

[
32
,
23
]
[
32
,
23
]
TEMP_1 ( 1)
V
m

TEMP_1 ( )
V
m
[
32
,
23
]
TEMP_1
V
local
Control
1
Fixed-Sized PA
1
32
m

( 1)
n

tiled
Control
D
:
one step delay
T
:
truncate
FIFO
Buffer
tiled
Control
From
Embedded
Processor
Stage
1
:
TEMP_1

V F u
( )
n n

( 1)
n

( 1)
n

Data Skewed
18

Fixed
-
Sized
NoC
-
PAs
-
based
Robust SS vector co
-
processor

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

Stage2:

Stage
2
:
TEMP_2 TEMP_1


V V u
( )
n n

( 1)
n

(1 )
n


[
32
,
23
]
TEMP_1
1,1
V
[
32
,
23
]
1,
u
m

T
[
64
,
46
]
[
32
,
23
]
TEMP_2
1,3
V
PE
PE
PE
PE
PE
PE
PE
PE
D
D
D
D
PE
PE
PE
PE
D
D
D
D
D
D
D
D
D
D
D
TEMP_1
1,1
V
TEMP_1
2,1
V
TEMP_1
,1
V
m
TEMP_1
1,1
V
m

TEMP_1
2,1
V
m
TEMP_1
2,1
V
m

1,1
u

1,2
u

1,3
u

1,
u
m

1,1
u
m


1,2
u
m


1,3
u
m


1,2
u
m

2
Fixed-Sized PA
4
m

TEMP_1
255 1,1
V
m

TEMP_1
255 2,1
V
m

TEMP_1
,1
V
n
1,255 1
u
m


1,255 2
u
m


1,255 3
u
m


1,
u
n

TEMP_2
V
( )
n n

2


PA Fuu
19

Fixed
-
Sized
NoC
-
PAs
-
based
Robust SS vector co
-
processor

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

Stage3:

3 diag
{ }.
 

PA Fuu F
Stage
3
:
TEMP_2 diag
{ }


V V F
( )
n n

( 1)
n

( 1)
n

FIFO
PE
0
D
PE
D
PE
D
PE
D
tiled
Control
3
Fixed-Sized PA
3
32
m

1,1
F

0
0
0
1,2
F

2,1
F

0
0
1,3
F

2,2
F

3,1
F

0
1,2
F
m

2,2 1
F
m


3,2 2
F
m


,
F
m m

V
tiled
Control
TEMP_2
1,1
V
0
0
0
TEMP_2
2,1
V
TEMP_2
1,2
V
0
0
TEMP_2
2,1
V
m
TEMP_2
2 1,2
V
m

TEMP_2
2 2,3
V
m

TEMP_2
,
V
m m
32
1,1
v
1,2
v
1,
v
m
,
v
n n
Global
Control
full
FIFO
Buffer
To Embedded
Processor
V
( 1)
n

Robust
SS Vector

[
32
,
23
]
T
[
64
,
46
]
,
F
m m

TEMP_2
,
V
m m
[
32
,
23
]

[
32
,
23
]
( 1)
V
m

[
32
,
23
]
[
32
,
23
]
( )
V
m
20

Fixed
-
Sized
NoC
-
PAs
-
based

RBR estimator co
-
processor

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

4,1 3,2 2,3,
3,1 2,2 1,3
2,1 1,2
1,1
0
0 0
0 0 0
m n
   
  
 

row1
row2
1,1
V
D
1,2
V
D
1,
V
m
D
FIFO
1,3
V
D
0

1,3
V
[
32
,
23
]
1,3

[
32
,
23
]
T
[
64
,
46
]

[
32
,
23
]
[
32
,
23
]
TEMP_1 ( 1)
ˆ
b
m

TEMP_1 ( )
ˆ
b
m
[
32
,
23
]
ΩV
Fixed-Sized PA
64
m

( 1)
n

tiled
Control
D
:
one step delay
T
:
truncate
tiled
Control
32
1,1
V
1,2
V
1,
V
m
,
V
n n
FIFO
Buffer
V
32
32
1,1

1,2

1,
m

,
n n

FIFO
Buffer
Ω
32
32
32
local
Control

From
Embedded
Processor
0
b
RBR
ˆ
b
32
RBR 1,1
ˆ
b
RBR 1,2
ˆ
b
RBR 1,
ˆ
b
m
RBR,
ˆ
b
n n
Global
Control
full
FIFO
Buffer
32
RBR
ˆ
b
Reconstructed
RS Image
RBR(1)
ˆ
b
RBR(2)
ˆ
b
RBR(3)
ˆ
b
RBR( )
ˆ
n
b
1 1
k k

RBR
ˆ
b
To Embedded
Processor
Global
Control
full
en
01,1
b
01,2
b
01,
b
m
0,
b
n n
FIFO
Buffer
0
b
32
full
en
Global
Control
full
en
Global
Control
RBR 0
ˆ
 
b b
ΩV
21

New Perspective:

VLSI
-
FPGA Platforms

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.




Novel VLSI
-
FPGA platform represents a new
perspective for real time processing of newer
RS applications.


22

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

VLSI
-
FPGA

Platform

Control System
Embedded
Processor
Spatial
-
temporal
reorder
FPGA
Buffer
Memory
FIFO
Bit
-
Level
MPPA
Architecture
u
F
Image
data
Robustified
Reconstruction
operator
VLSI
Co
-
processor
Degraded
Large
-
scale
RS Image
1 1
k k

Reconstructed
RS Image
RFS( )
ˆ
j
b
1 1
k k

j
u
23

Performance Analysis:
FPGA

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

HW

co
-
processors



Robust SS vector

RBR
estimator

Synthesis
Metrics

Slices

8158

3289

*
DSP’48

144

32

^
LUTs

7539

2278

Flip
-
Flops

6304

2788

24

Performance Analysis:
FPGA

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

Implementation



Processing time (seconds)

RBR

Evaluated

PC
-
Oriented

Implementation

19.7

Proposed Efficient RBR
architecture

1.26

25

Conclusions


The

implementation

results

of

the

proposed

NoC
-
PA
-
oriented

architecture

helps

to

drastically

reduce

the

overall

processing

time

of

the

RBR

algorithm
.

In

fact,

the

presented

architecture

is

efficiently

implemented

in

MPSoC

mode

in

spite

of

employing

systems

based

on

traditional

DSPs

or

PC
-
Clusters

platforms

.



The

implementation

of

the

RBR

algorithm

using

the

proposed

architecture

takes

only

1
.
26

seconds

for

the

large
-
scale

RS

image

reconstruction

in

contrast

to

19
.
7

seconds

required

with

the

C++

implementation
.

Thus,

the

achieved

processing

time

is

approximately

16

times

less

than

the

corresponding

processing

time

with

the

conventional

C++

PC
-
based

implementation
.

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

26

Recent Selected Journal Papers

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.


A
.

Castillo

Atoche
,

D
.

Torres,

Yuriy

V
.

Shkvarko
,

“Towards

Real

Time

Implementation

of

Reconstructive

Signal

Processing

Algorithms

Using

Systolic

Arrays

Coprocessors”,

JOURNAL

OF

SYSTEMS

ARCHITECTURE

(JSA),

Edit
.

ELSEVIER
,

Volume

56
,

Issue

8
,

August

2010
,

Pages

327
-
339
,

ISSN
:

1383
-
7621
,

doi
:
10
.
1016
/j
.
sysarc
.
2010
.
05
.
004
.

JCR
.



A
.

Castillo

Atoche
,

D
.

Torres,

Yuriy

V
.

Shkvarko
,

“Descriptive

Regularization
-
Based

Hardware/Software

Co
-
Design

for

Real
-
Time

Enhanced

Imaging

in

Uncertain

Remote

Sensing

Environment”,

EURASIP

JOURNAL

ON

ADVANCES

IN

SIGNAL

PROCESSING

(JASP),

Edit
.

HINDAWI
,

Volume

2010
,

31

pages,

2010
.

ISSN
:

1687
-
6172
,

e
-
ISSN
:

1687
-
6180
,

doi
:
10
.
1155
/ASP
.

JCR
.



Yuriy

V
.

Shkvarko
,

A
.

Castillo

Atoche
,

D
.

Torres,

“Near

Real

Time

Enhancement

of

Geospatial

Imagery

via

Systolic

Implementation

of

Neural

Network
-
Adapted

Convex

Regularization

Techniques”,

JOURNAL

OF

PATTERN

RECOGNITION

LETTERS,

Edit
.

ELSEVIER,

2011
.

JCR
.

In

Press

27


Thanks for your attention.

School of
Engineering,
AutonomousUniversity

of
Yucatan, Merida, Mexico.

Dr. Alejandro Castillo
Atoche

Email: acastill@uady.mx