ISTC

EC @ Cornell
Accelerating Belief Propagation
in Hardware
Skand
Hurkat
and José Martínez
Computer Systems Laboratory
Cornell University
http
://www.csl.cornell.edu
/
ISTC

EC @ Cornell
The Cornell Team
•
Prof. José Martínez (PI), Prof.
Rajit
Manohar
@
Computer Systems Lab
•
Prof.
Tsuhan
Chen
@
Advanced Multimedia Processing Lab
•
MS/Ph.D. students
–
Yuan
Tian
, MS
’
13
–
Skand
Hurkat
–
Xiaodong
Wang
ISTC

EC @ Cornell
The Cornell Graph
ISTC

EC @ Cornell
The Cornell Project
•
Provide hardware accelerators for
belief
propagation
algorithms on embedded
SoCs
(retail/car/home/mobile)
–
High speed
–
Very low power
–
Self

optimizing
–
Highly programmable
BP Accelerator within
SoC
Graph
Inference
Algorithm
Result
ISTC

EC @ Cornell
What is
b
elief propagation?
Belief propagation is a
message passing
algorithm for performing inference on
graphical
models
, such as Bayesian networks or Markov
Random Fields
ISTC

EC @ Cornell
What is
b
elief propagation?
•
Labelling problem
•
Energy as a measure of convergence
•
Minimize energy (MAP label estimation)
•
Exact results for trees
–
Converges in exactly two iterations
•
Approximate results for graphs with loops
–
Yields “good” results in practice
•
Minimum over large neighbourhoods
•
Close to optimal solution
ISTC

EC @ Cornell
N
ot all “that” alien to embedded
𝑠
0
𝑠
11
𝑠
12
𝑠
13
𝑠
21
𝑠
22
𝑠
2
3
𝑠
31
𝑠
32
𝑠
33
𝑠
41
𝑠
4
2
𝑠
4
3
𝑠
5
𝑠
0
𝑠
1
𝑠
2
𝑠
3
𝑠
4
𝑠
5
Remember the Viterbi algorithm?
•
Used extensively in digital communications
ISTC

EC @ Cornell
What does this mean?
•
Every mobile device uses Viterbi decoders
–
Error correction codes (
eg
: turbo codes)
–
Mitigating inter

symbol interference (ISI)
•
Increasing number of mobile applications
involve belief propagation
–
More general belief propagation accelerators can
greatly improve user experience with mobile
devices
ISTC

EC @ Cornell
Target markets
Retail/Car/Home/Mobile
•
Image processing
–
De

noising
–
Segmentation
–
Object detection
–
Gesture recognition
•
Handwriting recognition
–
Improved recognition
through context
identification
•
Speech recognition
–
Hidden Markov models are
key to speech recognition
Servers
•
Data mining tasks
–
Part

of

speech tagging
–
Information retrieval
–
“Knowledge graph” like
applications
•
Machine learning based
tasks
–
Constructive machine
learning
–
Recommendation systems
•
Scientific computing
–
Protein structure inference
ISTC

EC @ Cornell
Hardware accelerator for BP
BP Accelerator within
SoC
Graph
Inference
Algorithm
Result
ISTC

EC @ Cornell
Work done so far
Software
•
General purpose MRF
inference library
–
Support for arbitrary graphs
–
Floating point math
–
Parallel techniques for faster
inference
•
Library optimized for grid
graphs
–
Optimized data structures
–
Template can use any data type
–
Multiple inference techniques
optimized for early vision
–
Stereo matching in
∼
200
ms
Hardware
•
High level synthesis of
message update unit
–
Vivado
HLS (C

to

gates) tool
used to synthesize message
update unit on
ZedBoard
–
∼
2x
improvement in inference
speed on CPU+FPGA compared
to CPU

only inference
–
Fixed point math
•
GraphGen
collaboration
–
On

going work
–
Stereo matching task mapped
to multiple platforms
–
∼
10x speedup on GPU w.r.t.
CPU only implementation
ISTC

EC @ Cornell
Work done so far
Software
•
General purpose MRF
inference library
–
Support for arbitrary graphs
–
Floating point math
–
Parallel techniques for faster
inference
•
Library optimized for grid
graphs
–
Optimized data structures
–
Template can use any data type
–
Multiple inference techniques
optimized for early vision
–
Stereo matching in
∼
200
ms
Hardware
•
High level synthesis of
message update unit
–
Vivado
HLS (C

to

gates) tool
used to synthesize message
update unit on
ZedBoard
–
∼
2x
improvement in inference
speed on CPU+FPGA compared
to CPU

only inference
–
Fixed point math
•
GraphGen
collaboration
–
On

going work
–
Stereo matching task mapped
to multiple platforms
–
∼
10x speedup on GPU w.r.t.
CPU only implementation
ISTC

EC @ Cornell
Work done so far
Software
•
General purpose MRF
inference library
–
Support for arbitrary graphs
–
Floating point math
–
Parallel techniques for faster
inference
•
Library optimized for grid
graphs
–
Optimized data structures
–
Template can use any data type
–
Multiple inference techniques
optimized for early vision
–
Stereo matching in
∼
200
ms
Hardware
•
High level synthesis of
message update unit
–
Vivado
HLS (C

to

gates) tool
used to synthesize message
update unit on
ZedBoard
–
∼
2x
improvement in inference
speed on CPU+FPGA compared
to CPU

only inference
–
Fixed point math
•
GraphGen
collaboration
–
On

going work
–
Stereo matching task mapped
to multiple platforms
–
∼
10x speedup on GPU w.r.t.
CPU only implementation
ISTC

EC @ Cornell
Hierarchical belief propagation
ISTC

EC @ Cornell
Results
–
Stereo Matching
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
440000
445000
450000
455000
460000
465000
470000
475000
480000
U
p
d
a
t
e
s
E
n
e
r
g
y
Comparing inference algorithms on “Tsukuba”
benchmark
Updates
Energy
ISTC

EC @ Cornell
Work done so far
Software
•
General purpose MRF
inference library
–
Support for arbitrary graphs
–
Floating point math
–
Parallel techniques for faster
inference
•
Library optimized for grid
graphs
–
Optimized data structures
–
Template can use any data type
–
Multiple inference techniques
optimized for early vision
–
Stereo matching in
∼
200
ms
Hardware
•
High level synthesis of
message update unit
–
Vivado
HLS (C

to

gates) tool
used to synthesize message
update unit on
ZedBoard
–
∼
2x
improvement in inference
speed on CPU+FPGA compared
to CPU

only inference
–
Fixed point math
•
GraphGen
collaboration
–
On

going work
–
Stereo matching task mapped
to multiple platforms
–
∼
10x speedup on GPU w.r.t.
CPU only implementation
ISTC

EC @ Cornell
Work done so far
Software
•
General purpose MRF
inference library
–
Support for arbitrary graphs
–
Floating point math
–
Parallel techniques for faster
inference
•
Library optimized for grid
graphs
–
Optimized data structures
–
Template can use any data type
–
Multiple inference techniques
optimized for early vision
–
Stereo matching in
∼
200
ms
Hardware
•
High level synthesis of
message update unit
–
Vivado
HLS (C

to

gates) tool
used to synthesize message
update unit on
ZedBoard
–
∼
2x
improvement in inference
speed on CPU+FPGA compared
to CPU

only inference
–
Fixed point math
•
GraphGen
collaboration
–
On

going work
–
Stereo matching task mapped
to multiple platforms
–
∼
10x speedup on GPU w.r.t.
CPU only implementation
ISTC

EC @ Cornell
GraphGen
synthesis of BP

M
•
BP

M update (
logspace
messages)
implemented using
GraphGen
(Intel/CMU/UW)
•
GPU implementation
∼
10x faster than CPU
based implementation
•
On

going work on FPGA based
implementation and on implementing
hierarchical update
ISTC

EC @ Cornell
Cornell Publications (2013 only)
•
3x Comp. Vision & Pattern Recognition (CVPR)
•
3x Asynchronous VLSI (ASYNC)
•
2x
Intl. Symp. Computer Architecture (ISCA)
•
1x Intl. Conf. Image Processing (ICIP)
•
1x ASPLOS (w/
GraphGen
folks, under review)
ISTC

EC @ Cornell
Year 3 Plans
•
GraphGen
extensions for BP applications
–
Multiple inference techniques
•
Extraction of “BP ISA”
–
Ops on arbitrary graphs
–
Efficient representation
•
Amplification work on UAV ensembles
–
Self

optimizing, collaborative
SoCs
•
One

day “graph” workshop with
GraphGen+UIUC
ISTC

EC @ Cornell
Accelerating Belief Propagation
in Hardware
Skand
Hurkat
and José Martínez
Computer Systems Laboratory
Cornell University
http
://www.csl.cornell.edu
/
Comments 0
Log in to post a comment