The Fast Evaluation of Hidden Markov Models on GPU

beeuppityΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

122 εμφανίσεις

The Fast Evaluation of Hidden
Markov Models on GPU

Presented by Ting
-
Yu Mu &
Wooyoung

Lee

Introduction


Hidden Markov Model (HMM):


A statistical method (Probability based)


Used in a wide range of applications:


Speech recognition


Computer vision


Medical image analysis


One of the problems need to be solved:


Evaluate the probability of an observation
sequence on a given HMM (Evaluation)


The solution of above is the key to choose the
best matched models among the HMMs

Introduction


Example


Application: Speech Recognition


Goal: Recognize words one by one


Input:


The speech signal of a given word




Represented as a time sequence of coded spectral
vectors


Output:


The observation sequence



Represented as an index indicator in the spectral
codebook

Introduction


Example


The tasks:


Design individual HMMs for each word of
vocabulary


Perform unknown word recognition:


Using the solution of evaluation problem to score
each HMM based on the observation sequence of
the word


The model scores the highest is selected as the
result


The accuracy:


Based on the correctness of the chosen result

Computational Load


Computational load of previous example
consists of two parts:


Estimate the parameters of HMMs to build the
models, and the load varies upon each HMM


Executed only one time


Evaluate the probability of an observation
sequence on each HMM


Executed many times on recognition process


The performance is depend on the
complexity of the evaluation algorithm

Efficient Algorithm


The lower order of complexity:


Forward
-
backward algorithm


Consists of two passes:


Forward probability


Backward probability


Used extensively


Computational intensive


One way to increase the performance:


Design the parallel algorithm


Utilizing the present day’s multi
-
core systems

General Purpose GPU


Why choose Graphic Processing Unit


Rapid increases in the performance


Supports floating
-
points operations


Fast computational power/memory bandwidth


GPU is specialized for
compute
-
intensive and
highly parallel
computation

More transistors are
devoted to data
processing rather that
data caching

CUDA Programming Model


The GPU is seen as a
compute device

to
execute part of the application that:


Has to be executed multiple times


Can be isolated as a function


Works independently on different data


Such a function can be compiled to run on
the device. The resulting program is called
a Kernel


The batch of threads that executes a
kernel is organized as a grid of blocks

CUDA Programming Model


Thread Block:


Contains the batch of threads that can be
cooperate together:


Fast shared memory


Synchronizable


Thread ID



The block can be one
-
, two
-
, or three
-

dimensional arrays

CUDA Programming Model


Grid of Thread Block:


Contains the limited number of threads in a
block


Allows larger numbers of thread to execute
the same kernel with one invocation


Blocks identifiable through block ID


Leads to a reduction in thread cooperation


Blocks can be one
-

or two
-
dimensional arrays

CUDA Programming Model

CUDA Memory Model

Parallel Algorithm on GPU


The tasks of computing the evaluation
probability is split into pieces and
delivered to several threads


A thread block evaluates a Markov model


Calculating the dimension of the grid:


Obtained by dividing the number of states N by the
block size


Forward probability is computed by a thread
within a thread block


Needs to synchronize the threads due to:


Shared data

CUDAfy.NET

What is
CUDAfy.Net
?


Made by Hybrid
DSP
Systems in
Netherlands


a set of libraries and tools that permit
general purpose programming of NVIDIA
CUDA
GPUs from
the Microsoft .NET
framework
.


combining flexibility, performance and
ease of
use


First release: March 17, 2011

Cudafy.NET SDK


Cudafy

.NET
Library


Cudafy

Translator (Convert .NET code to
CUDA C)


Cudafy

Library (CUDA support for .NET)


Cudafy

Host (Host GPU wrapper)


Cudafy

Math (FFT + BLAS)


The translator converts .NET code into
CUDA code. Based on ILSPY (Open
Source .NET assembly browser and
decompiler
)



Cudafy

Translator

GENERAL CUDAFY PROCESS


T
wo
main components to the
Cudafy

SDK
:


Translation from .NET to CUDA C and compiling using NVIDIA
compiler (this results in a
Cudafy

module xml file)


Loading
Cudafy

modules and communicating with GPU from host


It
is not necessary for the target machine to perform the first step
above.


1
. Add reference to
Cudafy.NET.dll

from your .NET
project


2
. Add the
Cudafy
,
Cudafy.Host

and
Cudafy.Translator

namespaces
to source files (using in C
#)


3
. Add a parameter of
GThread

type to GPU functions and use it to
access thread, block and
grid information
as well as specialist
synchronization and local shared memory
features.


4
. Place a
Cudafy

attribute on the
functions.


5
. In your host code before using the GPU functions call
Cudafy.Translator.Cudafy
( ).
This returns
a
Cudafy

Module
instance.


6
. Load the module into a
GPGPU

instance. The GPGPU type allows
you to interact seamlessly with
the GPU
from your .NET code.

Development Requirement


NVIDIA CUDA Toolkit
4.1


Visual Studio 2010


Microsoft
VC++ Compiler (used by NVIDIA
CUDA Compiler
)


Windows( XP SP3, VISTA, 7 32bit/64bit)


NVIDIA GPUs


NVIDIA Graphic Driver


GPU
vs

PLINQ
vs

LINQ


GPU
vs

PLINQ
vs

LINQ


Reference


ILSpy

:
http://wiki.sharpdevelop.net/ilspy.ashx


Cudafy.NET :
http://cudafy.codeplex.com/


Using
Cudafy

for GPGPU Programming in .
NET :
http://www.codeproject.com/Articles/202792/Usi
ng
-
Cudafy
-
for
-
GPGPU
-
Programming
-
in
-
NET


Base64 Encoding on a
GPU

:
http://www.codeproject.com/Articles/276993/Bas
e64
-
Encoding
-
on
-
a
-
GPU


High Performance
Queries
: GPU vs LINQ vs
PLINQ :
http://www.codeproject.com/Articles/289551/Hig
h
-
Performance
-
Queries
-
GPU
-
vs
-
PLINQ
-
vs
-
LINQ