Accelerating image recognition

jabgoldfishAI and Robotics

Oct 19, 2013 (3 years and 9 months ago)

90 views

MACHINE VISION GROUP

Accelerating image recognition
on mobile devices using GPGPU

Miguel Bordallo
1
, Henri Nykänen
2
, Jari Hannuksela
1
, Olli Silvén
1

and Markku Vehviläinen
3


1

University of Oulu, Finland

2

Visidon Ltd. Oulu, Finland

3

Nokia Research Center, Tampere, Finland

Jari Hannuksela,
Olli Silvén

Machine Vision Group, Infotech Oulu

Department of Electrical
and Information Engineeering

University of Oulu, Finland

MACHINE VISION GROUP

Contents

Introduction

Mobile Image Recognition


Local Binary Pattern


Graphics processor as a computing engine

GPU accelerated image recognition


LBP Fragment Shader implementation


Image preprocessing

Experiments and results


Speed


Power Consumptions



MACHINE VISION GROUP

Motivation


Face detection and recognition is a key
component of future multimodal user interfaces


Mobile computation power still not harnessed
properly for real
-
time computer vision


High demand computations compromise
battery life.


Need for energy and computationally efficient
solutions

MACHINE VISION GROUP

Face analysis using local binary patterns


Face analysis is one of the major challenges in
computer vision



LBP method has already been adopted by many
leading scientists



Excellent results in face recognition and
authentication, face detection, facial expression
recognition, gender classification


MACHINE VISION GROUP

Local Binary Pattern

MACHINE VISION GROUP

GPU as a computing engine



Newer phones include a GPU chipset


OpenGL ES as a highly optimized and attractive
accelerator interface


Emerging platforms (OpenCL EP) will facilitate using
the GPU as a computing resource


Compatible data formats for graphics and camera sub
-
systems desirable





GPU can be treated a

an independent entity

MACHINE VISION GROUP

Fixed pipeline (OpenGL ES 1.1) vs.

programmable pipeline (OpenGL ES 2.0)

MACHINE VISION GROUP

Stream processing (OpenGL) vs.

shared memory processing (CUDA)

MACHINE VISION GROUP

OpenCL (Embedded Profile)


Emerging platforms will offer needed flexibility


OpenCL Embedded Profile is a subset of OpenCL


Supports data and task parallel programming models


Code executed concurrently on CPU & GPU (& DSP)


Other current and future resources are compatible


Easier programming in a heterogeneous processor
environment



High parallelization on image processing
computations
-
> High efficiency

MACHINE VISION GROUP

GPU assisted face analysis process

MACHINE VISION GROUP

GPU
-
accelerated image
recognition


Open GL ES 2.0:


Image features (LBP,...) extraction:


Image preprocessing


Image scaling


Displaying



C code:


Camera control


Classification




c

MACHINE VISION GROUP

LBP fragment shader
implementation


Access the image via texture lookup


Fetch the selected picture pixel


Fetch the neighbours values


Compute binary vector


Multiply by weighting factor



Two versions:


Version 1: calculates LBP map in one grayscale channel


Version 2: calculates 4 LBP maps in RGBA channels

MACHINE VISION GROUP

Preprocessing

Create quad

Divide texture &

Convert to grayscale

Render each piece

in one channel

MACHINE VISION GROUP

Experiments setup


OMAP 3 family (OMAP3530)


ARM Cortex A8 CPU


Power VRSGX535 GPU




3 set
-
ups:


Beagleboard revision 3


Zoom AM3517EVM (TI Sitara)


Nokia N900

MACHINE VISION GROUP

Processing times: LBP extraction


Computing LBP in four channels (version 2)
faster than computing in one



CPU faster than GPU



Concurrent execution of algorithms in GPU +
CPU increases performance

Size

GPUv1

GPUv2

CPU

CPU&
GPUv1

CPU&
GPUv2

1024x1024

232ms

180ms

100ms

116ms

90ms

512x512

76ms

46ms

25ms

37ms

23ms

64x64

2ms

1,5ms

0,4ms

1ms

0,2ms

MACHINE VISION GROUP

Processing times: Preprocessing


GPU outperforms CPU in pixelwise simple operations
(scaling + interpolation)



Concurrent execution of algorithms in GPU + CPU
slower than GPU alone due to data transfers

Size

GPU

CPU

CPU &GPU

1024x1024

35ms

100ms

54ms

512x512

10ms

25ms

15ms

64x64

0,2ms

0,4ms

0,4ms

MACHINE VISION GROUP

Speed (II): Preprocessing

Size

GPU

CPU

CPU&GPU

1024x1024

35ms

100ms

54ms

512x512

10ms

25ms

15ms

64x64

0,2ms

0,4ms

0,4ms

MACHINE VISION GROUP

Speed (II): Preprocessing

Size

GPU

CPU

GPU
preprocessing

&
CPU LBP
extraction

1024x1024

215ms

205ms

142ms

512x512

56ms

50ms

40ms

64x64

1,8ms

1ms

0,8ms

MACHINE VISION GROUP

Power and Energy consumptions


Power consumption of GPU and CPU is independent


CPU


190mW


GPU


110mW
-
130mW (increases with image size)



Energy consumption depends on processing time


GPU has smaller energy per operation.

Operation

GPU

CPU

Preprocesing

27mJ

19mJ

LBP

5,3mJ

10mJ

Combined

algorithm

32,3mJ

28mJ

MACHINE VISION GROUP

Summary





GPUs can be used as a general purpose procesors


New platforms will offer more efficiency and flexibility



Not optimized interfaces include excesive overheads

MACHINE VISION GROUP

Future directions


Implementation of classifier


Implementations in OpenCL


Multi
-
scale LBP


Implementation of other feature extraction

MACHINE VISION GROUP

Thank you!


Any questions???



Thanks to Texas Instruments for the donation of the Hardware