Neuro-IT Roadmap: Successful in the Physical World

parathyroidsanchovyΤεχνίτη Νοημοσύνη και Ρομποτική

17 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

76 εμφανίσεις

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
1

Neuro
-
IT Roadmap: Successful in the Physical World



Robust perception



Image processing



Speech recognition



Multimodal human machine interaction



System integration



Scene analysis and representation

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
2

Automotive: Overtake
-
Checker and Door
-
Opener Assistant

Lane-based
transformation
e
Vehicles
Temporal
feedback
Image
Lane
b
c
d
f
Contour
extraction
Motion
estimation
along
contours
Temporally
stabilized
motion segmentation
Vehicle
detection
a
Dr. Axel Techmer

Infineon Technologies

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
3

Security: Face Detection & Recognition

a
d
c
b

Leading edge approach of face
detection (University of Bochum)


Detection of face regions (a)


Pre
-
selecting of frontal faces (b)


Face recognition (c,d)


Elastic graph matching


Gabor Wavelet Transform

Ruhr University Bochum

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
4

Vision Instruction Processor (VIP)

Infineon Technologies, Corporate Research, Systems Technology

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
5

Vision Instruction Processor (VIP)

16 parallel

Processing

Elements

Prototype available since May 2001:


SIMD
-

Architecture


204 instructions


10 Million logic transistors


On
-
chip memory: 37KB


Technology: 0.35µm


Clock: 100 MHz


Power consumption:
100µW/MOPS


Die size: 22mm x 23mm


Peak Performance: 53 GOPS

in 0.13µm CMOS Technology:


Clock: 200 MHz


Peak Perf.:
106 GOPS


Die Size:
70 mm²


Power Consump.:
700 mW


PCI
-
Board with VIP and camera
submodules


Software Tools for VIP:


Compiler,
Debugger, Profiler


Software Tools on Host:


MS Visual C++ with VPL++
-
Library


Application demonstrators


Car Vision, Face recognition,

MPEG2, Graphic

Infineon Technologies, Corporate Research, Systems Technology

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
6

Car Vision Components
-

Hardware

other

sensor
s

CPU

Vehicle

control

other

sensors

Dr. Axel Techmer

Infineon Technologies

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
7

Neuro
-
IT Roadmap: Successful in the Physical World



Robust perception



Image processing



Speech recognition



Multimodal human machine interaction



System integration



Scene analysis and representation

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
8

Classical Sound Processing for Speech Recognition

A/D
8 kHz
| FFT |
25 ms
window
every
10 ms
Mel
transformation
smoothed
Cepstrum
&
loudness
normalized
Features
Hidden
Markov
Model
components
d/dt
d/dt
first
derivatives
second
derivatives
LOG
&
threshold
40 Hz
100 frequencies
24 channels
12 components
36 features
2 kHz
.
.
.
.
.
.
.
.
.
.
.
.
Filter
Microphone
4 kHz
80 Hz
160 Hz
Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
9

Speech production: time waveform

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
10

|FFT| resolves neither frequency nor temporal structure

20 ms window

|FFT|



frequency resolution: 50 Hz



temporal resolution: 20 ms

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
11

Classical Sound Processing for Speech Recognition

A/D
8 kHz
| FFT |
25 ms
window
every
10 ms
Mel
transformation
smoothed
Cepstrum
&
loudness
normalized
Features
Hidden
Markov
Model
components
d/dt
d/dt
first
derivatives
second
derivatives
LOG
&
threshold
40 Hz
100 frequencies
24 channels
12 components
36 features
2 kHz
.
.
.
.
.
.
.
.
.
.
.
.
Filter
Microphone
4 kHz
80 Hz
160 Hz
time structure of speech signal (<20 ms)

is lost in the magnitude spectrum (|FFT|)

Humans extract both temporal
-

and spectral

information for robust speech recognition

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
12

Auditory Sound Processing

sound

signal

ear

canal

middle

ear

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
13

Auditory Sound Processing

inner ear

hydrodynamics

100µm

sound

signal

ear

canal

middle

ear

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
14

0
5
10
15
20
25
30
35
10
-6
10
-7
10
-8
10
-9
10
-10
cochlear location (mm)
BM displacement (m)
level (dB
SPL
)
120
100
80
60
40
20
0
Dynamic Compression in the Inner Ear

basal

apical

speech
range

Inner ear model responses to 1 kHz tones

speech
range

BW

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
15

Auditory Sound Processing

sensory

cell

inner ear

hydrodynamics

sound

signal

ear

canal

middle

ear

synaptic

mechanisms

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
16

Coding of Sound into Action Potentials

tim
e(
ms)
cochlear location (mm)
0
20
40
60
80
100
5
10
15
20
25
30
F3
F2
F1
F0

regular firing pattern (
D
琽1〠浳m


f
0
=100 Hz)

low

high

frequency

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
17

Spectral
-

and
Temporal

Sound Processing in the Auditory Pathway

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
18

Neuro
-
IT Roadmap: Successful in the Physical World



Robust perception



Image processing



Speech recognition



Multimodal human machine interaction



System integration



Scene analysis and representation

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
19

Audio
-
Visual Speech Recognition

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
20

Audio
-
Visual Speech Recognition

Tracking of lip motion with sub
-
pixel precision

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
21

Audio
-
Visual Speech Recognition

Tracking of lip motion with sub
-
pixel precision

“two
-

one
-

seven
-

three
-

five
-

nine
-

eight
-

zero
-

four
-

six”

Hidden
-


Markov



Speech


Recognizer

0
2
4
6
8
10
12
10 pixels
Variation of
mouth width
mouth height
nose to chin
distance
time (s)
Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
22

Multi
-
modal: Pointing, gaze, gestures, mimics,…

Dr. Axel Steinhage, Infineon Technologies AG

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
23

Neuro
-
IT Roadmap: Successful in the Physical World



Robust perception



Image processing



Speech recognition



Audio
-
visual speech recognition



Multimodal human machine interaction



System integration



Scene analysis and representation

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
24

Man
-
Machine
-
Interaction based on

natural communication channels

Virtual Personal
Assistant (VPA)

Natural channels
speech, lip
-
motion,
gestures ...

Cheap sensors

(Webcam,

Microphone)

Items
presented
by VPA

Interactive
comunication
between user and
VPA

Dr. Axel Steinhage,
Infineon Technologies

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
25

Man
-
Machine
-
Interaction based on

natural communication channels

Virtual Personal
Assistant (VPA)

Human expert via
Advanced
Videophone (HHI)

Natural channels
speech, lip
-
motion,
gestures ...

Cheap sensors

(Webcam,

Microphone)

Items
presented
by VPA

Interactive
comunication
between user and
VPA

Advanced Videophone

Dr. Axel Steinhage,
Infineon Technologies

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
26

What do we earn from Neuro
-
IT ?



Sensitive Sensors


World knowledge




† †

Constructed brain”


Robust processing




“Tools for Neuroscience”



“Successful in the Physical World”




Conscious Machines”



Robust perception



Image processing



Speech recognition



Scene analysis and representation



Intelligent human
-
machine interaction



Natural feedback



Intelligent virtual person



Self learning Software



Massively parallel processing hardware


Digital and/or analog





neuronal networks




Factor 10”

Dr. Werner Hemmert, CPR ST

2003
-
12
-
02 Page
27

Neuro
-
IT Roadmap: Successful in the Physical World

Prof. Dr. Dr. h.c. H.
-
P. Zenner

Prof. Dr. A.W. Gummer

Werner Hemmert
Infineon

technologies AG

CPR
-
ST

Prof. Dr. D.M. Freeman

Dr. M. Mermelstein, B. Tsai

U. Dürig, M. Despont, G. Genolet,

U. Drechsler, P. Vettiger, G. Binning

MIT Micromechanics Group
Prof. Dr. U. Ramacher

J.
-
P. de la Cruz
-
Guiterrez, M. Holmberg

Dr. A. Steinhage, Dr. A. Techmer

Explore the Future -
Corporate Research