Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
1
Neuro
-
IT Roadmap: Successful in the Physical World
•
Robust perception
•
Image processing
•
Speech recognition
•
Multimodal human machine interaction
•
System integration
•
Scene analysis and representation
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
2
Automotive: Overtake
-
Checker and Door
-
Opener Assistant
Lane-based
transformation
e
Vehicles
Temporal
feedback
Image
Lane
b
c
d
f
Contour
extraction
Motion
estimation
along
contours
Temporally
stabilized
motion segmentation
Vehicle
detection
a
Dr. Axel Techmer
Infineon Technologies
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
3
Security: Face Detection & Recognition
a
d
c
b
Leading edge approach of face
detection (University of Bochum)
Detection of face regions (a)
Pre
-
selecting of frontal faces (b)
Face recognition (c,d)
Elastic graph matching
Gabor Wavelet Transform
Ruhr University Bochum
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
4
Vision Instruction Processor (VIP)
Infineon Technologies, Corporate Research, Systems Technology
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
5
Vision Instruction Processor (VIP)
16 parallel
Processing
Elements
Prototype available since May 2001:
SIMD
-
Architecture
204 instructions
10 Million logic transistors
On
-
chip memory: 37KB
Technology: 0.35µm
Clock: 100 MHz
Power consumption:
100µW/MOPS
Die size: 22mm x 23mm
Peak Performance: 53 GOPS
in 0.13µm CMOS Technology:
Clock: 200 MHz
Peak Perf.:
106 GOPS
Die Size:
70 mm²
Power Consump.:
700 mW
PCI
-
Board with VIP and camera
submodules
Software Tools for VIP:
Compiler,
Debugger, Profiler
Software Tools on Host:
MS Visual C++ with VPL++
-
Library
Application demonstrators
Car Vision, Face recognition,
MPEG2, Graphic
Infineon Technologies, Corporate Research, Systems Technology
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
6
Car Vision Components
-
Hardware
other
sensor
s
CPU
Vehicle
control
other
sensors
Dr. Axel Techmer
Infineon Technologies
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
7
Neuro
-
IT Roadmap: Successful in the Physical World
Robust perception
Image processing
•
Speech recognition
•
Multimodal human machine interaction
•
System integration
•
Scene analysis and representation
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
8
Classical Sound Processing for Speech Recognition
A/D
8 kHz
| FFT |
25 ms
window
every
10 ms
Mel
transformation
smoothed
Cepstrum
&
loudness
normalized
Features
Hidden
Markov
Model
components
d/dt
d/dt
first
derivatives
second
derivatives
LOG
&
threshold
40 Hz
100 frequencies
24 channels
12 components
36 features
2 kHz
.
.
.
.
.
.
.
.
.
.
.
.
Filter
Microphone
4 kHz
80 Hz
160 Hz
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
9
Speech production: time waveform
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
10
|FFT| resolves neither frequency nor temporal structure
20 ms window
|FFT|
•
frequency resolution: 50 Hz
•
temporal resolution: 20 ms
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
11
Classical Sound Processing for Speech Recognition
A/D
8 kHz
| FFT |
25 ms
window
every
10 ms
Mel
transformation
smoothed
Cepstrum
&
loudness
normalized
Features
Hidden
Markov
Model
components
d/dt
d/dt
first
derivatives
second
derivatives
LOG
&
threshold
40 Hz
100 frequencies
24 channels
12 components
36 features
2 kHz
.
.
.
.
.
.
.
.
.
.
.
.
Filter
Microphone
4 kHz
80 Hz
160 Hz
time structure of speech signal (<20 ms)
is lost in the magnitude spectrum (|FFT|)
Humans extract both temporal
-
and spectral
information for robust speech recognition
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
12
Auditory Sound Processing
sound
signal
ear
canal
middle
ear
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
13
Auditory Sound Processing
inner ear
hydrodynamics
100µm
sound
signal
ear
canal
middle
ear
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
14
0
5
10
15
20
25
30
35
10
-6
10
-7
10
-8
10
-9
10
-10
cochlear location (mm)
BM displacement (m)
level (dB
SPL
)
120
100
80
60
40
20
0
Dynamic Compression in the Inner Ear
basal
apical
speech
range
Inner ear model responses to 1 kHz tones
speech
range
BW
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
15
Auditory Sound Processing
sensory
cell
inner ear
hydrodynamics
sound
signal
ear
canal
middle
ear
synaptic
mechanisms
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
16
Coding of Sound into Action Potentials
tim
e(
ms)
cochlear location (mm)
0
20
40
60
80
100
5
10
15
20
25
30
F3
F2
F1
F0
regular firing pattern (
D
琽1〠浳m
f
0
=100 Hz)
low
high
frequency
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
17
Spectral
-
and
Temporal
Sound Processing in the Auditory Pathway
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
18
Neuro
-
IT Roadmap: Successful in the Physical World
Robust perception
Image processing
Speech recognition
•
Multimodal human machine interaction
•
System integration
•
Scene analysis and representation
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
19
Audio
-
Visual Speech Recognition
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
20
Audio
-
Visual Speech Recognition
Tracking of lip motion with sub
-
pixel precision
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
21
Audio
-
Visual Speech Recognition
Tracking of lip motion with sub
-
pixel precision
“two
-
one
-
seven
-
three
-
five
-
nine
-
eight
-
zero
-
four
-
six”
Hidden
-
Markov
Speech
Recognizer
0
2
4
6
8
10
12
10 pixels
Variation of
mouth width
mouth height
nose to chin
distance
time (s)
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
22
Multi
-
modal: Pointing, gaze, gestures, mimics,…
Dr. Axel Steinhage, Infineon Technologies AG
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
23
Neuro
-
IT Roadmap: Successful in the Physical World
Robust perception
Image processing
Speech recognition
Audio
-
visual speech recognition
Multimodal human machine interaction
•
System integration
•
Scene analysis and representation
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
24
Man
-
Machine
-
Interaction based on
natural communication channels
Virtual Personal
Assistant (VPA)
Natural channels
speech, lip
-
motion,
gestures ...
Cheap sensors
(Webcam,
Microphone)
Items
presented
by VPA
Interactive
comunication
between user and
VPA
Dr. Axel Steinhage,
Infineon Technologies
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
25
Man
-
Machine
-
Interaction based on
natural communication channels
Virtual Personal
Assistant (VPA)
Human expert via
Advanced
Videophone (HHI)
Natural channels
speech, lip
-
motion,
gestures ...
Cheap sensors
(Webcam,
Microphone)
Items
presented
by VPA
Interactive
comunication
between user and
VPA
Advanced Videophone
Dr. Axel Steinhage,
Infineon Technologies
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
26
What do we earn from Neuro
-
IT ?
•
Sensitive Sensors
World knowledge
† †
“
Constructed brain”
Robust processing
“Tools for Neuroscience”
“Successful in the Physical World”
“
Conscious Machines”
•
Robust perception
•
Image processing
•
Speech recognition
•
Scene analysis and representation
•
Intelligent human
-
machine interaction
•
Natural feedback
•
Intelligent virtual person
•
Self learning Software
•
Massively parallel processing hardware
Digital and/or analog
neuronal networks
“
Factor 10”
Dr. Werner Hemmert, CPR ST
2003
-
12
-
02 Page
27
Neuro
-
IT Roadmap: Successful in the Physical World
Prof. Dr. Dr. h.c. H.
-
P. Zenner
Prof. Dr. A.W. Gummer
Werner Hemmert
Infineon
technologies AG
CPR
-
ST
Prof. Dr. D.M. Freeman
Dr. M. Mermelstein, B. Tsai
U. Dürig, M. Despont, G. Genolet,
U. Drechsler, P. Vettiger, G. Binning
MIT Micromechanics Group
Prof. Dr. U. Ramacher
J.
-
P. de la Cruz
-
Guiterrez, M. Holmberg
Dr. A. Steinhage, Dr. A. Techmer
Explore the Future -
Corporate Research
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment