Executive Summary - Image and Sound Processing Group

londonneedlesΤεχνίτη Νοημοσύνη και Ρομποτική

25 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

164 εμφανίσεις





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)




Contact information




Project
coordinator

Silvia Castellví

silvia.castellvi@atosresearch.eu



Administrative manager

Silvia Castellví

silvia.castellvi@atosresearch.eu







Project partners


1.

A
tos
Spain

SA (ATOS)

2.

SIRRIS (SIRRIS)

3.

STMicroelectronics S.r.l
(ST
-
Italy)

4.

FluidHouse (FH)

5.

Luciad (Luciad)

6.

Fundación Tekniker
(Tekniker)

7.

Acondicionamiento
Tarrasense
(Leitat)

8.

Equalid Solutions, S.L.
(EQUALID)

9.

IRIDA Labs (IRIDA)

10.

Integrasys S.A. (INT)

11.

Politecnico di Milano
(PoliMi)

12.

Tampere University of
Technology Foundation
(TUT)

13.

IOS International (IOS)

14.

Pininfarina S.P.A. (PF)

15.

Akhela srl (Akhela)

16.

Telecom Design S.r.l. (
TD)

17.

El Jardín de Junio (Jarjun)

18.

Namahn (Namahn)

19.

Prodatec Oy (PDT)

20.

Smart Solutions
Technologies S.L.
(NUUBO)

21.

Thales (Thales)

22.

Ecole Nationale de
Sciences Cognitives
(ENSC)

23.

THT Control (THT)



Collaborative Project


ARTEMIS JOINT UNDERTAKING

ASP8,
Human
-
centric design of embedded systems


Reference designs and architectures

Seamless connectivity and middleware

Design methods and tools


D2.2


Survey of state of the art
enabling
technologies and associated gap
analysis w.r.t ASTUTE objectives

M6


Due

date of deliverable:

[31
-
08
-
2011]

Actual submission date: [
14
-
09

-
2011
]



S
tart date of project: 01.03.2011


Duration: 36

months


Project co
-
funded by the Euro
pean Commission within the
Seventh

Framework Programme

(2002
-
2006)

Dissemination Level

PU

Public

X

PP

Restricted to other programme participants (including the
Commission Services)



RE

Restricted to a group specified by the consortium
(including the Commission Services)


CO

Confidential, only for
members of the consortium
(including the Commission Services)





Organisation name of lead contractor for this deliverable
:
TUT






Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
ii
/
80


Revision

History

Revision

Date

Description

Issued by

1.0

3
1/0
5
/20
11

Creation of
a template
with initial
structure
; initial structure in Wiki


TUT
; PoliMi

1.01

20/08/2011

Working draft

TUT, ENSC, Luciad, Sirris

1.02

22/08/2011

Working draft

Sirris, JarJun
, IOS

1.03

29/08/2011

Working draft

PoLiMi, IRIDA

1.04

29/08/2011

Working draft

ST, PoliMi, IRIDA

1.05

06/09/2011

Pre
-
release 1

Polimi, TUT, ATOS, Tekniker

1.0
6

0
7
/09/2011

Pre
-
release
2

TUT

1.07

08
/09/2011

Release 1
st

milestone

PoLiMi





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
3
/
80


Table of Contents

EXECUTIVE SUMMARY

................................
................................
................................
...........................

11

1

INTRODUCTION

................................
................................
................................
...............................

12

1.1

This Document

................................
................................
................................
................................
.................
12

1.2

Objective

................................
................................
................................
................................
..........................
12

1.3

Intended/Main Audience

................................
................................
................................
................................
.
12

1.4

Outline

................................
................................
................................
................................
.............................
12

2

INFORMATION RETRIEVA
L AND FUSION

................................
................................
................

14

2.1

Intelligent Context
-
aware Information Retrieval

................................
................................
..............................
14

2.1.1

Situation Awareness

................................
................................
................................
................................
......

15

2.1.2

Context retrieval from visual features

................................
................................
................................
...........

15

2.1.3

Structure from Motion

................................
................................
................................
................................
..

16

2.2

Intelligent User State Information Retrieval

................................
................................
................................
.....
18

2.2.1

Affective User State

................................
................................
................................
................................
.......

18

2.2.2

User State Information Retrieval

................................
................................
................................
...................

18

2.2.3

Affect Detection

................................
................................
................................
................................
.............

19

2.3

Mu
ltimodal Information Fusion

................................
................................
................................
........................
20

2.3.1

Overview

................................
................................
................................
................................
........................

20

2.3.2

Multisensor Data Fusion

................................
................................
................................
................................

22

2.3.2.1

Introduction

................................
................................
................................
................................
.........

22

2.3.2.2

Possible architectures

................................
................................
................................
..........................

24

2.3.2.3

Data Fusion process definition

................................
................................
................................
.............

26

2.3.2.4

State of the art

................................
................................
................................
................................
.....

28

2.3.2.5

Dirty Secrets in Data Fusion

................................
................................
................................
.................

30

2.3.3

Data retrieval and exchange

................................
................................
................................
..........................

32

3

CONTEXT MODELLING AN
D PRO
-
ACTIVE DECISION SUPP
ORT

................................
.......

39





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
4
/
80


3.1

Context Modelling

................................
................................
................................
................................
............
39

3.1.1

Ontology
-
based models

................................
................................
................................
................................
.

40

3.1.1.1

CONON

................................
................................
................................
................................
.................

41

3.1.1.2

SOUPA

................................
................................
................................
................................
..................

41

3.1.1.3

CoOL

................................
................................
................................
................................
.....................

42

3.1.1.4

CoDaMoS

................................
................................
................................
................................
..............

43

3.1.2

Tracking Progress in Collaborative Environments

................................
................................
.........................

43

3.1.2.1

Workflow Model Perspective

................................
................................
................................
...............

43

3.1.2.2

Modelling Emergency Response Situations

................................
................................
.........................

46

3.1.3

Probabilistic Models

................................
................................
................................
................................
......

51

3.1.4

Probabilistic User State

................................
................................
................................
................................
..

51

3.2

Reasoning Techniques

................................
................................
................................
................................
......
51

3.2.1

Ontological Reasoners

................................
................................
................................
................................
...

51

3.2.1.1

The DIG Interface

................................
................................
................................
................................
.

52

3.2.1.2

Characteristics of a reasoner
................................
................................
................................
................

52

3.2.1.3

Reasoners

................................
................................
................................
................................
.............

52

3.2.1.4

DL

logic reasoners

................................
................................
................................
................................

52

3.2.1.5

Rule
-
based reasoners

................................
................................
................................
...........................

53

3.2.2

Probabilistic Reasoning

................................
................................
................................
................................
..

54

3.3

Naturalistic Decision Making

................................
................................
................................
............................
54

4

INTERFACE DESIGN

................................
................................
................................
........................

55

4.1

Multimodal Interface Design

................................
................................
................................
............................
55

4.1.1

Development of multimodal interfaces

................................
................................
................................
.........

55

4.2

Proactive Interfaces

................................
................................
................................
................................
..........
56

4.3

Adaptive User Interfaces

................................
................................
................................
................................
..
58

4.3.1

Map Centric Data Fusion and Visualization (Luciad)

................................
................................
.....................

59

5

CONCLUSIONS

................................
................................
................................
................................
...

63

6

BIBLIOGRAPHY AND REF
ERENCES

................................
................................
............................

64





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
5
/
80


6.1

Normative references

................................
................................
................................
................................
.......
64

6.2

Documents and Books

................................
................................
................................
................................
......
65

7

AN
NEXES

................................
................................
................................
................................
.............

80








Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
6
/
80


Figures

Figure 1.

Different levels of information fusion coming from different audio
-
visual sources.

.........

21

Figure 2.

A moving object observed by both a pulsed radar and an infrared imaging sensor.

........

24

Figure 3.

Direct fusion of sensor data.

................................
................................
...............................

25

Figure 4.

Representation of sensor data via feature vec
tors, with subsequent fusion of the feature
vectors.

................................
................................
................................
................................
...............

25

Figure 5.

Processing of each sensor to achieve high
-
level inferences or decisions, which are
subsequently c
ombined.

................................
................................
................................
....................

26

Figure 6.

Joint Directors of Laboratories process model for data fusion.

................................
.........

27

Figure 7.

CONON hierarchy.

................................
................................
................................
...............

41

Figure 8.

SOUPA ontology.

................................
................................
................................
.................

42

Figure 9.

Tangible Business Process Models: Toolkit.

................................
................................
.......

45

Figure 10.

Example of part of a disaster plan expressed in rule
-
based language.

............................

4
6

Figure 11.

Example Workflow of an Emergency Plan.

................................
................................
.......

48

Figure 12.

WorkItem State Diagram.

................................
................................
................................
.

48

Figure 13.

ERMA process model.

................................
................................
................................
.......

50




Tables

Table 1.

Set of technologies to be applied in ASTUTE:

................................
................................
......

32







Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
7
/
80


List of
Abbreviations

Term

Meaning

AAMI

Auto
-
Adaptive Multimedia
Interfaces

AMEBICA

Auto Adaptive Multimedia Environment Based on Intelligent Agents

AMR

Adaptive Multi
-
Rate

ANEW

Affective Norm for English Words

API

Application Programming Interface

BA

Bundle Adjustment

BCI

Brain
-
Computer Interface

BPMN

Business
Process Model and Notation

BPMS

Body Pressure Measurement System

CBIR

Context
-
Based Image Retrieval

CG

Conceptual Graph

DAML

DARPA Agent Markup Language

DFD

Data Flow Diagrams

DIG

Description Logic Implementation Group

DL

Description Logic

DoG

Difference of Gaussians

DRM

Digital Rights Management

DSA

Digital Signature Algorithm

DXF

Drawing Interchange Format





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
8
/
80


ECC

Elliptical Curve Cryptography

EDA

Electrodermal Activity

EEG

Electroencephalography

EKG

Electrocardiograms

EMG

Electromyogram

EOG

Electrooculography

ESPG

European Petroleum Survey Group

ESPRIT

European Strategic Program on Research in Information Technology

FACS

Facial Action Coding System

FAST

Features from Accelerated Segment Test

FLIR

Forward
-
Looking Infrared

GIF

Geographical Information Retrieval

GIS

Geographical Information System

GPS

Global Positioning System

HMI

Human Machine Interaction

HOG

Histogram of Oriented Gradients

HTML

HyperText Markup

Language

IFF

Identification Friend or Foe

IR

Information
Retrieval

KLT

Kanade
-
Lucas
-
Tomasi

LAI

Location Area Identity





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
9
/
80


LM

Levenberg
-
Marquardt

MHT

Multiple
-
Hypothesis Tracking

MVC

Model
-
View
-
Controller

NFC

Near Field Communication

NLP

Natural Language Processing

nRQL

new Racer Query Language

OGC

Open
Geospatial Consortium

OIL

Ontology Inference Layer

OS

Operating System

OWL

Web Ontology Language

P2P

Peer to Peer

PKI

Public Key Infrastructure

pKLT

pyramidal Kanade
-
Lucas
-
Tomasi

RANSAC

RANdom SAmple Consensus

RAT

Radio Access Technology

RDF

Resource Description Framework

RFID

Radio
-
F
r
equency ID
entification

RGBA

Red Green Blue Alpha

RSA

Rivest, Shamir,
Adleman

(
encryption algorithm
)

RSSI

Received Signal Strength I
ndicator

SA

Situation Awareness





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
10
/
80


SfM

Structure from Motion

SIFT

Scale
-
invariant feature transform

SLAM

Simultaneous localization and mapping

SOA

Service
-
Oriented Architecture

SOUPA

Standard Ontology for Ubiquitous and Pervasive Applications

SURF

Speeded Up Robust Feature

SWRL

Semantic Web Rule Language

TBPM

Tangible Business Process Modelling

TDOA

Time Difference Of Arrival

TOA

Time Of A
rrival

UI

User Interface

UML

Unified Modeling Language

UMTS

Universal Mobile Telecommunications System

U
-
OTDOA

Uplink Observed Time Difference Of Arrival

VHF

Very High
Frequency

VOR

VHF Omnidirectional Range

VQ

Vector Quantization

WFS

Web Feature Service

WM
S

Web Map Service

WS
-
BPEL

Web Services Business Process Execution Language

XML

Extensible Markup Language





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
11
/
80


Executive Summary

This document contains the first version of D2.2. It
presents a collection of related works from the
various domains of the partners’ expertise
.
In further versions, the content
will be further
elaborated along with other on
-
going project activities, discu
ssed and refined in collaboration with
all involved partners

in order to produce a coherent survey on enabling technologies
.






Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
12
/
80


1

Introduction

1.1

This Document

This document
provides a survey of state of the art enabling technologies and associated gap
analysis
w. r. t. ASTUTE objectives.

The survey scrutinizes
algorithms, models, hardware and
software technologies, services, content and sensing technologies relevant to ASTUTE objectives

in
order to facilitate developments in all R&D work packages of ASTUTE
.


The

current version of the document provides a collection of related works from the various
domains of the partners’ expertise. The content is organised in three main blocks which are
informational retrieval and fusion, context modelling and pro
-
active decisi
on support, and
interface design. Next revisions of the document are targeting to
add more value to the topic and
to
reorganise the heterogeneous knowledge of the consortium
members
to a
smooth and
coherent

paper.


1.2

Objective

The survey is
addressing
tw
o

main objectives:


1)

T
o reveal
the current
state of the art in the fields related to ASTUTE interest, and

2)

T
o identify opportunities for further
progress and innovative development align with
ASTUTE objectives.

In order to achieve ASTUTE specific objectives

the document is targeting:



To present the key parameters of both user state and situational context which essential
for the decision making



To provide the ground for the context modelling with a review of the information retrieval
and filtering, and cont
ext modelling technologies, methods and tools



To review the technologies enabling the decision support.


1.3

Intended/
Main Audience

The main audience of this document are all ASTUTE contributors, being part of the consortium or
not.

The document, in its fina
l release, has the aspiration


1.4

Outline

For rest of the document o
utline is as following:





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
13
/
80




Chapter
2

is dedicated to information retrieval and fusion with focus on i
ntelligent
c
ontext
-
aware information retrieval
, i
ntelligent user state information retrieval
,

and m
ultimodal
information fusion
.



Chapter
3

surveys context modelling,
reasoning techniques

and n
aturalistic decision
making
.



Chapter
4

is dedicated to interface design and focused on three topics: m
ultimodal
interface design
, p
roactive interfaces
, and
a
daptive user interfaces
.



C
onclusions

are summarized
in chapter
5
.



List of references

is provided in the end of the document.






Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
14
/
80


2

Information Retrieval and Fusion

2.1

Intelligent Context
-
aware Information Retrieval

Increasing amount of information flow from various information sources challenges traditional
Information Retrieval (IR) techniques and require further
investigation

in IR. Recent publications
on IR systems show
the

interest
of researchers
in context
-
aware

IR, which
is

sensitive to the user
location, preferences and interest, time and current state of the environment
. According to
[1]

the
IR system “is context
-
aware if it exploits context data in order to deliver relevant information to
the user”. Applications of context
-
aware IR systems are numerous: geographical infor
mation
retrieval (
GIF
) (which aims to deliver information to the user based on his/her current location)

[2]
; mobile IR (which aims to provide content adaptation for the mobile devices in addition to
GIF

goa
ls)
[3]
[4]
[5]
[6]
[7]
;

conte
xt aware search engines
[8]
[9]
[10]
[11]
,

etc
.

Context
-
aware IR systems combine traditional IR techniques like

HTML
-
aware tools, NLP
-
based
tools, and ontology
-
based tools
[12]

with
the
current context. For example, models based on

ontologies could be augmented with user location and context
,

for the refinement of queries in
information retrieval process
[2]
[5]
[9]
.

According to the general framework proposed
in
[3]

for context
-
aware IR systems their main
functions are

context
modelling

and

context retrieval.

The main objective of

context
modelling

is to extend the original user query according to current
context. Context
modelling

is focused on reflecting current user context

or on

inferring from the
external or internal sources, user profiles, current and past
behaviours

of the user. Different
context models could be used to access the context information: tags or key
-
value models

[4]
[8]
,
keyword vectors or vector classes
[6]
[13]
, graphs
[11]
, ontologies
[2]
[5]
[9]
[14]
, etc
.

Context retrieval

aims to deliver the right information to the user
by
exploiting different query
refinement techniques with query reformulation, data ranking and semantic queries
[5]
[6]
[7]
[9]
[11]
[13]
[14]
.

Concluding, context
-
a
ware IR is a novel field of
research with various
approaches under
consideration
, and the approaches for different applications need further
adjustments

[15]
.


Situation Awareness (SA)

plays a predominant role in decision making of a user while operating a
technical system
[16]
. In order to support SA and goal
-
oriented behaviour of a user an
Informational Retrieval (IR) system shou
ld provide the relevant information in a context
-
aware
manner.
The next subsection provides with the definition of
SA and its role in user behaviour.





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
15
/
80


2.1.1

Situation Awareness

One of the aspects
of intelligent context aware information
concerns the
SA

and its ro
le in
dynamic human decision making. As technology
progress
, users must deal with dynamic systems
that increase the complexity of effective and timely decision making. The operator’s situation
awareness is presented as a predominant concern in this decisio
n making and performance
process
when operating the systems. To construct a theoretical model of SA, Endsley (1995)
explores the relationship between situation awareness and numerous individual and
environmental variables, like attention and working memory
, as critical factors limiting operators
from acquiring and interpreting information from the environment to form situation awareness.
Mental models and goal
-
directed
behaviour

are hypothesized as important mechanisms for
overcoming these limits
[16]
. For example, in the aircraft domain, pilots are highly dependent on
an

up
-
to
-
date

assessment of the changing situation (operational parameters, externals conditions,
navigatio
nal information …).

The Endsley
’s

SA models describes and synthesizes different cognitive
resources and mechanisms that might be involved in con
structing and maintaining SA
[16]
, such as
dynamic goal selection, attentio
n to appropriate critical clues, expectancies regarding future states
of the situation, and tie
s

be
tween SA and typical actions
[17]
.

A second model proposed by Baumann & Krems (2009), based on the Construction
-
Integration
theory
[18]

explores the situation model processes. For safe driving, it is necessary for drivers to
perceive, identify and correctly interpret t
he current traffic situation, to be able to construct
future development and to adapt their driving
behaviour

to the situation

[19]
.

As a conclusion, we use a paper from Wickens (2008) that summarizes two articles by End
sley on
situation awareness

and presents the influence of the conce
pt on subsequent practice and theory
of human factors

[20]
. Situation awareness

is a viable and important construct that still has some
controversy over measurement issues. SA can be applied to the areas of training (information
seeking or teaching predictive skills
[21]
[22]
), error analysis (attentional training
[22]
[20]
), design
(display feature to support SA
[23]
), prediction

[22]
, teamwork (team dynamics an
d interworker
communications
[24]
), and auto
mation (harmony and workload
[16]
).


2.1.2

Context retrieval from visual features


An important source of context information is
from the visual appearance of the surroundings of a
place. For example, in an emergency situation, a worker with a camera
-
enabled device can orient
herself in an unknown place, or carry out complex procedures with the aid of Augmented Reality
(AR). A camer
a mounted on a car can warn the driver about potential dangerous situations (e.g. a
pedestrian on the road) or, again using AR, show the current itinerary and points of interest.

The methods from computer vision help
to
retrieve two important types of usef
ul contextual
information: the camera location in a known environment, and the type and position of objects of
interest in the surroundings.





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
16
/
80


The camera location can be recovered from a set of overlapping images of the surroundings, using
methods form Struc
ture from Motion (SfM)
[25]

(see also part
2.1.3
)
, and a number of images
large enough also allow to reconstruct the
3D
appearance of the surroundings
[26]
. If at least
some of the images have GPS coordinates, all the images can be geographically localized with
accuracy; for a recent example see
[27]
. Such a method applied to mobile or handled devices can
supplant or integrate GPS and provide useful information not otherwise available, e.g. what is
actually visible form the current location. One early attempt in this direction is the
work of
[28]
,
while a more ambitious approach is presented by
[29]
, where the localization is extended to the
whole earth using images from photo
-
sharing services.

The camera localization part is usually done via camera calibration, which in most cases requires
as a first
step the detection of feature descriptors invariant with respect to the point of view of the
camera. Currently many of such descriptors exist, and for an extensive overview see
[30]
. The
feature descri
ptors can also be used to implement Augmented Reality (AR) without markers, again
using SfM
[31]
. For storage efficiency, the descriptors can be used in place of whole images when
matching an acquired image ag
ainst a known set of images, and a standard for the descriptors is
under development in the

MPEG

Group [
l
].

Object recognition is
a
problem much harder than camera calibration, especially in the real case
scenario of thousands of potential categories of ob
jects
[32]
. The object recognition process, to
model the scene, can use the same type of descriptors used for camera calibration, for example in
the so
-
called bag
-
of
-
words models
[33]
, or use other approaches
[34]
. In particular cases specific
techniques are used, e.g. Histograms of
Oriented
Gradients (HOG) for pedestrian detection
[35]

or
boosted classifiers for face detection
[36]
.

2.1.3

Structure from Motion

Structure from Motion (SfM) is a computer v
ision technique directed to the estimation of camera
ego
-
motion and surrounding 3
D

shape environment, by anal
ysing a calibrated video.

The input video can be acquired with a mono or a stereo image camera, but in each case the
camera must be calibrated. The

calibration task is a standard off
-
line process, which achieves to
estimate intrinsic camera parameters and the reciprocal position of the cameras in the case of
stereo configuration. For intrinsic parameters here we mean focal length, the camera's centre

and
a few distortion coefficients introduced by the camera's lens
[25]
. A few open libraries are
available for camera calibration, like
OpenCV

[37]
, the
Bouguet toolbox

[38]

and
tclcalib

[39]
.

A reference SfM pipeline is divided in an image
analyser

stage, a cam
era reconstruction stage, a
triangulation stage and finally a refinement stage.

The image
analyser

stage generates a sparse optical motion flow combing
the
detection
and the

tracking
of interest points over time. Well
-
known corner detector are Harris
[40
]
, Shi
-
Tomasi
[41]





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
17
/
80


and FAST
[42]
[43]
,
whereas

to track th
e points the usual

solution
s

are
the
KLT
[44]

and the
PKLT

[45]

algorithm
s
.

The camera estimation process is based on epipolar geometry that re
presents the mutua
l
camera’s positions, through

an
essential or fundamental matrix. The family of algorithm
s

used to
compute this type of matrix are called
N
-
points, where
N

is the number of input points;
the most
well
-
known

configurations are 7 and 8
[25]
. In or
der to obtain a robust camera estimation
, an

N
-
points algorithm is not enough and must be integrated with a RANS
A
C
[46]

algorithm, that
randomly select
s

a set of input corners, output a large set of hypothesis for the camera position
and score them using re
-
projection error in order to select the best one.

By knowing
the
camera ego
-
motion
, it

is possible
to
compute a sparse 3d map
by estimating

the
spatia
l position of the tracked features. This process is named triangulation and the main
implementation is
in
[47]
.

Finally
,

a refinement stage is needed to improve quality using algorithm like LM bundle
adjustment (BA)
[25]
,

w
h
ich

is a technique for simultaneously refining the 3D structure and

the
camera parameters (i.e. camera pose and possibly intrinsic calibration parameters), to obtain a
reconstruction which is optimal under certain assumptions
about

the noise affecting

the

interest
point detection.

BA amounts to minimize the re
-
projection
error between the observed and predicted image
points. Since the prediction of
the
image points involves an image projection, one must in general
use non
-
linear least squares algorithms, of which the Levenberg
-
Marquardt (LM) has p
roven to be
the most succe
ssful,
due to its
d
amping strategy that
allows

it to converge from a wide range of
initial guesses. By
iteratively
linearizing the function to be minimized in the
neighbourhood

of the
current estimate, the LM algorithm computes the solution of
linear syste
ms

known as
normal
equations
.

In
the
automotive context a lot of
publications propose

different pipeline combination
s

to

estimate car motion using SfM:
some of them have a monocular system like
[48]
[49]
, while

other
s

use a stereo system
[50]
.

A further improvement is the introduction of invariant features inste
ad of using the combination
of
simpler corner detector
s

and tracker
s
. This family o
f algorithms describe
s

the region around a
key
-
point in order to preserve some invariance to rigid transform and light changes and
is

able to
identify the position of the same key
-
points in a new image simply comparing two descriptors.

The
matching advanta
ge
is a drastic relax
ation

on
the
PKLT assumption
s

reg
arding the brightness
constancy
, the temporal persistence and the spatial coherence between
the

images

involved
.
More
over,
descriptors can be used to recognize
the
object
s of a scene

in a visual databas
e and
,

for
example
, retrieve useful information about

the environment. The main state of
the
art algorithms
are SIFT
[51]

and

SURF

[52]
,

which

ap
proximate
the
interest point detection
,

based on Diff
erence




Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
18
/
80


of G
aussians
,

using Fast Hessian and Integral Image in order to reduce the computational cost
at
the expense of quality
.

An example of invariant features used to estimate camera motion is
[53]
,

where SIFT
is mixed with
SLAM.

For storage efficiency, the descriptors can be further comp
ressed

and used in place of whole

images
, and as said above,

a standard for those descriptors is under development in the MPEG
Group
[l][m]
. A novel algorithm that seems promising in that respe
ct is CHo
G

[54]

which,
starting
from DoG or Fast Hessian discretizes t
he oriented patch in two steps:
the first one

named
DAISY

divide
s

the region
in circular overlapped area
s, and

the second one do
es

a gradient
quantization
via vector quantization (
VQ
). Finally

the final descriptor is again compressed using
Type coding or Huffman coding, obtaining a descriptor
that range

from 44 to 100 bits instead
of
the
64
-
128 bytes of SIFT
[51]
.

2.2

Intelligent User State Information Retrieval

2.2.1

Affective User State

An intelligent user assistance s
ystem (e.g. human
-
machine interface


HMI) is the one that aims to
improve user’s performance,
and
minimize human errors during the main task. Moreover, and
more important, this system works to mitigate the impact of negative states in the user, for
instan
ce, fatigue, stress, confusion, boredom, or anxiety. A HMI is, therefore, emotion
-
aware,
if

it
is both able to recognize emotions and intelligently or appropriately express them and act on
them accordingly by providing user assistance. In designing this so
rt of system, affective
computing
[55]

should be a key conceptual framework, since
it
allows to compute the
relationships between human emotions and task performance. This technology integrates
different modalit
ies

of input information from the user to detect and recognize the current
affective

state. For instance, vocal emotion communication
[56]
, affective facial expression and
gestures
[57]
, and affective psy
chophysiology
[58]
[59]
.

2.2.2

User State Information Retrieval

Affective computing focuses on studying and developing sy
stems and devices capable of
recognizing, interpreting, processing and simulating human affects

(see part
2.2.1
)
. It is an
interdisciplinary field with contributio
ns from computer science, psychology, linguistics, cognitive
and affective sciences, neuroscience, and related disciplines.
The paper “Affective Computing”

[60]

is considered to be the modern origin of the field.

A recent review of the state
-
of
-
the
-
art in affect detection is that by Calvo and D’Mello

[61]
. The
numb
er of sensor modalities or channels investigated for detection of affective aspects has
increased since the field’s
inception
, as well as the number of techniques and methods employed,





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
19
/
80


mostly from the fields of machine learning and psychology. However it s
eems reasonable that
multimodal systems can provide advantages since emotional events typically activate multiple
user responses. Nonetheless, multimodal affect detection is more challenging and ver
y few
systems have explored it.

Support for cognitive
-
affe
ctive systems development is
typically found via bibliographic research
or

experimental and field work (see, for example

[62]
), with the objective of gathering annotated
data and knowledge on useful features and relations among user states, context, etc.

2.2.3

Affect Detection

The affective states in
people are inherently multimodal. In this section the different channels
through which an emotion can be detected

are discussed
, while in Section
2.3

we will discuss
more
thoroughly

how different modalities can be integrated.

Researchers concentrated mainly on detection through facial expressions, voice, posture,
physiology and textual con
tent
[61]
.

In the study of facial expressions, the goal is typically to identify basic expressions linked to human
emotions. A frequently used dictionary for expressions

and their link with emotions is the
Facial
Action

Coding System (FACS)
, developed by Ekman and Friesen

[63]
.
Most methods in the field
require a pre
-
segmented sequence of expression
s
, few have real
-
time performances and almost
n
o
one uses contextual cu
es to help t
he recognition phase
[64]
; further progress is thus needed to
enable real
-
world applications.

Emotion recognition through voice typically uses prosody (rhythm and tone) to recognize the
user’s
emotional state. Those methods often suffer from lower accuracy w.r.t.
methods based on
facial expressions
, but on the other hand most
of them
can work in real
-
time and in realistic
settings
[64]
.

Posture and physiology, differently from voice and facial expressions, measure variables affected
by unconscious reactions, so they can overcome social editing, i.e., the intentional adjustment of
one’s expressed emotion. An interesting example of posture
detection is the work of Mota et al.
[65]

on the
Body Pressure Measurement System (BPMS)
, a pressure pad that detect
s

the posture
on a chair. Physiological sensors
offer

an array of signals

that

range
s

from
Electromyogram
s (EMG
)
on the muscles,
to
detection of
Electrodermal Activity (EDA)
,
to
Electrocardiogram
s (EKG)

and
Electrooculogram
s (EOG)
[61]
.

Lastly, emotion can be inferred by the tone
and choice of words of a written text or a transcript.
The emotional characterization can be at the level of single words, as in the pioneering work of
Osgood et al.
[66]
, or based on lexical analysis of corpora of texts. Several projects are attempting
to create emotional ratings for comm
on words

[61]
, among which we cite the
Affective Norm for
English Words

(ANEW)

[67]
.





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
20
/
80


2.3

Multimodal Information Fusion

It is obvious that a successful HMI system should rely on multiple modalities, for instance, audio,
visual, even haptic m
odalities, and not only on sensory information, but also
on
emotional (e.g.
stress, frustration) information and physiological states (e.g. interest, engagement). Traditionally,
multimodal information could be processed at different levels. The lowest level is at the input or
raw data from sensory sensors, for inst
ance pattern recognition (audio
-
visual speech processing)
or multi
-
biometrics information (EEG, facial expression, EOG) from the user state. The highest
level takes place at the output of the system, for instance, decision
-
level processing
[68]
. Different
approaches have been developed to build a multimodal decision making: majority voting
[69]
,
weighted majority voting
[70]
, Bayesian decision fusion
[71]
, and behaviour knowledge space
[72]
.
Between the lowest and the highest levels, a number of in
termediate levels take place, for
instance focusing on feature fusion (algorithms for feature extraction and classification into an
appropriate feature)
[73]
. A relatively recent fusion approach to multimodal fusion is the so
-
called
adaptive fusion
[74]
. The main idea
of

this approach is to measure the signal quality
of each input
modality and then use this information at the fusion level. Li & Ji developed a probabilistic method
to make decision on assistance depending on the utility of such assistance

[75]
.


F
us
ion techniques are
often
employed for merging information from sensors

[61]
. Data fusion
works at the lowest level merging raw data
streams
that, therefore, are synchrono
us and have the
same temporal resolution. Feature fusion works at the level of characteristics extracted from
signals. Finally, decision fusion works at the highest level and consists in applying techniques for
merging the outputs of expert classifiers. Mo
del
-
based fusion, based on existing knowledge and
methods, although largely unexplored has also been advocated

[64]
.


2.3.1

Overview

Fusion of distinct modalities is one of the features that distinguish
multimodal interfaces from
unimodal ones. The challenge is to increase the robustness of an analysis system by combining
meaningful information from different modalities. Fusion can be executed in three levels: (a) data
-
based fusion that can be implemented

when dealing with multiple signals coming from similar
modality sources, (b) feature
-
based fusion that can be used in the basis of combining commonly
extracted features, and (c) decision
-
based fusion that tries to integrate multiple decisions from
differe
nt sources into a single one. Three different types of architecture can in turn manage
decision
-
level fusion: frame
-
based architecture
s
, unification
-
based architectures
and

hybrid
symbolic/statistical fusion architectures.

In the case of context
-
aware data, the multimodal nature creates an essential need for
information fusion for its analysis, indexing and retrieval. Fusion also greatly impacts other tasks
like object recognition, since all objects exist in multimodal space
s. Besides the more classical data




Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
21
/
80


fusion approaches in robotics, image processing and pattern recognition
[76]
, the information
retrieval community discovered some years ago its power in combining multiple information
sources
[77]
[78]
. To enhance human
-
computer communicat
ion, multimodal interaction h
as
vastly
developed in the last few years.


Figure
1
.

Different levels of information fusion coming from different audio
-
visual sources.



ASTUTE intends to work on the development of novel methods of multi
-
modal, multi
-
level fusion
that integrates contextual information obtained from spoken input and visua
l scene analysis. By
taking into account the user’s context with the emotional and psychological states, the system will
interact naturally with humans in order to interpret human behaviour.

Nowadays, it is possible to design a specialized system with some

of the functionality needed for
effective human
-
machine interaction (HMI). This became possible by extracting information
arriving simultaneously from different communication modalities and combining them into one or




Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
22
/
80


more unified and coherent representati
ons of the user’s intention. The context
-
aware, user
-
centred application should accept spontaneous multi
-
modal input speech, gestures (pointing,
iconic, possibly metaphoric) and physical actions; it should react to events, identify the user’s
preferences,
recognize intentions and emotions, and possibly predict the user’s behaviour and
generate the system’s own response. Next
-
generation HMI designs need to include the essence of
emotional intelligence: specifically, the ability to recognize a user's affectiv
e states


in order to
become more efficient, more effective, and more human
-
like. Affective arousal modulates all
nonverbal communicative cues (facial expressions, body movements, and vocal and physiological
reactions)
[79]
.

In the practice of system design,
the following points are considered: sensors or sources of
information, select
ion of

the most relevant features of the signals, fusion level, fusion strategy and
fusion architecture, and if further background or domain knowledge can b
e embedded.
In order t
o
capture the information, one uses different types of sensors,
i.e.,
microphones to capture the
audio signal, cameras to capture life video images, 3D sensors to directly
capture the surface
information

in real time. Apart from audio and visual informatio
n
,

as mentioned
before
, human
also relies on the haptic modality, smell and taste. From these basic sensory information, higher
cues such as 3D and temporal information, as well as emotional (e.g., stress, frustration) and
psychological state (e.g., intere
st) can also be derived
[80]
. The fusion of information from
heterogeneous sensors is crucial to the effectiveness of a multimodal system.

Exploiting the
feature’s and modality’s dependencies will yield to maximal perf
ormance
[81]
. Fus
ing the
multimodal data result

in a large increase in the recognition rates in comparis
on with the
unimodal systems
[80]
.


2.3.2

Multisensor Data Fusion

2.3.2.1

Introduction

The concept of mul
tisensor data fusion is hardly new. As humans and animals evolved, they
developed the ability to use multiple senses to help them survive. For example, assessing the
quality of an edible substance may not be possible using only the sense of vision; the com
bination
of sight, touch, smell, and taste is far more effective. Similarly, when vision is limited by structures
and vegetation, the sense of hearing can provide advanced warning of impending dangers. Thus,
multisensory data fusion is naturally performed
by animals and humans to assess more accurately
the surrounding environment and to identify threats, thereby improving their chances of survival.
Interestingly, recent applications of data fusion
[82]

have combined

data from an artificial nose
and an artificial tongue using neural networks and fuzzy logic.

Although the concept of data fusion is not new, the emergence of new sensors, advanced
processing techniques, improved processing hardware, and wideband communica
tions has made




Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
23
/
80


real
-
time fusion of data increasingly viable. Just as the advent of symbolic processing computers
(e.g., the Symbolics computer and the Lambda machine) in the early 1970s provided an impetus to
artificial intelligence, the recent advances in

computing and sensing have provided the capability
to emulate, in hardware and software, the natural data fusion capabilities of humans and animals.
Currently, data fusion systems are used extensively for target tracking, automated identification of
targe
ts, and limited automated reasoning applications. Data fusion technology has rapidly
advanced from a loose collection of related techniques to an emerging true engineering discipline
with
a
standardized terminology,
a
collection of robust mathematical tech
niques, and
an
established system
of
design principles.

Fused data from multiple sensors provide several advantages over data from a single sensor. First,
if several identical sensors are used (e.g., identical radars tracking a moving object), combining th
e
observations would result in an improved estimate of the target position and velocity. A statistical
advantage is gained by adding the
N

independent observations (e.g., the estimate of the target
location or velocity is improved by a factor proportional
to
N
1/2
), assuming the data are combined
in an optimal manner. The same result could also be obtained by combining
N

observations from
an individual sensor.

The second advantage is that
by
using the relative placement or motion of multiple sensors the
obse
rvation process can be improved. For example, two sensors that measure angular directions
to an object can be coordinated to determine the position of the object by triangulation. This
technique is used in survey
s

and for commercial navigation (e.g.,
VHF o
mni
directio
nal range
[VOR]). Similarly two

sensors, one moving in a known way with respect to another, can be used to
measure instantaneously an object’s position and velocity with respect to the observing sensors.

The third advantage gained using multiple

sensors is improved observability. Broadening the
baseline of physical observables can result in significant improvements.





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
24
/
80



Figure
2
.

A moving object observed by both a pulsed radar and an infrared imaging sensor
.

Figure
2

provides a simple example of a moving object, such as an aircraft, that is observed by
both a pulsed radar and a forwar
d
-
looking infrared (FLIR) imaging sensor. The radar can accurately
determine the aircraft’s range but has a limited ability to determine the angular direction of the
aircraft. By contrast, the infrared imaging sensor can accurately determine the aircraft’s

angular
direction but cannot measure the range. If these two observations are correctly associated (as
shown in
Figure
2
), the

combination of the two sensors provides a better determination of
location than could be obtained by either of the two independent sensors. This results in a
reduced error region, as shown in the fused or combined location estimate. A s
imilar effect may
b
e obtained by

determining the identity of an object on the basis of the observations of an
object’s attributes.

2.3.2.2

Possible architectures

Three basic alternatives can be used for multisensor data:

1.

direct fusion of sensor data (
Figure
3
);

2.

representation of sensor data via feature vectors, with subsequent fusion of the feature
vectors (
Figure
4
);

3.

processing of each sensor to achieve high
-
level inferences or decisions, which are
subsequently combined

(
Figure
5
).





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
25
/
80



Figure
3
.

Direct fusion of sensor data.



Figure
4
.

Representation of sensor data via feature vectors, with subsequent fusion of the fea
ture vectors.





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
26
/
80



Figure
5
.

Processing of each sensor to achieve high
-
level inferences or decisions, which are subsequently combined.

If the multisensor
data are commensurate (i.e.
the sensors are measuring the same physical
phenomena such as two visual image sensors or two acoustic sensors) then the raw sensor data
can be directly combined. Techniques for raw data fusion typically involve classic estimation
methods such as Kalman fi
ltering
[82]
.

Conversely, if the sensor data are non
-
commensurate then
the data must be fused at the feature/state vector level or decision level.

Feature
-
level fusion involves the extraction of representative feat
ures from sensor data. An
example of feature extraction is the cartoonist’s use of key facial characteristics to represent the
human face. This technique

which is popular among political satirists

uses key features to
evoke
the
recognition of famous figure
s. Evidence confirms that humans utilize a feature
-
based
cognitive function to recognize objects

[83]
.
In the case of multisensor feature level f
usion,
features are extracted from multiple sensor observations and combined into a single c
oncatenated
feature vector that serves as
an
input to pattern recognition techniques such as neural networks,
clustering algorithms, or template methods.

Decision
-
l
evel fusion combines
the
sensor information after each sensor has made a preliminary
determination of an entity’s location, attributes, and identity. Examples of decision level fusion
methods include weighted decision methods (voting techniques), classical

inference, Bayesian
inference, and Dempster

Shafer’s method.

2.3.2.3

Data Fusion process definition

One of the historical barriers to technology transfer in data fusion has been the lack of a unifying
terminology that crosses application
-
specific boundaries. Even

within military applications, related
but distinct applications

such as IFF, battlefield surveillance, and automatic target recognition



used different definitions for fundamental terms such as correlation and data fusion. To improve
communications among

military researchers and system developers, the Joint Directors of




Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
27
/
80


Laboratories (JDL) Data Fusion Working Group (established in 1986) began an effort to codify the
terminology related to data fusion. The result of that effort was the creation of a process

model
for data fusion and a data fusion lexicon represented in
Figure
6
.


Figure
6
.

Joint Directors of Laboratories process model for data fusion.

The JDL process model, which is intended to be very general and useful across multiple application
areas, identifies the processes, functions, categories of techniques, and specific techniques

applicable to data fusion. The model is a two
-
layer hierarchy. At the top level, shown in

Figure
6
,
the data fusion process is conceptualized by senso
r inputs, human

computer interaction,
database management, source preprocessing, and six key subprocesses:

Level 0 processing (sub
-
object data association and estimation) is aimed at combining pixel or
signal level data to obtain initial information about
the characteristics of
an observed
target
.

Level 1 processing (object refinement) is aimed at combining sensor data to obtain the most
reliable and accurate estimate of an entity’s position, velocity, attributes, and identity (to
support prediction estimat
es of future position, velocity, and attributes)
.

Level 2 processing (situation refinement) dynamically attempts to develop a description of
the
current relationships among entities and events in the context of their environment. This
entails object cluste
ring and relational analysis such as force structure and cross
-
force relations,
communications, physical context, etc.





Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
28
/
80


Level 3 processing (significance estimation) projects the current situation into the future to draw
inferences about enemy threats, frien
d and foe vulnerabilities, and opportunities for
operations
(and also
prediction

of consequences
, susceptibility, and vulnerability assessments).

Level 4 processing (process refinement) is a meta
-
process that monitors the overall data fusion
process to
assess and improve real
-
time system performance. This is an element of resource
management.

Level 5 processing (cognitive refinement) seeks to improve the interaction between a fusion
system and one or more user/analysts. Functions performed include aids f
or visualization,
cognitive assistance, bias remediation, collaboration, team
-
based decision making, course of
action analysis, etc.

The data fusion process model is augmented by a hierarchical taxonomy that identifies categories
of techniques and algorith
ms for performing the identified functions. An associated lexicon has
been developed to provide a consistent definition of data fusion terminology. See
[84]

for further
details.

2.3.2.4

State of the art

The technology

of multisensor data fusion is rapidly evolving. There
are

much
simultaneous

research
es

ongoing to develop new algorithms, to improve existing algorithms, and to assemble
these techniques into an overall architecture capable of addressing diverse data fusion
applications.

The most mature area of data fusion process is level 1 processing

usin
g multisensor data to
determine the position, velocity, attributes, and identity of individual objects or entities.
Determining the position and velocity of an object on the basis of multiple sensor observations is a
relatively old problem. Gauss and Legen
dre developed the method of least squares for
determining the orbits of asteroids
[85]
. Numerous mathematical techniques exist for performing
coo
rdinate transformations

in space
, associating observations to
other
observations or to tracks,
and estimating the position and velocity of a target. Multisensor target tracking is dominated by
sequential estimation techniques such as the Kalman filter. Cha
llenges in this area involve
circumstances in which there is a dense target environment, rapidly
manoeuvring

targets, or
complex signal propagation environments (e.g., involving multipath propagation, co
-
channel
interference, or clutter). However, single
-
t
arget tracking in excellent signal
-
to
-
noise environments
for dynamically well
-
behaved (i.e., dynamically predictable) targets is a straightforward, easily
resolved problem.

Current research focuses on solving the assignment and
manoeuvring

target problem.
Techniques
such as multiple
-
hypothesis tracking (MHT) and its extensions, probabilistic data association
methods, random set theory, and multiple criteria optimization theory are being used to resolve
these issues. Recent studies have also focused on relax
ing the assumptions of the Kalman filter




Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
29
/
80


using techniques such as particle filters and other methods. Some researchers are utilizing
multiple techniques simultaneously, guided by a knowledge
-
based system capable of selecting the
appropriate solution on the

basis of algorithm performance.

A special problem in level 1 processing involves the automatic identification of targets on the basis
of observed characteristics or attributes. To date, object recognition has been dominated by
feature
-
based methods in whi
ch a feature vector (i.e., a representation of the sensor data) is
mapped into feature space with the hope of identifying the target on the basis of the location of
the feature vector relative to decision boundaries

determined a priori
.

Popular pattern rec
ognition techniques include neural networks, statistical classifiers, and vector
machine approaches. Although numerous techniques are available, the ultimate success of these
methods relies on the selection of good features. (Good features provide excellen
t class
separability in feature space, whereas bad features result in greatly overlapping feature space
areas for several classes of target.) More research is needed in this area to guide the selection of
features and to incorporate explicit knowledge abou
t target classes. For example, syntactic
methods provide additional information about the makeup of a target. In addition, some limited
research is proceeding to incorporate contextual information

such as target mobility with
respect to terrain

to assist i
n target identification.

Level 2 and level 3 fusions (situation refinement and threat refinement) are currently dominated
by knowledge
-
based methods such as rule
-
based blackboard systems, intelligent agents, Bayesian
belief network formulations, etc. These

areas are relatively immature and have numerous
prototypes, but few robust, operational systems. Many efforts of A
STUTE

use
-
cases will focus on
improving these levels. The main challenge in this area is to establish a viable knowledge base of
rules, frame
s, scripts, or other methods to represent knowledge about situation assessment or
threat assessment. Unfortunately, only primitive cognitive models exist to replicate the human
performance of these functions. Much research is needed before reliable and lar
ge
-
scale
knowledge
-
based systems can be developed for automated situation assessment and threat
assessment. New approaches that offer promise are the use of fuzzy logic and hybrid
architectures, which extend the concept of blackboard systems

to hierarchica
l and multi

time
scale orientations.

Another significant approach is the one proposed by
[86]

on team
-
based intelligent agents. Thes
e
agents emulate the way human teams collaborate, proactively exchanging information and
anticipating information needs.

Level 4 processing, which assesses and improves the performance and operation of an ongoing
data fusion process, has a mixed maturity.
For single
-
sensor operations, techniques from
operations research and control theory have been applied to develop effective systems, even for
complex single sensors such as phased array radars. By contrast, situations that involve multiple
sensors, externa
l mission constraints, dynamic observing environments, and multiple targets are




Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
30
/
80


more challenging. To date, considerable difficulty has been encountered in attempting to model
and incorporate mission objectives and constraints to balance
an
optimized perfor
mance with
limited resources, such as computing power and communication bandwidth (e.g., between
sensors and processors), and other
variable
s. Methods from utility theory are being applied to
develop

adequate

measures of system performance and effectivenes
s. Knowledge
-
based systems
are being developed for context
-
based approximate reasoning. Significant improvements would
result from the advent of smart, self
-
calibrating sensors, which can accurately and dynamically
assess their own performance.

The advent
of distributed network
-
centric environments, in which sensing resources,
communications capabilities, and information requests are very dynamic, creates serious
challenges for level 4 fusion. It is difficult (or
perhaps

impossible) to optimize resource uti
lization
in such an environment.
In
[87]

the authors applied concepts
from

market
-
based auctions to
dynamically allocate resources, treating sensors and communication systems as suppliers of
services, and users and algorithms as consumers, to rapidly assess how to allocate system
resources to

satisfy the consumers of information.

Data fusion has suffered from a lack of rigor with regard to the test
ing

and evaluation of
algorithms and the
translation of
research findings from theory to application
s
. The data fusion
community must insist on high

standards for algorithm development, test, and evaluation;
creation of standard test cases; and systematic evolution of the technology to meet realistic
uses
.
It is particularly important, for A
STUTE

partners, to follow as much as possible these guideline
s
during use
-
cases development in order to produce high q
uality and reusable applications
.

2.3.2.5

Dirty Secrets in Data Fusion

In the following we include a significant and shrewd set of issues from the original article of Hall

[88]
:

Seven challenges in data fusion:

1.

There is no substitute for a good sensor.

2.

Downstream processing cannot absolve the sins of upstream processing.

3.

The fused answer may be worse than the best sensor.

4.

There are no magic algori
thms.

5.

There will never be enough training data.

6.

It is difficult to quantify the value of data fusion.

7.

Fusion is not a static process.

In the following we provide a detailed explanation:



There is still no substitute for a good sensor (and a good human to
interpret the results)

t
his means that if something cannot be actually observed or inferred from effects, then no




Pro
-
Active decision support for data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
31
/
80


amount of data fusion from multiple sensors would overcome this problem. This problem
becomes even more challenging as threats change. The
transition from the search for well
-
known physical targets (e.g., weapon systems, emitters, etc.) to targets based on human
networks causes obvious issues with determining what can and should be observed. In
particular, trying to determine intent is tantam
ount to mind reading, and is an elusive
problem.



Downstream processing still cannot absolve upstream sins (or lack of attention to the
data)

it
is clear that we must do the best processing possible at every step of the
fusion/inference process. For example
, it is necessary to perform appropriate image and
signal processing at the data stage, followed by appropriate transformations to extract
feature vectors, etc., for feature
-
based identity processing. Failure to perform the
appropriate data processing or f
ailure to select and refine effective feature vectors cannot
be overcome by choosing complex pattern recognition techniques. We simply must pay
attention at every stage of the information chain, from energy detection to knowledge
creation.



Not only may the

fused result be worse than the best sensor, but failure to address
pedigree, information overload, and uncertainty may really fowl up things

t
he rapid
introduction of new sensors and use of humans as “soft sensors (reporters)” in network
operations places

special challenges on determining how to weight the incoming data.
Failure to accurately assess the accuracy of the sensor/input data would lead to biases and
errors in the fused results. The advent of networked operations and service
-
oriented
architectur
es (SOA) can exacerbate this problem by rapidly disseminating data and
information without understanding the sources or pedigree (who did what to the data).



There are still no magic algorithms

this

book provides an overview of numerous
algorithms and techn
iques for all levels of fusion. Although there are increasingly
sophisticated algorithms, it is always a challenge to match the algorithm with the actual
state of knowledge of the data, system, and inferences to be made. No single algorithm is
ideal under
all circumstances.



There will never be enough training data

however
, hybrid methods that combine implicit
and explicit information can help. It is well
-
known that pattern recognition methods, such
as neural networks, require training data to establish the
key weights. When seeking to
map an
n
-
dimensional feature vector to one of
m

classes or categories, we need in general
n

×
m

× (10

30) training examples under a variety of observing conditions. This can be very
challenging to obtain, especially with dynami
cally changing threats. Hence, in general,
there will never be enough training data available to satisfy the mathematical conditions
for pattern recognition techniques. However, new hybrid methods that use a combination
of sample data, model
-
based data, an
d human subject explicit information can assist in this
area.





Pro
-
Active decision support for data
-
intensive environments