Executive Summary - Image and Sound Processing Group

flipenergySoftware and s/w Development

Oct 30, 2013 (3 years and 9 months ago)

189 views





Pro
-
Active decision support for data
-
intensive
environments
(ASTUTE)




Contact information




Project coordinator

Silv ia Castellv í

silv ia.castellv i@atosresearch.eu



Administrative manager

Silv ia Castellv í

silv ia.castellv i@atosresearch.eu







Project partners


1.

A
tos Origin SAE (ATOS)

2.

SIRRIS (SIRRIS)

3.

STMicroelectronics S.r.l
(ST
-
Italy )

4.

FluidHouse (FH)

5.

Luciad (Luciad)

6.

Fundación Tekniker
(Tekniker)

7.

Acondicionamiento
Tarrasense

(Leitat)

8.

Equalid Solutions, S.L.
(EQUALID)

9.

IRIDA Labs (IRIDA)

10.

Brno Univ ersity of
Technology (BUT)

11.

Integrasy s S.A. (INT)

12.

Politecnico di Milano
(PoliMi)

13.

Tampere Univ ersity of
Technology Foundation
(TUT)

14.

IOS International (IOS)

15.

Pininf arina S.P.A. (PF)

16.

Camea,

spol, s.r.o. (Ltd.)
(Camea)

17.

Akhela srl (Akhela)

18.

Telecom Design S.r.l. (TD)

19.

El Jardín de Junio (Jarjun)

20.

Namahn (Namahn)

21.

Prodatec Oy (PDT)

22.

Smart Solutions
Technologies S.L.
(NUUBO)

23.

Thales (Thales)

24.

Ecole Nationale de
Sciences Cognitiv es
(ENSC)

25.

THT Control
(THT)



Collaborative Project


ARTEMIS JOINT UNDERTAKING

ASP8, Human
-
centric design of embedded systems


Reference designs and architectures

Seamless connectivity and middleware

Design methods and tools


D2.2


Survey of state of the art
enabling
technologies and associated gap
analysis w.r.t ASTUTE objectives



Due date of deliverable:

[31
-
08
-
2011]

Actual submission date: [Date
-
MM String YYYY]



S
tart date of project: 01.03.2011


Duration: 36

months


Project

co
-
funded by the Euro
pean Commission within the
Seventh

Framework Programme

(2002
-
2006)

Dissemination Level

PU

Publ i c

X

PP

Restri cted to other programme parti ci pants (i ncl udi ng the
Commi ssi on Servi ces)



RE

Restricted to a group specified by the
consortium
(including the Commission Services)


CO

Confi denti al, onl y for members of the consorti um
(i ncl udi ng the Commi ssi on Servi ces)





Organisation name of lead contractor for this deliverable
:
TUT






Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
ii
/
79


Revision

History

Revision

Date

Description

Issued by

1.0

3
1/0
5
/20
11

Creation of
a template
with initial
structure
; initial structure in
Wiki


TUT
; PoliMi

1.01

20/08/2011

Working draft

TUT, ENSC, Luciad, Sirris

1.02

22/08/2011

Working draft

Sirris, JarJun

1.03

29/08/2011

Working draft

PoLiMi, IRIDA

1.04

29/08/2011

Working draft

ST, PoliMi, IRIDA

1.05

06/09/2011

Pre
-
release 1

Polimi,
TUT, ATOS, Tekniker

1.0
6

0
7
/09/2011

Pre
-
release
2

TUT

1.07

0
8
/09/2011

Release 1
st

milestone

PoLiMi





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
3
/
79


Table of Contents

EXECUTIVE SUMMARY
................................
................................
................................
...................
10

1

INTRODUCTION

................................
................................
................................
......................
11

1.1

This Document

................................
................................
................................
................................
................................
..........

11

1.2

Objective

................................
................................
................................
................................
................................
....................

11

1.3

Intended/Main Audience

................................
................................
................................
................................
.......................

11

1.4

Outline

................................
................................
................................
................................
................................
........................

11

2

INFORMATION
RETRIEVAL AND FUSION
................................
................................
...........
13

2.1

Intelligent Context
-
aware Information Retrieval
................................
................................
................................
..............

13

2.1.1

Si tuati on Awareness (ENSC)

................................
................................
................................
................................
.........

14

2.1.2

Context retri eval from visual features (PoLi Mi )

................................
................................
................................
.......

14

2.1.3

Structure from Moti on (ST)

................................
................................
................................
................................
..........

15

2.2

Intelligent User State Information Retrieval

................................
................................
................................
......................

17

2.2.1

Affecti ve User State (JarJún)

................................
................................
................................
................................
........

17

2.2.2

User State Informati on Retri eval (Tekni ker)

................................
................................
................................
.............

17

2.2.3

Affect Detecti on (PoLi Mi )

................................
................................
................................
................................
.............

18

2.3

Multimodal Information Fusion
................................
................................
................................
................................
............

19

2.3.1

Overvi ew (IRIDA)

................................
................................
................................
................................
............................

19

2.3.2

Mul tisensor Data Fusi on (PoLi Mi )

................................
................................
................................
...............................

21

2.3.3

Data retri eval and exchange (ATOS)

................................
................................
................................
...........................

31

3

CONTEXT MODELLING AN
D PRO
-
ACTIVE DECISION SUPP
ORT
................................
.....
38

3.1

Context Modelling

................................
................................
................................
................................
................................
...

38

3.1.1

Ontol ogy
-
based model s (Si rri s)

................................
................................
................................
................................
...

39

3.1.2

Tracki ng Progress i n Collaborati ve Envi ronments (Sirris)

................................
................................
.......................

42

3.1.3

Probabilisti c Models (Tekni ker)

................................
................................
................................
................................
...

50

3.1.4

Probabilisti c User State (Jarjún)
................................
................................
................................
................................
...

50





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
4
/
79


3.2

Reasoning Techniques
................................
................................
................................
................................
.............................

50

3.2.1

Ontol ogical Reasoners (TUT)
................................
................................
................................
................................
........

50

3.2.2

Probabilisti c Reasoni ng (JarJún and Tekni ker)

................................
................................
................................
..........

53

3.3

Naturalistic Decision Making (JarJún)
................................
................................
................................
................................
..

53

4

INTERFACE DESIGN

................................
................................
................................
................
54

4.1

Multimodal Interface Design

................................
................................
................................
................................
.................

54

4.1.1

Devel opment of mul ti modal i nterfaces (ENSC)

................................
................................
................................
........

54

4.2

Proactive Interfaces (IRIDA)
................................
................................
................................
................................
...................

55

4.3

Adaptive User Interfaces

................................
................................
................................
................................
........................

57

4.3.1

Map Centri c Data Fusi on and Vi suali zati on (Luciad)
................................
................................
................................

58

5

CONCLUSIONS
................................
................................
................................
..........................
62

6

BIBLIOGRAPHY AND REF
ERENCES

................................
................................
.....................
63

6.1

Normative references

................................
................................
................................
................................
.............................

63

6.2

Documents and Books
................................
................................
................................
................................
.............................

64

7

ANNEXES

................................
................................
................................
................................
..
79








Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
5
/
79


Figures

Figure 1.

Different levels of information fusion coming from different audio
-
visual sources.
.........

20

Figure 2.

A moving object observed by both a pulsed radar and an infrared imaging sensor.

........

23

Figur
e 3.

Direct fusion of sensor data.
................................
................................
...............................

24

Figure 4.

Representation of sensor data via feature vectors, with subsequent fusion of the feature
vectors.
................................
................................
................................
................................
...............

24

Figure 5.

Processing of each sensor to achieve high
-
level inferences or decisions, which are
subsequently combined.

................................
................................
................................
....................

25

Figure 6.

Joint Directors of Laboratories process model for data fusion.

................................
.........

26

Figure 7.

CONON hierarchy.
................................
................................
................................
...............

40

Figure 8.

SOUPA ontology.
................................
................................
................................
.................

41

Figure 9.

Tangible Business Process Models: Toolkit.

................................
................................
.......

44

Figure 10.

Example of part of a disaster plan expressed in rule
-
based language.

............................

45

Figure 11.

Example Workflow of an Emergency Plan.
................................
................................
.......

47

Figure 12.

WorkItem State Diagram.

................................
................................
................................
.

47

Figure 13.

ERMA process model.

................................
................................
................................
.......

49




Tables

Table 1.

Set of technologies to be applied in ASTUTE:

................................
................................
......

31







Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
6
/
79


List of
Abbreviations

Term

Meaning

AAMI

Auto
-
Adaptive Multimedia Interfaces

AMEBICA

Auto Adaptive Multimedia Environment Based on Intelligent Agents

AMR

Adaptive Multi
-
Rate

ANEW

Affective Norm for English Words

API

Application Programming Interface

BA

Bundle Adjustment

BCI

Brain
-
Computer Interface

BPMN

Business Process Model and Notation

BPMS

Body Pressure Measurement System

CBIR

Context
-
Based Image Retrieval

CG

Conceptual Graph

DAML

DARPA Agent Markup Language

DFD

Data Flow Diagrams

DIG

Description Logic
Implementation Group

DL

Description Logic

DoG

Difference of Gaussians

DRM

Digital Rights Management

DSA

Digital Signature Algorithm

DXF

Drawing Interchange Format





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
7
/
79


ECC

Elliptical Curve Cryptography

EDA

Electrodermal Activity

EEG

Electroencephalography

EKG

Electrocardiograms

EMG

Electromyogram

EOG

Electrooculography

ESPG

European Petroleum Survey Group

ESPRIT

European Strategic Program on Research in Information Technology

FACS

Facial Action Coding System

FAST

Features from
Accelerated Segment Test

FLIR

Forward
-
Looking Infrared

GIF

Geographical Information Retrieval

GIS

Geographical Information System

GPS

Global Positioning System

HMI

Human Machine Interaction

HOG

Histogram of Oriented Gradients

HTML

HyperText Markup

Language

IFF

Identification Friend or Foe

IR

Information Retrieval

KLT

Kanade
-
Lucas
-
Tomasi

LAI

Location Area Identity





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
8
/
79


LM

Levenberg
-
Marquardt

MHT

Multiple
-
Hypothesis Tracking

MVC

Model
-
View
-
Controller

NFC

Near Field Communication

NLP

Natural
Language Processing

nRQL

new Racer Query Language

OGC

Open Geospatial Consortium

OIL

Ontology Inference Layer

OS

Operating System

OWL

Web Ontology Language

P2P

Peer to Peer

PKI

Public Key Infrastructure

pKLT

pyramidal Kanade
-
Lucas
-
Tomasi

RANSAC

RANdom SAmple Consensus

RAT

Radio Access Technology

RDF

Resource Description Framework

RFID

Radio
-
F
r
equency ID
entification

RGBA

Red Green Blue Alpha

RSA

Rivest, Shamir,
Adleman

(
encryption algorithm
)

RSSI

Received Signal Strength I
ndicator

SA

Situation Awareness





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
9
/
79


SfM

Structure from Motion

SIFT

Scale
-
invariant feature transform

SLAM

Simultaneous localization and mapping

SOA

Service
-
Oriented Architecture

SOUPA

Standard Ontology for Ubiquitous and Pervasive Applications

SURF

Speeded Up Robust

Feature

SWRL

Semantic Web Rule Language

TBPM

Tangible Business Process Modelling

TDOA

Time Difference Of Arrival

TOA

Time Of A
rrival

UI

User Interface

UML

Unified Modeling Language

UMTS

Universal Mobile Telecommunications System

U
-
OTDOA

Uplink
Observed Time Difference Of Arrival

VHF

Very High Frequency

VOR

VHF Omnidirectional Range

VQ

Vector Quantization

WFS

Web Feature Service

WM
S

Web Map Service

WS
-
BPEL

Web Services Business Process Execution Language

XML

Extensible Markup Language





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
10
/
79


Executive Summary

This document contains the first version of D2.2. It
presents a collection of related works from the
various domains of the partners’ expertise
.
In further versions, the content
will be further
elaborated along with other on
-
going project

activities, discussed and refined in collaboration with
all involved partners

in order to produce a coherent survey on enabling technologies
.






Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
11
/
79


1

Introduction

1.1

This Document

This document
provides a survey of state of the art enabling technologies and associ
ated gap
analysis w. r. t. ASTUTE objectives.

The survey scrutinizes
algorithms, models, hardware and
software technologies, services, content and sensing technologies relevant to ASTUTE objectives

in
order to facilitate developments in all R&D work packag
es of ASTUTE
.


The current version of the document provides a collection of related works from the various
domains of the partners’ expertise. The content is organised in three main blocks which are
informational retrieval and fusion, context modelling and

pro
-
active decision support, and
interface design. Next revisions of the document are targeting to
add more value to the topic and
to
reorganise the heterogeneous knowledge of the consortium
members
to a
smooth and
coherent

paper.


1.2

Objective

The survey is
addressing
two

main objectives:


1)

T
o reveal
the current
state of the art in the fields related to ASTUTE interest, and

2)

T
o identify opportunities for further
progress and innovative development align with
ASTUTE objectives.

In order to achieve

ASTUTE specific objectives

the document is targeting:



T
o present the key parameters of both user state and situational context which essential
for the decision making



T
o provide the ground for the context modelling with a review of the information retrie
val
and filtering, and context modelling technologies, methods and tools



T
o review

the technologies enabling the decision support.


1.3

Intended/
Main Audience

The main audience of this document are all ASTUTE contributors, being part of the consortium or
not
.

<<Add more about audience>>


1.4

Outline

For rest of the document o
utline is as following:





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
12
/
79




Chapter
2

is dedicated to information retrieval and fusion with focus on i
ntelligent
c
ontext
-
aware information retrieval
, i
ntelligent user state information retrieval
,

and m
ultimodal
information fusion
.



Chapter
3

surveys context modelling,
reasoning techniques

and n
aturalistic decision
making
.



Chapter
4

is dedicated to interface design and focused on three topics: m
ultimodal
interface design
, p
roactive interfaces
, and
a
daptive user interfaces
.



C
onclusions

are summarized
in chapter
5
.



List of references

is provided in the end of the document.






Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
13
/
79


2

Information Retrieval and Fusion

In the current version of the deliverable, t
he content of the chapter 2 is organised around t
wo
topics which are
the technologies of
context
-
aware informational re
trieval and the technologies
specific for user
-
state information retrieval.


2.1

Intelligent Context
-
aware Information Retrieval

Introduction by TUT

Increasing amount of information flow from various information sources challenges traditional
Information Retrieval (IR) techniques and require further
investigation

in IR. Recent publications
on IR systems show
the

interest
of researchers
in context
-
aware

IR, which
is

sensitive to the user
location, preferences and interest, time and current state of the environment
. According to
[1]

the
IR system “is context
-
aware if it exploits context data in order to deliver relevant information to
the user”. Applications of context
-
aware IR systems are numerous: geographical information
retrieval (
GIF
) (which aims to deliver information to the u
ser based on his/her current location)

[2]
; mobile IR (which aims to provide content adaptation for the mobile devices in addition to
GIF

goals)
[3]
[4]
[5]
[6]
[7]
;

context aware search engines
[8]
[9]
[10]
[11]
,

etc
.

Context
-
aware IR systems combine traditional IR techniques like

HTML
-
aware tools, NLP
-
based
tools, and ontology
-
based tools
[12]

with
the
current context. For example, models based on
ontologies could be augmented with user location and context
,

for the refinement of queries in
information retrie
val process
[2]
[5]
[9]
.

According to the general framework proposed
in
[3]

for context
-
aware IR systems their main
functions are

context
modelling

and

context retrieval.

The main objective of

context
modelling

is to extend the original user query according to current
context. Context
modelling

is focused on reflecting current user context

or on

inferring from the
external or internal sources, user profiles, current and past
behaviours

of the user. Different
cont
ext models could be used to access the context information: tags or key
-
value models
[4]
[8]
,
keyword vectors or vector classes
[6]
[13]
, graphs
[11]
, ontologies
[2]
[5]
[9]
[14]
, etc
.

Context retrieval

aims to deliver the right information to the user
by
exploiting different query
refinement techniques with query reformulation, data ranking and semantic queries
[5]
[6]
[7]
[9]
[11]
[13]
[14]
.

Concluding, context
-
a
ware IR is a novel field of
research with various approaches under
consideration
, and the a
pproaches for different applications need further
adjustments

[15]
.


Situatio
n Awareness (SA)

plays a predominant role in decision making of a user while operating a
technical system
[16]
. In order to support SA and goal
-
oriented behaviour of a user an




Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
14
/
79


Informational Retrieval (IR) system should provide the relevant information

in a context
-
aware
manner.
The next subsection provides with the definition of
SA and its role in user behaviour.

2.1.1

Situation Awareness

(
ENSC
)

One of the aspects
of intelligent context aware information
concerns the
SA

and its role in
dynamic human deci
sion making. As technology
progress
, users must deal with dynamic systems
that increase the complexity of effective and timely decision making. The operator’s situation
awareness is presented as a predominant concern in this decision making and performance
process
when operating the systems
. To construct a theoretical model of SA, Endsley (1995)
explores the relationship between situation awareness and numerous individual and
environmental variables, like attention and working memory, as critical factors limiting operators
from acquiring and

interpreting information from the environment to form situation awareness.
Mental models and goal
-
directed
behaviour

are hypothesized as important mechanisms for
overcoming these limits
[16]
. For example, in the aircraft domain, pilots are highly dependent on
an

up
-
to
-
date

assessment of the changing situation (operational parameters, externals conditions,
navigatio
nal information …). The Endsley
’s

SA models describes and synthesizes different cognitive
resources and mechanisms that might be involved in con
structing and maintaining SA
[16]
,

such as
dynamic goal selection, attention to appropriate critical clues, expectancies regarding future states
of the situation, and tie
s

be
tween SA and typical actions
[17]
.

A second model proposed by Baumann & Krems (2009), based
on the Construction
-
Integration
theory
[18]

explores the situation model processes. For safe driving, it is nec
essary for drivers to
perceive, identify and correctly interpret t
he current traffic situation, to be able to construct
future development and to adapt their driving
behaviour

to the situation

[19]
.

As a conclusion, we use a paper from Wickens (2008) that summarizes two articles by End
sley on
situation awareness

and prese
nts the influence of the concept on subsequent practice and theory
of human factors

[20]
. Situation awareness is a viable and important construct that still has
some
controversy over measurement issues. SA can be applied to the areas of training (information
seeking or teaching predictive skills
[21]
[22]
), error analysis (attentional training
[22]
[20]
), design
(display feature to support SA
[
23]
), prediction

[22]
, teamwork (team dynamics an
d interworker
communications
[24]
), and auto
mation (harmony and workload
[16]
).


2.1.2

Context retrieval from visual features

(PoLiMi)

An important source of context information is from the visual appearance of the surroundings of a
place. For example, in an emergency situation, a worker with a camera
-
enabled device can orient
herself in an unknown place, or carry out complex procedures w
ith the aid of Augmented Reality
(AR). A camera mounted on a car can warn the driver about potential dangerous situations (e.g. a
pedestrian on the road) or, again using AR, show the current itinerary and points of interest.





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
15
/
79


The methods from computer visio
n help
to
retrieve two important types of useful contextual
information: the camera location in a known environment, and the type and position of objects of
interest in the surroundings.

The camera location can be recovered from a set of overlapping images

of the surroundings, using
methods form Structure from Motion (SfM)
[25]

(see also part
2.1.3
)
, and a number of images
large enough also allow
to reconstruct the tridimensional appearance of the surroundings
[26]
. If
at least some of the images have GPS coordinates, all the images can be geographically localized
with accurac
y; for a recent example see
[27]
. Such a method applied to mobile or handled devices
ca
n supplant or integrate GPS and provide useful information not otherwise available, e.g. what is
actually visible form the current location. One early attempt in this direction is the work of
[28]
,
while a more ambitious approach is presented by
[29]
, where the localization is extended to the
wh
ole earth using images from photo
-
sharing services.

The camera localization part is usually done via camera calibration, which in most cases requires
as a first step the detection of feature descriptors invariant with respect to the point of view of the
ca
mera. Currently many of such descriptors exist, and for an extensive overview see
[30]
. The
feature descriptors can also be used to implement Augment
ed Reality (AR) without markers, again
using SfM
[31]
. For storage efficiency, the descriptors can be used in place of whole images when
matching an acquired image a
gainst a known set of images, and a standard for the descriptors is
under development in the

MPEG

Group [
l
].

Object recognition is
a
problem much harder than camera calibration, especially in the real case
scenario of thousands of potential categories of objects
[32]
. The object recognition process, to
model the scene, can use the same type of descriptors used for camera calibration, for example in
the so
-
called bag
-
of
-
words models
[33]
, or use other approaches
[34]
. In particular cases specific
techniques are used,
e.g. Histograms of
Oriented
Gradients (HOG) for pedestrian detection
[35]

or
boosted classifiers for face

detection
[36]
.

2.1.3

Structure from Motion

(ST)

Structure from Motion (SfM) is a computer vision technique directed to the estimation of came
ra
ego
-
motion and surrounding 3
D

shape environment, by anal
ysing a calibrated video.

The input video can be acquired with a mono or a stereo image camera, but in each case the
camera must be calibrated. The calibration task is a standard off
-
line process,

which achieves to
estimate intrinsic camera parameters and the reciprocal position of the cameras in the case of
stereo configuration. For intrinsic parameters here we mean focal length, the camera's centre and
a few distortion coefficients introduced by
the camera's lens
[25]
. A few open libraries are
available for camera calibration, like
OpenCV

[37]
, the
Bouguet toolbox

[38]

and
tclcalib

[39]
.

A reference SfM pipeline is divided in an image
analyser

stage, a camera reconstruction stage, a
triangulation stage and finally a refinement stage.





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
16
/
79


The image
analyser

stage generates a sparse optical motion flow combing
the
detection
and the

tracking
of interest points over time. Well
-
known corner detector are Harris
[40]
, Shi
-
Tomasi
[41]

and FAST
[42]
[43]
,
whereas

to track th
e points the usual

solution
s

are
the
KLT
[44]

and the
PKLT

[45]

algorithm
s
.

The camera estimation process is based on epipolar geometry that re
presents

the mutual
camera’s positions, through

an
essential or fundamental matrix. The family of algorithm
s

used to
compute this type of matrix are called
N
-
points, where
N

is the number of input points;
the most
well
-
known

configurations are 7 and 8
[25]
. In order to obtain a robust camera estimation
, an

N
-
points algorithm is not enough and must be integrated with a RANS
A
C
[46]

algorithm, that
randomly select
s

a set of input corners, output a large set of hypot
hesis for the camera position
and score them using re
-
projection error in order to select the best one.

By knowing
the
camera ego
-
motion
, it

is possible
to
compute a sparse 3d map
by estimating

the
spatial position of the tracked features. This process is
named triangulation and the main
implementation is
in
[47]
.

Finally
,

a refinement stage is need
ed to improve quality using algorithm like LM bundle
adjustment (BA)
[25]
,

w
h
ich

is a technique for simultaneously refining the 3D structure and
the
camera parameters (i.e. camera pose and possibly intrinsic calibration parameters), to o
btain a
reconstruction which is optimal under certain assumptions
about

the noise affecting

the

interest
point detection.

BA amounts to minimize the re
-
projection error between the observed and predicted image
points. Since the prediction of
the
image
points involves an image projection, one must in general
use non
-
linear least squares algorithms, of which the Levenberg
-
Marquardt (LM) has p
roven to be
the most successful,
due to its
d
amping strategy that
allows

it
to converge f
rom a wide range of
initial guesses. By
iteratively
linearizing the function to be minimized in the
neighbourhood

of the
current estimate, the LM algorithm computes the solution of
linear systems

known as
normal
equations
.

In
the
automotive context a lot
of
publications propose

different pipeline combination
s

to

estimate car motion using SfM:
some of them have a monocular system like
[48]
[49]
, while

other
s

use a stereo system

[50]
.

A further improvement is the introduction of invariant features inste
ad of using the combination
of
simpler corner detector
s

and tracker
s
. This family of algorithms describe
s

the region around a
key
-
point in order to preserve some invariance to rigid transform and light changes and
is

able to
identify the position of the same key
-
points in a new image simply comparing two descriptors.

The
matching advantage
is a drastic relax
ation

on
the
PKLT a
ssumption
s

reg
arding the brightness
constancy
, the temporal persistence and the spatial coherence between
the

images

involved
.
More
over,
descriptors can be used to recognize
the
object
s of a scene

in a visual database and
,

for




Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
17
/
79


example
, retrieve useful info
rmation about

the environment. The main state of
the
art algorithms
are SIFT
[51]

and

SURF

[52]
,

which

approximate
the
interest point detection
,

based on Diff
erence
of G
aussians
,

using Fast Hessian and Integral Image in order to reduce the computational cost
at

the expense of quality
.

An example of invariant features used to estimate camera motion is
[53]
,

where SIFT
is mixed with
SLAM.

For storage efficiency, the descriptors can be further comp
ressed

and used in place of whole
images
, and as said above,

a standard for those descriptors is u
nder development in the MPEG
Group
[l][m]
. A novel algorithm that seems promising in that respe
ct is CHo
G

[5
4]

which,
starting
from DoG or Fast Hessian discretizes t
he oriented patch in two steps:
the first one

named
DAISY

divide
s

the region in circular overlapped area
s
, and

the second one do
es

a gradient
quantization
via vector
quantization (
VQ
). Finally

the final descriptor is again compressed using
Type coding or Huffman coding, obtaining a descriptor
that range

from 44 to 100 bits instead of
the
64
-
128 bytes of SIFT
[51]
.

2.2

Intelligent User State Information Retrieval

2.2.1

Affective User State

(JarJ
ú
n)

An intelligent user assistance system (e.g. human
-
machine interface


HMI) is the one that aims to
improve user’s performance,
and
minimize human errors during the main task. Moreover, and
more important, this system works to mitigate the impact o
f negative states in the user, for
instance, fatigue, stress, confusion, boredom, or anxiety. A HMI is, therefore, emotion
-
aware,
if

it
is both able to recognize emotions and
intelligently or appropriately express them and act

on
them accordingly by providing user assistance. In designing this sort of system, affective
computing
[55]

should be a key conceptual fram
ework, since
it
allows to compute the
relationships between human emotions and task performance. This technology integrates
different modalit
ies

of input information from the user to detect and recognize the current
affective state. For instance, vocal em
otion communication
[56]
, affective facial expression and
gestures
[57]
, and affective psychophysiology
[58]
[59]
.

2.2.2

User State Information Retrieval (Tekniker)

Affective computing focuses on studying and developing systems and

devices capable of
recognizing, interpreting, processing and simulating human affects

(see part
2.2.1
)
. It is an
interdisciplinary field with contributions from co
mputer science, psychology, linguistics, cognitive
and affective sciences, neuroscience, and related disciplines.
The paper “Affective Computing”

[60]

is considered to be the modern ori
gin of the field.





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
18
/
79


A recent review of the state
-
of
-
the
-
art in affect detection is that by Calvo and D’Mello

[61]
. The
number of sensor modalities or channels investigated for detection of affective aspects has
increased since the field’s
inception
, as well as the number of techniques and methods employed,

mostly from the fields of machine lea
rning and psychology. However it seems reasonable that
multimodal systems can provide advantages since emotional events typically activate multiple
user responses. Nonetheless, multimodal affect detection is more challenging and ver
y few
systems have explo
red it.

Support for cognitive
-
affective systems development is
typically found via bibliographic research
or

experimental and field work (see, for example

[62]
), with the objective of gathering annotated
data and knowledge on useful features and relations among user states, context, etc.

2.2.3

Affect Detection

(PoLiMi)

The affective states in
people are inherently multimodal. In this section
the different channels
through which an emotion can be detected

are discussed
, while in
Section
2.3

we will discuss
more
thoroughly

how different modalities can be integrated.

Researchers concentrated mainly on detection through facial expressions, voice, posture,
physiology and textual content
[61]
.

In the study of facial expressions, the goal is typically to identify basic expr
essions linked to human
emotions. A frequently used dictionary for expressions and their link with emotions is the
Facial
Action

Coding System (FACS)
, developed by Ekman and Friesen

[63]
.
Most methods in the field
require a pre
-
segmented sequence of exp
ression
s
, few have real
-
time performances and almost
n
o
one uses contextual cues to help t
he recognition phase
[64]
; further progress is thus needed to
enabl
e real
-
world applications.

Emotion recognition through voice typically uses prosody (rhythm and tone) to recognize the
user’s emotional state. Those methods often suffer from lower accuracy w.r.t.
methods based on
facial expressions
, but on the other hand
most
of them
can work in real
-
time and in realistic
settings
[64]
.

Posture and physiology, differently from voice and facial expressions, measure variables affected
by unconscious reactions,
so they can overcome social editing, i.e., the intentional adjustment of
one’s expressed emotion. An interesting example of post
ure detection is the work of Mota et al.
[65]

on the
Body Pressure Measurement System (BPMS)
, a pressure pad that detect
s

the posture
on a chair. Physiological sensors
offer

an array of signals

that

ra
nge
s

from
Electromyogram
s (EMG)
on the muscles,
to
detection of
Electrodermal Activity (EDA)
,
to
Electrocardiogram
s (EKG)

and
Electrooculogram
s (EOG)
[61]
.

Lastly, emotion can be inferred by the tone and choice of words of a written text or a transcript.
The emotional characterization can be at the level of single words, as in the pioneering

work of




Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
19
/
79


Osgood et al.
[66]
, or based on lexical analysis of corpora of

texts. Several projects are attempting
to create emotional ratings for common words

[61]
, among which we cite the
Affective Norm for
English Words

(ANEW)

[67]
.

2.3

Multimodal Information Fusion

Introduction by
JarJún

and
Tekniker

It is obvious that a successful HMI system should rely on multiple modalities, for instance, audio,
visual, even haptic modalities, and not only on sensory information, but also
on
emotional (e.g.
stress, frustration) inform
ation and physiological states (e.g. interest, engagement). Traditionally,
multimodal information could be processed at different levels. The lowest level is at the input or
raw data from sensory sensors, for instance pattern recognition (audio
-
visual spee
ch processing)
or multi
-
biometrics information (EEG, facial expression, EOG) from the user state. The highest
level takes place at the output of the system, for instance, decision
-
level processing
[68]
. Different
approaches have been developed to build a multimodal decision making: majority voting
[69]
,
weighted majority voting
[70]
, Bayesian decision fusion
[71]
, and behaviour knowledge space
[72]
.
Between the lowest and the highest levels, a number of intermediate levels take place, for
instance foc
using on feature fusion (algorithms for feature extraction and classification into an
appropriate feature)
[73]
. A relatively recent fusion approach to multimodal fusion is the so
-
called
adaptive fusion
[74]
. The main idea
of

this approach is to measure the signal quality of each input
modality and then use this information at the fusion level. Li & Ji developed a probabilistic metho
d
to make decision on assistance depending on the utility of such assistance

[75]
.


F
usion techniques are
often
employed for merging information fr
om sensors

[61]
. Data fusion
works at the lowest level merging raw data
streams
that,
therefore, are synchronous and have the
same temporal resolution. Feature fusion works at the level of characteristics extracted from
signals. Finally, decision fusion works at the highest level and consists in applying techniques for
merging the outputs o
f expert classifiers. Model
-
based fusion, based on existing knowledge and
methods, although largely unexplored has also been advocated

[64]
.


2.3.1

Overview

(IRIDA
)

Fusion of distinct modalities is one of the features that distinguish multimodal interfaces from
unimodal ones. The challenge is to increase the robustness of an analysis system by combining
meaningful information from different modalities. Fusion can be

executed in three levels: (a) data
-
based fusion that can be implemented when dealing with multiple signals coming from similar
modality sources, (b) feature
-
based fusion that can be used in the basis of combining commonly
extracted features, and (c) decis
ion
-
based fusion that tries to integrate multiple decisions from
different sources into a single one. Three different types of architecture

can in turn manage




Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
20
/
79


decision
-
level fusion: frame
-
based architecture
s
, unification
-
based architectures
and

hybrid
s
ymbolic/statistical fusion architectures.

In the case of context
-
aware data, the multimodal nature creates an essential need for
information fusion for its analysis, indexing and retrieval. Fusion also greatly impacts other tasks
like object recognition, s
ince all objects exist in multimodal spaces. Besides the more classical data
fusion approaches in robotics, image processing and pattern recognition
[76]
, the information
retrieval community discovered some years ago its po
wer in combining multiple information
sources
[77]
[78]
. To enhance human
-
computer communicat
ion, multimodal interaction h
as
vastly
developed in the last few years.




Figure
1
.

Different levels of information fusion coming from different audio
-
visual sources.







Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
21
/
79


ASTUTE intends to work on the development of novel methods of multi
-
modal, multi
-
level fusion
that integrates contextual information obtained from spoken input and visual scene analysis. By
taking into account the user’s context with the emotional and psyc
hological states, the system will
interact naturally with humans in order to interpret human behaviour.

Nowadays, it is possible to design a specialized system with some of the functionality needed for
effective human
-
machine interaction (HMI). This becam
e possible by extracting information
arriving simultaneously from different communication modalities and combining them into one or
more unified and coherent representations of the user’s intention. The context
-
aware, user
-
centred application should accept

spontaneous multi
-
modal input speech, gestures (pointing,
iconic, possibly metaphoric) and physical actions; it should react to events, identify the user’s
preferences, recognize intentions and emotions, and possibly predict the user’s behaviour and
gener
ate the system’s own response. Next
-
generation HMI designs need to include the essence of
emotional intelligence: specifically, the ability to recognize a user's affective states


in order to
become more efficient, more effective, and more human
-
like. Aff
ective arousal modulates all
nonverbal communicative cues (facial expressions, body movements, and vocal and physiological
reactions)
[79]
.

In the practice of system design, the following points are considered: sensors or sources of
information, select
ion of

the most relevant features of the signals, fusion level, fusion strategy and
fusion architecture, and if further background or domain
knowledge can b
e embedded.
In order to
capture the information, one uses different types of sensors,
i.e.,
microphones to capture

the
audio signal, cameras to capture life video images, 3D sensors to directly
capture the surface
information

in real time.
Apart from audio and visual information
,

as mentioned
before
, human
also relies on the haptic modality, smell and taste. From these basic sensory information, higher
cues such as 3D and temporal information, as well as emotional (e.g., stress, frustration)

and
psychological state (e.g., interest) can also be derived
[80]
. The fusion of information from
heterogeneous sensors is crucial to the effectiveness of a multimodal system.

Exploi
ting the
feature’s and modality’s dependencies will yield to maximal performance
[81]
. Fus
ing the
multimodal data result

in a large increase in the recognition rates in comparis
on with the
unimodal systems
[80]
.


2.3.2

Multisensor Data Fusion

(PoLiMi)

2.3.2.1

Introduction

The concept of multisensor data fusion is hardly new. As humans and animals evolved, they
developed the ability to use multiple senses to help them survive. For exam
ple, assessing the
quality of an edible substance may not be possible using only the sense of vision; the combination
of sight, touch, smell, and taste is far more effective. Similarly, when vision is limited by structures




Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
22
/
79


and vegetation, the sense of hear
ing can provide advanced warning of impending dangers. Thus,
multisensory data fusion is naturally performed by animals and humans to assess more accurately
the surrounding environment and to identify threats, thereby improving their chances of survival.
I
nterestingly, recent applications of data fusion
[82]

have combined data from an artificial nose
and an artificial tongue using neural networks and fuzzy logic.

Although the concept of data fusion is not new, the emergence of new sensors, advanced
processing techniques, improved processing hardware, and wideband communications has made
real
-
time fusion of data increasingly viable. Just as the advent of symbolic pr
ocessing computers
(e.g., the Symbolics computer and the Lambda machine) in the early 1970s provided an impetus to
artificial intelligence, the recent advances in computing and sensing have provided the capability
to emulate, in hardware and software, the
natural data fusion capabilities of humans and animals.
Currently, data fusion systems are used extensively for target tracking, automated identification of
targets, and limited automated reasoning applications. Data fusion technology has rapidly
advanced
from a loose collection of related techniques to an emerging true engineering discipline
with
a
standardized terminology,
a
collection of robust mathematical techniques, and
an
established system
of
design principles.

Fused data from multiple sensors provi
de several advantages over data from a single sensor. First,
if several identical sensors are used (e.g., identical radars tracking a moving object), combining the
observations would result in an improved estimate of the target position and velocity. A sta
tistical
advantage is gained by adding the
N

independent observations (e.g., the estimate of the target
location or velocity is improved by a factor proportional to
N
1/2
), assuming the data are combined
in an optimal manner. The same result could also be o
btained by combining
N

observations from
an individual sensor.

The second advantage is that
by
using the relative placement or motion of multiple sensors the
observation process can be improved. For example, two sensors that measure angular directions
to a
n object can be coordinated to determine the position of the object by triangulation. This
technique is used in survey
s

and for commercial navigation (e.g.,
VHF omni
directio
nal range
[VOR]). Similarly two

sensors, one moving in a known way with respect to
another, can be used to
measure instantaneously an object’s position and velocity with respect to the observing sensors.

The third advantage gained using multiple sensors is improved observability. Broadening the
baseline of physical observables can result

in significant improvements.





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
23
/
79



Figure
2
.

A moving object observed by both a pulsed radar and an infrared imaging sensor
.

Figure
2

provides a simple example of a moving object, such as an aircraft, that is observed by
both a pulsed radar and a forward
-
looking infrared (FLIR) imaging sensor. The ra
dar can accurately
determine the aircraft’s range but has a limited ability to determine the angular direction of the
aircraft. By contrast, the infrared imaging sensor can accurately determine the aircraft’s angular
direction but cannot measure the range.

If these two observations are correctly associated (as
shown in
Figure
2
), the combination of the two sensors provides a bette
r determination of
location than could be obtained by either of the two independent sensors. This results in a
reduced error region, as shown in the fused or combined location estimate. A s
imilar effect may
be obtained by

determining the identity of an obj
ect on the basis of the observations of an
object’s attributes.

2.3.2.2

Possible architectures

Three basic alternatives can be used for multisensor data:

1.

direct fusion of sensor data (
Figure
3
);

2.

representation of sensor data via feature vectors, with subsequent fusion of the feature
vectors (
Figure
4
);

3.

processing of each sensor to achieve high
-
level inferences or decisions, which are
subsequently combined

(
Figure
5
).





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
24
/
79



Figure
3
.

Direct fusion of sensor data.



Figure
4
.

Representation of sensor data via feature vectors, with subsequent fusion of the feature vectors.





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
25
/
79



Figure
5
.

Processing of each sensor to achieve high
-
level inferences or decisions, which are subsequently combined.

If the multisensor
data are commensurate (i.e.
the sensors are measuring the same physical
phenomena such as two visual image sensors or two acoustic

sensors) then the raw sensor data
can be directly combined. Techniques for raw data fusion typically involve classic estimation
methods such as Kalman filtering
[82]
.

Convers
ely, if the sensor data are non
-
commensurate then
the data must be fused at the feature/state vector level or decision level.

Feature
-
level fusion involves the extraction of representative features from sensor data. An
example of feature extraction is the
cartoonist’s use of key facial characteristics to represent the
human face. This technique

which is popular among political satirists

uses key features to
evoke
the
recognition of famous figures. Evidence confirms that humans utilize a feature
-
based
cognit
ive function to recognize objects

[83]
.
In the case of multisensor feature level fusion,
features are extracted from multiple sensor observations and combined into a single c
onc
atenated
feature vector that serves as
an
input to pattern recognition techniques such as neural networks,
clustering algorithms, or template methods.

Decision
-
level fusion combines
the
sensor information after each sensor has made a preliminary
determinat
ion of an entity’s location, attributes, and identity. Examples of decision level fusion
methods include weighted decision methods (voting techniques), classical inference, Bayesian
inference, and Dempster

Shafer’s method.

2.3.2.3

Data Fusion process definition

On
e of the historical barriers to technology transfer in data fusion has been the lack of a unifying
terminology that crosses application
-
specific boundaries. Even within military applications, related
but distinct applications

such as IFF, battlefield surve
illance, and automatic target recognition



used different definitions for fundamental terms such as correlation and data fusion. To improve
communications among military researchers and system developers, the Joint Directors of




Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
26
/
79


Laboratories (JDL) Data Fus
ion Working Group (established in 1986) began an effort to codify the
terminology related to data fusion. The result of that effort was the creation of a process model
for data fusion and a data fusion lexicon represented in
Figure
6
.


Figure
6
.

Joint Directors of Laboratories process model for data fusion.

The JDL process model, which is intended to be
very general and useful across multiple application
areas, identifies the processes, functions, categories of techniques, and specific techniques
applicable to data fusion. The model is a two
-
layer hierarchy. At the top level, shown in

Figure
6
,
the data fusion process is conceptualized by sensor inputs, human

computer interaction,
database management, source preprocessing, and six key subprocesses:

Lev
el 0 processing (sub
-
object data association and estimation) is aimed at combining pixel or
signal level data to obtain initial information about
the characteristics of
an observed
target
.

Level 1 processing (object refinement) is aimed at combining sensor

data to obtain the most
reliable and accurate estimate of an entity’s position, velocity, attributes, and identity (to
support prediction estimates of future position, velocity, and attributes)
.

Level 2 processing (situation refinement) dynamically attemp
ts to develop a description of
the
current relationships among entities and events in the context of their environment. This
entails object clustering and relational analysis such as force structure and cross
-
force relations,
communications, physical conte
xt, etc.





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
27
/
79


Level 3 processing (significance estimation) projects the current situation into the future to draw
inferences about enemy threats, friend and foe vulnerabilities, and opportunities for
operations
(and also
prediction

of consequences
, susceptibili
ty, and vulnerability assessments).

Level 4 processing (process refinement) is a meta
-
process that monitors the overall data fusion
process to assess and improve real
-
time system performance. This is an element of resource
management.

Level 5 processing (c
ognitive refinement) seeks to improve the interaction between a fusion
system and one or more user/analysts. Functions performed include aids for visualization,
cognitive assistance, bias remediation, collaboration, team
-
based decision making, course of
ac
tion analysis, etc.

The data fusion process model is augmented by a hierarchical taxonomy that identifies categories
of techniques and algorithms for performing the identified functions. An associated lexicon has
been developed to provide a consistent defi
nition of data fusion terminology. See
[84]

for further
details.

2.3.2.4

State of the art

The technology of multisensor data fusion is rapidly evolving. There
are

much
simultaneous

research
es

ongoing to develop new algorithms, to improve existing alg
orithms, and to assemble
these techniques into an overall architecture capable of addressing diverse data fusion
applications.

The most mature area of data fusion process is level 1 processing

using multisensor data to
determine the position, velocity, att
ributes, and identity of individual objects or entities.
Determining the position and velocity of an object on the basis of multiple sensor observations is a
relatively old problem. Gauss and Legendre developed the method of least squares for
determining t
he orbits of asteroids
[85]
. Nume
rous mathematical techniques exist for performing
coordinate transformations

in space
, associating observations to
other
observations or to tracks,
and estimating the position and velocity of a target. Multisensor target tracking is dominated by
sequential

estimation techniques such as the Kalman filter. Challenges in this area involve
circumstances in which there is a dense target environment, rapidly
manoeuvring

targets, or
complex signal propagation environments (e.g., involving multipath propagation, co
-
channel
interference, or clutter). However, single
-
target tracking in excellent signal
-
to
-
noise environments
for dynamically well
-
behaved (i.e., dynamically predictable) targets is a straightforward, easily
resolved problem.

Current research focuses on so
lving the assignment and
manoeuvring

target problem. Techniques
such as multiple
-
hypothesis tracking (MHT) and its extensions, probabilistic data association
methods, random set theory, and multiple criteria optimization theory are being used to resolve
th
ese issues. Recent studies have also focused on relaxing the assumptions of the Kalman filter




Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
28
/
79


using techniques such as particle filters and other methods. Some researchers are utilizing
multiple techniques simultaneously, guided by a knowledge
-
based system

capable of selecting the
appropriate solution on the basis of algorithm performance.

A special problem in level 1 processing involves the automatic identification of targets on the basis
of observed characteristics or attributes. To date, object recogniti
on has been dominated by
feature
-
based methods in which a feature vector (i.e., a representation of the sensor data) is
mapped into feature space with the hope of identifying the target on the basis of the location of
the feature vector relative to decisio
n boundaries

determined a priori
.

Popular pattern recognition techniques include neural networks, statistical classifiers, and vector
machine approaches. Although numerous techniques are available, the ultimate success of these
methods relies on the select
ion of good features. (Good features provide excellent class
separability in feature space, whereas bad features result in greatly overlapping feature space
areas for several classes of target.) More research is needed in this area to guide the selection o
f
features and to incorporate explicit knowledge about target classes. For example, syntactic
methods provide additional information about the makeup of a target. In addition, some limited
research is proceeding to incorporate contextual information

such a
s target mobility with
respect to terrain

to assist in target identification.

Level 2 and level 3 fusions (situation refinement and threat refinement) are currently dominated
by knowledge
-
based methods such as rule
-
based blackboard systems, intelligent age
nts, Bayesian
belief network formulations, etc. These areas are relatively immature and have numerous
prototypes, but few robust, operational systems. Many efforts of A
STUTE

use
-
cases will focus on
improving these levels. The main challenge in this area is

to establish a viable knowledge base of
rules, frames, scripts, or other methods to represent knowledge about situation assessment or
threat assessment. Unfortunately, only primitive cognitive models exist to replicate the human
performance of these funct
ions. Much research is needed before reliable and large
-
scale
knowledge
-
based systems can be developed for automated situation assessment and threat
assessment. New approaches that offer promise are the use of fuzzy logic and hybrid
architectures, which ex
tend the concept of blackboard systems

to hierarchical and multi

time
scale orientations.

Another significant approach is the one proposed by
[86]

on team
-
based intelligent agents. These
agents emulate the way human teams collaborate, proactively exchanging info
rmation and
anticipating information needs.

Level 4 processing, which assesses and improves the performance and operation of an ongoing
data fusion process, has a mixed maturity. For single
-
sensor operations, techniques from
operations research and control

theory have been applied to develop effective systems, even for
complex single sensors such as phased array radars. By contrast, situations that involve multiple
sensors, external mission constraints, dynamic observing environments, and multiple targets a
re




Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
29
/
79


more challenging. To date, considerable difficulty has been encountered in attempting to model
and incorporate mission objectives and constraints to balance
an
optimized performance with
limited resources, such as computing power and communication bandw
idth (e.g., between
sensors and processors), and other
variable
s. Methods from utility theory are being applied to
develop

adequate

measures of system performance and effectiveness. Knowledge
-
based systems
are being developed for context
-
based approximate
reasoning. Significant improvements would
result from the advent of smart, self
-
calibrating sensors, which can accurately and dynamically
assess their own performance.

The advent of distributed network
-
centric environments, in which sensing resources,
comm
unications capabilities, and information requests are very dynamic, creates serious
challenges for level 4 fusion. It is difficult (or
perhaps

impossible) to optimize resource utilization
in such an environment. In
[87]

the authors applied concepts
from

market
-
based auctions to
dynamically allocate resources, treating sensors and communication systems as suppliers of
services, and users and algorithms as consumers, to rapidly assess how to allocate system
resources to satisfy the consumers of information
.

Data fusion has suffered from a lack of rigor with regard to the test
ing

and evaluation of
algorithms and the
translation of
research findings from theory to application
s
. The data fusion
community must insist on high standards for algorithm development,

test, and evaluation;
creation of standard test cases; and systematic evolution of the technology to meet realistic
uses
.
It is particularly important, for A
STUTE

partners, to follow as much as possible these guidelines
during use
-
cases development in ord
er to produce high q
uality and reusable applications
.

2.3.2.5

Dirty Secrets in Data Fusion

In the following we include a significant and shrewd set of issues from the original article of Hall

[88]
:

Seven challenges in data fusion:

1.

There is no substitute for a good sensor.

2.

Downstream processing cannot absolve the sins of upstream processing.

3.

The fused answer may be worse than the best
sensor.

4.

There are no magic algorithms.

5.

There will never be enough training data.

6.

It is difficult to quantify the value of data fusion.

7.

Fusion is not a static process.

In the following we provide a detailed explanation:



There is still no substitute for a go
od sensor (and a good human to interpret the results)

t
his means that if something cannot be actually observed or inferred from effects, then no




Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
30
/
79


amount of data fusion from multiple sensors would overcome this problem. This problem
becomes even more challen
ging as threats change. The transition from the search for well
-
known physical targets (e.g., weapon systems, emitters, etc.) to targets based on human
networks causes obvious issues with determining what can and should be observed. In
particular, trying t
o determine intent is tantamount to mind reading, and is an elusive
problem.



Downstream processing still cannot absolve upstream sins (or lack of attention to the
data)

it
is clear that we must do the best processing possible at every step of the
fusion/in
ference process. For example, it is necessary to perform appropriate image and
signal processing at the data stage, followed by appropriate transformations to extract
feature vectors, etc., for feature
-
based identity processing. Failure to perform the
appr
opriate data processing or failure to select and refine effective feature vectors cannot
be overcome by choosing complex pattern recognition techniques. We simply must pay
attention at every stage of the information chain, from energy detection to knowledg
e
creation.



Not only may the fused result be worse than the best sensor, but failure to address
pedigree, information overload, and uncertainty may really fowl up things

t
he rapid
introduction of new sensors and use of humans as “soft sensors (reporters)”
in network
operations places special challenges on determining how to weight the incoming data.
Failure to accurately assess the accuracy of the sensor/input data would lead to biases and
errors in the fused results. The advent of networked operations and
service
-
oriented
architectures (SOA) can exacerbate this problem by rapidly disseminating data and
information without understanding the sources or pedigree (who did what to the data).



There are still no magic algorithms

this

book provides an overview of n
umerous
algorithms and techniques for all levels of fusion. Although there are increasingly
sophisticated algorithms, it is always a challenge to match the algorithm with the actual
state of knowledge of the data, system, and inferences to be made. No sing
le algorithm is
ideal under all circumstances.



There will never be enough training data

however
, hybrid methods that combine implicit
and explicit information can help. It is well
-
known that pattern recognition methods, such
as neural networks, require tra
ining data to establish the key weights. When seeking to
map an
n
-
dimensional feature vector to one of
m

classes or categories, we need in general
n

×
m

× (10

30) training examples under a variety of observing conditions. This can be very
challenging to ob
tain, especially with dynamically changing threats. Hence, in general,
there will never be enough training data available to satisfy the mathematical conditions
for pattern recognition techniques. However, new hybrid methods that use a combination
of sampl
e data, model
-
based data, and human subject explicit information can assist in this
area.





Pro
-
Active decision support for
data
-
intensive environments
(ASTUTE)






D2.2


Survey of state of the art technologies and associated gap analysis w.r.t ASTUTE objectives


Page
31
/
79




We have started at “the wrong end” (viz., at the sensor side vs. at the human side of
fusion)

finally
, we note that extensive research has been conducted to develop m
ethods
for level 0 and level 1 fusions. In essence, we have “started at the data side or sensor
inputs” to progress toward the human side. More research needs to be conducted in which
we begin at the human side (viz., at the formation of hypotheses or sema
ntic
interpretation of events) and proceed toward the sensing side of fusion. Indeed, the
introduction of the level 5 process was recognition of this need.

Even if the original article has more than 11 years we believe that all of these issues still hold t
rue.

Overall, this is an exciting time for the field of data fusion and
ASTUTE

is a big opportunity to
invest on it. The rapid advances and proliferation of sensors, the global spread of wireless
communications, and the rapid improvements in computer proc
essing and data storage enable
new applications and methods to be developed.

2.3.3

Data retrieval and exchange (ATOS)

The potential sources of data for a multimodal system are diverse, and many technologies exist to
retrieve various types of useful data. In the following
Table
1

we ha
ve grouped a set of basic
technologies
,

which are going to be applied in
ASTUTE
, their uses and their requirements.


Table
1
.

Set of technologies to be applied in ASTUTE:

Basic technology

Technological use

Requirements

2G/3G

Access

to low latency
conversational services
(i.e. voice)

Latency under 0,1 s

Push
-
To
-
Talk availability

Voice quality standards fulfilment

Best effort interactive
services access (Internet)


Non interactive services
access (high speed data
connection)

Always
-
on

Push and pull connections

ADSL
-
like connection speed (2 Mb/s)

Location