MediaHub: An Intelligent Multimedia Distributed Hub

topsalmonAI and Robotics

Feb 23, 2014 (3 years and 5 months ago)

622 views


1

MediaHub
: An Intelligent Multimedia Distributed Hub


-

for decision
-
making (
fusion and synchronisation
) on

Language and Vision data


Glenn Campbell


Supervisors: Prof. Paul Mc Kevitt, Dr. Tom Lunney.


Research Plan,

Faculty of
Engineering
, University of Ul
ster, Magee, Derry.


Abstract

The objective of the work outlined in this research plan is the development of an intelligent multimedia distributed
hub for the fusion and synchronisation of language and vision data, namely
MediaHub
.
MediaHub

will integrate
and synchronise language and vision data in such a way that the two modalities are complementary to each oth
er.
Methods of
semantic representation and decision
-
making

(fusion and synchonisation)

in existing multimedia
platforms are reviewed here. A potenti
al unique contribution is identified on how
MediaHub
, with a new approach
to
decision
-
making
, could improve on current systems. A research proposal for

MediaHub
, which will exist

and
be tested

within a
n existing

multimodal platform, and a

3
-
year research p
lan
are

given.


Keywords: intelligent multimedia, distributed system
s
,
multimodal
synchr
onisation
,

multimodal fusion,

multimodal semantic representation, knowledge representation, intelligent multimedia interfaces, decision
-
making, bayesian networks.


1. I
ntroduction

The area of intelligent multimedia has in recent years seen considerable
work on

creating user interfaces
that can accept multimodal input. This has led to the development of intelligent
interfaces

that can learn
to meet the needs of the user,
in contrast to traditional systems where the onus was on the user to
learn
to use the interface. A more natural form of human
-
computer

interaction has resulted from the
development of systems that allow multimodal input such as natural language, eye and he
ad tracking
and 3D gestures (Maybury 1993). Considerable work has also been completed in the area of
semantic or
knowledge representation

for

language and vision
, with the development of several semantic markup
languages
,
such as XHTML + Voice (XHTML + Voi
ce 2004, IBM X + V 2004) and the Synchronised
Multimedia Integration Language (SMIL) (Rutledge 2001, Rutledge & Schmitz 2001, SMIL 2004a,
2004b).
Frame
-
based methods of semantic representation
have also been used extensively

(Mc Kevitt
2003).

Efforts have
also been made to integrate natural language and vision processing, and some of the
approaches in this field are described in Mc Kevitt (1995a,b, 1996a,b)

and Mc Kevitt et al. (2002)
.



1.
1

Objectives of this Research

The principle aim of this research is
to develop
MediaHub
, a distributed hub for
decision
-
making over
multimodal information, specifically language and vision data. The primary objectives of this research
are to:




Interpret/generate semantic representations of multimodal input/output.



Perform
decision
-
making

(fusion and synchronisation) on multimodal data.



Implement
MediaHub
, a multimodal platform hub.


In pursuing the three objectives outlined above, several research questions will need to be answered.
For example:




Will
MediaHub

use frames fo
r semantic representation, or will it use XML or one of its
derivatives?



How will
MediaHub

communicate with various elements of a
multimodal
platform?



Will
MediaHub

constitute a blackboard or non
-
blackboard model?



What mechanism will be implemented for dec
ision
-
making within
MediaHub
?



2

These questions will be answered in the design and implementation of
MediaHub



a multimodal
platform hub.
MediaHub

will be tested
within an existing multimodal platform such as CONFUCI
US
(Ma & Mc Kevitt 2003)
using multimod
al input/
output

data
.


2. Literature Review

This section provides a
review

of literature relevant to the design and implementation of
MediaHub
.
Section 2.1 provides a review of the area
of distributed processing, whilst

section 2.2 looks at existing
multim
odal distributed platforms.


2.1
Distributed Processing

The area of distributed computing has been exploited to creat
e platforms

that are human
-
centred and
directly address the ne
eds of the user


systems that
allow

input that suits the needs and preferenc
e of
each individual user.

Recent

ad
vances in the area of
distributed systems have seen the development of
several software tools for distributed processing. These tools can be, and are, used in the creation of
many d
ifferent distributed
platforms. PVM (Pa
rallel Virtual Machine) (Sunderam 1990, Fink et al.
1995) is a programming environment that provides a unified framework where large parallel
processing
systems can be developed. It caters for the development of large concurrent or parallel applications th
at
consist of interacting, but relatively independent, components. ICE (Amtrup 1995) is a communication
mechanism for AI projects developed at the University of Hamburg. ICE is based on PVM, with an
additional layer added to interface with several programm
ing languages, including C, C++ and Lisp.
Support for visualisation is provided by the use of the Tcl/Tk scripting language. DACS
(Fink et al.
1995, 1996)
is a powerful tool for system integration that provides a multitude of useful features for
developin
g and maintaining distributed systems. Communication within DACS is based on simple
asynchronous message passing. All messages that are passed within DACS are encoded in a Network
Data Representation, which makes it possible to inspect data at any point in

the system and to develop
generic tools capable of processing all kinds of data.

The Open Agent Architecture (OAA) (Cheyer et al. 1998, OAA 2004) is a general
-
purpose
infrastructure for creating systems that contain multiple software agents. OAA allows
such agents to be
written in different programming languages and running on different platforms. All agents interact
using the InterAgent Communication Language (ICL). ICL is a logic
-
based declarative language used
to express high
-
level, complex tasks and
natural language expressions. JATLite (Kristensen 2001, Jeon
et al. 2000) provides a set of Java packages that enable multi
-
agent systems to be constructed using
Java. JATLite provides a Java agent platform that uses the KQML

(
Knowledge Query and
Manipulat
ion Language)

Agent Communication Language (ACL)

(Finin et al. 1994)

for inter
-
agent
communication.

KQML

is

a message format and message
-
handling protocol used to support knowledge
sharing among agents.

JavaSpaces (Freeman 2004), developed by Sun Microsyst
ems
,

is a

simple but
powerful distributed programming tool that allows developers to quickly create collaborative and
distributed applications. JavaSpaces represent a new distributed computing model where, in contrast to
conventional network tools, process
es do not communicate directly. Instead processes exchange objects
through a space, or sh
ared memory. CORBA (Vinoski
1993) is a specification released by the Object
Management Group (OMG) in 1991. A major component of CORBA is the Object Request Broker
(OR
B), which delivers requests to objects and returns results back to the client. The operation of the
ORB is completely transparent to the client. That is, the client doesn’t need to know where the objects
are, how they communicate, how they are implemented,

stored or executed. CORBA uses the Interface
Description Language (IDL), with a syntax similar to C++, to describe object interfaces.


2.2 Multimodal Platforms

Numerous intelligent multimedia distributed platforms currently exist. With respect to these p
latforms,
of particular interest are their methods of semantic representation, storage and
decision
-
making

(fusion
and synchronisation). With respect to semantic representation,
EMBASSI (
Kirste

2001, EMBASSI
2004), Psyclone (Psyclone 2004), SmartKom
(
Wahls
ter

200
3
, Wahlster et al. 2001, SmartKom 2004
)
and MIAMM (
Reithinger

et al. 2002,
MIAMM 2004) all use

an XML
-
based method of semantic
representation
.
XML (e
X
tensible Markup Language)
(W3C 2004)

was originally designed for use in
large
-
scale electronic publ
ishing
,

but is now used extensively in the exchange of data via the web.

Any
programming language can be used to manipulate data in XML, and a large amount of middleware
exists for managing data in XML format.

It is common that a derivative of XML is used
for semantic

3

representation. For example, SmartKom uses an XML
-
based mark
-
up language, M3L (MultiModal
Markup Language), to semantically represent information passed between the various components of
the platform. Similarly, the exchange of information wit
hin MIAMM is facilitated through MMIL
(Multi
-
Modal Interface Language), which is also based on XML.
AESOPWORLD

(Okada 1996)
,
CHAMELEON
(Brøndsted et al. 1998, 2001),
COLLAGEN (
Rich & Sidner 1997)
,
DARBS (
Choy

et
al. 2004, Nolle et al. 2001),
the
DARPA Gala
xy Communicator (Bayer et al. 2001),

INTERACT
(Waibel et al. 1996), Oxygen (2004),

Spoken Image/SONAS (
Ó Nualláin

et al. 1994,
Ó Nualláin &
Smith 1994
,
Kelleher et al. 2000)
,
WAXHOLM (Carlson &
Granström

1996)

and Ymir (Thórisson
1999) utilise frames
. F
rames, first introduced by Minsky (1975), are based on human memory and the
idea that when
humans

meet a new problem they select an existing frame (a remembered framework)
that can be adapted to fit the new situation
.

COLLAGEN introduces the concept of a S
haredPlan to
represent the common goal of a user and a collaborative agent, and
uses Sidner’s (1994) artificial
discourse language as the internal representation for user and agent communication acts.

With respect to

semantic storage a blackboard was imple
mented in DARBS, the DARPA
Galaxy Communicator, CHAMELEON,
Psyclone,
SmartKom
, Spoken Image/SONAS

and Ymir. The
DARPA Galaxy Communicator consists of a distributed hub
-
and
-
spoke architecture, with
communication facilitated via message
-
passing.
The blackboa
rd implemented in CHAMELEON is used
to keep track of interactions over time
, through representation of semantics using frames
. The system
consists of ten modules, mostly programmed in C and C++, which are glued together by the DACS
communication
s

system.
C
ommunication between
modules is achieved by exchanging semantic
representations, in the form of frames, between themselves or the blackboard.
An initial prototype
application for CHAMELEON is the IntelliMedia WorkBench
(Brøndsted et al. 2001)
, where the us
er
can ask the system for directions (using speech and pointing gestures) to various offices within a
building.
Ymir is a computational model for creating autonomous creatures capable of human
-
like
communication with real users. Ymir represents a distribut
ed, modular approach that bridges between
multimodal perception, decision and action in a coherent framework. There are three main blackboards
implemented in Ymir, and communication is achieved via message passing. Psyclone introduces the
concept of a ‘Whi
teboard’, which is essentially a blackboard that is capable of handling media streams.
Psyclone allows software to be easily distributed across multiple machines and enables communication
management using rich messages
-

formatted in XML. Non
-
blackboard mo
dels are implemented in
AESOPWORLD,
COLLAGEN, EMBASSI, INTERACT,
WAXHOLM
, MIAMM and Oxygen.

For
example, EMBASSI has a highly distributed architecture consisting of many independent components.

With respect to decision
-
making,

the rule
-
based method was
th
e most
popular form of
reasoning. However,
there is significant interest in using
other
Artificial Intelligence techniques to assist
decision
-
making in
multimodal platforms
.
For example, t
he DARBS distributed blackboard system

consists of rule based, neura
l network and genetic algorithm knowledg
e sources operating in parallel to

solve a problem, such as

control
ling

plasma deposition pro
cesses. Although COLLAGEN provides a
framework for communicating and recording decisions between the user and an agent, it
does not
provide a method of
decision
-
making



this is left to the discretion of the developer.


3. Project Proposal

The proposed project is the design and implementation of
MediaHub

-

an intelligent multimedia
distributed hub for the fusion and synchronis
ation of language and vision data. A
schematic

for
MediaHub

is shown in
F
igure 3.1.












Multimodal
Input Modules

Semantic Representation Database

Dialogue Manager

Decision Making Module

Multimodal
Output Modules

Figure 3.
1: Basic
architecture of MediaHub



4

The key components of
MediaHub

are the Dialogue Manager (DM), the Semantic Representation
Database (SRDB) and the
Decision
-
Making

Modu
le (DMM). The role of the DM is to facilitate the
interactions between all components of the platform. It will act as a blackboard module, with all
communication between components achieved via the DM. The DM will also be responsible for the
synchronisatio
n of the multimodal
inputs and outputs
. During the testing of
MediaHub

an
existing
multimodal platform
, such as CONFUCIUS (Ma & Mc Kevitt 2003),

will be used to
perform

the
processing of the multimodal input and output.

The SRDB in MediaHub will use an XML
-
based
method of semantic representation. XML has been chosen due to its widespread use in the area of
knowledge and semantic representation in intelligent multimedia.

The Decision
-
Making Module
(DMM) will employ an Artificial Intelligence (AI) techni
que t
o provide
decision
-
making on language
and vision data.

Bayesian Networks and CPNs (Causal Probabilistic Networks) will be investigated, to
determine if they will be suitable for decision
-
making. It may also be possible to use other techniques
such as Fuzzy

Logic, Neural Networks, Genetic Algorithms or a combination of techniques to provide
this functionality. The potential for these methods of decision
-
making under uncertainty will be
investigated further before a definitive decision is made on the design o
f the DMM.


3.1 Software Analysis and Prospective Tools

Several implementations of XML could be used by the SRDB. Initially, XHTML + Voice

(XHTM
L +
Voice 2004, IBM X + V 2004)

may be a suitable choice, since it combines the vision capabilities of
XHTML and

the speech capabilities of VoiceXML. Other XML
-
based languages such as the
Synchronised Multimedia Integration Language (SMIL)

(Rutledge 2001, Rutledge & Schmitz 2001,
SMIL 2004a, 2004b)

and MMIL, as used in MIAMM

(
Reithinger

et al. 2002, MIAMM 2004)
, wil
l also
be considered.

The HUGIN software tool (HUGIN 2004), a tool implementing
Bayesian Networks as
CPNs, will
be investigated
.

Other software tools for implementing Fuzzy Logic, Neural Networks and
Genetic Algorithms may also be
utilised.



4. Comparison

to

Other Work

Table A.1 in Appendix A compares
MediaHub

to
the hub of
other existing pla
tforms. In the table,
platform
characteristics are listed, with a
tick

(

) indicating if the characteristics are present for each of
the platforms. As shown

in the tab
le, INTERACT

(Waibel et al. 1996)

uses
n
eural
n
etworks, while
DARBS
(
Choy

et al. 2004, Nolle et al. 2001)
implements a combination of rule based, neural network
and genetic algorithm techniques for decision
-
making.

As illustrated,

MediaHub
,

like Psyclone

a
nd
SmartKom
, will implement a blackboard model and will use an XML
-
based method of semantic
representation.
MediaHub

will improve on the capabilities of Psyclone

and SmartKom

by implementing
a new

technique for decision
-
making

-

possibly Bayesian networks
,

CPNs

or other techniques such as
fuzzy logic, genetic algorithms or neural networks
.
It may also be possible to use a combination of these
techniques.


5. Project Schedule

Table B.1 in Appendix B outlines the plan of work for the completion of this projec
t, together with an
indication of the expected completion date for each of the tasks.


6. Conclusion

At this initial stage of the project, the focus has been on investigating in the area of distributed
computing for intelligent multimedia.
The objectives o
f
MediaHub
, in providing a distributed hub for
the fusion and synchronisation of language and vision data, have been defined. A review

of various
existing
distributed systems and multimodal platforms

has given an insight into the recent
advancements and ac
hievements in intelligent multimedia distributed
computing
. Due consideration
was also given to the various existing methods of multimodal semantic representation
, storage

and
decision
-
making, which will be of critical importance in the development of
Medi
aHub
.
A potential

unique contribution of
MediaHub

has been identified, in providing a new method of
decision
-
making
.
The effectiveness of MediaHub will be tested within an existing multimodal platform, such as
CONFUCIUS.
Although only a snapshot of the cur
rent research could be presented in this report, it is
hoped that it provides a concise summary of the motivation for, and future direction of, the
development of
MediaHub
.


5

References


Amtrup, Jan W. (1995) ICE
-
INTARC Communication Environment Users Guide

and Reference
Manual Version 1.4,
Computer Science
D
epartment,
University of Hamburg, October.


Bayer, S.,
C. Doran

&
B.
George

(2001) Dialogue Interaction with the DARPA Communicator
Infrastructure: The development of Useful Software. In
Proceedings of H
LP 2001, First International
Conference on Human Language Technology Research
, San Diego, CA, USA, 114
-
116.


Brøndsted, T.,
P.

Dalsgaard,
L.B.

Larsen,
M. Manthey
,

P.

Mc Kevitt,
T.B.

Moeslund
&
K.G.

Olesen

(1998) A platform for developing Intelligent Multi
Media applications. Technical Report R
-
98
-
1004,
Center for PersonKommunikation (CPK), Institute for Electronic Systems (IES), Aalborg University,
Denmark, May.


Brøndsted, T., P.

Dalsgaard, L.B.

Larsen,
M. Manthey,

P.

Mc Kevitt, T.B.

Moeslund
&
K.G.

Olese
n

(2001) The IntelliMedia WorkBench
-

An Environment for Building Multimodal Systems. In Advances
in Cooperative Multimodal Communication: Second International Conference, CMC

'98, Tilburg, The
Netherlands, January 1998, Selected Papers, Harry Bunt and Rob
bert
-
Jan Beun (Eds.), 217
-
233.
Lecture Notes in Artificial Intelligence (LNAI) series, LNAI 2155, Berlin, Germany
: Springer
-
Verlag.


Ca
rlson, R.

&
B.
Granström

(1996)

The

Waxholm spoken dialogue system.

In: Palková Z,
(E
d.
), 39
-
52,

Phonetica Pragensia IX.
Charisteria viro doctissimo Premysl Janota oblata. Acta Univers
itatis Carolinae
Philologica 1.


Cheyer, A.,
L.
Jul
ia

&

J.C. Martin

(1998)

A Unified Framework for Constructing Multimodal
Experiments and Applications. In
Proceedings of CMC ’98
; Tilburg, The
Netherlands, 63
-
69.


Choy, K.W
.,
A.A.
Hopgood,
L.
Nolle

&
B.C
.

O'Neill
(2004)
Implementing a blackboard system in a

distributed processing network. In
Expert Update
,
Vol. 7, No. 1, Spring,
16
-
24.


E
MBASSI

(2004)

Homepage.
http://www.embassi.de/ewas/ewas_frame.html

Site visited 09/12/04
.


Fini
n, T.,
R.
Fritzson,
D. McKay
&
R.
McEntire
(1994) KQML as an Agent Communication Language.
In Proceedings of the 3rd International Conference on Information
an
d Knowledge Management
(CIKM
'94), Gaithersburg, MD, USA, 456
-
463.


Fink, G.A.,
N.
Jungclaus,
F.
Kummert,
H. Ritter

&
G.
Sagerer
(1995) A Communication Framework for
Heterogeneous Distributed Pattern Analysis. In
International Conference on Algorithms And
Architectures for Parallel Processing
, Brisbane, Australia, 881
-
890.


Fink, G.A.,
N.
Jungclaus,
F.
Kummert,
H. Ritter

&
G.
Sagerer
(1996) A Distributed System for
Integrated Speech and Image Understanding. In
International Symposium on Artificial Intellige
nce
,
Cancun, Mexico, 117
-
126.


Freeman (2004)

Make Room For JavaSpaces Part 1. http://www.javaworld.com/javaworld/jw
-
11
-
1999/jw
-
11
-
jiniology.html Site visited 16/11/04.


Hugin (2004)

Hugin Expert Developers Site.
http://developer.hugin.com/

Site visited 30/10/04.


IBM X + V
(2004)

IBM's proposal to the W3C.

http://www
-
106.ibm.com/developerworks/wireless/library/wi
-
xvlanguage
/

Site visited 03/11/04


Jeon, H.,
C.

Petrie
&
M.R
.

Cutkosky

(2000) JATLite: A Java Agent Infrastructure with Message
Routing. IEEE Internet Computing Vol. 4, No. 2, Mar/Apr, 87
-
96.



6

Kelleher, J.,
T.
Doris,
Q. Hussain

&
S.
Ó Nualláin

(
2000
)

SONAS: Mult
imo
dal, Multi
-
User Interaction
with a Modelled Environment.
In S. Ó Nualláin, (Ed.),
171
-
184, Spatial Cognition.
Amsterdam, The
Net
herlands
: John Benjamins Publishing Co.


Kirste T.,
T.
Herfet
&
M.
Schnaider (2001) EMBASSI: Multimodal Assistance for Infotainm
ent and
Service Infrastructures
. In Proceedings of the 2001 EC/NSF Workshop Universal on Accessibility of
Ubiquitous Computing: Providing for the Elderly, Alcácer do Sal, Portugal, 41
-
50.


Kristensen, T. (2001)

T Software Agents In A Collaborative Learning

Environment. In
International
Conference on Engineering Education
, Oslo, Norway, August
, Session 8B1, 20
-
25.


Ma, M.
&

P.
Mc Kevitt

(2003)
Semantic representat
ion of events in 3D animation.

In Proc. of the Fifth
International Workshop on Computational Sem
antics (IWCS
-
5), Harry Bunt, Ielka van der Sluis and
Roser Morante (Eds.), 253
-
281. Tilburg University, Tilburg, The Netherlands, January.


Maybury, M.T. (Ed.) (1993) Intelligent Multimedia Interfaces. Menlo Park, CA: AAAI/MIT Press.


Mc Kevitt, P. (Ed.) (
1995a) Integration of Natural Language and Vision Processing (Volume I):
Computational Models and Systems.
Dordrecht, The Netherlands:
Kluwer Academic Publishers.


Mc Kevitt, P. (Ed.) (1995b) Integration of Natural Language and Vision Processing (Volume II
):
Intelligent Multimedia.
Dordrecht, The Netherlands:
Kluwer Academic Publishers.


Mc Kevitt, P. (Ed.) (1996a) Integration of Natural Language and Vision Processing (Volume III):
Theory and grounding representations.
Dordrecht, The Netherlands:
Kluwer Aca
demic Publishers.


Mc Kevitt, P. (Ed.) (1996b) Integration of Natural Language and Vision Processing (Volume IV):
Recent Advances.
Dordrecht, The Netherlands:
Kluwer Academic Publishers.


Mc Kevitt, P. (2003) Mul
tiModal semantic representation.

In Proc. of

the SIGSEM Working Group on
the Representation of MultiModal Semantic Information, First Working Meeting, Fifth International
Workshop on Computational Semantics (IWCS
-
5),

Harry Bunt, Kiyong Lee, Laurent R
omary, and
Emiel Krahmer (Eds.)

, 1
-
16
,


Tilburg U
niversity, Tilburg, The Netherlands, January.


Mc Kevitt, P.
, S. O’Nuallain

& C. Mulvihill
(
E
ds.) (2002) Language, Vision and Music
-

Selected
Papers from the 8th International Workshop on the Cognitive Science of Natural Language Processing,
Galway, Irela
nd. Amsterdam , Philadelphia: John Benjamins Publishing Company.


MIAMM (2004)

Multidimensional Information Access using Multiple Modalities

http://miamm.loria.fr/


Site visited 09/11/04.

Minsky, M. (1975) A Frame
wor
k for representing knowledge.

In
Readings in knowledge
representation,
R. Brachman and H. Levesque (Eds.), 245
-
262,
Los
A
ltos, CA: Morgan Kaufmann.


Nolle, L.,

K. Wong
&
A.A.
Hopgood
(2001) DARBS: a distributed blackboard system.

In Proc.
ES2001, Research
and Development in Intelligent Systems XVIII, M.Bramer, F.

Coenen and A.

Preece
(
E
ds.),

161
-
170, Berlin, Germany
: Springer
-
Verlag
.


OAA (2004)

http://www.ai.sri.com/~oaa/whitepaper.html

Site vi
sited 11/11/04.


Okada, N. (1996), Integrating Vision, M
otion and Language through Mind.

In
Artificial Intelligence
Review,
Vol. 10, Issues 3
-
4, August, 209
-
234.


Ó Nualláin, S.
,
B. Farley &

A.
Smith

(1994)

The Spoken Image System: On the visual

interpret
ation of
verbal scene descriptions. In P. McKevitt, (Ed.), 36
-
39,
Proceedings of the Workshop on integration of

7

natural language and vision processing
,
Twelfth American National Conference on Artificial
Intelligence (AAAI
-
94)
. Seattle, Washington, USA, Aug
ust.


Ó Nualláin, S. &
A. Smith

(1994) An Investigation into
the Common Semantics of Language and
Vision. In P. McKevitt, (Ed.), 21
-
30, Integration of Natural Language and Vision Processing (Volume
I): Computational Models and Systems. London, U.K.: Kluwer

Academic Publishers.


Oxygen (2004)

MIT.
http://oxygen.lcs.mit.edu/Overview.html

Site visited 22/10/04.


Psyclone (2004)

Mindmakers.
http://www.cmlabs
.com/psyclone/

Site visited 21/10/04.


Rich, C., & C. Sidner
(1997)

COLLAGEN: When Agents Collaborate with People. In
First
International Conference on Autonomous Agents
, Marina del Rey, CA, February, 284
-
291.


Reithinger, N.,
C.

Lauer &
L. Romary

(2002)

MIAMM:

Multimodal Information Access using Multiple
Modalities.

In Proc. of the
International
CLASS workshop

on Natural, Intelligent and Effective
interaction in MultiModal Dialogue Systems
, Copenhagen,

Denmark, 28
-
29 June.


Rutledge, L. (2001) SMIL 2.0: X
ML
for

Web Multimedia. In IEEE Internet Computing, Sept
-
Oct, 78
-
84.


Rutledge, L. &
P.
Schmitz

(2001) Improving Media Fragment Integration in Emerging Web Formats. In
Proceedings of the International Conference on Multimedia Modellin
g (MMM01),

CWI, Amsterd
am,
The Netherlands, November 5
-
7, 147
-
166.


Sidner, C.L. (1994) An Artificial Discourse Language for Collaborative Negotiation. In Proceedings of
the Twelfth National Confer
ence on Artificial Intelligence
,

Vol. 1,

MIT Press, Cambridge, MA, 814
-
819.


Smart
Kom (2004)

SmartKom.
http://www.smartkom.org

Site visited 12/10/04.


SMIL (2004a)

SMIL 1.0 WWW Consortium
Recommendation, June

1998.
http://www.w3.org/TR/REC
-
smil/

Site
visited 02/11/04


SMIL (2004b)

http://www.w3.org/AudioVideo/

Site visited 02/11/04


Sunderam, V.S. (1990) PVM: a framework for parallel distributed computing.
In
Concurrency Practice
and Experience
, 2(4), 315
-
3
40.


Thórisson, K.R. (1999) A Mind Model for Multimodal Communicative Creatures & Humanoids, In
International Journal of Applied Artificial Intelligence, Vol. 13 (4
-
5), 449
-
486.


Vinoski, S. (1993)

Distributed object computing with CORBA, C++ Report, Vol.
5, No. 6, July/August,
32
-
38.


Waibel, A.,
M.T.
Vo,
P.
Duchnowski &

S. Manke
(1996) Multimodal Interfaces.

In
Artificial
Intelligence Review
, Vol. 10, Issue 3
-
4, August, 299
-
319.


Wahlster, W.

(2003) SmartKom: Symmetric Multimodality in an Adaptive and Re
usable Dialogue
Shell
.

In: Krahl, R., Günther, D. (eds), 47
-
62, Proceedings of the Human Computer In
teraction Status
Conference
, June.

Berlin
, Germany
: DLR.


Wahlster, W., N. Reithinger
&
A.

Blocher

(2001)
SmartKom: Towards Multimodal Dialogues with
Anthro
pomorphic Interface
Agents. In: Wolf, G. & G. Klein
(
E
ds.),
23
-
34,
Proceedings of International
Status Conference
,

Human
-
Computer Interaction
.

October
,

Berlin
, Germany
: DLR
.


8


W3C (2004) W3C homepage.
http://www.w3.org

Site
visited 02/11/04
.


XHTML + Voice (2004)

XHTML

+

Voice Profile 1.2

http://www.voicexml.org/specs/multimodal/x+v/12/

Site visited 03/11/04
.



9

Appendix A:
Comparison of Intelligent Multimodal

Platforms



































Table A.1: Comparison of Intelligent Multimodal Platforms


Ca
tegory


System


Year

Semantic
Representation

Semantic
Storage

Decision
-
making

(Fusion and Synchronisation)

Multimodal Interaction

Frames

XML

BB

Non
BB

Bayesian
Networks /
CPNs

Fuzzy Logic

Genetic

Algorithms

Neural
Networks

Rule
Based

In
put Media

Output Media

Text

Speech

Gesture

Vision

Text

Speech

Gesture

Graphics




Intelligent
Multimodal
Platforms

WAXHOLM

1992


























Spoken Image /
SONAS

1994

























AESOPWORLD

1996

























COLLAGEN

1996







N/A














INTERACT

1996

























Ymir

1997



























CHAMELEON

1998


























Oxygen

1999




























SmartKom

2000




























DARBS

2001



























DARPA Galaxy
Communicator

2001























EMBASSI

2001


























MIAMM

2001



























Psyclone

2003



























This Project

MediaHub










?


















10





Appendix B: Project schedule



Research Activi
ties

Year 1



Year 2



Year 3



Oct ’04
-
Jan 05

Feb ’05
-
May ‘05

June ’05
-
Sept ‘05

Oct ’05
-

Jan ‘06

Feb ’06
-

May ‘06

June ’06
-
Sept ‘06

Oct ’06
-

Jan ‘07

Feb ’07
-

May ‘07

June ’07
-

Sept ‘07

Literature survey


















Writing Chapter 2 ‘Li
terature Review’


















Analysis and selection of tools


















Analysis of Semantic Representation
Languages (e.g., XML, SMIL, X + V)


















Analysis of suitable programming
languages (e.g. Java, C, C++)


















Review of AI techniques & other reusable
system components

(e.g. FL, NNs, GAs,
Bayesian Networks
)


















Design of
MediaHub



















Architecture implementation


















Develop Semantic Representation
Database










I
mplement AI technique for
decision
-
making

(e.g. FL, NNs, GAs,
Bayesian
Networks

e.g. CPNs
)










Develop a prototype application for
MediaHub










Integration and testing


















Improving system










Write up PhD thesis



















Table B.1: Project Schedule