1
MediaHub
: An Intelligent Multimedia Distributed Hub
-
for decision
-
making (
fusion and synchronisation
) on
Language and Vision data
Glenn Campbell
Supervisors: Prof. Paul Mc Kevitt, Dr. Tom Lunney.
Research Plan,
Faculty of
Engineering
, University of Ul
ster, Magee, Derry.
Abstract
The objective of the work outlined in this research plan is the development of an intelligent multimedia distributed
hub for the fusion and synchronisation of language and vision data, namely
MediaHub
.
MediaHub
will integrate
and synchronise language and vision data in such a way that the two modalities are complementary to each oth
er.
Methods of
semantic representation and decision
-
making
(fusion and synchonisation)
in existing multimedia
platforms are reviewed here. A potenti
al unique contribution is identified on how
MediaHub
, with a new approach
to
decision
-
making
, could improve on current systems. A research proposal for
MediaHub
, which will exist
and
be tested
within a
n existing
multimodal platform, and a
3
-
year research p
lan
are
given.
Keywords: intelligent multimedia, distributed system
s
,
multimodal
synchr
onisation
,
multimodal fusion,
multimodal semantic representation, knowledge representation, intelligent multimedia interfaces, decision
-
making, bayesian networks.
1. I
ntroduction
The area of intelligent multimedia has in recent years seen considerable
work on
creating user interfaces
that can accept multimodal input. This has led to the development of intelligent
interfaces
that can learn
to meet the needs of the user,
in contrast to traditional systems where the onus was on the user to
learn
to use the interface. A more natural form of human
-
computer
interaction has resulted from the
development of systems that allow multimodal input such as natural language, eye and he
ad tracking
and 3D gestures (Maybury 1993). Considerable work has also been completed in the area of
semantic or
knowledge representation
for
language and vision
, with the development of several semantic markup
languages
,
such as XHTML + Voice (XHTML + Voi
ce 2004, IBM X + V 2004) and the Synchronised
Multimedia Integration Language (SMIL) (Rutledge 2001, Rutledge & Schmitz 2001, SMIL 2004a,
2004b).
Frame
-
based methods of semantic representation
have also been used extensively
(Mc Kevitt
2003).
Efforts have
also been made to integrate natural language and vision processing, and some of the
approaches in this field are described in Mc Kevitt (1995a,b, 1996a,b)
and Mc Kevitt et al. (2002)
.
1.
1
Objectives of this Research
The principle aim of this research is
to develop
MediaHub
, a distributed hub for
decision
-
making over
multimodal information, specifically language and vision data. The primary objectives of this research
are to:
Interpret/generate semantic representations of multimodal input/output.
Perform
decision
-
making
(fusion and synchronisation) on multimodal data.
Implement
MediaHub
, a multimodal platform hub.
In pursuing the three objectives outlined above, several research questions will need to be answered.
For example:
Will
MediaHub
use frames fo
r semantic representation, or will it use XML or one of its
derivatives?
How will
MediaHub
communicate with various elements of a
multimodal
platform?
Will
MediaHub
constitute a blackboard or non
-
blackboard model?
What mechanism will be implemented for dec
ision
-
making within
MediaHub
?
2
These questions will be answered in the design and implementation of
MediaHub
–
a multimodal
platform hub.
MediaHub
will be tested
within an existing multimodal platform such as CONFUCI
US
(Ma & Mc Kevitt 2003)
using multimod
al input/
output
data
.
2. Literature Review
This section provides a
review
of literature relevant to the design and implementation of
MediaHub
.
Section 2.1 provides a review of the area
of distributed processing, whilst
section 2.2 looks at existing
multim
odal distributed platforms.
2.1
Distributed Processing
The area of distributed computing has been exploited to creat
e platforms
that are human
-
centred and
directly address the ne
eds of the user
–
systems that
allow
input that suits the needs and preferenc
e of
each individual user.
Recent
ad
vances in the area of
distributed systems have seen the development of
several software tools for distributed processing. These tools can be, and are, used in the creation of
many d
ifferent distributed
platforms. PVM (Pa
rallel Virtual Machine) (Sunderam 1990, Fink et al.
1995) is a programming environment that provides a unified framework where large parallel
processing
systems can be developed. It caters for the development of large concurrent or parallel applications th
at
consist of interacting, but relatively independent, components. ICE (Amtrup 1995) is a communication
mechanism for AI projects developed at the University of Hamburg. ICE is based on PVM, with an
additional layer added to interface with several programm
ing languages, including C, C++ and Lisp.
Support for visualisation is provided by the use of the Tcl/Tk scripting language. DACS
(Fink et al.
1995, 1996)
is a powerful tool for system integration that provides a multitude of useful features for
developin
g and maintaining distributed systems. Communication within DACS is based on simple
asynchronous message passing. All messages that are passed within DACS are encoded in a Network
Data Representation, which makes it possible to inspect data at any point in
the system and to develop
generic tools capable of processing all kinds of data.
The Open Agent Architecture (OAA) (Cheyer et al. 1998, OAA 2004) is a general
-
purpose
infrastructure for creating systems that contain multiple software agents. OAA allows
such agents to be
written in different programming languages and running on different platforms. All agents interact
using the InterAgent Communication Language (ICL). ICL is a logic
-
based declarative language used
to express high
-
level, complex tasks and
natural language expressions. JATLite (Kristensen 2001, Jeon
et al. 2000) provides a set of Java packages that enable multi
-
agent systems to be constructed using
Java. JATLite provides a Java agent platform that uses the KQML
(
Knowledge Query and
Manipulat
ion Language)
Agent Communication Language (ACL)
(Finin et al. 1994)
for inter
-
agent
communication.
KQML
is
a message format and message
-
handling protocol used to support knowledge
sharing among agents.
JavaSpaces (Freeman 2004), developed by Sun Microsyst
ems
,
is a
simple but
powerful distributed programming tool that allows developers to quickly create collaborative and
distributed applications. JavaSpaces represent a new distributed computing model where, in contrast to
conventional network tools, process
es do not communicate directly. Instead processes exchange objects
through a space, or sh
ared memory. CORBA (Vinoski
1993) is a specification released by the Object
Management Group (OMG) in 1991. A major component of CORBA is the Object Request Broker
(OR
B), which delivers requests to objects and returns results back to the client. The operation of the
ORB is completely transparent to the client. That is, the client doesn’t need to know where the objects
are, how they communicate, how they are implemented,
stored or executed. CORBA uses the Interface
Description Language (IDL), with a syntax similar to C++, to describe object interfaces.
2.2 Multimodal Platforms
Numerous intelligent multimedia distributed platforms currently exist. With respect to these p
latforms,
of particular interest are their methods of semantic representation, storage and
decision
-
making
(fusion
and synchronisation). With respect to semantic representation,
EMBASSI (
Kirste
2001, EMBASSI
2004), Psyclone (Psyclone 2004), SmartKom
(
Wahls
ter
200
3
, Wahlster et al. 2001, SmartKom 2004
)
and MIAMM (
Reithinger
et al. 2002,
MIAMM 2004) all use
an XML
-
based method of semantic
representation
.
XML (e
X
tensible Markup Language)
(W3C 2004)
was originally designed for use in
large
-
scale electronic publ
ishing
,
but is now used extensively in the exchange of data via the web.
Any
programming language can be used to manipulate data in XML, and a large amount of middleware
exists for managing data in XML format.
It is common that a derivative of XML is used
for semantic
3
representation. For example, SmartKom uses an XML
-
based mark
-
up language, M3L (MultiModal
Markup Language), to semantically represent information passed between the various components of
the platform. Similarly, the exchange of information wit
hin MIAMM is facilitated through MMIL
(Multi
-
Modal Interface Language), which is also based on XML.
AESOPWORLD
(Okada 1996)
,
CHAMELEON
(Brøndsted et al. 1998, 2001),
COLLAGEN (
Rich & Sidner 1997)
,
DARBS (
Choy
et
al. 2004, Nolle et al. 2001),
the
DARPA Gala
xy Communicator (Bayer et al. 2001),
INTERACT
(Waibel et al. 1996), Oxygen (2004),
Spoken Image/SONAS (
Ó Nualláin
et al. 1994,
Ó Nualláin &
Smith 1994
,
Kelleher et al. 2000)
,
WAXHOLM (Carlson &
Granström
1996)
and Ymir (Thórisson
1999) utilise frames
. F
rames, first introduced by Minsky (1975), are based on human memory and the
idea that when
humans
meet a new problem they select an existing frame (a remembered framework)
that can be adapted to fit the new situation
.
COLLAGEN introduces the concept of a S
haredPlan to
represent the common goal of a user and a collaborative agent, and
uses Sidner’s (1994) artificial
discourse language as the internal representation for user and agent communication acts.
With respect to
semantic storage a blackboard was imple
mented in DARBS, the DARPA
Galaxy Communicator, CHAMELEON,
Psyclone,
SmartKom
, Spoken Image/SONAS
and Ymir. The
DARPA Galaxy Communicator consists of a distributed hub
-
and
-
spoke architecture, with
communication facilitated via message
-
passing.
The blackboa
rd implemented in CHAMELEON is used
to keep track of interactions over time
, through representation of semantics using frames
. The system
consists of ten modules, mostly programmed in C and C++, which are glued together by the DACS
communication
s
system.
C
ommunication between
modules is achieved by exchanging semantic
representations, in the form of frames, between themselves or the blackboard.
An initial prototype
application for CHAMELEON is the IntelliMedia WorkBench
(Brøndsted et al. 2001)
, where the us
er
can ask the system for directions (using speech and pointing gestures) to various offices within a
building.
Ymir is a computational model for creating autonomous creatures capable of human
-
like
communication with real users. Ymir represents a distribut
ed, modular approach that bridges between
multimodal perception, decision and action in a coherent framework. There are three main blackboards
implemented in Ymir, and communication is achieved via message passing. Psyclone introduces the
concept of a ‘Whi
teboard’, which is essentially a blackboard that is capable of handling media streams.
Psyclone allows software to be easily distributed across multiple machines and enables communication
management using rich messages
-
formatted in XML. Non
-
blackboard mo
dels are implemented in
AESOPWORLD,
COLLAGEN, EMBASSI, INTERACT,
WAXHOLM
, MIAMM and Oxygen.
For
example, EMBASSI has a highly distributed architecture consisting of many independent components.
With respect to decision
-
making,
the rule
-
based method was
th
e most
popular form of
reasoning. However,
there is significant interest in using
other
Artificial Intelligence techniques to assist
decision
-
making in
multimodal platforms
.
For example, t
he DARBS distributed blackboard system
consists of rule based, neura
l network and genetic algorithm knowledg
e sources operating in parallel to
solve a problem, such as
control
ling
plasma deposition pro
cesses. Although COLLAGEN provides a
framework for communicating and recording decisions between the user and an agent, it
does not
provide a method of
decision
-
making
–
this is left to the discretion of the developer.
3. Project Proposal
The proposed project is the design and implementation of
MediaHub
-
an intelligent multimedia
distributed hub for the fusion and synchronis
ation of language and vision data. A
schematic
for
MediaHub
is shown in
F
igure 3.1.
Multimodal
Input Modules
Semantic Representation Database
Dialogue Manager
Decision Making Module
Multimodal
Output Modules
Figure 3.
1: Basic
architecture of MediaHub
4
The key components of
MediaHub
are the Dialogue Manager (DM), the Semantic Representation
Database (SRDB) and the
Decision
-
Making
Modu
le (DMM). The role of the DM is to facilitate the
interactions between all components of the platform. It will act as a blackboard module, with all
communication between components achieved via the DM. The DM will also be responsible for the
synchronisatio
n of the multimodal
inputs and outputs
. During the testing of
MediaHub
an
existing
multimodal platform
, such as CONFUCIUS (Ma & Mc Kevitt 2003),
will be used to
perform
the
processing of the multimodal input and output.
The SRDB in MediaHub will use an XML
-
based
method of semantic representation. XML has been chosen due to its widespread use in the area of
knowledge and semantic representation in intelligent multimedia.
The Decision
-
Making Module
(DMM) will employ an Artificial Intelligence (AI) techni
que t
o provide
decision
-
making on language
and vision data.
Bayesian Networks and CPNs (Causal Probabilistic Networks) will be investigated, to
determine if they will be suitable for decision
-
making. It may also be possible to use other techniques
such as Fuzzy
Logic, Neural Networks, Genetic Algorithms or a combination of techniques to provide
this functionality. The potential for these methods of decision
-
making under uncertainty will be
investigated further before a definitive decision is made on the design o
f the DMM.
3.1 Software Analysis and Prospective Tools
Several implementations of XML could be used by the SRDB. Initially, XHTML + Voice
(XHTM
L +
Voice 2004, IBM X + V 2004)
may be a suitable choice, since it combines the vision capabilities of
XHTML and
the speech capabilities of VoiceXML. Other XML
-
based languages such as the
Synchronised Multimedia Integration Language (SMIL)
(Rutledge 2001, Rutledge & Schmitz 2001,
SMIL 2004a, 2004b)
and MMIL, as used in MIAMM
(
Reithinger
et al. 2002, MIAMM 2004)
, wil
l also
be considered.
The HUGIN software tool (HUGIN 2004), a tool implementing
Bayesian Networks as
CPNs, will
be investigated
.
Other software tools for implementing Fuzzy Logic, Neural Networks and
Genetic Algorithms may also be
utilised.
4. Comparison
to
Other Work
Table A.1 in Appendix A compares
MediaHub
to
the hub of
other existing pla
tforms. In the table,
platform
characteristics are listed, with a
tick
(
√
) indicating if the characteristics are present for each of
the platforms. As shown
in the tab
le, INTERACT
(Waibel et al. 1996)
uses
n
eural
n
etworks, while
DARBS
(
Choy
et al. 2004, Nolle et al. 2001)
implements a combination of rule based, neural network
and genetic algorithm techniques for decision
-
making.
As illustrated,
MediaHub
,
like Psyclone
a
nd
SmartKom
, will implement a blackboard model and will use an XML
-
based method of semantic
representation.
MediaHub
will improve on the capabilities of Psyclone
and SmartKom
by implementing
a new
technique for decision
-
making
-
possibly Bayesian networks
,
CPNs
or other techniques such as
fuzzy logic, genetic algorithms or neural networks
.
It may also be possible to use a combination of these
techniques.
5. Project Schedule
Table B.1 in Appendix B outlines the plan of work for the completion of this projec
t, together with an
indication of the expected completion date for each of the tasks.
6. Conclusion
At this initial stage of the project, the focus has been on investigating in the area of distributed
computing for intelligent multimedia.
The objectives o
f
MediaHub
, in providing a distributed hub for
the fusion and synchronisation of language and vision data, have been defined. A review
of various
existing
distributed systems and multimodal platforms
has given an insight into the recent
advancements and ac
hievements in intelligent multimedia distributed
computing
. Due consideration
was also given to the various existing methods of multimodal semantic representation
, storage
and
decision
-
making, which will be of critical importance in the development of
Medi
aHub
.
A potential
unique contribution of
MediaHub
has been identified, in providing a new method of
decision
-
making
.
The effectiveness of MediaHub will be tested within an existing multimodal platform, such as
CONFUCIUS.
Although only a snapshot of the cur
rent research could be presented in this report, it is
hoped that it provides a concise summary of the motivation for, and future direction of, the
development of
MediaHub
.
5
References
Amtrup, Jan W. (1995) ICE
-
INTARC Communication Environment Users Guide
and Reference
Manual Version 1.4,
Computer Science
D
epartment,
University of Hamburg, October.
Bayer, S.,
C. Doran
&
B.
George
(2001) Dialogue Interaction with the DARPA Communicator
Infrastructure: The development of Useful Software. In
Proceedings of H
LP 2001, First International
Conference on Human Language Technology Research
, San Diego, CA, USA, 114
-
116.
Brøndsted, T.,
P.
Dalsgaard,
L.B.
Larsen,
M. Manthey
,
P.
Mc Kevitt,
T.B.
Moeslund
&
K.G.
Olesen
(1998) A platform for developing Intelligent Multi
Media applications. Technical Report R
-
98
-
1004,
Center for PersonKommunikation (CPK), Institute for Electronic Systems (IES), Aalborg University,
Denmark, May.
Brøndsted, T., P.
Dalsgaard, L.B.
Larsen,
M. Manthey,
P.
Mc Kevitt, T.B.
Moeslund
&
K.G.
Olese
n
(2001) The IntelliMedia WorkBench
-
An Environment for Building Multimodal Systems. In Advances
in Cooperative Multimodal Communication: Second International Conference, CMC
'98, Tilburg, The
Netherlands, January 1998, Selected Papers, Harry Bunt and Rob
bert
-
Jan Beun (Eds.), 217
-
233.
Lecture Notes in Artificial Intelligence (LNAI) series, LNAI 2155, Berlin, Germany
: Springer
-
Verlag.
Ca
rlson, R.
&
B.
Granström
(1996)
The
Waxholm spoken dialogue system.
In: Palková Z,
(E
d.
), 39
-
52,
Phonetica Pragensia IX.
Charisteria viro doctissimo Premysl Janota oblata. Acta Univers
itatis Carolinae
Philologica 1.
Cheyer, A.,
L.
Jul
ia
&
J.C. Martin
(1998)
A Unified Framework for Constructing Multimodal
Experiments and Applications. In
Proceedings of CMC ’98
; Tilburg, The
Netherlands, 63
-
69.
Choy, K.W
.,
A.A.
Hopgood,
L.
Nolle
&
B.C
.
O'Neill
(2004)
Implementing a blackboard system in a
distributed processing network. In
Expert Update
,
Vol. 7, No. 1, Spring,
16
-
24.
E
MBASSI
(2004)
Homepage.
http://www.embassi.de/ewas/ewas_frame.html
Site visited 09/12/04
.
Fini
n, T.,
R.
Fritzson,
D. McKay
&
R.
McEntire
(1994) KQML as an Agent Communication Language.
In Proceedings of the 3rd International Conference on Information
an
d Knowledge Management
(CIKM
'94), Gaithersburg, MD, USA, 456
-
463.
Fink, G.A.,
N.
Jungclaus,
F.
Kummert,
H. Ritter
&
G.
Sagerer
(1995) A Communication Framework for
Heterogeneous Distributed Pattern Analysis. In
International Conference on Algorithms And
Architectures for Parallel Processing
, Brisbane, Australia, 881
-
890.
Fink, G.A.,
N.
Jungclaus,
F.
Kummert,
H. Ritter
&
G.
Sagerer
(1996) A Distributed System for
Integrated Speech and Image Understanding. In
International Symposium on Artificial Intellige
nce
,
Cancun, Mexico, 117
-
126.
Freeman (2004)
Make Room For JavaSpaces Part 1. http://www.javaworld.com/javaworld/jw
-
11
-
1999/jw
-
11
-
jiniology.html Site visited 16/11/04.
Hugin (2004)
Hugin Expert Developers Site.
http://developer.hugin.com/
Site visited 30/10/04.
IBM X + V
(2004)
IBM's proposal to the W3C.
http://www
-
106.ibm.com/developerworks/wireless/library/wi
-
xvlanguage
/
Site visited 03/11/04
Jeon, H.,
C.
Petrie
&
M.R
.
Cutkosky
(2000) JATLite: A Java Agent Infrastructure with Message
Routing. IEEE Internet Computing Vol. 4, No. 2, Mar/Apr, 87
-
96.
6
Kelleher, J.,
T.
Doris,
Q. Hussain
&
S.
Ó Nualláin
(
2000
)
SONAS: Mult
imo
dal, Multi
-
User Interaction
with a Modelled Environment.
In S. Ó Nualláin, (Ed.),
171
-
184, Spatial Cognition.
Amsterdam, The
Net
herlands
: John Benjamins Publishing Co.
Kirste T.,
T.
Herfet
&
M.
Schnaider (2001) EMBASSI: Multimodal Assistance for Infotainm
ent and
Service Infrastructures
. In Proceedings of the 2001 EC/NSF Workshop Universal on Accessibility of
Ubiquitous Computing: Providing for the Elderly, Alcácer do Sal, Portugal, 41
-
50.
Kristensen, T. (2001)
T Software Agents In A Collaborative Learning
Environment. In
International
Conference on Engineering Education
, Oslo, Norway, August
, Session 8B1, 20
-
25.
Ma, M.
&
P.
Mc Kevitt
(2003)
Semantic representat
ion of events in 3D animation.
In Proc. of the Fifth
International Workshop on Computational Sem
antics (IWCS
-
5), Harry Bunt, Ielka van der Sluis and
Roser Morante (Eds.), 253
-
281. Tilburg University, Tilburg, The Netherlands, January.
Maybury, M.T. (Ed.) (1993) Intelligent Multimedia Interfaces. Menlo Park, CA: AAAI/MIT Press.
Mc Kevitt, P. (Ed.) (
1995a) Integration of Natural Language and Vision Processing (Volume I):
Computational Models and Systems.
Dordrecht, The Netherlands:
Kluwer Academic Publishers.
Mc Kevitt, P. (Ed.) (1995b) Integration of Natural Language and Vision Processing (Volume II
):
Intelligent Multimedia.
Dordrecht, The Netherlands:
Kluwer Academic Publishers.
Mc Kevitt, P. (Ed.) (1996a) Integration of Natural Language and Vision Processing (Volume III):
Theory and grounding representations.
Dordrecht, The Netherlands:
Kluwer Aca
demic Publishers.
Mc Kevitt, P. (Ed.) (1996b) Integration of Natural Language and Vision Processing (Volume IV):
Recent Advances.
Dordrecht, The Netherlands:
Kluwer Academic Publishers.
Mc Kevitt, P. (2003) Mul
tiModal semantic representation.
In Proc. of
the SIGSEM Working Group on
the Representation of MultiModal Semantic Information, First Working Meeting, Fifth International
Workshop on Computational Semantics (IWCS
-
5),
Harry Bunt, Kiyong Lee, Laurent R
omary, and
Emiel Krahmer (Eds.)
, 1
-
16
,
Tilburg U
niversity, Tilburg, The Netherlands, January.
Mc Kevitt, P.
, S. O’Nuallain
& C. Mulvihill
(
E
ds.) (2002) Language, Vision and Music
-
Selected
Papers from the 8th International Workshop on the Cognitive Science of Natural Language Processing,
Galway, Irela
nd. Amsterdam , Philadelphia: John Benjamins Publishing Company.
MIAMM (2004)
Multidimensional Information Access using Multiple Modalities
http://miamm.loria.fr/
Site visited 09/11/04.
Minsky, M. (1975) A Frame
wor
k for representing knowledge.
In
Readings in knowledge
representation,
R. Brachman and H. Levesque (Eds.), 245
-
262,
Los
A
ltos, CA: Morgan Kaufmann.
Nolle, L.,
K. Wong
&
A.A.
Hopgood
(2001) DARBS: a distributed blackboard system.
In Proc.
ES2001, Research
and Development in Intelligent Systems XVIII, M.Bramer, F.
Coenen and A.
Preece
(
E
ds.),
161
-
170, Berlin, Germany
: Springer
-
Verlag
.
OAA (2004)
http://www.ai.sri.com/~oaa/whitepaper.html
Site vi
sited 11/11/04.
Okada, N. (1996), Integrating Vision, M
otion and Language through Mind.
In
Artificial Intelligence
Review,
Vol. 10, Issues 3
-
4, August, 209
-
234.
Ó Nualláin, S.
,
B. Farley &
A.
Smith
(1994)
The Spoken Image System: On the visual
interpret
ation of
verbal scene descriptions. In P. McKevitt, (Ed.), 36
-
39,
Proceedings of the Workshop on integration of
7
natural language and vision processing
,
Twelfth American National Conference on Artificial
Intelligence (AAAI
-
94)
. Seattle, Washington, USA, Aug
ust.
Ó Nualláin, S. &
A. Smith
(1994) An Investigation into
the Common Semantics of Language and
Vision. In P. McKevitt, (Ed.), 21
-
30, Integration of Natural Language and Vision Processing (Volume
I): Computational Models and Systems. London, U.K.: Kluwer
Academic Publishers.
Oxygen (2004)
MIT.
http://oxygen.lcs.mit.edu/Overview.html
Site visited 22/10/04.
Psyclone (2004)
Mindmakers.
http://www.cmlabs
.com/psyclone/
Site visited 21/10/04.
Rich, C., & C. Sidner
(1997)
COLLAGEN: When Agents Collaborate with People. In
First
International Conference on Autonomous Agents
, Marina del Rey, CA, February, 284
-
291.
Reithinger, N.,
C.
Lauer &
L. Romary
(2002)
MIAMM:
Multimodal Information Access using Multiple
Modalities.
In Proc. of the
International
CLASS workshop
on Natural, Intelligent and Effective
interaction in MultiModal Dialogue Systems
, Copenhagen,
Denmark, 28
-
29 June.
Rutledge, L. (2001) SMIL 2.0: X
ML
for
Web Multimedia. In IEEE Internet Computing, Sept
-
Oct, 78
-
84.
Rutledge, L. &
P.
Schmitz
(2001) Improving Media Fragment Integration in Emerging Web Formats. In
Proceedings of the International Conference on Multimedia Modellin
g (MMM01),
CWI, Amsterd
am,
The Netherlands, November 5
-
7, 147
-
166.
Sidner, C.L. (1994) An Artificial Discourse Language for Collaborative Negotiation. In Proceedings of
the Twelfth National Confer
ence on Artificial Intelligence
,
Vol. 1,
MIT Press, Cambridge, MA, 814
-
819.
Smart
Kom (2004)
SmartKom.
http://www.smartkom.org
Site visited 12/10/04.
SMIL (2004a)
SMIL 1.0 WWW Consortium
Recommendation, June
1998.
http://www.w3.org/TR/REC
-
smil/
Site
visited 02/11/04
SMIL (2004b)
http://www.w3.org/AudioVideo/
Site visited 02/11/04
Sunderam, V.S. (1990) PVM: a framework for parallel distributed computing.
In
Concurrency Practice
and Experience
, 2(4), 315
-
3
40.
Thórisson, K.R. (1999) A Mind Model for Multimodal Communicative Creatures & Humanoids, In
International Journal of Applied Artificial Intelligence, Vol. 13 (4
-
5), 449
-
486.
Vinoski, S. (1993)
Distributed object computing with CORBA, C++ Report, Vol.
5, No. 6, July/August,
32
-
38.
Waibel, A.,
M.T.
Vo,
P.
Duchnowski &
S. Manke
(1996) Multimodal Interfaces.
In
Artificial
Intelligence Review
, Vol. 10, Issue 3
-
4, August, 299
-
319.
Wahlster, W.
(2003) SmartKom: Symmetric Multimodality in an Adaptive and Re
usable Dialogue
Shell
.
In: Krahl, R., Günther, D. (eds), 47
-
62, Proceedings of the Human Computer In
teraction Status
Conference
, June.
Berlin
, Germany
: DLR.
Wahlster, W., N. Reithinger
&
A.
Blocher
(2001)
SmartKom: Towards Multimodal Dialogues with
Anthro
pomorphic Interface
Agents. In: Wolf, G. & G. Klein
(
E
ds.),
23
-
34,
Proceedings of International
Status Conference
,
Human
-
Computer Interaction
.
October
,
Berlin
, Germany
: DLR
.
8
W3C (2004) W3C homepage.
http://www.w3.org
Site
visited 02/11/04
.
XHTML + Voice (2004)
XHTML
+
Voice Profile 1.2
http://www.voicexml.org/specs/multimodal/x+v/12/
Site visited 03/11/04
.
9
Appendix A:
Comparison of Intelligent Multimodal
Platforms
Table A.1: Comparison of Intelligent Multimodal Platforms
Ca
tegory
System
Year
Semantic
Representation
Semantic
Storage
Decision
-
making
(Fusion and Synchronisation)
Multimodal Interaction
Frames
XML
BB
Non
BB
Bayesian
Networks /
CPNs
Fuzzy Logic
Genetic
Algorithms
Neural
Networks
Rule
Based
In
put Media
Output Media
Text
Speech
Gesture
Vision
Text
Speech
Gesture
Graphics
Intelligent
Multimodal
Platforms
WAXHOLM
1992
√
√
√
√
√
√
√
√
Spoken Image /
SONAS
1994
√
√
√
√
√
√
√
AESOPWORLD
1996
√
√
√
√
√
√
√
COLLAGEN
1996
√
√
N/A
√
√
√
√
√
INTERACT
1996
√
√
√
√
√
√
√
Ymir
1997
√
√
√
√
√
√
√
√
√
CHAMELEON
1998
√
√
√
√
√
√
√
√
Oxygen
1999
√
√
√
√
√
√
√
√
√
√
SmartKom
2000
√
√
√
√
√
√
√
√
√
√
DARBS
2001
√
√
√
√
√
√
√
√
√
DARPA Galaxy
Communicator
2001
√
√
√
√
√
EMBASSI
2001
√
√
√
√
√
√
√
√
MIAMM
2001
√
√
√
√
√
√
√
√
√
Psyclone
2003
√
√
√
√
√
√
√
√
√
This Project
MediaHub
√
√
√
?
√
√
√
√
√
√
√
√
10
Appendix B: Project schedule
Research Activi
ties
Year 1
Year 2
Year 3
Oct ’04
-
Jan 05
Feb ’05
-
May ‘05
June ’05
-
Sept ‘05
Oct ’05
-
Jan ‘06
Feb ’06
-
May ‘06
June ’06
-
Sept ‘06
Oct ’06
-
Jan ‘07
Feb ’07
-
May ‘07
June ’07
-
Sept ‘07
Literature survey
Writing Chapter 2 ‘Li
terature Review’
Analysis and selection of tools
Analysis of Semantic Representation
Languages (e.g., XML, SMIL, X + V)
Analysis of suitable programming
languages (e.g. Java, C, C++)
Review of AI techniques & other reusable
system components
(e.g. FL, NNs, GAs,
Bayesian Networks
)
Design of
MediaHub
Architecture implementation
Develop Semantic Representation
Database
I
mplement AI technique for
decision
-
making
(e.g. FL, NNs, GAs,
Bayesian
Networks
e.g. CPNs
)
Develop a prototype application for
MediaHub
Integration and testing
Improving system
Write up PhD thesis
Table B.1: Project Schedule
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment