First technical validation

pogonotomygobbleAI and Robotics

Nov 15, 2013 (3 years and 6 months ago)

48 views

NeP4B

Networked Peers For Business


WP Number

Task T 3.3

Deliverable D3.3.1





First technical validation

(final,30.09.2008)


Abstract


This report, realised by UniMo, with the contribution of SATA,
CEFRIEL and ISTI, describes the results of the first
validation phase
applied to the NEP4B prototypes. More precisely, the ICT component
for the construction of the semantic peer, developed in Task T3.2,
discussed with target user groups, of new or re
-
engineered ICT were
tested with respect to the identified

in Task T3.1 The outcome of this
validation activities is twofold: (i) create practical example of NEP4B
technology use and (ii) obtain better focused use cases for the
second, and final, validation phase, regarding the whole NEP4B
infrastructure. Task T4
.3 will go deeper from the organizational and
socio
-
economical viewpoint in characterizing these application
scenarios
.



Document information

Document ID code

D3.3.1

Keywords

Technical validation, ICT services

Classification

DRAFT

Date of reference

30
.09.2008

Distribution level

NeP4B Consortium


Editor

Paola Monari

Unimo

Authors

Andrea Turati

Dario Cerizza

Irene Celino

Gianluigi Viscusi

Fausto Rabitti

Maurizio Vincini

Claudio Gennaro

Valentina Morandi

Marco Barbi

Serena Sorrentino

Mirko Orsini

Anton
io Sala


CEFRIEL

CEFRIEL

CEFRIEL

UniMiB

ISTI
-
CNR

Unimo

ISTI
-
CNR

CEFRIEL

Unimo

Unimo

Unimo

Unimo


Version history

Date

Version

Description

30/09/2008

FINAL

Final version

31/08/2008

DRAFT 2.0

Advanced draft version

31/07/2008

DRAFT 1.0

Draft version

07/07/2008

DRAFT 0.1

Draft version













Contents

1

INTRODUCTION

................................
................................
................................
..

4

1.1

NEP4B

PROTOTYPES

................................
................................
.........................

4

1.2

I
NITIAL APPLICATION S
CENARIOS

................................
................................
......

4

1.3

P
ERFORMED ACTIVITIES

................................
................................
....................

5

1.4

R
ESULTS

................................
................................
................................
............

6

2

LOG
-
A SCENARIO: “TRUCKLO
AD EXCHANGE”

................................
......

7

2.1

S
HORT DESCRIPTION

................................
................................
..........................

7

2.2

T
ESTED PROTOTYPES

................................
................................
.........................

8

2.3

T
EST PROCESS

................................
................................
................................
....

9

2.4

C
URRENT RESULTS AND N
EXT STEPS

................................
...............................

11

3

LOG
-

B

SCENARIO: “LOGISTIC
SERVICE BROKERING”

.....................

13

3.1

S
HORT DESCRIPTION

................................
................................
........................

13

3.2

T
ESTED PROTOTYPES

................................
................................
.......................

13

3.3

T
EST PROCESS

................................
................................
................................
..

14

3.4

C
URRENT RESULTS AND N
EXT STEPS

................................
...............................

16

4

ICT
-
A SCENARIO: “SEARCH
FOR PARTNERS IN

ICT CLUSTERS”

....

17

4.1

S
HORT DESCRIPTION

................................
................................
........................

17

4.2

T
ESTED PROTOTYPES

................................
................................
.......................

17

4.3

T
EST PROCESS

................................
................................
................................
..

17

5

B&C SCENARIO: “MARBL
E IDENTIFICATION”

................................
......

28

5.1

S
HORT DESCRIPTION

................................
................................
........................

28

5.2

T
ESTED PROTOTYPES

................................
................................
.......................

29

5.3

T
EST PROCESS

................................
................................
................................
..

30

5.4

C
URRENT RESULTS AND N
EXT STEPS

................................
...............................

39

6

APPENDIX A


PROTOTYPE TECHNICAL
TESTS

................................
....

40

7

APPENDIX B


DEMO VIDEOS

................................
................................
.......

41

8

APPEN
DIX C


ICT COMPANY PROFILES

................................
..................

42


1

Introduction

This document is aimed at presenting the results of the validation activities applied to the ICT
prototypes realised in Task T3.2, with reference to the application scenarios ide
ntified in Task
T3.1 and described in deliverable D3.1.1 User Requirement Specification. The final outcomes
are:



a sound technical validation of the NEP4B components with respect to realistic
operational contexts



a more precise identification of the exper
imental test bed useful to highlight the features
of the final NEP4B integrated prototype



hints for the presentation of the NEP4B solution to real users involved in the envisaged
use cases.

1.1

NEP4B prototypes

The first software development phase of the proje
ct, i.e. Task T3.2, produced seven prototypes
presented in deliverable D3.2.1 Prototypes for building the semantic peer


First release. They
are briefly recalled in the following:



P3.2.1.1 A system for extracting metadata and annotating (semi
-
structured)
data
sources. This component takes in input structured or semi
-
structured data sources,
possibly partially annotated, and produces a complete annotation with respect to
WordNet.



P3.2.1.2 A system for the incremental building of the PVV, taking as input se
ts of data
sources, multimedia data sources, semantic web services, and produces the PVV
expressed in the OWL language.



P3.2.1.3 A system for extracting metadata and annotating textual and multimedia data
sources, starting from a classification schema, tra
ining data and data to classify, and
producing annotated input data by using the proposed classification schema.



P3.2.1.4 A system for query reformulation and conflict resolution for (semi)
-
structured
data sources, starting from a OQL Query, accessible by

Java API, and giving a Java
ResultSet as outcome.



P3.2.1.5 WSMO Discovery Engine QoS
-
aware. This component uses Ontologies, Web
Service descriptions, Mediators and Goals and produced an ordered list of references
to Web services.



P3.2.1.6 Data user interf
ace offering to the end user the search criteria to retrieve data
described in the PVV and exposed by the semantic peer.



P3.2.1.7 A system for aggregate querying data sources and multimedia sources.

All these prototypes were tested in realistic conditions,

as described in the next sections, so as
to prepare clear and understandable user
-
oriented examples of use of the NEP4B results.

1.2

Initial application scenarios

In Task T3.1 four application scenarios were identified as suitable to highlight the innovative

characteristics of the NEP4B technology. The scenarios selected at that stage, and described in
deliverable D3.1.1 User Requirement Specification, are briefly summarised in the following:



LOG
-
A Truckload exchange
.
The objective is optimising the internal
truck fleet of
manufacturing and trading companies.
A company can offer load capacity on its own
trucks and/or search for trucks to transport its goods. The collaboration follows a P2P
approach, while other ongoing experiences are based on a marketplace in
frastructure.
In line of principle load offers and requests are expressed in form of data (not of
services) by each company.



LOG
-
B Logistic service brokering
. The objective is decreasing the logistic costs,
while keeping the same quality of service or impr
oving it, by aggregating the demand of
different manufacturers. Transportation orders are extracted from heterogeneous
databases (of the manufacturers), a semantic service discovery helps in identifying
logistic operators in condition to satisfy the select
ed orders, orders are aggregated and
sequenced in proper routes and assigned to the best logistic service provider. In
addition, services offered by the Public Administrations for different regions and sectors
can be searched and accessed to obtain authori
sations and administrative information.



ICT
-
A Partner search
.
The objective is supporting ICT SMEs in searching for a partner
(reseller, co
-
maker) taking into account features of the company, the market, the
product and service types. Such information are

hardly available in the current web
sites, then each company willing to interact with other cluster companies creates a
private web site area, accessible only to the cluster members, specifically oriented to
collaboration. A query interface supports the s
earch of the desired characteristics,
including the type and amount of resources to rent (similarly to LOG
-
A). The search can
start in one semantic
peer and can be extended to other peers.



ICT
-
B Multimedia data sharing.
The objective is to allow
ICT SME
s
to understand the
characteristics of a product component from multimedia descriptions (such as a demo
or a presentation), with the aim of a possible integration within a more complex
collaboration to which several SMEs can participate. Another related o
bjective is to
share common knowledge on specific topics, related to the administrative or production
-
related process, provided by the local CNA association in the form of multimedia
training objects. The multimedia content can be videos, images, and text
ual documents
realized with tools provided by CNA, opportunely annotated and automatically
processed. A query interface will support the search of the multimedia objects by means
of a textual description along with a possible non

textual sample (such as an

image).
The search can be issued in a local semantic peer, and can extended to other semantic
peers, where a semantic peer represent the local SME association..

1.3

Performed activities

A first point that is worth recalling is that in the second year of the p
roject further investigations
were made on the envisaged scenarios, as presented in deliverables issues by WP1, to go
deeper in understanding to which extent they are useful to build effective NEP4B showcases.

As a result of this verification process, the
ICT
-
B scenario presented some drawbacks that
suggested to drop it and substitute with a different use case. In fact, the number of multimedia
objects available now, and generated during the project, was quite small (in the order of few
hundreds) and this w
ould negatively affect the quality of the classification and training
mechanisms. In addition, the query the CNA user typically wants to express is much more
focused on textual data then on graphical/audio features of the multimedia object, then reducing
t
he effective validation of some prototypes, especially P3.2.1.7.

For this reason the ICT
-
B scenario was abandoned and replaced by the B&C scenario, Marble
identification developed in collaboration with the marble district of Massa Carrara. The scenario
is

based on offering the user a support to correctly classify the pictures of marbles taken in a fair
or downloaded from Internet or captured in a real
-
life environment. Such pictures, which
constitute the sample to characterize, are compared with a huge rep
ository of pictures, taken
during the cutting and finishing process by an industrial scanner. Such pictures are classified as
soon as they are acquired by the system, once tuned the automatic classification tool (P3.2.1.3)
with a proper training set.

Then
, the scenarios used for the prototype technical tests in this first validation phase are LOG
-
A, LOG
-
B, ICT
-
A and B&C.

The following table shows the relations between prototypes and scenarios, where links
represent the technical test of a given prototype w
ith respect to a given scenario,
















1.4

Results

The next chapters present in some details the technical validation process form the point of
view of the application scenarios, with the purpose of building effective case studies for the
future NEP4
B users. In addition, in Appendix A a summary of the technical tests applied to the
prototypes is reported, to get a better understanding of the activities performed and the results
already achieved and planned in the future.

In general, the main outcomes
of Task T3.3 are the following:



Identification of improvements and adaptation for the final version of the NEP4B
prototypes. Effective feedback have been drawn when performing the test, both with
respect to their intrinsic features and with the possibility

to integrate the NEP4B
components with existing applications.



Decision on the definite application scenarios. The experience gained during this first
validation phase confirmed the interest of scenarios LOG
-
B, ICT
-
A and B&C as
significant demonstration
cases to show the potentialities of the NEP4B technology.
Instead, although the technical validation of the related prototype (P3.2.1.6) was fully
satisfactory, the scenario ICT
-
A was finally dropped as the P3.2.1.6 data user interface
tool can be convenie
ntly demonstrated within scenario ICT
-
A.



Production of a first set of demonstration material. For each application scenario a video
was realised and made available on the project web site, so as to give a first impression
of the effective use of the NE
PB technology in realistic contexts.

LOG
-
A Truckload exchange

LOG
-
B Logistic service
brokering

Application sc
enari
o

ICT
-
A Partner search

B&C Marble identification

P3.2.1.1 Metadata extract &
annotate data sources

P.3.2.1.3 PVV incremental
building

NEP4B prototype

P3.2.1.3 Annotate text &
multimedia data sources

P.3.2.1.4

Query reform. &
conflict resolution

P3.2.1.5 WSMO discovery
engine QoS aware

P3.2.1.6 Data user interface
& demonstration

P3.2.1.7 Aggregate querying
data & multimedia sources

2

LOG
-
A Scenario: “Truckload exchange”

This chapter is aimed at showing how a logistic scenario focused on matching demand and
offer of logistic services between single companies provided with an internal truck fleet can

take
advantage of the NEP4B technology.

This scenario is data
-
oriented and is aimed at highlighting the peer
-
to
-
peer nature of the
system. Companies are aggregated in semantic peers and manage offer and demand of loads
and spaces in transportation mission
s in form of queries addressed to the local semantic peer
or to the network of semantic peers, by exploiting the mechanism of semantic routing.

This scenario is heavily based on the P3.2.1.6 prototype: “Data user interface offering to the
end user the sear
ch criteria to retrieve data described in the PVV exposed by the semantic
peer
". The demonstrator built exploits the capabilities of the P3.2.1.6 prototype to generate an
end user interface based on a
Semantic Navigation approach that allows end users to
e
ffectively navigate and browse the resources aggregated by NeP4B semantic peers.

In the rest of the chapter we briefly summarize the main characteristics of the scenario (section
2.1
); we describe in detail the use of the P3.2.
1.6 user interface prototype (section
2.2
) and we
report on test process (section
2.3
), test results and next steps planned (section
2.4
).

2.1

Short description

The “Trucklo
ad Exchange” scenario here summarized is described in details in deliverable
D3.1.1 and represents one of the four application scenarios (the LOG
-
A scenario) identified by
WP3 for the demonstration and validation of the NeP4B platform.
This scenario focuse
s on
matching the demand and offer of logistic services between single companies provided with an
internal truck fleet, e.g. the companies of a small industrial district. The goal is to optimize the
use of in
-
house transportation services, decreasing their

costs through an effective collaboration
between companies, but without incurring in penalties in terms of the resulting quality of service.

In this scenario, companies which have a residual transportation capacity for a given transport
mission may offer
the remaining empty space to other companies to transport their goods to the
same or a nearby destination for a fee. We term this kind of offer as
space offer
.

Conversely, companies which have to organize an inefficient transport mission for a limited
quan
tity of goods may instead prefer to offer their goods to other companies to be transported
using their vehicles, thus incurring in a reduced logistic cost. We term this kind of offer as
freight offer
.

The NeP4B platform can support these companies to both
offer and search for freight and
space offers.
The target user profiles are the logistic operators of the companies exploiting the
system and performing four main activities, depending on the situation:



Search for freights to complete a truck load



Offer a
freight to other truck owners



Search for space from other truck owners



Offer of space to other manufacturers

The offering phase is carried out automatically through the direct integration of the system with
the companies’ ERP systems, whose data can be use
d to characterize the planned and
requested transport missions. This integration is out of scope for the demonstrator, so we
assume to have all the data already available. The search phase is instead supported by the
end
-
user interface, provided by prototy
pe P3.2.1.6, and it is the focus of this chapter.

The demonstrator is physically deployed on the semantic peer, as shown in
Figure
1
. In this
scenario a semantic peer serves one or more companies (e.g. the companies of an industri
al
district) that want to share their logistic capacity in order to optimize the process and reduce the
costs. Space and freight offers are treated as data and they are taken from the ERPs of the
companies joined in the peer. Search of spaces and freight o
ffers is performed by queries over
data stored locally on a semantic peer; in addition it is also possible to retrieve the offers
published by the other semantic peers of the network, in order to improve the coverage and the
usefulness of the system.
Figure
1

shows also how the system accesses additional, publicly
available data (such as geographic data, goods and vehicle taxonomies and so on), to
complement the data describing the offers and to support improved search and navigati
on
functionalities.


Semantic
peer
peer
peer
peer
Truckload
Exchange
Demonstrator
PVV
Publicly available data
Semantic
peer
Semantic
peer
peer
peer
peer
peer
peer
peer

Figure
1
. The truckload exchange demonstrator in the NeP4B architecture


2.2

Tested prototypes

The prototype P3.2.1.6 "
Data user interface offering to the end user the search criteria to
retrieve data described i
n the PVV exposed by the semantic peer
" was demonstrated on the
LOG
-
A scenario, although the intention is to apply it definitely to the ICT
-
A scenario.

At the core of the P3.2.1.6 prototype there is STAR:chart, a Web framework that adopts a
semantic naviga
tion

approach making lever on the ontological description of data to generate a
effective Web user interface (UI).

STAR:chart makes lever on two elements: the “semantics” of data, expressed by a domain
-
specific ontology, and the use of formal specification
s of navigation and presentation models. In
NeP4B, the domain ontology is represented by the PVV which formalizes the properties and the
relations characterizing the resources of interest in the domain of a semantic peer. The
presentation and navigation mo
del, instead have to be defined by the developer at design time
and drive the framework in generating the user interface at runtime, providing it with high
-
level
guidelines on which resources of the domain ontology should be presented and on how they
shoul
d be displayed.

The user interface generated by STAR:Chart takes advantage of the most promising search
and visualization approaches for data fruition, like the methods of Interaction Design or the
techniques of the so
-
called Web 2.0. In particular STAR:C
hart is able to provide multiple views
on the same data, each one targeted at a particular user need (such as browsing, searching,
comparing and so on) or intended to highlight a particular aspect of the data to be shown (e.g.
temporal or geographical). Mo
reover, STAR:Chart supports the faceted browsing paradigm for
data fruition, especially suitable for navigating and searching a set of homogeneous resources,
allowing the user to restrict (or widen) the scope of the search by adding (or removing) filtering

conditions over the facets (i.e. the properties such as the name, cost, location and so on) of the
shown resources.

Within the LOG
-
A scenario, the P3.2.1.6 prototype based on STAR:Chart has been used to
provide a user interface over the space and freight
offer data which is the focus of the scenario.
Because a PVV were not available (component integration is not the focus of this
demonstrator), a substitute domain ontology has been realized, taking into consideration all the
requirements of the scenario ex
pressed in D3.1.1. This ontology has then been used as a basis
to define the presentation and navigation models needed by the prototype to build the user
interface. This interface supports the search of free spaces and of freight offers and it has been
esp
ecially designed to highlight three of the main features of STAR:Chart, namely the central
role of the domain ontology, the use of multiple view to present the results of a query and the
faceted
-
browsing approach to search and browse data.

2.3

Test process

For

this demonstrator a single semantic peer scenario has been addressed. The demonstrator
is physically represented by Java Web application built around the STAR:Chart framework
developed in the P3.2.1.6 prototype. This application is meant to be deployed on

a Semantic
Peer and offers access to data to all the companies aggregated into the peer.

Real data sources were not available for the demonstrator at the time it was built. This requires
both to model the data schema, by means of a custom ontology, and to

generate the data
instances to be shown using the interface.

The ontology built considers all the requirements detailed in D3.1.1 for the LOG
-
A scenario.It
consists of several components modeling different aspects of data which are relevant for the
scenar
io: contact information, including:



vehicles types, taking into consideration legal regulations such as ATP and ADR
classifications;



types of transported goods, together with compatibility constraints between goods (i.e.
which types of goods can be transpo
rted together) and between goods and vehicles
(i.e. which vehicle can transport a particular type of goods);



organization types (cooperatives, associations, public bodies, etc…);



contact information (addresses, email, phone number, etc…);



geographic inform
ation required to identify sources and destinations of transport
missions.

Regarding data instances, an automated data generator has been built to create a meaningful
dataset to be displayed by the demonstrator. The generator complies with the domain ontol
ogy
and is configured by means of a list of organizations and of a set of generation rules which
specify how to generate space and freight offer for each organization.

On the basis of the domain ontology the presentation and navigation models have been def
ined
in order to define the user interface for the end user. These models provide for a faceted
browsing interface (
Figure
2
) which consider the following facets for space and freight offers:
source; destination; offerer; vehicle
type; freight weight and volume; requested price.

View selection
Keyword based seach
Facets

Figure
2
. Table view of results, highlighting the adopted faceted browsing approach.



Figure
3
. Timeline view of results.

The user interface generated b
y the P3.2.1.6 prototype provides for three types of view over
freight and space offers:



a table view displays all the search results using a compact table, which support the
quick sorting, filtering and comparing of offers (
Figure
2
);



a timeline view displays the offers on a timeline according to their time of validity period;
this view supports the user in quickly identifying the offers available for a certain point in
time (
Figure
3
);



a map view display
s the offers on a geographical map according to their source or
destination location; this view allows the end user to quickly visualize the source or
destination of concurrent offers in order to choose the one that best fits his transport
needs (
Figure
4
).

The LOG
-
A demonstrator is available as a video (to be download at the NEP4B web site) and
as a live demo at
http://seip.cefriel.it/truckload
-
exchange
.



Figure
4
. Map view of results.


2.4

Current results and next steps

The activities performed in Task 3.3 fully demonstrate usability and effectiveness of the P3.2.1.6
prototype in providing end users with an effective user interface to browse and sea
rch the data
exposed by semantic peers.

Despite the positive results, the LOG
-
A scenario will be no further investigated because the
P3.2.1.6 tool can be conveniently demonstrated within scenario ICT
-
A which is characterized by
more complex data and thus r
epresents a more challenging testbed for the whole NeP4B
platform.

Differently from the demonstrator shown above, based on generated data, the ICT
-
A
demonstrator will exploit real
-
World data accessed through the semantic peer query processor
provided by P3
.2.1.4 prototype. This data will be made available to end users by the P3.2.1.6
user interface prototype.

The next envisaged steps related to the ICT use case are the following:



Construction and adoption of a real PVV as the domain ontology exploited by th
e
P3.2.1.6 prototype to generate the user interface. To this end, real data sources will be
integrated and a PVV will be built thanks to the P3.2.1.2 PPV builder. This PVV will be
expressed as an OWL ontology that will be used to configure the P3.2.1.6 use
r interface
framework.



Meeting with users (ICT companies) to understand which representation and
organization of data will be more feasible for the considered applicative domain. The
collected information will be used, together with the PVV, as the input t
o configure the
P3.2.1.6 framework in order to produce the user interface.



Joint design of the application program interface (API) between P3.2.1.6 and P3.2.1.4
required by the first prototype to query at runtime the data aggregated by the semantic
peer an
d obtain the results to show in the user interface.



3

LOG
-

B scenario: “Logistic service brokering”

3.1

Short description

This chapter is aimed at showing how a logistic scenario where the main role is played by the
Broker, offering mediation services to manu
facturers and logistic operators can take advantage
of the NEP4B technology.

This scenario is mainly service
-
oriented, then particularly focused on semantic Web service
discovery. This scenario provides also an example on how an existing web application c
an be
extended and empowered by using ICT tools made available by the NEP4B infrastructure.

The critical point when studying how to reduce logistic costs within an industrial district is to
promote and support the aggregation of the logistic demand for the

companies settled into the
district. Demand aggregation can be performed by a neutral actor, hereinafter called Broker,
entrusted by the user companies. The Broker is in charge of collecting transportation orders
from the users, studying the best transpor
tation solutions taking into account route optimisation
and technical and business conditions offered by logistic service providers. In other words, the
Broker acts as mediator supporting the match of demand and offer of logistic services.

In the frame of

the NeP4B architecture, this scenario is particularly suited for an INTRA
-
PEER
application. In fact, it is likely that the companies settled into the same industrial district can
refer to the same semantic peer, together with the broker and the logistic s
ervice providers
linked to that district. Demands, offers and their aggregation are then included into the same
semantic peer. Also an INTER
-
PEER application can be considered of interest, when
considering a network of brokers (and other logistic service
providers) distributed in a large
territory (several regions in different countries), where complex and long
-
range transportation
orders can be satisfied by properly combining partial solutions offered by single brokers.

To make the desired cost reduction

effective it could be useful to integrate Public
Administrations (PAs) in the network, directly or through a virtual view on the available services
enabled by repository technologies. NEP4B users can access them to know the rules of interest
and to get po
ssible authorisations. In turn, PAs can ask to NEP4B information about the
transaction flows, to derive indications and drifts.

3.2

Tested prototypes

The prototype P3.2.1.5 “WSMO Discovery Engine QoS
-
aware" was demonstrated on the LOG
-
B scenario.

The core of

this prototype is Glue2, which can be invoked through its programming interface,
including both API and Web service methods.
Starting from a goal description, Glue2 performs
the discovery process: by means of its internal mediators, it is able to check wh
ether a Web
service satisfies the goal. At the end, Glue2 returns a list containing the Web services that
match the goal; for each of them it states the level of match (for ranking purpose) and other
useful information (such as the URL, a textual descripti
on and so on).

Glue2 provides two main functionalities: publishing resources and discovering Web services.
The former accepts an entity description (e.g. a Web service instance) as input and stores it into
the internal repository. The latter accepts a Goal

instance description as input and, after
performing the discovery process, retrieves (from the repository) a list of Web service instances,
which are the only ones that match the Goal instance.

One of the characteristics of Glue2 is its ability to deal wi
th the different polarization that often
occurs between requesters and providers. As a matter of fact, in open environments the points
of view of providers and requesters about the shared knowledge could be different (due to
different philosophical positio
ns, sense of belonging, etc.), and this is reflected inside Glue in
different ontologies respectively used for annotating Web Services and for formulating the
requests.

The internal knowledge base consists of Ontologies, Web Service classes and instances,
Mediators, Goal classes. Given a Goal instance description as input, the output is an ordered
list of references to Web service instances.

3.3

Test process

3.3.1

The prototype technical test

GLUE2 is used in two different contexts, at design time and at run time:



At design time the purpose is feeding the Glue2 Repository, by inserting the domain
ontologies, the Goal classes, the WS classes and instances as well as the mediators.
Even if the action of populating the internal repository of Glue2 is done always in the

same way, conceptually the entities inserted into the repository are different and are
usually inserted subsequently in two different moments. Firstly, the domain in which
Glue2 is going to run is modelled by creating a set of ontologies. Then, several We
b
service classes representing all the possible types of service offers are described with
respect to the domain ontologies; the same is done for the Goal classes, in order to
describe also all the possible requests. Finally, some mediators are created to
describe
the rules that verify the conditions under which a service satisfies a goal. This fist set of
entities is then inserted into the repository in order to setup the system and make it
usable


this is done only once, at the beginning. In a second tim
e, providers can
describe their own real services referring to the available Web service classes and
insert them into the Glue2 repository as Web service instances


this is done every time
a provider makes a new service available.



At run time the Service
Discovery takes place, starting from a Goal instances, taking
into account functional (mandatory) and non
-
functional (optional) properties of the
desired service.

The test data to demonstrate the LOG
-
B scenario include the following entities.



A set of do
main ontologies provide the terminology to be used. Many ontologies have
been created, each one focusing on a particular aspect.

o

The “temporal ontology” models both temporal concepts


like
date

(as an
instant

having a
day
, a
month

and a
year
),
time

(i.e.
an
instance

with
hours
,
minutes

and
seconds
) and
interval

(having a
starting instant

and an
ending
instant
)


and axioms that calculate values related to the time


like the
distance between two dates or the intersection between intervals,

o

The “location on
tology” describes locations like
provinces
,
cities

and so on.

o

The “goods ontology” models good typologies, like
deep frozen goods
,
fresh
meat

and so on.

o

The “logistic ontology” models concepts related to the logistic domain, like
warehouse
,
truck
,
payment
modalities
,
insurance
,
package
,
unit of measures

and many other logistic aspects.



A Web service class describes a generic logistic service, which offers the transport of
goods from a location to another location by means of a set of trucks and warehouses.
This Web service class allows providers to specify the covered geographical areas, a
standard price list, the minimum number of hours between pickup and delivery, the
minimal accepted quantity of goods to transport, a possible deduction for repetitive
trav
els, and other additional characteristics such as the accepted payment modality, the
provided insurance, etc.



A Goal class describes a generic request for the transport of a generic good, from a
location to another location in a specified time interval.



A
mediator connects the Web service class to the Goal class described above and
includes all the rules that a Web service instance has to satisfy in order to match a Goal
instance. An example of such rules is the one that checks whether the locations stated
in the Goal instance where picking
-
up and delivering the good are covered by the Web
service instance.



25 Web Services instances model 25 different logistic services with different features.
Each one represents the offer of a specific logistic operator, in
cluding information of
type of fleet, area of work, types of material, warehouses and commercial conditions.
They differ in the covered geographical areas, the types of goods that they are able to
transport, the fleet and warehouses they own.



One Goal ins
tance, expressing the request for a transportation, in a unique route, of
one quintal of diving gear from a shop in Milan to a school located in Bologna. Other
details are specified: goods can be picked up from 13/03/2008 and must be delivered
within 15/03
/2008, transportation is at sender’s expense and the freight cannot be
aggregated with other freights. In addition, the Goal instance includes some
preferences on payment method (carriage paid), payment deadline (at the least 15
days), the insurance (refun
d for loss), the base price (less than or equal to 50 Euros)
and the maximum number of hours to delivery (less or equal to 72 hours).

The discovery process identified 4 WSs which match the functional properties, ordered by the
degree of matching with the n
on functional properties.

3.3.2

The LSM web application

The LSM web application, realised by SATA, is currently used at the Broker service activated in
the Modena province by ITL (a Foundation of the Emilia
-
Romagna region on Trasportation and
Logistics), with t
he support of CAP (Consorzio Attività Produttive


Productive Areas
Consortium) and the participation of 4 manufacturing companies (www.mo.brokerlogistica.net).

The service is operated by a young engineer with the support of the LSM application, whose
mai
n functions are:



Transportation order acquisition:



Finite capacity planning (to optimise the use of the internal fleets by the manufacturing
companies), aggregating orders into missions for internal fleets



Infinite capacity planning (to identify possible

transportation solutions offered by logistic
operators and compare them), aggregating orders into missions for logistic operators



Mission assignment and confirmation (information exchange with the logistic operators)

All these functions use a knowledge ba
se where types of materials, types of trucks, types of
packages (as well as their mutual relations) are coded, internal fleet agendas are updated,
geographical sites to visit (customers, suppliers and sub
-
contractors of the manufacturing
companies using th
e service) are characterised in terms of geo
-
reference, area (cluster), mutual
distances.

3.3.3

Integration scenarios

A first integration scenario between the LSM application and the WSMO Discovery Engine was
proposed in deliverable D3.1.1, based on the assump
tion that transportation orders are the
goals, which web services express the logistic operator offers.

This hypothesis is still valid, although it corresponds to a specific use case of the LSM
application, i.e. when a transportation order is urgent and th
e best transportation solution should
be found without waiting for other orders which could be potentially aggregated into the same
route.

After a deeper analysis, it was agreed that it is worth investigating also the most typical use
case, assuming that
the goal is a mission, i.e. a complex route satisfying more orders in the
same geographical cluster. A mission created by the LSM application is characterised by a
number of relevant properties, like time duration, maximum weight, maximum volume, maximum
s
urface, geographical cluster, type of materials to carry. These features can be compared with
WS properties, characterising transportation means and conditions offered by every logistic
operator.

Note that this comparison is not trivial, as a number of em
pirical rules are applied in the reality
to make a given mission fit the only transportation offer meeting constraints on load type and
truck type. Examples of such constraints are:



It the material is iron and the weight is higher than 2 tons (mission data
), then the truck
must be able to carry more than 2.5 tons and must be equipped with hydraulic side.



Iron materials and plastic shapes cannot be carried on the same vehicle.

Such constraints are under collection and will be implemented in the Discovery En
gine
knowledge repository to realise the desired match between goal and services at run time.

3.4

Current results and next steps

Such integration, here presented in general terms, will be realised by the end of the project.





4

ICT
-
A scenario: “Search for par
tners in ICT
clusters”

4.1

Short description

This scenario represents the context in which a company belonging to an ICT cluster decides
to search for partners (of the same sector) to start negotiating and collaborating with. The
company uses the search funct
ionality whenever it needs to identify new potential partners, buy
products and services from other companies, possibly competitors, rent resources of different
kind (human and technological).

With respect to the NeP4B architecture, this scenario is partic
ularly suited to an INTRA
-
PEER
application. In fact it is likely that the ICT company cluster is fully represented in the same
semantic peer, grouping together actors with common interests. This does not exclude
necessarily the INTER
-
PEER case, because the

collaboration across different clusters would be
of course very useful.

In this scenario the search for partners, as well as the search for products and resources, is
implemented in form of query applied to the web sites of the involved companies.

4.2

Teste
d prototypes

Three prototypes were demonstrated on the ICT
-
A scenario in task T3.3:



P3.2.1.1 "
A system for extracting metadata and annotating (semi
-
)structured data
sources
".



P3.2.1.2 “A system for the incremental building of the PVV”



P3.2.1.4 “A system
for query reformulation and conflict resolution for (semi
-
)structured
data sources”

4.3

Test process

The CNA
-
ICT cluster of the Modena province participated actively in the prototype technical
test, by making available information about their profile and offer
. The XML files and their
reference schema are attached in Appendix C.

Fourteen companies were involved, belonging to four sectors: software (SW), automation (AT),
telecom and networking (CM), graphics (GR) (although the distinction between sectors is
some
times very fuzzy). The current web sites of such companies are built for advertising
purpose, and do not contain relevant and complete information on company profile and offer.
Then, SATA interviewed the CNA ICT actors and coded the collected data into 4 d
ifferent XML
schemas, each characterized by its own file .dtd. Each schema can be intended as a separate
source, each including 3
-
4 instances. For the purpose of generality, the same source groups
companies of different sectors. More in detail:



Schema 1: L
eonardo Multimedia (SW), Elefondati (CM), Progel Engineering (AT),
CopiaModena (GR)



Schema 2: Hars


CastGroup (SW), Delin elettronica (AT), Miliaris (SW), Tel&Co (CM)



Schema 3: Archivist (SW), General Teleinformatica (CM), Grafiche Alice (GR)



Schema 4: SA
TA (SW), FG software (SW), Tipolito Salvioli (GR)

Each schema was modeled in a different style, emphasing time by time the different aspects
shown in the table below as Main feature. The table also shows the maximum depth of the XML
hierarchical structure
as well as the number of nodes.


The ICT
-
A scenario includes activities to be performed at design time (publication), i.e.
whenever a new data source is available for data integration, and activities to be performed at
run time, i.e. the search for partn
ers (query). Publication is of interest for software integrators
building data
-
warehouses from different databases, realizing vertical portals involving several
independent enterprises, making different information systems of merging companies
interoperate
. Instead, query is the function of interest for the final user, typically non
-
ICT expert.

In task T3.3 much effort was spent in demonstrating the publication phase, while the query
phase was tested in its core functions but the final interface (P3.2.1.6
) will be integrated in the
final prototype development. Thanks to the data collected in the field and organized into four
different data sources, the publication environment (P3.2.1.1 + P3.2.1.2) was tested with
respect to many schemas with few instances
each, while the typical test beds available so far
consisted of a couple of schemas with a lot of instances each.

A preliminary automation step to facilitate the work of the system integrator is translating the
XML representation of the source data into a

relational format compliant with a specific RDBMS
platform. This was basically done by taking advantage of the features of the Altova suite, and in
particular XMLSpy and Map Force tools, and adding further Java code thus realising a
seamless automated con
version process. Let’s now examine the process details.

The conversion between the XML and the relational representation formats is twofold since it
actually works on both the schema and the instance data. The schema conversion process is in
charge of defi
ning a valid relational schema that might map the XSD schema adopted for the
source data representation. Taking into account the natural hierarchical structure of an XSD
schema, a first Java process was implemented in order to adapt the source XSD by addin
g
some extra fields representing the concepts of relational keys and foreign keys and by applying
other patterns in order to correctly manage multiple cardinalities.

The adapted XSD can be turned into a relational schema by leveraging the functions provid
ed
by Altova XMLSpy. The final result is a relational schema derived from the XSD schema. The
work on the two schema must be completed by writing the conversion rule thus enabling an
automatic translation of each instance compliant with the source XSD into

a set of SQL
statements able to populate the relational schema previously defined. This last step is
undertaken by exploiting the Altova Map Force functions that produce the Java code
implementing all the conversion rules.



Max depth

# of
nodes

Product
characterisatio
n

M
ain feature


Sch 1


5

~ 200

List of parameters

Schema + general info

Sch 2


5

~ 120

List of
descriptions


Collaborations

Sch 3


6

~ 150

Family
parameters +
product/service

parameters

Company profile, families of
products/resources

Sch 4


6

~ 130

List of
descriptions

Resources

The work on the schema is now c
omplete. In order to complete the whole process, any valid
XML instance representing a source data must be modified in order to respect the adapted
XSD. An additional Java process is dedicated to that. The adapted XML instance can now be
processed by the J
ava code previously generated by Altova Map Force, thus completing the
whole computation. Finally, it must be mentioned that the execution of the whole process is
coordinated by proper Apache Ant tasks in order to ease the execution in an automated
environ
ment.

P3.2.1.1 "
A system for extracting metadata and annotating (semi
-
)structured data
sources
".

The system, named ALA (Automatic Lexical Annotation), was implemented as an integrated
component within the MOMIS system. The process of disambiguation, is rea
lized through the
combination of 4 different algorithms: the structural algorithm (SD), the WordNet Domains
(WND) algorithm, the Gloss Similarity algorithm and the Iterative Gloss Similarity algorithm. The
algorithms can be combined in two different ways:
consecutively or simultaneously. In the first
way, the annotations obtained with an algorithm, can be used as input for the following one.
With the last way, the output of each method contributes to the final annotation; each method is
executed autonomous
ly. The user can choose the methods to be used for the annotation. In the
case of sequential execution, the user chooses the order of the methods. All this functionalities
will be available to the user by a GUI. The system, receives as input a set of terms
, and all
structural relationships automatically extracted from the source data. As output, the software
returns a set of the terms, each annotated with one or more meanings. During the process, the
tool interacts with the WordNet lexical database in order

to extract the synset associated to
each term. The image below, shows the GUI that displays the result of the automatic
disambiguation process. The left panel, shows the list of sources to be integrated, while the
right panel, clicking on a data source te
rm, shows the list of meanings associated with the term.




Figure
5

The GUI of the automatic annotation output


We experimented the automatic annotation tool ALA, over the ICT scenario data source. The
annotation process starts

without any partial annotation of the data source (this means that the
entire work is delegated to the automatic annotation).
Every term is annotated with more than
one synset. Annotation has been executed with the default configuration:



the algorithms h
ave been run simultaneously (the output of each method contributes to
the final annotation);



WND algorithm considers only the first 4 prevalent domains;



every algorithm considers all terms that belong to the same source as context;



every compound terms (id
entified by camel
-
case or underscore (“_”) etc… notations) :
has an annotation for every term that belong to compound term.

The table below shows the result obtained with the ALA tool. The result values of precision and
recall increase in the combined algo
rithms approach, compared with values obtained by
applying algorithms individually.


In addition, the tool is able to detect more possible meanings for a term. For example, the term
"product", in the context of these sources, has more than one correct mean
ing. As shown in the
picture below, the annotation tool is able to identify two meanings as correct for the term
"product".












The result of the automatic annotation tool.





The meanings associated by the automatic annota
tion tool to the term “product”.


P3.2.1.2 “A system for the incremental building of the PVV”

Starting from the annotated local sources, the tool provides a methodology for the incremental
building of the Peer Virtual View (PVV).

The first step relies on
manually refining the obtained annotation, by deleting those incorrect
defined by the MOMIS ALA.

For example the following figure shows the prototype interface to delete a synset (by uncheck a
“sense”) for a term.



































The se
cond step computes the similarity functions among terms by means of the WordNet
relationships and generating a set of relations called Common Thesaurus.

Then, a parametric cluster algorithm define a rough PVV, grouping the most similar classes (see
the fig
ure):



The prototype permits the PVV refining by a drag and drop function to correct mistakes and
imprecision or to add complex maps between global and local representation.



The obtained Peer Virtual View is made of:

-

33 global classes

-

104 local class
es

-

237 global terms



P3.2.1.4 “A system for query reformulation and conflict resolution for (semi
-
)structured
data sources”

We tested the Query Manager prototype
on the CNA
-
ICT cluster scenario considering three
different search test cases:



Product Search

(query 1): find all the partners that are selling the “KYOCERA
FS” model of printer



Service Search (query 2): find all the partners that are offering training services



Generic Partner Search (query 3): find all the italian partners belonging to the
softwa
re sector


Queries can be composed using a GUI provided in the prototype.
The GUI, shown in
Figure
6
,
presents in a tree
-
representation the ontology, showing ISA relationships among the classes.
The user can select

the global classes to be queried and their attributes are shown in the
“Global Class Attributes” panel with a simple click. Then the attributes of interest can be
selected, specifying, if necessary, a condition in the “Condition” panel with usual relation
al
predicates and logic operators. More than one global class can be joined just choosing one of
the “Referenced Classes” of the currently selected class with no need to specify any join
condition between the classes as it is automatically inserted. The gr
aphical query, including
selections and conditions specified by the user, is then automatically translated into an SQL
query and sent to the Query Manager to be executed.


Figure
6

The Graphical User Interface for composing querie
s on the PVV

The figure above shows an example of the formulation of the query 2: “find all the partners that
are offering training services”. The user selects the class company from the tree on the left side
representing the PVV. All the attributes of the

class company are then shown in the tree in the
middle panel. Then, the user adds to the selection the “Referenced Class” service. All the
attributes of this class are then automatically added to the “Global Class Attributes” panel, and
the user may selec
t attributes from this global class as well. To find all the partners that offer
training services the user selects the attribute description, price and xrd_pk of the class service,
and the attributes companyName, email and companyId of the class company.
To restrict the
query only to the “training services”, it is just needed to add in the “Condition” panel the
condition “description like formazione” when the description attribute is selected.

Then, clicking the button “Execute Query”, the following query
is composed, shown in the right
side panel and sent to the Query Manager:

SELECT s.description, s.price, c.companyName, c.email, c.companyId, s.xrd_pk

FROM company as c, service as s

WHERE s.company_xrd_fk = c.xrd_pk

AND s.description LIKE '%formazione%'

The query expressed on the PVV (global query) is rewritten as an equivalent set of queries
expressed on the local schemata (local queries); this query translation is performed by
considering the mapping between the PVV and the local schemata.
The Query Man
ager follows
a GAV approach, thus this mapping is expressed by defining, for each global class G, a
mapping query qG over the schemas of a set of local classes L(G) belonging to G. Then the
query translation is performed by means of query unfolding, i.e.,
by expanding a global query on
a global class G of the PVV according to the definition of this mapping query qG.

When the global query is expressed as a join query among a set of global classes, it is
expanded into a set of global subqueries, each one refe
rring to a single global class. Every
“single global class” query is rewritten and executed concurrently, and the results coming form
the expanded queries are then fused together and presented to the user.

Query 1
: the query ‘Find all the partners that are

selling the “KYOCERA FS” model of printer’
has been written in our system as the following global query:

SELECT c.companyId, c.companyName, p.productDescription, p.price

FROM company as c, product as p

WHERE p.company_xrd_fk = c.xrd_pk

AND p.productDesc
ription LIKE '%stampant%KYOCERA FS%'


Query 2
: the query ‘Find all the partners that are offering training services’ has been written in
our system as the following global query:

SELECT s.description, s.price, c.companyName, c.email, c.companyId, s.xrd_pk

FROM company as c, service as s

WHERE s.company_xrd_fk = c.xrd_pk

AND s.description LIKE '%formazione%'


Query 3
: the query ‘Find all the italian partners belonging to the software sector’ has been
written in our system as the following global query:

SELE
CT c.companyId, c.companyName, s.companySector, l.city, l.country

FROM sector as s, company as c, mainLocation as l

WHERE s.company_xrd_fk = c.xrd_pk

AND l.company_xrd_fk_1 = c.xrd_pk

AND s.companySector LIKE '%software%'

AND l.country = 'Italia'


Here the

querying execution process is described for the query 3. The global query is
automatically expanded into three “single global class” queries over the involved global classes
sector
,
company
, and
mainLocation
:

-

Single Class Query on
sector
:

SELECT s.compa
nySector , s.company_xrd_fk FROM sector as s

WHERE (s.companySector like '%software%' )


-

Single Class Query on
company
:

SELECT c.companyId , c.companyName , c.xrd_pk FROM company as c

-

Single Class Query on
mainLocation
:

SELECT l.city , l.country ,
l.company_xrd_fk_1 FROM mainLocation as l

WHERE (l.country = 'Italia')


By means of the mappings between global and local classes, each single global class query is
rewritten into a set of local class queries, taking into account the translation functio
ns expressed
on the mapping table. For example, the mapping between the global attribute
mainLocation.company_xrd_fk_1

and the local class
mainLocation

is expressed as follows:

MT [
company_xrd_fk_1
][
CastGroup.mainLocation
] = (SELECT l.company_xrd_fk FROM
l
ocations AS l WHERE locations_xrd_fk = l.xrd_pk)

This mapping retrieves the
company_xrd_fk

data, that are not in the
mainLocation

local class,
querying the
locations

local class. Then, the single class query on
mainLocation
for the

CastGroup

data source is

rewritten as follows:

SELECT (SELECT l.company_xrd_fk FROM locations AS l WHERE locations_xrd_fk =
l.xrd_pk) AS locations_xrd_fk, mainLocation.city, mainLocation.country

FROM mainLocation

WHERE (country) = ('Italia')


By means of the mappings between the
global class
mainLocation

and the local data sources,
twelve local queries are generated and concurrently executed over the sources. The local
classes partial results are fused together
according to the mapping queries, data conversion
functions are applie
d, and the possible conflicts are solved by the resolution functions. Finally,
the global classes results are joined and presented to the user as shown below.






5

B&C scenario: “Marble identification”

This new scenario substitutes the ICT
-
B scenario ide
ntified in task T3.1 and documented in
deliverable D3.1. The main reason was the limited number of multimedia objects available
(training and demonstration videos), with the risk of low accuracy of the classification process,
and the “passive” role of the
multimedia knowledge with respect to the textual knowledge (the
user search is focused on words and sentences instead of colours, frame, images).

5.1

Short description

This scenario is based on the automatic analysis and sharing of multimedia contents as well

as
simple text documents, as we will see in full details. The main actors are the marble customers
aiming at finding the products that best fit their needs and the marble enterprises. Searches are
performed using multimedia and text documents and making u
se of advanced search and
analysis distributed systems.

Usually, requests to marble industries are made by images of the desired materials and short
textual descriptions sent by the clients. Each enterprise have to manually analyze the material
send by the

customer and clients have to send such request to one producer at a time. In this
scenario we propose a
Semantic Peer Multimedia Repository

system intended to efficiently
support the distributed resolution of such queries. This implies the use of the mult
imedia and
textual documents owned by marble industries. The importance of this approach is implied by
the expected economic advantages since the alternative approach of manually analyze orders
(for the enterprises) and find the best producer (for the clie
nts) is time
-

and cost
-

consuming. In
this scenario both distributed multimedia similarity search systems and classification algorithms
will be used.

In this scenario we consider
marbles and granites trading

SMEs. Each enterprise buy raw
marbles and granit
es blocks and cut them into slabs that are smoothed and waxed. The
obtained slabs are sold to their final customers.

All these SMEs maintain a vast MM database of high quality images of each slab they have.
Each slab is categorized on the base of its type

of material. One of the main reasons for the
existence of this database is that images are used as the base for starting commercial
transactions. In fact, the interaction between this type of enterprises and their clients usually
starts with a generic req
uest made by the customers where they only send an image of the
material they want to buy. It is a common case that clients see the marble of their interest in a
(cantiere) or on a web site. Then, they take a picture of this material and send them to a mar
ble
industry, asking if they are able to satisfy such a request. Usually, they are not able to exactly
specify the type and quality of the marble. Enterprises have then to devote part of the time of
their marble experts to try to identify the materials inv
olved in the request. This could be an
iterative process, where clients are requested to send more details and better images. In order
to easy the work of experts, marble industries maintain images of all the slabs they have worked
on, since they constitut
e a good base of comparison for the user requests. Moreover, images
are categorized in order to speed up the work of comparison.

In this scenario, the NeP4B architecture is used to solve the marble trading problem. Marble
SMEs could constitute a specific
peer in the network, where users can submit queries in the
form of a mix of images and textual data. Each SME can use a multimedia and metadata
management system in order to store its archive and, later, analyze queries. Images are
compared using standard
MPEG
-
7 features. Results are obtained by combining MM features
and textual data about the requested items.

This scenario is developed in cooperation with the
Metro SPA Marmi e Graniti
1
, which is a
company that have gained a long lasting experience in the

trade of high quality classical and
precious marbles and granites from all over the world. Metro SPA offers their customers in Italy
and abroad a vast range of slabs and cut to size of any type at competitive prices. They also
produce special works on req
uest to satisfy the needs and requirements of building and
architectural market.
The contact person of Metro Marmi SPA is Gianluca Fabrizi.


The company is representative of the marble district in Massa
-
Carrara (90% of the marble
treated and sold is impor
ted), where companies work on different phases along the marble
processing life cycle (cut, finishing, packaging).

Images play a fundamental role, as the marble is a natural product. For this reason two marble
slabs are never equal one to each other, then
the similarity search is of outmost importance at
least in the following phases:



Catalogue and sales (85% sales are based on images),



Shipment documents



Production monitoring

Concerning the use of images in the marble life
-
cycle phases, they are very cri
tical when:



Archiving Marketing images



Archiving Production images



Archiving Sales images



Searching in marketing, production and sales archives images similar to a sample
image provided by the customer. For this search it could be very useful an automatic
classification mechanism.

5.2

Tested prototypes

Two prototypes were demonstrated on the B&C scenario in task T3.3:



P3.2.1.3 "
A system for extracting metadata and annotating textual and multimedia data
sources
".



P3.2.1.7 “A system for aggregate querying da
ta sources and multimedia sources”

P3.2.1.7 (i.e., MILOS) is a Multimedia Content Management System based on powerful
multimedia database, able to guarantee advanced features for the persistence, search, and
retrieval of the multimedia content described by

metadata encoded in XML documents.

P3.2.1.7 support the persistence and the management of the output of P3.2.1.3 for multimedia
document classification. The classification technique will be based on a set of training examples
provided by the user of the r
epository. This means that the method will be seen as a
semi
-
supervised learning method
, since only a subset of the training data must be manually
classified. The training data can be either textual data or non
-
textual data (such as images).

Moreover, once

the text and multimedia independent classifiers have been generated, we will
further attempt to improve them by means of a semi
-
supervised learning technique known as
co
-
training
.





1

Metro SPA Marmi E Graniti

9, VIA DORSALE
-

54100 Massa (MS)

http://www.metromarmi.it

tel: 0585 792363


5.3

Test process

5.3.1

The prototype technical test

Two user profiles have been iden
tified in the B&C scenario:



Internal users, the sales person willing to satisfy at best customer requests (also
proposing alternatives if needed)



Final users, the customer capturing a marble image somewhere (in a building, on the
web, at an exhibition) an
d searching for the best similar blocks.

5.3.2

Classification Phase

In order to store the information in MILOS, we had to integrate the metadata descriptors in
MPEG
-
7 with the classification information.
We

decided to put everything under the xml tag root
<NeP
4B>.

Under this tag we put the metadata obtained from the .
msi

files together with the
information encoded according to the MPEG
-
7standard. The metadata contained in the .
msd

files were neglected for now.

The information corresponding to the classification

was reported in the MPEG
-
7 as previously
decided within the project NeP4B, as in the following:

<Mpeg7>


<Description type="ContentEntityType">


<MultimediaContent type="ImageType">


<Image>


<CreationInformation>


<Classificatio
n>


<Subject confidence="0.7486195894231509">


<KeywordAnnotation xml:lang="it">


<Keyword>
ROSA_PORRINO
</Keyword>


</KeywordAnnotation>


</Subject>


</Classification>


</Creatio
nInformation>


</Image>


</MultimediaContent>


</Description>

</Mpeg7>


The value of
confidence

attribute of the
<Subject>
element is reported the score attributed to
the category assigned by the classifier. If this value is
"1.0"

the document is
part of training set.
Please note that information on the material manually annotated is always available in
<slab
-
info>

element and in particular in the
material
attributed
.

Here it is an example of complete
description:


<NeP4B>


<slab
-
info id="9335" c
ode="6" block
-
id="284" block
-
code="01"


material="ROSA_PORRINO" thickness="20">


<finishings></finishings>


<dispositions></dispositions>


<measures dpi="107.21346892416">


<measure code="ext" x="0" y="0" width="15439" height="10240">



External slab size


</measure>


<measure code="mar" x="986" y="1256" width="13343" height="7829">


Full slab size


</measure>


<measure code="com" x="1488" y="1456" width="12337" height="7428">


Commercial slab

size


</measure>


<measure code="int" x="1162" y="1360" width="12990" height="7622">


Internal slab size


</measure>


</measures>


<notes></notes>


</slab
-
info>



<Mpeg7>


<Description type="ContentEntityType">


<Mu
ltimediaContent type="ImageType">


<Image>


<CreationInformation>


<Classification>


<Subject confidence="0.7486195894231509">


<KeywordAnnotation xml:lang="it">


<Keyword>ROSA_PORRINO
</Keyword>


</KeywordAnnotation>


</Subject>


</Classification>


</CreationInformation>


<MediaLocator><MediaUri>2007
-
12
-
19
-
01
-
6.jpg</MediaUri></MediaLocator>


<VisualDescriptor type="Scalab
leColorType"
numOfBitplanesDiscarded="0"


numOfCoeff="64">


<Coeff>


-
118
-
18
-
72 54
-
24 10 22 29 24 10 11 22
-
41 13 19 22 7
-
3 0 2


-
2 5 0 0
-
15 1 2 0
-
6 5 1
-
4 3 3 3 1 0 0 1 2 3 2 1 3 1 2 4 5
-
15



-
3 1 2 2 3 3 0
-
3 0 0
-
2 1 0
-
3
-
3</Coeff>


</VisualDescriptor>


<VisualDescriptor type="ColorStructureType" colorQuant="2">


<Values>


0 0 0 0 0 0 0 0 187 235 0 0 0 0 0 0 0 0 0 0 0 0 0 0 205 255 255



87 31 46 6 0 0 0 0 0 0 0 0 0 32 112 58 2 12 30 4 0 0 0 0 0 0 0 0


0 0 0 0 0 0 0 0 0


</Values>


</VisualDescriptor>


<VisualDescriptor type="ColorLayoutType">


<YDCCoeff>32</YDCCoeff>


<
CbDCCoeff>17</CbDCCoeff>


<CrDCCoeff>40</CrDCCoeff>


<YACCoeff5>16 16 16 16 17</YACCoeff5>


<CbACCoeff2>16 16</CbACCoeff2>


<CrACCoeff2>16 16</CrACCoeff2>


</VisualDescriptor>


<VisualDescriptor

type="EdgeHistogramType">


<BinCounts>


3 2 5 4 7 3 3 6 5 6 1 2 6 5 7 3 1 5 4 7 1 4 4 6 7 4 1 5


2 7 3 1 3 5 7 2 2 5 5 7 2 2 6 4 7 4 2 4 5 7 2 3 6 4 7 1


1 5 6 7 4 2 4 5 7 3 2 4 5 7 1 2 5 3 7 3 2 5 5 7



</BinCounts>


</VisualDescriptor>


<VisualDescriptor type="HomogeneousTextureType">


<Average>132</Average>


<StandardDeviation>53</StandardDeviation>


<Energy>


143 184 184 153 183 1
87 159 172 173 154 173 171 153 183


158 158 153 169 133 160 156 147 161 149 112 121 113 103 112 113


</Energy>


<EnergyDeviation>


139 186 185 151 187 189 148 166 169 155 166 167 140 180 144


142

137 164 122 135 154 128 157 147 96 99 90 89 83 98


</EnergyDeviation>


</VisualDescriptor>


</Image>


</MultimediaContent>


</Description>


</Mpeg7>


</NeP4B>


A preliminary dataset for creating and testing the scenario

is composed of a set of slab images
generated during the process of slab smoothing. The dataset have the following general
characteristics:



the total number of slabs is 2597;



the highest resolution images is 2560 vertical pixels ;



usually there is a lack

of loyalty chromatic;



frequently there is a pink contour around the slab image;



few images are incomplete or corrupted;



to each slab is associated with metadata (containing information such as the thinness of
the slab, the type of material, the measure of

the slab, etc.);



Figure 5.1
-

Example of an image of a slab with maximum resolution

The list of the type of materials and the corresponding number of slabs taken from the xml
metadata is the following:

Material

Count

ANDROMEDA

56

ANTIQUE_BROWN

214

A
RANDIS_YELLOW

105

ARTIC_CREAM

34

ASTERIX

28

BLACK_COSMIC

24

BLU_EYES

18

BLU_PEARL

106

COL_GOLD

20

COLONIAL_DREAM

36

COPPER_CANYON

65

COSTA_SMERALDA

209

DESERT_BROWN

12

DIORITE

55

FANTASTICO

28

GALAXY_BLACK

31

GIALLO_ARABESCATO

24

GIALLO_IRIS

42

GIALLO_ORNAMENTALE

439

GIALLO_VENEZIANO

86

GOLDEN_BEACH

59

GOLDEN_FLAKES

20

JUP_APRICOT

32

JUP_PERSA

187

LABRADORITE

27

LEMURIAN

30

LOTUS

46

MAGMA

70

MASCARELLO

12

MOON_YELLOW

125

NERO_AFRICA

71

NETTUNO

20

ROSA_PORRINO

43

STAR_BEACH

53

TARN

39

TROPIC_BROWN

6

VOLGA_BLU

125

total

2597



The result of classification is at the moment a text file for each document test with the following
format:

Roses

-
0.17

20.0

-
20.0

-
0.811440077543585

buses

-
0.194

20.0

-
20.0

-
0.6050249432150333

1040
00

-
0.19

20.0

-
20.0

-
0.4664549321865972

mountain

-
0.088

20.0

-
20.0

-
0.05791991333189391

colleges

-
0.11

20.0

-
20.0

-
0.3632860527390948


Where the meaning of the various fields is:

1.

The name of the category.

2.

The value which defines the margin of the categ
ory classifier. A document is classified
positively for this category if it has obtained a score value (5th field value) greater or
equal than the margin value.

3.

The maximum value that the score value can assume.

4.

The minimum value that the score value can a
ssume.

5.

The score value obtained by this document respect to the considered category. If the
score value is less than margin value, the classifier “thinks” that the document does not
belong to the considered category.

For the sake of consistency, it was nec
essary, however, to assign a category (i.e., a material) to
each slab of the test. At the same time it was not possible to accept two materials for the same
plate. We therefore decided to take as a category the one with for which the
score
-
margin

was
maxi
mum. This has been done even in those cases where this was most negative and then the
category would not have been assigned. The assessments of the behaviour (reported below)
are referred to the process that generated the files mentioned above and then do
not take
account of this additional step. For this reason you will find that the total number of
fp

does not
coincide with the total number of
ft.


The partition of the set of data in training
-
set and testing
-
set was conducted at 50% using
parameters that
then were not used by default in the latest version of the classifier. Below we
show these values:

KNN_MARGIN_START=0.01

KNN_MARGIN_END=40.0

KNN_MARGIN_STEP=0.05

KNN_K_START=2

KNN_K_END=40

KNN_K_STEP=2

COMMITTEE_MARGIN_START=0.01

COMMITTEE_MARGIN_END=40.0

COMMITTEE_MARGIN_STEP=0.05


In the next section we report the results obtained using other parameters and other dimensions
of training
-
set.

Global results

Precision

Recall

F1

Accuracy

micro
-
averaged

0,994

0,908

0,949

0,997

macro
-
averaged

0,995

0,913

0,9
46

0,997


Material

FN

FP

TN

TP

Training

Test

Total

ANDROMEDA

0

0

1263

28

28

28

56

ANTIQUE_BROWN

7

0

1184

100

107

107

214

ARANDIS_YELLOW

8

0

1239

44

53

52

105

ARTIC_CREAM

3

0

1274

14

17

17

34

ASTERIX

6

0

1277

8

14

14

28

BLACK_COSMIC

3

0

1279

9

12

12

24

BLU_EYES

0

0

1282

9

9

9

18

BLU_PEARL

0

0

1238

53

53

53

106

COL_GOLD

2

0

1281

8

10

10

20

COLONIAL_DREAM

0

0

1273

18

18

18

36

COPPER_CANYON

0

0

1259

32

33

32

65

COSTA_SMERALDA

15

0

1187

89

105

104

209

DESERT_BROWN

0

0

1285

6

6

6

12

DIORITE

0

0

126
4

27

28

27

55

FANTASTICO

0

0

1277

14

14

14

28

GALAXY_BLACK

0

1

1275

15

16

15

31

GIALLO_ARABESCATO

0

0

1279

12

12

12

24

GIALLO_IRIS

2

0

1270

19

21

21

42

GIALLO_ORNAMENTALE

16

0

1072

203

220

219

439

GIALLO_VENEZIANO

5

1

1247

38

43

43

86

GOLDEN_BEACH

1
8

0

1262

11

30

29

59

GOLDEN_FLAKES

0

0

1281

10

10

10

20

JUP_APRICOT

0

0

1275

16

16

16

32

JUP_PERSA

10

3

1195

83

94

93

187

LABRADORITE

0

0

1278

13

14

13

27

LEMURIAN

0

0

1276

15

15

15

30

LOTUS

0

0

1268

23

23

23

46

MAGMA

0

0

1256

35

35

35

70

MASCARELL
O

0

0

1285

6

6

6

12

MOON_YELLOW

12

1

1228

50

63

62

125

NERO_AFRICA

0

0

1256

35

36

35

71

NETTUNO

2

0

1281

8

10

10

20

ROSA_PORRINO

0

0

1270

21

22

21

43

STAR_BEACH

10

1

1264

16

27

26

53

TARN

0

0

1272

19

20

19

39

TROPIC_BROWN

0

0

1288

3

3

3

6

VOLGA_BLU

0

0

1229

62

63

62

125

Total

119

7

46469

1172

1306

1291

2597


Here we report the results obtained using both micro and macro averaged for the size of
training set of 10% and 50%. Regarding the latter we report both the results using the default
parameter
s and those using the one we have defined as “old” reported later on.



Type

Settings

Precision

Recall

F1

Accuracy

10% training micro
-
averaged

def

0,674

0,957

0,791

0,986

50% training micro
-
averaged

def

0,994

0,908

0,949

0,997

50% training micro
-
average
d

old

0,911

0,985

0,946

0,997







10% training macro
-
averaged

def

0,692

0,975

0,787

0,986

50% training macro
-
averaged

def

0,995

0,913

0,946

0,997

50% training macro
-
averaged

old

0,885

0,982

0,923

0,997

Where the parameters used are:


default

old

K
NN_MARGIN_START

-
1.0

0.01

KNN_MARGIN_END

1.0

40.0

KNN_MARGIN_STEP

0.005

0.05

KNN_K_START

3

2

KNN_K_END

30

40

KNN_K_STEP

3

2

COMMITTEE_MARGIN_START

-
20.0

0.01

COMMITTEE_MARGIN_END

20.0

40.0

COMMITTEE_MARGIN_STEP

0.01

0.05


5.3.3

MILOS Web Interface

The d
ataset and the results of the classification (considering 50% of training, using the "old"
parameter) are available through MILOS at:
http://pc
-
duetto.isti.cnr.it:8090/nep4b
-
demo/

The page of the Search Interface application is shown in Figure 5.2. From this page users can
search for marble slabs by using a similarity search and an advanced search. The same

interface allows users to classify an image
on
-
the
-
fly
. In case of similarity search, the user can
search by choosing a slab picture among those randomly proposed by the system. Random
slabs can be renewed on demand by the user.

In case of advanced search

it is possible to search for pictures by providing an own image
(using similarity) by expressing a free text (fulltext search) or by expressing a restrictions on the
basis of the metadata fields of the slabs (fielded search). For instance, the user can sp
ecify the
class of material of the slab, the id of the block, the thickness of the slab, etc.

Results are shown in the search page (Figure 5.3). From there, users can refine their queries by
choosing a picture in the result to submit a new similarity searc
h or to submit an advanced
search query, which combines similarity, full text, and fielded search. For instance, a user can
search for images similar to the chosen one, whose thickness is 30.

In case of on
-
the
-
fly classification, the user can submit his/he
r image and ask to the system to
classify it on the basis of the current marble classification. Results are shown in the search page
(Figure 5.4). The system shows best material classes matching the image example, associated
with the degree of confidence (
placing the mouse on the image slab presented).

Another way to submit a similarity query to the MILOS repository is to select an image from a
web site. For this purpose, we have developed a plug
-
in for the Firefox browser. The plug
-
in
allows to select an i
mage from any web site during the navigation, and to send it to the MILOS
system (see Figure 5.5). After the submission a web page is shown a tool allowing selecting the
area of interest from the image itself (see Figure 5.6) is presented. The result of th
e crop is then
submitted to the system and the results are presented to the user. This search mode is useful to
aid the user to select a piece of material from an image where an example of application is
shown from a web site showing their products. Then t
he user can refine his/her search
exploiting the same sub
-
image by adding to the query restriction based on the metadata fields,
fulltext, etc.


Figure 5.2


Example main search interface of MILOS for the B&C scenario.



Figure 5.3


Page that shows the
result list of a query.



Figure 5.4


Page that shows the result of the on
-
the
-
fly classification.




Figure 5.5


Example of application of the selection tool from the Marble.com web site.




Figure 5.5


Selection of the are of interest from the ima
ge (left), and result of the
similarity query (right).

5.4

Current results and next steps

Currently, as specified in the manual of the classifier, the result of the classification of a
document is a file that contains the score assigner to the document for eac
h class and the
threshold (called margin) for which the documents is decided to belong to the category. To
obtain the assignment of each slab to just one category we had to use a trick explained above.
The results of the evaluation of the classifier are re
lative to the classifier itself and therefore do
not take into account the trick used.

The cropping of the images could be achieved on the basis of information contained in files
metada file provided by the company rather than on the basis of a fixed perce
ntage (80% at the
time).

A doubt about the reliability of the test arises from the fact that the slabs of a same block are
practically identical. By dividing the dataset in training
-
set and testing
-
set is very likely that,
given a slab of the testing
-
set,

a slab of the same block is present in training
-
set. In this case,
the classification process is obviously easier and therefore the results less reliable. A solution
could be to select the training
-
set taking all the slabs of a certain block. In this way
there would
not be slabs of the testing
-
set coming from the same block of the training
-
set. In practice we
must classify the block and not the single slabs. However, this problem will probably overcame
when more slab images will be made available from the
Metro Marmi.


In order, to realize a scenario suitable for an advanced sperimentation that involves also the
integration with MOMIS, we must provide various multimedia source containing similar
information (i.e., marble, granite, etc.) described using diff
erent metadata. For this purpose we
are creating artificial multimedia datasets by downloading from the web site such as Marble.com
or Graniteland.com entire collection of materials.





.




6

Appendix A


Prototype technical tests

The attached file incl
udes an xls table proposed by CEFRIEL/Polimi and used to define and
plan the technical tests carried out in this first validation phase. The file contains work
-
in
-
progress material, then it is in Italian.

7

Appendix B


Demo videos

Six videos were realize
d to show the results of the first validation phase, namely:



Video
-
demo Semantic Navigation (HTML and Flash)


Scenario LOG
-
A



MOMIS
-

PVV Construction and Querying (SWF and HTML)


Scenario ICT



Milos Marble Search Interface Demo and Milos Select and Crop T
ool Demo



Scenario
B&C

8

Appendix C


ICT company profiles

The profiles of the CAN
-
ICT companies coded according to different XML schemas are
available in the attached file
D331 Appendix C
. The folder includes 4 sub
-
folders (Schema
-
1,
Schema
-
2, Schema
-
3
, Schema
-
4) each including the XML profiles of the companies sharing the
same XML schema.