vicinanza_AMGA - EELA Documents

righteousgaggleΔιαχείριση Δεδομένων

31 Ιαν 2013 (πριν από 4 χρόνια και 5 μήνες)

151 εμφανίσεις

IST
-
2006
-
026409


www.eu
-
eela.org

E
-
infrastructure shared between
E
urope and
L
atin
A
merica

The AMGA metadata catalog

with use cases


Domenico Vicinanza, CERN

EELA Tutorial, Santiago, September 2006

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

2


Background and Motivation for AMGA



Interface, Architecture and Implementation



Metadata Replication on AMGA



Deployment Examples



GILDA Use cases


Contents

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

3

Introduction


Data on the GRID are represented by millions of files
spread over several sites


To locate and find those of interest, users and
applications need and efficient mechanism to query
and discover information about their contents


This is provided by


associating descriptive attributes (metadata) to the data


exposing this information in catalogues which can be queried to
locate files (thanks to their attributes)


Metadata are typically modeled as couples

(key, value)


IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

4

Accessing metadata


Accessing metadata is conceptually similar to
accessing databases but


…having clients going directly to the database is not
the most convenient (secure and effective) solution!!!



A better solution is to have a simple interface for
metadata access on the GRID


Such a metadata interface


Should be defined in terms of metadata concepts (key, values)


It grants the access to the metadata hiding the DB structure and
implementation (transparency)


It would be fully compatible with distributed sources of
information (distributed data storages)


…it would work as a simplified relational DB interface


IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

5

Metadata services

A metadata service to be used in a GRID environment
should satisfy some specific requirements:


It must expose a complete but simple interface, so non
technical user can easily use it


It should be flexible and support dynamic schemas
(since there is no single schema which can support the
whole application domain)


It must support a hierarchical structure
(related
metadata can be grouped together and an isolated from
other metadata)


Security is required to provide different access levels
to different users

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

6

Basic concepts

The basic concepts of the metadata interface are


Entries
(the names of the data item or resource being
described)


Attributes

(the (key, value) pair)


Schema
(the logical group of attributes)



Entries are associated with one or more schemas and
inherit the attributes defined in those schemas


This is the only way of associating an attribute to an
entry
(it is not possible to have attributes associated
directly with entries)


Schemas are defined dynamically by the user

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

7


2004
-

ARDA evaluated existing Metadata Services
from HEP experiment: AMI
(ATLAS)
, RefDB
(CMS)
, Alien
Metadata Catalogue
(ALICE)



..and proposed an
interface for Metadata access on
the GRID


Based on requirements of LHC experiments


Generic
-

not bound to a particular application domain


Designed jointly with the gLite/EGEE team


Incorporates feedback from GridPP


Adopted as the

official EGEE Metadata Interface


Endorsed by PTF (Project Technical Forum of EGEE)

ARDA/gLite Metadata Interface

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

8


ARDA developed a
P
roject

T
ask

F
orce

in order to develop:


AMGA



A
RDA
M
etadata

G
rid

A
pplication



It began as
prototype

to evaluate the Metadata Interface


Evaluated by community since the beginning (LHCb and Ganga)


Matured quickly thanks to users feedback


AMGA is currently
part of the gLite middleware


Official Metadata Service for EGEE


First release with gLite 1.5


Also available as standalone component


It is expanding to other user communities:


HEP, Biomed, UNOSAT…


AMGA Implementation

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

9


Metadata organised as an

hierarchy


Collections can contain sub
-
collections


Analogy to file system:


Collection


Directory; Entry


File


Flexible Queries


SQL
-
like query language


Joins between schemas


Example


QUERY EXAMPLE:


> getattr /gilda/santiago/musica/*.mov Name Duration

AMGA Features

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

AMGA Security


Unix style permissions


ACLs


per
-
collection

or
per
-
entry
.


Secure connections


SSL


Client Authentication based on


Username/password


General X509 certificates


Grid
-
proxy certificates


Access control via a Virtual Organization
Management System (VOMS)


Authenticate
with X509
Cert
VOMS
-
Cert
with Group &
Role information
VOMS
-
Cert
Resource
management
AMGA
Oracle
VOMS
IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

11


C++ multiprocess server


Runs on any Linux flavour



Backends


Oracle, MySQL, PostgreSQL, SQLite



Two frontends


TCP Streaming


High performance


Client API for:
C++, Java, Python, Perl, Ruby


SOAP


Interoperability



Also implemented as standalone
Python library


Data stored on filesystem

Metadata Server
MD
Server
SOAP
TCP
Streaming
Postgre
SQL
Oracle
SQLite
Client
Client
MySQL
Python Interpreter
Metadata
Python
API
Client
filesystem
AMGA Implementation

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006


Medical Data Manager


MDM


Store and access medical images and associated metadata
on the Grid


Built on top of gLite 1.5 data management system


Demonstrated at last EGEE conference (October 05, Pisa)


Strong security requirements


Patient data is sensitive


Data must be encrypted


Metadata access must be restricted to authorized users


AMGA used as metadata server


Demonstrates authentication and encrypted access


Used as a simplified DB


More details at:


https://uimon.cern.ch/twiki/bin/view/EGEE/DMEncryp
tedStorage

Images
GUID
Date
Patient
ID
Doctor
Doctor
Name
Hospital
Patient
Biomed

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

16


TCP Streaming Front
-
end


mdcli &
mdclient

and C++ API (md_cli.h, MD_Client.h)


Java Client API and command line mdjavaclient.sh &
mdjavacli.sh (also under Windows !!)


Python Client API



SOAP Frontend (WSDL)


C++ gSOAP


AXIS (Java)


ZSI (Python)

Accessing AMGA

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

18


gLibrary



AMGA for geospatial metadata:
GIS

(Geographical Information System)



gMOD


GILDA Use cases

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

20


gLibrary is a higher level application built on top of many gLite
grid services: a Metadata Catalogue + File Catalogues + Storage
Elements


…with these requirements:
easy to use, fast, secure, extensible



gLibrary attempts to create a
Multimedia Management System on the
Grid


Examples of Multimedia Contents handled by gLibrary:


Images, Movies, Audio Files


Office Documents (Powerpoint, Word, Excel, OpenOffice)


E
-
Mails, PDFs, HTMLs


Customized versions of well
-
know document type (ex. EGEE
PPTs)


Keep track and organize in a uniform way all the additional details
(metadata) of files saved in Storage Elements and registered in
File Catalogues


Provide users an easy way to locate and retrieve files based on
their contents

gLibrary goals

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

21

Examples (Office/Entertainment):


Locate all theoretical (PPTType) PowerPoint (Type)
presentations about FireMan (Keywords) given in
2005 (Date) by John.S (Speaker);



Find all the movies (Type) in which Julia Roberts
(Cast) performed together with Hugh Grant (Cast)
produced in USA (Country) in 2004 (ReleaseDate)



All the acoustic (Genre) mp3 (Format) audio files
(Type) of Alanis Morissette (Singer) that last more
than 3 minutes (Runtime).

Usage Scenarios

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

24

Collection

/gLibrary

Entry Names

Attributes

FileName

PathName

Type

Submitter

4ffaffc8
-
26e7
-
4826
-
b460
-
3d5bf08081a4

DedicatoAte.mp3

/grid/gilda/calanducci

Audio

Tony
Calanducci

00454dca
-
a269
-
4b93
-
8a45
-
c4012af05600

ardizzonelarocca_is_231005.ppt.gpg

/grid/gilda/calanducci/
EGEE

EGEEDOC


Tony
Calanducci

/gLibrary (continuum)

Attributes

SubmissionDate

Encryption

Description


Keywords


CreationDate


2006
-
01
-
05 00:00:00

false

Song of the Italian Band
“Le Vibrazioni”

Vibrazioni

2004
-
02
-
05 00:00:00

2005
-
01
-
05 16:44:22


true

gLite Information System

R
-
GMA, RGMA, BDII, IS

2005
-
10
-
05 23:40




...









...





Example of Entries

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

26


User Requirements:


a valid proxy with VOMS extensions


VOMS Role and Group needed to be recognized by gLibrary as a
contents manager.


3 kinds of users:


gLibraryManage
r
: (s)he can create new content type and allows a
generic VO user to become gLibrarySubmitter


gLibrarySubmitters
: they can add new entries and define access
rights on the entries they create.


Fine
-
grained permission (reading, writing, listing, decrypting)
settings on each entry: whole VO members, groups, list of DNs


generic VO users
: browse and make queries (on entries they have
access to)


Basic level of cryptography:


New files saved on SEs can be encrypted beforehand with a
symmetric passphrase that

will be saved in /gLKeys. Only selected users
(that have a specific DN in the subject of their VOMS proxy) can access the
passphrase and decrypt the file.

gLibrary Security

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

30


gMOD provides a Video
-
On
-
Demand service


User chooses among a list of video and the chosen one is
streamed in real time to the video client of the user’s
workstation


For each movie a lot of details (Title, Runtime, Country,
Release Date, Genre, Director, Case, Plot Outline) are stored
and users can search a particular movie querying on one or
more attributes


Two kind of users can interact with gMOD:
TrailersManagers

that can administer the db of movies (uploading new ones
and attaching metadata to them);
GILDA VO users (guest)

can browse, search and choose a movie to be streamed.

gMOD:
grid Movie On Demand

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

31


Built on top of gLite services + GENIUS web portal:


Storage Elements
, sited in different places, physically
contain the movie files


FireMan
, the File Catalogue, keeps track in which Storage
Element a particular movie is located


AMGA

is the repository of the detailed information for
each movie, and makes possible queries on them


The
Virtual Organization Membership Service (VOMS)

is
used to assign the right role to the different users


The
Workload Management System

(WMS)

is responsible
to retrieve the chosen movie from the right Storage
Element and stream it over the network down to the
user’s desktop or laptop

gMOD under the hood

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

32

VOMS

FireMan

Catalogue

Metadata

Catalogue

CE

Storage
Elements

User

Genius Portal

Workload Management System

get Role

AMGA

gMOD interactions

IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

33

gMOD is accesible through the Genius Portal (https://glite
-
tutor.ct.infn.it)

gMOD screenshot


IST
-
2006
-
026409



E
-
infrastructure shared between
E
urope and
L
atin
A
merica

www.eu
-
eela.org

Santiago, Chile, EELA Tutorial, 06
-
07.09.2006

34





Thanks to Riccardo Bruno who developed the
first version of these slides