with use cases

apatheticyogurtΛογισμικό & κατασκευή λογ/κού

13 Δεκ 2013 (πριν από 3 χρόνια και 7 μήνες)

233 εμφανίσεις

FP6−2004−Infrastructures−6
-
SSA
-
026634

The AMGA metadata catalog

with use cases


Danfeng, Zhu

Beihang University


3rd

EUChinaGrid

Tutorial

Beijing
,
25
-
2
6
th
November

2006



Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Contents



Background and Motivation for AMGA



Interface, Architecture and Implementation



Metadata Replication on AMGA



Deployment Examples



GILDA Use cases


Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Metadata on the GRID


Metadata is
data about data


On the Grid:
information about files



Describe files


Locate files based on their contents


But also
simplified DB access on the Grid


Many Grid applications need structured data


Many applications require only simple schemas


Can be modelled as metadata


Main advantage: better integration with the Grid
environment


Metadata Service is a Grid component


Grid security


Hide DB heterogeneity


Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

ARDA/gLite Metadata Interface



2004
-

ARDA evaluated existing Metadata Services from HEP
experiments


AMI (ATLAS), RefDB (CMS), Alien Metadata Catalogue (ALICE)


Similar goals, similar concepts


Each designed for a particular application domain


Reuse outside intended domain difficult


Several technical limitations: large answers, scalability,
speed, lack of flexibility


ARDA proposed an
interface for Metadata access on the GRID


Based on requirements of LHC experiments


But generic
-

not bound to a particular application domain


Designed jointly with the gLite/EGEE team


Incorporates feedback from GridPP


Adopted as the

official EGEE Metadata Interface


Endorsed by PTF (Project Technical Forum of EGEE)

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

AMGA Implementation


ARDA developed an implementation of PTF interface


AMGA



A
RDA
M
etadata
G
rid
A
pplication


Began as prototype to evaluate the Metadata Interface


Evaluated by community since the beginning:


LHCb and Ganga were early testers (more on this later)


Matured quickly thanks to users feedback


Now
part of gLite middleware


Official Metadata Service for EGEE


First release with gLite 1.5


Planned for inclusion on gLite 3.1 (not present on gLite 3.0)


Also available as standalone component


Expanding user community


HEP, Biomed, UNOSAT…


Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Metadata Concepts


Some Concepts


Metadata

-

List of attributes associated with
entries


Attribute



key/value pair with type information


Type


The type (int, float, string,…)


Name/Key


The name of the attribute


Value

-

Value of an entry's attribute


Schema



A set of attributes


Collection



A set of entries associated with a schema


Think of schemas as tables, attributes as columns, entries
as rows

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

AMGA Features


Dynamic Schemas


Schemas can be modified at runtime by client


Create, delete schemas


Add, remove attributes


Metadata organised as an

hierarchy


Collections can contain sub
-
collections


Analogy to file system:


Collection


Directory; Entry


File


Flexible Queries


SQL
-
like query language


Joins between schemas


Example


selectattr /gLibrary:FileName /gLAudio:Author /gLAudio:Album

'/gLibrary:FILE=/gLAudio:FILE and
like(/gLibrary:FileName, “%.mp3")‘

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Security


Unix style permissions


ACLs



Per
-
collection or per
-
entry.


Secure connections


SSL


Client Authentication based on


Username/password


General X509 certificates


Grid
-
proxy certificates


Access control via a Virtual Organization Management System
(VOMS):


Authenticate
with X509
Cert
VOMS
-
Cert
with Group &
Role information
VOMS
-
Cert
Resource
management
AMGA
Oracle
VOMS
Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

AMGA Implementation


C++ multiprocess server


Runs on any Linux flavour


Backends


Oracle, MySQL, PostgreSQL,
SQLite


Two frontends


TCP Streaming


High performance


Client API for C++, Java,
Python, Perl, Ruby


SOAP


Interoperability


Also implemented as
standalone Python library


Data stored on filesystem

Metadata Server
MD
Server
SOAP
TCP
Streaming
Postgre
SQL
Oracle
SQLite
Client
Client
MySQL
Python Interpreter
Metadata
Python
API
Client
filesystem
Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Accessing AMGA


TCP Streaming Front
-
end


mdcli & mdclient and C++ API (md_cli.h, MD_Client.h)


Java Client API and command line mdjavaclient.sh &
mdjavacli.sh (also under Windows !!)


Python Client API


AMGA Web Interface

---
NEW


Developed totally by the GILDA team


INFN CT


Based on JAVA AMGA Standard APIs


Web Application using standard as JSP Custom Tags, Servlet


SOAP Frontend (WSDL)


C++ gSOAP


AXIS (Java)


ZSI (Python)

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

AMGA Web Interface







Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

AMGA Web Interface

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Metadata Schema Management

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Entry Management

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

ACL Management

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

QBE like Query Engine

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Query Result

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

AMGA WI Deployment Scenario

AMGA Server
AmgaWi
Application Server
Metadata Service
Internet
GRID
Clients
AMGA WI could be deployed on a
dedicated server. This can be located
inside the GRID network or outside.

Currently the GILDA AMGA Server
machine also hosts the web interface.

Users access to the catalog towards
the functionalities provided by the
web interface.

User uses a common Web Browser.

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Metadata Replication


Motivation


Scalability


Support hundreds/thousands of concurrent users


Geographical distribution


Hide network latency


Reliability


No single point of failure


DB Independent replication


Heterogeneous DB systems


Disconnected computing


Off
-
line access (laptops)


Architecture


Asynchronous replication


Master
-
slave


Writes only allowed on the master


Replication at the application level


Replicate Metadata commands, not SQL → DB independence


Partial replication



supports replication of only sub
-
trees of the
metadata hierarchy


Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Metadata Replication

Metadata
Commands
Redirected
Commands
Full replication

Partial replication

Federation

Proxy

Some use cases

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Early adopters of AMGA


LHCb
-
bookkeeping


Migrated bookkeeping metadata to ARDA prototype


20M entries, 15 GB


Large amount of static metadata


Feedback valuable in improving interface and fixing bugs


AMGA showing good scalability


Ganga


Job management system


Developed jointly by Atlas and LHCb


Uses AMGA for storing information about job status


Small amount of highly dynamic metadata

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Conclusion


AMGA


Metadata Service of gLite


Part of gLite (but still not certificed in gLite 3.0. it will be done
with 3.1 release
)


Useful for simplified DB access


Integrated on the Grid environment (Security)


Replication/Federation features


Tests show good performance/scalability


Already deployed by several Grid Applications


LHCb, ATLAS, Biomed, …


AMGA WI, gMOD, gLibrary (it follows)


AMGA Web Site




http://cern.ch/amga /


Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

GILDA Use cases


Biomed



gLibrary



gMOD


Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Biomed


M
edical
D
ata
M
anager


MDM


Store and access medical images and associated metadata on the
Grid


Built on top of gLite 1.5 data management system


Demonstrated at last EGEE conference (October 05, Pisa)


Strong security requirements


Patient data is sensitive


Data must be encrypted


Metadata access must be restricted to authorized users


AMGA used as metadata server


Demonstrates authentication


and encrypted access


Used as a simplified DB


More details at


https://uimon.cern.ch/twiki/bin/view/EGEE/


DMEncryptedStorage

Images
GUID
Date
Patient
ID
Doctor
Doctor
Name
Hospital
Patient
Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

gLibrary Motivations


Huge amounts of data can be saved on SEs (did we
forget about the existence of Data Grids?)


But how can we easily find later a file that we need?


(if you have good memory, its GUID could be a solution


)


File Catalogues just let us to arrange files in folders and
subfolders, no way to
query on their contents


Metadata Catalogues are a possible solution, but not always
“affordable” especially for non expert users (powerful but
complex to use)


Our solution: a higher level application built on top of
many gLite grid services: a Metadata Catalogue + File
Catalogues + Storage Elements


gLibrary


Requirements:
easy to use, fast, secure, extensible


Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

gLibrary goals


Attempt to create a
Multimedia Management System on
the Grid


Examples of Multimedia Contents handled by gLibrary:


Images


Movies


Audio Files


Office Documents (Powerpoint, Word, Excel, OpenOffice)


E
-
Mails, PDFs, HTMLs


Customized versions of well
-
know document type (ex. EGEE
PPTs)


….


Keep track

and
organize
in a uniform way all the
additional details (metadata) of files saved in Storage
Elements and registered in File Catalogues


Provide users an easy way to locate and retrieve files
based on their contents

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Usage scenarios


Example 1:


Locate all theoretical (PPTType) PowerPoint (Type)
presentations about FireMan (Keywords) given in 2005
(Date) by Uncle Sam (Speaker);


Find all the movies (Type) in which Julia Roberts (Cast)
performed together with Hugh Grant (Cast) produced in USA
(Country) in 2004 (ReleaseDate);


Find

all the acoustic (Genre) mp3 (Format) audio files (Type)
of Alanis Morissette (Singer) that last more than 3 minutes
(Runtime).


Example 2:


A doctor is looking for brain (keyword) DICOM (Type)
images of male (Gender) patients older than 65 (Age).

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

gLibrary prototype implementation


Files are saved on SEs and registered into file catalogues
(LFC and/or FiReMan)


The AMGA Metadata Catalogue is
used to archive and
organize metadata and to answer users’ queries.


gLibrary is built using the following AMGA collections:


/
gLibrary
contains generic metadata for each entry


/
gLAudio, /gLImage, /gLVideo, /gLPPT, /EGEEPPT, /gLDoc, …

are examples of collections of “additional features” (shown later)


/
gLTypes


keeps the associations between document types and the names of
the collection that contains the “additional features”


is used by gLibrary to find out where it has to look when new
document types are added into the system (extensibility)


/
gLKeys

is used to store Decryption Keys

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Example of entries

Collection

/gLibrary

Entry Names

Attributes

FileName

PathName

Type

Submitter

4ffaffc8
-
26e7
-
4826
-
b460
-
3d5bf08081a4

DedicatoAte.mp3

/grid/gilda/calanducci

Audio

Tony
Calanducci

00454dca
-
a269
-
4b93
-
8a45
-
c4012af05600

ardizzonelarocca_is_231005.ppt.gpg

/grid/gilda/calanducci/E
GEE

EGEEDOC


Tony
Calanducci

/gLibrary (continuum)

Attributes

SubmissionDate

Encryption

Description


Keywords


CreationDate


2006
-
01
-
05 00:00:00

false

Canzone delle vibrazioni
che ha ricevuto un enorme
successo tra i teenagers
nel 2003

Vibrazioni

2004
-
02
-
05 00:00:00

2005
-
01
-
05 16:44:22


true

gLite Information System

R
-
GMA, RGMA, BDII, IS

2005
-
10
-
05 23:40

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Example of gLibrary collections

Collection

/gLTypes

Entry names

Attributes

Path
(refers to a collection)

Audio

/gLAudio

Image

/gLImage

Video

/gLVideo

Documents

/gLDOC

PowerPoint

/gLPPT

EGEEDOC

/EGEEPPT

Collection

/EGEEPPT

Entry names

Attributes

Title

Runtime

Author

Type

Date

Event

Speaker

Topic

00454dca
-
a269
-
4b93
-
8a45
-
c4012af05600

Information
Systems

00:30:00

Valeria
Ardizzione,
Giuseppe La
Rocca

Theorical

2005
-
10
-
23

4
th

EGEE
Conferen
ce

Giuseppe La
Rocca,
Valeria
Ardizzone

R
-
GMA,
BDII

Collection

/gLAudio

Entry names

Attributes

SongTitle

Duration

Album

Genre

Singer

Format

4ffaffc8
-
26e7
-
4826
-
b460
-
3d5bf08081a4

Dedicato A Te

00:03:27

Dedicato A Te

Pop

Le Vibrazioni

MP3

Collection

/gLKeys

Entry names

Attributes

Passphrase

00454dca
-
a269
-
4b93
-
8a45
-
c4012af05600

ardizzo

“additional features”

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

gLibrary Security


User Requirements:


a valid proxy with VOMS extensions


VOMS Role and Group needed to be recognized by gLibrary as a
contents manager.


3 kinds of users:


gLibraryManager
: (s)he can create new content type and allows a
generic VO user to become gLibrarySubmitter


gLibrarySubmitters
: they can add new entries and define access
rights on the entries they create.


Fine
-
grained permission (reading, writing, listing, decrypting) settings
on each entry: whole VO members, VO groups, list of DNs


generic VO users
: browse and make queries (on entries they have
access to)


Basic level of cryptography:


New files saved on SEs can be encrypted beforehand with a
symmetric passphrase that will be saved in /gLKeys. Only selected
users (that have a specific DN in the subject of their VOMS proxy)
can access the passphrase and decrypt the file.



Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

gLibrary Authorization

Query> acl_show /gLibrary/

>> gLibraryManager rwx

>> gLibraryManager:glibsubmitters rwx

>> gilda:users rx

Query> acl_show /gLAudio

>> gLibraryManager rwx

>> gLibraryManager:glibsubmitters rwx

>> gilda:users rx

Query> acl_show /gLTypes

>> gLibraryManager rwx

>> gLibraryManager:glibsubmitters rx

>> gilda:users rx

Query> whoami

>> gLibrarySubmitter

Query> acl_show /gLKeys/gildateam

>> gLibrarySubmitter rwx

>> gLibrarySubmitter:gildateam rx

Query> grp_show gildateam

>> tony

>> valeria

>> giuseppe

>> emidio

Query> user_listcred tony

>>

>> 'C = IT, O = GILDA, OU = Personal
Certificate, L = INFN Catania, CN =
Tony Calanducci, emailAddress =
tony.calanducci@ct.infn.it
'


Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Implementation


Heavy exploitation of AMGA features


support for VOMS proxy authentication


fine
-
grained authorization capabilities to set ACLs per entry
basis to restrict access to the decryption keys.


Allow gLibrarySubmitters to control which users (based on
DNs, VOMS Roles and Groups) can list and get the attributes’
value for the submitted entries


GUI Front
-
ends (to achieve the “easy of use” promise):


Java SWING GUI to be run on a Grid UserInterface (JVM
required)
--

prototype is under way


Portlet based front
-
end will be deployed in GENIUSPHERE
and made available for any other JSR168 compliant portlets
cointainer


Both use AMGA Java APIs

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

gLibrary Deployment
scenario

Authenticate

with X509

Certificate

VOMS Proxy

with Group &

Role Information

(gLibraryManager,

gLibrarySubmitter,

VO user)

LFC

(or Fireman)

Catalog

SE

SE

SE


UI

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

gLibrary JAVA GUI
screenshot

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

gLibrary JAVA GUI
Screenshot (II)

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Future planned improvements


Splitting of big files among several SEs (different chunks stored in
different SEs):


Increase the security of data: even if a chunk is intercepted it has no
meaning alone.


Increase upload/download bandwidth


Possible implementation:


one more NumberOfChunks attribute in /gLibrary collection.


/gLChunks collection keeps track of FirstChunkGUID
-
Chunk#
-
ChunkGUID


Automatic extraction and population of metadata for well known
document types


use of GNU libextractor to extract metadata from HTML, PDF, PS,
OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw), DVI,
MAN, MP3 (ID3v1 and ID3v2), OGG, WAV, EXIV2, JPEG, GIF, PNG,
TIFF, DEB, RPM, TAR(.GZ), ZIP, ELF, REAL, RIFF (AVI), MPEG, QT
and ASF


use of Lucenne algorithm for indexing document types containing text


Evaluation of gLite Hydra Key Store to save decryptions keys



Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Splitting Implementation

UI

SE

SE

SE

SE

EGEE_Movie.mpg

EGEE_Movie.mpg_gpg_1

EGEE_Movie.mpg_gpg_2

EGEE_Movie_mpg_gpg_3

EGEE_Movie.mpg_gpg_4

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

gMOD: grid Movie On Demand


gMOD provides a Video
-
On
-
Demand service


User chooses among a list of video and the chosen one
is streamed in real time to the video client of the user’s
workstation


For each movie a lot of details (Title, Runtime, Country,
Release Date, Genre, Director, Case, Plot Outline) are
stored and users can search a particular movie
querying on one or more attributes


Two kind of users can interact with gMOD:
TrailersManagers

that can administer the db of movies
(uploading new ones and attaching metadata to them);
GILDA VO users (guest)

can browse, search and
choose a movie to be streamed.

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

gMOD under the hood


Built on top of gLite services:


Storage Elements
, sited in different place, physically
contain the movie files


FireMan
, the File Catalogue, keeps track in which
Storage Element a particular movie is located


AMGA

is the repository of the detailed information for
each movie, and makes possible queries on them


The
Virtual Organization Membership Service (VOMS)

is used to assign the right role to the different users


The
Workload Management System

(WMS)

is
responsible to retrieve the chosen movie from the right
Storage Element and stream it over the network down
to the user’s desktop or laptop

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

gMOD interactions

VOMS

FireMan

Catalogue

Metadata

Catalogue

CE

Storage
Elements

User

Genius Portal

Workload Management System

get Role

AMGA

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

gMOD screenshot

gMOD is accesible through the Genius Portal (https://glite
-
tutor.ct.infn.it)

Danfeng Zhu
,
BUAA



3牤r
Grid tutorial for users


Bejiing,
25
-
26th

2006

Any questions?





Thanks for the attention