WP2 - GridPP

caddiepastΔιαχείριση Δεδομένων

31 Ιαν 2013 (πριν από 4 χρόνια και 9 μήνες)

296 εμφανίσεις

WP2: Data Management

Gavin McCance

University of Glasgow



GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow


Key areas covered by WP2


Current Status GDMP


Services to be Delivered GridPP


CPU and Bandwidth Investigation


Summary


WP2: Data Management



GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow

WP2: Data Management


Goal: develop middle
-
ware infrastructure to
manage petabyte
-
scale data


R

e

p

l

i

c

a



M

a

n

a

g

e

r

D

a

t

a



M

o

v

e

r

D

a

t

a



A

c

c

e

s

s

o

r

S

t

o

r

a

g

e



M

a

n

a

g

e

r

C

a

s

t

o

r

H

P

S

S

D

a

t

a



L

o

c

a

t

o

r

M

e

t

a



D

a

t

a



M

a

n

a

g

e

r

L

o

c

a

l



F

i

l

e

s

y

s

t

e

m

Q

u

e

r

y



O

p

t

i

m

i

s

a

t

i

o

n



&

A

c

c

e

s

s



P

a

t

t

e

r

n



M

a

n

a

g

.

S

e

c

u

r

e



R

e

g

i

o

n

H

i

g

h



L

e

v

e

l



S

e

r

v

i

c

e

s

M

e

d

i

u

m



L

e

v

e

l



S

e

r

v

i

c

e

s

C

o

r

e



S

e

r

v

i

c

e

s

Service levels

reasonably

well defined

GridPP:

Identify Key Areas

Within Software

Structure




GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow

Key Areas and Services


Concentrate mostly on M9 deliverables and
where GridPP fits in



Replication


GDMP integration with Globus Replica Catalogue


Query / Replica Optimisation (not for M9!)


Investigate Genetic Algorithms for efficient
optimisation of cost functions


SQL Database Service


Complements the LDAP Directory Service approach


Service Index


Efficient and scalable discovery mechanism




GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow

GDMP Replication


CERN’s GDMP: Asad Samar / Heinz Stockinger




Allows world
-
wide replication of large OO
databases


Modules soon available for Objectivity, Root and FZ files (M9)


WP2: Numerous replication strategies possible


e.g. (fully) consistent synchronous replication or more lazy
asynchronous replication


Reviews...


Much current discussion in WP2 and beyond… workshops?


[Distributed Database Management Systems and the Data Grid, Heinz Stockiner]



GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow

GDMP Replica Catalogue

Get import
file list

Export

Catalogue

Import

Catalogue

Import

Catalogue

Import

Catalogue

Replica

Catalogue

Site1

(Publisher)

Site2

(Subscriber)

Site3

(Subscriber)

Site4



Publish files

Get import
file list

Get import
file list

Notify subscribers

of new files


M9… GDMP now interfaced to the Globus Replica
Catalogue

Logical File

Logical File

Logical File

Physical File

Physical File

Physical File

Logical Collection

File Registration, Searching
and Deletion implemented

[GDMP Integration with Globus’ Replica Catalogue, Asad Samar]



GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow

Query / Replica Optimisation


Should the replica manager make a new replica?
Can a query/job be split into sub
-
queries?

Which replica to use?


Higher level service! Uses
cost model

to make
decision...




Minimise over all subsets of data accessed in sub
-
queries and all physical file replicas



Preliminary work done in development of cost
models… more to be studied...


GridPP can contribute to WP2!

)
(
)
,
,
(
Time
Policy
G
Datasize
BW
f
F
net
net
i



[Towards a Cost Model for Distributed and Replicated Data Stores, Heinz & Kurt Stockinger, CERN]



GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow

GA Approach


GridPP work will investigate uses of Genetic
Algorithms for optimising complex multi
-
dimensional cost functions


Solutions are ‘bred’ in parallel, ranked according
to the cost function, and re
-
bred using the best
candidates using some crossing and mutation
operators


Multiple points evolved simultaneously;
more robust against local minima


Optimisations generally faster for
complex functions, particularly for more
unpredictable situations e.g. networks!



GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow


LDAP? Hierarchical model assumes you know
the query before designing the database!


Arbitrary / Computed queries can be expensive /
impossible!


RDBMS model is better for these queries


Investigating SQL databases…


Issues with transactions to be investigated


M9 should see
basic

SQL insert, delete, update
and select operations.


Standard protocols should be used!


e.g. Generic SQL wrapped in XML over HTTPS...

M9: SQL Database Service

PostgreSQL



GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow

M9: SQL Database Service


Producer / Consumer Model


A Producer adds meta
-
data and registers table
format.


(Dynamic registration of new tables is outside M9..?)


A Consumer uses a known or registered schema
(tbd!) to construct query.


translated by server to SQL.. queried.. returned to
client as XML / HTML


APIs to be implemented:


JAVA, Web, Command line



GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow

M9: Service Index


Grid services must be able to discover each other!


Neither the ‘everyone knows...’ approach nor the
hierarchical approach is scalable.

sds.cern.ch

sds.anl.gov

sds.infn.it

sds.ral.uk

sds.padova
-
infn.it

sds.trieste
-
infn.it

sds.bologna
-
infn.it

Allowed



Hierarchical Model

Construct a ‘web’

of Service Indices



GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow

M9: Service Index


Services publish XML based description…


e.g. name, contact protocols / details, type, who can
know about me.


JINI style ‘leases’: services must report
periodically or be dropped from list


Clients query service
-
indices using XML based
query with standard schema (tbd!)…


M9 will see basic propagation of queries.


Security: Services must be able to limit who can
access their description !


Coarse grained..


Other than this, the service index will not provide any
access policy control..!






GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow

M9: Service Index


Service descriptions should be small! (<1k)


User defined (eg. experiment specific) schema should
be ~ discouraged.



After M9.. more intelligent web traversing tools
can be developed!


Agent technology?



How to find a service index??


Hard wired ‘root’ service indices??


Limited scope multicast advertising??




GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow

CPU and Bandwidth Monitoring


Scalable CPU
Monitoring system
for ScotGRID cluster
with JAS GUI being
developed

General cluster overview

More detailed individual
node information



GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow

CPU and Bandwidth Monitoring


Network measurement tools being evaluated
and developed

Δt

b

b

Bandwidth measurement from

UDP packet dispersion

t
B
BW


MonitorX

Pipechar

IPERF



GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow

CPU and Bandwidth Monitoring


Other methods / tools being investigated and
developed

Bandwidth measurement from

Round
-
trip
-
time (RTT) using UDP, TC/PIP and ICMP

mptraceu

pathchar



BW
dB
RTT
d
1
)
(
min
Uses RTT through routers as a
function of packet size to obtain
bandwidth



GRID



I

I

D

UK

Particle

Physics

Gavin McCance
-

University of Glasgow

Summary


GDMP Replication Manager completed


Active discussion in WP2 and beyond about
replication strategies


Cost models… GA approach?


SQL Database Service being investigated for M9


Service Index being investigated for M9



CPU and Network Monitoring work is underway
in ScotGRID...