iRODS - integrated Rule Oriented Data System

musicincurableData Management

Jan 31, 2013 (4 years and 6 months ago)

162 views

1

iRODS
-

integrated Rule
Oriented Data System

Reagan Moore

rwmoore@renci.org


2

Development Team


DICE team


Arcot Rajasekar
-

iRODS Development Lead


Mike Wan
-

iRODS Chief Architect


Wayne Schroeder
-

iRODS Product Mgr., Developer


Bing Zhu
-

Fedora, Windows


Mike Conway
-

Java (Jargon)


Paul Tooby
-

Documentation, Foundation


Sheau
-
Yen Chen
-

Data Grid Administration


Reagan Moore
-

PI


Preservation


Richard Marciano
-

Preservation Development Lead


Chien
-
Yi Hou
-

Preservation Micro
-
services


Antoine de Torcy
-

Preservation Micro
-
services

3

User

Can Search, Access, Add and
Manage Data

& Metadata

*Access
data with Web
-
based Browser or iRODS GUI or Command Line clients.


Overview of iRODS Architecture

iRODS Data
Server

Disk, Tape, etc.

iRODS
Metadata

Catalog

Track information

iRODS Data System

iRODS Rule
Engine

Track policies

4

Scale of iRODS Data Grid


Number of files


Tens to millions to hundreds of millions of files


Size of data


Gigabytes to hundreds of terabytes to petabytes of data


Number of policy enforcement points


64 actions define when policies are checked


System state information


112 metadata attributes for system information per file


Number of functions


185 composable micro
-
services


Number of storage systems that are linked


One to tens to a hundred storage resources


Number of data grids


One to federation of tens of data grids

5

Data are Inherently Distributed


Distributed sources


Projects span multiple institutions


Distributed analysis platforms


Grid computing


Distributed data storage


Minimize risk of data loss, optimize access


Distributed users


Caching of data near user


Multiple stages of data life cycle


Data repurposing for use in broader context

Demo
-
1

6

Organize Distributed Data into a
Sharable Collection


Project repository





MotifNet
-

manage collection of analysis products


Institutional repository




Carolina Digital Repository for UNC collections


Regional collaboration




RENCI Data Grid linking resources across North Carolina


National collaboration




NSF Temporal Dynamics of Learning Center


Australian Research Collaboration Service


National Library





French National Library


National Archive





NARA Transcontinental Persistent Archive Prototype, Taiwan


International collaboration



BaBar High Energy Physics (SLAC
-
IN2P3)


National Optical Astronomy Observatory (Chile
-
US)

7

Logical Name Spaces

Storage Repository



Storage location



User name



File name



File context
(creation date,…)



Access constraints

Data Grid



Logical resource name space



Logical user name space



Logical file name space



Logical context (metadata)



Access controls

Institution Repository

Data Access Methods (C library, Unix shell, Java, Fedora)

Community controls the name spaces

Demo
-
2

Social Challenges


Every community prefers their user interface


Unix shell commands
-

icommands


Java I/O library
-

JARGON / JUX


C I/O library


Portals
-

EnginFrame


Digital Libraries
-

Fedora / Dspace


Workflows
-

Kepler / Taverna


Transport
-

GridFTP / Parrot


Web browsers / Windows browser


Load libraries
-

Python (Pyrods)


User level file systems
-

FUSE / WebDAV / PetaFS


Grid APIs
-

JSAGA


Web services
-

URSpace / VOSpace


Future ports
-

Islandora / iDROP

Heterogenity Challenges


Many types of operating systems


Unix variants, 32
-
bit/64
-
bit


Mac OSX/IntelPC, Mac OSX/PowerPc


Linux


Windows XP, Vista


Many types of storage systems


File systems


Tape archives


Cloud storage


Different administrative domains


Challenge
-
response authentication


Kerberos


GSI
-

Grid Security Infrastructure (PKI certificates)


Shibboleth

10

Data Virtualization

Storage System

Storage Protocol


Access Interface


Standard Micro
-
services


Data Grid


Map from actions
requested by the access
method to a standard
set of Micro
-
services.


Map the standard
Micro
-
services to
standard operations.


Map the operations to
protocol supported by
the operating system.

Standard Operations


11

iRODS
-

Policy
-
based Management


Turn policies into computer actionable rules


Compose rules by chaining micro
-
services


Manage state information as attributes on
namespaces:


Files / collections /users / resources / rules


Validate assessment criteria


Queries on state information, parsing of audit
trails


Automate administrative functions

12

iput With Replication

Data

iput

Client

Resource 1

icat

data

metadata

/<filesystem>

Metadata

Data

Resource 2

Rule added to
Rule database

data

Rule

Base

Rule

Base

NASA Center for Computational Sciences

13

Under the hood
-

a glimpse

iRODS Server

Rule Engine



Data request goes to 1
st

Server

iRODS Server

Rule Engine

iRODS Server
Rule Engine

DB



Server looks up information in catalog



Catalog tells 2
nd
federated server has data



1
st

server asks 2
nd

server for data



2
nd
server applies Rules and serves data



User asks for data (using logical properties)

Meta Data

Catalog

Austin

San Diego

Chapel Hill

14

iRODS Distributed Data Management

iRODS Wiki


Presentations, papers, tutorials


http://irods.diceresearch.org


Open source software
-

BSD license


Contributed clients, software


Performance assessments


Download source code


Windows
-

binary release


Unix / Mac / Linux build from source


iRODS Primer


Morgan & Claypool


Synthesis Lectures on Information Concepts,
Retrieval, and Services


16




User

With Client Views

& Manages Data


My Data

Disk, Tape, Database,
Filesystem, etc.

The iRODS Data System can install in a “layer” over existing or new data, letting

you view, manage, and share part or all of diverse data in a unified Collection.


iRODS Shows Unified “Virtual Collection”

Project Data

Disk, Tape, Database,
Filesystem, etc.

User Sees Single “Virtual Collection”

Reference Data

Remote Disk, Tape,
Filesystem, etc.

17

Infrastructure Independence


Manage properties of the collection
independently of the choice of technology


Access, authentication, authorization, description,
location, distribution, replication, integrity, retention


Enforce policies across all storage locations


Rule Engine resident at each storage site


Apply procedures at each remote storage site


Chain encapsulated operations into workflows


Use infrastructure independence to enable use
of new technology without interruption


Integrate new access methods, new storage systems,
new network protocols, new authentication systems

18

Data Grid Security


Manage name spaces for:


{users, files, storage}


Assign access controls as constraints imposed
between two logical name spaces


Access controls remain invariant as files are moved
within the data grid


Controls on: Files / Storage systems / Metadata


Authenticate each user access


PKI, Kerberos, challenge
-
response, Shibboleth


Use internal or external identity management system


Authorize all operations


ACLs (Access Control Lists) on users and groups


Separate condition for execution of each rule


Internal approval flags (IRB) within a rule

Demo
-
3

19

iRODS Rules and Micro
-
services

Reagan W. Moore

Rule Base


Rules stored in core.irb file


Separate copy of core.irb installed at each
storage location


Can have storage or site specific rules


Each rule is associated (through its
name) with specific event in the iRODS
framework (64 hooks)


acPreProcForPut


acPostProcForPut


acDeleteUser


Can also execute user
-
defined rules
through the irule command

Variables


Session variables


Define parameters associated with the client session, such
as:


$userNameClient


$rodsZoneClient


Workflow variables


Define parameters used within the workflow


*A, *CollName


stdout


Persistent state information


Maintained across sessions, stored in iCAT


DATA_NAME, DATA_SIZE, COLL_NAME, DATA_CHECKSUM


META_DATA_ATTR_NAME, META_DATA_ATTR_UNITS

22

iRods Rules


Each rule defines


An action for an event


Condition


Action chains (micro
-
services and rules)


Recovery chains


Invoked by servers to enforce policies


Invoked by clients to run workflows on servers


Rule types


Atomic
--

applied immediately


Deferred
--

run at a later time in the background


Periodic


run at a fix time interval

23

Format of a Rule


Action | Condition | MS
1
, …, MS
n
| RMS
1
, …, RMS
n




Action


Name of action to be performed


Name known to the server and invoked by server


Condition


condition under which the rule apply


Micro
-
services
-

If applicable micro services will be executed


Recovery micro
-
service
-

If any micro service fails, recovery
micro service(s) executed to maintain transactional consistency


Example of MS/RMS



createFile(*F) removeFile(*F)



ingestMetadata(*F,*M) rollback

24

Condition


Condition under which this Rule
applies


Examples


$rescName == demoResc8


$objPath like /x/y/z/*



Many operators


==, !=, >, <, >=, <=


%%, !! (and, or)


expr like reg
-
expr , expr not like reg
-
expr , expr
::= string

25

Micro
-
services (MSs)


Well
-
defined Server
-
side Procedures and
Functions


C functions on servers


MSs can be chained to form workflow using ‘##’

msiDataObjOpen(*A,*S_FD)##

msiDataObjRead(*S_FD,10000,*R_BUF)##

msiDataObjClose(*D_FD,*stat)


Flow control


whileExec
-

while loop


forExec


for loop


forEachExec


for each in the table or list


break


ifExec


if
-
else

26

Micro
-
services


flow control examples


whileExec


assign(*A,0)##whileExec( *A < 20,
writeLine(stdout,*A)##assign(*A, *A + 4), nop##nop)


forExec


forExec(assign(*A,0), *A < 20 , assign(*A,*A
+ 4), writeLine(stdout,*A),nop)


ifExec


ifExec(*A > *D,
assign(*A,*D),nop,assign(*D,*A),nop)

27

Other Micro
-
services



delayExec
-

execute MSs at a later time


Exec by the iRods batch server (irodsReServer) in the background


Example


delayExec(<PLUSET>1m</PLUSET>,msiReplColl(*desc_coll,*desc_resc,
backupMode,*outbuf),nop)


Time keywords


PLUSET


exec after the specified time has passed


ET


exec at the specified time (<ET>23:00</ET>)


FT


repeat exec at the specified frequency


Can be combined


<PLUSET>1m</PLUSET><EF>5m</EF>


remoteExec


execute MSs on remote servers


remoteExec(andal.sdsc.edu,null,msiSleep(10,0)##writeLine(stdout,open
remote write in andal), nop)


assign
-

assign a value to a parameter


writeString
-

write a string to stdout buffer


writeLine
-

write a line (with end of line) to stdout buffer

28

Micro
-
Services parameters


Micro
-
services communicate through:


Arguments/Parameters


Input from the initiator (client/server)


Lieterals


Variables


start with *


Output of a MS can be used as input of another MS in a MS chain


System Session Parameters


Start with “$”


Valid across rule invocations


Persistent data


iCat


Query the iCat


Valid across sessions


XMessages


out
-
of
-
band communications


Sender obtains send/receive tickets


Pass receive ticket to receivers


Receiver use ticket to read msg


Msg exchange


Between Parallel Session


Between the batch manager and the task manager on the task status


29

Example of passing parameters between
Micro
-
services




trimColl.ir file:

myTestRule||acGetIcatResults(*Action,*Condition,*B)##

forEachExec(*B,msiDataObjTrim(*B,tgReplResc,null,1,null,*C),
nop)|nop##nop

*Action=trim%*Condition= COLL_NAME =
'/tempZone/home/rods/loopTest'

*Action%*Condition


irule

F trimColl.ir

30

Using the rulegen parser


See:
https://www.irods.org/index.php/HELP.rulegen


Uses a nicer rule language and converts it into the
core.irb version


rulegen

s rX.r


This converts from the rulegen syntax to the core.irb
syntax and displays the result on your screen


rulegen

s rX.r > rX.ir


This converts from the rulegen syntax to the core.irb
syntax and stores the result in the file rX.ir


irule

F rX.ir


Executes the policy

31

Adding metadata values

mytestrule{

msiString2KeyValPair("FILETYPE_STATUS2=FTPASS",*kvp);

msiAssociateKeyValuePairsToObj(*kvp,*path,"
-
d");

}

INPUT *Att=$FILETYPE,*Val=$text,*
path=/renci/home/rods/listMS.ir

OUTPUT ruleExecOut



Note that there cannot be any spaces around the “=“ sign within
the msiString2KeyValPair micro
-
service. Spaces are interpreted as
part of the attribute name and attribute value.

32

Adding Metadata

mytestrule{


msiString2KeyValPair("*attrname=*attrvalue",*kvp);


assign(*A,*path/*obj);


writeLine(stdout,*A);


msiAssociateKeyValuePairsToObj(*kvp,*path/*obj,"
-
d");

}

INPUT *path=/renci/home/rods,*obj=$listMS.ir,*attrname="FILETYPE",
*attrvalue="25"

OUTPUT ruleExecOut

33

Reading user
-
defined metadata

acGetDataObjAVU{


msiMakeQuery("META_DATA_ATTR_NAME, META_DATA_ATTR_VALUE, COLL_NAME,
DATA_NAME", "COLL_NAME = '*CollName'", *Query);


msiExecStrCondQuery(*Query, *GenQOut);

forEachExec(*GenQOut){


msiGetValByKey(*GenQOut, META_DATA_ATTR_VALUE, *AttrValue);


msiGetValByKey(*GenQOut, META_DATA_ATTR_NAME, *AttrName);


msiGetValByKey(*GenQOut, DATA_NAME, *name);


writeLine(stdout,"*name has attribute *AttrName and value *AttrValue");


}

}

INPUT *CollName="$/renci/home/rods"

OUTPUT ruleExecOut



This lists all of the user
-
defined metadata values for all of the files in the named
collection

34

Example of multiple conditions

acGetDataObjAVU{


msiMakeQuery("META_DATA_ATTR_NAME, META_DATA_ATTR_VALUE, COLL_NAME,
DATA_NAME", "COLL_NAME = '*CollName' and META_DATA_ATTR_NAME =
'*AttrName'", *Query);


msiExecStrCondQuery(*Query, *GenQOut);

forEachExec(*GenQOut){


msiGetValByKey(*GenQOut, META_DATA_ATTR_VALUE, *AttrValue);


msiGetValByKey(*GenQOut, META_DATA_ATTR_NAME, *AttrName);


msiGetValByKey(*GenQOut, DATA_NAME, *name);


writeLine(stdout,"*name has attribute *AttrName and value *AttrValue");


}

}

INPUT *CollName="$/renci/home/rods", *AttrName="FILETYPE"

OUTPUT ruleExecOut


This only lists files that have the specified attribute name

35

Simple rule to list files

testlist.ir


mytestRule||acGetIcatResults(*Action,*Condition,*B)##


forEachExec(*B,msiGetValByKey(*B,DATA_NAME,*D)##


msiGetValByKey(*B,COLL_NAME,*E)##


writeLine(stdout,*E/*D),nop)|nop##nop

*C=/renci/home/rods%*Action=list%*Condition=COLL_NAME = ’*C'

ruleExecOut


Try


irule
-
F testlist.ir prompt


irule
-
F testlist.ir ‘yourpathname’


irule
-
F testlist.ir *C=‘yourpathname’

36

Converting String to AVU triplet

testrule||

msiDataObjChksum(*objPath,null,*ChksumStr)##

msiGetSystemTime(*Date,human)##

msiString2KeyValPair(Checksum.*Date=*ChksumStr,*KVPair)##

msiAssociateKeyValuePairsToObj(*KVPair,*objPath,
-
d)|nop


*objPath=/tempZone/home/antoine/tmp.txt


ruleExecOut


37

Installation of iRODS

Chien
-
Yi Hou

38

iRODS Wiki


http://irods.diceresearch.org


Descriptions of the technology


Publications / presentations


Download


Performance tests


Tinderbox system (tracks upgrades)


irods
-
chat page

39

iRODS installation


Download appropriate installation
manual from iRODS Wiki
http://irods.dicerearch.org



Installation procedure will take


Up to 30 minutes for server/catalog/clients


Up to 10 minutes for server/clients


About 3 minutes for clients



We will do a client install

40

Windows Installation


From the URL https://www.irods.org/index.php/windows


go to the section labeled Windows i
-
Commands and click
on the file




10
-
29
-
09: Windows i
-
commands 2.2



This will download the file




win_icmds_2_2.zip



Uncompress the file

41

Detailed Windows Install


Extract the exe files. This will be a long list of separate executable
commands, one for each type of operation that you may need to perform. The
list will include:



iadmin
-

used by the data grid administrator to set up


resources and accounts


icd
-

change to a different directory in the data grid


ils
-

list files in a data grid directory



To use these icommands, you will need to set up an environment variable file
which has default settings for the data grid that the class will use.



Note the directory name where you have put the executables

42

Detailed Windows Install


On the URL https://www.irods.org/index.php/windows


there are instructions in the section labeled




Setting up the iRODS User Environment file in Windows (for i
-
commands only)



To create the .irodsEnv file:

* Launch a "Command Prompt" by navigating to the menu "Start"
-
> "Accessories"
-
> "Command Prompt".

* Change directory to the user home directory.


> cd %HOMEDRIVE%%HOMEPATH%


* Type the following Windows command to create a folder, ".irods", and move
into this directory.


> md .irods


> cd .irods


> Notepad .irodsEnv


This will launch a Notepad and create a text file named ".irodsEnv".

43

Detailed Windows Install


Enter the following information into Notepad and click save.



irodsHost iren.renci.org’




irodsPort 1247



irodsDefResource 'renci
-
vault1'



irodsHome '/RENCI/home/usertutor1'



irodsCwd '/RENCI/home/usertutor1'



irodsUserName ’usertutor1'



irodsZone ’renci’



These are the Environment variables for a user account on the data
grid ‘RENCI’


You will need to replace the three occurrences of ‘usertutor1’ with your
iRODS account name on lines 4, 5, 6

44

Detailed Windows Install


To run i
-
commands in any directory in a Windows machine, the path
to where i
-
commands reside should be set in the Windows PATH
environment variable.


To do this, launch the System dialogue via:


* Start
-
> settings
-
> control panel.


* Click the "System" icon.


* In the "Advanced" tab, click the "Environment




variables" button.


Add the path name for the i
-
commands directory to the "PATH"
either in user category or the system category. The path name can
be found from the window that shows the icommand executables.
Add a semi
-
colon and this path name to the end of the PATH text.


Then close the window and start a new command prompt window.
You will be able to execute the icommands from any directory on
your system.

45

Detailed Windows Install


To connect to the data grid, type



iinit




To change your password, type



ipasswd




You will be prompted for your current password





You will then be asked for the new password

46

iRODS
-

Unix/Linux/Mac Installation


https://www.irods.org/download.html


Fill out form for:


BSD license


Registration / agreement


Tar file


Installation script (Linux, Solaris, Mac OSX)


Automated download of PostgreSQL, ODBC


Installation of PostgreSQL, ODBC, iRODS


Initiation of iRODS collection

47

iRODS Installation
-

Unix


Unpack

the release tar file


gzip
-
d irods.tgz


tar xf irods.tar


cd

into the top directory and execute


./irodssetup


It will prompt for a few parameters

48

irodssetup


Set up iRODS


------------------------------------------------------------------------


iRODS is a flexible data archive management system that supports many
different site configurations. This script will ask you a few questions, then
automatically build and configure iRODS.



There are four main components to iRODS:



1. An iRODS server that manages stored data.



2. An iCAT catalog that manages metadata about the data.



3. A database used by the catalog.



4. A set of 'i
-
commands' for command
-
line access to your data.



You can build some, or all of these, in a few standard configurations. For
new users, we recommend that you build everything.

49

iRODS Client Installation


iRODS configuration setup


----------------------------------------------------------------


This script prompts you for key iRODS configuration options.


Default values (if any) are shown in square brackets [ ] at each


prompt. Press return to use the default, or enter a new value.



For flexibility, iRODS has a lot of configuration options. Often


the standard settings are sufficient, but if you need more control


enter yes and additional questions will be asked.




Include additional prompts for advanced settings [no]?


50

iRODS Client Installation


iRODS configuration (advanced)


------------------------------


iRODS consists of clients (e.g. i
-
commands) with at least one iRODS


server. One server must include the iRODS metadata catalog (iCAT).



For the initial installation, you would normally build the server with


the iCAT (an iCAT
-
Enabled Server, IES), along with the i
-
commands.



After that, you might want to build another Server to support another


storage resource on another computer (where you are running this now).


You would then build the iRODS server non
-
ICAT, and configure it with


the IES host name (the servers connect to the IES for ICAT operations).



If you already have iRODS installed (an IES), you may skip building


the iRODS server and iCAT, and just build the command
-
line tools.




Build an iRODS server [yes]?

no

51

iRODS Client Installation


iRODS can make use of the Grid Security Infrastructure (GSI)


authentication system in addition to the iRODS secure


password system (challenge/response, no plain
-
text).


In most cases, the iRODS password system is sufficient but


if you are using GSI for other applications, you might want


to include GSI in iRODS. Both the clients and servers need


to be built with GSI and then users can select it by setting


irodsAuthScheme=GSI in their .irodsEnv files (or still use


the iRODS password system if they want).




Include GSI [no]?

no

52

iRODS Client Installation


Confirmation


------------


Please confirm your choices.



--------------------------------------------------------



GSI not selected




Build iRODS command
-
line tools



--------------------------------------------------------



Save configuration (irods.config) [yes]?


Saved.




Start iRODS build [yes]?


53

iRODS Client Installation


Build and configure


-------------------


Preparing...


Configuring iRODS...



Step 1 of 4: Enabling modules...



properties



Step 2 of 4: Verifying configuration...



No database configured.



Step 3 of 4: Checking host system...



Host OS is Mac OS X.



Perl: /usr/bin/perl



C compiler: /usr/bin/gcc (gcc)



Flags: none



Loader: /usr/bin/gcc



Flags: none



Archiver: /usr/bin/ar



Ranlib: /usr/bin/ranlib



64
-
bit addressing not supported and automatically disabled.

54

iRODS Client Installation



Step 4 of 4: Updating configuration files...



Updating config.mk...



Created /iRODS/config/config.mk



Updating platform.mk...



Created /iRODS/config/platform.mk



Updating irods.config...



Updating irodsctl...



Compiling iRODS...




Step 1 of 2: Compiling library and i
-
commands...




Step 2 of 2: Compiling tests...



Done!

55

iRODS Client Installation


-----



To use the iRODS command
-
line tools, update your PATH:



For csh users:



set path=(/iRODS/clients/icommands/bin $path)



For sh or bash users:



PATH=/iRODS/clients/icommands/bin:$PATH



Please see the iRODS documentation for additional notes on how


to manage the servers and adjust the configuration.



Change the path name to your installation path

56

Environment Variables


In home directory


cd ~/.irods


vi .irodsEnv



Default values to describe settings for
interacting with your data grid


57

Environment File

# iRODS personal configuration file.

#

# This file was automatically created during iRODS installation.

# Created Fri Jan 18 10:01:48 2008

#

# iRODS server host name:

irodsHost ‘iren.renci.org’

# iRODS server port number:

irodsPort 1247

# Home directory in iRODS:

irodsHome ’/RENCI/home/usertutor1'

# Current directory in iRODS:

irodsCwd ’/RENCI/home/usertutor1'

# Account name:

irodsUserName ’usertutor1'

# Zone:

irodsZone ’renci'

58

User Configuration


To use the iRODS 'i
-
commands', update your
PATH:



For csh users:



set path=(/storage
-
site/iRODS/clients/icommands/bin $path)




For sh or bash users:



PATH=/storage
-
site/iRODS/clients/icommands/bin:$PATH

59

irodsctl
-

script to control iRODS


Usage is:


./irods/irodsctl [options] [commands]


Help options:


--
help Show this help information


Verbosity options:


--
quiet Suppress all messages


--
verbose Output all messages (default)


iRODS server Commands:


istart Start the iRODS servers


istop Stop the iRODS servers


irestart Restart the iRODS servers

60

irodsctl options


Database commands:



dbstart

Start the database servers



dbstop

Stop the database servers



dbrestart

Restart the database servers



dbdrop

Delete the iRODS tables in the database



dboptimize

Optimize the iRODS tables in the database



dbvacuum

Same as 'optimize'


General Commands:



start

Start the iRODS and database servers



stop

Stop the iRODS and database servers



restart

Restart the iRODS and database servers



status

Show the status of iRODS and database servers



test

Test the iRODS installation