DC08 Document Version: 0.5 Prepared by - Google Project Hosting

shrubberystatuesqueΔιαχείριση Δεδομένων

1 Δεκ 2012 (πριν από 4 χρόνια και 11 μήνες)

214 εμφανίσεις












ANDS Software System Document

for

Genomics Data Capture

ANDS Project Code: DC08

Document Version: 0.
5

Prepared by: Jianfeng Li

University of Adelaide

<
30
/0
7
/2012>


ANDS Project Code

DC08

Project Title

Genomics Data Capture

ANDS Program

Data Capture

Organisation responsible for the
project (Subcontractor)

University of Adelaide

Name of Contact Person

Professor Dave Adelson

Address and
contact details of
Contact Person

Head of School

School of Molecular and Biomedical Science

The University of Adelaide

SA 5005

Names and affiliations of
collaborators if any

Nil





Table of Contents

1

Introduction

................................
................................
................................
................................
....

4

2

Installation

................................
................................
................................
................................
......

4

3

Access control

................................
................................
................................
................................
.

5

4

SRA submission

................................
................................
................................
...............................

5

5

EBI to RDA

................................
................................
................................
................................
.......

7

6

Glossary of terms

................................
................................
................................
............................

8





1

Introduction

Genomics Data Capture
r
is

a system

manages genomics data generated from Next Generation
sequencers. It also manages information generated during the following research studies using these
data. It helps researchers to prepare to deposit data i
nto
Short Read Archive at
National Center for
Biotechnology Information, USA
or at E
uropean Bioinformatics Institute
, UK

with as less as possible
human intervention. The collection description, along with relevant, associated party, activity
records are ma
de available for harvest to Research Data Australia.



It is a web application which has been developed
mainly
in Perl.

There are sub
-
systems developed in
PHP or Go. The database management system is
PostgreSQL
.


Features:

1.

Sequence read files are stored in

a secured area.

2.

Files with same content have only one physical existence.

3.

Metadata is organised using NCBI SRA model.

4.

The access to data can be controlled by roles.

5.

Metadata can also be exported in RIF
-
CS format.

6.

Sequence reads can be submitted to SRA for

sharing with minimal human intervention.

2

Installation

The whole package includes perl module
ANDS

(/
lib
) and resources (/
root
).
Copy

source

to a location
and define the path in startup.pl to enable apache to find it. Copy resources or link to the website
document root. After copied all file
s, make modification in config.
ini
.

The structure of config.
ini

is:

[database]

dbname =
name


host =
host

user =
user name

passwd =
password


[folders]

session = /var/lib/gdacap/sessions

template = /var/lib/gdacap
/templates


[repository]

source = temporary incoming folder

target =
permanent storage folder


[log]

log4perl.logger = DEBUG, LogFile

log4perl.appender.LogFile=Log::Log4perl::Appender::File

log4perl.appender.LogFile.filename=gdacap_test.log

log4perl.append
er.LogFile.mode=append

log4perl.appender.LogFile.layout = Log::Log4perl::Layout::PatternLayout

log4perl.appender.LogFile.layout.ConversionPattern=[%d] %l
-

%m%n


Perl script
ebi
-
submitter
.pl and bash script
submit2ebi

are used to submit run files to EBI. C
hange
ASPERA_SCP_PASS

and ASP according to
your

local system

and your EBI submission account
.


To transfer files into system, it needs a server program and a client program. The client program is
transmeta and the sever program is transmetaserver and they
can be downloaded from


http://code.google.com/p/gdacap/downloads/list

or compile locally. See
http://code.google.com/p/gdacap/source/checkout?repo=transmeta

for detail.


To set up RIF
-
CS feed, download
http://code.google.com/p/oai
-
pmh
-
2/
, make local change
s
accordingly.


3

Access control

The system has freely accessible resources and controlled system and project resources.
User
interface is prepared according to roles

for which resources they can access
.
Access to different
resources
is checked
depends on what is being accessed and who is doing it.


R
oles of r
egistered users are in t
wo

categories: project, system
. Project roles are Manager,
Administrator, Bioinformatician, Operator and Observer. System roles are

administrator and super
administrator.
Project roles also have associated rights. Rights are defined when a resource has
operations can affect it, e.g. edit or creation.



No login is required

for accessing

read
-
only
data services
: taxonomy search, ANZSRC
-
FOR



Role is checked when restricted system resources are being accessed.



System menu
s

(all menus at the time of writing)

are accessible only to logged
-
in users
.



Registered

users can
access
Tools, Create Project.



Project
resources

are only accessible to us
ers has roles

in the project.



Controlled r
esources
challenge
role

only or role and right
.



Two types of right are defined: read and write.



The access to project resources is checked by verifying if a person’s role has
correct

right to a
resource in a projec
t.

o

Roles have write right can modify resources.

o

Roles have read right can access resources.

o

Roles have no right defined cannot access resources.

o

e.g. p
roject management interface

(roles, sub
-
section of project information)

is

only

shown

to project administrators.

Access to this resource is only checked by role. No
right check is needed.



System
management

(
sub
-
section of management of
Tools,
O
rganisation
s
,
U
ser
management)

interface
s

are

only accessible to system administrators

(a type of

users)
.



System administrators m
anage user

account
s
, system
resources, create or appoint

Manger
role of projects,
organisations
.



S
uper administrators manage
system administrator
.

4

SRA submission

Researchers can choose to submit r
un files
of one or all experiments in a study.
The system checks if
a submission is valid
ated

by checking
if a user has sufficient privilege

and
if

files
are attached to

each
included
experiment.

If all checks are satisfied, a submission section appears in an exper
iment
metadata page when it is shown.
Usually SRA objects which
we
re submitted by this system are held
in private

by default
. These records can be released to public when associated papers have been
published.


By clicking Submit button shown in
Figure
1
, run files in the experiment
are

compresses
ed

(gzip)

and

upload
ed

to EBI
,
md5 checksum
s are generated.
System also submits associated SRA objects

by
creating required XML files
, namely Study, Sample
, Experiment, Run and Submi
ssion

to EBI. After the
submission finished, on the same page, Accession field will be filled with EBI accession number like
ERX124972
.
Details can be found from the scripts
ebi
-
submitter
.pl and submit2ebi.




Figure
1
.

Submission of a run to EBI

Every SRA object is uniquely identified within the submission account using the alias attribute. Once
an object has been submitted, no other object of the same type can use the same alias within the
submission account

forever
. T
he aliases are used in submissions to make references between
different SRA objects. One object references another object's alias using the refname attribute. For
example, if a sample has the alias "sample1", an experiment can reference to this sample by u
sing
refname="sample1"

in current submission or other submissions
.

They can also be referenced by
accession.


The fundamental information to construct
alias

is item id. Each object has prefix to identify its type.


SRA has defined
<ACTION>

for a submission

which has:



ADD: Add a study, experiment, sample

or

run
object to the archive
.



HOLD
:
The object will be kept private and made public only when the hold date expires.

Accession or refname

that is the target for the action. If a target is not specified, then the
hold is taken to include all objects referenced by this submission. If a modifier is not
specified, then the hold is taken to mean hold until released by the submitter.

HoldUntilDat
e
:
Direct the SRA to release the record on or after the specified number of date.



RELEASE
:
The object will be released immediately to public.



VALIDATE
:
Validates the object without submitting it.



MODIFY: Modify a study, experiment, sample, or run object
in

the archive

under this
submission session. Modify SRA metadata: a study, experiment, sample or run object that
was submitted. Note that the run data itself cannot be amended. To do that, withdraw the
existing Run for the specified Experiment and add a new

Run in its place. It has <
target
> tag
with attribute either refname or accession.



CANCEL
:
Cancel an object which has not been made public. Cancelled object will not be
made public.

Tag and attributes are the same as above.



SUPPRESS
:
Suppress an object whi
ch has been made public. Suppressed data will remain
accessible by accession number only.


If any objects have been submitted before, they should not be included.


A submission
only

concerns SRA Run. To fully describe a Run, it needs metadata of Study, Sa
mple
and Experiment. These metadata have to be submitted with Run or have
been submitted previously

and referenced in current submission.
They can be submitted either in Study or Experiment

container. In either container, all Runs in that container will be

submitted. Actually, Experiment
submission is equivalent of individual Run submission.


Use md5sums, gz file names to generate run.xml and submit to EBI through REST.


Each contact with EBI is a Submission no matter whether the action is ADD, RELEASE or M
ODIFY.
These contacts are tracked by the tables of submission and submission_state.
Submission
is about
what are concerned. Submission type (study/experiment) defines the run files belong to what
container. submission_state tracks

the
actions and states
of an EBI contact about the run files of an
experiment or a study.

It records
action (add/release), accession
, date happened,

if it was successful

and message
.


During the preparation and
initial
submission it might have
a series of actions

but they are not
tracked:

0:
create
, 1: cop
y
, 2: xml

generat
ion
, 3:
submi
t



if submission was not successful, message
can be retrieved for analysed
;

if successful,
accessions have been assigned to SRA objects in the
submission container if they
have

not
yet
.

These have to be done by system administrators.


A previously held submission

(run/experiment)

can be released b
efore the holding time expires by
simply create a submission with release date of current date with existing accessions of the SRA
objects.

Table submission needs to be updated when there is a new release_date.


Usually a user can only submit run files when a study has been in Publishable stage but can be
manually released to EBI when it has required information.

5

EBI to RDA

Metadata comes fr
om this system mainly describing datasets and associated activities. RDA Party is
not the
primary

data object

and has limited resources
for
managing them
.
RDA Party record
s

are
preferably
to be
created by
other systems even though they can be created by
this system from
Person
record
. If
it has been created

or maintained

o
utside,
a record in
table
rda_person is used to
map
an RDA Party record

to a

P
erson

with Manager role
in this system
.

When a project Manager
does not have a record

in rda_person table,
a
n RDA Party is created or updated when ANDS Services
harvests data from this feeder
. If these records
later

are maintained

by outside
r
,
a mapping

record
in
rda_person

is required to

stop them
be
ing

updated from
this system
.



The publication to RDA is a ma
pping from the metadata describing an EBI submission.

It only
happens when:

1.

An experiment has been submitted to EBI

-

has field
accession

been set
;

2.

Release date is passed the hold date.

This happens automatically and does not need any human intervention.


When RDA harvester harvests

from the feeder
,

if
an
experiment
satisfies the above conditions
,
the
feeder (PHP code)
map
s

a local
Person
record
with
Manager

role involved in a project to
an RDA
Person, Experiment

(run)

to Collection
, Study to Activity. EBI
Study has links to publish and other
resources.



<activity type="project">


<name type="primary">


<namePart>Deep sequencing analysis of the developing mouse brain reveals a novel
microRNA</namePart>


</name>


<relatedObject>




<key>
personrole.Manager
</key>


<relation type="isManagedBy"/>


</relatedObject>


<relatedObject>


<key>58d99a4a
-
bc3e
-
4513
-
8b9c
-
d1e40d8de1fa</key>


<relation type="hasOutput"/>


</relatedObject>


<relatedObject>


<key>39a97633
-
ee80
-
4320
-
a861
-
95a8ba07e1c7</key>


<relation type="hasOutput"/>


</relatedObject>


<subject type="anzsrc
-
for">111203</subject>

--

from project


<description type="brief">MicroRNAs (miRNAs) are small non
-
coding RN
As that can exert
multilevel inhibition/repression at a post
-
transcriptional or protein synthesis level during disease or
development. Characterisation of miRNAs in adult mammalian brains by deep sequencing has been
reported previously. However, to date, n
o small RNA profiling of the developing brain has been
undertaken using this method. In this study, deep sequencing and small RNA analysis of a developing
(E15.5) mouse brain was performed.


This work was supported by National Health and Medical Research C
ouncil fellowships (171601 and
461204); National Health and Medical Research Council Grants 219176, 257501 and
257529.</description>

--

abstract

</activity>


The data source account has
Reverse link

option 1 turned on to allow linkage between collections
a
nd activities to party records. If any party record was created outside of this source, both options
have to be turned on.


For person,


1. Manager
-

personrole table by study_id


SELECT pr.person_id FROM study st, role_type rt, personrole pr WHERE st.id
= ? AND pr.project_id =
st.project_id AND pr.role_type_id = rt.id AND rt.iname = 'Manager';


2. Is in rda_person? yes, return key, no, query


SELECT key from rda_person where person_id = 2;


SELECT oai_identifier FROM oai_headers WHERE ori_id = ? AND ori_
table_name ='person';

6

Glossary of terms

Term

Definition

EBI

European Bioinformatics Institute

SRA

Short Read Archive

RDA

Research Data Australia