ArrayExpress A public database for microarray based ... - Quretec

indexadjustmentInternet και Εφαρμογές Web

13 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

123 εμφανίσεις

ArrayExpress


A public database for microarray
based gene expression data


http://www.ebi.ac.uk/microarray/



European Bioinformatics Institute


EMBL
-
EBI


Alvis Brazma, Helen Parkinson, Ugis Sarkans,
Mohammadreza Shojatalab, Jaak Vilo + team


MGED IV, Boston, February 2002



ArrayExpress


Standards
:

MIAME
-
compliant


Data model
:

MAGE
-
OM


Data input
:

MAGE
-
ML, web


Data output
:

HTML
,

MAGE
-
ML,





TAB
-
delimited, link to




Expression Profiler


Data curation
:

Team of curators


Data sets:

Yeast, human

Tuesday, February 12
th
, 2002

Opened to public



General overview

ArrayExpress

MIAMExpress

Expression

Profiler

MAGE
-
ML

Internet

www

MAGE
-
ML



ArrayExpress component
architecture

Main database

SQL derived

from MAGE
-
OM

Data warehouse

gene
-
centred


queries

Application server

Java servlets

MAGE
-
OM

Images

file server

ArrayExpress

MAGE
-
ML

Submission/

curation


Internet

www



ArrayExpress
-

features


MIAME
-
compliant, MAGE
-
ML, MAGE
-
OM



Can deal with:


raw quantitation data


processed data


data transformations


Independent of:


experimental platforms


image analysis methods


data normalization methods



ArrayExpress: details


Database schema derived from MAGE
-
OM


Standard SQL, we use Oracle


Data loader for MAGE
-
ML
-

generated


Web interface
(first release 12.2.2002)


Queries

by experiment, array, sample


Browsing


Object model
-
based query mechanism,
automatic mapping to SQL



Simplified ArrayExpress model



MIAMExpress


Data annotation and submission tool


MIAME based web interface


Experiment,
Array
, Protocol

submissions


Uses

CV/ontology

wherever possible


Creates

MAGE
-
ML

files for loading into
ArrayExpress


Based on
MySQL, Perl, CGI, Apache






Login

Pending/New Experiment

Sample1

Sample2

Sample3

Sample
n

Sample protocol

Hybridisations

Hyb protocol

Array
1

Array
2

Array
3

Array
n

Scanning protocol

Data
1

Data
2

Data
3

Data
n

Image analysis protocol

Combined Experiment Data

Transformation protocol

Submit

Final free text comment


Create account

Extracts 1…n

Extracts 1…n

Extracts 1…n

Extracts 1…n

E
1

E
2

E
n

E
1

E
2

E
n

E
1

E
2

E
n

E
1

E
2

E
n

Extraction protocol

MIAMExpress

submission
procedure



MIAMExpress design and future


Species and domain specific pages and
ontologies, ontology development


Life
-
span of data submissions is long


Curation control, submissions tracking


Interaction with ArrayExpress


Full MAGE
-
OM, data updating


Usability, flexibility, scalability, platform
independence


User needs, free in
-
house installation



ArrayExpress curation effort


User support and help documentation


Submission support for MIAMExpress


Support on ontologies and CVs


Minimize free text, removal of synonyms


MIAME encouragement


Help on MAGE
-
ML


Goal:

to provide high
-
quality, well
-
annotated data to allow automated data
analysis




E
-
MEXP
-
234

E
xperiment 234 via





MIAMExpress


E
-
SANG
-
25


E
xperiment 25 from





Sanger Institute



A
-
AFFY
-
1034

A
rray description 1034




from Affymetrix




P
-
LABL
-
5


P
rotocol 5 for labeling









Accession numbers



Data in ArrayExpress


Human data (ironchip)
from
EMBL


Yeast data from
EMBL


S. pombe data
Sanger

Institute





TIGR

array descriptions


Affymetrix

chip designs


Direct pipeline from
Sanger

(Rob Andrews)


HGMP

mouse


EMBL

mosquito



(Add your name here!)


Now

Work underway



Data browsing and queries





Experiment info



Sample info



General overview

ArrayExpress

MIAMExpress

Expression

Profiler

MAGE
-
ML

Internet

www

MAGE
-
ML



Expression Profiler:

EPCLUST

DATA

SELECT

FOLDER

ANALYZE

A “CLUSTER”

URLMAP

GeneOntology

Pathways

Databases

SPEXS

Other tools



>YAL036C chromo=1 coord=(76154
-
75048(C)) start=
-
600 end=+2 seq=(76152
-
76754)

TGTTCTTTCTTCTTCTGCTTCTCCTTTTCCTTTTTTTCCTTCTCCTTTTCCTTCTTGGACTTTAGTATAGGCTTACCATCCTTCTTCTCTTCAATAACCTTCTTTTCTTG
CTTCTTCTTCGATTGCTTCAAAGTAGACATGAAGTCGCCTTCAATGGCCTCAGCACCTTCAGCACTTGCACTTGCTTCTCTGGAAGTGTCATCTGCACCTGCGCTGCTTT
CTGGATTTGGAGTTGGCGTGGCACTGATTTCTTCGTTCTGGGCGGCGTCTTCTTCGAATTCCTCATCCCAGTAGTTCTGTTGGTTCTTTTTACTCTTTTTCGCCATCTTT
CACTTATCTGATGTTCCTGATTGCCCTTCTTATCCCCTCAAAGTTCACCTTTGCCACTTATTCTAGTGCAAGATCTCTTGCTTTCAATGGGCTTAAAGCTTGAAAAATTT
TTTCACATCACAAGCGACGAGGGCCCGTTTTTTTCATCGATGAGCTATAAGAGTTTTCCACTTTTAAGATGGGATATTACGGTGTGATGAGGGCGCAATGATAGGAAGTG
TTTGAAGCTAGATGCAGTAGGTGCAAGCGTAGAGTTGTTGATTGAGCAAA_ATG_

>YAL025C chromo=1 coord=(101147
-
100230(C)) start=
-
600 end=+2 seq=(101145
-
101747)

CTTAGAAGATAAAGTAGTGAATTACAATAAATTCGATACGAACGTTCAAATAGTCAAGAATTTCATTCAAAGGGTTCAATGGTCCAAGTTTTACACTTTCAAAGTTAACC
ACGAATTGCTGAGTAAGTGTGTTTATATTAGCACATTAACACAAGAAGAGATTAATGAACTATCCACATGAGGTATTGTGCCACTTTCCTCCAGTTCCCAAATTCCTCTT
GTAAAAAACTTTGCATATAAAATATACAGATGGAGCATATATAGATGGAGCATACATACATGTTTTTTTTTTTTTAAAAACATGGACTCGAACAGAATAAAAGAATTTAT
AATGATAGATAATGCATACTTCAATAAGAGAGAATACTTGTTTTTAAATGAGAATTGCTTTCATTAGCTCATTATGTTCAGATTATCAAAATGCAGTAGGGTAATAAACC
TTTTTTTTTTTTTTTTTTTTTTTTGAAAAATTTTCCGATGAGCTTTTGAAAAAAAATGAAAAAGTGATTGGTATAGAGGCAGATATTGCATTGCTTAGTTCTTTCTTTTG
ACAGTGTTCTCTTCAGTACATAACTACAACGGTTAGAATACAACGAGGAT_ATG_

...

>YBR084W chromo=2 coord=(411012
-
413936) start=
-
600 end=+2 seq=(410412
-
411014)

CCATGTATCCAAGACCTGCTGAAGATGCTTACAATGCCAATTATATTCAAGGTCTGCCCCAGTACCAAACATCTTATTTTTCGCAGCTGTTATTATCATCACCCCAGCAT
TACGAACATTCTCCACATCAAAGGAACTTTACGCCATCCAACCAATCGCATGGGAACTTTTATTAAATGTCTACATACATACATACATCTCGTACATAAATACGCATACG
TATCTTCGTAGTAAGAACCGTCACAGATATGATTGAGCACGGTACAATTATGTATTAGTCAAACATTACCAGTTCTCGAACAAAACCAAAGCTACTCCTGCAACACTCTT
CTATCGCACATGTATGGTTCTTATTGTTTCCCGAGTTCTTTTTTACTGACGCGCCAGAACGAGTAAGAAAGTTCTCTAGCGCCATGCTGAAATTTTTTTCACTTCAACGG
ACAGCGATTTTTTTTCTTTTTCCTCCGAAATAATGTTGCAGCGGTTCTCGATGCCTCAAGAATTGCAGAAGTAAACCAGCCAATACACATCAAAAAACAACTTTCATTAC
TGTGATTCTCTCAGTCTGTTCATTTGTCAGATATTTAAGGCTAAAAGGAA_ATG_

101 Sequences relative to ORF start

GATGAG.T 1:52/70 2:453/508 R:7.52345 BP:1.02391e
-
33

G.GATGAG.T 1:39/49 2:193/222 R:13.244 BP:2.49026e
-
33

AAAATTTT 1:63/77 2:833/911 R:4.95687 BP:5.02807e
-
32

TGAAAA.TTT 1:45/53 2:333/350 R:8.85687 BP:1.69905e
-
31

TG.AAA.TTT 1:53/61 2:538/570 R:6.45662 BP:3.24836e
-
31

TG.AAA.TTTT 1:40/43 2:254/260 R:10.3214 BP:3.84624e
-
30

TGAAA..TTT 1:54/65 2:608/645 R:5.82106 BP:1.0887e
-
29

...

GATGAG.T

TGAAA..TTT

YGR128C + 100



Upstream sequence (600bp)

GATGAG.T

TGAAA..TTT

GATGAG.T
W/30


TGAAA..TTT

1 mismatch





EPCLUST


Expression data

GENOMES

sequence, function,


annotation

SPEXS

discover patterns

URLMAP

provide links

Components of


Expression Profiler

http://ep.ebi.ac.uk/

Expression data

External data, tools

pathways, function,

etc.

PATMATCH

visualise

patterns

EP:GO

GeneOntology

EP:PPI

Prot
-
Prot ia.

SEQLOGO



Ackowledgments: the team (3)

Alvis Brazma

Alan Robinson

Jaak Vilo

1999 November

MGED 1 in Hinxton, EBI




Ackowledgments: the team (5)

Alvis Brazma, Alan Robinson

Database

Ugis Sarkans

Expression Profiler

Jaak Vilo

Research, students

Thomas Schlitt

2000 August



Ackowledgments: the team (9)

Alvis Brazma

Database

Curation

MIAMExpress

Ugis Sarkans

Helen Parkinson

Mohammadreza


Shojatalab

Expression Profiler

Jaak Vilo

Research, students

Thomas Schlitt

Katja Kivinen

Johan Rung

Patrick Kemmeren

2001 June



Ackowledgments: the team (19)

Alvis Brazma

Database

Curation

MIAMExpress

Ugis Sarkans

Gonzalo Garcia

Helen Parkinson

Mohammadreza


Shojatalab

Expression Profiler

Jaak Vilo

Research, students

Thomas Schlitt

Katja Kivinen

Johan Rung

Patrick Kemmeren

Misha Kapushesky

Lev Soinov

Koichi Tazaki

Anastasia


Samsonova

Susanna Sansone

Philippe Rocca
-
Serra

Ele Holloway

Niran Abeyguna
-


wardena

Ahmet Oezcimen

2002 February