QUALITY OF LIFE AND MANAGEMENT OF LIVING RESOURCES PROGRAMME (1998-2002)

tealackingAI and Robotics

Nov 8, 2013 (4 years and 10 months ago)

134 views


1

QUALITY OF LIFE AND MANAGEMENT OF LIVING
RESOURCES PROGRAMME (1998
-
2002)



MID TERM REPORT




Contract number

:

QLRI
-
CT
-
2001
-
00015




Project acronym

:
TEMBLOR (DESPRAD subproject)




QoL action line

:
Research infrastructures area 14




Reporting period

:

01/01/02
-
31/07/2003



2

SECTION I: PROJECT IDENTIFICATION


Contract number:QLRT
-
2001
-
00015

Title of the project:

The European Molecular Biology Linked Original Resources

Microarray section

Acronym of the project:TEMBLOR (DESPRAD subproject)

Type of contr
act:RTD project

QoL action line: Research Infrastructures Area 14

Commencement date: 01/01/2002

Duration:36

Total project costs: 21,787,303

(in euro)

EU contribution: 19,400,912

(in euro)

Project co
-
ordinator:



Name
(including title)
:
Mr Graham Camero
n



Organisation:
The European Bioinformatics Institute



Postal address:
EMBL Outstation, Wellcome Trust Genome Campus, Hinxton, Cambridge
CB10 1SD UK



Telephone:
+44 (0) 1223 494467



Telefax:
+44 (0) 1223 494470



e
-
mail
:
Cameron@ebi.ac.uk

Keywords:microarray database standards analysis ontology

World wide web address:http://www.ebi.ac.uk/microarray/

List of participants:


Partner 3: Dr. Gos Micklem

University of Cambridge, Department of Genetics

Downing Street, CB2 3EH, Cambridge, UK

Phone: +44 (0) 1223 765 281

Fax: +44 (0) 1223 333 992

Email: gos.micklem@gen.cam.ac.uk


Partner 4: Dr. Alfonso Valencia

Protein Design Group

CNB
-
CSIC

Cantoblanco Madrid 28049 Spain

phone: (+34
-
91) 5854570 fax: (+3
4
-
91) 5854506

email: valencia@cnb.uam.es


Partner 9: Dr. Bernd Drescher

(Dr. Steffen Schulze
-
Kremer until February 2003)

RZPD Deutsches Ressourcenzentrum für Genomforschung GmbH

Heubnerweg 6

D
-
14059 Berlin

Germany

phone: (+49
-
30) 32639200 fax: (+49
-
30) 326
39111

Email steffen@rzpd.de





3


Partner 17: Dr. Pascal Hingamp

INSERM U136, CNRS UMR6102

Techniques Avancées pour le Génome et la Clinique

Centre d'Immunologie de Marseille Luminy

Parc Scientifique de Luminy, Case 906

13288 Marseille Cedex 09, FRANCE

emai
l: hingamp@ciml.univ
-
mrs.fr

Phone: (+33)0491269496 Fax: (+33)0491269430


Partner 18: Dr. Frank C.P. Holstege

Genomics Lab, UMC Utrecht

Department for Physiological Chemistry

HP STR 3.223

Postbus 80042

3508 TA Utrecht

The Netherlands

Phone (+31
-
30) 2538186

Fax (+31
-
30) 2539035

Email f.c.p.holstege@med.uu.nl


Partner 19: Dr Wilhelm Ansorge

EMBL

Meyerhofstrasse 1

Postfach 10.2209

69012 Heidelberg

Germany

Phone +49 (0) 6221 387355

Email: ansorge@embl
-
heidelberg.de


Partner 20: Dr. Inge Jonassen

Dept. of Informa
tics, University of Bergen,

HIB,

N5020 BERGEN, NORWAY

email: inge.jonassen@ii.uib.no

Phone: (+47) 55584199 Fax: (+47) 55584199









4

SECTION II: PROJECT PROGRESS REPORT



Table of Contents

1 OVERVIEW OF PROGRESS DURING THE REPORTING PERIOD

5

2 STATUS
OF THE INDIVIDUAL WORK PACKAGES

11

3. CONTRIBUTION OF THE PARTICIPANTS

40



4. PROJECT MANAGEMENT AND CO
-
ORDINATION

46



5. EXPLOITATION AND DISSEMINATION ACTIVITIES

46

6. ETHICAL ASPECTS AND SAFETY PROVISIONS

46

7. MID
-
TERM REVIEW

46



8. PLANS FOR THE

NEXT REPORTING PERIOD

46



9. REQUESTS TO THE COMMISSION

47









5

1. OVERVIEW OF PROGRESS DURING THE REPORTING PERIOD


The DESPRAD section of TEMBLOR is about microarray data. The main parts of this
subproject are 1) definitions, standards and ontologi
es for describing and exchanging
such data, 2) a public database, ArrayExpress, for microarray data, and tools for
querying, curating and submitting data, 3) algorithms and software tools for analysing
microarray data. The main objectives for this reportin
g period has been to start the work
for the database and the definitions, standards and ontologies, while most work on
analysis tools and algorithms will be in the later part of the project.

During the first 18 moths microarray database standards have been

defined, with only
minor follow
-
ups still ongoing, and a prototype repository accepting data in the standard
format has been developed and implemented and now online populated with the initial
datasets.

In the first year of the project, much work has bee
n devoted on developing the
definitions, standards and ontologies for microarray data, since the establishment of a
database and analysis tools requires that data can be managed in a clear, well
-
defined
and unambiguous way. Much of this work has been done
through the establishment of
the Microarray Gene Expression Data society (MGED), founded by the EBI and where
the DESPRAD partners play an active and important role. Achievements include work
on the definition of what is the Minimum Information About a Mic
roarray Experiment,
MIAME, a data object model that describes microarray data (MAGE
-
OM) and an XML
based data exchange language for microarrays (MAGE
-
ML, called MAML at the time
of the proposal). This work has been widely supported by the microarray commun
ity and
some scientific journals already require that submitted microarray data in submitted
manuscripts comply with these standards.

The database ArrayExpress is online and accepting submissions, and the web
-
based
submission tool MIAMExpress and a suite o
f web
-
based data analysis tools, Expression
Profiler, are linked to the database. As of end of August 2003, the database has data from
50 experiments (studies), totalling in over 1000 hybridisations. The data come from the
DESPRS/TEMBLOR participants as we
ll as from other laboratories from Europe and
North America.

The dissemination work includes numerous publications in scientific journals,
attendance and presentations at many important international and national conferences
and workshops, and the organiz
ation of courses and conferences.

The progress so far has been as planned and no significant difficulties or delays have
been encountered. The work on MIAME, the MAGE standards and the ArrayExpress
database is slightly ahead of schedule.


Table 1. Workpa
ckage list

WP

Description

Partners

PM

Start

End

Deliv.

Status

WP8.1

Formulate the minimum
information about a
microarray experiment
(MIAME)

1, 4, 17, 18

21

0

24

Dds6

Ahead of time

WP8.2

Definition of the data
exchange format for
1,

3, 17, 18,
19, 20

18

2

24

Dds7,
Dds8,
Ahead of time


6

microarray data (MAML)

Dds9,
Dds10

WP8.3

Participation in MGED and
development of ontologies

1, 3, 4, 9,
17, 18, 19

51

0

36

Dds11,
Dds12

On time

WP8.4

Development of the data
model and schema for
database ArrayExpress

1, 3,
9, 17,
18, 19

34

4

24

Dds13

On time

WP8.5

Database implementation

1, 3, 17

50

6

24

Dds14,
Dds15,
Dds16,
Dds17

Ahead of time

WP8.6

Development of data
annotation and curation tools

1, 3, 4, 17,
18, 19

53

6

30

Dds18,
Dds19

On time

WP8.7

Population of the
ArrayExpress with data from
the partners and
collaborators.

1, 3, 4, 9,
17
-
20

102

12

36

Dds20

On time
(starting yr 2)

WP8.8

Integration of ArrayExpress
with other EBI databases and
other Internet resources

1

30

12

36

Dds21

On time
(starting yr 2)

WP8.9

D
evelopment and
implementation of tools for
accessing ArrayExpress

1, 3, 17, 18,
20

42

2

24

Dds22,
Dds23,
Dds24

On time

WP8.10

Development of methods for
exploiting gene expression
data.

1, 3, 4, 9,
17
-
20

75

0

30

Dds25,
Dds26

On time

WP8.11

Integration of

data analysis
tools with ArrayExpress
database and interfaces

1, 19, 20

30

12

36

Dds27,
Dds28

On time
(starting yr 2)

WP8.12

Demonstration of analysis
pipelines integrating database
and analysis tools

1, 3, 4, 9,
17
-
20

49

12

36

Dds34,
Dds35,
Dds36,
Dds31

On time
(starting yr 2)

WP8.13

Disseminate the results and
stimulate the uptake of
Desprad work

1, 4, 17, 19,
20

15

2

36

Dds31,
Dds32,
Dds29,
Dds34,
Dds37,
Dds38

On time


Table 2. Milestone status


Milestone

Description

Status

Mds4

MAML first release o
f DTD, primer and
parser

Achieved

Mds5

MAML second release of DTD, primer
and parser

Achieved

Mds6

MAML final release and final submission
Submission to OMG done,

7

to OMG

finishing work in progress

Mds7

microarray experiment ontology
description and edit
or version 0

Achieved (editor abandoned)

Mds8

microarray experiment ontology
description and editor version 1

Year 2

Mds9

microarray experiment ontology
description and editor version 2

Year 2

Mds10

microarray experiment ontology
description and editor
version 3

Year 3

Mds11

First version and documentation of
MIAME compliant model

Achieved

Mds12

Revised version and documentation of
MIAME compliant model

Achieved

Mds13

Final version and documentation of
MIAME compliant model

Year 2

Mds14

ArrayExpress

architecture schema

and
first version of the E/R model

Achieved

Mds15

ArrayExpress
revised version of E/R
model, database generation scripts and
data loader

Achieved

Mds16

ArrayExpress
final version of E/R model,
database generation scripts and data
loa
der

Year 2

Mds17

First version of MAML compliant
annotation tool

Achieved

Mds18

Second version of MAML compliant
annotation tool integrated to LIMS,
curation tool

Year 2

Mds19

Final version of the MAML compliant
annotation tools

Year 3

Mds20

The first
ArrayExpress release populated
with a few experiments (up to 100
hybridisations)

Achieved

Mds21

Second ArrayExpress release

Year 2

Mds22

Third ArrayExpress release

Year 3

Mds23

Mapping of
ArrayExpress

to relevant
internet resources within and outside t
he
EBI through URL map

Year 2

Mds24

ArrayExpress
integration with EMBL
-
bank and SWISS
-
PROT through indexing
and SRS

Year 3

Mds25

ArrayExpress
integration with other
relevant databases at the EBI using Oracle

Year 3

Mds27

ArrayExpress queries list

Achiev
ed

Mds28

ArrayExpress query module and
documentation 1 and updated queries list

Achieved

Mds29

ArrayExpress query module and
documentation 2

Year 2
-
3

Mds30

New clustering algorithms and software
Year 2
-
3


8

for microarray data

Mds31

New algorithms for rec
onstructing
Bayesian regulatory networks for
microarray data

Year 2
-
3

Mds32

New algorithms for the analysis of DNA
signals relevant to the transcription
machinery

Year 2
-
3

Mds33

Definitions of
MAML compliant
interfaces

Year 2

Mds34

Implementations of
MA
ML compliant
interfaces necessary for the analysis tools

Year 2

Mds35

Integration of data exchange interfaces to
the query and retrieval components of the
ArrayExpress database

Year 3

Mds36

Publications on microarray informatics

Achieved and ongoing

Mds
38

Microarray informatics research reports
produced every six months.

Achieved and ongoing

Mds39

Microarray informatics international
conference.

Achieved


Table 3. Deliverables

Researchers involved in genomics, proteomics, and molecular biology in gener
al are
dissemination targets for all deliverables. For all deliverables completed or in
progress, full information is in section II.2 with references to the appendices
containing documentation on status.


Deli.

Description

Month
due

Nature

Dissem.
level

Di
ssem. target

Status

Dds6

Definition of the
‘Minimum
fn景f浡瑩潮m
a扯畴b
䵩捲潡牲ay
Experiment’


jf䅍A



o




C潭灬e瑥t

Dds7

A DTD
specification for
“microarray
浡mk
J
異u
language”
⡍Eji)



o




f渠n牯g牥獳

Dds8

MAML user
guide and
example
documents

24

R

PU


In progress

Dds9

A parser for
24

P

PU


In progress


9

MAML
documents

Dds10

Further MAML
submission to
OMG



24

R

RE


In progress

Dds11

Domain
ontologies on
sources of
samples used in
microarray
experiments

36

O

PU


In progress

Dds12

Ontology editor

18

P

PU


Abandoned

Dds13

Object model in
UML, Rational
Rose model and
documentation

18

O

PU


Completed

Dds14

Database
architecture
schema

24

O

PU

Primarily for the
database
developers

In progress

Dds15

Database E/R
model

24

O

PU

Primarily for the
databa
se
developers

In progress

Dds16

Database schema
generation
scripts

24

P

PU

Primarily for the
database
developers

In progress

Dds17

Data loader from
MAML
documents

24

P

PU

Primarily for the
data curators

In progress

Dds18

Data annotation
tools

30

P

PU

Pr
imarily for the
data curators

In progress

Dds19

Data curation
tools

30

P

PU

Primarily for the
data curators

In progress

Dds20

Populated
database
ArrayExpress

36

P

PU


In progress

Dds21

ArrayExpress
linked to
relevant
databases

36

P

PU


In progress

Dds2
2

The database
query list

6

R

PU


Completed

Dds23

Implementation
of query module

24

P

PU


In progress

Dds24

Query language
documentation

18

R

PU


Completed

Dds25

New methods
and algorithms

24

R

PU


In progress


10

Dds26

Data analysis
software

30

P

RE


In
progress

Dds27

Web
-
interfaces
for data analysis
tools



36

P

RE


In progress

Dds28

MAML
compliant
interfaces for
tools

36

P

PU


In progress

Dds29

Project Web
-
site

3

P

PU


Completed

Dds31

Research reports
every 6 months

6, 12,
18, 24

R

PU


Completed yr
1,
yr 2 in progress

Dds32

Organisation of
meetings and
workshops

12, 24

O

PU


Completed yr 1,
yr 2 in progress

Dds34

Publications in
scientific
journals,
conferences and
on web about
data model.

regular

R

PU


Completed yr 1,
yr 2 in progress

Dds35

Prese
ntation at
scientific
conferences

regular

O

PU


Completed yr 1,
yr 2 in progress

Dds36

Lectures,
courses

12, 24,
36

O

PU


Completed yr 1,
yr 2 in progress

Dds37

Visits to the sites
of microarray
facilities (both
industrial and
academic) by
project person
nel

Upon
request

O

RE


Completed yr 1,
yr 2 in progress

Dds38

Publication of
press releases
describing
progress and
availability of
new tools

6, 12,
18,
24,30,
36

R

PU


In progress




11

2. STATUS OF THE INDIVIDUAL WORK PACKAGES

Documentation for all deliver
ables and work accomplished can be found in the
appendices and are referenced in the text below

WP8.1

Objectives:

To formulate the minimum information about a microarray experiment
(MIAME) that needs to be revealed
in order to ensure the interpretability o
f the
experimental results, as well as their potential verification by third parties.


Deliverables:

Dds6

MIAME document


Documentation:

A1
-
1:

MIAME checklist

A1
-
2:

MIAME definitions, mappings to MAGE
-
OM and relationship with MGED
ontology

A1
-
3:

“Microarra
y standards at last”: editorial from Nature (419), 2002, p323

A1
-
4:

“A guide to microarray experiments


an open letter to the scientific journals”,
The Lancet (360), 2002, p1019.

A1
-
5:

“Standards for microarray data”: Science (298), 2002, p539

A1
-
6:

“Mini
mum information about a microarray experiment (MIAME)


toward
standards for microarray data”: Nature Genetics (29), 2002, p365
-
371

A1
-
7:

“Nature, Cell, Lancet will require researchers to comply with microarray
standard”: Genome biology, Oct 10 2002

A1
-
8:

“MIAME begets MAGE”: The Scientist, Sept 17 2002.

A1
-
9:

MIAME 1.1 document


A1
-
10 draft MIAME extension for Chip
-
on
-
chip experiments

A1
-
11 draft MIAME extension for toxicology

A1
-
12 draft MIAME extension for random arrays

A1
-
10


12 are also available f
rom www.mged.org/miame


Status, work accomplished and resources used:

P1: (9 personmonths)

We initiated and lead the development of the microarray experiment annoation
standard known as the Minimum Information About a Microarray Experiment


MIAME, in 2001
, before the TEMBLOR funding was secured. The MIAME proof of
concept was published in December 2001 (Brazma et al,
Nature Genetics

(2001), 29,
365
-
371, [A1
-
6])
. The TEMBLOR funding allowed us to continue in the MIAME
further development in collaboration wi
th the Microarray Gene Expression Data
(MGED) society. Based on the MIAME concept, jointly with other MGED members
we developed what is know now as MIAME checklist [A1
-
1] for journal authors,
reviewers and editors. MIAME checklist was described in an open
letter to scientific
journals signed by many scientists, including several from the TEMBLOR group. The
letter was pubished several leading journals, including journals
Science
,
The Lancet
[A1
-
4]
, Genome Biology

and
Bioinformatics
.
In October 2002 several m
ajor scientific
journals, including the Nature group, The Lancet, Cell and EMBO journal adopted
MIAME recommendations as a requirement for publishing papers based on

12

microarray research (see
Microarray standards at last,


Nature

(editorial) 419:323.
[A1
-
3]
; DeFrancesco, L., Journal trio embraces MIAME,
Genome Biology

3:gb
-
spotlight
-
20021010
-
01 [A1
-
7];
The Lancet


360:1019.;
Science

298(5593):539 [A1
-
5];
Bioinformatics

18(11):1409.) EBI has also produced a draft MIAME extension
for Protein/DNA binding loc
ation by array based chromatin immunoprecipitation
experiments (so
-
called Chip
-
on
-
chip experiments). EBI is maintaining the MGED
pages, and the TEMBLOR contribution to this is acknowledged on www.mged.org.


P4: (1 personmonth)

The Protein Design Group at t
he CNB in Madrid shows a long lasting research record
in various fields of bioinformatics including database structure, genome analysis and
information extraction (IE) from scientific text. We have been involved in MIAME
discussion meetings at the EBI as w
ell as in MGED meetings where data structure has
been discussed in order to come up with the MIAME document.

P17: (3 personmonths)

Partner 17 has taken part in MIAME development through the activities of the MGED
MIAME working group, notably by participat
ing to the Boston MGED4 and Tokyo
MGED5 meetings. Partner 17 has also been validating the applicability of MIAME to its
specific biochip technical platform (nylon microarrays with radioactive labelling)
applied to clinical diagnosis of cancer.

Partner 17

has continued ongoing participation to finalise MIAME definitions, both
by implementing MIAME into its own LIMS system (see WP8.6) and by verifying
MIAME suitability for nylon/radioactivity microarrays by submitting laboratory
generated MIAME data to Arr
ayExpress (see WP8.7).


P18:
(2 personmonths)

UMCU contributed to MIAME versions. They also tested feasibility of MIAME
annotation, and successfully submitted MIAME compliant datasets to ArrayExpress.


13

WP8.2


Objectives:
Definition of the data exchange for
mat for microarray data


microarray
mark
-
up language (MAML)


Deliverables:

Dds7

MAML DTD

Dds8

MAML user guide and sample experiment description in MAML format

Dds9

MAML parser

Dds10

Submission to OMG


Documentation:

A2
-
1:

MAGE
-
ML DTD

A2
-
2:

MAGE
-
ML user gu
ide

A2
-
3:

MGED MAGE web page
(http://www.mged.org/Workgroups/MAGE/mage.html)

A2
-
4:

Extract from MGED
-
MAGE mailing list archive (linked from the web page
mentioned in A2
-
3)


Comment:

This workpackage has changed. EBI began the development of MAML before the

funding for TEMBLOR was secured. The response from the international community
in developing the data exchange standard was overwhelming, and the development
proceeded much faster than expected. Therefore a joint proposal for standards
between the EBI and

several other companies was submitted and adopted by OMG in
February 2002. Careful finalisation work is in progress.


Status, work accomplished and resources used, per partner:

P1: (9 personmonths)

EBI began the development of microarray data exchange lan
guage (MAML) before
the TEMBLOR funding was secured.
In November 2000 we submitted a proposal for
a microarray data exchange format to the Object Management Group (OMG). OMG
accepted our initial MAML submission, but recommended to join it with an
alternati
ve submission from Rosetta Biosoftware, and to work towards a joint
proposal. In March 2001, already after the proposal for TEMBLOR (initially
DESPRAD) funding was submitted, the development of the joint standard began in
cooperation with many leading comp
anies including Rosetta, Affymetrix and
Agillent. We agreed to call the new unified standard Microarray Gene Expression
(MAGE). MAGE (http://www.mged.org/mage [A2
-
3]) consists of three parts: An
object model (MAGE
-
OM), a document exchange format (MAGE
-
ML)
[A2
-
1, A2
-
2],
which is derived directly from the object model, and software toolkits (MAGE
-
stk),
for creating MAGE
-
ML files. In February 2002 the MAGE standard became an
Adopted Specification by the OMG. The work on developing MAGE still continues


OMG ha
s set up so
-
called finalisation task force, in which we are actively
participating. We also participate in the development of MAGE
-
stk, in particular via
developing a MAGE
-
ML exporting microarray data annotation tool.

MAGE
-
OM model is now close to finalis
ation and only minor changes are expected.
MAGE
-
stk tool kit is well under the development and already in the use by MAGE
-
ML exporting tools. EBI has taken an active part in these developments. We expect to
finish the implementation of MAGE
-
ML based data e
xchange pipelines in a

14

workshop the EBI is organising in December 1
-
6 on the Wellcome Trust Genome
Campus with expected participation from all major MAGE developers.

P3: (work not started yet)


P17: (3 personmonths)

Partner 17 has tested the MAGEstk (MAGE
-
ML Perl standard toolkit) and is
implementing it inside its local microarray database system in order to allow data
generated in its laboratory to be exported in MAGE format (ultimately to transfer local
data to the public ArrayExpress archive).

Partner 1
7 has further implemented MAGE
-
ML into its own LIMS (see WP8.6) and
contributed to MAGE
-
ML development through participation to the 5th Programming
Jamboree 6
-
8th Sept in Aix en Provence.


P18: (1 personmonth)

UMCU
tested and reported on use of first versi
ons MAGE
-
ML, and have also performed
successful submissions of data and protocols in MAGE
-
ML format to ArrayExpress.

P19: (1 personmonth)

Several microarray datasets serving as test cases for MAGE
-
ML definitions were used to
set up the
MAGE
-
ML user guide a
nd the sample experiment description in MAGE
-
ML format.

P20: (1 personmonth)

Partner 20 has participated in the discussions on the MAGE mailing list of MGED, and
has been involved in testing the parser on example files distributed on the list and
through d
irect communication with EBI. Dr. Petersen was hired on the project in
February 2002, and attended the MGED IV meeting in Boston the same month.

We have contributed with maintenance and fixes for the Java MAGEstk, and input on
some minor issues of the MAGE

object model in the HigherLevelAnalysis package.








15

WP8.3


Objectives:

Participation in MGED and development of ontologies


Deliverables:

Dds11

An ontology for microarray experiment description

Dds12

An ontology editor (abandoned, Oiled editor being u
sed instead)


Documentation:

A3
-
1:

MGED ontology (Dds11)
(http://mged.sourceforge.net/ontologies/MGEDontology.php)

A3
-
2:

Sample MGED ontology hierarchical diagrams


Comment:

Deliverable Dds12 has been abandoned for reasons described by partner 9 below.


St
atus, work accomplished and resources used, per partner:

P1: (10 personmonths)

TEMBLOR funding has allowed us to participate in the development of MGED
ontology for describing microarray experiments.
The EBI is a key member of the
MGED ontology working gro
up which develops the structure and content of the
MGED ontology. The purpose of the ontology is to provide a structure for the terms
required in MAGE, to identify areas where domain expert knowlege is required and
ultimatley to provide terms and definitio
ns for structured database queries and
annotation.


As planed the core ontology usable by all MGED members databases has been
finished in September 2003 by a consortium of representatives of member laboratory
databases(TIGR, SMD, RAD and ArrayExpress) meet

monthly to produce the
ontology and add terms and definitions. The MGED ontology description is available
from http://mged.sourceforge.net/ontologies/MGEDontology.php. As the MGED
ontology was finalised only during MGED 6 meeting on September 5, 2003, a u
ser
friendly documentation is not yet available. The partner 1 is taking an active
participation in this work.
The OilEd (http://oiled.man.ac.uk/) tool has been used for
ontology development.


The EBI annotation and submission tool
-

MIAMExpress, already
uses terms from the
MGED ontology and is one of two annotation systems worldwide to do this. By
presenting terms in the context of forms, the annotation process is made simpler and
the underlying ontology, which is large and complex, is hidden from the use
r.


P3: (work not started yet)


P4: (1.5 personmonth)

One of the “hot topics” of the last years


establishment and use of biological ontologies


also attracted our attention. We focused on one of the challenges in this field and
addressed the problem of
automatic or aided establishment of ontologies by using

16

information directly from the literature (Blaschke and Valencia, 2002b). This effort will
help MGED establish recommendations for gene expression normalisation methods and
quality control.

Blaschke C
and Valencia A (2002b) Automatic ontology construction from the
literature. Genome Informatics Series 13: 201
-
213.


P9: (0 personmonths)

At the time of writing the project plan there was no robust, graphical editor for
ontologies available. Now, in 2002, a

number of such tools, all using the same
standard format (DAML+Oil), have been developed, are freely available to academics
and continue to be maintained by different groups, for example:



DAG
-
Edit (http://www.godatabase.org/dev/editor.html)



Protege
-
2000 (
http://protege.stanford.edu/)



OilEd (http://oiled.man.ac.uk/).

These tools are in the public domain and cover most of the needs of the MGED
community. In particular, DAG
-
Edit is being used at the EBI in the GeneOntology
project. Rather than duplicating the
se efforts it was decided to take advantage of these
tools. This enabled RZPD to spend more time and speed up on the implementation of
MAGEML standards and MIAME compliance at RZPD (see WP8.7 below).

The permission to shift the resources away from this W
P8.3 has been granted by the
EC. The OilEd (http://oiled.man.ac.uk/) tool has been used for ontology development
by the participating partners.


RZPD participated in the MGED workshop at the EBI in February 2002.


Work on the ontology for microarray expre
ssion experiments is under way but
priority was given to the implementation of MAGEML standards and MIAME
compliance at RZPD.


Work package WP8.7 already required more efforts than was anticipated and will
continue to do so within 2003. In 2002, we therefo
re reassigned 3 person months from
TEMBLOR/DESPRAD work package WP8.3 to WP8.7 and anticipate another 5
person months to be reassigned from WP8.3 to WP8.7 in 2003. This will leave the
remaining 8 person months of WP8.3 for its originally intended purpose o
f
developing an ontology for microarray experiment description.

P17: (5 personmonths)

Partner 17 has contributed to the regular activities of the MGED Society, in particular as
a member of its board of directors (monthly meetings and several email discussi
on
groups). The laboratory is also hosting the next international MGED6 meeting in Aix
-
en
-
Provence 3
-
5 september 2003 and as such is actively involved in both practical
meeting organisation and building the scientific program. Partner 17 is implementing th
e
MGED generated sample ontology to its own local microarray database.

P18: (3 personmonths)

UMCU p
articipated at MGED meetings, and tested and reported on ontology. Also, they
submitted data in correct format.



17



P19: (1 personmonth)

Several micro array d
atasets which served as test cases for MAML definitions were used
to verify, refine and improve the
ontology for micro
-
array experiment description
.



18

WP8.4


Objectives:

Development of the data model and schema for a microarray gene expression
database Arr
ayExpress


Deliverables:

Dds13

An object model described in the UML language, Rational Rose model (MDL
files) , The documentation of the model.


Documentation:

A4
-
1:

Rational Rose diagrams (MAGE
-
OM.mdl)


A4
-
2: Rational Rose model for ArrayExpress data wa
rehouse

A4
-
3: User guide to MAGE
-
ML


Status, work accomplished and resources used, per partner:

P1: (9 personmonths)

We decided to base ArrayExpress on the full Microarray Gene Expression Object
Model (MAGE
-
OM) [A4
-
1], in the development of which we acti
vely participated
(see WP 8.2 above). The three major components: arrays, experiments and protocols
are further described according to the MAGE
-
OM data model. ArrayExpress uses the
MIAME definition of an experiment
-

a set of related hybridisations, data,
protocols
and array descriptions.
There are several types of protocols, including extraction,
labelling, hybridisation, scanning, and data transformation protocols. Array design
describes what occupies each feature (spot) on the array and provides the resp
ective
references to external databases.

In order to provide flexible query capabilities data warehouse schema is very simple.
It is basically so
-
called star
-
like schema, with the central "fact" table capturing
absolute and relative expression values, and

three dimensions being gene/array design
element dimension, biological sample dimension and "technical", bioassay dimension
with links to experiment and chip type descriptions.

An introductory level documentation for the MAGE
-
OM model has been added and i
s
available from http://www.mged.org/Workgroups/MAGE/introduction.html. A paper
describing the MAGE model and its usage are under preparatins.


ArrayExpress model is freely available from ArrayExpress web
-
site
www.ebi.ac.uk/arrayexpress


P3: (work not started yet)


P9: (1 personmonth)

The first version of an MIAME compliant UML data model in MDL form (Milestone
Mds11) has been published via Sourceforge.net on September 9, 2002. [A4
-
1]


RZPD is using this

UML model as the basis for ist MAGEML implementation within
the RZPD database.


19

The deliverable Dds13 " An object model described in the UML language, Rational
Rose model (MDL files)" together with its documentation has been produced and
published at http:
//sourceforge.net/projects/mged/.


P17: (2 personmonths)

Partner 17 is participating in checking the validity of the ArrayExpress data model by
preparing data specific to its technical platform (nylon supported biochips) to submit in
MAGE
-
ML format to the

archive. In this way local laboratory database fields relevant to
MIAME can be mapped to ArrayExpress. The ArrayExpress model is also used to
update the laboratory’s own local microarray database relational model.




P18: (1 personmonth)

UMCU implemented,

tested and aided in bug fixes of local versions of ArrayExpress
and MIAMExpress.

P19: (1 personmonth)

Several microarray datasets serving as test cases for MAGE
-
ML definitions were used to
verify, refine and improve the
data model and schema for a micro
-
a
rray gene
expression database ArrayExpress.




20

WP8.5


Objectives:

Database architecture design. Development of a database schema for ArrayExpress.
Database implementation. Development of a data loader for MAML
-
compliant XML
documents into ArrayExpress.


De
liverables:

Dds14

Database architecture schema

Dds15

Database E/R model (Oracle Designer)

Dds16

Database schema generation scripts for Oracle 8 (DDL files)

Dds17

Data loader


Java application


Documentation:

A5
-
1:

ArrayExpress architecture

A5
-
2:

Database
generation script (extract, MAGE
-
RS.tab)

A5
-
3:

Information for submitters of MAGE
-
ML
(http://www.ebi.ac.uk/~ele/ext/submitter.html)


A5
-
4: User guide submitters

A5
-
5: Data loader modifications


Status, work accomplished and resources used, per partner:

P1:

(30 personmonths)

ArrayExpress includes a relational implementation of the MAGE
-
OM, i.e., there is a
uniform mapping of MAGE
-
OM to ArrayExpress relational tables. This mapping has
been encoded as a program that takes an object model as an input and produc
es
database schema definition and other software components.

The database model [A5
-
1], database scripts [A5
-
2] and application software are
available from the ArrayExpress web site.
ArrayExpress is implemented in Oracle,
the query interface is implemente
d via Java servlets and uses Tomcat and Velocity.

MAGE
-
ML loader to ArrayExpress has been implemented and successfully tested on
a large number of MAGE
-
ML submissions (about 50 differnet MAGE
-
ML files) from
different laboratories. Performance optimisation

is still in progress.


P3: (work not yet started)


P17: (3 personmonths)

Partner 17 is participating in checking the ArrayExpress data loader by preparing data
specific to its technical platform (nylon supported biochips) to submit in MAGE
-
ML
format to t
he archive.

Partner 17 has demonstrated the adequacy of the ArrayExpress implementation by
successfully having its own microarray data loaded (see WP8.7).




21

WP8.6


Objectives:

Development of data annotation tools working in conjuncture with laboratory
in
formation systems used by the project partners for creating MAML format data
representations. Development of data curation tools.


Deliverables:

Dds18

Data annotation tool

Dds19

Data curation tool


Documentation:

A6
-
1:

MIAMExpress source code (extract)


A6
-
2:

Information for curators (http://www.ebi.ac.uk/~ele/int/curator_info.html)

A6
-
3:

MIAMExpress help page
(http://www.ebi.ac.uk/miamexpress/help/help_page.html)


A6
-
4

MIAMExress user guide

A6
-
5 The list of MIAMExpress local instillations

A6
-
6 The l
ist of experiments submitted to ArrayExpress via MIAMExpress



Status, work accomplished and resources used, per partner:


P1: (15 personmonths)

To facilitate MIAME compliant data submission and annotation, we have developed a
software tool called MIAMExpr
ess [A6
-
1]. This is a web
-
based tool, which allows
users to annotate the submission either during or upon the completion of the
experiment. The current MIAMExpress Version 1.0 is a generic annotation tool,
suitable for annotation of any microarray gene exp
ression experiment, irrespective of
organism or type. To use MIAMExpress users need only an internet browser. The
user creates an account and is presented with a series of web forms, which include a
combination of drop
-
down fields (with appropriate control
led vocabularies) and free
format text fields, to annotate the experiment. Tab
-
delimited data files are uploaded
from the user’s local computer and linked to the experiment submission. Arrays and
protocols can also be submitted via MIAMExpress and can be l
inked to multiple
experiments. Help is available from the curation team throughout the submission and
contextual help is provided within the interface.

MIAMExpress is an open source project and consists of a perl
-
CGI interface, MySQL
database, and MAGE
-
ML

export component implemented using MAGEstk. The
system can be installed locally and used as an 'electronic notebook' for microarray
experiments, potentially allowing 'one
-
click' submissions to ArrayExpress or to any
other database or tool that accepts MAG
E
-
ML formatted data.

MIAMExpress has been developed further during the last 6 months and new
functionality has been added. The first draft of the user documentation has been
prepared. MIAMExpress has been actively used and 13 experiments have been
submitt
ed to ArrayExpress database using MIAMExpress tool. MIAMExpress is an
open source software and it has been installed in at least 15 different laboratories.



22


P3: (work not yet started)


P4: (1 personmonths)

The experience of the Protein Design Group in th
e development of tools for annotation
and curation based on text mining helps in the achivement of developing new curation
data tools and for the development of a final MAGE
-
ML data representation.


P17: (4 personmonths)

Partner 17 is participating in the
specification of the MIAMExpress data annotation tool
by verifying that it is compatible with data specific to its own technical platform (nylon
supported biochips). Also human and mouse alternative splice forms of gene transcripts
are being systematically

annotated on the basis of EST data to allow more accurate
microarray probe descriptions. A web
-
based service for cDNA based microarray probe
quality control is being developed.

Partner 17 has both further developed ELOGE
-

its own LIMS system
-

for captur
ing
laboratory generated data, as well as tested the MIAMExpress curation tool with a full
experimental dataset submission. The LIMS system is now operational in the
laboratory and only two MAGE
-
ML packages (BioAssay and BioData) now need to
be implemented

to gain full direct export capacity to ArrayExpress. Further work
however is still needed to make full use of the MGED ontologies, the first stable
version of which has just been released.


P18: (2 personmonths)

UMCU implemented, tested and aided in bug f
ixes of local versions of ArrayExpress
and MIAMExpress, and is currently working on implementation of a curation tool for
the present microarray database that will allow MIAME and MAGE
-
ML compliant
upload to ArrayExpress. UMCU has also successfully submitt
ed data to ArrayExpress in
these formats.

During the last 6 months the partner continued work on setting up in
-
house, MAML
compliant, annotation tool.


P19: (1.5 personmonth)

To facilitate the MIAME compliant annotation
of micro array experiments, it was

investigated to use and expand the MIAME Express data submission tool as a
laboratory information system. Thus the MAML format data representations could be
created automatically. Implementation of a modified MIAME Express tool will be
done during the res
t of the project.

MIAMExpress has been set
-
up at EMBL. This will work as the EMBL front
-
end of a
microarray data submission pipeline to ArrayExpress at EBI. We are in process of
adding additional features to MIAMExpress (e.g. simple query tool, local tiff

image
storage, user access control, batch
-
loader. This additional functionality will make
MIAMExpress more attractive to researchers, and thus help to increase the acceptance
of data submissions, even if publications are not planned or do not require
Arra
yExpress data submissions yet.


23

As a part of the local installation of MIAMExpress, we are developing custom tools
which allow the re
-
annotation of feature descriptions of the spotted cDNA arrays
produced at EMBL. Not documented sequences are blasted agains
t well curated
sequence databases (EnsEMBL, Refseq, Swissprot,…), or sequence annotations are
extended linking sequence identifiers to databases (e.g. EnsEMBL, Locuslink, GO
-
terms).







24

WP8.7


Objectives:

Population of the ArrayExpress with gene expressi
on data sets from the project
partner laboratories and other collaborators.


Deliverables:

Dds20

A database populated with high quality gene expression data sets


Documentation:

A7
-
1:

ArrayExpress entry page


updated in July 2003,
(http://www.ebi.ac.uk/a
rrayexpress)

A7
-
2:

Database documentation from partner 9 (
desprad
-
database
-
doc.tar.gz) (not
printed)

A7
-
3:

Software documentation from partner 9 (desprad
-
software
-
doc.tar.gz) (not
printed)


A7
-
4: ArrayExpress content (submissions by organisation, country a
nd organism)


Status, work accomplished and resources used, per partner:

P1: (10 personmonth)

As of July, the ArrayExpress database contains 50 experiments (total of over 1000
hybridisations), for yeasts Sacharomices Cerevisiae, Sacharmomices Pombe, Homo
h
uman, mouse, rat, C. Elegans, Arabidopsis and droshophyla. This constitutes more
than ten
-
fold growth during the last 6 months (see Annex A7
-
4)

P3: 5 personmonths

The bulk of the effort in the last 6 months has been towards WP8.7. Dr. Debashis
Rana started

on December 2002 and has been working on organising internal data:
tools have been developed that allow us to deposit array descriptions to ArrayExpress
and to reformat other internal data for use in MIAMExpress. We have proposed
changes to the ADF file
format to allow the recording of PCR production status for
micorarray elements and to deal with correctly recording different types of 'empty'
wells. The array design specification has been deposited in ArrayExpress and much
work has been done to curate,
in consultation with the actual experimenters, the
details of experiments carried out in our facility into the version of MIAMExpress
that we have installed locally. The nature of these experiments requires
MIAMExpress to allow multiple files to be upload
ed as one set, and this functionality
has just been made available in MIAMExpress. Therefore we expect in the next
couple of months to have brought up to date the set of spotted microarray experiments
due for submission to ArrayExpress.


We have found this

workpackage to be more complex than we anticipated, partly due
to the need to deal with Affymetrix data (not yet started) as well as in
-
house spotted
microarray expreiments. As we can now be sure that MIAMExpress will
accommodate our needs, we request pe
rmission that the planned effort towards
WP8.6 (5 months) which was to be have been directed towards a direct deposition of
data from our LIMS system instead be transferred to WP8.7.


25



P9: 20 personmonths

In order to prepare RZPD for MIAME compliant archiv
al and retrieval of gene
expression data the following tasks were performed at RZPD.


1.

As a first step some conceptual decisions had to be made. In order to store new
gene expression data as well as integrating those already in the RZPD primary
database we
decided that a new table space should be set up in RZPD. The data
schema should fulfill the requirements of MIAME, and MAGE
-
ML should be
used as data exchange format. For input of data the MIAMExpress tool
developedat the EBI can be used. The implementatio
n of the data schema should
be realized using ORACLE 8i while the software should be developed in Perl in
order to be consistent with other projects at RZPD.


2.

The new database for storing gene expression data was set up. The data model is
based on the MAGE

object model. Each class resulted in one table in the schema
with the exception of the three abstract classes Extendable, Describable and
Identifiable. Attributes and Relationships of these have been added directly to the
underlying tables in order to inc
rease performance. The hierarchical structure
between the tables has been implemented using identifying relationships.
Associations have been implemented via foreign keys, the description of the
classes and attributes were stored as comments to tables and
columns.


3.

A schema parser was written, that reads metadata from the ORACLE data
dictionary and creates two kinds of application programming interfaces (APIs) to
support work with the database. The first layer does an object
-
relational mapping,
i.e. one cla
ss per database table is created. Each class offers 'get' and 'set' methods
for modifying the value of an attribute. The second layer provides a database
interface for each table and offers 'insert', 'update', 'delete' and 'select' methods for
retrieving a
nd modifying the data in the database. Furthermore two types of
documentation are created by the parser. The first describes tables and their
columns by reading the corresponding metadata from the data dictionary, the
second gives information about the str
ucture of the APIs, their methods and usage.
The documentation is available in HTML format, the latter one additionally in
Perl's documentation language POD. With the help of this schema parser APIs
were created for the RZPD primary databases well as for t
he newly created gene
expression database.


4.

A program was implemented, that exports the gene expression data from the
original RZPD primary database and converts it into the MAGE
-
ML format. For
accessing the database the generated APIs of the previous task

were used. The
objects retrieved from the primary database are then transformed into MAGE
objects and exported to MAGE
-
ML.


5.

A parser was written, that converts files generated by the Affymetrix software
(sequence files in the fasta format, chip descriptio
n file (CDF), cell intensity file
(CEL)) into the MAGE
-
ML format.



26

6.

A program was implemented, that imports the gene expression data from MAGE
-
ML format into the gene expression database. The XML parsing was achieved
using an implementation of SAX, the dat
abase access was realized using the
generated APIs from task 3. Already about 20 gene expression experiment series
have been imported in the new RZPD expression database tables. Currently, tests
are underway to determine how to implement an automatic excha
nge mechanism
between RZPD and ArrayExpress.


This work is documented in the annex as follows.


File

desprad
-
database
-
doc.tar.gz
(
size 21 KB,

A7
-
2
) contains html documentation
pages to all database tables created. Starting with the 'index.html' page the us
er can
browse the structure and definition of the implemented schema. These documentation
files were automatically created by the schema parser based on the MAGE
-
OM data
model.


File

desprad
-
software
-
doc.tar.gz
(size 326 KB,
A7
-
2)
contains the following
di
rectory structure:


RZPD/


Geneexp/*.pm.pod
-
> Doc of classes of object
-
relational
mappings


DBI/*.pm.pod
-
> Doc of classes of MGED database interface


Import/


Mageml.pm.pod
-
> Doc of importers for MAGE
-
ML data


Mageml/



Handler.pm.pod
-
> Doc of SAX
-
Event
-
Handler for importer


Rzpd/*.pm.pod
-
> Doc of classes of object
-
relational
mappings


DBI/*.pm.pod
-
> Doc of classes of RZPD database interface


Export/


Mageml.pm.pod
-
> Doc of expor
ter for MAGE
-
ML data


Mageml/*.pm.pod
-
> Doc of classes of exporters for each MAGE
-
ML

Pack
age


Schema/*.pm.pod
-
> Doc of classes of schema parsers


Generator/*.pm.pod
-
> Doc of classes of Perl/HTML
-
generators


This work package already
required more efforts than was anticipated and will
continue to do so within 2003. In 2002, we therefore reassigned 3 person months from
TEMBLOR/DESPRAD work package WP8.3 to WP8.7 and anticipate another 5
person months to be reassigned from WP8.3 to WP8.7

in 2003. This will leave the
remaining 8 person months of WP8.3 for its originally intended purpose.


Period February 1


July 31, 2003


Like ArrayExpress the newly set up RZPD gene expression database uses MAGE
-
ML
as the main input/output language. Curre
ntly RZPD receives data input of MAGE
-
ML
files from three different sources:


1.
Data import from RZPD's primary database


27

The data is exported in MAGE
-
ML as described in last years project report and
imported into the gene expression database.

2.
MIAMExpress

RZP
D locally installed MIAMExpress, the data annotation and submission tool
implemented by partner 1, and adapted it to work with an ORACLE database
management system like the one used at RZPD. MIAMExpress is able to export
MAGE
-
ML which then is also populate
d to ArrayExpress.

3.
Data resulting from experiments with Affymetrix arrays

Affymetrix data files (.EXP, .CEL and .CHP) are converted into MAGE
-
ML using
the GDAC Exporter Tool available for download at Affymetrix' homepage.
However, the MAGE
-
ML produced cont
ains mistakes, so that changes 'by hand'
are still necessary. These problems have been reported to Affymetrix and will
hopefully be corrected in the near future.


Using these pipelines RZPD submitted 7 experiments so far to ArrayExpress.



P17: (10 personm
onths)

Partner 17 is developing a local MIAME compatible database to store its experimental
data. Stored data will be exported to ArrayExpress in MAGE
-
ML format using the
MAGEstk Perl modules. Deployment of the local database in the wet
-
lab side of the
lab
oratory is underway and partial implementation of the MAGE
-
ML exporter has been
achieved (BioSamples and BioAssays objects are still to be added). On full local
database installation (including functional MAGE
-
ML exporter), priming the data
pipeline to Arr
ayExpress will start after experimental laboratory generated data can be
loaded into the local database. Tools to aid experimental data acquisition in MIAME
compliant format are being developed, in particular a highly automated image feature
extraction sof
tware with stringent quality controls. This software is freely available and
the novel methological aspects are being submitted for publication.

Partner 17 has successfully submitted a full experimental dataset to ArrayExpress
using the MIAMExpress tool as

part of a manuscript preparation for scientific
publication. Although the laboratory LIMS is essentially operational, experimental
biologists have not yet got into the habit of recording MIAME compliant data in rela
time. The main incentive appears to be
the pressure put by scientific journal editors in
supplying MIAME quality annotation. To help establish electronic lab book keeping,
partner 17 is hiring a biologist data curator to prime the laboratory's LIMS system.


P18: (6 personmonths)

First submissio
n of MIAME compliant data in MAGE
-
ML format to ArrayExpress.
[B2
-
4]

Assisted collaborating groups in preparing their microarray data for MIAME/MAML
compliant submission to ArrayExpress in parallel with their paper submissions.


Continued work on increased
automation of the pipeline for generating
MIAME/MAML compliant data from combination of two in
-
house microarray
databases with LIMS system.

P19: (5 personmonths)


28

Two datasets have been submitted to EBI using the EBI installation of
MIAMExpress. Two other
submissions are in preparation. These two data sets are
submitted to the local MIAMExpress installation and will be pipelined to
ArrayExpress at EBI.


P20: (1 personmonths)

To facilitate submissions from the Norwegian Microarray Consortium, partner 20 has

installed and maintains a local BASE (BioArray Software Environment) installation
that is the selected platform of the consortium. Collaborative work with the
developers responsible for MAGE functionality of BASE has been initiated, with
focus on utilizin
g the MAGE software toolkit from WP8.2 as a basis.

To offer better storage and analysis capabilities to resarch collaborators, we have
installed the version 1.2.x series of BASE. We are in contact with the developers of
BASE, discussing MAGE
-
ML functionali
ty, to assure a standard complient
integration between J
-
Express, BASE and ArrayExpress. The current beta
-
version of
MAGE
-
ML export in BASE is developed by one of our partners in the Norwegian
Microarray Consortium, based on the Java MAGEstk.

A data analys
is course for utilising the available tools and storage facilities is planned
this autumn, and will increase the number of submissions to our local data storage.


29


WP8.8

Objectives:

Integration of ArrayExpress with other databases of the EBI and other Int
ernet
resources


Deliverables:

Dds21 ArrayExpress fully integrated in the common database system at the EBI and
linked to other relevant internet genomics resources



Status, work accomplished and resources used, per partner:

P1: (3 personmonth)


The datab
ase is integrated with other databases at the EBI via URL map component in
Expression Profiler (see below, WP 10). New work has started on integration via
ArrayExpress Data Warehouse (WP 8.4, 8.5) using Ensmart technology recently
developed at the EBI. We
are still exploring the advantages and limitations of this
technology. The integration prototype based on Ensmart technology is due January
31, 2004.



30

WP8.9


Objectives:

Development and implementation of languages and tools for querying and accessing
mic
roarray data in ArrayExpress


Deliverables:

Dds22

List of typical queries

Dds23

Implementation of the query module

Dds24

Query language documentation


Documentation:

A9
-
1:

List of typical queries (Dds22)

A9
-
2:

Logical structure of queries

A9
-
3:

ArrayExpres
s query forms

A9
-
4:

Data warehouse conceptual schema


A9
-
5 Query language documentation


Status, work accomplished and resources used, per partner:

P1: (15 personmonths)

Arrays, experiments and protocols can be queried by their accession numbers and also

by a variety of parameters. For example, experiments can be queried on various
parameters including author, laboratory, organism, experiment type (e.g., time series),
experimental factors (e.g., compound), and details of the array used in the
experiments
(e.g., array manufacturer). On querying, a brief description (provided by
the submitter) of all entries matching the query is returned. Complete information can
be retrieved by selecting particular items from this list. A browsable list of the
database con
tent is also present in the query interface.

Microarrays can contain tens or even hundreds of thousands of features. To explore
array designs easier, a generic tab
-
delimited display format has been designed, in
which each row corresponds to a feature (spo
t) on the array.

Documentation of the database access and query interface has been written (annex
A9
-
5)

Programmatic access to ArrayExpress is under development jointly with partner 20.
The documentation of the query language has been written (annex A9
-
5)


P3: (work not yet started)


P17: (7 personmonths)

Partner 17 is participating in investigating data queries methods by testing an
implementing SQL data extraction from its own local microarray database.

Having a dataset submitted to ArrayExpress, Partner

17 has downloaded its data back
in MAGE
-
ML in order to verify data integrity. Queries are also being tested against
its own database to collect data in gene or sample centric views rather than solely by
experiment.



31

P18: (4 personmonths)

Partner 18 starte
d by implementing local version of Expression Profiler (
http://gen
-
master.med.uu.nl/tools/EP/)
, then developed and implemented novel tools to determine
mRNA coexpression for the purpose of verifying putative protein
-
protein interactions).
This tool was int
egrated in Expression Profiler. Moreover, microarray normalisation
tools were developed. Partner 18 has also implemented a local version of this tool. See
WP8.10 for references/documentation.

Contributed to development of Expression Profiler tool for seam
less querying of
ArrayExpress


P20: (2 personmonths)

The earlier code for importing MAGE
-
ML into J
-
Express has been rewritten to a
middle layer framework for handling several MAGE operations in addition to
MAGE
-
ML import. This framework will have the funct
ionality to query a repository
for parts or whole of other MAGE
-
OM data structures available from the repository.
Discussions with EBI on query protocol (internatinal standardisation process) and
implementation are ongoing.



32

WP8.10


Objectives:

Developmen
t of methods, algorithms, and software for the analysis, exploration,
visualisation, and mining (knowledge extraction) of gene expression data.


Deliverables:

Dds25

New methods and algorithms (publications)

Dds26

Computer software implementing these method
s and algorithms


Documentation:

A10
-
1:

EPCLUST documentation

A10
-
2:

Expression Profiler description

A10
-
3:

Analysis of Gene Expression Data using J
-
Express


A10
-
4:

Updated Expression Profiler entry web page (http://ep.ebi.ac.uk/)


Status, work accomplishe
d and resources used, per partner:

P1: (16 personmonths)

Expression Profiler (EP, http://ep.ebi.ac.uk/ [A10
-
1,2,4]) is a set of tools for the
analysis and interpretation of gene expression and other functional genomics data.
These tools perform expression

data clustering, visualization, and analysis, integration
of expression data with protein interaction data and functional annotations, and the
analysis of promoter sequences for predicting transcription factor binding sites.
Several clustering analysis m
ethod implementations and tools for sequence pattern
discovery provide a rich data mining environment for various types of biological data.
All the tools are web
-
based with minimal browser requirements. Analysis results are
cross
-
linked to other databases
and tools available on the Internet.

One of the most significant developments in the microarray data analysis tool
Expression Profiler (EP) is the first public release of a new web interface, together
with the completely re
-
designed underlying architecture
. The architecture is based on
a three layer model (with web, database and application servers) centering around the
idea of web services. The extensible, pluggable XML
-
based framework not only
presents a uniform, easy
-
to
-
understand interface to the existi
ng EP components, but
also enables other developers to create new ones easily. For instance, a component
that takes advantage of the EP capability to integrate with the statistical package R,
and implemented the ordination methods COA and PCA (corresponden
ce and
principal component analysis, respectively) as well as the supervised method BGA
(between group analysis) has been developed. An important aim of the new design
was to enable closer integration with the microarray data repository ArrayExpress,
allow
ing the user to seamlessly export data, obtain a histogram of the distribution,
create a series of subselections and cluster the data. Further work on helping the user
interpret clustering results is ongoing and, clustering comparison algorithm is being

i
mplemented as another component available in Expression Profiler.


These will be implemented in Expression Profiler interface and intergrated with
ArrayExpress (WP 11) due July 31, 2004.




33

P3: (work not yet started)

P4: (1 personmonth)

The vast amount of
data derived from expression array experiments requires new
computational methods for its analysis. In particular, issues related to the extraction of
biological information are important for the end users. We have implemented a strategy
which is able to
detect biological terms significantly associated to different gene
expression clusters by mining collections of Medline abstracts. (
Blaschke C, Oliveros
JC, Valencia A. 2001
)

Information extraction systems tend to rather big and complex applications becau
se of
the enormous amount of data that has to be handled (a database with the entire Medline
can easily reach a size of 300 gigabytes). Most scientific applications deal therefore with
subsets of this data and often address problems of low generality. To o
vercome these
limitations the CSIC (Spanish research council) and the CNB (National Center of
Biotechnology in Madrid) agreed on a technology transfer with Alma Bioinformatica,
S.L. (a bioinformatics software company near Madrid). Under this agreement the
Protein
Design Group from the CNB transfers basic information extraction and text analysis
technology to Alma Bioinformatica that in return provides professional software
framework to be used by academic parters of the Protein Design Group. Various
members

of the group were (and still are) involved in transferring technology and
supervising implementations. This consumed a considerable part of our time but it is
planned to pay off already this year.

Blaschke C, Oliveros JC, Valencia A.
Mining functional inf
ormation associated with
expression arrays.

Funct Integr Genomics 2001 Mar;1(4):256
-
68

P9: (2 personmonths)

The statistical programming language 'R' as well as several packages for accessing a
database from it and doing microarray data analysis have been i
nstalled namely the
'ROracle' package, the 'marray*' packages and the 'affy*' packages. First analysis
steps have been tested producing histograms and scatterplots, performing
normalizations (mean, median, loess methods) and scoring (log
-
ratios, t
-
test).

P
17: (4 personmonths)

Partner 17 is investigating expression profile analysis applying novel graph theoretical
approaches to biochip experiments. One such approach has been implemented as a freely
available software tool; the rational behind the method has
been submitted for scientific
publication.

Partner 17 has implemented two novel methods in freely available software tools. The
first called Trixy offers a novel approach to clustering gene expression data based on
graph theory; a manuscript describing the

method has been published in BMC
Bioinformatics (http://www.biomedcentral.com/1471
-
2105/4/15). The second is
BzScan, a fully automated feature extraction software specifically taylored to nylon
microarrays coupled with radioactive detection. The final man
uscript describing
BzScan will be submitted for publication in October.

P18: (5 personmonths)

Partner 18 developed and implemented a novel tool to determine mRNA coexpression
for the purpose of verifying putative protein
-
protein interactions [B2
-
5]. They
i
mplemented this tool in Expression Profiler (
http://ep.ebi.ac.uk/EP/PPI/)


34


and developed microarray normalisation tools [B2
-
4]This tool has been implemented
locally (
http://www.bioinformatics.med.uu.nl/
)

Continued development of novel visualisation tools for integrated analysis of mRNA
expression data with other types of functional genomic data.

P19: (2 personmonths)

As a part of the local installation of MIAMExpress, we are developing a hybridisation
ce
ntric query tool. The tool will allow the MIAMExpress user to search hybridisation
data sets based on sample description, protocol details … This will allow the
MIAMExpress user to easily review his submitted data, as well as to extract data from
the MIAME
xpress database for his own data analysis tasks.

P20: (8 personmonths)

A prototype for a new visualisation module has been implemented in J
-
Express [A10
-
3].
The algorithm aims at expressing more of the present variance in the multidimensional
dataset in th
e resulting low dimensional visualisation space compared to standard
methods such as principal component analysis (PCA). This approach is based on the
technique Gradual Projection (A. Aszodi and W. R. Taylor) from the field of protein
stucture modelling.

A
n extensive exploration of a technique for combined clustering and visualisation of
multidimensional data was completed, unfortunately with the conlusion that the
technique was not suitable for microarray data. The work included comparisons of
clusterings
from different methods, and aquired knowledge on this subject can be
implemented as analysis tools in the J
-
Express package. Current work on new
analysis methods in J
-
Express includes an error
-
model estimated data reliability
measure in combination with re
plicates, in addition to class prediction and cross
validation functionality.



35

WP8.11


Objectives:

Integration of data analysis tools with the ArrayExpress database and query and
retrieval interfaces.


Deliverables:

Dds27

Web
-
interfaces for data analysis
tools

Dds28

MAML compliant interfaces for the developed tools


Documentation:

A11
-
1:

Expression Profiler entry web page (http://ep.ebi.ac.uk/)


Status, work accomplished and resources used, per partner:

P1: (8 personmonth)

Data and array descriptions can b
e exported as tab
-
delimited files or uploaded into
Expression Profiler for online analysis and visualisation. Further development of this
work
-
package is closely related to the development of programmatic query interface
for ArrayExpress (WP. 8.8). The que
ry language has now been developed, allowing
for closer integration of the tools.

P19: (work not yet started)

P20: (4 personmonths)

The natural first approach of integration is through MAGE
-
ML, thus the capability of
MAGE
-
ML import was developed and imple
mented in J
-
Express version 3.0 released
1
st

of September 2002. The implentation relies heavily on the Java version of the
Open Source MAGE software toolkit and considerable effort was put into further
development of critical parts in the data model (secon
d transformation of the
BioAssayData objects, ref. OMG Specification). The developed code changes have
been submitted back to the MAGEstk
-

project for incorporation into the publicly
available code.

As a complement to the implemented import functionali
ty of J
-
Express in the MAGE
framework (see WP 8.9), the framework has been extended with an interface for
export of data from J
-
Express, and thus facilitates MAGE
-
ML export of analysis data
from J
-
Express. A prototype is implemented an is currently beeing
tested with J
-
Express. A submodule that in particular needs further development and discussion
with partners, is the handling and generation of object identifiers.



36

WP8.12


Objectives:

Practical demonstration of the analysis pipelines that integrate the d
atabase access and
on
-
line analysis tools.


Deliverables:

Dds34

Publications in scientific journals,

Dds35

Presentation in conferences,

Dds36

Lectures in courses

Dds31

Research reports


Documentation:

Appendices B1, B2 and B3 contains documentation on ac
tivities within this
workpackage. Please note that these appendices contain examples and most important
presentations, publications and organised meetings. Any further documentation is
available upon request.


Status, work accomplished and resources used,
per partner:

P1: (8 personmonths)

We have demonstrated that we have successfully use gene expression data and
Expression Profiler for analysing gene regulation in
S. pombe

genome in collaboration
with Bahler's lab in the Sanger Insitute (Chen et al.,
Molec
ular Biology of the Cell

(2003), 14, 214
-
229) [B2
-
2]. Two new publications that exploit dataset from R.
Young’s lab at the Whitehead Institute has been submitted (Appendice B4
-
B5)


The first research project in our team that will exploit data directly in A
rrayExpress
has started. We have begun a three way collaborative project with Holstege's lab in
Utrecht and Bahler's lab in the Sanger Institute to generate and compare data from
copper response in bakers and fission yeasts. We are establishing pipelines f
rom these
laboratories to ArrayExpress [B2
-
3], and the data will be analysed by our team.
Similar collaborations have started with the EMBL Heidelberg. One new dataset
(stress response) from Bahler’s lab has been submitted in July and an additional one is
currently being submitted (cell cycle).


EBI has produced numerous publications and presentations, the most important ones
in [B1
-
3].

P3: (0 personmonths)

Attended TEMBLOR/ DESPRAD meetings on 25 Feb 2002 and 21
-
22 November
2002 in Hinxton, Cambridge. [B
3
-
4]


Attended Standards and Ontologies for Functional Genomics (SOFG) Hinxton,
Cambridge:17
-
20 November, 2002 (No presentation). [B3
-
5]

P9: (1 personmonths)

Except from generally following up with the progress made by other partners in this
work package n
o significant efforts were spend on WP8.12 by partner 9 / RZPD
during this period.


37


P17: (work not yet started)

P18: (2 personmonths)

Work on combinatorial genomics has resulted in one publication [B2
-
5] and several
presentations.

Contributed to several wo
rkshops and seminars aimed at training scientists in use of
microarray database and analysis tools.



Conferences attended:

MGED IV, 13
-

16 February,


2002, Hynes Convention Center, Boston, USA,
Kemmeren

Array NL, Utrecht 21 st January 2002, Holstege

TEMB
LOR, EBI Hinxton, 24
-
25 th Feb 2002, Holstege

Transcriptome 2002, Seattle 11
-
13 March 2002, Holstege

EMBO YIP meeting, Heidelberg 10
-
12 April 2002, Holstege

EBI, 27th June 2002, Holstege

ISMB Edmonton Canada, Aug 3
-
7 2002, Lijnzaad

ArrayNL, Utrecht 18th Se
ptember 2002, Holstege

Functional Genomics meeting EMBO/EMBL, Heidelberg 13
-
16 October 2002,
Holstege

TEMBLOR, EBI Hinxton, 20
-
22 November 2002, Holstege, Kemmeren

ICSB Stockholm, 12
-
15 december 2002, Kemmeren

EBI workshop datamining, 9
-
11 december, Kemmer
en


P19: (work not yet started)


P20: (work not yet started)

Future work, depends on deliverables of other workpackages not yet completed.




38

WP8.13


Objectives:

To disseminate the results of research and developments obtained to microarray
producers and u
sers in the European pharmaceutical and biotechnology industry, and
academia in order to stimulate the uptake of the results of the project. Also, to raise
awareness of the applications of the microarray database and bioinformatics analysis
for the exploit
ation of microarray data.


Deliverables:

Dds31

A research report every 6 months of the project.

Dds32

Organisation of meetings and workshops.

Dds29

Public project web pages.

Dds34

Attendance and presentation at scientific conferences, plus publications
in
journals.

Dds37 Visits to the sites of microarray facilities (both industrial and academic) by
project personnel and visits to host institutes of the project by personnel from
microarray facilities.

Dds38

Publication of press releases describing progr
ess and availability of new tools.


Documentation:

Appendices B1, B2 and B3 contains documentation on dissemination activities.
Please note that these appendices contain examples and most important presentations,
publications and organised meetings.

Appen
dix B6 contains the 6 months report for the period February 1


July 31, 2003.


Status, work accomplished and resources used, per partner:

P1: (3 personmonths)

The EBI has organised two meetings for the DESPRAD partners, both held in
Hinxton, UK. The first

one was held 21/2 2002, the second one 21
-
22/11 2002 [B3
-
4].
The first SOFG conference (Standards and Ontologies for Functional Genomics) was
also organised in Hinxton 17
-
20/11 2002 [B3
-
5,6]. The first BREW (Bioinformatics
Research and Education Workshop)

was organised in Hinxton 17
-
19/4 2002 [B3
-
1,2].
An EMBO practical course in Microarray data analysis is being organised in Hinxton
16
-
22/3 2003 [B3
-
8]. An industry workshop, “Microarrays and data mining” has been
organised in Hinxton 10
-
11/12 2002. [B3
-
7
]

EMBO course ‘Analysis and Informatics of Microarray Data’ has been organised by
Partner 1. The course was attended by 30 participants, predominantly from the EU
countries (annex B7)

A large number of conferences and workshops have been attended. [B1]

A v
isit at the UMC Utrecht lab was organised 21
-
22/10. [B3
-
3]

Jointly with Partner 17, we organised MGED 7 conference (see below)


P4: (0 personmonths)

P17: (2 personmonths)

Partner 17 has submitted one scientific publication and will submit a second manuscri
pt
by the end of January. The TAGC laboratory has taken part in both annual MGED

39

international meetings in Boston and Tokyo and will host MGED6, the next such
meeting in September 2003.

Partner 17 has published one paper (EU support acknowledged) and is
about to
submit another (EU support acknowledged) by October 2003. Partner 17 has also
organised MGED's 6th International Congress 3
-
5 Sept 2003 in Aix
-
en
-
Provence
(http://tagc.univ
-
mrs.fr/mged6/) which attracted 340 participants from over 20
countries (th
e EU's participation in supporting members of the Organising Committee
was acknowledged http://tagc.univ
-
mrs.fr/mged6/committee.html).



P19: (1.5 personmonth)

An international practical course on “
E
MBO Practical Course

Microarray Technology
from Genome t
o Proteome” has been organized in Heidelberg, 1
-
8/6 2002.

An international EU funded course ‘Microrray Technology: from Genome to
Proteome’ was organised and held at EMBL from June, 01 until June 08, 2003. The
course covered all fields of Microarray produc
tion, evaluation and data mining. The
concepts of MIAME, MIAMExpress and ArrayExpress were presented in talks.
Working with MIAMExpress and ArrayExpress was explained through practical
workshop sessions.

Group members joined MDEG V and MGED VI conferences
as well as project
meetings. The meetings were used to exchange information and ideas with project
collaborators as well as with other specialists in the field.


P20: (1 personmonths)

The roles of MAGE
-
ML, ArrayExpress and MiameExpress has been advocated i
n
discussions within the Norwegian Microarray Consortium.

The MAGE standard, the concepts of MAGE
-
ML, MAGE
-
OM and MAGEstk, have
been disseminated at Norwegian Microarray Consotrium's national meeting March
17.
-
18. and the Bioinformatics Forum for Young Sc
ientists March 22.
-
24. 2003
(organised by UiB), through presentations, posters and discussions. An additional
topic in the discussions was data submission to ArrayExpress, and related requirement
from selected journals.





40

3. CONTRIBUTION OF THE PARTICIPA
NTS




P1: EMBL/EBI

Totally 145 personmonths has been devoted to the project, documented in previous
section for each workpackage separately.


No significant problems encountered.


P3: University of Cambridge

Totally 5 personmonths has been devoted to the
project, documented in previous
section for each workpackage separately.


The researcher funded under the project, Dr. Debashis Rana, has spent their first 2
months familiarising themselves with MIAME and MAML requirements,
successfully installing the MIA
MExpress tool locally, learning how to use it,
understanding our local data production process and assembling preliminary data sets
for use with MIAMExpress. During this process they have made contact with
counterparts in the EBI and early discussions hav
e taken place about how to
customise MIAMExpress as a tool to improve the curation of
Drosophila

gene
expression microarray experiments.

Progress has been slow since the start of the project, due to the difficulty of
identifying a suitable candidate and de
lays in the hiring process due to the need to
obtain a work permit. As a result the successful candidate, Dr. Debashis Rana, joined
us on 4
th

December 2002 and therefore has contributed only 2months work to date. As
a consequence of this late start we hav
e been unable to participate as fully as we had
hoped in some of the earlier
-
action workplans (WP8.2, WP8.4) but before Debashis
started we have succeeded in running an earlier release of the ArrayExpress database
schema locally, running under MySQL rather

than Oracle (WP8.5,WP8.6) which will
facilitate future development and testing work. Where other missed work packages
are still timely we will try and catch up (WP8.3, WP8.6, WP8.9) and, for instance, it
will be a priority to address WP8.3, WP8.6, by wor
king to make MIAMExpress a
better tool for annotation of
Drosophila

microarray experiments. Likewise, although
due to the hiring delay we have missed Dds22 of WP8.9, input on queries is still
timely and we will shortly start to address this.

Due to progre
ss that has already been made by other partners, the delay in hiring, and
re
-
assessment of the effort required in generating MIAME
-
compliant data from our
production workflow, we estimate that the bulk of our efforts will now be directed
towards WP8.7 (10
months), backed up by efforts on WP8.3 (4 months), WP8.6 (5
months), WP8.9 (2 months), and WP8.12 (4 months).


Due to a delay in hiring P3 were unable to contribute to deliverable Dds22 of WP8.9
but understand that input to the list of test queries would s
till be beneficial and will
address this shortly.

There is no change of scientific team, other than that the post funded has been filled by
Dr.Debashis Rana.



41

The bulk of the effort in the last 6 months has been towards WP8.7. Dr. Debashis
Rana started on
December 2002 and has been working on organising internal data:
tools have been developed that allow us to deposit array descriptions to ArrayExpress
and to reformat other internal data for use in MIAMExpress. We have proposed
changes to the ADF file form
at to allow the recording of PCR production status for
micorarray elements and to deal with correctly recording different types of 'empty'
wells. The array design specification has been deposited in ArrayExpress and much
work has been done to curate, in c
onsultation with the actual experimenters, the
details of experiments carried out in our facility into the version of MIAMExpress
that we have installed locally. The nature of these experiments requires
MIAMExpress to allow multiple files to be uploaded a
s one set, and this functionality
has just been made available in MIAMExpress. Therefore we expect in the next
couple of months to have brought up to date the set of spotted microarray experiments
due for submission to ArrayExpress.


We have found this wor
kpackage to be more complex than we anticipated, partly due
to the need to deal with Affymetrix data (not yet started) as well as in
-
house spotted
microarray expreiments. As we can now be sure that MIAMExpress will
accommodate our needs, we request permis
sion that the planned effort towards
WP8.6 (5 months) which was to be have been directed towards a direct deposition of
data from our LIMS system instead be transferred to WP8.7.


P4: CNB

Totally 4.5 personmonths has been devoted to the project, documented

in previous
section for each workpackage separately.


No significant problems encountered.


P9: RZPD

Totally 26 personmonths has been devoted to the project, documented in previous
section for each workpackage separately.


No significant problems encounte
red.


P17: CIML

Totally 44 personmonths has been devoted to the project, documented in previous
section for each workpackage separately.


Partner 17 has been contributing to DESPRAD through investment in four main
practical projects. First and foremost the

team has been building a local LIMS database
system called ELOGE to collect, store and distribute microarray data produced by the
laboratory’s technical wet lab platform (WP8.1
-
6, WP8.7 & WP8.9). The ELOGE
project is a collaborative effort with IPSOGEN SA
S, a start
-
up firm involved in industial
technological transfer of TAGC’s microarray platform. This software system is designed
to specifically address the needs of a particular subtype of biochips used in the
laboratory, namely nylon supported microarrays

with radioactive detection. ELOGE
consists of a relational database implemented under the PostgreSQL RDBMS, the
relational schema being largely based on the conceptual organisation of the
ArrayExpress database. The ELOGE database schema is mostly stable,
with
modifications now limited to additions of new or missing data fields required by the

42

MIAME standard. Development during the past year has been related to implementing a
web interface to ELOGE allowing laboratory personnel to populate the local databas
e
with their experimental data. A beta
-
version of this interface is available and about to be
deployed in the TAGC laboratory. The second aspect of the ELOGE project has been to
start developing a specific module for exporting whole experiment’s data in th
e newly
established MAGE
-
ML format. Once experimental data can be collected by the web
interface, stored in the local database and exported in MAGE
-
ML, a TAGC data pipeline
to ArrayExpress can be established for published data.

Partner 17’s second project
concerns the annotation and conceptual design of the cDNA
based microarrays used in the laboratory (WP8.1, WP8.3 & WP8.7). Indeed current
procedure for cDNA probe selection is unsatisfactory in that it leaves many ambiguities
as to the exact nature of the
probes deposited on the chips, especially concerning the
probe position inside the gene structure (UTR or CDS) as well as which alternative
transcripts are being probed by each specific cDNA clone. The C2 project therefore aims
at precise annotation of the

microarray cDNA probes by projecting EST data onto
genomic sequences, allowing our experimental data to be annotated to the level proposed
by MIAME and captured by the MAGE
-
ML format (cDNA clone identification number
linked to ensembl transcript identific
ation, itself linked to Gene Ontology terms). A first
web interface to this C2 annotation tool is expected to be available in the course of 2003
second semester.

TAGC’s third project is involved in the development of an automatic image analysis
software wi
th stringent quality control for the quantification of experimental microarray
hybridizations results (WP8.1, WP8.3 & WP8.7). Discussions by MGED participants
concerning MIAME and data normalisation methods have led to the conclusion that
robust quality me
trics need to be associated to image derived features if valid expression
profile analysis are to ensue. Our team has produced a specific software called BZscan
which applies a novel rational for spot quantification of hybridisation images. This
method is
particularly interesting in its ability to propose a robust quality metric to each
probe intensity, as well as compensate for technical artefacts such as signal saturation
and spot overshining which have sofar severely hampered radioactive detection based

microarrays. BZscan is freely available and our industrial partner IPSOGEN is already
showing much interest in incorporating this software in its biochip based clinical
diagnosis products. A scientific publication describing BZscan has also just been
subm
itted in January 2003 (manuscript attached), including acknowledgment that the
project was supported by the EU TEMBLOR grant.

The fourth project is the conception of a novel gene expression profile analysis method
(WP8.10). The technique is based on mathe
matical graph theoretical concepts applied to
gene coexpression graphs that is able to cluster group genes or samples with similar
profiles with high stringency and low false positive rates. The concept has been
implemented in the Trixy software tool which

is freely available and the method has
been submitted for scientific publication in December 2002 (manuscript attached),
including acknowledgment that the project was supported by the EU TEMBLOR grant.

Finally, partner 17 is hosting the MGED 6
th

internati
onal meeting which will be held 3
-
7
september 2003 in Aix
-
en
-
Provence (WP8.13). Topics presented to the scientific
community will cover MIAME (in particular in relation to scientific journal editors),
MAGE
-
ML, ontologies and data normalisation and analysis

strategies.



43

Bottlenecks experienced by partner 17 involve two unforeseen difficulties: on the one
hand populating their own local LIMS database with data, and on the other hand
implementing the MAGE
-
ML format exportter in the LIMS. Deploying such a LIMS

can be seen as replacing traditional paper lab books with a digital lab book and as such is
more a revolution of routine laboratory practices than straight forward evolution. The
motivation for experimentalists producing microarray data to use digital LIM
S will be
the necessity for publication of results to have submitted MIAME class data to public
archives (an issue to be debated at the forthcoming MGED6 conference). Entering data
in such a normalised system also requires revisiting laboratory protocols a
nd where
appropriate develop the necessary software tools to allow seamless LIMS data collection
(viz. cDNA clone annotation and image feature extraction). The second issue with
MAGE
-
ML data export is both linked to having to retro
-
fit missing or incomplet
e data
fields required by MIAME in the local LIMS and to the continuing maturation of the
MAGEstk software toolkit. The latter should reach stability over the next period
allowing completion of the LIMS to ArrayExpress pipeline.



P18: UMCU

Totally 27 pers
onmonths has been devoted to the project, documented in previous
section for each workpackage separately.


Activities in summary:

MIAME and MAGE
-
ML:

The scientific team at the UMC Utrecht have contributed to the development and
dessimination of the MIAME p
roposals and the MAGE
-
ML format in the following
ways:

Submitting to the proposal and final version

Writing to journals and speaking to journal representatives

Testing MIAME compliancy and MAGE
-
ML formatting of various databases (see
below).

Promoting both

MIAME and MAGE
-
ML at meetings.


Database:

Work on the microarray database has consisted of implementing four different
microarray databases (all based on the EBI standard) and testing these together in
order to determine suitability for uploading of data
and curation, analysis and
automated uploading to ArrayExpress, taking into account the MIAME standards and
MAGE
-
ML format. This comparison has resulted in adopting GeNet as the preferred
local database. Work is now in progress to make the database fully M
IAME
-
compliant and capable of exporting data in MAGE
-
ML.


Data:

Expression profiling datasets (yeast and human) have been sucessfully submitted to
ArrayExpress (MIAME compliant and in MAGE
-
ML). The submission also included
array descriptions and comprehens
ive protocols and was part of a scientific
publication. More data will be submitted to ArrayExpress in 2003. Work is now in
progress to streamline and automate the MIAME compliant, MAGE
-
ML formatted
upload of data from our local database to ArrayExpress.


Analysis tools:


44

The scientific team have worked on an analysis method that uses mRNA coexpression
determinations to validate putative protein
-
protein interactions. This method and the
first results have been published. The software has been implemented loc
ally and in
Expression Profiler.

We have also worked on normalisation methods. This has resulted in a publication.
The software is available locally as a web
-
based tool and as a package in R
(http://www.genomics.med.uu.nl/pub/jvp/ext_controls/).

We are pre
sently working on further improving normalisation and increasing the
scope of the combinatorial genomics approach.


No significant problems encountered.



P19: EMBL
-
HD

Totally 13 personmonths has been devoted to the project, documented in previous
section
for each workpackage separately.


All tasks as described in the project workplan have been completed as scheduled.

Several micro array gene expression studies were used to model, verify and improve the
basic data structures, and ontologies in close interac
tion with the partners of the project,
mainly partner 1. Based upon these results, MAML and ARRAY EXPRESS database
structure were finalized.

The created datasets were loaded into Array Express and served as test cases for
validation of data base structure
and upload tools.

The concepts and principal ideas of the MGED initiative were presented in an
international practical course on “
E
MBO Practical Course
-
Microarray Technology
from Genome to Proteome” which was organized at partner 19.


No significant probl
ems encountered.


P20: University of Bergen

Totally 19 personmonths has been devoted to the project, documented in previous
section for each workpackage separately.


At this early stage of the project, most of the time has been spent on education and
devlo
pment, in particular on the subject of MAGE concepts related to WP8.2 and
WP8.11, and new algorithms in the developed prototype of WP8.10.

The MGED IV meeting in Boston and the MGED Sourceforge website and mailing
lists have been very valuable sources of i
nformation in this phase. In the process, of
implementing the MAGE
-
ML import functionality of J
-
Express (WP 8.11), Dr.
Petersen got involved in development of the MAGE software toolkit (Java Version).
Having a hands on example that needs further developmen
t proved to be an excellent
way of learning the MAGE object model to a more detailed level.

A new approach to the clustering and visualisation of gene expression data has been
developed and implemented as a prototype module for J
-
Express. Methods and
exp
erience from earlier work on protein structure modelling is adapted and applied on
microarray data (WP 8.10). The approach uses the internal distances between the
objects under analysis (genes or arrays representing for example tissues) and allows

45

one to a
ssign different weights to each distance. The experiments performed are
showing promising results. The algorithm is being modified to tailor the method to
gene expression data.

Both of the above tasks involves implementation of Java
-
code, that in addition

has to
be tailored to seamlessly integrate with already developed Java
-
code (J
-
Express and
MAGEstk). This work has involved both a transition in programming language, and
intensive study of code written by others.

Dr. Petersen has worked with data provide
rs in the Norwegian Microarray
Consortium to help them store their data in the laboratory information management
system (LIMS) BASE. This work benefits from Dr. Petersen’s knowledge of MAGE
obtained when implementing a MAGE
-
ML import functionality for J
-
Ex
press. Dr.
Petersen has been responsible for the installation and maintenance of a BASE server
at the University of Bergen.

The project benefits from the close collaborations the bioinformatics group has with
experimental groups in Bergen, including the
groups of Prof. Kalland and Prof.
Vasstrand. These groups are performing cancer studies using microarray technology
and the group assists in experimental design, handling of data, and data analysis. Dr.
Jonassen is supervising two PhD students interacting
closely with these groups (Bø
and Dysvik) and advocates adaptation of MIAME standards in data representation as
well as submission to ArrayExpress.


No significant problems encountered.



46

4. PROJECT MANAGEMENT AND CO
-
ORDINATION



Two meetings have been hel
d for the DESPRAD partners, both in Hinxton. The first
meeting was held 21/2 2002, the second meeting was held 21
-
22/10 2002, the third
meeting was held 02
-
09
-
2003.



Use of resources so far per workpackage and partner (used first year/total planned for
du
ration of project, all figures in personmonths):


WP

P1

P3

P4

P9

P17

P18

P19

P20

Sum

8.1

9/12


1/3


3/4

2/2



15/19

8.2

9/10

0/1



3/4

1/1

1/1

1/1

12/18

8.3

10/12

0/4

1.5/3


5/10

3/2

1/4


21.5/51

8.4

9/12

0/4


3/4

5/9

1/1

1/4


19/34

8.5

30/40

0/6



3/
4




33/50

8.6

15/34

0/5

1/2


4/6

3/4

1.5/2


25.5/53

8.7

10/40

5/8

0/4

20/24

10/16

6/10

5/10


24/96

8.8

30/30








30/30

8.9

15/20

0/2



7/10

4/6


2/4

28/42

8.10

16/28

0/3

1/2

2/5

4/8

5/6

2/6

8/16

38/74

8.11

8/24






0/2

6/4

14/30

8.12

8/20

0/4

0
/3

1/4

0/6

2/4

0/4

0/4

11/49

8.13

3/6


0/1


2/4


15/3

1/1

7.5/15

Sum

145/288

5/37

4.5/18

26/37

44/81

27/36

13/36

19/30

285.5/563


5. EXPLOITATION AND DISSEMINATION ACTIVITIES


No exploitation activities.

Dissemination activities are desribed in detail i
n WP8.13, with full documentation in
appendices B1, B2 and B3.


6.
ETHICAL ASPECTS AND SAFETY PROVISIONS

No ethical and safety issues arising during the period.


7. MID
-
TERM REVIEW

The mid
-
term review for TEMBLOR has been set for October 20
-
21, 2003 at the

EBI.
DESPRAD subprojects met on September 2, 2003, see annex.


8. PLANS FOR THE NEXT REPORTING PERIOD

The plans and objectives for the coming period remain as described in the technical
annex.


47

9. REQUESTS TO THE COMMISSION

Request from partner 3


Dr. Gos
Micklem, Department of Genetics, University of Cambridge


The bulk of the effort in the last 6 months has been towards WP8.7. Dr. Debashis
Rana started on December 2002 and has been working on organising internal data:
tools have been developed that allow
us to deposit array descriptions to ArrayExpress
and to reformat other internal data for use in MIAMExpress. We have proposed
changes to the ADF file format to allow the recording of PCR production status for
micorarray elements and to deal with correctly

recording different types of 'empty'
wells. The array design specification has been deposited in ArrayExpress and much
work has been done to curate, in consultation with the actual experimenters, the
details of experiments carried out in our facility int
o the version of MIAMExpress
that we have installed locally. The nature of these experiments requires
MIAMExpress to allow multiple files to be uploaded as one set, and this functionality
has just been made available in MIAMExpress. Therefore we expect in

the next
couple of months to have brought up to date the set of spotted microarray experiments
due for submission to ArrayExpress.


We have found this workpackage to be more complex than we anticipated, partly due
to the need to deal with Affymetrix data
(not yet started) as well as in
-
house spotted
microarray expreiments. As we can now be sure that MIAMExpress will
accommodate our needs, we request permission that the planned effort towards
WP8.6 (5 months) which was to be have been directed towards a di
rect deposition of
data from our LIMS system instead be transferred to WP8.7.


48

SECTION III: SCHEMATIC DESCRIPTION OF THE PROJECT



Overall objectives of the project:

TEMBLOR will be a new
-
generation bioinformatics project, centred on an integrated
layer fo
r the exploitation of genomic and proteomic data (Integr8) by drawing on
databases maintained at major bioinformatics centres in Europe, and by creating new
important resources for protein
-
protein interaction (IntAct), structural (EMSD) and
microarray (DES
PRAD) data. Integr8 will enable text
-
, structure
-

and sequence
-
based
searches against a gene
-
centric view of all completed genomes. This section is aimed
at developing ArrayExpress, a public repository for microarray data, and the standards
and ontologies
needed to describe, exchange and store microarray data


experiments,
protocols and array designs. Also, software tools for querying the database, and for
curation and submission of data will be developed. Analysis tools, stand
-
alone or
integrated with the

database, are also goals for this project.


Experimental approach and working method:

ArrayExpress is an Oracle
-
based database, implementing the MIAME and MAGE
standards developed by the MGED society. It accepts submissions of arrays,
experiments or prot
ocols by MAGE
-
ML or through the web
-
based submission tool
MIAMExpress, built as an open source project using Perl
-
CGI and MySQL. The query
interface is implemented via Java servlets using Tomcat and Velocity.

The work on standards and ontologies for Micora
rray data is coordinated through the
MGED society.

The analysis tools are developed to be easily integrated with ArrayExpress and to
conform to the standards developed within the project.


Achievements and results to date:




MIAME has been accepted by most

scientific journals as a requirement



MAGE standards have been adopted by OMG, first tools implemented



ArrayExpress is online and populated with over 1000 hybridisations



Several data submission pipelines from partners established



MIAMExpress online submi
ssion and annotation tool is functional



The core ontology for microarray data has been finalised



Online data analysis tool for ArrayExpress functional


The two most relevant publications emanating from the project
:

A Brazma, P Hingamp, J Quackenbush, G
Sherlock, P Spellman, C Stoeckert, J Aach,
W Ansorge, C A Ball, H C Causton, T Gaasterland, P Glenisson, F C P Holstege, I F
Kim, V Markowitz, J C Matese, H Parkinson, A Robinson, U Sarkans, S Schulze
-
Kremer, J Stewart, R Taylor, J Vilo & M Vingron.
Minimu
m information about a
microarray experiment (MIAME)

toward standards for microarray data.

Nature Genetics, vol 29 (December 2001), pp 365
-

371.

A. Brazma, H. Parkinson, U. Sarkans, M. Shojatalab, J. Vilo, N. Abeygunawardena,
E. Holloway, M. Kapushesky, P.

Kemmeren, G.G. Lara, A. Oezcimen, P. Rocca
-
Serra and S. Sansone.
ArrayExpress

a public repository for microarray gene
expression data at the EBI.

Nucleic Acids Research, 2003, 31: 68
-
71.


49