The Functional Genomics Experiment Model (FuGE)

streakconvertingSoftware and s/w Development

Dec 13, 2013 (3 years and 10 months ago)

95 views

The Functional Genomics
Experiment Model (FuGE)


Andy Jones


School of Computer Science and Faculty of
Life Sciences, University of Manchester

History


Data sharing for ‘omics data tackled by various groups:


MAGE format for microarrays (MGED 2002)


PEDRo for proteomics (U. Man 2003)


Problems for functional genomics:


Common parts modelled differently


Labs performing both techniques must create 2 complex
applications to describe similar concepts


Difficult to integrate data


Two efforts to merge MAGE and PEDRo (2004)


Merged models even more complex


Did not cover other techniques e.g. metabolomics


But, significant advantages if upstream details can be
described only once!

Introduction to FuGE


Functional Genomics Experiment model
(FuGE)


Models common components across functional
genomics experiments


Sample description, experimental variables
protocols, multidimensional data

Three uses of FuGE:

1.
A data format for representing laboratory workflows

2.
Supplement existing data formats with additional metadata to
describe their context within a workflow

3.
A framework for building new data formats

FuGE

Common

Bio

Measurement

Audit

Ontology

Protocol

Reference

Investigation

Data

Material

Conceptual

Molecule

Common:


General data format management


Auditing


Referencing external resources


Protocols

Bio:


Investigation structure


Data


Materials (organisms, solutions,
compounds)


Theoretical molecules e.g.
sequences, metabolites stored in
a database

FuGE structure

Description

FuGE exists as:

1. Object model (UML)



UML


XML Schema


2. XML schema



...and Java STK, Hibernate relational DB binding etc.

Use 1: Experiment Workflow

Material

Treatment

Material

Material

Treatment

Material

Treatment

Material

Data Acquisition

Data

Data Transformation

Data

= Inputs and outputs


= ProtocolApplication

Data

Use 2: Tie Together External Formats

ProtocolApplication

Material

ExternalData

mzData file

File format definition

Parser will exist to extract data /
parameters from mzData file

Material can be used to
describe the sample.


This connects the MS data with
a separation workflow

inputMaterial

outputData

Use 3: Build extension data formats

FuGE Status


Milestone 1 (Sept 2005)


Milestone 2 (Dec 2005)


Milestone 3 (May 2006)


Beta Java software toolkit


M2 (March 2006); M3 (Sept 2006)


FuGE v1 (candidate)


Currently in PSI standards process


Expected to stablise from process by March/April 07

Formats extending from FuGE


MAGE version 2 (MGED)


GelML and GelInfoML (PSI)


analysisXML (PSI)


spML (PSI / MSI)


NMR
(FuGE being evaluated by MSI)


Planned migration for mzData and other PSI formats


Upstream workflow description for all groups


investigation structure and variables, sample description etc.


Allows assembly of studies that cross
-
technology boundaries in
one data format

Conclusions


FuGE accepted by MGED, PSI and MSI


for developing future data formats


for describing parts of experiments common across technology


Moving toward convergence of data formats


Simplify process of developing new data standards


Will facilitate data integration and submission of data to
public repositories


Improve the uniformity of data sets in public repositories
thus facilitates querying

Web:
http://fuge.sourceforge.net/

Acknowledgements


FuGE development


Angel Pizarro (UPenn), Michael Miller (Rosetta), Paul Spellman
(Lawrence Berkley)


MGED, PSI, Fred Hutchinson CRC, Genologics


PSI


Chris Taylor, Henning Hermjakob, Randy Julian


MSI


Nigel Hardy and Helen Jenkins (Aber)



Work on FuGE in Manchester is funded by the BBSRC

Email:
ajones@cs.man.ac.uk

Web:
http://fuge.sourceforge.net/