Hunting strategy of the bigcat.

dasypygalstockingsBiotechnology

Oct 2, 2013 (3 years and 8 months ago)

93 views

BiGCaT Bioinformatics

Hunting strategy of the bigcat

BiGCaT,

bridge between two universities

Universiteit Maastricht

Patients, Experiments,

Arrays and

Loads of Data

TU/e

Ideas & Experience


in Data Handling


BiGCaT

Major Research Fields

Cardiovascular

Research

Nutritional &

Environmental

Research

BiGCaT

What are we looking for?





What are we looking for?



Different conditions
show different levels
of gene expression
for specific genes

Differences in gene expression?

Between e.g.:



healthy and sick



different stages of disease progression



different stages of healing



failed and successful treatment



more and less vulnerable individuals


Shows:



important pathways and receptors



which then can be influenced





The transfer of information

from DNA to protein.

From: Alberts et al. Molecular Biology of the Cell, 3rd edn.

Eukaryotic genes

in somewhat more detail



Gene
expression measurement

Functional genomics/transcriptomics
:


Changes in mRNA


Gene expression microarrays


Suppression subtraction lybraries





Proteomics
:


Changes in protein levels


2D gel electrophoresis


Antibody arrays





DNA


浒乁



灲潴敩p

Gene expression
arrays

Microarrays: relative
fluorescense signals.
Identification.

Macroarrays: absolute
radioactive signal.
Validation.

Layout of a microarray experiment

1)
Get the cells

2)
Isolate RNA

3)
Make fluorescent
cDNA

4)
Hybridize

5)
Laser read out

6)
Analyze image


The cat and its prey:

the data

Comprises:


Known cDNA sequences (not known genes!)

on the array = reporters


Data sets typically contain 20,000 image spot
intensity values in 2 colors


One experiment often contains multiple data
points for every reporter (e.g. times or
treatments)


Each datapoint can (should) consist of multiple
arrays


Bioinformatics

should translate this in to useful
biological information

Hunting

Comprises:


Analyze reporters


Data pretreatment


Finding patterns in expression


Evaluate biological significance of
those patterns

Reporter analysis


Reporter sequence must be known

(can be sequenced using digest
electrophoresis).


Lookup sequence in genome databases

(e.g. Genbank/Embl or Swissprot)


Will often find other RNA experiments
(ESTs) or just chromosome location.

Blast reporters against what?


Nucleotide databases (EMBL/Genbank)

Disadvantages: many hits, best hit on
clone, we actually want function (ie
protein)


Nucleotide clusters (Unigene)

Disadvantage: still no function


Protein databases (Swissprot+trEMBL)

Disadvantages: non coding sequence
not found, frameshifts in clones

Two implemented solutions


Start with Unigene (from Blastn or
platform provider), mine using
SRS (direct, through PDB, through
PIR)
-
> Swissprot/trEMBL


Use dedicated EMBL
-
Swissprot X
-
linked DB (Blast against EMBL
subset get Swissprot/trEMBL)


Two implemented solutions


Start with Unigene (from Blastn or
platform provider), mine using
SRS (direct, through PDB, through
PIR)
-
> Swissprot/trEMBL


Use dedicated EMBL
-
Swissprot X
-
linked DB (Blast against EMBL
subset get Swissprot/trEMBL)


Scotland
-

Holland: 1
-
0?

Check Affymetrix reporter sequences.


-
Each reporter 16 25
-
mer probes.

-
Blast against ENSEMBL genes

(takes 1 month on UK grid).

-
Use for cross
-
species analysis

-
Adapt RMA statistical analysis in
Bioconductor

Next slide shows data of one
single actual microarray



Normalized expression shown for both
channels.


Each reporter is shown with a single dot.


Red dots are controls


Note the GEM barcode (QC)


Note the slight error in linear
normalization (low expressed genes are
higher in Cy5 channel)

Next slide shows same data
after processing



Controls removed


Bad spots (<40% average area)
removed


Low signals (<2.5 Signal/Background)
removed


All reporters with <1.7 fold change
removed (only changing spots shown)

Final slide shows information
for one single reporter



This signifies one single spot


It is a known gene:

an UDP glucuronyltransferase


Raw data and fold change are
shown

Secondary
Analyses


Gene clustering

(find genes that behave equally)


Cluster evaluation

(what do we see in clusters …)


Physiological evaluation

(for arrays, proteomics, clusters)


Understand the regulation



2

time

Expr. level

Clustering: find genes with same pattern

T1 signal

T2 signal

Left hand picture shows expression patterns for 2 genes (these
should probably end up in the same cluster).

Right hand picture shows the expression vector for one gene
for the first 2 dimensions. Can be normalized by amplitude
(circle) or relatively (square).

Cluster evaluation


Group genes (function, pathway,
regulations etc.)


Find groups in patterns using
visualization tools and automatic
detection.


Should lead to results like:

“This experiment shows that a large number
of apoptosis genes are up
-
regulated during
the early stage after treatment. Probably
meaning that cells are dying”



Example of GenMAPP results:

Manual lookup on a MAPP

Understanding regulation

The main idea:

co
-
regulated genes could
have common regulatory pathways.

The basic approach:

annotate transcription
factor binding sites using Transfac and
use for supervised clustering.

The problem:

each gene has hundreds of
tfb’s.

Solution?

Use syntenic regions using rVista
(work in progress with Rick Dixon)


Understanding QTL’s

Get blood pressure QTLs:


from ENSEMBL/cfg Welcome group.

Look up functional pathways and Go
annotations using GenMapp:

virtual
experiment assume all genes in QTL are
changing.

Create a new blood pressure Mapp:

confront
this with real blood pressure/heart failure
microarray data.


Work in progress TU/e MDP3 group.

People involved

Bigcat Maastricht:

Rachel van Haaften (IOP), Edwin ter Voert (BMT),

Joris Korbeeck (BMT/UM), Willem Ligtenberg (IOP), Stan Gaj (tUL), Chris Evelo


Tue:
Peter Hilbers, Huub ten Eijkelder, Patrick van Brakel, lots of students

CARIM:
Yigal Pinto, Umesh Sharma, Blanche Schroen, Matthijs Blankesteijn,

Jos Smits, Jo de Mey, Danielle Curfs, Kitty Cleutjens, Natasja Kisters, Esther

Lutgens, Birgit Faber, Petra Eurlings, Ann
-
Pascalle Bijnens, Mat Daemen, Frank
Stassen, Marc van Bilssen, Marten Hoffker.

NUTRIM:
Wim Saris, Freddy Troost, Johan Renes, Simone van Breda.

GROW:
Daisy vd Schaft, Chamindie Puyandeera

IOP Nutrigenomics:
Milka Sokolovic, Theo Hackvoort, Meike Bunger, Guido
Hooiveld, Michael Müller, Lisa Gilhuis
-
Pedersen, Antoine van Kampen, Edwin
Mariman, Wout Lamers, Nicole Franssen, Jaap keijer

Cfg Welcome group:
Neil Hanlon (Glasgow) Gontran Zepeda (Edinburg),

Rick Dixon (Leicester), Sheetal Patel (London).

Paris leptin group:
Soraya Taleb, Rafaelle Cancello,Nathalie Courtin, Carine
Clement

Organon:
Jan Klomp, Rene van Schaik.

BioAsp:
Marc Laarhoven.