Bioinformatics Coursework: - Imperial College London

austrianceilΒιοτεχνολογία

1 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

58 εμφανίσεις

Course 341: Introduction to Bioinformatics

2004/2005, 2005/2006, 2006/2007

Moustafa Ghanem


Imperial College London

Course 341: Introduction to Bioinformatics

Answers to Microarray Bioinformatics Tutorial 1

(Review questions on gene expression)
1


1.

What is meant by gene expression?

Lecture
s

slide #
7
-
11


From Lecture Oxford Dictionary of Biology (4
th

Edition):


gene expr
ession
:
-

The manifestation of the effects of a gene by the production of the
particular protein, polypeptide, or type of RNA whose synthesis it controls. Individual genes
can be ‘switched on’ (exert their effects) or ‘switched off’ according to the needs a
nd
circumstances of the cell at a particular time. A number of mechanisms are thought to be
responsible for the control of gene expression ….
[some biology deleted]
In multi cellular
organisms, expression of the right genes in the right order at the right
times is particularly
crucial during embryonic development and cell differentiation. This involves subtle and
complex interplay of chemical signals with the embryo’s genes, in patterns that vary between
different types of organisms. Abnormalities of gene e
xpression may result in the death of
cells, or their uncontrolled growth, as in cancer.






















From Page 3 of Microarray Gene Expression Data Analysis A Beginner’s Guide (Helen
C. Causton, John Quackenbush & Alvis Brazma, Blackwell Publishin
g 2003):


DNA is a subtle molecule and the same DNA is present, with a few exceptions, in all the cells
of an organism. Despite this, not all the cells are the same. Many of the differences between
them are due to the different subsets of genes that are ex
pressed in each of the different cell
types. We also find different subsets of genes expressed in response to stimuli, so that the
patterns of gene expression levels reflects both the cell type and its condition. Microarrays
permit the detection of the abu
ndance of various mRNA molecules or transcripts in a cell at a
given moment. The amount of each mRNA detected in the cell can provide information on the
corresponding protein ….





1

For this tutorial, I provide more than just answers to the questions and provide an explanation of some
of the background collected from the different sources. This is to help you appreciate the mo
tivation
and background for this part of the course. For the review questions in the remainder of the tutorials, I
will only refer to the relevant lecture slides and provide a typical short answer.

Course 341: Introduction to Bioinformatics

2004/2005, 2005/2006, 2006/2007

Moustafa Ghanem


Imperial College London

2.

Why are microarrays a hot topic in bioinformatics?

Lectu
res slides 12
-
14


Fro
m Microarray Gene Expression Data Analysis A Beginner’s Guide (Causton,
Quckenbush and Brazma) and Bioinformatics Computing (Bergeron)


Microarrays are effectively transforming a living cell from a black box into a transparent box.
They offer an efficient
method of gathering data that can be used to determine the patterns of
gene expression of all the genes in an organism in a single experiment. They also allow
researchers to examine the mRNA from different tissues in normal and diseased states to
determine

which genes and environmental conditions can lead to disease. Similarly,
microarrays can be used to determine which genes are expressed in which tissues and at
which times during embryonic development.







































From http
://www.ncbi.nlm.nih.gov/About/primer/microarrays.html


Microarrays are a significant advance both because they may contain a very large number of
genes and because of their small size. Microarrays are therefore useful when one wants to
survey a large numbe
r of genes quickly or when the sample to be studied is small.
Microarrays may be used to assay gene expression within a single sample or to compare
gene expression in two different cell types or tissue samples, such as in healthy and diseased
tissue. Becau
se a microarray can be used to examine the expression of hundreds or
thousands of genes at once, it promises to revolutionize the way scientists examine gene
Course 341: Introduction to Bioinformatics

2004/2005, 2005/2006, 2006/2007

Moustafa Ghanem


Imperial College London

expression. This technology is still considered to be in its infancy; therefore, many initial
stud
ies using microarrays have represented simple surveys of gene expression profiles in a
variety of cell types. Nevertheless, these studies represent an important and necessary first
step in our understanding and cataloguing of the human genome.


As more inf
ormation accumulates, scientists will be able to use microarrays to ask
increasingly complex questions and perform more intricate experiments. With new advances,
researchers will be able to infer probable functions of new genes based on similarities in
exp
ression patterns with those of known genes. Ultimately, these studies promise to expand
the size of existing gene families, reveal new patterns of coordinated gene expression across
gene families, and uncover entirely new categories of genes. Furthermore,
because the
product of any one gene usually interacts with those of many others, our understanding of
how these genes coordinate will become clearer through such analyses, and precise
knowledge of these inter
-
relationships will emerge. The use of microarra
ys may also speed
the identification of genes involved in the development of various diseases by enabling
scientists to examine a much larger number of genes. This technology will also aid the
examination of the integration of gene expression and function
at the cellular level, revealing
how multiple gene products work together to produce physical and chemical responses to
both static and changing cellular needs.

.

.

.… picture this: a hand
-
held instrument that a physician could use to quickly diagnose can
cer
or other diseases during a routine office visit. What if that same instrument could also
facilitate a personalized treatment regimen, exactly right for you? Personalized drugs.
Molecular diagnostics. Integration of diagnosis and therapeutics. These are

the long
-
term
promises of microarray technology. Maybe not today or even tomorrow, but someday. For the
first time, arrays offer hope for obtaining global views of biological processes

simultaneous
readouts of all the body's components

by providing a syst
ematic way to survey DNA and
RNA variation.



3.

What do we mean by the term “Gene co
-
expression”, and how can microarrays be
used to study co
-
expressed genes.

Lecture
s

slides #14, Also see
Lecture
s on Data clustering



Genes co
-
expression refers to the situ
ation when two or more genes have similar gene
expression profiles under different conditions, i.e. when their gene expression levels follow
one another; when the expression level of gene A is high, the expression level of gene B is
also high, when the exp
ression level of A is low, so is that of B.


Finding co
-
expressed genes, helps biologists to discover understand the subtle interactions
between genes and helps them identify which genes are involved in the same biological
processes.



Microarrays allow u
s to understand co
-
expression since they allow us to measure the
expression levels of many genes at a any given time. By designing experiments that take
such snapshots from different cell types, and under different environmental conditions, one
could anal
yse the data using data clustering algorithms to find such co
-
expressed genes and
study their relationships in detail.


Typically the output from the microarray experiments is stored as a gene expression matrix
where the rows represent genes, and the colum
ns represent values measured from different
samples (representing different conditions). One could then apply a data clustering algorithm
(such as hierarchical clustering) over the gene expression matrix (to find row clusters) and
thus identify which genes

have a similar expression profile. If the output is organised as a
dendrogram one could also study the degree of similarity between these genes.





Course 341: Introduction to Bioinformatics

2004/2005, 2005/2006, 2006/2007

Moustafa Ghanem


Imperial College London



















4.

How can microarrays be used to study the relationships between different disease
types
?

Lectures
slide # 4


If we run experiments that record gene expression levels as measured in cells taken from
different disease types, one ends up with a gene expression matrix where the columns
represent disease types. One could then transpose the matrix

to obtain a profile for each
disease in terms of gene expression values (i.e. for each disease you know the expression
value for each gene in the organism), and then apply data clustering algorithms to find which
diseases have a similar profile. We typica
lly call this clustering the columns of the matrix,
rather than clustering the rows of the matrix. The diagram above shows data that has been
clustered both by rows and columns.


Such type of analysis may be useful in establishing links between the disease
s, and can be
starting point in designing drugs for them, e.g. if you know that drug
-
X is effective for the
treatment of a certain disease, it may also be effective in treating another disease that has a
similar profile.



5.

Explain briefly how microarray ex
periments can be used in different phases of the
drug discovery process.

2

Lectures slides 4
-
6


Drug discovery is a lengthy and expensive process that takes years and that requires the use
of a variety bioinformatics, chemoinformatics and clinical informat
ics tools. The diagram below
shows the different stages that are typically required when designing a new drug.








Roughly speaking:


In the first phase
“Target Identification”
, you are interested in understanding the disease
states and in finding an
d understanding the properties of potential targets for the drug to act
on, e.g. which genes and proteins are involved in the disease and how do they vary in the
population. You can use microarray experiments here to identify how gene expression varies



2

If you are really interested check the affymetrix brochu
res on the applications of gene chip in drug
discovery on the web http://www.affymetrix.com/support/technical/other/drug_discovery_brochure.pdf

Target
Identificatio
n

Target
Validation

Lead

Identificatio
n

Lead

Optimizatio
n

Pre
-
clinical

Trials

clinical
Trials

Columns represent gene expression
measurements collected from Samples
colle
cted at different conditions

Rows
represent
Genes

Gene expression levels

Course 341: Introduction to Bioinformatics

2004/2005, 2005/2006, 2006/2007

Moustafa Ghanem


Imperial College London

bet
ween diseased and healthy cell. Effectively you are trying to find biomarkers for the
disease. You may also be trying to design
diagnostic tools
that can be used to characterise
the individuals who have the disease based on their gene expression profiles.


In the

Target Validation
stage you are interested in collecting information about what
happens when you manipulate the target (e.g. knock out a gene, suppress a protein, etc), and
how this affects the other genes and proteins in the organism. Again, you c
an design
microarray experiments that measure such effects, including finding which other genes are
co
-
expressed or co
-
regulated with those you identified as a target.


In
Lead identification

you are trying to design (from scratch) or find (from an existi
ng
compound library) a chemical compound (or lead compound) that achieves the desired effect
on the targets that you have identified. At this stage you typically need to screen and test a
large number of compounds. You can use data collected from microarra
y experiments to
prioritize testing the compounds and also to characterize their potential efficacy and toxicity.


Once you have identified a number of potent drug molecules that achieves the desired effect,
you move to optimising the use of these molecul
es in the
Lead Optimization

phase, where
you are trying to study the mechanisms of their toxicity. You can also use microarray
experiments at this stage to find new applications for your leads, to understand potential
problematic side effects, and to disco
ver biomarkers for monitoring toxicity and efficacy during
preclinical and clinical trials.


You can then move into
Pre
-
clinical

and
Clinical Trials
, where you test the effects drug on
model organisms (e.g. worms etc), on animals (at the risk of going into

an ethical and political
problems), and on humans. At this stage, you may be interested in choosing the right
candidates (e.g. which model organisms or which people) for you experiments. Also at this
stage you are looking to understand both the efficacy o
f the drug and its side effects and thus
trying to understand which groups of people would benefit most from the drug (e.g.
prognostic studies
) based on their measured gene expression profiles.


(Review questions on microarray technology)


6.

Explain the bas
ic idea behind microarrays.

Lectures

slides
#15
-
21


A Microarray is a device detects the presence and abundance of labelled nucleic acids
(mainly mRNA) in a biological sample. It works by exploiting the ability of a given mRNA
molecule to bind specifically

to, or hybridize to, the DNA template from which it originated.


A Microarray consists of a solid surface onto which known DNA molecules have been
chemically bonded at special locations:



Each array location is typically known as a probe and contains many

replicates of the
same molecule.



The molecules in each array location are carefully chosen so as to hybridise only with
mRNA molecules corresponding to a single gene.


To use a microarray in a given experiment, you label the molecules of the sample with a

fluorescent dye, and then apply it to the array:



If the a gene is expressed in the sample, the corresponding mRNA hybridises with
the molecules on a given probe (array location).



If a gene is not expressed, no hybridisation occurs on the corresponding pr
obe.


With the aid of a computer, the amount of mRNA bound to the spots on the Microarray is
precisely measured by detecting the intensity of light emitted by the florescent dye from each
spot. The information is then digitised and made ready for analysis.



Course 341: Introduction to Bioinformatics

2004/2005, 2005/2006, 2006/2007

Moustafa Ghanem


Imperial College London

7.

When manufacturing microarrays:


a)

What are the main two types of probes that can be used and what are the
advantages of each?
Lectures

slides #
22
-
30


The two main types of probes are: cDNA (PCR
3

product) probes and oligonucleotide
(oligo) probes.


cDNA

probes are derived through the application of highly parallel PCR on a clone library.
The resulting probes are thus typically long sequences (Complete genes or ESTs). The
method is simple (standard in most biology labs) and flexible in that it allows usin
g cDNAs
from any source without a priori information about the corresponding genes. The longer
sequences generated also increase hybridization specificity of the probes. However, one
disadvantage of using such longer sequences is that it reduces the densit
y of packing the
probes on a chip. Another disadvantage of cDNA probes is that since they are derived
through cloning, they may end up to include errors in their sequences (as a result of the
experimental procedure used).


Oligos are short fragments of a
single
-
stranded DNA that are typically 5 to 50 nucleotides
long. Since they are short they can be synthesized in the lab from base nucleotides. Also
since they are short and synthetic, they typically contain less errors which improves
specificity and ensur
e reproducibility of results. Since they are short they can achieve a
higher packing density on a chip. In order to generate oligo probes, you first have to
design them (choose their bases) which requires you to understand more about the genes
you will be
measuring with the array. Also since they are short, they can be synthesized in
-
situ which leads to many other advantages (e.g. result in better quality spots on the array).


b)

What is the difference between spotting and in
-
situ synthesis and what are the
a
dvantages of each?
Lecture 12 slides # 14
-

18


Spotting works for both cDNA probes and oligo probes. It is a mechanical process that is
based on using robots, and is typically less high tech than in
-
situ synthesis so is cheaper.
However, it requires prepa
ration of the probes and storing them before they are added to
the chips. This may introduce experimental errors and contamination. It may also result in
spot qualities be variable (e.g. due to different amounts being of probe molecules being
deposited in
each spot location, or not being placed in exactly the right location etc).


In
-
situ synthesis is based on adding the bases of the sequence one at a time to each array
location. It starts by fixing one molecule to the surface and then adding the bases. It
typically is very precise (similar to manufacturing silicon chips) and thus results in higher
quality probes (both amount and shape). However, it only works for short probes of known
sequences (oligos) and not arbitrary cDNA probes. It is typically a more
expensive process
than that involving the use of spotting robots.



8.

Describe the difference between single
-
label experiments and dual label
experiments. What are the data analysis implications of each?

Lecture 12 slides #
31
-
36


In double label experiment
s, two target samples can be applied to the array
simultaneously, where each sample is labelled using a different dye (Cy3 dye excited by a
green laser and Cy5 dye excited by a red laser). The output of double label experiments
can be interpreted by refer
ring to the colours of the spots: a green spot means more
hybridisation of one sample, a red spot means more hybridisation of the other, a yellow
spot means equal hybridisation from both samples.





3

PCR (polymerase chain reaction) is technique used to replicate a fragment of DNA to
produce many copies of a
particular DNA sequence.


Course 341: Introduction to Bioinformatics

2004/2005, 2005/2006, 2006/2007

Moustafa Ghanem


Imperial College London

In single label experiments, the target sample is labelled
using only one dye, typically Cy3

which is excited by a green laser. The main implication is that two measure two samples,
we need to use two chips rather than one.



9.

Describe the operation of cDNA microarrays. Make sure you solutions explains the
interpr
etation of the colours “Red”, “Yellow” and “Green”.

Lecture 12 slides # 31
-

3
3


Should be able to do this one yourself based on the answers provided above. Make sure
to explain first what a microarray is, what cDNA probes are, and then using two samples,
and then what the colour means. BTW What does a black spot mean?



10.

Describe the Affymetrix microarray technology.


Lectures

slides # 34
-
3
7




Make sure your answer defines the term “oligonucleotides” and describes how
“photodeprotection using masks” is used

to fix the oligonucleotide probes to
the to the array surface.


Affymetrix GeneChips are microarray chips that are based on the use of a single colour
label. The chips use oligos (which are short fragments of a single
-
stranded DNA that are
typically 5 to
50 nucleotides long) as probes to achieve higher packing density and less
manufacturing errors.


Affymetrix array technology is based on synthesizing the oligos in
-
situ on the glass
surface of the array one base at a time. In order to control the addition

of bases to specific
array locations, in
-
situ synthesis requires the addition of masks (a protective layer that
prevents the addition of other bases by mistake) after each base is added. To add the next
base to a probe, the mask has to be removed either u
sing an acid or using light. Affymetrix
uses an approach based on photo
-
deprotection wherby light is directed to the appropriate
probe masks to remove them.























Course 341: Introduction to Bioinformatics

2004/2005, 2005/2006, 2006/2007

Moustafa Ghanem


Imperial College London



Make sure your answer describes what is meant by a “Probe”, “Probe Pair” an
d
“Probe Set”.





Probe:

oligonucleotide sequence (e.g. 25 bp, shorter than cDNA) “fabricated” to
surface in high density by chip
-
making technology



Probe pair:

one normal oligonucleotide sequence (“perfect match”, PM), another
similar oligo with one base ch
anged (“mismatch”, MM)



Probe set:

a collection of probe pairs for the purpose of detecting one mRNA
sequence.




Make sure that you answer describes the terms “Perfect Match Probe”, and
“Mismatch Probe” and how they are used.


Affymetrix
GeneChips use a Per
fect Match/Mismatch probe strategy. Each probe designed to
be perfectly complementary to a target sequence, a partner probe is generated that is
identical except for a single base mismatch in its centre. These probe pairs, called the Perfect
Match probe (P
M) and the Mismatch probe (MM), allow the quantitation and subtraction of
signals caused by non
-
specific cross
-
hybridization.

The difference in hybridization signals between the partners, as well as their intensity ratios,
serve as indicators of specific
target abundance.





















11.

What is the general workflow for conducting a microarray experiment? Make sure
your answer describes both manufacturing the array and using it.

Lectures slides # 2
1


1.

Prepare DNA chip(s) by choosing probes and attachin
g them to glass substrate. Note
location and properties of each probe.


2.

Generate a hybridization solution containing a mixture of fluorescently labelled
targets.



3.

Incubate hybridization mixture.


4.

Detect probe hybridization using laser technology

a.

Scan the

arrays and store output as images

b.

Quantitate each spot

c.

Subtract background

d.

Normalize

e.

Export a table of fluorescent intensities for each gene in the array


5.

Analyze data using computational methods.

Course 341: Introduction to Bioinformatics

2004/2005, 2005/2006, 2006/2007

Moustafa Ghanem


Imperial College London

12.

What are the main steps in the analysis of microarray data

and what are the main
data structures used?


Normalization Lecture slides

























Main data structures are raw array scans (TIFF images), intermediate spot matrices and gene
expression matrices.





















13.

Describe what type of
image processing may be required for reading a microarray
output, giving examples of typical problems need to be addressed.

Lectures slide #37


Image processing required is to find the location of its spot and find its extend (edges) and
measure its averag
e intensity. The typical problems that arise include having spots at uneven
grid positions (different spacing or those that curve out), having variable spot shape and size
across spots, and also having variable intensity within each spot.


Biological question

Biological verification and interpretation

Microarray experiment

Experimental design Platform Choice

Image analysis

Normalization

Data Mining

Course 341: Introduction to Bioinformatics

2004/2005, 2005/2006, 2006/2007

Moustafa Ghanem


Imperial College London


14.

What is meant
by gene expression matrix and how it can be generated from a raw
microarray data.

Lectures slide #38


Diagram from
13 above shows how a gene expression matrix generated
.
In the gene
expression matrix, rows represent genes (as opposed to features/spots on t
he array) and
columns represent measurements from different experimental conditions measured on
individual arrays.

Problems


15.

Using a series of diagrams, show how the following the following probes “GTGT”
and “ATAT” can be in
-
situ
-
synthesized on an array
surface using the affymetrix
approach.



In the sequence below I use S to represent Glass Surface, X to represent first molecule
that attaches to the surface, M to be a mask, and A, C, G and T to represent the bases


Step 1 (add X and M)

Probe 1: S
-
X
-
M

Pro
be 2: S
-
X
-
M


Step 2 (De
-
protect Probe 1 and then add G and M)

Probe 1: S
-
X
-
G
-
M

Probe 2: S
-
X
-
M


Step 3 (De
-
protect Probe 2 and then add A and M)

Probe 1: S
-
X
-
G
-
M

Probe 2: S
-
X
-
A
-
M


Step 4 (De
-
protect Probe 1 and Probe 2 and then add T and M)

Probe 1: S
-
X
-
G
-
T
-
M

Probe 2: S
-
X
-
A
-
T
-
M


Step 5 (De
-
protect Probe 1 and then add G and M)

Probe 1: S
-
X
-
G
-
T
-
G
-
M

Probe 2: S
-
X
-
A
-
T
-
M


Step 5 (De
-
protect Probe 2 and then add A and M)

Probe 1: S
-
X
-
G
-
T
-
G
-
M

Probe 2: S
-
X
-
A
-
T
-
A
-
M


Step 6 (De
-
protect Probe 1 and Probe 2 and then add

T and M)

Probe 1: S
-
X
-
G
-
T
-
G
-
T
-
M

Probe 2: S
-
X
-
A
-
T
-
A
-
T
-
M