miRNA Folding Prediction Algorithms - Stanford AI Lab

disturbedtonganeseBiotechnology

Oct 2, 2013 (3 years and 11 months ago)

82 views

+

miRNA

Discovery and
Prediction Algorithms

George Michopoulos

+

microRNAs


What are they?


Why do we care about them?


How do we discover them?


Biological Methods


Computational
Methods


What limitations do these methods have?


+

What is microRNA?

+

miRNA

structure


Small non
-
coding RNAs


~22
-
25 bases
long


Characterized by their hairpin
precursors, composed of the

mature
, the
loop
, and the
star

miRNA

+

miRNA

biogenesis


Transcribed in the nucleus


Pri
-
miRNA

hairpin gets cut by
Drosha

enzyme


The pre
-
miRNA

then either
degrades into
miRNA

naturally, or
gets cleaved by the
D
icer
enzyme


Then the
miRNA

gets bound by
an
Argonoute

protein into a RNA
-
induced silencing complex


Then the complex binds target
mRNA and cleaves it

+

Why do we care?


miRNAs

regulate protein expression,
including those involved in:


Cancer


inhibit proteins responsible
for controlling proliferation


Neural development


links to
schizophrenia


Cardiac development


linked to
cardiomyopathies


DNA methylation and histone
modification


can alter the expression
of target genes



+

Why do we care?


The use of
antagomirs
,
chemically engineered
oligonucleotides
, could be
used as a therapy for such
diseases
to silence
endogenous
microRNA


Non
-
coding RNAs account
for a significant portion of
the genome, so their
homology can be used as
tool to assess phylogeny

+

Detection and Discovery


Biological Methods:


Can use RT
-
PCR and QPCR for individual
miRNAs


Can use microarrays to detect multiple
miRNAs


Computational Methods:


Mining deep
-
sequencing data and using predictive algorithms to
detect
miRNA

characteristics and compare potential sequences to
homologs


Bentwich

et al. (2005)



miRAlign
: Wang et al. (2005
)


miRDeep
:
Friedländer

et al. (2008
)


miRDeep2
:
Friedländer

et al. (
2011)







+

RT
-
PCR


Reverse
transcription
polymerase chain
reaction, not
real time PCR (
qPCR
)


Desired RNA is transcribed and
the resulting
cDNA

is amplified
using
qPCR



Is useful for detecting very low
copy numbers of RNA molecules;
oldest method, non
-
specific for
miRNA


+

Northern Blotting


M
easure levels of RNA
expression using probes
with partial homology


This picture shows a
northern blot that has
detected 4/5 of the shown
microRNAs


Lower sensitivity, but
higher specificity than RT
-
PCR


Fewer false positives


+

Microarray Detection


Microarrays first used to detect
miRNAs

in 2004
by different
groups


Probes can be developed and then chip can be ordered through
companies (
Barad

et al.)


E
verything can be developed and put together using amine
-
binding slides and an array printer (
Miska

et al.)


Incredibly more efficient for
large scale discovery, but limited by
the need for prior sequence data for probe
development

+

Took known
miRNA

sequences

Created DNA chips with
probes complementary to
those sequences

Hybridized
miRNA

samples onto chips

Performed Clustering
Analysis

Use
mirMASA

to
confirm findings

Found that the microarray method
has a higher sensitivity and
specificity than previous
miRNA

identification methods

Barad

et al.
(2004)

+

Useful Programs:

RNAFold


RNAFold

is an algorithm that is part of the “Vienna Package”


Takes in RNA sequences and calculates their minimum free
energy structure, outputting the following results:

+

Useful Programs:

ClustalW


ClustalW

is a multiple local
alignment tool that is frequently
used to compare homologous
sequences across species, or to
compare families of genes.


Takes in two sequences, does a
pairwise alignment, creates a
phylogenetic tree, and then
uses that to conduct multiple
alignment using other
sequences

+

Bentwich

et al. (2005)

+

Bentwich

et al. (2005)


Scanning the entire human genome identified 11 million hairpins, including
86% of known microRNA precursors.


After
microarray
sampling, the 359 expressed microRNAs
were subjected to
confirmation by sequencing


Successfully cloned
and sequenced
89 human
microRNA genes that do not appear in
the microRNA
registry


Using UCSC
BlastZ

alignment and
ClustalW
, found that fifty
three of
these are
located
in two large
non
-
conserved clusters
, including one on
chromosome
19 that is only expressed in the placenta and was
the largest microRNA
cluster ever
reported.


This cluster comprises 43
new predicted
microRNAs which all show
similarity to
a
neighboring
miRNA

family specifically
expressed in human embryonic
stem cells


The other cluster is on the X chromosome and its
miRNAs

are only expressed in the
testis


Homology
analysis showed that
both clusters are conserved
only in
chimpanzees
and possibly
rhesus monkeys

+

miRAlign
: Wang et al. (2005)


A
novel genome
-
wide
computational
approach to
detect
miRNAs

in animals
based on both sequence and
structure
alignment


Uses
RNAfold

to test
secondary structures, then
CLUSTAL to perform pairwise
alignment, unique algorithms
to confirm the
miRNA’s

position on the stem
-
loop, and
finally
RNAforester

to conduct
pairwise structure alignment

+

miRAlign
: Wang et al. (2005)


miRAlign

outperforms BLAST search in both sensitivity and
selectivity, and furthermore, nearly all the known
miRNAs

found by
BLAST can
also be
detected by
miRAlign
.


The average
number
of false positives
is 7.1 for BLAST and
0.9
for
miRAlign


Algorithm is dependent on pre
-
existing data to search
against, only useful for finding
miRNAs

that are closely
related to previously annotated ones.

+

miRDeep
:
Friedländer

et al.






(
2008)



Suite of PERL scripts


Uses
a probabilistic model of
miRNA

biogenesis to score
compatibility of the position
and frequency of sequenced
RNA with the secondary
structure of the
miRNA

precursor



+

Algorithm

for P(sequence is a precursor)


score
= log (P(pre | data) / P(
bgr

| data
)


The
probability of the sequence being a
precursor is given by Bayes’ theorem:


P
(pre | data) = P(data | pre) P(pre) /
P(data)


P
(pre | data) = P(abs | pre) P(
rel

| pre)
P(sig | pre) P(star | pre) P(
nuc

| pre) P(pre)
/ P(data
)


The
same holds for the probability of the
sequence being a background hairpin:


P
(
bgr

| data) = P(data |
bgr
) P(
bgr
) /
P(data
)


P
(
bgr

| data) = P(abs |
bgr
) P(
rel

|
bgr
)
P(sig |
bgr
) P(star |
bgr
) P(
nuc

|
bgr
)
P(
bgr
) / P(data)

+

miRDeep
:
Friedländer

et al.






(2008)



Of the 555 known human mature
miRNA

sequences, 213 were
present in the data set. Of these, 154 (72%) were successfully
recovered by
miRDeep
. The total estimated number of false
positives was 6
±

2


This pipeline is much more efficient at finding microRNA
expression from deep
-
sequencing than the previous
methods


+

miRDeep2
:
Friedländer

et al.






(2011)


Analyzing
data from
seven animal
species
representing the major
animal clades
, miRDeep2
identified
miRNAs

with an
accuracy of 98.6

99.9%
and reported hundreds of
novel
miRNAs


New package include
many more options and
graphical outputs that
make the software more
accessible

+

miRDeep2
:
Friedländer

et al.






(
2011)


+

miRDeep2
:
Friedländer

et al.






(
2011)


+

miRDeep2
:
Friedländer

et al.






(
2011)


+

miRDeep2
:
Friedländer

et al.






(2011)



Relative to miRDeep1:


Performs
excision by scanning the genome for stacks of
reads, where a stack is
one
or more reads that map to the exact same 50 and 30 positions in the
genome


When identifying
miRNAs

in data from sea squirts, known to harbor large
numbers of non
-
canonical
miRNAs
, the first version of
miRDeep

only reports 46
known and 31 novel
miRNAs
. In contrast, miRDeep2 reports 313 known and 127
novel
ones


Can
detect anti
-
sense
miRNAs

(+/
-
)


Supports
single or multiple mismatches
.


Performs
substantially better on the human data, reporting 186 known and 36
novel
miRNAs

(compared to 154 known and 10 novel in the initial publication
)




More
accurate detection of lowly abundant
miRNAs


Faster
; analyzed
30
million RNAs
in less
than 5 h and
with 3
GB
memory


More intuitive interface for biologists

+

Beyond miRDeep2


Remaining challenges in identifying and detecting
expression levels of
miRNA
:


miRBase
, the primary database used as a source for
miRNA

annotations used today, is for from pristine


Hard to tell whether detected novel
miRNAs

actually have a
biological function, will take a lot of biological experimentation
until we know that


Algorithms still have room for improvement in terms of
accessibility and efficiency

+

Questions?

+

References


Barad
,
O.,
Meiri
, E.,
Avniel
, A.,
Aharonov
, R.,
Barzilai
, A.,
Bentwich
, I.,
Einav
, U., et al. (2004). MicroRNA
expression detected by oligonucleotide microarrays : System establishment and expression profiling in human
tissues. Genome Research, 2486
-
2494. doi:10.1101/gr.
2845604.4


Bentwich
, I.,
Avniel
, A.,
Karov
, Y.,
Aharonov
, R.,
Gilad
, S.,
Barad
, O.,
Barzilai
, A., et al. (2005). Identification of
hundreds of conserved and
nonconserved

human microRNAs. Online, 37(7), 766
-
770. doi:10.1038/
ng1590


Friedländer
, M. R., Chen, W.,
Adamidi
, C.,
Maaskola
, J.,
Einspanier
, R.,
Knespel
, S., &
Rajewsky
, N. (2008).
Discovering microRNAs from deep sequencing data using
miRDeep
. Nature biotechnology, 26(4), 407
-
15.
doi:10.1038/
nbt1394


Friedländer
,
M. R.,
Mackowiak
, S. D., Li, N., Chen, W., &
Rajewsky
, N. (2011). miRDeep2 accurately identifies
known and hundreds of novel microRNA genes in seven animal clades. Nucleic acids research, 1
-
16.
doi:10.1093/
nar
/
gkr688


Krüger
, J., &
Rehmsmeier
, M. (2006).
RNAhybrid
: microRNA target prediction easy, fast and flexible. Nucleic
acids research, 34(Web Server issue), W451
-
4. doi:10.1093/
nar
/
gkl243


Miska
, E. a, Alvarez
-
Saavedra
, E., Townsend, M., Yoshii, A.,
Sestan
, N.,
Rakic
, P., Constantine
-
Paton, M., et al.
(2004). Microarray analysis of microRNA expression in the developing mammalian brain. Genome biology,
5(9), R68. doi:10.1186/gb
-
2004
-
5
-
9
-
r68


Wang
, X., Zhang, J., Li, F.,
Gu
, J., He, T., Zhang, X., & Li, Y. (2005). MicroRNA identification based on sequence
and structure alignment. Bioinformatics (Oxford, England), 21(18), 3610
-
4. doi:10.1093/bioinformatics/bti562