RNA bioinformatics - bio.biomedicine.gu.se

tennisdoctorΒιοτεχνολογία

29 Σεπ 2013 (πριν από 3 χρόνια και 10 μήνες)

156 εμφανίσεις

RNA
bioinformatics
Marcela Davila-Lopez
Department of Medical Biochemistry and Cell Biology
Institute of Biomedicine
Medical genomics and bioinformatics, 2009
RNA bioinformatics2
RNA
mRNA
DNA
Alternative splicing
Translation
ProteinA
ProteinB
PolyA tail
5’cap
Mod. / Export
Transcription
RNA bioinformatics3
Overview
RNA ncRNA
Importance disease related
Structure type
RNA regulatory elements Riboswitches
SECIS
IRE
miRNA
How to predict ncRNA secondary structure
Mfold
Mutual information
How to identify ncRNA genes
Pattern matching (Patscan)
SCFG (CMsearch)
Phylogenetic analysis
RNA bioinformatics4
General concepts
RNA bioinformatics5
Types and Roles of ncRNAs
•mRNAcodes for proteins
•A non-coding RNA(ncRNA) is any RNA
molecule that is not translatedinto a protein
•Genomic stability
Telomerase
•RNA processing and modification
Spliceosomal snRNA
U7 snRNA
RNAse P
RNAse MRP
•Transcription
7SK RNA
6S RNA
•Translation
tRNA
tmRNA
rRNA
•Protein trafficking
SRP RNA
Gisela Storz, ShoshyAltuviaand Karen M.
Wasserman (2005)
Matera, A.G., R.M. Terns, and M.P. Terns, Nat
Rev Mol Cell Biol, 2007.
RNA bioinformatics6
ncRNA content
Are ncRNAs responsible for the complexity in different organisms?
Huttenhofer, A., P. Schattner, and N. Polacek,
Trends Genet, 2005
RNA bioinformatics7
Disease
Prasanth, K.V. and D.L. Spector, Genes Dev, 2007.
Costa, F.F. Drug DiscovToday2009
Pandey, A.K., P. Agarwal, K. Kaur, and M. Datta.
Cell PhysiolBiochem2009
m
i
R
D
iab
et
e
s
MR
P RN
A
C
a
r
tilage
hair
-hy
p
opla
si
a
RNA bioinformatics8
Disease
Thiel, C.T., G. Mortier, I. Kaitila, A. Reis, and A.
Rauch. Am J Hum Genet2007
Cartilage hair-hypoplasia
MRP RNA processing of pre-rRNA
RNA bioinformatics9
Protein -Primary sequence
ClustalW
Sequence similarity

bi
ological
rel
a
tion
same function
RNA bioinformatics10
ncRNA -Primary sequence

No sequence conservation,
but structural
Covariatio
n
: Consistent and compensat
o
ry
m
u
ta
tions tha
t
(often) conserve the structu
r
e
RNA bioinformatics11
A single mutation can radically
change the structure
Cano
nical pairs
N
o
n-ca
n
o
nical pairs: GU wobble
http://pri
on.bchs.uh.edu
/
bp_t
y
pe/bp_str
u
ct
u
r
e.ht
ml
RNA bioinformatics12
Multibranched loop
Secondary structure
RNA functionality
depends on
structure
External bas
e
Stem
Loop
Hairpin
Internal l
o
op
Bulge
Pseudoknot
RNA bioinformatics13
Tertiary structure
RNA tertiary structure comprises interactions of SS:
two helices
two unpaired regions
one unpaired region and a double-stranded helix
Prediction of RNA 3D structure is very difficult and RNA
bioinformatics is therefore dominated by the prediction and
analysis of secondary structure.
RNA bioinformatics14
Family structure
tRNA
Telo
merase RNA
P RN
A
Each family typically adopts a characteristic secondary structure
RNA bioinformatics15
However...
Dictyostelium discoideum
Candida albicans
Trypanosoma brucei
U1 snRNA
MRP R
N
A
RNA bioinformatics16
Examples:
RNA regulatory elements
Riboswitches
SECIS
IRE
miRNA
RNA bioinformatics17
RNA regulatory elements
A cis-regulatory elementor cis-element is a region of RNAthat regulates the
expression of geneslocated on that same strand.
Trans-regulatoryelementsare RNAs that may modify the expression of genes,
distant from the gene that was originally transcribed to create them.
C D S
m7G
5’
3’
miRNA
5’
3’
AAUAA
AAAAAAAA
RNA bioinformatics18
Cis and trans regulatory elements
Dominski, Z. and W.F. Marzluf
f
.Gene, 2007
Histones
DNA
U7 snRNA
D3
B
G
E
Lsm10
Lsm11
F
Symplekin
CPSF-73
CPSF-100
SLBP
ZFP-100
Histone pre-mRNA
Stem-Loop motif of Histone pre-mRNA
RNA bioinformatics19
Riboswitch
2002Part of an mRNAmolecule that can directly bind a small target
molecule, affecting the gene’s activity (Auto-regulation)
•Typically found in the 5’UTR
•Biosynthesis, catabolism and transport of various cellular catabolites
(aminoacids [K,G], cofactors, nucleotides and metal ions)
•Most known occur in Bacteria
Tucker, B.J. and R.R. CurrOpinStructBiol,
2005
RNA bioinformatics20
Riboswitch examples
SerganovA, Patel DJ. BiochimBiophysActa. 2009
Transcription
Trans
l
ati
o
n
Shine-Da
lgarno
RNA bioinformatics21
Riboswitch identification
HenkinTM. Genes Dev. 2008
MandalM, et al, Cell. 2003
Comparative analysisof upstream regions of several genes:
•BLASTto find UTRs homologousto all UTRs in Bacillus subtilis(e.g)
•Inspection for conserved structure RNA-like motifs
•Experimental confirmation
Guanine Riboswitc
h
RNA bioinformatics22
Selenoproteins
At least 25 selenoproteins
Present in all lineages of life (bacteria, archaea and eukarya)
Function of most selenoproteinsis currently unknown
Prevention of some forms of cancer (?) therapeutic targets (?)
Selenium 
antioxidant activity
chemopreventive, antiinflammatory, and antiviral properties
Moderate selenium deficiencyhas been linked to:
increased cancer and infection risk, male infertility,
decrease in immune and thyroid function, and several
neurologic conditions, including Alzheimer’s and
Parkinson’s disease
Not a cofactor incorporated into the polypeptide chain as
selenocysteine[SEC] (21st aa)
Papp, LV, et al. ANTIOXIDANTS & REDOX
SIGNALING 2007

RNA bioinformatics24
SECIS
Kryukov, G.V., et al., Science, 2003
Overall low sequence similarities
Secondary structures are highly
conserved and contain consensus
sequences that are indispen
sable for Sec incorporation
Eukaryotic SECIS:
non
-
cano
n
ical A-G base pairs
K-turn motif
RNA bioinformatics25
RNA bioinformatics26
IRE: Iron responsive element
Essential for oxygen transport, cellular respiration, and DNA synthesis
[↓] cellular growth arrest and death anemia, retardation in children
[↑]generate hydroxyl or lipid radicals damage lipid membranes, proteins,
and nucleic acids.
hemochromatosis, liver/heart failure
Iron:
Balance: iron-responsive element/iron regulatory protein regulatory system
MuckenthalerMU, GalyB, HentzeMW. Annu
Rev Nutr.2008
Piccinelli P, Samuelsson T, RNA, 2007
26–30 nts(long hairpin)
CAGUGN apical loop sequence
5’UTR –3’UTR
RNA bioinformatics27
IRE regulation
MuckenthalerMU, GalyB, HentzeMW. Annu
Rev Nutr.2008
RNA bioinformatics28
Gene Identification
and
SS prediction
RNA bioinformatics29
Protein vs RNA identification
Sequence-similarity based
Conserved primary sequence
Protein
RNA
Promoters (Pol II)
Not Conserved primary sequence
Promoters (Pol II, Pol III)
Sequence-similarity based
Secondary structure based
Comparative genomics
RNA bioinformatics30
Methods
•Nussinovalgorithm
•Mfold(prediction of secondary structure)
•Analysis of mutual information
•Pattern matching
•SCFG (Stochastic context-free grammar models)
•Phylogenetic analysis
Nussinov algorithm: Find the structure with the most base pairs (dynamic programming)
Drawbacks:
Not unique structure
Testing all possible structures
numerically impossible
RNA bioinformatics31
Methods
•Nussinovalgorithm
•Mfold(prediction of secondary structure)
•Analysis of mutual information
•Pattern matching
•SCFG (Stochastic context-free grammar models)
•Phylogenetic analysis
Z
u
ker folding algo
rit
hm
(
1981):
T
h
e correct structure is
th
e
one wi
th the low
e
st e
q
u
ili
brium
free energy (
Δ
G) which is the sum
of individual con
t
rib
ution
s
from loops,
base pai
r
s and other seco
ndary structu
r
e el
ements
Every syst
e
m
see
k
s to achieve a m
i
nim
u
m

of free energy (MFE)
How
e
ver ... The str
u
ct
u
r
e w
i
th the low
e
st
MFE not always is the biolo
g
ical relevant
RNA bioinformatics32
Methods
•Nussinovalgorithm
•Mfold(prediction of secondary structure)
•Analysis of mutual information
•Pattern matching
•SCFG (Stochastic context-free grammar models)
•Phylogenetic analysis
Mutual information
: quan
tity tha
t
measu
r
es the mu
tu
al
dependence of the t
w
o variabl
es (two

positions). The unit of measurement
i
s the bit
.


Covarying positions:
consiten
t and compensatory mu
ta
tions tha
t
conserve the structu
r
e
RNA bioinformatics33
Mutual information -example
fxi
= fq of one of the 4 bases in column i
fxixj
= fq of one of the 16 base-pairs in
columns iandj
Mij
= 2 max value informative
= 0 conserved positions not informative
1 2 3 4
G G C C
G C C G
G A C U
G U C A
Columns 2-4:
GC
CG
AU
UA
fG=1/4 fC=1/4 fGC=1/4
fC=1/4 fG=1/4 fCG=1/4
fA=1/4 fU=1/4 fAU=1/4
fU=1/4 fA=1/4 fUA=1/4
fGC*log2(fGC/fG*fC)
1/4*log2(0.25/(0.25*0.25)) = 0.5
1/4*log2(0.25/(0.25*0.25)) = 0.5
1/4*log2(0.25/(0.25*0.25)) = 0.5
1/4*log2(0.25/(0.25*0.25)) = 0.5
MI = 2
Columns 1-3:
GC
fG=4/4 fC=4/4 fGC=4/4
4/4*log2(1/(1*1)) = 0
MI = 0
RNA bioinformatics34
Mutual information –
e
xcercise
RNA bioinformatics35
Mutual information plot
Diagonals of covarying positions correspond to
the four stems of the tRNA. Dashed lines
indicate some of the addtional tertiary contacts
observed in the yeast tRNA-Phe crytal structure.
RNA bioinformatics36
Methods
•Nussinovalgorithm
•Mfold(prediction of secondary structure)
•Analysis of mutual information
•Pattern matching
•SCFG (Stochastic context-free grammar models)
•Phylogenetic analysis
p1 =
5...7
GGAA
~p1
Patscan
: is a pattern matcher (deterministic mo
tifs as wel
l
as secondary structure
cons
traints) which searches protei
n or n
u
cleotide sequence archives
Drawback
:
Yes/No answer
RNA bioinformatics37
PatScan -Example
r1={au,ua,gc,cg,gu,ug}
r1~p2[1,0,1]
p1=6...7
~p1
4...4
p2=8...9
GGG [1,0,0]
3...4
r1={au,u
a,gc,cg,gu,ug}
p1=6...7
GGG [1,0,0]
p2=8...9
4...4
r1~p2[1,0,1]
3...4
[1,0,0]
Mismatch
Deletion
Insertion
RNA bioinformatics38
Methods
•Nussinovalgorithm
•Mfold(prediction of secondary structure)
•Analysis of mutual information
•Pattern matching
•SCFG (Stochastic context-free grammar models)
•Phylogenetic analysis
Regular gra
m
m
a
r

primary sequence models
T

a
S
| b
T
|
ɛ
aT

aaS

aabS

aabaT

aaba
ɛ

aaba
S

aT |
bS
Model repeat
regio
n
s (ex. FMR-1
triplet repeat regio
n
)
S

gW1
W1

cW2
W2

gW3
W3

cW4
W4

gW5
W5

gW6
W6

cW7 | aW4 | cW4
W7

tW8
W8

g
gcg cgg ctg
gcg cgg agg cgg ctg
gag agg ctg
gcg agg cgg ctg
gcg agg cgg cgg
RNA bioinformatics39
Methods
•Nussinovalgorithm
•Mfold(prediction of secondary structure)
•Analysis of mutual information
•Pattern matching
•SCFG (Stochastic context-free grammar models)
•Phylogenetic analysis
Con
t
ext-free gram
mar

primary sequenc
e
model
s


palindro
mes
S

aSa | bS
b |
aa | b
b
S

aSa

aaSaa

aabS
b
aa

aabaabaa
RNA sec
o
ndary structure
CAGGAAACUG
GCUGCAAAGC
GCUGCAACUG
S

a
W1u | c
W1g | gW1c |uW1a
W1

aW2u | c
W2g | gW2c |uW2a
W2

aW3u | c
W3g | gW3c |uW3a
W3

ggaa | gcaa
G A
G A
G.C
A.U
C.G
C A
G A
U.A
C.G
G.C
C A
G A
UxC
CxU
GxG
RNA bioinformatics40
Methods
•Nussinovalgorithm
•Mfold(prediction of secondary structure)
•Analysis of mutual information
•Pattern matching
•SCFG (Stochastic context-free grammar models)
•Phylogenetic analysis
Stochastic regular gra
mma
r

weighted prima
r
y sequence
models (probabil
i
s
ti
c
)
S

rW1
S

kW1
S

nW1
(0,45)
(0,45)
(0,10)
Hidden ma
rkov m
o
dels
A
C
G
T
ɛ
β
RNA bioinformatics41
Methods
•Nussinovalgorithm
•Mfold(prediction of secondary structure)
•Analysis of mutual information
•Pattern matching
•SCFG (Stochastic context-free grammar models)
•Phylogenetic analysis
Stochastic con
t
ext-free gram
mar

Covariance models: probabilist
ic models that flexibly
describe the
secondary struct
u
r
e
and
primary sequences
consensus fo an RNA sequence
family
RNA bioinformatics42
Infernal Package
•Search for additional and
family-related sequences in
sequence databases
RNA bioinformatics43
CM example
Build a model (automatically) from an existing sequence alignment
RNA bioinformatics44
CM example
RNA bioinformatics45
Databasecontaining information about ncRNA families and
other structured RNA elements.
RNA bioinformatics46
Structural alignments
Phylogenetic
distribution
RNA bioinformatics47
Methods
•Nussinovalgorithm
•Mfold(prediction of secondary structure)
•Analysis of mutual information
•Pattern matching
•SCFG (Stochastic context-free grammar models)
•Phylogenetic analysis
-
C
onserved elements alignment
-
S
CFG Secondary structure
-F
o
l
d
-
P
hylogenetic evaluation
EVOf
old:
RNA bioinformatics48
miRNA
RNA bioinformatics49
miRNA
Negrini, M., M.S. Nicoloso, and G.A. Calin. Curr
OpinCell Biol2009.
C D S
m7G
5’3’
miRNA
5’
3’
AAUAA
AAAAAAAA
Target
•SS RNA
•~22 nucleotides
•Inhibit the translation of mRNAs to
their protein products by biding to
specific regions in the 3ʼUTR
•Accounts for ~1% of all transcripts
in humans and potentially regulate
10%-30% of all genes.
•Expressed ubiquitously and highly
conserved in Metazoans (animal
kingdom).
RNA bioinformatics50
miRNA
Negrini, M., M.S. Nicoloso, and G.A. Calin. Curr
OpinCell Biol2009.
C D S
m7G
5’3’
miRNA
5’
3’
AAUAA
AAAAAAAA
Apoptosis
Cell prolifertion
Cell differentiation
Development
Organism defense against infections
Tissue morphogenesis
Regulation of metabolism
Cancer
Viral infections
Neurodegenerative disorders
Cardiac pathologies
Muscle disorders
Diabetes
Biologi
cal
processes
Diseases
Ta
r
g
et
RNA bioinformatics51
miRNA
Negrini, M., M.S. Nicoloso, and G.A. Calin. Curr
OpinCell Biol2009.
He, L. and G.J. Hannon, Nat Rev Genet2004
C D S
m7G
5’3’
miRNA
5’
3’
AAUAA
AAAAAAAA
Target
Multiple binding siteslin-4
is partially complementary to 7
sites in the lin-14 3′UTR
RNA bioinformatics52
miRNA genes
Kim VNNat Rev Mol Cell Biol.2005
Winter J et al Nat Cell Biol. 2009
Exonic
miRNAs
in non-co
ding transcripts
Intro
n
ic miRNAs
in non-coding transcripts
Intronic miRNAs
in protein-coding transcripts
Single
Clustered
RNA bioinformatics53
miRNA Biogenesis
Winter, J., S. Jung, S. Keller, R.I. Gregory, and
S. Diederichs. Nat Cell Biol2009.
Paul S. Meltzer, Nature, 2005
Canonical
Non-Canonical
RNA bioinformatics54
miRNA structure
Negrini, M., M.S. Nicoloso, and G.A. Calin. Curr
OpinCell Biol2009.
miRNA
miRNA*
Intervenin
g
loop
High conservation mature
miRNA
Lower conservatio
n
loop
Hum
a
n gen
o
me ~11 millio
n
hairpins
Hairpin struct
u
r
e
RNA bioinformatics55
miRNA computational
identification
Homology search based
BLAST
miRAling, ProMir, microHARVESTER
Gene finding
Identification of conserved genomic regions
Folding of the identified regions (Mfold, RNAfold)
Evalutation of hairpins
miRseeker, miRscan
Neighbour stem loop
(~42% of human miRNA genes are clustered together)
Check surroundings of a known miRNA for candidate secondary structures
Comparative genomics
BLAST intergenic sequences of two genomes against each other
Filter based on rules inferred based on known miRNAs
miRFinder
Intragenomic matching(A functional miRNA should have at least a target)
miRNAs show perfect complementarity to their targets (?)
It simultaneously predicts miRNAs and their targets
miMatcher
RNA bioinformatics56
miRNA experimental validation
through sequencing
Experimental approach:
–Purify small RNAs (15-35 nt)
–Deep sequencing of the RNA library.
–Map sequence traces to the genome.
Ruby JG. et al. Genome Res., 2007
RNA bioinformatics57
miRNA Target prediction
Negrini, M., M.S. Nicoloso, and G.A. Calin. Curr
OpinCell Biol2009.
•PredictingmiRNAtargetsin plantsiseasier, due tothe perfect
complementaritytothe miRNAs
•In animals, perfectcomplementarityisnotcommon
–miRNAseedcomplementarity(6to9nt)
–High false positivesrate
•Common approach
–Experimentalevidences–ValidatedmiRNA/target pairs
–Tarbase, miRecords
•Computational methods:
–Base-pairing rules and binding sites sequence features
–Conservation
–Thermodynamics
C D S
m7G
5’3’
miRNA
5’
3’
AAUAA
AAAAAAAA
Target
RNA bioinformatics58
Base-pairing rules
Bartel, D.P. 2009. Cell2009.
6-9 nt, starting usually at P2
P1 is typically unpaired or starts with U
Often flanked by A
Usually no G:U wobbles (vs regulation)
3’
compensatory sites
Cano
nical sites
Atypical
sites
lsy-6
/cog-1 3’UTR
5’
dominant sites
May compensate for insufficient base
pairing in the seed
RNA bioinformatics59
More methods ...
Negrini, M., M.S. Nicoloso, and G.A. Calin. Curr
OpinCell Biol2009.
Searchforconservedseedsin the UTRsacrossdifferentspecies
Evaluation of ΔG of predicted duplexes usually < -20 Kcal/mol
Discard F(+) but favorable interactions not always correspond to
actual duplex
The targe site on the mRNA 
not involved in any
intramolecular bp
Any existing secondary structure
must be first removed
Thermodynamics
Structural accesiblity
Con
s
ervation
RNA bioinformatics60
miRNA
Bartel, D.P. 2009. Cell2009
RNA bioinformatics61
miRNA gene expression in cancer
Negrini, M., M.S. Nicoloso, and G.A. Calin. Curr
OpinCell Biol2009.
RNA bioinformatics62
miRNA in Cancer
Lu, J., et al., Nature, 2005
RNA bioinformatics63
Carlo Croce 2009
A
B
miR-29b or scrambled oligos injection (5 µg)
K562 cells
injected SC
Days
Tumor size
Stop
0 3 7 10 14
D
* P<0.003
0
200
400
600
800
1000
1200
1400
1600
1800
0+3Days+7+10+14
Tumor Volume (mm
3
)
Mock
Scrambled
miR-29b
*
*
miR-29b
Scrambled
C
Tumor Weight (grams)
P<0.001
0
0.2
0.4
0.6
0.8
1
1.2
scrambledmiR-29b
(A) Diagramillustrating the experimental
design of the mice xenograftexperiment.
(B)Graphic representing the tumor volume
determinations at the indicated days during
the experiment for the three groups; mock
(n= 6), scrambled (n=12) and synthetic miR-
29b(n=12).
(C)Tumor weight averages between
scrambled and synthetic miR-29b treated
mice groups at the end of the experiment
(Day +14). P-values were obtained using t-
test. Bars represent ±S.D.
(D)Photographs of two mice injected with
miR-29b(left flank) or scrambled (right
flank).
MiR-29binhibits Leukemic growth in vivo.
miRNAs as tumor suppresors
RNA bioinformatics64
miR DBs
Published miRNAS
Experimentally suported targets
Prediction of miRNAS targets
miRNA-disease relationships reported in the literature.