20110601_WRM_PiInfectionPromMotifx - College of Wooster

hordeprobableBiotechnology

Oct 4, 2013 (3 years and 10 months ago)

65 views



An Exceptional Motif in
Infection
-
Induced Promoters

of P.
infestans

Regulatory Genomics


During an organism’s lifetime,
s
ets of genes are specifically

turned “on” and “off”.


A major goal of bioinformatics is
to decipher the circuitry that
underlies this regulation


Case Study:
Phytophthora
infestans


Major

plant pathogen


Causes late blight of potato and tomato


1840’s, Irish potato famine; today, resurging


Eukaryote

(
Kamoun

and Smart, 2005)

Case Study:
Phytophthora
infestans


During infection, intimate interaction with host


Transcriptional changes (
Haas
et al
., 2009
)


>450 genes are induced >2x


>100 genes are repressed

Transcriptional Regulation


Gene
control mostly at transcriptional level


Mediated by transcription factors (TFs)


Bind specific DNA
sites (TFBSs)

(
Alberts

et al
.,
Molecular Biology of the Cell
)

Discovering TFBSs


TFBSs can be identified experimentally, but
laborious







Computational prediction promises to speed
up TFBS discovery

[mRNA 1]

++

+

-

Computational Discovery of TFBSs


Computational challenge


Short
sequences (6
-
12
bp
), so high
noise


Variant binding
sites:
motifs



Sequence
& location often unknown


General strategies


Genomic comparisons for conserved
elements


Over
-
represented motifs in co
-
regulated
genes


A
T G
*C T G A A T

G T A


*C T A T A T

A G T A A T


C T G T
*C A A T A T

G T


A A C
*C T A A T T

G T T


*C A G A T T

T C C C A C


C T C G A
*C A A A T T

T


A C T
*C A G A T T

C T C

Steps to TFBS Discovery

1.
Identify co
-
regulated
genes


400+ genes
induced >
2
x

on day
2 of infection

2.
Collect
presumed gene regulatory regions


Putative promoters: 1 kb before start codon

3.
Select motif model


Length
?

8
-
mers


Gapped
/
ungapped
?
None


A
T G
*C T G A A T

G T A


*C T A T A T

A G T A A T


C T G T
*C A A T A T

G T


A A C
*C T A A T T

G T T


*C A G A T T

T C C C A C


C T C G A
*C A A A T T

T


A C T
*C A G A T T

C T C

Steps to TFBS Discovery

4.
Select search method


Alignment
-
based (MEME,
AlignACE
)


Enumerative
(
WordSeeker
)


List and count all words of chosen length

5.
Measure
exceptionality

of each word


Compute probability of each observation given
expectations

Expectations & Models


What you expect depends upon your model


For example, how often would you expect
word CGCGCG to occur in 10 kb?


Expectations & Models


Simplest (Bernoulli) model


Each occurrence independent of previous ones



If 4 occurrences, is this exceptional?

Models & Expectations


Markov chain model


Next
bp

depends on previous
bp
(s)


But which
Markov order most
appropriate?


And calculations complex if word can self overlap

Models & Expectations


Compound Poisson Model


Clump occurrence probability (frequency)


Clump size distribution

(
self
-
overlapping
probability)


Parameter estimation


Markov chain model


Direct estimation based on “background”


22657
P.
infestans

1 kb upstream sequences

Infection
-
specific Promoter Motifs

Of the 50K+ 8
-
mers…


How many were exceptionally frequent?


Because multiple testing, used
Bonferroni

correction for FWE rate of 0.01


p
-
value cutoff of < 0.2x10
-
7



Computed p
-
values with custom R scripts and
MoSDi


Warning! Calculation errors with
Wordseeker


Result: 16 exceptional words


Infection
-
specific Promoter Motifs

Word/Motif*

Observed#

Expected#

p
-
value

TGTACATG

60

25.5

6.51E
-
09

AGTACATG

50

20.9

4.69E
-
08


GTACATGT

101

42

2.06E
-
14


GTACCGGT

57

20.1

1.57E
-
11


TACCGGTA

112

43.6

4.38E
-
18


ACCGGTAC

67

19.6

6.84E
-
17


ACATGTAC

96

45.1

4.93E
-
11


CATGTACA

61

27.6

4.05E
-
08


ATGTACAT

39

14.2

6.26E
-
08

-
GTACMKGTACA
-

70

19.1

1.78E
-
18

Infection
-
specific Promoter Motifs

Word/Motif*

Observed#

Expected#

p
-
value


GTACAGTA

76

28.7

2.94E
-
13


GTACTGTA

66

30.2

1.40E
-
08


TACTGTAC

66

28.6

2.00E
-
09


TACAGTAC

83

30.4

5.40E
-
15


ACAGTACA

44

16.9

4.49E
-
08

-
GTACWGTACA
-

80

24.5

5.51E
-
18

TATTAATA

80

40.8

3.21E
-
08

Infection
-
specific Promoter Motifs


Are variant motifs exceptional?


Computed p
-
values of both motifs allowing 1
mismatch (
MoSDi
)


Altogether, 229/436 of infection
-
induced
promoters contain 1+ degenerate motif


Motif (H=1)*

Observed#

Expected#

p
-
value

GTACMKGTAC

298

129.9

1.04E
-
23

GTACWGTACA

320

165.3

2.96E
-
16

Infection
-
specific Promoter Motifs


Next: Experimentally confirm motif’s function


Mutate


Reintroduce


Measure transcription levels

Take Home Lessons


Explore in depth


Double
-
check results when possible


Bioinformatics yields refined hypotheses that
require experimental verification


Acknowledgments


Thanks to the Ohio University Bioinformatics
Group for helpful discussions and support


Lonnie Welch


Xiaoyu

(Veronica)
Liang


Further Exploration


DNA, Words and Models

by Robin
et al.
, 2005


Understanding Bioinformatics

by
Zvelebil

&
Baum, 2008


WordSeeker

<
http
://word
-
seeker.org
/>


MoSDi



Motif Statistics
and Discovery

<
http://
code.google.com
/p/
mosdi
/>