download - Cal State LA - Instructional Web Server

crashclappergapSoftware and s/w Development

Dec 13, 2013 (3 years and 9 months ago)

88 views

Nathaniel Gustafson

Dr. Garry Larson (City of Hope)


Cancer Studies


Can we tie genetic variation (
eg
. SNPs) to Cancer Risk?


Myriad Genetics found BrCa1 and BrCa2


Mutations in BrCa1/2 tied to
800%

increase in breast cancer risk


Most research is on
exonic

regions


Changes protein composition




http://members.cox.net/amgough/Fanconi
-
genetics
-
genetics
-
primer.htm

Our approach


What about regulatory regions?


Motif:


Recurring sequence, usu. 6
-
20
bp
.


Generally functional


Hypothesis
: Regulatory motifs upstream of the
transcriptional start site (TSS) may play some role
in breast cancer


stop

ATG

3`

5`

5`
-

upstream

YGCGYRCGC

ATCMNTCCGY

TGAYRTCA

GCTNWTTGK

...

Disease Mutations in
Phylogenetically

Conserved Motifs

G A C C T A C T A C A



Orthologous bases: identical by descent

Nonorthologous

Bases in
red

G A
G


C T A C T A C
T

~5 myr

G A C
T

T A
A

T
T

C A

~70 myr

G A
G

C

T A C
-

A
G

A

~300 myr

G A
G

T

T A
A

T
G

G

T

~475 myr

G A C
C

T
T

C T A C A

BrCa

Pt.

Mutation


Background


Meta
-
Analysis pools several
brca

ER+/
-
* studies


Statistics used to find genes that have
consistent

differences in expression levels in ER+
vs

ER
-

cell lines






*ER = Estrogen Receptor


a common way of classifying breast
cancer cells


GCCATnTT


x
50

GCCATnTT


x
9


Aims


Investigate regulatory motifs for these genes


Compare occurrences of each motif across gene sets



Hypothesis
:

genes
overexpressed

in the same tumor type share motifs


Weak Results



Are we missing the signal?


Old counting method

15

10

-
2000
-
1500
-
1000
-
500 TSS

ER+ < ER
-

gene set

motif occurrences

P
-
val
:
.30

ER+ > ER
-

gene set

motif occurrences

-
2000
-
1500
-
1000
-
500 TSS

NOT

significant


Are we missing the signal?


New counting method: use position bias


12

3

-
2000
-
1500
-
1000
-
500 TSS

ER+ < ER
-

gene set

motif occurrences

P
-
val
:
.03

ER+ > ER
-

gene set

motif occurrences

-
2000
-
1500
-
1000
-
500 TSS

significant

Tools


Perl


Handy scripting language


Great for parsing textual data


mySQL


Storage and retrieval of structured data

www.yusoft.net/yu
-
graph/main/logo
-
mysql.jpg


Problems


Lack of data specificity


What do
Xie’s

pos. biases mean?


Insufficient data


Needed position of motif relative to TSS


Improperly annotated data


Position shown to be inconsistent


Collaboration


Norway is about 10 time zones away

Results

Motif

1down
count

1up
cnt

5down
count

5up
cnt

pos.
bias

Pval

top1

Pval

top5

SCGGAAGY

5

8

36

71

-
24

0.40116

0.00019

...

...

...

...

...

...

...

...

Motif

1down
count

1up
cnt

5down
count

5up
cnt

pos.
bias

Pval

Top1

Pval

top5

SCGGAAGY

31

41

168

206

-
24

0.10299

0.00719

...

...

...

...

...

...

...

...


Reading

100
bp

from positional bias






No

window (Previous results)


Any SNPs in this motif?


One SNP was found from
HapMap

in this motif


But it was at a degenerate position (
eg
. Y = C
or

G)


= still satisfied the motif


Might still affect expression

Biological Significance


SCGGAAGY found more in ER+
overexpressed

genes


Known as a binding site for ELK
-
1


Might provide some insight into ER+/ER
-

cell
differentiation


Verification
in vivo

remains to be done

3’UTR Motif List

-
6/7mer miRNA seeds

-
Phylogenetic conser. motifs

HapMap

BrCa GWAS

Datasets

Hunter,
et al

(CGEMS)

Gold,
et al
.

(MSKCC)

Easton,
et al
.(UK)

(
unavailable
)

Stacey (deCode)

(
unavailable)

SNP_list

SNPs Rank &

Biological Testing

BrCa

Somatic

Mutations (
Sj
ö
blom
)

Linkage Studies

in
BrCa

(Smith,
et al
.)

LOH (aCGH)

in BrCa

Thermodynamic Profiling

(STarMir, PITA)

In
-
House Independent

Association Studies

3’UTR
-
luc

Fusion Assay

Reciprocal Allelic

Testing
-
Effect

Evolutionary Conser
-

vation (miRNA seeds)

LD

Mapping

Proxy

SNPs

Allele frequency in

HapMap Population(s)

Reciprocal Allelic

testing
-
no effect

Additional Biological

Testing

GWAS


“Genome Wide Association Study”


Genotypes cases and controls at thousands of loci


Intended to be an unbiased approach


Potentially identifies pertinent mutations

http://www2.bioinformatics.tll.org.sg/img
/species/karyotype_Homo_sapiens.png

Study

Assay
Platform

Cases/
Controls

Comment_1

Comment_2

Public
Dataset

Hunter,
et al.

(Nat Genet 39,
2007)

Illumina

Hap 550
(keep 528K)

1,145 / 1,142

Prospective,
post
-
menopausal
women

Logistic
Regression

YES

(CGEMS)

Easton,
et al.

(Nature 447,
2007)

Affy, 266k
SNPs

(keep 227k)

Stage I
-

380
/
364

Stage 2
-
3,990
/3,916 ctrls

Stage 3
-
21,860/22,578
ctrls

Stage 1
-
Cases
(

㈠晩r獴
-
d敧r敥e
r敬慴av敳ewi瑨
F慭aH砩

㌠獴慧攠慳獯捩慴aon

却慧攠2
-
瑯p 㔥 o映
獴慧攠1

却慧攠3
-
Top ㌰3
华偳P晲o洠却慧攠2

TNRC9

high score

NO

Stacey,
et al

(Nat Genet 39,
2007)

deCode
Dataset

Illumina
Hap300

(keep 311k)

1,600 Icelandic
cases/ 11,563
ctrls

Top 10 SNPs
GTP’d in 2
nd

Icelandic
sample and 2
-
3
ind. European
cohorts

1 SNP strong LD
with 999


BRCA2
-
removed from
study

Found SNP near
TNRC9

NO


Gold,
et al.

(PNAS 105

March, 2008)

Affy

GTP
435K SNPs

(keep 150k)

249 AJ
Fam

Hx

(

㌠捡獥猬c
BRCA1

&
2

neg
)
vs.

299 Ca
-
free AJ
ctrls

3 stage design

Reproduced
FGFR2

region

MAYBE?

BrCa GWAS Datasets

Study

Assay
Platform

Cases/
Controls

Comment_1

Comment_2

Public
Dataset

Hunter,
et al.

(Nat Genet 39,
2007)

Illumina

Hap 550
(keep 528K)

1,145 / 1,142

Prospective,
post
-
menopausal
women

Logistic
Regression

YES

(CGEMS)

Easton,
et al.

(Nature 447,
2007)

Affy
, 266k
SNPs

(keep 227k)

Stage I
-

380
/
364

Stage 2
-
3,990
/3,916 ctrls

Stage 3
-
21,860/22,578
ctrls

Stage 1
-
Cases
(

㈠晩r獴
-
d敧r敥e
r敬慴av敳ewi瑨
F慭aH砩

㌠獴慧攠慳獯捩慴aon

却慧攠2
-
瑯p 㔥 o映
獴慧攠1

却慧攠3
-
Top ㌰3
华偳P晲o洠却慧攠2

TNRC9

high score

NO

Stacey,
et al

(Nat Genet 39,
2007)

deCode

Dataset

Illumina

Hap300

(keep 311k)

1,600 Icelandic
cases/ 11,563
ctrls

Top 10 SNPs
GTP’d in 2
nd

Icelandic
sample and 2
-
3
ind. European
cohorts

1 SNP strong LD
with 999


BRCA2
-
removed from
study

Found SNP near
TNRC9

NO


Gold,
et al.

(PNAS 105

March, 2008)

Affy

GTP
435K SNPs

(keep 150k)

249 AJ
Fam

Hx

(

㌠捡獥猬c
BRCA1

&
2

neg
)
vs.

299 Ca
-
free AJ
ctrls

3 stage design

Reproduced
FGFR2

region

MAYBE?

BrCa GWAS Datasets

YES

Bring this...

SM70 SG74 LF52 SM17 SH14 L5721 SM56 SF63 L5957 L5349 L5420 L5713 SH5 LF48 SJG4

L6029 SG21 L5352 L6121 SG69 L5952 SM78 SM113 SF23 L5573 SN6 SF1 SM91 L5895 L5518

L5501 L5328 L5772 SG08 SG28 SM52 SM106 SM67 L5463 L5494 SA17 L5796 L6014 SN15

rs2180341 chr6 127642323 + ncbi_b35
MSKCCOffit

AffyEAv3
PhaseIGold_et_al

TT
TT


TT CT
CT

CC TT CT TT
TT

TT

TT

CT TT CT TT CT TT
TT

CT TT
TT

TT

CT
CT

CT

CT

CT

CT


CT TT CT
CT

TT
TT

CT CC TT
TT

CT TT
TT

CT TT CT
CT

CT

CT

TT
TT

TT

CT
CT

CT

TT
TT


CT TT
TT

TT

TT

TT

TT

CT TT
TT

TT

TT

CT TT
TT

CT TT
TT

TT

CT
CT

TT CT
CT

TT CT TT

...

rs6569480 chr6 127663441 + ncbi_b35
MSKCCOffit

AffyEAv3
PhaseIGold_et_al

GG
GG


GG
GG

AG AA GG AG GG
GG

GG

GG

AG GG
GG

GG

AG GG
GG

AG GG
GG

GG

AG
AG

AG

AG

AG

AG


AG GG AG
AG

GG
GG

AG AA GG
GG

AG GG
GG

AG GG AG
AG

AG

AG

GG
GG

GG

AG
AG

AG

GG
GG


AG GG
GG

NN GG
GG

GG

AG GG
GG

GG

GG

AG GG
GG

AG GG
GG

GG

AG GG
GG

AG
AG

GG AG GG

...

rs_num


chr

pos


analysis_name


p_value

OR_het

OR_hom

build



rs10510126


chr10

124992475

chi square
-

genotype



4e
-
06

0.5918

0.6387


ncbi_b36


rs10510126


chr10

124992475

chi square
-

allele



2e
-
06

0.5918

0.6387


ncbi_b36

To this...

Future Work


We’ve digested the Gold data set


Employ this in the triage for producing a gene list


Combine with other triage methods to find the
most

interesting genes


Test these
in vivo

Special Thanks


Dr. Garry Larson


SoCalBSI

program


SoCalBSI

mentors


City of Hope

Funding


Komen

for the Cure


National Science Foundation


National

Institute of Health


Employment

and Workforce
Development

References



Xie

X,
Mikkelsen

TS,
Gnirke

A,
Lindblad
-
Toh

K,
Kellis

M, Lander ES.
Systematic discovery of regulatory motifs in conserved regions of the human
genome, including thousands of CTCF insulator sites. Proc
Natl

Acad

Sci

U S
A. 2007 Apr 24;104(17):7145
-
50.


D. Smith, P.
Sætrom
, O.
Snøve

Jr
, C. Lundberg, G. Rivas, C.
Glackin

and G.
Larson. Meta
-
analysis of breast cancer microarray studies in conjunction with
conserved
cis
-
elements suggest patterns for coordinate regulation. BMC
Bioinformatics 2008, 9:63