Lecture 17 - The Ohio State University

hordeprobableΒιοτεχνολογία

4 Οκτ 2013 (πριν από 4 χρόνια και 7 μέρες)

82 εμφανίσεις

Department of

Biomedical Informatics

Bioinformatics and Genetics

Kun Huang

Department of Biomedical Informatics

OSUCCC Biomedical Informatics Shared Resource

The Ohio State University


2011


Department of

Biomedical Informatics

2

Outline


Introduction


Genetic variations


Technologies


Array
-
based technology


Massive sequencing


Genome wide association study (GWAS)


SNP array


ex潭o

獥煵敮捩湧


g敮e浥mre
-
sequencing


Expression quantitative trait loci (
eQTL
)


Allelic specific ********ion



Department of

Biomedical Informatics

3

Genetic Variations


SNP


In
-
Del


Transposon


Copy number variation


LOH


Gene fusion




Department of

Biomedical Informatics

Single Nucleotide
Polymorphism
(SNP)

At least 1% of
a population

has a different
nucleotide


There are many other classes of variants and
these are no less important (e.g., deletions and
duplications), SNP are simply the most
abundant.


First SNPs
-

RFLPs


D. Botstein
-

1980

The
single nucleotide polymorphism

(SNP) [pronounced "snip"] is the
most common form of genetic variation. As the name suggests, each SNP
is a difference in a single nucleotide (
A,T,C,or

G) of an individual's DNA
sequence, such as having AAGG instead of ATGG. There may be from 1 to
10 million SNPs in the entire human genome, but perhaps only a few
thousand relate to disease outcomes. The numbers seem to change with
every news report.

Department of

Biomedical Informatics

5

Critical SNP concepts

Marker SNP vs. Functional SNP

SNPs highlights the spots for search (features, region of interest).


SNP patterns from a target population can be compared with SNP patterns from
unaffected populations to find genetic variations shared only by the affected group.


The most useful SNPs are known as "functional SNPs." A single functional
SNP or certain combinations of functional SNPs may help explain variability
in individual responses to a given drug or pinpoint the subtle genetic
differences that predispose some to diseases such as arthritis, Alzheimer's,
cancer, diabetes, and depression.

Department of

Biomedical Informatics

6

Critical SNP concepts


Understand evolution



DNA fingerprinting


forensic applications



Markers for polygenetic traits



Genotype
-
specific medicine (personalized medicine)

Department of

Biomedical Informatics

7

Critical SNP concepts

1. Humans are diploid and exhibit significant heterogeneity
and
heterozygosity

2. DNA is essentially identical in every cell

3. The closer two SNP are the less likely they are to have
segregated in a population (linkage disequilibrium)

4. Multiple variants/alleles can be combined into
haplotypes

(polygenic markers


quantitative trait loci or
QTL)


Department of

Biomedical Informatics

8

HapMap


The International
HapMap

Project is a multi
-
country effort to
identify and catalog genetic similarities and differences in
human beings. Six participating countries:
Japan, the United
Kingdom, Canada, China, Nigeria, and the United States.


The goal is to compare the genetic sequences of different
individuals to identify chromosomal regions where genetic
variants are shared.


Data generated by the Project can be
downloaded

with minimal
constraints.


http://www.hapmap.org/index.html.en

Department of

Biomedical Informatics

9

NCBI SNP

Department of

Biomedical Informatics

10


In
-
Del


Transposon


Aneuploidy




Keiko et al, Genome Research 2008

Department of

Biomedical Informatics

11

SNP Array

Affymetrix SNP 6.0 array


More than 906,600 SNPs:


Unbiased selection of 482,000 SNPs; historical SNPs from the SNP
Array 5.0


Selection of additional 424,000 SNPs


Tag SNPs


SNPs from chromosomes X and Y


Mitochondrial SNPs


New SNPs added to the dbSNP database


SNPs in recombination hotspots


More than 946,000 copy number probes



Department of

Biomedical Informatics

12

SNP Array

Affymetrix SNP 5.0 array


Department of

Biomedical Informatics

Cytogenetics

Department of

Biomedical Informatics

CGH


Comparative Genomic
Hybridization

Department of

Biomedical Informatics

$1000 genome project


Solexa

SOLiD

454

Re
-
sequencing using massive parallel
sequencer

Department of

Biomedical Informatics

16

GWAS


Focus is on SNPs


Control
vs

case


Chi
-
square based test


Distribution of
haplotypes

in different conditions


Contigency

table


Other statistics or metric can also be used


Department of

Biomedical Informatics

17

GWAS


Statistical challenges


Millions of SNPs


millions of tests


Compensate for multiple tests


P
-
value cutoff is very stringent


Needs a lot of samples (thousands or more) to achieve
the necessary power


Rare event detection is statistically challenging

Department of

Biomedical Informatics

18

GWAS


Interpretation challenges


Association is NOT causation


Many SNPs are on inter
-
genic

regions (not on genes)


For SNPs on genes, most of them do NOT affect
protein coding


what are they doing?


Due to the stringent cut, many potentially associated
genes were not selected and it is hard to infer high
level information such as pathways

Department of

Biomedical Informatics

19

GWAS


Integration of bioinformatics information


Pathway information


not necessarily the same genes
are targeted


could be the same pathways


Other annotations


networks, GO terms


Frequent pattern


data mining using frequent item
set on SNPs


Frequent set mining on pathways (not just genes)


The only phenotypes are disease
vs

control


how
about other phenotypes?


Department of

Biomedical Informatics

20

Quantitative Trait Locus (QTL)


Quantitative phenotype


phenotype attributed to
multiple genes (polygenic effects)


Examples


height, longevity


Multiple genes + environment


QTLs


stretches of DNA containing or linked to the genes
that underlie a QT


Detection


copy number variance, SNPs


Statistical analysis


t statistics (compare the quantitative phenotypes
between the two groups with different genotype)


Multiple genotype groups


ANOVA (F statistics)


Mutual information

Department of

Biomedical Informatics

21

Expression Quantitative Trait
Locus (
eQTL
)


Gene expression is a quantitative phenotype


phenotype attributed to multiple genes (what are the
possible ones?)


Besides other genes


regulatory elements


eQTLs



most focus on SNP
vs

gene expression


3 million SNPs X 20,000 genes


㙘㄰
10

ANOVA tests

Department of

Biomedical Informatics

22

Expression Quantitative Trait
Locus (
eQTL
)


Restrain to a small set

of SNPs


E.g., for a gene, only focus on the SNPs on the gene


Cis
-
eQTL

(local)


Trans
-
eQTL

(distal)


Direct and indirect effects


Second and third order effects


eQTL

networks


Lodish

et al, Molecular Cell Biology

Department of

Biomedical Informatics

RNA
-
seq


Paradigm

changes by NGS


RNA
-
seq



not only gene expression, but also
sequences



Department of

Biomedical Informatics

TopHat

Trapnell et al. Bioinformatics 2009

Department of

Biomedical Informatics

After TopHat


You got this:







But you want this:

Cufflinks

Department of

Biomedical Informatics


Assigning each reads to
its potential isoform by
maximizing a function
that assigns a
likelihood to all
possible sets of relative
abundances of the
different isoforms.



Open source software

Trapnell et al. Nat. Biot 2010

Cufflinks

Department of

Biomedical Informatics

From sequence reads to isoforms

Primary aligner:

Eland, BFAST, BOWTIE, …

Junction finding
Strategy:

TopHat

SOLiD Bioscope



Isoform

identification:

Xing et al. NAR 2006

Jiang et al. Bioinformatics 2009

Cufflink (Nat
Biot

2010)

Scribble (Nat
Biot

2010)



Department of

Biomedical Informatics

Allelic Specific Expression





Specific X
-
chromosome suppression



Much more broader presence in the genome


Screen for functional SNPs



Department of

Biomedical Informatics

Allelic Specific Expression





Screen for functional SNPs



A=48 G=89

A=99 G=105

Department of

Biomedical Informatics

Allelic Specific Binding


Protein

binding requires recognition of specific
sequences (motifs)


Mutations on the binding sites may lead to disruption
of regulation and hence expression




Kasowski

et al,
Science,
2010.

Department of

Biomedical Informatics

Allelic Specific
Methylation


One of the earliest known mechanism for allelic specific
expression





Department of

Biomedical Informatics

Other Allelic Specific Events


Allelic specific splicing




BMC Genomics.

2008 Jun 2;9:265.

Genome
-
wide survey of allele
-
specific splicing in humans.

Nembaware

V
,
Lupindo

B
,
Schouest

K
,
Spillane C
,
Scheffler

K
,
Seoighe

C
.