Genome Biology and Biotechnology 2005 - CCMM

gooseliverΒιοτεχνολογία

22 Οκτ 2013 (πριν από 4 χρόνια και 21 μέρες)

114 εμφανίσεις

Genome Biology and
Biotechnology

4. The variable human genome

Prof. M. Zabeau

Department of Plant Systems Biology

Flanders Interuniversity Institute for Biotechnology (VIB)

University of Gent



International course 2005

Summary

¤
Sequence Variations in the Human Genome

¤
Haplotype structure of the sequence variations in the
human genome


linkage disequilibrium

in the human genome


haplotype blocks

in the human genome

¤
The
haplotype map

of the human genome


Map of all the genetic variations in the human population

Sequence Variations in the Human Genome

¤
Most human sequence variation (>90%) results from


SNPs (single nucleotide polymorphisms)


SNPs are the result of very rare replication errors in which a
wrong base remains incorporated in the newly synthesized
strand

¤
Human sequence variation is responsible for


Phenotypic variation between individuals


Influencing the risk of common human diseases

The International HapMap Consortium et. al., Nature 437, 1299 (2005)

Root causes of common human diseases

¤
Causes of human diseases are largely unknown


preventative measures are generally inadequate


available treatments are seldom curative

¤
Family history is one of the strongest risk factors
for nearly all diseases


cardiovascular disease, cancer, diabetes, autoimmunity,
psychiatric illnesses and many others


inherited genetic variation has an important role in the
pathogenesis of disease

¤
Identifying the causal genes and variants represents
an important step towards


improved prevention, diagnosis and treatment of disease


The International HapMap Consortium et. al., Nature 437, 1299 (2005)

Heritable human diseases

¤
Rare highly heritable 'mendelian' disorders


> 1000 genes have been identified


variation in a single gene causes disease

¤
Common human diseases


are thought to be due to the combined effect of


many different susceptibility DNA variants


interacting with environmental factors


have proven much more challenging to study


The International HapMap Consortium et. al., Nature 437, 1299 (2005)

Common human diseases

¤
Studies of common diseases: 2 broad classes


family
-
based linkage studies across the entire genome


linkage analysis has low power except when


a single locus explains a substantial fraction of disease


population
-
based association studies of candidate genes



association studies examine only a small fraction of the
'universe' of sequence variation in each patient

¤
Comprehensive search for genetic influences


examining
all genetic differences in a large number of
affected individuals and controls


complete genome resequencing



systematically test common genetic variants


The International HapMap Consortium et. al., Nature 437, 1299 (2005)

Common genetic variants

¤
Common genetic variants


explain much of the genetic diversity in our species


a consequence of the
historically small size and shared ancestry
of the human population


¤
Common variants with an important role in disease


HLA: autoimmunity and infection


APOE4: Alzheimer's disease, lipids


Factor VLeiden: deep vein thrombosis


PP: G: encoding PPAR; type 2 diabetes


KCNJ11: type 2 diabetes


PTPN22: rheumatoid arthritis and type 1 diabetes


CTLA4: autoimmune thyroid disease, type 1 diabetes


NOD2: inflammatory bowel disease


complement factor H: age
-
related macular degeneration


RET: Hirschsprung disease


Sequence Variations in the Human Genome

¤
Most human sequence variation (>90%) results from


SNPs (single nucleotide polymorphisms)


SNPs occur on average every
1,000 bases

when the sequences
of two human individuals are compared


Remainder of the human sequence variation is attributable to


insertions or deletions of one or more bases


repeat length polymorphisms


Rearrangements

¤
SNPs are well suited to automated, high
-
throughput
and low cost genotyping


SNPs are
binary

and can thus easily be typed




SNPs have a
low rate of recurrent mutation


SNPs are
present at sufficient density

for comprehensive
genetic analysis

High throughput SNP Genotyping Methods

¤
Primer extension


Primer designed adjacent to the SNP, extended and the
extension product analyzed


Fluorescence


Mass spectrometry



¤
Oligonucleotide ligation


Ligation requires perfect base pairing of the terminal
nucleotides

¤
Array
-
based hybridization



high density Affymetrix microarrays


25
-
mer oligonucleotides are perfectly suited to discriminate SNP
alleles


Latest product 500.00 SNP array

A/C

T/G

Haplotype structure of the sequence variations

¤
Human genetic diversity appears to be


Limited
:

a small number of
common polymorphisms

explain the
bulk of the observed variation,


i.e. are found in most individuals in the population


Structured
:

specific combinations of alleles



haplotypes



are
observed at closely linked sites

recombination

Haplotype 1

Haplotype 2

SNP

Haplotype 3

Haplotype Structure of the Sequence Variations

¤
At a macroscopic scale (chromosome),


recurrent recombination results in complete
linkage equilibrium



random combinations of SNPs


Recombination

1
st

generation

Random assortment of SNPs

N generations

N recombination events

Haplotype Structure of the Sequence Variations

¤
At a microscopic scale (gene)


Non
-
random recombination results in
linkage disequilibrium



Non
-
random combinations of SNPs: haplotype blocks


1
st

generation

N generations

Haplotype blocks

Linkage disequilibrium in the human genome

¤
Landmark paper presenting


a systematic analysis of the extent of linkage disequilibrium in
the human genome


a large
-
scale experiment to measure linkage disequilibrium (LD)
in 19 randomly selected genomic regions in


United States population of north
-
European descent


Nigerian population

Reich et. al.,
Nature

411
, 199
(
2001
)

Reprinted from: Reich et. al.,
Nature

411
, 199
(
2001
)

Experimental Approach

¤
Selected 19 high
-
frequency or common SNPs in genes
as core SNPs


High
-
frequency SNPs tend to be
common in all populations
,
facilitating cross
-
population comparisons


Linkage disequilibrium around common alleles can be measured
with
a modest sample size

of 80

100 chromosomes


Linkage disequilibrium around common alleles represents a
'worst case' scenario


Such alleles are generally old and there has been ample historical
opportunity for recombination to break down ancestral haplotypes

Reprinted from: Reich et. al.,
Nature

411
, 199
(
2001
)

Experimental Approach

¤
High frequency SNPs were identified at various
distances from the core SNPs


Re
-
sequenced regions of ~ 2

kb at 0, 5, 10, 20, 40, 80 and
160

kb from the core SNP in 44 unrelated individuals from Utah


Identified a total of 272 'high frequency' polymorphisms


Measured
linkage disequilibrium

between two SNPs using the
classical statistic D‘


D’ = observed linkage/maximal linkage: P
ab
/(P
a
,P
b
)

Core SNP

40

20

10

80

160

Reprinted from: Reich et. al.,
Nature

411
, 199
(
2001
)

Observed Linkage Disequilibrium

¤
Linkage disequilibrium has a half
-
length of ~ 60

kb


linkage disequilibrium extends much (10
-
fold) further than
previously predicted

Reprinted from: Reich et. al.,
Nature

411
, 199
(
2001
)

Why does linkage disequilibrium extend so far?

¤
Long
-
range linkage disequilibrium can be explained by


an extreme founder effect or population bottleneck


A period when the population was so small that
a few
ancestral haplotypes
gave rise to the

present day haplotypes

¤
Linkage disequilibrium in different populations


short
-
range linkage disequilibrium
is general in sub
-
Saharan
African populations


long
-
range linkage disequilibrium
is typical for northern
Europeans


a severe bottleneck in the European population

could have generated
the linkage disequilibrium

Origin of linkage disequilibrium?

¤
The bottleneck could be specific to northern Europe


Europe was substantially depopulated during the Last Glacial
Maximum (30,000

15,000 years ago), and subsequently
recolonized by a small number of founders


Long range linkage disequilibrium would be absent in other non
-
African populations

¤
The bottleneck is more global


Result of the dispersal of the modern humans from Africa
50,000 years ago


Long
-
range linkage disequilibrium would then be present in a variety
of non
-
African populations

Reprinted from: Reich et. al.,
Nature

411
, 199
(
2001
)

High
-
resolution Haplotype Structure in the
Human Genome

¤
Landmark paper presenting


High
-
resolution analysis of the haplotype structure across
500
kb region on chromosome 5q31


Genotyped 103 common SNPs in 129 trios from a European
-
derived
population


Low marker density of 1 SNP roughly every 5 kb


First high
-
resolution picture of the patterns of genetic variation
across a large genomic region

Daly et. al., Nature Genet. 29, 229 (2001)

Block
-
like Haplotype Diversity at 5q31

¤
The common SNPs are arranged in haplotype blocks


span up to 100 kb


contain
multiple (five or more) common SNPs



have
only a few (2

4) haplotypes
, which


account for the majority of chromosomes (>90%) in the sample


show no evidence of being derived from one another by
recombination

Reprinted from: Daly et. al.,
Nature Genet.

29
, 229 (2001)

Block
-
like Haplotype Diversity at 5q31

¤
The haplotype blocks are separated by intervals


in which several independent
historical recombination events
seem to have occurred

¤
The historical recombination events are clustered


multiple exchanges between most blocks


little or no recombination within blocks.


The clustering of recombination events is suggestive of
local
hotspots of recombination

Reprinted from: Daly et. al.,
Nature Genet.

29
, 229 (2001)

Historical recombination events

Implications of Haplotype blocks

¤
Once the haplotype blocks are identified


they can be treated
as alleles

in genome
-
wide association studies
to find medically relevant variation


Holy grail of pharmacogenetics


a subset of SNPs
haplotype tag SNPs


htSNPs

-

can be used to
uniquely distinguish the common haplotypes in each block


A subset of all the SNPs is sufficient for whole
-
genome association
anlysis

Reprinted from: Daly et. al.,
Nature Genet.

29
, 229 (2001)

Blocks of Limited Haplotype Diversity Revealed
by High
-
Resolution Scanning of Human
Chromosome 21

¤
Landmark paper presenting


the
haplotype structure of chromosome 21


Used
high
-
density oligonucleotide arrays
,

in combination with somatic cell
genetics


To identify the
common SNPs on human chromosome 21



To directly

observe the
haplotype structure

defined by these SNPs


This structure

reveals blocks of limited haplotype diversity in which
more than

80% of a global human sample can typically be
characterized by

only
three common haplotypes


Patil et. al., Science, 294: 1719 (2001)

Experimental Approach

¤
Discovered chr 21

SNPs and determined the
haplotype

structure using


ultra high
-
density oligonucleotide arrays


in combination

with somatic cell genetics

¤
SNPs discovery


Using a public panel of 24

ethnically diverse individuals


African,

Asian, and Caucasian


Physically

separated the two chr 21 copies from each individual



using a rodent
-
human somatic cell hybrid technique


Analyzed 20 independent copies of chromosome 21

¤
Since SNPs are characterized on haploid copies


they directly reveal haplotypes


The SNPs of chromosome 21 reveal numerous haplotype blocks

Reprinted from: Patil et. al.,
Science
, 294: 1719 (2001)

Haplotype Block Defined by 14 Common SNPs

Block of consecutive common SNPs

Nucleotide position on chrom. 21

15/20 individual chromosomes

major allele

minor allele

Haplotype blocks

1

2

3

4

5

6

Reprinted from: Patil et. al.,
Science
, 294: 1719 (2001)

Haplotype Block: selection of tag SNPs

haplotype patterns

1

2

3

4

SNPs for genotyping

4 common haplotypes

Reprinted from: Patil et. al.,
Science
, 294: 1719 (2001)

Inventories of human genome sequence variation

¤
The first inventory of SNPs was made by


The public Human Genome Project

(HGP)


971,077 candidate SNPs were identified as sequence differences in
regions of
sequence overlap

between large
-
insert clones


The SNP Consortium

(TSC)


a public/private consortium


Discovered using a publicly available panel of 24 ethnically diverse
individuals


1,023,950 candidate SNPs identified by shotgun sequencing of
genomic fragments and aligning to the genome sequence

¤
First inventory (2001) comprised 1,4 million SNPs


Average density of
one SNP every 1.91

kb


SNPs primarily in regions surrounding genes


estimate
60,000 exonic SNPs

in the collection


The International SNP Map Working Group
,

Nature

409
,
928

(
2001
)

Human genome sequence variation

¤
It is estimated that in the world's human population


about
10 million “common” SNPs



With a minor allele frequency of 1% or more


one variant per 300 bases on average


these 10 million common SNPs constitute 90% of the variation in
the world population


The remaining 10% of the variation is due


A large number of SNPs that are rare in the population


These may represent another 30 million SNPs

¤
Next frontier in the human genome


Complete inventory of the common SNPs


Complete map of the common SNPs: The HapMap project


The International SNP Map Working Group
,

Nature

409
,
928

(
2001
)

The International HapMap Project


¤
The goal of the International HapMap Project


determine the common patterns of DNA sequence variation in
the human genome and


make this information freely available in the public domain.

¤
The HapMap will


allow the discovery of sequence variants that affect common
disease


will facilitate development of diagnostic tools


will enhance our ability to choose targets for therapeutic
intervention

The International HapMap Consortium, Nature 426, 789
-

796 (2003)

The International HapMap Project

¤
Determine haplotype patterns across the genome


5 million common sequence variants


genotyped in 270 DNA samples from populations of Africa, Asia and Europe


Common SNPs are found in all populations


Project includes several populations from different geographic
locations


Yoruba, Japanese, Chinese individuals and individuals with
ancestry from Northern and Western Europe


¤
Genotyping strategy


Phase I


initial round of genotyping of 1.00.000 SNPs in the 270 DNA samples


completed December 2004


Phase II


genotyped 5 million SNPs at ~ 1
-
kilobase intervals in 270 individuals


Completed November 2005

The International HapMap Project

¤
The extent of association between nearby markers


varies dramatically across the genome


the patterns of association must be empirically

determined for
efficient selection of tag SNPs.

¤
On the basis of empirical studies it is estimated that


most of the information about genetic variation represented by
the 10 million common SNPs in the population could be provided


by genotyping
200,000 to 1,000,000 tag SNPs across the genome



Thus, a substantial reduction in the amount of genotyping can be
obtained with little loss of information, by


using knowledge of the LD present in the genome.

Perspectives

¤
For the full potential of the HapMap to be realized



The genotyping technology must


become more cost efficient, and the analysis methods must be improved



Pilot studies with other populations must be completed


to confirm that the HapMap is generally applicable

¤
Genome
-
wide association projects must establish


carefully phenotyped sets of affected and unaffected individuals for
many common diseases in a way that


preserves confidentiality


retains detailed clinical and environmental exposure data

¤
Careful attention must also be paid to the ethical issues that


will be raised by the HapMap and the studies that will use it



challenge to avoid misinterpretations or misuses of results from studies
that use the HapMap


Whole
-
Genome Patterns of Common DNA
Variation in Three Human Populations

¤
Paper presents


Whole
-
genome patterns of common human DNA variation
by
genotyping 1,586,383 SNPs in 71 Americans of European,
African, and Asian ancestry


Different approach to represent the structure of genetic
variation


LD bins: clusters of tightly linked SNPs

Hinds et. al., Science. 307: 1072
-
1079 (2005)

Reprinted from:
Hinds et. al., Science. 307: 1072
-
1079 (2005)

Extended LD bin and haplotype block structure
around the
CFTR

gene

Conclusion

¤
The 1,5 Million SNPs capture


most common human genetic variation as a result of linkage
disequilibrium


strong correlation among common SNP alleles that define haplotypes

¤
Strong correlation between


extended regions of
linkage disequilibrium



functional genomic elements

¤
First generation haplotype map provides a tool for


exploring the causal role of common human DNA variation in
complex human traits



investigating the nature of genetic variation within and between
human populations.

Reprinted from:
Hinds et. al., Science. 307: 1072
-
1079 (2005)

A haplotype map of the human genome


¤
Paper presents


A map of >1 million SNPs for which accurate and complete
genotypes have been obtained in 269 DNA samples from four
populations


The data document the generality of


recombination hotspots


a block
-
like structure of linkage disequilibrium


low haplotype diversity


substantial correlations of SNPs with many of their neighbours

The International HapMap Consortium et. al., Nature 437, 1299 (2005)

The International HapMap Consortium et. al., Nature 437, 1299 (2005)

Number of SNPs in dbSNP over time

¤
Public database dbSNP
(
http://www.ncbi.nlm.nih.gov/SNP/
)


October of 2005:
10,4 million RefSNP clusters


4,8 million validated SNPs



The International HapMap Consortium et. al., Nature 437, 1299 (2005)

Genealogical relationships among haplotypes

The International HapMap Consortium et. al., Nature 437, 1299 (2005)

Length of LD spans

The International HapMap Consortium et. al., Nature 437, 1299 (2005)

Conclusions

¤
The phase I haplotype map documents the generality
of


block
-
like structure of linkage disequilibrium


low haplotype diversity


recombination hotspots


substantial correlations of SNPs with many of their neighbours

¤
Important application of the HapMap data is


make possible comprehensive, genome
-
wide association studies


Identify the root causes of common deseases

Recommended reading

¤
Human Haplotype Map


The Structure of Haplotype Blocks in the Human Genome


Daly et. al., Nature Genet. 29, 229 (2001)


The human HapMap project


The International HapMap Consortium, Nature 426, 789
-

796
(2003)


Haplotype map of the human genome


The International HapMap Consortium et. al., Nature 437, 1299
(2005)



Further reading

¤
Sequence variations in the human genome


A map of human genome sequence variation


The International SNP Map Working Group, Nature 409, 928 (2001)

¤
Haplotype structure of the sequence variations in the
human genome


Linkage disequilibrium in the human genome


Reich et. al., Nature 411, 199 (2001)


The Structure of Haplotype Blocks in the Human Genome


Patil et. al., Science, 294: 1719 (2001)


First generation human haplotype map


Hinds et. al., Science. 307: 1072
-
1079 (2005)