here - Bioinformatics - Boston College

clumpfrustratedΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

90 εμφανίσεις

Evolutionary Genome Biology

Gabor T. Marth, D.Sc.

Department of Biology, Boston College

marth@bc.edu

Lecture overview

1. Inter
-
species evolution and
comparative genomics

2. Intra
-
species evolution,
population genomics, and
human origins

1. Inter
-
species evolution and comparative
genomics

Initial sequencing and comparative analysis of the mouse genome

Mouse Genome Sequencing Consortium

Nature 420, 520
-
562. 2002

Questions of Evolutionary Biology



What are the taxological relationships between living organisms
(which organisms are more or less closely related to each other)?




How do genes evolve?




How do genomes evolve?




How do comparisons with other organisms help us understand our
own genome?

Mechanisms of molecular evolution

DNA sequence evolution: mutations

Phylogenetic relationships (1)

Higgs and Attwood, Bioinformatics and Molecular Evolution, Blackwell Publishing

Multiple alignment of mammalian mitochondrial small subunit rRNA
sequences

Phylogenetic relationships (2)

Higgs and Attwood, Bioinformatics and Molecular Evolution, Blackwell Publishing

Jukes
-
Cantor distance matrix for mammalian mitochondrial small
subunit rRNA sequences

Phylogenetic relationships (3)

Higgs and Attwood, Bioinformatics and Molecular Evolution, Blackwell Publishing

Phylogenetic tree constructed from mammalian mitochondrial small
subunit rRNA sequences

Gene structure evolution: duplications

Gene duplication


paralogs

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Evolution of chromosome organization

Synteny

Initial sequencing and comparative analysis of the mouse genome

Mouse Genome Sequencing Consortium

Nature 420, 520
-
562. 2002

Gene classes across organisms

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Gene conservation across organisms

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Comparative genomics helps gene annotations

2. Intra
-
species evolution, population genomics,
and human origins

Questions about human evolution



How do we discover / assess genetic variations?




What is the level of diversity across humans?




How can we model the ancestral and mutation processes?




What do phylogenetic analyses of human mitochondrial sequences
tell us about human origins and dispersal?




Does mitochondrial DNA give us the full picture?




What do we learn from model
-
fitting analysis of nuclear DNA?




A single wave of out
-
of
-
Africa migration or multiple waves?

Human genetic diversity

polymorphism density along
chromosomes varies widely

average polymorphism rate between a
pair of human chromosomes:

1 SNP in 1,300 bp

of sequence

What explains heterogeneity?

G+C nucleotide content

CpG di
-
nucleotide content

recombination
rate

functional
constraints

3’ UTR


5.00 x 10
-
4

5’ UTR


4.95 x 10
-
4

Exon, overall

4.20 x 10
-
4

Exon, coding

3.77 x 10
-
4


synonymous

366 / 653

non
-
synonymous

287 / 653

Variance is so high that these quantities are poor predictors of
nucleotide diversity in local regions hence random processes are likely
to govern the basic shape of the genome variation landscape


(random) genetic drift

The origin of genetic variations



sequence variations are the result of
mutation events

TAAAAAT

TAA
C
AAT

TAAAAAT

TAAAAAT

TAA
C
AAT

TAA
C
AAT

TAA
C
AAT

TAAAAAT

TAA
C
AAT

TAAAAAT

MRCA



mutations are propagated down
through generations



and determine present
-
day variation
patterns

Recombination messes up phylogenies

ac
g
gttatgtaga

accgttatg
t
aga

accgttatgtaga

ac
g
gtt
atg
t
aga

ac
g
gtt
atg
t
aga

ac
g
gtt
atg
t
aga

ac
g
gtt
atg
t
aga

ac
g
gttatgtaga

ac
g
gttatgtaga

ac
g
gttatgtaga

accgttatg
t
aga

accgttatg
t
aga

accgttatg
t
aga



because of recombination, DNA sequences may not have a unique
common ancestor, hence phylogenetic analysis may not apply

What does mtDNA say about human origins?

However, the mitochondrion is only a
single locus (~16kb, short on the
scale of the 3Gb human genome)

Campbell and Heyer. Genomics, Proteomics, Bioinformatics. Cummings.

What does nuclear DNA say?



Because of recombination, phylogenetic analysis is not feasible (there
is not a unique tree that can explain the ancestry of DNA sequences)



Instead, one uses statistical “genetic analysis” i.e. one examines the
statistical properties of the possible ancestries that produced the
nucleotide sequences observed in individuals

Polymorphism data

1. marker density (MD): distribution of
number of SNPs in pairs of sequences

“rare”

“common”

2. allele frequency spectrum (AFS):
distribution of SNPs according to
allele frequency in a set of samples

Clone 1

Clone 2

# SNPs

AL00675

AL00982

8

AS81034

AK43001

0

CB00341

AL43234

2

SNP

Minor allele

Allele count

A/G

A

1

C/T

T

9

A/G

G

3

Population genetic models

past

present

stationary

expansion

collapse

MD

(simulation)

AFS

(direct form)

history

bottleneck

Data fitting: polymorphism density



best model is a
bottleneck

shaped population size history

present

N
1
=6,000

T
1
=1,200 gen.

N
2
=5,000

T
2
=400 gen.

N
3
=11,000

Marth
et al
.

PNAS 2003



our conclusions from the marker density data are confounded by
the
unknown ethnicity

of the public genome sequence we looked at
allele frequency

data from ethnically defined samples

Data fitting: allele frequency

present

N1=20,000

T1=3,000 gen.

N2=2,000

T2=400 gen.

N3=10,000

model consensus:
bottleneck

bottleneck ~ 3,000 generations (or
100,000 years
) ago

Data from other human populations

European data

African data

bottleneck

modest but
uninterrupted
expansion

Marth
et al
.

Genetics 2004

What nuclear DNA tells us

Recent African Origin

Multiregional

our results