DTC bioinformatics module

moredwarfBiotechnology

Oct 1, 2013 (3 years and 8 months ago)

89 views

DTC bioinformatics module

Genome bioinformatics practical

Gil McVean


Divide into five groups


these will use, and (where possible) develop bioinformatics methods
for the analysis of different aspects of genetic information. We will focus on human
chromo
some 20


a
t

63 Mb
one of the smallest in our genome


but one that has been
extensively studied
[1
-
3]
.

We will look at five areas


i) Gene structure

Describe the distribution of gene structures (exons, introns, coding sequences, regulatory
signals) within the chromosome. Devise a simple parametric model for exon structure and
estimate the parameters from the data.
Are genes and gene structures rando
mly distributed
along the chromosome?

Do established and predicted genes have similar properties? Use the
complete list of genes and gene structures available from

http://www.ncbi.nlm.nih.g
ov/genome/guide/human/


See also

http://www.sanger.ac.uk/HGP/Chr20/

for more information and additional downloads.


ii) Base composition

Describe how the base composition varies along the chromosome in t
erms of nucleotide
frequencies and simple words.
Develop a simple HMM to look for CpG islands. What is the
relationship between these and genes?

Use the complete DNA sequence available from
http://genome.ucsc.edu/goldenPath/hg16/chromosomes/chr20.fa.zip



iii) Molecular evolution

Describe the extent and nature of the divergence between humans and chimps.
Do all types
of mutation occur at the same rate?
To what extent is di
vergence influenced by local base
composition? Do genes and non
-
coding sequences evolve at similar rates? Use the set of
human
-
chimp alignments available from

http://ge
nome.ucsc.edu/goldenPath/panTro1/vsHg16/axtNet/chr20.axt.gz

You could also use the set of 7645 aligned human
-
chimp
-
mouse

genes sequenced by Celera
(before they got bought out). Data available at
http://www.sciencemag.org/cgi/content/full/302/5652/1960/DC1

(see
[4]
).


iv) Population structure

Describe the extent of population differentiation among humans using the SNP genotype
information collected in a 10Mb region of chromosome 20. If you didn’t know where
genotyped

individuals came from, would you be able to classify people into different groups?
How do these groups compare with the geographical labels?

Use the genotype information
available from

www.stats.
ox.ac.uk/~mcvean/DTC/SNP


And the program
STRUCTURE 2.1

available from

http://pritch.bsd.uchicago.edu/software.html



See
[1,5]

for a description of the data,
[6]

for a discussion of human structure and
[7,8]

for a
discussion of the STRUCTURE algorithm.


v) Recombination and linkage disequili
brium

How does the recombination rates vary along the chromosome? Does recombination rate
correlate with underlying genomic features such as gene location and GC content? Use
pedigree
-
based estimates of the recombination rate available from

the
[9]

http://www.nature.com/cgi
-
taf/DynaPage.taf?file=/ng/journal/v31/n3/full/ng917.html


The program Recmin written by Simon Myers
[10]

http://www.stats.ox.ac.uk/~m
yers/RecMin.html


And the LDhat package to estimate recombination
-
rate variati
on from population genetic data
[5]



use the
genotype data from

unrelated UK Caucasians

available from

www.stats.ox.ac.uk/~mcvean/DTC/SNP



(software will be made available on Friday)




Reference

List


1 Ke,X.
et al.

(2004) The impact of SNP density on fine
-
scale patterns of linkage
disequilibrium.
Hum. Mol. Genet.
13, 577
-
588

2 Bentley,D.R.
et al.

(2001) The physical maps for sequencing human chromosomes 1, 6, 9,
10, 13, 20 and X.
Natu
re
409, 942
-
943

3 Lander,E.S.
et al.

(2001) Initial sequencing and analysis of the human genome.
Nature
409,
860
-
921

4 Clark,A.G.
et al.

(2003) Inferring nonneutral evolution from human
-
chimp
-
mouse
orthologous gene trios.
Science
302, 1960
-
1963

5 McVean,G.
A.
et al.

(2004) The fine
-
scale structure of recombination rate variation in the
human genome.
Science
304, 581
-
584

6 Rosenberg,N.A.
et al.

(2002) Genetic structure of human populations.
Science
298, 2381
-
2385

7 Falush,D.
et al.

(2003) Inference of populat
ion structure using multilocus genotype data:
linked loci and correlated allele frequencies.
Genetics
164, 1567
-
1587

8 Pritchard,J.K.
et al.

(2000) Inference of population structure using multilocus genotype
data.
Genetics
155, 945
-
959

9 Kong,A.
et al.

(20
02) A high
-
resolution recombination map of the human genome.
Nat.
Genet.
31, 241
-
247

10 Myers,S.R. and Griffiths,R.C. (2003) Bounds on the minimum number of recombination
events in a sample history.
Genetics
163, 375
-
394