DTC bioinformatics module
Genome bioinformatics practical
Divide into five groups
these will use, and (where possible) develop bioinformatics methods
for the analysis of different aspects of genetic information. We will focus on human
one of the smallest in our genome
but one that has been
We will look at five areas
i) Gene structure
Describe the distribution of gene structures (exons, introns, coding sequences, regulatory
signals) within the chromosome. Devise a simple parametric model for exon structure and
estimate the parameters from the data.
Are genes and gene structures rando
along the chromosome?
Do established and predicted genes have similar properties? Use the
complete list of genes and gene structures available from
for more information and additional downloads.
ii) Base composition
Describe how the base composition varies along the chromosome in t
erms of nucleotide
frequencies and simple words.
Develop a simple HMM to look for CpG islands. What is the
relationship between these and genes?
Use the complete DNA sequence available from
iii) Molecular evolution
Describe the extent and nature of the divergence between humans and chimps.
Do all types
of mutation occur at the same rate?
To what extent is di
vergence influenced by local base
composition? Do genes and non
coding sequences evolve at similar rates? Use the set of
chimp alignments available from
You could also use the set of 7645 aligned human
genes sequenced by Celera
(before they got bought out). Data available at
iv) Population structure
Describe the extent of population differentiation among humans using the SNP genotype
information collected in a 10Mb region of chromosome 20. If you didn’t know where
individuals came from, would you be able to classify people into different groups?
How do these groups compare with the geographical labels?
Use the genotype information
And the program
for a description of the data,
for a discussion of human structure and
discussion of the STRUCTURE algorithm.
v) Recombination and linkage disequili
How does the recombination rates vary along the chromosome? Does recombination rate
correlate with underlying genomic features such as gene location and GC content? Use
based estimates of the recombination rate available from
The program Recmin written by Simon Myers
And the LDhat package to estimate recombination
on from population genetic data
genotype data from
unrelated UK Caucasians
(software will be made available on Friday)
(2004) The impact of SNP density on fine
scale patterns of linkage
Hum. Mol. Genet.
(2001) The physical maps for sequencing human chromosomes 1, 6, 9,
10, 13, 20 and X.
(2001) Initial sequencing and analysis of the human genome.
(2003) Inferring nonneutral evolution from human
orthologous gene trios.
(2004) The fine
scale structure of recombination rate variation in the
(2002) Genetic structure of human populations.
(2003) Inference of populat
ion structure using multilocus genotype data:
linked loci and correlated allele frequencies.
(2000) Inference of population structure using multilocus genotype
02) A high
resolution recombination map of the human genome.
10 Myers,S.R. and Griffiths,R.C. (2003) Bounds on the minimum number of recombination
events in a sample history.