bioinformatics

hordeprobableΒιοτεχνολογία

4 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

106 εμφανίσεις







MG 500
-
Web site:


www.biosci.ohio
-
state.edu/~dverma/

Genomics, Bioinformatics and
Proteomics

Chapter 19

From genes to genomes


Functions of many genes are not known


Genes for many phenotypes are not known


Mutants without genotype are known


Polygenic traits


Next 10 years goal is to define the function of



most genes in plants and animals


Genome size of model organisms

Organism

cellular complexity

genomic complexity

______________________________________

Bacterium

one cell prokaryote

4Mb


Yeast


one cell Eukaryote

15Mb


Nematode

1000 cells


100Mb


Drosophila

50,000 cells


180Mb


Mouse


10
-
11

cells


3000Mb


Human


10
-
14

cells


3000Mb



A Genomic Project


Genetic map


Physical map


cDNA sequencing


Genomic sequencing


clone
-
by
-
clone


-
shotgun method

Annotating the sequence


Identification of Open Reading Frames


Six possible open reading frames


Initiation codon (ATG)


Termination codon (TAA, TGA, TAG)


Splicing


False open reading frames


Wrong termination



ORF Search Programs


Codon bias


High in exons


Nill or low in introns


Intron exon junction


Upstream regulatory sequences


Poly A addition signal

EST: Expressed Sequence Tags


Not full length sequences


Provide general information about a gene
transcript, tissue specificity and abundance
of expression


Full length sequence can be obtained by
PCR

Gene Arrays


Synthetic DNA (50
-
60b) immobilized on a chip


Hybridize with fluorescent labeled cDNA


Commercially available gene arrays


Total Genome array


Tissue specific gene array


Disease
-
specific gene array


Signal transduction pathway arrays

Computer Programs for Genome
Analysis


Data basis (over 800 genomes sequenced)


Gene data bases. EST, SNPs, protein data base


EMBL/ GenBank (National Center for Biotecnology
Information)


FASTA, ORF


BlastN


BlastP


Consensus sequence data base, secondary structure
analysis

Procaryotic Genomes


Most genomes are Circular


Gene density is very high


One gene /kb


Intergenic region very short


Little or no introns


Presence of operons (polycistronic transcription
unit)


1500
-
5000 genes


Eucaryotic Genomes


Low gene density


Increase in the size of genome is not proportional
to gene number yeast 1 gene/2.5kb; human 1 gene
/8.5kb)


Number and size of introns increases as the
genome size increase (yeast has few introns;
human single gene may have 100 intron)


Presence of repeatitive DNA and large intergenic
regions


Eucaryotic gene


Monocistronic but C. elegans has 25% of the gene
arranged as polycistronic


Genes are found within introns of other genes


Gene duplications and the presence of pseudo
genes


Maize has 10 times larger genome than
Arabidopsis but contains about the same number
of genes (genes are arranged in clusters)


Gene
-
empty regions may be involved in
chromosomal rearrangement

Insights from genomics


Organisms resembling single cell algae existed 1.4
billion years ago


Genomes are highly dynamic and evolving
raqpidly


Smallest genome (Mycoplasma , only 470 genes)


Disease causing bacteria have reduced their
genome size (Mycoplasma leprae has lost 50% of
its genome); causing slow growth

Gene Duplication


Important for evolution


Over 50% of the genome is duplicated in
yeast


Provides insight in to the evolution of a
species (over 10,000 genes in human aqre
duplicated)


Gene duplication increases genetic diversity

Gene Duplication


Caused by unequal crossing over


Replication errors


Molecular Phylogenetics allows determination of
duplication and divergence events


Duplicated genes may remain linked or become
scattered on different chromosomes, eg globin
genes.


Multigene families can provide diverse functions
during development


Multiple proteins that arise from gene duplications
are known as Paralogs


Immunoglobin Genes


Two light chains and two heavy chains with
constant and variable regions


A unique somatic recombination during B cell
maturation occurs to generate over 100,000
possible configurations that are specific to each
unknown antigen.


Recombination occurs between variable region, J
region and C region


J and C regions have no promoters

Immunoglobulins

Proteomics


One gene
-

one enzyme concept is not valid


Many proteins are post
-
translationally
modified/ combine with other proteins to
make a functional complex


Proteome changes during development and
in response to the environment

2G Gel analysis


Denature proteins


Isoelectric Focusing (resolve on the basis of
isolelectic point of a protein


SDS
-

poly acrylamide Gel electrophoresis


(resolve on the basis of size)

Resolve 200
-
1000 major spots


Isoelectric point (pH)

MW

Protein fingerprinting


Each spot from a gel can be cut out and
sequenced using MASS
-
Spectrophotometer


Treat with trypsin protease and analyze the
mass of the fragment, compare data with the
information of the mass of a peptide
fragment generated by computer analysis of
a protein database

Genomics/proteomics


A fertile field with enormous potential for
new discoveries/products



bioinformatics



data mining


Diagnostics, Novel Agricultural crops