Genomics II—Proteomics and Bioinformatics

clumpfrustratedBiotechnology

Oct 2, 2013 (3 years and 2 months ago)

62 views

Chapter 22

Genomics II


Functional Genomics

studying genes in groups,
with respect to the cell, tissue, signaling pathway
or organism


Proteomics

to understand the interplay among
many different proteins (cellular processes and
organismal level [traits])


Bioinformatics

using computers, math, and
statistics to understand the genome and
proteome information (record, store, analyze,
predict)

Chapter 22

Genomics II


Functional Genomics

studying genes in groups,
with respect to the cell, tissue, signaling pathway
or organism


Proteomics

to understand the interplay among
many different proteins (cellular processes and
organismal level [traits])


Bioinformatics

using computers, math, and
statistics to understand the genome and
proteome information (record, store, analyze,
predict)

Add reverse transcriptase, poly
-
dT

primers that anneal to the mRNAs,

and fluorescent nucleotides.

Note: Only 1 complementary

cDNA strand is made.

View with a laser scanner.

Hybridize cDNAs

to the microarray.

A mixture of 3

different types of

mRNA

A portion of a DNA microarray

Fluorescently

labeled cDNA that

is complementary

to the mRNA

A

A

A

A

B

C

D

E

F

A

B

C

D

E

F

D

F

F

F

D

D

A

A

A

D

F

F

F

D

D

Figure 22.1


Copyright ©The McGraw
-
Hill Companies, Inc. Permission required for reproduction or display

Microarrays for
studying

gene
expression or re
-
sequencing

Modern day “
Southerns
” and

Northerns


microarray analysis

Two
distinct forms of
large
B
-
cell
lymphoma are shown by the expression
pattern:
GC B
-
like DLBCL (orange) and Activated B
-
like DLBCL (blue
)

ASH ALIZADEH et al
. 2000

Nature

403
, 503
-
511 (3 February 2000)

significantly
better
overall
survival

Distinct
types of diffuse large B
-
cell lymphoma identified by
gene expression profiling

Observation/problem


Diffuse
large B
-
cell lymphoma (DLBCL
) = most
common subtype of non
-
Hodgkin's
lymphoma
is clinically heterogeneous: 40% of patients respond well to current therapy and
have prolonged survival, whereas the remainder succumb to the
disease

Hypothesis



variability
in natural history reflects unrecognized molecular heterogeneity in the
tumours
.

Experiment


DNA microarrays used for a
systematic characterization of gene expression in B
-
cell
malignancies.

Results


Diversity in gene expression among the
tumours

of DLBCL patients (reflecting the variation
in
tumour

proliferation rate, host response and differentiation state of the
tumour
).


Identified two molecularly distinct forms of DLBCL which had gene expression patterns
indicative of different stages of B
-
cell differentiation.


One type expressed genes characteristic of germinal
centre

B cells ('germinal
centre

B
-
like DLBCL');


the second type expressed genes normally induced during
in vitro

activation of peripheral blood B
cells ('activated B
-
like DLBCL').


Patients with germinal
centre

B
-
like DLBCL had a significantly better overall survival than
those with activated B
-
like DLBCL.

Conclusion


Molecular
classification of
tumours

on the basis of gene expression can thus identify
previously undetected and clinically significant subtypes of cancer.



ASH ALIZADEH et al
. 2000

Nature

403
, 503
-
511 (3 February 2000)

Add formaldehyde to crosslink

protein to DNA. Lyse the cells.

Sonicate DNA into small
pieces.

Add antibodies that recognize the

protein of interest. The antibodies

are bound to heavy beads. After

the antibodies bind to the protein

of interest, the sample is

subjected to centrifugation.

Collect complexes in pellet.

Add chemical that breaks the

crosslinks to remove the protein.

Unknown Candidates:

Ligate
DNA linkers to the

ends of the DNA.

Known Candidates:

Conduct
PCR using primers

to a known DNA region.

If PCR amplifies the DNA,

the protein was bound to

the DNA region recognized

by the primers.

Conduct PCR using primers

that are complementary to

the linkers. Incorporate

fluorescently labeled

nucleotides during PCR.

Denature DNA and

hybridize to a microarray.

Antibody against

protein of interest

Protein of interest

Bead

Protein of interest

Linker

or

Pellet

See Figure 22.1

Figure 22.2

Which DNA sequences
bind to my protein of
interest?

Chromatin
Immunoprecipitation

Assay
(
ChIP
)

Chapter 22

Genomics II


Functional Genomics

studying genes in groups,
with respect to the cell, tissue, signaling pathway
or organism


Proteomics

to understand the interplay among
many different proteins (cellular processes and
organismal level [traits])


Bioinformatics

using computers, math, and
statistics to understand the genome and
proteome information (record, store, analyze,
predict)

Exon 1

Exon 2

Exon 3

Exon 4

Exon 5

Exon 6

Exon 1

Exon 2

Exon 4

Exon 5

Alternative splicing

Translation

Exon 6

Exon 1

(a) Alternative splicing

Exon 3

Exon 4

or

or

Exon 5

Exon 6

pre
-
mRNA

Exon 1

Exon 2

Exon 4

Exon 6

Why is the proteome so large? Alternative splicing

Proteolytic

processing

Attachment of

prosthetic

groups, sugars,

or lipids

Sugar

Heme

group

Phospholipid

Disulfide bond

formation

S

S

SH

SH

Irreversible

modifications

(b) Posttranslational covalent modification

Phosphorylation

Methylation

Phosphate

group

Acetyl

group

Methyl

group

PO
4
2
-

C

CH
3

CH
3

O

Reversible modifications

Acetylation

Copyright
©
The McGraw
-
Hill Companies, Inc. Permission required for reproduction or display.

Why is the
proteome so
large?

Post translational
modification

SDS
-
polyacrylamide gel

Proteins migrate until they

reach the pH where their

net charge is 0. At this

point, a single band could

contain 2 or more

different proteins.

Lyse a sample of cells and

load the resulting mixture

of proteins onto an isoelectric

focusing gel.

pH 10.0

pH 4.0

pH 10.0

pH 4.0

200 kDa

10 kDa

Lay the tube gel onto an

SDS
-
polyacrylamide gel and

separate proteins according

to their molecular mass.

Figure 22.4

Techniques to
study the
proteome: 2D
Gel analysis

Figure 22.5

Copyright ©The McGraw
-
Hill Companies, Inc. Permission required for reproduction or display

Digest protein into

small fragments

using a protease.

Determine the mass

of these fragments with

a first spectrometer.

C

N

C

N

Purified protein

Mass/charge

Abundance

0

4000

1652 daltons

Techniques
to study the
proteome:
Mass spec

Figure 22.5

Copyright ©The McGraw
-
Hill Companies, Inc. Permission required for reproduction or display

Analyze this fragment with

a second spectrometer.

The peptide is fragmented

from one end.

Mass/charge

Abundance

0

4000

1652 daltons

Mass/charge

Abundance

900


Asn

Ser

Asn

Leu

His

Ser


1008

1114

1201

1315

1428

1565

1652

1800

Copyright
©
The McGraw
-
Hill Companies, Inc. Permission required for reproduction or display.

Chapter 22

Genomics II


Functional Genomics

studying genes in groups,
with respect to the cell, tissue, signaling pathway
or organism


Proteomics

to understand the interplay among
many different proteins (cellular processes and
organismal level [traits])


Bioinformatics

using computers, math, and
statistics to understand the genome and
proteome information (record, store, analyze,
predict)

Copyright ©The McGraw
-
Hill Companies, Inc. Permission required for reproduction or display

Numbers represent the base number
in the sequence file

Example of DNA Sequence as stored in
Genetic Database

A bioinformatics program may ask:


Does the sequence contain a gene?


Which
nt’s

are the functional sites (e.g. promoters,
exons, introns, termination sequence)?


Does the sequence encode a protein? (have an open
reading frame [ORF]


What is the secondary structure of its RNA or
associated amino acid sequence?


Is the sequence homologous to any other known
sequences?


What is the evolutionary relationship between two
or more sequences?

Copyright ©The McGraw
-
Hill Companies, Inc. Permission required for reproduction or display


DNA sequences of the
lacY

gene


~ 78% of the bases are a perfect match



In this case, the two sequences are similar because the genes are
homologous to each other


They have been derived from the same ancestral gene


Refer to Figure 22.6


Sequence matches between
E. coli

and
K.
pneumoniae

Human Pa
Ca

Mouse Lu
Ca

Human
LHON,
Human
Thy Ca

Mouse Lu
Ca

Example output from a computer
alignment program (and
comparison to real world data)

Interesting cancer mutation pattern in mitochondrial ND6 protein

Federal Genetic Databases

National Center for Biotechnology Information

www.
ncbi
.nlm.nih.gov/


U.S
. government
-
funded national resource for molecular biology
information.


BLAST programs identify sequences with
homology or similarity

Table 22.5

Copyright ©The McGraw
-
Hill Companies, Inc. Permission required for reproduction or display

Figure 22.6

Accumulation of

random mutations

in the 2 genes

Mutation

Ancestral
lacY

gene

Ancestral

organism

Evolutionary separation

of 2 (or more)

distinct species

lacY
gene

E. coli

lacY
gene

lacY
gene

lacY
gene

K. pneumoniae

Mutation

Origin

of
orthologous
genes

Orthologs
,
paralogs
, homologs

From Thompson
and Thompson,
Genetics in
Medicine
, 6
th

ed.

Like Brooker fig 8
-
7


All

the globin genes
have homology

to each other


a
-
like genes are
paralogs

of each other;


b
-
like
genes are
paralogs

of each other;


a
-
1 in mice and
a
-
1
in humans are
orthologs

Myoglobin

a

chains

b

chains

Hemoglobins

Millions of years ago

1,000

800

600

400

200

0

Mb

ζ

?%
ζ

ψ
α
2

ψ
α
1

α
2

α
1

f

ε

g
G

g
A

?%
β

δ

??

Copyright ©The McGraw
-
Hill Companies, Inc. Permission required for reproduction or display

Figure 8.7

Duplication

Better at binding and
storing oxygen in muscle
cells

Better at binding and
transporting oxygen via red
blood cells

Ancestral
globin

Table 22.5

3′ end

5′ end

Copyright
©
The McGraw
-
Hill Companies, Inc. Permission required for reproduction or display.

Copyright ©The McGraw
-
Hill Companies, Inc. Permission required for reproduction or display

Figure 22.7

A secondary structural model for

E. coli
16S rRNA