Graphs and Networks in Bioinformatics

wickedshortpumpBiotechnology

Oct 1, 2013 (4 years and 1 month ago)

99 views

S
u
san H
o
lmes
S
t
atistics
S
t
anfor
d
G
r
aphs and N
e
tw
o
rks in

Bioinformatics
T
r
ees and E
v
olutionar
y
R
e
lationships
M
e
tabolic P
a
thways
T
r
anscriptional P
a
thways
P
r
ot
e
in P
r
ot
e
in int
e
ractions
Databa
s
es
Diff
erent graphs
T
r
ees
Gene f
a
milies
,
evolution of proteins ￿
P
F
A
M
￿
S
e
quence mutations
,
relation of diseases ￿
H
IV
￿
H
o
molog
y
as an indic
a
tion of F
u
nctio￿
T
i
me to species jump ￿malaria
,
HIV
￿
A
n
thropolog
y
￿
Y

chromosome￿
Splitstree G
r
aphs
M
e
dian N
e
tw
o
rks
G
a
lact
o
se M
e
tabolism
M
e
tabolic P
a
thways
M
e
tabolic P
a
thways
Gly
c
oly
s
is
Krebs C
y
cle

A

netw
o
rk of regulat
o
r
￿
gene int
e
ractions

N
o
des are genes, edges are int
e
ractions

A

Biochemical netw
o
rk responsible for

regulating the expression of genes in c
e
l
l
s
T
r
anscriptional N
e
tw
o
rk
Special c
o
nfi
gurations
Gal4
Gal8
0
Gal4
Gal8
0
Gal3
Negative feedback

loop
Positive feedback

loop
Building a c
e
l
l
￿cycle netw
o
rk
Blue box
– a set of genes
bound by a common set of

regulators and Co-
expressed through the cycle
Text inside boxes
– the

common regulators
Ovals
– regulators,

connected to the genes they

regulated by solid lines
Arcs
– define the period of

activity of the regulator
Dashed line
– a gene

encoding a regulator
T
e
xt
U
r
i
A
lon

s
M
o
tif

Algorithm
Start from a Directed Graph
(interactions are directed edges,maybe of 2 types)
Network is scanned for all possible 3-node and 4-
node sub-graphs, (5-node etc) the number of

occurrences of each sub-graph is recorded.
Compare the numbers to the number of appearances

in “suitable” randomized graphs, suitable in Alon
is Erdos Renyi: probably should be tailored.
“Network motifs’’ defined as those with p-
Value<0.01.
M
o
tifs Det
e
ct
e
d
M
o
tifs det
e
ct
e
d
2 significant motifs :
Feedforward loop
Bi-Fan
Both appeared numerous times in non-homologous gene

systems that perform diverse biological functions
Application t
o
E. Coli

￿Alon￿

Databa
s
e ￿ R
e
gulonDB
c
o
ntains int
e
ractions betw
e
en
T
Fs and the

operons they regulat
e

Contains 577 int
e
ractions, 424 operons and

116
T
Fs

35 more
T
Fs w
e
re added f
r
om lit
e
rature

P
r
e
v
iousl
y
described algorithm wa
s
r
u
n on

this data ￿1000 random netw
o
rks￿

same P￿
v
alue
Signifi
cant motifs
Feedforward loop
found in 22 different systems,
10 TFs and 40 operons
85% of the loops are coherent
P-Val=0.001
Signifi
cant M
o
tif
Single I
n
put M
o
tif ￿SIM￿
Al
l
operons in a SIM are
regulat
e
d with the same
Sign.
Appeared in 24 diff
erent
syst
e
ms
Dense Overlapping
R
e
gulon ￿DOR￿ ￿
a layer of o
v
erlapping
int
e
ractions betw
e
en operons and
a group of
T
Fs, much denser
than this str
u
cture w
o
uld appear
in an Erdos
￿
R
e
nyi random graph
Signifi
cant M
o
tif

E. Coli N
e
tw
o
rk
Single layer of DORS
Feedforward loops and SIMs often occur at the

output of these DORs
DORs are interconnected by global TFs, which

control many genes in one DOR and few in others
Few cascades
Over 70% of operons are connected to DORs,
Rest are in small, disjoint systems
K
e
g
g
pathways
U
s
es gene expression t
o
sc
o
re subgraphs
F
i
nding active

subnetw
o
rks
Discovering Regulatory and Signaling Circuits in

Molecular Interaction Networks
by
T. Ideker, O. Ozier, B. Schwikowski
and
A.F. Siegel,

Whitehead Institute for biomedical research, Cambridge and

University of Washington, Seattle. (Bioinformatics 2002 Jul;18

Suppl 1:S233-40)
Experiment #1
small network of 362

interactions.
T
w
o

conditions of the

expression data:
gal80

deletion vs. WT
.
5
signifi
cant

subnetworks were

found, including
41
out

of
77
signifi
cant

genes.
Galactose

utilization
Gluconeogenes
is
Mating,

cell cycle
rProtein

synth
Glycolytic

enzymes
stress
Experiment #2
Network consists of all known interactions:
7145 protein-protein interactions from
BIND
317 regulation interactions from
TRANSFAC
Expression data includes 20 perturbations to genes

in the Galactose pathway.
7 active subnetworks found. The biggest consists of

340 genes.
Repeating annealing with the network above,

generated 5 significant sub-sub-networks.
R
e
ferenc
e
s
R Milo, S Shen-Orr
,
S Itzkovitz, N Kashtan, D Chklovskii & U
A
lon,
Network Motifs: Simple Building Blocks of Complex Networks
Science, 298:824-827 (2002).
Y
.
Setty
,

A
. E. Mayo, M. G. Surette, and U.
A
lon,
Detailed map of a cis-r
e
gulatory input function
PNAS, 100:7702-7707 (2003).
S Shen-Orr
,
R Milo, S Mangan & U
A
lon,
Network motifs in the transcriptional r
e
gulation network of Escherichia coli.
Nature Genetics, 31:64-68 (2002).
Uetz, P
.
, Schwikowski, B., and Ideker
,

T
.
"V
i
sualization and Integration of Protein-Protein

Interactions", in
Pr
o
tein-Pr
o
tein Interactions:
A

Molecular Cloning Manual
, E. Golemis ed. CSHL

Press, Cold Spring Harbor
,
N.Y
.
(2002).

Ideker
,

T
.
, Ozier
,
O., Schwikowski, B., and Siegel,
A
. Discovering regulatory and signaling circuits

in molecular interaction networks.
Bioinformatics
18
: S233 (2002).

http://www.nature.com/nrc/journal/v2/n5/weinberg_poster/

Subway map of cancer pathways. Nature Reviews Cancer
Courses:http://www
.
dna.calt
e
ch.edu/c
o
urses/
cs191/
Lectures:
http://icg.har
v
ard.edu/~mcb195/Lectures/
summar
y
_04_02.html
T
a
lks:http://www
.
cmth.bnl.go
v
/~ma
s
lo
v
/
rock
e
fel
l
er_2002_netw
o
rks.ppt
Other resourc
e
s
KEGG
- the Kyoto Encyclopedia of Genes and Genomes. The goal of the project is to

computerize the current knowledge on the information pathways of the genes and gene products

that may be regarded as wiring diagrams of biological systems. KEGG consists of three types of

data: pathways, genes, and molecules, linked to each other and to the existing databases

through the
DBGET
integrated database system. The data in KEGG are represented either by

graphical diagrams (pathway maps and genome maps) or hierarchical texts (gene catalogs and

molecular catalogs). The programs for handling the data in KEGG are written either in the CGI

(Common Gateway Interface) scripts or in Java. http://www.genome.ad.jp/kegg/kegg.html
RegulonDB
-
Escherichia coli
signal transduction pathways and transcriptional regulation
WIT
- Integrated system for functional curation and development of metabolic models
UM-BBD
- The University of Minnesota Biocatalysis/Biodegradation Database. Microbial

biocatalytic reactions and biodegradation pathways primarily for xenobiotic, chemical

compounds

EcoCyc
- Encyclopedia of E. coli Genes and Metabolism. Bioinformatics database that

describes the genome and the biochemical machinery of
E. coli
. The long-term goal of the

project is to describe the molecular catalog of the
E. coli
cell, as well as the functions of each

of its molecular parts, to facilitate a system-level understanding of
E. coli
.
MetaCyc
-
Meta
-metabolic database because it contains pathways from a variety of different

organisms. MetaCyc describes metabolic pathways, reactions, enzymes, and substrate

compounds. The MetaCyc data were gathered from a variety of literature and on-line sources,

and contain citations to the source of each pathway. genes and metabolism

Databa
s
es