Pathway construction using

cabbageswerveAI and Robotics

Nov 7, 2013 (3 years and 9 months ago)

90 views

Pathway construction using
integrative Bayesian Networks



Group meeting 29th Sep 2009

Ståle Nygård

Network of differentially
expressed genes is
constructed based on

(1)
co
-
occurrences in
PubMed abstracts,

(2)
protein
-
protein
interactions

(3)
co
-
regulations


A

H

F


C


D


G


B


E


A


D


A

F


C


D


G

H

Seeded BN procedure

1)
Construct undirected
graph based on
number

of PubMed
abstract co
-
occurrences and
protein
-
protein
interactions (PPI)

2)
Make directed acyclic
graph (DAG) using
modified depth first
search

3)
Fit network to gene
expression data using
greedy hill climbing

5

PPI

12

8


2


A

F


C


D


G


B


E


A


D

8

H

6


B


E

3

1

1

Start network

Fitted network

3

Extensions of seeded BN:

Integrating ligand
-
receptor bindings

The Database of Ligand

Receptor
Partners (DLRP) includes 175 protein
ligands, 131 protein receptors and 451
experimentally determined ligand
-
receptor pairings.


A

F


C


D


G

H

5

PPI

12

8


2

8

6


B


E

3

1

1

Ligand

Receptor

I

Receptor

7



3




A

F


C


D


G

H

5

PPI

12

8


2

8

6


B


E

3

1

1

TF

TF target
gene


I

10

TF target
gene

Integrating transcription factor binding data

TRANSFAC®

7.0 Public 2005

and other databases contain data
on transcription factors, their
experimentelly
-
proven binding
sites, and regulated genes.

3



Using sequence homologies


A

F


C


D


G


B


E


A


D

H


A


I

Include homologues of
genes already in the
network.

Fitted network can be used
to infer function of
previously uncharacterized
genes.

I and A are
sequence
homologues

HOGENOM contains
sequence data on
homologous genes
from fully sequenced
organisms

Methodological problem (1):

Going from PDAG to DAG

-

Adding ligand/receptors or TF binding data
gives a start network which is a partially directed
acyclic graph (PDAG).

-

BN needs a directed acyclic graph (DAG).

-

Dor & Tarsi (1992) provide a method for going
from PDAG to DAG


A

F


C


D


G

H

5

PPI

12

8


2

8

6


B


E

3

1

1

Ligand

Receptor

I

Receptor

7



3

Methodological problem (2):

Quantification of local dependencies

How to quantify P(child | parents),

e.g P(E | B,C)?


Suggestions by Friedman (2000):

-

Discretization of gene expressions
and multinomial model:

-

Continous gene expressions and
linear Gausian model


Newer suggestions

-

Non
-
parametric models (e.g Ko et al,
2007)

-

Mixture models using latent variables
(e.g. Grzegorczyk et al 2008)





A

F


C


D


G


B


E


A


D

H

Methodological problem (3):

Fitting DAG to expression data

-
In Seeded BN,
greedy hill climbing

is used to
optimize network. Problem: global optimum is
not guaranteed.

-

Another possibility: Markov Chain Monte Carlo
(MCMC) (e.g. Grzegorczyk and Husmeier, 2008)




A

F


C


D


G


B


E


A


D

H

Project plan

Java source code (from Quackenbush)

Modification of source code

New method generates network of
interactions based on:


-

PubMed Articles


-

PPI data bases


-

DLRP


-

TRANSFAC


-

HOGENOM

Network is trained to fit expression data


Does the new
method
improve the
identifiacation
of known
pathways?


Apply improved
method to KO data
(with unknown
molecular response)

Project participants

Eivind


Vegard


Trevor


Geir Christensen,

Institute for Experimental Medical
Research, OUS
-

Ullevål