# Learning Bayesian Networks with micro array data

AI and Robotics

Nov 7, 2013 (4 years and 6 months ago)

113 views

Learning Bayesian Networks with
microarray data

Goal: use well known
Bayesian network
learning algorithms to
analyze microarray data

Challenge in microarray data analysis
techniques

Prior techniques (clustering, PCA, SVM):

Group together genes with similar expression
patterns

Do not reveal structural relations between genes

The challenge:

Extract meaningful information from the expression
data

Discover interaction between genes based on the
measurements

Use of Bayesian networks in
microarray data analysis

Expression
Profiles

Constructed bayesian
network

Gene
-
gene
relation analysis

Activation or inhibition

Sample
classification

Disease diagnosis

Gene Regulatory
network analysis

Global view on the
relations among genes

Bayesian networks: a short
example

Evidence: my car does not start.

Reasoning: now fuel and dirty spark plugs become more
certain, therefore the certainty of the fuel meter standing
for empty also increases.

Fuel

start

Fuel meter

Clean spark

plug

Bayesian networks: a short
example

The bayesian directed acyclic graph actually
describes the joint probability of
P(X
1
,X
2
,…,Xn)
:

P(
X
) =
П
P(X
i
|
Pa
(X
i
))

n

i=1

Where
Pa
(X
i
)
are the parents of node
X
i

Fuel=yes

Fuel=no

FMS=full

0.39

0.001

FMS=medium

0.60

0.001

FMS=empty

0.01

0.99

Fuel

Fuel meter

standing

P(FMS|F)

Learning the gene network with
Bayesian methods

Deals with noisy data

Have good statistical foundation

Compact and intuitive representation

The total possible DAGs with 10 nodes is 4.2 *
10^18

# samples << #features in microarray experiments

Acyclic

Friedman used a specialized learning method (SCA),
permuted the dataset to learn 200 networks and selected
some special features from these networks to create a final
network.

Dominant genes

Functionally related pairs

Clusters of dominated genes

methods: Reproducing Page

data set: 74 myeloma samples and 31 healthy samples (affy)

genes selected and discretize on basis of entropy (info gain)

Learned ‘markov blanket’ to
classify examples is a naïve
bayesian

100% score

Only 15 out of 30 genes
needed

Problem is that we compare ill VS healthy: big difference

My results : Van ‘t Veer experiment

70 metastases predicting genes in breast cancer
samples found by van ‘t Veer are used to learn a
network

two networks are learned:

Markov blanket to classify: only 16 of 70 genes
score 95% correct (van ‘t veer scores 84% !)

PDAG: ‘Interesting’ global network but
significance is not clear.

Markov blanket van ‘t Veer

PDAG van ‘t veer

Further plans

Use other bayesian network learners and try to
discover the significance and robustness of the resulting
networks

Discretization methods have a large influence on the
resulting network: try different methods

Gene selection method : Use prior knowledge to
select a group of genes (pathways)

Conclusion

Experiment for a few more months!