Learning Bayesian Networks with micro array data

lettuceescargatoireAI and Robotics

Nov 7, 2013 (4 years and 1 day ago)

103 views

Learning Bayesian Networks with
microarray data

Goal: use well known
Bayesian network
learning algorithms to
analyze microarray data

Challenge in microarray data analysis
techniques


Prior techniques (clustering, PCA, SVM):


Group together genes with similar expression
patterns


Do not reveal structural relations between genes


The challenge:


Extract meaningful information from the expression
data


Discover interaction between genes based on the
measurements

Use of Bayesian networks in
microarray data analysis

Expression
Profiles

Constructed bayesian
network

Gene
-
gene
relation analysis

Activation or inhibition

Sample
classification

Disease diagnosis

Gene Regulatory
network analysis

Global view on the
relations among genes

Bayesian networks: a short
example

Evidence: my car does not start.

Reasoning: now fuel and dirty spark plugs become more
certain, therefore the certainty of the fuel meter standing
for empty also increases.

Fuel

start

Fuel meter

Clean spark


plug

Bayesian networks: a short
example

The bayesian directed acyclic graph actually
describes the joint probability of
P(X
1
,X
2
,…,Xn)
:

P(
X
) =
П
P(X
i
|
Pa
(X
i
))


n

i=1

Where
Pa
(X
i
)
are the parents of node
X
i

Fuel=yes

Fuel=no

FMS=full

0.39

0.001

FMS=medium

0.60

0.001

FMS=empty

0.01

0.99

Fuel

Fuel meter

standing

P(FMS|F)

Learning the gene network with
Bayesian methods



Deals with noisy data



Have good statistical foundation



Compact and intuitive representation



The total possible DAGs with 10 nodes is 4.2 *
10^18



# samples << #features in microarray experiments



Acyclic


Already achieved by using
advanced algorithms

Friedman used a specialized learning method (SCA),
permuted the dataset to learn 200 networks and selected
some special features from these networks to create a final
network.




Dominant genes



Functionally related pairs



Clusters of dominated genes

My results using less advanced
methods: Reproducing Page



data set: 74 myeloma samples and 31 healthy samples (affy)



genes selected and discretize on basis of entropy (info gain)



Learned ‘markov blanket’ to
classify examples is a naïve
bayesian



100% score



Only 15 out of 30 genes
needed

Problem is that we compare ill VS healthy: big difference

My results : Van ‘t Veer experiment



70 metastases predicting genes in breast cancer
samples found by van ‘t Veer are used to learn a
network



two networks are learned:



Markov blanket to classify: only 16 of 70 genes
score 95% correct (van ‘t veer scores 84% !)



PDAG: ‘Interesting’ global network but
significance is not clear.

Markov blanket van ‘t Veer

PDAG van ‘t veer

Further plans



Use other bayesian network learners and try to
discover the significance and robustness of the resulting
networks



Discretization methods have a large influence on the
resulting network: try different methods



Gene selection method : Use prior knowledge to
select a group of genes (pathways)

Conclusion

Experiment for a few more months!