Dimensional Problems Using

clangedbivalveAI and Robotics

Oct 19, 2013 (3 years and 11 months ago)

80 views

Classification for High
Dimensional Problems Using
Bayesian Neural Networks
and Dirichlet Diffusion Trees

Radford M. Neal and Jianguo Zhang

the winners of NIPS2003 feature selection challenge

University of Toronto

The results


Combination of Bayesian
neural networks and
classification based on Bayesian
clustering with a Dirichlet
diffusion tree model.


A Dirichlet diffusion tree
method is used for Arcene.


Bayesian neural networks (as
in BayesNN
-
large) are used for
Gisette, Dexter, and Dorothea.


For Madelon, the class
probabilities from a Bayesian
neural network and from a
Dirichlet diffusion tree method
are averaged, then thresholded
to produce predictions.

Their General Approach


Use simple techniques to reduce the
computational difficulty of the problem,
then apply more sophisticated
Bayesian methods.


The simple techniques: PCA and feature
selection by significance tests.


Bayesian neural networks.


Automatic Relevance Determination.

(I) First level feature
reduction

Feature selection using
significance tests (first level)


An initial feature subset was found by
simple univariate significance tests.
(correlation coefficient, symmetrical
uncertainty )


Assumption: Relevant variables will be at
least somewhat relevant on their own.


For all tests, a p
-
value was found by
comparing to the distribution found when
permuting the class labels.

Dimensionality reduction with
PCA (an alternative for FS)


There are probably better dimensionality
reduction methods than PCA, but that

s
what we used. One reason is that it

s
feasible even when p is huge, provided n is
not too large
-

time required is of order
min(pn
2
, np
2
).



PCA was done using all the data (training,
validation, and test).

(II) Building learning model &
Second level feature Selection

Bayesian Neural Networks

Conventional neural network
learning

Bayesian Neural Network
Learning


Based on the statistic interpretation of
the conventional neural network
learning

Bayesian Neural Network
Learning


Bayesian predictions are found by integration rather
than maximization. For a test case x, y is predicted:


Conventional neural network only consider
parameters with maximum posterior


Bayesian Neural Network consider all possible
parameters in the parameter space.


Can be implemented by Gaussian approximation and
MCMC

ARD Prior


Still remember the decay?






How? (by optimize the decay parameter)


Associate weights from each input with a decay parameter


There are theories for optimizing the decays.



Result.


If an input feature x is irrelevant, its relevance hyper
-
parameter
β=1/a

will tend to be small, forcing the relevant
weight from that input to be near zero.

Some Strong Points of This
Algorithm


Bayesian learning integrates over the
posterior distribution for the network
parameters, rather than picking a single

optimal


set of parameters. This farther

helps to avoid overfitting.


ARD can be used to adjust the relevance of
input features


We can using prior to incorporate external
knowledge

Dirichlet Diffusion Trees


An Bayesian hierarchical clustering
method


The methods


BayesNN
-
small

features selected using significance tests.


BayesNN
-
large

principle components


BayesNN
-
DFT
-
combo


the class probabilities from a Bayesian neural
network and from a Dirichlet diffusion tree method
are averaged, then thresholded to produce
predictions.


About the datasets

The results


http://www.nipsfsc.ecs.soton.ac.uk/

Thanks.

Any Question?