# Dimensional Problems Using

AI and Robotics

Oct 19, 2013 (4 years and 6 months ago)

88 views

Classification for High
Dimensional Problems Using
Bayesian Neural Networks
and Dirichlet Diffusion Trees

Radford M. Neal and Jianguo Zhang

the winners of NIPS2003 feature selection challenge

University of Toronto

The results

Combination of Bayesian
neural networks and
classification based on Bayesian
clustering with a Dirichlet
diffusion tree model.

A Dirichlet diffusion tree
method is used for Arcene.

Bayesian neural networks (as
in BayesNN
-
large) are used for
Gisette, Dexter, and Dorothea.

probabilities from a Bayesian
neural network and from a
Dirichlet diffusion tree method
are averaged, then thresholded
to produce predictions.

Their General Approach

Use simple techniques to reduce the
computational difficulty of the problem,
then apply more sophisticated
Bayesian methods.

The simple techniques: PCA and feature
selection by significance tests.

Bayesian neural networks.

Automatic Relevance Determination.

(I) First level feature
reduction

Feature selection using
significance tests (first level)

An initial feature subset was found by
simple univariate significance tests.
(correlation coefficient, symmetrical
uncertainty )

Assumption: Relevant variables will be at
least somewhat relevant on their own.

For all tests, a p
-
value was found by
comparing to the distribution found when
permuting the class labels.

Dimensionality reduction with
PCA (an alternative for FS)

There are probably better dimensionality
reduction methods than PCA, but that

s
what we used. One reason is that it

s
feasible even when p is huge, provided n is
not too large
-

time required is of order
min(pn
2
, np
2
).

PCA was done using all the data (training,
validation, and test).

(II) Building learning model &
Second level feature Selection

Bayesian Neural Networks

Conventional neural network
learning

Bayesian Neural Network
Learning

Based on the statistic interpretation of
the conventional neural network
learning

Bayesian Neural Network
Learning

Bayesian predictions are found by integration rather
than maximization. For a test case x, y is predicted:

Conventional neural network only consider
parameters with maximum posterior

Bayesian Neural Network consider all possible
parameters in the parameter space.

Can be implemented by Gaussian approximation and
MCMC

ARD Prior

Still remember the decay?

How? (by optimize the decay parameter)

Associate weights from each input with a decay parameter

There are theories for optimizing the decays.

Result.

If an input feature x is irrelevant, its relevance hyper
-
parameter
β=1/a

will tend to be small, forcing the relevant
weight from that input to be near zero.

Some Strong Points of This
Algorithm

Bayesian learning integrates over the
posterior distribution for the network
parameters, rather than picking a single

optimal

set of parameters. This farther

helps to avoid overfitting.

ARD can be used to adjust the relevance of
input features

We can using prior to incorporate external
knowledge

Dirichlet Diffusion Trees

An Bayesian hierarchical clustering
method

The methods

BayesNN
-
small

features selected using significance tests.

BayesNN
-
large

principle components

BayesNN
-
DFT
-
combo

the class probabilities from a Bayesian neural
network and from a Dirichlet diffusion tree method
are averaged, then thresholded to produce
predictions.