Boosting Neural Networks

clangedbivalveAI and Robotics

Oct 19, 2013 (3 years and 11 months ago)

101 views

Boosting Neural Networks

Published by Holger Schwenk and Yoshua Benggio

Neural Computation
, 12(8):1869
-
1887, 2000.


Presented by Yong Li


Outline

1.
Introduction

2.
AdaBoost

3.
3 versions of AdaBoost for Neural
Network

4.
Results

5.
Conclusions

6.
Discussions



Introduction


Boosting


a general method to improve the
performance of a learning method.


AdaBoost is a relatively new one of Boosting
algorithms.


Many empirical studies for AdaBoost using
decision tree as base classifiers. (Breiman 1996,
Drucker and cortes, 1996, et al)


Also theoretically understanding. (Schapire et al
1997, Breidman 1998, Schapire 1999)


Introduction


But applications have all been to decision
trees. No applications to multi
-
layer
artificial neural networks. (At that time)


The questions which this paper try to
answer


Does AdaBoost work as well for neural networks as for
decision tree?


Does it behave in a similar way?


And more?


AdaBoost (Adaptive Boosting)


It is often possible to increase the accuracy
of a classifier by averaging the decisions of
an ensemble of classifiers.


Two popular ensemble methods. Bagging
and Boosting.


Bagging improves generation performance due to a
reduction in variance while maintaining or only slightly
increasing bias.


AdaBoost constructs a composite classifier by
sequentially training classifier while putting more and
more emphasis on certain patterns.

AdaBoost


AdaBoost M2 is used in the experiments



Applying AdaBoost to neural networks


Three versions of AdaBoost are compared
in this paper.


(R) Training the t
-
th classifier with a fixed
training set


(E) Training the t
-
th classifier using a different
training set at each epoch


(W) Training the t
-
th calssifier by directly
weighting the cost function of the t
-
th neural
network.

Results


Experiments are performed on three data sets.


The online data set collected at Paris 6 university


22 attributes([
-
1 1]
22
), 10 classes.


1200 examples for learning and 830 examples for testing


UCI letter


16 attributes and 26 classes


16000 for training and 4000 for testing


Satimage Data set


36 attributes and 6 classes


4435 for training and 2000 for testing



Results of online data

Results of online data


Some conclusions


Boosting is better than
Bagging


AdaBoost is less useful for
very big networks.


(E) and (W) versions are
better than (R)

Results of online data


The generation errors
continue decrease
after the training error
reach zero.


Results of online data

The number of examples
with high margin increases
when more classifier are
combined by boosting

Note: There are opposite results about the
margin cumulative distribution.

Results of online data

Bagging has no
significant influence on
the margin distribution


The results for UCI letters and
Satimage data sets


Only E and W version are applied. They obtain
same results.


The same conclusions are drawn as those of online
data. (Some results are omitted)

Conclusion


AdaBoost can significantly improve the neural
classifiers.


Does AdaBoost work as well for neural networks as for decision
tree?


Answer
Yes


Does it behave in a similar way?


Answer
Yes


Overfitting


Still there


Other questions


Short answers




Discussions


Empirically shows AdaBoost works well for
neural networks


The algorithm description is misleading.


D
t
(i), D
t
(i, y)