Implementation of a Multitask Learner using Stacking In the WEKA environment

crazymeasleAI and Robotics

Oct 15, 2013 (3 years and 11 months ago)

189 views

Implementation of a Multitask
L
earner using Stacking

In the WEKA environment


Chonly Wang

ECE492Q


Machine Learning

Class Project



Journal Paper

http://www.gogoshen.org/ml

cwang6@ncsu.edu


1. Introduction

Multitask learning is using information from
related tasks to help predict a
particular target
task.
Through the works of Caruana (1997), it has been proven that the sharing of
knowledge
through the paralle
l

training o
f related tasks, improves the performance of a

specific target task learner. Now, the question is, how can you train a learner to find the
relatedness of tasks and at the same time, through the training of the related tasks, share
this knowledge.

This is

possible through concept of inductive bias. An inductive bias is
anything that causes a learning algorithm to prefer some set of parameters, or hypothesis,
over another hypothesis (Hamers 2003).

When the influence of other tasks is used to train
a learn
er
, bias is introduced.

This proposed

multitask learner uses the stacking meta
-
learner algorithm. Through
the use of different classifiers and parameters in the base
-
level, the learner attempts to find
the relatedness of tasks and introduce bias into th
e meta
-
level learner which will use this
information to improve the expected tasks performance. In addition, I will illustrate how this
implementation can be integrated into the robust, machine learning software workbench,
WEKA.


2. Stacking and Multitas
k Learning


The stacked generalization method (Stacking) of combining multiple models allows for
bias through the use of two levels of learning, base
-
level and meta
-
level. The base
-
level is
comprised of different learning models trained on the same d
ataset. The predictions from
these models are then fed into the meta
-
level learner. The purpose of the meta
-
level
learner is to discover how to best combine the outputs of the base
-
level learners to achieve
higher performance on predictions.


This
imple
mentation of a multitask learner

algorithm

uses stacking, as a means of
relating tasks, sharing knowledge, and introducing bias.

The base
-
level learners are trained
on a single related task. The outputs from these learners are then used as inputs to the
meta
-
level learners.

Similar to the classic stacking method, the entire dataset is not used to
train
the
classifier directly. Instead
, a

holdout method is used to reserve some instances to
train the base classifiers to form the training data for the l
eve
l
-
1 learner. Once the base
classifiers are trained on this portion, they are tested on the remaining portion of the dataset.

The predictions from this are

then used to train the level
-
1 learner.

Once the
creation of the
meta
-
instances is

done, the base
classifiers are rebuilt using the entire dataset.


2.1

Base
-
level Learners

The purpose of the base
-
level learners in this multitask stacking algorithm is
to

introduce
bias to the meta
-
learner
. This is accomplish
ed

by using different parame
ters on differ
ent
classifiers and training on specific single, related tasks.

A specific example for a tree
classifier would be using different levels of pruning as parameters. For consistency
purposes, every classifier and parameter specified will be used to create a

model for each
task
.
The purpose of
using
different

parameters on the classifiers is to create bias learners.
In essence, the more bias in the base
-
level, the better the performance of
the
meta
-
learner.

Figure 1 shows a graphical view of the base
-
lev
el learners.


Figure 1
-

Base
-
level Learner


Figure one points
out
an important feature to this algorithm.
Given the number of different
classifiers and parameter combinations used in the base level, there will be

an identical
copy of the classifier cr
eated for each task. This allows for

equal sharing of information and
bias from each task given the classifiers specified.









2.1

Meta
-
level Learners

Using
the out
come of the base
-
level learners
the meta
-
level learner

attempts to relate these
task
s to the targeted task.
Because the outputs from the base
-
level are highly
diversified
from high levels of pruning and using different classifier models, it is likely that the meta
-
learn
er will discover commonalities and relate
dness of

the
other
tasks to
the target task.
Figure 2 shows a graphical view the meta
-
level learner.



Figure 2

-

Meta
-
level Learner


3. Implementation to WEKA

WEKA provided an enormous amount of algorithms and resources that were useful and
necessary for this multitask implementa
tion. Not having to rewrite many of the core objects
such as “Attribute”, “Instances”, “Classifiers”, etc, were one of the many things that prompted
this study to use WEKA. Also, since WEKA provided such a nice workbench GUI, it was
beneficial to integra
te this multitask learner into the GUI environment.


In addition, huge amount of testing was necessary for this multitask implementation.
Because of its numerous options in using classifiers and parameters, the best way to
efficiently test and make

use of this multitask learner was to create a workbench design.



3.1


Design

The name of this classifier is “MultiTask”. I created a package called
“weka.classifier.multiTask” which will hold all
classes related to this classifier.

Three options
are
available, two required and one optional. The required
options are “targetTask” and
“tasks”. Here the user is required to define the name of the tasks in the dataset that are to
be used.

The optional one is “useInputAttributes”, which is a flag that tel
ls the classifier to
use the input attributes use
d

in the base
-
level learner.

Adding the MultiTask classifier to the
“weka.gui.GenericObjectEditor.props” file
was necessary to get the WEKA GUI to
recognize it.


3.3


MultiTask in WEKA GUI Screen Shots


F
igure 3


WEKA Explorer



Figure 4


MultiTask Options
in

WEKA GUI

3.4 GUI
Hacks and Bugs

Although the tampering of source code in Weka was not intended, a minor hack to the
ClassifierPanel class was necessary for MultiTask to function properly in the Wek
a
Environment.

This hack was simply one

line of code change to prevent the removal of the
instance
attribute

from the instance. This was necessary because the createMetaInstance
method in MultiTask required the use of the values of the class attribute.



MultiTask can only be run once per Weka startup and it is only supported in the Explorer
GUI. Once one run of MultiTask is ran, the GUI must be restarted completely.
Also, testing
on the classifier can only be done using either the training set or a
specified training set.
Cross validation and percentage split is not supported.



4.
Analysis

Although it may seem that by using other task information to predict the target task w
ould
result in
better performance,
t
his particular implementation

has prov
en otherwise.
The
testing of this multitask learner was on the C45 classifier level.

Using only C45 classifiers in
the meta and base level
s

and adjusting different levels of pruning to create bias, the hope
would be that the base learners may find

new si
milarities and
/or

discrepancies among the
tasks
that

enable the

meta learner

to make better predictions

since it was trained on the
outputs of the base learners.

Perhaps ther
e may be other classifiers and parameters
that
when

used with this multitask lear
ner, result in more promising performances
.


The

multitask
learner’s classification of a target task using related tasks

in comparison to
a single task C45 c
lassifier
was far less accurate.
With correctly classified instances
averaging
around 50%, th
e multitask
learner would not be a good classifier to use.
Using
the same target task, without the aid of related task information, the single task classifier
C45 averaged around 98% on correctly classified instances.


In the largest use of
variations

in pruning, the multitask performance was only 57.4257%
for correctly classif
ied instances. This test case consisted of eight base level classifiers with
confidence
factors ranging from 0.1 to 0.00001
, reduced error pruning set to true, no
pruning, and o
ne which uses the default J48 option values
.

The meta classifier used was
also a J48 with default values.


Using the zooMultiTask dataset with testing using the
training set, the performance was still low.
The target task was attribute “type” and related

task being “blood”.

Figure 5 shows the run summary for these parameters
.


=== Run information ===


Scheme: weka.classifiers.multiTask.MultiTask
-
T type
-
A false
-
I blood
-
X 10
-
M "weka.classifiers.trees.J48
-
C 0.25
-
M 2"
-
S 1
-
B "weka.classifiers.t
rees.J48
-
R
-
N 3
-
Q 1
-
M 2"
-
B "weka.classifiers.trees.J48
-
C 1.0E
-
5
-
M 2"
-
B
"weka.classifiers.trees.J48
-
C 1.0E
-
4
-
M 2"
-
B "weka.classifiers.trees.J48
-
C 0.0010
-
M 2"
-
B "weka.classifiers.trees.J48
-
C
0.01
-
M 2"
-
B "weka.classifiers.trees.J48
-
C 0.1
-
M 2
"
-
B "weka.classifiers.trees.J48
-
U
-
M 2"
-
B


=== Evaluation on training set ===

=== Summary ===


Correctly Classified Instances 58 57.4257 %

Incorrectly Classified Instances 43 42.5743 %

Kappa statistic

0.4005

Mean absolute error 0.1484

Root mean squared error 0.2724

Relative absolute error 67.8204 %

Root relative squared error 82.6738 %

Total Number of Instances

101

Figure 5 Weka Run Information



The initial idea of

using pruning
was
to create smaller tress that would introduc
e bias to
the meta classifier.

Through the classification of these smaller, pruned trees, of the related
task,
this would in
turn create different meta instances to train the meta learner.
The bias
from the base
classifiers was

supposed
to

theoretically
enable

the meta classifier

to

make

better
judgment

for the classification of the target task
.

It is

through the adjustment of
the
confidence
Factor
option
in the J48
classifier that

changes the levels of pruning. The lower
the confidence factor, the higher the pruning.

Weka only allows for a confidenc
e factor

no
less than 1x10^
-
7
.


An interesting discovery was made

while
obs
erving the base classifier tress in the Weka
Run Information. Even though the pruning for 5 different classifiers were set ranging from
0.1 to 0.00001, the tree created remained exactly the same each time, with the exception of
the classifier
s

with confid
ence factor set to
greater than
0.00001, which resulted in the same
tree structure as well.

This discovery sets a big hurdle in trying to find any means of
creating smaller, biased trees using the C45 classifier
.


Although the size of the actual
dataset
may
affect

the pruning sensitivity to the confidence factor, this was not tested.
Perhaps, the multitask classifier works best with large
r

datasets.

The tree structures for
pruning with confidence factors ranging from 0.1


0.001 is shown in Figure 6 and
confidence factors ranging from 0.00001
-
0.0000001.


J48 pruned tree

------------------


breathes = false: cold (21.0)

breathes = true

| backbone = false: cold (11.0)

| backbone = true

| | type = mammal: warm (41.0)

| | type = bird: warm (20.0)

| | type = reptile: cold (4.0)

| | type = fish: warm (0.0)

| | type = amphibian: cold (4.0)

| | type = insect: warm (0.0)

| | type = invertebrate: warm (0.0)


Number of Leaves : 9


Size of the tree : 12

Figure 5 Confidence Factor (0.1


0.001)






J48 pruned tree

------------------


breathes = false: cold (21.0)

breathes = true

| backbone = false: cold (11.0)

| backbone = true: warm (69.0/8.0)


Number of Leaves : 3


Size of the tree : 5

Figure 6 Confidence Factor (0.0001


0.00000
1)


5
.

Conclusion

Since it was showed that pruning
of the base classifiers using the

zooMultiTask
dataset did
not succeed in
biasing the multitask meta
-
learner enough to perform better than or remotely
clos
e to the single task learners, the continual stud
ying and testing of this algorithm is
encouraged. Although the use of base level input attributes

as inputs to the meta classifier
was not implemented, it may be helpful in improving the performa
nce of this multitask
learner. Also, other classifiers shou
ld be taken into consideration when choosing the base
classifiers. This study was implementing a multitask learner at the C45 level. Other
combinations of classifiers and parameters may better performance.



6
. References

Bart Hamers, Johan A.K. Suykens,

and Bart De Moor


Coupled Transnductive Ensemble
Learning of Kernel Models.


Geoffrey Homes, Bernheard Pfahringer, Richard Kirby, Eibc Frank, and Mark Hall


Multiclass Alternating Decision Trees


Yoav Freud, Robert E. Schapire


A Short Introduction to
Boosting


Sebastian Thrun


Learning to Learn


R. Caruana. Multitask learning


Machine Learning