Research Progress Report

journeycartΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

100 εμφανίσεις

Active Learning for Class
Imbalance Problem

Problem to be addressed


class imbalance problem

referring to the situation that at least one of class having significantly
less number of training examples

or examples in training data belonging to one class heavily outnumber
the examples in the other class

Currently, most of the machine learning algorithms assume the
training data to be balanced, support vector machine, logistic
regression, naïve bayesian classifier etc,.

During the last few decades, some effective methods have been
proposed to attack this problem, like up
sampling, down
and asymmetric bagging, etc,.

Problem to be addressed

Detailed problem

Traditional machine learning algorithms are often
biased toward the majority class

Since the goal of the classifiers is to reduce the training
error, not taking the data distribution into consideration

Consequently, examples from the majority class are
classified while the examples from minority class
tend to be misclassified

Several Common Approaches

From the data perspective



Asymmetric Bagging

From the learning algorithm perspective

Adjusting the cost function

Tuning the related parameters

Background Knowledge

Active Learning

Similar to semi
supervised learning method, the key idea is to use
both the labeled and unlabeled data for classifier training.

Active learning is composed of four components

A small set of labeled training data, a large pool of unlabeled data, a
based learning algorithm and an active learner (selection strategy)

Active learning is not a machine learning algorithm, It can be seen
as a enhancing wrapper method

The difference between semi
supervised learning and active

Background Knowledge

Active Learning

Goals of active learning

Maximizing the learning performance while
minimizing the required labeled training examples

Achieving better performance using the same
amount of labeled training data

Needing less training samples to obtain the same
learning performance

Background Knowledge

Background Knowledge

An Example

based Active Learning

A small set of labeled training examples

A large pool of unlabeled data

Base learning algorithm SVM

Active Learner (selection strategy)

Instances closest to the current separating
hyperplane are selected and asks for human labeling


based Active Learning

In classical active learning methods, the most informative samples
are selected from the entire unlabeled pool

In other words, each iteration of active learning involves the
computation of distance of each sample to the decision boundary

For large
scale data set, it is time
consuming and computationally

Paper Contribution

Proposed method

Instead of querying the whole unlabeled pool ,
a subset is first selected

Select the closed sample from using the
criterion that is among the top closest
instances with probability

Paper Contribution

Proposed Method

The probability that at least one of the

instances is among the closest is

We have

Paper Contribution

Proposed Method

For example

The active learner will pick one instance, with 95%
probability, that is among the top 5% closed
instances to the separating hyperplane, by randomly
sampling only instances
regardless of the training set size



Evaluation Metric


where sensitivity and specifity are the accuracies of the
positive and negative instances respectively






This paper propose a method to address the class
imbalance problem using active learning technique

Experimental results show that this approach can
achieve a significant decrease in the training time,
while maintaining the same or even higher g
means value by using less number of training

Active selection of informative examples from a
randomly selected subset avoid searching the
whole unlabeled pool