V R S F S

desertcockatooData Management

Nov 20, 2013 (3 years and 11 months ago)

81 views

V
ARIANCE

R
EDUCTION

FOR


S
TABLE

F
EATURE

S
ELECTION

Presenter: Yue Han

Advisor: Lei Yu




Department of Computer Science

10/27/10




O
UTLINE


Introduction and Motivation


Background and Related Work


Preliminaries


Publications


Theoretical Framework


Empirical Framework :
Margin Based Instance Weighting


Empirical Study


Planned Tasks

O
UTLINE


Introduction and Motivation


Background and Related Work


Preliminaries


Publications


Theoretical Framework


Empirical Framework :
Margin Based Instance Weighting


Empirical Study


Planned Tasks

I
NTRODUCTION

AND

M
OTIVATION

F
EATURE

S
ELECTION

A
PPLICATIONS

D
1

D
2

Sports

T
1

T
2

….…… T
N

12 0

….…… 6

D
M

C

Travel

Jobs







Terms

Documents


3 10

….…… 28


0 11

….…… 16



Features(Genes or Proteins)

Samples

Pixels

Vs

Features

I
NTRODUCTION

AND

M
OTIVATION

F
EATURE

S
ELECTION

FROM

H
IGH
-
DIMENSIONAL

D
ATA

p
: # of features
n
: # of samples

High
-
dimensional data:
p
>>

n

Feature Selection:


Alleviating the effect of the curse of
dimensionality.


Enhancing generalization capability.


Speeding up learning process.


Improving model interpretability.

Curse of Dimensionality:


Effects on distance functions


In optimization and learning


In Bayesian statistics




High
-
Dimensional Data

Feature Selection Algorithm

MRMR, SVMRFE, Relief
-
F,

F
-
statistics, etc.

Low
-
Dimensional Data

Learning Models

Classification,

Clustering, etc.

Knowledge Discovery on High
-
dimensional Data

I
NTRODUCTION

AND

M
OTIVATION

S
TABILITY

OF

F
EATURE

S
ELECTION

Training Data

Feature Subset

Training Data

Feature Subset

Training Data

Feature Subset

Feature Selection Method

Consistent
or not???


Stability of Feature Selection
: the insensitivity of the result of


a feature selection algorithm to variations to the training set.

Training
Data

Learning
Model

Training
Data

Learning
Model

Training
Data

Learning
Model

Learning Algorithm

Stability of Learning Algorithm is

firstly examined by
Turney

in 1995



Stability of feature selection


was relatively neglected
before and attracted interests from
researchers in data mining recently.

Stability Issue of Feature Selection

I
NTRODUCTION

AND

M
OTIVATION

M
OTIVATION

FOR

S
TABLE

F
EATURE

S
ELECTION

D1

D2

Features

Samples

Given Unlimited Sample Size of D
:

Feature selection results from D1 and D2 are the same

Size of D is limited: (
n<<p

for high dimensional data)

Feature selection results from D1 and D2 are different

Challenge
:
Increasing #of samples could be very costly or impractical



Experts from Biology and Biomedicine are interested in:



not only the prediction accuracy but also the consistency of feature subsets;


validating stable genes or proteins less sensitive to variations to training data;


biomarkers to explain the observed phenomena.


O
UTLINE


Introduction and Motivation


Background and Related Work


Preliminaries


Publications


Theoretical Framework


Empirical Framework :
Margin Based Instance Weighting


Empirical Study


Planned Tasks

B
ACKGROUND

AND

R
ELATED

W
ORK

F
EATURE

S
ELECTION

M
ETHODS

Subset
Generation

Subset
Evaluation

Stopping
Criterion

Result
Validation

Original


set

Subset

Goodness of
subset

no

Yes

Evaluation Criteria



Filter Model



Wrapper Model



Embedded Model

Search Strategies
:



Complete Search



Sequential Search



Random Search

Representative Algorithms



Relief, SFS, MDLM, etc.



FSBC, ELSA, LVW, etc.



BBHFS, Dash
-
Liu’s, etc.

B
ACKGROUND

AND

R
ELATED

W
ORK

S
TABLE

F
EATURE

S
ELECTION

Comparison of Feature Selection Algorithms
w.r.t
. Stability

(
Davis et al. Bioinformatics, vol. 22, 2006;
Kalousis

et al. KAIS, vol. 12, 2007
)


Quantify the stability in terms of consistency on subset or weight;


Algorithms varies on stability and equally well for classification;


Choose the best with both stability and accuracy.


Bagging
-
based Ensemble Feature Selection


(
Saeys

et al. ECML07)


Different bootstrapped samples of the same training set;


Apply a conventional feature selection algorithm;


Aggregates the feature selection results.


Group
-
based Stable Feature Selection


(Yu et al. KDD08;
Loscalzo

et al. KDD09)


Explore the intrinsic feature correlations;


Identify groups of correlated features;


Select relevant feature groups.

B
ACKGROUND

AND

R
ELATED

W
ORK

M
ARGIN

BASED

F
EATURE

S
ELECTION

Sample Margin
: how much can

an instance travel before it hits

the decision boundary


Hypothesis Margin
: how much
can the hypothesis travel before it
hits an instance (Distance between
the hypothesis and the opposite
hypothesis of an instance)


Representative Algorithms: Relief, Relief
-
F, G
-
flip,
Simba
, etc.




margin is used for feature weighting or feature selection


(totally different use in our study)

O
UTLINE


Introduction and Motivation


Background and Related Work


Preliminaries


Publications


Theoretical Framework


Empirical Framework :
Margin Based Instance Weighting


Empirical Study


Planned Tasks

P
UBLICATIONS


Yue Han

and Lei Yu.
An Empirical Study on Stability of Feature
Selection Algorithms
.
Technical Report from Data Mining Research
Laboratory,
Binghamton University, 2009.



Yue Han

and Lei Yu.
Margin Based Sample Weighting for Stable
Feature Selection
. In
Proceedings of the 11th International Conference
on Web
-
Age Information Management (
WAIM2010
)
, pages 680
-
691,
Jiuzhaigou
, China, July 15
-
17, 2010.



Yue Han

and Lei Yu.
A Variance Reduction Framework for Stable
Feature Selection
. In
Proceedings of the 10th IEEE International
Conference on Data Mining (
ICDM2010
)
, Sydney, Australia, December
14
-
17, 2010, To Appear.



Lei Yu,
Yue Han

and Michael E.
Berens
.
Stable Gene Selection from
Microarray Data via Sample Weighting
.
IEEE/ACM Transactions on
Computational Biology and Bioinformatics (
TCBB
), 2010
, Major
Revision Under Review.



O
UTLINE


Introduction and Motivation


Background and Related Work


Preliminaries


Publications


Theoretical Framework


Empirical Framework :
Margin Based Instance Weighting


Empirical Study


Planned Tasks

T
HEORETICAL

F
RAMEWORK

B
IAS
-
VARIANCE

D
ECOMPOSITION

OF

F
EATURE

S
ELECTION

E
RROR


Expected Loss(Error)
:

Training Data:
D
; Data Space: ; FS Result:
r(D)
; True FS Result:
r*

Bias
:

Variance
:

Bias
-
Variance Decomposition of Feature Selection Error
:


o

Reveals relationship between accuracy(opposite of loss) and stability (opposite of variance);


o

Suggests a better trade
-
off between the bias and variance of feature selection.




T
HEORETICAL

F
RAMEWORK

V
ARIANCE

R
EDUCTION

VIA

I
MPORTANCE

S
AMPLING


Feature Selection (Weighting)


Monte Carlo Estimator

Relevance Score
:

Monte Carlo Estimator
:


Variance of Monte Carlo Estimator
:

Impact Factor:
feature selection algorithm

and
sample size

?

Increasing sample size impractical and costly

Importance Sampling

A good importance sampling function
h(x)

Instance
Weighting

Intuition behind h(x)
:

More instances draw from important regions

Less instances draw from other regions


Intuition behind instance weight
:

Increase weights for instances from important regions

Decrease weights for instances from other regions

O
UTLINE


Introduction and Motivation


Background and Related Work


Preliminaries


Publications


Theoretical Framework


Empirical Framework :
Margin Based Instance Weighting


Empirical Study


Planned Tasks

E
MPIRICAL

F
RAMEWORK

O
VERALL

F
RAMEWORK

Challenges
:



How to produce weights for
instances from the point view
of feature selection stability;



How to present weighted
instances to conventional
feature selection algorithms.


Margin Based Instance Weighting for Stable Feature Selection

E
MPIRICAL

F
RAMEWORK

M
ARGIN

V
ECTOR

F
EATURE

S
PACE

Original Space

For each

Margin Vector
Feature Space

Hypothesis
Margin:

hit

miss

Nearest Hit

Nearest Miss

captures the local profile of feature relevance for all features at




Instances exhibit different profiles of feature relevance;



Instances influence feature selection results differently.

E
MPIRICAL

F
RAMEWORK

A
N

I
LLUSTRATIVE

E
XAMPLE

Hypothesis
-
Margin based Feature Space Transformation:

(a)
Original Feature Space
(b)
Margin Vector Feature Space.

(a)

(b)

E
MPIRICAL

F
RAMEWORK

M
ARGIN

B
ASED

I
NSTANCE

W
EIGHTING

A
LGORITHM

Instance

exhibits different profiles of feature relevance

influence feature selection results differently

Instance
Weighting

Higher Outlying Degree Lower Weight

Lower

Outlying Degree Higher Weight

Review
:

Variance reduction via
Importance Sampling



More instances draw

from important regions



Less instances draw
from other regions

Weighting:

Outlying Degree:

E
MPIRICAL

F
RAMEWORK

A
LGORITHM

I
LLUSTRATION


Time Complexity Analysis
:


o

Dominated by Instance Weighting:


o

Efficient for High
-
dimensional Data with small sample size
(n<<d)

O
UTLINE


Introduction and Motivation


Background and Related Work


Preliminaries


Publications


Theoretical Framework


Empirical Framework :
Margin Based Instance Weighting


Empirical Study


Planned Tasks

E
MPIRICAL

S
TUDY

S
UBSET

S
TABILITY

M
EASURES

Average Pair
-
wise Similarity:

Kuncheva

Index:

Feature Subset


Jaccard

Index;


nPOGR
;


SIMv
.


Feature Ranking
:


Spearman Coefficient


Feature Weighting
:


Pearson Correlation Coefficient


Training Data

Feature Subset

Training Data

Feature Subset

Training Data

Feature Subset

Feature Selection Method

Consistent
or not???

Stability of Feature Selection

E
MPIRICAL

S
TUDY

E
XPERIMENTS

ON

S
YNTHETIC

D
ATA

Synthetic Data Generation
:


Feature Value
:

two multivariate normal distributions






Covariance matrix



is a 10*10 square


matrix with elements


1

along the diagonal


and
0.8

off diagonal.

100 groups and 10 feature each


Class label
:

a weighted sum of all feature values with
optimal feature weight vector

500 Training Data
:

100 instances with 50 from and 50 from

Leave
-
one
-
out Test Data
:

5000 instances

Method in Comparison
:

SVM
-
RFE: Recursively eliminate 10%
features of previous iteration till 10
features remained.

Measures
:

Variance, Bias, Error

Subset Stability (
Kuncheva

Index)

Accuracy (SVM)

E
MPIRICAL

S
TUDY

E
XPERIMENTS

ON

S
YNTHETIC

D
ATA

Observations
:



Error is equal to the sum of bias and variance

for both versions of SVM
-
RFE;



Error is dominated
by bias during early iterations


and is dominated
by variance during later iterations
;



IW SVM
-
RFE exhibits
significantly lower bias, variance and error

than


SVM
-
RFE when the number of remaining features approaches 50.

E
MPIRICAL

S
TUDY

E
XPERIMENTS

ON

S
YNTHETIC

D
ATA

Conclusion
:

Variance Reduction via Margin Based Instance Weighting


better bias
-
variance tradeoff


increased subset stability


improved classification accuracy

E
MPIRICAL

S
TUDY

E
XPERIMENTS

ON

R
EAL
-
WORLD

D
ATA

Microarray Data
:

Experiment Setup
:

Methods in Comparison
:

SVM
-
RFE

Ensemble SVM
-
RFE

Instance Weighting SVM
-
RFE

Measures
:

Variance

Subset Stability

Accuracies (KNN, SVM)

10 fold

...

Training

Data

Test Data

10
-
fold Cross
-
Validation

Bootstrapped
Training Data

Feature Subset

Aggregated
Feature
Subset

20

...

Bootstrapped
Training Data

...

Feature Subset

20
-
Ensemble SVM
-
RFE

E
MPIRICAL

S
TUDY

E
XPERIMENTS

ON

R
EAL
-
WORLD

D
ATA

Note
: 40 iterations starting
from about 1000 features till
10 features remain

Observations
:


Non
-
discriminative
during early iterations;



SVM
-
RFE sharply
increase as # of features
approaches 10;



IW SVM
-
RFE shows
significantly slower rate of
increase.

E
MPIRICAL

S
TUDY

E
XPERIMENTS

ON

R
EAL
-
WORLD

D
ATA

Observations
:


Both ensemble and
instance weighting
approaches improve
stability consistently;



Ensemble is not as
significant as instance
weighting;



As # of features
increases, stability score
decreases because of the
larger correction factor.

E
MPIRICAL

S
TUDY

E
XPERIMENTS

ON

R
EAL
-
WORLD

D
ATA

Conclusions
:


Improves stability of feature
selection
without
sacrificing prediction accuracy;



Performs much better than ensemble
approach and more efficient;



Leads to significantly increased stability
with slight extra cost of time.

O
UTLINE


Introduction and Motivation


Background and Related Work


Preliminaries


Publications


Theoretical Framework


Empirical Framework :
Margin Based Instance Weighting


Empirical Study


Planned Tasks

P
LANNED

T
ASKS

O
VERALL

F
RAMEWORK

Theoretical Framework of Feature Selection Stability

Empirical Instance Weighting Framework

Margin
-
based Instance Weighting

Representative
FS Algorithms

SVM
-
RFE

Relief
-
F

F
-
statistics

HHSVM

Various Real
-
world Data Set

Gene Data

Text Data

Iterative Approach

State
-
of
-
the
-
art
Weighting Schemes

Relationship Between
Feature Selection
Stability and
Classification Accuracy

P
LANNED

T
ASKS

L
ISTED

T
ASKS

A

Extensive Study on Instance Weighting Framework

A1

Extension to Various Feature Selection Algorithms

A2

Study on Datasets from Different Domains


B

Development of Algorithms under Instance Weighting Framework

B1

Development of Instance Weighting Schemes

B2

Iterative Approach for Margin Based Instance Weighting


C

Investigation on the Relationship between Stable Feature Selection


and Classification Accuracy

C1

How Bias
-
Variance Properties of Feature Selection Affect Classification Accuracy

C2

Study on Various Factors for Stability of Feature Selection



Oct
-
Dec 2010

Jan
-
Mar 2011

April
-
June2011

July
-
Aug 2011

A1

A2

B1

B2

C1

C2

Thank you


and


Questions?