V
ARIANCE
R
EDUCTION
FOR
S
TABLE
F
EATURE
S
ELECTION
Presenter: Yue Han
Advisor: Lei Yu
Department of Computer Science
10/27/10
O
UTLINE
Introduction and Motivation
Background and Related Work
Preliminaries
Publications
Theoretical Framework
Empirical Framework :
Margin Based Instance Weighting
Empirical Study
Planned Tasks
O
UTLINE
Introduction and Motivation
Background and Related Work
Preliminaries
Publications
Theoretical Framework
Empirical Framework :
Margin Based Instance Weighting
Empirical Study
Planned Tasks
I
NTRODUCTION
AND
M
OTIVATION
F
EATURE
S
ELECTION
A
PPLICATIONS
D
1
D
2
Sports
T
1
T
2
….…… T
N
12 0
….…… 6
D
M
C
Travel
Jobs
…
…
…
Terms
Documents
3 10
….…… 28
0 11
….…… 16
…
Features(Genes or Proteins)
Samples
Pixels
Vs
Features
I
NTRODUCTION
AND
M
OTIVATION
F
EATURE
S
ELECTION
FROM
H
IGH

DIMENSIONAL
D
ATA
p
: # of features
n
: # of samples
High

dimensional data:
p
>>
n
Feature Selection:
Alleviating the effect of the curse of
dimensionality.
Enhancing generalization capability.
Speeding up learning process.
Improving model interpretability.
Curse of Dimensionality:
•
Effects on distance functions
•
In optimization and learning
•
In Bayesian statistics
High

Dimensional Data
Feature Selection Algorithm
MRMR, SVMRFE, Relief

F,
F

statistics, etc.
Low

Dimensional Data
Learning Models
Classification,
Clustering, etc.
Knowledge Discovery on High

dimensional Data
I
NTRODUCTION
AND
M
OTIVATION
S
TABILITY
OF
F
EATURE
S
ELECTION
Training Data
Feature Subset
Training Data
Feature Subset
Training Data
Feature Subset
Feature Selection Method
Consistent
or not???
Stability of Feature Selection
: the insensitivity of the result of
a feature selection algorithm to variations to the training set.
Training
Data
Learning
Model
Training
Data
Learning
Model
Training
Data
Learning
Model
Learning Algorithm
Stability of Learning Algorithm is
firstly examined by
Turney
in 1995
Stability of feature selection
was relatively neglected
before and attracted interests from
researchers in data mining recently.
Stability Issue of Feature Selection
I
NTRODUCTION
AND
M
OTIVATION
M
OTIVATION
FOR
S
TABLE
F
EATURE
S
ELECTION
D1
D2
Features
Samples
Given Unlimited Sample Size of D
:
Feature selection results from D1 and D2 are the same
Size of D is limited: (
n<<p
for high dimensional data)
Feature selection results from D1 and D2 are different
Challenge
:
Increasing #of samples could be very costly or impractical
Experts from Biology and Biomedicine are interested in:
not only the prediction accuracy but also the consistency of feature subsets;
validating stable genes or proteins less sensitive to variations to training data;
biomarkers to explain the observed phenomena.
O
UTLINE
Introduction and Motivation
Background and Related Work
Preliminaries
Publications
Theoretical Framework
Empirical Framework :
Margin Based Instance Weighting
Empirical Study
Planned Tasks
B
ACKGROUND
AND
R
ELATED
W
ORK
F
EATURE
S
ELECTION
M
ETHODS
Subset
Generation
Subset
Evaluation
Stopping
Criterion
Result
Validation
Original
set
Subset
Goodness of
subset
no
Yes
Evaluation Criteria
Filter Model
Wrapper Model
Embedded Model
Search Strategies
:
Complete Search
Sequential Search
Random Search
Representative Algorithms
Relief, SFS, MDLM, etc.
FSBC, ELSA, LVW, etc.
BBHFS, Dash

Liu’s, etc.
B
ACKGROUND
AND
R
ELATED
W
ORK
S
TABLE
F
EATURE
S
ELECTION
Comparison of Feature Selection Algorithms
w.r.t
. Stability
(
Davis et al. Bioinformatics, vol. 22, 2006;
Kalousis
et al. KAIS, vol. 12, 2007
)
Quantify the stability in terms of consistency on subset or weight;
Algorithms varies on stability and equally well for classification;
Choose the best with both stability and accuracy.
Bagging

based Ensemble Feature Selection
(
Saeys
et al. ECML07)
Different bootstrapped samples of the same training set;
Apply a conventional feature selection algorithm;
Aggregates the feature selection results.
Group

based Stable Feature Selection
(Yu et al. KDD08;
Loscalzo
et al. KDD09)
Explore the intrinsic feature correlations;
Identify groups of correlated features;
Select relevant feature groups.
B
ACKGROUND
AND
R
ELATED
W
ORK
M
ARGIN
BASED
F
EATURE
S
ELECTION
Sample Margin
: how much can
an instance travel before it hits
the decision boundary
Hypothesis Margin
: how much
can the hypothesis travel before it
hits an instance (Distance between
the hypothesis and the opposite
hypothesis of an instance)
Representative Algorithms: Relief, Relief

F, G

flip,
Simba
, etc.
margin is used for feature weighting or feature selection
(totally different use in our study)
O
UTLINE
Introduction and Motivation
Background and Related Work
Preliminaries
Publications
Theoretical Framework
Empirical Framework :
Margin Based Instance Weighting
Empirical Study
Planned Tasks
P
UBLICATIONS
Yue Han
and Lei Yu.
An Empirical Study on Stability of Feature
Selection Algorithms
.
Technical Report from Data Mining Research
Laboratory,
Binghamton University, 2009.
Yue Han
and Lei Yu.
Margin Based Sample Weighting for Stable
Feature Selection
. In
Proceedings of the 11th International Conference
on Web

Age Information Management (
WAIM2010
)
, pages 680

691,
Jiuzhaigou
, China, July 15

17, 2010.
Yue Han
and Lei Yu.
A Variance Reduction Framework for Stable
Feature Selection
. In
Proceedings of the 10th IEEE International
Conference on Data Mining (
ICDM2010
)
, Sydney, Australia, December
14

17, 2010, To Appear.
Lei Yu,
Yue Han
and Michael E.
Berens
.
Stable Gene Selection from
Microarray Data via Sample Weighting
.
IEEE/ACM Transactions on
Computational Biology and Bioinformatics (
TCBB
), 2010
, Major
Revision Under Review.
O
UTLINE
Introduction and Motivation
Background and Related Work
Preliminaries
Publications
Theoretical Framework
Empirical Framework :
Margin Based Instance Weighting
Empirical Study
Planned Tasks
T
HEORETICAL
F
RAMEWORK
B
IAS

VARIANCE
D
ECOMPOSITION
OF
F
EATURE
S
ELECTION
E
RROR
Expected Loss(Error)
:
Training Data:
D
; Data Space: ; FS Result:
r(D)
; True FS Result:
r*
Bias
:
Variance
:
Bias

Variance Decomposition of Feature Selection Error
:
o
Reveals relationship between accuracy(opposite of loss) and stability (opposite of variance);
o
Suggests a better trade

off between the bias and variance of feature selection.
T
HEORETICAL
F
RAMEWORK
V
ARIANCE
R
EDUCTION
VIA
I
MPORTANCE
S
AMPLING
Feature Selection (Weighting)
Monte Carlo Estimator
Relevance Score
:
Monte Carlo Estimator
:
Variance of Monte Carlo Estimator
:
Impact Factor:
feature selection algorithm
and
sample size
?
Increasing sample size impractical and costly
Importance Sampling
A good importance sampling function
h(x)
Instance
Weighting
Intuition behind h(x)
:
More instances draw from important regions
Less instances draw from other regions
Intuition behind instance weight
:
Increase weights for instances from important regions
Decrease weights for instances from other regions
O
UTLINE
Introduction and Motivation
Background and Related Work
Preliminaries
Publications
Theoretical Framework
Empirical Framework :
Margin Based Instance Weighting
Empirical Study
Planned Tasks
E
MPIRICAL
F
RAMEWORK
O
VERALL
F
RAMEWORK
Challenges
:
How to produce weights for
instances from the point view
of feature selection stability;
How to present weighted
instances to conventional
feature selection algorithms.
Margin Based Instance Weighting for Stable Feature Selection
E
MPIRICAL
F
RAMEWORK
M
ARGIN
V
ECTOR
F
EATURE
S
PACE
Original Space
For each
Margin Vector
Feature Space
Hypothesis
Margin:
hit
miss
Nearest Hit
Nearest Miss
captures the local profile of feature relevance for all features at
Instances exhibit different profiles of feature relevance;
Instances influence feature selection results differently.
E
MPIRICAL
F
RAMEWORK
A
N
I
LLUSTRATIVE
E
XAMPLE
Hypothesis

Margin based Feature Space Transformation:
(a)
Original Feature Space
(b)
Margin Vector Feature Space.
(a)
(b)
E
MPIRICAL
F
RAMEWORK
M
ARGIN
B
ASED
I
NSTANCE
W
EIGHTING
A
LGORITHM
Instance
exhibits different profiles of feature relevance
influence feature selection results differently
Instance
Weighting
Higher Outlying Degree Lower Weight
Lower
Outlying Degree Higher Weight
Review
:
Variance reduction via
Importance Sampling
More instances draw
from important regions
Less instances draw
from other regions
Weighting:
Outlying Degree:
E
MPIRICAL
F
RAMEWORK
A
LGORITHM
I
LLUSTRATION
Time Complexity Analysis
:
o
Dominated by Instance Weighting:
o
Efficient for High

dimensional Data with small sample size
(n<<d)
O
UTLINE
Introduction and Motivation
Background and Related Work
Preliminaries
Publications
Theoretical Framework
Empirical Framework :
Margin Based Instance Weighting
Empirical Study
Planned Tasks
E
MPIRICAL
S
TUDY
S
UBSET
S
TABILITY
M
EASURES
Average Pair

wise Similarity:
Kuncheva
Index:
Feature Subset
Jaccard
Index;
nPOGR
;
SIMv
.
Feature Ranking
:
Spearman Coefficient
Feature Weighting
:
Pearson Correlation Coefficient
Training Data
Feature Subset
Training Data
Feature Subset
Training Data
Feature Subset
Feature Selection Method
Consistent
or not???
Stability of Feature Selection
E
MPIRICAL
S
TUDY
E
XPERIMENTS
ON
S
YNTHETIC
D
ATA
Synthetic Data Generation
:
Feature Value
:
two multivariate normal distributions
Covariance matrix
is a 10*10 square
matrix with elements
1
along the diagonal
and
0.8
off diagonal.
100 groups and 10 feature each
Class label
:
a weighted sum of all feature values with
optimal feature weight vector
500 Training Data
:
100 instances with 50 from and 50 from
Leave

one

out Test Data
:
5000 instances
Method in Comparison
:
SVM

RFE: Recursively eliminate 10%
features of previous iteration till 10
features remained.
Measures
:
Variance, Bias, Error
Subset Stability (
Kuncheva
Index)
Accuracy (SVM)
E
MPIRICAL
S
TUDY
E
XPERIMENTS
ON
S
YNTHETIC
D
ATA
Observations
:
Error is equal to the sum of bias and variance
for both versions of SVM

RFE;
Error is dominated
by bias during early iterations
and is dominated
by variance during later iterations
;
IW SVM

RFE exhibits
significantly lower bias, variance and error
than
SVM

RFE when the number of remaining features approaches 50.
E
MPIRICAL
S
TUDY
E
XPERIMENTS
ON
S
YNTHETIC
D
ATA
Conclusion
:
Variance Reduction via Margin Based Instance Weighting
better bias

variance tradeoff
increased subset stability
improved classification accuracy
E
MPIRICAL
S
TUDY
E
XPERIMENTS
ON
R
EAL

WORLD
D
ATA
Microarray Data
:
Experiment Setup
:
Methods in Comparison
:
SVM

RFE
Ensemble SVM

RFE
Instance Weighting SVM

RFE
Measures
:
Variance
Subset Stability
Accuracies (KNN, SVM)
10 fold
...
Training
Data
Test Data
10

fold Cross

Validation
Bootstrapped
Training Data
Feature Subset
Aggregated
Feature
Subset
20
...
Bootstrapped
Training Data
...
Feature Subset
20

Ensemble SVM

RFE
E
MPIRICAL
S
TUDY
E
XPERIMENTS
ON
R
EAL

WORLD
D
ATA
Note
: 40 iterations starting
from about 1000 features till
10 features remain
Observations
:
Non

discriminative
during early iterations;
SVM

RFE sharply
increase as # of features
approaches 10;
IW SVM

RFE shows
significantly slower rate of
increase.
E
MPIRICAL
S
TUDY
E
XPERIMENTS
ON
R
EAL

WORLD
D
ATA
Observations
:
Both ensemble and
instance weighting
approaches improve
stability consistently;
Ensemble is not as
significant as instance
weighting;
As # of features
increases, stability score
decreases because of the
larger correction factor.
E
MPIRICAL
S
TUDY
E
XPERIMENTS
ON
R
EAL

WORLD
D
ATA
Conclusions
:
Improves stability of feature
selection
without
sacrificing prediction accuracy;
Performs much better than ensemble
approach and more efficient;
Leads to significantly increased stability
with slight extra cost of time.
O
UTLINE
Introduction and Motivation
Background and Related Work
Preliminaries
Publications
Theoretical Framework
Empirical Framework :
Margin Based Instance Weighting
Empirical Study
Planned Tasks
P
LANNED
T
ASKS
O
VERALL
F
RAMEWORK
Theoretical Framework of Feature Selection Stability
Empirical Instance Weighting Framework
Margin

based Instance Weighting
Representative
FS Algorithms
SVM

RFE
Relief

F
F

statistics
HHSVM
Various Real

world Data Set
Gene Data
Text Data
Iterative Approach
State

of

the

art
Weighting Schemes
Relationship Between
Feature Selection
Stability and
Classification Accuracy
P
LANNED
T
ASKS
L
ISTED
T
ASKS
A
Extensive Study on Instance Weighting Framework
A1
Extension to Various Feature Selection Algorithms
A2
Study on Datasets from Different Domains
B
Development of Algorithms under Instance Weighting Framework
B1
Development of Instance Weighting Schemes
B2
Iterative Approach for Margin Based Instance Weighting
C
Investigation on the Relationship between Stable Feature Selection
and Classification Accuracy
C1
How Bias

Variance Properties of Feature Selection Affect Classification Accuracy
C2
Study on Various Factors for Stability of Feature Selection
Oct

Dec 2010
Jan

Mar 2011
April

June2011
July

Aug 2011
A1
A2
B1
B2
C1
C2
Thank you
and
Questions?
Comments 0
Log in to post a comment