S w - MathNet

brasscoffeeΤεχνίτη Νοημοσύνη και Ρομποτική

17 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

73 εμφανίσεις

Kernelized Discriminant Analysis and Adaptive
Methods for Discriminant Analysis

Haesun Park


Georgia Institute of Technology,

Atlanta, GA, USA

(joint work with C. Park)

KAIST, Korea, June 2007

Clustering



Clustering :


grouping of data based on similarity measures




Classification:


assign a class label to new unseen data

Classification

Data Mining


Data Preparation


Preprocessing

Classification

Clustering


Association Analysis



Regression



Probabilistic modeling …

Dimension reduction

-
Feature Selection

-

Data Reduction



Mining or discovery of new information
-

patterns


or rules
-

from large databases

Feature Extraction

Feature Extraction



Optimal feature extraction


-

Reduce the dimensionality of data space


-

Minimize effects of redundant features and noise

Apply a classifier

to predict a class

label of new data

feature extraction

..

..

..

..

..

..

number of
features

new data

Curse of dimensionality

Linear dimension reduction

Maximize class separability


in the reduced dimensional space

Linear dimension reduction

Maximize class separability


in the reduced dimensional space


What if data is not linear separable?

Nonlinear Dimension Reduction

Contents


Linear Discriminant Analysis



Nonlinear Dimension Reduction based on
Kernel Methods


-

Nonlinear Discriminant Analysis



Application to Fingerprint Classification











n
i
i
a
n
c
1
1

For a given data set {
a
1
,
┉,
a
n
}


Within
-
class scatter matrix





trace(
S
w
)









r
i
i
i
class
a
c
a
1
2
||
||







r
i
T
i
i
class
a
i
w
c
a
c
a
S
1
)
)(
(
Centroids
:




i
class
a
i
i
a
n
c
1
Linear Discriminant Analysis (LDA)


Between
-
class scatter matrix





trace(
S
b
)


2
1
||
||
c
c
r
i
i




T
i
i
i
r
i
b
c
c
c
c
n
S
)
)(
(
1





G
T



maximize


minimize


trace(
G
T
S
w
G
)

trace(
G
T
S
b
G
)

a
1

a
n

G
T
a
1

G
T
a
n


Eigenvalue problem


x
x
S
S
b
w



1
S
w
-
1

S
b

G

=

S
w
-
1
S
b
X =


X

))
(
)
((

trace
max
)
(
1
G
S
G
G
S
G
G
J
b
T
w
T
G


rank(
S
b
)



number

of classes
-

1


Face Recognition





92 x 112

10304



G
T



?

dimension reduction to maximize

the distances among classes
.

Text Classification


A bag of words: each document is represented
with frequencies of words contained


Education

Faculty

Student

Syllabus

Grade

Tuition

….

Recreation

Movie

Music

Sport

Hollywood

Theater

…..


G
T

S
b

S
w

Generalized LDA Algorithms

x
x
S
S
b
w



1
x
S
x
S
w
b



Undersampled problems:


high dimensionality & small number of data





Can’t compute
S
w
-
1
S
b

Nonlinear Dimension Reduction

based on Kernel Methods

Nonlinear Dimension Reduction

G
T

nonlinear mapping

linear dimension

reduction

)
,
2
,
(
)
,
(
2
2
2
1
2
1
2
1
x
x
x
x
x
x


Kernel Method


If a kernel function
k
(
x,y
) satisfies Mercer’s
condition, then there exists a mapping




for which <

(
x
)
,

(
y
)>=
k
(
x
,
y
) holds


A

(A)


< x, y >
<

(
x
)
,

(
y
) > =
k
(
x
,
y
)




For a finite data set
A=
[
a
1
,

,a
n
], Mercer’s
condition can be rephrased as the kernel matrix


is positive semi
-
definite.

n
j
i
j
i
a
a
k
K



,
1
)]
,
(
[


Nonlinear Dimension Reduction by Kernel Methods

G
T

)
,
(
)
(
),
(
y
x
k
y
x





Given a kernel function
k
(
x,y
)

linear dimension

reduction

Positive Definite Kernel Functions


Gaussian kernel




Polynomial kernel

)
/
exp(
)
,
(
2

y
x
y
x
k



)
,
,
0
(
)
,
(
)
,
(
2
1
2
1
R
d
y
x
y
x
k
d










Nonlinear Discriminant Analysis
using Kernel Methods

{
a
1
,a
2
,

,a
n
}





S
b

x=


S
w
x

{

(
a
1
)
,

,

(
a
n
)}

Want to apply LDA

<

(
x
)
,

(
y
)>=
k
(
x
,
y
)



Nonlinear Discriminant Analysis
using Kernel Methods

{
a
1
,a
2
,

,a
n
}





S
b

x=


S
w
x

{

(
a
1
)
,

,

(
a
n
)}

k
(
a
1
,
a
1
)

k
(
a
1
,
a
n
)


… ,…, …

k
(
a
n
,
a
1
)

k
(
a
n
,
a
n
)

S
b

u
=


S
w

u

Apply
Generalized LDA


Algorithms




S
b

S
w

Generalized LDA Algorithms

x
S
x
S
w
b


Minimize

trace(
x
T
S
w
x
)


x
T
S
w
x = 0


x


null(
S
w
)

Maximize

trace(
x
T
S
b
x
)


x
T
S
b
x
≠ 0


x


range(
S
b
)



Generalized LDA algorithms


Add a positive diagonal matrix

I


to

S
w

so that

S
w
+

I
is nonsingular

RLDA

LDA/GSVD


Apply the generalized singular value


decomposition (GSVD) to {
H
w
, H
b
}


in
S
b

= H
b
H
b
T

and
S
w
=H
w

H
w
T

To
-
N
(
S
w
)


Projection to null space of
S
w


Maximize between
-
class scatter


in the projected space

Generalized LDA

Algorithms

To
-
R
(
S
b
)


Transformation to range space of
S
b


Diagonalize within
-
class scatter matrix


in the transformed space

To
-
NR
(
S
w
)


Reduce data dimension by PCA


Maximize between
-
class scatter


in range(
S
w
) and null(
S
w
)


Data sets


Data dim no. of data no. of classes


Musk 166 6599 2


Isolet 617 7797 26


Car 6 1728 4


Mfeature 649 2000 10


Bcancer 9 699 2


Bscale 4 625 3

From Machine Learning Repository Database

Experimental Settings

Split

kernel function
k

and a linear transf.
G
T

Dimension reducing

Predict class labels of test data using training data

Original data

Training data

Test data



Each color represents different data sets

methods

Prediction

accuracies

Linear and Nonlinear Discriminant Analysis

Data sets

Face Recognition

Application of Nonlinear Discriminant
Analysis to Fingerprint Classification


Left Loop Right Loop Whorl


Arch Tented Arch

Fingerprint Classification

From NIST Fingerprint database 4

Previous Works in Fingerprint Classification

Feature representation


Minutiae





Gabor filtering


Directional partitioning


Apply Classifiers
:


Neural Networks


Support Vector


Machines


Probabilistic NN

Our Approach


Construct core directional images by DFT


Dimension Reduction by Nonlinear Discriminant Analysis

Construction of Core Directional Images


Left Loop Right Loop Whorl

Construction of Core Directional Images

Core Point

Discrete Fourier transform (DFT)


Discrete Fourier transform (DFT)


Construction of Directional Images



Computation of local dominant directions
by DFT and directional filtering



Core point detection



Reconstruction of core directional images



Fast computation of DFT by FFT


Reliable for low quality images




Computation of local dominant directions
by DFT and directional filtering

Construction of Directional Images

105 x 105

512 x 512

Nonlinear discriminant Analysis





105 x 105

11025
-
dim. space

G
T

Left loop

Whorl

Right loop

Tented arch

Arch

Maximizing class separability

in the reduced dimensional space

4
-
dim. space

Comparison of Experimental Results


NIST Database 4


Rejection rate (%)

0 1.8 8.5 20.0

Nonlinear LDA/GSVD

90.7

91.3

92.8

95.3

PCASYS +


89.7 90.5
92.8

95.6


Jain et.al
. [1999,TPAMI]

-

90.0 91.2 93.5

Yao et al
. [2003,PR]

-

90.0 92.2
95.6


prediction accuracies (%)

Summary



Nonlinear Feature Extraction based on Kernel
Methods


-

Nonlinear Discriminant Analysis


-

Kernel Orthogonal Centroid Method (KOC)


A comparison of Generalized Linear and
Nonlinear Discriminant Analysis Algorithms


Application to Fingerprint Classification


Dimension reduction
-

feature transformation :


linear combination of original features


Feature selection :


select a part of original features


gene expression microarray data anaysis


--

gene selection


Visualization of high dimensional data


Visual data mining



θ
i,j

:

dominant direction on the neighborhood
centered at

(i, j)



Measure consistency of local dominant directions


|
ΣΣ
i,j=
-
1,0,1

[
cos
(
2
θ
i,j
)
, sin
(
2
θ
i,j
)] |


:
distance from the starting point to finishing point



the lowest value
-
> Core point



Core point detection

References


L.Chen et al., A new LDA
-
based face recognition system which can solve the small
sample size problem, Pattern Recognition, 33:1713
-
1726, 2000


P.Howland et al., Structure preserving dimension reduction for clustered text data
based on the generalized singular value decomposition, SIMAX, 25(1):165
-
179, 2003


H.Yu and J.Yang, A direct LDA algorithm for high
-
dimensional data
-
with application
to face recognition, Pattern Recognition, 34:2067
-
2070, 2001


J.Yang and J.
-
Y.Yang, Why can LDA be performed in PCA transformed space?,
Pattern Recognition, 36:563
-
566, 2003


H. Park et al., Lower dimensional representation of text data based on centroids and
least squares, BIT Numerical Mathematics, 43(2):1
-
22, 2003



S. Mika et al., Fisher discriminant analysis with kernels, Neural networks for signal
processing IX, J.Larsen and S.Douglas, pp.41
-
48, IEEE, 1999



B. Scholkopf et al., Nonlinear component analysis as a kernel eigenvalue problem,
Neural computation, 10:1299
-
1319, 1998



G. Baudat and F. Anouar, Generalized discriminant analysis using a kernel approach,
Neural computation, 12:2385
-
2404, 2000



V. Roth and V. Steinhage, Nonlinear discriminant analysis using a kernel functions,
Advances in neural information processing functions, 12:568
-
574, 2000

..


S.A. Billings and K.L. Lee, Nonlinear fisher discriminant analysis using a
minimum squared error cost function and the orthogonal least squares
algorithm, Neural networks, 15(2):263
-
270, 2002



C.H. Park and H. Park, Nonlinear discriminant analysis based on generalized
singular value decomposition, SIMAX, 27
-
1, pp. 98
-
102, 2005


A.K.Jain et al., A multichannel approach to fingerprint classification, IEEE
transactions on Pattern Analysis and Machine Intelligence, 21(4):348
-
359,1999


Y.Yao et al., Combining flat and structural representations for fingerprint
classifiaction with recursive neural networks and support vector machines,
Pattern recognition, 36(2):397
-
406,2003


C.H.Park and H.Park, Nonlinear feature extraction based on cetroids and kernel
functions, Pattern recognition, 37(4):801
-
810


C.H.Park and H.Park, A Comparison of Generalized LDA algorithms for
undersampled problems, Pattern Recognition, to appear


C.H.Park and H.Park, Fingerprint classification using fast fourier transform and
nonlinear discriminant analysis, Pattern recognition, 2006