Proximal Support Vector Machine Using Local Information

randombroadΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

78 εμφανίσεις

Proximal Support Vector Machine

Using
Local

I
nformation


Xubing Yang
1,2

Songcan Chen
1



Bin Chen
1

Zhisong Pan
3


1
Department of Computer Science and Engineering


Nanjing University of Aeronautics & Astronautics
,
Nanjing 210016, China

2
College of Informa
tion Science and Technology,

Nanjing

Forest
ry

University,
Nanjing

210037, China

3.
Institute of Command Automation,

PLA University of Science Technology, Nanjing 210007
, China


Abstract
:
Instead of s
tandard support vector machines
(SVM
s
)
that
classif
y

poi
nts

to one of two disjoint
half
-
spaces

by solving a quadratic program
,
the plane classifier
GEPSVM
(
Proximal SVM Classification
via
Generalized Eigenvalues
)

classifies points by assigning them to the closest of two
nonparallel

planes
which
are
generated by

their corresponding generalized
e
igenvalue

problems
.

A simple geometric interpretation of
GEPSVM is that
each

plane is closest to the points of its own class and furthest to the points of the other
class.
A
nalysis and experiments have
demonstrate
d its
cap
a
bility

in both computation time and test
correctness.

In this paper,
following the geometric intuition of GEPSVM
, a new
supervised
learning method
called
Proximal Support Vector Machine Using Local Information

(L
I
PSVM)

is proposed.
W
ith the
introduction o
f
proximity information

(consideration of underlying information such as
correlation

or
similarity
between points) between the training points,
LIPSVM

not only keeps
aforementioned

characteristic
s of GEPSVM, but also
has

its
additional advantages: (1) robu
stness to outliers; (2) each plane
is generated
from its corresponding standard rather than generalized
eigenvalue problem

to avoid matrix
singularity
; (
3
) comparable classification ability to the
eigenvalue
-
based
classifiers
GEPSVM and
LDA.
Furthermore, t
he idea of
LIPSVM

can be
easily

extended to
other classifiers, such as

LDA.
F
inally, some
e
xperiments on the artificial and benchmark datasets show

the effectiveness of
LIPSVM
.

Keywords
:
p
roximal
classification;
e
igenvalue
; manifold learning;
outlier

1.
In
troduction

S
tandard

support vector machine
s

(SVM
s
)

are

based on the structural risk
minimization
(SRM)
principle

and aim at maximizing the margin between the points of two
-
class data
set
.

For the pattern recognition case,





Corresponding author: Tel: +86
-
25
-
84896481
-
12106; Fax: +86
-
25
-
84498069; email:
s.chen@nuaa.edu.cn

(Songcan Chen),
xbyang@nuaa.edu.cn

(Xubing Yang) ,
b.chen@nuaa.edu.cn

(Bin Chen) and
pzsong@nuaa.edu.cn

(Zhisong

Pan)


they

have shown
their

outstanding c
lassification

performance
[
1
]
in
many applications
, such as
handwritten

digit recognition

[
2
,
3
]
, object recognition [
4
], speaker identification [
5
], face detection in images and text
categorization

[
6
,
7
]
.
A
lthough
SVM

is

a

powerful classification
tool,
it

require
s

the solution of quadratic
programming

(QP)

problem
.

R
ecently,
Fung

and

Mangasarian

introduce
d

a linear classifier, p
roximal
s
upport
v
ector
m
achine

(PSVM)
[
8
]
, at KDD2001 as variation
of

a standard

SVM
.
Different from SVM,
PSVM re
plac
es

the
inequal
ity
with
the
equality

in the defining
constraint

structure of the SVM framework
.

Besides, it

also replac
es

the
absolute error
measure
by
the
squared error
measure
in defining
the

minimization problem.
T
he authors claim
ed

that

in doing so
,

the
comput
ation
al

complexity
can be greatly
reduc
ed

without resulting in

discernible loss of classification accuracy
.

F
urthermore,
PSVM classifies
two
-
class points to the closest of two
parallel

planes that are pushed apart as far as possible.
GEPSVM
[
9
]

is
an alternative
version to PSVM, which
relaxes

the parallelism condition on the PSVM. Each of the
nonparallel

proximal planes is generated
by a generalized eigenvalue problem
.
I
ts
performance

has

been
showed in both computation
al

time and
test accuracy

in [
9
].

S
imilar to PSVM, the geometric interpretation
of GEPSVM is
that each
of two
nonparallel

planes

is as close as possible to one of

the
two
-
class
data sets
and as far as possible from the other
class
data set.


I
n this paper,
we propose
a n
ew
nonparallel

plane classifier
LIPSVM

for binary classification
.

D
ifferent
from GEPSVM,
LIPSVM

introduces
proximity

information between
the data
points
in
to
constructing

classifier
.
A
s for so
-
called

proximity
information
, it

is
often
measured

by

the

neare
st neighbor
relations

lurking in the

points.

Cover and Hart
[
10
]
had
firstly
concluded that almost
the
half
of
the classification
information is contained in the nearest neighbors.
F
or
the
purpose of
classification, the basic assumption
here
is that
the
poi
nts sampled from the same class have higher correlation/similarity
(
for example, they are
sampled from an unknown identical distribution) than those from the different
one
s [
11
].
F
urthermore,
e
specially

in recent years
, many researchers have reported that m
ost of the points of

a

dataset are highly
correlated, at least locally, or
the data set has inherent geometrical property

(for example, a manifold
structure)

[
12
,
13
]
.

This
issue

explains the

success
es

of the increasingly popular manifold learning methods
,

s
u
ch as
Locally Linear E
mbedding

(LLE) [
14
],
ISOMAP
[
15
], Laplacian Eigenmap
[
16
] and their extensions
[
17
,
18
,
19
]
.

A
lthough those algorithms are efficient for
discovering

intrinsic

feature of the lower
-
dimensional
manifold embedded in the original

high
-
dimensional

observation

space,
up to now
many open problems still
have not been efficiently solved
for supervised
learning
,
for instance,

data classification
.

One of the most
important
reasons

is that

it is
not
necessarily
reasonable to suppose that the manifolds in
different class will
be well
-
classified in the same lower
-
dimension embedded space.

F
urthermore,
the
intrinsic

dimensionality of
a
manifold is usually unknown
a priori

and can not be reliably estimated from the dataset.

I
n this paper,
q
uite d
ifferent from
the
aforementioned

manifold learning methods,
LIPSVM

need
s

not consider
how to
estimate
intrinsic

dimension

of the embedded space

and only
requests

the
proximity

information between
points
which
can

be derived from

their nearest neighbor
s
.

W
e highlight the

contributions of this paper.


1)
I
nstead of generalized eigenvalue problems in GEPSVM algorithm,
LIPSVM

only needs
to solve
standard eigenvalue problem
s
.

2
)
With introducing this proximal information into constructing
LIPSVM
,
we expect
that
the
so
-
develo
ped
classifier

is
to be

robust to outliers.

3
)
I
n essence, GEPSVM is
derived

by a generalized eigenvalue problem through minimizing a kind of
Rayleigh quotient.
F
or
the
two real symmetric matrices
appearing
in GEPSVM criterion, if
both

are

positive

semi
-
d
efinite

or singular
, an ill
-
defined operation
will be yielded
due to floating
-
point imprecisions.
S
o,
GEPSVM adds a perturbation to one of the
singular (or semi
-
definite positive)

matrix.
A
lthough the authors
also
claimed

that this perturbation acts as som
e kind of regularization, the real influence in this setting of
regularization is not yet
well
understood.
I
n contrast
,
LIPSVM

need
not
to care about the
matrix
singularity
due to adoption of
a similar f
ormulation

to
the

Maximum Margin Criterion
(MMC)
[
20
]
,

but it is worthwhile
noting that MMC just is a
dimensionality reduction method rather than a classification method.

4
) The idea of
LIPSVM

is applicable to a wide range of binary classifiers, such as LDA.

T
he rest of this paper is
organized

as follows. In

Section 2,
we review some basic work about GEPSVM
.
M
athematical

description of
LIPSVM

will
appear

in section 3.
I
n Section 4, we extend th
is

design
idea of
LIPSVM

to LDA.
A
nd in Section 5, we provide the experimental results on some
artificial
and public
datasets
.

Finally, we conclu
de the whole paper

in Section 6.

2.
A brief review on
GEPSVM

algorithm

GEPSVM [
9
]
algorithm has

been
validate
d

that it is

effective for binary
classification
.
I
n this section, we
briefly
illustrate

its
main
idea.

Given a training set of two pattern
classes
,
i

= 1, 2 with
N
i

n
-
dimensional
patterns in the
i
th

class. Throughout the paper,
superscript


T


denote
s

transposition, and

e


is a
case
-
dependent dimensional column ve
ctor whose entries are all ones.
Denote the

training set by
a

N
1

n

matrix
A

(
A
i
,
the

i
th

row of
A
, corresponds

to the
i
th

pattern

of
Class 1) and the
N
2

n

matrix
B

(
B
i

has the
same
meaning

of
A
i
)
,

respectively. GEPSVM attempts to seek two
non
-
parallel
plan
es
(
eq (
1
))

in
n
-
dimensional input space

respectively
,


,

(1)

where

and

mean the weight and threshold of

the given
i
th

plane.

The
geometrical
interpretation
,
each

plan
e

should be
closest to the points
of

its own class and
furthest

from the points
of

the other
class
,
leads to the

following optimization problem,




(2)

w
here


is a nonnegative regularization
factor, and ||.||
mean
s the
2
-
norm
.

Let



G
: = [
A

e
]
T
[
A

e
] +

I

,


H
: = [
B

e
]
T
[
B

e
],
z
: =[
w
T

r
]
T





(3)

t
hen, w
.r.t

the f
irst plane of (1), formula (2) becomes:



,

(4)

where
both
matrices

G

and
H

are positive semi
-
definition when


=0
.
Formula

(4) can be solved by the
following generalized eigen
value problem



G z

=


H z

(
z



0
)
.



(5)

When
either of
G

and

H

in eq (5) is a positive definite matrix, t
he global minimum of (4) is achieved at an
eigenvector of the generalized eigenvalue problem (5)

corresponding to the smallest eigenvalue
.

S
o in many
real
-
world cases,
a
regularization

factor


must be set to a positive
constant
,
especially

in
s
ome Small Size
Sample (SSS) problems
.
T
he 2
nd

plane can be obtained with a similar process.

U
nder
the
fores
aid optimization criterion,
GEPSVM
attempt
s to estimate
planes

in input space

for

the
given data
, that is, each plane
is generated or approximated by
the data

points of its
correspond
ing

class
.

In
essence
,

the

points
in

the same
class

are

fitted
using

a li
near function.
H
owever,
in
the

viewpoint of
regression,
it is

not quite
reasonable to take a
n

outlier (a point far from the
most samples
)
as
a
normal sample

in
data
fitting
methodology

when

outlier
s

are

present
.
F
urthermore,
the
outliers
usually

conduct

er
roneous
data

information
,
even

misguide

a fitting
. So t
hey can
heavily

affect fitting effect
in most cases

as shown i
n
Fig
.

1

where
two outliers

are added
.
O
bvious
ly,

the plane

generated

by GEPSVM
is heavily biased

due to
presence of the

two outliers.
Ther
efore, in this paper,
we
attempt

to
define

a new
robust criterion

to seek
the
planes
which not only
substantially

consider
s original data distribution, but also
can be

resistant

to outliers

(
see

red dashed lines in
Fig. 1
, which are generated from
LIPSVM
)
.



F
ig
.
1 The planes learned by GEPSVM and
LIPSVM
, respectively.
T
he
red

(
dashed
) line
s

come from
LIPSVM
, and the black
(
solid
) one
s
, from GEPSVM.
D
ata points in Class 1 are
symbolize

with

o

, and in Class 2, with
“□”
.
T
hose symbols with
additional

+


stand
for margin
al

points

(
k
2
-
nearest neighbors to Class 1) in Class 2
.
T
he two points
,

far
away
from most of
data points in Class 1
,

can
be thought as outliers.
I
t is also illustrates the intra
-
class graph and its co
nnected relationship of data
points in Class 1.


I
n what follows, we
detail

our
LIPSVM
.

3.
Proximal Support Vector Machine Using Local Information

(
LIPSVM
)

I
n this section, we introduce
our

novel classifier
LIPSVM
, which

contains

the following

two steps
.

I
n the first step
,

constructing the

two graphs characterize the intra
-
class denseness and the inter
-
class
separability

respectively
. Each vertex
in the graphs
corresponds

to
a

sample of the given data, as described
in many
graph
-
based
machine learning meth
ods
[
13
]
.
D
ue to
the
one
-
to
-
one
correspondence

between

vertex


and

sample

, we
will

not strictly
distinguish

them
hereafter.

In

the intra
-
class graph,
a
n edge
between

a

vertex pair is
add
ed
when

the corresponding sample

pair

is

each other

s
k
1
-
nearest neighbors
(
k
1
-
NN) in

the same class.
I
n the inter
-
class
graph
, the vertex pair, whose corresponding samples come
from different classes, is connected when one

of the pair
is a
k
2

-
NN of the other.
F
or the intra
-
class
case
,
the p
oints in high density regions
(
hereafter we call them
interior

points
)

have more chance to become
nonzero
-
degree vertexes; while the points in low density regions,
for example
,

outliers
, become

more likely

isolated vertexes (zero
-
degree).
F
or the inter
-
cla
ss case,
the
points in

the

marginal regions

(
marginal points
)
have more possibility

to become nonzero
-
degree vertexes.

I
ntuitively
,

if

a fitting plane of one class is far
away from th
e

marginal points of the other class, at least in linear
-
separable case
,
this plane
may be

far
away from the rest.

In the second step
, only
those

nonzero
-
degree
points

are

used

to
train
ing

classifier
.
T
hus
,
LIPSVM

can
restrain

outlier

to great extent

(see Fig.1).

T
he
training
time cost

of LIPSVM

is from

the

two
aspect
s
:

O
ne i
s
from

selection of
interior

points

and
marginal points
,
and t
he other
from

the
optimiz
ation of

LIPSVM
.
T
he
following analysis

indicates that
LIPSVM performs

faster than
GEPSVM
:

1)

LIPSVM

just requires solving a standard eigenvalue problem,
while

GEPSVM ne
eds to solve a generalized eigenvalue problem; 2)
After finishing samples selection, the
size of th
e

selected samples used to training LIPSVM is smaller than that of the GEPSVM
.
.

F
or example, i
n
Fig
.1, the

first
plane

(
the top
dash line

of Fig.1
)

of

LIPSVM

is closer to the
interior points

of Class 1 and
far
away

from the
marginal

points

of Class 2.

A
ccordingly
, the training set size of
LIPSVM

is smaller than
that of GEPSVM.

I
n the next

sub
section,
we firstly
derive

the linear
LIPSVM
.

T
hen, we
develop its

cor
responding nonlinear
one with kernel tricks
.


3.1 Linear
LIPSVM

C
orresponding

to Fig.1,
the
two
adjacent

matrices

of each plane
are

respectively
denoted by
S

and
R

and
defined as follows:


, (6)

, (7)

where
denote
s
a set of the

k
1
-
nearest neighbors in the same class of the
sample
, and

a
set of data points composed of
k
2
-
nearest neighb
ors (
k
2
-
NN) in the different class of the sample
.
W
hen
or
, an undirected edge between

and

is

added
to
the
corresponding
graph
.

A
s a
r
esult, a

linear plane of
LIPSVM

can be

produce
d from those
nonzero
-
degree
vertexes
.


3.1.1

O
ptimization criterion

of LIPSVM


A
nalogously

to GEPSVM
,
LIPSVM

also
tries

to seek two
nonparallel
planes
as described in

eq (1).
With
the
similar

geometric intuiti
on of GEPSVM,
w
e define
an
optimization criterion
to determine

the plane of
Class 1 as follows:




(8)



(9)

B
y s
implifying (8),
we obtain the following expression:


(10)

where
the weight

and
.

For geometrical

interpretab
ility (see
3.1.3
), we define the weights

and

as:



(11)






(12)

where

and
.

Next we discuss how to solve this optimization problem.


3.1.2

Analytic S
olution

and Theoretical Justification

Define
a
Lagrange multiplier function based on the objective
function

(10) and equality constraints (9) as
follows:

. (13)

S
etting the gradients of
L

with respect to
w
1

and
r
1

equal to zero gives the following optimality conditions:



, (14)




. (15)

Simplifying (14) and (15), we obtain the following simple expression with matrix form:


, (16)


, (17)

where
,
,
,
and
.

When
, the variable
r
1

disappear
s
f
rom expression (17). So, we discuss
the
solutions of the
proposed optimization criterion
in two cases
:

1
)

when
, the optimal solution
to

the
above
optim
ization
problem
is
obtained

from
solving
the
following eigenvalue problem after substituting eq

(17) into (16).



(18)




(19)

where
,
, and
.

2
)
when
,
r
1

disappear
s

from eq. (17).
A
nd eq.(17) becomes:

.

(20)


Left multiplying eq.

(16) by
, and substituting eq

(20) into it, we obtain a new expression as follows:


. (21)


W
hen

is an eigenvalue of the real
symmetrical

matrix
,
the
s
olutions of eq. (21)
can

be
obtain
ed from
the
following

standard
eigen
-
equation
(details is described in
T
heorem 1)

.

I
n this situation,
r
1

can not be solved
through

(16) and (17). So instead
,

we directly define
r
1

with

the
intra
-
class vertexes as follows:


=

(22)

A
n intuition for
the above definition is from that

a

fitting plane/line

passing

th
rough

the center of
the
given
points

h
a
s

less regression loss in a sense o
f MSE

[
21
]
.

Next,
importantly,
we
will
prove that the optimal
normal

vector
w
1

of the first plane is
exactly
an
eigenv
ector of the aforementioned eigenvalue problem corresponding to a smallest eigenvalue

(Theorem 2)
.

Theorem

1
.


Let
be a real
symmetric

matrix
, for any unit
vector

, if
the

is a

solution
of the
eigen
-
equation
, then
it

must
satisfy

, where
.
Convers
e
ly
,

if

satisfies


and
is
an eigenvector of matri
x

, then

and

must satisfy
.

Theorem
2
. The optimal
normal

vector
w
1

of the first plane
is
exactly
the

eigenvector corresponding to
smallest

eigen
value of the

eigen
-
equation
derived
from objective (8)
subject to

constraint (9).

P
roof:

we

re
write

eq.
(10)

(
equivalent to
objective
(8)
)

as follows:


Let




S
implifying the above expression and representing it in matrix form, we
obtain



(23)

1)
W
hen
, substituting eq. (18), (19)
and (9) into (23), we get the following expressio
n.



(24)


=


T
hus
,
the optimal value of the optimization problem is the smallest eigenvalue of the eigen
-
equation (18).
Namely,
is an ei
genvector corresponding to
the

smallest
eigenvalue
.

2) When
, eq

(17) becomes

. (25)

E
q

(25) is equivalent to the following expression:


. (26)


Substituting

(26) and (21) into (23),

we obtain



=
,


where

is still an ei
genvalue of eq

(18).

This ends our proof of
the
Theorem
2
. #


3.1.3

G
eometrical

I
nterpreta
tion

According to
the
defined

notations
, eq

(10)
implie
s
that
LIPSVM

only concerns

those samples whose
weight
s

(
.
d
l

or
f
m
)
are

greater than

zero.
F
or example,
will be

present
in
eq

(
10
)

if
and
only
if
its
weight
.

As a result
,
the

points
of a training set of
LIPSVM

are selectively generated
by
the
two
NN

matrices
S

and
R
, which can lead LIPSVM to robustness due to

elimina
ting

or
restrain
ing

effect of outliers.
Fig.1

gives

a
n illustration in which the

two outliers in Class 1
are

eliminated during selecti
ng

samples
process.

Similarly
,
, its
corresponding

marginal point,

will be kept
in (10)
when
.

I
n most cases,
the number of marginal points is
far

less
than
th
at of the

given
points
. Thus, the number of the samples
used
to train LIPSVM can be

reduced
.

Fig. 1 also illustrate
s

those marginal points

as marked

+

.

S
ince

the

distance of a point

to a plane

is

measured as

[22]
, the expression

stands for the square distance of the points

to the
plane
.
S
o,
the goal of
training

LIPSVM

is to seek the plane

as
close to the
interior points

in Class 1
as possible
and

as
faraway
from the
margin
al

points

in Class 2

as possible
.

This is quite c
onsist
ent

with the
foresaid
optimization objective.

S
imilarly
, t
he second plane
for the other class
can be obtained.
I
n the following,

we

are
in
a
position to
describe its nonlinear version.


3.2 Nonlinear
LIPSVM

(Kernelized
LIPSVM
)

In real world, the prob
lems encountered can not always
effectively
be handled using linear methods.
In
order to

mak
e the proposed
method able to
accommodate

nonlinear cases, we extend it to
the
nonlinear
counterpart

by

well
-
known kernel trick
[
22
,
23
]
.

T
hese non
-
linear kernel
-
based

algorithms, such as
KFD
[
24
,
25
,
26
]
,

KCCA

[
27
]
,

and KPCA

[
28
,
29
]
, usually use the

kernel trick


to achieve their non
-
linearity.
T
his
conceptually corresponds to first mapping the input into a higher
-
dimensional feature space (RKHS:
R
eproducing
K
ernel Hilbert
S
p
ace) with some non
-
linear transformation.
T
he

trick


is that this mapping is
never
given

explicitly
, but implicitly induced by a kernel.
T
hen those linear methods
can

be applied in this
new
ly
-
mapped

data space RKHS
[
30
,
31
]. Nonlinear
LIPSVM

also follow
s

th
i
s

process
.

Rewrite
the
training set

as
.

F
or
convenience

of description,
we define Empirical Kernel Map
(EMP, for more detail please
sees

[
23
,
32
]
)

as follows:


where
. Function

stands for an arbitrary
Mercer
kerne
l
,

for
any
n
-
dimension
al

vector
s

x

and
y
,

which maps the
m

into a real number in
R
. A frequently used kernel in
nonlinear classificatio
n is Gaussian with the expression

, where

is
a positive constant

and called as the bandwidth
.


S
imilarly, we consider the following kernel
-
generated nonlinear plane, instead of the aforementioned
li
near case
in input space.


,

(27)


D
ue to

the

k
1
-
NN and
k
2
-
NN relationship graphs

and
matrices (denoted by
and

), we consider the
following optimization criterion instead of the original one in the input space as (8) and (9)

with an entirely
similar argument.




(28)




(29)


With
an

analogous

manipulation

to the linear case
, we
also
have the following eigen
-
system
(
when
)



where
,
, a
nd

.
Specification

about
the
parameters is

completely
analogous

to linear
LIPSVM
.


3.3
Links

with previous approaches

Due to its simplicity
, effectiveness

and efficiency, LD
A

is still a popular dimensionality reduction
in
many applications such as handwritten digit recognition

[
33
]
,

face detection

[
34
,
35
]
,

text categorization

[
36
]

and target tracking

[
37
]
.

However,

it
also

has several essential

embarrassments such as singularity o
f
the
scatter matrix

in SSS

case
and

the

problem
of
rank limitation. To
attack

these

limit
ation
s,
in

[
42
]
,
we have

previously

de
signed

alternative LDA (A
F
LDA) by introducing a new discriminant criterion. A
F
LDA
overcomes the rank limitation and at the same
time
, mitigates the singularity.
Li and Jiang

[
20
]
exploit
ed

the
average maximal margin
principle

to
define
a

so
-
called
Maximum Margin Criterion (
MMC
)

and derived
a
lternative discriminant analysis approach
. Its main difference from LDA criterion is to adopt the trace
difference
instead of the trace ratio
between the between
-
class scatter and

the within
-
class scatter, as a result,
bypassing both the singularity and th
e rank limitation.

In
[
13
],

M
arginal Fisher Analysis (MFA)
establishes

a
similar formulation of the trace ratio between
the

two scatters to
LDA

but
further
incorporates manifold
structure of the given data

to

f
ind

the projection directions in the
PCA

transform
ed subspace
.

D
oing so

avoids the singularity
.

Though there are many methods to overcome the problems, their basic
philosophy

are
similar, thus, we just
mention

a few above

here
.
B
esides, these methods are l
argely dimensionality reduction,
despite of different definitions for the scatters in
the
objectives

while in order to perform classification task
after dimensionality reduction, they all
us
e the simple and popular

nearest neighbor

and thus

are generally
a
lso

viewed as an indirect classification method. In contrast,
SVM
is a
directly
-
design
ed

classifier based
on
the SRM principle
by

maximizing the margin

between the two
-
class
given
data points

and
has been shown
superior classification performance in most r
eal cases
. However, SVM requires

solving a

QP problem.

U
nlike its sol
vin
g
as described in section I, PSVM

[
8
]
utilizes
2
-
norm and equa
lity

constraints and
only
need
s

to solve a
set of
linear
equations

for
seek
i
ng

two
parallel

planes. While
GEPSVM

[
9
]
relaxes

this
paralle
lism and aims to
obtain

two
nonparallel

planes from two corresponding generalized eigenvalue
problems, respectively.
However, it
also encounters the
singularity

problem

for

which
the authors used
the
regularization technique.

Recently, Guarracino and Cifarelli
gave a
more flexible
setting technique for the
regularization parameter to overcome the same

singularity

problem
and named
so
-
proposed plane cla
ssifier

as
ReGEC

[
38

39
]
. ReGEC

seeks two planes
simultaneously just from

a single

generalized eigenvalue
equation

(the two planes correspond respectively to the maximal and minimal eigenvalues),
instead of two
equations in GEPSVM
.

I
n 2007, an incremental v
ersion of ReGEC, termed as I
-
ReGEC[
40
], is proposed,
which
first performed
a subset selection and then use
d

th
e

subset to
train ReGEC classifier

for performance
gain
.

A common point of t
h
e
se plane
-
type classifiers
in overcoming the singularity all adopt the

regularization
technique.
H
owever,

f
or one thing, the selection of
the

regularization factor
is
a key to
performance of solution and
still open up to now.
F
or another thing,
the introduction of the
regularization
term in

GEPSVM
unavoidably
depart
s

from it
s

original geometricism partially
.

A

major difference of our
LIPSVM from them

is no need of regularization due to that the solution of LIPSVM is just an
ordinary

eigen
-
system.
I
nterestingly,

the

relation between LIPSVM and MMC is
quite
similar to
that

betw
een
GEPSVM
and

regularized LDA

[
41
]

LIPSVM
is
develop
ed by
fus
ing

proximity

information so as to not only keep the
characteristic
s
, such as
geometricism of
GEPSVM, but also
possess

its own advantages as described in Section I.
Extremely
, when
the

number of
NN, k,
take
s large enough, for instance, set
k
=
N
, all
the given
data points
can

be used for
training LIPSVM.
A
s far as
geometrical

principle, i.e. each
approximate

plane is as closest to data points of
its own class as
possible

and as furthest to points of

the other class as possible, the
geometricism

of LIPSVM
is in complete
accordance
with that of GEPSVM.

S
o, with the
inspiration

of MMC, LIPSVM
can be seen as

a generalized
version
of GEPSVM
.


4.

A b
yproduct

inspired by
LIPSVM


LDA
[
42
]
ha
s

been
widely us
ed for pattern classification

and can be
analytical
ly solved by its
corresponding eigen
-
system. Fig.2
illustrate
s

a binary classification problem and
the
data points of each class
are generated from a non
-
Gaussian distribution.
T
he solid (black) line stand
s for the LDA optimal
decision

hyper
plane

which

is
directly generate
d

from
the
two
-
class points
, w
hile

the dashed (green) line is obtained
by LDA only with those so
-
called
margin
al

points
.


Fig.2

Two
-
class
non
-
Gaussian

data points and

their discriminant
plane
s

generated by LDA.

S
ymbol

o


stands for those points
in Class 1, while
“□”

is
for Class 2. The
marginal points
, marked

x

,
come from aforementioned inter
-
class relationship
graphs.
T
he solid (black) line is a discriminant plane obtained by LDA
with

all training samples, while for the dashed (green)
one, obtained by LDA only with
those
marginal points
.


From
Fig.2 and
the

design
idea of LIPSVM,
a two
-
step LDA can be
developed

through using th
ose

interior

points

or

margin
al

points

in the two classes
.

F
irstly,
select

those
nonzero
-
degree
points
of
k
-
NN
graphs
from

training samples;
secondly
, train LDA with th
e

selected
points.
F
urther analysis and experiments
are

discussed

in Section
5
.

In
what

follows, we turn to our experimental tests and some compa
risons.


5
.

E
xperiment
al

V
alidation
s

and
C
omparison
s


T
o
demonstrate

the performance of our proposed algorithm, we report results on
one
synthetic

toy
-
problem
and UCI

[
43
]

real
-
world

datasets in two parts: 1) comparisons among
LIPSVM
,
ReGEC,
GEPSVM

and

LDA;

2) comparisons between LDA and their
varia
nts.
T
he

synthetic

data set
named

CrossPlanes


consist
s

of two
-
class samples

generated respectively from two intersecting planes (lines) plus
Gaussian

noise.
I
n this
section, all computational time was
obtained

o
n a machine
running

Matlab 6.5 on Windows xp with a
Pentium IV 1.0GHz processor and 512 megabytes of memory.

5.1
C
omparisons among
LIPSVM
,
ReGEC,
GEPSVM

and

LDA


I
n this
sub
section,
we test the foresaid classifiers
with

linear and
Gaussian

kernel
, respecti
vely.


Table 1 shows a
comparison
of
LIPSVM

versus
ReGEC,
GEPSVM, LDA.
When
a linear kernel

is used
,

ReGEC has two regularization parameter

δ
1

and

δ
2
, while

each of
GEPSVM and
LIPSVM

ha
s

a
single
parameter:
δ

for GEPSVM

and

k

for
LIPSVM
1
.

Parameters

δ
1

and

δ
2

were selected from {10
i
|
i
=
-
4,
-
3


3
,
4}
,

δ

and
C

were

selected from the values {10
i
|
i
=
-
7,
-
6


6
, 7
},

and

k

of NN
in
LIPSVM

was

select
ed

from {2
, 3, 5, 8
} by using 10 percent of
each

training set as a tuning set.
According to the suggestion [
9
],
the tuning set of
GEPSVM

was

not returned to the training fold to learn the final classifier once the
parameter
was

determined
.
W
hen
facing
singularity

of
both
augmented
sample

matrices
,
a small
disturbance
,
such as

η
I
,

will be added to
the
G

in
ReGEC.

I
n addition to reporting the average accuracies across the 10
folds, we
also
performed paired
t
-
tests
[
44
]

in
comparing
LIPSVM

to
ReGEC,
GEPSVM
and LDA.
T
he
p
-
value for each test is the probability of the observed or a g
reater difference assumption of the null
hypothesis that there is no difference between test set correctness distributions.
T
hus, the smaller the
p
-
value,
the less likely that the observed difference resulted from identical test set correctness distributio
ns.
A

typical
threshold for
p
-
value is 0.05. For example, the
p
-
value of the test
when
comparing
LIPSVM

and GEPSVM

on the Glass data set is 0.000 (<

0.05
)
,
meaning

that
LIPSVM

and GEPSVM have different accuracies on
this data set
.

Table 1
shows
that
GEPSVM
, ReGEC

and LIPSVM significantly outperform
LDA

on the
CrossPlanes
.






1

T
he parameter
k
, is
the number of NN. In the experiments, we use Euclidean distance to evaluate the nearest neighbors o
f
the
samples
x
i

with assumption
k
1
=
k
2
=
k
.


Table 1
. Linear Kernel LIPSVM, ReGEC, GEPSVM and LDA, 10
-
fold average testing
correctness

(Corr) (%) and their
standard deviation (STD), p
-
values, average 10
-
fold training time (Time,

sec.).



D
ataset

m
×
n

LIPSVM

Corr
±

STD

-

T
ime (s)

ReGEC

Corr
±

STD

p
-
value

T
ime (s)

GEPSVM

Corr
±

STD

p
-
value

T
ime (s)

LDA

Corr
±

STD

p
-
value

T
ime (s)

G
lass

214
×
9

87.87±1.37

-

0.0012

80.46
±
6.01*

0.012

0.0010

63.28±4.81
*

0
.000

0.0072

91.029±2.08

0.13
1

0.0068

I
ris23

100
×
4

95.00
±
1.66

-

0.0011

90.00
±
4.00

0.142

0.0008

93.00
±
3.00

0
.509

0.0055
-

97.00
±
1.53

0
.343

0.0006

S
onar

208
×
60

80.43
±2.73

-

0.0150

67.14
±
5.14*

0.001

0.0043

76.00
±2.33

0.200

0.0775

71.57
±2.07
*

0.016

0.0037

L
iver

345
×
6

72.48
±
2.48

-

0.0013

66.74
±
3.67

0.116

0.00
1
9

59.13
±
2.03
*

0
.002

0.043

61.96
±
2.59
*

0
.021

0.0009

C
mc

1
473
×
8

92.64
±
0
.51

-

0.0020

75.52
±
3.88*

0.000

0.0028

66
.52
±

1.02
*

0.00
0

0.0199
-

6
7.45
±
0
.65
*

0.000

0.0020
-

C
heck

1000
×
2

51.60
±1.30

-

0.0009

51.08
±
2.34

0.186

0.0007

50.35
±
1.25

0.3
62

0.0098

48.87
±
1.43

0.229

0.0013
-

P
ima

746
×
8

76.04
±
1.11

-

0.0070

74.88
±
1.70

0.547

0.0034

75.95
±
1.12

0.912

0.0537

76.15
±
1.30

0.936

0.0021

Mushroom

8124
×
22

80.12
±
3.21

-

6.691

80.82
±
1.87

0.160

8.0201

81.10
±
1.38

0.352

9.360

75.43
±
2.35

0.138

6.281

C
ross
Plan
es

200
×
2

9
6
.50
±
1.58

-

0.0008

95.00
±
1.00

0.555

0.0201

9
6
.50
±
1.58

1.000

0.0484

53.50
±
17.33*

0.000

0.0160



T
he
p
-
values
were

from a
t
-
test comparing each algorithm to L
IPS
VM.
B
est test accurac
ies

are

in bold.
A
n asterisk

(*)
denote
s

a significant differenc
e from
LIPSVM

based on
p
-
value less than 0.05
, and underline number means minimum train
ing

time
.

Data set
Iris23

is a fraction of UCI Iris dataset with
versicolor

vs.
virginica
.




Table 2
report
s
a
comparison among
the four eigenvalue
-
based classifiers

us
ing
a
Gaussian kernel.
T
he
kernel
width
-
parameter
σ

was
chosen from the value {10
i
|
i

=
-
4,
-
3
… 3, 4
} for
all

the
algorithms.
T
he
tradeoff

parameter
C

for SVM was selected from the set {10
i
|
i

=
-
4,
-
3
… 3,
2}, while the
regularization

factor
s

δ

in
GEPSVM

and
δ
1
,

δ
2

in
ReGEC

w
ere

all
selected from the set {10
i
|
i

=
-
4,
-
3
… 3,
4}.
F
or

KFD
[
24
], when the
symmetrical

matrix
N

was

singular
,
the

regularization

trick

was

adopted
by
setting

N
=N+
η
I
,
where

η

(
>0)
was selected from {10
i
|
i

=
-
4,
-
3
… 3,
4}.

I

is an identity matrix with the same s
ize of
N
.

The
k

in the
nonlinear LIPSVM is
the
same as
that in
its linear case.
P
arameter selection
was

done by comparing
the accuracy of each combination of parameters on a tuning set consisting of a random 10 percent of each
training set.

O
n
the

syntheti
c

data set CrossPlanes, Table 2 report
s

that LIPSVM
, ReGEC

and GEPSVM

are

also
significantly outperform LDA.


Table 2
. Gaussian Kernel LIPSVM, ReGEC
2
, GEPSVM, SVM and LDA, 10
-
fold average testing
correctness

(Corr) (%)
and their standard deviation (STD), p
-
values, average 10
-
fold training time (Time, sec.)

Dataset

m
×
n

LIPSVM

Corr
±

STD


-

T
ime (s)

ReGEC

Corr
±

STD

p
-
value

T
ime (s)

GEPSVM

Corr
±

STD

p
-
value

T
ime (s)

LDA

Corr
±

STD

p
-
value

T
ime (s)

WPBC

194
×
32

77.51
±2.48


0.0908

76.87
±
3.63

0.381

0.1939

6
3
.52±3.51
*

0.000

3.7370

65.17±2.86
*

0.001

0.1538

Check

1000
×
2

92.22
±
3.50


31.25

88.10
±
3.92*

0.028

28.41

87.43
±
1.31*

0.001

40.30

93.38
±
2.93

0.514

24.51

Ionosph
ere

351
×
34

98.72
±4.04


0.4068

91.46
±
8.26*

0.012

0.8048

46.99±14.57
*

0.000

1.5765

87.87±8.95
*

0.0
10

0.6049

G
lass

214
×
9

97.64
±7.44

-

0.0500

93.86
±
5.31

0.258

0.2725

71.04±17.15
*

0.002

0.5218

89.47±10.29

0.090

0.1775

C
mc

1
473
×
8

92.60
±
0.08

-

34.1523

93.05
±
1.00

0.362

40.4720

58.50
±
12.88
*

0.011

57.2627

82.74
±
4.60
*

0.037

64.4223

WDBC

569
×
30

90.17
±
3.52

-

1
.9713

91.37
±
2.80

0.225

5.4084

37.23
±
0.86*

0.000

5.3504

92.65
±
2.36

0.103

3.3480

W
ater

116
×
38

79.93
±
12.56

-

0.1543

57.11
±
3.91*

0.003

0.1073

45.29
±
2.69*

0.000

1.3229

66.53
±
18.06*

0.036

0.0532

C
rossPlanes

2
00
×
2

98.75
±
1.64

-

0.2039

98.00
±
0.00

1.00

1.9849

98.1
3
±
2.01

0.591

2.2409

58.58
±
10.01*

0.000

1.8862

T
he
p
-
values
were

from a
t
-
test comparing each algorithm to L
IPS
VM.
B
est test accurac
ies

are

in bold.
A
n asterisk

(*)



2

Gaussian ReGEC
Matlab

code
available

at:
http://www.na.icar.cnr.it/~mariog/

denote
s

a significant difference from
LIPSVM

based on
p
-
value less than 0.05
, and underline

number means minimum train
ing

time
.



5.2
C
omparisons
between

LDA and
its

extensions



I
n this
sub
section, we ma
d
e comparisons
on

computation
al

time and test accurac
ies

among

LDA and their
extended versions.
I
n order to avoid unbalanced classification

pro
blem
,
the extended classifiers
,
respectively
named as I
nterior
_LDA and marginal_LDA ,
were trained on

two
-
class

interior

points

and
marginal points
.
Table
s

3 and 4
report comparisons between LDA and its variants
, respectively
.


Table
3
.

Linear kernel LDA
and its
variants
:
Interior
_LDA and Margin
al
_LDA, 10
-
fold average test
ing

correctness

±
STD,
p
-
value, average 10
-
fold training
time
(Time, sec.)

Dataset

m
×
n

LDA


C潲r散e湥n猠
±

STD

-

Tim攨獥捯湤c)

I湴敲i潲_LDA

C潲r散e湥n猠
±

STD

p
-
value

Time(seconds)

Margin
a
l
_
LDA

Correctness
±

STD

p
-
value

Time(seconds)

Glass

214
×
9

㤱⸰9
±
㘮㔸

-

〮㈰0

㤰⸶9
±
㘮㌷

〮㌴0

〮〹0

92.13
±
4.78

0.546

0.005

S
onar

208
×


㜱⸵7
±
㘮㔵

-

〮ㄹ0

㜱⸲7
±
㠮㌳

〮㠳0

〮〷0

72.29
±
8.01

0.867

0.023

L
iver

345
×
6

㘱⸹6
±
㠮ㄸ

-

〮㜲0

67.29
±
6.45*

0.007

0.
350

64.86
±
7.80*

0.024

0.121

C
mc

1473
×
8

㜷⸴7
±
㈮〷

-

㘰⸲㠰

77.92
±
2.34

0.178

25.227

75.86
±
3.08

0.56

0.253

Ionosph
ere

351
×


㠴⸹8
±
㘮㜴

-

〮㠶0

88.23
±
4.15

0.066

0.053

85.80
±
6.21

0.520

0.023

Check

1000
×
2

㐸⸸4
±
4
.㐳

-

〮〰ㄳ

㔱⸶5
±
㔮㤸

〮㈵0

〮〰


52
.01
±
4.
32

0.130

0.001
1

Pima

746
×
8

㜶⸱7
±
㐮ㄲ

-

〮〰㈱

76.58
±
3.33

0.558

0.0010

75.39
±
5.00

0.525

0.0005

T
he
p
-
values
were

from a
t
-
test comparing
LDA

variants
to
LDA
.
B
est test accurac
ies

are

in bold.
A
n asterisk

(*) denote
s

a
significant difference from
LDA

base
d on
p
-
value less than 0.05
, and underline number means minimum training time
.


Table
4
.

Gaussian

kernel LDA
,

Interior_LDA

and Margin
al
_LDA, 10
-
fold average test
ing correctness
±
STD,
p
-
value, average 10
-
fold training
time
(Time, sec.)
.

Dataset

m
×
n

LDA


C潲r
散e湥n猠
±

STD

-

Tim攨猩

I湴敲i潲_LDA

C潲r散e湥n猠
±

STD

p
-
value

Time(s)

Margin
al
_
LDA

Correctness
±

STD

p
-
value

Time(s)

Pima

746
×
8

66.83
±
4.32

-

6.354

63.46
±
6.45

0.056

2.67
1

65.10
±
6.60

0.060

0.47
8

Ionosphere

351
×


91.55
±
6.89

-

0.65
8


89.78
±
3.56

0.246

0.052

73.74
±
7.24*

0.000

0.01
9

C
heck

1000
×
2

93.49
±
4.08

-

15.10
7


9 2.5 4
±
3.3 0

0.2 0 5

1 0.6 5
5

8 6.6 4
±
3.3 7 *

0.0 0 1

0.4 8
6

L
i v e r

3 4 5
×
6

6 6.0 4
±
9.7 0

-

0.6 4
1


6 4.4 3
±
8.5 2

0.4 6 3

0.3 2 5

5 7.4 6
±
9.2 9 *

0.0 1 2

0.0 9 2

M
o n k 1

4 3 2
×
6

㘶⸳6
±
㜮㔷

-

ㄮ㤲1

6 8.8 9
±
8.8 6

0.5 6 6

0.3 8
1

6 5.0 0
±
9.6 4

0.7 3 2

0.3 1 8

M
o n k 2

4 3 2
×
6

㔷⸸5
±
㠮㠹

-

ㄮ㈵㘠

㔷⸶5
±
㠮㜶

〮㔹0

〮0


6 7.8 9
±
8.4 2

0.3 9 0

0.1 8 6

M
o n k 3

4 3 2
×
6

9 9.7 2
±
0.8 8

-

1.2 7 6

9 9.7 2
±
0.8 8

1.0 0 0

0.2 4 3

9 8.0 6
±
1.3 4 *

0.0 0 5

0.1 7 4

T
h e
p
-
v a l u e s
w e r e

f r o m a
t
-
t e s t c o m p a r i n g
L D A

v a r i a n t s
t o
L D A
.
B
est test accura
c
ies

are

in bold.
A
n asterisk

(*) denote
s

a
significant difference from
LDA

based on
p
-
value less than 0.05
, and underline number means minimum training time
.


Table

3

s
ay
s

that

for

linear
case,
LDA and its variants have insignificant
performance
differenc
e

on most
data sets.
B
ut
Table
4

indicate
s

that

a s
ignificant difference
exists
between
nonlinear
LDA and
Marginal_LDA
.
Furthermore,
compared to its linear one, as described in [
24
], the

nonlinear LDA is more

likely

prone to

s
ingular

due to
higher
(even infinite)
dimensionality in kernel space.

5.3
C
omparison between LIPSVM and I
-
ReGEC

A
s mentioned

before
,

the recently
-
proposed
I
-
ReGEC

[40]

also

involves

a subset selection and is much
related to our work
. H
owever
,

it is worth t
o point out

several differences

between LIPSVM and I
-
ReGEC:

1)
I
-
ReGEC adopts an incremental
fashion to

find the training subset

while

LIPSVM
does not,
2) I
-
ReGEC is
sensitive to its initial selection as the authors declared, while ours does NOT involve su
ch selection and thus
does not
suffer from

such a problem
;

3)

I
-
ReGEC

seeks its

two

nonparallel hyperplanes

from one single
generalized eigenvalue problem

with respect to all the classes
,

while
LIPSVM
does it

from corresponding
ordinary eigenvalue problem

with respect to each class

4

I
-
ReGEC does not
take

underlying
proximity

information between data points

into account
in
construct
ing
classifier

while

LIPSVM
just do
es
.


I
n what follows, we give a comparison of
their
classification
accurac
ies

using Gaussia
n kernel and
tabulate
the results

in Table 5 (where I
-
ReGEC results are directly

copied

from [40])


Table 5 Test accuracies of I
-
ReGEC and LIPSVM with Gaussian kernel

dataset

I
-
ReGEC

T
est accuracy

LIPSVM

Test accracy

Banana

85.49

86.23

German

73.50

74.28

Diabetis

74.13

75.86

Bupa
-
liver

63.94

70.56

WPBC*

60.27

64.35

Thyroid

94.01

94.38

Flare
-
solar

65.11

64.89

Table 5 says that in
6

out of the
7

datasets, the test
accuracies
of LIPSVM
are better than or comparable to
th
ose

of I
-
ReGEC.

6. Conclusion

an
d Future Work

I
n this paper,
following the geometrical interpretation of GEPSVM

and
fus
ing

the
local information into
the
design

of

classifier
, we

propose a new
robust

plane classifier termed as LIPSVM

and its nonlinear
version
derived by

so
-
called kernel
technique
s
.

I
nspired by

the
MMC
criterion

for dimensionality reduction
,
we define a similar criterion for designing
LIPSVM
(classifier)
and then
seek

two nonparallel planes
to
respectively fit the two given classes
by

solv
ing

the
two

corresponding
standard

rather than generalized

eigenvalue problems

in GEPSVM.
O
ur
experimental

results on most public datasets
used here
demonstrate
that LIPSVM obtain
s

statistically comparable
testing accuracy
to
the foresaid

classifiers.
However, we also
notice

that

due to th
e limitation of the current algorithm
s

in solving larger scale eigenvalue problem
,
LIPSVM
also inherits

such a
limit
ation
.

O
ur f
uture work includes
that how to go further for solving

real
large classification problems
un
fit
ted

in memory, for both linear an
d nonlinear LIPSVM.
W
e also plan to
explore

some

heuristic

rules

to
guide
the
parameter selection of KNN.



Acknowledgment


T
he
corresponding
author
would like to thank Natural Science Foundations of China
Grant No.
60773061

for
partial
support.

R
e
ference


[
1
].

C.J.C. Burges. A tutorial on support vector machine for pattern recognition. Data Mining and Knowledge Discovery,
vol.
2(2), 1
-
47, 1998.

[
2
]. B. Scholkopf, A. Smola, K.
-
R Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural computation,
1998, 10: 1299
-
1319

[
3
]. B. Scholkopf, P. Simard, A.Smola, and V. Vapnik. Prior Knowledge in Support Vector Kernel
s. In M. Jordan, M.Kearns,
and S.Solla, editors, Advances in Neural Information Processing Systems 10, Cambridge, MA: MIT Press, 1998: 640
-
646

[
4
]. V.Blanz, B.Scholkopf, H.Bulthoff, C.Burges, V.Vapnik and T.Vetter. Comparison of view
-
based object recogniti
on
algorithms using realistic 3d models. Artificial Neural Networks, Proceedings of the International Conference on Artificial
Neural Networks, (Eds.) C. von der Malsburg, W. von Seelen, J.
-
C. Vorbrüggen, B. Sendhoff. Springer Lecture Notes in
Computer Sci
ence, Bochum 1996 1112, 251
-
256. (Eds.) von der Malsburg, C., W. von Seelen, J. C. Vorbrüggen and B.
Sendhoff, Springer (1996)

[
5
]. M. Schmidt. Identifying speaker with support vector networks. In Interface ’96 proceedings, Sydney, 1996

[
6
]. E.Osuna,, R.Fr
eund and F. Girosi. Training support vector machines: an application to face detection. In IEEE Conference
on Computer Vition and Pattern Recognition, 1997: 130
-
136

[
7
]. T. Joachims. Text categorization with support vector machines: learning with many. rel
evant features. In: Proc of the 10th
European Conf on Machine learning, 1999:137
-
142.

[
8
]. G. Fung and O. L. Mangasarian.
Proximal Support Vector Machine Classifiers.
Proc. Knowledge Discovery and Data
Mining (KDD), F. Provost and R. Srikant, eds., pp. 77
-
86, 2001.

[
9
]. O. L. Mangasarian and Edward W. Wild, Multisurface Proximal Support Vector Machine Classification via Generalized
Eigenvalues. IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI), 28(1): 69
-
74. 2006

[
10
].

T. M.Cover and P. E
. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 1967,
13:21
-
27

[
11
]. Y. Wu, K. Ianakiev and V. Govindaraju. Improved k
-
nearset neighbor classification. Pattern Recognition 35: 2311
-
2318,
2002

[
12
].

S. Lafon, Y. Kell
er and R. R. Coifman. Data fusion and multicue data matching by diffusion Maps.
IEEE Trans. On
Pattern Analysis and Machine Intelligence (PAMI)

2006, 28(11): 1784
-
1797.

[
13
]. S. Yan, D. Xu, B. Zhang and H.
-
J. Zhang, Graph Embedding: A General Framework fo
r Dimensionality Reduction,
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2005).

[
14
]. S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (2000): 2323
-
23
26

[
15
].

J. B. Tenenbaum, V. de Silva and J. C. Langford, A
global

geometric framework for nonlinear dimensionality reduction,
Science 290 (2000):2319
-
2323

[
16
]. M. Nelkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering,
Advances in Neural
Information Processing Systems 14 (NIPS 2001), pp: 585
-
591, MIT Press, Cambridge, 2002.

[
17
]. H.
-
T Chen, H.
-
W. Chang, T.
-
L.Liu, Local discriminant embedding and its variants, in: Proceedings of International
Conference on Computer Vision
and Pattern Recognition, 2005

[
18
].

P. Mordohai and G. Medioni, Unsupervised dimensionality estimation and manifold learning in high
-
dimensional spaces
by tersor voting. 19th International Joint Conference on A
rtificial

Intelligence (IJCAI) 2005:798
-
803

[
19
]
.

K.Q. Weinberger and L.K.Saul. Unsupervised learning of image manifolds by semidefine programming. In Proc. Int. Conf.

on Computer Vision and Pattern Recognition, Pages II: 988
-
995, 2004

[
20
]. H. Li, T. Jiang and K. Zhang. Efficient and robust feature extr
action by maximum margin criterion.
In:

Proc Conf
.

Advances in Neural Information Processing Systems.

Cambrigde,

MA:MIT Press,

2004.97
-
104

[
21
].

V
.
Marno
and

N
.
Theo
.
Minimum MSE estimation of a regression model with fixed effects from a series of cross
-
sect
ions
.

Journal of Econometrics
, 1993
,
59(1
-
2), 125
-
136

[
22
]. H. Zhang, W.Huang, Z.Huang and B.Zhang. A Kernel Autoassociator Approach to Pattern Classification. IEEE
Transaction on Systems, Man, and C
ybernetics

Part B: CYBERNETICS, Vol.35, No.3, June 2005.

[
23
]. B. Scholkopf and A. J. Smola. Learning With Kernels, Cambridge, MA: MIT Press, 2002.

[
24
]. S. Mika. Kernel Fisher Discriminants. PhD thesis, Technischen Universität, Berlin, Germany, 2002

[
25
]. S
. Mika, G. Rätsch, J. Weston, B. Schölkopf and K.
-
R. Müller. Fisher discriminant analysis with kernels. In Y.
-
H. Hu,
E.Wilson J.Larsen, and S. Douglas, editors, Neural Networks for Signal Processing IX, pp: 41
-
48. IEEE, 1999

[
26
]. S.Mike. A mathematical app
roach to kernel Fisher algorithm. In Todd K.Leen and Thomas G.Dietterich Völker Tresp,
editors, Advances in Neural Information Processing Systems 13, pages: 591
-
597, Cambridge, MA, 2001.MIT Press

[
27
]. M. Kuss, T. Graepel, The geometry of kernel canonical c
orrelation analysis, Technical Report No.108, Max Planck
Institute for Biological Cybernetics, Tubingen, Germany, May 2003

[
28
]. B.Schlkopf, A.Smola and K.
-
R. Miller, Nonlinear component analysis as a kernel eigenvalue problem, Neural
Computation, vol. 1
0(5), 1998

[
29
]. B.Scholkopf, A.Smola, and K.
-
R.Muller, Kernel principal component analysis, in Advance in Kernel Methods, Support
Vector Learning, N.Scholkopf, C.J.C.Burges, and A. Smola, Eds. Combridge, MA: MIT Press, 1999

[
30
]. T.P. Centeno and N.D.Lawren
ce
.

Optimizing kernel parameters and regularization coefficients for non
-
linear discriminant
analysis, Journal of Machine Learning Research 7 (2006): 455
-
491

[
31
]. K. R. Müller , S. Mika, G. Rätsch, et al.
An introduction to kernel
-
based learning algorithms
. IEEE Transaction on Neural
Network, 2001, 12(2):181
-
202

[
32
]. M. Wang and S. Chen. Enhanced FMAM Based on Empirical Kernel Map, IEEE Trans. on Neural Networks, Vol. 16(3):
557
-
564.

[
33
]. P.Berkes. Handwritten digit recognition with nonlinear Fisher discrim
inant analysis,
Proc. of ICANN Vol. 2, Springer,
LNCS 3696, 285
-
287



[
34
].

Liu CJ, Wechsler H. A shape
-

and texture
-
based enhanced Fisher classifier for face recognition. IEEE Transactions on
Image Processing, 2001,10(4):598~608

[
35
].
S. Yan, D. Xu, L. zhan
g, Q. Yang, X. Tang and H. Zhang. Multilinear
d
iscriminant
a
nalysis for
f
ace
r
ecognition, IEEE
Transactions on Image Processing (TIP),
2007, 16(1): 212
-
220


[
36
].
T
.
Li, S
.
Zhu, and M
.
Ogihara. Efficient
m
ulti
-
way
t
ext
c
ategorization via
g
eneralized
d
iscrimina
nt
a
nalysis. In Proceedings
of the Twelfth International Conference on Information and Knowledge Management

(CIKM’03)
:
317
-
324
,
2003

[
37
].
R
.

Lin, M
.

Yang, S
.

E. Levinson
.

Object
t
racking
u
sing

i
ncremental
f
isher
d
iscriminant
a
nalysis. 17th International
Co
nference on Pattern Recognition
(
ICPR’
04)

(2)
:

757
-
760
, 2004

[
38
].

M.R. Guarracino, C. Cifarelli, O. Seref, P. M. Pardalos
.

A Parallel

Classification Method for Genomic and Proteomic
Problems, 20th

International Conference on Advanced Information Networkin
g and

Applications
-

Volume 2 (AINA'06),
2006, Pages 588
-
592.

[
39
].
M.R. Guarracino, C. Cifarelli, O. Seref,
M
. Pardalos.
A Classification Method Based on Generalized Eig
envalue
Problems
, Optimization Methods and Software, vol. 22, n. 1 Pages 73
-
81, 2007.

[
40
].
C. Cifarelli,

M.R. Guarracino
,
O. Seref
, S. Cuciniello and P.
M.

Pardalos
. Incremental classification with generalized
eigenvalues. Journal of Classification, 2007,
24:205
-
219

[
41
].
J
.

Liu, S
.

Chen,
and
X
.

Tan, A Study on three Linear Discriminant Analysis Based Methods in Small Sample Size Problem,
Pattern Recognition, 2008
,
41(1)
:
102
-
116


[
42
]. S. Chen and X. Yang. Alternative linear discriminant classifier. Pattern R
ecognition, 2004, 37(7): 1545
-
1547

[
43
]. P.M. Murphy and D.W. Aha. UCI Machine Learning Repository, 1992,
www.ics.uci.edu/~mlearn
/ML Repository.html.

[
44
]. T. M. Mitchell. Machine Learning. Boston: McGraw
-
Hill,

1997