Proximal Support Vector Machine
Using
Local
I
nformation
Xubing Yang
1,2
Songcan Chen
1
Bin Chen
1
Zhisong Pan
3
1
Department of Computer Science and Engineering
Nanjing University of Aeronautics & Astronautics
,
Nanjing 210016, China
2
College of Informa
tion Science and Technology,
Nanjing
Forest
ry
University,
Nanjing
210037, China
3.
Institute of Command Automation,
PLA University of Science Technology, Nanjing 210007
, China
Abstract
:
Instead of s
tandard support vector machines
(SVM
s
)
that
classif
y
poi
nts
to one of two disjoint
half

spaces
by solving a quadratic program
,
the plane classifier
GEPSVM
(
Proximal SVM Classification
via
Generalized Eigenvalues
)
classifies points by assigning them to the closest of two
nonparallel
planes
which
are
generated by
their corresponding generalized
e
igenvalue
problems
.
A simple geometric interpretation of
GEPSVM is that
each
plane is closest to the points of its own class and furthest to the points of the other
class.
A
nalysis and experiments have
demonstrate
d its
cap
a
bility
in both computation time and test
correctness.
In this paper,
following the geometric intuition of GEPSVM
, a new
supervised
learning method
called
Proximal Support Vector Machine Using Local Information
(L
I
PSVM)
is proposed.
W
ith the
introduction o
f
proximity information
(consideration of underlying information such as
correlation
or
similarity
between points) between the training points,
LIPSVM
not only keeps
aforementioned
characteristic
s of GEPSVM, but also
has
its
additional advantages: (1) robu
stness to outliers; (2) each plane
is generated
from its corresponding standard rather than generalized
eigenvalue problem
to avoid matrix
singularity
; (
3
) comparable classification ability to the
eigenvalue

based
classifiers
GEPSVM and
LDA.
Furthermore, t
he idea of
LIPSVM
can be
easily
extended to
other classifiers, such as
LDA.
F
inally, some
e
xperiments on the artificial and benchmark datasets show
the effectiveness of
LIPSVM
.
Keywords
:
p
roximal
classification;
e
igenvalue
; manifold learning;
outlier
1.
In
troduction
S
tandard
support vector machine
s
(SVM
s
)
are
based on the structural risk
minimization
(SRM)
principle
and aim at maximizing the margin between the points of two

class data
set
.
For the pattern recognition case,
Corresponding author: Tel: +86

25

84896481

12106; Fax: +86

25

84498069; email:
s.chen@nuaa.edu.cn
(Songcan Chen),
xbyang@nuaa.edu.cn
(Xubing Yang) ,
b.chen@nuaa.edu.cn
(Bin Chen) and
pzsong@nuaa.edu.cn
(Zhisong
Pan)
they
have shown
their
outstanding c
lassification
performance
[
1
]
in
many applications
, such as
handwritten
digit recognition
[
2
,
3
]
, object recognition [
4
], speaker identification [
5
], face detection in images and text
categorization
[
6
,
7
]
.
A
lthough
SVM
is
a
powerful classification
tool,
it
require
s
the solution of quadratic
programming
(QP)
problem
.
R
ecently,
Fung
and
Mangasarian
introduce
d
a linear classifier, p
roximal
s
upport
v
ector
m
achine
(PSVM)
[
8
]
, at KDD2001 as variation
of
a standard
SVM
.
Different from SVM,
PSVM re
plac
es
the
inequal
ity
with
the
equality
in the defining
constraint
structure of the SVM framework
.
Besides, it
also replac
es
the
absolute error
measure
by
the
squared error
measure
in defining
the
minimization problem.
T
he authors claim
ed
that
in doing so
,
the
comput
ation
al
complexity
can be greatly
reduc
ed
without resulting in
discernible loss of classification accuracy
.
F
urthermore,
PSVM classifies
two

class points to the closest of two
parallel
planes that are pushed apart as far as possible.
GEPSVM
[
9
]
is
an alternative
version to PSVM, which
relaxes
the parallelism condition on the PSVM. Each of the
nonparallel
proximal planes is generated
by a generalized eigenvalue problem
.
I
ts
performance
has
been
showed in both computation
al
time and
test accuracy
in [
9
].
S
imilar to PSVM, the geometric interpretation
of GEPSVM is
that each
of two
nonparallel
planes
is as close as possible to one of
the
two

class
data sets
and as far as possible from the other
class
data set.
I
n this paper,
we propose
a n
ew
nonparallel
plane classifier
LIPSVM
for binary classification
.
D
ifferent
from GEPSVM,
LIPSVM
introduces
proximity
information between
the data
points
in
to
constructing
classifier
.
A
s for so

called
proximity
information
, it
is
often
measured
by
the
neare
st neighbor
relations
lurking in the
points.
Cover and Hart
[
10
]
had
firstly
concluded that almost
the
half
of
the classification
information is contained in the nearest neighbors.
F
or
the
purpose of
classification, the basic assumption
here
is that
the
poi
nts sampled from the same class have higher correlation/similarity
(
for example, they are
sampled from an unknown identical distribution) than those from the different
one
s [
11
].
F
urthermore,
e
specially
in recent years
, many researchers have reported that m
ost of the points of
a
dataset are highly
correlated, at least locally, or
the data set has inherent geometrical property
(for example, a manifold
structure)
[
12
,
13
]
.
This
issue
explains the
success
es
of the increasingly popular manifold learning methods
,
s
u
ch as
Locally Linear E
mbedding
(LLE) [
14
],
ISOMAP
[
15
], Laplacian Eigenmap
[
16
] and their extensions
[
17
,
18
,
19
]
.
A
lthough those algorithms are efficient for
discovering
intrinsic
feature of the lower

dimensional
manifold embedded in the original
high

dimensional
observation
space,
up to now
many open problems still
have not been efficiently solved
for supervised
learning
,
for instance,
data classification
.
One of the most
important
reasons
is that
it is
not
necessarily
reasonable to suppose that the manifolds in
different class will
be well

classified in the same lower

dimension embedded space.
F
urthermore,
the
intrinsic
dimensionality of
a
manifold is usually unknown
a priori
and can not be reliably estimated from the dataset.
I
n this paper,
q
uite d
ifferent from
the
aforementioned
manifold learning methods,
LIPSVM
need
s
not consider
how to
estimate
intrinsic
dimension
of the embedded space
and only
requests
the
proximity
information between
points
which
can
be derived from
their nearest neighbor
s
.
W
e highlight the
contributions of this paper.
1)
I
nstead of generalized eigenvalue problems in GEPSVM algorithm,
LIPSVM
only needs
to solve
standard eigenvalue problem
s
.
2
)
With introducing this proximal information into constructing
LIPSVM
,
we expect
that
the
so

develo
ped
classifier
is
to be
robust to outliers.
3
)
I
n essence, GEPSVM is
derived
by a generalized eigenvalue problem through minimizing a kind of
Rayleigh quotient.
F
or
the
two real symmetric matrices
appearing
in GEPSVM criterion, if
both
are
positive
semi

d
efinite
or singular
, an ill

defined operation
will be yielded
due to floating

point imprecisions.
S
o,
GEPSVM adds a perturbation to one of the
singular (or semi

definite positive)
matrix.
A
lthough the authors
also
claimed
that this perturbation acts as som
e kind of regularization, the real influence in this setting of
regularization is not yet
well
understood.
I
n contrast
,
LIPSVM
need
not
to care about the
matrix
singularity
due to adoption of
a similar f
ormulation
to
the
Maximum Margin Criterion
(MMC)
[
20
]
,
but it is worthwhile
noting that MMC just is a
dimensionality reduction method rather than a classification method.
4
) The idea of
LIPSVM
is applicable to a wide range of binary classifiers, such as LDA.
T
he rest of this paper is
organized
as follows. In
Section 2,
we review some basic work about GEPSVM
.
M
athematical
description of
LIPSVM
will
appear
in section 3.
I
n Section 4, we extend th
is
design
idea of
LIPSVM
to LDA.
A
nd in Section 5, we provide the experimental results on some
artificial
and public
datasets
.
Finally, we conclu
de the whole paper
in Section 6.
2.
A brief review on
GEPSVM
algorithm
GEPSVM [
9
]
algorithm has
been
validate
d
that it is
effective for binary
classification
.
I
n this section, we
briefly
illustrate
its
main
idea.
Given a training set of two pattern
classes
,
i
= 1, 2 with
N
i
n

dimensional
patterns in the
i
th
class. Throughout the paper,
superscript
“
T
”
denote
s
transposition, and
“
e
”
is a
case

dependent dimensional column ve
ctor whose entries are all ones.
Denote the
training set by
a
N
1
n
matrix
A
(
A
i
,
the
i
th
row of
A
, corresponds
to the
i
th
pattern
of
Class 1) and the
N
2
n
matrix
B
(
B
i
has the
same
meaning
of
A
i
)
,
respectively. GEPSVM attempts to seek two
non

parallel
plan
es
(
eq (
1
))
in
n

dimensional input space
respectively
,
,
(1)
where
and
mean the weight and threshold of
the given
i
th
plane.
The
geometrical
interpretation
,
each
plan
e
should be
closest to the points
of
its own class and
furthest
from the points
of
the other
class
,
leads to the
following optimization problem,
，
(2)
w
here
is a nonnegative regularization
factor, and .
mean
s the
2

norm
.
Let
G
: = [
A
e
]
T
[
A
e
] +
I
,
H
: = [
B
e
]
T
[
B
e
],
z
: =[
w
T
r
]
T
，
(3)
t
hen, w
.r.t
the f
irst plane of (1), formula (2) becomes:
,
(4)
where
both
matrices
G
and
H
are positive semi

definition when
=0
.
Formula
(4) can be solved by the
following generalized eigen
value problem
G z
=
H z
(
z
0
)
.
(5)
When
either of
G
and
H
in eq (5) is a positive definite matrix, t
he global minimum of (4) is achieved at an
eigenvector of the generalized eigenvalue problem (5)
corresponding to the smallest eigenvalue
.
S
o in many
real

world cases,
a
regularization
factor
must be set to a positive
constant
,
especially
in
s
ome Small Size
Sample (SSS) problems
.
T
he 2
nd
plane can be obtained with a similar process.
U
nder
the
fores
aid optimization criterion,
GEPSVM
attempt
s to estimate
planes
in input space
for
the
given data
, that is, each plane
is generated or approximated by
the data
points of its
correspond
ing
class
.
In
essence
,
the
points
in
the same
class
are
fitted
using
a li
near function.
H
owever,
in
the
viewpoint of
regression,
it is
not quite
reasonable to take a
n
outlier (a point far from the
most samples
)
as
a
normal sample
in
data
fitting
methodology
when
outlier
s
are
present
.
F
urthermore,
the
outliers
usually
conduct
er
roneous
data
information
,
even
misguide
a fitting
. So t
hey can
heavily
affect fitting effect
in most cases
as shown i
n
Fig
.
1
where
two outliers
are added
.
O
bvious
ly,
the plane
generated
by GEPSVM
is heavily biased
due to
presence of the
two outliers.
Ther
efore, in this paper,
we
attempt
to
define
a new
robust criterion
to seek
the
planes
which not only
substantially
consider
s original data distribution, but also
can be
resistant
to outliers
(
see
red dashed lines in
Fig. 1
, which are generated from
LIPSVM
)
.
F
ig
.
1 The planes learned by GEPSVM and
LIPSVM
, respectively.
T
he
red
(
dashed
) line
s
come from
LIPSVM
, and the black
(
solid
) one
s
, from GEPSVM.
D
ata points in Class 1 are
symbolize
with
“
o
”
, and in Class 2, with
“□”
.
T
hose symbols with
additional
“
+
”
stand
for margin
al
points
(
k
2

nearest neighbors to Class 1) in Class 2
.
T
he two points
,
far
away
from most of
data points in Class 1
,
can
be thought as outliers.
I
t is also illustrates the intra

class graph and its co
nnected relationship of data
points in Class 1.
I
n what follows, we
detail
our
LIPSVM
.
3.
Proximal Support Vector Machine Using Local Information
(
LIPSVM
)
I
n this section, we introduce
our
novel classifier
LIPSVM
, which
contains
the following
two steps
.
I
n the first step
,
constructing the
two graphs characterize the intra

class denseness and the inter

class
separability
respectively
. Each vertex
in the graphs
corresponds
to
a
sample of the given data, as described
in many
graph

based
machine learning meth
ods
[
13
]
.
D
ue to
the
one

to

one
correspondence
between
“
vertex
”
and
“
sample
”
, we
will
not strictly
distinguish
them
hereafter.
In
the intra

class graph,
a
n edge
between
a
vertex pair is
add
ed
when
the corresponding sample
pair
is
each other
’
s
k
1

nearest neighbors
(
k
1

NN) in
the same class.
I
n the inter

class
graph
, the vertex pair, whose corresponding samples come
from different classes, is connected when one
of the pair
is a
k
2

NN of the other.
F
or the intra

class
case
,
the p
oints in high density regions
(
hereafter we call them
interior
points
)
have more chance to become
nonzero

degree vertexes; while the points in low density regions,
for example
,
outliers
, become
more likely
isolated vertexes (zero

degree).
F
or the inter

cla
ss case,
the
points in
the
marginal regions
(
marginal points
)
have more possibility
to become nonzero

degree vertexes.
I
ntuitively
,
if
a fitting plane of one class is far
away from th
e
marginal points of the other class, at least in linear

separable case
,
this plane
may be
far
away from the rest.
In the second step
, only
those
nonzero

degree
points
are
used
to
train
ing
classifier
.
T
hus
,
LIPSVM
can
restrain
outlier
to great extent
(see Fig.1).
T
he
training
time cost
of LIPSVM
is from
the
two
aspect
s
:
O
ne i
s
from
selection of
interior
points
and
marginal points
,
and t
he other
from
the
optimiz
ation of
LIPSVM
.
T
he
following analysis
indicates that
LIPSVM performs
faster than
GEPSVM
:
1)
LIPSVM
just requires solving a standard eigenvalue problem,
while
GEPSVM ne
eds to solve a generalized eigenvalue problem; 2)
After finishing samples selection, the
size of th
e
selected samples used to training LIPSVM is smaller than that of the GEPSVM
.
.
F
or example, i
n
Fig
.1, the
first
plane
(
the top
dash line
of Fig.1
)
of
LIPSVM
is closer to the
interior points
of Class 1 and
far
away
from the
marginal
points
of Class 2.
A
ccordingly
, the training set size of
LIPSVM
is smaller than
that of GEPSVM.
I
n the next
sub
section,
we firstly
derive
the linear
LIPSVM
.
T
hen, we
develop its
cor
responding nonlinear
one with kernel tricks
.
3.1 Linear
LIPSVM
C
orresponding
to Fig.1,
the
two
adjacent
matrices
of each plane
are
respectively
denoted by
S
and
R
and
defined as follows:
, (6)
, (7)
where
denote
s
a set of the
k
1

nearest neighbors in the same class of the
sample
, and
a
set of data points composed of
k
2

nearest neighb
ors (
k
2

NN) in the different class of the sample
.
W
hen
or
, an undirected edge between
and
is
added
to
the
corresponding
graph
.
A
s a
r
esult, a
linear plane of
LIPSVM
can be
produce
d from those
nonzero

degree
vertexes
.
3.1.1
O
ptimization criterion
of LIPSVM
A
nalogously
to GEPSVM
,
LIPSVM
also
tries
to seek two
nonparallel
planes
as described in
eq (1).
With
the
similar
geometric intuiti
on of GEPSVM,
w
e define
an
optimization criterion
to determine
the plane of
Class 1 as follows:
(8)
(9)
B
y s
implifying (8),
we obtain the following expression:
(10)
where
the weight
and
.
For geometrical
interpretab
ility (see
3.1.3
), we define the weights
and
as:
(11)
(12)
where
and
.
Next we discuss how to solve this optimization problem.
3.1.2
Analytic S
olution
and Theoretical Justification
Define
a
Lagrange multiplier function based on the objective
function
(10) and equality constraints (9) as
follows:
. (13)
S
etting the gradients of
L
with respect to
w
1
and
r
1
equal to zero gives the following optimality conditions:
, (14)
. (15)
Simplifying (14) and (15), we obtain the following simple expression with matrix form:
, (16)
, (17)
where
,
,
,
and
.
When
, the variable
r
1
disappear
s
f
rom expression (17). So, we discuss
the
solutions of the
proposed optimization criterion
in two cases
:
1
)
when
, the optimal solution
to
the
above
optim
ization
problem
is
obtained
from
solving
the
following eigenvalue problem after substituting eq
(17) into (16).
(18)
(19)
where
,
, and
.
2
)
when
,
r
1
disappear
s
from eq. (17).
A
nd eq.(17) becomes:
.
(20)
Left multiplying eq.
(16) by
, and substituting eq
(20) into it, we obtain a new expression as follows:
. (21)
W
hen
is an eigenvalue of the real
symmetrical
matrix
,
the
s
olutions of eq. (21)
can
be
obtain
ed from
the
following
standard
eigen

equation
(details is described in
T
heorem 1)
.
I
n this situation,
r
1
can not be solved
through
(16) and (17). So instead
,
we directly define
r
1
with
the
intra

class vertexes as follows:
=
(22)
A
n intuition for
the above definition is from that
a
fitting plane/line
passing
th
rough
the center of
the
given
points
h
a
s
less regression loss in a sense o
f MSE
[
21
]
.
Next,
importantly,
we
will
prove that the optimal
normal
vector
w
1
of the first plane is
exactly
an
eigenv
ector of the aforementioned eigenvalue problem corresponding to a smallest eigenvalue
(Theorem 2)
.
Theorem
1
.
Let
be a real
symmetric
matrix
, for any unit
vector
, if
the
is a
solution
of the
eigen

equation
, then
it
must
satisfy
, where
.
Convers
e
ly
,
if
satisfies
and
is
an eigenvector of matri
x
, then
and
must satisfy
.
Theorem
2
. The optimal
normal
vector
w
1
of the first plane
is
exactly
the
eigenvector corresponding to
smallest
eigen
value of the
eigen

equation
derived
from objective (8)
subject to
constraint (9).
P
roof:
we
re
write
eq.
(10)
(
equivalent to
objective
(8)
)
as follows:
Let
S
implifying the above expression and representing it in matrix form, we
obtain
(23)
1)
W
hen
, substituting eq. (18), (19)
and (9) into (23), we get the following expressio
n.
(24)
=
T
hus
,
the optimal value of the optimization problem is the smallest eigenvalue of the eigen

equation (18).
Namely,
is an ei
genvector corresponding to
the
smallest
eigenvalue
.
2) When
, eq
(17) becomes
. (25)
E
q
(25) is equivalent to the following expression:
. (26)
Substituting
(26) and (21) into (23),
we obtain
=
,
where
is still an ei
genvalue of eq
(18).
This ends our proof of
the
Theorem
2
. #
3.1.3
G
eometrical
I
nterpreta
tion
According to
the
defined
notations
, eq
(10)
implie
s
that
LIPSVM
only concerns
those samples whose
weight
s
(
.
d
l
or
f
m
)
are
greater than
zero.
F
or example,
will be
present
in
eq
(
10
)
if
and
only
if
its
weight
.
As a result
,
the
points
of a training set of
LIPSVM
are selectively generated
by
the
two
NN
matrices
S
and
R
, which can lead LIPSVM to robustness due to
elimina
ting
or
restrain
ing
effect of outliers.
Fig.1
gives
a
n illustration in which the
two outliers in Class 1
are
eliminated during selecti
ng
samples
process.
Similarly
,
, its
corresponding
marginal point,
will be kept
in (10)
when
.
I
n most cases,
the number of marginal points is
far
less
than
th
at of the
given
points
. Thus, the number of the samples
used
to train LIPSVM can be
reduced
.
Fig. 1 also illustrate
s
those marginal points
as marked
“
+
”
.
S
ince
the
distance of a point
to a plane
is
measured as
[22]
, the expression
stands for the square distance of the points
to the
plane
.
S
o,
the goal of
training
LIPSVM
is to seek the plane
as
close to the
interior points
in Class 1
as possible
and
as
faraway
from the
margin
al
points
in Class 2
as possible
.
This is quite c
onsist
ent
with the
foresaid
optimization objective.
S
imilarly
, t
he second plane
for the other class
can be obtained.
I
n the following,
we
are
in
a
position to
describe its nonlinear version.
3.2 Nonlinear
LIPSVM
(Kernelized
LIPSVM
)
In real world, the prob
lems encountered can not always
effectively
be handled using linear methods.
In
order to
mak
e the proposed
method able to
accommodate
nonlinear cases, we extend it to
the
nonlinear
counterpart
by
well

known kernel trick
[
22
,
23
]
.
T
hese non

linear kernel

based
algorithms, such as
KFD
[
24
,
25
,
26
]
,
KCCA
[
27
]
,
and KPCA
[
28
,
29
]
, usually use the
“
kernel trick
”
to achieve their non

linearity.
T
his
conceptually corresponds to first mapping the input into a higher

dimensional feature space (RKHS:
R
eproducing
K
ernel Hilbert
S
p
ace) with some non

linear transformation.
T
he
“
trick
”
is that this mapping is
never
given
explicitly
, but implicitly induced by a kernel.
T
hen those linear methods
can
be applied in this
new
ly

mapped
data space RKHS
[
30
,
31
]. Nonlinear
LIPSVM
also follow
s
th
i
s
process
.
Rewrite
the
training set
as
.
F
or
convenience
of description,
we define Empirical Kernel Map
(EMP, for more detail please
sees
[
23
,
32
]
)
as follows:
where
. Function
stands for an arbitrary
Mercer
kerne
l
,
for
any
n

dimension
al
vector
s
x
and
y
,
which maps the
m
into a real number in
R
. A frequently used kernel in
nonlinear classificatio
n is Gaussian with the expression
, where
is
a positive constant
and called as the bandwidth
.
S
imilarly, we consider the following kernel

generated nonlinear plane, instead of the aforementioned
li
near case
in input space.
,
(27)
D
ue to
the
k
1

NN and
k
2

NN relationship graphs
and
matrices (denoted by
and
), we consider the
following optimization criterion instead of the original one in the input space as (8) and (9)
with an entirely
similar argument.
(28)
(29)
With
an
analogous
manipulation
to the linear case
, we
also
have the following eigen

system
(
when
)
where
,
, a
nd
.
Specification
about
the
parameters is
completely
analogous
to linear
LIPSVM
.
3.3
Links
with previous approaches
Due to its simplicity
, effectiveness
and efficiency, LD
A
is still a popular dimensionality reduction
in
many applications such as handwritten digit recognition
[
33
]
,
face detection
[
34
,
35
]
,
text categorization
[
36
]
and target tracking
[
37
]
.
However,
it
also
has several essential
embarrassments such as singularity o
f
the
scatter matrix
in SSS
case
and
the
problem
of
rank limitation. To
attack
these
limit
ation
s,
in
[
42
]
,
we have
previously
de
signed
alternative LDA (A
F
LDA) by introducing a new discriminant criterion. A
F
LDA
overcomes the rank limitation and at the same
time
, mitigates the singularity.
Li and Jiang
[
20
]
exploit
ed
the
average maximal margin
principle
to
define
a
so

called
Maximum Margin Criterion (
MMC
)
and derived
a
lternative discriminant analysis approach
. Its main difference from LDA criterion is to adopt the trace
difference
instead of the trace ratio
between the between

class scatter and
the within

class scatter, as a result,
bypassing both the singularity and th
e rank limitation.
In
[
13
],
M
arginal Fisher Analysis (MFA)
establishes
a
similar formulation of the trace ratio between
the
two scatters to
LDA
but
further
incorporates manifold
structure of the given data
to
f
ind
the projection directions in the
PCA
transform
ed subspace
.
D
oing so
avoids the singularity
.
Though there are many methods to overcome the problems, their basic
philosophy
are
similar, thus, we just
mention
a few above
here
.
B
esides, these methods are l
argely dimensionality reduction,
despite of different definitions for the scatters in
the
objectives
while in order to perform classification task
after dimensionality reduction, they all
us
e the simple and popular
nearest neighbor
and thus
are generally
a
lso
viewed as an indirect classification method. In contrast,
SVM
is a
directly

design
ed
classifier based
on
the SRM principle
by
maximizing the margin
between the two

class
given
data points
and
has been shown
superior classification performance in most r
eal cases
. However, SVM requires
solving a
QP problem.
U
nlike its sol
vin
g
as described in section I, PSVM
[
8
]
utilizes
2

norm and equa
lity
constraints and
only
need
s
to solve a
set of
linear
equations
for
seek
i
ng
two
parallel
planes. While
GEPSVM
[
9
]
relaxes
this
paralle
lism and aims to
obtain
two
nonparallel
planes from two corresponding generalized eigenvalue
problems, respectively.
However, it
also encounters the
singularity
problem
for
which
the authors used
the
regularization technique.
Recently, Guarracino and Cifarelli
gave a
more flexible
setting technique for the
regularization parameter to overcome the same
singularity
problem
and named
so

proposed plane cla
ssifier
as
ReGEC
[
38
，
39
]
. ReGEC
seeks two planes
simultaneously just from
a single
generalized eigenvalue
equation
(the two planes correspond respectively to the maximal and minimal eigenvalues),
instead of two
equations in GEPSVM
.
I
n 2007, an incremental v
ersion of ReGEC, termed as I

ReGEC[
40
], is proposed,
which
first performed
a subset selection and then use
d
th
e
subset to
train ReGEC classifier
for performance
gain
.
A common point of t
h
e
se plane

type classifiers
in overcoming the singularity all adopt the
regularization
technique.
H
owever,
f
or one thing, the selection of
the
regularization factor
is
a key to
performance of solution and
still open up to now.
F
or another thing,
the introduction of the
regularization
term in
GEPSVM
unavoidably
depart
s
from it
s
original geometricism partially
.
A
major difference of our
LIPSVM from them
is no need of regularization due to that the solution of LIPSVM is just an
ordinary
eigen

system.
I
nterestingly,
the
relation between LIPSVM and MMC is
quite
similar to
that
betw
een
GEPSVM
and
regularized LDA
[
41
]
LIPSVM
is
develop
ed by
fus
ing
proximity
information so as to not only keep the
characteristic
s
, such as
geometricism of
GEPSVM, but also
possess
its own advantages as described in Section I.
Extremely
, when
the
number of
NN, k,
take
s large enough, for instance, set
k
=
N
, all
the given
data points
can
be used for
training LIPSVM.
A
s far as
geometrical
principle, i.e. each
approximate
plane is as closest to data points of
its own class as
possible
and as furthest to points of
the other class as possible, the
geometricism
of LIPSVM
is in complete
accordance
with that of GEPSVM.
S
o, with the
inspiration
of MMC, LIPSVM
can be seen as
a generalized
version
of GEPSVM
.
4.
A b
yproduct
inspired by
LIPSVM
LDA
[
42
]
ha
s
been
widely us
ed for pattern classification
and can be
analytical
ly solved by its
corresponding eigen

system. Fig.2
illustrate
s
a binary classification problem and
the
data points of each class
are generated from a non

Gaussian distribution.
T
he solid (black) line stand
s for the LDA optimal
decision
hyper
plane
which
is
directly generate
d
from
the
two

class points
, w
hile
the dashed (green) line is obtained
by LDA only with those so

called
margin
al
points
.
Fig.2
Two

class
non

Gaussian
data points and
their discriminant
plane
s
generated by LDA.
S
ymbol
“
o
”
stands for those points
in Class 1, while
“□”
is
for Class 2. The
marginal points
, marked
“
x
”
,
come from aforementioned inter

class relationship
graphs.
T
he solid (black) line is a discriminant plane obtained by LDA
with
all training samples, while for the dashed (green)
one, obtained by LDA only with
those
marginal points
.
From
Fig.2 and
the
design
idea of LIPSVM,
a two

step LDA can be
developed
through using th
ose
interior
points
or
margin
al
points
in the two classes
.
F
irstly,
select
those
nonzero

degree
points
of
k

NN
graphs
from
training samples;
secondly
, train LDA with th
e
selected
points.
F
urther analysis and experiments
are
discussed
in Section
5
.
In
what
follows, we turn to our experimental tests and some compa
risons.
5
.
E
xperiment
al
V
alidation
s
and
C
omparison
s
T
o
demonstrate
the performance of our proposed algorithm, we report results on
one
synthetic
toy

problem
and UCI
[
43
]
real

world
datasets in two parts: 1) comparisons among
LIPSVM
,
ReGEC,
GEPSVM
and
LDA;
2) comparisons between LDA and their
varia
nts.
T
he
synthetic
data set
named
“
CrossPlanes
”
consist
s
of two

class samples
generated respectively from two intersecting planes (lines) plus
Gaussian
noise.
I
n this
section, all computational time was
obtained
o
n a machine
running
Matlab 6.5 on Windows xp with a
Pentium IV 1.0GHz processor and 512 megabytes of memory.
5.1
C
omparisons among
LIPSVM
,
ReGEC,
GEPSVM
and
LDA
I
n this
sub
section,
we test the foresaid classifiers
with
linear and
Gaussian
kernel
, respecti
vely.
Table 1 shows a
comparison
of
LIPSVM
versus
ReGEC,
GEPSVM, LDA.
When
a linear kernel
is used
,
ReGEC has two regularization parameter
δ
1
and
δ
2
, while
each of
GEPSVM and
LIPSVM
ha
s
a
single
parameter:
δ
for GEPSVM
and
k
for
LIPSVM
1
.
Parameters
δ
1
and
δ
2
were selected from {10
i

i
=

4,

3
…
3
,
4}
,
δ
and
C
were
selected from the values {10
i

i
=

7,

6
…
6
, 7
},
and
k
of NN
in
LIPSVM
was
select
ed
from {2
, 3, 5, 8
} by using 10 percent of
each
training set as a tuning set.
According to the suggestion [
9
],
the tuning set of
GEPSVM
was
not returned to the training fold to learn the final classifier once the
parameter
was
determined
.
W
hen
facing
singularity
of
both
augmented
sample
matrices
,
a small
disturbance
,
such as
η
I
,
will be added to
the
G
in
ReGEC.
I
n addition to reporting the average accuracies across the 10
folds, we
also
performed paired
t

tests
[
44
]
in
comparing
LIPSVM
to
ReGEC,
GEPSVM
and LDA.
T
he
p

value for each test is the probability of the observed or a g
reater difference assumption of the null
hypothesis that there is no difference between test set correctness distributions.
T
hus, the smaller the
p

value,
the less likely that the observed difference resulted from identical test set correctness distributio
ns.
A
typical
threshold for
p

value is 0.05. For example, the
p

value of the test
when
comparing
LIPSVM
and GEPSVM
on the Glass data set is 0.000 (<
0.05
)
,
meaning
that
LIPSVM
and GEPSVM have different accuracies on
this data set
.
Table 1
shows
that
GEPSVM
, ReGEC
and LIPSVM significantly outperform
LDA
on the
CrossPlanes
.
1
T
he parameter
k
, is
the number of NN. In the experiments, we use Euclidean distance to evaluate the nearest neighbors o
f
the
samples
x
i
with assumption
k
1
=
k
2
=
k
.
Table 1
. Linear Kernel LIPSVM, ReGEC, GEPSVM and LDA, 10

fold average testing
correctness
(Corr) (%) and their
standard deviation (STD), p

values, average 10

fold training time (Time,
sec.).
D
ataset
m
×
n
LIPSVM
Corr
±
STD

T
ime (s)
ReGEC
Corr
±
STD
p

value
T
ime (s)
GEPSVM
Corr
±
STD
p

value
T
ime (s)
LDA
Corr
±
STD
p

value
T
ime (s)
G
lass
214
×
9
87.87±1.37

0.0012
80.46
±
6.01*
0.012
0.0010
63.28±4.81
*
0
.000
0.0072
91.029±2.08
0.13
1
0.0068
I
ris23
100
×
4
95.00
±
1.66

0.0011
90.00
±
4.00
0.142
0.0008
93.00
±
3.00
0
.509
0.0055

97.00
±
1.53
0
.343
0.0006
S
onar
208
×
60
80.43
±2.73

0.0150
67.14
±
5.14*
0.001
0.0043
76.00
±2.33
0.200
0.0775
71.57
±2.07
*
0.016
0.0037
L
iver
345
×
6
72.48
±
2.48

0.0013
66.74
±
3.67
0.116
0.00
1
9
59.13
±
2.03
*
0
.002
0.043
61.96
±
2.59
*
0
.021
0.0009
C
mc
1
473
×
8
92.64
±
0
.51

0.0020
75.52
±
3.88*
0.000
0.0028
66
.52
±
1.02
*
0.00
0
0.0199

6
7.45
±
0
.65
*
0.000
0.0020

C
heck
1000
×
2
51.60
±1.30

0.0009
51.08
±
2.34
0.186
0.0007
50.35
±
1.25
0.3
62
0.0098
48.87
±
1.43
0.229
0.0013

P
ima
746
×
8
76.04
±
1.11

0.0070
74.88
±
1.70
0.547
0.0034
75.95
±
1.12
0.912
0.0537
76.15
±
1.30
0.936
0.0021
Mushroom
8124
×
22
80.12
±
3.21

6.691
80.82
±
1.87
0.160
8.0201
81.10
±
1.38
0.352
9.360
75.43
±
2.35
0.138
6.281
C
ross
Plan
es
200
×
2
9
6
.50
±
1.58

0.0008
95.00
±
1.00
0.555
0.0201
9
6
.50
±
1.58
1.000
0.0484
53.50
±
17.33*
0.000
0.0160
T
he
p

values
were
from a
t

test comparing each algorithm to L
IPS
VM.
B
est test accurac
ies
are
in bold.
A
n asterisk
(*)
denote
s
a significant differenc
e from
LIPSVM
based on
p

value less than 0.05
, and underline number means minimum train
ing
time
.
Data set
Iris23
is a fraction of UCI Iris dataset with
versicolor
vs.
virginica
.
Table 2
report
s
a
comparison among
the four eigenvalue

based classifiers
us
ing
a
Gaussian kernel.
T
he
kernel
width

parameter
σ
was
chosen from the value {10
i

i
=

4,

3
… 3, 4
} for
all
the
algorithms.
T
he
tradeoff
parameter
C
for SVM was selected from the set {10
i

i
=

4,

3
… 3,
2}, while the
regularization
factor
s
δ
in
GEPSVM
and
δ
1
,
δ
2
in
ReGEC
w
ere
all
selected from the set {10
i

i
=

4,

3
… 3,
4}.
F
or
KFD
[
24
], when the
symmetrical
matrix
N
was
singular
,
the
regularization
trick
was
adopted
by
setting
N
=N+
η
I
,
where
η
(
>0)
was selected from {10
i

i
=

4,

3
… 3,
4}.
I
is an identity matrix with the same s
ize of
N
.
The
k
in the
nonlinear LIPSVM is
the
same as
that in
its linear case.
P
arameter selection
was
done by comparing
the accuracy of each combination of parameters on a tuning set consisting of a random 10 percent of each
training set.
O
n
the
syntheti
c
data set CrossPlanes, Table 2 report
s
that LIPSVM
, ReGEC
and GEPSVM
are
also
significantly outperform LDA.
Table 2
. Gaussian Kernel LIPSVM, ReGEC
2
, GEPSVM, SVM and LDA, 10

fold average testing
correctness
(Corr) (%)
and their standard deviation (STD), p

values, average 10

fold training time (Time, sec.)
Dataset
m
×
n
LIPSVM
Corr
±
STD

T
ime (s)
ReGEC
Corr
±
STD
p

value
T
ime (s)
GEPSVM
Corr
±
STD
p

value
T
ime (s)
LDA
Corr
±
STD
p

value
T
ime (s)
WPBC
194
×
32
77.51
±2.48
0.0908
76.87
±
3.63
0.381
0.1939
6
3
.52±3.51
*
0.000
3.7370
65.17±2.86
*
0.001
0.1538
Check
1000
×
2
92.22
±
3.50
31.25
88.10
±
3.92*
0.028
28.41
87.43
±
1.31*
0.001
40.30
93.38
±
2.93
0.514
24.51
Ionosph
ere
351
×
34
98.72
±4.04
0.4068
91.46
±
8.26*
0.012
0.8048
46.99±14.57
*
0.000
1.5765
87.87±8.95
*
0.0
10
0.6049
G
lass
214
×
9
97.64
±7.44

0.0500
93.86
±
5.31
0.258
0.2725
71.04±17.15
*
0.002
0.5218
89.47±10.29
0.090
0.1775
C
mc
1
473
×
8
92.60
±
0.08

34.1523
93.05
±
1.00
0.362
40.4720
58.50
±
12.88
*
0.011
57.2627
82.74
±
4.60
*
0.037
64.4223
WDBC
569
×
30
90.17
±
3.52

1
.9713
91.37
±
2.80
0.225
5.4084
37.23
±
0.86*
0.000
5.3504
92.65
±
2.36
0.103
3.3480
W
ater
116
×
38
79.93
±
12.56

0.1543
57.11
±
3.91*
0.003
0.1073
45.29
±
2.69*
0.000
1.3229
66.53
±
18.06*
0.036
0.0532
C
rossPlanes
2
00
×
2
98.75
±
1.64

0.2039
98.00
±
0.00
1.00
1.9849
98.1
3
±
2.01
0.591
2.2409
58.58
±
10.01*
0.000
1.8862
T
he
p

values
were
from a
t

test comparing each algorithm to L
IPS
VM.
B
est test accurac
ies
are
in bold.
A
n asterisk
(*)
2
Gaussian ReGEC
Matlab
code
available
at:
http://www.na.icar.cnr.it/~mariog/
denote
s
a significant difference from
LIPSVM
based on
p

value less than 0.05
, and underline
number means minimum train
ing
time
.
5.2
C
omparisons
between
LDA and
its
extensions
I
n this
sub
section, we ma
d
e comparisons
on
computation
al
time and test accurac
ies
among
LDA and their
extended versions.
I
n order to avoid unbalanced classification
pro
blem
,
the extended classifiers
,
respectively
named as I
nterior
_LDA and marginal_LDA ,
were trained on
two

class
interior
points
and
marginal points
.
Table
s
3 and 4
report comparisons between LDA and its variants
, respectively
.
Table
3
.
Linear kernel LDA
and its
variants
:
Interior
_LDA and Margin
al
_LDA, 10

fold average test
ing
correctness
±
STD,
p

value, average 10

fold training
time
(Time, sec.)
Dataset
m
×
n
LDA
C潲r散e湥n猠
±
STD

Tim攨獥捯湤c)
I湴敲i潲_LDA
C潲r散e湥n猠
±
STD
p

value
Time(seconds)
Margin
a
l
_
LDA
Correctness
±
STD
p

value
Time(seconds)
Glass
214
×
9
㤱⸰9
±
㘮㔸

〮㈰0
㤰⸶9
±
㘮㌷
〮㌴0
〮〹0
92.13
±
4.78
0.546
0.005
S
onar
208
×
㘰
㜱⸵7
±
㘮㔵

〮ㄹ0
㜱⸲7
±
㠮㌳
〮㠳0
〮〷0
72.29
±
8.01
0.867
0.023
L
iver
345
×
6
㘱⸹6
±
㠮ㄸ

〮㜲0
67.29
±
6.45*
0.007
0.
350
64.86
±
7.80*
0.024
0.121
C
mc
1473
×
8
㜷⸴7
±
㈮〷

㘰⸲㠰
77.92
±
2.34
0.178
25.227
75.86
±
3.08
0.56
0.253
Ionosph
ere
351
×
㌴
㠴⸹8
±
㘮㜴

〮㠶0
88.23
±
4.15
0.066
0.053
85.80
±
6.21
0.520
0.023
Check
1000
×
2
㐸⸸4
±
4
.㐳

〮〰ㄳ
㔱⸶5
±
㔮㤸
〮㈵0
〮〰
㌰
52
.01
±
4.
32
0.130
0.001
1
Pima
746
×
8
㜶⸱7
±
㐮ㄲ

〮〰㈱
76.58
±
3.33
0.558
0.0010
75.39
±
5.00
0.525
0.0005
T
he
p

values
were
from a
t

test comparing
LDA
variants
to
LDA
.
B
est test accurac
ies
are
in bold.
A
n asterisk
(*) denote
s
a
significant difference from
LDA
base
d on
p

value less than 0.05
, and underline number means minimum training time
.
Table
4
.
Gaussian
kernel LDA
,
Interior_LDA
and Margin
al
_LDA, 10

fold average test
ing correctness
±
STD,
p

value, average 10

fold training
time
(Time, sec.)
.
Dataset
m
×
n
LDA
C潲r
散e湥n猠
±
STD

Tim攨猩
I湴敲i潲_LDA
C潲r散e湥n猠
±
STD
p

value
Time(s)
Margin
al
_
LDA
Correctness
±
STD
p

value
Time(s)
Pima
746
×
8
66.83
±
4.32

6.354
63.46
±
6.45
0.056
2.67
1
65.10
±
6.60
0.060
0.47
8
Ionosphere
351
×
㌴
91.55
±
6.89

0.65
8
89.78
±
3.56
0.246
0.052
73.74
±
7.24*
0.000
0.01
9
C
heck
1000
×
2
93.49
±
4.08

15.10
7
9 2.5 4
±
3.3 0
0.2 0 5
1 0.6 5
5
8 6.6 4
±
3.3 7 *
0.0 0 1
0.4 8
6
L
i v e r
3 4 5
×
6
6 6.0 4
±
9.7 0

0.6 4
1
6 4.4 3
±
8.5 2
0.4 6 3
0.3 2 5
5 7.4 6
±
9.2 9 *
0.0 1 2
0.0 9 2
M
o n k 1
4 3 2
×
6
㘶⸳6
±
㜮㔷

ㄮ㤲1
6 8.8 9
±
8.8 6
0.5 6 6
0.3 8
1
6 5.0 0
±
9.6 4
0.7 3 2
0.3 1 8
M
o n k 2
4 3 2
×
6
㔷⸸5
±
㠮㠹

ㄮ㈵㘠
㔷⸶5
±
㠮㜶
〮㔹0
〮0
㈰
6 7.8 9
±
8.4 2
0.3 9 0
0.1 8 6
M
o n k 3
4 3 2
×
6
9 9.7 2
±
0.8 8

1.2 7 6
9 9.7 2
±
0.8 8
1.0 0 0
0.2 4 3
9 8.0 6
±
1.3 4 *
0.0 0 5
0.1 7 4
T
h e
p

v a l u e s
w e r e
f r o m a
t

t e s t c o m p a r i n g
L D A
v a r i a n t s
t o
L D A
.
B
est test accura
c
ies
are
in bold.
A
n asterisk
(*) denote
s
a
significant difference from
LDA
based on
p

value less than 0.05
, and underline number means minimum training time
.
Table
3
s
ay
s
that
for
linear
case,
LDA and its variants have insignificant
performance
differenc
e
on most
data sets.
B
ut
Table
4
indicate
s
that
a s
ignificant difference
exists
between
nonlinear
LDA and
Marginal_LDA
.
Furthermore,
compared to its linear one, as described in [
24
], the
nonlinear LDA is more
likely
prone to
s
ingular
due to
higher
(even infinite)
dimensionality in kernel space.
5.3
C
omparison between LIPSVM and I

ReGEC
A
s mentioned
before
,
the recently

proposed
I

ReGEC
[40]
also
involves
a subset selection and is much
related to our work
. H
owever
,
it is worth t
o point out
several differences
between LIPSVM and I

ReGEC:
1)
I

ReGEC adopts an incremental
fashion to
find the training subset
while
LIPSVM
does not,
2) I

ReGEC is
sensitive to its initial selection as the authors declared, while ours does NOT involve su
ch selection and thus
does not
suffer from
such a problem
;
3)
I

ReGEC
seeks its
two
nonparallel hyperplanes
from one single
generalized eigenvalue problem
with respect to all the classes
,
while
LIPSVM
does it
from corresponding
ordinary eigenvalue problem
with respect to each class
；
4
）
I

ReGEC does not
take
underlying
proximity
information between data points
into account
in
construct
ing
classifier
while
LIPSVM
just do
es
.
I
n what follows, we give a comparison of
their
classification
accurac
ies
using Gaussia
n kernel and
tabulate
the results
in Table 5 (where I

ReGEC results are directly
copied
from [40])
Table 5 Test accuracies of I

ReGEC and LIPSVM with Gaussian kernel
dataset
I

ReGEC
T
est accuracy
LIPSVM
Test accracy
Banana
85.49
86.23
German
73.50
74.28
Diabetis
74.13
75.86
Bupa

liver
63.94
70.56
WPBC*
60.27
64.35
Thyroid
94.01
94.38
Flare

solar
65.11
64.89
Table 5 says that in
6
out of the
7
datasets, the test
accuracies
of LIPSVM
are better than or comparable to
th
ose
of I

ReGEC.
6. Conclusion
an
d Future Work
I
n this paper,
following the geometrical interpretation of GEPSVM
and
fus
ing
the
local information into
the
design
of
classifier
, we
propose a new
robust
plane classifier termed as LIPSVM
and its nonlinear
version
derived by
so

called kernel
technique
s
.
I
nspired by
the
MMC
criterion
for dimensionality reduction
,
we define a similar criterion for designing
LIPSVM
(classifier)
and then
seek
two nonparallel planes
to
respectively fit the two given classes
by
solv
ing
the
two
corresponding
standard
rather than generalized
eigenvalue problems
in GEPSVM.
O
ur
experimental
results on most public datasets
used here
demonstrate
that LIPSVM obtain
s
statistically comparable
testing accuracy
to
the foresaid
classifiers.
However, we also
notice
that
due to th
e limitation of the current algorithm
s
in solving larger scale eigenvalue problem
,
LIPSVM
also inherits
such a
limit
ation
.
O
ur f
uture work includes
that how to go further for solving
real
large classification problems
un
fit
ted
in memory, for both linear an
d nonlinear LIPSVM.
W
e also plan to
explore
some
heuristic
rules
to
guide
the
parameter selection of KNN.
Acknowledgment
T
he
corresponding
author
would like to thank Natural Science Foundations of China
Grant No.
60773061
for
partial
support.
R
e
ference
[
1
].
C.J.C. Burges. A tutorial on support vector machine for pattern recognition. Data Mining and Knowledge Discovery,
vol.
2(2), 1

47, 1998.
[
2
]. B. Scholkopf, A. Smola, K.

R Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural computation,
1998, 10: 1299

1319
[
3
]. B. Scholkopf, P. Simard, A.Smola, and V. Vapnik. Prior Knowledge in Support Vector Kernel
s. In M. Jordan, M.Kearns,
and S.Solla, editors, Advances in Neural Information Processing Systems 10, Cambridge, MA: MIT Press, 1998: 640

646
[
4
]. V.Blanz, B.Scholkopf, H.Bulthoff, C.Burges, V.Vapnik and T.Vetter. Comparison of view

based object recogniti
on
algorithms using realistic 3d models. Artificial Neural Networks, Proceedings of the International Conference on Artificial
Neural Networks, (Eds.) C. von der Malsburg, W. von Seelen, J.

C. Vorbrüggen, B. Sendhoff. Springer Lecture Notes in
Computer Sci
ence, Bochum 1996 1112, 251

256. (Eds.) von der Malsburg, C., W. von Seelen, J. C. Vorbrüggen and B.
Sendhoff, Springer (1996)
[
5
]. M. Schmidt. Identifying speaker with support vector networks. In Interface ’96 proceedings, Sydney, 1996
[
6
]. E.Osuna,, R.Fr
eund and F. Girosi. Training support vector machines: an application to face detection. In IEEE Conference
on Computer Vition and Pattern Recognition, 1997: 130

136
[
7
]. T. Joachims. Text categorization with support vector machines: learning with many. rel
evant features. In: Proc of the 10th
European Conf on Machine learning, 1999:137

142.
[
8
]. G. Fung and O. L. Mangasarian.
Proximal Support Vector Machine Classifiers.
Proc. Knowledge Discovery and Data
Mining (KDD), F. Provost and R. Srikant, eds., pp. 77

86, 2001.
[
9
]. O. L. Mangasarian and Edward W. Wild, Multisurface Proximal Support Vector Machine Classification via Generalized
Eigenvalues. IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI), 28(1): 69

74. 2006
[
10
].
T. M.Cover and P. E
. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 1967,
13:21

27
[
11
]. Y. Wu, K. Ianakiev and V. Govindaraju. Improved k

nearset neighbor classification. Pattern Recognition 35: 2311

2318,
2002
[
12
].
S. Lafon, Y. Kell
er and R. R. Coifman. Data fusion and multicue data matching by diffusion Maps.
IEEE Trans. On
Pattern Analysis and Machine Intelligence (PAMI)
2006, 28(11): 1784

1797.
[
13
]. S. Yan, D. Xu, B. Zhang and H.

J. Zhang, Graph Embedding: A General Framework fo
r Dimensionality Reduction,
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2005).
[
14
]. S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (2000): 2323

23
26
[
15
].
J. B. Tenenbaum, V. de Silva and J. C. Langford, A
global
geometric framework for nonlinear dimensionality reduction,
Science 290 (2000):2319

2323
[
16
]. M. Nelkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering,
Advances in Neural
Information Processing Systems 14 (NIPS 2001), pp: 585

591, MIT Press, Cambridge, 2002.
[
17
]. H.

T Chen, H.

W. Chang, T.

L.Liu, Local discriminant embedding and its variants, in: Proceedings of International
Conference on Computer Vision
and Pattern Recognition, 2005
[
18
].
P. Mordohai and G. Medioni, Unsupervised dimensionality estimation and manifold learning in high

dimensional spaces
by tersor voting. 19th International Joint Conference on A
rtificial
Intelligence (IJCAI) 2005:798

803
[
19
]
.
K.Q. Weinberger and L.K.Saul. Unsupervised learning of image manifolds by semidefine programming. In Proc. Int. Conf.
on Computer Vision and Pattern Recognition, Pages II: 988

995, 2004
[
20
]. H. Li, T. Jiang and K. Zhang. Efficient and robust feature extr
action by maximum margin criterion.
In:
Proc Conf
.
Advances in Neural Information Processing Systems.
Cambrigde,
MA:MIT Press,
2004.97

104
[
21
].
V
.
Marno
and
N
.
Theo
.
Minimum MSE estimation of a regression model with fixed effects from a series of cross

sect
ions
.
Journal of Econometrics
, 1993
,
59(1

2), 125

136
[
22
]. H. Zhang, W.Huang, Z.Huang and B.Zhang. A Kernel Autoassociator Approach to Pattern Classification. IEEE
Transaction on Systems, Man, and C
ybernetics
–
Part B: CYBERNETICS, Vol.35, No.3, June 2005.
[
23
]. B. Scholkopf and A. J. Smola. Learning With Kernels, Cambridge, MA: MIT Press, 2002.
[
24
]. S. Mika. Kernel Fisher Discriminants. PhD thesis, Technischen Universität, Berlin, Germany, 2002
[
25
]. S
. Mika, G. Rätsch, J. Weston, B. Schölkopf and K.

R. Müller. Fisher discriminant analysis with kernels. In Y.

H. Hu,
E.Wilson J.Larsen, and S. Douglas, editors, Neural Networks for Signal Processing IX, pp: 41

48. IEEE, 1999
[
26
]. S.Mike. A mathematical app
roach to kernel Fisher algorithm. In Todd K.Leen and Thomas G.Dietterich Völker Tresp,
editors, Advances in Neural Information Processing Systems 13, pages: 591

597, Cambridge, MA, 2001.MIT Press
[
27
]. M. Kuss, T. Graepel, The geometry of kernel canonical c
orrelation analysis, Technical Report No.108, Max Planck
Institute for Biological Cybernetics, Tubingen, Germany, May 2003
[
28
]. B.Schlkopf, A.Smola and K.

R. Miller, Nonlinear component analysis as a kernel eigenvalue problem, Neural
Computation, vol. 1
0(5), 1998
[
29
]. B.Scholkopf, A.Smola, and K.

R.Muller, Kernel principal component analysis, in Advance in Kernel Methods, Support
Vector Learning, N.Scholkopf, C.J.C.Burges, and A. Smola, Eds. Combridge, MA: MIT Press, 1999
[
30
]. T.P. Centeno and N.D.Lawren
ce
.
Optimizing kernel parameters and regularization coefficients for non

linear discriminant
analysis, Journal of Machine Learning Research 7 (2006): 455

491
[
31
]. K. R. Müller , S. Mika, G. Rätsch, et al.
An introduction to kernel

based learning algorithms
. IEEE Transaction on Neural
Network, 2001, 12(2):181

202
[
32
]. M. Wang and S. Chen. Enhanced FMAM Based on Empirical Kernel Map, IEEE Trans. on Neural Networks, Vol. 16(3):
557

564.
[
33
]. P.Berkes. Handwritten digit recognition with nonlinear Fisher discrim
inant analysis,
Proc. of ICANN Vol. 2, Springer,
LNCS 3696, 285

287
[
34
].
Liu CJ, Wechsler H. A shape

and texture

based enhanced Fisher classifier for face recognition. IEEE Transactions on
Image Processing, 2001,10(4):598~608
[
35
].
S. Yan, D. Xu, L. zhan
g, Q. Yang, X. Tang and H. Zhang. Multilinear
d
iscriminant
a
nalysis for
f
ace
r
ecognition, IEEE
Transactions on Image Processing (TIP),
2007, 16(1): 212

220
[
36
].
T
.
Li, S
.
Zhu, and M
.
Ogihara. Efficient
m
ulti

way
t
ext
c
ategorization via
g
eneralized
d
iscrimina
nt
a
nalysis. In Proceedings
of the Twelfth International Conference on Information and Knowledge Management
(CIKM’03)
:
317

324
,
2003
[
37
].
R
.
Lin, M
.
Yang, S
.
E. Levinson
.
Object
t
racking
u
sing
i
ncremental
f
isher
d
iscriminant
a
nalysis. 17th International
Co
nference on Pattern Recognition
(
ICPR’
04)
(2)
:
757

760
, 2004
[
38
].
M.R. Guarracino, C. Cifarelli, O. Seref, P. M. Pardalos
.
A Parallel
Classification Method for Genomic and Proteomic
Problems, 20th
International Conference on Advanced Information Networkin
g and
Applications

Volume 2 (AINA'06),
2006, Pages 588

592.
[
39
].
M.R. Guarracino, C. Cifarelli, O. Seref,
M
. Pardalos.
A Classification Method Based on Generalized Eig
envalue
Problems
, Optimization Methods and Software, vol. 22, n. 1 Pages 73

81, 2007.
[
40
].
C. Cifarelli,
M.R. Guarracino
,
O. Seref
, S. Cuciniello and P.
M.
Pardalos
. Incremental classification with generalized
eigenvalues. Journal of Classification, 2007,
24:205

219
[
41
].
J
.
Liu, S
.
Chen,
and
X
.
Tan, A Study on three Linear Discriminant Analysis Based Methods in Small Sample Size Problem,
Pattern Recognition, 2008
,
41(1)
:
102

116
[
42
]. S. Chen and X. Yang. Alternative linear discriminant classifier. Pattern R
ecognition, 2004, 37(7): 1545

1547
[
43
]. P.M. Murphy and D.W. Aha. UCI Machine Learning Repository, 1992,
www.ics.uci.edu/~mlearn
/ML Repository.html.
[
44
]. T. M. Mitchell. Machine Learning. Boston: McGraw

Hill,
1997
Comments 0
Log in to post a comment