UNSUPERVISED FEATURE LEARNING CLASSIFICATION USING AN EXTREME LEARNING MACHINE

munchsistersAI and Robotics

Oct 17, 2013 (4 years and 24 days ago)

80 views

1


Dao Lam

Department of Electrical and Computer Engineering

Missouri University of Science and Technology, Rolla, MO 65409


UNSUPE
R
VISED

FE
A
TURE

LEARNING

CLASSIFIC
A
TION

USING

AN

EXTREME
LEARNING

M
A
CHINE





Abstrac
t

This

paper

p
r
esents

a

new

app
r
oach,

which

we
call

UFL
-
ELM,

to

classification

using

both

unsupe
r
vised

and
supe
r
vised

lea
r
nin
g
.

Unli
k
e

traditional

app
r
oaches

in

which
featu
r
es

a
r
e

extracted,

hand
-
crafted,

and

then

trained

using

time
-

consuming,

iterated

optimization,

this

p
r
oposed

method

l
e
v
erages
unsupe
r
vised

featu
r
e

lea
r
ning

to

lea
r
n

featu
r
es

f
r
om

the

data
themsel
v
es

and

then

train

the

classifier

using

an

ext
r
eme

lea
r
ning
machine

to

r
each


the

analytic

solution.

The

r
esul
t


is

the
r
e
f
o
r
e
widely

and

quickly

applied

to

un
iv
ersal

data

.

Experiments
on

a

la
r
ge

dataset

of

images

confirm

the

ease

of

use

and
speed

of

training

of

this

unsupe
r
vised

featu
r
e

lea
r
ning

app
r
oach.
Furthermo
r
e,

the

paper

discusses

h
o
w

to

speed

up

training,

using
mass
iv
ely

parallel

p
r
ogrammin
g
.




I
.

I
NT
R
O
DU
C
T
I
O
N


Classification

is

one

of

the

most

important

applications
of

machine

learning.

T
raditional

classifiers

require

a

human
operator

to

design

features

that

represent

high
-
l
e
v
el

features
of

the

objects,

such

as

Scale
-
i
n
v
ariant

feature

transform

SIFT
and

Histogram

of

Oriented

Gradients

HOG

[15],[3].

A

good
classifier

requires

progress
i
v
ely

more

data

for

training,

which
also

increases

the

time

required

for

training.

These

require
-

ments

ma
k
e

classification

a

daunting

task

for

machine

vision
systems.

T
o

tackle

the

disad
v
antage

of

hand
-
crafted

features,

s
e
v
eral
methods

of

unsupervised feature

learning

(UFL)

h
av
e

been
researched,

such

as

Sparse

Coding

,

Deep

Belief

Nets,

Auto
Encode
r
,

and

Independent

Subspace

Analysis

[14],

[7].

These
approaches

also

l
e
v
erage

the

f
act

that

unlabeled

data

are
plentiful

and

do

not

require

much

e
f
fort

to

collect.

The

data
that

can

be

learned

by

UFL

is

un
i
v
ersal,

including

audio,
image,

and

video

data.

Extracting

features

from

the

dataset

is

not

the

only

di
f
ficult
task

in

classification.

T
raining

the

classifier

remains

an

open
problem.

A

popular

classifier

is

the

Support

V
ector

Machine
[8]

and

its

v
ariant

[13]

for

multiclass

classifications.

Other
methods

discussed

in

the

literature

include

boosting

[5],

b
ut
the

performance

of

those

methods

are

sl
o
w

and

requires

some
parameters

to

be

tuned.

In

[10],

Huang

introduced

the

Extreme

Learning

Machine
(ELM)

to

ease

the

task

of

classification.

ELM

is

a

f
ast

learning
feed
-
for
w
ard

single

hidden

layer

neural

net
w
ork,

which

can
approximate

a
n
y

nonlinear

function,

outperform

t

ma
n
y

other
classifiers,

and

pr
o
vide

v
ery

accurate

r
e
gression

[11].

The

method

presented

in

this

paper

represents

a

combination
of

unsupervised

feature

learning

and

the

e
xtreme

learning
machine

(UFL
-
ELM)

for
classification.

This

combiantion

pro
-

duces

t
w
o
-
fold

results:

first,

it

eliminates

the

requirement

of

hand
-
crafting

features;

second,

it

pr
o
vides

a

f
aster

method

of
training

the

classifier

than

does

unsupervised

learning.

The

remainder

of

the

paper

is

o
r
g
anized

as

foll
o
ws:

Section
II

discusses

UFL,

foll
o
wed

by

ELM.

Then,

in

Section

IV
we

introduce

the

fram
e
w
ork

of

UFL
-
ELM

and

discuss

its
performance.


II
.

U
F
L

In

computer

vision

and

machine

learning,

features

are

the
highest

representations

of

the

i
n
v
esti
g
ated

object.
Pr
e
vious
w
ork

has

sh
o
wn

that

hand
-
designed

local

features,

such

as
SIFT

and

HOG

[15],[3],

perform

well

b
ut

are

di
f
ficult
to
generalize

ov
er

di
f
ferent

kinds

of

datasets,

such

as

video,

t
e
xt,
and

sound.

A

gr
o
wing

interest

in

learning

features

directly
from

data

is

satisfied

with

UFL

[2]

and
i
s

implemented

through
encoders,

such

as

the

Sparse

Encoder

[16]

and

k
-
means

[17].
Unsupervised

feature

learning

consists

of

t
w
o

steps:

1)
b
uilding

the

feature

encode
r
,

and

2)

encoding

the

feature.
When

b
uilding

the
feature

encode
r
,

unlabeled

data

are

pro
-

cessed.

Those

unlabeled

data

can

consist

of

the

dataset

itself,
compute
r
-
generated

data,

or

data

collected

from

the

internet
or

other

related

sources.

This

is

an

ad
v
antage

of

UFL

o
v
er
traditional

methods.

Processing

the

unlabeled

data

i
n
v
ol
v
es

learning

the

feature

encoder

through

the

foll
o
wing

steps:

1.

Extract

random

patches
from

unlabeled

training

2.

Apply

pre
-
processing

to

enhance

the

contrast

of

the
patches

3.

Learn

the

feature

encoder

by

using

an

unsupervised
learning

algorithm,

such

as

k
-
means

or

an

auto

encoder

Once

the

feature

encoder

is

learned,

g
i
v
en

a

piece

of

data,
we

can

e
xtract

the

UFL

feature

by

performing

the

foll
o
wing
steps:

1.

Extract

the

patches

from

the

data

(
P
atches

should

ade
-

quately

c
ov
er

all

of

the

data

2.

Encode

each

patch

using

the

feature

encoder

learned

in
the

pr
e
vious

step

3.

Pool

learned

features

together

to

reduce

the

number

of
features

Figure

1

e
xplains

h
o
w

to

encode

the

feature

when

the

input
data

consists

of

images.


III
.

E
L
M

The

architecture

of

an

ELM

for

classification

is

depicted
in

Figure

2.

The

most

ad
v
antageous

feature

of

ELM

is

the
w
ay

it

is

trained.

Unli
k
e

ma
n
y

other

neural

net
w
orks

that

ta
k
e
hours

or

e
v
en

days

to

train

because

of

their

sl
o
w

co
n
v
e
r
gence
in

the

optimization

process

[4],

ELM

input

weights

can

be

2







K

channels


I

/









(
n
-
w)/s+

1



J;

J,


.........

of

classification.

Denote

wi

E

RNHxn

and

Wo

E

RkXNHas

the

input
weight

matrix

and

output

weight

matrix

of

ELM,

where

N
H
is

the

number

of

neurons

in

the

hidden

layer.

Doing

so

yields


H

=

g(Wi

*X)


(1)


where

H

E

RNH

xN

is

the

hidden

layer

output

matrix

of

ELM,

and

g

is

the

activation

function

of

the

neuron.

Figure

1.

Learning

the

UFL

features

to represent

an

image.

Images

are
divided

into

patches

of

w

x

w

pixels.

Each

patch

is

encoded

into

a

vector
based

on

the

feature

encoder
learned

in

the

unsupervised

step.

Features

of

the
patches

that

lie

in

the

same

quadrant

as

the

image

will

be

pooled

together

to
reduce

the

number

of

features

used

to

represent

an

image.

In

this

figure,

there
are

four

times

more

UFL

features

than

elements

K

in

the

feature

encoder.
(Figure

adapted

from[2].)






Input

Layer



Figure

2.

ELM

architecture

[11].

ELM

for

classification

is

a

feed
-
forward
single

hidden

neural

network

in

which

the

number

of

input

neurons

equals
the

number

of

features

in

the

dataset

and

the

number

of
output

neurons

equals
the

number

of

classes

to

classify.

The

number

of

hidden

neurons

is

usually
equivalent

to

the

number

of

features.



initialized

randomly,

and

the

output

weights

can

be

determined
analytically

by

a

pseudo

inverse

matrix

operation.

An

ELM

classifier

has

3 layers:

1)

the

input

layer,

whose
number of

neurons

equals

the

number

of

features

in

the
dataset;

2)

the

hidden

layer,

which

has

a

non
-
linear

activation
function;

and

3)

the

output

layer,

whose

number

of

output
neurons

equals

the

number

of

classes.

Let

X

E

RnxN

=

[
x
1
,xn
+
1
,
..
.
,xN],

wherenisthenumber
of

features

and

N

is

the

number

of

data

pieces,

be

the

data
used

to

train

the

ELM.

To

include

the

bias

value

of

the

neuron,

we

transform

X

into

X

by

adding

a

row

vector

of

all

1s,

i.e.

"

X

X

=

[

1

].

Let

C

E

RkxN

=

[
c
1
,

c
2
,

.
.
eN],

where

k

is

the

number
of

classes,

and

ci

=

[0,

0,

.
.
,

1,

.
.
o]T

is

the

vector

of

all

zeros
except

at

the

correct

class

vector,

which

is the

expected

output

Once

we

obtain

H,

we

can

calculate

the

output

of

the

output
layer



(2)


Eq.

2

occurs

because

the

output

node

activation

function

is
linear.

For

training

purposes,

0

should

be

as

close

to

C

as
possible,
i.e.

110
-

Cll

=

0.

ELM

theory

states

that

to

achieve

110
-

Cll

=

0

[11],

we

can

initialize

Wi

with

a

random

value

and

compute

W

0

as


W
0

=

pinv(H)

*

C


(3)


where

pinv(H)

represents

the

generalized

inverse

of

a
matrix.

Though

ELM

can

be

modified

to

some

extent

to

improve

its

performance

[9]

or

reduce

its

complexity

[12],

a

simple
implementation

is

sufficient

for

several

applications.

Once

training

is

complete,

we

can

use

ELM

to

classify

the

testing

set.


IV.

TECHNICAL

APPROACH

FRAMEWORK

In

this

section,

we

describe

the

new

approach

to

classi-
fication using

UFL
-
ELM.

Given

a

dataset

to

classify,

this
framework

requires

two

steps: first, the

UFL

phase

collects
more

data

and

labels

some

of

the

data

for

learning

and

training;
second,

ELM

trains

the

classifier

using

the

labeled

data.

Specifically,

for

the

UFL

learning

phase,

the

additional
data

that

were

collected

can

be

obtained

from

any

source

of
the

same

domain.

For

example,

when

working

with
image
classification,

good

sources

of

data

include

Flickr,

Google
Images,

and

Bing

Images.

The

additional

data

can even

be
generated

by

computer

graphics

programs

to

help enrich

the
feature

learning.

The

UFL

phase

uses

unlabeled

data

to

build

the

feature
encoder.

It

begins

by

densely

sampling

each

of

the

data

into
patches.

Each

patch

then

is

vectorized

and

is

considered

an
object

in

unsupervised

learning

to learn

the

encoder's

structure.
An

unsupervised

learning

algorithm,

such

as

k
-
means

or

an
auto

encoder,

is

applied

to

those

patches

to

learn

the

structure
of

the

encoder.

This

encoder

then

is

used

to

learn

features

in
the

labeled

dataset.

At

this

point,

for

each

labeled

object,

patches

are

sequen-
tially

extracted

to

represent

the

object

at

a

low

level.

All

of
the

patches

then

are

compared

inside

the

feature

encoder

to
form

the

features

of

the

object

at

a

higher

level,

such

as
edges

or

comers

in

the

case

of

images.

An

object

may

need
several

patches

to

represent

it

at

a

low

level,

and

even

more

to

3


























L
a
b
e
l
e
d

Data




U
n
l
a
b
e
l
e
d

D
a
t
a









C
l
a
ss
i
f
i
e
r

R
e
s
u
l
t

U
F
L
E
n
c
od
e
r




E
L
M





T
e
s
t
i
n
g

s
e
t

F
e
a
t
u
r
e

Mapping




Training

s
e
t

Labeled U
F
L
F
e
a
t
u
r
e
s




Figure

3.

The

UFL
-
ELM

fram
e
w
ork.

The

unlabeled

data

are

used

to

b
uild
the

feature

encoder

to

later

learn

the

feature

in

the

labeled

data

through

a
feature

mapping

method.

The

labeled

data

then

are

d
i
vided

into

training

and
testing

sets

to train

the

ELM

and

test

ov
erall

performance.



represent

it

at

a

high

l
e
v
el,

so

we

need

to pool

those

features
to

reduce

the

size

of

the

object.

Usuall
y
,

the

final

number

of
features

needed

to

represent

the

object

is

a

f
e
w

times

more
than

the

number

of

elements

in

the

feature

encode
r
.

After

the

labeled

data

are

learned,

th
e
y

are

split

a
g
ain

into

t
w
o

sets:

the

training

set

and the

testing

set.

The

training
set

is

used

to

train

the

ELM.

This

training

is

straightfor
w
ard
and

essentially

i
n
v
ol
v
es

initializing

the

input

laye
r
,

which

is

a
random

weight,

and

then

computing

the

output

weight

using
a

pseudo

i
n
v
erse

matrix

calculation.

This

results

in

v
ery

f
ast
training.

Once

this

step

is

complete,

the

testing

set

is

fed

into
ELM,

and

the

output

neuron

that

has

the

highest

act
iv
ation

is
chosen

as

the

class

of

the

input.


V
.

E
X
P
E
R
I
M
E
N
T

In

this

section,

we

describe the

e
xperiment

used

to

test
UFL
-

ELM

on

an

actual,

la
r
ge

dataset.

The

dataset

we

used

to

test

our

approach

w
as

Ci
f
a
r
-
10

[1],

which

consists

of

60,000

32x32

color

images

in

10

classes,
with

60,000

images

per

class,

mutually

e
xclus
i
v
e.

50,000

of
these images

were

used

for

training,

and

the

remaining

10,000
were

used

for

testing.

The

dataset

w
as

o
r
g
anized

into

a

50.000
x

3072

(3072

=

32x32x3)

matrix

for

training

and

10,000

x

3072

for

testing.

F
or

learning

the

UFL

feature,

we

foll
o
w

the

a
v
ailable
implementation

in

[2].

W
e

used

a

wind
o
w

with

a

size

of
w

=

6x6

pi
x
els

to

randomly

sample

the

training

dataset

to
collect

400,000

patches.

Each

patch

then

w
as

v
ectorized

into
a

column.

Then,

those

patches

were

normalized

and

whitened
to

increase

the

contrast.

W
e

then

applied

k
-
means

into

the

enhanced

patch

matrix

to

learn

K

800

centroids.

T
echnicall
y
,

labeled

data

are

not
required

for

learning

the

centroids

or

the

features

in

later
stages.

This

is

the

ad
v
antage

of

UFL.

Figure

4,

which

is
best

vi
e
wed

in

colo
r
,

depicts

the

feature

encode
r
.

W
e

plotted

a

random

100

of

the

800

centroids

after


Figure

4.

400

centroids

of
the

feature

encoder using

k
-
means

clustering.

The
features

appear

as

v
ertical,

horizontal

and

diagonal

edges,

and

other

features
are

represented

by

the

colored

stripes

in

the

images



rearranging

them

into

an

image

format.

Each

of

the

small
squares

in

the

figure

represents

a

centroid

used

to
encode

the
feature

in

the

later

feature

mapping

step.

Each

square

appears
as

a

horizontal,

v
ertical

or

diagonal

edge

in

the

dataset.

This
is

proof

of

successful

unsupervised

feature

learning.

Other
types

of

datasets,

such

as

audio

and

video,

h
av
e

their

o
wn
corresponding

criteria

to

pr
ov
e

the

success

of

UFL

[14],

[6].
F
or

each

pi
x
el

in

each

image

in

either

the

training

or

the
testing

set,

we

e
xtracted

a

6x6x3

wind
o
w

and

stack

into

a
v
ecto
r
.

W
e

computed

the

distances

from

this

v
ector

to

the

800
learned

centroids

from

the

k
-
means

step.

Then,

we

formed

a
n
e
w

v
ector

from

those

800

distances

with

the

foll
o
wing

rules:
If

the

distance

w
as

la
r
ger

than

the

mean distance,

then

we

k
ept

it;

otherwise,

we

set

that

distance

to

0.

The

v
olume

of

the

distance

v
ector

w
as

27x27x800.

W
e

then
pooled

the

features

to

reduce

the

size

by

summing

up

the
feature

in

the

same

quadrant.

Finall
y
,

we

concatenated

the
v
olume

into

a

feature

v
ector

of

3,200

elements.

This

v
ector

is
the

UFL

presentation

of

one

image.

F
or

classification,

we

used

ELM

with

3,200

inputs.

The
act
iv
ation

function

of

hidden

nodes

w
as

chosen

as

Sigmoid.
The

number

of

output

neurons

w
as

10,

corresponding

to

the

10

classes

of

objects

in

the

dataset.

The

number

of

hidden

neurons

must

be defined

when

using
ELM.

W
e

ran

the

ELM

with

di
f
ferent

numbers

of

hidden
neurons,

from

1,000

to

6,000.

W
e

ran

the

e
xperiment

using
Matlab

on

a

machine

with

an

Intel

Xeon

E5645

CPU

2.4GHz
and

12GB

RAM.

W
e

attempted

a

run

with

7,000

hidden
neurons

b
ut

encountered

an

Out

of

Memory

erro
r
.

T
able

Ireports

the

performance

of

the

classifie
r
.

There

are

t
w
o

aspects

of

this

table

to

consider:

precision

4




Train

Accuracy

.62

.68

.72

.75

.77

.80

Test

Accuracy

.59

.62

.63

.64

.64

.64

Train

Time

(
s)

6.1

16.4

31

52

77

111

Test

Time

(s)

.7

1.4

2.0

2.8

3.4

4.1



Table

I

UFL
-
ELM
CLASSIFICATION

USING

CPU

MATLAB

IMPLEMENTATION


1

1ooo

1
2ooo

1

3ooo

1

4ooo

1

5ooo

1

6ooo

1

Table

II

UFL
-
ELM

PERFORMANCE

WITH

CUDA

SPEED

UP


1

10oo

1

2ooo

1

3ooo

1

4ooo

1

5ooo

1

6ooo


Train

Accuracy

.62

.68

.72

.75

.77

.80

Test

Accuracy

.59

.62

.63

.64

.64

.64

Train

Time

(s)

108

327

675

1205

1845

2766

Test

Time

(s)

8

15

24

33

42

49









1
5
0
0

+
--------------
-
-

.
--------
-

;::






1000


2000


3000


4000


5000


6
0
00

Number

af

hidden

neurons



Figure

5.

Speeding

up

the

ELM

classifier

using

CUDA.

As

the

number

of
hidden

neurons

increased,

the

time

required

for

the

pseudo

inverse

MATLAB
implementation

followed

the

power

law

of

the

number

of

hidden

neurons,
while

the

time

required

for

CUDA

implementation

remained

linear.




and

time.

Regarding

the

first

aspect,

the

precision

of

the

ELM
increased

with

the

number

of

neurons.

However,

when

there
were

4,000

hidden

neurons,

the

increment

of

precision

was
trivial.

In

fact,

when

we

reduced

the

dataset

size

by

half

to
overcome

the

memory

problem,

we found

that

an

increased
number

of

hidden

neurons

decreased

the

precision

of

the

ELM
classifier.

The

second

aspect

of

the

ELM

classifier

is

time.

As

the
number

of

hidden

neurons

increased,

so

did

the

complexity,
which

then

increased

the

time

needed

for

training.

With

6,000
hidden

neurons,

it

took

MATLAB,

with

its

proprietary

matrix
manipulation

optimization,

more

than

45

minutes

to

train

the
ELM.

To reduce

the

training

time,

we

leveraged

the

parallel
characteristics

in the

matrix,

addition

and

multiplication,

and
implemented

it

using

the

CUDA

library.

We

ran

the

ELM

on
a

machine

with

a

NYDIA

Tesla

C2050

and

measured

the

time
required

for

training.

The

result

is

plotted

in

Fig.

5.

With

GPU

implementation,

the

timing

improved

dramati-
cally.

I
n

fact,

with

6,000

hidden

neurons,

GPU

implementation
was

20

times

faster

than

Matlab

in

training

and

10

times

faster
in

testing.

The

multiclass

SVM classifier

in

[2]

would

yield

80%

training

accuracy

and

75%

testing

accuracy,

and

it

requires

452s

of running

time.

Our

approach

has

80%

and

64%
respectively.

Though

the

performance

seems

higher

compared
to

our

ELM

approach,

the

method

requires

a

great

deal

of

work
to

tune

and

optimize

the

SVM.

The

needed

time

performing
cross
-
validation

for

parameter

setting in

theSVM

approach

is
substantial.

VI.

CONCLUSION


In

this

paper,

we

introduced

a

combination

of

unsupervised
and

supervised

learning

for

the

problem

of

classification.

For
unsupervised

learning,

we used

unlabeled

data

to

build

the
feature

encoding

and

then

used

that

encoder

to

extract

the
feature

in

the

labeled

data.

Doing

so

leveraged

the

availability
of

data

from

many

other

sources

and

eliminated

the

daunting
process

of

designing

features

suitable

for

each

specific

type
of

data.

Furthermore,

for

supervised

learning,

we

exploited
the

ELM

for

faster

classifier

training.

The

ELM

was

sped

up
further

using

massive

parallel

programming

due

to its

matrix
manipulation

capabilities.

The

result

on

a

large

image

dataset
confirmed

the

advantage

of

the

approach



Acknowledgements


We

would

like

to

thank

Adam

Coates

and

Guang
-
Bin
Huang

for

their

public

codes.

Partial

support

from

the

National
Science

Foundation,

the Missouri

S&T

Intelligent

Systems
Center,

and

the

Mary

K.

Finley

Missouri

Endowment

is
gratefully

acknowledged.



REFERENCES


[1]

Cifar
-
10.

In

http://www
.
cs
.
toronto
.
edul

kriz/cifar.html.

[2]

Adam

Coates,

Honglak

Lee,

and

Andrew

Y

Ng.

An

analysis

of

single-

layer

networks

in

unsupervised

feature

learning.

Ann
Arbor
,

1001:48109,

2010.

[3]

N.

Dalal

and

B.

Triggs.

Histograms

of

oriented

gradients

for

human
detection.

In

Proc.

IEEE

Computer
Society

Conf

Computer

Vision

and
Pattern

Recognition

CVPR

2005,

volume

1,

pages

886
-
893,

2005.

[4]

Scott

E

Fahlman.

An

empirical

study

of

learning

speed

in

back-

propagation

networks.

CMU

Technical Report,

1988.

[5]

Yoav

Freund

and

Robert

E

Schapire.

Experiments

with

a

new

boosting
algorithm.

In

Machine

learning

-

International

workshop

then

confer-
ence

-
,

pages

148
-
156.

Morgan

Kaufmann

Publishers,

Inc.,

1996.

[6]

I.

Goodfellow,

Q.

Le,

A.

Saxe,

H.

Lee,

and

A.

Ng.

Measuring

invariances
in

deep

networks.

In

NIPS,

2010.

[7]

Geoffrey

E Hinton,

Simon

Osindero,

and

Yee
-
Whye

Teh.

A

fast

learning
algorithm

for

deep

belief

nets.

Neural

computation,

18(7)
:1527
-
1554,

2006.

[8]

Chih
-
Wei

Hsu

and

Chih
-
Jen

Lin.

A

comparison

of

methods

for

mu
l
-
ticlass

support

vector

machines.

Neural

Networks,

IEEE

Transactions
on,

13(2):415

-
425,

mar

2002.

[9]

Guang
-
Bin

Huang,

Xiaojian

Ding,

and

Hongming

Zhou.

Optimization
method

based

extreme

learning

machine
for
classification.

Neurocom-
puting,

74(1):155
-
163,

2010.

[10]

Guang
-
Bin

Huang,

Qin
-

Yu

Zhu,

and

Chee
-
Kheong

Siew.

Extreme

learn-
ing

machine:

a

new

learning

scheme

of

feedforward

neural

networks.
In

Neural

Networks,

2004.

Proceedings.

2004

IEEE

International

Joint
Conference

on,

volume

2,

pages

985
-
990.

IEEE,

2004.

[11]

Guang

Bin

Huang,

Qin

Yu

Zhu,

and

Chee

Kheong

Siew.

Extreme
learning

machine:

Theory

and

applications.

Neurocomputing,

2006.

[12]

Hieu

Trung

Huynh

and

Yonggwan

Won.

Evolutionary

algorithm

for
training

compact

single

hidden

layer

feedforward

neural

networks.
In

Neural

Networks, 2008.

IJCNN

2008.(IEEE

World

Congress
on
Computational

Intelligence).

IEEE

International

Joint

Conference

on,
pages

3028
-
3033.

IEEE,

2008.

5




[13]

T
akuya

Inoue

and

Shigeo

Abe.

Fuzzy

support

v
ector

machines
for

pat
-

tern

classification.

In

Neu
r
al

Networks,

2001.

P
r
oceedings.

IJCNN’01.
International

J
oint

Confe
r
ence

on
,

v
olume

2,

pages

1449

1454.

IEEE,

2001.

[14]

Q.

V
.

Le,

W
.

Y
.

Zou,

S.

Y
.

Y
eung,

and

A.

Y
.

Ng.

Learning

hierarchical

in
-

v
ariant

spatio
-
temporal

features

for

action

recognition

with

independent
subspace

analysis.

In

P
r
oc.

IEEE

Con
f
.

Computer

V
ision

and

P
attern
Rec
o
gnition

(CVPR)
,

pages

3361

3368,

2011.

[15]

D.

G.

L
o
we.

Object

recognition

from

local

scale
-
i
n
v
ariant

features.

In

P
r
oc.

S
e
venth

IEEE

Int

Computer

V
ision

Con
f
.

The
,

v
olume

2,

pages

1150

1157,

1999.

[16]

Rajat

Raina,

Anand

Madh
a
v
an,

and

Andr
e
w

Y

Ng.

La
r
ge
-
scale

deep
unsupervised

learning

using

graphics

processors.

In

P
r
oceedings

of

the

26th

Annual

International

Confe
r
ence

on

Ma
c
hine

Learning
,

v
olume

382,

pages

873

880.

A
CM,

2009.

[17]

R.

Xu

and

D.

W
unsch.

Clustering
.

W
il
e
y
-
IEEE

Press,

2009.