Modified Support Vector Machine for Detecting

spraytownspeakerAI and Robotics

Oct 16, 2013 (3 years and 9 months ago)

65 views

Modif
ied

Support Vector Machine

for Detecting
B
ikini

Images



李宏毅

R97942033,
李倫銓

D97921013

tlkagkb93901106@yahoo.com.tw
,
d97921013@ntu.edu.tw


Abstract


The paper
u
se

two

novel

modified support vector machine (SVM), cluster
-
based
SVM and multi
-
kernel SVM tree, for more robust training.
Both novel methods

can
outperform traditional SVM training.


1. Int
r
oduction


Recently,
pornographic

or bikini

image detection has be
come an important and
useful problem in web mining and image retrieval.

Especially nowadays lots of people
are download pictures from web album or blogs.

In our term project, we try to detect
pornographic

or bikini
images from a collection of normal images

and overcome
some difficulties
of
the task.

(
I
n this project we wrote a spider to scratch images from
wrech album.
W
e use these pictures for this project research only and we will then
delete these pictures).


2. Difficulty


Support vector machine is the

method we used in our task.
Although support
vector machine is a powerful machine learning method for classification, some
difficulties would appear when using SVM as our training method.

Consider the image examples in
Fig.

1.
If we want to retrieve imag
e about
bikinis
,
all the images in
Fig. 1
should be retrieved although they look different.
However, in feature space
, images in
Fig. 1

may not gather together. (
Fig. 2
)

If we
se
lect hyper
-
plane 1 to separate
bikinis

from
other images
, we would miss lots o
f
images about
bikinis
. However, if we select hyper
-
plane 2 as separating hyper
-
plane,
we would
get lots of images
irrelevant

to
bikinis
.
K
ernel functions
may

be a useful
tool to solve the above problem, but
sometimes

kernel functions would result in
over
-
fitting
.
In our experiment, f
or kernel function with high
dimension
s (ex.
radial
basis
kernel
function
)
,
SVM would get a model with zero missing rate and zero false
alarm in training data but perform terrible on testing data
. Two methods are
proposed to ma
ke SVM more robust.

We
implement

the two method
s

on
our

task
pornographic

images detection.




Fig.

1



Fig.
2


2. Cluster
-
based

SVM

For
pornographic

images may be very different, we cluster the
pornographic

images of trainin
g data by k
-
means

(Fig. 3)
. Then each cluster is

used to train an
SVM model (Fig. 4
).
Eight clusters are used in our experiment.



Fig. 3


Fig.4


3
. Multi
-
kernel SVM tree


The following graph illustrates the training procedu
re of SVM tree by using data
of Fig. 2:



The
procedure

above is a simple SVM tree which may outperform any
hyper
-
plane in
Fig.

2
. SVM tree is a kind of decision tree, but the
question
s used in
SVM tree are hyper
-
planes. SVM tree can be viewed as a

decisi
on tree
consider
ing

several questions at a same time.

However, in reality, in our experiment
,

SVM

tree cannot outperform
conventional SVM significantly because SVM is so powerful that after separating
features into two
nodes
, it cannot separate the
nodes

a
nymore nevertheless lots of
wrong classification

still
exists in the nodes
.
We modify

SVM tree into multi
-
kernel
SVM tree.
We
choose a kernel function

from a defined kernel function set

which can
decrease wrong classification of development set to split a
node
.

MULTI
-
KERNEL SVM TREE ALGORITHM

S
PLIT
-
NODE
(all training data, all development data)


END of ALGORITHM

S
PLIT
-
NODE(training set, develop
ment

set)

{


take a kernel function

f

from kernel function set


use function f to train a
SVM
model m using training

set


use model m to classify development set

if (using model m to classify development set can decrease number of
misclassification)

using model m to classify training set into training set A and training
set B

using model m to classify training set int
o development set A and
development set B



S
PLIT
-
NODE(training set A, development set A)


S
PLIT
-
NODE(training set B, development set B)



break


end

}until(every kernel
function

in kernel function set has been used)


4. Experiment

4.1 Data


6879 images a
re gathered
by spider. 712 images are labeled
pornographic

by
manual
, and the other 6167 images are consider
ed

as normal
. 4 fold cross validation
are conducted on all the experiments below.

4.2 Image feature


We use
color histogram and Gabor texture in HSV

color space as our feature.
There are 186 features for each image, 162 features are color histogram, and 24
features are Gabor texture.

4.3 Evaluation
Merit


The
evaluati
on merit would be different from

retrieval because our goal is going
to detect
pornog
raphic

images and then reject them instead retrieve them. Some
term must be defined first:







We use

loss


to defined the performance of our system.




We set
, so
.

4.
4

Experiment Result


S
ingle SVM

C
luster
-
based
SVM

M
ulti
-
kernel
SVM tree

L
inear kernel

B
est kernel*

M
issing rate

15.87


32.86


40.17


20.37


F
alse alarm rate

40.44


17.64


9.50


23.17


L
oss

56.31

50.5

49.67

43.54

*
Average the performance of the kernel function which performs best in a fold

Table 1

4.6 Discussion


From Table 1, we can observe that multi
-
kernel SVM tree even outperform
selecting the kernel which perform best on each fold of testing set. The observation
verify that multi
-
kernel SVM tree is useful.


5. Conclusion and Future Work


Multi
-
kernel SVM tree has some spirit the same as AdaBoost. AdaBoost classify
a feature by combining several classifier
s

with different weight. However,
Multi
-
kernel SVM

tree
separate feature
s

into several
areas
, and then defined the
class of the feature

by model trained by local data. Multi
-
kernel SVM tree can be
developed to combine several classifier
s
, and it is able to focus on local information
which is different from AdaBoost.

Combining
several machine learning algorithm
instead of
only SVM

may be instresting.