Question 1 - Waikato Mailing Lists

munchsistersΤεχνίτη Νοημοσύνη και Ρομποτική

17 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

85 εμφανίσεις



Question 1
:

***** German credit dataset has 700 positive samples and 300 negative samples; I split the
dataset to training dataset with 900 sample
s and testing dataset with 100 samples. Training
dataset has 632 positive samples (genuine) and 268 negative samples (fraud). Testing dataset
includes 68 positive samples and 32 negative samples. German credit data dataset is a nominal
type dataset. I work

with this dataset for support vector machine, but
I do not know that the
nominal type is suitable for SVM or numeric type is suitable type for SVM???????

And I
do not know
the cross
-
validation only is used for SVM
or

training and testing also can be
used
for SVM for binary classification and one
-
class classification????????


A c
omparison
imbalanced

datasets in SMO

(using of RBF kernel with default setting)


comment

Dataset

#SV

#samples

#Genuine

#Fraud

Accuracy


(%)


Computation
Time

(s)

Train

Test

Train

Test

Original dataset

(Imbalance)

Training
sample

629

900

632

268

70.22

68

0.41

0.51

Test sample

100

68

32

20 %

reduction of
fraud

from imbalance
original dataset

Training
sample

463

846

632

214

74.70

68

0.36

0.33

Test sample

100

68

32

40%

reduction of
fraud

from imbalance
original dataset

Training
sample

377

793

632

161

79.69

68

0.24

0.24

Test sample

100

68

32

60%

reduction of
fraud

from imbalance
original dataset

Training
sample

310

739

632

107

85.52

68

0.18

0.18

Test
sample

100

68

32

80%

reduction of
fraud

from imbalance
original dataset

Training
sample

181

686

632

54

92.12

68

0.12

0.11

Test s
ample

100

68

32



*****

reduction of fraud or negative sample have been done without using filtering and use only
fraud sample from dataset file.



Continue.......






A c
omparison imbalanced datasets in
C
-
SVC
in LIBSVM (using of RBF kernel with default setting)

comment

Dataset

#samples

#Genuine

#Fraud

Accuracy
(%)


Computation
Time

(s)

Train

Test

Train

Test

Original dataset

(Imbalance)

Training sample

900

632

268

80.44

74

0.28

0.37

Test sample

100

68

32

20 %

reduction of fraud

from imbalance original
dataset

Training sample

846

632

214

82.86

74

0.16

0.17

Test sample

100

68

32

40%

reduction of fraud

from imbalance original
dataset

Training sample

793

632

161

83.60

68

0.14

0.13

Test sample

100

68

32

60%

reduction of fraud

from imbalance original
dataset

Training sample

739

632

107

85.92

68

0.13

0.1

Test sample

100

68

32

80%

reduction of fraud

from imbalance original
dataset

Training sample

686

632

54

92.12

68

0.08

0.09

Test sample

100

68

32





A c
omparison
imbalanced

datasets in

-
SVC in LIBSVM (using of RBF kernel with default setting)

comment

Dataset

#samples

#Genuine

#Fraud

Accuracy
(%)


Computation
Time

(s)

Train

Test

Train

Test

Original dataset

(Imbalance)

Training sample

900

632

268

83.44

77

0.33

0.3

Test sample

100

68

32

20 %

reduction of
fraud

from imbalance
original dataset

Training sample

846

632

214

80.61

71

0.22

0.19

Test sample

100

68

32

40%

reduction of
fraud

from imbalance
original dataset

Training sample

793

632

161

--

--

--

--

Test sample

100

68

32

60%

reduction of
fraud

from imbalance
original dataset

Training sample

739

632

107

--

--

--

--

Test sample

100

68

32

80%

reduction of
fraud

from imbalance
original dataset

Training sample

686

632

54

--

--

--

--

Test sample

100

68

32


****** Red line means that there is not any data for infeasible nu
-
parameter!!!!!!!

Question
2:

Accuracy in three above tables is increased, but accuracy with reduction of negative
samples must be decreased. I do
not know what do I make mistake?????? Please help me.


***** For balancing dataset, I used from your comment in previous email for filterin in
weka:
weka.filters.supervised.instance.SpreadSubsample can be used to balance the class
distribution.



A c
ompari
son
balanced

training datasets in SMO (using of RBF kernel with default setting)

comment

Dataset

#SV

#samples

#Genuine

#Fraud

Accuracy
(%)


Computation
Time

(s)

Train

Test

Train

Test

Original dataset

(Imbalance :
900 training
and 100 testing
)

Training sample

440

536

268

268

71.08

61

0.15

0.16

Test sample

100

68

32

20 %

reduction of fraud

from imbalance original
dataset

Training sample

409

482

268

214

72.61

66

0.31

0.16

Test sample

100

68

32

40%

reduction of fraud

from imbalance original
dataset

Training sample

333

429

161

161

69.93

72

0.08

0.2

Test sample

100

68

32

60%

reduction of fraud

from imbalance original
dataset

Training sample

230

375

107

107

71.46

68

0.06

0.06

Test sample

100

68

32

80%

reduction of fraud

from imbalance original
dataset

Training sample

152

322

54

54

83.22

68

0.05

0.05

Test s
ample

100

68

32






Co
ntinue.....















A c
omparison balanced training datasets in
C
-
SVC in LIBSVM

(using of RBF kernel with default setting)

comment

Dataset

#samples

#Genuine

#Fraud

Accuracy
(%)


Computation
Time

(s)

Train

Test

Train

Test

Original dataset

(Imbalance)

Training sample

536

268

268

79.29

68

0.9

0.9

Test sample

100

68

32

20 %

reduction of fraud

from
imbalance original dataset

Training sample

482

268

214

79.66

72

0.25

0.08

Test sample

100

68

32

40%

reduction of fraud

from
imbalance original dataset

Training sample

429

268

161

80.41

73

0.11

0.09

Test sample

100

68

32

60%

reduction of fraud

from
imbalance original dataset

Training sample

375

268

107

83.46

74

0.05

0.05

Test sample

100

68

32

80%

reduction of fraud

from
imbalance original dataset

Training sample

322

268

54

84.78

68

0.05

0.05

Test sample

100

68

32


A c
omparison balanced training datasets in

-
SVC in LIBSVM

(using of RBF kernel with default setting)

comment

Dataset

#samples

#Genuine

#Fraud

Accuracy
(%)


Computation
Time

(s)

Train

Test

Train

Test

Original dataset

(Imbalance)

Training sample

536

268

268

88.99

65

0.23

0.11

Test sample

100

68

32

20 %

reduction of fraud

from imbalance original
dataset

Training
sample

482

268

214

89.21

68

0.19

0.11

Test sample

100

68

32

40%

reduction of fraud

from imbalance original
dataset

Training sample

429

268

161

89.51

67

0.08

0.06

Test sample

100

68

32

60%

reduction of fraud

from imbalance original
dataset

Training sample

375

268

107

85.6

73

0.08

0.06

Test sample

100

68

32

80%

reduction of fraud

from imbalance original
dataset

Training sample

322

268

54

--

--

--

--

Test sample

100

68

32


****** Red line means that there is not any data for
infeasible nu
-
parameter!!!!!!!

Continue...


Question
3:

Sorry,
in previous mail I wanted you to explain about how make can one
-
class classification
in weka and you answered me “
weka.filters.unsupervised.instance.SubsetByExpression

can be
used to filter out instances with a particular value for an attribute. After you create a new data set
with only one class value present, you will need to edit the ARFF file to remove the no longer
present class values from the class attribute decl
aration in the header of the file.
” But
I do not
understand why

dose

unsupervised

use
in weka

for one
-
class SVM classification

when the
one
-
class classification is known as
supervise
?????????????