Aggregating Support Vector Models for
Si Jie Phua
School of Computer Engineering, Nanyang Technological University, Singapore
This paper aims to demonstrate a methodology to find better solutions for a c
business problem .
The dataset is obtained from PAKDD competition 2007. I have
used an ensemble classifier method
Support Vector Machine (SVM) [2
, 3, 4
classifier to obtain
faster training speed and
aper is organized as follows: Section 2 lists the modifications made on the dataset
before applying them for training and prediction.
Section 3 briefly explains the support
vector machine (SVM) and ensemble classifier.
gives the parameter
train the model. Section 5 concludes the results with discussion on the business insight.
odifications are done on
before training and prediction
in order to
fit the data mining technique used
< 30K”, “30K
< 90K”, “90K
< 240K”, “240K
< 360K”, “360K+” is changed to 1, 2, 3, 4, 5, 6
correspondingly. This modification intends to represent the ordi
nal property of
E is changed to 1
ccordingly to represent the
ordinal property of the attribute.
All attributes related to bureau except “B_DEF_PAID_IND” and
“B_DEF_UNPD_IND”: 98 is changed to
1 and 99 is changed to
2 so that the
contact frequencies of customer contact with bureau are be
The training dataset is large and the class distribution is unbalanced. Thus, we propose to
train a few models
of Support Vector Machine (SVM)
where each model is trained by a
portion of training dataset, and then cl
assify new instances using the trained models with
SVM is developed
on the concept of decision planes that define decision boundaries.
decision plane separates a set of objects
different class memberships.
rplane as the decision plane, which separates the positive and negative
samples with the maximized margin.
Among the training technique of support vector
machine, sequential minimal optimization (SMO)
algorithm in YALE [6
] is used as it
can handle bot
h numerical and nominal attributes.
SVM usually has nice generalization
abilities due to its margin maximization strategy. However, the training speed of SVM
usually increases tremendously with the size of dataset.
Thus, we partition the
to train a few models of SVM so that the training
speed is faster.
We have constructed 9 training datasets from the original training dataset
to train 9 models of SVM. Each training dataset contains all the positive instances and
equal amount of negativ
e instances. By this way, we can reduce the effect of unbalanced
For the purpose of prediction, we
combine the outputs of several classifiers
scheme. This type of classifier
known as ensemble classifier .
classifier generally has low variance and thus has better generalization abilities than
With the strategies of sampling, ensemble classifier and SVM, we aim to provide a good
prediction on cross
Scoring Model Results
The only important parameter is the cost parameter of SVM. We have set it to 1 for
faster training speed.
By applying the trained model on the
dataset to be predicted, we get the
results listed in Table 1
Number of Instances
Table 1: Prediction of test dataset
The higher the score, the more likely the customer will
sign up for a new home loan with
company within 12 months of opening the credit card account. As a result, the company
should focus more on the customers that have high scores in order to maximize their
 PAKDD Competition 2007,
 C. C. Burges, “A tutorial on support vector machines for pattern recognition,” Proc.
Int. Conf. Data Mining and Knowledge Discovery, Vol.2, 1998, pp. 121
] Hyeran Byun, Seong
Whan Lee, “A sur
vey on pattern recognition applications of
support vector machines,” International Journal of Pattern Recognition and Artificial
Intelligence, Vol. 17, No. 3, 2003, pp. 459
] Marti A. Hearst, “Support vector machines,” IEEE Intelligent Systems, July
Sequential Minimal Optimization: A Fast Algorithm for Training Support
Microsoft Research Technical Report MSR
Yet Another Learning Environment
 Classifier Ensembles,