SUPPORT VECTOR MACHINE LEARNING FOR DETECTION OF
MICROCALCIFICATIONS IN MAMMOGRAMS
Issam El-Naqa, Yongyi Yang, Miles N. Wernick, Nikolas P. Galatsanos, and Robert Nishikawa*
Dept. of Electrical and Computer Engineering, Illinois Institute of Technology
3301 S. Dearborn Street, Chicago, IL 60616
*Department of Radiology, University of Chicago
5841 South Maryland Avenue, Chicago, IL 60637
This work was supported in part by NIH/NCI grant CA89668.
Microcalcification (MC) clusters in mammograms can be
an indicator of breast cancer. In this work we propose for
the first time the use of support vector machine (SVM)
learning for automated detection of MCs in digitized
mammograms. In the proposed framework, MC detection
is formulated as a supervised-learning problem and the
method of SVM is employed to develop the detection
algorithm. The proposed method is developed and
evaluated using a database of 76 mammograms
containing 1120 MCs. To evaluate detection
performance, free-response receiver operating
characteristic (FROC) curves are used. Experimental
results demonstrate that, when compared to several other
existing methods, the proposed SVM framework offers the
Microcalcification (MC) clusters are an indicator of breast
cancer, which is a leading cause of death in women. MCs
are tiny calcium deposits that appear as small bright spots
in a mammogram (as illustrated in Fig. 1). Individual MCs
are sometimes difficult to detect because of the
surrounding breast tissue, their variation in shape,
orientation, brightness and size (typically, 0.05-1mm) .
In the literature, a great many image-processing
methods have been proposed to detect MCs automatically.
Here, we briefly cite a few. A statistical Bayesian image
analysis model was developed in . A difference image
technique was investigated in . Wavelet based
approaches were studied in . A detection scheme using
multi-scale analysis was proposed in . Methods based
on weighted difference of Gaussian filtering were used in
. A method based on higher-order statistics was
developed in . A fuzzy logic approach was proposed in
. A 2D adaptive lattice algorithm was used to predict
correlated clutters in the mammogram in . Fractal
modeling was proposed in . A method based on
region growing and active contours was studied in .
More recently, a two-stage neural network approach was
proposed in .
In this work we investigate for the first time the use of a
support vector machine (SVM) learning framework for
MC detection, and show that it provides the best
performance among the methods we have tested so far.
SVM learning is based on the principle of structural risk
minimization . Instead of directly minimizing learning
error, it aims to minimize the bound on the generalization
error. As a result, an SVM is able to perform well when
applied to data outside the training set. In recent years
SVM learning has been applied to a wide range of real-
world applications where it has been found to offer
superior performance to that of competing methods .
In the proposed work, MC detection is considered as a
two-class pattern classification task performed at each
location in the mammogram. The two classes are MC
present and MC absent. With an SVM formulation, a
nonlinear classifier is trained using supervised learning to
automatically detect the presence of MCs in a
Figure 1. A section of a mammogram containing multiple
MCs (labeled with circles).
2.1. SVM classifier
The basic idea of an SVM classifier is illustrated in Fig. 2.
This figure shows the simplest case in which the data
vectors (marked by Xs and Os) can be separated by a
hyperplane. In such a case there may exist many
separating hyperplanes. Among them, the SVM classifier
seeks the separating hyperplane that produces the largest
separation margin [13,14]. Such a scheme is known to be
associated with structural risk minimization .
In the more general case in which the data points are
not linearly separable in the input space, a nonlinear
transformation is used to map the data vector x into a
high-dimensional space (called feature space) prior to
applying the linear maximum-margin classifier. To avoid
the potential pitfall of over-fitting in this higher-
dimensional space, an SVM uses a kernel function in
which the nonlinear mapping is implicitly embedded.
According to Covers theorem , a function qualifies as
a kernel provided that it satisfies the Mercers conditions.
With the use of a kernel, the discriminant function in an
SVM classifier has the following form:
( ) (,)
g d Kα α
x x x
K ⋅ ⋅
is the kernel function,
support vectors determined from training data,
number of support vectors, d
is the class indicator (e.g.,
+1 for class 1 and 1 for class 2) associated with each
are constants, also determined from training.
By definition, support vectors (Fig. 2) are elements of
the training set that lie either exactly on or inside the
decision boundaries of the classifier. In essence, they
consist of those training examples that are most difficult
to classify. The SVM classifier uses these borderline
examples to define its decision boundary between the two
classes. This in philosophy is quite different from a
classifier that is based on minimizing leaning error alone.
Note that in a typical SVM learning problem only a small
portion of the training examples will typically qualify as
2.2. Design of SVM Classifier for MC Detection
A. Input feature vector
MCs appear as tiny bright spots in a mammogram. To test
the presence of an MC at a given location, we use as input
pattern to the SVM the pixel values in a small
window centered at that location. We chose M=9 to
accommodate the MCs, the average size of which was
around 6-7 pixels in diameter in our data set. Such a
window size can effectively avoid any potential
interference from neighboring MCs.
Figure 2. Support vector machine classification with
a linear hyperplane that maximizes the separating
margin between the two classes.
Alternatively, other image features (e.g., local edges,
etc.) might prove more salient rather than pixel values for
the input pattern. However, it is not clear what defines the
complete set of salient features deemed relevant for MC
detection. Thus, we used image pixels.
The image must be preprocessed before the pixel values
are used. A high-pass filter with a very narrow stop-band
was applied to mitigate the effect of spatial inhomogeneity
of the background. This filter was designed as a length-41,
linear-phase FIR filter with cutoff frequency w
B. SVM kernel functions
The kernel function plays the central role of implicitly
mapping the input vector into a high-dimensional feature
space, in which better separability is achieved. In this
study the following two types of kernel functions are
1. Polynomial kernel
(,) ( 1), where 0 is a constant
K p= + >x y x y
2. Gaussian radial basis function (RBF) kernel
where 0 is a constant that defines the k
Both of these kernels satisfy the Mercers conditions ,
and are among the most commonly used in SVM.
C. Training examples
Training examples are gathered from the mammograms as
follows: for the MC present class (designated Class 1),
image windows of size
are collected at the
centers of mass of the MCs identified in the database; for
the MC absent class (designated Class 2), image
windows are collected from those regions of the image
containing no MCs. Because there are typically far more
background regions than regions containing MCs, a
random sampling scheme is adopted for Class-2 examples
so that the training examples are representative of all the
D. SVM training
denote the input feature vector for each of the
training set elements. Then the desired response of the
belongs to Class 1 and
belongs to Class 2.
The support vectors and other parameters in the
decision function g(x) in (1) are determined through
numerical optimization during the training phase.
Specifically, the dual form of the optimization problem
for maximal margin separation is given as:
1 1 1
min ( ) (,)
N N N
i j i j i j
i i j
J d d Kα α αα
= = =
subject to the following constraints:
(1) 0; and
(2) 0 for 1,2,...,,
≤ ≤ =
where N is the total number of training samples, C is a
positive regularization parameter that controls the trade-
off between complexity of the machine and the allowed
It is noted that the number of training samples used in
this study is rather large (on the order of several
thousands). Traditional optimization algorithms can no
longer be efficiently applied in this case. Fortunately,
more efficient algorithms have been developed in recent
years for the SVM optimization problem . These
algorithms typically take advantage of the fact that the
Lagrange multipliers in (2) are mostly zeros. In this study,
a technique called successive minimal optimization  is
E. SVM model selection
During the training phase, the following variables need to
be determined for the SVM classifier: the kernel function
to use, and the regularization parameter C. For this
purpose, we adopt a widely used statistical method called
-fold cross-validation, which consists of the following
steps: 1) divide randomly all the available training
equal-sized subsets; 2) use all but one
subset to train the SVM; 3) use the held out subset to
measure classification error; 4) repeat Steps 2 and 3 for
each subset, 5) average the results to get an estimate of the
generalization error of the SVM classifier. The SVM was
tested using this procedure for various parameter settings.
In the end, the model with the smallest generalization
error was adopted.
3. EXPERIMENTAL RESULTS
3.1. Data set
The proposed algorithm was developed and evaluated
using a data set provided by the Department of Radiology
at the University of Chicago. This data set consists of 76
mammograms, digitized with a spatial resolution of 0.1
mm/pixel and 10-bit grayscale. In the data set, 1120
individual MCs were identified by experienced
In this work, the mammograms in this data set were
divided equally into two subsets in a random fashion, one
used exclusively for training (designated training
mammogram set), and the other exclusively for testing
(designated test mammogram set).
3.2. Training and model selection results
The examples used for SVM training include a total of
547 for Class 1, and twice as many for Class 2. Such a
choice is a result of compromise between the vast number
of available Class-2 examples and the complexity of the
The SVM classifier was then trained using a 10-fold
cross-validation procedure for various model and
parametric settings. In Fig. 3 we show a plot of the
estimated generalization error rate for the trained SVM
classifier with the Gaussian RBF kernel. A generalization
level as low as 6% was achieved under various parametric
settings. These results demonstrate that the performance
of the SVM classifier is rather robust over the choice of
Interestingly, similar error level was also achieved when
the polynomial kernel with p=3 was used. Due to space
limitation these results are not shown.
In the evaluation study below, the SVM classifier using
the Gaussian RBF kernel with
5 and 1000
Plot of generalization errors versus the
, achieved by the SVM
classifier using the Gaussian RBF kernel with
used. The number of resulting support vectors for this
case was about 12% of the total number of training
samples; the training time was about 7 seconds
(implemented in MATLAB on a Pentium III 933 MHz
3.3. Other methods for comparison
The proposed algorithm was compared with four other
existing methods for MC detection: (1) the image
difference technique (IDT) in ; (2) the difference of
Gaussians (DoG) method in ; (3) the wavelet based
method in ; and (4) the two-stage multi-layer neural
network method in .
In our implementation, these methods were typically
run for numerous parameter settings and the one yielding
the best result was chosen for the final evaluation.
3.4. Evaluation Results
The detection performance was evaluated quantitatively
using the free-response receiver operating characteristic
(FROC) curves . An FROC curve plots the correct
detection rate (i.e., true-positive fraction) versus the
average number of false-alarms (i.e., false-positives) per
image for the continuum of the decision threshold. The
FROC curve provides a comprehensive summary of the
trade-off between missed detections and false alarms.
All the detection algorithms were evaluated using the
same set of 38 test mammograms. The results are
summarized using FROC curves in Fig. 4. As can be seen,
the SVM classifier offers the best result in the operating
range with less than 3 false clusters per image.
The small section shown in Fig. 1 was from a test
mammogram containing several MCs; these MCs (though
some of them are hardly visible) were all successfully
detected by the SVM classifier and labeled with circles.
Avg. Number of False Clusters
Figure 4. FROC curves obtained for the different
methods evaluated. A higher FROC curve indicates better
performance. The most significant portion of the curves
is at the low end of the number of false positive clusters,
where one would prefer to operate.
In this work we demonstrated an SVM based classifier to
detect microcalcifications in mammogram images.
Experimental results show that the proposed framework is
quite robust over the choice of several model parameters.
In these initial results the SVM classifier outperformed all
the other methods considered.
 M Lanyi, Diagnosis and Differential Diagnosis of Breast
Calcifications, Springer-Verlag, Berlin, 1988.
 N. Karssemeijer, A stochastic model for automated
detection calcifications in digital mammograms, in Proc. 12th
Int. conf. Info. Med. Imag., Wye, UK, July 1991.
 R. M. Nishikawa, et al, Computer aided detection of
clustered Microcalcifications in digital mammograms, Med.
Bio Eng. Comp., vol. 33, 1995.
 R. N. Strickland, and H. L. Hahn, Wavelet transforms
methods for object detection and recovery, IEEE Trans. Image
Processing, vol. 6, pp. 724-735, May 1997.
 T. Netsch, A scale-space approach for the detection of
clustered microcalcifications in digital mammograms, 3rd Int.
Workshop on Digital Mammography, 1996.
 J. Dengler, S. Behrens, and J.F. Desaga, Segmentation of
microcalcifications in mammograms, IEEE Trans. Med. Imag.,
vol. 12 no. 4, 1993.
 M. N. Gurcan, et al, Detection of microcalcifications in
mammograms using higher order statistics, IEEE Signal Proc.
Lett., vol. 4, no. 8, 1997.
 H. Cheng, Y. M. Lui, and R. I. Freimanis, A novel
approach to microcalcifications detection using fuzzy logic
techniques, IEEE Trans. Med. Imag., vol. 17 no. 3, June 1998.
 P. A. Pfrench, J. R. Zeidler, and W.H. Ku, Enhanced
detectability of small objects in correlated clutter using an
improved 2-D adaptive lattice algorithm, IEEE Trans. Imag.
Proc., vol. 6 no. 3, 1997.
 H. Li, K. J. Liu, and S. B. Lo, Fractal modeling and
segmentation for the enhancement of microcalcifications in
digital mammograms, IEEE Trans. Med. Imag., vol. 16, no. 6,
 I. N. Bankman, et al, Segmentation algorithms for
detecting microcalcifications in mammograms, IEEE Trans.
Info. Tech. in Biomed., vol. 1, no. 2, June 1997.
 S. Yu, and L. Guan, A CAD system for the automatic
detection of clustered microcalcifications in digitized
mammogram films, IEEE Trans. Med. Imag., vol. 19, pp. 115-
126, Feb. 2000.
 V. Vapnik, Statistical Learning Theory, Wiley, 1998.
 K. R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B.
Scholkopf, An introduction to kernel-based learning
algorithms, IEEE Trans. Neural Networks, vol. 12, no. 2,
 J. Platt, Fast training of support vector machine using
sequential minimal optimization, Advances in Kernel Methods:
Support Vector Learning, ed., Scholkopf et al, MIT Press, 1998.
 P. C. Bunch, et al, A free-response approach to the
measurement and characterization of radiographic-observer
performance, J. Appl. Eng. vol. 4, 1978.