SUPPORT VECTOR MACHINE LEARNING FOR DETECTION OF

MICROCALCIFICATIONS IN MAMMOGRAMS

1

Issam El-Naqa, Yongyi Yang, Miles N. Wernick, Nikolas P. Galatsanos, and Robert Nishikawa*

Dept. of Electrical and Computer Engineering, Illinois Institute of Technology

3301 S. Dearborn Street, Chicago, IL 60616

*Department of Radiology, University of Chicago

5841 South Maryland Avenue, Chicago, IL 60637

1

This work was supported in part by NIH/NCI grant CA89668.

ABSTRACT

Microcalcification (MC) clusters in mammograms can be

an indicator of breast cancer. In this work we propose for

the first time the use of support vector machine (SVM)

learning for automated detection of MCs in digitized

mammograms. In the proposed framework, MC detection

is formulated as a supervised-learning problem and the

method of SVM is employed to develop the detection

algorithm. The proposed method is developed and

evaluated using a database of 76 mammograms

containing 1120 MCs. To evaluate detection

performance, free-response receiver operating

characteristic (FROC) curves are used. Experimental

results demonstrate that, when compared to several other

existing methods, the proposed SVM framework offers the

best performance.

1. INTRODUCTION

Microcalcification (MC) clusters are an indicator of breast

cancer, which is a leading cause of death in women. MCs

are tiny calcium deposits that appear as small bright spots

in a mammogram (as illustrated in Fig. 1). Individual MCs

are sometimes difficult to detect because of the

surrounding breast tissue, their variation in shape,

orientation, brightness and size (typically, 0.05-1mm) [1].

In the literature, a great many image-processing

methods have been proposed to detect MCs automatically.

Here, we briefly cite a few. A statistical Bayesian image

analysis model was developed in [2]. A difference image

technique was investigated in [3]. Wavelet based

approaches were studied in [4]. A detection scheme using

multi-scale analysis was proposed in [5]. Methods based

on weighted difference of Gaussian filtering were used in

[6]. A method based on higher-order statistics was

developed in [7]. A fuzzy logic approach was proposed in

[8]. A 2D adaptive lattice algorithm was used to predict

correlated clutters in the mammogram in [9]. Fractal

modeling was proposed in [10]. A method based on

region growing and active contours was studied in [11].

More recently, a two-stage neural network approach was

proposed in [12].

In this work we investigate for the first time the use of a

support vector machine (SVM) learning framework for

MC detection, and show that it provides the best

performance among the methods we have tested so far.

SVM learning is based on the principle of structural risk

minimization [13]. Instead of directly minimizing learning

error, it aims to minimize the bound on the generalization

error. As a result, an SVM is able to perform well when

applied to data outside the training set. In recent years

SVM learning has been applied to a wide range of real-

world applications where it has been found to offer

superior performance to that of competing methods [14].

In the proposed work, MC detection is considered as a

two-class pattern classification task performed at each

location in the mammogram. The two classes are MC

present and MC absent. With an SVM formulation, a

nonlinear classifier is trained using supervised learning to

automatically detect the presence of MCs in a

mammogram.

Figure 1. A section of a mammogram containing multiple

MCs (labeled with circles).

2. METHODOLOGY

2.1. SVM classifier

The basic idea of an SVM classifier is illustrated in Fig. 2.

This figure shows the simplest case in which the data

vectors (marked by Xs and Os) can be separated by a

hyperplane. In such a case there may exist many

separating hyperplanes. Among them, the SVM classifier

seeks the separating hyperplane that produces the largest

separation margin [13,14]. Such a scheme is known to be

associated with structural risk minimization [13].

In the more general case in which the data points are

not linearly separable in the input space, a nonlinear

transformation is used to map the data vector x into a

high-dimensional space (called feature space) prior to

applying the linear maximum-margin classifier. To avoid

the potential pitfall of over-fitting in this higher-

dimensional space, an SVM uses a kernel function in

which the nonlinear mapping is implicitly embedded.

According to Covers theorem [13], a function qualifies as

a kernel provided that it satisfies the Mercers conditions.

With the use of a kernel, the discriminant function in an

SVM classifier has the following form:

1

( ) (,)

S

L

i

i i

o

i

g d Kα α

=

= +

∑

x x x

, (1)

where

(,)

K ⋅ ⋅

is the kernel function,

x

i

are so-called

support vectors determined from training data,

S

L

is the

number of support vectors, d

i

is the class indicator (e.g.,

+1 for class 1 and 1 for class 2) associated with each

x

i

,

and

α

i

are constants, also determined from training.

By definition, support vectors (Fig. 2) are elements of

the training set that lie either exactly on or inside the

decision boundaries of the classifier. In essence, they

consist of those training examples that are most difficult

to classify. The SVM classifier uses these borderline

examples to define its decision boundary between the two

classes. This in philosophy is quite different from a

classifier that is based on minimizing leaning error alone.

Note that in a typical SVM learning problem only a small

portion of the training examples will typically qualify as

support vectors.

2.2. Design of SVM Classifier for MC Detection

A. Input feature vector

MCs appear as tiny bright spots in a mammogram. To test

the presence of an MC at a given location, we use as input

pattern to the SVM the pixel values in a small

M M

×

window centered at that location. We chose M=9 to

accommodate the MCs, the average size of which was

around 6-7 pixels in diameter in our data set. Such a

window size can effectively avoid any potential

interference from neighboring MCs.

Figure 2. Support vector machine classification with

a linear hyperplane that maximizes the separating

margin between the two classes.

Alternatively, other image features (e.g., local edges,

etc.) might prove more salient rather than pixel values for

the input pattern. However, it is not clear what defines the

complete set of salient features deemed relevant for MC

detection. Thus, we used image pixels.

The image must be preprocessed before the pixel values

are used. A high-pass filter with a very narrow stop-band

was applied to mitigate the effect of spatial inhomogeneity

of the background. This filter was designed as a length-41,

linear-phase FIR filter with cutoff frequency w

c

=0.125.

B. SVM kernel functions

The kernel function plays the central role of implicitly

mapping the input vector into a high-dimensional feature

space, in which better separability is achieved. In this

study the following two types of kernel functions are

considered:

1. Polynomial kernel

(,) ( 1), where 0 is a constant

T p

K p= + >x y x y

.

2. Gaussian radial basis function (RBF) kernel

2

2

(,) exp,

2

K

σ

−

= −

x y

x y

where 0 is a constant that defines the k

ernel width.

σ >

Both of these kernels satisfy the Mercers conditions [13],

and are among the most commonly used in SVM.

C. Training examples

Training examples are gathered from the mammograms as

follows: for the MC present class (designated Class 1),

image windows of size

M M

×

are collected at the

centers of mass of the MCs identified in the database; for

the MC absent class (designated Class 2), image

windows are collected from those regions of the image

containing no MCs. Because there are typically far more

background regions than regions containing MCs, a

random sampling scheme is adopted for Class-2 examples

so that the training examples are representative of all the

mammograms.

D. SVM training

Let

x

i

denote the input feature vector for each of the

training set elements. Then the desired response of the

classifier

d

i

= +

1

if

x

i

belongs to Class 1 and

d

i

= −

1

if

x

i

belongs to Class 2.

The support vectors and other parameters in the

decision function g(x) in (1) are determined through

numerical optimization during the training phase.

Specifically, the dual form of the optimization problem

for maximal margin separation is given as:

1 1 1

1

min ( ) (,)

2

N N N

i

i j i j i j

i i j

J d d Kα α αα

= = =

= −

∑ ∑∑

x x

G

, (2)

subject to the following constraints:

1

i

(1) 0; and

(2) 0 for 1,2,...,,

l

i i

i

d

C

i N

α

α

=

=

≤ ≤ =

∑

where N is the total number of training samples, C is a

positive regularization parameter that controls the trade-

off between complexity of the machine and the allowed

classification error.

It is noted that the number of training samples used in

this study is rather large (on the order of several

thousands). Traditional optimization algorithms can no

longer be efficiently applied in this case. Fortunately,

more efficient algorithms have been developed in recent

years for the SVM optimization problem [14]. These

algorithms typically take advantage of the fact that the

Lagrange multipliers in (2) are mostly zeros. In this study,

a technique called successive minimal optimization [15] is

adopted.

E. SVM model selection

During the training phase, the following variables need to

be determined for the SVM classifier: the kernel function

to use, and the regularization parameter C. For this

purpose, we adopt a widely used statistical method called

m

-fold cross-validation, which consists of the following

steps: 1) divide randomly all the available training

examples into

m

equal-sized subsets; 2) use all but one

subset to train the SVM; 3) use the held out subset to

measure classification error; 4) repeat Steps 2 and 3 for

each subset, 5) average the results to get an estimate of the

generalization error of the SVM classifier. The SVM was

tested using this procedure for various parameter settings.

In the end, the model with the smallest generalization

error was adopted.

3. EXPERIMENTAL RESULTS

3.1. Data set

The proposed algorithm was developed and evaluated

using a data set provided by the Department of Radiology

at the University of Chicago. This data set consists of 76

mammograms, digitized with a spatial resolution of 0.1

mm/pixel and 10-bit grayscale. In the data set, 1120

individual MCs were identified by experienced

mammographers.

In this work, the mammograms in this data set were

divided equally into two subsets in a random fashion, one

used exclusively for training (designated training

mammogram set), and the other exclusively for testing

(designated test mammogram set).

3.2. Training and model selection results

The examples used for SVM training include a total of

547 for Class 1, and twice as many for Class 2. Such a

choice is a result of compromise between the vast number

of available Class-2 examples and the complexity of the

training.

The SVM classifier was then trained using a 10-fold

cross-validation procedure for various model and

parametric settings. In Fig. 3 we show a plot of the

estimated generalization error rate for the trained SVM

classifier with the Gaussian RBF kernel. A generalization

level as low as 6% was achieved under various parametric

settings. These results demonstrate that the performance

of the SVM classifier is rather robust over the choice of

model parameters.

Interestingly, similar error level was also achieved when

the polynomial kernel with p=3 was used. Due to space

limitation these results are not shown.

In the evaluation study below, the SVM classifier using

the Gaussian RBF kernel with

5 and 1000

Cσ= =

was

10

-1

10

0

10

1

10

2

10

3

10

4

10

5

0.05

0.06

0.08

0.1

0.15

0.2

0.25

Misclassification rate

C

σ=2.5

σ

=5

σ

=10

Figure 3.

Plot of generalization errors versus the

regularization parameter

C

, achieved by the SVM

classifier using the Gaussian RBF kernel with

σ=

2.5, 5,

and 10.

used. The number of resulting support vectors for this

case was about 12% of the total number of training

samples; the training time was about 7 seconds

(implemented in MATLAB on a Pentium III 933 MHz

dual-processor PC).

3.3. Other methods for comparison

The proposed algorithm was compared with four other

existing methods for MC detection: (1) the image

difference technique (IDT) in [3]; (2) the difference of

Gaussians (DoG) method in [6]; (3) the wavelet based

method in [4]; and (4) the two-stage multi-layer neural

network method in [12].

In our implementation, these methods were typically

run for numerous parameter settings and the one yielding

the best result was chosen for the final evaluation.

3.4. Evaluation Results

The detection performance was evaluated quantitatively

using the free-response receiver operating characteristic

(FROC) curves [16]. An FROC curve plots the correct

detection rate (i.e., true-positive fraction) versus the

average number of false-alarms (i.e., false-positives) per

image for the continuum of the decision threshold. The

FROC curve provides a comprehensive summary of the

trade-off between missed detections and false alarms.

All the detection algorithms were evaluated using the

same set of 38 test mammograms. The results are

summarized using FROC curves in Fig. 4. As can be seen,

the SVM classifier offers the best result in the operating

range with less than 3 false clusters per image.

The small section shown in Fig. 1 was from a test

mammogram containing several MCs; these MCs (though

some of them are hardly visible) were all successfully

detected by the SVM classifier and labeled with circles.

0

1

2

3

4

5

0.4

0.5

0.6

0.7

0.8

0.9

1

Avg. Number of False Clusters

TP fraction

SVM Classifier

Wavelet

DoG

IDTF

Neural Network

Figure 4. FROC curves obtained for the different

methods evaluated. A higher FROC curve indicates better

performance. The most significant portion of the curves

is at the low end of the number of false positive clusters,

where one would prefer to operate.

4. CONCLUSTIONS

In this work we demonstrated an SVM based classifier to

detect microcalcifications in mammogram images.

Experimental results show that the proposed framework is

quite robust over the choice of several model parameters.

In these initial results the SVM classifier outperformed all

the other methods considered.

5. REFERENCES

[1] M Lanyi, Diagnosis and Differential Diagnosis of Breast

Calcifications, Springer-Verlag, Berlin, 1988.

[2] N. Karssemeijer, A stochastic model for automated

detection calcifications in digital mammograms, in Proc. 12th

Int. conf. Info. Med. Imag., Wye, UK, July 1991.

[3] R. M. Nishikawa, et al, Computer aided detection of

clustered Microcalcifications in digital mammograms, Med.

Bio Eng. Comp., vol. 33, 1995.

[4] R. N. Strickland, and H. L. Hahn, Wavelet transforms

methods for object detection and recovery, IEEE Trans. Image

Processing, vol. 6, pp. 724-735, May 1997.

[5] T. Netsch, A scale-space approach for the detection of

clustered microcalcifications in digital mammograms, 3rd Int.

Workshop on Digital Mammography, 1996.

[6] J. Dengler, S. Behrens, and J.F. Desaga, Segmentation of

microcalcifications in mammograms, IEEE Trans. Med. Imag.,

vol. 12 no. 4, 1993.

[7] M. N. Gurcan, et al, Detection of microcalcifications in

mammograms using higher order statistics, IEEE Signal Proc.

Lett., vol. 4, no. 8, 1997.

[8] H. Cheng, Y. M. Lui, and R. I. Freimanis, A novel

approach to microcalcifications detection using fuzzy logic

techniques, IEEE Trans. Med. Imag., vol. 17 no. 3, June 1998.

[9] P. A. Pfrench, J. R. Zeidler, and W.H. Ku, Enhanced

detectability of small objects in correlated clutter using an

improved 2-D adaptive lattice algorithm, IEEE Trans. Imag.

Proc., vol. 6 no. 3, 1997.

[10] H. Li, K. J. Liu, and S. B. Lo, Fractal modeling and

segmentation for the enhancement of microcalcifications in

digital mammograms, IEEE Trans. Med. Imag., vol. 16, no. 6,

Dec. 1997

[11] I. N. Bankman, et al, Segmentation algorithms for

detecting microcalcifications in mammograms, IEEE Trans.

Info. Tech. in Biomed., vol. 1, no. 2, June 1997.

[12] S. Yu, and L. Guan, A CAD system for the automatic

detection of clustered microcalcifications in digitized

mammogram films, IEEE Trans. Med. Imag., vol. 19, pp. 115-

126, Feb. 2000.

[13] V. Vapnik, Statistical Learning Theory, Wiley, 1998.

[14] K. R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B.

Scholkopf, An introduction to kernel-based learning

algorithms, IEEE Trans. Neural Networks, vol. 12, no. 2,

pp.181-201, 2001.

[15] J. Platt, Fast training of support vector machine using

sequential minimal optimization, Advances in Kernel Methods:

Support Vector Learning, ed., Scholkopf et al, MIT Press, 1998.

[16] P. C. Bunch, et al, A free-response approach to the

measurement and characterization of radiographic-observer

performance, J. Appl. Eng. vol. 4, 1978.

## Comments 0

Log in to post a comment