Predicting of the Semiconductor Book-to-Bill Ratio by Using a Novel Genetic Algorithm based Support Vector Machine

chardfriendlyAI and Robotics

Oct 16, 2013 (4 years and 8 months ago)
















p p.



P r e d i c t i n g o f t h e S e mi c o n d u c t o r B o o k
Bill Ratio by Using

a Novel Genetic Algorithm based Support Vector Machine

Yu Chang

Yo Huang

Hsien Yang


Hshiung Tzeng


Hua Hu




of Computer Science

National Tsing

Hua University

No. 101, Kuang Fu Rd, Sec. 2 Hsinchu, Taiwan


of Industrial Education


Taiwan Normal University

No. 162, Ho
Ping East Road I

Taipei, Taiwan


of Project Management

Kainan University

No. 1, Kainan RoadLuchu, Taiwan


Department of Finance

Ta Hwa Institute of technology

No. 1, Ta
Hwa R
, Chiung
Chu, Taiwan


0; accepted




bill (BB) ratio is a demand
supply ratio for the number of
orders booked to the number of orders filled. Thus, BB ratio predictions are important to
production, sales and marketing, as well finance managers and investors since accurate
on results can serve as the foundation for equipment ordering, integrated circuit
(IC) products pricing, inventory control, debt planning, and investments. Albeit important,
few researches tried to predict the BB ratio. In order to develop a forecast mecha
nism for
BB ratio prediction, this study introduced a novel genetic algorithm (GA) based support
vector machine (SVM). The kernel function and parameters will be optimized by the GA
based SVM. This GA based SVM is more generalized due to its weaker depend
ence on
experience. The prediction of semiconductor BB ratio was based on the historical data
between 1996
2010. In this study, the association between future realizations of BB ratio
performance and the current semiconductor fluctuation of information was

proposed to
assess the relative usefulness of these BB ratios. Based on the GA based SVM, how
forecast volatility and forecast inflation lead to delayed tool delivers can be demonstrated.

Meanwhile, based on the GA based SVM, prediction accuracy can be hi
gher than 80% for
most time slots. In the future, the GA based SVM can further be applied on other
predictions of industry cycles or trends.

Genetic algorithm (GA), support vector machine (SVM), semiconductor,
bill ratio (BB ratio)



The book
bill (BB) ratio is a demand
supply ratio for the number of orders
booked to the number of orders filled. It is an important indicator of sales in many
tech industries such as the semiconductor manufacturing, the printed cir
cuit boards
(PCB) manufacturing, and so on. In the semiconductor industry, many organizations such
as Semiconductor Equipment and Materials International (SEMI), Semiconductor
Equipment Association of Japan (SEAP), and Very Large
Scale Integrated Research
Research) provide reports on the BB ratio [1].

The Semiconductor Industry Association (SIA) employs the Price Waterhouse LLP, a
major international independent accounting firm, to collect data from firms, to investigate
questionable data submitted by

companies, and to calculate the semiconductor BB ratio. A
value of BB ratio being less than 1 implies the over
supply situation while the value being
greater than 1 implies a shortage. For example, a book
bill ratio of 0.9 implies that the
$90 worth of

new orders was received for every $100 of products being billed for the
specific time slot.

The BB ratio is essential for both industry managers and investors. For managers from
functional departments including production, sales and marketing, as well fi
nance managers,
the BB ratio can serve as the foundation for equipment ordering, integrated circuit (IC)
products pricing, inventory control, debt planning, and investments. For investors, the BB
ratio can serve as the basis for making appropriate investme

Albeit BB ratio is so important for managers and investors, the timeliness is usually
sacrificed for relevance or reliability [2]. With most accounting information systems, for
example, the earliest time transactions can be recorded is the moment when

an order is
placed. Revenue is subsequently recognized at the point of sale and the associated earnings
typically are not revealed until the end of the quarter. As a result, the earliest point at which
users can access to some sets of potentially value
levant financial statement information
is the quarterly report date [2]. Thus, an accurate prediction of the BB ratio is important to
the semiconductor industry since the prediction can serve as general indicator for the future
supply and demand situation.

However, few researches tried to predict the BB ratio. SVM,
a statistical learning theory based on the machine learning algorithms being presented by
Vapnik [6, 7], can be a possible approach to precisely predict the highly fluctuated BB
ratios by using l
inear model to implement nonlinear class boundaries through some
nonlinear mapping the input BB ratio vector into the high
dimensional feature space. But
how the parameters can be selected so as to optimize the prediction results can still prevent
ers from precise predictions of the non
linear BB ratio time series.

To resolve the above mentioned BB ratio prediction problems and help industry
managers or investors make correct decisions, a novel genetic algorithm (GA) based
support vector machine (S
VM) will be introduced to regress the BB ratio time series. Since

the parameter selection in SVM is rather important and impact the prediction accuracy
significantly, the GA will be introduced to optimize the parameters. The rolling prediction
of the histo
rical nonlinear BB ratio time series from 1996 to 2010 will be used to verify the
feasibility of the novel GA based SVM algorithm. The results demonstrate that the
based SVM method can actually predict the fluctuation of BB ratios.

The remainder of this

paper is organized as follows, In Section 2, the industrial
background regarding to the BB ratio will be provided. Section 3 introduces the novel
based SVM including the GA and the SVM algorithms. In Section 4, predictions of the
nonlinear historical B
B ratio time series will be demonstrated. The empirical study results
and findings will be discussed in Section 5. Finally, Section 6 will conclude the whole
article with observations, conclusions and recommendations for further study.

Literature Revie

As in many customized capital goods industries, the semiconductor equipment supply
chain faces an order fulfillment dilemma. On the one hand, buyers of equipment expect
their suppliers to be responsive and to be able to fulfill orders within a relativel
y short order
time; on the other hand, the high value and the customized nature of the products
makes it risky for the supplier to keep finished products or sub
systems in inventory,
leading to long and variable manufacturing lead
times. To resolve th
is dilemma, the buyers
(producers of micro chips) provide their equipment suppliers with order forecasts for the
next 24 months and longer.

Demand for semiconductor production equipment is triggered by the demand for chips,
including micro
processors and m
emory chips. Given that the demand for chips is in turn
generated by the demand for electronic devices, e.g., servers, personal computers, cell
phones, etc., semiconductor equipment makers find themselves at the wrong end of the
“bullwhip” [3]. They face b
usiness cycles that flood them with orders in one year and
starve them for work in the next.

The large chip producers create market forecasts on a monthly or quarterly basis. These
forecasts are used to project production capacity needs for the next 2
5 ye
ars. Forecasts and
capacity plans are updated on the basis of a rolling horizon principle. Chip manufacturers
use these product level demand forecasts combined with equipment output models to
allocate forecasted capacity requirements to both existing and p
otentially new
semiconductor fabs. If the forecasted capacity requirement is not supported by the size and
productivity of the installed equipment base, additional equipment must be ordered. This
projected need for additional equipment is shared with equi
pment suppliers in the form of
soft orders consistent with the principle of forecast sharing and collaborative planning.

Typically, the chip manufacturers are unlikely to actually commit to purchase
equipment at the time of the first forecast. Over the nex
t two years, the chip manufacturer
will obtain new information about developments on the market for chips as well as about
the effective capacity of the currently installed equipment base (based on production yields,
throughput time, and machine uptime). A
s a result, the chip manufacturer may update the
order and will usually delay making a firm order (i.e., issue a purchase order) until about
6 months prior to the projected delivery date [4].

The book
bill (BB) ratio is a demand
supply ratio for th
e number of orders

booked to the number of orders filled. The BB ratio is compiled monthly by Price
Waterhouse LLP on behalf of the Semiconductor Industry Association (SIA) based on
surveys of firms that manufacture semiconductors. The numerator of the BB
represents a seasonally adjusted, three
month moving average of new orders received,
while the denominator represents a seasonally adjusted three
month moving average of
chips shipped. A BB ratio of $1.10 indicates, therefore, that $1.10 in new order
s have been
received for every $1 of chips shipped, which ordinarily would be interpreted as a positive
signal regarding future industry sales levels.

The accounting firm collects data from a voluntary sample of companies on the fifth
business day of the m
onth. The SIA issues a press release containing the preliminary
estimate of the index for the previous month between the ninth and the twelfth of each
month. The news release is made from California and is picked up on the newswire. The
release of the BB r
atio was reported in the Wall Street Journal on the following day for all
releases during 1995 and 1996, and for ten of the 12 releases in 1994. The Wall Street
Journal typically reports the value of the index, and the change from the previous month’s
ex. In addition, comments are sometimes solicited from firms in response to the release
of the index [2]. For example, a spokesman for Advanced Micro Devices responded to a
decline in the index by stating that “the stock market murders all chip stock but t
he industry
is fundamentally sound” (Wall Street Journal 1996a). The SIA does not typically include
the name of the accounting firm in the BB press release.

The first release of the BB ratio is technically a preliminary figure that is frequently
by a small amount in the following month. However, it is the preliminary figure
which attracts the primary news coverage and which would be expected to convey the
newest information to the market. The adjustments to the preliminary BB announcements
were no
t found to be associated with stock returns and are not considered further in this

Chandra et al. [5] find a significant correlation between changes in the BB and
subsequent changes in quarterly earnings. Chandra et al. [5] also find significant sto
ck price
movements on BB release dates. Our study is similar to that of Chandra [5] and the results
of both studies are generally consistent in finding that the BB announcements do provide
information to investors. While Chandra et al. [5] focus on the imp
act of the BB release on
the stock prices for a small sample of semiconductor manufacturers, our study examines the
broader industry
wide information effects for firms in the semiconductor, semiconductor
components and technology areas.

SVM with GA


this section, the semiconductor BB ratio prediction mechanism, a GA
based SVM
algorithm, will be introduced. The GA will be introduced to optimize parameters for the
SVM based forecast mechanism. Then, historical BB ratio data sets will be predicted



The SVM is a statistical learning theory based on machine learning algorithm
presented by Vapnik

SVM uses linear model to implement nonlinear class boundaries
through some nonlinear mapping the input vector

nto the high
dimensional feature

space. A linear model being constructed in the new space can represent a nonlinear decision
boundary in the original space. In the new space, an optimal separating hyperplane is
constructed. Thus, the SVM is known as the al
gorithm that finds a special kind of linear
model, the maximum margin hyperplane. The maximum margin hyperplane gives the
maximum separation between the decision classes. The training data sets that are closest to
the maximum margin hyperplane are called s
upport vectors. All other training data sets are
irrelative for defining the binary class boundaries.

For the linear separable case, a hyperplane separating the binary decision classes in the
attribute case can be represented as the following equati




is the outcome,

are the attribute values, and there are four weights

to be
learned by the learning algorithm. In Eq. (1), the weights

are parameters that determine
the hyperplane. The maximum margin hyperplane can be represented as the following
equation in terms of the support vectors:



is the class value of traini
ng data sets

represents the dot product. The

represented a test data set and the vectors

are the support vectors. In this


are parameters that determine the hyperplane. From the
implementation point of view, finding the support vector and determining the parameters


are equivalent to solving a l
inearly constrained quadratic programming.

As mentioned above, SVM constructs linear model to implement nonlinear class
boundaries through the transforming the inputs into the high
dimensional feature space. For
the nonlinear separating case, a high
ional version of Eq. (2) is simply represented
as followed:



The function

is defined as the kernel function. Any function that meets
Mercer’s condition can be used as the Kernel function, like polynomia
l, sigmoid, and
Gaussian radial basis function (RBF) used in SVM. In this work, the RBF kernel is given
by (4) is used.



denotes the variance of the Gaussian kernel. In addition, for the separable
there is a lower bound 0 on the coefficient

in Eq. (3), for the non
separating case, SVM
can be generalized by placing an upper bound

on the coefficients
. Therefore

, of a SVM model is important to the accuracy of prediction.

The learning algorithm for a non
linear classifier SVM follows the design of an
optimal separating hyperplane in a feature space. The procedure is the same
as the one
being associated with hard and soft margin classifier SVMs in x
space. Accordingly, the
dual Lagrangian in z
space is [8]


and using the chosen kernels, the Lagrangian is maximized as follows.




Note the constraints must be revised for using in a non
linear soft margin classifier
SVM. The only difference these constraints and those of the separable non
linear c
are in the upper bound C on the Lagrange multipliers
. Consequently, the constraints of
the optimization problem become




Parameter Opt

To design an effective SVM model, values of parameters in SVM have to be chosen
carefully in advance [9, 10]. These parameters include: (1) the regularization parameter
which determines the tradeo

cost between minimizing the training error and minimizing
the complexity of the model; (2) the parameter sigma (

) of the kernel function
which defines the non
linear mapping from the input space to some h
feature space (only the Gaussian kernel will be considered in this research while the
variance the kernel function is
); (3) a kernel function being used in this SVM, which is
used to construct a non
linear decision h
ypersurface in an input space [8].

To solve this SVM design problem, Lin [10] provided a systematic method for
selecting SVM parameters. Lin’s approach for selecting parameters of the support vector
regression was based on the concept of the sampling theor
y into the Gaussian Filter. Min
and Lee [11] also proposed a grid
search technique by using a 5
fold cross validation to
find out the optimal parameter values of the kernel function of SVM.

In contrast to abovementioned methods of parameter optimization on

SVM, this
reserach develops a novel GA based method, the GA based SVM, for optimizing the two
SVM parameters (

) simultaneously. The first parameter,
, will be used to
determine the


between the fitting error minimization and model complexity. The
second parameter,
, is the bandwidth of the radial basis function (RBF) kernel.



A GA is based on the evolutionary process of animals where the propaga
tion of desired
traits happens by natural selection. In each generation the best traits are combined to
produce offspring that are better than their parents. Thus this is a greedy algorithm.

Since structural methods for confirming efficiently the selectio
n of parameters are
lacking. Therefore, GA is used in the proposed SVM model to optimize parameter selection.
To precisely establish a GA
based feature selection and parameter optimization system, the
following main steps (as shown in Fig.1) must be procee
ded. Following, the procedures of

GA is explained in detail.


Chromosome representation

The two parameters,

, of SVM were directly coded to form the chromosome
in the proposed method. The chromosome

is represented as
, where


denote the regularization parameter


(the parameter of the kernel
function), r


Evaluating fitness function

A fitness function, assessing the performance of each chromosome, must be designed
before starts to search optimal values of SVM parameters. In this study, a mean absolute
percentage error (MAPE) is used as the fit
ness function. The MAPE is as follows:




represent the actual and forecast values and

is the number of
forecasting periods.


Selection and reprodu

Based on fitness functions, chromosomes with higher fitness values are more likely to
yield offspring in the next generation. The tournament selection method is applied to
choose chromosomes for reproduction.



Once a pair of chromosomes has
been selected for crossover, one or more randomly
selected positions are assigned to the to
crossed chromosomes. The newly crossed
chromosomes are then combined with the rest of the chromosomes to generate a new
population. This study we use the method
proposed by Adewuya [12] to prevent overload
of post
crossover when genetic algorithm with real
valued chromosomes are applied.



Move closer:










represent the pair of populations before crossover operation;


represent the
pair of new populations after crossover operation.



The mutation operation follows the crossover operation and determines whether a
chromosome should be mutated in the next generation. In this study, uniform mutation
method is applied and designe
d in the presented model. Consequently, researchers can
select the method of mutation in GA
SVM best suited to their problems of interest.
Uniform mutation can represent as following:







denotes the number of parameters;

represents a random number in the range
, and

is the position of the mutation. LB and UB are the l
ow and upper bounds on
the parameters, respectively.


denote the low and upper bound at location

represents the population before mutation operation;

represents the
new population following mutation operation.

The GA function

Predicting of the BB ratio by the GA based SVM


Research Data

The empirical analysis is based on a proprieta
ry data set consisting of the BB ratio
from November 1996 to May 2010. The data set served as the input data in the rolling
prediction simulation.

Concepts of the GA Based SVM

In this study, the Gaussian radial basis function (RBF) is used as the kern
el function of
SVM. Tay and Cao [13] showed that the upper bound


played an important role
in the performance of SVMs. An improper selection of these two parameters can cause the
overfitting or the underfi
tting problems. Since there is few general guidance to determine
the parameters of SVM, this study employs the GA to select optimal parameters


simultaneously for the best prediction performance.

In Figure

2, the optimal parameters were decided by using the GA
based SVM model.
First, some data sets were given for training. Further training runs were then carried out
using the GA to select the optimal combination of input data sets, and improve regression
rformance. In the GA function, it determines whether it satisfies the stop condition, if not,
it continues preceding between GA function and SVM function until deriving the optimal
values of parameters

Next, the raw data sets were imported with the optimal
parameters and run the remaining steps of support vector machine. That is, the data was
trained to create a model, then predict new input data and get the prediction results and the

The GA
based SVM model

SVM Results

In Figure 3, the semiconductor BB ratio values observed from November 1996 to May
2010 were demonstrated. T
he historical BB ratio values would be selected as input data sets
for the pr
oposed GA
based SVM model. Predictions based on the data set were executed
by using both the SVM and the GA based SVM algorithms for benchmarking the
performance of the novel GA based SVM framework. The MAPE will serve as the
indicator for the prediction

In Figure 4, prediction results based on the GA based SVM and the fixed SVM with
=1000 and
=1 were demonstrated as the basis for comparisons, the results
based on the GA based SVM can fit
the raw data of historical BB ratio closely with
comparatively lower prediction error rate.

Trend chart based on the historical BB ratio value from 1996 to 2010


The BB ratio to the raw data, data after GASvM, and fixed SVM

We emplo
y the GA based SVM framework for rolling predictions,
the accuracy of
predictions are higher than 80% for most time slots (Figure 5) of the BB ratio in the
semiconductor industry.


The MAPE by the GA based SVM


In this research, a

novel GA based SVM framework was introduced for predicting the
semiconductor BB ratios. Apparently, the GA based SVM performed better than the fixed
SVM. Prediction results by the traditional SVM with fixed values of parameters
demonstrated higher error r
ates. These results imply that the prediction errors of the
traditional SVM algorithm can be reduced dramatically by using the parameters being
optimized by using the GA.

With the GA based SVM algorithm, the prediction accuracy of the semiconductor BB
io could be higher. Let the months before last year be the training data and the months in
the current year be the raw data for prediction. For example, in Figure 5, the data was
trained with with the months in 1996 and 1997, then predict the BB ratios in
1998 and the
MAPE can be calculated.

Based on the MAPE results being demonstrated
in Figure 5, the prediction accuracy
are usually higher than 80% while the global semiconductor market keeps steady growth.
However, some extreme situations exist. Following,

the extreme situations which caused
significant prediction errors will be discussed also.

In 1998, the MAPE achieved 40% since our training data sets are not enough (only two
years from 1996 to 1997) and the BB ratio decreased rapidly relative to the 1997

refer to Figure 3) due to a recession (please refer to Figure 6). In 1999 and 2000, the
semiconductor kept growth while the prediction results are satisfactory with the prediction
accuracy higher than 80%. Next, in 2001 the MAPE was as high as 70%

due to the severe
industry downturn (Figure 6) being caused by the internet bubble and thus, a recession of

the world’s economy. In 2009, the Financial Tsunami drove the BB ratio down again which
influenced the prediction accuracy significantly


McClean et al. [14,15]



Worldwide semiconductor industry growth rate (1978



Forecasting the future BB ratio is important to the semiconductor manufacturing
industry. In this paper, a forecasting method is consisting of the

SVM and a GA was
provided. It is found that the novel GA based SVM algorithm can predict the BB ratios
accurately for the time slots when there is no significant recession in the semiconductor
industry. In the future, more data samples can be collected fo
r verifying the accuracy of this
forecast mechanism.



T. Chen and Y.C. Wang, “A Hybrid Fuzzy and Neural Approach for Forecasting the Book
Bill ratio
in the Semiconductor Manufacturing Industry,”
The International Journal of Advanced Manufact
, pp. 1


N.L. Fargher, L.R. Gorman and M.S. Wilkins, “Timely Industry Information as an Assurance Service,
Evidence on the Information Content of the Book
Bill Ratio,”
, vol. 17, pp. 109
124, 1998.


H.L. Lee, V. Padmanabhan and

S. Whang, “Information Distortion in a Supply Chain: the Bullwhip
Management Science
, vol. 43, n
. 4, pp. 546
558, 1997.


C. Terwiesch, Z.J. Ren, T.H. Ho and M.A. Cohen, “An Empirical Analysis of Forecast Sharing in the
Semiconductor Equipment Sup
ply Chain,”
Management Science
, vol. 51,

2, pp. 208
220, 2005.


U. Chandra, A. Oricassini and G. Waymire, “The Information Content of Non
financial Disclosures,”
Working Paper
, Emero University, 1997.


V. Vanpnik, “
Statistical Learning Theory
,” J. Wiley,



K. Kim, “Financial Time Series Forecasting using Support Vector Machines,”
, vol. 55,

2, pp. 307
319, 2003.


C.H. Wu, G.H. Tzeng, Y.J. Goo and W.C. Fang, “A Real
valued Genetic Algorithm to Optimize the
Parameters of Support Vector

Machine for Predicting Bankruptcy,”
Expert Systems with Applications
vol. 32,

2, pp. 397
408, 2007.


K. Duan, S.S. Keerthi and A.N. Poo, “Evaluation of Simple Performance Measures for Tuning SVM
, vol. 51, pp. 41
59, 20


P.T. Lin, “Support Vector Regression: Systematic Design and Performance Analysis,” Unpublished
Doctoral Dissertation, Department of Electronic Engineering, National Taiwan University.


J.H. Min and Y.C. Lee, “Bankruptcy Prediction using Support Vector M
achine with Optimal Choice of
Kernel Function Parameters,”
Expert Systems with Applications
, vol. 28,

4, pp. 603
614, 2005.


A.A. Adewuya, “New Methods in Genetic Search with Real
valued chromosomes,” Unpublished
Master’s thesis, Massachusetts Institute

of Technology.


F.E.H. Tay and L. Cao, “Application of Support Vector Machines in Financial Time Series
, vol. 29,

4, pp. 309
317, 2001.


B. McClean, B. Matas, and T. Yancey,
The McClean Report,
2001 Edition
Scottsdale, Arizona: IC
Insights, 2001.


B. McClean, B. Matas, and T. Yancey,
The McClean Report,
2005 Edition.

Scottsdale, Arizona: IC
Insights, 2005.