Multistep Virtual Metrology Approaches for
Semiconductor Manufacturing Processes
Presenter: Simone Pampuri (University of Pavia, Italy)
Authors:
Simone Pampuri
, University of Pavia, Italy
Andrea Schirru,
University of Pavia, Italy
Gian Antonio Susto
, University of Padova, Italy
Cristina De Luca
, Infineon Technologies AT, Austria
Alessandro Beghi,
University of Padova, Italy
Giuseppe De Nicolao
, University of Pavia, Italy
Introduction
Collaboration between University of Pavia (Italy), University of
Padova (Italy) and Infineon Technologies AT (Austria)
Activity funded by the European project EU

IMPROVE
:
I
mplementing
M
anufacturing science solutions to
increase equi
P
ment p
RO
ducti
V
ity and fab p
E
rformance
Introduction
Collaboration between University of Pavia (Italy), University of
Padova (Italy) and Infineon Technologies AT (Austria)
Activity funded by the European project EU

IMPROVE
:
I
mplementing
M
anufacturing science solutions to
increase equi
P
ment p
RO
ducti
V
ity and fab p
E
rformance
Duration:
42
months (since Jan
2009
)
Global fundings:
37.7
M
€
32
partners, including
•
Semiconductor fabs
•
Academic institutions
•
Research centers
•
Software houses
Thematic Work Packages
Motivations
1
Machine Learning
2
Multilevel framework
3
Multistep VM
4
5
5
Contents
Results and Conclusions
What is Virtual Metrology?
In semiconductor manufacturing, measurement
operations are costly and time

consuming
Only a small part of the production is actually measured
What is Virtual Metrology?
In semiconductor manufacturing, measurement
operations are costly and time

consuming
Only a small part of the production is actually measured
Virtual metrology exploits sensors and logistic
information to predict process outcome
Sensor Data
Recipe Data
Logistic Data
VM
What is Virtual Metrology?
In semiconductor manufacturing, measurement
operations are costly and time

consuming
Only a small part of the production is actually measured
Virtual metrology exploits sensors and logistic
information to predict process outcome
Controllers
Sampling tools
Decision tasks
Sensor Data
Recipe Data
Logistic Data
VM
Predictive
Information
Motivations
1
Machine Learning
2
Multilevel framework
3
Multistep VM
4
5
5
Contents
Results and Conclusions
Machine learning (in a nutshell)
Machine learning algorithms create models from observed data
(training dataset), using little or no prior informations about the
physical system
Input
(X)
Output
(Y)
Model
f(X)
Learning
Algorithm
Training dataset
Machine learning (in a nutshell)
Machine learning algorithms create models from observed data
(training dataset), using little or no prior informations about the
physical system
The model is then able to predict patterns similar to the observed
ones
Input
(X)
Output
(Y)
Model
f(X)
Learning
Algorithm
Training dataset
Model
Input
(X
new
)
Prediction
(Y
new
)
Machine learning (in a nutshell)
Machine learning algorithms create models from observed data
(training dataset), using little or no prior informations about the
physical system
The model is then able to predict patterns similar to the observed
ones
Input
(X)
Output
(Y)
Model
f(X)
Learning
Algorithm
Training dataset
Model
Input
(X
new
)
Prediction
(Y
new
)
Most famous algorithm:
Ordinary Least Squares (OLS)
that consists in solving the optimization
problem defined by the loss function
The curse of dimensionality
Problem: the so

called “curse of dimensionality”
Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases
In semiconductor manufacturing,
it is common to have
hundreds
of candidate predictors: how to
tackle the problem?
The number of selected predictors grows almost linearly
with the number of
candidate
predictors
The curse of dimensionality
Problem: the so

called “curse of dimensionality”
Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases
In semiconductor manufacturing,
it is common to have
hundreds
of candidate predictors: how to
tackle the problem?
The number of selected predictors grows almost linearly
with the number of
candidate
predictors
Regularization (or Penalization)
methods
The curse of dimensionality
Problem: the so

called “curse of dimensionality”
Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases
The number of selected predictors grows almost linearly
with the number of
candidate
predictors
1943
Ridge
(or
Tikhonov)
regression
:
in
order
to
improve
the
least
squares
method,
stable
(“easier”)
solutions
are
encouraged
by
penalizing
coefficients
through
the
parameter
a
The curse of dimensionality
Problem: the so

called “curse of dimensionality”
Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases
The number of selected predictors grows almost linearly
with the number of
candidate
predictors
1943
Ridge
(or
Tikhonov)
regression
:
in
order
to
improve
the
least
squares
method,
stable
(“easier”)
solutions
are
encouraged
by
penalizing
coefficients
through
the
parameter
a
•
Best value for
hyperparameter is chosen
via validation
•
Computationally easy
(closed form solution)
•
No sparse solution
The curse of dimensionality
Problem: the so

called “curse of dimensionality”
Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases
1996
–
today
L
1

penalized
methods
:
by
constraining
the
solution
to
belong
to
an
hyper

octahedron,
sparse
models
can
be
obtained
(variable
selection)
.
Most
famous
example
:
LASSO
The number of selected predictors grows almost linearly
with the number of
candidate
predictors
The curse of dimensionality
Problem: the so

called “curse of dimensionality”
Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases
1996
–
today
L
1

penalized
methods
:
by
constraining
the
solution
to
belong
to
an
hyper

octahedron,
sparse
models
can
be
obtained
(variable
selection)
.
Most
famous
example
:
LASSO
The number of selected predictors grows almost linearly
with the number of
candidate
predictors
•
Best value for
hyperparameter is chosen
via validation
•
Sparse solution (variable
selection)
•
Solved by iterative
algorithms (e.g. SMO)
Motivations
1
Machine Learning
2
Multilevel framework
3
Multistep VM
4
5
5
Contents
Results and Conclusions
The hierarchical variability
We deal every day with multiple levels of variability:
Every equipment has several chambers
In some cases, these chambers are splitted in sub

chambers
Different process groups, recipes run on the same equipment
The hierarchical variability
We deal every day with multiple levels of variability:
Every equipment has several chambers
In some cases, these chambers are splitted in sub

chambers
Different process groups, recipes run on the same equipment
Simple (“naive”) solution: create one
model for every possible combination
of factors
We’ll never have enough data to that,
especially for low volume recipes
The hierarchical variability
We deal every day with multiple levels of variability:
Every equipment has several chambers
In some cases, these chambers are splitted in sub

chambers
Different process groups, recipes run on the same equipment
Simple (“naive”) solution: create one
model for every possible combination
of factors
We’ll never have enough data to that,
especially for low volume recipes
Better solution: handle those different
levels of variability inside the model
The hierarchical variability
We deal every day with multiple levels of variability:
Every equipment has several chambers
In some cases, these chambers are splitted in sub

chambers
Different process groups, recipes run on the same equipment
Simple (“naive”) solution: create one
model for every possible combination
of factors
We’ll never have enough data to that,
especially for low volume recipes
Better solution: handle those different
levels of variability inside the model
Multilevel Techniques:
Multilevel Ridge Regression (RR)
&
Multilevel Lasso
First step
is to
create an extended input matrix to reflect the
relationships between the
j
clusters. For instance, in the
case of
j
mutually exclusive nodes,
The input matrix reflects the dependency on logistic paths
The Multilevel
Transform
Motivations
1
Machine Learning
2
Multilevel framework
3
Multistep VM
4
Results and Conclusions
5
5
Contents
Standard scenario
Production flow: sequence of steps; each step
represents an operation that must be performed on a
wafer in order to obtain a specific results
Each step is performed by different equipment
(composed by multiple chambers):
The knowledge of which wafer is processed by a specific
equipment is available (logistic information)
The information about processed wafer (e.g. sensor
readings and recipe setup) might be available
On some equipments a “single step” VM system is already
in place (estimated measures for each processed wafer
are available)
Cascade Multistep VM
This approach allow to build a pipe system in which the
predictive information is propagated forward to concur to
further model estimation.
The generation of multilevel input matrix consists in replace
j

th
cluster’s process variables with
j

th
VM

j estimation
Cascade Multistep VM
This approach allow to build a pipe system in which the
predictive information is propagated forward to concur to
further model estimation.
The generation of multilevel input matrix consists in replace
j

th
cluster’s process variables with
j

th
VM

j estimation
Pros:
o
Small overhead append to
the input space
o
Computational effort very
similar to “single step” VM
case
Cons:
o
Steps without “single step”
VM must be excluded
o
There might be some
information loss between two
or more steps
Process and Logistic Multistep VM
With this approach, all the relevant logistic, process and
recipe information from all the considered steps is included in
the input set
In this case, the generation of input matrix fully follows the
previous
Multilevel Transform
Process and Logistic Multistep VM
With this approach, all the relevant logistic, process and
recipe information from all the considered steps is included in
the input set
In this case, the generation of input matrix fully follows the
previous
Multilevel Transform
Pros:
o
Steps with no (or
meaningless) measurements
can be included
o
All the available information
is provided to the learning
algorithm
Cons:
o
Input space dimension is
significantly increased by this
approach
o
More observations are
needed to train the learning
algorithm
Contents
Motivations
1
Machine Learning
2
Multilevel framework
3
Multistep VM
4
Results and Conclusions
5
5
Production flow for methodologies validation:
1.
Chemical Vapor Deposition (CVD)
2.
Thermal Oxidation
3.
Coating
4.
Lithography
Target: post

litho CDs
Dataset:
583
wafers anonymized
Hyper

parameter tuning:
10

fold crossvalidation
Multistep VM setups:
CVD

Litho Cascade
CVD

Litho Process and Full Logistic
Scenario
Cascade
The cascade VM allows to further improve
the VM performances using RR. This result
might be related to the additional hidden
knowledge provided by the intermediate
CVD metrology prediction.
The cascade approach performs worse
with the LASSO. It should be noted that
this is the only case in which the extended
input space does not improve the
predictive performances.
Process and Full Logistic
Validation RMSE results for Ridge
Regression: it is apparent how the full
step choice allows to improve the
predictive performances.
LASSO is consistently outperformed by
Ridge Regression in the dataset that was
used for the experiment; nevertheless, the
extended input space proves to be fruitful
also in this case, with respect to the
Lithography based approach.
Best Lasso and Best RR
The best overall results for Ridge
Regression are obtained with the
cascade approach and by considering all
the process steps.
For the LASSO, the best overall results are
obtained by considering the extended
process values for all the involved steps.
Research and design of Multistep VM strategies targeted
to specific semiconductor manufacturing needs
Main features:
Enhancing precision and accuracy of regular VM system
Taking in account process without measurements
Tests showed promising results; however, the strategy to
be implemented must be carefully designed:
Sample size and relevance of the steps are fundamental
criteria to obtain the best performances
Conclusions
www.themegallery.com
Thanks for your attention!
Authors:
Simone Pampuri
, University of Pavia, Italy
Andrea Schirru,
University of Pavia, Italy
Gian Antonio Susto
, University of Padova, Italy
Cristina De Luca
, Infineon Technologies AT, Austria
Alessandro Beghi,
University of Padova, Italy
Giuseppe De Nicolao
, University of Pavia, Italy
Presenter: Simone Pampuri (University of Pavia, Italy)
Comments 0
Log in to post a comment