Multistep Virtual Metrology Approaches for Semiconductor Manufacturing Processes

wonderfuldistinctΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

64 εμφανίσεις

Multistep Virtual Metrology Approaches for
Semiconductor Manufacturing Processes

Presenter: Simone Pampuri (University of Pavia, Italy)

Authors:

Simone Pampuri
, University of Pavia, Italy

Andrea Schirru,
University of Pavia, Italy

Gian Antonio Susto
, University of Padova, Italy

Cristina De Luca
, Infineon Technologies AT, Austria

Alessandro Beghi,

University of Padova, Italy


Giuseppe De Nicolao
, University of Pavia, Italy

Introduction


Collaboration between University of Pavia (Italy), University of
Padova (Italy) and Infineon Technologies AT (Austria)


Activity funded by the European project EU
-



IMPROVE
:
I
mplementing
M
anufacturing science solutions to
increase equi
P
ment p
RO
ducti
V
ity and fab p
E
rformance


Introduction


Collaboration between University of Pavia (Italy), University of
Padova (Italy) and Infineon Technologies AT (Austria)


Activity funded by the European project EU
-



IMPROVE
:
I
mplementing
M
anufacturing science solutions to
increase equi
P
ment p
RO
ducti
V
ity and fab p
E
rformance



Duration:
42
months (since Jan
2009
)


Global fundings:
37.7
M



32
partners, including


Semiconductor fabs


Academic institutions


Research centers


Software houses


Thematic Work Packages



Motivations

1

Machine Learning

2

Multilevel framework

3

Multistep VM

4

5

5

Contents

Results and Conclusions

What is Virtual Metrology?


In semiconductor manufacturing, measurement
operations are costly and time
-
consuming



Only a small part of the production is actually measured

What is Virtual Metrology?


In semiconductor manufacturing, measurement
operations are costly and time
-
consuming



Only a small part of the production is actually measured


Virtual metrology exploits sensors and logistic
information to predict process outcome

Sensor Data

Recipe Data

Logistic Data

VM

What is Virtual Metrology?


In semiconductor manufacturing, measurement
operations are costly and time
-
consuming



Only a small part of the production is actually measured


Virtual metrology exploits sensors and logistic
information to predict process outcome

Controllers

Sampling tools

Decision tasks

Sensor Data

Recipe Data

Logistic Data

VM

Predictive

Information

Motivations

1

Machine Learning

2

Multilevel framework

3

Multistep VM

4

5

5

Contents

Results and Conclusions

Machine learning (in a nutshell)


Machine learning algorithms create models from observed data
(training dataset), using little or no prior informations about the
physical system








Input

(X)

Output

(Y)

Model

f(X)

Learning

Algorithm

Training dataset

Machine learning (in a nutshell)


Machine learning algorithms create models from observed data
(training dataset), using little or no prior informations about the
physical system









The model is then able to predict patterns similar to the observed
ones


Input

(X)

Output

(Y)

Model

f(X)

Learning

Algorithm

Training dataset

Model

Input

(X
new
)

Prediction

(Y
new
)

Machine learning (in a nutshell)


Machine learning algorithms create models from observed data
(training dataset), using little or no prior informations about the
physical system









The model is then able to predict patterns similar to the observed
ones


Input

(X)

Output

(Y)

Model

f(X)

Learning

Algorithm

Training dataset

Model

Input

(X
new
)

Prediction

(Y
new
)

Most famous algorithm:


Ordinary Least Squares (OLS)

that consists in solving the optimization

problem defined by the loss function

The curse of dimensionality


Problem: the so
-
called “curse of dimensionality”





Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases




In semiconductor manufacturing,

it is common to have
hundreds


of candidate predictors: how to

tackle the problem?


The number of selected predictors grows almost linearly

with the number of
candidate
predictors

The curse of dimensionality


Problem: the so
-
called “curse of dimensionality”





Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases




In semiconductor manufacturing,

it is common to have
hundreds


of candidate predictors: how to

tackle the problem?


The number of selected predictors grows almost linearly

with the number of
candidate
predictors

Regularization (or Penalization)

methods

The curse of dimensionality


Problem: the so
-
called “curse of dimensionality”





Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases

The number of selected predictors grows almost linearly

with the number of
candidate
predictors

1943

Ridge

(or

Tikhonov)

regression
:

in

order

to

improve

the

least

squares

method,

stable

(“easier”)

solutions

are

encouraged

by

penalizing

coefficients

through

the

parameter

a

The curse of dimensionality


Problem: the so
-
called “curse of dimensionality”





Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases

The number of selected predictors grows almost linearly

with the number of
candidate
predictors

1943

Ridge

(or

Tikhonov)

regression
:

in

order

to

improve

the

least

squares

method,

stable

(“easier”)

solutions

are

encouraged

by

penalizing

coefficients

through

the

parameter

a



Best value for
hyperparameter is chosen
via validation



Computationally easy
(closed form solution)



No sparse solution

The curse of dimensionality


Problem: the so
-
called “curse of dimensionality”





Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases

1996


today

L
1
-
penalized

methods
:

by

constraining

the

solution

to

belong

to

an

hyper
-
octahedron,

sparse

models

can

be

obtained

(variable

selection)
.

Most

famous

example
:

LASSO

The number of selected predictors grows almost linearly

with the number of
candidate
predictors

The curse of dimensionality


Problem: the so
-
called “curse of dimensionality”





Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases

1996


today

L
1
-
penalized

methods
:

by

constraining

the

solution

to

belong

to

an

hyper
-
octahedron,

sparse

models

can

be

obtained

(variable

selection)
.

Most

famous

example
:

LASSO

The number of selected predictors grows almost linearly

with the number of
candidate
predictors



Best value for
hyperparameter is chosen
via validation



Sparse solution (variable
selection)



Solved by iterative
algorithms (e.g. SMO)

Motivations

1

Machine Learning

2

Multilevel framework

3

Multistep VM

4

5

5

Contents

Results and Conclusions

The hierarchical variability


We deal every day with multiple levels of variability:


Every equipment has several chambers


In some cases, these chambers are splitted in sub
-
chambers


Different process groups, recipes run on the same equipment




The hierarchical variability


We deal every day with multiple levels of variability:


Every equipment has several chambers


In some cases, these chambers are splitted in sub
-
chambers


Different process groups, recipes run on the same equipment



Simple (“naive”) solution: create one

model for every possible combination

of factors


We’ll never have enough data to that,

especially for low volume recipes

The hierarchical variability


We deal every day with multiple levels of variability:


Every equipment has several chambers


In some cases, these chambers are splitted in sub
-
chambers


Different process groups, recipes run on the same equipment



Simple (“naive”) solution: create one

model for every possible combination

of factors


We’ll never have enough data to that,

especially for low volume recipes



Better solution: handle those different

levels of variability inside the model

The hierarchical variability


We deal every day with multiple levels of variability:


Every equipment has several chambers


In some cases, these chambers are splitted in sub
-
chambers


Different process groups, recipes run on the same equipment



Simple (“naive”) solution: create one

model for every possible combination

of factors


We’ll never have enough data to that,

especially for low volume recipes



Better solution: handle those different

levels of variability inside the model

Multilevel Techniques:


Multilevel Ridge Regression (RR)

&

Multilevel Lasso


First step

is to

create an extended input matrix to reflect the
relationships between the
j

clusters. For instance, in the
case of
j
mutually exclusive nodes,










The input matrix reflects the dependency on logistic paths

The Multilevel

Transform

Motivations

1

Machine Learning

2

Multilevel framework

3

Multistep VM

4

Results and Conclusions

5

5

Contents

Standard scenario


Production flow: sequence of steps; each step
represents an operation that must be performed on a
wafer in order to obtain a specific results



Each step is performed by different equipment
(composed by multiple chambers):


The knowledge of which wafer is processed by a specific
equipment is available (logistic information)


The information about processed wafer (e.g. sensor
readings and recipe setup) might be available


On some equipments a “single step” VM system is already
in place (estimated measures for each processed wafer
are available)



Cascade Multistep VM


This approach allow to build a pipe system in which the
predictive information is propagated forward to concur to
further model estimation.


The generation of multilevel input matrix consists in replace
j
-
th

cluster’s process variables with
j
-
th

VM
-
j estimation

Cascade Multistep VM


This approach allow to build a pipe system in which the
predictive information is propagated forward to concur to
further model estimation.


The generation of multilevel input matrix consists in replace
j
-
th

cluster’s process variables with
j
-
th

VM
-
j estimation

Pros:


o

Small overhead append to
the input space


o

Computational effort very
similar to “single step” VM
case

Cons:


o

Steps without “single step”
VM must be excluded


o

There might be some
information loss between two
or more steps

Process and Logistic Multistep VM


With this approach, all the relevant logistic, process and
recipe information from all the considered steps is included in
the input set


In this case, the generation of input matrix fully follows the
previous
Multilevel Transform


Process and Logistic Multistep VM


With this approach, all the relevant logistic, process and
recipe information from all the considered steps is included in
the input set


In this case, the generation of input matrix fully follows the
previous
Multilevel Transform


Pros:


o

Steps with no (or
meaningless) measurements
can be included


o

All the available information
is provided to the learning
algorithm

Cons:


o

Input space dimension is
significantly increased by this
approach


o

More observations are
needed to train the learning
algorithm

Contents

Motivations

1

Machine Learning

2

Multilevel framework

3

Multistep VM

4

Results and Conclusions

5

5


Production flow for methodologies validation:

1.
Chemical Vapor Deposition (CVD)

2.
Thermal Oxidation

3.
Coating

4.
Lithography


Target: post
-
litho CDs


Dataset:
583
wafers anonymized


Hyper
-
parameter tuning:
10
-
fold crossvalidation


Multistep VM setups:


CVD
-
Litho Cascade


CVD
-
Litho Process and Full Logistic







Scenario

Cascade

The cascade VM allows to further improve
the VM performances using RR. This result
might be related to the additional hidden
knowledge provided by the intermediate
CVD metrology prediction.

The cascade approach performs worse
with the LASSO. It should be noted that
this is the only case in which the extended
input space does not improve the
predictive performances.

Process and Full Logistic

Validation RMSE results for Ridge
Regression: it is apparent how the full
step choice allows to improve the
predictive performances.

LASSO is consistently outperformed by
Ridge Regression in the dataset that was
used for the experiment; nevertheless, the
extended input space proves to be fruitful
also in this case, with respect to the
Lithography based approach.

Best Lasso and Best RR

The best overall results for Ridge
Regression are obtained with the

cascade approach and by considering all
the process steps.

For the LASSO, the best overall results are
obtained by considering the extended
process values for all the involved steps.


Research and design of Multistep VM strategies targeted
to specific semiconductor manufacturing needs



Main features:


Enhancing precision and accuracy of regular VM system


Taking in account process without measurements



Tests showed promising results; however, the strategy to
be implemented must be carefully designed:


Sample size and relevance of the steps are fundamental
criteria to obtain the best performances


Conclusions

www.themegallery.com

Thanks for your attention!

Authors:

Simone Pampuri
, University of Pavia, Italy

Andrea Schirru,
University of Pavia, Italy

Gian Antonio Susto
, University of Padova, Italy

Cristina De Luca
, Infineon Technologies AT, Austria

Alessandro Beghi,

University of Padova, Italy


Giuseppe De Nicolao
, University of Pavia, Italy

Presenter: Simone Pampuri (University of Pavia, Italy)