18
th
European Symposium on Computer Aided Process Engineering
–
ESCAPE 18
Bertrand Braunschweig
and
Xavier Joulia
(Edit
ors)
© 2008 Elsevier B.V./Ltd. All rights reserved.
Diagnosis of chemical processes by fuzzy
clustering
methods: New optimization
method of
partitions
Claudia Isaza
a,
c
,
Marie

Veronique Le Lann
a
,b
,
Joseph Aguilar

Martin
a
a
CNRS
; LAAS, 7, avenue du Colonel Roche, F

31077 Toulouse
,
France
b
Université de Toul
ouse
; INSA ; LAAS
;
135, avenue de Rangueil ; F

31 077 Toulouse, France
c
Universidad de Antioquia
,
Calle 67 Nº 53

108
bloque
21
, Medellín
, Colombia
Abstract
Th
e
diagnosis of
processes can be
defined as the identification of
their
functional states.
W
hen
a mathematical model is difficult or not possible to
obtain
(what is generally the case for complex chemical process
es
)
, knowledge
on the process
behaviour
can be extracted from
historical
measurements
or
complex simulators
. This knowledge is then
organize
d
as a partition of the data
set into classes representing the functional states of the process
(normal or
faulty operations)
. Among the
data mining
techniques, those including fuzzy
logic present the advantages to express the data membership degree to sev
eral
classes. The main objective of
this
work is
to
propose a new method to
optimize
the fuzzy partition in terms of cluster compactness and separation.
Th
is
method
is
applied to the propylene glycol production process
simulated with Hysys
.
Keywords
:
Diag
nosis, fuzzy clustering, partition quality index, complex chemical
process
.
1.
Introduction
The
diagnosis
of
a
complex process
, in the absence
of a
precise
mathematical
model
,
can be developed from
measures reco
rd
ed during previous normal and
abnormal situati
ons
which
can be used to define the operational states
through
trainin
g
mechanisms and expertise.
Among the
data mining
techniques
,
t
he
classification techniques
enable
to establish a model of the system
states
(
behavioural
model) by
extracting
knowledge
f
rom
various attributes
(
raw or
statistical characteristics of a signal such as average and standard deviation
or
even qualitative information
)
relating to a particular behavio
u
r, without this
behavio
u
r being represented by a
set
of analytical relations
.
Th
e modifications
of these characteristics
enable the
detect
ion of
abnormal operation
s
[1]
.
Among the
large number of classification techniques
,
those including fuzzy
logic present the advantages to express the membership degre
e
of an
observation (data)
to several classes
and are known to be able to model
knowledge uncertainty and imprecision
. In general, these methods
exhibit
a
similar and high performance if the
ir
respective
initial parameters are
adequately selected
.
To
selec
t correctly
the
se
parameter
s
,
different
approach
es
for the validation
and
the
adap
ta
t
ion of
the data space partition are proposed
[2]
[3]
.
To evaluate the partition quality,
the
y
generally
used
the geometrical
cluster characteristics obtained by data distance based
clustering algorithms
[4]
.
2
C. Isaza et al.
C
onsequently
the
se
approaches
can be
applied
only to some
specific
classification techniques
.
Th
e
proposed method
here
validates and adapts
automatically the
data space
partition obtained by a
ny
fuzzy classification
technique.
It
is based on a new quality partition index
(CV) depending only o
n
an initial non

optimal fuzzy partition
through the membership degree
matrix
and not
on
the data values. The proposed methodology has been applied to the
propylene glycol production process
[5]
simulated on HYSYS.
In Section 2, the systems diagnosis by classification methods is introduced.
Section 3 describes the proposed approach, followed by the chemical process
application
in Section 4. Finally the discussion and conclusions are present
ed.
2.
Diagnos
is using classification methods
Process
monitoring using classification method consists in determining at each
sample time, the current class which was associated beforehand
to
a functional
state of the
process
.
T
here are two principal phases: t
he training and
the
recognition. In
the
first step (training), the objective is to find the
process
behaviour characteristics which will allow differentiating the
process
states
(each one
being
associated to a class).
The initial
algorithm
parameters are
selected
by
the process expert
who
validates the
obtained
behavioural
model.
In
a
posterior step, the data
recognition
allows
to identify one line the current
process
state.
At each
sampling
time
,
a
vector
collects
the accessible
information
(raw data or p
re

treated data such as filtered, FFT) which
is
provided for monitoring, and the class recognition
procedure yields the operator
what is the current functional state of the process.
In order to optimize the
obtained partition we
propose
to include into the
training phase
a step
to
automatically validate
and adjust
the
clusters.
The
proposed
new approach
automatically
improve
s
, in terms of compactness and class
separation
,
a non
optimal initial
partition helping
therefore the
discrimination
between classes i
.e.
between
operati
on modes
.
3.
Fuzzy partition o
ptimisation method
For improving a partition, it is necessary
first
to evaluate its quality.
A
new
partition
quality index (CV)
which is the base of the fuzzy partition
improvement algorithm
is p
ro
posed. It
inc
ludes a measure of inter

classes
distance and partition dispersion
.
Within the diagnosis area, it is interesting to
have a partition with compact and separate classes
[3]
[6]
[7]
, but to avoid the
trivial solution with a class by element, it is necessary to take into account the
total number of classes.
By seeking compact classes, the objective is to have
clusters
with high
membership
degrees
for
the elements
similar to
the
class
prototypes and small
membership
degrees in the
opposite
cases.
A good
separation of
classes facilitate
s
the detection of the abnormal
process
behaviour
.
A
high number of classes
generate
an unneces
sarily complex
behavioural
model.
I
n the other hand,
a
behavioural
model
w
ith a very low number of
classes w
ould be incomplete or little precis
e
. Consequently, a
partition quality
index must
take into
account the 3
characteristics:
compactness, separation
and
Diagnosis of chemical processes by fuzzy clustering methods
3
number of classes
.
Most
of the
cited
indexes
need
complement
information
associated to
distance

based clustering
algorithms
[4]
.
The
objective
here
is to
propos
e
a
partition quality index applicable to
any
fuzzy classification
method
.
In our
approach
,
each class is interpreted like a fuzzy set defined by the
membership degree
of each data to each class. This representation enables
to
work
with the
fuzzy
set theory
to establish the
partition
characteristics
without
work
ing
directly
with
the data
value
s
or
the geometrical structure of the classes.
To estimate
the partition dispersion, different measures based on the
membership degrees and the data averages were proposed
[6]
[7]
.
“The
Interclass Contrast Index“,
Icc
[7]
generalises the
Fisher‘s dispersion matrix
concept
to fuzzy partitions
.
In this section
a
new quality p
artit
ions index
, to be neared to the
Icc
index
[7]
,
is
presented
.
Eq
. (
1
)
gives
the
expression of this
Icc
index:
K
D
N
sbe
Icc
.
min
(
1
)
The dispersion
measure
sbe
k
associated
to
the
class
k
dispersion
is given by:
T
ke
N
n
ke
kn
k
m
m
m
m
sbe
)
(
)
(
1
N
n
kn
N
n
n
kn
ek
x
m
1
1
.
and
N
n
n
x
N
m
1
.
1
(
2
)
Where
N
is the total
number of
training data,
m
ke
is the
class
k
prototype
and
m
is the data centre
.
kn
is the membership degree
of
the data
n
(vector
x
n
)
to the
class
k
.
This approach use
s
explicitly
the data values, which represents a high
calculation cost when the
re are a lot of
descriptors
i.e.
process variables
(
common
for
comp
lex process
)
.
The minimal distance (
D
min
) is
calculated
by
the euclidean
inter

classes centres
distance
(
ek
m
,Eq.(2))
.
The
n K, the
quantity of
classes is tak
e
n
into account in order to avoid a high
Icc
value (correspond
ing
to
the best pa
rtition) whe
n
there is
a large number of classes.
The proposed new quality partition index is the
Clusters Validity Index
(
CV
,
Eq.(
3
)) has the same structure than the
Icc
but for the partition dispersion
(
Dis
)
and the minimal distance
(
*
min
D
=min(
d*(p,q)
)
]
,
[
,
K
q
p
1
)
estimations uses
expressions which don’t depend
explicitly
on data values.
K
D
N
Dis
CV
.
*
min
(
3
)
This index measures the quality partition in terms of classes co
mpactness and
separation, the highest value of
CV
correspond to a better partition.
The
computation of the partition dispersion involves the
definition
of the
information index I
D,
[8]
which
measures the information degree of fuzzy sets;
I
D
has a high value for fuzzy sets with strong membership degrees to similar
prototype data and small ones in the opposite case. The
I
D
complement is
an
entropy measure
H
D
(A). H
D
(A)
establishes t
he similarity between a fuzzy set
A
and the singleton (the most compact fuzzy set)
and is used to evaluate
the
dispersion
Dis
(
Eq. (
4
))
.
By using the
I
D
(A)
, the quality analysis of each fuzzy
set is indirectly included. Thus the classes would be better def
ined (more similar
to the singleton) if the value of dispersion is low. In order to estimate the
minimal distance
D*
min
a new distance measure has been proposed to compute
4
C. Isaza et al.
the distance between fuzzy sets without including the data value. The Eq.(5)
defines
a distance index between two fuzzy subsets
(A
and
B)
[8]
.
K
k
Mk
Mk
N
n
kn
kn
K
k
D
K
k
D
N
k
I
k
H
Dis
1
1
1
1
)
exp(
)
exp(
1
)
(
1
)
(
, where
kn
Mk
kn
and
kn
Mk
max
]
,
1
[
,
N
n
n
,
K
= number of classes
(
4
)
N
n
Bn
An
N
n
Bn
An
B
A
M
B
A
M
B
A
d
1
1
)
,
max(
)
,
min(
1
1
,
*
(
5
)
d*(
A,B
)
is an ultrametric measure of the set
P
(
X
) of the fuzzy
set
of the
X
data.
Therefore
using the complement of the
d*(A,B),
it
is possible to obtain a
measurement of
similarity
between fuzzy sets
G
(
A,
B
)
.
)
,
(
*
1
)
,
(
B
A
d
B
A
G
(
6
)
This
similarity index is used into the optimization partitions algorithm
to
estimate
the two more similar classes and
to
update
the fuzzy partition matrix.
In the literature there are several methods to optimize th
e data space partition.
They
are
based on a geometric repr
e
sentation of classes (for distance

based
clustering
techniques
[4]
, membership degree threshold overshoot), entropy
and
restarting the
training algorithm
)
.
The proposed
approach
is
more general
. The
algorithm
includes two
principal
steps, the partition
quality analysis
and the
clusters update.
At
each iteration, the
partition quality is measured by
the
proposed
validation index
(CV,
Eq.(
3
)
)
.
For the clusters update the fuzzy
simila
r
ity of classes
(G(A,B), Eq.(
6
))
is calculated and the merging (each
iteration) of the two similar classes is pe
r
formed
using the maximum
who S

Norm
.
The algorithm
at
each iteration
merges
the
two
most similar
classes
,
updates the fuzzy partition matrix
and
goes on
until
reaching
a partition
with
only
two class
es.
T
he best partition is the one with the maximum CV value
.
Unlike to the optim
ization algorithm proposed in
[9]
,
the
new methodolog
y
does
an exhaustive and ordered search of
the global
maximum;
it is avoided to
fall in local
maximums
.
The
proposed methodology
depends only of the
properties of an initial
non

optimal fuzzy partition (matrix of the membership
degrees) and not of the data values and
is applicable to all fuzzy classification
methods
.
4.
A
pplication to the propylene glycol production
The proposed
methodology
has been applied to the complete
prop
ylene glycol
production process
(Figure 1
) including
several unit operations
: mixing,
chemical reaction
(
2
8
3
6
3
2
O
H
C
O
H
C
O
H
)
and separation. This process has been
simulated with Hysys package and has been
reported
in a previous study
dealing with the development of a metho
dology for sensor selection for a
diagnosis purpose
[5]
.
This study lies on a first
classification
of faults with all
the possible descriptors before reducing to a
minimum
set of pertinent ones.
The
objective
of
the
proposed m
ethodology application
is to optimize the initial
partition
to
give
the
process
expert a simple
r
partition
(with few states without a
lack of precision)
which leads to a better understandable
behavioural model
.
Diagnosis of chemical processes by fuzzy clustering methods
5
Figure 1
.
Propylene glycol process scheme
[5]
.
The
non supervised
fuzzy classification technique adopted in this study is
LAMDA (Learning Algorithm for Multivariate Data Analysis
)
[10]
but
it could
have
be
en
replaced by any other fuzzy classification method.
The
cla
ssification
partition
optimization
method has
been applied
to the same data set used by
[5]
.
Faults/dysfunctions at different point
s
of the process have been simulated.
Increasing and decreasing changes around their nominal va
lues have been
applied to the
Propylene
Oxide
main feed flow rate
(1.
Oxyde
, 2.
Oxyde
)
,
inlet
cooling fluid
temperature
at the reactor
(3.
TinletCool
, 4.
TinletCool
)
,
inlet cooling
fluid at the distillation condenser
(5.
TinletCond
, 6.
TinletCond
),
reac
tion rate through
the frequency factor of the kinetic law
(7.
Freq.
, 8.
Freq
.)
,
.
This dynamic
simulation run yielded
6337
measurement
s
which
constitu
t
e
the set of
individual to be classed
.
The partition
optimisation
method has been applied to
this set con
taining
with
the
potential sensors
(before the sensor selection)
[5]
(it
has to be noted that among the potential sensors no concentration measurement
was
considered
)
.The
Figure
2
gives the time evolution of these sensors and
the
initial classification obtained with LAMDA (exhibiting
36
classes)
Figure 2.b.
Initial classification
Figure
2
.b.
Process v
ariables
Figure
2
.
Initial classification

21
descriptors
Figure 3
gives the
evolution of the
CV qual
ity index
with iterations
.
0,00008
0,00013
0,00018
0,00023
0,00028
0,00033
1
4
7
10
13
16
19
22
25
28
31
34
Iteration
CV
0,00008
0,00013
0,00018
0,00023
0,00028
0,00033
1
4
7
10
13
16
19
22
25
28
31
34
Iteration
CV
Figure
3
.
Partitions quality index
CV

21
descriptor
The best partition is obtained at
iteration 18
and so the optimal partition is
composed of
19
classes
as presented on
Figure
4
.
The partition optimisation
method enables to merge classes to get a simpler partition but it allows also
conserving the small ones which may correspond to transition states which can
be «
pre

fault
», alarm states or drift states. This is very important t
o keep a
model with those specific «
pre

fault
» states since it is crucial to very early
detect a fault enabling the trigger a preventive action or maintenance. To
compare the results obtained, the reference partition proposed by the expert was
used. This
partition has 25 classes. Renaming the classes, the relation between
6
C. Isaza et al.
the states (failing and normal) and the partition obtained automatically is
presented, also the relationship to the partition of reference.
Class
Individual
Class
Individual
Figure
4
.
Optimal
classification

21
descri
pt
or
s
REFERENCE
C
LASS
AUTOMATIC
CLASS
STATE
REFERENCE
CLA
SS
AUTOMATIC
CLASS
STATE
1
1
B

TCOND
13
NON ELEMENTS
2
2
H

OXYDE
14
15
AL_H

OXYDE
3
3
REC_H

OXY_N
15
16
AL_B

OXYDE
4
4
B

OXYDE
16
17
SI1
5
7
REC_B

OXY_N
17
18
SI2
6
8
B

TCOOL
18
1
SI3
7
9
H

TCOOL
19
13
SI4
8
10
REC

HAUSSE
20
20
SI5
9
11
H

TCOND
21
21
SI6
10
12
NORMAL
22
23
SI7
11
13
H

FREQV
23
24
SI8
12
14
B

FREQV
Tab.
1
.
–
Classes/States
Association

21
descript
ors
.
The
similar
it
y
between the optimal partition and the reference par
tition is
calculated using the
normalized similitude index
[10]
;
the value is
0.0131
indicating the high compatibility between
the
two
partitions.
This method
obtains
a simple partition to identify the faults states and normal state
, taking in
account
the separation and dispersion classes criteria
and
giving to
expert of the
processes an important help to establish the functional
states in the
implementation of a monitoring technique of a complex process.
The optimal
partition
can be associated
most directly to states
tha
n
the initial partition.
5.
Conclusions
A me
thodology for the fuzzy partition optimization which is independent of the
class
i
fication methods has been proposed. The method is useful when there is
not a cluster geometrical representation. The approach is considered as a
complement of the classif
i
cati
on methods and is useful for the identification of
complex systems faults.
In this first approach only a type of S

Norme and T

Norme (min

max) had been
used, a study to the influence of type the S

Norme and T

Norme is necessary
because there is influence o
f these operations into the method calculus.
References
[1]
S.
Gentil
et al.
, Supervision des Procédés Complexes, Lavoisier, 2007
[2]
X.
Wang, V.
Syrmos, Optimal Cluster Selection Based on Fisher Class Measure. ACC, 2005.
[3]
U.
Kaymak,
M.
Setnes,
. IEEE Transaction on Fuz
zy Systems, Vol. 10 No. 6, 2002.
[4]
R.
Krishnapuram
,
J.
Kim
, IEEE Trans. Fuzzy Syst., vol. 8, pp. 228
–
236, Apr. 2000.
[5]
A.
Orantes,
et al.
, A new support methodology for the placement of sensors used for fault detection and
diagnosis, Chemical Engineering Process
, Elsevier, In Press
, available on line
doi:10.1016/j.cep.2007.01.024
[6]
L.
Xie,
B.
Xuanli
. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 13 No 8, 1991
[7]
C.
Franco,
et a
l.
, A Validity Measure for Hard an Fuzzy clustering derived from Fisher’s Linear
Discriminant. International Conference on Fuzzy Systems, 2002
[8]
Isaza C. et al.. Decision Method for Functional States Validation in a Drinking Water Plant, 10th
Computer Appli
cation in Biotechnology (CAB), IFAC,
2007
[9]
Isaza C.,
et al.
.
Artificial Intell
igence Research and Development
, 2006
[10]
R.
López, Aut
o apprentissage d’une partition
: Application au classement itératif de données
multidimensionnels.
Phd. These
, UPS de Toulouse, 1977
Comments 0
Log in to post a comment