Diagnosis of chemical processes by fuzzy clustering methods: New optimization method of partitions

overratedbeltΤεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 3 χρόνια και 4 μήνες)

70 εμφανίσεις

18
th

European Symposium on Computer Aided Process Engineering


ESCAPE 18

Bertrand Braunschweig

and
Xavier Joulia

(Edit
ors)

© 2008 Elsevier B.V./Ltd. All rights reserved.


Diagnosis of chemical processes by fuzzy

clustering
methods: New optimization

method of
partitions

Claudia Isaza
a,
c
,
Marie
-
Veronique Le Lann
a
,b
,

Joseph Aguilar
-
Martin
a

a

CNRS

; LAAS, 7, avenue du Colonel Roche, F
-
31077 Toulouse
,

France

b

Université de Toul
ouse

; INSA ; LAAS

;

135, avenue de Rangueil ; F
-
31 077 Toulouse, France

c
Universidad de Antioquia
,

Calle 67 Nº 53
-
108

bloque

21
, Medellín
, Colombia

Abstract

Th
e
diagnosis of
processes can be

defined as the identification of
their

functional states.
W
hen
a mathematical model is difficult or not possible to
obtain

(what is generally the case for complex chemical process
es
)
, knowledge
on the process
behaviour

can be extracted from
historical
measurements

or
complex simulators
. This knowledge is then
organize
d

as a partition of the data
set into classes representing the functional states of the process

(normal or
faulty operations)
. Among the
data mining

techniques, those including fuzzy
logic present the advantages to express the data membership degree to sev
eral
classes. The main objective of
this
work is
to
propose a new method to
optimize

the fuzzy partition in terms of cluster compactness and separation.
Th
is
method

is
applied to the propylene glycol production process

simulated with Hysys
.


Keywords
:
Diag
nosis, fuzzy clustering, partition quality index, complex chemical
process
.

1.

Introduction

The
diagnosis
of

a
complex process
, in the absence
of a

precise
mathematical
model
,

can be developed from

measures reco
rd
ed during previous normal and
abnormal situati
ons

which

can be used to define the operational states
through
trainin
g

mechanisms and expertise.

Among the
data mining

techniques
,
t
he
classification techniques

enable
to establish a model of the system

states

(
behavioural

model) by
extracting

knowledge
f
rom

various attributes

(
raw or
statistical characteristics of a signal such as average and standard deviation

or
even qualitative information
)

relating to a particular behavio
u
r, without this
behavio
u
r being represented by a
set
of analytical relations
.
Th
e modifications
of these characteristics
enable the

detect
ion of

abnormal operation
s

[1]
.


Among the

large number of classification techniques
,

those including fuzzy
logic present the advantages to express the membership degre
e

of an
observation (data)
to several classes

and are known to be able to model
knowledge uncertainty and imprecision
. In general, these methods
exhibit
a
similar and high performance if the
ir
respective

initial parameters are
adequately selected
.

To
selec
t correctly
the
se

parameter
s
,
different

approach
es

for the validation
and
the
adap
ta
t
ion of

the data space partition are proposed
[2]
[3]
.
To evaluate the partition quality,
the
y

generally
used
the geometrical
cluster characteristics obtained by data distance based
clustering algorithms
[4]
.
2


C. Isaza et al.

C
onsequently
the
se
approaches

can be
applied
only to some
specific
classification techniques
.
Th
e
proposed method

here

validates and adapts

automatically the
data space
partition obtained by a
ny

fuzzy classification
technique.
It

is based on a new quality partition index
(CV) depending only o
n

an initial non
-
optimal fuzzy partition

through the membership degree
matrix
and not
on

the data values. The proposed methodology has been applied to the
propylene glycol production process
[5]

simulated on HYSYS.


In Section 2, the systems diagnosis by classification methods is introduced.
Section 3 describes the proposed approach, followed by the chemical process
application
in Section 4. Finally the discussion and conclusions are present
ed.

2.

Diagnos
is using classification methods

Process
monitoring using classification method consists in determining at each
sample time, the current class which was associated beforehand
to
a functional
state of the
process
.
T
here are two principal phases: t
he training and
the
recognition. In
the
first step (training), the objective is to find the
process
behaviour characteristics which will allow differentiating the
process
states
(each one
being
associated to a class).
The initial
algorithm
parameters are
selected
by
the process expert
who

validates the
obtained
behavioural

model.
In
a

posterior step, the data

recognition
allows

to identify one line the current
process
state.
At each

sampling
time
,
a

vector
collects

the accessible
information
(raw data or p
re
-
treated data such as filtered, FFT) which
is
provided for monitoring, and the class recognition

procedure yields the operator
what is the current functional state of the process.

In order to optimize the
obtained partition we
propose
to include into the

training phase
a step
to

automatically validate

and adjust
the
clusters.

The
proposed
new approach
automatically
improve
s
, in terms of compactness and class
separation
,

a non
optimal initial
partition helping

therefore the
discrimination

between classes i
.e.
between
operati
on modes
.

3.

Fuzzy partition o
ptimisation method

For improving a partition, it is necessary
first
to evaluate its quality.
A
new
partition

quality index (CV)
which is the base of the fuzzy partition
improvement algorithm
is p
ro
posed. It
inc
ludes a measure of inter
-
classes
distance and partition dispersion
.

Within the diagnosis area, it is interesting to
have a partition with compact and separate classes
[3]
[6]
[7]
, but to avoid the
trivial solution with a class by element, it is necessary to take into account the
total number of classes.
By seeking compact classes, the objective is to have
clusters

with high

membership
degrees
for

the elements

similar to

the
class
prototypes and small
membership
degrees in the
opposite
cases.
A good
separation of

classes facilitate
s

the detection of the abnormal
process
behaviour
.
A

high number of classes
generate

an unneces
sarily complex
behavioural

model.
I
n the other hand,
a
behavioural

model
w
ith a very low number of
classes w
ould be incomplete or little precis
e
. Consequently, a

partition quality

index must
take into

account the 3
characteristics:

compactness, separation
and
Diagnosis of chemical processes by fuzzy clustering methods

3

number of classes
.

Most

of the

cited

indexes

need
complement

information
associated to

distance
-
based clustering
algorithms
[4]
.
The
objective
here
is to
propos
e

a

partition quality index applicable to

any

fuzzy classification
method
.

In our

approach
,

each class is interpreted like a fuzzy set defined by the

membership degree

of each data to each class. This representation enables
to
work

with the
fuzzy

set theory

to establish the
partition
characteristics

without

work
ing

directly
with
the data
value
s

or
the geometrical structure of the classes.

To estimate

the partition dispersion, different measures based on the
membership degrees and the data averages were proposed
[6]
[7]
.
“The
Interclass Contrast Index“,

Icc
[7]

generalises the
Fisher‘s dispersion matrix
concept

to fuzzy partitions
.

In this section

a

new quality p
artit
ions index
, to be neared to the
Icc

index
[7]
,
is
presented
.

Eq
. (
1
)
gives

the

expression of this

Icc

index:


K
D
N
sbe
Icc
.
min





(
1
)

The dispersion

measure

sbe
k

associated
to
the
class

k

dispersion
is given by:

T
ke
N
n
ke
kn
k
m
m
m
m
sbe
)
(
)
(
1














N
n
kn
N
n
n
kn
ek
x
m
1
1
.




and




N
n
n
x
N
m
1
.
1

(
2
)

Where

N

is the total
number of
training data,

m
ke

is the

class

k

prototype
and
m

is the data centre
.

kn


is the membership degree
of
the data
n

(vector
x
n
)

to the
class
k
.
This approach use
s

explicitly
the data values, which represents a high
calculation cost when the
re are a lot of

descriptors
i.e.

process variables

(
common
for
comp
lex process
)
.
The minimal distance (
D
min
) is
calculated

by
the euclidean
inter
-
classes centres
distance
(
ek
m
,Eq.(2))
.

The
n K, the

quantity of
classes is tak
e
n

into account in order to avoid a high
Icc

value (correspond
ing

to
the best pa
rtition) whe
n

there is

a large number of classes.

The proposed new quality partition index is the
Clusters Validity Index

(
CV
,
Eq.(
3
)) has the same structure than the
Icc

but for the partition dispersion
(
Dis
)
and the minimal distance
(
*
min
D
=min(
d*(p,q)
)
]
,
[
,
K
q
p
1


)
estimations uses
expressions which don’t depend
explicitly

on data values.


K
D
N
Dis
CV
.
*
min





(
3
)

This index measures the quality partition in terms of classes co
mpactness and
separation, the highest value of
CV

correspond to a better partition.

The
computation of the partition dispersion involves the
definition

of the
information index I
D,

[8]

which

measures the information degree of fuzzy sets;
I
D

has a high value for fuzzy sets with strong membership degrees to similar
prototype data and small ones in the opposite case. The
I
D


complement is
an

entropy measure

H
D
(A). H
D
(A)

establishes t
he similarity between a fuzzy set
A

and the singleton (the most compact fuzzy set)

and is used to evaluate
the
dispersion
Dis

(
Eq. (
4
))
.

By using the
I
D
(A)
, the quality analysis of each fuzzy
set is indirectly included. Thus the classes would be better def
ined (more similar
to the singleton) if the value of dispersion is low. In order to estimate the
minimal distance
D*
min

a new distance measure has been proposed to compute
4


C. Isaza et al.

the distance between fuzzy sets without including the data value. The Eq.(5)
defines

a distance index between two fuzzy subsets
(A

and
B)
[8]
.

















K
k
Mk
Mk
N
n
kn
kn
K
k
D
K
k
D
N
k
I
k
H
Dis
1
1
1
1
)
exp(
)
exp(
1
)
(
1
)
(





, where

kn
Mk
kn





and


kn
Mk


max

]
,
1
[
,
N
n
n


,
K
= number of classes

(
4
)

















N
n
Bn
An
N
n
Bn
An
B
A
M
B
A
M
B
A
d
1
1
)
,
max(
)
,
min(
1
1
,
*







(
5
)

d*(
A,B
)

is an ultrametric measure of the set
P
(
X
) of the fuzzy
set

of the
X
data.
Therefore
using the complement of the
d*(A,B),
it
is possible to obtain a
measurement of
similarity

between fuzzy sets
G
(
A,

B
)
.

)
,
(
*
1
)
,
(
B
A
d
B
A
G




(
6
)

This

similarity index is used into the optimization partitions algorithm

to
estimate

the two more similar classes and
to
update

the fuzzy partition matrix.

In the literature there are several methods to optimize th
e data space partition.
They
are
based on a geometric repr
e
sentation of classes (for distance
-
based
clustering
techniques
[4]
, membership degree threshold overshoot), entropy

and
restarting the
training algorithm
)
.

The proposed
approach

is
more general
. The
algorithm

includes two
principal
steps, the partition
quality analysis

and the
clusters update.
At

each iteration, the

partition quality is measured by
the
proposed

validation index

(CV,

Eq.(
3
)
)
.

For the clusters update the fuzzy
simila
r
ity of classes

(G(A,B), Eq.(
6
))

is calculated and the merging (each
iteration) of the two similar classes is pe
r
formed

using the maximum
who S
-
Norm
.

The algorithm
at
each iteration
merges
the
two
most similar
classes
,
updates the fuzzy partition matrix

and
goes on
until
reaching

a partition
with
only
two class
es.

T
he best partition is the one with the maximum CV value
.
Unlike to the optim
ization algorithm proposed in
[9]
,
the
new methodolog
y

does

an exhaustive and ordered search of
the global
maximum;

it is avoided to
fall in local
maximums
.

The
proposed methodology

depends only of the
properties of an initial

non
-
optimal fuzzy partition (matrix of the membership
degrees) and not of the data values and
is applicable to all fuzzy classification
methods
.

4.

A
pplication to the propylene glycol production

The proposed
methodology

has been applied to the complete

prop
ylene glycol
production process
(Figure 1
) including

several unit operations
: mixing,
chemical reaction
(
2
8
3
6
3
2
O
H
C
O
H
C
O
H


)
and separation. This process has been
simulated with Hysys package and has been
reported

in a previous study
dealing with the development of a metho
dology for sensor selection for a
diagnosis purpose

[5]
.

This study lies on a first
classification

of faults with all
the possible descriptors before reducing to a
minimum

set of pertinent ones.
The
objective

of
the
proposed m
ethodology application

is to optimize the initial
partition
to
give
the
process
expert a simple
r

partition

(with few states without a
lack of precision)
which leads to a better understandable
behavioural model
.

Diagnosis of chemical processes by fuzzy clustering methods

5



Figure 1
.


Propylene glycol process scheme
[5]
.

The
non supervised
fuzzy classification technique adopted in this study is
LAMDA (Learning Algorithm for Multivariate Data Analysis
)
[10]

but
it could
have
be
en

replaced by any other fuzzy classification method.

The
cla
ssification

partition
optimization

method has
been applied

to the same data set used by
[5]
.
Faults/dysfunctions at different point
s

of the process have been simulated.
Increasing and decreasing changes around their nominal va
lues have been
applied to the
Propylene

Oxide

main feed flow rate

(1.

Oxyde
, 2.

Oxyde
)
,
inlet
cooling fluid
temperature
at the reactor

(3.

TinletCool
, 4.

TinletCool
)
,

inlet cooling
fluid at the distillation condenser

(5.

TinletCond
, 6.

TinletCond
),

reac
tion rate through
the frequency factor of the kinetic law
(7.

Freq.
, 8.

Freq
.)
,

.
This dynamic
simulation run yielded
6337
measurement
s

which

constitu
t
e
the set of

individual to be classed
.

The partition
optimisation
method has been applied to
this set con
taining
with
the

potential sensors

(before the sensor selection)

[5]

(it
has to be noted that among the potential sensors no concentration measurement
was
considered
)

.The
Figure
2

gives the time evolution of these sensors and

the
initial classification obtained with LAMDA (exhibiting
36

classes)



Figure 2.b.

Initial classification

Figure
2
.b.

Process v
ariables

Figure
2
.

Initial classification

-
21

descriptors

Figure 3

gives the
evolution of the
CV qual
ity index
with iterations
.

0,00008
0,00013
0,00018
0,00023
0,00028
0,00033
1
4
7
10
13
16
19
22
25
28
31
34
Iteration
CV
0,00008
0,00013
0,00018
0,00023
0,00028
0,00033
1
4
7
10
13
16
19
22
25
28
31
34
Iteration
CV

Figure
3
.

Partitions quality index

CV
-
21

descriptor


The best partition is obtained at
iteration 18

and so the optimal partition is
composed of
19

classes

as presented on
Figure
4
.

The partition optimisation
method enables to merge classes to get a simpler partition but it allows also
conserving the small ones which may correspond to transition states which can
be «

pre
-
fault

», alarm states or drift states. This is very important t
o keep a
model with those specific «

pre
-
fault

» states since it is crucial to very early
detect a fault enabling the trigger a preventive action or maintenance. To
compare the results obtained, the reference partition proposed by the expert was
used. This

partition has 25 classes. Renaming the classes, the relation between
6


C. Isaza et al.

the states (failing and normal) and the partition obtained automatically is
presented, also the relationship to the partition of reference.

Class
Individual
Class
Individual

Figure
4
.

Optimal
classification

-
21

descri
pt
or
s



REFERENCE

C
LASS

AUTOMATIC

CLASS

STATE

REFERENCE

CLA
SS

AUTOMATIC

CLASS

STATE

1

1

B
-
TCOND

13

NON ELEMENTS


2

2

H
-
OXYDE

14

15

AL_H
-
OXYDE

3

3

REC_H
-
OXY_N

15

16

AL_B
-
OXYDE

4

4

B
-
OXYDE

16

17

SI1

5

7

REC_B
-
OXY_N

17

18

SI2

6

8

B
-
TCOOL

18

1

SI3

7

9

H
-
TCOOL

19

13

SI4

8

10

REC
-
HAUSSE

20

20

SI5

9

11

H
-
TCOND

21

21

SI6

10

12

NORMAL

22

23

SI7

11

13

H
-
FREQV

23

24

SI8

12

14

B
-
FREQV




Tab.
1
.



Classes/States
Association

-
21

descript
ors
.

The
similar
it
y

between the optimal partition and the reference par
tition is
calculated using the
normalized similitude index

[10]
;

the value is
0.0131
indicating the high compatibility between

the

two
partitions.
This method
obtains
a simple partition to identify the faults states and normal state
, taking in
account
the separation and dispersion classes criteria

and

giving to
expert of the
processes an important help to establish the functional

states in the
implementation of a monitoring technique of a complex process.
The optimal
partition
can be associated
most directly to states
tha
n

the initial partition.

5.

Conclusions

A me
thodology for the fuzzy partition optimization which is independent of the
class
i
fication methods has been proposed. The method is useful when there is
not a cluster geometrical representation. The approach is considered as a
complement of the classif
i
cati
on methods and is useful for the identification of
complex systems faults.

In this first approach only a type of S
-
Norme and T
-
Norme (min
-
max) had been
used, a study to the influence of type the S
-
Norme and T
-
Norme is necessary
because there is influence o
f these operations into the method calculus.

References

[1]

S.
Gentil
et al.
, Supervision des Procédés Complexes, Lavoisier, 2007

[2]

X.
Wang, V.
Syrmos, Optimal Cluster Selection Based on Fisher Class Measure. ACC, 2005.

[3]

U.
Kaymak,
M.
Setnes,
. IEEE Transaction on Fuz
zy Systems, Vol. 10 No. 6, 2002.

[4]

R.
Krishnapuram
,

J.
Kim
, IEEE Trans. Fuzzy Syst., vol. 8, pp. 228

236, Apr. 2000.

[5]

A.
Orantes,
et al.
, A new support methodology for the placement of sensors used for fault detection and
diagnosis, Chemical Engineering Process
, Elsevier, In Press
, available on line
doi:10.1016/j.cep.2007.01.024

[6]

L.
Xie,
B.
Xuanli
. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 13 No 8, 1991

[7]

C.
Franco,

et a
l.
, A Validity Measure for Hard an Fuzzy clustering derived from Fisher’s Linear
Discriminant. International Conference on Fuzzy Systems, 2002

[8]

Isaza C. et al.. Decision Method for Functional States Validation in a Drinking Water Plant, 10th
Computer Appli
cation in Biotechnology (CAB), IFAC,

2007

[9]

Isaza C.,
et al.
.

Artificial Intell
igence Research and Development
, 2006

[10]

R.
López, Aut
o apprentissage d’une partition
: Application au classement itératif de données
multidimensionnels.
Phd. These
, UPS de Toulouse, 1977