c

sharpfartsΤεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 4 χρόνια και 2 μέρες)

49 εμφανίσεις

Fuzzy Bi
-
cluster
Regression

張炳騰

教授

東海大學工業工程與經營資訊學系


2

Outlines


Fuzzy Regression and its Methods


Fuzzy Clustering


Concept and Solution Approach of
Fuzzy Bi
-
cluster Regression


Application to Heat Tolerance
Estimation of Plants


Fuzzy c
-
regression Models


3

Fuzzy Sets


Fuzzy set was first introduced by Zadeh [50].
A membership function can be defined for all
the elements
a

in a referential set
U
.


A fuzzy set (
A
) can be defined by the
membership function

A
(
a
), which takes
values in [0, 1].



Thus, a fuzzy set (
A
) is said to be convex if all
the ordinary subsets of
A

are convex. A fuzzy
set is to be normalized if

a



U
,


A
(
a
) = 1.

4

Fuzzy Numbers


Moreover, a fuzzy number (FN) can be defined as
a convex normalized fuzzy set on real line


with
an upper semi
-
continuous membership function
and bounded support.


Definition A.1

(
Bound form representation

of
FNs) Specially, a
symmetrical FN

may be written
as
A

= (




c
,

,


+
c
)
LR
,


where




c

and


+
c

represent the
lower

and
upper bounds
,


the
mode, mean,

or

center
value
, and
L
,
R

the left and right reference (or
shape) functions, respectively, of
A
. (




c
,


+
c
)
is the
support

of
A
. In the case,
A

has the
membership function


5





Likewise,
non
-
symmetrical FNs

can also be defined. Also,
for a
symmetrical triangular FN

it may be specially defined
with


  

  

 


   



   


 

0,,
(( )/),,
( )
(( )/),,
0,,
A
a c
L a c c a
a
R a c a c
a c

  

  

 


   



   


 

0,,
( )/,,
( )
( )/,,
0,,
A
a c
a c c a
a
a c a c
a c






c



+
c




A
(
a
)
=1


and denoted as
A

= (




c
,

,


+
c
)
T
.

6


For an
interval

or
level of confidence

or termed

-
cut

or

-
level set

at level





(0, 1], an ordinary
subset of
A

can be defined and denoted as [
A
]

:


[
A
]


= {
a



U

|

A
(
a
)



},




(0, 1]

(A1)





Also, another representation exists and may be
utilized.





7


Definition A.2

(
Spread form representation

of
FNs) A
symmetric FN

can also be written as
A

=
(

,
c
)
L



with the membership function defined as

A
(
a
) =
L
((
a




)/
c
)

a


, where


is the
centre

or

mode
value

of
A

and
c

is the
spread

or
radius

(around
the mode) of
A
. Thus, this reference function
L
(
x
)
has the properties: (
i
)
L
(
x
) =
L
(

x
), (ii)
L
(0) = 1,
L
(1) = 0, (iii)
L

is decreasing on [0,

], and (iv)
L

is invertible on [0, 1].



c

c




A
(
a
)=1

8

Fuzzy Linear Regression
(
FLR
)


Fuzzy regression (FR) analysis is a methodology giving
rise to a possibility distribution for an imprecise or vague
phenomenon, which can be expressed by
yielding fuzzy
parameters/coefficients
.


The FR can represent the data accrual without losing the
original meaning and can
analyze the trends of both
variability and mean in data
.


Kim et al. [31] also noted that the classical regression
analysis makes the rigid assumption about the statistical
properties, such as normality of the error terms and
prediction and random measurement errors in recorded
observations.


The FR may relax these rigid assumptions. FR analysis
was first introduced by Tanaka et al. [45].

9


The fuzzy linear regression analysis introduced in
this appendix was first introduced by Tanaka et al.
[45] (also see [47], [44]) and the approach
reviewed below is that also modified by Chang
and Lee [5] (also see [8]).


Consider the
general linear function
Y
*

=
f
(
x
,
A
),


where
x

= (1,
x
1
, …,
x
n
)
T

represents the vector of
non
-
fuzzy (crisp) inputs,
A

= (
A
0
,
A
1
, …,
A
n
)
T

represents the vector of fuzzy parameters,
respectively, and T stands for transpose.


The fuzzy parameters
A
k

can be defined as the
symmetrical triangular FNs,

A
k

= (

k
,
c
k
)
L
,
k

= 0,
1, …,
n
, and have the membership function

10








In vector notation, the parameters
A

= (
A
0
,
A
1
, …,
A
n
)
T

can also be written as


A

= (


c
), where


= (

0
,

1
, …,

n
)
T

and
c

= (
c
0
,
c
1
, …,
c
n
)
T
.






  






1, ,
( )
0,.
k
k k
k k k
A k
k
a
if a c
a
c
otherwise

k

c
k

c
k





( )
1
k
A k
a
11


Therefore, the
estimated
Y
*

of the function can
be obtained by using the Zadeh extension
principle or fuzzy arithmetic (e.g., see [14] and
[29]) and results in the membership function






where |
x
| = (1, |
x
1
|, …, |
x
n
|)
T
. The
mode of
Y
*

is


T
x

and spread of
Y
*

is
c
T
|
x
|.





 


  


 



0
0
0
*
T
T
( )

1,,
1,, 0,
0,, 0,
Y y
y
y
y

c
x
x
x
x
x
12


Suppose there are a set of
crisp
-
input but fuzzy
-
output
data (
Y
i
,
x
i
),
i

= 1, …,
N
, available and
Y
i

= (
y
i
,
e
i
)
L
.


To obtain
the fuzzy parameters
A
, the condition that








H

for all
i

holds

may be imposed
upon the FLR, where [
Y
i
]


stands for the fuzzy estimate
of
Y
i

and
H



[0, 1) is a threshold level chosen by the
decision
-
maker.


In the fuzzy regression analysis,
H

can be regarded as a
measure of goodness of fit for the model. For instance, if
H

= 0, the estimated data totally enclose the observed
data.


The objective function of the fuzzy regression can be
defined as the minimization of total fuzziness (spreads)
of the estimated outputs .


This problem can be formulated as a
linear programming
(LP) problem

([5], [45]) as:

*
[ ] [ ]
i i
Y Y
 

13

i
Y

*
i
Y

y
14




The solution gives the FLR model:


      
  
    
    

0 1 1
1
T T
T T
( )
, 1, ..., ,
(1 ) (1 ),
(1 ) (1 ).
N
i n ni
i
k
i i i i
i i i i
Minimize J Nc c x c x
subject to all c and i N
H y H e
H y H e


c
c
x x
x x
  
       
*
0 0 1 1 1
(,) (,) (,) (,| |)
L L n n L n L
Y c c x c x
T T

x c x
15

Other FR Methods Tested



1)
Savic
-
Pedrycz Fuzzy Least
-
squares Regression
(
FLSR
)



Savic and Pedrycz [41] also modified Tanaka et
al.’s FLR method [45], [47], [44] by using the
classic least squares concept. This approach is
therefore performed in two steps.


First, a conventional least
-
squares regression is
fitted to the modes of the fuzzy data and results in
the vector of modes

*

for the fuzzy regression
parameters.


Then, the second step is performed exactly the
same as the Tanaka et al.’s FLR method ([45], [47],
[44]), with the known fuzzy parameter modes

*
.


16


In addition, for a possibility that a system may
have inconsistent or conflicting trends of mode
and spread in a FR analysis and if that is not
properly treated, incorrect trends may be
predicted with a FR method.


To overcome this problem, Chang and Lee [5], [8]
have proposed the idea by allowing the FR
parameter spreads to be unconstrained in sign
(i.e.,
all
c
k




, as in Tanaka et al.’s

FLR method
also modified by Chang and Lee, (B5)).


Likewise, Savic
-
Pedrycz FLSR may be extended
by this idea too and a specially called
Spread
-
relaxed Savic
-
Pedrycz FLSR

may result and also
be tested in this paper.

17


2)
Chang
-
Lee FLSR Approach

Chang and
Lee [6] also proposed a least
-
squares
regression for the FR analysis.


Likewise, this fuzzy least
-
squares
regression method can be extended by
Chang and Lee’s concept ([5], [8]) by
allowing the FR parameter spreads

(
ck
)
unconstrained in sign.


The extension can be called the
Spread
-
relaxed Chang
-
Lee FLSR

and is also tested
in this paper.

18

Data clustering


Data clustering concerns
the structure of a data
set and partitions the data set into a finite
number of subsets by data correlation
.


Clustering analysis has been used widely in many
fields such as taxonomy, geology, business,
engineering systems, medicine and image
processing (e.g., see [2], [49], [24], [34]
,
[46]).


Among the various clustering/partitioning
approaches, the hard
c
-
means (
k
-
means), fuzzy
c
-
means (FCM), variants, artificial neural
networks, etc. have become the popular
approaches (e.g., see [2], [27], [26], [49], [52]).

19

Fuzzy clustering concept


In particular, by the FCM and fuzzy clustering
concept,
the membership functions of multi
-
clusters may be defined
based on a distance
function and
degrees of the memberships may
express the proximity of a datum to the multi
-
cluster centers
.


  

  
   

1
1 2 3
( ) 1 : suppose 3
e.g., 1,( ) 0.2,( ) 0.7,( ) 0.1.
c
j
j
i i c
i i i i
20

Concept of Fuzzy Bi
-
cluster
Regression
(
FBCR
)


Simultaneously group data into two
clusters and fit fuzzy regression
models to both clusters


Whereby find the fuzzy threshold
(set or interval) for the critical
changing pattern of the data course

21


The FBCR can simultaneously and optimally
determine the two FLRs jointly with a fuzzy data
-
clustering to the FR lines with observed data



Fuzzy
intersection
set

FLR line 2

y

x





FLR line
1




22


Fuzzy clustering developed for the FR analyses in
the FBCR


Proposition
1.
Let


[0, 1] denote the
clustering membership grades for the observed
data (
Y
i
, x
i
) to the fuzzy clusters
j
,
j

= 1, 2 and
i

= 1, …,
N
.


j
( ) = 1

i
. Eq. (1) may classify
the data for the clusters of the two FLRs based
on .


( )
j
G
i

( )
j
G
i

( )
j
G
i




 


 


 


1
1
1
(,) 1 and 2, ( ),(soft classification)
(,) 1, ( ),(hard classification)
(,) 2, ( ),(hard classification)
i i G
i i G U
i i G L
Y x clusters if i T
Y x cluster if i T
Y x cluster if i T
23

FLR analyses of the FBCR


Weighted FLRs




 
 

 

  
 
      
 
 
1 1
10 11
1 1
10 11
10 11 10 11
( ) ( )
,, 1,
(1 ) (1 ),

1)

1 ( 1):

G G i
i cluster i cluster
i i i i
Min i c i c x
subject t
FLR for fuzzy cluster or F
o c c i cluster
x H c c x
LR lin
y H e
e

 
      
 
10 11 10 11
(1 ) (1 ).
i i i i
x H c c x y H e
 
 
 
 

  
 
      
 

 
2 2
20 21
2 2
20 21
20 21 20 21
20
( ) ( )
,, 2,
(1 ) (1 )
2) 2 (
,



2):

G G i
i cluster i cluster
i i i i
Min i c i c x
subject to c
FLR for fuzzy cluster FLR
c i cluster
x H c c x y H
line
e
 
     
 
21 20 21
(1 ) (1 ).
i i i i
x H c c x y H e
24

The overall objective function of
the FBCR


As aforementioned, the FBCR imposes the
important characteristic that both fuzzy clustering
and FLRs will be concurrently optimized.





 
  
  
 
 
 
 
     
  
1 2
2
0 1
1
( ) ( )
.. ( ) ( ) 1,( ) [0, 1] 1,..., 1,2.
j j
j
G j G j i
j i cluster j i cluster j
G G G
Min F i c i c x
s t i i i i N and j

responsible for optimizing the fuzzy
-
clustering
memberships and determines the optimal
segregation (Eqs. (1
b
)
-
(1
c
)) and overlapping (Eq.
(1
a
)) of data for the FRs
.

25

The FLRs obtained (FLR lines 1
and 2)


FLR line 1:
Y
*

=
A
10

+
A
11
x






= (

10
,
c
10
)
L

+ (

11
,
c
11
)
L
x
,



FLR line 2:
Y
*

=
A
20

+
A
21
x






= (

20
,
c
20
)
L

+ (

21
,
c
21
)
L
x
.

Pyramidal fuzzy intersection set

y
= (

21
)
x

+


20

y
= (

11
)
x

+


10

x


S
(
y
,
x
)

y

y
= (
c
11
+


11
)
x

+

c
10
+


10

y
= (

11


c
11
)
x

+


10


c
10

y
= (

21


c
21
)
x

+


20


c
20


y
= (
c
21
+


21
)
x

+

c
20
+


20


S

1 1
(,)
S S
y x
2 2
(,)
S S
y x
3 3
(,)
S S
y x
4 4
(,)
S S
y x
(,)
M M
S S
y x
26

A general solution approach


Find the clustering memberships. Determine
the data for the fuzzy clusters by Eq. (1).


Solve the FLRs Eqs. (5) and (6) (by linear
programming Eqs. (2) and (3)). Find the fuzzy
regression parameters for the clusters.


Evaluate the overall objective function Eq. (4
a
).
If not optimal, return to 1) and re
-
perform 1)
-
3).


Otherwise, if the optimal solution is found, find
the pyramidal fuzzy intersection set
S
, Eqs. (7)
-
(10), for all

-
cuts.

27

A Solution Approach for
FBCR



Mutation operation

Is termination
condition
reached?

No

Yes

Determine the fuzzy intersection set

Crossover operation

Evaluate overall objective (fitness) function

Perform FLRs (linear programming)

Elitism


Tournament selection


Generate initial population (real codes)

28

Application to Heat
Tolerance Estimation of
Plants


The chlorophyll fluorescence measurement is an indicator
for functional change of photosynthesis and is sensitive to
temperature.
Using the fluorescence
-
temperature curves
the experimenter may determine the heat tolerance (T
c
) of
plants
by intersections of two linear regression lines.


When there are
nonlinear inflections in data curves
, due to
the imperative use of linear regression models, the
traditional regression analysis may become unable to
sufficiently model the uncertainties exhibited.


By FBCR, from
S
, T
c

may be determined as the fuzzy set or
fuzzy number, denoted as
FTc
. Using the bounds and mode
of the fuzzy intersection set
S
, Eq. (10), where
x

is
replaced by

t
to stand for the temperature,
FT
c

may be
determined in the bound form as


29










 
1 M 2 1 2 3 4 M 1 2 3 4
,,min,,,,,max,,,
c c c c S S S S S S S S S
T
T
FT t t t t t t t t t t t t
y
= (

21
)
x

+


20

y
= (

11
)
x

+


10

x


S
(
y
,
x
)

y

y
= (
c
11
+


11
)
x

+

c
10
+


10

y
= (

11


c
11
)
x

+


10


c
10

y
= (

21


c
21
)
x

+


20


c
20


y
= (
c
21
+


21
)
x

+

c
20
+


20


S

1 1
(,)
S S
y x
2 2
(,)
S S
y x
3 3
(,)
S S
y x
4 4
(,)
S S
y x
(,)
M M
S S
y x
30

Traditional Regression
Statistical Analysis Results


(a)
Pachira macrocarpa
(b)
Ipomoea batatas
(c)
Euphoria longana
(d)
Mangifera indica
(e)
Ficus retusa
31

Results of Traditional Regression
Statistical Analysis


Scientific
name

Month of
experiment

T
c

Avg. T
c

/ Interval of T
c

[Avg. T
c



1 std. dev.
]

Measurement number

1

2

3

Pachira
macrocarpa

July

49.8094

49.2886

49.3407

49.4796 /

[49.1928, 49.7664]

Ipomoea
batatas

August

47.5588

47.9347

47.7798

47.7578 /

[47.5689, 47.9467]

Euphoria
longana

July

48.6356

48.8014

48.8933

48.7768 /

[48.6462, 48.9074]

Mangifera
indica

August

49.3275

48.9607

48.5549

48.9477 /

[48.5612, 49.3342]

Ficus
retusa

July

34.2346

33.2188

34.3862

33.9465 /

[33.3117, 34.5813]

32

Fuzzy Bi
-
cluster Regression
Analytic Results



In the FBCR, as the data inputs, the fuzzy
observations
Yi

or intensities of Fo response
equivalent to (
y
i
,
e
i
)
L

may be taken as follows.


First, under each temperature the maximum and
minimum F
o

may be found.


Then,
y
i

takes the average of the maximum and
minimum Fo and
e
i

equals the maximum F
o

minus
y
i
.


These fuzzy observations can contain all the data
observed into the simultaneous analysis.

33

Results with fuzzy bi
-
cluster
regression

Pachira macrocarpa
Ipomoea batatas
Euphoria longana
Mangifera indica
Ficus retusa
34

Results of Fuzzy Bi
-
cluster
Regression

Scientific

name

H
-
level

tested

Mode of

FT
c

Support of
FT
c

The best overall

objective value

found

CPU

Time

(min.)

Pachira

macrocarpa

0.5

48.9376

[47.4773,
50.9726]

892.021

5.4404

Ipomoea

Batatas

0.5

47.8473

[47.1332,
50.1458]

928.356

5.1385

Euphoria

longana

0.5

48.2207

[46.9840,
50.2813]

1151.090

5.5104

Mangifera

indica

0.5

48.7175

[46.8776,
51.7823]

1240.911

4.9253

Ficus

retusa

0.5

34.2808

[33.0605,
36.7088]

625.725

3.6255

35

Comparison of FBCR Results and Traditional
Regression Analysis (TRA) (up to

4 Std. Dev.)


Traditional regression statistical analysis

FBCR

Interval length of T
c

Scientific

Name

Mode of

FT
c

Support

length

Avg. T
c

Std.

dev.


1 std.

dev.


3 std.

dev.


3.5 std.

dev.


4 std.

dev.

Pachira

Macrocarpa

48.937

3.495

49.480

0.287

0.574

1.721

2.008

2.294

Ipomoea

Batatas

47.847

3.013

47.758

0.189

0.378

1.133

1.322

1.511

Euphoria

Longana

48.221

3.297

48.777

0.131

0.261

0.784

0.914

1.045

Mangifera

Indica

48.718

4.905

48.948

0.387

0.773

2.319

2.706

3.092

Ficus

retusa

34.281

3.648

33.947

0.635

1.270

3.809

4.444

5.078


By TRA, even with the interval of

3.5 or 4 std. dev. used, the interval
lengths of T
c

of
Pachira macrocarpa
,
Ipomoea batatas
,
Euphoria
longana

and
Mangifera indica

are still underestimated, whereas the
interval of T
c

of
Ficus retusa

can become fairly large.

36


A useful and straightforward FR
approach, fuzzy bi
-
cluster regression,
has been proposed


More applications can be found in
other areas such as medicine


e.g.,
threshold of critical attribute value
changes to a disease, etc.