ISSN 1843

6188
Scientific Bulletin of the Electrical Engineering Faculty
–
Year 10 N
o.
3
(14)
5
CLUSTERING
METHODS
FOR
ELECTRICAL
LOAD PATTERN
CLASSIFICATION
Gianfranco
CHICCO
Politecnico di Torino, Dipartimento di Ingegneria Elettrica,
corso Duca degli Abruzzi 24, 10129 Torino, Italy
E

mail: gianfranco.chicco@polito.it
Abstract:
In the
current structure of the electricity business,
distribution and supply services have been unbundled in
many jurisdictions. As a consequence of unbundling,
electricity supply to customers is now provided on a
competitive basis. In this context, the electric
ity suppliers
need to get accurate information on the actual behaviour of
the electricity customers, for the purpose of setting up
effective commercial offers. Grouping the electrical load
patterns on the basis of information on their activity or
commercia
l codes has proven to be ineffective, since very
different load patterns would result in the same group.
Customer classification on the basis of consumption pattern
similarity is likely to provide more effective results. In order to
establish customer grou
ping based on similarity aspects,
various clustering techniques have been tested on electrical
load pattern data. This paper provides an overview of these
techniques, included in a more general scheme for analyzing
electrical demand data. The various stage
s of the customer
classification procedure include the definition of the
information to be gathered on the field, the selection of the
features to be used to run the clustering methods, the use of
clustering methods with assessment of their effectiveness
t
hrough the calculation of appropriate clustering validity
indicators, and the formation of the final load profiles
representing a relatively limited number of final customer
classes. The characteristics of these stages are illustrated and
discussed, provid
ing links to relevant literature references.
Keywords:
Classification, Clustering, Electrical consumer,
Electrical demand, Customer categorization, Load pattern,
Load profile, Validity indicator.
1.
INTRODUCTION
In most restructured electricity markets,
distribution and
supply services have been unbundled. Electricity
suppliers are now operating within a competitive
environment,
with
some degrees of freedom in
formulating the tariff offers
[
1
]
[
2
], provided that their
offers meet the requirements set by re
gulatory authorities
in the form of price or revenue caps.
Electrical l
oad pattern classification is carried out with
the
main objective
of
identifying a suitable set of
customer classes on the basis of the shape of the
electrical load patterns
.
For t
he c
ustomer classes
formed,
the supplier
can then
formulate
specific tariff offer
s.
Load pattern classification
for tariff purposes
is typically
performed on
aggregate
residential
load data, or on
individua
l
non

residential
load data
.
R
esidential
consumers ar
e generally not classified
as individual
entities,
for
the following
reasons:
•
the
consumption
pattern of individual residential loads
varies in function of the number of persons compos
ing
the family,
as well as
the
ir activity
, age and lifestyle
[
3
]
[
4
]
; the
characterization
of residential customers by
taking into account the expected load pattern of each
single customer would require performing a detailed
statistical
analysis
based on the several factors
affecting the energy use in a family
[
5
]

[
7
]
; however,
the variation of the individual residential customer
load at each hour of the day, mainly conditioned on the
occasional use of a few facilities with relatively large
power consumption (e.g., washing machine, electrical
oven, and so forth)
[
8
]
, is so large
to make the use of
statistical data impractical
;
•
the
electrical
distribution system
lines starting from the
MV/LV substation
do not feed the residential load
s
directly, but each distribution system feeder supplies
an
aggregated load
;
in the presence of a
significant
number (
e.g., a few
dozens
or more
) of residential
customers,
the diversity of energy use for each
customer
makes
the aggregated load pattern
smoother
with respect to the one of the individual residential
load
;
the time evolution of the aggrega
ted load pattern
can be predicted to a relatively good extent
[
7
]
.
Conceptually, electricity customer classification should
follow the rules of segmentation referred to the
commercial types of activity, as established for instance
by the national institute
s of statistic
s
. However, t
here is a
great diversity among the load patterns of the customers
belonging to the same
type of activity
or associated to the
same
commercial code
[
9
]
[
10
]
.
As such, c
ustomer
classifications based on the type of activity and on
c
ommercial codes are
not efficient
for representing the
specific aspects of the electricity consumption
.
The
distinction should be limited to som
e
macro

categories
(e.g., residential, industrial, commercial
,
or other
s
pecific
categories such as lighting and
traction
)
.
Identification of
some “external” features can be useful to make a
preliminary customer partitioning into
macro

categories
.
Scientific Bulletin of the Electrical Engineering Faculty
–
Year 10 N
o.
3
(
14) ISSN 1843

6188
6
Possible
external features are the rated
values of
electrical quantities, the type of activity and other
information suc
h as
supply voltage level,
annual active
and reactive
energy
(maximum, minimum, average value
and standard deviation),
utilization level (defined as the
energy consumption to rated power ratio),
and
power
factor. M
o
reover, it is possib
le to build s
eparate
models
for
weather

dependent
loads.
Using macro

categories,
t
he
number
of load patterns to be handled together for
each macro

category by the classification methods
would be
reduced.
Dedicated research has been developed in the last decade
to study
classif
ication
mechanisms
based on the shape of
the load patterns
,
useful for the formulation of tariff
options dedicated to each load class
. Furthermore, once
identified, each class can be represented for tariff
purposes through its synthetic load profile, on th
e basis
of which the supplier can make its evaluations and the
authorities can set up appropriate regulation.
Starting from an initial pre

defined number of macro

classes (for instance, residential, industrial, commercial
and others), and from the identifi
cation of time periods
with different consumption characteristics (e.g.,
weekdays and weekend days, defined for different
periods of the year in order to take into account
seasonality issues), research on load pattern
classification has been carried out to
formulate suitable
algorithms able to make sound grouping of the
customers belonging to the same macro

class in a given
time period, using load pattern shape information
gathered from results
measured on
the field.
More specifically, t
he
electrical load
pattern
c
la
ssification
approach starts from the assumption
that it is possible to measure the load pattern of any
customer belonging to the same macro

category
,
for
a given duration of observation
.
Practically, o
n

site
measurements have to be performed
for
a time
period long enough
to get a sufficient amount of data
for customer classification
.
W
ith the present
diffusion of metering facilities
(although suitable
technologies are available)
,
generally
it is
still
not
possible to perform measurements on every
customer
in every jurisdiction, even though the installation of
smart meters is in progress in many countries.
As
such, load pattern classification can be performed by
creating the customer classes by
monitoring a limited
number of customers
.
Then, a suit
able mechanism
for associating the other customers to the classes
formed has to be identified.
The minimum number of
customers to be subject to load pattern measurement
for the various macro

classes
can be
determined
by
using statistical techniques such as
the stratified
sampling approach
[
11
].
The remainder of this paper addresses the various steps
of the electrical load pattern classification process,
accor
d
ing to the general scheme outlined in
Fig
ure
1
[
12
]
[
13
]
.
2.
DATA
GATHERING AND PROCESSING
Let us co
nsider a set of
M
customers belonging to the
same macro

category, to be classified into a meaningful
number of customer classes. Relevant data are referred
to comparable periods in terms of type of day
(weekday/weekend) and season
[
12
][
14
]
. Let us
refer to
the context characterizing
each of these periods as
a
loading condition
.
Typically
,
the monitored data are organised to represent
the customer’s consumption by means of a
daily
load
pattern.
The duration of the monitoring period has to be
long enough to
guarantee availability of a sufficient
amount of data.
Hence, the corresponding monitoring
period should be at least
two

three weeks
in the same
loading condition.
The
sampling rate
depends on the characteristics of the
monitoring equipment used to collec
t data (for a given
monitoring period the
rate
limit can
depend on
data
storage capability). The
t
ime intervals
of interest for data
representation are typically 1 minute, 15 minutes or one
hour.
The corresponding number of samples
characterizing each dail
y load pattern is
H
= 1440,
H
=
96, and
H
= 24, respectively. One

minute sampling can
be used to gather a sufficient number of points to
compute 15

minute data by smoothing the effect of the
discretization step of the measurement system [
15
].
Generally, ev
e
n
though faster measurements are done,
the stored data refer to 15

minute time intervals
[10]
[
16
]

[
18
].
Fig
ure
1
.
The load pattern classification proce
dure
.
PRE

CLUSTERING PHASE

selection
of
the
H
representative features

load pattern data
processing
to build the
M
x
H input
data set
for the specified loading condition
CLUSTERING PHASE

formation of the
customer
class
es
by using the
selected algorithm

c
entroid
formation
by using the customer c
lass
composition and the time

domain data

comput
ation of
the
clustering validity indi
cator
s
to
assess clustering effectiveness
DATA GATHERING AND PROCESSING

load pattern data
measure
ment
of
the
M
customers in
a specified loading condition

bad da
ta
detect
ion
and eliminat
ion
POST

CLUSTERING PHASE

determination of
the final
load profiles
for each
customer class in the specified loading condition

calculation of the
global
power and energy
information
for the customer
class
es to define the
tari
ff options
ISSN 1843

6188
Scientific Bulletin of the Electrical Engineering Faculty
–
Year 10 N
o.
3
(14)
7
D
ata
accuracy
depends on the characteristics of the
monitoring equi
pment. In order to improve the accuracy
of the information gathered, sometimes it would be better
to monitor data with a sampling rate
higher
than the one
corresponding to the time interval of interest (e.g., each
minute), thus calculating the data related
to the time
interval of interest (e.g., 15 minutes) by
averaging
the
single data monitored inside each time interval.
Bad data
detection
and
elimination
is performed in such
a way to ensure that the load patterns used for customer
classification correspon
d to
normal
operating conditions.
For this purpose, load data
leading
to
uncommon
situations
are detected and eliminated. Practically,
u
ncommon
situations may occur depending on
anomalous days
(e.g., bank holidays occurring at
weekdays),
expected events
(
e.g., maintenance) or
unexpected
events
(e.g., failures, strikes, ...). The effects
of
failures
or abnormal conditions may be detected by
identifying the time intervals at which the average RMS
voltage is
outside
the acceptable range (90%
110% of
the rated
voltage).
Moreover. a
dedicated procedure for
d
e

noising by wavelet multiresolution analysis is
presented in [
19
].
The bad data detected are eliminated
from the analysis and the
number of useful data
for a
given time interval is correspondingly reduced.
In order to
represent
the
customer
information
,
the
average load pattern
is determined by computing the
average value of the
useful
load pattern
s
gathered
at each
time interval
. For instance, if measurements have been
performed for two weeks in the loading
condition
corresponding to the spring season,
measuring 10
weekend days,
the average load pattern for the
representative weekday is calculated by averaging
instant

by

instant
the
10 weekday
load patterns
(assuming there is no anomalous day).
If a bad data
appears, for instance at hour 9 am for one of the days,
th
is
bad data is excluded from the averaging, and the
remaining 9 points are used to determine the average
value at hour 9 am to be included in the average daily
load pattern.
T
he averaging process a
llows for
smoothing
the
average
load pattern
curve, thus reducing the
relevance
of
possible power values obtained in “normal” operating
conditions but largely outside the average load
.
For a given
loading condition
, the information
concerning each customer
, to be used for classification
purposes, is
thus
given
in such a way to get load patterns
comparable in terms of their shape. The information
stored contains:
the
reference power
[kW]
, defined as the peak value
of the
average load pattern;
the normalised
r
epresentative
l
oad
p
attern
(RLP)
,
computed by dividing the average load pattern by its
reference power.
The effect of this definition
is that the
reference power
does
not
correspond to the
true
peak power
reached by
the load pattern in the period of obs
ervation
,
because of
averaging
multiple points at corresponding time instants
of the different measured days
. However, this fact can be
seen in a positive way, since non

regular peaks
that
could occur
during
the
measurements
come out to
have a
limited impa
ct on the RLP.
The normalization aspect is
extended in [
20
] to
include also
the minimum value of
the load pattern, in such a way that all RLPs formed have
a null minimum value and a unity maximum value.
3.
PRE

CLUSTERING PHASE
3.1.
Definition of the feature
s to be used for
classification purposes
Feature selection concerns the identification of the
type
of data
to be used for performing the evaluations referred
to load classification
.
The initial data are the RLPs built from the measured
time

domain
data
.
T
he time

domain RLPs can be used
directly
,
or
can be
processed
to obtain other features
representing the customers
.
Using
t
ime

domain data
, the l
oad patterns are defined
with an arbitrary number of
average
power
values
,
depending on the meter resolution
. F
or a given
loading
condition
, a
simple way to define the
features
of the
m
th
representative load pattern, for
m
= 1,…,
M
, is to
consider all or a part of the normalised power values
obtained from the measurements in the time domain
. In
this way, a
set of
H
direct
ly
determined
shape features
is
readily available
,
without performing a
ny
load pattern
post

processing
.
In the time domain, an analysis could be
made for grouping together a number of successive time
intervals. For instance, the RLP data correspondi
ng to
each 15 minutes could be grouped together in such a way
to identify a reduced number of time intervals, composed
of night hours (from 0 am to 6 am), sunrise hours (from
6 am to 8 am), morning hours (from 8 am to 12 am),
lunchtime hours (from 12 am to
2 pm), and so
forth
[12]
.
Let us denote t
he set of
RLPs
a
s
X
= {
x
(
m
)
,
m
= 1,…,
M
},
whose
m
th
component is represented by the vector
x
(
m
)
=
{
x
h
(
m
)
,
h
= 1,…,
H
}
.
An alternative to the use of time

domain data is the
definition of suitable
indirectly determi
ned
s
hape
features, requiring post

processing of the time

domain
data for their definition.
While determining the shape
features
,
an interesting possibility refers to
reducing
the
number of data to be stored for each customer and to be
sent to the classifi
cation tools.
Among the
shape
features
, it is possible to define a set of
shape
factors
,
model
ling
specific aspects of the customer consumption
“signature”
.
The
shape factors
are defined for each
customer on the basis of the representative load diagram
in
a given loading condition
. Examples are the
dimensi
onless ratios used in
[
2
]
,
[
10
]
, [
12
]
and
[
21
]
,
related to average to maximum power ratios, or ratios
between average power at different portions of the day
(daylight period, night period, or the entire da
y)
.
Other
types of indirectly determined shape features are
those
identified in the frequency domain, such as
the
harmonics

based coefficients
presented
in
[
12
]
and
[
22
]
,
the
Fourier series coefficients used
in [
23
], and
the
Scientific Bulletin of the Electrical Engineering Faculty
–
Year 10 N
o.
3
(
14) ISSN 1843

6188
8
coefficients derived from the w
avelet transform
exploited
in [
24
]
.
Furthermore, d
ata size
reduction can be performed by
us
ing
projection
methods, such as
the
Principal
Component Analysis (
PCA
), Curvilinear Component
Analysis (
CCA
)
and
Sammon Map
exploited in
[
25
]
, or
the
Canonical Varia
te Analysis (CVA)
used in
[
26
].
3.2.
Load pattern data processing to build the input
data set
The input data for a given loading condition can be
conveniently set up in the form of a
matrix
, for instance
with
M
rows (for the
M
customers) and
H
columns (
for the
H
features), with an additional column vector of
M
components containing the reference powers for every
customer.
The clustering techniques can use the RLPs to
form the groups. The reference powers can be either
ignored during the group formation (
thus giving the same
conceptual importance to all load patterns regardless of the
corresponding actual power), or can be more conveniently
exploited as weighting factors in the calculation of the
centroids as weighted sums of the load patterns contained in
the group. In this way, the centroid will assume a more
meaningful role. After creating the centroid, it is important
to associate to the centroid the appropriate reference power,
given by the sum of the reference powers of the load
patterns belonging to
the group represented by the centroid.
Once created, the centroid will generally have a maximum
value lower than unity.
4.
CLUSTERING PHASE
4
.
1
.
C
l
ustering
techniques
On the basis of the features defined, the core of the
classification procedure is the ad
option of an effective
classification technique.
Clustering
techniques
[
27
]

[
29
]
are generally used to perform this task. In particular, it is
possible to identify [
23
]
unsupervised learning

based
techniques, such as the Kohonen’s self organizing map
(SOM)
, supervised learning

based techniques, such as
the ones adopting multilayer perceptron or Elman neural
networks, or
vector quantization,
fuzzy logic

based
techniques, statistical techniques such as k

means
(KM)
and multivariate analysis, and hybrid techni
ques such as
probability neural networks
(PNN) and fuzzy k

me
a
ns
(FKM)
.
Further techniques have been recently defined
by following the concept of entropy borrowed by
information theory, or adapting classification techniques
used in other domains, such as
f
ollow the leader
(FDL)
and
support vector clustering
(SVC)
.
A summary of the
techniques used in various literature papers, with
indication of relevant
references, is shown in
Table
1
.
On the application side, the clustering techniques differ
according to t
he principle used in their definition, but
can be discussed on the basis of the requirements for
their usage. A first aspect is the possibility of setting up
the final number of clusters the user intends to obtain.
This possibility can be of interest for t
he supplier or for
the regulating authority, since in their perspective the
number of final consumer classes cannot be too high, in
order to
set up a relatively small number of
tariff
options
, whose contents and differences have to be
readily understandabl
e by the consumers. In this respect,
the different methods
behave
as follows:
a)
Agglomerative
techniques such as the hierarchical
clustering
[20][21]
[
27
][
30
] can be easily adopted to
produce a given number of clusters. In fact, the
hierarchical clustering pr
ocedure starts with a
number of classes equal to the number of RLPs
,
and
proceeds by adding one load pattern at a time to the
“closest” existing class (according to specific linkage
criteria) up to reaching the desired number of
clusters. However,
the hier
archical clustering
includes no mechanism for improving the cluster
formation by reassigning the load patterns to the
clusters already formed
, and the performance
obtained from its variants (with different linkage
criteria [
31
]), evaluated by using appropr
iate
clustering validity indicators (see section
4.2
), are
relatively different
.
The agglomeration principle is
also used in the approach illustrated in [
32
], based on
information theory principles [
33
] and adopting an
effective non

linear metric exploitin
g Renyi entropy
concepts [
34
][
35
] in the development of the
clustering algorithm.
Table
1
.
Load pattern classification methods
method
r
eferences
Adaptive Vector
Quantization (AVQ)
[20]
(2007)
C5.0
[10] (2005)
Entropy

based
(Renyi)
[32] (2010)
Follow

the

leader
(FDL)
[2] (2003),
[37] (2004), [12] (2005),
[13] (2005), [18] (2005),
[25] (2006),
[22]
(2006)
Fuzzy logic (FL)
[19]
(2004)
Fuzzy and ARIMA
[40] (2005)
Fuzzy k

means
(FKM)
[
36
] (2004),
[
12
]
(2005),
[13] (2005),
[
25
]
(2006)
,
[
20
]
(2007)
Hier
archical
clustering (HC)
[12] (2005), [13] (2005), [25] (2006),
[20]
(2007), [21] (2007)
Iterative Refinement
Clustering (IRC)
[
41
]
(2005)
k

means (KM)
[36] (2004), [10] (2005), [12
]
(2005),
[13] (2005), [25] (2006),
[20]
(2007)
Min

Max neuro

fuzzy (MM
NF)
[39]
(2000)
Multivariate statistics
(MANOVA)
[23]
(2006)
Probabilistic neural
network (PNN)
[19]
(2004),
[43]
(2005)
Self Organizing Map
(SOM)
[37] (2004), [36] (2004), [10] (2005),
[23] (2005), [12] (2005), [13] (2005),
[25] (2006), [38] (2007)
Su
pport vector
clustering (SVC)
[
42
]
(2009)
Weighted Evidence
Accumulation
Clustering (WEACS)
[
21
] (2007)
ISSN 1843

6188
Scientific Bulletin of the Electrical Engineering Faculty
–
Year 10 N
o.
3
(14)
9
b)
Other techniques such as k

means and fuzzy k

means
[
27
]
[
44
]
,
accept
ing
the final number of clusters
as
input
, in
a few
cases
resulted in
forming
a lo
wer
number of clusters
with respect to what required
, due
to the presence of empty clusters
in the final
grouping
. However, since the procedure of the
method is not deterministic
,
with internal
steps
depend
ing
on random number extractions, it is
possible t
o find a suitable solution with the desi
r
e
d
number of clusters by running the method again.
Basic
k

means and
fuzzy concepts have been
exploited to set up clustering algorithms applied to
load pattern classification
[19][20]
[
36
]
[
45
].
The
Adaptive Vector Qu
antization (AVQ) method used in
[
20
] is an unsupervised one

layer neural network that
uses a competitive layer with a constant number of
neurons.
Customized versions using fuzzy principles
[
46
]
have been proposed, as in [
39
]
and [
47
]
by
exploiting
Min

Max
neuro

fuzzy network [
48
]
concepts
. Moreover,
a fuzzy inference model using
fuzzy rules to identify the input data, as well as
to
create local regression models, is illustrated in [
45
]
,
and a study showing the possibility of combining
fuzzy clustering and
A
uto

Regressive Integrated
Moving Average
(ARIMA) statistical models is
reported in [
40
]
.
c)
Conversely, t
he follow the leader algorithm
[2]
[
49
]
does
not
require the definition of the number of
clusters as input, but uses an internal distance
threshold among
the cluster centroids, whose
variation produces different numbers of clusters in a
deterministic way (that is, the number of clusters
obtained with a given distance threshold is always
the same for the same set of
initial
data).
Since the
follow the leader
algorithm is relatively fast, it is
possible to
run the algorithm more times
successively,
with different distance thresholds
,
until
the specified number of clusters is reached.
d)
The IRC method
[
41
]
has been defined
to merge the
most interestin
g
properties
of the hierarchical
clustering
(
work
ing
with a specified number of
clusters
)
with the ones of the follow

the

leader
(
the
presence of an iterative mechanism for reassigning
the load patterns to the clusters already formed
)
, and
includes an explicit mechani
sm to avoid the
formation of empty clusters
.
e)
The SO
M
[
50
]
modifies the search space to represent
the results on a bi

dimensional map
[14][37][38]
,
but
does not generate directly the final clusters
. Hence, a
post

processing stage is needed to form the clust
ers,
with arbitrary assumptions, so that different numbers
of clusters can be formed starting from the same
SOM out
comes
, by using a specific technique to
identify the final clusters (for instance,
post

processing based on k

means is used in [
37
] and
[
10
]
)
.
Likewise, SVC [
51
] requires a first stage in
which the support vectors are formed, followed by a
second stage in which the groups are formed for the
desired number of clusters
[42]
.
Furthermore, the
statistical
multivariate technique MANOVA is
presented
in [
23
], by highlighting its
graphical
representation capability, similar to the one of the
SOM, that allows for
simple and effective
visualization of the clustering results.
f)
Further applications have been performed in [
43
] by
using Probabilistic Neural N
etworks (PNN)
,
based on
finding for each load pattern the class with maximum
probability of being the right one
, and in [
21
] by
using a Weighted Evidence Accumulation Clustering
(WEACS) approach
.
An important aspect is that the clustering algorithm can
be
executed on the basis of different features. However,
for the sake of comparison among the final results
from
different clustering techniques
, the
final
RLP
grouping
ha
s
to be
made
on the basis of time

domain data
,
whatever feature has been used to run t
he clustering
procedure
. More specifically, regardless of the specific
details of the clustering method, the only output needed
from the clustering algorithm is the allocation of the
RLPs to the clusters
formed
. This can be done by
constructing a two

dimen
sional list, in which the first
dimension contains the number of clusters 1,..,
K
, while
the second dimension contains for each cluster the list of
RLPs belonging to that cluster. Alternatively, it is
possible to
create and
handle a unique vector
containin
g
M
components,
progressively updated during the
clustering process,
in which the
m
th
component contains
the number of the cluster to which the
m
th
RLP is
assigned.
4
.
2
.
C
l
ustering validity
indicators
Different clustering validity indicators have been d
efined
in order to assess
the effectiveness of the clustering
methods
. Most of these indicators are based on
Euclidean distance metrics
.
For this purpose,
different
types of distance are needed
.
A comprehensive vector

based formulation of the distances is
presented
in
[
12
]
and
[
13
]
. Assuming
that
the clustering results
originate
the set of
centroids
C
= {
c
(
k
)
,
k
= 1,…,
K
}
and the
corresponding groups of the RLPs, denoted as
L
(
k
)
,
each
of which contains
n
(
k
)
RLPs,
for
k
= 1,…,
K
,
it is possible
to consider
v
arious distances. Applying the Euclidean
distance rationale, the set of distances used includes
the
pattern

to

pattern distance
, for instance
)
(
)
(
,
j
i
d
x
x
between
the
i
th
and
j
th
RLPs
,
the
pattern

to

set distance
,
for instance
k
L
,
)
(
i
d
x
from
the
i
th
RLP and
the
k
th
clustered group
,
the
average set

to

set distance
, for
instance
j
L
L
,
)
(
i
d
between
the
i
th
and
j
th
clustered
groups
, and
the
infra

set distance
, for instance
k
d
L
ˆ
refer
r
ed to
the
k
th
cluster
ed group
.
Starting from these definitions, some clustering validity
indicators have been defined under the common
rationale according to which, for each indicator,
lower
values represent better clustering validity. For this
Scientific Bulletin of the Electrical Engineering Faculty
–
Year 10 N
o.
3
(
14) ISSN 1843

6188
10
purpose, the original definition
s of some indicators have
been modified. The set of clustering validity indicators
used in various publications
, referred to the formation of
K
clusters,
is the following:
Mean Index Adequacy (
MIA
) [
2
]:
K
k
k
k
d
K
K
MIA
1
)
(
2
,
1
L
c
(
1
)
Clustering Dispersion I
ndicator
(
CDI
)
[
2
]:
K
k
k
d
K
d
K
CDI
1
2
ˆ
1
ˆ
1
L
C
(
2
)
Scatter Index
(
SI
)
[
52
]:
1
1
2
1
2
,
,
K
k
k
M
m
m
d
d
K
SI
p
c
p
x
(
3
)
with
pooled scatter
M
m
m
M
1
1
x
p
.
Variance Ratio Criterion
(
VRC
)
[
53
]:
1

1
1

1
1
K
M
W
K
W
M
K
VRC
(
4
)
where
K
k
k
k
k
d
d
M
n
n
W
1
2
2
ˆ
ˆ
1
1
L
L
.
Davies

Bouldin Index
(
DBI
)
, considering the
version of
the index introduced in
[
54
] constructed
by using the Euclidean distances
and for
i
,
j
= 1, …,
K
:
K
k
j
i
j
i
d
d
d
K
K
DBI
1
,
ˆ
ˆ
max
1
j
i
L
L
c
c
(
5
)
Similarity Matrix Indicator
(
SMI
)
[
12
]
, for
i
,
j
= 1,
…,
K
:
1
,
ln
1
1
max
j
i
j
i
d
K
SMI
c
c
(
6
)
Modified Dunn Index (
MDI
), adapted
in [
25
]
from
the original version [
55
]
by using the Euclidean
distances;
for
i
,
j
= 1, …,
K
:
1
1
,
min
ˆ
max
j
i
j
i
q
K
q
d
d
K
MDI
c
c
X
(
7
)
Ratio of within
cluster sum of squares to between
cluster variation (
WCBCR
)
[
20
]
; for
i
,
j
= 1, …,
K
:
1
1
2
1
2
,
,
j
i
j
i
K
k
i
k
d
d
K
WCBCR
k
i
c
c
x
c
x
L
(
8
)
Intra

cluster index (
IAI
) [
36
], related to the basic
distances:
K
k
i
i
k
k
d
K
IAI
1
2
,
L
x
c
(
9
)
Inter

cluster index (
I
E
I
) [
36
], related to the distances
to the pooled scatter
p
:
K
k
k
k
d
n
K
IEI
1
2
,
p
c
(
10
)
Each clusterin
g indicator can be applied to a data set
formed by using either time domain data or data defined
in other vector spaces.
As discussed in section 4.1,
representing the clustered groups with their time

domain
patterns enables direct comparison among the clus
tering
outcomes.
Table
2
summarizes the use of the
clustering
validity
indicators in
various
literature references.
Table
2
.
Application of the c
lustering validity
indicators
to load pattern classification
indicator
r
eferences
MIA
[2]
[
10
]
[12][13]
[
20
]
[21]
[22][25]
[
32
]
[37][41]
[
42
][
5
6
]
CDI
[2][12][13]
[
20
]
[21]
[
22
]
[25]
[
32
][
37
][
41
]
[42]
[
5
6
]
DBI
[12][13]
[19]
[
20
]
[
22
][
25
]
[32]
[39]
[
43
]
IAI
[
19
]
[36]
[
43
]
IEI
[
36
]
MDI
[
25
]
SI
[12][13][22][25][32][41][42]
SMI
[
12
]
[
13
]
[20]
VRC
[
12
]
[
13
]
[
41
]
[42]
WCBCR
[
20
]
5.
P
OST

CLUSTERING PHASE
The clustering results are used to set up the customer
class representative load patterns, called
load profiles
.
Generally the load profiles are expressed in absolute
terms, that is, the vertical axis
is
expressed in power
units. This
i
s
done to make the interpretation of the load
profile simpler to the reader. Alternatively, the load
profiles could remain in relative terms, but the reference
powers associate
d
with each of them should be clearly
defined.
The load profiles are not neces
sarily given by the
centroids resulting from the application of the clustering
algorithm. In particular, if the clustering is done by using
features different from the time domain data, the relevant
outcome of the clustering process is the group formation,
as indicated in section 3.2, but the final load profiles are
in any case determined by using the load patterns
expressed in the time domain. The same recalculation
occurs if the centroids are calculated by simple average
of the RLPs (without considering t
heir reference power).
Another important aspect for load profile formation is
that the class representative load patterns that can be
built on the basis of the clustering results could be
referred to the only customers subject to on

site
measurement, that
may correspond to a limited number
with respect to the entire customer set. In this sense,
more refinements would be necessary to build the load
profiles representing the whole population of customers.
The final load profiles can be obtained by properly
re
scaling the class representative load patterns
,
taking
into account not only the reference power, but also other
scalar factors introduced for the purpose of reproducing
with the load profiles the overall energy consumption of
the entire customer set. This
determination requires
ISSN 1843

6188
Scientific Bulletin of the Electrical Engineering Faculty
–
Year 10 N
o.
3
(14)
11
availability of further measurements at the substation
level, knowing the exact location of all the consumers
served by that substation, in order to match the actual
consumption pattern by using the load profiles in the
best way pos
sible. The evaluations can be carried out by
exploiting data fitting techniques.
The final load profiles are used by suppliers and
authorities to formulate and check the effects of
dedicated tariff offers
for the actual consumers
.
In
addition, the load pro
files provide the basis for making
further assessment, for instance to test the revenues that
could come to the supplier by modifying the tariff offer,
or to estimate the energy not served after an interruption
affecting a known portion of the network.
For
tariff purposes, the load profiling system has to be
readily adaptable to incorporate the effects of changing
the number of consumers and their characteristics (for
instance, for a contract power variation). In particular,
clear solutions have to be set u
p to incorporate in the
load profiling system the presence of new consumers [
2
].
When new consumers are added, their attrib
ution to one
of the existing con
sumer classes can be done on the basis
of its estimated load pattern based on initial estimation
of i
ts type of application (e.g., based on external
features). Then, the attribution of the new consumer to
the existing classes can be refined according to
measurements to be carried out in the first period of the
consumer connection to the supply system, by
determining with respect to which centroid the new load
pattern has the lowest distance. It can be also noted that,
after including new consumers, the load profiles should
be periodically updated
(e.g., once a year)
in such a way
that the time integral of
the overall load pattern curve
referred to a customer class in the various loading
conditions matches the total energy consumed by all
customers belonging to that customer class.
The
attribution of new consumers to the existing customer
classes can be assi
sted by the use of classification
algorithms
such as the one used in [
10
], based on C5.0
[
5
7
].
Classification of the business activities into their
most probable clusters has been carried out in [
19
] by
using
P
robab
ilistic
Neural Networks
(PNN)
[
5
8
].
6.
SUM
MARY OF THE CLUSTERING RESULTS
The clustering methods tested
by
various
authors provide
useful
information
on
load pattern clustering. In some
papers the methods were compared on the basis of the
clustering validity indicators. The uniform definition of
the clustering validity indicators, according to which
lower values correspond to higher validity, makes it
possible to observe and rank the methods by testing the
results obtained for various numbers of final clusters.
The dependence of the indicators on
the final number of
clusters implies that the methods can be compared only
by considering the same number of clusters formed. A
ranking of the methods can however be carried out by
checking the robustness of the results, namely, the fact
that the same met
hod provide the lowest values of the
clustering validity indicator for different indicators across
different final numbers of clusters.
In some cases, the best number of clusters formed is
determined by tracking the evolution of specific indices
(such as
the entropy content
[32]
), and identifying the
presence of maximum conditions for these indices.
However, for large data sets the best number of clusters
obtained in this way could be excessively high with
respect to the needs of the electricity suppliers,
according
to which the final number of customer classes has to be
relatively low.
The convenience of
adopting
a
clustering method rather
than another
one
also depends on the nature of the data
set. From extended comparisons made on electrical load
pattern
data sets containing some hundreds of patterns, it
emerged that some methods regularly provided more
effective results in terms of clustering validity. In
particular, among the most classical methods, the
hierarchical clustering method with average linkag
e
criterion has been the one showing remarkably good
performance
[
12]
[
20]
[
21]
[
25
]
.
Comparable performance
with respect to HC
has been shown by the version of the
FDL
method
introduced in
[
2
],
by the IRC method
[
41
]
merging the complementary characteristics
of
HC
and
FDL,
as well as
by SVC
[
42
]
(especially for low numbers
of clusters) and
by
the Renyi entropy

based method
specifically developed
in [
32
]
to deal with the load
pattern classification problem.
Generally, the most interesting results have been sho
wn
by
exploiting
methods exhibiting significant ability to
isolate the outliers appearing in the data set. Other
methods like
k

means
exhibits some tr
end to create more
uniform groups, with lower attitude to single out
uncommon
load
patterns.
Finally, i
nt
eresting perspectives have been opened by the
use of non

Euclidean metrics
,
such as the ones adopted
in
the IRC method and to create the variants of the Renyi
entropy

based method. A challenging aspect is the
assessment of the potential of non

Euclidean me
trics to
be used in the
clustering
proce
dures to deal with the
specific problem of load pattern classification
.
7.
CONCLUSION
S
Research on electricity load pattern classification
has
shown that classical clustering methods such as
k

means
and some variant
s of the hierarchical clustering
are
not
exhibit
ing
the best performance in forming the customer
groups by singling out the outliers existing in the data set
.
Among the clustering algorithms tested, the FDL, SVC
and Renyi entropy

based methods emerged as p
romising
options
, with results comparable with
the hierarchical
clustering
executed
with
the
average linkage criterion.
The adoption of a clustering method able to isolate the
o
utliers opens the question on
how to handle the outliers,
especially when thei
r number is non

negligible with
respect to the total number of clusters formed. In this
respect, t
he supplier (or
the
authority) is the decision
maker establishing the treatment of the outliers, for
Scientific Bulletin of the Electrical Engineering Faculty
–
Year 10 N
o.
3
(
14) ISSN 1843

6188
12
instance form
ing
specific
load
profiles
of
individual
cus
tomer classes, or includ
ing them
in other customer
groups.
One of the directions
to be explored
for
enhancing
electrical load pattern classification
refer
s
to the
formulation and testing of suitable
techniques for
handling very large amounts of data
gather
ed
from many
consumers,
as the
data
made available by the extended
adoption of smart metering technologies
.
Further
directions include the
incorporation of the effect of
possible demand response actions
[
38
]
or
the
application
of real

time pricing
concepts
[
5
9
]
in the customer class
formation, as well as further exploitation of non

linear
metrics
within the clustering algorithms
to make the
clustering process more efficient.
8.
REFERENCE
S
[1]
Chen CS, Kang MS, Huang CW. Application of Load
Survey Systems to Pr
oper Tariff Design,
IEEE Trans.
on Power Systems
, 1997,
12
, (4), pp. 1746

1751.
[2]
Chicco G, Napoli R, Postolache P, Scutariu M, Toader
C. Customer Characterisation Options for Improving
the Tariff Offer,
IEEE Trans. on Power Systems
, 2003,
18
, (1), pp. 381

3
87.
[3]
Walker CF, Pokoski JL.
Residential load shape
modeling based on customer behavior
.
IEEE Trans. on
Power Apparatus and Systems
, 1985,
PAS

104
, (7),
pp.1703

1711.
[4]
Capasso A, Grattieri W, Lamedica R, Prudenzi A. A
bottom

up approach to residential load mo
deling,
IEEE
Trans. Power Systems
, 1994,
9
,
(2),
pp.
957

964.
[5]
Herman R, Kritzinger JJ. The statistical description of
grouped domestic electrical load currents,
Electric
Power Systems Research
, 1993,
27
,
pp.
43

48.
[6]
Heunis SW, Herman R. A probabilistic mode
l for
residential consumer loads,
IEEE Trans. on Power
Systems
, 2002,
17
,
(3),
pp.
621

625.
[7]
Carpaneto E, Chicco G. Probabilistic characterisation
of the aggregated residential load patterns,
IET
Generation, Transmission and Distribution
,
2008
,
2
(3),
pp.
3
73
–
382
.
[8]
Herman R, Heunis SW. Load models for mixed
domestic and fixed, constant power loads for use in
probabilistic LV feeder analysis,
Electric Power
Systems Research
, 2003,
66
,
pp.
149

153
[9]
Chicco G, Napoli R, Scutariu M, Postolache P, Toader
C. Electri
c energy customer characterisation for
developing dedicated market strategies,
Proc. IEEE
Porto Power Tech 2001,
Porto, Portugal, 10

13
September 2001, paper POM5

378.
[10]
Figueiredo V, Rodrigues F, Vale Z, Gouveia JB.
An
Electric Energy Consumer Characterizat
ion Framework
Based on Data Mining Techniques,
IEEE Trans. on
Power Systems
, 2005,
20
, (2), pp. 596

602.
[11]
Neyman J.
On the two different aspects of the
representative method: the met
hod of stratified
sampling and the method of purposive selection
.
Journal of the Royal Statistical Society
, Part IV, 1934,
pp. 558

606.
[12]
Chicco G, Napoli R, Piglione F, Scutariu M,
Postolache P, Toader C. Emergent Electricity Customer
Classification,
IEE Pr
oc.
Gener. Transm. Distrib.
,
2005,
152
, (2), pp. 164

172.
[13]
Chicco G, Napoli R, Piglione F, Scutariu M,
Postolache P, Toader C. Application of clustering
techniques to load pattern

based electricity customer
classification,
Proc. 18th CIRED
, Torino, Italy, 6

9
June 2005, Session 5, paper No. 467.
[14]
Nazarko
J
, Styczynski ZA. Application of Statistical
and Neural Approaches to the Daily Load Profile
s
Modelling in Power Distribution Systems,
Proc. IEEE
Transm. and Distrib. Conference
, New Orleans, LA,
11

16 Ap
ril
1999
,
1
, pp. 320

325.
[15]
Chicco G. Challenges for Smart Distribution Systems:
Data Representation and Optimization Objectives,
Proc. 12
th
International Conference on optimization of
electrical and electronic equipment (OPTIM 2010)
,
Bra
ş
ov, Romania, 20

22 May 2010.
[16]
Chen CS, Hwang JC, Tzeng YM, Huang CW, Cho
MY. Determination of customer load characteristic by
load survey system at Taipower,
IEEE Trans. Power
Delivery
, 1996,
11
(3),
pp.
1430

1436
.
[17]
Chen CS, Kang MS, Hwang JC, Huang, CW. Sy
nthesis
of Power System Load Profiles by Class Load Study,
Electrical Power and Energy Systems
, 2000,
22
, (5),
pp. 325

330.
[18]
Yu IH, Lee JK, Ko JM, Kim SI. A method for
classification of electricity demands using load profile
data,
Proc. Fourth Annual ACIS I
ntern. Conference on
Computer and Information Science
, 2005,
pp.
164

168.
[19]
Gerbec D,
Ga
š
peri
č
S,
Š
mon
I, Gubina F. Determining
the Load Profiles of Consumers Based on Fuzzy Logic
and Probability Neural Networks,
IEE Proc. Gener.
Transm Distrib
., 2004,
151
, (3), pp. 395

400.
[20]
Tsekouras GJ, Hatziargyriou ND, Dialynas EN.
Two

Stage Pattern Recognitio
n of Load
Curves for
Classification of Electricity Customers
,
IEEE Trans. on
Power Systems
, 2007,
22
, (3), pp. 1120

1128.
[21]
Ramos S, Vale Z, Santana J, Duarte J.
Data Mining
Contributions to Characterize MV Consumers and to
Improve the Suppliers

Consumers Se
ttlements
,
Proc.
IEEE/PES
General Meeting 2007
, 24

28 June 2007
.
[22]
Carpaneto E, Chicco G, Napoli R, Scutariu M.
Electricity customer Classification using Frequency

Domain Load Patt
ern Data,
Electrical Power & Energy
Systems
, 2006,
28
, (1
), pp. 13

20.
[23]
Verd
ú
SV, Garc
í
a MO, Senabre C,
Gabaldón
Mar
ín A
,
García Franco FJ
.
Classification, Filtering, and
Identification of Electrical Customer Load Patterns
Through the Use of Self

Organizing Maps,
IEEE
Trans. on Pow
er Systems
, 2006,
21
, (4), pp. 1672

1682.
[24]
Petrescu M, Scutariu M. Load diagram
characterisation
by means of wavelet packet transform, Proc. 2
nd
Balkan
power conference, Belgrade, Yugoslavia, 19
–
21 June
2002
,
pp. 15

19.
[25]
Chicco G, Napoli R, Piglione F. Compa
risons among
Clustering Techniques for Electricity Customer
Classification,
IEEE Trans. on Power Systems
, 2006,
21
, (2), pp. 933

940.
[26]
Li X, Bowers C, Schnier T.
Classification of Energy
Consumption in Buildings with Outlier Detection
,
IEEE Trans. on Industrial Electronics
, in press.
[27]
Anderberg MR.
Cluster Analysis for Applications
,
1973, Academic Press, New York
.
[28]
Everitt BS.
Cluster An
alysis
, 3
rd
ed., 1993, Edward
Arnold and Halsted Press, London, UK.
[29]
Jain AK, Murty MN, Flynn PJ. Data Clustering: a
Review,
ACM Computing Surveys
, 1999,
31
, (3),
pp.
264

323.
ISSN 1843

6188
Scientific Bulletin of the Electrical Engineering Faculty
–
Year 10 N
o.
3
(14)
13
[30]
Cârţină G, Grigoraş G, Bobric EC,
Clustering
techniques in fuzzy modeling
–
Power systems
applications
(in Romanian), Editură Venus, Iaşi,
Romania, 2005.
[31]
Ward J
H. Hierarchical grouping to optimise an
objective function,
J. Am. Stat. Assoc.
, 1963,
58
, pp.
2
36
–
244
.
[32]
Chicco G, Sumaili Akilimali J. Renyi entropy

based
classification of daily electrical load patterns,
IET
Generation Transmission and Distribution
, 2010,
4
,
(6), pp. 736

745.
[33]
Gockay E, Principe JC.
Information Theoretic
Clustering,
IEEE Trans. on Pa
ttern Analysis and
Machine Intelligence
, 2002,
24
,
(
2
)
, pp. 158

170.
[34]
Renyi A. On Measures of Entropy and
Information,
Proc. Fourth Berkeley Symp. Math., Statistics and
Probability
, 1960, pp. 547

561.
[35]
Jenssen R, Hild II KE, Erdogmus D, Principe J
C
, Eltoft
T
.
Clustering using
Renyi’s Entropy,
IEEE Trans. on
Pattern Analysis and Machine Intelligence
, 2002,
24
,
(2), pp. 158

171.
[36]
Marques DZ, de Almeida KA, de Deus AM, da Silva
Paulo ARG, da Silva Lima W. A comparative analysis
of neural and fuzzy cluster techniq
ues applied to the
characterization of electric load in substations,
Proc.
IEEE/PES Transmission and Distribution Conference
and Exposition: Latin America
, 8

11 Nov. 2004,
pp.
908
–
913
.
[37]
Chicco G, Napoli R, Piglione F, Scutariu M,
Postolache P, Toader C. Lo
ad Pattern

Based
Classification of Electricity Customers,
IEEE Trans. on
Power Systems
, 2004,
19
,
(2), pp. 1232

1239.
[38]
Valero S, Ortiz M, Senabre C, Alvarez C, Franco FJG,
Gabald
ó
n A.
Methods for customer and demand
response policies selection in new electricity markets
,
IET Generation, Transmission & Distribution
, 2007,
1
,
(1), pp. 104

110.
[39]
Lamedica R, Fracassi G, Martinelli G, P
rudenzi A,
Santolamazza L. A Novel Methodology Based on
Clustering Techniques for Automatic Processing of
MV Feeder Daily Load Patterns,
Proc. IEEE PES
Summer Meeting 2000
, Seattle, WA, 16

20
July 2000,
1
, pp. 96

101
.
[40]
Nazarko J, Jurczuk A, Zalewski W. ARIM
A models in
load modelling with clustering approach.
Proc.
IEEE
Power Tech 2005
, St. Petersburg, Russia
,
27

30 June
2005.
[41]
Batrinu F, Chicco G, Napoli R, Piglione F, Scutariu, M
Postolache P, Toader C. Efficient Iterative Refinement
clustering for electrici
ty customer classification
,
Proc.
IEEE Power Tech 2005
, St. Petersburg, Russia
,
27

30
June 2005, paper no.139.
[42]
Chicco G, Ilie IS. Support Vector Clustering of
Electrical Load Pattern Data,
IEEE Trans. on Power
Systems
, 2009,
24
,
(3
),
pp. 1619

1628.
[43]
Gerbec
D, Ga
š
peri
č
S,
Š
mon I, Gubina F. Allocation of
the load profiles to consumers using probabilistic
neural networks,
IEEE Trans. on Power Systems
, 2005,
20
, (2), pp. 548

555.
[44]
Bezdek JC, Harris JD. Fuzzy partitions and relations;
an axiomatic basis for clustering,
Fuzz
y Sets and
Systems
, 1978,
1
, pp. 111

127.
[45]
Zakaria Z, Lo KL, Sohod MH. Application of Fuzzy
Clustering to Determine Electricity Consumers' Load
Profiles,
Proc. IEEE Power and Energy Conference
PECon '06
, Nov
ember
2006,
pp.
99

103.
[46]
Zadeh L
.
Similarity relat
ions and fuzzy orderings,
Information Sciences
, 1971,
3
, pp. 177

200.
[47]
Özveren CS, Vechakanjana C, Birch AP. Fuzzy
classification of electrical load demand profiles
–
A
case study,
Proc. IEE Power System Management
and
Control
,
17

19
April 2002
,
pp.
353

358
.
[48]
Simpson PK. Fuzzy Min

Max neural networks

Part 2:
Clustering,
IEEE Trans. on Fuzzy Systems
,
1993,
1
,
(
1
)
,
pp. 324.
[49]
Pao YH, Sobajic DJ. Combined use of unsupervised
and supervised learning for dynamic security
assessment.
IEEE Trans. on Power Systems
,
1992,
7
,
pp. 878
–
884.
[50]
Kohonen T.
Self

Organizing Maps
, 1995, Springer
Series in Information Science,
30
, Springer

Verlag,
Berlin, Germany.
[51]
Ben Hur A, Horn D, Siegelmann HT, Vapnik V.
Support vector clustering,
J. Mach. Learn. Res.
, 2001,
2
, pp. 125

137.
[52]
Pi
tt BD, Kirschen DS. Application of Data Mining
Techniques to Load Profiling,
Proc. IEEE PICA’99
,
Santa Clara, CA, 16

21 May 1999, pp. 131

136.
[53]
Calinski RB, Harabasz J. A dendrite method for cluster
analysis,
Commun. Stat.
, 1974,
3
, pp. 1

27.
[54]
Davies DL, Bou
ldin DW. A Cluster Separation
Measure,
IEEE Trans. on Pattern Analysis and
Machine Intelligence
, 1979,
PAM

1
, (2), pp. 224

227.
[55]
Dunn JC. Well separated clusters and optimal fuzzy
partitions,
Journal of Cybernetics
, 1974,
4
, pp. 95

204
.
[56]
Zalewski W. Application of fuzzy inference to electric
load clustering,
Proc. IEEE Power India Conference
,
10

12 April 2006.
[57]
Quinlan R. The Book C4.5: Programs for Machine
Learning. Morgan Kaufmann, San Mateo, CA, 1993.
[58]
Specht DF. Probabilistic neural net
works,
Neural
Networks
, 1990,
3
, (1), pp. 109
–
118.
[59]
Schweppe FC, Caramanis MC, Tabors RD, Bohn RE.
Spot pricing of electricity
, Kluwer Academic
Publishers, Boston 1988.
Comments 0
Log in to post a comment