Quality Threshold Clustering Algorithms for Event Detection in an
Industrial Environment

Kevin Nelson

Abstract
Utilizing multi

point observations from multiple sensors to predict future events in a
complex system in real time is a common, yet difficult, challenge that presents itself in both
military and industrial settings. The first step to making headway toward
s a solution to such a
problem is to analyze and better understand the data that is available. Next, an appropriate
predictive model must be selected and developed. Depending on the model selected, the nature
and extent of the
relationships among variabl
es must be understood using correlation analysis
tools to determine which variables most strongly correspond to events of interest.
In the
military, finding a relevant dataset containing ground truth that is unclassified proves to be
difficult, so, for th
e purposes of this research, data from an industrial environment was utilized.
The data came from an environment that operated 24 hours a day, so times when the system was
down were used as anomalous
events to be detected/predicted. Two different thresho
lding
models were developed, implemented, and analyzed; the Quality Threshold K

Means Model
(QTKM) and the Quality Threshold DBSCAN Model (QTDM). Thresholding models allow an
operator to adjust the sensitivity of the model based on an allowance for false
alarms and the
necessity for early detection of specific events.
1
Introduction
The research discussed in this paper is both an extension of and an improvement to the research
outlined by Dr. Misty Blowers in her Ph.D. thesis entitled “Analysis of
Machine Learning
Models and Prediction Tools for Paper Machine Systems.”
Much of her work is described here
as it is prerequisite to understanding the ideas introduced in this paper. Her work
, as does the
work described in this paper,
focus
es
on
predicti
ng faults or down times in a paper recycling mill
through analysis of the data sets described in Section 2.
In her thesis, Dr. Blowers
introduces the
Quality Threshold K

Means Model (QTKM), a unique thresholding model built around the K

Means clustering a
lgorithm.
We will discuss this algorithm and
improvements
that were
made to
Chad Salisbury’s initial implementation of
the algorithm
,
as part of this research. A new
extension of the algorithm
called the Quality Threshold DBSCAN Model (QTDM)
,
which is
bu
ilt around the DBSCAN clustering algorithm
, was also developed as part of this project
.
These algorithms are discussed in detail in section 3, along with the contributions made by this
work. Section 4 concludes by summarizing the results of this research
.
2
Experimental Environment
2.1
Data Collection
The data used for this research came from a paper recycling mill.
This particular mill was
equipped with a plant information system which gave a “snapshot” of the
readings of all the
sensors at
any given time.
For this research, data was collected in five minute intervals, and one
full year’s worth of data was used. At each time stamp, the data forms a vector of sensor
readings.
A ground truth value was then added to each vector corresponding
to the state of the
system at the time the data was collected. The vector was labeled with a 1 if it represented a
state where the system was operating properly. The state that occurred 5 time steps (25 minutes)
prior to a break in the system was labele
d with a 0. All vectors within 5 time steps of the break
were eliminated from the data set as sensors often became unreliable during this time, and since
the goal of this research is to predict, not detect, failures.
Also, based on information provided
b
y experts from the mill, certain sensors were determined to contain too much uncertainty, and
their readings were eliminated from the data set.
The processed data set contained 1,441 records
belonging to class 0 and 71,362 records belonging to class 1.
A
lso included with the data set was
an operator log. Nearly every time the system was down, an operator at the mill logged the
down
time, the event that lead to the system being down, and the cause of the event.
The
aforementioned data sets were acquired and processed by Dr. Misty Blowers during her Ph
.
D
.
research through collaboration
with experts at the paper mill.
2.2
Importance of Research
Preventing down times is of obvious importance in a mill that opera
tes constantly 24 hours a
day. More down times equals less production, and less production equals diminished income.
Predicting a system failure before it actually occurs and providing the operator with potential
causes of why the system may fail gives t
he operator
enough
time to take the necessary steps to
resolve the issue and avoid a down time.
This is a complicated challenge, however, as there are
a variety of fac
tors that may lead to a break in a
paper recycling mill. For this reason, many
features need to be considered simultaneously. Existing prediction algorithms are limited
because they focus on linear relationships among the variables.
However, in a paper mill which
runs several different g
rades of paper, a single linear correlation may not exist. Rather, subsets
of correlations must be considered.
2.3
Data Preprocessing
The first step of our predictive model is to determine, through correlation analysis, which
features of the data set
(sensors) most strongly correspond to system failures.
Only those
variables selected in this step will be considered in subsequent steps. The Tukey test (t

test) was
the primary statistical analysis technique used in this research.
The Tukey test is a s
imple
statistical test which essentially measures the amount of overlap in the data between the two
classes (system down and system running) for a given
sensor.
The t

test operates as follows:
1.
Locate the largest and smallest values of class 0.
2.
Locate the
largest and smallest values of class 1.
3.
Consider the class containing the smallest overall value. Count the number of values
belonging to that class which are less than the smallest value of the other class
4.
Consider the class containing the largest overal
l value. Count the number of values
belonging to that class which are greater than the largest value of the other class.
5.
The sum of the values found in steps 3 and 4 is the resulting t

score.
Sensors with a high t

score are more likely to be considered.
Also taken into consideration is the
mean difference between the two classes. The mean difference is obtained by finding the
average of the values belonging to class 0 and the average of the values belonging to class 1 and
calculating the difference betwe
en these two values.
An important contribution of this research
is the automation of this data preprocessing step, allowing the operator to easily analyze and
select those fields of most importance without editing the data set itself.
3
Algorithms
Studied
3.1
K

Means Clustering
The first model
studied and developed
was the Quali
ty Threshold K

Means Model which builds
upon the K

Means clustering algorithm. K

Means clustering is an algorithm which attempts to
partition vectors into a set number
, k,
of similar groups
. The algorithm operates as follows:
1.
Place k points randomly into the space represented by the vectors to be clustered. These
points represent cluster centroids.
2.
Assign each vector to the centroid which it is closest to.
3.
Reposition each
centroid so that it is in the center of the vectors assigned to it.
4.
Repeat steps 2 and 3 until there is no change in the centroids during step 3 or until a set
number of iterations
is
reached.
Generally, distances between vectors are measured using the
Euclidean distance measurement.
Given two vectors, x = (x
1
, x
2
,…, x
n
) and y = (y
1
, y
2
,…, y
n
), the Euclidean distance between
them is calculated as follows:
n
i
i
i
y
x
y
x
d
1
2
)
,
(
3.2
K

Evaluator
One of the biggest criticisms against the K

Means
clustering algorithm is that the number of
groupings, k, must be known and predefined. Using a value of k that is too small may lead to
groups that contain unrelated or non

similar data, and using a value that is too large leads to
groups competing over d
ata that is highly related.
We attempt to solve this problem by using a
cluster evaluator or k

evaluator. The k

evaluator works by iteratively clustering
the data
using
different values of k
and calculating the inter

cluster to intra

cluster ratio. It t
hen selects the
value of k which maximizes this ratio. The intra

cluster distance is calculated by finding the
distance of every point to
its respective
cluster
centroid
, and then averaging these values. The
inter

cluster distance is
calculated by findin
g the distance each cluster centroid is from each of
the other cluster centroids, and then averaging these values.
During this research, a K

Evaluator
was incorporated into the implementation of the QTKM in an attempt to quickly improve results
through li
ttle effort by the operator.
3.3
Quality Threshold K

Means Model
The paper mill data used for this research contained a significant amount of overlap between the
two classes
(running system and down system). For this reason, simply implementing the
K

Means clustering algorithm would be ineffective as membership to a cluster alone would not be
enough information to classify a vector as class 0 or class 1.
During her Ph.D. research with this
data, Dr. Misty Blowers developed the Quality Threshold K

Me
ans Model (QTKM), and it was
initially
implemented by Chad Salisbury. With this model, all vectors representing times when
t
he system was down (class 0) ar
e first clustered using the K

Means clustering algorithm. Then,
each cluster is assigned a threshol
d value. Each threshold value represents a radial distance from
the cluster’s centroid.
An unlabeled vector of sensor readings is classified by first assigning it to
the nearest cluster centroid, and then determining whether or not it falls within that c
luster’s
threshold distance. If it falls within the threshold, it is considered to represent a system that is
about to go down (class 0), and if it falls outside the threshold, it is considered to represent a
healthy system (class 1).
Generally, since th
e clusters were created based on class 0 data only,
they are densest with class 0 data towards the centroid. Therefore,
vectors that fall
closer to the
centroid a
re
more likely
to belong to class 0. With this in mind, the operator may adjust the
threshold
s based on their desired confidence that an unlabeled vector classified
by the algorithm
as class 0 does in fact mean that the system is about to go down. In other words, adjusting the
thresholds allows the operator to set a tolerance for false alarms and
detections.
Each threshold
is selected by choosing a number between 0 and 100 which corresponds to the minimum distance
such that the selected percentage of class 0 training vectors fall within the threshold.
This
approach is a combination of supervised
and unsupervised learning known as semi

supervised
learning. It is unsupervised because clusters of similar patterns are permitted to emerge within
the class 0 vectors, while also being supervised because the vectors are labeled as representing a
running
or down system.
3.4
QTKM Threshold Optimization
Often times, the operator may not want to spend time manually adjusting the thresholds of each
cluster. It would be beneficial to him if he were given the ability to select a minimum detection
rate and
let the thresholds adjust themselves accordingly.
One of the contributions of this
research is a greedy algorithm
which attempts to do just that. In order to understand this
algorithm, it is important to first understand the detection rate and false alar
m rate.
After the
clusters have been trained, a validation period is selected, generally beginning where the training
set ended and spanning the same amount of time. For example, if the clusters were created based
on mill data from January, then they wou
ld be validated using data from February. During
validation,
each record from the selected validation set is assigned to its nearest cluster centroid
and compared against that cluster’s threshold. If it falls within the threshold, then it is classified
a
s class 0, and if it falls outside the threshold, it is classified as class 1. The detection rate is the
percentage of vectors from the validation set labeled as class 0 that were correctly classified as
class 0 during validation. The false alarm rate is
the percentage of vectors from the validation set
labeled as class 0 that were incorrectly classified as class 0 during validation.
The threshold optimization algorithm attempts to ensure a selected minimum detection rate is
met while simultaneously mini
mizing the false alarm rate.
This is done by testing threshold
values of 0 through 100 for each cluster. For each cluster, the threshold value that results in the
largest detection rate to false alarm rate ratio for that cluster is selected.
Then, if th
e overall
detection rate is not yet met, it is determined for which cluster a threshold increase by 5 would
result in the largest value of change in detection rate minus change in false alarm rate. The
corresponding threshold is increased by 5, and this p
rocess continues until the minimum
detection rate is met, or until all thresholds are at 100. An alternative to this approach is to skip
the first step, starting all thresholds at 0, and then iteratively increase the proper threshold by 5
until the desire
d detection is met.
3.5
DBSCAN Clustering
K

Means clustering is most effective when the data naturally forms circular shaped clusters.
However, examination of the paper mill data revealed that this is often not the case
. The vectors
representing down
times in the system tended to f
orm thin strips. For this reason, it was deemed
necessary to use an algorithm which allowed for arbitrary shapes and
relied more on density
, and
the DBSCAN algorithm does just that. DBSCAN stands for density based spatial c
lustering of
applications with noise.
It focuses on the notion of density reachability. A point q is directly
density reachable from a point p if q is within a set
distance
,
(epsilon), from p, and there is
also
a set number of points
,
called “
minPts
”
,
within a distance of
from p. Two points, p and q,
are density reachable if there is a series of points from p to q that are all directly density
reachable from the previous point in the series.
All points within a cluster formed by DBSCAN
are density re
achable, and any point that is density reachable from any point in the cluster,
belongs to the cluster. The basic DBSCAN algorithm operates as follows:
1.
Choose an unvisited point P from the data set and mark it as visited.
Finished if there are
none.
2.
Find
all the neighbors of point P (points with
in
a distance of
).
3.
If the number of neighbors is less than minPts, mark
P
as noise and
return to step 1
.
4.
Create a new cluster and add P to the cluster.
5.
Repeat steps 6

9
for each point P’ of the neighbors of P
6.
If P
’ is not visited, mark it as visited. Otherwise return to step 5.
7.
Find the
neighbors of P’
8.
If the number of neighbors of P’ is not less than minPts, join the neighbors of P’ with the
neighbors of P.
9.
If P’ does not belong to a cluster, add it to the clus
ter of P.
10.
Return to step 1.
3.6
Epsilon Evaluator
Much like the K

Means clustering algorithm, DBSCAN has the issue of choosing the proper
inputs to the algorithm. DBSCAN takes two inputs, epsilon and minPts. Examining the mill
data reveals that groupin
gs of similar class 0 vectors generally contain at least three points, so
minPts is usually set to 3. In an attempt to find the optimal value of epsilon, an epsilon evaluator
was
implemented
as part of this research
.
The evaluator relies on the use of a
k

distance graph.
To generate this graph, the distance to the kth nearest point is determined for each point that is to
be clustered. Generally
,
minPts is used as the value for k. This list of distances is then sorted
from smallest to largest
. Plotting
this list generally results in a gradually upward sloping line
which suddenly increases greatly in slope towards the end. Choosing epsilon to be the value of
the distance just before the large increase in slope
,
results in the best clustering. Noise poi
nts
whose k

nearest distance is large will not be clustered
, and core points with a small k

nearest
distance are guaranteed to be clustered with nearby points.
3.7
Quality Threshold DBSCAN Model
The
Quality Threshold DBSCAN Model (QTDM)
, which was develo
ped and implemented
during this research,
incorporates the concept of thresholding into the DBSCAN clustering
algorithm.
First, the DBSCAN clustering algorithm is run on only the vectors representing a
down system. Then, similar to the QTKM, each cluster
is given a threshold value. Since the
clusters are arbitrarily shaped, they do not contain a centroid, so the threshold value applies to
each point in the cluster.
An unlabeled vector of sensor readings is classified by testing it against
each point of
each cluster, and for each point determining if it is within the point’s respective
cluster’s threshold distance from the point.
If the vector falls within any of the thresholds, it is
considered to represent a system that is about to go down, and if it f
alls outside of all the
thresholds, then it is considered to represent a healthy system.
Similar to the QTKM, adjusting
the thresholds allows the operator to set a tolerance for false alarms and detections. Each
threshold is selected by choosing a value
between 0 and 100 which corresponds to the selected
percentage of 4 times epsilon
(4
)
.
Also, much like the QTKM, this model is a semi

supervised
system.
3.8
QTDM Threshold Optimization
Again, an algorithm is needed that can
adjust the thresholds to meet a selected detection rate
while also minimizing the false alarm rate. However, unlike with QTKM threshold optimization,
a greedy algorithm cannot be used because the adjustment of one threshold may affect the
detection and fa
lse alarm rates of another cluster. This is because during validation, each
validation point is assigned to the cluster which contains the
largest
number of points
within the
threshold of which
the validation poi
nt falls
. If it does not fall within any t
hresholds, then it is
assigned to the cluste
r which contains the point that is found to minimize the distance
to the
validation point
. As we can see, adjusting threshold values may change the
assignment of points
to
cluster
s
.
Therefore, an algorithm which takes all the threshold values into account simultaneously is
necessary. A genetic algorithm was developed
during this research
for this purpose.
A genetic
algorithm is an optimization technique which attempts to mimic natur
al evolution. In such an
algorithm, first, an initial population of potential solutions is developed. Then, those solutions
with the highest fitness value are selected to “breed” to pass down their favorable characteristics
to the next generation. Rando
m mutations are also introduced with each generation. After a set
number of generations is produced, the potential solution with the highest fitness value is
selected.
For our purposes, each potential solution is a set of threshold values, one
value
for
each cluster.
The initial population is generated randomly.
The fitness value for each set of thresholds is the
resulting detection rate to false alarm rate ratio, weighted by the distance from the desired
detection rate.
Parents are selected for breedi
ng probabilistically based on their fitness value.
Breeding is done through two

point crossover (swapping a section of thresholds between two
parents), and mutation is introduced at a set probability.
3.9
Benefits
of Thresholding Algorithms
As already
stated, these models were developed for a data set which contained a large amount of
overlap. Traditional clustering algorithms would prove to be ineffective. Thresholding allows
an operator to adjust the sensitivity of the model based on the tolerance
for false alarms and the
desired detection.
An added benefit of these algorithms comes through use of the operator log which tracks the
cause of each system down time. This not only allows the system to output potential causes of a
predicted failure base
d on what has been seen in the past, but it also allows the operator to set a
sensitivity to specific events. This is because during training, each cluster stores the operator log
entry that corresponds to each point assigned to the cluster.
Therefore, w
hen an unlabeled vector
is assigned to a cluster, the system can report what sort of failures the patterns in that cluster
have been associated with in the past and with what frequency. Also, when an operator is
adjusting thresholds, he may increase or de
crease a threshold based on the events associated with
the corresponding cluster and the importance of detecting such an event.
During our research,
the ability to more easily display per cluster operator log information was added to the original
implemen
tation of the thresholding models, giving the operator more ability to make informed
decisions.
4
Conclusion
During the research presented in this paper, the Quality Threshold K

Means Model, as
introduced in Misty Blower’s paper entitled “Analysis of Mac
hine Learning Models and
Prediction Tools for Paper Machine Systems,” was studied. Chad Salisbury’s initial
implementation was improved upon in an attempt to more rapidly allow an operator to more
accurately predict down times in a paper recycling mill.
A simple data preprocessing step was
added to allow the operator to select the most useful fields, a K

Evaluator was incorporated to
improve clustering results, an algorithm was developed to automatically optimize threshold
values, and simple
r
interfaces w
ere created to do all of the above.
A second thresholding model,
the Quality Threshold DBSCAN Model, was developed and implemented after examination of
the data set. This made use of the DBSCAN clustering algorithm and also includes an epsilon
evaluator for selecting optimal inputs for th
e algorithm, and a genetic algorithm for determining
optimal threshold values.
Comments 0
Log in to post a comment