Self-Detection of Abnormal Event

lettuceescargatoireAI and Robotics

Nov 7, 2013 (3 years and 7 months ago)

49 views

Self
-
Detection of Abnormal Event
Sequences

February 26
-
27, 2009

Farokh B. Bastani

UT
-
Dallas

ilyen@utdallas.edu

I
-
Ling Yen

UT
-
Dallas

bastani@utdallas.edu

Latifur Khan

UT
-
Dallas

lkhan@utdallas.edu

Net
-
Centric Software & Systems Consortium

Kick
-
off Meeting

2

11/8/2013

Net
-
Centric Software & Systems
Consortium

Problem Description


There are numerous types of event
-
based
workflows in net
-
centric systems


E.g., Call control signal processing, network accesses,
access to resources, access to data, etc.


Need for abnormal behavior detection


Event
-
based workflows may incur software & system
faults, operational errors, attacks, fraud, illegitimate
manipulations, resulting in abnormal behaviors


If the abnormal behavior can be detected, proactive
techniques can be used to mitigate the problem

Net
-
Centric Software & Systems Consortium

Kick
-
off Meeting

3

11/8/2013

Net
-
Centric Software & Systems
Consortium

Existing Solutions


Many data mining and machine learning algorithms can be used to
classify normal and abnormal events


Bayesian networks, neural networks, decision trees, K
-
mean, support vector
machines (SVM), hidden Markov models, etc.


Problem: Which method to use?


Data set dependent


Must explore the best approach for each dataset


Feature extraction from raw data can have significant impact on the
prediction quality


Must explore various feature extraction models


Problem: How to mine event sequences?


Automata based approach: Known event sequences, cluster them and
determine the abnormal ones (no well established clustering techniques)


Episode based approaches: Need to mine the event sequences first, and
then cluster them and determine the abnormal ones (has well established
episode mining techniques, but not much research on clustering)

Net
-
Centric Software & Systems Consortium

Kick
-
off Meeting

4

11/8/2013

Net
-
Centric Software & Systems
Consortium

Our Solution


Multivariate automata and episode mining


Unknown event sequence: Use episode mining


Automata merging for known or mined event sequences


Multiple variables result in a huge state space


Use dominance parameters and weights to merge states


Develop techniques to merge automata efficiently (hashing, clustering)


Identify abnormal event sequences


Use clustering techniques to identify outliers


Need effective clustering techniques


Need to handle event sequences with different lengths


Need to integrate inter
-
event parameters in the clustering process


Manual help to identify actual faulty event sequences offline

Net
-
Centric Software & Systems Consortium

Kick
-
off Meeting

5

11/8/2013

Net
-
Centric Software & Systems
Consortium

Our Solution (Cont.)


Develop a feedback based self
-
improving mechanism


When the prediction error exceeds a threshold, adjust the algorithm


Use multiple algorithms to provide fine tuning


E.g., use weighted decision from multiple algorithms


Fine tune feature set extractions and use dimension reduction
mechanisms to obtain faster and better results


Off
-
line analysis to achieve improvements and feed the
improvements to the online model


Adjusted algorithm, revision of features, addition of inter
-
ES features


Develop fault
-
injection techniques to induce self
-
learning


Establish the faulty pattern library from data that have been learned


Inject faulty patterns to train the mining process and to measure the
effects (use
faulty pattern library and develop fault generation algorithms)

Net
-
Centric Software & Systems Consortium

Kick
-
off Meeting

Classifier

Current

Data sets

prediction

All data sets

Classifier

feedback

Analysis

Faulty data

injector

6

11/8/2013

Net
-
Centric Software & Systems
Consortium

Experimental Plan


Develop techniques for abnormal event sequence detection


Develop automata generation and merging techniques


Study the effects of various clustering algorithms on various event
sequence datasets


Consider signal flow data from Cisco


Consider network
-
based intrusion detection datasets


Consider human interoperations (if possible)


Develop the models and methods for dynamic adaptation


Algorithmic adaptation and feature set extraction adaptation


Net
-
Centric Software & Systems Consortium

Kick
-
off Meeting

7

11/8/2013

Net
-
Centric Software & Systems
Consortium

Industry Member Benefits


The abnormal behavior prediction approach can be applied
to many net
-
centric applications that are event
-
based and
workflow
-
oriented


Call control signal processing


Resource and database access control


System health monitoring for real
-
time embedded systems, including
avionic systems, space
-
based systems, etc.


Application
-
dependent workflows, e.g. monitoring the behavior of
drivers on roads


Need real data and related knowledge from industry for
analysis, model construction, effectiveness analysis

Net
-
Centric Software & Systems Consortium

Kick
-
off Meeting

8

11/8/2013

Net
-
Centric Software & Systems
Consortium

Deliverables and Budget


First year, $30K: Develop the basic multivariate automata mining and
abnormal sequence detection techniques


First quarter: Work with industrial partner to understand the data
and develop pre
-
processor to extract event patterns


Second and third quarter: Develop the automata merging and
automata clustering techniques


Fourth quarter: Apply the techniques to the dataset and validate
the approaches


Second year, $30K: Develop dynamic learning techniques


Develop the feedback learning approach


Develop tools to efficiently achieve self learning

Net
-
Centric Software & Systems Consortium

Kick
-
off Meeting

11/8/2013

Net
-
Centric Software & Systems
Consortium

9









With dynamic learning:
More
accurate abnormal event prediction
results and less false alarms.

Automated Feature extraction:
Obtain features that can optimize
the prediction effectiveness.

Can be used for abnormal event sequence detection for many event based applications.

MAIN ACHIEVEMENT:

Applied data clustering algorithms to various data
sets to study their effectiveness. The experiments
show that Support Vector Machine yields the
best results for 90% of the data sets.

Developed improved SVM algorithm to further
improve data clustering outcomes.

Developed methods for clustering sparse data sets.


HOW IT WORKS:


Dynamic Learning:

Develop a feedback based
self
-
improving mechanism to improve clustering
algorithm on
-
the
-
fly based on a small set of data
and verify the improvement off
-
line on a large
volume of historical data


Automated feature extraction:

Build workflow and
event model to allow automatic extraction of data
features, including event characteristics, inter
-
event effects, etc. Try to improve the precision of
abnormality prediction by improvement on
extracted features.


ASSUMPTIONS AND LIMITATIONS:


Availability of data

Key objectives:



Dynamic learning



Adaptive feature extraction

Apply the technique to Cisco
signal flow data and for
network intrusion detection

Many data clustering
algorithms can be used for
abnormal event detection.
But they do not self adapt
and data features have to
be identified preliminarily

Develop an abnormal event
detection algorithm that can
dynamically adapt through
learning and can
automatically extract the
best features for optimal
prediction

QUANTITATIVE IMPACT

END
-
OF
-
PHASE GOAL

STATUS QUO

NEW INSIGHTS

Topic/project/effort description

Net
-
Centric Software & Systems Consortium

Kick
-
off Meeting

Comparison of Prediction Accuracy

Methods
\
Datasets

dataset A

dataset B

Item
-
Based

61.326

60.132

User
-
Based

61.271

60.321

LPWSI

68.3

67.0

LPKA

71.5

70.0

KAWOK

72.5

70.4

BIC
-
aiNet

82 (80% training, 20% test)

BMKL

84.03

83.53

BMKL with NSM

84.13

83.71