Siggi Olafsson
Associate Professor
Department of Industrial Engineering
Iowa State University
20
th
European Conference on Operational Research
Rhodes, Greece, July 5

8
Operations Research
&
Data Mining
20th European Conference on Operational
Research, July 4

7, 2004
2
Purpose of Talk
Give a definition and an
overview of data mining as
it relates to operations
research
Present some examples to
give the flavor for the type
of work that is possible
My views and future of OR
and data mining
Aim for it to be accessible
without prior knowledge of
data mining
Should
I be
here?
20th European Conference on Operational
Research, July 4

7, 2004
3
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas
20th European Conference on Operational
Research, July 4

7, 2004
4
Background
Rapidly growing interest in data mining among
operations research academics and practitioners
For example evidenced by increased data mining
presence in professional organizations
New INFORMS Section on Data Mining
Large number of data mining sessions at INFORMS and
IIE research conferences
Special issues in Computers & Operations Research, IIE
Transactions, Discrete Applied Mathematics, etc.
Numerous presentations/sessions at this conference
20th European Conference on Operational
Research, July 4

7, 2004
5
What is Data Mining?
20th European Conference on Operational
Research, July 4

7, 2004
6
What is Data Mining,
Really
?
Extracting
meaningful
, previously
unknown
patterns or knowledge from
large databases
The knowledge discovery process
Define
Objective
Prepare
Data
Mine
Knowledge
Interpret
Results
Data cleaning
Data selection
Attribute selection
Visualization
Classification
Association rule
discovery
Clustering
Business/scientific
objective
Data mining
objective
Predictive models
Structural insights
20th European Conference on Operational
Research, July 4

7, 2004
7
Interdisciplinary Field
Statistics
Databases
Optimization
Machine
Learning
Data Mining
20th European Conference on Operational
Research, July 4

7, 2004
8
Input Engineering
Preparing the data may take as much as 70% of
the entire effort
Numerous steps, including
Combining data sources
Transforming attributes
Data cleaning
Data selection
Attribute selection
Data visualization
Many of those have connections with operations
research and optimization in particular
20th European Conference on Operational
Research, July 4

7, 2004
9
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas
20th European Conference on Operational
Research, July 4

7, 2004
10
Data Visualization
Visualizing the data is important in any
data mining project
Generally difficult because the data is
always high

dimensional, i.e., hundreds or
thousands of attributes (variables)
How can we
best
visualize such data in 2
or 3 dimensions?
Traditional techniques include
multidimensional scaling, which uses
nonlinear optimization
20th European Conference on Operational
Research, July 4

7, 2004
11
Optimization Formulation
Recent combinatorial optimization formulation by Abbiw

Jackson, Golden, Raghavan, and Wasil (2004)
Map a set
M
of
m
points from
R
r
to
R
q
, q
= 2,3
Approximate the
q

dimensional space by a lattice
N
1
,
0
,
1
s.t.
)
,
(
),
,
(
min
1
ik
N
k
ik
M
i
j
M
j
N
k
N
l
jl
ik
new
original
x
M
i
x
x
x
l
k
d
j
i
d
F
etc
map,
Sammon
square,
least
as
such
Function
in
measure
Distance
)
,
(
in
measure
Distance
)
,
(
F
l
k
d
j
i
d
q
new
r
original
R
R
20th European Conference on Operational
Research, July 4

7, 2004
12
Solution Methods
Quadratic Assignment Problem (QAP)
Not possible to solve exactly for large scale problems
Local search procedure proposed
Key to the formulation is selection of objective function,
e.g., Sammon map
M
i
i
j
M
j
N
k
N
l
original
jl
ik
new
original
M
i
i
j
M
j
original
j
i
d
x
x
l
k
d
j
i
d
j
i
d
)
,
(
)
,
(
)
,
(
)
,
(
1
min
2
20th European Conference on Operational
Research, July 4

7, 2004
13
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas
20th European Conference on Operational
Research, July 4

7, 2004
14
Attribute Selection
Usually large number of attributes
Some attributes are redundant or
irrelevant and should be removed
Benefits:
Faster subsequent induction
Simpler models (important in data mining)
Better (predictive) performance of models
Discover which attributes are important
(descriptive or structural knowledge)
20th European Conference on Operational
Research, July 4

7, 2004
15
Optimization Formulation
Define decision variable
Combinatorial optimization problem
Number of solutions is 2
n

1
How should the objective function be defined?
otherwise.
,
0
selected,
is
attribute
if
,
1
j
x
j
j
x
x
x
x
f
j
n
1
,
0
s.t.
,...,
,
max
2
1
x
x
20th European Conference on Operational
Research, July 4

7, 2004
16
Solution Methods
Non

linear objective function
(Defining a good objective is a major issue)
Mathematical programming approach (Bradley,
Mangasarian and Street, 1998)
Metaheuristics have been applied extensively
Genetic algorithms, simulated annealing
Nested partitions method (Olafsson and Yang, 2004)
Intelligent partitioning: take advantage of what is known in
data mining about evaluating attributes
Random instance sampling: in each step the algorithm
uses a sample of instances, which improves scalability
20th European Conference on Operational
Research, July 4

7, 2004
17
Learning from Data
Each data point (instance) represents an
example from which we can learn
The instances are either
Labeled (supervised learning)
One attribute is of special interest (called the class or
target) and each instance is labeled by its class value
Unlabeled (unsupervised learning)
Instances are assumed to be independent
(However, spatial and temporal data mining
are active areas of research)
20th European Conference on Operational
Research, July 4

7, 2004
18
Learning Tasks in Data Mining
Classification (supervised learning)
Learn how to classify data in one of a given
number of categories or classes
Clustering (unsupervised learning)
Learn natural groupings (clusters) of data
Association Rule Discovery
Learn correlations (associations) among the
data instances
Also called market basket analysis
20th European Conference on Operational
Research, July 4

7, 2004
19
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas
20th European Conference on Operational
Research, July 4

7, 2004
20
Classification
Classification is the most common learning
task in data mining
Many methods have been proposed
Decision trees, neural networks, support
vector machines, Bayesian networks, etc.
The algorithm is trained on part of the
data and the accuracy tested on
independent data (or use cross

validation)
Optimization is relevant to many
classification methods
20th European Conference on Operational
Research, July 4

7, 2004
21
Optimization Formulation
Suppose we have
n
attributes and each instance has been
labeled as belonging to one of two classes
Represent by two matrices
A
and
B
Need to learn what separates the points in the two sets (if
they can be separated)
In a 1965
Operations Research
article, Olvi Mangasarian
studied the case where the two sets can be separated with
a hyperplane:
0
,
,
wx
e
Bw
e
Aw
20th European Conference on Operational
Research, July 4

7, 2004
22
Class A
Class B
x
1
x
2
Separating hyperplane
Closest points in
convex hulls
c
d
Separating Hyperplane
20th European Conference on Operational
Research, July 4

7, 2004
23
Finding the Closest Points
0
1
1
s.t.
2
1
min
B
Class
A
Class
B
Class
A
Class
2
,
i
i:
i
i:
i
i:
i
i
i:
i
i
d
c
x
d
x
c
d
c
Formulate as QP:
20th European Conference on Operational
Research, July 4

7, 2004
24
Support Vector Machines
Class A
Class B
Separating
Hyperplane
Support Vectors
x
1
x
2
20th European Conference on Operational
Research, July 4

7, 2004
25
Limitations
The points (instances) may not be separable by a
hyperplane
Add error terms to minimize
A linear separation is quite limited
Class A
Class B
x
1
x
2
Solution is to map the data to a higher dimensional space
20th European Conference on Operational
Research, July 4

7, 2004
26
Wolfe Dual Problem
First formulate the Wolfe dual
Now the data only appears in the dot
product in the objective function
.
0
0
subject to
2
1
max
,
2
i
i
i
i
j
i
j
i
j
i
j
i
i
i
y
C
y
y
x
x
w
α
20th European Conference on Operational
Research, July 4

7, 2004
27
Kernel Functions
Use kernel functions to map the data and replace
the dot product with
For example,
)
(
)
(
)
(
y
x
y
x,
K
H
n
R
:
)
tanh(
)
(
)
(
)
1
(
)
(
2
2
2
/
y
x
y
x,
y
x,
y
x
y
x,
y
x
K
e
K
K
p
20th European Conference on Operational
Research, July 4

7, 2004
28
Other Classification Work
Extensive publications on SVM and mathematical
programming for classifications
Several other approaches also relevant, e.g.
Logical Analysis of Data
(LAD) learns logical
expressions to classify the target attribute (series of
papers by Hammer, Boros, et al.)
Related approach is
Logic Data Miner Lsquare
(e.g.,
talk by Felici, Truemper, and Paola last Monday)
Bayesian networks
are often used, and finding the
best structure of such networks is a combinatorial
optimization problem
Further discussed in the next talk
20th European Conference on Operational
Research, July 4

7, 2004
29
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas
20th European Conference on Operational
Research, July 4

7, 2004
30
Data Clustering
Now we do not have labeled data to train
(unsupervised learning)
Want to identify natural clusters or
groupings of data instances
Many possible set of clusters
What makes a set of clusters good?
20th European Conference on Operational
Research, July 4

7, 2004
31
Optimization Formulation
Given a set
A
of
m
points, find the centers
C
j
of
k
clusters that minimize the 1

norm
This formulation is due to Bradley, Mangasarian,
and Street (1997)
Much more work is needed in this area
k
j
m
i
D
C
A
D
D
e
ij
j
T
i
ij
m
i
ij
T
j
D
C
,...,
1
;
,...,
1
,
s.t.
min
min
1
,
20th European Conference on Operational
Research, July 4

7, 2004
32
Association Rule Discovery
Find strong associations among instances (e.g.,
high support and confidence)
Originally used in market basket analysis, e.g.,
what products are candidates for cross

sell, up

sell, etc.
Define an item as an attribute

value pair
Algorithm approach (Agrawal et al., 1992, Apriori
and related methods):
Generate frequent item sets with high support
Generate rules from these sets with high confidence
20th European Conference on Operational
Research, July 4

7, 2004
33
Objectives for Association Rules
Want high support and high confidence
Maximizing support would lead to only discovering a few
trivial rules (those that occur very frequently)
Maximizing confidence leads to obvious rules (those that
are 100% accurate)
Support and confidence are usually treated as
constraints (user specified minimum)
Still need measures for good rules (i.e., rules that
add insights and are hence interesting)
Significant opportunities for optimizing the rules
that are obtained (not much work, yet)
20th European Conference on Operational
Research, July 4

7, 2004
34
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas
20th European Conference on Operational
Research, July 4

7, 2004
35
Data Mining for OR Applications
Data mining can be used to complement
traditional OR methods in many areas
Example applications areas:
E

commerce
Supply chain management (e.g., to enable
customer

value management in the chain)
Production scheduling
20th European Conference on Operational
Research, July 4

7, 2004
36
Data Mining for Scheduling
Production scheduling is often ad

hoc in practice
Experience and intuition of human schedulers
Li and Olafsson (2004) propose a method to learn
directly from production data
Benefits
Make scheduling practices explicit
Incorporate in automatic scheduling system
Insights into operations
Improve schedules
20th European Conference on Operational
Research, July 4

7, 2004
37
Background
Scheduling task
Given a finite set of jobs, sequence the jobs in
order of priority
Many simple dispatching rules available
Machine learning in scheduling
Considerable work over two decades
Expert systems
Inductive learning
Select dispatching rules from simulated data
Has not been applied directly to scheduling data
(which would be data mining)
20th European Conference on Operational
Research, July 4

7, 2004
38
Simple Example: Dispatching List
Job
ID
Release
Time
Start
Time
Processing
Time
Completion
Time
J5
0
0
17
17
J1
10
17
15
32
J3
18
32
20
52
J4
0
52
7
59
J2
30
59
5
64
How were these five jobs scheduled?
Longest processing time first (LPT)
20th European Conference on Operational
Research, July 4

7, 2004
39
Data Mining Formulation
Determine the target concept
Dispatching rules are a pair

wise comparison
Learning task: Given two jobs, which job
should be dispatched first?
Data preparation
Construct a flat file
Each line (instance/data object) is an example
of the target concept
20th European Conference on Operational
Research, July 4

7, 2004
40
J1
15
10
J2
5
30
Yes
J1
15
10
J3
20
18
Yes
J1
15
10
J4
7
0
Yes
J1
15
10
J5
17
0
No
J2
5
30
J1
15
10
No
J2
5
30
J3
20
18
No
J2
5
30
J4
7
0
No
Job
1
Processing
Time1
Release
1
Job
2
Processing
Time2
Release
2
Job1Scheduled
First
Prepared Data File
20th European Conference on Operational
Research, July 4

7, 2004
41
Input Engineering
Attribute creation (i.e., composite
attributes) and attribute selection is an
important part of data mining
Add attributes:
ProcessingTimeDifference
ReleaseDifference
Job1Longer
Job1ReleasedFirst
Select the best subset of attributes
Apply the C4.5 decision tree algorithm
20th European Conference on Operational
Research, July 4

7, 2004
42
Decision Tree
Yes
No
Job 1 Longer?
Yes
Yes
No
No
Job 1 Released
First?
Job 1 Released
First?
No
Yes

8
>

8
No
Processing Time
Difference
Yes
LPT for
released jobs
No
Do not wait for Job 1
if not much longer than Job 2
Yes
5
> 5
Processing Time
Difference
Wait for Job 1 to be
released if it is much
longer than Job 2
20th European Conference on Operational
Research, July 4

7, 2004
43
Structural Knowledge
The dispatching rule is LPT
Mine data that use this rule
and the processing
time and release time data
The induced model takes into account:
Possible range of processing times
Largest delay caused by a not released job
New structural patterns, not explicitly known by
the dispatcher, discovered
Next step is to improve schedules
Instance selection: learn from best practices
Optimize the decision tree
20th European Conference on Operational
Research, July 4

7, 2004
44
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas
20th European Conference on Operational
Research, July 4

7, 2004
45
Optimizing Decision Trees
Decision tree induction is often unstable
Genetic algorithms have been used to
select the best tree from a set of trees
Kennedy et al. (1997) encode decision trees
and define crossover and mutation operators
The accuracy of the tree is the fitness function
A series of papers by Fu, Golden, et al. (2003;
2004a; 2004b) builds further on this approach
Other optimization methods could also
apply and other outputs can be optimized
20th European Conference on Operational
Research, July 4

7, 2004
46
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas
20th European Conference on Operational
Research, July 4

7, 2004
47
Conclusions
Although data mining related optimization work
dates back to the 1960s,
most problems are
still open
or need more research
Need to be aware of the key concerns of data
mining: extracting
meaningful
, previously
unknown
patterns or knowledge from
large
databases
Algorithms should handle massive data sets, that is, be
scalable with respect to both time and memory use
Results often focus on simple to interpret meaningful
patterns that provide structural insights
Previously unknown means few modeling assumptions
that restrict what can be discovered
20th European Conference on Operational
Research, July 4

7, 2004
48
Open Problems
Many data mining problems can be formulated as
optimization problems
Seen numerous examples, e.g., classification and
attribute selection (most work for these problems)
Many areas have not been addressed or need more work
(in particular, clustering and association rule mining)
Optimizing model outputs is very promising
Use of data mining in OR applications has been
very little investigated
Supply chain management
Logistics and transportation
Planning and scheduling
20th European Conference on Operational
Research, July 4

7, 2004
49
Questions?
For more information after today:
Email me at
olafsson@iastate.edu
Visit my homepage at
http://www.public.iastate.edu/~olafsson
Consult Dilbert
20th European Conference on Operational
Research, July 4

7, 2004
50
Select References
The following surveys on optimization and data mining are available:
1.
Padmanabhan, B. and A. Tuzhilin (2003). “On the Use of Optimization for Data Mining: Theoretical Interactions and
eCRM Opportunities,”
Management Science
49
: 1327

1343.
2.
Bradley, P.S., U.M. Fayyad, and O.L. Mangasarian (1999). “Mathematical Programming for Data Mining: Formulations
and Challenges,”
INFORMS Journal of Computing
11
: 217

238.
Work mentioned in presentation:
3.
Abbiw

Jackson, B. Golden, S. Raghavan, and E. Wasil (2004). “A Divide

and

Conquer Local Search Heuristic for Data
Visualization,”
Working Paper
, University of Maryland.
4.
Boros, E. P.L. Hammer, T. Ibaraki, A. Kogan (1997). “Logical Analysis of Numerical Data,”
Mathematical Programming
79
: 163

190.
5.
Bradley, P.S., O.L. Mangasarian, and W.N. Street (1997). “Clustering via Concave Minimization,” in M.C. Mozer, M.I.
Jordan, T. Petsche (eds.)
Advances in Neural Information Processing Systems
. MIT Press, Cambridge, MA.
6.
Bradley, P.S., O.L. Mangasarian, and W.N. Street (1998). “Feature Selection via Mathematical Programming,”
INFORMS Journal of Computing
10
: 209

217.
7.
Fu, Z., B. Golden, S. Lele, S. Raghavan, and E. Wasil (2003). “A Genetic Algorithm

Based Approach for Building
Accurate Decision Trees,”
INFORMS Journal of Computing
15
: 3

22.
8.
Kennedy, H., C. Chinniah, P. Bradbeer, and L. Morss (1997). “The Construction and Evaluation of Decision Trees: A
Comparison of Evolutionary and Concept Learning Methods,” in D. Corne and J.L. Shapiro (eds.)
Evolutionary
Computing
, Lecture Notes in Computer Science, Springer

Verlag, 147

161.
9.
Li, X. and S. Olafsson (2004). “Discovering Dispatching Rules using Data Mining,”
Journal of Scheduling
, to appear.
10.
Mangasarian, O.L. (1965). “Linear and Nonlinear Separation of Patterns by Linear Programming,”
Operations
Research
13
: 455

461.
11.
Olafsson, S. and J. Yang (2004). “Intelligent Partitioning for Feature Selection,”
INFORMS Journal on Computing
, to
appear.
Comments 0
Log in to post a comment