Operations Research & Data Mining - Iowa State University

naivenorthAI and Robotics

Nov 8, 2013 (3 years and 9 months ago)

64 views

Siggi Olafsson

Associate Professor

Department of Industrial Engineering

Iowa State University


20
th

European Conference on Operational Research

Rhodes, Greece, July 5
-

8

Operations Research

&

Data Mining

20th European Conference on Operational
Research, July 4
-
7, 2004

2

Purpose of Talk


Give a definition and an
overview of data mining as
it relates to operations
research


Present some examples to
give the flavor for the type
of work that is possible


My views and future of OR
and data mining


Aim for it to be accessible
without prior knowledge of
data mining

Should

I be
here?

20th European Conference on Operational
Research, July 4
-
7, 2004

3

Overview


Background


Intersection of OR and Data Mining


Optimization algorithms used for data mining


Data visualization


Attribute selection


Classification


Unsupervised learning


Data mining used in OR applications


Production scheduling


Optimization methods applied to output of standard data
mining algorithms


Selecting and improving decision trees


Open research areas

20th European Conference on Operational
Research, July 4
-
7, 2004

4

Background


Rapidly growing interest in data mining among
operations research academics and practitioners


For example evidenced by increased data mining
presence in professional organizations


New INFORMS Section on Data Mining


Large number of data mining sessions at INFORMS and
IIE research conferences


Special issues in Computers & Operations Research, IIE
Transactions, Discrete Applied Mathematics, etc.


Numerous presentations/sessions at this conference

20th European Conference on Operational
Research, July 4
-
7, 2004

5

What is Data Mining?

20th European Conference on Operational
Research, July 4
-
7, 2004

6

What is Data Mining,
Really
?


Extracting
meaningful
, previously
unknown
patterns or knowledge from
large databases



The knowledge discovery process

Define

Objective

Prepare

Data

Mine

Knowledge

Interpret

Results

Data cleaning

Data selection

Attribute selection

Visualization

Classification

Association rule


discovery

Clustering

Business/scientific


objective

Data mining


objective

Predictive models

Structural insights

20th European Conference on Operational
Research, July 4
-
7, 2004

7

Interdisciplinary Field

Statistics

Databases

Optimization

Machine

Learning

Data Mining

20th European Conference on Operational
Research, July 4
-
7, 2004

8

Input Engineering


Preparing the data may take as much as 70% of
the entire effort


Numerous steps, including


Combining data sources


Transforming attributes


Data cleaning


Data selection


Attribute selection


Data visualization


Many of those have connections with operations
research and optimization in particular

20th European Conference on Operational
Research, July 4
-
7, 2004

9

Overview


Background


Intersection of OR and Data Mining


Optimization algorithms used for data mining


Data visualization


Attribute selection


Classification


Unsupervised learning


Data mining used in OR applications


Production scheduling


Optimization methods applied to output of standard data
mining algorithms


Selecting and improving decision trees


Open research areas

20th European Conference on Operational
Research, July 4
-
7, 2004

10

Data Visualization


Visualizing the data is important in any
data mining project


Generally difficult because the data is
always high
-
dimensional, i.e., hundreds or
thousands of attributes (variables)


How can we
best

visualize such data in 2
or 3 dimensions?


Traditional techniques include
multidimensional scaling, which uses
nonlinear optimization

20th European Conference on Operational
Research, July 4
-
7, 2004

11

Optimization Formulation


Recent combinatorial optimization formulation by Abbiw
-
Jackson, Golden, Raghavan, and Wasil (2004)


Map a set
M

of
m

points from
R
r

to
R
q
, q
= 2,3


Approximate the
q
-
dimensional space by a lattice
N



1
,
0
,
1
s.t.
)
,
(
),
,
(
min
1















ik
N
k
ik
M
i
j
M
j
N
k
N
l
jl
ik
new
original
x
M
i
x
x
x
l
k
d
j
i
d
F
etc

map,
Sammon

square,
least

as
such
Function
in

measure

Distance
)
,
(
in

measure

Distance
)
,
(
F
l
k
d
j
i
d
q
new
r
original
R
R
20th European Conference on Operational
Research, July 4
-
7, 2004

12

Solution Methods


Quadratic Assignment Problem (QAP)


Not possible to solve exactly for large scale problems


Local search procedure proposed



Key to the formulation is selection of objective function,
e.g., Sammon map


















M
i
i
j
M
j
N
k
N
l
original
jl
ik
new
original
M
i
i
j
M
j
original
j
i
d
x
x
l
k
d
j
i
d
j
i
d
)
,
(
)
,
(
)
,
(
)
,
(
1
min
2
20th European Conference on Operational
Research, July 4
-
7, 2004

13

Overview


Background


Intersection of OR and Data Mining


Optimization algorithms used for data mining


Data visualization


Attribute selection


Classification


Unsupervised learning


Data mining used in OR applications


Production scheduling


Optimization methods applied to output of standard data
mining algorithms


Selecting and improving decision trees


Open research areas

20th European Conference on Operational
Research, July 4
-
7, 2004

14

Attribute Selection


Usually large number of attributes


Some attributes are redundant or
irrelevant and should be removed


Benefits:


Faster subsequent induction


Simpler models (important in data mining)


Better (predictive) performance of models


Discover which attributes are important
(descriptive or structural knowledge)

20th European Conference on Operational
Research, July 4
-
7, 2004

15

Optimization Formulation


Define decision variable




Combinatorial optimization problem




Number of solutions is 2
n
-
1


How should the objective function be defined?





otherwise.
,
0
selected,

is


attribute

if
,
1
j
x
j




j
x
x
x
x
f
j
n




1
,
0
s.t.
,...,
,
max
2
1
x
x
20th European Conference on Operational
Research, July 4
-
7, 2004

16

Solution Methods


Non
-
linear objective function


(Defining a good objective is a major issue)


Mathematical programming approach (Bradley,
Mangasarian and Street, 1998)


Metaheuristics have been applied extensively


Genetic algorithms, simulated annealing


Nested partitions method (Olafsson and Yang, 2004)


Intelligent partitioning: take advantage of what is known in
data mining about evaluating attributes


Random instance sampling: in each step the algorithm
uses a sample of instances, which improves scalability

20th European Conference on Operational
Research, July 4
-
7, 2004

17

Learning from Data


Each data point (instance) represents an
example from which we can learn


The instances are either


Labeled (supervised learning)


One attribute is of special interest (called the class or
target) and each instance is labeled by its class value


Unlabeled (unsupervised learning)


Instances are assumed to be independent


(However, spatial and temporal data mining
are active areas of research)

20th European Conference on Operational
Research, July 4
-
7, 2004

18

Learning Tasks in Data Mining


Classification (supervised learning)


Learn how to classify data in one of a given
number of categories or classes


Clustering (unsupervised learning)


Learn natural groupings (clusters) of data


Association Rule Discovery


Learn correlations (associations) among the
data instances


Also called market basket analysis

20th European Conference on Operational
Research, July 4
-
7, 2004

19

Overview


Background


Intersection of OR and Data Mining


Optimization algorithms used for data mining


Data visualization


Attribute selection


Classification


Unsupervised learning


Data mining used in OR applications


Production scheduling


Optimization methods applied to output of standard data
mining algorithms


Selecting and improving decision trees


Open research areas

20th European Conference on Operational
Research, July 4
-
7, 2004

20

Classification


Classification is the most common learning
task in data mining


Many methods have been proposed


Decision trees, neural networks, support
vector machines, Bayesian networks, etc.


The algorithm is trained on part of the
data and the accuracy tested on
independent data (or use cross
-
validation)


Optimization is relevant to many
classification methods

20th European Conference on Operational
Research, July 4
-
7, 2004

21

Optimization Formulation


Suppose we have
n

attributes and each instance has been
labeled as belonging to one of two classes


Represent by two matrices
A

and
B


Need to learn what separates the points in the two sets (if
they can be separated)


In a 1965
Operations Research

article, Olvi Mangasarian
studied the case where the two sets can be separated with
a hyperplane:

0
,

,







wx
e
Bw
e
Aw
20th European Conference on Operational
Research, July 4
-
7, 2004

22

Class A


Class B

x
1

x
2

Separating hyperplane

Closest points in

convex hulls

c

d

Separating Hyperplane

20th European Conference on Operational
Research, July 4
-
7, 2004

23

Finding the Closest Points

0
1
1
s.t.
2
1
min
B

Class
A

Class
B

Class
A

Class
2
,










i
i:
i
i:
i
i:
i
i
i:
i
i
d
c
x
d
x
c
d
c





Formulate as QP:

20th European Conference on Operational
Research, July 4
-
7, 2004

24

Support Vector Machines

Class A


Class B

Separating

Hyperplane

Support Vectors

x
1

x
2

20th European Conference on Operational
Research, July 4
-
7, 2004

25

Limitations


The points (instances) may not be separable by a
hyperplane


Add error terms to minimize


A linear separation is quite limited







Class A


Class B

x
1

x
2

Solution is to map the data to a higher dimensional space

20th European Conference on Operational
Research, July 4
-
7, 2004

26

Wolfe Dual Problem


First formulate the Wolfe dual








Now the data only appears in the dot
product in the objective function

.
0
0
subject to
2
1
max
,
2








i
i
i
i
j
i
j
i
j
i
j
i
i
i
y
C
y
y





x
x
w
α
20th European Conference on Operational
Research, July 4
-
7, 2004

27

Kernel Functions


Use kernel functions to map the data and replace
the dot product with




For example,



)
(
)
(
)
(
y
x
y
x,




K
H
n

R
:

)
tanh(
)
(
)
(
)
1
(
)
(
2
2
2
/











y
x
y
x,
y
x,
y
x
y
x,
y
x
K
e
K
K
p
20th European Conference on Operational
Research, July 4
-
7, 2004

28

Other Classification Work


Extensive publications on SVM and mathematical
programming for classifications


Several other approaches also relevant, e.g.


Logical Analysis of Data

(LAD) learns logical
expressions to classify the target attribute (series of
papers by Hammer, Boros, et al.)


Related approach is
Logic Data Miner Lsquare

(e.g.,
talk by Felici, Truemper, and Paola last Monday)


Bayesian networks

are often used, and finding the
best structure of such networks is a combinatorial
optimization problem


Further discussed in the next talk

20th European Conference on Operational
Research, July 4
-
7, 2004

29

Overview


Background


Intersection of OR and Data Mining


Optimization algorithms used for data mining


Data visualization


Attribute selection


Classification


Unsupervised learning


Data mining used in OR applications


Production scheduling


Optimization methods applied to output of standard data
mining algorithms


Selecting and improving decision trees


Open research areas

20th European Conference on Operational
Research, July 4
-
7, 2004

30

Data Clustering


Now we do not have labeled data to train
(unsupervised learning)


Want to identify natural clusters or
groupings of data instances


Many possible set of clusters



What makes a set of clusters good?

20th European Conference on Operational
Research, July 4
-
7, 2004

31

Optimization Formulation


Given a set
A

of
m

points, find the centers
C
j

of
k

clusters that minimize the 1
-
norm






This formulation is due to Bradley, Mangasarian,
and Street (1997)


Much more work is needed in this area



k
j
m
i
D
C
A
D
D
e
ij
j
T
i
ij
m
i
ij
T
j
D
C
,...,
1
;
,...,
1

,
s.t.
min
min
1
,








20th European Conference on Operational
Research, July 4
-
7, 2004

32

Association Rule Discovery


Find strong associations among instances (e.g.,
high support and confidence)


Originally used in market basket analysis, e.g.,
what products are candidates for cross
-
sell, up
-
sell, etc.


Define an item as an attribute
-
value pair


Algorithm approach (Agrawal et al., 1992, Apriori
and related methods):


Generate frequent item sets with high support


Generate rules from these sets with high confidence

20th European Conference on Operational
Research, July 4
-
7, 2004

33

Objectives for Association Rules


Want high support and high confidence


Maximizing support would lead to only discovering a few
trivial rules (those that occur very frequently)


Maximizing confidence leads to obvious rules (those that
are 100% accurate)


Support and confidence are usually treated as
constraints (user specified minimum)


Still need measures for good rules (i.e., rules that
add insights and are hence interesting)


Significant opportunities for optimizing the rules
that are obtained (not much work, yet)

20th European Conference on Operational
Research, July 4
-
7, 2004

34

Overview


Background


Intersection of OR and Data Mining


Optimization algorithms used for data mining


Data visualization


Attribute selection


Classification


Unsupervised learning


Data mining used in OR applications


Production scheduling


Optimization methods applied to output of standard data
mining algorithms


Selecting and improving decision trees


Open research areas

20th European Conference on Operational
Research, July 4
-
7, 2004

35

Data Mining for OR Applications


Data mining can be used to complement
traditional OR methods in many areas


Example applications areas:


E
-
commerce


Supply chain management (e.g., to enable
customer
-
value management in the chain)


Production scheduling

20th European Conference on Operational
Research, July 4
-
7, 2004

36

Data Mining for Scheduling


Production scheduling is often ad
-
hoc in practice


Experience and intuition of human schedulers


Li and Olafsson (2004) propose a method to learn
directly from production data


Benefits


Make scheduling practices explicit


Incorporate in automatic scheduling system


Insights into operations


Improve schedules

20th European Conference on Operational
Research, July 4
-
7, 2004

37

Background


Scheduling task


Given a finite set of jobs, sequence the jobs in
order of priority


Many simple dispatching rules available


Machine learning in scheduling


Considerable work over two decades


Expert systems


Inductive learning


Select dispatching rules from simulated data


Has not been applied directly to scheduling data
(which would be data mining)

20th European Conference on Operational
Research, July 4
-
7, 2004

38

Simple Example: Dispatching List

Job
ID


Release
Time


Start
Time


Processing
Time


Completion
Time


J5


0


0


17


17


J1


10


17


15


32


J3


18


32


20


52


J4


0


52


7


59


J2


30


59


5


64


How were these five jobs scheduled?

Longest processing time first (LPT)

20th European Conference on Operational
Research, July 4
-
7, 2004

39

Data Mining Formulation


Determine the target concept


Dispatching rules are a pair
-
wise comparison


Learning task: Given two jobs, which job
should be dispatched first?


Data preparation


Construct a flat file


Each line (instance/data object) is an example
of the target concept

20th European Conference on Operational
Research, July 4
-
7, 2004

40

J1


15


10


J2


5


30


Yes


J1


15


10


J3


20


18


Yes


J1


15


10


J4


7


0


Yes


J1


15


10


J5


17


0


No


J2


5


30


J1


15


10


No


J2


5


30


J3


20


18


No


J2


5


30


J4


7


0


No


Job
1


Processing
Time1


Release
1


Job

2

Processing
Time2


Release
2


Job1Scheduled
First


Prepared Data File

20th European Conference on Operational
Research, July 4
-
7, 2004

41

Input Engineering


Attribute creation (i.e., composite
attributes) and attribute selection is an
important part of data mining


Add attributes:


ProcessingTimeDifference


ReleaseDifference


Job1Longer


Job1ReleasedFirst


Select the best subset of attributes


Apply the C4.5 decision tree algorithm

20th European Conference on Operational
Research, July 4
-
7, 2004

42

Decision Tree

Yes

No

Job 1 Longer?

Yes

Yes

No

No

Job 1 Released

First?

Job 1 Released

First?

No

Yes



-
8

>
-
8

No

Processing Time

Difference

Yes

LPT for

released jobs

No

Do not wait for Job 1

if not much longer than Job 2

Yes



5

> 5

Processing Time

Difference

Wait for Job 1 to be

released if it is much

longer than Job 2

20th European Conference on Operational
Research, July 4
-
7, 2004

43

Structural Knowledge


The dispatching rule is LPT


Mine data that use this rule
and the processing
time and release time data


The induced model takes into account:


Possible range of processing times


Largest delay caused by a not released job


New structural patterns, not explicitly known by
the dispatcher, discovered


Next step is to improve schedules


Instance selection: learn from best practices


Optimize the decision tree

20th European Conference on Operational
Research, July 4
-
7, 2004

44

Overview


Background


Intersection of OR and Data Mining


Optimization algorithms used for data mining


Data visualization


Attribute selection


Classification


Unsupervised learning


Data mining used in OR applications


Production scheduling


Optimization methods applied to output of standard data
mining algorithms


Selecting and improving decision trees


Open research areas

20th European Conference on Operational
Research, July 4
-
7, 2004

45

Optimizing Decision Trees


Decision tree induction is often unstable


Genetic algorithms have been used to
select the best tree from a set of trees


Kennedy et al. (1997) encode decision trees
and define crossover and mutation operators


The accuracy of the tree is the fitness function


A series of papers by Fu, Golden, et al. (2003;
2004a; 2004b) builds further on this approach


Other optimization methods could also
apply and other outputs can be optimized

20th European Conference on Operational
Research, July 4
-
7, 2004

46

Overview


Background


Intersection of OR and Data Mining


Optimization algorithms used for data mining


Data visualization


Attribute selection


Classification


Unsupervised learning


Data mining used in OR applications


Production scheduling


Optimization methods applied to output of standard data
mining algorithms


Selecting and improving decision trees


Open research areas

20th European Conference on Operational
Research, July 4
-
7, 2004

47

Conclusions


Although data mining related optimization work
dates back to the 1960s,
most problems are
still open

or need more research


Need to be aware of the key concerns of data
mining: extracting
meaningful
, previously
unknown
patterns or knowledge from
large
databases


Algorithms should handle massive data sets, that is, be
scalable with respect to both time and memory use


Results often focus on simple to interpret meaningful
patterns that provide structural insights


Previously unknown means few modeling assumptions
that restrict what can be discovered

20th European Conference on Operational
Research, July 4
-
7, 2004

48

Open Problems


Many data mining problems can be formulated as
optimization problems


Seen numerous examples, e.g., classification and
attribute selection (most work for these problems)


Many areas have not been addressed or need more work
(in particular, clustering and association rule mining)


Optimizing model outputs is very promising


Use of data mining in OR applications has been
very little investigated


Supply chain management


Logistics and transportation


Planning and scheduling

20th European Conference on Operational
Research, July 4
-
7, 2004

49

Questions?


For more information after today:


Email me at
olafsson@iastate.edu


Visit my homepage at
http://www.public.iastate.edu/~olafsson



Consult Dilbert

20th European Conference on Operational
Research, July 4
-
7, 2004

50

Select References


The following surveys on optimization and data mining are available:

1.
Padmanabhan, B. and A. Tuzhilin (2003). “On the Use of Optimization for Data Mining: Theoretical Interactions and
eCRM Opportunities,”
Management Science

49
: 1327
-
1343.

2.
Bradley, P.S., U.M. Fayyad, and O.L. Mangasarian (1999). “Mathematical Programming for Data Mining: Formulations
and Challenges,”
INFORMS Journal of Computing

11
: 217
-
238.


Work mentioned in presentation:

3.
Abbiw
-
Jackson, B. Golden, S. Raghavan, and E. Wasil (2004). “A Divide
-
and
-
Conquer Local Search Heuristic for Data
Visualization,”
Working Paper
, University of Maryland.

4.
Boros, E. P.L. Hammer, T. Ibaraki, A. Kogan (1997). “Logical Analysis of Numerical Data,”
Mathematical Programming

79
: 163
-
190.

5.
Bradley, P.S., O.L. Mangasarian, and W.N. Street (1997). “Clustering via Concave Minimization,” in M.C. Mozer, M.I.
Jordan, T. Petsche (eds.)
Advances in Neural Information Processing Systems
. MIT Press, Cambridge, MA.

6.
Bradley, P.S., O.L. Mangasarian, and W.N. Street (1998). “Feature Selection via Mathematical Programming,”
INFORMS Journal of Computing

10
: 209
-
217.

7.
Fu, Z., B. Golden, S. Lele, S. Raghavan, and E. Wasil (2003). “A Genetic Algorithm
-
Based Approach for Building
Accurate Decision Trees,”
INFORMS Journal of Computing

15
: 3
-
22.

8.
Kennedy, H., C. Chinniah, P. Bradbeer, and L. Morss (1997). “The Construction and Evaluation of Decision Trees: A
Comparison of Evolutionary and Concept Learning Methods,” in D. Corne and J.L. Shapiro (eds.)
Evolutionary
Computing
, Lecture Notes in Computer Science, Springer
-
Verlag, 147
-
161.

9.
Li, X. and S. Olafsson (2004). “Discovering Dispatching Rules using Data Mining,”
Journal of Scheduling
, to appear.

10.
Mangasarian, O.L. (1965). “Linear and Nonlinear Separation of Patterns by Linear Programming,”
Operations
Research

13
: 455
-
461.

11.
Olafsson, S. and J. Yang (2004). “Intelligent Partitioning for Feature Selection,”
INFORMS Journal on Computing
, to
appear.