Learning to Speed Up Search
Bart Selman and Wei Wei
Introduction
In this talk, we’ll survey some promising
recent developments in using learning
methods to speed up search
General methodology: (1) Use machine
learning techniques to uncover
hidden
structure
of the search space. (2) Use this
information to speed up search.
General Observations
Approaches fall into two classes:
A)
Work in the machine learning community. We
will discuss three examples. Promising, but in
general not compared to best other solution
methods.
B)
Approaches coming out of search / SAT
community. Powerful but do not explicitly use
state

of

the

art learning methods.
We will compare and contrast A & B.
Work From the Machine
Learning Community
Three examples
Learn good starting states for local search.
STAGE

Boyan & Moore 1998
Learn structure of search space directly.
MIMIC

Bonet et al. 1996
Learn new objective function that is easier
for local search.
Zhang & Dietterich 1995.
I) STAGE algorithm
Boyan and Moore 1998
Idea: more features of the current state may
help local search
Task: to incorporate these features into
improved evaluation functions, and help
guide search
Method
The algorithm learns the expected outcome
of a local search algorithm given an initial
state V
p
(
s)
Can this function be learned successfully?
Features
State feature vector: problem specific
Example: for SAT, following features are useful:
1.
% of clauses currently unsat (=obj function)
2.
% of clauses satisfied by exactly 1 variable
3.
% of clauses satisfied by exactly 2 variables
4.
% of variables set to their naïve setting
Learner
Fitter: can be any function approximator;
polynomial regression is used in practice.
Training data: generated on the fly; every
LS trajectory produces a series of new
training data.
Restrictions on
p
: it must terminate; it must
be Markovian.
Diagram of STAGE
Run
p
to
Optimize Obj
Hillclimb to
Optimize V
Produces new training data
Produces good start states
Results
Works on many domain, such as bin

packing, channel routing, SAT
On SAT, reduces the number of unsat
clauses on par32 benchmarks (from 9 to 1)
when STAGE learner is introduced to
WalkSAT
Discussion
Is the learned function a good approximation to
V
p
(s)?
–
Somewhat unclear.
(“worrisome”: linear regression performs better than
quadratic regression, which should give a better
approximation. Learning does help however.)
Why not learn a better objective function and search
on that function directly (clause weighing)?
(Zhang and Dietterich, 3
rd
example.)
II) MIMIC
De Bonet
et al
, 1997
MIMIC learns a probability density
distribution over the search space by
repeated and “clever” sampling.
The purpose of retaining this density
distribution is to communicate information
about the search space from one iteration of
the search to the next.
The idea in more detail
If we know nothing about a search space, we look
for its minimum by generating points from a
uniform distribution over all inputs
Less work is necessary if we know the distribution
p
θ
(x)
, which is uniformly distributed over those
inputs with objective O(x)
θ, and has a
probability of 0 elsewhere
In particular, the task is trivial when we when the
distribution
p
θ’
(x)
,
in which θ’= min
x
O(x)
MIMIC algorithm
Starts by generating samples from uniform
distribution, and find the median fitness
θ
0
of
these samples. Then,
1.
Calculate the density estimator of p
θ
i
(x)
2.
Generate more samples p
θ
i
(x)
3.
Let
θ
i+1
be the Nth percentile of the samples.
Retain only the points lower than
θ
i+1
Distribution estimator
The effectiveness of the algorithm depends
on the if
p
θ
(x) can be successfully approximated,
and if the difference between p
θ
i
(x) and p
θ
i+1
(x) is
small enough.
De Bonet
et al.
introduced a quadratic time
algorithm to approximate the distribution
using pairwise conditional probabilities and
unconditional probabilities
Approximation
The true joint probability distribution is
p(X) = p(X
1
X
2
…X
n
)p(X
2
X
3
…X
n
)…p(X
n

1
X
n
)p(X
n
)
Given a permutation of 1…n,
p
=i
1
i
2
…i
n
,
Let
p
p
’(X) = p(X
i
1
X
i
2
)p(X
i
2
X
i
3
)…p(X
i
n

1
X
i
n
)p(X
i
n
)
Ideally, we want to search over all
p
’s to find the
closest one to the true distribution, but there are
too many of them
A greedy algorithm
i
n
= arg min
j
h’(X
j
)
For k = n

2, n

2, …, 2, 1
i
k
= arg min
j
h’(X
j
X
i
k+1
)
Where h’() is the empirical entropy
Results
Beats several standard optimization
algorithms (e.g. PBIL, RHC, GA) in four
peaks, six peaks, and max k

coloring
domains
PBIL
–
standard population based incremental learning
RHC
–
randomized hill climbing
GA
–
genetic algorithm
III) Reinforcement learning for scheduling
Zhang and Dietterich, 1995
Domain: space shuttle payload processing of
NASA
To schedule several jobs; each job has a set of
partially

ordered tasks; each task has a duration
and a list of resource requirements
35 different resources, each of which has many
units available. However, the units are divided
into pools, and a task has to draw its need of a
resource from a single pool
NASA domain continued
Each job has a fixed launch date, but no
starting and ending dates. Most of its tasks
are to be performed before the launch date
Others take place after the launch date
Goal: find a feasible schedule of the jobs
with minimum duration
The algorithm must be able to repair a
schedule in case unforeseen event happens
Approach
Critical path: the tightest schedule without
considering the resource constraints. (the
only consideration is the partial ordering of
the tasks.)
Resource dilation factor (RDF): can be
regarded as a scale

independent measure of
the length of the schedule
Actions: Reassign

Pool and Move
Approach, continued
Start from the critical path
Reinforce function R(s, a, s’) is equal to
–
0.001 if s’ is not a feasible state. R(s, a, s’)
=

RDF(s’, s
0
) otherwise.
Reinforcement Learning
We learn a policy
p
, which tells us what action
(“
local search move
”) to take in every state
We can define a value function f
p
, and f
p
(s) is the
cumulative reward we can get from s on if we
follow
p
We hope to learn the optimal policy
p
*, but we
can learn f
p*
(denoted as f*) instead, because we
can look one step ahead
TD(
l
)
Value function is represented by a feed

forward neural net f(s, W)
At each step, choose the best action
according to current value function, and
update the weight vector:
J
j
= [f(s
j+1
, W) + R(s
j+1
)]
–
f(s
j
, W)
e
j
=
W
f(s
j
, W) +
l
e
j

1
D
W =
a
J
j
e
j
Results
Compared with iterative repair (IR) method
previously used in the domain, temporal
difference (TD) scheduling finds a schedule
3.9% shorter, which translates to 14 days if
the schedule lasts 1 year
Approaches from the Search/SAT
Community
Two Strategies
Clause learning.
Both for backtrack search and for local search.
Clause weighing.
For local search.
Both strategies can be viewed as “changing the
objective function” (while maintaining global
optima).
Clause learning
DPLL
–
branch and backtracking
Learning as a pruning method. Generate
implied clauses during search, and add them
to the clauses database
Clauses are generated by conflict analysis
The technique is employed by state

of

the

art SAT solvers, e.g. Chaff, rel

sat, GRASP
DPLL with learning
while (1) {
if (decide_next_branch()){
//branching
while(deduce()==conflict) {
//deducing
blevel = analyze_conflict();
//learning
if (blevel ==0)
return UNSATISFIABLE;
else back_track(blevel);
//backtracking
}
}
else // all variables got assigned
return SATISFIABLE;
}
Conflict analysis
Learning is based on analysis of conflict

V
6
(1)
V
11
(5)
V
18
(5)

V
18
(5)

V
17
(1)
V
8
(2)

V
10
(5)
V
19
(3)
Learned clause:
V
17
+V
8
’+V
10
+V
19
’
Clause learning
Many schemes available for generating
clauses
Restarting is helpful in DPLL solvers
(Gomes
et al
, 1995). When restarted, all
learned clauses from previous runs are kept
Clause Learning

Local Search
Similar to clause learning in DPLL solvers, adding
new clauses during local search.
(Cha and Iwama, 1996)
Clauses added are one

step resolvents that are
unsat at the local minima
It has similar effects as increasing weights of unsat
clauses.
New approach: add clauses to capture
long range
structure to speed up local search.
(Wei Wei and Selman, CP 2002)
Clause weighing
Used by local search solvers as a way to
“memorize” traps it has encountered.
(Morris 1993; Kautz & Selman 1993)
When search gets stuck, update the weight of each
clause
Effectively change the landscape of search space
during search (learn a better objective function)
Used by a range of efficient stochastic LS
algorithms, e.g. DLM
(Wu and Wah, 2000),
ESG
(Schuurmans
et al
, 2001)
Summary
Recent developments in Machine Learning
Community for using learning to speed up
search are encouraging.
However, so far, comparisons have been done
only against relatively naïve search methods.
Little (or no) follow

up in search/SAT
community.
Success of relatively ad

hoc strategies such
as clause learning and weighing suggests
that more advanced machine learning ideas
may have a significant pay

off.
Key idea: Discover (“learn”) hidden
structure in underlying search space.
It appears time to re

evaluate the machine
learning approaches by incorporating the
ideas in state

of

the

art solvers.
Comments 0
Log in to post a comment