Learning to Speed Up Search

unknownlippsΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

70 εμφανίσεις

Learning to Speed Up Search

Bart Selman and Wei Wei


Introduction

In this talk, we’ll survey some promising
recent developments in using learning
methods to speed up search


General methodology: (1) Use machine
learning techniques to uncover
hidden
structure

of the search space. (2) Use this
information to speed up search.

General Observations

Approaches fall into two classes:

A)
Work in the machine learning community. We
will discuss three examples. Promising, but in
general not compared to best other solution
methods.

B)
Approaches coming out of search / SAT
community. Powerful but do not explicitly use
state
-
of
-
the
-
art learning methods.

We will compare and contrast A & B.

Work From the Machine
Learning Community

Three examples

Learn good starting states for local search.


STAGE
---

Boyan & Moore 1998

Learn structure of search space directly.


MIMIC
---

Bonet et al. 1996

Learn new objective function that is easier
for local search.


Zhang & Dietterich 1995.

I) STAGE algorithm

Boyan and Moore 1998

Idea: more features of the current state may
help local search

Task: to incorporate these features into
improved evaluation functions, and help
guide search

Method

The algorithm learns the expected outcome
of a local search algorithm given an initial
state V
p
(
s)

Can this function be learned successfully?

Features

State feature vector: problem specific


Example: for SAT, following features are useful:

1.
% of clauses currently unsat (=obj function)

2.
% of clauses satisfied by exactly 1 variable

3.
% of clauses satisfied by exactly 2 variables

4.
% of variables set to their naïve setting

Learner

Fitter: can be any function approximator;
polynomial regression is used in practice.

Training data: generated on the fly; every
LS trajectory produces a series of new
training data.

Restrictions on
p
: it must terminate; it must
be Markovian.


Diagram of STAGE

Run
p
to

Optimize Obj

Hillclimb to

Optimize V

Produces new training data

Produces good start states

Results

Works on many domain, such as bin
-
packing, channel routing, SAT

On SAT, reduces the number of unsat
clauses on par32 benchmarks (from 9 to 1)
when STAGE learner is introduced to
WalkSAT

Discussion

Is the learned function a good approximation to
V
p
(s)?


Somewhat unclear.


(“worrisome”: linear regression performs better than
quadratic regression, which should give a better
approximation. Learning does help however.)

Why not learn a better objective function and search
on that function directly (clause weighing)?


(Zhang and Dietterich, 3
rd

example.)


II) MIMIC

De Bonet
et al
, 1997

MIMIC learns a probability density
distribution over the search space by
repeated and “clever” sampling.

The purpose of retaining this density
distribution is to communicate information
about the search space from one iteration of
the search to the next.

The idea in more detail

If we know nothing about a search space, we look
for its minimum by generating points from a
uniform distribution over all inputs

Less work is necessary if we know the distribution
p
θ
(x)
, which is uniformly distributed over those
inputs with objective O(x)


θ, and has a
probability of 0 elsewhere

In particular, the task is trivial when we when the
distribution
p
θ’
(x)
,
in which θ’= min
x

O(x)

MIMIC algorithm

Starts by generating samples from uniform
distribution, and find the median fitness
θ
0

of
these samples. Then,

1.
Calculate the density estimator of p
θ
i
(x)

2.
Generate more samples p
θ
i
(x)

3.
Let
θ
i+1

be the Nth percentile of the samples.
Retain only the points lower than
θ
i+1

Distribution estimator

The effectiveness of the algorithm depends
on the if
p
θ
(x) can be successfully approximated,
and if the difference between p
θ
i
(x) and p
θ
i+1
(x) is
small enough.

De Bonet
et al.
introduced a quadratic time
algorithm to approximate the distribution
using pairwise conditional probabilities and
unconditional probabilities

Approximation

The true joint probability distribution is

p(X) = p(X
1
|X
2
…X
n
)p(X
2
|X
3
…X
n
)…p(X
n
-
1
|X
n
)p(X
n
)

Given a permutation of 1…n,
p
=i
1
i
2
…i
n
,

Let

p
p
’(X) = p(X
i
1
|X
i
2
)p(X
i
2
|X
i
3
)…p(X
i
n
-
1
|X
i
n
)p(X
i
n
)


Ideally, we want to search over all
p
’s to find the
closest one to the true distribution, but there are
too many of them

A greedy algorithm

i
n
= arg min
j

h’(X
j
)

For k = n
-
2, n
-
2, …, 2, 1


i
k

= arg min
j

h’(X
j
|X
i
k+1
)


Where h’() is the empirical entropy

Results

Beats several standard optimization
algorithms (e.g. PBIL, RHC, GA) in four
peaks, six peaks, and max k
-
coloring
domains

PBIL


standard population based incremental learning

RHC


randomized hill climbing

GA


genetic algorithm

III) Reinforcement learning for scheduling

Zhang and Dietterich, 1995

Domain: space shuttle payload processing of
NASA

To schedule several jobs; each job has a set of
partially
-
ordered tasks; each task has a duration
and a list of resource requirements

35 different resources, each of which has many
units available. However, the units are divided
into pools, and a task has to draw its need of a
resource from a single pool

NASA domain continued

Each job has a fixed launch date, but no
starting and ending dates. Most of its tasks
are to be performed before the launch date
Others take place after the launch date

Goal: find a feasible schedule of the jobs
with minimum duration

The algorithm must be able to repair a
schedule in case unforeseen event happens

Approach

Critical path: the tightest schedule without
considering the resource constraints. (the
only consideration is the partial ordering of
the tasks.)

Resource dilation factor (RDF): can be
regarded as a scale
-
independent measure of
the length of the schedule

Actions: Reassign
-
Pool and Move

Approach, continued

Start from the critical path

Reinforce function R(s, a, s’) is equal to

0.001 if s’ is not a feasible state. R(s, a, s’)
=
-
RDF(s’, s
0
) otherwise.

Reinforcement Learning

We learn a policy
p
, which tells us what action
(“
local search move
”) to take in every state

We can define a value function f
p
, and f
p
(s) is the
cumulative reward we can get from s on if we
follow
p

We hope to learn the optimal policy
p
*, but we
can learn f
p*

(denoted as f*) instead, because we
can look one step ahead

TD(
l
)

Value function is represented by a feed
-
forward neural net f(s, W)

At each step, choose the best action
according to current value function, and
update the weight vector:

J
j

= [f(s
j+1
, W) + R(s
j+1
)]


f(s
j
, W)

e
j
=

W
f(s
j
, W) +
l
e
j
-
1

D
W =
a
J
j
e
j

Results

Compared with iterative repair (IR) method
previously used in the domain, temporal
difference (TD) scheduling finds a schedule
3.9% shorter, which translates to 14 days if
the schedule lasts 1 year

Approaches from the Search/SAT

Community

Two Strategies

Clause learning.


Both for backtrack search and for local search.

Clause weighing.


For local search.


Both strategies can be viewed as “changing the
objective function” (while maintaining global
optima).



Clause learning

DPLL


branch and backtracking

Learning as a pruning method. Generate
implied clauses during search, and add them
to the clauses database

Clauses are generated by conflict analysis

The technique is employed by state
-
of
-
the
-
art SAT solvers, e.g. Chaff, rel
-
sat, GRASP

DPLL with learning

while (1) {


if (decide_next_branch()){


//branching


while(deduce()==conflict) {


//deducing





blevel = analyze_conflict();

//learning




if (blevel ==0)





return UNSATISFIABLE;




else back_track(blevel);

//backtracking




}



}



else // all variables got assigned



return SATISFIABLE;

}

Conflict analysis

Learning is based on analysis of conflict


-
V
6
(1)

V
11
(5)

V
18
(5)

-
V
18
(5)

-
V
17
(1)

V
8
(2)

-
V
10
(5)

V
19
(3)

Learned clause:

V
17
+V
8
’+V
10
+V
19


Clause learning

Many schemes available for generating
clauses

Restarting is helpful in DPLL solvers
(Gomes
et al
, 1995). When restarted, all
learned clauses from previous runs are kept

Clause Learning
---

Local Search

Similar to clause learning in DPLL solvers, adding
new clauses during local search.


(Cha and Iwama, 1996)

Clauses added are one
-
step resolvents that are
unsat at the local minima

It has similar effects as increasing weights of unsat
clauses.

New approach: add clauses to capture
long range

structure to speed up local search.


(Wei Wei and Selman, CP 2002)

Clause weighing

Used by local search solvers as a way to
“memorize” traps it has encountered.


(Morris 1993; Kautz & Selman 1993)

When search gets stuck, update the weight of each
clause

Effectively change the landscape of search space
during search (learn a better objective function)

Used by a range of efficient stochastic LS
algorithms, e.g. DLM
(Wu and Wah, 2000),

ESG
(Schuurmans
et al
, 2001)

Summary

Recent developments in Machine Learning
Community for using learning to speed up
search are encouraging.

However, so far, comparisons have been done
only against relatively naïve search methods.

Little (or no) follow
-
up in search/SAT
community.


Success of relatively ad
-
hoc strategies such
as clause learning and weighing suggests
that more advanced machine learning ideas
may have a significant pay
-
off.


Key idea: Discover (“learn”) hidden
structure in underlying search space.



It appears time to re
-
evaluate the machine
learning approaches by incorporating the
ideas in state
-
of
-
the
-
art solvers.