Learning to Support Constraint Programmers

journeycartΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 4 χρόνια και 2 μήνες)

85 εμφανίσεις

Learning to Support

Constraint Programmers

Susan L. Epstein
1

Gene Freuder
2

and Rick Wallace
2


1
Department of Computer Science

Hunter College and The Graduate Center of

The City University of New York

2
Cork Constraint Computation Centre

Facts about ACE


Learns to solve

constraint satisfaction problems


Learns search heuristics


Can
transfer

what it learns on simple problems to solve
more difficult ones


Can
export

knowledge to ordinary constraint solvers


Both a learner and a
test bed


Heuristic but
complete
: will find a solution, eventually,
if one exists


Guarantees high
-
quality, not optimal, solutions


Begins with substantial
domain knowledge

Outline


The task: constraint satisfaction


Performance results


Reasoning mechanism


Learning


Representations


Constraint satisfaction problem <
X, D, C
>


Solution
: assign a value to every variable
consistent with constraints


Many
real
-
world problems

can be represented and
solved this way (design and configuration, planning
and scheduling, diagnosis and testing)

The Problem Space

Domains

A


{1,2,3}

B


{1,2,4,5,6}

C


{1,2}

D


{1,3}

Constraints

A
= B

A > D

C
≠ D

Variables

A, B, C, D

B

A

C

D

(1 1) (2 2)

(2 1) (3 1) (3 2)

(1 3) (2 1) (2 3)

A Challenging Domain


Constraint solving is NP
-
hard


Problem class parameters: <
n, k, d, t
>

n

= number of variables

k

= maximum domain size

d

= edge
density

(% of possible constraints)

t

=
tightness

(% of value pairs excluded)


Complexity peak
: values for
d

and
t

that make
problems hardest


Heavy
-
tailed distribution difficulty
[Gomes et al., 2002]


Problem may have multiple or no solutions


Unexplored choices may be good

Finding a Path to a Solution


Sequence of decision pairs (select variable,
assign value)


Optimal length: 2
n
for
n

variables


For
n

variables with domain size
d,

there are
(
d
+1)
n

possible states

Select a variable

Assign a value

Solution

B

D=3

No

C=2

A=2



Solution Method

Search from initial state to goal

Domains

A



{1,2,3}

B



{1,2,4,5,6}

C


{1,2}

D


{1,3}

No

D

D=1

No

D

D=1

D=3

No

Constraints

A
= B

A > D

C
≠ D

B

A

C

D

(1 1) (2 2)

(2 1) (3 1) (3 2)

(1 3) (2 1) (2 3)

B=1

A

C

D

A

A=1

C

D

C

C=1

D

Consistency Maintenance


Some values may initially be
inconsistent


Value assignment can
restrict

domains

B=2



A



{1,
2
}

C



{1,2}

D



{1,3}

No

C



{1,2}

D





No other
possibilities

Constraints

A
= B

A > D

C
≠ D

B

B=1

A

A=1

Domains

A



{1,2,
3
}

B



{1,2,
4,5,6
}

C


{1,2}

D


{1,3}

B

A

C

D

(1 1) (2 2)

(2 1) (3 1) (3 2)

(1 3) (2 1) (2 3)

When an inconsistency arises, a
retraction method

removes a value and returns to an earlier state

Retraction

Here!

B=2



A



{1,
2
}

C



{1,2}

D



{1,3}

No!

C



{1,2}

D





B

B=1

A

A=1

B

A

C

D

(1 1) (2 2)

(2 1) (3 1) (3 2)

(1 3) (2 1) (2 3)

Domains

A



{1,2,
3
}

B



{1,2,
4,5,6
}

C


{1,2}

D


{1,3}

Where’s the error?



A=2

B



{
1
,2}

C



{1,2}

D



{1,
2
}

Variable Ordering

l
A good variable ordering can speed search

A

A=1

Domains

A



{1,2,
3
}

B



{1,2,
4,5,6
}

C


{1,2}

D


{1,3}

B



{1,
2
}

C



{1,2}

D





No

B

A

C

D

(1 1) (2 2)

(2 1) (3 1) (3 2)

(1 3) (2 1) (2 3)

Value Ordering

A good value ordering can speed search too

A

A=2

Domains

A



{1,2,
3
}

B



{1,2,
4,5,6
}

C


{1,2}

D


{1,3}

B



{
1
,2}

C



{1,2}

D



{1,
3
}

D

D=1

B



{
1
,2}

C



{
1
,2}

B

B=2

C

C=2

C



{
1
,2}

Solution: A=2, B=2, C=2, D=1

B

A

C

D

(1 1) (2 2)

(2 1) (3 1) (3 2)

(1 3) (2 1) (2 3)

Constraint Solvers Know…


Several consistency methods


Several retraction methods


Many variable ordering heuristics


Many value ordering heuristics


but

the
interactions

among them are not well understood,

nor is one combination best for all problem classes.

Goals of the ACE Project


Characterize

problem classes


Learn

to solve classes of problems well


Evaluate

mixtures of known heuristics


Develop

new heuristics


Explore the role of planning in solution

Outline


The task: constraint satisfaction


Performance results ACE


Reasoning mechanism


Learning


Representation

Experimental Design


Specify problem class, consistency and retraction
methods


Average performance across 10 runs


Learn on
L

problems (halt at 10,000 steps)


To
-
completion testing on
T new
problems


During testing, use only heuristics judged accurate
during learning


Evaluate performance on


Steps to solution


Constraint checks


Retractions


Elapsed time

ACE Learns to Solve Hard
Problems


<30, 8, .24, .66> near the complexity peak


Learn on 80 problems


10 runs, binned in sets of 10 learning problems


Discards 26 of 38 heuristics


Outperforms MinDomain, an “off
-
the
-
shelf” heuristic

Steps to solution

2500

1500

1000

500

2000

1 2 3 4 5 6 7 8

Bin #

Means in
blue
, medians in
red

ACE Rediscovers Brélaz Heuristic


Graph coloring
: assign different colors to adjacent
nodes.


Graph coloring is a kind of constraint satisfaction
problem.


Brélaz
: Minimize dynamic domain, break ties with
maximum forward degree.


ACE learned this consistently on different classes of
graph coloring problems.

[Epstein & Freuder, 2001]

Color each vertex red, blue, or green so pair
of adjacent vertices are different colors.

ACE Discovers a New Heuristic


“Maximize the product of degree and forward degree at
the top of the search tree”


Exported

to several traditional approaches:


Min Domain


Min Domain/Degree


Min Domain + degree preorder


Learned on small problems but
tested

in 10 runs on
n

=
150, domain size 5, density .05, tightness .24


Reduced search tree size

by 25%


96%

[Epstein, Freuder, Wallace,
Morozov, & Samuels 2002]

Outline


The task: constraint satisfaction


Performance results


Reasoning mechanism


Learning


Representation

Constraint
-
Solving Heuristic


Uses domain knowledge


What
problem

classes

does it work well on?


Is it valid
throughout

a single solution?


Can its
dual

also be valid?


How can heuristics be
combined
?


and

where do new heuristics come from?

FORR

(For the Right Reasons)


General architecture for learning and problem solving


Multiple learning methods, multiple representations,
multiple decision rationales


Specialized by
domain knowledge


Learns
useful knowledge

to support reasoning


Specify whether a rationale is correct or heuristic


Learns

to combine rationales to improve problem
solving

[Epstein 1992]

An
Advisor

Implements a
Rationale


Class
-
independent action
-
selection rationale


Supports or opposes actions by
comments


Expresses opinion direction by
strengths



Limitedly
-
rational procedure

< strength, action, Advisor >

current problem state

Advisor

actions

Advisor Categories


Tier 1
: rationales that correctly select a single action


Tier 2
: rationales produce a set of actions directed to a
subgoal


Tier 3
: heuristic rationales that select a single action


Choosing an Action

take action

yes

Tier 1: Reaction
from perfect
knowledge

Victory

T
-
11

T
-
1n



Decision?

begin plan

yes

no

Tier 3: Heuristic
reactions

T
-
31

T
-
32

T
-
3m





Voting

take action

Tier 2: Planning
triggered by
situation recognition

no

P
-
1

P
-
2

P
-
k



Decision?

Current state

Possible actions

ACE’s Domain Knowledge


Consistency maintenance methods
: forward checking,
arc consistency


Backtracking methods
: chronological


21
variable ordering

heuristics


19
value ordering

heuristics


3 languages whose expressions have interpretations as
heuristics


Graph theory knowledge, e.g., connected, acyclic


Constraint solving knowledge, e.g., “only one arc
consistency pass is required on a tree”

An Overview of ACE


The task: constraint satisfaction


Performance results ACE


Reasoning mechanism


Learning


Representation

What ACE Learns


Weighted linear combination for comment strengths


For voting in tier 3 only


Includes only valuable heuristics


Indicates relative accuracy of valuable heuristics


New, learned heuristics


How to restructure tier 3


When random choice is the right thing to do


Acquire knowledge

that supports heuristics (e.g.,
typical solution path length)


Learn from trace of each solved problem


Reward

decisions on perfect solution path


Shorter paths
reward

variable ordering


Longer paths
reward

value ordering


Blame

digression
-
producing decisions in
proportion to error


Valuable Advisor’s weight > baseline’s

Digression
-
based Weight Learning

Select a variable

Assign a value

Solution

digression

Learning New Advisors


Advisor grammar on pairs of concerns


Maximize or minimize


Product or quotient


Stage


Monitor all expressions


Use good ones collectively


Use best ones individually

Outline


The task: constraint satisfaction


Performance results ACE


Reasoning mechanism


Learning


Representation

No

No

No

No

Yes

Representation of Experience


State

describes variables and value assignments,
impossible future values, prior state, connected
components, constraint checks incurred, dynamic edges,
trees


History

of successful decisions


… plus other significant decisions
become
training examples


Is


Can be

Cannot be


A



1

2

B


2





C



1,2




D



1,3



Checks incurred: 4

1 acyclic component: A,C,D

Dynamic edges: AD, CD

Representation of Learned
Knowledge


Weights for Advisors


Solution size distribution


Latest error
: greatest number of variables bound at
retraction

ACE’s Status Report


41 Advisors in tiers 1 and 3


3 languages in which to express additional Advisors


5 experimental planners


Problem classes: random, coloring, geometric, logic,
n
-
queens, small world, and quasigroup (with and
without holes)



Learns to solve hard problems



Learns new heuristics



Transfers to harder problems



Divides and conquers problems



Learns when
not

to reason

Current ACE Research


Further
weight
-
learning

refinements


Learn appropriate
restart

parameters


More

problem classes, consistency methods,
retraction methods, planners, and Advisor languages


Learn
appropriate

consistency checking methods


Learn
appropriate

backtracking methods


Learn to
bias

initial weights


Metaheuristics

to reformulate the architecture


Modeling

strategies

… and, coming soon, ACE on the Web

Acknowledgements

Continued thanks for their ideas and efforts go to:

Diarmuid Grimes

Mark Hennessey

Tiziana Ligorio

Anton Morozov

Smiljana Petrovic

Bruce Samuels

Students of the FORR study group

The Cork Constraint Computation Centre

and, for their support, to:



The National Science Foundation


Science Foundation Ireland

Is ACE Reinforcement Learning?


Similarities:


Unsupervised learning through trial and error


Delayed rewards


Learns a policy


Primary differences:


Reinforcement learning learns a policy represented
as the
estimated values of states it has experienced

repeatedly … but ACE is unlikely to revisit a state;
instead it
learns how to
act


in any state


Q
-
learning learns
state
-
action preferences

… but
ACE learns a policy that
combines

action
preferences

How is ACE like STAGGER?

l

STAGGER

ACE

l
Learns

Boolean classifier

Search control preference




function for a sequence of



decisions in a class of problems

l
Represents

Weighted booleans


Weighted linear function

l
Supervised


Yes

No

l
New elements

Failure
-
driven

Success
-
driven

l
Initial bias


Yes

Under construction

l
Real attributes


Yes

No

[Schlimmer 1987]

l
Both learn search control from unsupervised experience, reinforce
decisions on a successful path, gradually introduce new factors,
specify a threshold, and transfer to harder problems, but…

l

SAGE.2

ACE

l
Learns on

Same task

Different problems in a class

l
Represents

Symbolic rules

Weighted linear function

l
Reinforces

Repeating rules

Correct comments

l
Failure response

Revise

Reduce weight

l
Proportional to error

No

Yes

l
Compares states

Yes

No

l
Random benchmarks

No

Yes

l
Subgoals

No

Yes

l
Learns during search

Yes

No

How is ACE like SAGE.2?

[Langley 1985]