Learning to Support
Constraint Programmers
Susan L. Epstein
1
Gene Freuder
2
and Rick Wallace
2
1
Department of Computer Science
Hunter College and The Graduate Center of
The City University of New York
2
Cork Constraint Computation Centre
Facts about ACE
Learns to solve
constraint satisfaction problems
Learns search heuristics
Can
transfer
what it learns on simple problems to solve
more difficult ones
Can
export
knowledge to ordinary constraint solvers
Both a learner and a
test bed
Heuristic but
complete
: will find a solution, eventually,
if one exists
Guarantees high

quality, not optimal, solutions
Begins with substantial
domain knowledge
Outline
The task: constraint satisfaction
Performance results
Reasoning mechanism
Learning
Representations
Constraint satisfaction problem <
X, D, C
>
Solution
: assign a value to every variable
consistent with constraints
Many
real

world problems
can be represented and
solved this way (design and configuration, planning
and scheduling, diagnosis and testing)
The Problem Space
Domains
A
{1,2,3}
B
{1,2,4,5,6}
C
{1,2}
D
{1,3}
Constraints
A
= B
A > D
C
≠ D
Variables
A, B, C, D
B
A
C
D
(1 1) (2 2)
(2 1) (3 1) (3 2)
(1 3) (2 1) (2 3)
A Challenging Domain
Constraint solving is NP

hard
Problem class parameters: <
n, k, d, t
>
n
= number of variables
k
= maximum domain size
d
= edge
density
(% of possible constraints)
t
=
tightness
(% of value pairs excluded)
Complexity peak
: values for
d
and
t
that make
problems hardest
Heavy

tailed distribution difficulty
[Gomes et al., 2002]
Problem may have multiple or no solutions
Unexplored choices may be good
Finding a Path to a Solution
Sequence of decision pairs (select variable,
assign value)
Optimal length: 2
n
for
n
variables
For
n
variables with domain size
d,
there are
(
d
+1)
n
possible states
Select a variable
Assign a value
Solution
B
D=3
No
C=2
A=2
…
Solution Method
Search from initial state to goal
Domains
A
{1,2,3}
B
{1,2,4,5,6}
C
{1,2}
D
{1,3}
No
D
D=1
No
D
D=1
D=3
No
Constraints
A
= B
A > D
C
≠ D
B
A
C
D
(1 1) (2 2)
(2 1) (3 1) (3 2)
(1 3) (2 1) (2 3)
B=1
A
C
D
A
A=1
C
D
C
C=1
D
Consistency Maintenance
Some values may initially be
inconsistent
Value assignment can
restrict
domains
B=2
…
A
{1,
2
}
C
{1,2}
D
{1,3}
No
C
{1,2}
D
No other
possibilities
Constraints
A
= B
A > D
C
≠ D
B
B=1
A
A=1
Domains
A
{1,2,
3
}
B
{1,2,
4,5,6
}
C
{1,2}
D
{1,3}
B
A
C
D
(1 1) (2 2)
(2 1) (3 1) (3 2)
(1 3) (2 1) (2 3)
When an inconsistency arises, a
retraction method
removes a value and returns to an earlier state
Retraction
Here!
B=2
…
A
{1,
2
}
C
{1,2}
D
{1,3}
No!
C
{1,2}
D
B
B=1
A
A=1
B
A
C
D
(1 1) (2 2)
(2 1) (3 1) (3 2)
(1 3) (2 1) (2 3)
Domains
A
{1,2,
3
}
B
{1,2,
4,5,6
}
C
{1,2}
D
{1,3}
Where’s the error?
…
A=2
B
{
1
,2}
C
{1,2}
D
{1,
2
}
Variable Ordering
l
A good variable ordering can speed search
A
A=1
Domains
A
{1,2,
3
}
B
{1,2,
4,5,6
}
C
{1,2}
D
{1,3}
B
{1,
2
}
C
{1,2}
D
No
B
A
C
D
(1 1) (2 2)
(2 1) (3 1) (3 2)
(1 3) (2 1) (2 3)
Value Ordering
A good value ordering can speed search too
A
A=2
Domains
A
{1,2,
3
}
B
{1,2,
4,5,6
}
C
{1,2}
D
{1,3}
B
{
1
,2}
C
{1,2}
D
{1,
3
}
D
D=1
B
{
1
,2}
C
{
1
,2}
B
B=2
C
C=2
C
{
1
,2}
Solution: A=2, B=2, C=2, D=1
B
A
C
D
(1 1) (2 2)
(2 1) (3 1) (3 2)
(1 3) (2 1) (2 3)
Constraint Solvers Know…
Several consistency methods
Several retraction methods
Many variable ordering heuristics
Many value ordering heuristics
…
but
the
interactions
among them are not well understood,
nor is one combination best for all problem classes.
Goals of the ACE Project
Characterize
problem classes
Learn
to solve classes of problems well
Evaluate
mixtures of known heuristics
Develop
new heuristics
Explore the role of planning in solution
Outline
The task: constraint satisfaction
Performance results ACE
Reasoning mechanism
Learning
Representation
Experimental Design
Specify problem class, consistency and retraction
methods
Average performance across 10 runs
Learn on
L
problems (halt at 10,000 steps)
To

completion testing on
T new
problems
During testing, use only heuristics judged accurate
during learning
Evaluate performance on
Steps to solution
Constraint checks
Retractions
Elapsed time
ACE Learns to Solve Hard
Problems
<30, 8, .24, .66> near the complexity peak
Learn on 80 problems
10 runs, binned in sets of 10 learning problems
Discards 26 of 38 heuristics
Outperforms MinDomain, an “off

the

shelf” heuristic
Steps to solution
2500
1500
1000
500
2000
1 2 3 4 5 6 7 8
Bin #
Means in
blue
, medians in
red
ACE Rediscovers Brélaz Heuristic
Graph coloring
: assign different colors to adjacent
nodes.
Graph coloring is a kind of constraint satisfaction
problem.
Brélaz
: Minimize dynamic domain, break ties with
maximum forward degree.
ACE learned this consistently on different classes of
graph coloring problems.
[Epstein & Freuder, 2001]
Color each vertex red, blue, or green so pair
of adjacent vertices are different colors.
ACE Discovers a New Heuristic
“Maximize the product of degree and forward degree at
the top of the search tree”
Exported
to several traditional approaches:
Min Domain
Min Domain/Degree
Min Domain + degree preorder
Learned on small problems but
tested
in 10 runs on
n
=
150, domain size 5, density .05, tightness .24
Reduced search tree size
by 25%
–
96%
[Epstein, Freuder, Wallace,
Morozov, & Samuels 2002]
Outline
The task: constraint satisfaction
Performance results
Reasoning mechanism
Learning
Representation
Constraint

Solving Heuristic
Uses domain knowledge
What
problem
classes
does it work well on?
Is it valid
throughout
a single solution?
Can its
dual
also be valid?
How can heuristics be
combined
?
…
and
where do new heuristics come from?
FORR
(For the Right Reasons)
General architecture for learning and problem solving
Multiple learning methods, multiple representations,
multiple decision rationales
Specialized by
domain knowledge
Learns
useful knowledge
to support reasoning
Specify whether a rationale is correct or heuristic
Learns
to combine rationales to improve problem
solving
[Epstein 1992]
An
Advisor
Implements a
Rationale
Class

independent action

selection rationale
Supports or opposes actions by
comments
Expresses opinion direction by
strengths
Limitedly

rational procedure
< strength, action, Advisor >
current problem state
Advisor
actions
Advisor Categories
Tier 1
: rationales that correctly select a single action
Tier 2
: rationales produce a set of actions directed to a
subgoal
Tier 3
: heuristic rationales that select a single action
Choosing an Action
take action
yes
Tier 1: Reaction
from perfect
knowledge
Victory
T

11
T

1n
…
Decision?
begin plan
yes
no
Tier 3: Heuristic
reactions
T

31
T

32
T

3m
…
…
Voting
take action
Tier 2: Planning
triggered by
situation recognition
no
P

1
P

2
P

k
…
Decision?
Current state
Possible actions
ACE’s Domain Knowledge
Consistency maintenance methods
: forward checking,
arc consistency
Backtracking methods
: chronological
21
variable ordering
heuristics
19
value ordering
heuristics
3 languages whose expressions have interpretations as
heuristics
Graph theory knowledge, e.g., connected, acyclic
Constraint solving knowledge, e.g., “only one arc
consistency pass is required on a tree”
An Overview of ACE
The task: constraint satisfaction
Performance results ACE
Reasoning mechanism
Learning
Representation
What ACE Learns
Weighted linear combination for comment strengths
For voting in tier 3 only
Includes only valuable heuristics
Indicates relative accuracy of valuable heuristics
New, learned heuristics
How to restructure tier 3
When random choice is the right thing to do
Acquire knowledge
that supports heuristics (e.g.,
typical solution path length)
Learn from trace of each solved problem
Reward
decisions on perfect solution path
Shorter paths
reward
variable ordering
Longer paths
reward
value ordering
Blame
digression

producing decisions in
proportion to error
Valuable Advisor’s weight > baseline’s
Digression

based Weight Learning
Select a variable
Assign a value
Solution
digression
Learning New Advisors
Advisor grammar on pairs of concerns
Maximize or minimize
Product or quotient
Stage
Monitor all expressions
Use good ones collectively
Use best ones individually
Outline
The task: constraint satisfaction
Performance results ACE
Reasoning mechanism
Learning
Representation
No
No
No
No
Yes
Representation of Experience
State
describes variables and value assignments,
impossible future values, prior state, connected
components, constraint checks incurred, dynamic edges,
trees
History
of successful decisions
… plus other significant decisions
become
training examples
Is
Can be
Cannot be
A
—
1
2
B
2
—
—
C
—
1,2
—
D
—
1,3
—
Checks incurred: 4
1 acyclic component: A,C,D
Dynamic edges: AD, CD
Representation of Learned
Knowledge
Weights for Advisors
Solution size distribution
Latest error
: greatest number of variables bound at
retraction
ACE’s Status Report
41 Advisors in tiers 1 and 3
3 languages in which to express additional Advisors
5 experimental planners
Problem classes: random, coloring, geometric, logic,
n

queens, small world, and quasigroup (with and
without holes)
Learns to solve hard problems
Learns new heuristics
Transfers to harder problems
Divides and conquers problems
Learns when
not
to reason
Current ACE Research
Further
weight

learning
refinements
Learn appropriate
restart
parameters
More
problem classes, consistency methods,
retraction methods, planners, and Advisor languages
Learn
appropriate
consistency checking methods
Learn
appropriate
backtracking methods
Learn to
bias
initial weights
Metaheuristics
to reformulate the architecture
Modeling
strategies
… and, coming soon, ACE on the Web
Acknowledgements
Continued thanks for their ideas and efforts go to:
Diarmuid Grimes
Mark Hennessey
Tiziana Ligorio
Anton Morozov
Smiljana Petrovic
Bruce Samuels
Students of the FORR study group
The Cork Constraint Computation Centre
and, for their support, to:
The National Science Foundation
Science Foundation Ireland
Is ACE Reinforcement Learning?
Similarities:
Unsupervised learning through trial and error
Delayed rewards
Learns a policy
Primary differences:
Reinforcement learning learns a policy represented
as the
estimated values of states it has experienced
repeatedly … but ACE is unlikely to revisit a state;
instead it
learns how to
act
in any state
Q

learning learns
state

action preferences
… but
ACE learns a policy that
combines
action
preferences
How is ACE like STAGGER?
l
STAGGER
ACE
l
Learns
Boolean classifier
Search control preference
function for a sequence of
decisions in a class of problems
l
Represents
Weighted booleans
Weighted linear function
l
Supervised
Yes
No
l
New elements
Failure

driven
Success

driven
l
Initial bias
Yes
Under construction
l
Real attributes
Yes
No
[Schlimmer 1987]
l
Both learn search control from unsupervised experience, reinforce
decisions on a successful path, gradually introduce new factors,
specify a threshold, and transfer to harder problems, but…
l
SAGE.2
ACE
l
Learns on
Same task
Different problems in a class
l
Represents
Symbolic rules
Weighted linear function
l
Reinforces
Repeating rules
Correct comments
l
Failure response
Revise
Reduce weight
l
Proportional to error
No
Yes
l
Compares states
Yes
No
l
Random benchmarks
No
Yes
l
Subgoals
No
Yes
l
Learns during search
Yes
No
How is ACE like SAGE.2?
[Langley 1985]
Comments 0
Log in to post a comment