# Learning to Support Constraint Programmers

AI and Robotics

Oct 15, 2013 (4 years and 7 months ago)

90 views

Learning to Support

Constraint Programmers

Susan L. Epstein
1

Gene Freuder
2

and Rick Wallace
2

1
Department of Computer Science

Hunter College and The Graduate Center of

The City University of New York

2
Cork Constraint Computation Centre

Learns to solve

constraint satisfaction problems

Learns search heuristics

Can
transfer

what it learns on simple problems to solve
more difficult ones

Can
export

knowledge to ordinary constraint solvers

Both a learner and a
test bed

Heuristic but
complete
: will find a solution, eventually,
if one exists

Guarantees high
-
quality, not optimal, solutions

Begins with substantial
domain knowledge

Outline

Performance results

Reasoning mechanism

Learning

Representations

Constraint satisfaction problem <
X, D, C
>

Solution
: assign a value to every variable
consistent with constraints

Many
real
-
world problems

can be represented and
solved this way (design and configuration, planning
and scheduling, diagnosis and testing)

The Problem Space

Domains

A

{1,2,3}

B

{1,2,4,5,6}

C

{1,2}

D

{1,3}

Constraints

A
= B

A > D

C
≠ D

Variables

A, B, C, D

B

A

C

D

(1 1) (2 2)

(2 1) (3 1) (3 2)

(1 3) (2 1) (2 3)

A Challenging Domain

Constraint solving is NP
-
hard

Problem class parameters: <
n, k, d, t
>

n

= number of variables

k

= maximum domain size

d

= edge
density

(% of possible constraints)

t

=
tightness

(% of value pairs excluded)

Complexity peak
: values for
d

and
t

that make
problems hardest

Heavy
-
tailed distribution difficulty
[Gomes et al., 2002]

Problem may have multiple or no solutions

Unexplored choices may be good

Finding a Path to a Solution

Sequence of decision pairs (select variable,
assign value)

Optimal length: 2
n
for
n

variables

For
n

variables with domain size
d,

there are
(
d
+1)
n

possible states

Select a variable

Assign a value

Solution

B

D=3

No

C=2

A=2

Solution Method

Search from initial state to goal

Domains

A

{1,2,3}

B

{1,2,4,5,6}

C

{1,2}

D

{1,3}

No

D

D=1

No

D

D=1

D=3

No

Constraints

A
= B

A > D

C
≠ D

B

A

C

D

(1 1) (2 2)

(2 1) (3 1) (3 2)

(1 3) (2 1) (2 3)

B=1

A

C

D

A

A=1

C

D

C

C=1

D

Consistency Maintenance

Some values may initially be
inconsistent

Value assignment can
restrict

domains

B=2

A

{1,
2
}

C

{1,2}

D

{1,3}

No

C

{1,2}

D

No other
possibilities

Constraints

A
= B

A > D

C
≠ D

B

B=1

A

A=1

Domains

A

{1,2,
3
}

B

{1,2,
4,5,6
}

C

{1,2}

D

{1,3}

B

A

C

D

(1 1) (2 2)

(2 1) (3 1) (3 2)

(1 3) (2 1) (2 3)

When an inconsistency arises, a
retraction method

removes a value and returns to an earlier state

Retraction

Here!

B=2

A

{1,
2
}

C

{1,2}

D

{1,3}

No!

C

{1,2}

D

B

B=1

A

A=1

B

A

C

D

(1 1) (2 2)

(2 1) (3 1) (3 2)

(1 3) (2 1) (2 3)

Domains

A

{1,2,
3
}

B

{1,2,
4,5,6
}

C

{1,2}

D

{1,3}

Where’s the error?

A=2

B

{
1
,2}

C

{1,2}

D

{1,
2
}

Variable Ordering

l
A good variable ordering can speed search

A

A=1

Domains

A

{1,2,
3
}

B

{1,2,
4,5,6
}

C

{1,2}

D

{1,3}

B

{1,
2
}

C

{1,2}

D

No

B

A

C

D

(1 1) (2 2)

(2 1) (3 1) (3 2)

(1 3) (2 1) (2 3)

Value Ordering

A good value ordering can speed search too

A

A=2

Domains

A

{1,2,
3
}

B

{1,2,
4,5,6
}

C

{1,2}

D

{1,3}

B

{
1
,2}

C

{1,2}

D

{1,
3
}

D

D=1

B

{
1
,2}

C

{
1
,2}

B

B=2

C

C=2

C

{
1
,2}

Solution: A=2, B=2, C=2, D=1

B

A

C

D

(1 1) (2 2)

(2 1) (3 1) (3 2)

(1 3) (2 1) (2 3)

Constraint Solvers Know…

Several consistency methods

Several retraction methods

Many variable ordering heuristics

Many value ordering heuristics

but

the
interactions

among them are not well understood,

nor is one combination best for all problem classes.

Goals of the ACE Project

Characterize

problem classes

Learn

to solve classes of problems well

Evaluate

mixtures of known heuristics

Develop

new heuristics

Explore the role of planning in solution

Outline

Performance results ACE

Reasoning mechanism

Learning

Representation

Experimental Design

Specify problem class, consistency and retraction
methods

Average performance across 10 runs

Learn on
L

problems (halt at 10,000 steps)

To
-
completion testing on
T new
problems

During testing, use only heuristics judged accurate
during learning

Evaluate performance on

Steps to solution

Constraint checks

Retractions

Elapsed time

ACE Learns to Solve Hard
Problems

<30, 8, .24, .66> near the complexity peak

Learn on 80 problems

10 runs, binned in sets of 10 learning problems

Outperforms MinDomain, an “off
-
the
-
shelf” heuristic

Steps to solution

2500

1500

1000

500

2000

1 2 3 4 5 6 7 8

Bin #

Means in
blue
, medians in
red

ACE Rediscovers Brélaz Heuristic

Graph coloring
: assign different colors to adjacent
nodes.

Graph coloring is a kind of constraint satisfaction
problem.

Brélaz
: Minimize dynamic domain, break ties with
maximum forward degree.

ACE learned this consistently on different classes of
graph coloring problems.

[Epstein & Freuder, 2001]

Color each vertex red, blue, or green so pair
of adjacent vertices are different colors.

ACE Discovers a New Heuristic

“Maximize the product of degree and forward degree at
the top of the search tree”

Exported

Min Domain

Min Domain/Degree

Min Domain + degree preorder

Learned on small problems but
tested

in 10 runs on
n

=
150, domain size 5, density .05, tightness .24

Reduced search tree size

by 25%

96%

[Epstein, Freuder, Wallace,
Morozov, & Samuels 2002]

Outline

Performance results

Reasoning mechanism

Learning

Representation

Constraint
-
Solving Heuristic

Uses domain knowledge

What
problem

classes

does it work well on?

Is it valid
throughout

a single solution?

Can its
dual

also be valid?

How can heuristics be
combined
?

and

where do new heuristics come from?

FORR

(For the Right Reasons)

General architecture for learning and problem solving

Multiple learning methods, multiple representations,
multiple decision rationales

Specialized by
domain knowledge

Learns
useful knowledge

to support reasoning

Specify whether a rationale is correct or heuristic

Learns

to combine rationales to improve problem
solving

[Epstein 1992]

An

Implements a
Rationale

Class
-
independent action
-
selection rationale

Supports or opposes actions by

Expresses opinion direction by
strengths

Limitedly
-
rational procedure

current problem state

actions

Tier 1
: rationales that correctly select a single action

Tier 2
: rationales produce a set of actions directed to a
subgoal

Tier 3
: heuristic rationales that select a single action

Choosing an Action

take action

yes

Tier 1: Reaction
from perfect
knowledge

Victory

T
-
11

T
-
1n

Decision?

begin plan

yes

no

Tier 3: Heuristic
reactions

T
-
31

T
-
32

T
-
3m

Voting

take action

Tier 2: Planning
triggered by
situation recognition

no

P
-
1

P
-
2

P
-
k

Decision?

Current state

Possible actions

ACE’s Domain Knowledge

Consistency maintenance methods
: forward checking,
arc consistency

Backtracking methods
: chronological

21
variable ordering

heuristics

19
value ordering

heuristics

3 languages whose expressions have interpretations as
heuristics

Graph theory knowledge, e.g., connected, acyclic

Constraint solving knowledge, e.g., “only one arc
consistency pass is required on a tree”

An Overview of ACE

Performance results ACE

Reasoning mechanism

Learning

Representation

What ACE Learns

Weighted linear combination for comment strengths

For voting in tier 3 only

Includes only valuable heuristics

Indicates relative accuracy of valuable heuristics

New, learned heuristics

How to restructure tier 3

When random choice is the right thing to do

Acquire knowledge

that supports heuristics (e.g.,
typical solution path length)

Learn from trace of each solved problem

Reward

decisions on perfect solution path

Shorter paths
reward

variable ordering

Longer paths
reward

value ordering

Blame

digression
-
producing decisions in
proportion to error

Digression
-
based Weight Learning

Select a variable

Assign a value

Solution

digression

Advisor grammar on pairs of concerns

Maximize or minimize

Product or quotient

Stage

Monitor all expressions

Use good ones collectively

Use best ones individually

Outline

Performance results ACE

Reasoning mechanism

Learning

Representation

No

No

No

No

Yes

Representation of Experience

State

describes variables and value assignments,
impossible future values, prior state, connected
components, constraint checks incurred, dynamic edges,
trees

History

of successful decisions

… plus other significant decisions
become
training examples

Is

Can be

Cannot be

A

1

2

B

2

C

1,2

D

1,3

Checks incurred: 4

1 acyclic component: A,C,D

Representation of Learned
Knowledge

Solution size distribution

Latest error
: greatest number of variables bound at
retraction

ACE’s Status Report

41 Advisors in tiers 1 and 3

5 experimental planners

Problem classes: random, coloring, geometric, logic,
n
-
queens, small world, and quasigroup (with and
without holes)

Learns to solve hard problems

Learns new heuristics

Transfers to harder problems

Divides and conquers problems

Learns when
not

to reason

Current ACE Research

Further
weight
-
learning

refinements

Learn appropriate
restart

parameters

More

problem classes, consistency methods,
retraction methods, planners, and Advisor languages

Learn
appropriate

consistency checking methods

Learn
appropriate

backtracking methods

Learn to
bias

initial weights

Metaheuristics

to reformulate the architecture

Modeling

strategies

… and, coming soon, ACE on the Web

Acknowledgements

Continued thanks for their ideas and efforts go to:

Diarmuid Grimes

Mark Hennessey

Tiziana Ligorio

Anton Morozov

Smiljana Petrovic

Bruce Samuels

Students of the FORR study group

The Cork Constraint Computation Centre

and, for their support, to:

The National Science Foundation

Science Foundation Ireland

Is ACE Reinforcement Learning?

Similarities:

Unsupervised learning through trial and error

Delayed rewards

Learns a policy

Primary differences:

Reinforcement learning learns a policy represented
as the
estimated values of states it has experienced

repeatedly … but ACE is unlikely to revisit a state;
learns how to
act

in any state

Q
-
learning learns
state
-
action preferences

… but
ACE learns a policy that
combines

action
preferences

How is ACE like STAGGER?

l

STAGGER

ACE

l
Learns

Boolean classifier

Search control preference

function for a sequence of

decisions in a class of problems

l
Represents

Weighted booleans

Weighted linear function

l
Supervised

Yes

No

l
New elements

Failure
-
driven

Success
-
driven

l
Initial bias

Yes

Under construction

l
Real attributes

Yes

No

[Schlimmer 1987]

l
Both learn search control from unsupervised experience, reinforce
decisions on a successful path, gradually introduce new factors,
specify a threshold, and transfer to harder problems, but…

l

SAGE.2

ACE

l
Learns on

Different problems in a class

l
Represents

Symbolic rules

Weighted linear function

l
Reinforces

Repeating rules

l
Failure response

Revise

Reduce weight

l
Proportional to error

No

Yes

l
Compares states

Yes

No

l
Random benchmarks

No

Yes

l
Subgoals

No

Yes

l
Learns during search

Yes

No

How is ACE like SAGE.2?

[Langley 1985]