Dimensions in Program Synthesis
Sumit Gulwani
(sumitg@microsoft.com)
Microsoft Research, Redmond
Invited Tutorial
FMCAD 2010
Lugano
, Switzerland
•
What is Program Synthesis?
–
Synthesize an executable program from user intent expressed
in form of some constraints.
1
Automated Program Synthesis
Compilers
Synthesizers
Structured
language input
Can accept a variety
/
mixed
form of constraints
(e.g., logic, examples, traces, partial programs)
Syntax

directed
translation
Uses some kind of search
No new algorithmic insights
Discovers new algorithmic insights
•
Why today?
–
N
atural goal given that computing has become accessible, but:
•
f
undamental “how” programming models have not changed.
•
most people are not expert programmers.
–
Enabling technology is now available
•
Better search/logical reasoning techniques (SAT/SMT solvers
)
•
Faster machines (good application for multi

cores)
Our techniques can synthesize
a wide variety of
algorithms/programs from logic and examples.
•
Undergraduate book algorithms
(e.g., sorting
,
dynamic
p
rog
)
–
[Srivastava/Gulwani/Foster, POPL 2010]
•
Program Inverses
(
e.g
,
deserializers
from
serializers
)
–
[Srivastava/Gulwani/Chaudhuri/Foster, MSR

TR

2010

34
]
•
Graph
Algorithms
(e.g.,
bi

partiteness
check)
–
[
Itzhaky/Gulwani/
Immerman
/Sagiv, OOPSLA 2010]
•
Bit

vector algorithms
(e.g., turn

off rightmost one bit)
–
[Jha/Gulwani/
Seshia
/
Tiwari
, ICSE 2010
]
•
String Manipulating macros
(e.g. ”Helmut
Veith
”

> “
Veith
,
H
.”)
–
[Gulwani, POPL 2011]
•
Geometry Constructions
(e.g
.
construct reg. hexagon given a side)
–
[
Gulwani/Korthikanti/
Tiwari
, recent work]
2
Program Synthesis: Recent Success
•
Dimension 1: User Intent
•
Dimension 2: Search space
•
Dimension 3: Search Technique
•
Applications
–
Bit

vector Algorithms
–
String Manipulation Macros
–
Geometry Constructions
3
Outline
Dimension 1: User Intent
•
Dimension 2: Search space
•
Dimension 3: Search Technique
•
Applications
–
Bit

vector Algorithms
–
String Manipulation Macros
–
Geometry Constructions
4
Outline
•
Logical specifications
–
Logical relations between inputs and outputs
•
Natural language
•
Input

output examples
•
Traces
•
Programs
5
Dimension 1: User Intent
6
Logical Specification: Example 1
Problem:
Sorting
Logical relation between input array A and
output array B of size n.
8
k
: (
0≤k<n

1
)
)
(
B[k
]
≤
B[k
+ 1])
Æ
8
k
9
j
: (
0
≤
k<n

1)
)
(
0
≤
j<n
Æ
B[j
] = A[k])
•
Advances in NLP allow mapping natural language to logic.
–
NL interfaces have been designed for database queries.
•
Natural language can be ambiguous.
–
This issue can be resolved by
paraphrasing
.
7
Natural Language
•
Advantages of Input

Output examples
–
Easy to provide, No need to remember syntax
–
Less chances of mistake
•
What prevents a trivial table

lookup solution on input

output pairs (x
i
,
y
i
)?
Switch x
Case x
1
: return y
1
Case x
2
: return y
2
:
Case
x
n
:
retun
y
n
–
Restriction on search space!
•
How to select examples?
–
Interactive manner!
8
Input

Output Examples
1.
User
provides few input

output
examples I.
2.
Sythesizer
constructs a program
P consistent
with I
.
3.
Process may be repeated after adding new examples.
–
User

driven
Interaction
•
User tests the program on other
inputs I’.
If a discrepancy
is found, user provides a new input

output
example.
–
Synthesizer

driven
Interaction
•
If synthesizer finds another program
P’ consistent
with I
,
it
asks user to provide output for
distinguishing input
.
Typically few iterations are required
–
small teaching dimension [Goldman, Kearns, JCSS ‘95]
–
low Kolmogorov (descriptive) complexity
9
Interaction Model
•
A detailed step

by

step description of how the
program should behave on a given input.
•
Easier to deal with than input

output examples.
–
Some synthesizers that accept input

output examples
first generate traces.
•
A natural model in certain applications.
–
Programming by demonstration systems for end

users.
•
intermediate states resulting from the user’s successive
actions on a user interface constitute a valid trace.
–
Reverse engineering.
•
Convenient in certain scenarios.
–
E.g., consider describing Factorial(7).
•
Trace: 7*6*5*4*3*2 or Recursive Trace: 7*Factorial(6)
•
F
inal simplified output: 5040
10
Traces
•
Dimension 1: User Intent
Dimension 2: Search space
•
Dimension 3: Search Technique
•
Applications
–
Bit

vector Algorithms
–
String Manipulation Macros
–
Geometry Constructions
11
Outline
•
Programs
–
Operators
•
Comparison operators
•
Combination of arithmetic and bitwise operators
•
APIs exported from a given library
–
SortList
= Array2List
◦
SortArray
◦
List2Array
–
Control

flow
•
Given looping template
•
Bounded number of statements
•
Partial program with holes
•
straight

line or loop

free
12
Dimension 2: Search Space
Parameterized by set of components/operators used.
•
Bitvector
Algorithms
–
Components = Arithmetic + Bitwise operators
•
String Manipulation (or Text Editing) Macros
–
Components = editing commands (insert, locate, select, delete)
•
Geometrical Constructions
–
Components = ruler + compass
•
Unbounded data type manipulation
–
Components = linear arithmetic/set operators
–
[PLDI ‘10, Viktor
Kuncak
et
al,
Complete Functional
Synthesis]
•
API call sequences
[PLDI ’05,
Bodik
et al,
Jungloid
Mining]
–
Components = API calls
Can be likened to putting together Jigsaw puzzle pieces.
13
Loop

free Programs
•
Programs
–
Operators
–
Control

flow
•
Grammars
–
Examples: Regular Expressions, DFAs, NFAs, Context Free
Grammars, Regular Transducers
–
Applications: robotics/control
systems, pattern recognition,
computational
linguistics/biology,
data
compression/mining etc.
•
Logics
–
First

order logic + Fixed point
•
= PTIME algorithms over ordered structures
such as strings, graphs
•
E.g., Graph Classifiers: Bipartite
, Acyclic,
Connected
Graph Computations:
Shortest Path,
Cycle,
2
coloring
14
Dimension 2: Search Space
•
Dimension 1: User Intent
•
Dimension 2: Search space
Dimension 3: Search Technique
•
Applications
–
Bit

vector Algorithms
–
String Manipulation Macros
–
Geometry Constructions
15
Outline
Program Synthesis Techniques
•
Exhaustive search
–
Usually requires non

trivial optimizations.
–
Geometry algorithms, Mutual Exclusion algorithms
•
Reduction to SAT/SMT constraints
–
Can leverage engineering advances in recent off

the

shelf
solvers.
–
Bit

vector algorithms, Graph
algorithms, Program inverses
•
Version space algebra
–
Data

structures to efficiently represent and manipulate
huge sets of programs consistent with given observations.
–
String Manipulation macros
•
Machine Learning
–
Bayesian Learning, Belief Propagation
–
QBF Solving
16
•
Dimension 1: User Intent
•
Dimension 2: Search space
•
Dimension 3: Search Technique
•
Applications
Bit

vector Algorithms
–
String Manipulation Macros
–
Geometry Constructions
17
Outline
•
Search Space Dimension: Straight

line
programs that use
–
Arithmetic Operators: +,

,*,/
–
Logical Operators: Bitwise and/or/not, Shift
left/right
•
User Intent Dimension
–
Logical specifications
–
Input

output examples
•
Search Algorithm Dimension: SAT/SMT based techniques
18
Application 1:
Bitvector
Algorithms
Algorithm
Designers
•
Significance Dimension: Algorithm Designers
Consumers of Program Synthesis Technology
Application 1:
Bitvector
Algorithms
19
1 0 1 0 1 1 0 0
Turn

off rightmost 1

bit
20
Examples of
Bitvector
Algorithms
1 0 1 0 1 1 0 0
1 0 1 0 1 0 0 0
Z
Z & (Z

1)
1 0 1 0 1 0 1 1
Z
Z

1
1 0 1 0 1 0 0 0
&
Z & (Z

1)
21
Examples of
Bitvector
Algorithms
Turn

off rightmost contiguous sequence
of
1

bits
Z
Z & (1 + (Z  (Z

1)))
1 0 1 0 1 1 0 0
1 0 1 0 0 0 0 0
Ceil of average of two integers without overflowing
(YZ)
–
((Y
©
Z) >> 1)
22
Examples of
Bitvector
Algorithms
P25: Higher order half
of product of x and y
o1 := and(x,0xFFFF);
o2 :=
shr
(x,16);
o3 := and(y,0xFFFF);
o4 :=
shr
(y,16);
o5 :=
mul
(o1,o3);
o6 :=
mul
(o2,o3);
o7 :=
mul
(o1,o4);
o8 :=
mul
(o2,o4);
o9 :=
shr
(o5,16);
o10 := add(o6,o9);
o11 := and(o10,0xFFFF);
o12 :=
shr
(o10,16);
o13 := add(o7,o11);
o14 :=
shr
(o13,16);
o15 := add(o14,o12);
res := add(o15,o8);
P24: Round up to next
highest power of 2
o1 := sub(x,1);
o2 :=
shr
(o1,1);
o3 := or(o1,o2);
o4 :=
shr
(o3,2);
o5 := or(o3,o4);
o6 :=
shr
(o5,4);
o7 := or(o5,o6);
o8 :=
shr
(o7,8);
o9 := or(o7,o8);
o10 :=
shr
(o9,16);
o11 := or(o9,o10);
res := add(o10,1);
[ICSE 2010] Joint work with Susmit Jha,
Sanjit
Seshia
(UC

Berkeley),
Ashish
Tiwari
(SRI) and
Venkie
(MSR Redmond)
Experiments: Comparison with
Exhaustive Search
23
Program
Brahma
AHA
time
Name
lines
iters
time
P1
2
2
3
0.1
P2
2
3
3
0.1
P3
2
3
1
0.1
P4
2
2
3
0.1
P5
2
3
2
0.1
P6
2
2
2
0.1
P7
3
2
1
2
P8
3
2
1
1
P9
3
2
6
7
P10
3
14
76
10
P11
3
7
57
9
P12
3
9
67
10
Program
Brahma
AHA
time
Name
lines
iters
time
P13
4
4
6
X
P14
4
4
60
X
P15
4
8
119
X
P16
4
5
62
X
P17
4
6
78
109
P18
6
5
46
X
P19
6
5
35
X
P20
7
6
108
X
P21
8
5
28
X
P22
8
8
279
X
P23
10
8
1668
X
P24
12
9
224
X
P25
16
11
2779
X
•
Choice 1: Logical relation between inputs and outputs
•
Choice 2: Input

Output Examples
24
Functional Specification
25
Functional Specification: Logical Relations
Æ
[
(
I[p]=1
Æ
(
I[j]=0)
)
)
(
J
[p]=0
Æ
(J[j] = I[j])
)
]
p=1
b
j=p+1
b
j
p
Problem:
Turn off rightmost 1

bit
Functional Specification of desired behavior
Tool Output:
J = I & (I

1)
Problem:
Turn off rightmost contiguous string of 1

bits
•
Logical Relations
–
A bit complicated
•
Input

Output Examples
–
Key challenge is to resolve under

specification
–
Our solution: Interaction with user
26
Functional Specification
Problem: Turn

off rightmost contiguous string of 1’s
User:
I want a design that maps 01011

> 01000
Oracle:
I can think of two designs
Design 1: (x+1) & (x

1) Design 2: (x+1) & x
which differ on 00000 (Distinguishing Input)
What should 00000 be mapped to?
User:
00000

> 00000
27
Dialog: Interactive Synthesis
Problem: Turn

off rightmost contiguous string of 1’s
User:
01011

> 01000
Oracle:
00000 ?
User:
00000
Oracle:
01111 ?
User:
00000
Oracle:
00110 ?
User:
00000
Oracle:
01100 ?
User:
00000
Oracle:
01010 ?
User:
01000
Oracle:
Your design is
X & (1 + ((x

1)x))
28
Dialog: Interactive Synthesis
•
Dimension 1: User Intent
•
Dimension 2: Search space
•
Dimension 3: Search Technique
•
Applications
–
Bit

vector Algorithms
String Manipulation Macros
–
Geometry Constructions
29
Outline
•
Search Space Dimension: Programs with conditionals/loops
–
String operations: Concatenate, Substring
–
Logical operations: comparison involving # of occurrences of a
regular expression
•
User Intent Dimension: Input

output examples
•
Search Algorithm Dimension: Combination of
–
Version Space Algebras
–
Machine Learning
30
Application 2: String Manipulation Macros
End

Users
Algorithm
Designers
Software Developers
Most Useful
Target
Consumers of Program Synthesis Technology
Application 2: String Manipulation Macros
•
Significance Dimension: End

users
31
1.
I
dentify tasks that end

users struggle with and identify
how they can effectively communicate their intent.
–
Read help

forums and blogs.
–
Interview real users.
2.
Design a language that satisfies the following trade

off.
–
Expressive enough to express a lot of tasks.
–
Small enough to allow efficient learning.
3.
Design a learning algorithm with following features.
–
Interactive with fast convergence (with success or failure).
–
Provide feedback.
–
Noise tolerant.
32
Methodology: Automating end

user programming
Joint work with: Bill Harris (UW, Madison), Rishabh Singh (MIT)
33
Synthesis Algorithm for String Programs
•
Language L of programs contains regular expressions,
conditionals and loops.
•
Goal: Given input

output pairs: (i1, o1), (i2, o2), (i3, o3),
(i4, o4), compute set of all programs P such that
–
P(i1)=o1, P(i2)=o2, P(i3)=o3, P(i4)=o4
–
Then, choose the simplest such P.
34
Synthesis Algorithm for String Programs
1. Compute the set D1 of all straight

line programs
s.t.
for
any Q in D1, Q(i1) = o1. Similarly compute D2, D3, D4.
2. Let D = D1
Å
D2
Å
D3
Å
D4. If D ≠
;
then done, Else:
(a). Find a smallest partition, say {D1,D3}, {D2,D4},
s.t.
D1
Å
D3 ≠
;
and D2
Å
D4
≠
;
(b). Learn a
boolean
classifier
b that maps i1 and i3 to true
and i2 and i4 to false.
3. Desired set of programs is
If (b) then D1
Å
D3
else
D2
Å
D4
•
Dimension 1: User Intent
•
Dimension 2: Search space
•
Dimension 3: Search Technique
•
Applications
–
Bit

vector Algorithms
–
String Manipulation Macros
Geometry Constructions
35
Outline
•
Search Space Dimension: Straight

line programs
–
Operations: Ruler, Compass
•
User Intent Dimension: Logical specifications
–
Can be obtained from natural language
–
Is further translated into a random input

output example
•
Search Algorithm Dimension: Exhaustive Search
–
Property Testing
–
Goal

directed search
–
C
ommonly used library of constructions
36
Application 3: Geometry Constructions
Students and Teachers
End

Users
Algorithm
Designers
Software Developers
Most Useful
Target
Most
Transformational
Target
Consumers of Program Synthesis Technology
Application 3: Geometry Constructions
•
Significance Dimension: Students and Teachers
37
Automating Education
Make education interactive and fun
•
Automated problem solving (for students)
–
Provide
hints
–
Point
out mistakes and suggest fixes
•
Creation of teaching material (for teachers)
–
Authoring tools
–
Problem construction
•
G
roup interaction (for teachers/students)
–
Ask questions
–
Share annotations
Domains: Geometry, Algebra, Probability, Mechanics,
Electrical Circuits, etc.
38
What is the role of PL + Logic + Synthesis?
•
Programming Language for Geometry
–
Objects: Point, Line, Circle
–
Constructors
•
Ruler(Point, Point)

> Line
•
Compass(Point, Point)

> Circle
•
Intersect(Circle, Circle)

> Pair of Points
•
Intersect(Line, Circle)

> Pair of Points
•
Intersect(Line, Line)

> Point
•
Logic for Geometry
–
Inequality predicates over arithmetic expressions
•
Distance(Point, Point), Angle(Line, Line), …
•
Automated Problem Solving
–
Given pre/
postcondition
, synthesize a straight

line program
39
Geometry Constructions Domain
Automated Problem
Solving
•
Given
pre/
postcondition
, synthesize a straight

line program
Example:
Draw a line L’ perpendicular to a given line L.
Precondition: true
Postcondition
: Angle(L’,L) = 90
Program
Step 1: P1, P2 =
ChoosePoint
(L);
Step 2: C1 = Circle(P1,P2);
Step 3: C2 = Circle(P2,P1);
Step 4: <P3, P4> = Intersect(C1,C2);
Step 5: L’ = Line(P3,P4);
40
Geometry Domain: Automated Problem Solving
41
Constructing line L’ perpendicular to given line L
P1
P2
P3
P4
C
1
C2
L
L’
Step 1: P1, P2 =
ChoosePoint
(L);
Step 2: C1 = Circle(P1,P2);
Step 3: C2 = Circle(P2,P1);
Step 4: <P3, P4> = Intersect(C1,C2);
Step 5: L’ = Line(P3,P4);
•
Bisect a given line.
•
Bisect an angle.
•
Copy an angle.
•
Draw a line parallel to a given line.
•
Draw an equilateral triangle given two points.
•
Draw a regular hexagon given a side.
•
Given 4 points, draw a square with each of the
sides passing through a different point.
Other Applications
:
•
New approximate geometric constructions
•
2D/3D planning problems
42
Examples of Geometry Constructions
•
Synthesis, in general, is harder than verification.
–
Synthesis Problem
: Given
pre/
postcondition
, synthesize a
straight

line
program
–
Verification Problem
: Given
pre/
postcondition
, and a straight

line program, determine whether the Hoare triple holds.
•
Decision procedures for verification of geometry
constructions are known, but are complex.
–
Because of symbolic reasoning.
43
Synthesis Algorithm for Geometry Constructions
Precondition: True
Postcondition
: Angle(L,L’) = 90
Step 1: P1, P2 =
ChoosePoint
(L);
Step 2: C1 = Circle(P1,P2);
Step 3: C2 = Circle(P2,P1);
Step 4: <P3, P4> = Intersect(C1,C2);
Step 5: L’ = Line(P3,P4);
•
Symbolic reasoning based decision procedures are complex.
•
How about property testing?
Theorem:
A construction that works (i.e.,
satisfies the
postcondition
) for a randomly chosen model of precondition
also works for all models (
w.h.p
.).
Proof:
•
Objects constructed using ruler/compass can be described
using polynomial ops (+,

,*), square

root
& division operator.
•
The randomized polynomial
i
dentity testing algorithm lifts
to square

root and division operators as well !
44
A simpler strategy for verification of Constructions
•
Problem: Given two polynomials P1 and P2, determine
whether they are equivalent.
•
The naïve deterministic algorithm of expanding
polynomials to compare them term

wise is exponential.
•
A simple randomized test is probabilistically sufficient:
–
Choose random values r for polynomial variables x
–
If P1(r) ≠ P2(r), then P1 is not equivalent to P2.
–
Otherwise P1 is equivalent to P2 with high probability,
45
Randomized Polynomial Identity Testing
Problem: Symbolic reasoning is hard.
Idea #1: Leverage Property Testing to reduce symbolic
reasoning to concrete reasoning.
•
Construct a random input

output example (I,O) for the problem
and find a construction that can generate O from I.
•
Example: Construct
incenter
of a triangle
.
–
If I chose my input triangle to be an equilateral one, then the
circumcenter
construction also appears to work
!
•
Since
incenter
=
circumcenter
for an equilateral
traingle
.
–
But what are the chances of choosing an random triangle to be
an equilateral one?
46
Synthesis Algorithm for Geometry Constructions
Exhuastive
Search Strategy:
Given input objects I and
desired objects O, keep constructing new objects from I
using ruler and compass until objects O are obtained.
Problem:
Search blows up, i.e., too many (useless) objects get
constructed.
–
Example: n points lead to O(n^2) lines, which leads to O(n^4)
points, and so on…
47
Synthesis Algorithm for Geometry Constructions
Problem: Search space is huge.
•
Idea
#2:
Perform goal

directed reasoning.
–
Example: If an operation leads to construction of a line L that
passes through a desired output point, it is worthwhile
constructing line L.
–
Mimics human intelligence
.
–
For this to be effective, we need solutions with small depth.
•
Idea #3: Work with a richer library of primitives.
–
Common constructions picked up from chapters of text

books.
–
A search space of (small width, large depth) is converted into
one of (large width, small depth).
–
Mimics human
experience/knowledge.
48
Synthesis Algorithm for Geometry Constructions
49
Search space Exploration: With/without goal

directness
50
Problem Solving Engine with Natural Interfaces
Natural Language
Processing
Paraphrasing
Synthesis
Engine
Problem Description
in
English
Problem Description
as
Logical Relation
Solution
as
Functional Program
Solution
in
English
Joint work with: Kalika Bali, Monojit Chaudhuri (MSR Bangalore)
Vijay Korthikanti (UIUC), Ashish
Tiwari
(SRI)
Useful modules powered by problem solving engine
The next step is to architect several useful modules on
top of the problem

solving architecture such as:
•
Interactive feedback to students
–
Provide hints
–
Point out mistakes and suggest fixes
•
Creation of teaching material (for teachers)
–
Problem construction
–
Authoring
tools
51
What domains should we prioritize for automation?
•
Mathematics
–
Algebra
–
Probability
•
Physics
–
Mechanics
–
Electrical Circuits
–
Optics
•
Chemistry
–
Quantitative Chemistry
–
Organic Chemistry
52
Other Domains
•
Consider the problem of computing effective resistance
between two nodes in a graph of resistances.
•
MATLAB implements
Kirchoff’s
law based decision procedure
–
A
lgebraic
sum of the currents at any circuit junction
= 0
–
S
um
of
changes
in potential
in any
complete loop
= 0
53
Electrical Circuits: Concept

specific solutions
Joint work with:
Swarat
Chaudhuri (Penn State University)
•
Consider the problem of computing effective resistance
between two nodes in a graph of resistances.
•
Kirchoff’s
law based decision procedure is not useful for
students who are expected to know only simpler concepts.
•
S
olutions need be parameterized by specific concepts such as:
–
Series/Parallel composition of resistances
–
Symmetry Reduction
–
Wheatstone Bridge
54
Electrical Circuits: Concept

specific solutions
Joint work with:
Swarat
Chaudhuri (Penn State University)
55
Resistance Reduction Concepts
If R
3
/R
1
= R
4
/R
2
, then V
D
= V
B
Parallel
Combination
Series
Combination
Wheat

stone
Bridge
Automating Education: Long

term Goals
•
Ultra

intelligent computer
•
Model of human mind
•
Inter

stellar travel
•
User Intent
–
Human Computer Interaction
–
Natural Language Processing
•
Search Space (requires corresponding domain expertise)
–
Graphics (for image manipulation)
–
Mathematics/Physics (for classroom problem solving)
•
Search Techniques
–
Logical Reasoning
–
Machine Learning
57
(Inter

disciplinary) Dimensions in Program Synthesis
Students and Teachers
End

Users
Algorithm
Designers
Software Developers
Most
Transformational
Target
The Significance Dimension
58
Consumers of Program Synthesis Technology
Most Useful
Target
•
How to combine various forms of user intent in a unified
programming interface?
–
logic, natural language, input/output example, partial program
•
How to ensure a modular architecture that allows reuse of
domain knowledge and search techniques across different
synthesis tools/applications?
•
How to combine power of different search techniques?
–
Version space algebras
–
SAT/SMT based logical reasoning techniques
–
Machine learning techniques
59
Research Questions
•
Dimensions in Program Synthesis
–
Invited paper at ACM PPDP 2010
•
Bitvector
Algorithms
–
“Oracle guided component based program synthesis”,
ICSE 2010, Jha/Gulwani/
Seshia
/
Tiwari
•
String
M
anipulation Macros
–
“Automating String Processing in Spreadsheets using Input

Output Examples”, POPL 2011, Gulwani
•
Geometry Constructions
–
“Synthesizing Geometry Constructions”,
Techreport
2011, Gulwani/Korthikanti/
Tiwari
60
References
Comments 0
Log in to post a comment