Dimensions in Program Synthesis

spraytownspeakerAI and Robotics

Oct 16, 2013 (4 years and 28 days ago)

79 views

Dimensions in Program Synthesis


Sumit Gulwani

(sumitg@microsoft.com)

Microsoft Research, Redmond

Invited Tutorial

FMCAD 2010

Lugano
, Switzerland


What is Program Synthesis?


Synthesize an executable program from user intent expressed
in form of some constraints.


1

Automated Program Synthesis

Compilers

Synthesizers

Structured

language input

Can accept a variety
/
mixed

form of constraints
(e.g., logic, examples, traces, partial programs)

Syntax
-
directed

translation

Uses some kind of search

No new algorithmic insights

Discovers new algorithmic insights


Why today?


N
atural goal given that computing has become accessible, but:


f
undamental “how” programming models have not changed.


most people are not expert programmers.


Enabling technology is now available


Better search/logical reasoning techniques (SAT/SMT solvers
)


Faster machines (good application for multi
-
cores)

Our techniques can synthesize
a wide variety of
algorithms/programs from logic and examples.


Undergraduate book algorithms
(e.g., sorting
,
dynamic
p
rog
)


[Srivastava/Gulwani/Foster, POPL 2010]


Program Inverses
(
e.g
,
deserializers

from
serializers
)


[Srivastava/Gulwani/Chaudhuri/Foster, MSR
-
TR
-
2010
-
34
]


Graph
Algorithms
(e.g.,
bi
-
partiteness

check)


[
Itzhaky/Gulwani/
Immerman
/Sagiv, OOPSLA 2010]


Bit
-
vector algorithms
(e.g., turn
-
off rightmost one bit)


[Jha/Gulwani/
Seshia
/
Tiwari
, ICSE 2010
]


String Manipulating macros
(e.g. ”Helmut
Veith

-
> “
Veith
,
H
.”)


[Gulwani, POPL 2011]


Geometry Constructions
(e.g
.
construct reg. hexagon given a side)


[
Gulwani/Korthikanti/
Tiwari
, recent work]




2

Program Synthesis: Recent Success



Dimension 1: User Intent



Dimension 2: Search space



Dimension 3: Search Technique



Applications


Bit
-
vector Algorithms


String Manipulation Macros


Geometry Constructions

3

Outline



Dimension 1: User Intent



Dimension 2: Search space



Dimension 3: Search Technique



Applications


Bit
-
vector Algorithms


String Manipulation Macros


Geometry Constructions

4

Outline


Logical specifications


Logical relations between inputs and outputs



Natural language



Input
-
output examples



Traces



Programs

5

Dimension 1: User Intent

6

Logical Specification: Example 1


Problem:
Sorting


Logical relation between input array A and

output array B of size n.

8
k
: (
0≤k<n
-
1
)
)
(
B[k
]


B[k
+ 1])

Æ
8
k
9
j
: (
0

k<n
-
1)
)

(
0

j<n
Æ
B[j
] = A[k])


Advances in NLP allow mapping natural language to logic.


NL interfaces have been designed for database queries.



Natural language can be ambiguous.


This issue can be resolved by
paraphrasing
.

7

Natural Language


Advantages of Input
-
Output examples


Easy to provide, No need to remember syntax


Less chances of mistake



What prevents a trivial table
-
lookup solution on input
-
output pairs (x
i
,
y
i
)?



Switch x



Case x
1
: return y
1



Case x
2
: return y
2




:



Case
x
n
:
retun

y
n



Restriction on search space!



How to select examples?


Interactive manner!




8

Input
-
Output Examples

1.
User
provides few input
-
output
examples I.

2.
Sythesizer

constructs a program
P consistent
with I
.

3.
Process may be repeated after adding new examples.


User
-
driven
Interaction


User tests the program on other
inputs I’.
If a discrepancy
is found, user provides a new input
-
output
example.


Synthesizer
-
driven
Interaction


If synthesizer finds another program
P’ consistent
with I
,
it
asks user to provide output for
distinguishing input
.

Typically few iterations are required


small teaching dimension [Goldman, Kearns, JCSS ‘95]


low Kolmogorov (descriptive) complexity


9

Interaction Model


A detailed step
-
by
-
step description of how the
program should behave on a given input.


Easier to deal with than input
-
output examples.


Some synthesizers that accept input
-
output examples
first generate traces.


A natural model in certain applications.


Programming by demonstration systems for end
-
users.


intermediate states resulting from the user’s successive
actions on a user interface constitute a valid trace.


Reverse engineering.


Convenient in certain scenarios.


E.g., consider describing Factorial(7).


Trace: 7*6*5*4*3*2 or Recursive Trace: 7*Factorial(6)


F
inal simplified output: 5040


10

Traces



Dimension 1: User Intent



Dimension 2: Search space



Dimension 3: Search Technique



Applications


Bit
-
vector Algorithms


String Manipulation Macros


Geometry Constructions

11

Outline


Programs


Operators


Comparison operators


Combination of arithmetic and bitwise operators


APIs exported from a given library


SortList

= Array2List

SortArray

List2Array


Control
-
flow


Given looping template


Bounded number of statements


Partial program with holes


straight
-
line or loop
-
free

12

Dimension 2: Search Space

Parameterized by set of components/operators used.


Bitvector

Algorithms


Components = Arithmetic + Bitwise operators


String Manipulation (or Text Editing) Macros


Components = editing commands (insert, locate, select, delete)


Geometrical Constructions


Components = ruler + compass


Unbounded data type manipulation


Components = linear arithmetic/set operators


[PLDI ‘10, Viktor
Kuncak

et
al,
Complete Functional
Synthesis]


API call sequences
[PLDI ’05,
Bodik

et al,
Jungloid

Mining]


Components = API calls

Can be likened to putting together Jigsaw puzzle pieces.

13

Loop
-
free Programs


Programs


Operators


Control
-
flow


Grammars


Examples: Regular Expressions, DFAs, NFAs, Context Free
Grammars, Regular Transducers


Applications: robotics/control
systems, pattern recognition,
computational
linguistics/biology,
data
compression/mining etc.


Logics


First
-
order logic + Fixed point


= PTIME algorithms over ordered structures
such as strings, graphs


E.g., Graph Classifiers: Bipartite
, Acyclic,
Connected


Graph Computations:
Shortest Path,
Cycle,
2
coloring

14

Dimension 2: Search Space



Dimension 1: User Intent



Dimension 2: Search space



Dimension 3: Search Technique



Applications


Bit
-
vector Algorithms


String Manipulation Macros


Geometry Constructions

15

Outline

Program Synthesis Techniques


Exhaustive search


Usually requires non
-
trivial optimizations.


Geometry algorithms, Mutual Exclusion algorithms


Reduction to SAT/SMT constraints


Can leverage engineering advances in recent off
-
the
-
shelf
solvers.


Bit
-
vector algorithms, Graph
algorithms, Program inverses


Version space algebra


Data
-
structures to efficiently represent and manipulate
huge sets of programs consistent with given observations.


String Manipulation macros


Machine Learning


Bayesian Learning, Belief Propagation


QBF Solving

16



Dimension 1: User Intent



Dimension 2: Search space



Dimension 3: Search Technique



Applications


Bit
-
vector Algorithms


String Manipulation Macros


Geometry Constructions

17

Outline



Search Space Dimension: Straight
-
line
programs that use


Arithmetic Operators: +,
-
,*,/


Logical Operators: Bitwise and/or/not, Shift
left/right



User Intent Dimension


Logical specifications


Input
-
output examples




Search Algorithm Dimension: SAT/SMT based techniques



18

Application 1:
Bitvector

Algorithms



Algorithm

Designers


Significance Dimension: Algorithm Designers

Consumers of Program Synthesis Technology

Application 1:
Bitvector

Algorithms

19

1 0 1 0 1 1 0 0

Turn
-
off rightmost 1
-
bit


20

Examples of
Bitvector

Algorithms

1 0 1 0 1 1 0 0

1 0 1 0 1 0 0 0

Z

Z & (Z
-
1)

1 0 1 0 1 0 1 1

Z


Z
-
1

1 0 1 0 1 0 0 0

&

Z & (Z
-
1)

21

Examples of
Bitvector

Algorithms

Turn
-
off rightmost contiguous sequence

of
1
-
bits


Z

Z & (1 + (Z | (Z
-
1)))

1 0 1 0 1 1 0 0

1 0 1 0 0 0 0 0

Ceil of average of two integers without overflowing





(Y|Z)


((Y
©
Z) >> 1)


22

Examples of
Bitvector

Algorithms

P25: Higher order half

of product of x and y

o1 := and(x,0xFFFF);

o2 :=
shr
(x,16);

o3 := and(y,0xFFFF);

o4 :=
shr
(y,16);

o5 :=
mul
(o1,o3);

o6 :=
mul
(o2,o3);

o7 :=
mul
(o1,o4);

o8 :=
mul
(o2,o4);

o9 :=
shr
(o5,16);

o10 := add(o6,o9);

o11 := and(o10,0xFFFF);

o12 :=
shr
(o10,16);

o13 := add(o7,o11);

o14 :=
shr
(o13,16);

o15 := add(o14,o12);

res := add(o15,o8);



P24: Round up to next
highest power of 2

o1 := sub(x,1);

o2 :=
shr
(o1,1);

o3 := or(o1,o2);

o4 :=
shr
(o3,2);

o5 := or(o3,o4);

o6 :=
shr
(o5,4);

o7 := or(o5,o6);

o8 :=
shr
(o7,8);

o9 := or(o7,o8);

o10 :=
shr
(o9,16);

o11 := or(o9,o10);

res := add(o10,1);



[ICSE 2010] Joint work with Susmit Jha,
Sanjit

Seshia

(UC
-
Berkeley),



Ashish
Tiwari

(SRI) and
Venkie

(MSR Redmond)





Experiments: Comparison with
Exhaustive Search

23

Program

Brahma

AHA

time


Name

lines

iters

time

P1

2

2

3

0.1

P2

2

3

3

0.1

P3

2

3

1

0.1

P4

2

2

3

0.1

P5

2

3

2

0.1

P6

2

2

2

0.1

P7

3

2

1

2

P8

3

2

1

1

P9

3

2

6

7

P10

3

14

76

10

P11

3

7

57

9

P12

3

9

67

10

Program

Brahma

AHA

time


Name

lines

iters

time

P13

4

4

6

X

P14

4

4

60

X

P15

4

8

119

X

P16

4

5

62

X

P17

4

6

78

109

P18

6

5

46

X

P19

6

5

35

X

P20

7

6

108

X

P21

8

5

28

X

P22

8

8

279

X

P23

10

8

1668

X

P24

12

9

224

X

P25

16

11

2779

X




Choice 1: Logical relation between inputs and outputs




Choice 2: Input
-
Output Examples

24

Functional Specification

25

Functional Specification: Logical Relations


Æ
[

(
I[p]=1


Æ
(
I[j]=0)
)
)

(
J
[p]=0
Æ
(J[j] = I[j])
)

]


p=1

b

j=p+1

b

j

p


Problem:
Turn off rightmost 1
-
bit


Functional Specification of desired behavior


Tool Output:

J = I & (I
-
1)


Problem:
Turn off rightmost contiguous string of 1
-
bits





Logical Relations


A bit complicated



Input
-
Output Examples


Key challenge is to resolve under
-
specification


Our solution: Interaction with user

26

Functional Specification

Problem: Turn
-
off rightmost contiguous string of 1’s


User:
I want a design that maps 01011
-
> 01000


Oracle:
I can think of two designs


Design 1: (x+1) & (x
-
1) Design 2: (x+1) & x


which differ on 00000 (Distinguishing Input)


What should 00000 be mapped to?


User:
00000
-
> 00000












27

Dialog: Interactive Synthesis

Problem: Turn
-
off rightmost contiguous string of 1’s

User:
01011
-
> 01000


Oracle:
00000 ?

User:
00000


Oracle:
01111 ?

User:
00000


Oracle:
00110 ?

User:
00000


Oracle:
01100 ?

User:
00000


Oracle:
01010 ?

User:
01000


Oracle:
Your design is
X & (1 + ((x
-
1)|x))



28

Dialog: Interactive Synthesis



Dimension 1: User Intent



Dimension 2: Search space



Dimension 3: Search Technique



Applications


Bit
-
vector Algorithms


String Manipulation Macros


Geometry Constructions

29

Outline



Search Space Dimension: Programs with conditionals/loops


String operations: Concatenate, Substring


Logical operations: comparison involving # of occurrences of a
regular expression



User Intent Dimension: Input
-
output examples




Search Algorithm Dimension: Combination of


Version Space Algebras


Machine Learning



30

Application 2: String Manipulation Macros



End
-
Users

Algorithm

Designers

Software Developers

Most Useful
Target

Consumers of Program Synthesis Technology

Application 2: String Manipulation Macros


Significance Dimension: End
-
users

31

1.
I
dentify tasks that end
-
users struggle with and identify
how they can effectively communicate their intent.


Read help
-
forums and blogs.


Interview real users.


2.
Design a language that satisfies the following trade
-
off.


Expressive enough to express a lot of tasks.


Small enough to allow efficient learning.


3.
Design a learning algorithm with following features.


Interactive with fast convergence (with success or failure).


Provide feedback.


Noise tolerant.




32

Methodology: Automating end
-
user programming

Joint work with: Bill Harris (UW, Madison), Rishabh Singh (MIT)

33

Synthesis Algorithm for String Programs


Language L of programs contains regular expressions,
conditionals and loops.



Goal: Given input
-
output pairs: (i1, o1), (i2, o2), (i3, o3),
(i4, o4), compute set of all programs P such that


P(i1)=o1, P(i2)=o2, P(i3)=o3, P(i4)=o4


Then, choose the simplest such P.


34

Synthesis Algorithm for String Programs

1. Compute the set D1 of all straight
-
line programs
s.t.

for
any Q in D1, Q(i1) = o1. Similarly compute D2, D3, D4.


2. Let D = D1
Å
D2
Å
D3
Å
D4. If D ≠
;

then done, Else:

(a). Find a smallest partition, say {D1,D3}, {D2,D4},
s.t.

D1
Å
D3 ≠
;


and D2
Å
D4

;


(b). Learn a
boolean

classifier
b that maps i1 and i3 to true
and i2 and i4 to false.


3. Desired set of programs is

If (b) then D1
Å
D3

else
D2
Å
D4



Dimension 1: User Intent



Dimension 2: Search space



Dimension 3: Search Technique



Applications


Bit
-
vector Algorithms


String Manipulation Macros


Geometry Constructions

35

Outline



Search Space Dimension: Straight
-
line programs


Operations: Ruler, Compass



User Intent Dimension: Logical specifications


Can be obtained from natural language


Is further translated into a random input
-
output example




Search Algorithm Dimension: Exhaustive Search


Property Testing


Goal
-
directed search


C
ommonly used library of constructions



36

Application 3: Geometry Constructions



Students and Teachers

End
-
Users

Algorithm

Designers

Software Developers

Most Useful
Target

Most
Transformational
Target

Consumers of Program Synthesis Technology

Application 3: Geometry Constructions


Significance Dimension: Students and Teachers

37

Automating Education

Make education interactive and fun



Automated problem solving (for students)


Provide
hints


Point
out mistakes and suggest fixes



Creation of teaching material (for teachers)


Authoring tools


Problem construction



G
roup interaction (for teachers/students)


Ask questions


Share annotations


Domains: Geometry, Algebra, Probability, Mechanics,
Electrical Circuits, etc.



38

What is the role of PL + Logic + Synthesis?


Programming Language for Geometry


Objects: Point, Line, Circle


Constructors


Ruler(Point, Point)
-
> Line


Compass(Point, Point)
-
> Circle


Intersect(Circle, Circle)
-
> Pair of Points


Intersect(Line, Circle)
-
> Pair of Points


Intersect(Line, Line)
-
> Point


Logic for Geometry


Inequality predicates over arithmetic expressions


Distance(Point, Point), Angle(Line, Line), …


Automated Problem Solving


Given pre/
postcondition
, synthesize a straight
-
line program

39

Geometry Constructions Domain

Automated Problem
Solving


Given
pre/
postcondition
, synthesize a straight
-
line program


Example:
Draw a line L’ perpendicular to a given line L.


Precondition: true


Postcondition
: Angle(L’,L) = 90


Program


Step 1: P1, P2 =
ChoosePoint
(L);


Step 2: C1 = Circle(P1,P2);


Step 3: C2 = Circle(P2,P1);


Step 4: <P3, P4> = Intersect(C1,C2);


Step 5: L’ = Line(P3,P4);

40

Geometry Domain: Automated Problem Solving

41

Constructing line L’ perpendicular to given line L

P1

P2

P3

P4

C
1

C2

L

L’

Step 1: P1, P2 =
ChoosePoint
(L);

Step 2: C1 = Circle(P1,P2);

Step 3: C2 = Circle(P2,P1);

Step 4: <P3, P4> = Intersect(C1,C2);

Step 5: L’ = Line(P3,P4);


Bisect a given line.


Bisect an angle.


Copy an angle.


Draw a line parallel to a given line.


Draw an equilateral triangle given two points.


Draw a regular hexagon given a side.


Given 4 points, draw a square with each of the
sides passing through a different point.


Other Applications
:


New approximate geometric constructions


2D/3D planning problems

42

Examples of Geometry Constructions


Synthesis, in general, is harder than verification.


Synthesis Problem
: Given
pre/
postcondition
, synthesize a
straight
-
line
program


Verification Problem
: Given
pre/
postcondition
, and a straight
-
line program, determine whether the Hoare triple holds.








Decision procedures for verification of geometry
constructions are known, but are complex.


Because of symbolic reasoning.

43

Synthesis Algorithm for Geometry Constructions

Precondition: True

Postcondition
: Angle(L,L’) = 90

Step 1: P1, P2 =
ChoosePoint
(L);

Step 2: C1 = Circle(P1,P2);

Step 3: C2 = Circle(P2,P1);

Step 4: <P3, P4> = Intersect(C1,C2);

Step 5: L’ = Line(P3,P4);


Symbolic reasoning based decision procedures are complex.



How about property testing?


Theorem:
A construction that works (i.e.,

satisfies the
postcondition
) for a randomly chosen model of precondition
also works for all models (
w.h.p
.).


Proof:


Objects constructed using ruler/compass can be described
using polynomial ops (+,
-
,*), square
-
root

& division operator.


The randomized polynomial
i
dentity testing algorithm lifts
to square
-
root and division operators as well !

44

A simpler strategy for verification of Constructions


Problem: Given two polynomials P1 and P2, determine
whether they are equivalent.



The naïve deterministic algorithm of expanding
polynomials to compare them term
-
wise is exponential.



A simple randomized test is probabilistically sufficient:


Choose random values r for polynomial variables x


If P1(r) ≠ P2(r), then P1 is not equivalent to P2.


Otherwise P1 is equivalent to P2 with high probability,

45

Randomized Polynomial Identity Testing

Problem: Symbolic reasoning is hard.


Idea #1: Leverage Property Testing to reduce symbolic
reasoning to concrete reasoning.


Construct a random input
-
output example (I,O) for the problem
and find a construction that can generate O from I.


Example: Construct
incenter

of a triangle
.


If I chose my input triangle to be an equilateral one, then the
circumcenter

construction also appears to work
!


Since
incenter

=
circumcenter

for an equilateral
traingle
.


But what are the chances of choosing an random triangle to be
an equilateral one?

46

Synthesis Algorithm for Geometry Constructions

Exhuastive

Search Strategy:
Given input objects I and
desired objects O, keep constructing new objects from I
using ruler and compass until objects O are obtained.


Problem:
Search blows up, i.e., too many (useless) objects get
constructed.


Example: n points lead to O(n^2) lines, which leads to O(n^4)
points, and so on…

47

Synthesis Algorithm for Geometry Constructions

Problem: Search space is huge.



Idea
#2:
Perform goal
-
directed reasoning.


Example: If an operation leads to construction of a line L that
passes through a desired output point, it is worthwhile
constructing line L.


Mimics human intelligence
.


For this to be effective, we need solutions with small depth.



Idea #3: Work with a richer library of primitives.


Common constructions picked up from chapters of text
-
books.


A search space of (small width, large depth) is converted into
one of (large width, small depth).


Mimics human
experience/knowledge.





48

Synthesis Algorithm for Geometry Constructions

49

Search space Exploration: With/without goal
-
directness

50

Problem Solving Engine with Natural Interfaces

Natural Language
Processing

Paraphrasing

Synthesis
Engine

Problem Description
in

English

Problem Description
as

Logical Relation

Solution
as

Functional Program

Solution

in
English

Joint work with: Kalika Bali, Monojit Chaudhuri (MSR Bangalore)



Vijay Korthikanti (UIUC), Ashish
Tiwari

(SRI)

Useful modules powered by problem solving engine


The next step is to architect several useful modules on
top of the problem
-
solving architecture such as:



Interactive feedback to students


Provide hints


Point out mistakes and suggest fixes




Creation of teaching material (for teachers)


Problem construction


Authoring
tools




51

What domains should we prioritize for automation?



Mathematics


Algebra


Probability



Physics


Mechanics


Electrical Circuits


Optics



Chemistry


Quantitative Chemistry


Organic Chemistry

52

Other Domains


Consider the problem of computing effective resistance
between two nodes in a graph of resistances.









MATLAB implements
Kirchoff’s

law based decision procedure


A
lgebraic
sum of the currents at any circuit junction
= 0


S
um
of
changes
in potential
in any
complete loop
= 0

53

Electrical Circuits: Concept
-
specific solutions

Joint work with:
Swarat

Chaudhuri (Penn State University)


Consider the problem of computing effective resistance
between two nodes in a graph of resistances.



Kirchoff’s

law based decision procedure is not useful for
students who are expected to know only simpler concepts.



S
olutions need be parameterized by specific concepts such as:


Series/Parallel composition of resistances


Symmetry Reduction


Wheatstone Bridge

54

Electrical Circuits: Concept
-
specific solutions

Joint work with:
Swarat

Chaudhuri (Penn State University)

55

Resistance Reduction Concepts

If R
3
/R
1

= R
4
/R
2
, then V
D

= V
B

Parallel

Combination

Series

Combination

Wheat
-
stone

Bridge

Automating Education: Long
-
term Goals



Ultra
-
intelligent computer





Model of human mind





Inter
-
stellar travel




User Intent


Human Computer Interaction


Natural Language Processing



Search Space (requires corresponding domain expertise)


Graphics (for image manipulation)


Mathematics/Physics (for classroom problem solving)



Search Techniques


Logical Reasoning


Machine Learning


57

(Inter
-
disciplinary) Dimensions in Program Synthesis



Students and Teachers

End
-
Users

Algorithm

Designers

Software Developers

Most
Transformational
Target

The Significance Dimension

58

Consumers of Program Synthesis Technology

Most Useful
Target


How to combine various forms of user intent in a unified
programming interface?


logic, natural language, input/output example, partial program



How to ensure a modular architecture that allows reuse of
domain knowledge and search techniques across different
synthesis tools/applications?



How to combine power of different search techniques?


Version space algebras


SAT/SMT based logical reasoning techniques


Machine learning techniques



59

Research Questions


Dimensions in Program Synthesis


Invited paper at ACM PPDP 2010



Bitvector

Algorithms


“Oracle guided component based program synthesis”,


ICSE 2010, Jha/Gulwani/
Seshia
/
Tiwari



String
M
anipulation Macros


“Automating String Processing in Spreadsheets using Input
-
Output Examples”, POPL 2011, Gulwani



Geometry Constructions


“Synthesizing Geometry Constructions”,



Techreport

2011, Gulwani/Korthikanti/
Tiwari


60

References