DSLsforCellularInteractions - ITTC

rucksackbulgeΤεχνίτη Νοημοσύνη και Ρομποτική

1 Δεκ 2013 (πριν από 3 χρόνια και 6 μήνες)

48 εμφανίσεις

Domain
-
specific Languages for
Cellular Interactions

Bill Harrison

Department of Computer Science

University of Missouri at Columbia

This work partially supported by:

NIH1 R0l GM62920
-
04A1,

NIH1 P20 GM065762
-
01A1,

the Georgia Research Alliance and

the Georgia Cancer Coalition.

Domain
-
specific Languages for
Cellular Interactions

Bill Harrison

Department of Computer Science

University of Missouri at Columbia

meow!

This work partially supported by:

NIH1 R0l GM62920
-
04A1,

NIH1 P20 GM065762
-
01A1,

the Georgia Research Alliance and

the Georgia Cancer Coalition.


Ph.D 2001, UIUC


Thesis: Modular Compilers and Their Correctness
Proofs


Thesis Advisor: Sam Kamin


Post
-
doc, Oregon Graduate Inst. (OGI)


Three years on Programatica Project


using Haskell programming language as basis for formal
methods


Assistant Professor, University of Missouri
-
Columbia since Fall 2003

Systems Biology asks…


Can
static

biological structure be related to
dynamic

biological behavior with
mathematical clarity, precision, & rigor?


Can biological systems be viewed as the “sum
of their parts”?


Can component
-
level models be integrated into
precise system
-
level models of biological behavior?


What techniques from Mathematics and
Computer Science apply to this composition
problem?


Rhodobacter Sphaeroides


Photosynthetic
bacterium


seeks out regions of
greater light


Roughly the size of
wavelength of light


cannot sense local light
differences directly




applies random walk


Simulations of Biological Systems


Simulations provide qualitative feedback,
but are not models
per se


how accurate/faithful is a simulation?


what does the feedback mean?


can one reason about the biological
phenomenon based on the simulation?


can you identify the biology by inspecting
the text of the simulation program?

R. Sphaeroides

in C++


contains

1000 LOC


to understand requires


expertise in C++


…and biological model


…and critical system details


e.g., how is concurrency
implemented?

bool global_state::register_state(void *apointer)

{


if( number_of_states == mother_of_all_states.size())


mother_of_all_states.resize(number_of_states + 1000);


mother_of_all_states[number_of_states++] = apointer;


return true;

}

R. Sphaeroides

in C++


Program structure does
not

reflect biological model


can you look at the source code and
recognize the underlying biology?




difficult to comprehend


…and write correctly


…and modify


…and maintain


…and re
-
use

bool global_state::register_state(void *apointer)

{


if( number_of_states == mother_of_all_states.size())


mother_of_all_states.resize(number_of_states + 1000);


mother_of_all_states[number_of_states++] = apointer;


return true;

}

System Biology as Programming
Language Design


The Problem:



General
-
purpose programming languages do not
have the “right vocabulary”


Biological model: Concurrent Markov chains


C++: classes, pointers, etc.


…nor are they mathematics


Our Solution:

Design small, special purpose
languages with exactly the right vocabulary


called a
Domain
-
specific Language

(DSL)
[Sheard99,Thiemann01,Leijen01]


Mathematical semantics of DSLs gives
formal
model of biology

cell
1

|| … || cell
n

Executing:

Produces animation:

Language Model of
R. Sphaeroides

Outline


Language Design and Domain
-
specific
Languages


design, definition, and implementation


Systems Biology as Language Design


Case Study for
Rhodobacter Sphaeroides


Design
: what are the appropriate abstractions for
R.
Sphaeroides
?


Definition
: how do we specify exactly what
R. Sphaeroides

programs mean?


Implementation
: how do we run
R. Sphaeroides

programs?


Conclusions

Application Programmers

should choose languages


with abstractions most suited to their task;

Language designers

must provide languages


with those abstractions…

Domain

Central Activities

Reasonable Language

System Programming

“bit
-
fiddling”

C

Artificial Intelligence

List processing

LISP

System Admin.

Text processing, etc.

PERL

Cardinal Rule of Language Design

DSLs are small languages w/
“domain abstractions”

translates

directly

assignStmt :: Parser Stmt

assignStmt = do{ id


楤敮i


; symbol ":="


; s


䕸灲

††† †† †
㬠牥瑵牮 ⡁獳楧n⁩搠s⥽

Parsec code

<Stmt>


㱩摥湴㸠㨽‼䕸灲:

BNF for language

Ex: “Parsec” Parser DSL

“Why a language and not a library?”


The Slogan
: “What is excluded from a DSL is as
important as what is included in it”


libraries in a general
-
purpose language still require


considerable expertise & self
-
discipline on the part of the
programmer


Lack of generality in DSL


fewer things to “go wrong”


DSL may have desirable properties that a general
-
purpose language
will not


e.g., implementation techniques specialized to DSL that do not
apply to general
-
purpose languages


small size makes rigorous specification tractable

DSL Design

DSL design for
R. Sphaeroides


what are our domain abstractions?


How does this organism behave?


What modeling techniques are used by
biologists to describe this behavior?

Bacterial Commands

adjust

speed

grow

divide

tumble

die

*Probability of growth


varies with light concentration

laze

Chapman
-
Kolmogorov Equation*

probability of transition from
i

to
j

P
i,j

probability of being


in state
m

*Commonly used framework for modeling


biological systems [Bremaud99, Dailey02, Mao02, Shah00]

Chapman
-
Kolmogorov Equation

A row in the above matrix encodes the transition


function from state
i

of a
Markov chain


Bacteria as
Markov Chains

State
i

State 0

State
m



0
,
i
P
m
i
P
,


non
-
deter. state machines with probabilistic transitions


induced by the Chapman
-
Kolmogorov equation



P
i,j

in terms of environmental factors, organism


state, etc.



executing concurrently

Domain Abstractions for
R.
Sphaeroides


Individual cells: Markov
-
chain abstraction


choose



P
1



Action
1





P
n



Action
n


Actions:
Tumble
,
Divide
,
AdjSpeed
,
Laze
,
Grow
, etc.


Concurrency:
cell
1

||
cell
2


Environmental Factors:
light
,

size

Abstract syntax for CellSys


choose

is our principal domain abstraction


behaves like the Markov chain transition function


Cell
-
level environment variables:
light
,

size

DSL Definition


Background:

Programming languages are
“collections of effects”


Java = OO + Threads + State +…


LISP = Higher
-
order Functions + …


Prolog = Backtracking + …


Corresponding to each such effect is an
algebraic construction called a
monad


used for the development of modular semantic theories of
programming languages [Moggi89]


monads may be constructed using “monad transformers”

StateT

imperative

:=

EnvT

binding



@ v

ErrorT

exceptions

raise/catch

ContT

continuations

callcc

NondetT

non
-
determ.

choose

ResT

threads

step pause



DebugT

debugging

rollback

BackT

backtracking

cut

ProbT

probability

random

ReactT

reactivity

send,recv,…

Periodic Table of Effects

StateT

imperative

:=

EnvT

binding



@ v

ErrorT

exceptions

raise/catch

ContT

continuations

callcc

NondetT

non
-
determ.

choose

ResT

threads

step pause



DebugT

debugging

rollback

BackT

backtracking

cut

ReactT

reactivity

send,recv,…


Prog. languages are collections of effects captured as
monads
[Moggi]


Monads assembled from
constructors

(monad transformers)


Our view: Systems are collections of effects captured as monads


“Systems” broadly construed:


Compilers [Harrison00,98,01,02],


Secure system software [Harrison05,03], and


Biology [Harrison04]

Periodic Table of Effects

ProbT

probability

random

StateT

imperative

:=

EnvT

binding



@ v

ErrorT

exceptions

raise/catch

ContT

continuations

callcc

NondetT

non
-
determ.

choose

ResT

threads

step pause



DebugT

debugging

rollback

BackT

backtracking

cut

ReactT

reactivity

send,recv,…


Mathematical definitions for
any

language
created by combining MTs


CellSys = StateT + ResT + ProbT + ReactT


Such definitions are flexible


modular, extensible, and easily refactored



DSL definition similar to traditional RTS


In a traditional RTS


threads request
services like


“send a message”


“output on device”


“consume resource”


RTS mediates


ensuring that the
threads do not
interfere


global system state
remains consistent


schedules threads


Run
-
time System



threads

High
-
level view of definition


In CellSys


Cells are threads with
physical components as well


size, velocity, …


cells request services like


“consume nutrients”


“move me here”


“want to divide”


GE mediates like RTS, also:


preserves physical integrity


updates global world view


performs scheduling


Global Enviroment



cells

DSL Implementation


Because CellSys defined in terms of
monad transformers, may be
implemented directly as Haskell program


I.e., monadic language definition may be
transcribed “symbol for symbol” into Haskell


Haskell implementation easily instrumented
to output system “snapshots”:


prints out snapshots in POV (Persistence of
Vision) format & converted into MPEG

Q: What
are

appropriate
language
s

for modeling?


Integrate techniques from programming languages


models of concurrency


language semantics


i.e., precise, mathematical language definitions


efficient language implementation


…into special purpose language called a “Domain
-
Specific Language”


abstractions taken directly from biology




comprehensible by biologists


DSLs and DSL programs


hide technical details irrelevant/uninteresting to biologists


are “tunable” by computer scientist to reflect
discovery/refinement


execute to provide “reality check” by biologists

Bioinformatics = Computer Science + Biology


models of concurrency


efficient implementation


mathematical models of
programs


reasoning about programs


organism structure &
behavior


modeling techniques


cellular automata


systems of PDE’s


numerical techniques

Computer Science

Biology



=


Hard Problem:

How do you effect a technology transfer from CS


Biology?

Interdisciplinary Process

CellSys (version 1.0)

CellSys (version
2.0
)

feedback/discussion

Biologist evaluates
DSL model for
accuracy,
expressiveness, etc.

Language expert
refactors language
as needed

Summary

modular

monadic

semantics

domain

specific

languages

systems

biology

Comprehensibility,


Reusability, &

Ease of Use

Precise description of biological

phenomena through DSL semantics

Large body of work providing

domain abstractions &

models

* Harrison & Harrison,
“Domain Specific Languages for Cellular Interactions”

in Proceedings

of the International Conference IEEE Engineering in Medicine and Biology, 2004.