YACC no more

skatechildrenSoftware and s/w Development

Nov 3, 2013 (3 years and 9 months ago)

50 views

Session # 2221

YACC no more

Sriram Srinivasan (“Ram”)

Integrating parsers, interpreters and
compilers into your application

Session # 2221

2

This is he


Sriram Srinivasan


One of the core engineers of the WebLogic
app server


Wrote the first commercially available EJB
implementation


Wrote the TP engine in the WLS


Author: “Advanced Perl Programming”
(O’reilly)

Beginning

Session # 2221

3

Why this talk?


Quest for higher level programming patterns


More productive / faster / maintainable etc…


Integrating compilers, parsers, interpreters
into your application



Beginning

Session # 2221

4

Embeddable Parsers


JDK parsers for configuration data


java.util.Properties, XML, regex library


java.util.Properties


Limited to “
property = value
” format


Takes care of comments, multi
-
line values, quotes


Case Study: Configuration Data

#app server properties

connectionPoolName = testPool

numThreads = 10




p = new Properties().load(inputStream)

Middle

Session # 2221

5

XML parsers


Good for structured, hierarchical data


DOM (Document Object Model) parser


Converts an entire XML document into a
corresponding tree of Nodes.


SAX (Simple API for XML)


Callback class extends
DefaultHandler


Supplies methods for
startDocument(…)
,
startElement(…)
,
endElement(…)

etc.

Middle

Session # 2221

6

Adding code to data


Problem: We want to add add macros and
expressions to our properties.

numThreads =
numProcessors

# Ensure that connection pool is smaller than

# thread pool.

connectionPoolSize =
min(numThreads


2, 1)



This requires an expression evaluator


Middle

Session # 2221

7

Embeddable interpreters


Plethora of free, high quality interpreters
available


BeanShell (Java
-
like syntax)


Rhino (JavaScript)


Jython (Python in Java)


Kawa (Scheme in Java)


When embedded, flow of control easily goes
from java to interpreter to back.


Command
-
line shell always included

Middle

Session # 2221

8

BeanShell


Expressions identical to java


Types are inferred dynamically

Middle

add( a, b ) {


return a + b;

}


sum = add(1, 2); // 3

str = add("Web", "Logic"); // "WebLogic"

Session # 2221

9

Embedding BeanShell

Middle

import bsh.Interpreter;


Interpreter i = new Interpreter();

i.set
("foo", 5);

i.eval
("bar = foo*10");

System.out.println("bar = "+
i.get
("bar"));

i.eval(new FileReader("config.properties"));

Integer n = i.get("connectionPoolSize");


Instead of writing code to parse the properties file,
just
eval

it!


Comments should be “// … ”, not “# …


Each property definition line should end in “;”

Session # 2221

10


Strict java expression syntax


no class declarations


Loose convenience syntax


BeanShell features

Middle

b = new java.awt.Button();

b.label = "Yo" // eqvt. to b.setLabel("Yo")

h = new Hashtable();

h{"spud"} = "potato";

// Swing stuff

b = new JButton("My Button");

f = new JFrame("My Frame");

f.getContentPane().add(b, "Center");

f.pack();

f.show();

Session # 2221

11

Rhino


Free ECMAScript interpreter from Mozilla


Slightly more cumbersome to embed than
BeanShell


Contains bytecode compiler that can be
called from within java


Closures


Regex support built
-
in. Good for text
manipulation


Middle

Session # 2221

12

Case study: Command pattern

Middle

function insertCommand(text) {


this.pos

= buf.pos


buf.insert(text)


this.len

= text.length


this.undo = function () {


buf.moveTo(this.pos)


buf.erase(this.len);


}



undoStack.push(this);

}


new insertCommand("foo")

undoStack.pop().undo()


Undo/Redo in an editor

Session # 2221

13

Python

Middle


Python (Java implementation is "Jython")


powerful high
-
level language


Compiles to bytecode.


True scripting language


Can extend java classes


Static compilation and standalone execution

Session # 2221

14

More case studies


Embedded expressions


Spreadsheet formulae


Customizable GUIs


Macro facility, keyboard mapping


Remote agents


Monitoring


Performance through partial evaluation


Middle

Session # 2221

15

Case Study: Remote Agents


Example: Test Agents


Can upload script to each agent to launch
processes, control them locally.


Jython is well
-
suited for this kind of task


Example: Scriptable IMAP mail server


"All messages that contain this regex, make
a copy in this folder"

Middle

Session # 2221

16

Case Study: Monitoring


SNMP model: Obtain attributes from each
node over the network, do calculation


Alternatively, upload script to each node,
and let it return the result


Conserves network bandwidth


Can insert any kind of probe


Study application data structures


Application
-
specific profiling


Middle

Session # 2221

17

Case Study: Performance


Partial evaluation can yield substantial
performance benefits


Object
-

RDBMS adaptors


Code generator studies class and db
schema


Omits unnecessary conversions, null checks


Vector dot product


Middle

dp = a[0]*b[0] + a[1]*b[1] + a[2]*b[2];


// But if 'a' is fixed {16,0,4} …

dp = b[0] << 4 + b[2] << 2

Session # 2221

18

Generating java


Moving from embedded interpreters to
generating java source


Example: JSP.


Convert template to java, compile and
dynamically load


BEA/WebLogic's
weblogic.dtdc


Converts XML DTD to a high performance
SAX parser tuned to that DTD

Middle

Session # 2221

19

Generating code with Doclets


javadoc

is a general purpose parser

javadoc

doclet ListClass foo.java



ListClass.start() called with a hierarchy of *Doc
nodes

import com.sun.javadoc.*;

public class ListClass {


public static boolean
start
(
RootDoc

root) {


ClassDoc
[] classes = root.classes();


for (int i = 0; i < classes.length; ++i) {


System.out.println(classes[i]);


}


return true;

}


Arbitrary tags can be introduced at any level

Middle

Session # 2221

20

Case study: iContract


Pattern: doclet expressions converted to
annotated java code

/**

* Ensure that argument is always > 0

*
@pre f >= 0.0

*

* Ensure that the function produces the sqrt

* within a

*
@post Math.abs((return * return)
-

f) < 0.001


*/

public float sqrt(float f) { ... }


Middle

Session # 2221

21

Case Study: EJBGen

/**


*
@ejbgen:entity


*
ejb
-
name = AccountEJB
-
OneToMany


*
data
-
source
-
name = demoPool


*
table
-
name = Accounts


*/

abstract public class AccountBean


implements EntityBean {


/**


*
@ejbgen:cmp
-
field column = acct_id


*
@ejbgen:primkey
-
field


*
@ejbgen:remote
-
method transaction
-
attribute


= Required


*/


abstract public String getAccountId();


Middle

Session # 2221

22

Generating bytecode


Example: WebLogic RMI adaptors


Sometimes, some facilities are available
only in bytecode (goto's!)


Example: fast string matching


Given a search string, encode the state
machine into bytecode


Worth it if the same pattern is going to be
used many times


Virus scanners


Searching genome sequences

Middle

Session # 2221

23

Example: String matching


Problem: match "10100"


Convert to a state machine


Each state encodes a succesful prefix match



Middle

S
5

S
0

S
1

S
3

S
4

1

0

1

0

0

0

1

S
2

1

0

1

Session # 2221

24

String matching
(contd.)


If only goto were allowed in java …















But, goto's are allowed in bytecode!




Middle

try { //buf

is the buffer to be searched


int i =
-
1;


s0: i++; if (buf[i] != '1') goto s0;


s1: i++; if (buf[i] != '0') goto s1;


s2: i++; if (buf[i] != '1') goto s0;


s3: i++; if (buf[i] != '0') goto s1;


s4: i++; if (buf[i] != '0') goto s3;


s5: i++; return i
-
5;

} catch (ArrayIndexOutOfBoundsException e) {


return
-
1;

}

Session # 2221

25

String matching
(contd.)


Using an assembler like jasmin



Middle


iconst_m1


istore_1

S0
:
;; i++; if a[i] != '1' goto S0;



iinc 1 1
; i++


aload_0
; load a[i]


iload_1


caload


bipush 49
; load '1'


if_icmpne S0
; if .. goto S0

S1
:
;; i++; if a[i] != '0' goto S1



iinc 1 1


aload_0


iload_1


caload


bipush 48


if_icmpne S1

Session # 2221

26

Custom languages


Craft a language that fits the context you are
working in


Avoid XML ugliness: SRML (Simple Rule Markup)


Instead of "
if s.purchaseAmount > 100 …

"

Middle

<simpleCondition className="ShoppingCart"


objectVariable="s">


<binaryExp operator="gt">


<field name="purchaseAmount"/>


<constant type="float" value="100"/>


</binaryExp>

</simpleCondition>

Session # 2221

27

Antlr Introduction


Antlr: A recursive descent parser with configurable
lookahead (LL(k) parser)


Much,
much

simpler than lex/yacc


Yacc error messages are cryptic, tough for non
-
CS
types to understand


Even generated code easy to understand


Includes tree building and recognition


No such facility in yacc


Lexer, parser and tree recognizer phase have
similar syntax


Middle

Session # 2221

28

Antlr


Example: hierarchical property list


A list consists of name value pairs


Names are identifiers, values are numbers or lists


Middle

( a 200


b (c 10


d 20)

)

Session # 2221

29

Antlr
(contd.)

Middle

class
LispLexer

extends Lexer;

ID : ('a' .. 'z')+;

NUM: ('0' .. '9')+;

LP : '(';

RP : ')';


class
LispParser

extends Parser;

list : LP (nameValuePair)+ RP;

nameValuePair : ID value ;

value : NUM | list;

Session # 2221

30

Antlr
(contd.)

Middle

nameValuePair
returns [NVP ret=null]

{Object v;}


:
t:
ID
v=
value


{ret = new NVP(t.getText(),v);}


;

value
returns [Object ret=null]


:
t:
NUM
{ret=t.getText();}



|
ret=
list


;


Adding code, arguments, return values

Session # 2221

31

Way out there …

Middle


Configurable hardware


New circuits on the fly


Intentional programming


Code not represented as a stream of characters

Session # 2221

32

Summary


Run
-
time evaluation gives you a lot of power


Other languages add features (e.g. closures) to
java


Lots of simple, free, quality parsers, interpreters


Produce custom java source or byte code for
performance


Roll your own domain
-
specific language with
ANTLR or javacc.


Yacc No More.



End

Session # 2221

33

References


Doclets


Doclet tools:
www.doclet.com


EJBGen:
www.beust.com
, Cedric Beust


Icontract:
www.reliable
-
systems.com,
Reto Kramer


Languages, interpreters


Beanshell:
www.beanshell.org


Rhino:

www.mozilla.org/rhino


Python:
www.python.org, www.jython.org


ANTLR:
www.antlr.org


More …
flp.cs.tu
-
berlin.de/~tolk/vmlanguages.html


SRML:
xml.coverpages.org/srml.html




End

Session # 2221

34

References
(contd.)


Bytecode manipulation:


Jasmin:
mrl.nyu.edu/~meyer/jasmin/


Jikes Bytecode toolkit:
www.alphaworks.ibm.com/tech/jikesbt



BCEL:
bcel.sourceforge.net


"Rapid"
-

Reconfigurable hardware


www.cs.washington.edu/research


"The death of computer languages, the birth of
intentional programming", Charles Simonyi


research.microsoft.com/scripts/pubs/trpub.asp


Microsoft tech report MSR
-
TR
-
95
-
52


Thinking in Patterns with Java
, Bruce Eckel


www.mindview.net/Books/TIPatterns


End