FerryLeaks - A Web-based Editor for the Table Algebra

unalaskaweepingInternet and Web Development

Jul 19, 2012 (5 years and 1 month ago)

623 views

Eberhard Karls Universitat Tubingen
Mathematisch-Naturwissenschaftliche Fakultat
Wilhelm-Schickard-Institut fur Informatik
Bachelor Thesis
FerryLeaks - A Web-based Editor
for the Table Algebra
Patrick Brosi
August 2011
Reviewer
Prof.Dr.Torsten Grust
Wilhelm-Schickard-Institut fur Informatik
Universitat Tubingen
Advisors
Jeroen Weijers,M.Sc.
Wilhelm-Schickard-Institut fur Informatik
Universitat Tubingen
Dipl.-Inf.Manuel Mayr
Wilhelm-Schickard-Institut fur Informatik
Universitat Tubingen
Brosi,Patrick:
FerryLeaks - A Web-based Editor for the Table Algebra
Bachelor Thesis
Eberhard Karls Universitat Tubingen
Period:May 1,2011 - August 31,2011
Abstract
The Ferry language can be mapped to a growing number of programming lan-
guages,providing themwith the ability to process large amounts of data on an
external DBMS like PostgreSQL or IBMDB2.For SQL compilation,Ferry uti-
lizes Pathnder,using an XML representation of the query plan bundle as an
intermediary.As Ferry is being further developed and the number of supported
languages grows,debugging these intermediate plans becomes essential.In this
thesis,we present a web-based editor for manipulating,evaluating and vali-
dating query plans - FerryLeaks.It was created with the Google Web Toolkit
which allows the cross-compilation of Java code to JavaScript.The editor's
usage,its implementation,installation and possible extension are discussed in
this work.
i
Acknowledgements
I would like to thank my advisors Jeroen Weijers and Manuel Mayr for always
keeping an open door and for giving elaborate answers to my frequent emails.
I want to especially thank Jeroen Weijers for rereading this thesis again and
again,parsing for otherwise untraceable errors and giving honest criticism.My
thanks go to everyone who beta-tested FerryLeaks and provided me with bug
reports.
Special thanks go to Jan Rittinger,who supervised our student project The
Construction of a SASL Compiler in summer 2010.He always provided us with
constructive criticism.I would not have been able to implement FerryLeaks'
parsing functionality without the experience I made back then.
I would also like to thank George Giorgidze,who introduced me to the
Google Web Toolkit in January 2011 and who has been a very friendly super-
visor throughout my time in the database research group.
Very special thanks go to Torsten Grust,who not only allowed me to take a
job in his group,but who also supported me in many other ways over the last
year.
ii
Contents
1.Introduction 1
2.Ferry and the Table Algebra 3
2.1.An Everyday Example.......................3
2.2.Doing it the Ferry Way.......................5
2.3.Compilation Stages and the Table Algebra............6
2.4.XML Representation of Plans...................7
3.Tools and Frameworks 9
3.1.The Google Web Toolkit......................9
3.1.1.Principles of Web Applications..............10
3.1.2.A Note on the MVC Pattern................11
3.1.3.Installing GWT.......................12
3.1.4.Building the User Interface.................13
3.1.5.Communicating With the Server..............14
3.1.6.Including Raw JavaScript and Libraries..........16
3.1.7.Limits of GWT and Alternatives.............17
3.2.The Raphael Library........................18
3.2.1.Features...........................19
3.2.2.Using Raphael with GWT.................19
4.All Aboard the bugFerry!21
4.1.Using FerryLeaks..........................22
4.1.1.The Workbench.......................22
4.1.2.Loading Plan Bundles...................23
4.1.3.Editing Plans........................24
4.1.4.Creating New Plans from Scratch.............26
4.1.5.Evaluation and SQL Listeners...............27
4.1.6.Output Formats.......................29
4.2.Implementation...........................30
4.2.1.The GraphCanvas......................31
4.2.2.Filling and Sorting.....................33
4.2.3.Parsing Node Plans Using Schema Files.........34
iii
Contents
4.2.4.Manipulation........................37
4.2.5.Validation..........................38
4.2.6.Session Handling......................40
5.Deploying to a Server 43
5.1.Creating a Web Archive File....................43
5.1.1.Compiling..........................43
5.1.2.Structure of a Web Archive File..............45
5.2.Hosting FerryLeaks with Apache Tomcat.............45
5.2.1.Installing Tomcat and Uploading FerryLeaks.......46
5.2.2.The Conguration File...................47
6.Conclusions and Future Work 49
6.1.Contributions............................49
6.2.Ideas for Additional Features...................50
A.XML Representation of the Table Algebra 53
List of Figures and Listings 59
Bibliography 63
iv
1.Introduction
Relational database management systems are among to most ecient query
processors for large amounts of data.Decades of research have led to a
thorough understanding of their capabilities and to well-designed implemen-
tations [15].However,accessing databases from programming and scripting
languages is in most cases done by including static textual SQL queries into
the source code and processing the results within the host language.By doing
so,various advantages oered by DBMS are simply not called upon.
The Ferry project tries to establish a connection between the world of
databases and the world of programming languages.It strives to provide a
seamless integration of DBMS abilities into various host languages.For cre-
ating SQL queries,Ferry makes use of the Pathnder project.A query plan
(bundle) is outputted by Ferry and given to Pathnder for further compilation.
Up to now,debugging these plans is a challenging task.Trying to load an
erroneous plan into Pathnder usually results in cryptic error messages being
thrown (Listing 1.1).As neither line numbers are presented nor further error
descriptions given,ecient debugging is virtually impossible.In Listing 1.1,
1!ERROR:Segmentation fault.
2 The Pathfinder compiler experienced an internal problem.
3 You may want to report this problem to the Pathfinder
4 development team (pathfinder@pathfinder -xquery.org).
5
6 When reporting problems,please attach your XQuery input,
7 as well as the following information:
8 Compilation stage:0
9
10 We apologize for the inconvenience...
Listing 1.1:Error messages presented by Pathnder
the cause of the problemwas a simple reference to a child node introduced after
the parent node.Additionally,creating new query plans by hand or editing
existing ones can be a tedious job.Usually one has to dig into large XML les
and is frequently entangled in a mesh of node id references and column names.
1
1.Introduction
In this thesis,we present a live editor and evaluator for the table algebra [3]
on a pure JavaScript basis:FerryLeaks (a.k.a.the bugFerry).This
work also serves as an introduction to the extension of the editor by future
developers.
FerryLeaks was created using an AJAX framework called GWT,the Google
Web Toolkit.GWT enables developers to cross-compile Java code to high-
performance JavaScript.Graphical representation of the plan bundles was
realized using the lightweight Raphael library.
After a brief introduction to Ferry and the table algebra,we have a look
at the basics of GWT and Raphael.The usage of FerryLeaks and an insight
into its actual implementation are presented in Chapter 4.A quick manual on
how to deploy FerryLeaks onto a web server is given using a standard Tomcat
installation as an example.Lastly,we discuss some ideas for future extensions.
2
2.Ferry and the Table Algebra
For developers accustomed to standard programming languages like C++ or
Java,creating database queries often feels like entering an alien world.Well-
known constructs like loops,for-each and if-then-else statements suddenly are
no longer available;one is constrained to the use of what seem to be very
\ at"SELECT-FROM-WHERE statements.More than often those keywords form
the main knowledge of SQL a developer has.
Data processing is commonly done within the known environment of the
programming language itself.This often leads to unnecessary code and un-
necessary data transfers.Consider for example looping through the result of
a SELECT-FROM-WHERE statement to sum up certain values.For big results,
this can take quite some time,many rows will be unused and it adds at least
a dozen of extra lines to the code,whereas a simple AGGREGATE/GROUP BY
statement in the SQL query would have been much faster,easier to process
and would have transferred only the rows we would have actually needed.
Another problem usually arises when trying to ll nested data structures
with SQL-results.In terms of Java,think of lling a HashMap of ArrayLists
of strings.SQL does not support nesting,so lling the data structures would
have to be done by hand.Sending various SQL queries to the server would be
unavoidable.We shed more light on that in Section 2.1.
The Ferry programming language aims to overcome these problems.It serves
as an intermediary between common programming languages and database
management systems.A vast variety of scripting or programming languages
can be mapped onto Ferry.The Ferry compiler ferryc then generates algebraic
plans and gives them to Pathnder,which compiles SQL:1999 statements out
of it.
2.1.An Everyday Example
In order to clarify the statement above we present a modied version of the
introductive example shown in [1].Given a table of products like in Figure 2.1,
where every product has a name,an id,a price and a shop,the two cheapest
3
2.Ferry and the Table Algebra
Articles
id
name
shop
price
1
Franny and Zooey
books
7.99
2
Complete Beethoven
cds
69.99
3
1923 Ch^ateau Mouton-Rothschild
wine
60,000
4
Festen
dvds
6.99
5
The World as Will and Representation
books
9.95
6
Adidas sambas
shoes
56.69
7
Shiny red ball
outdoor
3.99
8
The Godfather Part III
dvds
0.69
9
The Corrections
books
7.99
Figure 2.1:Table Articles
articles per shop and their prices should be found.The result should be given
as a populated HashMap,containing the shop's name as a key and a tuple
of string arrays as a value,the latter representing the id,the name and the
price of the article.Listing 2.1 shows how this would be done in Java,using a
traditional JDBC approach.For readability,we introduced the standard class
Tuple,which can hold two objects of any type and the method getArray()
to return a string array containing a row's columns.
1/** suppressed connection handling **/
2 HashMap <String,Tuple <String[]>> cheapestTwo =    HashMap <
String,Tuple <String[]>>();
3 ResultSet shops = db.executeQuery("SELECT DISTINCT shop FROM
Articles;");
4      (depts.next()) {
5 String curShop = shops.getString("shop");
6 String artQry ="SELECT name,salary FROM Articles WHERE
shop='"+ curShop +"'ORDER BY price DESC FETCH
FIRST 2 ROWS ONLY;"
7 ResultSet arts = db.executeQuery(artQry);
8 Tuple t =    Tuple();
9      (arts.next()) t.put(arts.getArray());
10 cheapestTwo.put(curShop,t);
11 }
12 }
Listing 2.1:JDBC code excerpt
It is obvious that writing queries in this way is not optimal.Imagine a table
with thousands of shops.We would be attacking the database server with a
whole bulk of nearly identical queries just to locally do things the server could
do much more eciently and faster.
4
2.2.Doing it the Ferry Way
2.2.Doing it the Ferry Way
Ferry has been developed to be mapped onto other programming languages.
As a language on its own,Ferry's relevance is constricted to being a sandbox
environment for testing and demonstration purposes.Ferry's methods,oper-
ations and type system are designed to provide support for computations and
iteration over arbitrarily nested lists and tuples.All of this data is formed
according to the recursive denition
t = aj[t]j(t;::::;t)
where a is an atomic type like string or integer.Using this equation,most of the
data structures found in modern programming languages can be modelled by
Ferry.In the Java-world for example,we could represent a simple array of in-
tegers as (1,2,3,4,5),whereas HashMap<Integer,Integer> could be mod-
elled as [(int,int)].It is obvious that this denition corresponds with the
shape of a DBMS-table,where [(int,int,int)] would be a table of three in-
teger columns.Figure 2.1 could be modelled as [(int,string,string,int)].
With Ferry,instead of Listing 2.1,we could use Ruby code like in Listing 2.2 to
1 s_as = Articles.group_by {|a| a.shop}
2 s_ns = s_as.map {|s,as| [s,as.map {|a| [a.name,a.price]}]}
3 top2 = s_ns.map {|s,ns| [s,ns.sort_by {|n,p| -p}.first(2)]}
Listing 2.2:Ruby script
obtain the nested data [1].An equivalent snippet of Ferry code would look like
P
1
in Listing 2.3.This code is compiled,transformed into an SQL:1999 state-
ment and sent to the DBMS.The result will be returned to the host language
for further usage.
1    s = table Articles (  int,name string,shop string,
2 price int)
3 with keys ((  ))
4   for x   e
5      by x.shop
6       (the (x.shop),
7     (2,for y      (x.name,x.price)
8 order by y.2 descending
9       y))
Listing 2.3:Ferry Program P
1
5
2.Ferry and the Table Algebra
Host programming language
ferryc
SQL:1999
Ferry program
pathfinder
Algebraic Plan
Figure 2.2:Ferry compilation stages
2.3.Compilation Stages and the Table Algebra
After parts of the host language have been translated into the Ferry language,
the ferryc compiler takes over.Figure 2.2 visualizes the data ow through
the various compilation stages.Internally,the Ferry programis rst translated
into a Ferry core program out of which the Typed Ferry core and the (Un)boxed
Ferry Core are generated [1].
Ferry outputs bundles of algebraic query plans.Pathnder,originally a
purely relational XQuery compiler,takes the plan and outputs optimized
SQL:1999 queries that can be directly fed to a DBMS.The optimization meth-
ods applied by Pathnder are beyond the scope of this thesis.See [3] for a
in-depth discussion of compilation techniques such as loop lifting and further
query optimizations.
An algebraic plan is a directed acyclic graph (DAG) with a root node and
certain leaf nodes,the latter all representing either literal or referenced tables
in an actual back-end database.Data ow is frombottomto top,meaning that
going from the leaves to the root node,every node does some manipulation to
the input relation.The result of the algebraic plan can be found at the root
node.
Figure 2.3 shows an example of an algebraic plan (actually outputted by
FerryLeaks).At the bottom of the plan,a table node is created,containing
only the column iter with exactly one row of data lled with the integer 1.
Two branches emerge,adding various columns to the original table and doing
some projections.The results are joined again with an equijoin on columns
iter and iter999901.Finally,two columns are added and the results written
into a new column.
We explicitly point out that this plan is not complete:Anode called serialize
relation,usually found as the root node of every plan,has been suppressed.
6
2.4.XML Representation of Plans
add
(item99999001): <item1 item99991>
eqjoin
(iter = iter99991)
project
(iter:iter99991 item1:item99991 )
attach
(pos), val: <1>
attach
(item1), val: <3>
attach
(pos), val: <1>
attach
(item1), val: <2>
table
(iter) (<1>)
Figure 2.3:An algebraic plan,outputted by FerryLeaks
It is needed to tell Pathnder how the actual resulting table will look,for
example which columns are not needed and thus can be safely discarded.We
omitted it here because FerryLeaks automatically creates it,relieving the user
from this task.
Listing 2.4 shows the SQL:1999 statement created by Pathnder from the
example plan.Pathnder calculated the result of the addition at compile time,
thus relieving the DBMS of a trivial task (2 +3).
1       a0000.item3_int,a0000.iter1_nat
2     (      (1,1,5))   a0000(iter1_nat,
3 pos2_nat,
4 item3_int)
5        a0000.iter1_nat   ,a0000.pos2_nat   ;
Listing 2.4:SQL code generated by Pathnder from the example plan seen in
Figure 2.3.
The column names do not match the ones given in the example plan.This
is due to pathnders internal renaming techniques.For completeness,the
(trivial) resulting table can be seen in Figure 2.4.
2.4.XML Representation of Plans
Bundles of algebraic plans are presented as at XML les,in which all nodes
are listed bottom-up.Each node is represented by a <node> tag with attributes
7
2.Ferry and the Table Algebra
Result
item3
int
iter1
nat
5
1
Figure 2.4:The resulting table
id and kind.The former serves as a unique identier of the node,the latter
denes the node's kind.Children are dened as simple id references within
a special <edge> tag.Further attributes,like the join columns of an equijoin
operator,are specied within the <content> tag.See [4] for detailed informa-
tion on the XML format of the table algebra.A complete list of all algebraic
operators and their XML representations can be found in the Appendix A.
Listing 2.5 shows a very basic example of an XML plan.Some lines of code
have been suppressed for readability purposes.
1 <logical_query_plan>
2 <node id="0"kind="table">
3 <content>
4 <column new="true"name="iter"><value type="nat">1</value>
5 </column>
6 <column new="true"name="pos"><value type="nat">1</value>
7 </column>
8 <column new="true"name="item"><value type="nat">5</value>
9 </column>
10 </content>
11 </node>
12 <node id="1"kind="nil"/>
13 <node id="2"kind="serialize relation">
14 <content>
15 <column new="false"name="iter"function="iter"/>
16 <column new="false"name="pos"function="pos"/>
17 <column new="false"name="item1"function="item"/>
18 </content>
19 <edge to="1"/>
20 <edge to="0"/>
21 </node>
22 </logical_query_plan>
Listing 2.5:Abbr.XML representation of algebraic plans
8
3.Tools and Frameworks
A large number of frameworks and/or tool kits could have been used to create
an editor like FerryLeaks.Besides taking JavaScript as a basis,it would have
been possible to use a Flash approach or to even create a Java Applet.In
this chapter,we present the frameworks and libraries actually employed for
creating FerryLeaks:the Google Web Toolkit and Raphael.Alternatives to
these approaches are brie y discussed.
3.1.The Google Web Toolkit
1   (res < 1000) {
2//[...]
3 alert("It worked!(with res="+ res +")");
4//[...]
5 }
Listing 3.1:Typical JavaScript debugging
GWT carries the understanding of JavaScript as the web's assembler lan-
guage to a new level.Its Java-to-JavaScript compiler allows developers to
create complex web-applications with an IDE like Eclipse without bothering
about low-level technicalities.Thus,the scalability problems usually found
when working directly with JavaScript dissolve [13].It is possible to develop
an entire web application without having any knowledge of HTML,JavaScript
or CSS [13].The Java approach allows developers to use debugging and test-
ing possibilities not available with JavaScript (Listing 3.1).Some well known
applications like Google Mail and Google Maps have been developed using
GWT.The greatest strength of GWT is its easy-to-use implementation of the
AJAX-trac between client and web server.In a traditional JavaScript/DOM
approach using an XMLHttpRequest-Object,the developer has to take care of
requests on a very low level,specifying data formats,controlling the ow and
making HTTP-requests.GWT takes care of all of that.It all comes down to
a single interface the developer has to declare.On the server side,the servlet
9
3.Tools and Frameworks
will implement this interface,thus allowing the developer to directly call the
servlets methods from the client-side of the application.
In this chapter,a basic introduction to GWT is given.Since teaching GWT
can ll a whole book,we stick to the methods and approaches used for Ferry-
Leaks.Further techniques will be introduced on-the- y in Chapter 4,where
we have a precise look at FerryLeaks'implementation.
3.1.1.Principles of Web Applications
Browser
View
Web server
Server
View
Apache,Tomcat...
Back-end
Database,System...
Requests once
HTML + CSS
Figure 3.1:Architecture of a classic web application
Traditional web pages are requested once from a HTTP-server and rendered
in the client's browser.After that,they are static documents that cannot be
changed anymore (see Figure 3.1).Following this anatomy,a traditional web
application has to provide information to ll the browser window with before
the client's request is answered and the page is rendered.Therefore,the web
application is mostly based on the server.The browser window is nothing more
than a simple view which can be updated by means of clicking on links and
sending HTML-forms.Perhaps the most common approach to implement this
is to use PHP on the server.Other possibilities include Perl,Python,Ruby
on Rails or CGI.In all cases,requesting or sending new information leads to
a reload of the whole web page,resulting in heavy connection loads.
The use of AJAX techniques leads to an architecture similar to the one
depicted in Figure 3.2.Along with the HTML and CSS contents the JavaScript
is transferred to the client view.After that,the XMLHttpRequest object can
do as many calls to the server as it needs to,allowing content to be loaded
dynamically without refreshing the entire page.Note that the AJAX approach
naturally leads to more logic being deployed on the client side.This can reduce
server load,but it can also be a source of serious security problems.
10
3.1.The Google Web Toolkit
Browser
View
Web server
Server
Apache,
Back-end
Database,
Requests once
HTML + CSS + JS
XMLHttpRequest
(JavaScript)
System...
Tomcat...
Multiple calls
Figure 3.2:Architecture of an AJAX web application
3.1.2.A Note on the MVC Pattern
The Model-View-Controller pattern (MVC) is the de facto standard for sketch-
ing software systems and it has proved its worth.However,when thinking of
web applications,a subtle problem arises when trying to implement it.Figure
3.3 shows the traditional MVC pattern as it is taught at most computer science
departments.The problem is marked as
1
.
Changes to the model have to be propagated to the view.This is usually
done by either directly invoking the view's methods or by attaching listener
objects to the model.In both ways,there has to be some data ow from the
model to the view.As said before,the model is usually on the server side.
This makes sense because database back-ends etc.are also situated on the
server.But as discussed in Section 3.1.1,the HTTP connection only allows
data ow in a one-way fashion,meaning that the server cannot send anything
to the client on its own.This does not t with the MVC pattern.Instead,
when sketching web applications,another pattern is increasingly used:the
Model-View-Presenter pattern MVP.In MVP,model and view comply with
Model
View
Controller
1
Figure 3.3:The traditional Model-View-Controller pattern
their MVC counterparts,but do not communicate with each other anymore.
They are connected by an additional presenter.The presenter controls the
data ow between model and view and also holds the background logic for
lling the view.It responds to user events caught by the view,communicates
with the model and updates the view accordingly.Figure 3.4 shows the basic
concept of MVP.When implementing MVP,interfaces are in most cases used
11
3.Tools and Frameworks
Model
View
Presenter
Changes
Modifies
Updates
User events
Figure 3.4:The Model-View-Presenter pattern
to dene the methods oered by the model and the view.In terms of testing,
the view can thus be replaced by a test class implementing the view interface
and providing methods that expect a certain behaviour.This can be done for
example with JUnit [16].
3.1.3.Installing GWT
FerryLeaks was developed using Eclipse and the associated GWT plug-in,but
any other Java IDE can be used.Developing whole applications by hand and
compiling them with GWT's command line tools is also thinkable.However,
the common way to create GWT applications is with Eclipse,which is available
for most operating systems.Eclipse can be downloaded for free from the
Eclipse Foundation [5].
The GWT Eclipse plug-in can be downloaded from the GWT download
page [6].The plug-in already ships with the GWT SDKs,so nothing further
has to be installed.For users preferring development with only command line
tools,a download containing only the GWT SDK is also available.For GWT's
development mode,a browser plug-in is needed.
After starting Eclipse,select New!Web Application Project to create a new
GWT project.Use Google App Engine should be unchecked.Checking generate
GWT project sample code will create a ready to go\Hello World!"application
to experiment with.
12
3.1.The Google Web Toolkit
3.1.4.Building the User Interface
Building a user interface with GWT is similar to creating a GUI with Java's
AWT or Swing libraries.GWT oers a library of Widgets matching user
interface objects known from common desktop environments.A complete list
of all widgets can be found in the Widget Gallery [25].
Very recently,Google responded to web designers complains that building
the interface\the Java way"was not as intuitive as the old HTML way and
often resulted in many lines of code for a widget initialization.The UIBinder
let's designers create GUI template using a simple XML syntax with sev-
eral placeholders as known from various content management systems (Typo3,
Drupal,Joomla etc.).However,since FerryLeaks main view,the GraphCanvas
(Section 4.2.1) is designed to also work with standard Java libraries like Swing
and AWT,we stick to the classic Java approach.For further information on
UIBinder,see the GWT documentation [24].
Widgets can be attached to the application very much like with Java's AWT
or Swing (Figure 3.2).
1        TextArea conversationArea =    TextArea();
2        TextBox inputBox =    TextBox();
3        Button sendButton =    Button("send");
Listing 3.2:GUI initialization
These GUI elements are placed inside a template le that can be found in
war/WEB-INF.Listing 3.3 shows an example le.To add GUI elements to it,
RootPanel.get().add(Widget w) has to be called.An example user interface
built with GWT can be seen Figure 3.5.
1 <!           PUBLIC"-//W3C//DTD HTML 4.01 Transitional//EN"
2"http://www.w3.org/TR/html4/loose.dtd">
3 <    >
4 <    >
5 <          ="text/javascript"
6    ="gwtalk/gwtalk.nocache.js"></      >
7 </    >
8 <    >
9 <  >GWTalk</  >
10 </    >
11 </    >
Listing 3.3:A GWT template le
13
3.Tools and Frameworks
Figure 3.5:A simple user interface
Additionally,widgets can be combined to formnewstandalone GUI elements
by means of the Composite class [12].
GWT provides all widgets with standard CSS class names.Styling the view
can be done by adding a Cascading Style Sheet to the application.Custom
CSS classes can be added to a widget by calling widget.addStyleName().As
a design rule,there should be as few styling directives in the Java code as
possible.Instead,CSS classes should be used where appropriate.
3.1.5.Communicating With the Server
Accessing data on remote servers brings up a fundamental problem:results
are not available instantly.In web applications,the client (hence JavaScript)
has to wait for the server's response.This can either be done by means of
blocking or non-blocking calls.
Blocking methods are intuitively expected from any iterative code.State-
ments (method calls,variable assignments etc.) are executed from top to bot-
tom,forming a hierarchy of calls where every call waits until its predecessor
has nished execution (Figure 3.6).
Using this approach for calling remote servers leads to very unresponsive
applications,though.Very often,the whole thread freezes while waiting for a
server response [13].
Non-blocking methods do not wait until the execution has nished.Instead,
they make heavy use of callbacks which are executed after the method has
terminated (the basic functionality is depicted in Figure 3.6).This leads to
14
3.1.The Google Web Toolkit
Client
Server
Call
Return
Client
Server
Call
Callback
(a) Blocking (b) Non-blocking
Figure 3.6:Blocking and Non-blocking Calls
code dierent from the one used with blocking methods.A bit of rethinking
is required by the developer,who now cannot just iterate code but has to
consider the execution logic by himself.
Providing callbacks with Java is more dicult than with JavaScript,though.
Since Java does not support passing around function pointers,callbacks have
to be implemented by means of interfaces.In fact,this is the GWT approach,
called Remote Procedure Calls (RPC).
A key to understanding the power behind GWT's RPCs is to think of them
as cross-connection interfaces providing methods on the client side that can
be called nearly as if they were implemented in the client itself.Connection
handling is transparent,GWT takes care of it.Hence,implementing an RPC
usually comes down to four steps:
 The creation of a service interface extending RemoteService and pro-
viding the headers of the methods that should be oered.
 Preparing a corresponding interface suxed Async which automatically
provides non-blocking versions of the methods introduced within the class
RemoteService.Methods dened by this interface can only be called
with a valid callback specied.
 Adding a servlet class that extends RemoteServiceServlet and imple-
ments the RemoteService interface.
 The instantiation of a RemoteServiceAsync object using GWT.create().
That is all.Methods invoked on the RemoteServiceAsync object will now be
executed within the servlet and their results written to the callback methods,
which are subsequently executed.
15
3.Tools and Frameworks
A RemoteService class for a basic chat application would look like the
example given in Listing 3.4.These methods can be called from the client
1                ChatCommunicationService        RemoteService
{
2          registerUser();
3       List <ChatMessage > getMessages(   uid);
4           leave(   uid);
5           sendMessage(ChatMessage msg);
6 }
Listing 3.4:A RemoteService
along with a callback method declared within an instance of AsyncCallback.
Any exception thrown in the server-side service implementation will be given
to a special onFailure() method.
Chapter 5 will give some insight into the actual deployment of a GWT
application to a web server.For now,note that GWT is not restricted to being
used with Java based web-servers like Tomcat or Jetty.Using PHP or Ruby-
on-Rails is also possible,for example to re-use existing servlets with GWT.
Since GWT's server calls are JSON-based,virtually any server technology will
do.
3.1.6.Including Raw JavaScript and Libraries
As FerryLeaks makes heavy use of an already existing JavaScript library called
Raphael,we give a short introduction to what is called the JavaScript Native
Interface (JSNI).Quite simply,including raw JavaScript within a GWT ap-
plication works like shown in Listing 3.5.The keyword native tells the GWT
1                        sayHello()/* -{
2 alert("Hello!");
3 }-*/;
Listing 3.5:Including JavaScript using JSNI
compiler that raw JavaScript is following.The body itself has to be com-
mented out.It is especially possible to pass arguments to JavaScript methods
and receive return values that are mapped to\real"Java types wherever suit-
able.
16
3.1.The Google Web Toolkit
With JSNI,using external JavaScript libraries is a matter of including them
into the template HTML le.See the GWT documentation [24] to get more
information on how Java objects can be mapped to JavaScript and vice versa.
3.1.7.Limits of GWT and Alternatives
Besides GWT,other approaches to developing large web applications exist.
This section discusses several alternatives to using AJAX based techniques.
Flash Perhaps the most well-known browser plug-in is Adobe's Flash Player.
According to estimates,9599%of all web users have it installed [7],and
it is available for almost every operating system or browser.Originally
developed to yield animations better than the ones created with GIFs,it
is nowadays also used to create large applications using the SDK Adobe
Flex [8].
Flash's biggest disadvantage is the lack of company-independent stan-
dards.Flash has to compete on a market with similar plug-ins like Mi-
crosoft's Silverlight,and nobody can tell if it will be still there in a few
years.Besides,the plug-in-based distribution still leaves some browser
without Flash,and Flash sites will not run in a fresh browser installation.
Additionally,development tools for Flash are not freely available.
Silverlight The Microsoft version of Flash oers.NET technology to build web
applications.Like Flash,it is controlled by a single company instead of
a standards organization,which can lead to insecurities when thinking
of using it in long-term applications.Browser support is not that good,
with Silverlight not running fully in Opera [21].However,Silverlight has
reached a distribution of over 60% [7],making it a vital (and the main)
concurrency to Adobe Flash.
Java Applets Java applications can be embedded as an applet into a sandbox
environment running in the browser since the nineties.However,Ap-
plets soon gained a reputation of not looking that good,of having a bad
performance and of not integrating fully with the rest of the web site.
Additionally,a full Java Run-time Environment has a size of roughly
13MB [9],making it much bigger than the standard Flash or Silverlight
plug-ins.
A common disadvantage of the approaches mentioned above is the necessity to
install a browser extension.Hence,since the FerryLeaks project should oer
platform independent editing,they do not apply.We further feel that with
the rise of HTML5,distribution of Flash or Silverlight will drop signicantly.
17
3.Tools and Frameworks
We would then have developed an application for a framework nobody would
be using anymore.Hence the decision to use the AJAX approach GWT oers.
However,despite having chosen GWT to develop FerryLeaks with,we close
this section with a list of GWT's disadvantages.This list is basically taken
from [13].
Indexing Websites created by GWT are not indexable by search engines be-
cause of the dynamic way in which they are generated.A solution called
cloaking exists,meaning that search robots will only see a static version
of the page.
Compatibility There is no graceful degradation with GWT.Either the ap-
plication will run or it won't (in most cases because JavaScript is not
supported/enabled).
Behaviour The back button will behave dierently than known from classic
web pages.GWT oers an easy way to overcome this problem.However,
usability is still a bit dierent.
Weight Compilation of the code requires much more time then just editing a
few lines of JavaScript and refreshing the site would take.Above that,
changes to the server side code require a recompilation of the whole
application.This can take up to a few minutes even in development
mode.
3.2.The Raphael Library
One of the main features FerryLeaks has to oer is a dynamic graphical rep-
resentation of the query plan.Implementing this by means of static images
is not an option,since usability would not be very good.Instead,FerryLeaks
uses Raphael,a lightweight graphical library for JavaScript [10].
1 var circle = paper.circle(200,200,120);
2 circle.attr("fill","grey");
3 circle.attr("stroke","black");
4 circle.attr("stroke -width","5");
5 paper.path("M200,300 Q300,20 400,100 T520,100").attr("stroke -
width","10");
Listing 3.6:A Raphael snippet
18
3.2.The Raphael Library
3.2.1.Features
Despite being projected to be a web standard with HTML 5,SVG graphics
are not yet implemented in every browser in the same way.Some widely-
used browsers like Internet Explorer 8 even lack SVG support completely [22].
Raphael provides basic JavaScript methods for drawing vector graphics browser
independently,meaning that it automatically detects the browsers version and
chooses the optimal rendering technique.
With Raphael,a compatibility even with Internet Explorer 6 can be guar-
anteed by FerryLeaks.Listing 3.6 gives an example of how Raphael works.
The resulting vector graphic (Figure 3.7) can by changed dynamically using
the DOM model.
Figure 3.7:Vector graphic created with Raphael
3.2.2.Using Raphael with GWT
For embedding Raphael into GWT applications,FerryLeaks uses a library
called raphael-gwt.Raphael-gwt comes as either an overlay which just maps
Raphael's methods to GWT using the JSNI introduced in Section 3.1.6 or
as a sophisticated Raphael class extending GWT's widget class.FerryLeaks
primarily uses the Raphael class,but since raphael-gwt is still under devel-
opment,a few changes have to be made to the code and direct JavaScript
invocations cannot be avoided.See Section 4.2.1 for more details on this.
19
4.All Aboard the bugFerry!
The motivation for developing an editor like FerryLeaks is to overcome the
debugging problems described in Chapter 1.The editor has to oer a good-
looking graphical interface for adding,modifying,deleting and merging exist-
ing plans in a drag-and-drop way.In addition,it should have the ability to
serve as a demonstration tool to teach the basics of the table algebra.To be
more specic,the editor has to be able to handle
 XML les containing algebraic plans
 any bundle of algebraic plans
 direct user input by means of Create new node/Create new edge buttons
This input has to be checked for validness.Erroneous nodes have to be high-
lighted in some way.Additionally,some description of the error should be
given.After loading an existing plan or while creating new plans from scratch,
the user should be able to
 add and delete logical nodes along with their incident edges
 add and delete edges
 merge two plans,for example by means of copying plan P
1
onto plan P
2
's
canvas and creating an edge between
Some output has to be given,too.The user should have the possibility to
download XML les.Since Pathnder can compile any valid input to SQL,
it should also be possible to get ready-to-go queries or even to evaluate plans
against a back-end database.
In this chapter,FerryLeaks'features are presented in the same order as the
implementation will later be discussed.First,we present FerryLeaks'work-
bench,the GraphCanvas.Then,after explaining how plans are loaded onto
the canvas,we discuss the various possibilities to manipulate query plans,how
to merge them and how to create new plans from scratch.Finally,we have a
look at a special feature called SQL listeners oering live evaluation and close
with a presentation of FerryLeaks'output formats.
21
4.All Aboard the bugFerry!
4.1.Using FerryLeaks
FerryLeaks was designed to resemble editors known from elds like diagram
creation or image manipulation,for example Inkscape [17],Dia [18] or the (also
web-based) Aviary Tools [19].It can be started from any modern browser,in-
cluding Firefox,Safari,Internet Explorer,Chrome and Opera.Usage with
Internet Explorer 6 is possible,but not recommended,as several features are
only provided in a fall-back manner.Secondary,starting FerryLeaks from a
mobile browser will not work on every system,mainly because many mobile
browsers cannot handle vector graphics yet.FerryLeaks falls back to a simpler
graphical representation of the plan by need and if possible,using for example
a completely dierent approach for node drawing when started from old Inter-
net Explorers.However,we recommend using it with an up-to-date browser.
FerryLeaks was primarily tested on Firefox 3,4 and 5,Chrome,Safari and
Opera.
4.1.1.The Workbench
Typing in FerryLeaks'URL will bring up the main window,the workbench
(Figure 4.1).An empty plan is loaded by default onto the GraphCanvas
1
.
On the left,the edit panel
2
provides basic tools such as adding a new node,
drawing edges,deleting elements,editing nodes,getting the plan's SQL or
XML representation as well as a play button for evaluating the whole plan.At
the top of the workbench sits the tab bar
5
which can be used to edit multiple
plans at once.Clicking on the + button adds a new empty plan to the bundle.
The tabs also oer a context menu and can be closed by either using the menu
or by middle-clicking on them.Loaded plans will be kept in the server session,
so closing a browser window accidentally doesn't mean that the whole work is
lost.Reloading FerryLeaks will prompt a notice that previous plans have been
found on the server.Session handling will be further discussed in Section 4.2.6.
Hidden in Figure 4.1 are the sorting menu
3
and the I/O menu
4
.They
can be opened by clicking on the related menu headers.The sort menu holds
methods to sort the graph.As we will see in the next section,additional sort-
ing methods can be added easily by providing classes extending RemoteSorter.
Standard sorting methods include a tree-like sorter using an external dot in-
stallation,a circle sorter and a simple in-line sorter as a fallback when dot is
not available on the server.
22
4.1.Using FerryLeaks
2
5
3
4
1
Figure 4.1:FerryLeaks'workbench
4.1.2.Loading Plan Bundles
After opening the I/Omenu,plans or plan bundles can be uploaded by clicking
on the Upload XML plan button
6
.An upload dialog appears where the
XML le to be edited can be selected.The plan is uploaded automatically
and subsequently sorted.If the XML le contained a bundle of query plans,
each plan will be loaded into an own tab as seen in Figure 4.2.A graphical
representation of the plan appears on the GraphCanvas.It is possible to
drag around nodes and sort them again via the Sort menu.Hovering over a
node makes a pop-up appear containing information on the nodes columns,its
children and its contents.Node labels resemble the ones used in the standard
dot output of Pathnder.Take for example the eqjoin node
7
from Figure
4.2.The line iter = iter99991 simply means that this equijoin is on columns
iter and iter99991.Column renaming (e.g.in projection nodes) is usually
indicated by colons.
Uploaded plans are validated instantly.If a plan contains errors,a red
number
8
denoting the error count is displayed in the plan tab.
23
4.All Aboard the bugFerry!
6
8
7
Figure 4.2:FerryLeaks with a loaded plan bundle
4.1.3.Editing Plans
Query plans can be edited either by means of
Figure 4.3:Context menu
the node context menu or by using the buttons
on the left of the workbench.Keyboard inter-
action is not yet supported (see Section 6).
Every change to a node results in an imme-
diate re-validation of the whole query plan.
Errors are displayed instantly.A red excla-
mation mark tells which nodes exactly are er-
roneous and hovering them gives a detailed
description of what is wrong.Among others,
validation checks for the existence of refer-
enced columns and for grammar correctness.
A detailed introduction to all validators and
a description how to add custom ones is given in Section 4.2.5.Since every
context menu entry has its own related toolbar button,we only introduce the
24
4.1.Using FerryLeaks
menu items here.We omit the menu entries for evaluation,since they are
further discussed in Section 4.1.5.
Delete removes either the associated node or edge from the plan.Removing
a node results in all incident edges being automatically deleted.
Select has the some functionality as clicking on a node and is available only
for fallback reasons
Select subtree selects the associated node and all of its children.Double-
clicking on a node is equivalent.
Copy oers a basic copy & paste feature.All selected nodes are saved on the
server-side and can later be pasted into any other plan.
Edit brings up a new window,the node edit panel (Figure 4.4),where all con-
tents specied in the corresponding node schema les can be congured
(see Section 4.2.3).Fields that have to be lled with an existing column
are displayed as a drop-down box
9
containing all columns accessible
from this node.In this case (Figure 4.4),only the columns in the right
child are shown.
It is possible to directly edit the XML source code of a node.To do so,click
on the Source tab in the node edit panel.A text area similar to Figure 4.6
will be shown,allowing live changes to the XML source.Changes to the node
kind or to the id will be ignored.Saving the edit panel sends the new source
code to the server where it is parsed and returned to the client as described in
Section 4.15.
To demonstrate FerryLeaks'validation
Figure 4.5:An erroneous node
techniques we could for example change
the name of a column introduced by an
attach node.All nodes referring to his
column are subsequently marked as erro-
neous (Figure 4.5).An exact description
of the error is given,namely that a non
existing column is referred.If a node has
multiple children,the message tells in which node the column was expected,
thus providing enough information to x the bug.We close this chapter with a
note that FerryLeaks is able to parse almost any algebraic plan,no matter how
erroneous it may be.No grammar checking is done during parsing.If a node
does not hold the information it is expected to,FerryLeaks will add an error
complaining that it misses some content.If a node has information it should
not have,FerryLeaks ignores the additional content.The only way to actually
break the parsing is to upload an erroneous XML le.Since FerryLeaks uses
25
4.All Aboard the bugFerry!
9
Figure 4.4:The node edit panel
standard Java methods to parse XML,an error message pops up telling the
user exactly what went wrong while parsing the uploaded le.
4.1.4.Creating New Plans from Scratch
There are two last things missing to complete the list of editing methods oered
by FerryLeaks:the adding of nodes and edges.
After clicking on the Add node button described in Section 4.1.1,a dialog
window pops up asking for the kind of the node that should be added.After
selecting the node type,clicking anywhere on the canvas will make a new node
appear at this position.
Most of the nodes require subsequent content lling.An attach node is of
no use with no columns provided.Filling a raw node with data is done exactly
like editing one.
Adding edges,however,is a bit more complex.Since most nodes of the table
algebra either expect one or two children,the editor has to somehow make sure
26
4.1.Using FerryLeaks
Figure 4.6:Editing a node's source
that outgoing edges can only be added to nodes with vacant child positions.
In FerryLeaks,this is done by using edge connectors to which new edges can
be attached.Note that edge arrows in FerryLeaks are drawn according to the
data ow,not the node relation.This means that when drawing new edges,the
arrow points to the edges origin instead of to the target node.See Figure 4.7
for an example of this.
Figure 4.7:Drawing an edge
4.1.5.Evaluation and SQL Listeners
Query plans can be conveniently evaluated against a back-end-database.Cur-
rently,PostgreSQL is used as the default,but any DBMS supporting the ANSI-
SQL 92 standard will do.FerryLeaks allows the evaluation of either a whole
27
4.All Aboard the bugFerry!
query plan or of single nodes and their sub-graphs.The former can be done
by either clicking on the Play button in the left toolbar or by using the tab's
context menu.For the latter,the node menu depicted on Figure 4.3 can be
used.In both cases a dialog window resembling Figure 4.8 will pop up ask-
ing to give some information on how the evaluation should take place.This
dialog resembles the one for SQL generation except that an additional tab for
database conguration is accessible.As described earlier,FerryLeaks hides the
Figure 4.8:The evaluation dialog
serialize relation node be default.Hence,specifying which columns should be
used as item,sort and iteration columns has to take place in the evaluation
dialog.An evaluation context will be created which can be saved on the server
by checking Save for current node.Item,sort and iteration can be selected just
like they would be in the serialize relation node by means of selecting the ap-
propriate columns in the corresponding drop down menu.As an extra feature,
FerryLeaks has the ability to generate these columns on its own.Iteration can
be specied by simply stating an integer.FerryLeaks will add an attach node
containing this value to the plan and tell Pathnder to use it as the iteration
value.The position values can also be generated automatically by selecting a
sort order and the column on which the sorting should take place.FerryLeaks
will hang a rownum node to the top of the query plan,thus telling Pathnder
how to sort the evaluation result.
28
4.1.Using FerryLeaks
Finally,the Database panel has to be lled out to specify the back-end
database.These settings can either be saved per plan,per node or session-
wide for later use.
As a special feature,SQL listeners can be hung onto query plan nodes by
selecting Add SQL listener from the node context menu.The dialog window
resembles the evaluation dialog.After lling it out,a widget is added to
the node containing the live database result after this node.SQL listeners
are updated automatically each time the query plan changes and can also be
refreshed manually.Listeners can be dragged along with their parent nodes
and will be deleted with them.Figure 4.9 shows a query plan with multiple
listeners attached.
Figure 4.9:SQL listeners in action
4.1.6.Output Formats
The I/O menu contains the main output method:downloading the whole plan
bundle as a single XML le.This is done by clicking on the Download plan
29
4.All Aboard the bugFerry!
bundle button.It is also possible to download single plans by right-clicking on
the corresponding tab item and selecting Download.
Other output formats include the generation of an XML representation of
a single node,either by selecting Get XML from the node menu or by opening
the edit window and navigating to the source edit panel.
SQL generated by a back-end Pathnder installation can be downloaded
by selecting Get SQL from the node context menu.A pop-up will appear
containing the compiled SQL:1999 code.
1       1   iter7_nat,(a0000.item3_int - a0000.item4_int)  
item6_int
2     (      (2,5))   a0000(item3_int,
3 item4_int);
Listing 4.1:SQL compiled by Pathnder and outputted by FerryLeaks
It is also possible to get a graphical SVG representation by clicking on Get
SVG of current plan in the I/O menu.This SVG code can be embedded into
an HTML page or can be used as a standalone image le.
4.2.Implementation
FerryLeaks'implementation follows the MVP pattern introduced in Chap-
ter 3.1.2.But in contrast to MVP,several communication interfaces are not
part of the main presenter class as FerryLeak's main view,the GraphCan-
vas,was designed to work as a standalone widget.Additionally,there is no
monolithic model class on the server-side.Instead,FerryLeaks uses multiple
services for uploading les,manipulating the query plan,handling sessions
and outputting.All servlets are connected by means of session and context
mechanisms.There is no meta-class holding everything together.
Figure 4.10 shows a simplied class diagram of the editor.This chapter
gives an introduction to each class depicted.First,the GraphCanvas and its
FerryLeaks extension,the LogicalCanvas are discussed.Chapter 4.2.2 gives an
overviewof howRemote Procedure Calls are used to ll the canvas with content
from a remote server.After that,the parsing techniques of FerryLeaks are
presented.This includes the editor's schema description language (somewhat
similar to the XML Schema language (XSD) [28]) which is used to specify the
grammar of the table algebra.An introduction on how plans are manipulated
on the server and written back to the client is given in Chapter 4.2.4.We
30
4.2.Implementation
View
LogicalCanvas
<<interface>>
AlgebraEditorView
GraphCanvas
(provides basic
node drawing
methods)
ControlPanel
AlgebraEditor
Presenter
Raphael
PlanModelManipulator
RemoteFiller
RemoteSorter
Model
invokes
PlanCommunicationServlet
QueryPlan
Session
calls
updates
callbacks
Figure 4.10:Simplied class diagram of FerryLeaks.The dashed line depicts
the separation of client and server.
have a quick look on how FerryLeaks validates query plans and how custom
validators can be added to the editor by future developers in Section 4.2.5.At
the end of this chapter,the session object holding everything together will be
brie y discussed.
4.2.1.The GraphCanvas
At the heart of FerryLeaks'GUI lies the LogicalCanvas class,an extension of
the GraphCanvas (see Fig.4.10).The LogicalCanvas introduces some table
algebra specic methods such as attaching SQL listeners to nodes or entering
node/edge adding mode.Furthermore,it works as a wrapper conguring the
underlying GraphCanvas the way FerryLeaks requires it to be.In this section,
we discuss the abilities of the GraphCanvas both as the core of FerryLeaks
GUI and as a standalone library for drawing graphs.
The GraphCanvas provides a fully congurable GWT widget for drawing,
editing and sorting graphs of all kinds.It was developed primarily for Ferry-
Leaks.Other GWT libraries such as gwt-graph only oer basic drawing
functionality [11] but lack GraphCanvas'built-in methods for node dragging,
adding selection handlers or providing interfaces for sorting classes (see Sec-
tion 4.2.2).The GraphCanvas was built before principal development of the
FerryLeaks editor started and was improved along the way,making it a solid
library for FerryLeaks to depend on.It has been successfully tested on a large
number of browsers,including Firefox 1,Internet Explorer 6 and Konqueror.
31
4.All Aboard the bugFerry!
In particular,GraphCanvas'features include
 displaying directed and undirected graphs
 special display modes such as"tree",which causes new edges to appear
in a tree-like way
 methods to add new nodes and edges to the canvas
 sorting the graph by means of implementing the GraphSorter class
 optional animations
 drag & drop functionality
 attaching either SVG elements or GWT widgets to a node.These wid-
gets are moved,animated and removed together with their parent node.
(Fig.4.9)
 own methods for checking onMouseOver,onMouseClick and onMouseOut
on nodes
 optional ability to work with an instance of FullScreenDragPanel
 optional ability to use an external dot installation for graph sorting with
an instance of the DotSorter class
 node styling via Cascading Style Sheets
The GraphCanvas class extends RaphaelJS'standard Raphael class for browser-
independent vector drawing (Chapter 3.2).Note that the GraphCanvas only
uses basic vector drawing techniques like drawing boxes or paths.It can be
mapped to virtually any other vector library.Raphael was used because of the
advantages discussed in Chapter 3.2.Furthermore,GraphCanvas is not re-
stricted to be used with web applications.It could also be used with standard
Java libraries such as AWT or Swing.
GraphCanvas holds GraphNode and GraphEdge objects as an internal rep-
resentation of the graph (Figure 4.11) but can be used without having any
1 GraphCanvas c =    GraphCanvas(300,200);
2 c.addNode(1,0xFFFFFF,30,50,100,40,"hello");
3 c.addNode(2,0xFAFAFA,180,50,100,40,"world!");
4 c.createEdge(1,2);
Listing 4.2:Usage of GraphCanvas
knowledge about them.An example of GraphCanvas'usage is given in List-
ing 4.2.
As a GWT library for drawing graphs GraphCanvas is unprecedented in its
functional range.We plan on making it into an own open source project and
committing it to the GWT community.
32
4.2.Implementation
Raphael
Provides drawing
of vector
graphics
GraphCanvas
+width: int
+height: int
...
addNode()
createEdge()
+layout()
+animateTo()
+moveTo()...
<<interface>>
Fillable
+addNode()
+createEdge()+clear()
Widget
GraphNode
+id
+caption+color
+...
GraphEdge
+from+to
...
GraphEdgeModifier
GraphNodeModifier
Figure 4.11:(Incomplete) class diagram of the GraphCanvas
4.2.2.Filling and Sorting
For loading and sorting graphs,FerryLeaks uses RemoteFiller and Remote-
Sorter classes implementing GraphCanvas'GraphFiller and GraphSorter
interfaces.RemoteFiller acts as a remote service which provides basic meth-
ods to request bulks of semantic-free RawNode and RawEdge objects.The prin-
cipal data- ow is depicted in Figure 4.12.After uploading an XML le on the
GraphCanvas
UploadDialog
Parser
RemoteSorter
example.xml
HTTP
upload
RemoteFiller
Server
Client
RawNode / RawEdge
Figure 4.12:Data ow for lling and sorting the canvas
basis of traditional HTML methods,the input is parsed (see Chapter 4.2.3)
and saved into the server's session object.Since the view does not need to
know any internals of the model it is editing,the plan is rst translated to
a basic RawEdge/RawNode representation and subsequently sent to the client,
where the nodes are drawn onto the canvas.A RawNode object only contains
information on how to display the respective node (e.g.labels,colors,border
thickness) as well as the node's id.RawEdge only holds data regarding target
and source nodes.
33
4.All Aboard the bugFerry!
Data ow is minimized this way.As a general design rule,the FerryLeaks
client never holds more than exactly one node actually containing semantic
content of the table algebra.Semantics are only transferred for editing pur-
poses after opening the Edit Node dialog (Fig.4.4).Chapter 4.2.4 sheds more
light on how the editor keeps synchronicity between the client view and the
server-side model.
For layouting the graph,FerryLeaks has built-in sorters for circle and in-line
sorting,both implementing the GraphSorter interface.The default DotSorter
class,however,implements a more sophisticated layouting mechanism by us-
ing an external dot [14] installation (Figure 4.13).The sorting service ac-
cepts RawEdge and RawNode objects as an input and transforms them into
the standard GraphViz [14] format accepted by dot.The sorted graph is re-
dot
RemoteSorter
DotSorter
GraphCanvas
1
2
3
4
5
6
FerryLeaks
System
Figure 4.13:Stages of dot sorting
transformed into RawEdge and RawNode objects and sent back to the client,
which updates the node positions accordingly.Since dot is a carefully engi-
neered tool and its sorted (XML) output parsed with standard Java libraries,
this approach can be considered very stable.
Again,note that all sorting is done completely without any knowledge of
the actual node semantics.
4.2.3.Parsing Node Plans Using Schema Files
One of the main features FerryLeaks oers is a dynamic and easy to use way to
add new node kinds.This becomes essential as Ferry,Pathnder or the table
algebra are further developed.Hard-coding the dierent types of nodes by
means of providing a single class (extending some meta-PlanNode class) would
have required re-compiling and re-deploying the whole application every time
a new node kind was added.Additionally,since every node has to be parsed in
a similar way,even minor changes to a single node's XML syntax would have
meant rewriting large parts of the parser.
Instead,FerryLeaks uses its own description language to map the table al-
gebra's grammar.The language itself is based on XML.Its elements hold
34
4.2.Implementation
information on how to parse a certain XML le along with some meta at-
tributes telling FerryLeaks how the node should be rendered and which editing
tools should be displayed.FerryLeaks'node schema language resembles the
standard XML schema language XSD [28].XSD is not used for reasons of
proportionality.Furthermore,XSD was designed as a descriptor for XML val-
idation rather than as a language describing how to parse it.Listing 4.3 shows
1 <nodeschema kind="not">
2 <properties>
3 <property name="color">0xc0c0c0</property>
4 <property name="label_schema">[_kind]</property>
5 </properties>
6 <schema>
7 <gointo xmlob="content"howoften="1">
8 <val xmlob="column"name="rescol"howoften="1">
9 <fields>
10 <field type="__COLUMN_NEW">name</field>
11 <field type="boolean"must_be="true">new</field>
12 </fields>
13 </val>
14 <val xmlob="column"name="column"howoften="1">
15 <fields>
16 <field type="__COLUMN">name</field>
17 <field type="boolean"must_be="false">new</field>
18 </fields>
19 </val>
20 </gointo>
21 <val xmlob="edge"name="edgeto"howoften="1">
22 <fields>
23 <field type="__NID">to</field>
24 </fields>
25 </val>
26 </schema>
27 </nodeschema>
Listing 4.3:XML schema of a not node
an example of the schema language.The code has been abridged for read-
ability purposes.Some attributes containing information on how the node's
labels should be lled have been suppressed.The properties tag can contain
general information such as how the node should be displayed (its color,its
label schema) or how it behaves in terms of column referencing.The actual
node schema can be found within the <schema> tag.FerryLeaks allows four
descriptive elements within <scheme>.
GoInto the <gointo> tag will parse an XML element to an internal wrapper
object accumulating other content objects.It can hold neither values
35
4.All Aboard the bugFerry!
nor attributes.If attributes are specied,they will be ignored by the
parser.
Val <val> tags describe XML elements containing either values or attributes
(or both).The elements will be parsed to an internal object holding every
eld specied within the <fields> tag.If an XML object holds an actual
value,the parser will only save it into the internal representation if the
has
val attribute is set to true.
Fields aggregates <field> elements and can only be used within <val> ob-
jects.
Field <field> tags specify exactly which attributes are expected to be found
in the surrounding XML object.Special attributes like must
be or
can
be exist.
The gointo and val elements have to hold an attribute howoften.This value
tells the parser how often the described element is expected to be found in
the XML node currently processed.Possible values resemble the quantiers
known from regular expressions.For example,howoften="?"expects either
one or none occurrence,howoften="f3,6g"expects a minimum of three and a
maximum of six occurrences,howoften="1"only allows one of these elements
to be found and so on.
Listing 4.4 shows the content of a standard not node in the XML represen-
tation.This node could be parsed and validated using the schema shown in
Listing 4.3.We especially point out that both of the columns introduced by
the not node are described by the schema le as being of type
COLUMN
NEW
and
COLUMN respectively.These special types tell FerryLeaks that either
a new column is being introduced here or that an old one is being referred
to.In the former case,the validator will check whether a column with that
name already exists.In the latter case,FerryLeaks will only display referable
columns as possible values when entering the edit mode.The validator also
checks whether referenced columns exist.For FerryLeaks,the table algebra
1 <content>
2 <column name="COLNAME"new="true"/>
3 <column name="COLNAME"new="false"/>
4 </content>
Listing 4.4:XML representation of a not node
grammar [4] has been completely translated to schema les.Each le can hold
several node schemes.The parser takes these les and processes the uploaded
le accordingly.As depicted in Figure 4.14,query plans are then saved as
36
4.2.Implementation
<query_plan_bundle>
<query_plan id="0">
<logical_query_plan unique_names="true">
<node id="1" kind="attach">
<content>
<column name="item1" new="true">
<value type="int">1</value>
</column>
</content>
<edge to="0"/>
</node>
<node id="0" kind="table">
<content>
<column name="iter" new="true">
<value type="nat">1</value>
</column>
node schemes
PlanNode
PlanNode
PlanNode
PlanNode
PlanNode
PlanNode
Figure 4.14:Parsing XML input
linked PlanNode objects,all lled with the information specied in the scheme
le.
Node labelling is also done with the help of schema les.<Gointo> and
<val> tags can hold an attribute name
to
print which can be lled with place-
holders referring to child elements.For example,[col
name] will be replaced
with the value of the eld named col
name,fright
colg will be replaced by
the (evaluated) name
to
print attribute of the element parsed to the internal
name right
col.The same placeholders can be used within the label
schema
property of a scheme le.Special placeholders contain [
kind],which will be
evaluated to the node's kind string and [
val],which will be replaced by the
value of the underlying XML object.
The internal representation of the schema les is also used to generate Fer-
ryLeak's output.The XMLBuilder class adds XML elements in the same order
as they are specied in the node scheme le.For writing XML documents,
FerryLeaks uses the JDOM library [26].
We close this section with a remark on FerryLeaks're-usability.Since the
GraphCanvas can be congured to display all kinds of graphs as described
in Section 4.2.1,FerryLeaks could indeed be used to edit virtually any input
graph as long as it is provided in an XML format.FerryLeaks is not restricted
to being used with the table algebra.The schema language introduced in this
section forms a tool to describe and parse XML based graphs.For example,
one could easily think of schema les describing the elementary algebra.
4.2.4.Manipulation
As discussed in Chapter 4.2.2,the FerryLeaks client does not hold any seman-
tics regarding the query plan.The model resides on the server as shown in
Figure 4.10.To maximize synchronicity between the client view and the server
model,FerryLeaks never anticipates changes to the underlying plan when sub-
37
4.All Aboard the bugFerry!
mitting edit requests.This means that sending requests for plan manipulation
and the actual redrawing of the view are completely separated.After submit-
LogicalCanvas
PlanManipulator
ManipulationService
invokes
edit
updates
ManipulationMessage
+nodesAffected = (1,2)
+returnCode = 1 (OK)
+action = "delete"
PlanNode
PlanNode
PlanNode
PlanNode
PlanNode
PlanNode
requests
manipulation
Server
Client
Figure 4.15:Data ow during plan manipulation
ting an edit request,the ManipulationService tries to execute it on the query
plan.If everything went well,the servlet returns a ManipulationMessage con-
taining the aected nodes,a return code and an action string.The action string
can be one of delete,update or add.The PlanModelManipulator updates
the editor's view accordingly.If an error occurred,the return code will be set
to a number indicating what went wrong.
All editing methods oered by the PlanManipulator (including adding and
deletion of nodes or edges,updating node contents etc.) are thus being trans-
lated into three basic keywords for updating the view.It is therefore easily
possible to add custom editing methods to the manipulator.Other Ferry-
Leaks features that use the RemoteManipulationMessage class include the
node source code editor or the copy/paste functionality.As described in the
next section,the PlanModelManipulator will automatically validate the query
plan after each manipulation.
4.2.5.Validation
Among others,FerryLeaks oers validation of grammar correctness,node ref-
erencing,column referencing or cycle detection.In this section,we discuss how
these validators are invoked and how custom validators can be added to the
editor.We will especially look at the validator for column referencing,since it
can be congured sophisticatedly through the node scheme les introduced in
Section 4.2.3.
All validators are managed by a central ValidationMachine instance.The
validation machine is called by the manipulation servlet each time the query
38
4.2.Implementation
plan model is changed.The dierent validators are called iteratively.Each
of them adds ValidationError objects to an instance of ValidationResult
hold by the validation machine.The ValidationError class holds the id of
the erroneous node along with a description string.The ValidationResult
class holds the id of the query plan as well as a List of validation errors and is
sent to the client view.The RemoteManipulationMessage class introduced in
the previous section can also hold an optional instance of ValidationResult,
thus providing instant validation results without invoking another servlet call.
Instances of ValidationResult are displayed by the client view as seen in
Figure 4.5.
Validators have to implement the Validator interface which oers a single
validate() method expecting a list of PlanNodes to be validated,a con-
text plan and an instance of ValidationResult to write validation errors to.
Custom validators can be added by means of implementing the Validator in-
terface.An instance of the validator has to be given to the validation machine
via the addValidator() method.We point out that the validators are called
in the order as they are attached to the validation machine.Additionally,we
note that validators can also be used to translate errors already thrown by
other validators.The ValidationResult class denes methods to get and
change the ValidationErrors it holds.
The standard ReferencedColumnsIterator uses information specied in
the node schema les to tell which columns are referable from a specic node.
In particular,the validator processes a node's subtree recursively,searching for
elds with either type
COLUMN
NEW,
COLUMN,
COLUMNfig or
COLUMN
REM.
Fields typed as
COLUMN
NEW add a new referable column.Type
COLUMN
tells the validator that the eld's value has to be a referable column.An
extension to this type is the
COLUMNfig type,which expects the eld value to
be a referable column within the child at position i.The
COLUMN
REM types
remove their eld value from the list of accessible columns.A special node
property reset
columns exists.If set to true,the corresponding node will
only propagate column names labelled as
COLUMN
NEW to its parent nodes.All
other columns will be discarded.This is for example needed for projection
nodes.
A list of referable columns is assembled this way for every node by the
ReferencedColumnsValidator.If the node refers to any columns not appear-
ing in this list,an error is added to the ValidationResult.
We explicitly point out that the eld types introduced above are not part
of the schema description language.The\
"-types are handled by the parser
in exactly the same way as any other data type (string,boolean etc.)
39
4.All Aboard the bugFerry!
For completeness,we note that FerryLeaks uses another special data type
for validation purposes:
NID- typed elds have to provide an integer which
refers to an existing node id.This is for example used to validate a node's
children.
4.2.6.Session Handling
We close this chapter with a description of how the dierent servlets for manip-
ulating,validating,parsing,uploading and downloading plans are connected.
As described in the beginning of this chapter,FerryLeaks'server-side imple-
mentation has no single meta-class that manages data- ow or sessions.Stan-
dard Java-servlet containers like Apache Tomcat or Jetty set up a servlet
environment consisting of a single context object and multiple session ob-
jects [23,20].By storing run-time-data into them,servlet instances do not
have to remain in the main memory.The server can discard them at will,
instantiating them again if needed.In addition,context and sessions are saved
to the persistent memory on a server stop and can be reloaded on a restart.
Usually,there is one context per web application.The context holds infor-
mation needed in general for the execution of the application.Among other
things,FerryLeaks stores the already parsed schema les and the main con-
guration in its servlet context (Figure 4.16).Sessions are created per user
ManipulationServlet
RemoteSorter
RegistrationServlet
RemoteFiller
Session
Session
Session
Session
Context
Server Environment
...
Configuration
Schemas
...
Figure 4.16:Session and context objects
and resemble the session objects known from PHP or Ruby-on-Rails.Their
assignment to specic users is handled automatically by the server (usually
by means of cookies).FerryLeaks mainly stores the internal representation of
the query plan bundle currently edited to the session object.Additionally,the
copy/paste functionality is also implemented via sessions.
40
4.2.Implementation
As the query plans are stored on the server in their current state,Ferry-
Leaks can reload them after a user accidentally closes the browser window (as
described in Section 4.1.1).
The next chapter will give an introduction on how to install a compiled
version of FerryLeaks to an Apache Tomcat environment.However,we note
that FerryLeaks is not restricted to being used with Tomcat.For accessing
the session and context objects described in this chapter,FerryLeaks uses the
standard javax.servlet package [23].Its servlets could thus be installed on
any servlet container implementing javax.servlet,for example Jetty [29] or
Resin [30].As a matter of fact,GWT's Eclipse plug-in uses Jetty as a default
in environment mode.
41
5.Deploying to a Server
After having seen how FerryLeaks is used and how it is implemented,we now
describe how to make the editor available over a network.This chapter serves
as a manual on how to compile FerryLeaks and on how to host it on a live web-
server.It also gives an account of the experiences made with web application
hosting on Tomcat.We describe the steps necessary for creating distributable
Web Archive Files (WARs),give some basic remarks on how to install Tomcat
and close with instructions on how to upload WARs to it.After reading this
chapter,developers and administrators should be able to install a running copy
of FerryLeaks onto their web-server.
5.1.Creating a Web Archive File
Because of the separated nature of web applications,compiling a GWT project
usually requires two actual compilations.The servlet code has to be trans-
formed into Java byte-code,whereas the client code has to be given as raw
JavaScript.The latter is done using a traditional Java compiler (javac),the
former is generated by the GWT cross-compiler [24].
5.1.1.Compiling
With GWT,there are several ways to cross-compile a Java class to equivalent
JavaScript.The following list gives a brief overview of the dierent manners
the JavaScript can be created in.This list is taken from[24].We point out that
the GWT SDK has to be installed for a successful compilation (Section 3.1.3).
 As the Java-to-JavaScript compiler is written completely in Java,it can
be started by running its main class com.google.gwt.dev.Compiler
using java.
 Projects created with Eclipse or with the webAppCreator script can be
compiled by running the build.xml.It is generated automatically.How-
ever,we point out that this does not create a distributable WAR le as
described in the Section 5.1.2.
43
5.Deploying to a Server
 If the Eclipse Plug-in is installed,the project can be compiled by clicking
the Compile button.This approach also does not create a standalone
WAR le.
For testing purposes,the compilation should be started either from Eclipse
or by using ant.Output les will be written to the/war folder by default.
This can be changed by adjusting the -war ag.Additional options include
the -style ag which can be used to dene the compilers script output style.
This is especially needed if the compiled JavaScript is planned to be distributed
to other developers.By default,gwtc outputs obfuscated JavaScript to both
protect the intellectual property of the developer,but also to reduce the down-
load size of the script (Listing 5.2).To obtain readable code,the -style ag
has to be set to either PRETTY (Listing 5.1) or DETAILED.
1 function CheckBox_0(){
2 var e;
3 $clinit_1();
4 CheckBox_1.call(   ,(e = $doc.createElement('INPUT'),
5 e.type ='checkbox',e.value='on',e));
6    .element['className'] ='gwt -CheckBox';
7 }
Listing 5.1:PRETTY output of the GWT compiler
1 function zfb(){var b;ag();Afb.call(   ,(b=$doc.createElement(
xLb),b.type='checkbox',b.value=bMb,b));   .bb[nHb]='gwt -
CheckBox'}
Listing 5.2:OBFUSCATED output of the GWT compiler
For matters of convenience,FerryLeaks ships with a modied version of the
build.xml which supports the creation of a ready-to-go Web Archive File,the
structure of which is discussed in the next section.The WAR le can be created
by running
ant war
from within the project directory.If the project should only be compiled,this
can be done by typing
ant build
In both cases,the build.properties.example le has to be renamed to
build.properties.Moreover,the property gwt.sdk has to be set to the
path of the GWT SDK.Flags can also be set within the property le by
setting gwt.args.
44
5.2.Hosting FerryLeaks with Apache Tomcat
5.1.2.Structure of a Web Archive File
Web Archive les are special JAR les used for the distribution of web appli-
cations [27].In addition to the standard les and directories usually found in
JAR les,WARs include a folder called/WEB-INF which contains the web.xml
le.This le is used to dene the structure of the web application.In partic-
ular,the web.xml contains information on how servlets are mapped to URL
requests.Neither the standard GWT compiler nor the build.xml contained
in the FerryLeaks package create this le automatically.After a new servlet
has been added to the application,it has to be specied as shown in List-
ing 5.3.The <servlet> tag is used to give a name to a servlet class and to
tell the server that a particular class should be accessible as a servlet.The
<servlet-mapping> tag binds a URL pattern to the servlet.In Listing 5.3,
all relative requests to/algebraeditor/register are redirected to the servlet
class.
1 <servlet>
2 <servlet -name>RegistrationServiceServlet</servlet -name>
3 <servlet -class>com.algebraweb.editor.server.
RegistrationServiceServlet</servlet -class>
4 </servlet>
5 <servlet -mapping>
6 <servlet -name>RegistrationServiceServlet</servlet -name>
7 <url -pattern>/algebraeditor/register</url -pattern>
8 </servlet -mapping>
Listing 5.3:Excerpt of FerryLeaks'web.xml
Standard Java classes that should be accessed from a servlet are stored
within the/WEB-INF/classes directory.Prepackaged libraries have to be
placed into/WEB-INF/lib.
We point out that FerryLeaks was only compiled and tested with GWT
SDK 2.3.Earlier versions do not support some graphical features such as
dialog animations.
5.2.Hosting FerryLeaks with Apache Tomcat
Web Archive les created with FerryLeaks'build.xml can be deployed to
any Java servlet container.In this section,we give an example on how to
deploy FerryLeaks to a standard Tomcat installation.We also provide a basic
45
5.Deploying to a Server
introduction to Apache's mod
proxy extension which allows running Apache
and Tomcat on the same port of a single server.
5.2.1.Installing Tomcat and Uploading FerryLeaks
Apache Tomcat is an open source servlet container for Java applications and
can be downloaded for free from the Apache Software Foundation [20].Pre-
compiled packages exist for most operating systems.We demonstrate Tomcat's
conguration with a precompiled Ubuntu 10.04 package of Tomcat 6.
By default,Tomcat listens on port 8080 for incoming requests.In order to
make it use the standard web-server port 80,the server.xml in/etc/tomcat6
has to be edited accordingly.The port attribute of the <Connector> tag has
to be changed to 80.This only works when running Tomcat as root,though.
Another approach is to locally forward incoming port 8080 packages to port
80 by using for example iptables [32,33].
Binding Tomcat to port 80 can cause serious problems when running a con-
current Apache installation,which is usually the case.Several approaches for
integrating Tomcat with Apache exist.A sophisticated way to directly connect
them is the mod
jk extension,which allows multiple instances of Tomcat to
be accessed as workers from the Apache environment [31].However,we found
it both easier to set up and to scale the degree of a Tomcat integration with
Apache's mod
proxy extension.It can be downloaded from [34].
The mod
proxy serves as a simple proxy for HTML connections.Requests
to specic URLs can be forwarded to a local/remote address,thus providing
transparent access to any underlying services.Tomcat and Apache stay sep-
arated.Hence,further changes to the server architecture are easily possible;
scalability is provided.Listing 5.4 shows a basic mod
proxy environment for the
location/editor/.Mod
proxy directives can appear in virtually any Apache
conguration le,but the proper way is to include them within a <Location>
tag.
1 <Location/editor/>
2 ProxyPass http://localhost:8080/FerryLeaks/
3 ProxyPassReverse http://localhost:8080/FerryLeaks/
4 ProxyPassReverseCookiePath/FerryLeaks/editor
5 ProxyPassReverseCookieDomain localhost www.uni -tuebingen.de
6 </Location>
Listing 5.4:A basic mod
proxy setup for connecting Tomcat to Apache
46
5.2.Hosting FerryLeaks with Apache Tomcat
In Listing 5.4,the ProxyPass directive makes any request to/editor/go
directly to http://localhost:8080/FerryLeaks/,which is the local address
of FerryLeaks running in a Tomcat server (Port 8080).ProxyPassReverse
modies the HTTP header to t the original request URL rather than the
local redirection URL.To rewrite cookies to t the browser request URLs,the
ProxyPassReverseCookiePath and ProxyPassReverseCookieDomain have to
be set.Without these directives,browsers would not correctly assign cookies
and session handling would be corrupted.
In this setup,Tomcat stays accessible through port 8080 from the outside.
The port has to be explicitly blocked to disallow remote connections to it.