jW@CK: A Java Web Accessible Computing Kluster

fortunabrontideInternet and Web Development

Nov 13, 2013 (3 years and 10 months ago)

70 views

jW@CK: A Java Web Accessible Computing Kluster


Brett Stone
-
Gross, Dominic Metzger, Alon Levi, Anders Smestad

Department of Computer Science

University of California, Santa Barbara, CA, USA

{bstone, metzger, alevi, asm }@cs.ucsb.edu


Introduction

The impet
us for the creation of jW@CK was motivated by the realization that our
previous “compute farm” implementation required a breadth of knowledge of the
internals of the system. Our intention was to make an improved “compute farm”
(jW@CK) which would provide a
ll the necessary functionality of the original design
through an intuitive web
-
interface, while maintaining accessibility to previous external
applications. jW@CK is a java
-
based application interface running on a tomcat server
that is accessible via a PHP

generated website. The backend is written entirely in Java,
and makes heavy use of Java RMI. Users can create and store new tasks through a web
interface and execute them at their leisure. Additionally, the system is persistent and
fault
-
tolerant and can
be fully administrated through the website.


User Features

The following end user available features are implemented in jW@CK

1.

Extended access: No download required. A user only requires a web browser and
can execute tasks from virtually anywhere.

2.

Execute:

Automated system via Apache/Tomcat connects with the RMI server in
order to execute tasks. Users can monitor the execution and server statistics in
real
-
time.

3.

Create new tasks: Tasks can be created dynamically and loaded into running
system.

4.

Receive resu
lts: When a task is executed, a user can either manually check the
result and invoice after it becomes available, or can have the result and invoice
sent to an email address when it becomes available.


Administrative Features

The following features aid in
the administration of a jW@CK system:

1.

System Startup and Shutdown: the entire system can be brought up or down
through the admin tools on the jW@CK website.

2.

Computer management: functionality is included through the admin web interface
for connecting, or
gracefully shutting down a variable number of new computers
to the system.

3.

jW@CK Status: jW@CK provides the system administrator with real time system
information that is easily accessible through the website. Information provided
includes memory usage sta
tistics of the task server and the HSP, number of
currently active user sessions, and connected computers, in addition to number of
tasks and results waiting in the task servers queues. The latter to give the
administrator an idea of the current system con
figuration.



System Architecture

In order to efficiently build and execute tasks the system is modularized into six primary
components (see Figure 1):



Client: utilizes a web browser to interact with the system. Communicates with the
web server with the S
ecure Socket Layer (SSL) to protect private information
including credit card numbers during registration. The reason for this, is the idea
that the user will be billed for cpu time on the compute farm.



Apache Web Server: serves dynamically generated PHP
web pages for user
interaction, and interfaces with a mySQL database to store user registration, task
creation variables, and account settings. An additional design consideration that
we used was PHP sessions which maintain state on the web server rather
than
requiring users to store cookies on their local machine, thus eliminating any
potential privacy concerns. In order to make the website more dynamic and
interactive, we use Asynchronous Javascript with XML (AJAX) for some of the
activities, such as sta
tus updates, system graphing, and during the interaction with
the HSP during creation of user defined tasks. The Apache http server filters
certain urls to be forwarded to the Apache Tomcat server.



Tomcat Backend: HTTP URL requests are received from the us
er and are mapped
by Tomcat to various Java servlets. The Java servlets are then executed and

start parsing the received POST / GET data. Whenever any class from the Tomcat
server needs to connect to the HSP service, it needs the remote interface, to
inter
act with. This is done by looking up the HSP server once, and caching the
object, to avoid calling a HSP lookup for each consecutive call. However, if the
connection is lost, the Tomcat server will try to reconnect to the HSP server. This
results in that w
e can restart the HSP server without needing to restart the Tomcat
server. In similar fashion the HSP does not store any state for the Tomcat client, it
interacts as any other user client, so the likewise, the Tomcat server can be
restarted without it affe
cting the HSP service.



The Java servlets call remote methods on the HSP according to the provided URL
and pass in the parsed user data. The Java Servlets are the connection between the
user / administrator and the Java RMI Server. In the following paragra
phs, the
main user and administrative tools are listed and the required interaction with the
Java RMI Server is described for each one of them.

Task submission and retrieval:

1.

The Java servlets receives a job from the user, submits it to the HSP, and
return
s a job reference number to the user.

2.

The open connection to the user is terminated immediately thereafter so
that the user does not have to wait while the task is being processed.

3.

If the user has chosen to receive the results by email, the Java servlets

will
wait for the results from the HSP to return and will then email them to the
user. Alternatively, the user can utilize the website tools to check the
status of the submitted tasks. The user will be provided with either a
message that the jobs are stil
l in progress or with the results in PDF or
HTML format.

Note that outstanding results are stored in Tomcat in a shopping cart manner
by the use of Java HTTP sessions.

Starting up the HSP:


To start up the HSP, the administrator uses the admin tools on the

jW@CK
website. When the administrator starts an HSP, the PHP code executes an Ant
target which deploys the HSP on an available machine.


Adding computers:

To add computers to the compute farm, the administrator utilizes the website’s
tool to notify the Ja
va Servlets to contact the HSP. The HSP then securely
copies (SCP) the computer.jar and a start script to an available remote host.
To execute this task automatically, a public and private key pair has to be set
on the participating machines. Once the scri
pts have been uploaded, the HSP
logs into the chosen machine through the secure shell protocol (SSH) and
executes the start script to initialize a new computer.

Shutting down Computers / Server:


To kill a connected computer or to take down the HSP, the Ja
va servlet makes
simple RMI calls to the HSP to initialize the shut down procedure.

jW@CK Status:


Java Servlets makes a remote call to the HSP which then gathers information
from the Taskserver and itself. This information is sent back to the Java
Servlet

which formats it to html and passes it on to the user.



Hosting Service Provider: In addition to assigning connecting computers with
appropriate services, the HSP acts as the gateway to the internals of the jW@CK
system. In addition the HSP contains code f
or user code generation, directs
computers to the task server,



Computers: Calculate and perform the requested operations as defined in the
submitted task and returns the computation to the task server.


C
l
i
e
n
t
C
o
m
p
u
t
e
r
s
P
H
P
J
a
v
a

S
e
r
v
l
e
t
s
T
o
m
c
a
t
D
a
t
a
b
a
s
e
(
M
y
S
q
l
)
J
a
v
a

R
M
I

S
e
r
v
e
r
A
p
a
c
h
e

W
e
b
s
e
r
v
e
r
H
S
P
T
a
s
k
S
e
r
v
e
r
J
W
@
C
K
H
T
M
L

/

A
J
A
X
C
l
i
e
n
t
C
o
m
p
u
t
e
r
s
P
H
P
J
a
v
a

S
e
r
v
l
e
t
s
T
o
m
c
a
t
D
a
t
a
b
a
s
e
(
M
y
S
q
l
)
J
a
v
a

R
M
I

S
e
r
v
e
r
A
p
a
c
h
e

W
e
b
s
e
r
v
e
r
H
S
P
T
a
s
k
S
e
r
v
e
r
J
W
@
C
K
H
T
M
L

/

A
J
A
X





Figure 1: The jW@CK system architecture.


User
Defined Task Generation

User defined task generation was implemented in order to provide a more accessible way
for a potential user to take advantage of the system. Previously, a user would have to be
aware of internal system conventions and architecture a
nd produce a variety of classes in
order to successfully write a task. With our user task generation web forms, the user can
instantiate their own program with no knowledge of system components. User defined
task generation is based on the model of a recur
sive computation. The user is directed to
input code for various components of a simple recursive computation in Java. Our system
then takes these components and generates and compiles Java source that performs a
distributed divide
-
and
-
conquer computation
over numerous computers. The user defined
task generation is implemented in 5 steps on our website


1.

Definition of Input, Global and Result components. (3)

2.

Has the computation reached a base case?


(1)

3.

If so, how to compute a Result from an Input


(1)

4.

O
therwise, given an input, what recursive calls do you want to make? (1)

5.

Definition on how to compose results (1)


The system then takes these five points of information and based on them generates seven
files, in addition to three other files which alw
ays contain the same code. The generated
files include: Input.java, Global.java, Result.java, ToCompute.java, Compute.java,
Branch.java, and Compose.java, respectively. The compilation of classes is incremental
in order to inform the user when they have in
put invalid code.

When the user wishes to run a task, they are again prompted to input code for
creation of initial input and global object. After compilation of 2 more files tasks are
dynamically loaded into the running system. The system supports saving

of generating
tasks, and allows for the simultaneous generation and execution by different users. If a
user wishes to run the same task multiple times during the same execution of the jW@CK
system, the classes have to be re
-
loaded using a custom classload
er. This is done in the
class gen.UserTaskLoader, and since the class loading depends on the defining class
loader, by using a new instance of the java.net.URLClassLoader, the classes get reloaded
each time the user needs to generate new input objects for
a user task.


Maven

One of the secondary goals of the project was to gain experience with the maven
deployment and project management tool. We were not able to get this finished by the
time of the demonstration, but we were still eager to get it to work, s
o the attached
archive contains a pom.xml file to build the project using maven. The project is built and
a project jar file is created and deployed by running the “mvn deploy” target. This
compiles all code, and runs all available test classes. By running

the “mvn site
-
deploy”
target a project site is created for the project. Just for testing purposes, we added
configurations of plugins to the pom.xml, to do various report generations, some of which
does not seem to work, most likely due to configuration
mistakes. The maven
configuration by default builds the system to the directory “target”. In this directory the
compiled class files reside, along with the project.jar, and the files for the generated site.
The file “target/site/index.html” is the front p
age to the project page, and gives an idea of
how a project site may look. The reports link should have links to code analysis reports,
test coverage reports and others, currently the only two that we were able to run were the
javadoc generation, and also
interesting, the javadoc style source code pages. The article
from javaworld at:
http://www.javaworld.com/javaworld/jw
-
02
-
2006/jw
-
0227
-
maven_p.html

gives an introduction to
maven2 usage. admin/jw@ckadmin


System analysis

(This section is also found in the documents folder of the jar turnin, as it contains
information about installing and configuring the system).

The system is comprised of several components that needs to run

in order to use the
system. We packaged all these components in an archive file, that can be downloaded
from the project website. The size of this file makes it unfit for sending in email. The link
to the file is
http://csil.cs.ucsb.edu:8889/codemonkey/project.tar.gz


The system as
demonstrated monday June 12
th

is still accessible from the url
https://csil.cs.ucsb.edu:9090/



In the bottom we rely on a database for storing persistent information. We use mysql
version 5.0.22, and can be found in the project archive. Furthermore we need an apache
httpd server, for this we u
se httpd
-
2.0.58, with a php module compiled with mysql
support, and also a mod_jk module for connection to the tomcat server. This is also found
in the project archive. The tomcat server we use is version 5.5.17, and is also found in the
project archive. T
hese are all components that need minimal configuration, and are ready
to use when installed.


The apache httpd.conf file needs two changes, the one is the location of the
workers.property file needed by httpd in order to forward requests to the tomcat ser
ver, it
is by default found in the config/ directory of the tomcat installation, and it may be
necessary to modify the path to this file. Furthermore the web folder alias "codemonkey"
points to the directory where the user generated class files are stored,

this is required in
order for the computers to be able to load the user generated classes. This folder is by
default in the HSP directory and called "userClasses".


To start up apache, use the command "bin/apachectl start", similarly the mysql server, is
started by issuing the command "bin/mysqld" in the mysql directory. To start the apache
tomcat server, the command is "bin/catalina.sh run" (or use "bin/catalina.sh start" in order
to see the output from this program on the console as the server runs. This

turned out to
be a convenient way as opposed to tail three different log files when web application
exceptions occurred).


Once this is started, the project website is up, and can be accessed from the url
https://hostname:9090/ The hsp server can as shown

in the project demonstration be
started through the web interface, this requires however that the path to the HSP
installation directory is known by the php scripts that executes this startup command.
This can be modified in the file "startHSP.php".


The
HSP server can also be started manually by running the ant target "ant runServer". In
the current configuration this will also start a task server on the same machine.


The way the computers are added to the system is by first running the ant command "ant

deploycomputer", which generates a "quickStartComputer" script along with the
computer.jar. Then in turn the two files and a policy file is copied to host where the
computer is to be started, and the script is then executed on the remote host. This was
do
ne in order to simplify the startup of the computer,since it needs quite a few
configuration settings, such as the policy file, a codebase, computer configuration
parameters etc... It is assumed that the user that runs the HSP server also has
passwordless
private/public key authentication set up in order to be able to log in to the
remote computers.


The list of computers that are connected to is contained in the file staticfiles/service.hosts
from the HSP home directory.


In order to redeploy the tomcat we
b application, the ant target "ant warit" packs all the
necessary files in a war file and drops it in the tomcats webapp folder. The tomcat server
automatically detects the new file, extracts the content and restarts the webapplication. In
order to have t
he warit target copy the war file to the right location, it may be necessary
to modify the parameter "tomcat.webapps" from the build.properties in the HSP home
directory.


This may seem like a lot of installation and configuration in order to get the syst
em up,
and we must admit that we spent quite some time in order to get everything to work
together, specially the apache/php/tomcat combination turned out to be harder than
expected, but the harder it got, the more rewarding it was to finally be able to ge
t the
system up and running, and it turned out to be quite stable. The first approach was to
build in a custom php module into the apache tomcat server, but as shown in the in class
project presentation, the system was very unstable, so we rejected this id
ea, and spent
some time to get the current system to run.


However, when the system was set up and configured, we spent virtually no more time
on maintenance issues and could focus on the development of the system.


We would also like to mention that we us
ed SVN for revision control, throughout the
whole quarter, and that the extra effort of using this system definitively showed
worthwhile specially when we were 4 developers on the same projects, and not having to
worry about one developer overwriting anoth
er developers changes was very helpful.
This also let us easily have multiple up to date installations of the system, so that we had
one production system, but also the individual developers could track the latest changes
on laptops and run the entire syst
em locally. We could also set up automated updates of
the production system, so it would always run the latest revision of the system.


Package structure

Most java packages should be self explanatory, some additional were added. The invoice
package contain
s all classes related to invoicing the system. The web package contains
java code for the web application and is deployed on the tomcat server, an additional
package plotter exists, containing code for a graph applet, but we were not able to
successfully i
ntegrate with this applet to have it show the live statistics as we would like
to. The window package contains a GUI component used to monitor and control various
components of the system, it was for example used to show a window on a computer, so
that if
a user decided it did not want to donate more cpu power, it could shut it down by a
click on a button. Similar functionality exist for the task server and the hsp, however is
not used in the deployed system. The functionality can be turned on by setting th
e gui
parameters in the build properties to true. The most important new package is the gen
package that contains all the files used to generate user classes, as well as the dynamic
class loader. The main component of this package is the Generator componen
t.
Interestingly, the initial development of this package was entirely done aside from the
rest of the project, and by providing only the public function names, with empty
implementations, we could test the rest of the system without this package fully
imp
lemented. We could test the rest of the system up to this point, and the generator was
developed and tested separately. When the two were finished, the generator was plugged
in and the deployment was painless.


Turnin structure

The attached file jwack.jar

contains all java source code, under the maven
-
compatible
source directories src/main/java and src/test/java for main classes and test classes
respectively. The folder staticfiles contains a few “static” files used by the application.
The directory php co
ntains compressed php code used for the php part of the application,
including the myphpadmin application. The target directory contains the maven site, and
the other products from the maven build.


Conclusions and Future Work

The jW@CK Project is an effec
tive system for enabling any user to create a task that can
be executed on a compute farm, provided that the task is not too complex. Future work
could include a more simplified functional programming based language for user task
definition. In addition,
we could add more features that would enable users to create
tasks in segments rather than forcing them to define the task in one session. Thus a user
would be able create a task, logout and come back later to complete the task
implementation.


References

[1] Cappello, Peter. “Janet's Abstract Distributed Service Component”,
Proc. 15th
IASTED Int. Conf. on Parallel and Distributed Computing and Systems,
pages 751
-

756,
Marina del Rey, California, Nov. 2003.