Scalable and Highly Available Infrastructure for J2EE Applications

tukwilagleefulInternet and Web Development

Oct 31, 2013 (4 years and 8 months ago)


Scalable and

Highly Available Infrastructure

for J2EE Applications

A Case Study:


Education and Training Administration

Riddle Aeronautical University

written by

John Vaughan, DataRoad, Inc.

Marty Smith, Embry
Riddle Aeronautical Un


In this white paper we will discuss the development of a highly available, scalable and secure
infrastructure designed to support the operation of a web
based J2EE application. The project
involved implementation
of a flight training management application for Embry
Aeronautical University. This application provides management for flight training operations to a
variety of organizations and is the main focus of efforts at Embry
Riddle to standardize, automat
and secure control over the activities of flight training globally.

pplications that provide these services must be able to combine existing information with new
business functions that deliver services to a broad range of users. These services need to

Highly available, to meet the demands of a extended business environment

Secure, to protect privacy and integrity of data

Reliable and scalable, to guarantee that business transactions are accurately and
promptly processed

In reality, Java technology
is only as scalable, available, and manageable as the infrastructure on
which it runs. When the platform can’t keep up with growth in the number of users, transactions
per user, or transaction bandwidth, applications perform poorly and websites slow to a c

Like any other enterprise application, a server
side Java application can be brought down by a
hardware fault, a software fault, a network fault, or an environment fault. Whatever the reason,
there is no room for downtime in an e
business environment
. Properly designed data centers
address the network and environment fault issues by providing redundant power and internet

This paper presents a case study of implementing a highly available and scalable solution that
combines Oracle9iAS, Or
acle9i RAC, SSL accelerators, and hardware load balancers. This
solution was designed and implemented for Embry
Riddle Aeronautical University, the largest
training school in the world. ERAU's J2EE application supports every aspect of the school's
light training program, so scalability and 24/7 availability are critical.

Riddle Approaches the Future


AeronauticalUniversity is the largest independent aeronautical university in the
world. The
profit institution educates mor
e than 24,000 students annually through thirty
degree programs. Its ROTC detachments train more Air Force pilots and commissioned officers
than any other institution except the Air Force Academy.

The flagship program at

however, is the educa
tion of commercial pilots
. This
training is a demanding process that requires
a comprehensive and continuously reviewed
program of advanced learning.
Consequently the university is launching a new curriculum to
revolutionize pilot training brining informat
ion technology to bear on the issues involved.

oversees all the usual elements of an academic program such as student records,
human resources, financials and facilities. But when it comes to sending a student airborne, the
school must also t
rack a host of factors that can change by the minute, such as weather, air
traffic, condition of the plane or simulator, and the health and certifications of both the student and
instructor/pilot. For every sortie, the instructor, student and craft must me
et specific qualifications.
Relying on manual methods, Embry
Riddle staff scramble to match craft, student and instructor.

Compounding the challenges is the scale of Embry
Riddle’s operation, which includes

in Daytona and Prescott, Arizona,
education centers and a distance
learning network.
At the
Daytona campus alone, 80 instructor/pilots supervise 550 student flights daily using a
fleet of 139
instructional aircraft.

Until recently the tracking of training programs worldwide has largely
been a paper and pencil
procedure. This practice is inexpensive and requires little training to maintain but is fraught with
opportunity for error that can, inevitably, lead to inaccurate information being provided to the
training institution, trainers and

students alike.

Their approach to dealing with the increasingly dynamic requirements of this state of affairs was
to develop an automated, Internet
capable information management system for tracking flight
training data for their students. This allowed
Riddle to re
engineer the training of
commercial pilots to provide them with a better education at less cost. This new curriculum blends
all of the required skills into one seamless course.

This innovative approach to curriculum management is
’s pioneering Education and
Training Administration (ETA) system, which
applies “just
time” methods to orchestrate the
costly human and capital assets required for flight training.
Riddle expects the ETA system
to enhance instructional quali
ty while reducing student expenses and institutional overhead.

As was mentioned earlier, until ETA, Flight Training Management was largely organized through
pencil and paper operations. These operations are a
ntiquated, time consuming and inaccurate.

iculty to maintaining currency of data and human error compound the problem. This method
also was very fragmented and lacked in comprehensive communications across disciplines and

The priority then was to develop a system that would keep th
e current operations running
smoothly while improving on the integrity, availability and accuracy of the data being managed.
This was absolutely essential to the overall perception of the solution as being beyond criticism
and doubt. To accomplish this, th
e ETA system would need to exceed expectations for all
Levels, Academic/classroom support, Student Services and Daily campus support.

It would need to be available 24 x 7, anytime, anywhere and the network, infrastructure, and
applications would ne
ed to operate flawlessly.

The answer to meet these demands was to develop a real
time information management system
for tracking flight
training data that is h
ighly available, highly scalable and immediate with fast
web access any time, anywhere. The syst
em also needed to address usability issues with user
friendly interfaces and the portal based individualization. The data must be continually updated
and current. The system must also be secure with authentication and intrusion proof data.

Education and
Training Administration

Aviation Learning Management System

To meet the challenge of establishing a system that would provide such a system
Aeronautical University, DataRoad, Inc. and Talon Systems collaborate to produce

an Aviation
g Management System called ETA: Education and Training Administration

This system is t
he most comprehensive Flight Training Management Program ever and serves as
an enterprise model

for those using it. ETA is a
Flight Training Management Tool; a 100%
based J2EE application accessed through standard web browser. It is completely
electronic and supports concurrent operations at Daytona, Prescott, Affiliate Operations and now
the US Air Force Academy. The number of locations it supports is contin
uing to grow.

Integrating data from Embry
Riddle’s maintenance, HR, payroll, accounting and student record
systems, the ETA system provides
students, instructor/pilots and managers

with secure, Web
based access to all of the
information and tools they
require to participate in the new curriculum.

The ETA system t
ranslates Embry
Riddle’s new course for commercial pilots into a continuum of
stages, lessons and units, each structured by line
item objectives. Students can extract
customized training plan
s that guide and track their progress through the curriculum. An
electronic grade sheet automatically posts any incomplete line items until their satisfactory

Whether a lesson takes place in a cockpit or a classroom
, the ETA system identifies
schedules all human and capital resources required to fulfill the session’s line
item objectives and
confirms the readiness of these resources.

Flagging any issues, the system automatically checks all relevant details, including the student’s
isite courses, flight hours and registration, financial and health records; the instructor’s
pilot ratings and certifications for the prescribed craft and sortie; and the maintenance status of
the vehicle.

While translating documents into real
time data,
the ETA system also streamlines execution of
laden processes

from FAA
safety and security documentation to
Riddle’s own internally generated paperwork.

Far more than scheduling software, the ETA system is a repository of real
time in
formation and
tools The ETA system’s tools reinforce
best practices in managing human and capital assets,
enabling not
profit Embry
Riddle to more efficiently deploy its educational resources.

Riddle worked with Talon Systems LLC to develop the

entirely Web
based(J2EE) system.
Oracle partner DataRoad, Inc.

designed and implemented the infrastructure for ETA at one of its
secure data centers
in Jacksonville, Florida
. This infrastructure uses Oracle 9i Application
Server(9iAS) and Oracle 9i RDBMS

software, HP Servers and Alteon Load Balancers and SSL

Preparing for Growth

Due to the growth potential of the user base for the ETA system scalability and high availability
were essential. As more users come on to the system it must sca
le up appropriately and be
available immediately, 24x7. Why? Embry
Riddle provides multi
national flight training at both day
and night in multiple time zones.

For the ETA project, the system runs on a real
time, 24x7 platform that utilizes Oracle 9i Rea
Application Clusters (RAC) database, HP TruCluster Server software, and DataRoad’s technical
experience to provide a highly available, highly scalable solution to meet Embry
Riddle’s needs.
Unique to the solution is the single
system manageability of the

software, which makes operating
multiple servers as simple and economical as managing one

DataRoad’s end
end solution exploits all of these advantages to efficiently meet Embry
Riddle’s requirements for high availability, security and data integrity.

DataRoad provides a
dedicated platform for the ETA system that comprises servers, software, and networking.
DataRoad hosts and administers both the system and the application, which users access
through a secure VPN.


Prior to discussing sp
ecific configurations it is important to discuss general architecture terms and
definitions appropriate for the deployment of highly available and scalable infrastructure.


Firewalls are devices that restrict access between different LAN segments

for security purposes.
Firewalls perform this function by analyzing traffic and can make restrictions based on IP
address, port, protocol used, protocol transitions and message content. For example, Check
Point Firewall
1 products provide a software solut
ion that includes a feature called "stateful
inspection" that can restrict access based on illegal Internet protocol transitions. Cisco's PIX is an
example of an integrated hardware
software firewall solution.

Some devices that are called firewalls are s
only products that are loaded into client or
server machines. These may be useful but are inadequate for corporate firewalls that should
always be deployed in separate machines than those deploying application or infrastructure

are a main defense for sites providing Internet access. Different firewall
products vary considerably in features and performance. Appropriate use of firewalls can protect
against many common vulnerabilities by prohibiting Internet access to services such

as FTP or
rsh (especially if such services were inadvertently left running on Internet servers).

Load Balancers

Load balancers have two essential functions. The first is to load balance traffic across multiple
servers thus resulting in better scalability
. In high traffic situations this can be very important. The
second essential function is to provide fault tolerance for servers. In this case the load balancer
ensures that a single failing server does not result in loss of a critical resource. The load b
accomplishes this by routing new requests to alternate servers if one server fails. So, Load
balancing hardware is used both to provide scalability by spreading load across multiple
processors and also to provide fault tolerance in case of process
or failures.

Load balancers typically are able to route traffic in both situations where the infrastructure keeps
application state also in situations were it does not keep state. In the case of stateless
communication the load balancer can route to any o
f its managed servers since there is no state
in any particular server that is needed to correctly process the message. This is generally more
efficient since requests can always go to the least busy server but stateless operation often puts
an unacceptabl
e burden on application writers. Many Oracle products require that the
infrastructure maintain application state.

For transactions where the infrastructure keeps state, load balancers switch incoming messages
to the server containing the state. Switching
criteria are determined by analyzing cookies,
headers or other attributes. Sometimes only a single server contains the state. In that case
processor failures result in the failure of all transactions that have state in the failed processor
and such transac
tions must be restarted. In some situations there are preferred processors but all
processors can obtain the state. When failures occur in these situations, a redirect due to

failure will result in successful processing although there may be added overhead

for transactions
that had state in failed processors.

SSL Accelerators

In many sites, SSL key exchange operations can dominate CPU usage. For such sites HTTPS
appliances can result in significant cost reductions and improved performance.
panding HTTPS use improves security. Where HTTPS use is limited by performance
considerations, HTTPS accelerators should be considered. The term "sticky" or "persistent"
transaction is often used to denote transactions that should be routed to particular,

load balancer

managed hardware containing intermediate application transaction state.

There are different types of SSL Accelerators. One type is basically a math coprocessor

that offloads expensive cryptographic operations from general purpose CPUs . A

second type is
a stand
alone device that converts HTTPS to HTTP protocols. That is to say, it takes incoming
HTTPS protocols and converts them to HTTP. Since the SSL processing of the HTTPS protocol
can consume a large percentage or even most of a CPUs t
ime, offloading SSL processing may
result in a significant reduction in the number of CPUs required to support a workload. Such
reduction can result in both cost savings as well as improved scalability.

A current problem with HTTPS to HTTP appliances occu
rs when client side X.509 certificates are
used. This is because these appliances terminate the SSL session and there is no standard way
to provide the client side X.509 certificate information with the forwarded message. If client side
certificates are on
ly used to allow/deny access to a site or virtual host this may be acceptable.
However if the application or other infrastructure items need certificate information, custom
solutions are currently required. Since client side certificates are infrequently u
sed at this time,
this consideration is not important for most sites. Customers interested in use of X.509 client side
certificates with such devices should contact Oracle or appliance providers as progress toward
standard, supported solutions is being mad


Clustering, while complex in practice, is fairly simple in definition. Clustering is the grouping
together of hardware and software into nodes that work together as a single system to ensure
that an application remains online for users durin
g excessive loads, or if one of the nodes fails.

Clustering enables you to construct a multi
node system that makes several independent servers
appear like one. Multiple servers are connected together to form a single integrated system. If
any part of th
e systems goes down

either intentionally or unintentionally

failover masks the
failure to the end users, thereby making the system more available. The down member of the
cluster is then reactivated, if possible, through a restart. This reduces the need

for administrator
intervention. The system can also be scaled more effectively support more users through load
balancing. Advanced tools for managing the cluster also assist in monitoring the activities of the
system and alerting administrators to potenti
al issues.


High Availability requires a variety of approaches to deliver. Each goes hand in hand to contribute
to a highly available service to the end user.

As mention earlier, in clustered environment multiple servers act in concert with
each other to
present a single source. For a member within the clustered environment to take the place of
another that is experiencing trouble, the state of requests must be shared across all members of
the cluster. When a new cluster member takes over for

a failing member the process is executed
more smoothly due to the share requests.

In the event of a failure, transparent failover enables a member of a clustered system to take the
place of another member without the end user being aware that a change i
s taking place; in
essence totally transparent. This gives users a sense of continuity to the system. The individual
member of the cluster experiencing the downtime does not effect the operation on the user side
at all.

Once failover is executed and the s
ystem is stable again, which happens rapidly, quick
automated restarts then take place. The down member is identified and restarted automatically. If
it cannot be restarted an error is generated and administrators are notified of the situation for
attention. This process reduces the need for direct intervention on the administrative level,
thereby minimizing downtime and increasing availability.

In the event that the system has a serious failure that requires significant downtime, the cluster
can g
racefully degrade the service provided to the end user. This provides a limited level of
service, rather than presenting a total failure. Single points of failure are also reduced or
eliminated thereby limiting the risks of significant failure and unnecess
ary downtime.

High availability is also improved through the use of load balancers. Load balancing is necessary
because multiple servers servicing one application can quickly be overwhelmed and crash if the
workload is not split up. Load balancing divides

work between two or more computers. The work
gets done in the same amount of time without any one computer getting overloaded. Cluster
resources are dynamically re
balanced for optimal cluster utilization.


Scalability is also essential to m
aintaining acceptable levels of service while keeping costs under
control. System growth must be progressive and easily expandable to meet increasing demand
from the user base. A clustered environment provides the most appropriate solution. Nodes
added to
the cluster are automatically utilized; no manual re
allocation of resources is needed.


enables low cost incremental scaling, allowing DataRoad to reduce the hosting expenses to
Riddle by using only the server power it needs at any time.
Due t
o the flexible nature of
the environment, more equipment can be brought online at a moments notice to address any
scaling requirements the system may demand. This provides ERAU with a “Scale as you grow”
option that minimizes the initial capital outlay for

equipment, thereby significantly reducing hosting
fees. This approach provides for more effective costing of hosting fees based upon real and not
just anticipated growth.

In a rapidly changing environment, opportunities for growth appear and disappear r
apidly. It is
difficult to accurately predict the demand for a database or application server two years out, yet
having too little computing horsepower at any given time is unacceptable.
Even if growth is
initially underestimated, the scalability of the sy
stem will allow for cost effective sizing of
Real Application Clusters give scalability on demand because it is no longer
necessary to predict scalability needs.

Application Server (9iAS)

This paper focuses on the ‘core’ components of Ora
AS Release2. Hence, a reference to
AS in this paper in general is a reference to Oracle9
AS Release 2 J2EE and Web Cache

The components that fall in the core category are:

Web Cache
This is typically the first component of Oracle9
AS to receive the request.
For both static and dynamic requests, it can cache the result and then replay the results,
thus reducing the workload of the machines behind. In addition, these Web Cache
instances can themselves be clustered.

Oracle HTTP Server

This is the next in line after Web Cache to receive a request

this sub
system comprises a web server (based on Apache), a perl execution
environment, and a PLSQL and OC4J routing system.

Oracle Containers for J2EE (OC4J):

This is the J2EE complian
t container in
AS. It provides clustering capabilities for the J2EE components

Servlets, JSP,
and EJB. It also contains other mechanisms, such as Java Object Cache, which provides
distributed caching capabilities.

Real Application Clusters (9iR

Real Application Clusters is an option for an Oracle
Real Application
Clusters provides both scalability and availability as a single, easy to manage database product.
With Oracle
Real Application Clusters, your enterprise databa
se delivers scale out economics
with the ease of use and power of a scale up approach. For any database application, a Real
Application Cluster database looks just like an Oracle
database on a single server. Real
Application Clusters supports all types o
f applications, from update
intensive online transaction
processing to read
intensive data warehousing.

Real Application Clusters database not only appears like a standard Oracle
to users, but the same maintenance tools and practices us
ed for a single Oracle
database can
be used on the entire cluster. All of the standard backup and recovery operations, including the
use of Recovery Manager, work transparently with Real Application Clusters. All SQL operations,
including data definition

language and integrity constraints, are also identical for both

Real Application Clusters provides rapid, automatic failover for users if their servers go down.
This automatic failover capability can prevent having to go through a complex

serious of
operations to restore access to a database, actions which, if not performed promptly or correctly,
can increase the duration of downtime or even jeopardize the integrity of your data.

The Solution