Scalable and Highly Available Infrastructure for J2EE Applications

tukwilagleefulInternet και Εφαρμογές Web

31 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

77 εμφανίσεις

Scalable and

Highly Available Infrastructure

for J2EE Applications


A Case Study:

ETA
-

Education and Training Administration
System

Embry
-
Riddle Aeronautical University




written by


John Vaughan, DataRoad, Inc.

Marty Smith, Embry
-
Riddle Aeronautical Un
iversity




















Introduction

In this white paper we will discuss the development of a highly available, scalable and secure
infrastructure designed to support the operation of a web
-
based J2EE application. The project
involved implementation
of a flight training management application for Embry
-
Riddle
Aeronautical University. This application provides management for flight training operations to a
variety of organizations and is the main focus of efforts at Embry
-
Riddle to standardize, automat
e
and secure control over the activities of flight training globally.

A
pplications that provide these services must be able to combine existing information with new
business functions that deliver services to a broad range of users. These services need to
be:




Highly available, to meet the demands of a extended business environment



Secure, to protect privacy and integrity of data



Reliable and scalable, to guarantee that business transactions are accurately and
promptly processed

In reality, Java technology
is only as scalable, available, and manageable as the infrastructure on
which it runs. When the platform can’t keep up with growth in the number of users, transactions
per user, or transaction bandwidth, applications perform poorly and websites slow to a c
rawl.

Like any other enterprise application, a server
-
side Java application can be brought down by a
hardware fault, a software fault, a network fault, or an environment fault. Whatever the reason,
there is no room for downtime in an e
-
business environment
. Properly designed data centers
address the network and environment fault issues by providing redundant power and internet
connectivity.

This paper presents a case study of implementing a highly available and scalable solution that
combines Oracle9iAS, Or
acle9i RAC, SSL accelerators, and hardware load balancers. This
solution was designed and implemented for Embry
-
Riddle Aeronautical University, the largest
flight
-
training school in the world. ERAU's J2EE application supports every aspect of the school's
f
light training program, so scalability and 24/7 availability are critical.


Embry
-
Riddle Approaches the Future


Embry
-
Riddle

AeronauticalUniversity is the largest independent aeronautical university in the
world. The
not
-
for
-
profit institution educates mor
e than 24,000 students annually through thirty
degree programs. Its ROTC detachments train more Air Force pilots and commissioned officers
than any other institution except the Air Force Academy.


The flagship program at
Embry
-
Riddle,

however, is the educa
tion of commercial pilots
. This
training is a demanding process that requires
a comprehensive and continuously reviewed
program of advanced learning.
Consequently the university is launching a new curriculum to
revolutionize pilot training brining informat
ion technology to bear on the issues involved.


Embry
-
Riddle
oversees all the usual elements of an academic program such as student records,
human resources, financials and facilities. But when it comes to sending a student airborne, the
school must also t
rack a host of factors that can change by the minute, such as weather, air
traffic, condition of the plane or simulator, and the health and certifications of both the student and
instructor/pilot. For every sortie, the instructor, student and craft must me
et specific qualifications.
Relying on manual methods, Embry
-
Riddle staff scramble to match craft, student and instructor.


Compounding the challenges is the scale of Embry
-
Riddle’s operation, which includes

campuses
in Daytona and Prescott, Arizona,
130
education centers and a distance
-
learning network.
At the
Daytona campus alone, 80 instructor/pilots supervise 550 student flights daily using a
fleet of 139
instructional aircraft.


Until recently the tracking of training programs worldwide has largely
been a paper and pencil
procedure. This practice is inexpensive and requires little training to maintain but is fraught with
opportunity for error that can, inevitably, lead to inaccurate information being provided to the
training institution, trainers and

students alike.


Their approach to dealing with the increasingly dynamic requirements of this state of affairs was
to develop an automated, Internet
-
capable information management system for tracking flight
-
training data for their students. This allowed
Embry
-
Riddle to re
-
engineer the training of
commercial pilots to provide them with a better education at less cost. This new curriculum blends
all of the required skills into one seamless course.


This innovative approach to curriculum management is
Embry
-
Riddle
’s pioneering Education and
Training Administration (ETA) system, which
applies “just
-
in
-
time” methods to orchestrate the
costly human and capital assets required for flight training.
Embry
-
Riddle expects the ETA system
to enhance instructional quali
ty while reducing student expenses and institutional overhead.


As was mentioned earlier, until ETA, Flight Training Management was largely organized through
pencil and paper operations. These operations are a
ntiquated, time consuming and inaccurate.

Diff
iculty to maintaining currency of data and human error compound the problem. This method
also was very fragmented and lacked in comprehensive communications across disciplines and
organizations.


The priority then was to develop a system that would keep th
e current operations running
smoothly while improving on the integrity, availability and accuracy of the data being managed.
This was absolutely essential to the overall perception of the solution as being beyond criticism
and doubt. To accomplish this, th
e ETA system would need to exceed expectations for all
Service
Levels, Academic/classroom support, Student Services and Daily campus support.

It would need to be available 24 x 7, anytime, anywhere and the network, infrastructure, and
applications would ne
ed to operate flawlessly.


The answer to meet these demands was to develop a real
-
time information management system
for tracking flight
-
training data that is h
ighly available, highly scalable and immediate with fast
web access any time, anywhere. The syst
em also needed to address usability issues with user
-
friendly interfaces and the portal based individualization. The data must be continually updated
and current. The system must also be secure with authentication and intrusion proof data.



Education and
Training Administration


Aviation Learning Management System


To meet the challenge of establishing a system that would provide such a system
Embry
-
Riddle
Aeronautical University, DataRoad, Inc. and Talon Systems collaborate to produce

an Aviation
Learnin
g Management System called ETA: Education and Training Administration
.


This system is t
he most comprehensive Flight Training Management Program ever and serves as
an enterprise model

for those using it. ETA is a
Flight Training Management Tool; a 100%
Internet
-
based J2EE application accessed through standard web browser. It is completely
electronic and supports concurrent operations at Daytona, Prescott, Affiliate Operations and now
the US Air Force Academy. The number of locations it supports is contin
uing to grow.



Integrating data from Embry
-
Riddle’s maintenance, HR, payroll, accounting and student record
systems, the ETA system provides
students, instructor/pilots and managers

with secure, Web
-
based access to all of the
information and tools they
require to participate in the new curriculum.


The ETA system t
ranslates Embry
-
Riddle’s new course for commercial pilots into a continuum of
stages, lessons and units, each structured by line
-
item objectives. Students can extract
customized training plan
s that guide and track their progress through the curriculum. An
electronic grade sheet automatically posts any incomplete line items until their satisfactory
completion.


Whether a lesson takes place in a cockpit or a classroom
, the ETA system identifies
and
schedules all human and capital resources required to fulfill the session’s line
-
item objectives and
confirms the readiness of these resources.


Flagging any issues, the system automatically checks all relevant details, including the student’s
prerequ
isite courses, flight hours and registration, financial and health records; the instructor’s
pilot ratings and certifications for the prescribed craft and sortie; and the maintenance status of
the vehicle.


While translating documents into real
-
time data,
the ETA system also streamlines execution of
paper
-
laden processes

from FAA
-
mandated
safety and security documentation to
Embry
-
Riddle’s own internally generated paperwork.


Far more than scheduling software, the ETA system is a repository of real
-
time in
formation and
tools The ETA system’s tools reinforce
best practices in managing human and capital assets,
enabling not
-
for
-
profit Embry
-
Riddle to more efficiently deploy its educational resources.


Embry
-
Riddle worked with Talon Systems LLC to develop the

entirely Web
-
based(J2EE) system.
Oracle partner DataRoad, Inc.

designed and implemented the infrastructure for ETA at one of its
secure data centers
in Jacksonville, Florida
. This infrastructure uses Oracle 9i Application
Server(9iAS) and Oracle 9i RDBMS

software, HP Servers and Alteon Load Balancers and SSL
accelerators.


Preparing for Growth


Due to the growth potential of the user base for the ETA system scalability and high availability
were essential. As more users come on to the system it must sca
le up appropriately and be
available immediately, 24x7. Why? Embry
-
Riddle provides multi
-
national flight training at both day
and night in multiple time zones.


For the ETA project, the system runs on a real
-
time, 24x7 platform that utilizes Oracle 9i Rea
l
Application Clusters (RAC) database, HP TruCluster Server software, and DataRoad’s technical
experience to provide a highly available, highly scalable solution to meet Embry
-
Riddle’s needs.
Unique to the solution is the single
-
system manageability of the

software, which makes operating
multiple servers as simple and economical as managing one
.


DataRoad’s end
-
to
-
end solution exploits all of these advantages to efficiently meet Embry
-
Riddle’s requirements for high availability, security and data integrity.

DataRoad provides a
dedicated platform for the ETA system that comprises servers, software, and networking.
DataRoad hosts and administers both the system and the application, which users access
through a secure VPN.




Definitions


Prior to discussing sp
ecific configurations it is important to discuss general architecture terms and
definitions appropriate for the deployment of highly available and scalable infrastructure.


Firewalls

Firewalls are devices that restrict access between different LAN segments

for security purposes.
Firewalls perform this function by analyzing traffic and can make restrictions based on IP
address, port, protocol used, protocol transitions and message content. For example, Check
Point Firewall
-
1 products provide a software solut
ion that includes a feature called "stateful
inspection" that can restrict access based on illegal Internet protocol transitions. Cisco's PIX is an
example of an integrated hardware
-
software firewall solution.


Some devices that are called firewalls are s
oftware
-
only products that are loaded into client or
server machines. These may be useful but are inadequate for corporate firewalls that should
always be deployed in separate machines than those deploying application or infrastructure
software
.
Firewalls

are a main defense for sites providing Internet access. Different firewall
products vary considerably in features and performance. Appropriate use of firewalls can protect
against many common vulnerabilities by prohibiting Internet access to services such

as FTP or
rsh (especially if such services were inadvertently left running on Internet servers).


Load Balancers

Load balancers have two essential functions. The first is to load balance traffic across multiple
servers thus resulting in better scalability
. In high traffic situations this can be very important. The
second essential function is to provide fault tolerance for servers. In this case the load balancer
ensures that a single failing server does not result in loss of a critical resource. The load b
alancer
accomplishes this by routing new requests to alternate servers if one server fails. So, Load
balancing hardware is used both to provide scalability by spreading load across multiple
processors and also to provide fault tolerance in case of process
or failures.


Load balancers typically are able to route traffic in both situations where the infrastructure keeps
application state also in situations were it does not keep state. In the case of stateless
communication the load balancer can route to any o
f its managed servers since there is no state
in any particular server that is needed to correctly process the message. This is generally more
efficient since requests can always go to the least busy server but stateless operation often puts
an unacceptabl
e burden on application writers. Many Oracle products require that the
infrastructure maintain application state.


For transactions where the infrastructure keeps state, load balancers switch incoming messages
to the server containing the state. Switching
criteria are determined by analyzing cookies,
headers or other attributes. Sometimes only a single server contains the state. In that case
processor failures result in the failure of all transactions that have state in the failed processor
and such transac
tions must be restarted. In some situations there are preferred processors but all
processors can obtain the state. When failures occur in these situations, a redirect due to

failure will result in successful processing although there may be added overhead

for transactions
that had state in failed processors.


SSL Accelerators

In many sites, SSL key exchange operations can dominate CPU usage. For such sites HTTPS
accelerator
appliances can result in significant cost reductions and improved performance.
Ex
panding HTTPS use improves security. Where HTTPS use is limited by performance
considerations, HTTPS accelerators should be considered. The term "sticky" or "persistent"
transaction is often used to denote transactions that should be routed to particular,

load balancer

managed hardware containing intermediate application transaction state.




There are different types of SSL Accelerators. One type is basically a math coprocessor

that offloads expensive cryptographic operations from general purpose CPUs . A

second type is
a stand
-
alone device that converts HTTPS to HTTP protocols. That is to say, it takes incoming
HTTPS protocols and converts them to HTTP. Since the SSL processing of the HTTPS protocol
can consume a large percentage or even most of a CPUs t
ime, offloading SSL processing may
result in a significant reduction in the number of CPUs required to support a workload. Such
reduction can result in both cost savings as well as improved scalability.


A current problem with HTTPS to HTTP appliances occu
rs when client side X.509 certificates are
used. This is because these appliances terminate the SSL session and there is no standard way
to provide the client side X.509 certificate information with the forwarded message. If client side
certificates are on
ly used to allow/deny access to a site or virtual host this may be acceptable.
However if the application or other infrastructure items need certificate information, custom
solutions are currently required. Since client side certificates are infrequently u
sed at this time,
this consideration is not important for most sites. Customers interested in use of X.509 client side
certificates with such devices should contact Oracle or appliance providers as progress toward
standard, supported solutions is being mad
e.


Clustering

Clustering, while complex in practice, is fairly simple in definition. Clustering is the grouping
together of hardware and software into nodes that work together as a single system to ensure
that an application remains online for users durin
g excessive loads, or if one of the nodes fails.



Clustering enables you to construct a multi
-
node system that makes several independent servers
appear like one. Multiple servers are connected together to form a single integrated system. If
any part of th
e systems goes down


either intentionally or unintentionally
-

failover masks the
failure to the end users, thereby making the system more available. The down member of the
cluster is then reactivated, if possible, through a restart. This reduces the need

for administrator
intervention. The system can also be scaled more effectively support more users through load
balancing. Advanced tools for managing the cluster also assist in monitoring the activities of the
system and alerting administrators to potenti
al issues.


Availability

High Availability requires a variety of approaches to deliver. Each goes hand in hand to contribute
to a highly available service to the end user.


As mention earlier, in clustered environment multiple servers act in concert with
each other to
present a single source. For a member within the clustered environment to take the place of
another that is experiencing trouble, the state of requests must be shared across all members of
the cluster. When a new cluster member takes over for

a failing member the process is executed
more smoothly due to the share requests.


In the event of a failure, transparent failover enables a member of a clustered system to take the
place of another member without the end user being aware that a change i
s taking place; in
essence totally transparent. This gives users a sense of continuity to the system. The individual
member of the cluster experiencing the downtime does not effect the operation on the user side
at all.


Once failover is executed and the s
ystem is stable again, which happens rapidly, quick
automated restarts then take place. The down member is identified and restarted automatically. If
it cannot be restarted an error is generated and administrators are notified of the situation for
further
attention. This process reduces the need for direct intervention on the administrative level,
thereby minimizing downtime and increasing availability.


In the event that the system has a serious failure that requires significant downtime, the cluster
can g
racefully degrade the service provided to the end user. This provides a limited level of
service, rather than presenting a total failure. Single points of failure are also reduced or
eliminated thereby limiting the risks of significant failure and unnecess
ary downtime.


High availability is also improved through the use of load balancers. Load balancing is necessary
because multiple servers servicing one application can quickly be overwhelmed and crash if the
workload is not split up. Load balancing divides

work between two or more computers. The work
gets done in the same amount of time without any one computer getting overloaded. Cluster
resources are dynamically re
-
balanced for optimal cluster utilization.


Scalability

Scalability is also essential to m
aintaining acceptable levels of service while keeping costs under
control. System growth must be progressive and easily expandable to meet increasing demand
from the user base. A clustered environment provides the most appropriate solution. Nodes
added to
the cluster are automatically utilized; no manual re
-
allocation of resources is needed.


This

enables low cost incremental scaling, allowing DataRoad to reduce the hosting expenses to
Embry
-
Riddle by using only the server power it needs at any time.
Due t
o the flexible nature of
the environment, more equipment can be brought online at a moments notice to address any
scaling requirements the system may demand. This provides ERAU with a “Scale as you grow”
option that minimizes the initial capital outlay for

equipment, thereby significantly reducing hosting
fees. This approach provides for more effective costing of hosting fees based upon real and not
just anticipated growth.


In a rapidly changing environment, opportunities for growth appear and disappear r
apidly. It is
difficult to accurately predict the demand for a database or application server two years out, yet
having too little computing horsepower at any given time is unacceptable.
Even if growth is
initially underestimated, the scalability of the sy
stem will allow for cost effective sizing of
infrastructure.
Real Application Clusters give scalability on demand because it is no longer
necessary to predict scalability needs.


Application Server (9iAS)

This paper focuses on the ‘core’ components of Ora
cle9
i
AS Release2. Hence, a reference to
Oracle9
i
AS in this paper in general is a reference to Oracle9
i
AS Release 2 J2EE and Web Cache
Install.


The components that fall in the core category are:




Web Cache
:
This is typically the first component of Oracle9
i
AS to receive the request.
For both static and dynamic requests, it can cache the result and then replay the results,
thus reducing the workload of the machines behind. In addition, these Web Cache
instances can themselves be clustered.



Oracle HTTP Server
(OHS):

This is the next in line after Web Cache to receive a request


this sub
-
system comprises a web server (based on Apache), a perl execution
environment, and a PLSQL and OC4J routing system.



Oracle Containers for J2EE (OC4J):

This is the J2EE complian
t container in
Oracle9
i
AS. It provides clustering capabilities for the J2EE components


Servlets, JSP,
and EJB. It also contains other mechanisms, such as Java Object Cache, which provides
distributed caching capabilities.


Real Application Clusters (9iR
AC)

Real Application Clusters is an option for an Oracle
9i
database.
Oracle
9i
Real Application
Clusters provides both scalability and availability as a single, easy to manage database product.
With Oracle
9i
Real Application Clusters, your enterprise databa
se delivers scale out economics
with the ease of use and power of a scale up approach. For any database application, a Real
Application Cluster database looks just like an Oracle
9i
database on a single server. Real
Application Clusters supports all types o
f applications, from update
-
intensive online transaction
processing to read
-
intensive data warehousing.

Oracle
9i
Real Application Clusters database not only appears like a standard Oracle
9i
database
to users, but the same maintenance tools and practices us
ed for a single Oracle
9i
database can
be used on the entire cluster. All of the standard backup and recovery operations, including the
use of Recovery Manager, work transparently with Real Application Clusters. All SQL operations,
including data definition

language and integrity constraints, are also identical for both
configurations.


Real Application Clusters provides rapid, automatic failover for users if their servers go down.
This automatic failover capability can prevent having to go through a complex

serious of
operations to restore access to a database, actions which, if not performed promptly or correctly,
can increase the duration of downtime or even jeopardize the integrity of your data.



The Solution