Washington Mutual WebSphere Health Check on z/OS

greasyservantInternet και Εφαρμογές Web

30 Ιουλ 2012 (πριν από 5 χρόνια και 3 μήνες)

1.031 εμφανίσεις


1


1.
z
/OS Infrastructure


WLM considerations:


Setting the WLM goals properly can have a very significant effect
on application throughput. The
WebSphere for z/OS system address spaces should be given a
fairly high priority. As work
comes
into the system, t
he work classification of the enclaves sh
ould be based on your business
goals.


Use STC classification rules to classify work for daemons, controller and servant regions.


Classify location service daemons

as

SYSSTC

or high velocity
.


Classify controller r
egions as high velocity. These

regions do some processing to receive work into the
system, manage the HTTP transport handler, classify the work and do other housekeeping tasks.


Classify servants regions with
reasonably high
velocity goals.


Java garbage
collection
(GC)
runs under this classificatio
n. Java GC is a CPU and storage
intensive process, so if you set the velocity goal too high GC could consume more of the
system resources than desired. On the other hand, if your Java heap is correctly tuned, GC

for each server region should run no more than 5% of the time. Also, providing proper
priority to GC processing is necessary since other work in the server region is stopped
during much of the time GC is running.

JSP compiles
also
run
under this

classific
ation. If your system is configured to do these
compiles at runtime, setting the velocity goal too low could result in longer delays waiting
for JSP compiles to complete.


Application Environment for work running under servants
. This is the actual executi
on of the
application. This work is classified under the CB classification as described below:




Classifications for application environment work is classified under
CB
.




Classif
ication

based
up
on server name, se
rver instance name, User ID, or

transaction
class



Per
centage response time goal is recommended
.

These goals should be achievable.

Keep in mind, even administration console work is classified under CB.


A

goal that 80% of the work will complete in .25 seconds is a typical goal.


Velocity goals for

application work are not meaningful and should be avoided.




A reasonable
default service class
should be defined
for C
B, the d
efault is SYSOTHER

which is a discretionary goal and on a busy system will provide very poor service.


Your goals can be multi
-
p
eriod. This might be useful if you have distinctly short and long run
n
ing
transactions in the same service class. On the other hand, it is usually better to filter this work into a
different service class if you can. Being in a different service class will

place the work in a different
servant which allows WLM much more latitude in managing the goals.

The goals that a
re defined
should be attainable not optimistic.


2

Current
Finding
:



The first two steps of the control
ler

region
start
-
up proc
es
s invoke

the
BPXBATCH shell script, and
does not inherit the service classification of the started task (according to STC rules.) This can
greatly elongate the startup of a control
ler

region because

i
t will be classified on the OMVS rules
and if not handled appropriate
ly on a busy system will cause multiple minutes delay before
BBOCTL gets control.


The purpose of the additional step is to check the service level delivered, applied, and pending for
WebSp
here, and log the results.


Classify server controller jobnames wi
th WLM OMVS
c
lassification rules.

Here is the URL for
the
OMVS classificat
i
on techdoc:


http://www
-
03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD102730



W
ebSpher
e

HFS considerations:

Non
-
shared HFS datasets are implemented, which is preferred over shared

HFS

for WebSphere

V5

environments
.

You
may
want to c
onsider using zFS
.
z/OS has introduced a new file system called zFS which
should provide improved file system

access. You may benefit from using the zFS for your UNIX file
system.

For further information, you may review section “
UNIX System Services (USS) tuning tips for z/OS

of the manual
WebSphere Application Server for z/OS V 5.1 Performance Monitoring and Tu
ning

(
SA22
-
7963
)

SMF considerations:

Ensure

that unne
ce
ssary SMF records are
NOT

being recorded

in the production regions.

Enable
activity records for diagnostic purposes only
!



To enable SMF type 120 records, click New, and specify one or more of the fo
llowing properties:




server_SMF_server_activity_enabled = 1

(or true)



server_SMF_server_interval_enabled = 1 (or true)



server_SMF_container_activity_enabled = 1 (or true)



server_SMF_container_interval_enabled = 1 (or true)



server_SMF_interval_length, v
alue=n, where n is the interval, in seconds, that the
system will use to write records for a server instance. Set this value to 0 to use the
default SMF recording interval.


Current
Finding
:



LE considerations:


Ensure that you are
NOT

using the followin
g options in production:


RPTSTG(ON)

RPTOPTS(ON)

HEAPCHK(ON)


3


Consider using
HEAPP(ON) to get the default LE heappools. LE will be providing

additional pools
(more than 6) and larger than

2048MB cell size in future releases of z/OS. You may be able to
take
advantage of these increased pools and cell sizes, if you have that service on your system
.

Chapter 3 of the
WebSphere Application Server for z/OS V5.1 Performance
Monitoring

and Tuning

provides steps to fine tune the LE heap.


Current
Finding
:



For

best performance, ensure that LE and C++ modules are loaded into LPA. The SCEELPA
dataset can be

added to LPA as well as the following sample PROGxx members in SCEESAMP:


CEEWLPA

(LE Runtime)

ED
C
WLPA

(C++ Runtime)


Current
Finding
:



2. WebSphere Infr
astructure


JVM considerations:


The Java heap parameters also influence the behavior of garbage collection. Increasing the heap
size supports more object creation. Because a large heap takes longer to fill, the application runs
longer before a garbage col
lection occurs. However, with a larger heap garbage collection takes
longer. Initial and maximum Java heap sizes can be specified using the
-
Xms and
-
Xmx options.
The general recommendation is that these values should be the same.


Note that Java Heap info
rmation is contained in SMF records and can be viewed dynamically using
the console command
:


MODIFY

<controller region>,
DISPLAY,JVMHEAP
.


Starting with the default value for initial and maximum heap size, capture verbose:gc output (or
equivalent performan
ce monitor data) running your standard workload. Examine the verbose:gc
output for garbage collections which occurred after the workload reaches a steady state. Compare
the following statistics:




Numb
er of garbage collection calls



Average duration of a si
ngle garbage collection call.



Ratio between the length of a single garbage collection call and the avera
ge time
between calls.


This can be viewed as the percentage of time spent doing GC processing.



System paging activity (from RMF or another system mon
itor).


Current
Finding
:



If the heap free space settles at 85% or more and the percentage of time in GC processing is low,
consider decreasing the initial and maximum heap size values because the application server and
the application are under
-
utilizin
g the memory allocated for heap. If system real storage is
constrained, overall performance may be improved by reducing the Java heap size, even if the
percentage of time in GC is higher than desired.


It is good for the Initial Heap Size to equal the Maxi
mum Heap Size because it allows the allocated
storage to be completely filled before GC kicks in.

Otherwise, GC will run more frequently than

4

necessary, potentially impacting performance.

Make sure the region is large enough to hold the
specified JVM hea
p.

Beware of making the Initial Heap Size too large.

While it initially improves
performance by delaying garbage collection, it ultimately affects response time when garbage
collection eventually kicks in (because it runs for a longer time).

Paging acti
vity on your system
must also be considered when you set your JVM heap size.

If your system is already paging
heavily, increasing the JVM heap size might make performance worse rather than better.

To
determine if you are being affected by garbage collect
ion, you can enable Verbose Garbage
Collection on the JVM Configuration tab.


The default is not enabled.

This will write a report to the
output stream each time the garbage collector runs.

This report should give you an idea of what is
going on with Jav
a GC.


Here is the URL for GC techdoc help:

http://www
-
03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD1012
16


Ensure

that the JIT
C

is enabled
, especially in the production regions.


J2RE 1.4.2 IBM z/OS Persistent Reusable VM build cm142
-
20050929 (SR3)

(
JIT enabled: jitc
)
Java(TM) 2 Runtime Environment, Standard Edition


Keep up with the most current version of
the
JVM, almost every release improves performance.

The informational APAR II13519 can be used to identify the available levels of the JVM.


Cu
rrent
Finding
:



Currency of WebSphere is also important. If you let your environment age significantly, upgrading
WebSphere to fix a problem you’re encountering can become sizable.
For example, if you need to
apply a PTF that became available last month

and your current level is a year old you will need to
apply

11 months worth of maintenance.


It might be desirable to use more controller regions thus reducing the number of servants under a
controller. This would make restarts a faster process and also
lessen the impact if a controller
region is lost.


There

are numerous timeouts

and settings
. It is assumed that the client has various
timers/timeouts that should be reviewed to insure that they are in sync with the WebSphere
timeouts such as those listed

below:




Security cache

Ususally the defaults for size

and timeout values work ok



LTPA



HTTP



KeepAlive


The WebSphere Application Server class loading function uses asynchronous scope alarms instead of
Java timers. If you put the alarm manager into quiet m
ode, class reload only runs when the server
completes the processing of a request. If the server is not processing any requests, the class reload
function does not run. To enable, set the following variable in the servant JVM Custom Properties


com.ibm.e
js.am.mode.workbased=true

(currently not coded)



Other considerations:


Application Connections



5

DB2


JDBC

is using
the DB2 Universal Driver con
figured

as Type
2 for connections to the local
DB2
datasharing

members
. Th
is is the preferred configuration
for JDBC connections as
the Universal
Type 2
driver provides
two
-
phased commit processing via RRS and
as well
as the
the best performance.


It might be good to review whether to use JDBC or SQLJ. The Rebook
DB2 for z/OS and
OS/390: Ready for Java

(SG24
-
64
35)
may be helpful:


Ensur
e

DB2
t
racing
is turned off. To do so, verify

the db2.jcc.propertiesFile jvm property
has been defined to specify a DB2 jcc properties file to the WebSphere Application Server
for z/OS,
ensure that the following trace
statements
in the

file are commented out if they
are specified:


jcc
.override.traceFile=<file name>

db2.jcc.t2zosTraceFile=<file name>


Current
Finding
:



You will have to define more connections called threads in DB2.

WebSphere for z/OS uses
a lot of threads. Some
times this is the source of throughput bottlenecks since the server will
wait at the create thread until one is available.


The Connection Pool statistics shoiuld be monitored as the workload increases.


Current
Finding
:



Make sure you are current with JD
BC maintenance. Many performance improvements
have been made to JDBC. To determine the JDBC mainten
ance level, enter the following
from
:

OMVS


java com.ibm.db2.jcc.DB2Jcc

version


It is

recommend
ed

that you enable dynamic statement caching in DB2. To do t
his, modify
your ZPARMS to say
CACHEDYN(YES) MAXKEEPD(16K)
. Depending on the application,
this can make a very significant improvement in DB2 performance. Specifically, it can help
JDBC and LDAP quer
ies
.


Typical tuning of the DB2 environment should also b
e performed.



6

MQ Series


MQ

connections are being made using BINDINGS mode and

shared queues to CI
CS
systems
within the parallel sysplex.


Ensure MQ
tracing

is turned off. To do so, modify the following MQ ZPARMS parameter as
follows:


TRACSTR=NO



Simil
arly,
it is recommended
to turn off tracing in the channel initiator. Unlike base MQ
tracing, this parameter cannot be enabled dynamically. To turn tracing back on for debug
purposes, you will need to reassemble your MQ XPARMS. Enabling channel initiator
t
racing can degrade your system by 5
-
10%. The
following parameter controls this tracing:


TRAXSTR=NO,


START TRACE AUTOMATICALLY YES|NO



Ensure that your logger configuration is optimal by using SMF 88 records. See the tuning
section of
z/OS MVS S
etting Up a Sysplex

or the chapter on System Logger Accounting in
z/OS MVS System Management Facilities (SMF)

for details. In any case, you should
monitor the logger to ensure that there is a sufficient size in the CF and that offloading is
not impacting t
he overall throughput. The transaction logs are one of the only shared I/O
intensive resources in the mainline and can affect throughput dramatically if they are
mistuned.


Typical tuning of the MQ environment should also be performed.


Performance data sh
ould be collected and sent per the email I sent you for analysis.



Resource Recovery Services
-

RRS


If you have no need for the

archive log, you may consider not defining it as it will continue
to grow unless it redefined.




Application Reload Interval


Disable class reloading

(Check the box "Reload Enabled" under

applications
-
> enterprise
applications

-
> <application name> and set the "Reload Interval" to zero).


Or, if reload is needed, set the "Reload Interval" to a high value.





7

Distributing HTTP
Requests over multiple Servants


WebSphere
uses a "hot server" strategy to route HTTP requests
.


By this, we mean that WebSphere r
oute
s

to servant regions which had recently
dispatched work with threads available

(i.e.
"hot servers"
). These “hot servers”
have pages in memory, application methods and cache full of data.

HTTP
requests with session affinity are routed to the servant region where the

session
object(s) reside.


However, this can cause imbalances in some situations:


"Hot" servant regions can g
et over
-
loaded with work
.
GC and loss of a servant
region can impact many sessions.


You can enable WebSphere to distribute

HTTP requests evenly across servant regions
.


To do so, s
pecify
WLMStatefulSession
=1 for the desired server(s)

Click
Servers > Appl
ications servers

Under Server Infrastructure, click
Administration > Administration Services

Under Additional Properties, click
Custom Properties


Optimize the minimum and maximum number of servant regions.



You should review whether to
eliminate
any
tra
nsaction class mapping
s being
performed.


Minimize
s

the number of different service classes for these servers.



Tracing


Ensure that component tracing is disabled for WebSphere. To determine if tracing is
disabled, the following MVS command can be entere
d:


D TRACE,COMP=ALL


To change a component to its minimum tracing level issue:


TRACE CT
,OFF,COMP=<component identified>


Current
Finding
:




Ensure JRAS tracing is disabled by setting in the admin console:


Troubleshooting → Logs and trace → server_name →diagnostic trace


Set the following trace specs: *=all=disable


Current
Finding
:



Monitoring


RMF reports should be run regularly to track the growth of the application during the rollout.
This should to pr
ovide the necessary information to properly create realistic projections for

8

the capacity requirements of the application for future rollouts to the indicated maximum of
27500 terminals.


The RMF reports request should include CF Activity Reports to monito
r the CF utilization
plus and the sizing of the structures, especially those for RRS, DB2, and MQ.


It has been
recommended that
Luis Tosta De Sa

should be engaged to assist with this
effort.


Application Issues



Application Logging


It app
ears that the a
pplication is generating large amounts of log information being directed
to SYSPRINT. The application should limit the amount of logging in the production
environment to the absolute minimum. Logging to
a HFS is also not recommended, in this
situation an

unshared zFS is recommended.



Pass By Reference


Pass by reference s
pecifies how the ORB passes parameters.

If enabled, the ORB passes
parameters by reference instead of by value, which avoids making an object copy.


If you
do not enable pass by referen
ce, the parameters are copied to the stack before every
remote method call is made, which can be expensive.


You can use this option only when the Enterprise JavaBeans (EJB) client and the EJB are
on the same classloader.

This requirement means that the
EJB client and the EJB must be
deployed in the same EAR file.


If the EJB client and the EJB server are installed in the same WebSphere Application
Server instance, and the client and server use remote interfaces, enabling
p
ass by
reference can improve per
formance up to 50%

. Pass by reference helps performance only
where non
-
primitive object types are passed as parameters.

Therefore, int and floats are
always copied, regardless of the call model.


Enable this property with caution, because unexpected beha
vior can occur. If an object
reference is modified by the remote method, the caller might change
.

If this variable isn’t
coded, pass by value is used.
To enable code the following variable under”


Servers

>

Application Servers

>

serverName

>

ORB Services


com.ibm.CORBA.iiop.noLocalCopies
=true


Current
Finding
:




System Authorization Facility


Disabl
e
SAF calls for successful HFS accesses

via the BPX.SAFFASTPATH Facility class
if audits of successful HFS accesses are not needed and you are not using the
S
AF
callable services router installation exit

IRRSXT00.




9

Security Domain Identifier


If multiple

WebSphere cells are going to be configured utilizing a common security
database, you may want to consider utilizing a security domain identifier. A security

domain identifier will provide unique EJBROLE checks that are performed by the
administration console to control what administration functions the user can perform. If no
identifier has been defined, a security administrator in one WebSphere cell has acc
ess the
same access to all cells that are not using an identifier.


For example:


If a security domain identifier has been defined as PROD, then all EJBROLE
accesses will be checked against PROD.<ejbrole>. If the request would normally
access administrato
r as defined by the administration console

applications’s
deployment descriptor
, WebSphere will change the EJBROLE from administrator

to
PROD.administrator thus providing a unique EJBROLE name for the cell without
any application changes.


Access to applic
ations that utilize EJBROLEs are also controlled in the same way as the
administration console.


The same security domain identifier is also used as the APPL name, the APPL class is
active on the system.

In this example, a check would be made against PROD

for all
accesses to this cell.





3. Network Infrastructure


T
CP/IP considerations:


First, ensure that you have defined enough sockets to your syste
m and that the default socket
time
-
out of 180 seconds is not too high. To allow enough

sockets, update th
e BPXPRMxx
parmlib
member:


Set MAXSOCKETS for the AF_INET filesystem high eno
ugh


NETWORK DOMAINNAME(AF_INET) DOMAINNUMBER(2) MAXSOCKETS(30000)


Current
Finding
:



Set MAXFILEPROC high enough.


MAXFILEPROC(
nnnnn
)


Current
Finding
:



The general rule of
thumb is

setting MAXSOCKETS and MAXFILEPROC to at
least 5000 for low
-
throughput,
10000 for medium
-
throughput, and 35000 for high
-
throughput WebS
phere transaction
en
v
i
ro
nments.

MAXSOCKETS should at least be equal to MAXFILEPROC. A socket open in Unix
Sys
tems Services is considered an open file.
Setting high values for these parameters shou
ld not
cause excessive use of
resources unless the sockets or files are actually allocated.


10

The maximum value for MAXSOCKETS is
16777215

and for MAXFILEPROC is
131071
.


C
heck the TCPIP profile dataset to ensure that NODELAYACKS

is specified
. C
hanging this could
improve throughput by as much as 50% (this is particularly useful when dealing with trivial
workloads). This setting is important for good performance when runn
ing SSL.
NODELAYACKS
can be specified at on the TCPCONFIG,
PORT, PORTRANGE, BEGINROUTES, and GATEWAY

statements. DELAYACKS is the default.



Current
Finding
:



You should ensure that your DNS configuration is optimized so tha
t lookups for frequently
-
used

servers and clients are being cached.


Caching is sometimes related to the name server's Ti
me
To
Live (TTL) value.

On the one hand, setting the TTL high
will ensure good cache hits.


However,
setting it high also means that, if the Daemon goes d
own, it w
ill take a while for
everyone in the
network to be aware of it.


A good way to verify that your DNS configuration is optimized is to issue the
oping

and
d
nslookup

USS commands. Make sure they respond in a reaso
nable amount of time. Often a
DNS or DNS
serve
r name will cause

delays of 10 seconds or more.


Increase the size of the TCPIP send and receive buffers from the default of 16K to at least 64K.
The size of the buffers

including control information beyond what is present in the data that you are
sending

in your application.To do this specify the following:


TCPCONFIG

TCPSENDBFRSIZE 65535



TCPRCVBUFRSIZE 65535


Note:
It would not be unreasonable, in some cases, to specify 256K buffers


Current
Finding
:




Increase the default listen backlog. This is use
d to buffer spikes
in new connections which come
with a protocol like HTTP. The default listen backlog is 10 re
quests. We recommend that you
increase this

value to something larger. For
example:


protocol_http_backlog=100

protocol_https_backlog=100

protoco
l_iiop_backlog=100

protocol_ssl_backlog=100


Current
Finding
:




11

In the most demanding benchmarks you may find that even

defining 65K sockets and file
descriptors does not give you enough ’free’ sockets to run

100%. When a socket is closed
abnormally (for
example, no longer needed) it is not made available immediately. Instead it is
placed into a state called finwait2 (this is what shows up in the netstat
-
s command). It waits there
for a period of time before it is made available in the free pool. The defa
ult for this is 600 seconds.


Note:

Unless you have trouble using up sockets, we recommend that you leave this set to the
default value.


If you are using z/OS V1.2 or above, you can control the amount of time the socket stays in finwait2
state by specifyi
ng the following
TCPCONFIG.
When this timer expires, it is reset to 75 seconds
and when it expires a second time, the connection is dropped.

The value can be from 60
-
3600.


FINWAIT2TIME 60


Current
Finding
:




Additional information can be found in the
l


V
6
.
0

Information Center and Performance and Tuning

manual
available at the following URL:


http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp