IBM Tivoli JVM Monitoring – Best Practices

watermelonroachdaleInternet και Εφαρμογές Web

30 Ιουλ 2012 (πριν από 5 χρόνια και 2 μήνες)

449 εμφανίσεις

IBM Tivoli

JVM Monitoring


Best Practices

Steve Klopfer

Technical Specialist, IBM

scklopf
@us.ibm.com


IBM Tivoli

Definitions


Monitoring


Observing performance data in real time to find
and correct resource, throughput, or response time problems.



Trending


The analysis of data with the intention of identifying
discernable patterns.



Forecasting


The projection of those identified patterns on
business growth patterns to understand the impact on business
processes.



Capacity planning


The response to forecasts that ensures the
integrity of business processes.

IBM Software Group | Tivoli software

Capacity/Load model

IBM Software Group | Tivoli software

Typical WAS/J2EE Application Components

CPU

(AIX, Solaris, Windows)

Component interactions

Production JVM

(AIX, AS400, HP
-
UX, Linux, Solaris, Unix, Windows, OS/390, z/OS)

Application Server

J2EE Application



EJB





Servlet


EJB

CICS

Transaction

Gateway

MQSeries

Connector

JDBC Driver

Thread

Pool

EJB

Pools

JDBC

Pools

Mainframe

Back
-
end systems

Database

Memory
Management

J2EE Services

File and
Network I/O

Customer

Transactions

J2EE components

Back
-
end connectors





HTTP
Server


plugin

IBM Software Group | Tivoli software

© 2007 IBM Corporation

5

ITCAM for WebSphere and ITCAM for J2EE Version 6.1

What kinds of Problems does JVM Monitoring Help
Solve?



Request / Transaction problems


Slow or Hung requests


Intermittent performance problems


Correlation to remote EJB containers, CICS, IMS, MQ


Real time diagnostics


In flight request search and diagnose capability with Java stack trace and thread dumps in real time


Memory leaks


Monitor JVM heap size, memory usage and garbage collection patterns,


Heap snapshots


Resource monitoring



Connection Pools, JDBC, Thread pool, etc


Non
-
intrusive Diagnostic data collection for key application components



JMS, SCA, Portlets (ITCAM for WS only), Web Services, etc.


Problem Situation Automation


Alerts and Traps for hard to re
-
create problems and problem context for later diagnosis


Problem recreation



Provides production data for hard to re
-
create problems via integration with Rational Performance
Tester (RPT) and IBM Performance Optimization Toolkit (IPOT)


How is it doing today and how will it do tomorrow?


Historical and Trending reports

IBM Software Group | Tivoli software

Questions to Ask when troubleshooting


Is the problem re
-
creatable?


Did it ever work?


If it did, what changed


configuration, additional installation, product upgrade etc.


Does environment matter e.g. works in test/development but not in production


What is the topology of the environment


What external systems are involved?


Any connectivity (firewall), security


authentication, expired passwords issues?


Is there any workload considerations


Is the problem happening under heavy workloads?


Network or bandwidth issues?


Is there a pattern to the problem e.g. every Monday morning at 10 AM?



IBM Tivoli

What must a good monitoring product do?

A clever person solves a problem. A wise person avoids it.
--

Einstein



It must monitor the environment 24 X 7.


Real time visualization tools are not adequate unless you plan on
having highly paid analysts monitoring these tools 24 X 7.



It must support intelligent alerting


Alerting tools must acquire and correlate metrics from multiple
sources.



It must exhibit a depth of monitoring across the breadth of
technologies that spans, at minimum, end
-
user experience
(both real and synthetic), application servers, and data base
servers.



IBM Tivoli

Monitoring Levels


Vertical levels, not Horizontal levels


Monitoring On Demand


Change monitoring level as needed without restarting either the
applications or the application servers


No need to pinpoint specific classes or methods in advance (i.e., no
need to designate what needs to be monitored)


“Level 1”


Request Level
-

Production


100% of System Resource information


100% of incoming requests/transactions


“Level 2”


Component Level


Problem Determination


View major application events (EJB’s, servlets, JDBC, JNDI, etc.)


“Level 3”


Method Level
-

Tracing


Adds method trace information for problem determination and
performance analysis.



IBM Tivoli

Using the Tool Efficiently


Everyone assumes they need method level data for every
transaction in Production


What would you do with that much data?


Gain Application/Transaction Understanding in Test/QA, workload
understanding in Production


Use Traps and Alerts to find anomalies and collect detailed data


Test/QA


Use L2/L3 for Transaction/Application Analysis


Top Methods Used (L3)


Most CPU Intensive methods (L3)


Top Slowest Methods (L3)


Transaction Component (L2) Trace


Transaction Method (L3) Trace


SQL Profile (L2)

IBM Tivoli

Application Performance Analysis





Work with Defined Objectives


Throughput / Response time Goals from SLA’s


Identify and Fix any Performance Problems Early


Slow Transactions, Memory Leaks, WebSphere Performance
Tuning


Best Practices for Performance Tuning and Analysis


Collect the information about the applications and the environment.


Identify Key Transactions


Conduct Transaction Profiling


Conduct Workload Profiling


Measure the baseline matrix for various performance parameters
before tuning


Leverage your tools in conjunction with load testing tools to analyze
and tune application performance

IBM Tivoli

Focus on Best Practices






Identify all key transactions in the workload mix


Most frequently used


Most important to application


Set workable limit e.g. 10
-
20



Conduct Transaction Profiling to obtain basic understanding of
what these key transactions do


Code Flow (component and method level)


Component Profile


Method Profile


Event timings for each component and method

IBM Tivoli

Transaction Profiling




Transaction Profiling refers to tracing the entire execution of a selected
request (HTTP or EJB invocation)



Normally the best practice is to prepare a single user automated test
script that fires off such transactions with a think time in between
invocations



At L2 monitoring level, the data is shown at J2EE component Level with
contextual data


JSP, EJB, JMS, MQI, JDBC, JNDI



At L3, full application class/method trace will be collected by default

IBM Tivoli

Workload Analysis




Workload Analysis refers to running the applications via a
Traffic Simulator with a number of clients



Monitoring Tool is normally running at L1 for this type of
analysis, with a sampling rate under 10%



Normally the best practice is to prepare a multi
-
user
automated test script that fires off transactions in the right
mix that represents the ‘production’ workload


IBM Tivoli

Workload Analysis




Each run should be at least 30
-
60 minutes long to observe the system at
Steady State



During steady state, analysis can be conducted on a large number of
metrics:


Heap, CPU, paging, throughput, response time, WebSphere resource
pools, GC activities etc..



At the end of the run, a graph of CPU% vs. Throughput Rate should be
plotted. Any non
-
linearity of the behavior of the workload should be
explained, bottlenecks eliminated, and a re
-
run until a relatively linear line
is obtained



More reports can be drawn from Performance Analysis & Reporting (PAR)

IBM Tivoli

Additional Performance Tuning Tips
-

1



Here are a few other things that we can try to help improve
performance. Please note, that these suggestions are given without
detailed knowledge of the environment / architecture / open issues.



Increase web container max keep
-
alives.


Increase web container thread pool.


Increase database connection pool.


Adjust maximum and minimum heap sizes.


Disable explicit garbage collection.


Enable concurrent I/O at o/s level.


Pre
-
compile JSPs.


Increase the priority of the app server process at o/s level.



IBM Tivoli

Additional Performance Tuning Tips
-

2


If there are many short living objects, tuning NewSize and
MaxNewSize JVM parameters would help.


Changing ulimit for operating system (AIX, Solaris) may help
improve performance.


Enable dynamic caching, if possible.


Creating new indexes or re
-
organizing indexes will help improve
performance of database intensive transactions.


Adjusting prepared statement cache size may also help.


Adjust O/S parameters: tcp_time_wait_interval and
tcp_fin_wait_2_flush_interval.


IBM Tivoli


Example: Workload Analysis


IBM Tivoli

Check Environmental Consistency

Ensure Platform Can Support Application

Verify
System, Java
and App
Server
Runtime
Environment

IBM Tivoli


Check Server Statistics

Compare key performance metrics side
-
by
-
side

Shows paging
and load
balancing in
clustered
deployments

Ensures overall
throughput matches
expected results from
load generator

Quick overview of
application impact
on monitored
servers

IBM Tivoli


Validate Throughput vs Response Time

Quantify Application Scalability

Correlated
plot of
response time
during stress
test relative to
request rate

Graphical
report showing
number of
requests over
time

IBM Tivoli


Calculate Throughput vs. JVM CPU%

Verify target transaction per second rate achievable

Request rate
during stress
run (same as
prior slide)

Correlated plot
reveals low JVM
CPU consumption
even as
throughput
increases

IBM Tivoli


Throughput vs. Garbage Collection (GC)


Tune JVM to minimize GC frequency

Request rate
during stress
run

GC frequency not
in steady state as
throughput rises

Increased heap size impacting
GC rate although < = 6 per
minute appears to be
affordable as response time
remains < 34 ms

!

IBM Tivoli


Throughput vs. Total GC time

Avoid paging (has large effect on end user response time)

Request rate
ramps and tops
out

Excessive and
persistently high
total GC time

Total time for GC
to complete per
cycle correlated
with request rate

!

IBM Tivoli


Throughput vs. Heap size after GC

Good indicator of potential memory leaks


Request rate
during stress
run (same as
prior slide)

Shows well
-
tuned
heap size as little
if any growth
during high
throughput

No growth in
heap under
increased load
proves no
detectable
leaks

IBM Tivoli

WebSphere Resources Utilization Analysis

Verify application does not over
-
tax app server resources

Saturated
thread pool


good candidate
for tuning !

Overall we see low
J2EE resource
consumption

IBM Tivoli

Check Average CPU time per Transaction

Based on threads running application classes in workload mix

Spikes showing
high
consumption at
random
intervals

Otherwise
normal
consumption
rates

IBM Tivoli

Check Average CPU time per Transaction

Based on threads running application classes in workload mix

Transaction
with very high
CPU in spike
interval

IBM Tivoli

Example:

Transaction Analysis Methodology


IBM Tivoli

Analyze Transaction Instances of Interest

Show “Level 2” J2EE component
-
level events

Sequential
view of
event
execution /
flow

High
-
precision
timing
measurements
for each event
call

Highlighted
JCA calls
exhibit high
delta CPU
timing
difference

!

IBM Tivoli

Further Analyze Transactions

Show discreet “Level 3” method
-
level and nested method events

Each row
shows
method flow
and depth

Good
candidate for
tuning due to
high delta CPU
consumption

!

IBM Tivoli

Analyze SQL Profile



Check the response time for various queries.



Use the data in conjunction with Top used queries report. Tune queries.

IBM Tivoli

Check for Top Methods Used

Identify hot methods by count

Names of
hot
methods

Total
Invocation
Count

!

IBM Tivoli

Check for Most CPU
-
Intensive Methods

Correlate hot methods by CPU cost with highest count methods

Names of hot
methods

CPU
consumption
for each
method

!

IBM Tivoli

Check for Slowest Methods

Correlate with hot methods to evaluate total contribution to response time

Names of
slow
methods

High average
response time
per method

!

IBM Tivoli


Example:

Memory Leak Analysis


IBM Tivoli

Memory Analysis Reporting

Quick check to detect presence of a leak

Upward slope
indicates
possibility of
a “slow”
memory leak

Constant
request rate
correlated
with JVM
Heap Size

IBM Tivoli

Memory Leak: Avg. Heap Size after GC vs. Requests



Average Heap Size after GC vs. Number of Requests:



Verify that a leak exists with the Avg. Heap Size After GC Graph.



Check to see if it is due to an increasing number of requests.

To access this feature: Select PROBLEM DETERMINATION
-
> Memory Diagnosis
-
> Memory Analysis
-
>

Change Metrics.


IBM Tivoli

Memory Leak: Average Heap Size after GC vs. Live Sessions



Average Heap Size after Garbage Collection (GC) vs. Live Sessions:



Verify that a leak exists with the Avg. Heap Size After GC Graph



Check to see if it is due to an increasing number of users.

To access this feature: Select PROBLEM DETERMINATION
-
> Memory Diagnosis
-
> Memory Analysis
-
>

Select Metrics.

IBM Tivoli

Find Leaking Candidates

Production
-
friendly heap
-
based analysis

Comparison of
heap
snapshots
shows
suspected leak
candidates

Class

name

filters

Application
class that
appears to have
some growth

IBM Tivoli

Zero in on leaking code

View suspected classes and allocating methods

Each ‘allocation pattern’
uniquely identifies a set of
heap objects of the same
class, allocated by the
same request type, and
from the same point in the
application code

Indicates the
specific point in the
application code
where this object
set was allocated
from

!

IBM Tivoli

Zero in on leaking code (scroll from previous page)

V
iew suspected classes and allocating methods

Each ‘allocation pattern’
uniquely identifies a set of
heap objects of the same
class, allocated by the
same request type, and
from the same point in the
application code

Additional code and GC
performance details
help developers isolate
leak and optimize JVM

Large number of
surviving objects since
last GC

IBM Tivoli

View References to Live Objects

Confirm Allocating Class

Helps pinpoint why
objects in question
are not getting
garbage collected

Also shows other
objects on the heap
which contain
references to the set of
objects being analyzed.


Allocating method and
line number in the code


IBM Tivoli

Questions

IBM Tivoli

Thank You