uPortal Performance Optimization

judgedrunkshipServers

Nov 17, 2013 (3 years and 10 months ago)

113 views

uPortal Performance
Optimization

Faizan Ahmed

Architect and Engineering Group

Enterprise Systems & Services

RUTGERS

faizan
@rutgers.edu

11/17/2013

2

What to Measure for Performance


Performance Planning


What to measure (Set targets)


Application Throughput


Response time


Memory Sizes


Transaction rate

11/17/2013

3

How to measure performance


Load testing


User Perception


Profiling.


11/17/2013

4

When to Optimize?


Optimize at the analysis and design
stage


Determine general characteristics of
objects, data, and users


Identify probable performance limitations
from the determined specifications.


Eliminate any performance conflicts by
extending, altering or restating the
specifications.

11/17/2013

5

When Not to Optimize?


Do not optimize at code writing
(implementation) stage .

11/17/2013

6

GC History


Garbage Collection (GC) has been
around for a while (1960’s LISP,
Smalltalk, Eiffel, Haskell, ML, Scheme
and Modula
-
3)


Went mainstream in the 1990’s with
Java (then C#)


JVM implementations have improved on
GC algorithms/speed over the past
decade


11/17/2013

7

GC Benefits/Cost


Benefits


Increased reliability


Decoupling of memory mgmt from program
design


Less time spent debugging memory errors


Dangling points/memory leaks do not occur
(Note: Java programs do NOT have memory
leaks; “unintentional object retention” is more
accurate)


Costs


Length of GC pause


CPU/memory utilization

11/17/2013

8

GC Options


Sun’s 1.3 JDK included 3 GC
strategies


1.4 JDK includes 6 and over a dozen
command line options


Application type will demand
strategy:


Real
-
time


short and bounded pauses


Enterprise


may tolerate longer or less
predictable pauses in favor of higher
throughput

11/17/2013

9

GC Phases


GC has two main phases:


Detection


Reclamation



Steps are either distinct:


Mark
-
Sweep, Mark
-
Compact


… or interleaved


Copying

11/17/2013

10

Fundamental GC Property







“When an object becomes garbage, it stays garbage.”

11/17/2013

11

Reachability


Roots


reference to object in a static
variable or local variable on an active
stack frame


Root Objects


directly reachable
from roots


Live Objects


all objects transitively
reachable from roots


Garbage Objects


all other objects

11/17/2013

12

Reachability Example

Runtime

Stack

ref

ref

Heap

x

x

x

11/17/2013

13

GC Algorithms


Two basic approaches:


Reference Counting


keep a count of
references to an object; when a count of
zero, object is garbage



Tracing


trace out the graph of objects
starting from roots and mark; when
complete, unmarked objects are garbage

11/17/2013

14

Reference Counting


An early GC strategy


Advantages:


can be run in small chunks of time


Disadvantages:


does not detect cycles


overhead in incrementing/decrementing
counters


Reference Counting is currently out
-
of
favor

11/17/2013

15

Tracing


Basic tracing algorithm is known as
mark and sweep


mark phase


GC traverses reference
tree, marking objects


sweep phase



umarked objects are
finalized/freed


11/17/2013

16

Mark
-
Sweep Example

Root

x

x

x

x

x

1. mark
-
sweep start

a
Root

x

x

x

x

x

a

a

a

2. end of marking

Root

3. end of sweeping

11/17/2013

17

Mark
-
Sweep


Problem is fragmentation of memory


Two strategies include:


Mark
-
Compact Collector


After mark,
moves live objects to a contiguous area
in the heap


Copy Collector


moves live objects to a
new area


11/17/2013

18

Mark
-
Compact Collector Example

a
Root

x

x

x

x

x

a

a

a

1. end of marking

Root

2. end of compacting

11/17/2013

19

Copy Collector Example

Root

x

x

x

x

x

1. copying start

Root

2. end of copying

Unused

Unused

From

From

To

To

11/17/2013

20

Performance Characteristics


Mark
-
Compact collection is roughly
proportional to the size of the heap


Copy collection time is roughly
proportional to the number of live
objects

11/17/2013

21

Copying


Advantages:


Very fast…if live object count is low


No fragmentation


Fast allocation


Disadvantages:


Doubles space requirements


not
practical for large heaps

11/17/2013

22

Observations


Two very interesting observations:


Most allocated objects will die young


Few references from older to younger
objects will exist


These are known as the weak
generational hypothesis

11/17/2013

23

Generations


Heap is split into generations, one
young and one old


Young generation


all new objects are
created here. Majority of GC activity
takes place here and is usually fast
(Minor GC).


Old generation


long lived objects are
promoted (or tenured) to the old
generation. Fewer GC’s occur here, but
when they do, it can be lengthy (Major
GC).

11/17/2013

24

Sun HotSpot Heap Layout

Young

Old

Permanent

Eden

From

To

Survivor Spaces

11/17/2013

25

Before Minor GC

x

Old

Permanent

Eden

From

To

Survivor Spaces

x

x

x

x

x

x

x

x

Unused

x

Young

11/17/2013

26

After Minor GC

Empty

Old

Permanent

Eden

To

From

Survivor Spaces

Unused

Young

11/17/2013

27

GC Notes


Algorithm used will vary based on VM
type (e.g., client/server)


Algorithm used varies by generation


Algorithm defaults change between
VM releases (e.g., 1.4 to 1.5)

11/17/2013

28

Young Generation Collectors


Serial Copying Collector


All J2SEs (1.4.x default)


Stop
-
the
-
world


Single threaded


Parallel Copying Collector


-
XX:+UseParNewGC


JS2E 1.4.1+ (1.5.x default)


Stop
-
the
-
world


Multi
-
threaded


Parallel Scavenge Collector


-
XX:UseParallelGC


JS2E 1.4.1+


Like Parallel Copying Collector


Tuned for very large heaps (over 10GB) w/ multiple CPUs


Must use Mark
-
Compact Collector in Old Generation

11/17/2013

29

Serial vs. Parallel Collector

stop
-
the
-
world

pause

Serial

Collector

Parallel

Collector

11/17/2013

30

Old Generation Collectors


Mark
-
Compact Collector


All J2SEs (1.4.x default)


Stop
-
the
-
world


Single threaded


Train (or Incremental) Collector


-
Xincgc


About 10% performance overhead


All J2SEs


To be replaced by CMS Collector


Concurrent Mark
-
Sweep (CMS) Collector


-
XX:+UseConcMarkSweepGC


J2SE 1.4.1+ (1.5.x default (
-
Xincgc))


Mostly concurrent


Single threaded

11/17/2013

31

Mark
-
Compact vs. CMS Collector

stop
-
the
-
world

pause

Mark
-
Compact

Collector

Concurrent Mark
-
Sweep

Collector

initial mark

concurrent

marking

remark

concurrent sweeping

11/17/2013

32

Intergenerational References


What if an object in the older generation
references an object in the younger
generation?


Add old
-
to
-
young references to root set


Young Generation

x

x

x

x

x

Old Generation

Root set of
references

11/17/2013

33

JDK 1.4.x Heap Option Summary

Low Pause Collectors

Throughput Collectors

Heap Sizes

Generation

Young

Old

Permanent

1 CPU

2+ CPUs

Serial
Copying
Collector
(default)

Parallel Copying
Collector


-
XX:+UseParNewGC

1 CPU

2+ CPUs

Copying
Collector
(default)

Parallel Scavenge Collector


-
XX:+UseParallelGC

-
XX:+UseAdaptiveSizePolicy


-
XX:+AggressiveHeap

-
XX:NewSize

-
XX:MaxNewSize

-
XX:SurvivorRatio


Mark
-
Compact
Collector
(default)

Concurrent Collector


-
XX:+UseConcMarkSweepGC

Mark
-
Compact
Collector
(default)

Mark
-
Compact Collector
(default)

-
Xms

-
Xmx

Can be turned off with

Xnoclassgc

(use with care)

-
XX:PermSize

-
XX:MaxPermSize

11/17/2013

34

Heap Tuning


Analyze application under realistic
load


Run with

verbose:gc and other
options to identify GC information


Select GC engine based on need


Size areas of heap based on
application profile


e.g., an app that
creates many temporary objects may
need a large eden area

11/17/2013

35

Verbose Minor GC Example

62134.872: [GC {Heap before GC invocations=1045:

Heap


def new generation total 898816K, used 778761K


eden space 749056K, 100% used


from space 149760K, 19% used


to space 149760K, 0% used


tenured generation total 1048576K


the space 1048576K, 24% used


compacting perm gen total 32768K, used 26906K


the space 32768K, 82% used

62134.880: [DefNew

Desired survivor size 138018816 bytes, new threshold 16 (max 16)

-

age 1: 3718312 bytes, 3718312 total

-

age 2: 5935096 bytes, 9653408 total

-

age 3: 2614616 bytes, 12268024 total

-

age 4: 1426056 bytes, 13694080 total

-

age 5: 2095808 bytes, 15789888 total

-

age 6: 3996288 bytes, 19786176 total

-

age 7: 550448 bytes, 20336624 total

-

age 8: 4058384 bytes, 24395008 total

: 778761K
-
>23823K(898816K), 0.1397140 secs] 1035112K
-
>280173K(1947392K) Heap after GC invocations=1046:

Heap


def new generation total 898816K, used 23823K


eden space 749056K, 0% used


from space 149760K, 15% used


to space 149760K, 0% used


tenured generation total 1048576K, used 256350K


the space 1048576K, 24% used


compacting perm gen total 32768K, used 26906K


the space 32768K, 82% used

} , 0.1471464 secs]

11/17/2013

36

Verbose Major GC Example

199831.706: [GC {Heap before GC invocations=9786:

Heap


def new generation total 898816K, used 749040K


eden space 749056K, 99% used


from space 149760K, 0% used


to space 149760K, 0% used


tenured generation total 1048576K, used 962550K


the space 1048576K, 91% used


compacting perm gen total 32768K, used 27040K


the space 32768K, 82% used

199831.706: [DefNew: 749040K
-
>749040K(898816K), 0.0000366 secs]199831.707:
[Tenured: 962550K
-
>941300K(1048576K), 3.6172827 secs] 1711590K
-
>1653534K(1947392K) Heap after GC invocations=9787:

Heap


def new generation total 898816K, used 712233K


eden space 749056K, 95% used


from space 149760K, 0% used


to space 149760K, 0% used


tenured generation total 1048576K, used 941300K


the space 1048576K, 89% used


compacting perm gen total 32768K, used 27007K


the space 32768K, 82% used

} , 3.6185589 secs]


11/17/2013

37

Programming Tips


No need to call System.gc()


Finalizers must be used with care!


The use of Object pools can be debated


Learn to use ThreadLocal for large Objects
(but use with care)


Be cognizant of the number/size of Objects
being created


Use caches judiciously (e.g.,
WeakHashMap’s make bad caches under a
heavily loaded system)

11/17/2013

38

Conclusions


GC can be your friend (or enemy)


Put your app on an Object diet


Should always monitor an app with
-
verbose:gc
-
XX:+PrintGCDetails


Tune your heap only if there is a
problem


A large heap may slow an app (more
memory is not always good)



11/17/2013

39

Demonstrate Heap tuning

JVM heap tuning was demonstrated for
uPortal running on local machine
using JDK 1.5.06 . (one hour)

11/17/2013

40

Demonstrate Memory leak

A live demonstration was done to show
the techniques of finding memory
related issues in a java application.
uPortal 2.5.2 code was used for
examples. (1 hour and 30 min).

11/17/2013

41

Techniques for production env.


lsof command


File descriptors


Verbose gc


Kill
-
3


Tune gc


11/17/2013

42

Deployment Architecture/scalability


Capacity planning


Clustering/Load Balancing


11/17/2013

43

Capacity Planning


Today
-

~ 3,000 users


Fall 2003
-

~10,000 users


Spring 2004
-

~50,000


Summer 2004
-

~75,000


Fall 2004


300,00+


For capacity planning, we recommend
using 8% to estimate the peak
number of concurrent users (~800
users)


11/17/2013

44

Requirements


SSL


all sessions will be through
HTTPS


Maximum response time for a single
page request
-

5s


Availability


24x7

11/17/2013

45

One powerful machine with everything

11/17/2013

46

Apache on one machine load balancing uPortal/Tomcat on
smaller boxes

Apache

Database

Other Backend

Systems

Tomcat/uPortal

11/17/2013

47

External cookie
-
based load balancer with Apache/Tomcat on
smaller boxes

Load Balancer/

SSL Encryption

Engine

Database

Other Backend

Systems

Apache/

Tomcat/uPortal

11/17/2013

48

Hardware Chart

One Powerful
Machine

Apache load
balancing

Dedicated
load balancer

Pros

Easy to manage

Can add more
boxes easily

Portal computers
are relatively
inexpensive

Same as with
Apache but scaling
is better.

Adding in SSL
won’t necessarily
degrade
performance

Cons

Cost

Can not scale

Apache’s load
balancing is
limited

Performing SSL
with Apache will
ramp up the CPU
cycles

Can be pricey

$10k


$50k

11/17/2013

49

Choices


SSL Acceleration


1 Big Box or Many Little Boxes


Apache Load Balance or “Layer 4
Switch”


Linux/Intel or Solaris/Sparc


11/17/2013

50

?
?
?
?
?
?
?
?
?
?

11/17/2013

51

References


Garbage Collection in Java
-

http://www.cs.usm.maine.edu/talks/05/printezis.pdf




A brief history of garbage collection



http://www
-
128.ibm.com/developerworks/java/library/j
-
jtp10283/



Garbage collection in the HotSpot JVM
-

http://www
-
128.ibm.com/developerworks/java/library/j
-
jtp11253/



Diagnosing a GC problem
-

http://java.sun.com/docs/hotspot/gc1.4.2/example.html