Java thread performance

errorhandleΛογισμικό & κατασκευή λογ/κού

18 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

75 εμφανίσεις





Java thread performance
The behaviour of Java threads under Linux NPTL





















Author: Jon Barnett
Document version: 1.5


Table of contents

1

The behaviour of Java threads under Linux NPTL.................................................................1-1

1.1

Parallel threading....................................................................................................................1-1

1.2

Native POSIX Thread Library................................................................................................1-1

1.3

Test Environment....................................................................................................................1-1

1.3.1

Java Virtual Machine environments..............................................................................1-2

1.3.2

Test methodology..........................................................................................................1-2

1.4

Results.....................................................................................................................................1-2

1.4.1

Short load results...........................................................................................................1-2

1.4.2

Medium load results......................................................................................................1-3

1.4.3

Long load results...........................................................................................................1-3

1.4.4

Discussion......................................................................................................................1-4

A

Thread load generator...............................................................................................................A-1

A.1

Test code...........................................................................................................................A-1

B

Measurement data......................................................................................................................B-1

B.1

Short duration tests............................................................................................................B-1

B.1.1

Sun JDK 1.4.1 03..........................................................................................................B-1

B.1.2

Sun JDK 1.4.2...............................................................................................................B-1

B.1.3

IBM SDK 1.4.1 (NPTL)...............................................................................................B-2

B.1.4

IBM SDK 1.4.1.............................................................................................................B-3

B.2

Medium duration tests.......................................................................................................B-3

B.2.1

Sun JDK 1.4.1 03..........................................................................................................B-3

B.2.2

Sun JDK 1.4.2...............................................................................................................B-4

B.2.3

IBM SDK 1.4.1 (NPTL)...............................................................................................B-5

B.2.4

IBM SDK 1.4.1.............................................................................................................B-5

B.3

Long duration tests............................................................................................................B-6

B.3.1

Sun JDK 1.4.1 03..........................................................................................................B-6

B.3.2

Sun JDK 1.4.2...............................................................................................................B-7

B.3.3

IBM SDK 1.4.1 (NPTL)...............................................................................................B-7

B.3.4

IBM SDK 1.4.1.............................................................................................................B-8

C

Supplemental information.........................................................................................................C-1

C.1

Object creation..................................................................................................................C-1

C.1.1

Code fragment for load.................................................................................................C-1

C.1.2

Sun JDK 1.4.1 03..........................................................................................................C-1

C.1.3

Sun JDK 1.4.2...............................................................................................................C-2

C.1.4

IBM SDK 1.4.1 (NPTL)...............................................................................................C-3

C.1.5

IBM SDK 1.4.1.............................................................................................................C-3

C.1.6

Comments.....................................................................................................................C-4

C.2

HashMap object insertion..................................................................................................C-4

C.2.1

Code fragment for load.................................................................................................C-4

C.2.2

Sun JDK 1.4.1 03..........................................................................................................C-4

C.2.3

Sun JDK 1.4.2...............................................................................................................C-5

C.2.4

IBM SDK 1.4.1 (NPTL)...............................................................................................C-6

C.2.5

IBM SDK 1.4.1.............................................................................................................C-6

C.2.6

Comments.....................................................................................................................C-7

D

Parallel GC in Sun Java.............................................................................................................D-1

D.1.1

Sun JDK 1.4.2 with UseParNewGC.............................................................................D-1

D.1.2

Sun JDK 1.4.2 with UseParNewGC and UseConcMarkSweepGC..............................D-2

D.1.3

Comments.....................................................................................................................D-2

Amity Solutions Pty Ltd – Version 1.5
i
1 The behaviour of Java threads under Linux NPTL

This study is intended to examine the Java threads performance under the Linux implementation of the
Native POSIX Thread Library (NPTL). The study was prompted after we discovered variability in
ECperf 1.1 results under the same conditions. The uncertainty in the measurements included response
time characteristics that could be as much as twice previous response measurements under the same
conditions and variations in performance measurements that reversed relative positioning results on
multiple runs. With such observations, we began to question our underlying assumptions on the
threading performance.
1.1 Parallel threading

The notion that we are exploring is that of actual parallel execution, rather than concurrency. In
concurrency, we expect multiple tasks to appear to be executed at once by time-sharing the CPU
between the tasks. However with parallel execution, we expect multiple tasks to complete as quickly
as they would take to complete if they were the only task running. This assumption is based on the
provision that there are enough CPUs to service all the tasks running at once. In the simplest case, a
thread can be considered an independent task if you do not create any interdependencies between
threads such as through resource sharing.

There are some modifications to the theory that move operation away from the ideal situation. Usually,
threads are created by a parent routine. There are some necessary control mechanisms to allow the
parent to view, manage and if necessary, terminate the child thread. These mechanisms impose a co-
ordination overhead such that the execution of operations within a thread run slightly slower than the
same operations running in a single threaded and dedicated environment. This modification to the
model means that the greater the number of threads the greater the management overhead. The aim of
good parallel processing and multi-threaded environment design is to minimise this overhead. It would
be preferable that the management algorithm does not result in an n!-based expansion of the overhead
as threads increase. Such exponential growth in overhead would greatly impact scalability.
1.2 Native POSIX Thread Library

There are two primary reasons for implementing NPTL:
− Better thread performance and scalability
− Better POSIX standards compliance

The NPTL concept arose from kernel developers at the RedHat group. A rival implementation called
New Generation POSIX Threads (NGPT) has been developed by an IBM and Intel led group. It is not
our intention to argue the merits of one or the other. We focus on the NPTL implementation as it
affected our measurements on a RedHat 9 system. Further information on the two implementations can
be found on the O’Reilly web site page,
http://www.onlamp.com/lpt/a/2839
.

Java is affected by Linux threads as it needs to use the underlying thread implementation of the
operating system the Java Virtual Machine (JVM) runs upon. The latest Sun JDKs for Linux have been
built with NPTL compatibility. The latest IBM SDK for Linux does not officially support NPTL but is
released as an experimental implementation for RedHat Linux NPTL systems.
1.3 Test Environment

The test environment consists of an 8-way Intel multiprocessor system, running RedHat 9 with an
NPTL implementation. Nothing else is running on the system except the standard Linux support
processes. The CPU clock speed on the system is reported as 2.0 GHz. The bus clock speed is reported
as 100.0 MHz.

The test code listed in Appendix A is used to generate the thread loads for the system. Measurements of
run times are taken from the thread information produced.

Amity Solutions Pty Ltd – Version 1.5
1-1
The behaviour of Java threads under Linux NPTL
1.3.1 Java Virtual Machine environments
The following JVM environments are tested:
− Sun Java 2 JDK 1.4.1 03 b02 in server mode
− Sun Java 2 JDK 1.4.2 b28 in server mode
− IBM Java 2 SDK 1.4.1 build cxia32141-20030522 without LD_ASSUME_KERNEL=2.2.5
− IBM Java 2 SDK 1.4.1 build cxia32141-20030522 with LD_ASSUME_KERNEL=2.2.5

The LD_ASSUME_KERNEL=2.2.5 setting instructs the IBM SDK to employ the standard Linux
threading model rather than use NPTL.
1.3.2 Test methodology

The tests are conducted for different loads to observe the performance over short, medium and long
parallel runs. The effect of increasing the Java threads within the capability of the multiprocessor
system to provide processing power without resorting to concurrent operation is compared for each
load group. This is done as we are interested in the parallel tasking capability of the system. We
understand that there will be some processor overhead in the operating system itself but assume that
this is negligible compared to the actual loads we impose on each thread.

A short duration load consists of 10000 HashMap put operations. A medium duration load consists of
100000 HashMap put operations. A long duration load consists of 10000000 HashMap put operations.
Although the performance of the HashMap operations vary across the JVM implementations, they are a
sufficiently CPU intensive operation with constant execution speed to allow us to obtain a reasonable
indication of thread operation characteristics.
1.4 Results

Graphs of the results are presented here. Appendix B contains the collated information from which
these graphs are constructed.
1.4.1 Short load results

Average completion times for a short duration load: 10000 ops
0
50
100
150
200
250
300
1 2 3 4 5 6 7 8
Threads
Completi
on (ms)
Sun JDK 1.4.1 03
Sun JDK 1.4.2
IBM SDK 1.4.1 (NPTL)
IBM SDK 1.4.1



1
2
3
4
5
6
7
8
Sun JDK 1.4.1 03
17
34
52
74
96
120
158
259
Sun JDK 1.4.2
18
36
57
85
93
125
127
87
IBM SDK 1.4.1 (NPTL)
41
40
39
39
43
40
43
47
IBM SDK 1.4.1
41
42
39
43
42
43
42
39

Amity Solutions Pty Ltd – Version 1.5
1-2
The behaviour of Java threads under Linux NPTL
1.4.2 Medium load results

Average completion times for a medium duration load: 100000 ops
0
500
1000
1500
2000
2500
3000
1 2 3 4 5 6 7 8
Threads
Completion (ms)
Sun JDK 1.4.1 03
Sun JDK 1.4.2
IBM SDK 1.4.1 (NPTL)
IBM SDK 1.4.1



1
2
3
4
5
6
7
8
Sun JDK 1.4.1 03
61
212
370
808
962
1599
1738
2767
Sun JDK 1.4.2
67
187
369
749
957
1329
1685
2792
IBM SDK 1.4.1 (NPTL)
50
57
57
60
65
72
79
81
IBM SDK 1.4.1
50
56
57
61
67
66
74
79
1.4.3 Long load results

Average completion times for a long duration load: 10000000 ops
0
50000
100000
150000
200000
250000
300000
1 2 3 4 5 6 7 8
Threads
Completion (ms)
Sun JDK 1.4.1 03
Sun JDK 1.4.2
IBM SDK 1.4.1 (NPTL)
IBM SDK 1.4.1



1
2
3
4
5
6
7
8
Sun JDK 1.4.1 03
1188
17770
34895
59992
60816
125142
185450
275471
Sun JDK 1.4.2
1253
16803
24694
71551
60941
132759
195018
283774
IBM SDK 1.4.1 (NPTL)
1567
2209
2831
3385
3860
4173
4751
5322
IBM SDK 1.4.1
1291
1583
1782
1987
2328
2667
2970
3386




Amity Solutions Pty Ltd – Version 1.5
1-3
The behaviour of Java threads under Linux NPTL
1.4.4 Discussion

From these observations, the Sun implementations using the standard Garbage Collector (GC) routines
suffer in throughput performance as the number of threads increase. This occurs despite the
availability of processors to service the load. Monitoring the CPU loads using Linux top shows that the
load is distributed to the other processors and they achieve 100 percent utilisation. The non-linearity of
the results for the Sun implementations indicates that under certain conditions poor results can be
obtained for a multi-threaded Java application. We also note that there is a large variation in the
sample results for the Sun implementations that can also lead to inconsistency in performance.

We do note the continuing characteristic for wide variation in responses. At this time, we have
insufficient information to provide a possible source for the variations. We note from some informal
testing that the -server option used here with the Sun JDKs stabilises throughput fluctuations but
variability still exists. We also note that a few tests reported large variations that we have been forced
to remove from our samples. As we do not yet understand the source of these deviations and we are
more interested in the standard response behaviour, we have omitted them. We do provide the omitted
figures in Appendix D for those interested in the size of the variation.

The IBM implementation provides a more linear characteristic. From the data tables, it also appears the
removal of the reliance on NPTL improves the performance for long runs. However, the performance
of any of the JVM configurations will depend on many factors beyond the scope of this study.

We further supplement this information with additional tests conducted in Appendix C. These indicate
the source of the non-linear behaviour. It appears to stem from Java object creation. The removal of
the object creation from the load improves the Sun JDK performances for standard garbage collection
to a near-linear characteristic. However, the characteristic is such that the proportionality exceeds the
requirements for obtaining any benefit from additional CPUs for increasing load. Under the same
conditions, the IBM SDK suffers from no apparent performance penalty for adding CPU to service
load. The primary reason is the standard garbage collection routine implemented in the Sun JDK. The
IBM JDK automatically applies a parallel threaded GC implementation. The requirement to dispose of
the created objects places a heavy load on the standard Sun GC routine. It is expected that the parallel
GC would operate as efficiently as the IBM implementation.

The Sun JDK 1.4.2 implementation employing the new enhancements for parallel processor GC
support provides a performance curve that indicates greater parallel thread management efficiency as
the threads increase. These results can be viewed in Appendix D. The suggestion may be interpreted
by the decreasing rate of change as the number of threads increase. This is the expected behaviour for
a well-designed multi-threading system. We also direct your attention to the similarity in the linear
sections of the response with those of the non-object creation test results of Appendix C. The
similarity indicates that the parallel GC operation does not introduce significant deformation to the
response and that there is otherwise no major change to the operating environment for the test program.
This demonstrates continuity between execution in the environment of Appendix C and Appendix D,
and highlights the source of the differences between the standard environment and Appendix D.

As with all studies at such low levels, a certain amount of interpretation must be accounted for, when
determining the impact on large and complex systems. Thread interruption, other synchronization
operations and concurrency may minimise the impact of any effects observed in these tests. These
other interactions are not investigated in these tests. Run-time optimizations may also improve
performance for certain code segments.

We do note that empirically from our study of ECperf 1.1 results, there is a greater variability in
throughput and worse response characteristics are possible under the Sun Java 2 SDK 1.4.2 with
standard GC. The extent of the throughput variability is less than 3 percent.

There is no test that exists at this time that measures the contribution of the effects observed here to the
variability of the ECperf 1.1 results. The complexity of application server systems will result in
different characteristics manifesting themselves. It is possible that certain environmental combinations
may result in an application suffering severe performance degradation. The likelihood of particular
environmental conditions to occur is too difficult to compute for a sufficiently complex system.
Therefore we provide this as only an informational paper.
Amity Solutions Pty Ltd – Version 1.5
1-4
The behaviour of Java threads under Linux NPTL
Finally, we provide as a separate extension, the results of parallel thread behaviour on a JBoss client-
based load. This may be found in the technical note,
Parallel thread performance: A JBoss client
example
. Those results corroborate the response characteristics of the original findings in this report
for the Sun JDK with standard settings, rather than the modified results tabled in Appendix C.
However, the magnitude of the degradation is much smaller. There is a separate paper that provides a
visual aid to place the results in context. This may be found in the technical note,
Parallel thread
performance: A visual aid
.

Of interest in the report on the JBoss client response characteristic is the minimal effect on response of
the new GC settings for the latest Sun JDK release. However, the new JDK has many other tuning
parameters that were not utilised in the testing. Currently, it appears such tuning requires in-depth
knowledge of load and program operation. It indicates that with increasingly complex server
configurations, there may be a need to invest heavily in testing and tuning the JVM for the expected
server load for optimal production operation. More than that cannot be determined at this time.

Amity Solutions Pty Ltd – Version 1.5
1-5

A Thread load generator

The program developed to generate loading for threads is based on a simple view of parallel operations
in a multi-processor environment. The load consists of performing a set number of operations in each
thread and measuring the amount of time required to complete the operation for each thread. The
threads have no explicit interdependencies. The thread count and load is tuneable so the effect of
increasing threads and increasing the load can be observed.
A.1 Test code

/*
* Parallel.java
*
* Created on 2 July 2003, 11:51
*/

package com.amity.tester;

import java.lang.Thread;

import java.util.Date;
import java.util.HashMap;

/**
*
* @author jbarnett
* @version 1.0
*/

public class Parallel extends java.lang.Object
{
/**
* Used to test the throughput response.
*
* @returns <code>long</code> which should be millisecond time
*/

class ThroughputTest extends Thread
{
private int id = 0;
private int length = 0;

public ThroughputTest(int id, int length)
{
this.id = id;
this.length = length;
}

public void run()
{
HashMap map = new HashMap();
String key = "Key";
Date start = new Date();
for (int i = length; i -- > 0;)
map.put(key, new Object());
Date end = new Date();
long counter = end.getTime()-start.getTime();
System.out.println("Load " + id + ": , run length = " + length
+ ", time = " + counter + " ms");
}
}

public void generateLoad(int threads, int length)
{
for (int i = 0; i < threads; i++)
new ThroughputTest(i, length).start();
}




Amity Solutions Pty Ltd – Version 1.5
A
-1
Thread load generator
/**
* @param args the command line arguments
*/
public static void main(String[] args)
{
Parallel test = new Parallel();
if (args.length < 2)
{
System.err.println("Expected parallel thread count and test run length.");
System.exit(1);
}
try
{
int i = Integer.parseInt(args[0]);
int j = Integer.parseInt(args[1]);
test.generateLoad(i, j);
}
catch(Exception e)
{
System.err.println("Arguments expected to be integers.");
System.exit(2);
}
}
}

Amity Solutions Pty Ltd – Version 1.5
A
-2

B Measurement data

The following tales and charts summarise the data accumulated from the tests. The calculation of the
standard deviation is based on the assumption that the sample represents a normally distributed
population.
B.1 Short duration tests

These are the results obtained from tests where the 10000 HashMap put operations were performed per
thread.
B.1.1 Sun JDK 1.4.1 03

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
16
17
17
1
2
16
26
45
34
6
3
15
30
69
52
13
4
16
36
94
74
19
5
15
34
130
96
28
6
18
31
163
120
39
7
21
57
226
158
42
8
16
216
301
259
27

Completion time for a short duration load: Sun JDK 1.4.1 03
0
50
100
150
200
250
300
350
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend

B.1.2 Sun JDK 1.4.2

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
16
28
18
4
2
16
27
50
36
7
3
15
36
77
57
12
4
16
42
115
85
18
5
15
4
128
93
38
6
18
64
163
125
29
7
21
4
230
127
73
8
16
4
148
87
51

Amity Solutions Pty Ltd – Version 1.5
B
-1
Measurement data
Completion time for a short duration load: Sun JDK 1.4.2
0
50
100
150
200
250
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend

B.1.3 IBM SDK 1.4.1 (NPTL)

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
41
43
41
1
2
16
33
46
40
4
3
15
33
45
39
4
4
16
33
45
39
4
5
15
40
49
43
3
6
18
35
47
40
4
7
21
40
49
43
3
8
16
44
52
47
3

Completion time for a short duration load: IBM SDK 1.4.1 (NPTL)
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend







Amity Solutions Pty Ltd – Version 1.5
B
-2
Measurement data
B.1.4 IBM SDK 1.4.1

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
41
42
41
0
2
16
34
46
42
4
3
15
34
44
39
3
4
16
39
47
43
3
5
15
38
47
42
3
6
18
41
48
43
2
7
21
38
50
42
3
8
16
30
48
39
6

Completion time for a short duration load: IBM SDK 1.4.1
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend

B.2 Medium duration tests

These are the results obtained from tests where the 100000 HashMap put operations were performed
per thread.
B.2.1 Sun JDK 1.4.1 03

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
60
63
61
1
2
16
134
1250
212
277
3
15
165
602
370
149
4
16
144
1092
808
294
5
15
162
1220
962
389
6
18
468
1843
1599
428
7
21
1249
2087
1738
398
8
16
2679
2833
2767
44

Amity Solutions Pty Ltd – Version 1.5
B
-3
Measurement data
Completion time for a medium duration load: Sun JDK 1.4.1 03
0
500
1000
1500
2000
2500
3000
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend

B.2.2 Sun JDK 1.4.2

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
64
70
67
1
2
16
173
194
187
6
3
15
173
587
369
133
4
16
186
1045
749
282
5
15
159
1214
957
387
6
18
498
1753
1329
464
7
21
1133
2053
1685
412
8
16
2635
2862
2792
56

Completion time for a medium duration load: Sun JDK 1.4.2
0
500
1000
1500
2000
2500
3000
3500
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend







Amity Solutions Pty Ltd – Version 1.5
B
-4
Measurement data
B.2.3 IBM SDK 1.4.1 (NPTL)

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
50
52
50
1
2
16
51
60
57
3
3
15
51
62
57
3
4
16
54
68
60
4
5
15
60
75
65
5
6
18
60
90
72
10
7
21
64
102
79
12
8
16
58
97
81
11

Completion time for a medium duration load: IBM SDK 1.4.1 (NPTL)
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend

B.2.4 IBM SDK 1.4.1

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
50
51
50
0
2
16
49
60
56
4
3
15
52
62
57
4
4
16
54
72
61
5
5
15
58
76
67
5
6
18
59
79
66
5
7
21
64
88
74
8
8
16
67
97
79
8

Amity Solutions Pty Ltd – Version 1.5
B
-5
Measurement data
Completion time for a medium duration load: IBM SDK 1.4.1
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend

B.3 Long duration tests

These are the results obtained from tests where the 10000000 HashMap put operations were performed
per thread.
B.3.1 Sun JDK 1.4.1 03

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
1179
1205
1188
7
2
16
10180
26655
17770
5642
3
15
10264
62205
34895
19057
4
16
10352
78432
59992
20949
5
15
11278
84078
60816
25527
6
18
75930
154238
125142
23125
7
21
158374
208184
185450
16642
8
16
272652
276413
275471
998

Completion time for a long duration load: Sun JDK 1.4.1 03
0
50000
100000
150000
200000
250000
300000
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend


Amity Solutions Pty Ltd – Version 1.5
B
-
6
Measurement data
B.3.2 Sun JDK 1.4.2

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
1244
1264
1253
5
2
16
11045
21741
16803
4306
3
15
10179
36945
24694
11278
4
16
18295
95320
71551
24132
5
15
10382
94081
60941
32140
6
18
87574
156594
132759
23243
7
21
177766
210608
195018
10277
8
16
283335
284151
283774
261

Completion time for a long duration load: Sun JDK 1.4.2
0
50000
100000
150000
200000
250000
300000
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend

B.3.3 IBM SDK 1.4.1 (NPTL)

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
1408
1718
1567
107
2
16
1910
2605
2209
189
3
15
2386
3368
2831
296
4
16
3184
3559
3385
119
5
15
3513
4133
3860
180
6
18
3824
4567
4173
272
7
21
4228
5121
4751
257
8
16
5147
5446
5322
89

Amity Solutions Pty Ltd – Version 1.5
B
-
7
Measurement data
Completion time for a long duration load: IBM SDK 1.4.1 (NPTL)
0
1000
2000
3000
4000
5000
6000
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend

B.3.4 IBM SDK 1.4.1

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
1279
1306
1291
9
2
16
1450
1719
1583
78
3
15
1628
1890
1782
87
4
16
1865
2127
1987
81
5
15
2167
2465
2328
98
6
18
2453
3086
2667
136
7
21
2704
3169
2970
130
8
16
2962
3778
3386
240

Completion time for a long duration load: IBM SDK 1.4.1
0
500
1000
1500
2000
2500
3000
3500
4000
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend


Amity Solutions Pty Ltd – Version 1.5
B
-8

C Supplemental information

An important question was raised about the load we imposed for the testing. The implementation of
the HashMap is known to differ in distributions. We also realise that the Java Virtual Machines have
different execution speeds. However, our intent is not to compare the results directly in terms of head-
to-head performance. We are more interested in the characteristics of the performance in relation to
our notion of expected parallel process behaviour. The HashMap implementation, we believe has
sufficient complexity that it is not a trivial case for the run-time optimizer of the JVMs tested. It is also
sufficiently complex so that it is somewhat like a real code segment.

As for our expectations on parallel execution, the most important is that the execution time of each
thread load should not increase in a 1 to 1 proportion as threads increase. That is, in a linear model, the
doubling of processors should not result in the execution duration of each thread also doubling. Such a
characteristic would render the addition of parallel computing power ineffective. Similarly, any
polynomial function that models the performance should not attain a result that approaches or exceeds
this proportionality for the salient number of CPUs. We leave it to the reader to determine the
acceptable limit for proportionality.

We have, as a result of this question, also explored the components of the load and present this
supplemental information. The test results are based on 10000000 operations. The environment and
Java configurations have not been changed.
C.1 Object creation

We reduced the load to the creation of objects. This was expected to be a poor test as it is a simple and
common operation.
C.1.1 Code fragment for load

The load was changed to this.

Object mapObject;
for (int i = length; i -- > 0;)
mapObject = new Object();

C.1.2 Sun JDK 1.4.1 03

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
782
805
789
5
2
16
9752
23535
15113
5383
3
15
8732
31975
21189
9515
4
16
7896
69874
45631
23605
5
15
7580
108776
76972
32259
6
18
61418
161535
134323
29412
7
21
167081
206084
191344
13328
8
16
253547
263096
260380
3073

Amity Solutions Pty Ltd – Version 1.5
C-1
Supplemental information
Completion time for a long duration load: Sun JDK 1.4.1 03
0
50000
100000
150000
200000
250000
300000
1 2 3 4 5 6 7 8
Threads
Du
ration (ms
)
Min.
Max.
Avg.
Trend


C.1.3 Sun JDK 1.4.2

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
783
792
786
3
2
16
9843
24175
18526
5876
3
15
8685
43000
27715
11354
4
16
7870
69684
37920
22265
5
15
7546
108000
64099
35415
6
18
113865
164908
139478
18593
7
21
187882
200822
195528
3950
8
16
241567
259132
253036
5219

Completion time for a long duration load: Sun JDK 1.4.2
0
50000
100000
150000
200000
250000
300000
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend






Amity Solutions Pty Ltd – Version 1.5
C-2
Supplemental information
C.1.4 IBM SDK 1.4.1 (NPTL)

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
24
26
25
1
2
16
7
26
17
9
3
15
7
25
14
8
4
16
6
27
12
9
5
15
6
28
15
9
6
18
7
27
12
8
7
21
6
31
13
8
8
16
7
28
12
8

Completion time for a long duration load: IBM SDK 1.4.1 (NPTL)
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7 8
Threads
Dura
tion (ms)
Min.
Max.
Avg.
Trend

C.1.5 IBM SDK 1.4.1

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
24
24
24
0
2
16
15
25
22
3
3
15
7
26
18
7
4
16
7
25
15
8
5
15
7
25
13
8
6
18
7
25
12
7
7
21
7
31
14
9
8
16
7
26
13
8

Amity Solutions Pty Ltd – Version 1.5
C-3
Supplemental information
Completion time for a long duration load: IBM SDK 1.4.1
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend

C.1.6 Comments

The major source of poor parallel execution for the Sun JDKs appears to be related to the creation of
objects in parallel threads. The IBM SDKs improve performance for more CPU but it is suspected that
perhaps some optimization has been achieved in the run-time execution because this is a simple
operation.
C.2 HashMap object insertion

We reduce the load by restricting the operation to inserting a pre-existing object into the HashMap.
This removes the creation penalty discovered in the previous test while retaining the HashMap load
complexity.
C.2.1 Code fragment for load

The load was changed to this.

Object mapObject = new Object();
for (int i = length; i -- > 0;)
map.put(key, mapObject);
C.2.2 Sun JDK 1.4.1 03

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
389
396
391
2
2
16
953
1369
1183
170
3
15
1196
2081
1894
279
4
16
1109
4021
3188
981
5
15
1080
4422
3193
1057
6
18
3053
6653
5375
1108
7
21
4998
6999
6144
487
8
16
6589
8727
7712
718

Amity Solutions Pty Ltd – Version 1.5
C-4
Supplemental information
Completion time for a long duration load: Sun JDK 1.4.1 03
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1 2 3 4 5 6 7 8
Threads
Du
ration (ms
)
Min.
Max.
Avg.
Trend


C.2.3 Sun JDK 1.4.2

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
523
536
527
3
2
16
968
1268
1119
148
3
15
1110
2000
1905
228
4
16
1175
3598
2297
762
5
15
1019
4387
3486
928
6
18
2662
6651
4805
1277
7
21
5594
6422
6052
263
8
16
5890
7557
6601
551

Completion time for a long duration load: Sun JDK 1.4.2
0
1000
2000
3000
4000
5000
6000
7000
8000
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend






Amity Solutions Pty Ltd – Version 1.5
C-5
Supplemental information
C.2.4 IBM SDK 1.4.1 (NPTL)

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
294
302
299
3
2
16
285
300
294
4
3
15
279
306
294
7
4
16
279
307
293
7
5
15
293
312
302
6
6
18
283
311
298
8
7
21
248
312
292
18
8
16
276
310
298
11

Completion time for a long duration load: IBM SDK 1.4.1 (NPTL)
0
50
100
150
200
250
300
350
1 2 3 4 5 6 7 8
Threads
Dura
tion (ms)
Min.
Max.
Avg.
Trend

C.2.5 IBM SDK 1.4.1

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
284
287
285
1
2
16
279
298
286
5
3
15
276
290
284
4
4
16
283
310
291
7
5
15
288
310
297
7
6
18
280
296
286
5
7
21
283
299
290
5
8
16
200
311
285
34

Amity Solutions Pty Ltd – Version 1.5
C-
6
Supplemental information
Completion time for a long duration load: IBM SDK 1.4.1
0
50
100
150
200
250
300
350
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend

C.2.6 Comments

With object creation removed from the system load, the Sun JDK performances were improved.
However, the operation still remained poor as the proportionality limits cited earlier were either
approached or surpassed. The IBM SDK performances indicate little parallel load penalty for this type
of operation.


Amity Solutions Pty Ltd – Version 1.5
C-
7

D Parallel GC in Sun Java

As was recently pointed out, the Sun Java tests were unfair as we did not use the new parallel threaded
Garbage Collection system. This is quite true and we have sought to address this. These additional
tests show that the standard garbage collection employed does produce much of the performance issues
we first encountered. We should again stress that our intention was not to single out any JVM but
rather to understand the effects we experienced when running some fairly simple tests on a
multiprocessor machine.

We have re-run the long duration tests and present the results in the same format. We have used the
same configurations but with the following options:
-XX:+UseParNewGC

and:

-XX:+UseParNewGC -XX:+UseConcMarkSweepGC

These settings rival settings in the IBM Java 1.4.1 implementation.
D.1.1 Sun JDK 1.4.2 with UseParNewGC

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
1179
1235
1206
18
2
16
1943
4182
2767
705
3
*15
2311
4395
3613
705
4
16
2991
5905
4704
971
5
15
3127
6896
5720
877
6
18
4362
9002
6769
1161
7
21
6257
7655
6883
405
8
16
6898
8307
7327
429

One set of results for the 3-thread tests returned a strange combination consisting of 4117, 24334 and
24314. We discarded this although we are at a loss to explain the deviation as the test machine had no
other loads running. It is possible a Linux process blocked progress in the test.

Completion time for a long duration load: Sun JDK 1.4.2 with UseParNewGC
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend






Amity Solutions Pty Ltd – Version 1.5
D
-1
Parallel GC in Sun Java
Amity Solutions Pty Ltd – Version 1.5
D
-2
D.1.2 Sun JDK 1.4.2 with UseParNewGC and UseConcMarkSweepGC

Threads
Samples
Min.
Max.
Avg.
Std. Dev.
1
16
1206
1235
1219
10
2
16
1941
3602
2702
620
3
15
3159
4390
3521
509
4
16
3861
6129
5367
888
5
15
3085
7593
5463
1280
6
18
5613
11122
7796
1803
7
*21
5797
8991
7541
1046
8
16
6718
8266
7102
382

One set of results for the 7-thread tests returned a strange combination consisting of 2074, 3356, 3377,
19443, 19524, 20325 and 20331. We discarded this although we are at a loss to explain the deviation as
the test machine had no other loads running. It is possible a Linux process blocked progress in the test.

Completion time for a long duration load: Sun JDK 1.4.2 with both GC options
0
2000
4000
6000
8000
10000
12000
1 2 3 4 5 6 7 8
Threads
Duration (ms)
Min.
Max.
Avg.
Trend

D.1.3 Comments

The results presented in this final addition show a marked turnaround in response characteristic. They
are in-line with expectations for parallel thread operation and show that increasing processors may
result in a reduction of the management overhead. However, the conceded increase in completion
times for increased threads may still present a concern for some readers. As always, there are more
complex scenarios that may render these simple tests invalid in a real situation. Therefore use these
only as a guide to effects that may prevail in your JVM environment.