JMS Performance with WebSphere MQ for Windows V6.0

groupertomatoInternet και Εφαρμογές Web

30 Ιουλ 2012 (πριν από 5 χρόνια και 1 μήνα)

679 εμφανίσεις

04 August 2005


Marc Carter


WebSphere MQ Performance

WMQPG@uk.ibm.com


IBM UK Laboratories

Hursley Park

Winchester

Hampshire

SO21 2JN



Property of IBM


JMS Performance with WebSphere MQ for
Windows V6.0

Version 1.1
JMS Performance with WebSphere MQ for Windows V6.0
Please take Note!

Before using this report, please be sure to read the paragraphs on “disclaimers”, “warranty and liabil ity
exclusion”, “errors and omissions”, and the other g eneral information paragraphs in the "Notices" section
below.
Second Edition, August 2005.

This edition applies to WebSphere MQ V6 for Windows (and to all subsequent releases and modifications
until otherwise indicated in new editions).
© Copyright International Business Machines Corporation 2005. All rights reserved.

Note to U.S. Government Users

Documentation related to restricted rights.
Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule contract with IBM
Corp
JMS Performance with WebSphere MQ for Windows V6.0
Notices

DISCLAIMERS
The performance data contained in this report were measured in a controlled environment. Results obtained
in other environments may vary significantly.
You should not assume that the information contained in this report has been submitted to any formal testing
by IBM.
Any use of this information and implementation of any of the techniques are the responsibility of the licensed
user. Much depends on the ability of the licensed user to evaluate the data and to project the results into
their own operational environment.
WARRANTY AND LIABILITY EXCLUSION
The following paragraph does not apply to the United Kingdom or any other country where such provisions
are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION “AS IS”
WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A
PARTICULAR PURPOSE.
Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore this
statement may not apply to you.
In Germany and Austria, notwithstanding the above exclusions, IBM's warranty and liability are governed
only by the respective terms applicable for Germany and Austria in the corresponding IBM program license
agreement(s).
ERRORS AND OMISSIONS
The information set forth in this report could include technical inaccuracies or typographical errors. Changes
are periodically made to the information herein; any such change will be incorporated in new editions of the
information. IBM may make improvements and/or changes in the product(s) and/or the program(s) described
in this information at any time and without notice.

INTENDED AUDIENCE
This report is intended for architects, systems programmers, analysts and programmers wanting to
understand the performance characteristics of WebSphere MQ V6 for Windows. The information is not
intended as the specification of any programming interface that is provided by WebSphere MQ V6. It is
assumed that the reader is familiar with the concepts and operation of WebSphere MQ V6.

LOCAL AVAILABILITY
References in this report to IBM products or programs do not imply that IBM intends to make these available
in all countries in which IBM operates. Consult your local IBM representative for information on the products
and services currently available in your area.
ALTERNATIVE PRODUCTS AND SERVICES
JMS Performance with WebSphere MQ for Windows V6.0
Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product, program, or service that does
not infringe any IBM intellectual property right may be used instead. However, it is the user’s responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.

USE OF INFORMATION PROVIDED BY YOU
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Trademarks and service marks
The following terms used in this publication are trademarks of International Business Machines Corporation
in the United States, other countries or both:
IBM
WebSphere
Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or
both.
Other company, product, and service names may be trademarks or service marks of others.

EXPORT REGULATIONS
You agree to comply with all applicable export and import laws and regulations.
JMS Performance with WebSphere MQ for Windows V6.0
Preface
This report presents the results of performance evaluations of the Java and JMS clients supplied with
WebSphere MQ for Windows V6.0, and is intended to assist with programming and capacity planning. An
IBM eSeries x360 machine (4-way 2.8GHz CPU, 8 GB RAM) running Windows 2003 was used as the server
under test for all the measurements in this report. For full details of the measurement environment see page
26.
Target Audience
This SupportPac is designed for people who:
• Will be designing and implementing environments using Java/JMS clients with WebSphere MQ for
Windows V6.0
• Want to understand the performance characteristics of JMS.
Readers should have a general awareness of the Java programming language, the Java Message Service
API, the Windows 2003 operating system and of WebSphere MQ in order to make best use of this
SupportPac
The contents of this SupportPac
This SupportPac includes:
• Charts and tables describing the performance headlines of JMS using WebSphere MQ V6.0
• Performance characteristics of options within JMS
• WebSphere MQ messaging comparisons between Java, JMS and C (MQI) programming interfaces
• Advice on programming with WebSphere MQ JMS for performance

Feedback on this SupportPac
The reports and tools are produced to help you understand the performance characteristics of JMS in
WebSphere MQ V6.0 and to assist you with capacity planning. To ensure that the reports and tools are
effective in what they do, it is useful to have feedback on the content and style of the information that we
produce. All comments, both positive and negative, are therefore welcome.
We are particularly interested in your answers to the following questions:
• What are your most common performance questions?
• Do the reports provide what you need?
• Is there any other performance information that is required to help you do your job?
• Would you like to see any other aspects of JMS performance discussed?

Please send feedback to WMQPG@uk.ibm.com
JMS Performance with WebSphere MQ for Windows V6.0
CONTENTS
1 Headline performance....................................................................................................1
1.1 WebSphere MQ V5.3 performance comparison............................................................................1
1.2 WebSphere MQ V6.0 JMS point-to-point.......................................................................................4
1.3 WebSphere MQ V6.0 JMS publish-subscribe................................................................................7
2 Additional comparisons..................................................................................................9
2.1 JMS reliable messaging.................................................................................................................9
2.2 Local bindings connection............................................................................................................10
2.3 JMS message types....................................................................................................................12
2.4 JMS message sizes.....................................................................................................................13
2.5 JMS selectors..............................................................................................................................16
2.6 Java, JMS and C programming interfaces...................................................................................18
2.7 Windows and Linux JMS client efficiency.....................................................................................20
3 Tuning/programming guidelines...................................................................................21
3.1 Tuning the queue manager..........................................................................................................21
3.2 Tuning minimum heap size for Java............................................................................................21
3.3 Use of correlation identifiers........................................................................................................22
3.4 Asynchronous receivers...............................................................................................................22
3.5 How WebSphere MQ JMS is connected to the C interface (MQI)................................................22
4 Scenarios used in this document.................................................................................24
4.1 Onboard request-response scenario............................................................................................24
4.2 Offboard request-response scenario............................................................................................24
4.3 Client send-receive scenario........................................................................................................25
4.4 Local send-receive scenario........................................................................................................25
4.5 Publish-subscribe scenario 1-n....................................................................................................25
5 Measurement environment...........................................................................................26
5.1 Hardware.....................................................................................................................................26
5.2 Software......................................................................................................................................26

JMS Performance with WebSphere MQ for Windows V6.0
Page 1

1 Headline performance
1.1 WebSphere MQ V5.3 performance comparison
The results summarised here demonstrate the performance of WebSphere MQ V5.3 and WebSphere MQ
V6.0. Each WebSphere MQ product level was tested with the JMS level packaged with that release. To
differentiate the changes in the JMS client, please see Section 1.1.3. To ensure a fair comparison, the JMS
1.0.2 API is used for all tests in this section.
1.1.1 Point-to-point peak throughput
Onboard request-response scenario is used for these tests (see page 24). More results from this scenario
are in section 1.2.
WebSphereMQ point-to-point JMS comparison
6.0 GA
6.0 GA5.3 CSD 10
5.3 CSD 105.3 CSD 1
5.3 CSD 1
0
1000
2000
3000
4000
5000
6000
7000
Nonpersistent Persistent
peak roundtrips / sec
6.0 GA
5.3 CSD 10
5.3 CSD 1

Figure 1 – WebSphere MQ point-to-point JMS comparison

Product
Peak
throughput
Server CPU
(%)
Peak
throughput
Server CPU
(%)
6.0 GA 6343 97.50 2588 20.67
5.3 CSD 10 6452 97.12 2581 88.79
5.3 CSD 1 5836 97.96 2465 88.83
Nonpersistent Persistent

Table 1 – WebSphere MQ point-to-point JMS comparison

The throughput of this test is the peak number of requests processed per second. When compared to
WebSphere MQ V5.3 CSD 1, throughput increases by 8.6% for nonpersistent messaging and 5.0% for
persistent messaging.
JMS Performance with WebSphere MQ for Windows V6.0
Page 2

1.1.2 Publish-subscribe peak throughput
Publish-subscribe scenario 1-n is used for these tests (see page 25). More results from this scenario are in
section 1.3.
SupportPac MA0C has provided WMQ publish-subscribe functionality for many years. As of WebSphere MQ
V5.3 CSD 8, this is now part of the standard WMQ installation.

WebSphere MQ publish-subscribe JMS comparison
6.0 GA
6.0 GA5.3 CSD 10
5.3 CSD 105.3 CSD 1
5.3 CSD 1
0
5000
10000
15000
20000
25000
30000
35000
Nonpersistent Persistent
peak messages / sec
6.0 GA
5.3 CSD 10
5.3 CSD 1

Figure 2 – WebSphere MQ publish-subscribe JMS comparison

Product
Peak
throughput
Server CPU
(%)
Peak
throughput
Server CPU
(%)
6.0 GA 29876 86.08 4616 3.58
5.3 CSD 10 30425 90.56 5305 32.94
5.3 CSD 1 26215 71.92 5162 30.86
Nonpersistent Persistent

Table 2 - WebSphere MQ publish-subscribe JMS comparison

The throughput of this test is the total number of messages going in and out of the broker. When compared
to WebSphere MQ V5.3 CSD 1, throughput has increased by 14.0% for nonpersistent messaging and
decreased by 10.6% for persistent messaging.
JMS Performance with WebSphere MQ for Windows V6.0
Page 3

1.1.3 JMS client efficiency
The following figure demonstrates the CPU that is used by the client Java processes in the point-to-point
tests. The Client send-receive scenario is used (see page 25). This is a measure of efficiency not
throughput, therefore lower values are better.

WebSphere MQ point-to-point client efficiency
6.0 GA
6.0 GA5.3 CSD 10
5.3 CSD 105.3 CSD 1
5.3 CSD 1
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
Nonpersistent Persistent
CPU per 1000 rt
6.0 GA
5.3 CSD 10
5.3 CSD 1

Figure 3 - JMS client efficiency

Product
Peak
throughput
Mean client
CPU (%)
CPU per
1000 rt
Peak
throughput
Mean client
CPU (%)
CPU per
1000 rt
6.0 GA 9681 26.08 2.7 3768 14.62 3.9
5.3 CSD 10 10448 30.19 2.9 3940 16.94 4.3
5.3 CSD 1 11265 32.97 2.9 4116 18.09 4.4
Nonpersistent Persistent

Table 3 - JMS client efficiency

The validity of a simple metric like “CPU per 1000 roundtrips” relies on a linear relationship between
throughput and CPU. Although it is not shown here, this has been confirmed on all corresponding results.
Figure 3 demonstrates that messaging efficiency for nonpersistent messages in WebSphere MQ V6.0 has
improved by 8.0%. This can be stated as “the same throughput now takes 8% less CPU in the clients”.
In the persistent case, the efficiency improvement is 11.7% over WebSphere MQ V5.3 CSD 1.
JMS Performance with WebSphere MQ for Windows V6.0
Page 4

1.2 WebSphere MQ V6.0 JMS point-to-point
There are several measures of point-to-point messaging that can be used to demonstrate different
scenarios. These figures represent the performance of the server under test and care should be taken when
extrapolating data to other environments
All tests were performed using the JMS 1.1 API and 2KB messages.
1.2.1 Maximum queue throughput
To demonstrate the maximum throughput of a single queue, the Client send-receive scenario is used (see
page 25). In these tests, each client sends a message to a queue then gets the same message back.
Throughput is measured as the number of completed send-receive operations, also known as roundtrips.
1.2.1.1 Nonpersistent

Nonpersistent JMS queue throughput
0
2000
4000
6000
8000
10000
12000
14000
16000
0 10 20 30 40 50
requestors
roundtrips / sec
1 queue
4 queues

Figure 4 - Maximum nonpersistent queue throughput

requestors Throughput
Server CPU
(%)
Mean client
CPU (%) Throughput
Server CPU
(%)
Mean client
CPU (%)
2 2268 13.21 5.91 2298 12.21 5.98
8 7499 46.29 19.78 7580 45.75 19.99
14
10040
76.50
26.06
10903 75.67 28.48
20 9840 78.17 26.01 12852 93.42 34.99
26 9616 78.17 25.99 13661 96.75 40.15
32 9514 78.08 26.23 13889 97.96 44.38
38 9435 77.75 26.35
13893
98.25
47.35
44 9339 78.29 26.68 13680 98.46 49.00
1 queue 4 queues

Table 4 - Maximum nonpersistent queue throughput

The results demonstrate that higher total throughput is achieved by using more than a single queue. As with
any other shared resource, access to a queue must be serialised. On any given system, there is a maximum
rate at which a single shared resource can be accessed. It is therefore advisable on a high-volume system
to use multiple queues and hence reduce the contention on each.
Table 4 shows that too much contention on a single queue has limited the system’s ability process
messages. With multiple queues, 98% CPU utilisation is reported on server and the peak rate is 38.3%
higher.
JMS Performance with WebSphere MQ for Windows V6.0
Page 5

1.2.1.2 Persistent

Persistent JMS queue throughput
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 10 20 30 40 50
requestors
roundtrips / sec
1 queue
4 queues

Figure 5 - Maximum persistent queue throughput

requestors Throughput
Server CPU
(%)
Mean client
CPU (%) Throughput
Server CPU
(%)
Mean client
CPU (%)
2 684 10.83 2.59 683 10.67 2.60
8 1357 15.46 4.67 1514 19.04 6.14
14 2348 34.67 8.63 2619 39.42 9.56
20 3218 53.71 11.62 3263 52.92 12.34
26 3574 62.08 13.10 3639 59.67 13.78
32 3694 66.79 14.08 3778 64.92 14.63
38 3720 68.46 14.33 3851 68.50 15.01
44 3754 70.46 14.40 3924 70.79 15.51
50 3767 71.25 14.91 3954 72.92 16.04
1 queue 4 queues

Table 5 - Maximum persistent queue throughput

Table 5 shows that for persistent, transacted messaging there is very little difference between the two tests.
The rates are much lower (around 3000 roundtrips per second) and contention on the queue is not an issue.
Instead, the rate is limited by contention on the disk resource. Nevertheless, a 5% improvement in peak rate
is observed.
JMS Performance with WebSphere MQ for Windows V6.0
Page 6

1.2.2 Request-response variations
A common question is whether or not to place a JMS responder process on the same hardware as the
queue manager. The following sections compare the two configurations: Onboard request-response
scenario and Offboard request-response scenario (pages 24 and 24 respectively). In the offboard scenario,
the machine that hosts the responder process is of an identical specification to the queue manager machine
and is connected on the same gigabit Ethernet network. In the onboard scenario, the responder uses
bindings connections.
In both cases, the results represent messaging with no further workload. The balance of messaging speed
and CPU usage on the machines involved is highly specific to the scenario and therefore does not lend it self
to generalised conclusions.
1.2.2.1 Nonpersistent
Nonpersistent request-response throughput
0
1000
2000
3000
4000
5000
6000
7000
8000
0 5 10 15 20 25
requestors
roundtrips / sec
Onboard responder
Offboard responder

Figure 6 – Nonpersistent request-response throughput
In this nonpersistent test, Figure 6 shows the onboard and offboard responders are converging to very
similar throughput as load increases.
1.2.2.2 Persistent
Persistent request-response throughput
0
500
1000
1500
2000
2500
3000
0 10 20 30 40 50
requestors
roundtrips / sec
Onboard responder
Offboard responder

Figure 7 – Persistent request-response throughput
Figure 7 shows the persistent case. At the peak values (2592 and 2096), the onboard scenario is 24%
faster than the offboard responder. This demonstrates it is distinctly better for this scenario.
JMS Performance with WebSphere MQ for Windows V6.0
Page 7

1.3 WebSphere MQ V6.0 JMS publish-subscribe
These tests are of the WebSphere MQ V6.0 broker and not of any other IBM publish-subscribe solutions
available. For information on these products, visit:
http://www.ibm.com/software/sw-bycategory/subcategory/SW910.html
In all publish-subscribe tests, “throughput” is the total number of messages going in and out of the broker.
The Publish-subscribe scenario 1-n is used for the following tests.
1.3.1 Nonpersistent
Nonpersistent publish-subscribe throughput
0
5000
10000
15000
20000
25000
30000
35000
0 10 20 30 40 50
subscribers
total messages / sec
WebSphere MQ V6.0

Figure 8 - WMQ broker nonpersistent throughput

subscribers Rate rate/sub/s
response
time (ms)
Server CPU
(%)
Mean client
CPU (%)
2 7248 2416 0.41 20.50 3.60
8 20645 2294 0.44 49.17 14.34
14 27667 1844 0.54 73.44 17.43
20
29876
1423
0.70
86.08
18.67
26 29484 1092 0.92 84.08 18.98
32 29500 894 1.12 83.61 18.05
38 29857 766 1.31 87.89 19.56
44 29729 661 1.51 88.61 20.35
50 29614 581 1.72 88.53 19.87
Throughput CPU

Table 6 - WMQ broker nonpersistent throughput

JMS Performance with WebSphere MQ for Windows V6.0
Page 8

1.3.2 Persistent
Persistent publish-subscribe throughput
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0 10 20 30 40 50
subscribers
total messages / sec
WebSphere MQ V6.0

Figure 9 - WMQ broker persistent throughput

subscribers Rate rate/app/s
response
time (ms)
Server CPU
(%)
Mean client
CPU (%)
2 1511 504 1.99 13.22 1.05
8 2422 269 3.72 14.00 2.38
14 3203 214 5 18.58 2.50
20 3699 176 6 21.22 2.88
26 4020 149 7 23.94 4.21
32 4229 128 8 27.25 4.38
38 4495 115 9 27.19 4.19
44 4506 100 10 26.36 4.05
50 4500 88 11 26.03 3.66
Throughput CPU

Table 7 - WMQ broker persistent throughput

JMS Performance with WebSphere MQ for Windows V6.0
Page 9

2 Additional comparisons
2.1 JMS reliable messaging
WebSphere MQ 5.3 CSD6 introduced a new class of persistence to the queue manager and the JMS
applications that connect to it. This new persistence model offers higher performance but is not guaranteed
to retain all messages for unplanned shutdowns.
By default, selection of persistent messaging in a JMS client uses WebSphere MQ’s long-standing and
proven delivery mechanisms to provide that persistence. For all customer uses that require true assured
delivery, this remains the only advised option.
The following figure shows the JMS throughput for persistent and reliable messaging, Client send-receive
scenario (see page 25).

JMS Reliable Messaging
0
1000
2000
3000
4000
5000
6000
7000
0 10 20 30 40 50
driving applications
roundtrips / sec
Persistent
Reliable

Figure 10 - Comparing JMS persistent and reliable messaging

Figure 10 shows that, at the measured peak values (3145 and 5554 respectively), the throughput of the
Reliable model is a 44.6% faster than the corresponding persistent test.

JMS Performance with WebSphere MQ for Windows V6.0
Page 10

2.2 Local bindings connection
JMS clients on the same physical hardware as the WebSphere MQ queue manager can connect in a more
direct manner by using local bindings. This avoids the use of the TCP/IP stack.
The following test is a Local send-receive scenario (see page 25) run with client connections (locally) and
with bindings. The results can be compared to the Client send-receive scenario by looking at the “single
queue” results from Section 1.2.1.
Section 1.2.2 provides a different comparison, demonstrating the contrast between local bindings and
remote client connections on request-response scenarios.
2.2.1 Nonpersistent bindings
All messages here are nonpersistent and nontransacted.

Nonpersistent bindings comparison
0
2000
4000
6000
8000
10000
12000
14000
16000
0 2 4 6 8
requestors
roundtrips / sec
Client
Bindings

Figure 11 - Comparing nonpersistent local connections

requestors Throughput
Server CPU
(%)
CPU per
1000 rt Throughput
Server CPU
(%)
CPU per
1000 rt
1 2695 36.04 13.4 6285 25.04 4.0
2 4004 63.67 15.9 10599 49.33 4.7
3 4652 85.00 18.3 11328 74.42 6.6
4 4808 95.08 19.8
14291
93.83
6.6
5 4944 98.12 19.8 12459 97.00 7.86
4998
98.96
19.8
11865 97.25 8.2
7 4926 99.00 20.1 11421 97.21 8.5
8 4817 99.04 20.6 11037 97.46 8.8
Client connection Bindings connection

Table 8 - Comparing nonpersistent local connections

Figure 11 shows the increase in performance through the use of bindings connections. Table 8 includes the
associated CPU utilisation and demonstrates that the bindings operations require less than half as much
CPU as their client-connected counterparts.
JMS Performance with WebSphere MQ for Windows V6.0
Page 11

2.2.2 Persistent bindings
Here we see a Local send-receive scenario (see page 25) run with client connections and with bindings. All
messages are persistent and transacted.
Persistent bindings comparison
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 10 20 30 40 50
requestors
roundtrips / sec
Client
Bindings

Figure 12 - Comparing persistent local connections

requestors Throughput
Server
CPU(%)
CPU per
1000 rt Throughput
Server CPU
(%)
CPU per
1000 rt
4 1201 38.25 31.9 1228 17.88 14.6
12 2103 79.96 38.0 2654 48.33 18.2
20
2198
90.42
41.1
3167 63.50 20.1
28 2077 92.00 44.3 3479 74.92 21.5
36 1981 92.38 46.6
3536
79.92
22.6
44 1915 93.12 48.6 3446 81.33 23.6
Client connection Bindings connection

Table 9 - Comparing persistent local connections

Even though persistent numbers have a physical bottleneck in their requirement to write their data to disk
before completion, the bindings connections are able to achieve higher throughput. Again, the operation
uses half as much CPU for JMS applications using bindings connection. In the client-connected case, the
increased time taken per message is limiting the efficient use of the disk.
JMS Performance with WebSphere MQ for Windows V6.0
Page 12

2.3 JMS message types
JMS defines several types of message body which cover the majority of messaging styles currently in use.
See the JMS documentation for a description of what differentiates each of these message-types and when
you might want to use them.
In this test we use the same source data (a 2048 byte String object) and perform a benchmark using each
message-type. It should be noted that the empty message type does not have a body, and is effectively a
different test. It is included for completeness only. Shown here is the JMS client efficiency for nonpersistent
messaging, Client send-receive scenario (see page 25).

Message types client efficiency
empty
text
object
bytes
stream
map
0.00
1.00
2.00
3.00
4.00
5.00
6.00
message type
CPU per 1000 rt
empty
text
object
bytes
stream
map

Figure 13 – Comparing JMS message types

Figure 13 demonstrates the efficiency of the messaging types with lower values being better.
1. The same total server throughput is obtained with any of the message types. From the point of view
of the server, the data that is being transported in each case is near identical.
2. The efficiency of text, object and bytes messages are identical. In JMS, these message-types are
very simple, single objects.
3. There is a higher cost associated with the use of stream and (particularly) map messages. In
contrast to the previous message-types, stream and map contain their data in a structure and
provide extra function based upon that structure. In both cases, the decreased efficiency is
attributed to the increased CPU-cost of encoding and decoding a complex object into a transmittable
format.
Tests with larger message sizes (not shown here) show that the relative degradation, between simple and
complex message-types increases with message size.
JMS Performance with WebSphere MQ for Windows V6.0
Page 13

2.4 JMS message sizes
The effect of increasing message size was investigated by using the Client send-receive scenario (page 25)
with a range of nonpersistent data volumes. The maximum message size in WebSphere MQ V6.0 is 100MB
without the use of segmentation, so this value represents the upper bound of the testing.

Message
size
Peak
throughput
Total data
(kB/s)
Mean client
CPU (%)
Server CPU
(%)
Mem per rt
(kB)
64B 12292 1537 24.24 79.08 17
256B 12073 6036 24.23 47.67 18
1kB 11013 22026 24.85 79.08 21
4kB 8270 66162 26.06 81.00 28
16kB 4298 137218 28.62 76.00 76
64kB 1238 158499 28.27 50.21 251
256kB 322 165023 28.69 55.67 1198
1MB 58 117760 15.40 22.33 4555
4MB 8 61768 10.60 14.12 17996
16MB 2 52756 10.58 13.75 90117
64MB 0.4 56361 11.60 14.33 429245
Throughput CPU at peak

Table 10 - JMS message sizes

Table 10 gives details of some of the message sizes tested. For each size, the peak throughput is reported
and the corresponding statistics are those measured at time of the peak. The subsequent sections provide
analysis of this data. Pay careful attention to the logarithmic axes when interpreting charts in this section.
2.4.1 Message size throughput
Message size throughput
0
1
10
100
1000
10000
100000
64B
256B
1k
B
4kB
16
kB
64kB
256kB
1MB
4MB
16
MB
64MB
Message size
Peak throughput (rt/s)

Figure 14 - Total throughput per message size

The throughput, in roundtrips per second, has a nonlinear relationship to the size of user data being
transmitted. This is primarily defined by the network infrastructure. Messages less than 1kB fit into a single
network packet and consequently have very similar throughput. Beyond this point, the throughput tends to
be approximately linear.
CPU efficiency in both server and clients follows an identical pattern to the throughput with respect to the
message size.
JMS Performance with WebSphere MQ for Windows V6.0
Page 14

2.4.2 Client memory use
Transient memory is the working memory used and released for each operation. The following chart shows
the transient memory required for each roundtrip; the information is not broken down into the individual send
and receive components. This was measured by enabling verbose garbage collection on all JMS clients and
aggregating the data. Section 3.2 gives more information on how to measure and set heap sizes for the
current and expected workloads.
Memory used per message
1
10
100
1000
10000
100000
1000000
64B
256B
1kB 4kB
16kB64kB
256kB
1M
B
4MB
16MB
64
MB
Message size
Memory per roundtrip (B)

Figure 15 - JMS transient memory use

• These values do not take into account any further operations done per-message in the user
application. In practice, you should be aware of such memory requirements.
• Measurement shows there is little change in the memory consumption for persistent or transacted
messages.
JMS Performance with WebSphere MQ for Windows V6.0
Page 15

2.4.3 Cumulative data transfer rate
The raw throughput of these tests rapidly becomes constrained by the bandwidth and latency limits of the
network infrastructure. Nevertheless, it may be of use to note the characteristics of the message size on the
volume of user data transferred by WebSphere MQ.
Message size data transfer
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
64B
256B
1kB 4k
B
16kB 64kB
256kB
1MB
4M
B
16MB
64MB
Message size
Total data transfer (kB/s)

Figure 16 - User-data transfer efficiency

Figure 16 shows the total data rate is limited at its peak by the network interface card (NIC) in the server
machine. Disregarding this, the optimum range of message sizes for data throughput is between 32kB and
512kB.
JMS Performance with WebSphere MQ for Windows V6.0
Page 16

2.5 JMS selectors
A JMS message selector allows a client to filter the messages that it is interested in by using the message
header. Messages with headers that match the selector are delivered to the application. In the model for
point-to-point JMS in WebSphere MQ V6.0 this matching is done by each client, except for the JMS
messageId and correlationId fields. These two fields are tied directly to the corresponding fields in the
underlying WebSphere MQ message. In this way, we are using the optimised matching support that is used
by the base product and selection is done at the server (in the queue manager).
It is important to note that the selection models for publish-subscribe messaging provided by the
WebSphere Business Integration Message Brokers (WBIMB) product family allows for more
sophisticated server-side processing of all JMS selectors and hence have very different performance
characteristics. These figures relate only to point-to-point messaging in WebSphere MQ V6.0.
2.5.1 CorrelationId as an arbitrary selector
Selecting on correlationId is not normally logical for publish-subscribe messaging where no prior link exists
between the publisher and subscriber of a topic.
In this comparison we use a correlationId selector in the same way as described in Client send-receive
scenario (see page 25), but without using the optimised matching (see 3.3 Use of correlation identifiers),
effectively turning it into an arbitrary JMS selector that will only ever match one message on the queue at
any time. This also implies that there is no overlap between selectors (they are disjoint). This represents
one kind of worst case scenario for selectors.
JMS selectors vs CorrelationId
0
2000
4000
6000
8000
10000
12000
Persistent Nonpersistent
peak roundtrips / sec
Optimised CorrelationId
Arbitrary CorrelationId

Figure 17 - Effect of JMS selectors, client-side selection

Throughput
Server CPU
(%)
Mean client
CPU (%) Throughput
Server CPU
(%)
Mean client
CPU (%)
Persistent 3768 69.88 14.62 970 33.38 17.73
Nonpersistent 10040 76.5 26.06 1887 30.96 18.10
Optimised CorrelationId Arbitrary CorrelationId

Table 11 - Effect of JMS selectors, client-side selection

In this scenario shows the peak nonpersistent performance is 18.8% that of the optimised case. For
persistent messaging, this figure becomes 25.7%. In both cases, this also causes a large increase in the
CPU used per message at both client and server. The open nature of JMS selectors means that, dependant
on the scenario, performance can be anywhere equal to or far less than the non-selector case. This
information clarifies how important it is to use the optimised matching for scenarios involving correlationId.
JMS Performance with WebSphere MQ for Windows V6.0
Page 17

2.5.2 Arbitrary selectors
Selectors provide the programmer with a flexible design framework but should never be faster than a well-
architected system. Use of JMS selectors is discouraged for performance reasons but, if they must be used,
care should be taken with the selection criteria to minimise their detrimental affect on performance.
Recommendations
• Disjoint selectors, as used here to demonstrate one type of worst case scenario, can also be
implemented by using multiple queues, topics or perhaps correlationId.
• When selectors are used, every effort should be made to minimise their complexity and also to make
them fail as quickly as possible when a message does not match.
Example:
“height=183 AND gender=male” will fail faster than “gender=male AND height=183” since we can
assume that in the selection domain there are fewer people with that exact height than there are
males.
JMS Performance with WebSphere MQ for Windows V6.0
Page 18

2.6 Java, JMS and C programming interfaces
The majority of C programs accessing WebSphere MQ do so using the MQI programming interface. The
“WebSphere MQ classes for Java” is used to send messages from a Java program (without using JMS).
This API maps onto the underlying MQI calls (see Section 3.5) and gives detailed control over the aspects
of WebSphere MQ messaging.
When reviewing a direct comparison between MQI and Java based messaging, it should be remembered
that the added services and flexibility of JMS and Java mean that raw performance is merely one aspect of
this comparison.
The Java language provides many built-in services such as memory management, security, sophisticated
exception handling and portable, platform-independent code that do not exist in a compiled language such
as C. JMS complements this with a common standards-based, vendor-independent approach to messaging
technologies. In addition to basic messaging, it provides simple access to message selectors, asynchronous
delivery, publish-subscribe messaging, and a means of communication with a J2EE application server.
When looking at the benchmarks used in this report, the source code for the JMS application is several times
smaller than that of the MQI application (when regarded as either of number-of-lines or as total-file-size) for
the comparable functions they serve. This translates to faster development, testing and maintenance of the
application in JMS.
2.6.1 Throughput comparison
The Client send-receive scenario (see page 25) is shown here for nonpersistent messaging. Persistent
messaging achieves the very similar throughput across the different APIs and is differentiated primarily by
the client efficiency (see next section).
API comparison for nonpersistent messaging
0
2000
4000
6000
8000
10000
12000
0 5 10 15 20 25
requestors
roundtrips / sec
JMS
WMQ Java
MQI

Figure 18 - Java, JMS and MQI throughput

The peak throughputs are in Table 12. The two Java-based APIs are very similar, with the WMQ Java
peaking 3.0% above the JMS API. The MQI interface provided better overall performance peaking 13.7%
higher than JMS.
JMS Performance with WebSphere MQ for Windows V6.0
Page 19

2.6.2 Client CPU efficiency
API comparison client efficiency
JMS
JMSWMQ Java
WMQ JavaMQI
MQI
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
Nonpersistent Persistent
CPU per 1000 rt
JMS
WMQ Java
MQI

Figure 19 - Java, JMS and MQI efficiency

API
Peak
throughput
Client CPU
(%)
CPU per
1000 rt
Peak
throughput
Client CPU
(%)
CPU per
1000 rt
JMS 10002 25.88 2.6 3768 14.62 3.9
WMQ Java 10301 27.28 2.6 3808 14.89 3.9
MQI 11377 16.85 1.5 3969 9.92 2.5
Nonpersistent Persistent

Table 12 - Java, JMS and MQI efficiency

Figure 19 graphs the efficiency of the three APIs on the client machines for the persistent and nonpersistent
tests. The results were similar in both cases with the Java-based APIs requiring far more client CPU than
the MQI applications. CPU efficiency was 75% higher for nonpersistent messaging and 55% higher for
persistent. Section 3.5 gives details on how the JMS API layers over the MQI interface.

JMS Performance with WebSphere MQ for Windows V6.0
Page 20

2.7 Windows and Linux JMS client efficiency
In order to provide a short comparison of the client efficiency of the two Operating Systems, the client
machines were rebooted into Red Hat Enterprise Server (RHES) 3 Update 5 and installed with IBM Java
1.4.2. The server (queue manager) was unchanged. The Client send-receive scenario was used (see page
25).
OS comparison client efficiency
Linux
LinuxWindows
Windows
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
Nonpersistent Persistent
CPU per 1000 rt
Linux
Windows

Figure 20 – JMS efficiency in Operating Systems

API
Peak
throughput
Client CPU
(%)
CPU per
1000 rt
Peak
throughput
Client CPU
(%)
CPU per
1000 rt
Linux 10078 18.13 1.8 3827 12.50 3.3
Windows 10031 26.08 2.6 3768 14.62 3.9
Nonpersistent Persistent

Table 13 - JMS efficiency in Operating Systems

The above results demonstrate that there is little perceived difference to the (Windows) queue manager of
either Windows or Linux clients residing on the same class of hardware. There is a difference, however, in
the efficiency of the operation. At their peak values, the Linux client used 30% less CPU for nonpersistent
operation and 15% for less for persistent operation. This efficiency increase has not been broken down into
component causes, but is probably a combination of Linux, the JVM and the WebSphere MQ client code.
JMS Performance with WebSphere MQ for Windows V6.0
Page 21

3 Tuning/programming guidelines
3.1 Tuning the queue manager
Performance reports with tuning information for WebSphere MQ v6.0 on each platform can be found on the
IBM SupportPac webpage at the following URL:
http://www.ibm.com/software/integration/support/supportpacs/perfreppacs.html
The main tuning actions taken for the tests in this report were:
• Use of multiple physical disk arrays (with IBM SSA battery-backed cache)
• Log / LogBufferPages = 4096
• Log / LogFilePages = 16348
• Log / LogPrimaryFiles = 16
• Channels / MQIBindType = FASTPATH
• TuningParameters / DefaultQBufferSize = 1MB
• TuningParameters / DefaultPQBufferSize = 1MB
3.2 Tuning minimum heap size for Java
During operation, current garbage collectors (GC) will normally interrupt the execution of all other threads in
a JVM to some extent. The level of interruption depends on the amount and the type of work the GC is
doing. This is largely dependant on how the memory is being used by the application and the GC settings
currently in operation.
Messaging in Java has characteristics such that fixed memory requirements are low and transient memory
requirements are high. Without tuning, or with incorrect tuning, the automatic garbage collection policies of
Java do not favour messaging.
The most common GC settings are:
• –Xms Minimum heap size.
• –Xmx Maximum heap size.
• –verbose:gc Display garbage collection events.
As an example, the following line sets the heap limits to 64MB and 256MB and enables verbose garbage
collection.
java -Xms64M –Xmx256M –verbose:gc
Recommendations
• Use –verbose:gc to monitor the frequency of your application’s garbage collection under different
loads and adjust the minimum and maximum heap sizes accordingly.
• A garbage collection interval of less than one second is detrimental to performance. A sensible
minimum GC interval is 1-2 seconds.
• Do not leave the minimum heap size unset. If left unset, the heap will not be expanded. With a
small (the default) minimum heap size, the GC will operate very frequently, reclaiming transient
memory but not extending the heap (since we have already stated that most of the memory is
released immediately after it is requested when messaging).
• It is a common mistake to fix both minimum and maximum heap settings to a single large value.
When the minimum heap size is set too high, the GC can operate very infrequently, have more work
to do and can create a noticeable pause in response times. If you cannot avoid this situation,
consider having multiple independent JMS applications serving the same destinations. When one
application is stalled on garbage collection another might still be serving.
JMS Performance with WebSphere MQ for Windows V6.0
Page 22

3.3 Use of correlation identifiers
• Selecting against correlationId or messageId follows an optimised path through WebSphere MQ 5.3
and the selection occurs on the server-side (in the queue manager). This gives better performance
than when using arbitrary JMS selectors.
• Use of the provider-specific “ID:” tag is applicable to these two fields only and is of practical use only
with correlation identifiers.
• To use the optimised path, the correlationId must be prefixed with “ID:” and must be formatted
correctly as 24 bytes represented as a hex-string (of 48 characters). Failure to adhere to this means
the selection will revert to expensive client-side methods.
Example:
session.createReceiver(
inqueue,
“JMSCorrelationID=’ID:574d51363053616d706c65436f7272656c6174696f6e4944’”);
In this case, the hexadecimal represents a 24-byte ASCII string “WMQ60SampleCorrelationID”
• The safest way of generating a correct identifier is to use
JMSMessage.setJMSCorrelationIDAsBytes. This allows the formatted version to be returned by
getJMSCorrelationID. The number of bytes input should not be more than 24 or the identifier will be
truncated.
Example:
message.setJMSCorrelationIDAsBytes( “WMQ60SampleCorrelationID”.getBytes(“UTF8”) );
session.createReceiver(
inqueue,
“JMSCorrelationID=’” + message.getJMSCorrelationID() + ”’” );
• A change to the correlationId (or indeed any selector) that you are matching against requires
opening a new QueueReceiver and discarding the old one. This is an expensive operation if it is
done for every message that is processed since it involves closing and re-opening the underlying
queue (see Section 3.5). For this reason, you should consider, if possible, generating your own
correlationId for each client rather than the common design pattern of using the messageId of a sent
message as the correlationId of its reply. Another alternative is to use a temporary queue per client.
3.4 Asynchronous receivers
An alternative to manually making calls to receive a message is to register a method that is called
automatically when a suitable message is available. This is achieved by JMS creating an internal thread to
receive messages from the QueueSession on behalf of the application.
• A single thread services asynchronous receivers for each QueueSession that is opened. If you
create multiple onMessage() listeners on a single session you are serialising the access, even if
each listener is targeted toward a different queue.
• A single thread is created for every asynchronous QueueSession. You are not reducing the number
of threads by registering listeners.
3.5 How WebSphere MQ JMS is connected to the C interface (MQI)
The JMS API that is provided for WebSphere MQ is layered over the “WebSphere MQ classes for Java”,
which are in turn layered over the MQI interface. If you are familiar with the MQI interface you might find it
useful to know how the MQI verbs and handles map to JMS objects.
MQCONN / MQDISC
MQHCONN is held at the Session level and not released until the object is closed.
MQOPEN / MQCLOSE
MQHOBJs are held in the objects MessageProducer and MessageConsumer (the parents of
QueueReceiver, QueueSender, TopicSubscriber and TopicPublisher). The queue resources are accessed
as soon as the JMS object is created
MQPUT / MQGET
JMS Performance with WebSphere MQ for Windows V6.0
Page 23

These are called when the producer or consumer sends or receives a message
MQCMIT / MQBACK
These are called when the application commits or roll back operations on a Session
Recommendations
• Do not throw away connection and session objects that you think you don’t need. This will preclude
the ability to release their resources in a timely fashion, which will have an impact on the queue
manager in a capacity situation.
• Always call the close() method on JMS connection and session objects when they are no longer
needed. This releases the underlying resource handle. This is especially important for publish-
subscribe, where clients need to deregister from their subscriptions. Closing the objects allows the
queue manager to release the corresponding resources in a timely fashion; failure to do so can
affect the capacity of the queue manager for large numbers of applications or threads.
• Do not lose references to connection and session objects (e.g. after registering an asynchronous
listener) as this precludes being able to call their close() methods.
• To ensure an application or internal object will always tidy up correctly, including if it should fail,
these close() calls should be made in the final part of a try-catch-finally control structure.
• Do not create sender or receiver objects regularly if you can reuse them instead. This avoids
releasing then re-acquiring the same queue manager resource.
• Always call delete() on temporary queues and topics when they are no longer needed. Otherwise,
they will not be deleted until the connection is closed. For long running applications this will cause
performance and administration problems.
JMS Performance with WebSphere MQ for Windows V6.0
Page 24

4 Scenarios used in this document
The subsequent sections give detailed descriptions of the logical layout of the scenarios used in the
preparation of this document. Unless stated otherwise for a specific test, the following articles hold for all
tests:
• The message format used is a 2048 byte JMSTextMessage.
• Persistent messaging is done within a transaction.
• All applications run in a loop with no waits, think-times or fixed rates.
• “IBM Performance Harness for Java Message Service” is used in all cases.
• Each sample point reported is the average of four minutes of reporting.
• At no point were messages building up on WebSphere MQ queues.
4.1 Onboard request-response scenario
network
s
e
r
v
e
r

/

r
e
s
p
o
n
d
e
r
s
r
e
q
u
e
s
t
o
r
s
1
2
3
4

In this scenario, the responders always use local bindings connections to the queue manager.
1 A requestor places a message onto the single request queue, setting its CorrelationId as it does so.
2, 3 A responder receives it and sends it, unaltered, onto the single reply queue. For persistent
messages, both operations are completed within a single transaction.
4 The requestor then receives the reply message that matches their CorrelationId.
4.2 Offboard request-response scenario
network
network
s
e
r
v
e
r
r
e
q
u
e
s
t
o
r
s
1
2
3
4
r
e
s
p
o
n
d
e
r
s

This is similar to the Onboard request-response scenario above except that the responders are run on a
separate machine. This machine has an identical specification to the queue manager machine.
1 A requestor places a message onto the single request queue, setting its CorrelationId as it does so.
2, 3 A responder receives it and sends it, unaltered, onto the single reply queue. For persistent
messages, both operations are completed within a single transaction.
4 The requestor then receives the reply message that matches their CorrelationId.
JMS Performance with WebSphere MQ for Windows V6.0
Page 25

4.3 Client send-receive scenario
network
s
e
r
v
e
r
r
e
q
u
e
s
t
o
r
s
1
2

1 A requestor places a message onto the single queue, setting its CorrelationId as it does so.
2 The requestor then receives the message that matches their CorrelationId from the single queue.
4.4 Local send-receive scenario
s
e
r
v
e
r

/

r
e
q
u
e
s
t
o
r
s
1
2

This is similar to the Client send-receive scenario above except that all clients are run on the queue manager
machine.
1 A requestor places a message onto the single queue, setting its CorrelationId as it does so.
2 The requestor then receives the message that matches their CorrelationId from the single queue.
4.5 Publish-subscribe scenario 1-n
network
network
s
e
r
v
e
r
p
u
b
l
i
s
h
e
r
1
2
s
u
b
s
c
r
i
b
e
r
s

All subscribers used unique subscriber queues. Persistent subscribers received five messages in each
transaction.
1 A publisher publishes a message to the single topic.
2 Each subscriber then receives the message.
JMS Performance with WebSphere MQ for Windows V6.0
Page 26

5 Measurement environment
5.1 Hardware
All machines are physically on the same switched Gigabit Ethernet and are part of the same subnet.
5.1.1 Server
IBM eServer x360
• 4 * 2.8GHz P4 Xeon
• 8GB RAM
• 3 * IBM SSA 15,000 RPM drives
• 1Gb Ethernet card
5.1.2 Drivers
Several machines were used to drive the tests:
2 * IBM Netfinity 6000R
• 4 * 700MHz P3 Xeon
• 2.5GB RAM
• 3 * SCSI 10,000 RPM drives
• 1Gb Ethernet card
4 * IBM Netfinity 8500R
• 4 * 700MHz P3 Xeon
• 4GB RAM
• 3 * SCSI 10,000 RPM drives
• 1Gb Ethernet card
1 * IBM eServer x360
• 4 * 2.8GHz P4 Xeon
• 4GB RAM
• 3 * IBM SSA 15,000 RPM drives
• 1Gb Ethernet card


5.2 Software
Windows 2003 Server, Enterprise Edition
IBM Java 1.4.2, Service Release 1a
IBM WebSphere MQ V6.0
IBM Performance Harness for Java Message Service (http://www.alphaworks.ibm.com/tech/perfharness)