TWP Java Collections Performance Evaluation

antlertextureSoftware and s/w Development

Jul 14, 2012 (5 years and 1 month ago)

255 views



TECHNICAL WHITEPAPER
Performance Evaluation | Java Collections Framework

Author: Kapil Viren Ahuja
Technical Whitepaper
Performance Evaluation | Java Collections
Framework
Date: October 17, 2008
Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 2

Table of Contents
1 Introduction......................................................................................3
1.1 Scope of this document.....................................................................3
1.2 Intended audience............................................................................3
2 Evaluation Approach..........................................................................4
2.1 Comparison parameters....................................................................4
2.2 Comparison scenarios.......................................................................5
2.3 Environment....................................................................................6
2.4 Execution and sampling....................................................................7
3 Measurements...................................................................................9
3.1 Insertion of unique elements (Long)...................................................9
3.2 Comparison of unique elements (Element)........................................10
3.3 Comparison of non-unique elements (Element)..................................12
3.4 Iteration over elements (Long)........................................................12
3.5 Iteration over elements (Element)....................................................14

Appendices
A Data Structure for custom class.......................................................16
B List of Tables...................................................................................18
C Change Log......................................................................................19

Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 3

Introduction
1 INTRODUCTION
Managing list or collection of objects is a very common scenario. In addition,
managing that list effectively, that provides the optimum performance is also a
very common need. The Java programming language offers many in-built data
types for representing and modeling collection of objects. Some of the commonly
used data types are:
java.lang.ArrayList
java.lang.HashSet
java.lang.TreeMap

Each of the data types behave differently under different scenarios. In addition,
when writing algorithms that demonstrate highest levels of performance it is
necessary to make the right choice. For many developers and architects it is not
an easy choice.
This document provides details of a comparison done across various data-types
supported by Java Collections Framework. In addition, it will study their
performance under different circumstances.
1.1 Scope of this document
This document provides performance data for various data types in Java
Collections Framework. It will not provide details of the Collections Framework or
about the data. This is a factor of how each collection data type is implemented
in Java and hence is subject to change from one implementation of the Java
Virtual Machine specification to another. If developers are interested in learning
the reasons behind performance are encouraged to read the Java documentation
on Sun Microsystems website.
This document does not contain any recommendations. This document only
covers performance results in a specific environment. How you interpret the
results and use these is entirely up to you.
1.2 Intended audience
All Java developers who are using or intend to use the Java Collections
Framework while developing an application want to decide which collection data
type to use in a given scenario.
Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 4

Evaluation Approach
2 EVALUATION APPROACH
To benchmark the performance we have to establish some common rules, which
can be consistently applied to various scenarios. These are listed below:

1. Comparison parameters
2. Comparison scenarios
3. Environment
4. Execution and sampling

2.1 Comparison parameters
For the success of any benchmark, it is critical that various parameters are
identified upfront. This helps in a consistent comparison. We had selected four
different parameters for our comparison. These are explained below:

Collection size
The very first parameter used in the benchmarking process is the size of the
collection itself. The number of elements contained in a collection identifies the
size of a collection.
Performance was benchmarked for varied sizes of 1000 to 100,000 in multiples
of 10.
We did not consider size less than 10000, as results for different data types were similar.
In addition, we did not consider size of 1,000,000 because we were running into Java
heap size issues.

Collection type
The second parameter used in the benchmarking process is the data type of the
Java Collections Framework. These are listed below:
1. ArrayList
2. LinkedList
3. HashSet
4. TreeSet
5. Vector
6. HashMap
7. TreeMap
8. LinkedHashMap

Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 5

If developers are interested to understand the Java Collections Framework, they
are encouraged to read more the following provided links:

Wikipedia

IBM

Data type of the elements stored
Another parameter used in the benchmarking process is the data type of the
element stored in the collection. Data types used were primitive, in-built and
user-defined. The intention of using all three kinds of data types is to provide
coverage across all kinds of data types. These are listed below:
1. In-built data type: java.lang.Long
2. User-defined data type: We created a custom class called "Element". We
created instance of this class with random data during the exercise. The
structure has been defined in Annexure Data Structure for custom class

Sample size
The fourth and last parameter used in the benchmarking process is the sample
size. It is a very common practice to repeat a process several times and collect
data points. This ensures that we have tested for consistency of the behavior.
Using this data a correlation can easily be drawn on the data set.

We performed 10 iterations for every scenario.
2.2 Comparison scenarios
To benchmark the performance, we identified a few but very commonly used
scenarios. These have been explained below:
Insertion
One of the very basic requirements of a collection is to insert an element or
number of elements into a collection. This scenario deals with the common use
case of inserting elements in a collection. We evaluated two aspects of the
scenarios:
1. In the first scenario, we inserted unique elements in a collection. We used
the value returned by the hashCode to identify the uniqueness of an
element. This was tested for elements of data type Long as two objects
return different values
2. In the second scenario, we inserted non-unique elements in a collection.
For creating non-unique elements, the hashCode method of the element
Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 6

class was overridden to return the same value always. This case was tried
only on data types of Set and Map because only these two types support
filtering out non-unique elements.

Iteration
Another very common use case is to iterate over a collection. We observed in
most cases, iteration over a collection is more frequently used scenario when
compared to insertion and deletion of elements.
2.3 Environment
Results of any performance benchmark are dependent on the environment on
which the data is being deduced. For the purposes of this benchmark, the
system specifications have been listed below:
Hardware specifications
Parameter
Value
Processor
Intel® Core™2 Duo CPU T8100
Number of CPUs 2
CPU speed
Both cores @ 2.10 GHZ
RAM model and make 3070 MB
Table 1: Hardware specifications

Software specifications
Parameter
Value
Operating System
Windows Vista Home Premium
Java runtime Java™ 2 Runtime Environment, Standard Edition (build
1.5.0_08-b03)
IDE Eclipse 3.3.2 Build id: M20080221-1800
Table 2: Software specifications

Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 7

2.4 Execution and sampling
During the benchmarking exercise all, the scenarios were run as per the
parameters agreed upon. We had available with us two approaches to record
samples for the benchmark. These have been listed below:

Iterations for scenarios
In this approach, we considered one scenario as one sample
. We then iterated
over the same scenario 10 times and collected samples. For example, we
inserted 10 records in an ArrayList and recorded one sample. This sample was
the time taken to insert 10 records in the collection. We repeated the process 10
times.
Elements in a collection
In this approach, we considered one element
as one sample
. For example, we
inserted 10 records in an ArrayList and recorded a sample which was the time
taken to insert that element in the collection. At the end of the use case, we had
10 samples.
For the purposes of this evaluation, we opted for the former approach, because
in most common cases, a user is interested in performance of one complete
operation. The later approach is not so much useful, because it will not provide
diversified samples. Hence, measuring the predictability of the collection is not
feasible. In addition, collecting samples for iterations will ensure that any
variation during while adding elements to collection are captured.

Interpretation of results
Benchmark was prepared using various mathematical parameters. These have
been listed below:
S. No

Parameter
Description
Symbol

Unit

1
Iterations
The total number of times a specific
scenario was performed
n
N.A.
2 Minimum time Minimum amount of time taken to
complete an iteration
min ￿s
3
Maximum time
Maximum amount of time taken to
complete an iteration
max
￿s
4 Total time Total time taken to complete all iterations T ￿s
Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 8

5
Mean time
Average time taken to complete all
iterations
M
￿s
6 Standard
Deviation
Standard deviation of the operation from
mean
σ ￿s
7
Samples
outside σ
Number of samples that are outside 1
sigma
m±σ
N.A.
8 Samples
outside 2σ
Number of samples that are outside 2
sigma
m±2σ N.A.
9 Samples
outside 3σ
Number of samples that are outside 3
sigma
m±3σ N.A.
Table 3: Mathematical parameters for benchmark

To compare a collection for a given scenario, the following two factors should be
looked at collectively:
1. Mean time: This represents the average time consumed to perform an
iteration of a scenario. It is calculated as average of the times taken for
all the iterations
2. Samples outside sigma: Standard Deviation is the factor that determined
the stability of a distribution. It has been proven that a distribution is said
to be most stable if it follows a Normal distribution. As per the laws of the
normal distribution if the number of samples outside the lower and upper
control limits of mean and standard deviation are less, the distribution is
said to be more stable.

When comparing two or more data types, we should look for a data type that is
the fastest in a given scenario. However, if the faster data type is less stable
then we cannot predict the same performance every time. This will mean that in
a real scenario there is a higher probability of the data type to run slower of
faster than expected.
However, a more stable data type, which is a little slow in execution, is a better
option.
Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 9

Measurements
3 MEASUREMENTS

3.1 Insertion of unique elements (Long)
Comparison for data size of 10000 elements (Long)

Type \
Parameter
Mean
Samples
outside σσσσ
Samples
outside 2σσσσ
Samples
outside 3σσσσ
ArrayList
0.8

2

1

1

LinkedList 0.8

12

1

0

HashSet
1.28

1

1

1

TreeSet 6.8

2

1

1

Vector
1.2

1

1

1

HashMap 1.48

5

1

1

TreeMap
6.36

5

1

1

LinkedHashMap 1.88

1

1

1

Table 4: Results for insertion of 10000 unique elements (Long)

Comparison for data size of 100000 elements (Long)

Type \
Parameter
Mean
Samples
outside σ
σσ
σ
Samples
outside 2σ
σσ
σ
Samples
outside 3σ
σσ
σ
ArrayList
25.6

4

1

0

LinkedList 54.96

1

1

7

HashSet
63.12

2

1

1

TreeSet 126.08

2

2

2

Vector
20.04

5

3

0

HashMap 42.68

4

3

0

Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 10

TreeMap
99.12

7

1

0

LinkedHashMap 59.12

8

2

0

Table 5: Results for insertion of 100000 unique elements (Long)

Comparison of data size of 1000000 elements (Long)

Type \
Parameter
Mean
Samples
outside σσσσ
Samples
outside 2σσσσ
Samples
outside 3σσσσ
ArrayList
165.44

1

1

1

LinkedList 378.52

1

1

1

HashSet
675.24

3

1

1

TreeSet 1253.16

6

2

0

Vector
231

1

1

1

HashMap 711.52

1

1

1

TreeMap
1215.25

10

1

0

LinkedHashMap 863.48

2

1

1

Table 6: Results for insertion of 100000 unique elements (Long)

3.2 Comparison of unique elements (Element)
Comparison for data size of 10000 elements

Type \
Parameter
Mean
Samples
outside σσσσ
Samples
outside 2σσσσ
Samples
outside 3σσσσ
ArrayList
3.12

4

1

1

LinkedList 1.28

1

1

1

HashSet
1.28

2

2

2

Vector 1.28

2

2

2

Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 11

HashMap
1.24

1

1

1

LinkedHashMap 1.92

2

2

1

Table 7: Results for insertion of 10000 unique elements (Element)

Comparison for data size of 100000 elements

Type \
Parameter
Mean
Samples
outside σσσσ
Samples
outside 2σσσσ
Samples
outside 3σσσσ
ArrayList
51.76

1

1

1

LinkedList 45.52

7

0

0

HashSet
52.44

3

2

1

Vector 20.6

4

2

1

HashMap
1.28

2

2

2

LinkedHashMap 78.04

9

1

0

Table 8: Results for insertion of 100000 unique elements (Element)

Comparison for data size of 500000 elements

Type \
Parameter
Mean
Samples
outside σσσσ
Samples
outside 2σσσσ
Samples
outside 3σσσσ
ArrayList
188.48

1

1

1

LinkedList 298.44

2

1

1

HashSet
444.2

1

1

1

Vector 169.08

11

0

0

HashMap
437.4

1

1

1

LinkedHashMap 530.44

1

1

1

Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 12

Table 9: Results for insertion of 500000 unique elements (Element)

3.3 Comparison of non-unique elements (Element)

Comparison for data size of 10000 elements

Type \
Parameter
Mean
Samples
outside σ
σσ
σ
Samples
outside 2σ
σσ
σ
Samples
outside 3σ
σσ
σ
HashSet
1218.08

12

0

0

HashMap 1233.08

4

2

0

LinkedHashMap 1234.96

9

0

0

Table 10: Results for insertion of 10000 non-unique elements (Element)

You can notice that time taken for inserting 10000 non-unique elements, is
significantly more than for unique elements. This clearly shows that such cases
should be avoided unless necessary.
We did not carry on any further benchmarking of this scenario due to our
observation above.
3.4 Iteration over elements (Long)
Comparison for data size of 10000 elements (Long)

Type \
Parameter
Mean
Samples
outside σ
σσ
σ
Samples
outside 2σ
σσ
σ
Samples
outside 3σ
σσ
σ
ArrayList
0.64

1

1

1

LinkedList 0

0

0

0

HashSet
0.64

1

1

1

TreeSet 0

O

O

O

Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 13

Vector
0

0

0

0

HashMap 1.84

3

3

0

TreeMap
5.64

9

0

0

LinkedHashMap 2.56

4

4

0

Table 11: Results for iteration of 10000 elements (Long)

Comparison for data size of 100000 elements (Long)

Type \
Parameter
Mean
Samples
outside σσσσ
Samples
outside 2σσσσ
Samples
outside 3σσσσ
ArrayList
7.52

1

1

0

LinkedList 8

1

1

1

HashSet
17

1

1

1

TreeSet 9.96

11

1

1

Vector
4.4

7

0

0

HashMap 28.76

11

1

0

TreeMap
97.96

6

1

0

LinkedHashMap 71.12

10

0

0

Table 12: Results for iteration of 100000 elements (Long)


Comparison for data size of 1000000 elements (Long)

Type \
Parameter
Mean
Samples
outside σ
σσ
σ
Samples
outside 2σ
σσ
σ
Samples
outside 3σ
σσ
σ
ArrayList
33.8

2

2

1

LinkedList 33.08

8

1

1

HashSet
53.64

2

2

0

Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 14

TreeSet 35.12

5

3

0

Vector
29.4

7

2

0

HashMap 683.16

1

1

1

TreeMap
1219.92

4

1

1

LinkedHashMap 844.96

1

1

1

Table 13: Results for iteration of 1000000 elements (Long)

3.5 Iteration over elements (Element)
Comparison for data size of 10000 elements (Element)

Type \
Parameter
Mean
Samples
outside σσσσ
Samples
outside 2σσσσ
Samples
outside 3σσσσ
ArrayList
2.48

4

4

0

LinkedList 1.28

1

1

1

HashSet
1.28

1

1

1

Vector 1.28

2

2

2

HashMap
1.28

1

1

1

LinkedHashMap 1.88

3

3

0

Table 14: Results for iteration of 10000 elements (Element)

Comparison for data size of 100000 elements (Element)

Type \
Parameter
Mean
Samples
outside σσσσ
Samples
outside 2σσσσ
Samples
outside 3σσσσ
ArrayList
51.16

1

1

1

LinkedList 54.92

19

0

0

HashSet
54.32

3

3

1

Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 15

Vector 19.26

4

1

1

HashMap
51,84

8

3

0

LinkedHashMap 80.96

7

0

0

Table 15: Results for iteration of 100000 elements (Element)

Comparison for data size of 1000000 elements (Element)

Type \
Parameter
Mean
Samples
outside σσσσ
Samples
outside 2σσσσ
Samples
outside 3σσσσ
ArrayList
187.28

1

1

1

LinkedList 325.12

1

1

1

HashSet
470.44

5

1

1

Vector 177.24

15

0

0

HashMap
448.4

5

1

1

LinkedHashMap 548.44

1

1

1

Table 16: Results for iteration of 1000000 elements (Element)

Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 16

Appendix
A DATA STRUCTURE FOR CUSTOM CLASS
package com.kapil.spikes.collections;

public class Element
{
private Long identifier;


public Element(Long identifier)
{
this.identifier = identifier;
}


@Override
public int hashCode()
{
final int prime = 31;
int result = 1;
result = prime * result + ((identifier == null) ? 0 :
identifier.hashCode());
return result;

// Returing a constant value of 1 will make all objects equal
// return 1;
}


@Override
public boolean equals(Object obj)
{
if (this == obj)
{
return true;
}

if (obj == null)
{
return false;
}

if (getClass() != obj.getClass())
{
return false;
}

final Element other = (Element) obj;
if (identifier == null)
{
if (other.identifier != null)
{
return false;
}
}
else if (!identifier.equals(other.identifier))
{
return false;
}
Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 17


return true;
}
}
Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 18

List of Tables
B LIST OF TABLES
Table 1: Hardware specifications..................................................................6
Table 2: Software specifications...................................................................6
Table 3: Mathematical parameters for benchmark..........................................8
Table 4: Results for insertion of 10000 unique elements (Long)........................9
Table 5: Results for insertion of 100000 unique elements (Long)....................10
Table 6: Results for insertion of 100000 unique elements (Long)....................10
Table 7: Results for insertion of 10000 unique elements (Element).................11
Table 8: Results for insertion of 100000 unique elements (Element)...............11
Table 9: Results for insertion of 500000 unique elements (Element)...............12
Table 10: Results for insertion of 10000 non-unique elements (Element).........12
Table 11: Results for iteration of 10000 elements (Long)...............................13
Table 12: Results for iteration of 100000 elements (Long).............................13
Table 13: Results for iteration of 1000000 elements (Long)...........................14
Table 14: Results for iteration of 10000 elements (Element)..........................14
Table 15: Results for iteration of 100000 elements (Element)........................15
Table 16: Results for iteration of 1000000 elements (Element).......................15

Technical Whitepaper
Performance Evaluation | Java Collections Framework


Prepared by Kapil Viren Ahuja for public distribution
October 21, 2008
Page 19

Appendix
C CHANGE LOG

ID
Description
User
Date
1
First Draft of the benchmark
Kapil Viren Ahuja
2008-10-08
2 Published Kapil Viren Ahuja 2008-10-21