Dealing with JVM limitations in Apache Cassandra

Arya MirSoftware and s/w Development

Feb 10, 2012 (5 years and 7 months ago)

829 views

Dealing with JVM limitations
in Apache Cassandra
Jonathan Ellis / @spyced
Pain points for Java databases

GC

GC

GC
Pain points for Java databases

GC

Platform specific code
GC

Concurrent and compacting: choose one

G1

Azul C4 / Zing?
Fragmentation

Bloom filter arrays

Compression offsets
Automatic mitigation?

http://www.research.ibm.com/people/d/dfb/papers/Bacon03Controlling.pdf

http://researcher.ibm.com/files/us-hirzel/pldi10-arraylets.pdf
Fragmentation, 2

Arena allocation for memtables
(Memtables?)
Memory
Hard drive
Memtable
write( , )
k1
c1:v1
Commit log
Memory
Hard drive
Memtable
write( , )
k1
c1:v
Commit log
k1
c1:v
k1
c1:v
Memory
Hard drive
write( , )
k1
c2:v
k1
c1:v
k1
c1:v
k1
c2:v
c2:v
Memory
Hard drive
k1
c1:v
k1
c1:v
k1
c2:v
c2:v
write( , )
k2
c1:v
c2:v
k2
c1:v
c2:v
k2
c1:v
c2:v
Memory
Hard drive
k1
c1:v
k1
c1:v
k1
c2:v
c2:v
write( , )
k1
c1:v
c3:v
k2
c1:v
c2:v
k2
c1:v
c2:v
k1
c1:v
c3:v
c3:v
Memory
Hard drive
SSTable
flush
k1
c1:v
c2:v
k2
c1:v
c2:v
c3:v
index
cleanup
“Java is a memory hog”

Large overhead for typical objects and collections

How large?

java.lang.instrument.Instrumentation

JAMM: Java Agent for Memory Measurements

https://github.com/jbellis/jamm
org.apache.cassandra.cache.SerializingCache

Live objects are about 85% JVM bookeeping


org.apache.cassandra.cache.FreeableMemory
using reference
counting

Considering doing reference-counted, off-heap memtables
as well
Don’t forget about young gen

Always stop-the-world for ~100ms
Platform-specific code

OS

JVM
m[un]map

Log-structured storage wants to remove old files post-
compaction; some platforms disallow deleting open files

Old workaround (pre-1.0):

use PhantomReference to tell when mmap’d file is GC (hence
unmapped)

Poor user experience and messy corner cases

New workaround:

Class.forName("sun.nio.ch.DirectBuffer").getMethod("cleaner")
mmap part 2

2GB limit via
ByteBuffer
:

public abstract byte get(int index)

Workaround:
MmappedSegmentedFile
public Iterator<DataInput> iterator(long position)
link

Used for snapshots

Old workaround: JNA

New workaround: supported directly by Java7
mlockall

swappiness: pissing off database developers since 2001 (?)

mlockall(MCL_CURRENT)
Low-level i/o

posix_fadvise

mincore/fincore

fctl

... JNA
A plug for JNA

https://github.com/twall/jna
static {
try {
Native.register("c");
...
private static native int mlockall(int flags)
throws LastErrorException;
The fallacy of choosing portability over power

Applets have been dead for years

Python gets it right

import readline
The fallacy of choosing safety over power

Allowing munmap would expose developers to segfaults

But, relying on the GC to clean up external resources is a
well-known antipattern

File.close

We need munmap badly enough that we resort to
unnatural and unportable code to get it

You haven’t kept us from risking segfaults, you’ve just made us
miserable
Compatibility through obscurity?

sun.misc.Unsafe

Used by high-profile libraries like high-scale-lib
... even public options
http://blogs.oracle.com/dave/entry/false_sharing_induced_by_card
Too negative?
Still true

"Many concurrent algorithms are very easy to write with a
GC and totally hard (to down right impossible) using
explicit free." -- Cliff Click