Telegraph Java Experiences

hedgebornabaloneSoftware and s/w Development

Dec 2, 2013 (3 years and 6 months ago)

71 views

2/14/01

RightOrder : Telegraph & Java

1

Telegraph Java
Experiences

Sam Madden

UC Berkeley

madden@cs.berkeley.edu



2/14/01

RightOrder : Telegraph & Java

2

Telegraph Overview


100% Java


In memory database


Query engine for alternative sources


Web


Sensors


Testbed for adaptive query processing


2/14/01

RightOrder : Telegraph & Java

3

Telegraph & WWW : FFF


Federated Facts and Figures


Collect Data on the Election


Based on Avnur and Hellerstein
Sigmod ‘00 Work: Eddies


Route tuples dynamically based on
source loads and selectivities

2/14/01

RightOrder : Telegraph & Java

4

fff.cs.berkeley.edu


2/14/01

RightOrder : Telegraph & Java

5

Architecture Overview


Query Parser


Jlex & CUP


Preoptimizer


Chooses Access Paths


Eddy


Routes Tuples To Modules

2/14/01

RightOrder : Telegraph & Java

6

Modules


Doubly
-
Pipelined Hash Joins


Index Joins


For probing into web
-
pages


Aggregates & Group Bys


Scans


Telegraph Screen Scraper: View
web pages as Relations

2/14/01

RightOrder : Telegraph & Java

7

Execution Framework


One Thread Per Query


Iterator Model for Queries


Experimented with Thread Per Module


Linux threads are expensive


Two Memory Management Models


Java Objects


Home Rolled Byte Arrays


2/14/01

RightOrder : Telegraph & Java

8

Tuples as Java Objects


Tuple Data stored as a Java Object


Each in separate byte array


Tuples copied on joins, aggregates


Issues


Memory Management between Modules,
Queries, Garbage collector control


Allocation Overhead


Performance: 30,000 200byte tuples /
sec
-
> 5.9 MB / sec

2/14/01

RightOrder : Telegraph & Java

9

Tuples As Byte Array


All tuples stored in same byte array /
query


Surrogate Java Objects

Offset, Size

Offset, Size

Offset, Size

Surrogate Objects

Byte Array

Directory

2/14/01

RightOrder : Telegraph & Java

10

Byte Array (cont)


Allows explicit control over
memory / query (or module)


Compaction eliminates garbage
collection randomness


Lower throughput: 15,000 t/sec


No surrogate object reuse


Synchronization costs

2/14/01

RightOrder : Telegraph & Java

11

Other System Pieces


XML Based Catalog


Java Introspection Helps


Applet
-
based Front End


JDBC Interface


Fault Tolerance / Multiple Servers


Via simple UNIX tools

2/14/01

RightOrder : Telegraph & Java

12

RightOrder Questions


Performance vs. C


JNI Issues


Garbage Collection Issues


Serialization Costs


Lots of Java Objects


JDBC vs ODI

2/14/01

RightOrder : Telegraph & Java

13

Performance Vs. C


JVM + JIT Performance Encouraging:
IBM JIT == 60% of Intel C compiler,
faster than MSC for low level
benchmarks


IBM JIT 2x Faster than HotSpot for
Telegraph Scans


Stability Issues


www.javalobby.org/features/jpr

2/14/01

RightOrder : Telegraph & Java

14

JIT Performance vs C

IBM JIT

Optimized Intel

Optimized MS

Source: www.javalobby.org/features/jpr

2/14/01

RightOrder : Telegraph & Java

15

Performance Gotchas


Synchronization


~2x Function Call overhead in HotSpot


Used in Libraries: Vector, StringBuffer


String allocation single most intensive operation
in Telegraph


Mercatur: 20% initial CPU Cost


Garbage Collection


Java dumb about reuse


Mercatur: 15% Cost


OceanStore: 30ms avg latency, 1S peak


2/14/01

RightOrder : Telegraph & Java

16

More Gotchas


Finalization


Finalizing methods allows inlining


Serialization


RMI, JNI use serialization


Philippsen & Haumacher Show
Performance Slowness

2/14/01

RightOrder : Telegraph & Java

17

Performance Tools


Tools to address some issues


JAX, Jopt: make bytecode smaller, faster


www.alphaworks.ibm.com/tech/JAX


www.condensity.com


Bytecode optimizer


www.optimizeit.com


Good profiler, memory allocation and garbage
collection monitor

2/14/01

RightOrder : Telegraph & Java

18

JNI Issues


Not a part of Telegraph


JNI overhead quite large (JDK
1.1.8, PII 300 MHz)


Source: Matt Welsh.
A System Support High Performance Communication and IO In Java
. Master’s Thesis,


UC Berkeley, 1999.

2/14/01

RightOrder : Telegraph & Java

19

More JNI


But, this is being worked on


IBM JDK 100,000 B copy in 5ms, vs 23ms
for 1.1.8 (500 Mhz PIII)


JNI allows synchronization (pin /
unpin), thread management


See
http://developer.java.sun.com/developer/onlineTraining/Programming/JDCBook/jni.html


GCJ + CNI: access Java objects via
C++ classes


http://gcc.gnu.org/java/


2/14/01

RightOrder : Telegraph & Java

20

Garbage Collection


Performance


Big problem: 1 S or longer to GC lots of objects


Most Java GCs blocking (not concurrent or multi
-
threaded)


Unexpected Latencies


OceanStore: Network File Server, 30ms avg.
latencies for network updates, 1000 ms peak due
to GC


In high
-
concurrency apps, such delays disastrous

2/14/01

RightOrder : Telegraph & Java

21

Garbage Collection Cont.


Limited Control


Runtime.gc() only a hint


Runtime.freeMemory() unreliable


No way to disable


No object reuse


Lots of unnecessary memory allocations


2/14/01

RightOrder : Telegraph & Java

22

Serialization


Not in Telegraph


Philippsen and Haumacher, “More Efficient Object Serialization.”
International Workshop on Java for Parallel and Distributed
Computing. San Juan, April, 1999
.


Serialization costs for RMI are 50% of total RMI time


Discard longevity for 7x speed up


Sun Serialization provides versioning


Complete class description stored with each serialized
object


Most standard classes forward compatible (JDK docs
note special cases)


See
http://java.sun.com/products/jdk/1.2/docs/guide/serialization/spec/serialTOC.doc.html

2/14/01

RightOrder : Telegraph & Java

23

Lots of Objects


GC Issues Serious


Memory Management


GC makes programmers allocate willy
-
nilly


Hard to partition memory space


Telegraph byte
-
array ugliness due to
inability to limit usage of concurrent
modules, queries

2/14/01

RightOrder : Telegraph & Java

24

Storage Overheads


Java Object class is big:


Integer requires 23 bytes in JDK 1.3


int requires 4.3 bytes


No way to circumvent object
fields


Use primitives or hand
-
written
serialization whenever possible

2/14/01

RightOrder : Telegraph & Java

25

JDBC vs ODI


No experience with Oracle


JDBC overheads are high, but
don’t have specific performance
numbers

2/14/01

RightOrder : Telegraph & Java

26

Bottom Line


Java great for many reasons


GC, standard libraries, type safety, introspection,
etc.


Significant reductions in development and
debugging time.


Java performance isn’t bad


Especially with some tuning


Memory Management an Issue


Lack of control over JVMs bad


When to garbage collect, how to serialize, etc.