JavaMPI on Grid with communication-optimizing load sharing

hedgebornabaloneΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 3 χρόνια και 6 μήνες)

42 εμφανίσεις

G
-
JavaMPI:

A Grid Middleware for Distributed Java Computing with
MPI Binding and Process Migration Supports


Lin Chen, Cho
-
Li Wang, Francis C. M. Lau and Ricky K. K. Ma

Department of Computer Science and Information Systems

The University of Hong Kong

{lchen2+clwang+fcmlau+kk1ma}@csis.hku.hk

GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

2

Outline

Motivation

Overall system architecture

Detailed Issues

Related works

Conclusion & Future Work

GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

3

Motivation

Grid computing:
large
-
scale resource sharing
, high
performance

Globus Project: basic services required by building
and using a Grid



(authentication, security, resource allocation, remote data access,
information services, etc.)

However


long
-
running applications


continuous computation


Better utilization of resource


scheduling and load balancing

Java process migration


architecture
-
independent bytecode makes migration easier

GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

4

Motivation

Let the programmer write a grid application easily


no care about inter
-
site communication and intra
-
site
communication (we must care about it if directly using globus
communication libraries)


SPMD: one program can be executed in multiple places or
sites

MPI paradigm


a group of distributed processes, they can do peer
-
to
-
peer or
collective communication


Communication source or destination addresses are
unrelated with the real physical network address (adaptable)

GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

5

System Overview

(*)

Gatekeeper

(1)

(
1
*)

LS

Gatekeeper

(
3
*)

LS

Gatekeeper

(3)

LS

(
2
)

WAN

Migrating

(restarting a new
process through
Globus remote job
request with
delegated user
credentials and
Java
-
MPI job
credentials)


Java
-
MPI

communication

Some legacy

messages are
redirected

during migration

(
2
*)

JVM

M

Migration
module
resides in
each
JVM

GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

6

System overview




Globus Toolkit

Libraries


Java MPI communication daemons





Local schedulers
Java
-
MPI processes

Migration modules



A Java
-
MPI process


Java
-
MPI process


(before migration)


(after migration)




(1*)


(2*)


(3*): MPI communication route before migration


(1*)


(2*)


(3*): MPI communication route after migration


(*): Java MPI communication daemons redirect some legacy messages which should
be go to the migrated process


M

LS

GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

7

Layered design

Restorable

Communication Services

Authentication

Control

Block

DLB

Policy


Info.

Update

(Restorable MPI Comm Layer)

(Load Balancing Module)

Java
-
MPI Applications







MPICH
-
G2

Message

Queues

Globus Services

OS





JVM


JVMDI

Execution State Probe &
Migration Plug
-
in

(Migration Layer)

Java
-
MPI API & Java API


(Java
-
MPI API Layer)

Hardware

Migration

Instructions

GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

8

Java
-
MPI binding

Restorable communication layer


Daemon, a running MPICH
-
G2 process,
providing MPI communication services


Communicate with JavaMPI process
through IPC


Post
-
migration message



re
-
direction

Restorable

Communication

Process

space

GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

9

Java Process Migration

State capturing:


a probe attached in each JVM, saves the process
context through JVMDI (JVM Debugger Interface)


All runtime data: PC register, stack frames, objects,
method area (local variables), etc.


Event notification: method_entry, frame_pop, etc.


Use object serialization to package all reachable
objects in heap


New JDK1.4.0 & 1.4.1 released in Aug. 2002 support “full
-
speed debugging”

JVM

probe

JVMDI

1. Execution

state data

2. Event

notification

GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

10

Java process migration

State Restoration:


Exception handler inserted in bytecode
(pre
-
processing before execution) to
restore local variables and “jump” to the
original execution point


Re
-
allocate objects when re
-
starting JVM


Dynamic class loading

GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

11

Information update

Migration

Source

site

Migration

Destination

site

Other

sites

Migration begin


Notify other sites

(including destination site)


The process arrives

the safe migration point

(consume all legacy messages)


Update local site of

the process’s new place


Begin process state capturation

GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

12

Process Restart

Original

Process

New
-
started

Process

creates a new user certificate proxy

(proxy_init_cred )


delegated to remote site


get the resource allocation


The new process can be started

(similar to normal globus job submit)

JVM initialization

At the same time, the probe started


Process suspended in the beginning,

Probe read out context from dumpfile


Restoring the execution context


Process resumed and

continued from the last point



GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

13

Experiment Results

Hardware


32
-
node Cluster “ostrich”


configured as two grid points of 16 nodes


733MHz Pentium III processor


392MB of memory


connected by a 24
-
port Fast Ethernet
switch

Software


Linux 2.2.14


Gloubs 2.0


S
un JDK 1.4.0_02 (supporting JVMDI
with full
-
speed debugging mode)


MPICH 1.2.4 (MPICH
-
G2)


GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

14

Experiment results

Bandwidth
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
8
16
32
64
128
256
512
1024
2048
Message Size (byte)
Bandwidth (Kbyte/s)
Intra-site bandwidth
Inter-site bandwidth
Bandwidth comparison between inter
-
site and intra
-
site communication

with the installation of the MPI communication layer.

GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

15

Experiment results

Latency
0
0.1
0.2
0.3
0.4
0.5
0.6
1
2
4
8
16
32
64
128
256
512
1024
2048
Message Size (byte)
Latency (s)
Inter-site latency
Intra-site latency
Latency comparison for small messages between intra
-
site and inter
-
site communication

with the installation of the MPI communication layer.


GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

16

Experiment results

Time for capturing and restoring objects
0
500
1000
1500
2000
2500
3000
1
10
100
1000
10K
100K
1M
10M
object size (byte)
time (microsecond)
capturing objects
restoring objects
Time spent in capturing and restoring objects


GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

17

Experiment results

Time for capturing and restoring Java frames
0
1
2
3
4
5
6
1
10
20
50
100
200
300
number of frames
time (seconds)
capturing frames
restoring frames
Time spent in capturing and restoring frames


GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

18

Related Works

Java bindings for MPI: “mpiJava”, “JavaMPI”,
“MPIJ”, etc.

Java process or thread migration:


Add additional backup codes in programs [
Aglets[IBM96]]


Insert backup statements in the source or byte code, a
backup object is used to store state
[Wasp project
[Funfrocken98]]


Extend the JVM, make state accessible from Java programs,
support type recognition of Java stack
[sara Bouchenak
2000]


Use JVMDI to capture state, insert bytecode instructions in
program body to help restoring
[Torsten2001]


JESSICA (supports thread migration in JVM)


GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

19

Conclusion

a new middleware for the Grid with
Java
-
MPI communication and
transparent process migration features.


write MPI
-
style programs in Java language


Java process migration mechanism
supports the development of any dynamic
load balancing policy or fault tolerance
mechanism

GCC2002 Presentation

Lin Chen, CSIS, HKU (Dec. 26, 2002)

20

Future Plan

Develop some scientific and
engineering applications on top of this
middleware

Support of the transfer of other I/O
(including file stage
-
in/out)

Load balancing algorithm for the grid
environment (both CPU and network
load)


The End

Thanks !