Java Virtual Machine - CS 434

buninnateΛογισμικό & κατασκευή λογ/κού

18 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

77 εμφανίσεις

The Java Virtual Machine

1

Course Overview

PART I: overview material

1

Introduction

2

Language processors (tombstone diagrams, bootstrapping)

3

Architecture of a compiler

PART II: inside a compiler

4

Syntax analysis

5

Contextual analysis

6

Runtime organization

7

Code generation

PART III: conclusion

8
Interpretation

9

Review

Supplementary material:

Java’s runtime organization

and the Java Virtual Machine

The Java Virtual Machine

2

What This Topic is About

We look at the JVM as an example of a real
-
world runtime system
for a modern object
-
oriented programming language.

JVM is probably the most common and widely used VM in the
world, so you’ll get a better idea what a real VM looks like.



JVM is an abstract machine.



What is the JVM architecture?



What is the structure of .class files?



How are JVM instructions executed?



What is the role of the constant pool in dynamic linking?

Also visit this site for more complete information about the JVM:

http://java.sun.com/docs/books/vmspec/2nd
-
edition/html/VMSpecTOC.doc.html

The Java Virtual Machine

3

Recap: Interpretive Compilers

Why?

A tradeoff between fast(er) compilation and a reasonable runtime
performance.


How?

Use an “intermediate language”


more high
-
level than machine code => easier to compile to


more low
-
level than source language => easy to implement as an
interpreter


Example:
A “Java Development Kit” for machine
M

Java

>JVM

M

JVM

M

The Java Virtual Machine

4

Abstract Machines

Abstract machine implements an intermediate language in between
the high
-
level language (e.g. Java) and the low
-
level hardware (e.g.
Pentium)

Java

Pentium

Java

Pentium

JVM (.class files)

High level

Low level

Java compiler

Java JVM interpreter

or JVM JIT compiler

Implemented in Java:
Machine independent

The Java Virtual Machine

5

Abstract Machines

An abstract machine is intended specifically as a runtime system for
a particular (kind of) programming language.



JVM is a virtual machine for Java programs.



It directly supports object
-
oriented concepts such as classes,
objects, methods, method invocation etc.



Easy to compile Java to JVM

=> 1. easy to implement compiler


2. fast compilation



Another advantage: portability


The Java Virtual Machine

6

Class Files and Class File Format

The
JVM
is an
abstract

machine in the truest sense of the word.

The JVM specification does
not

give implementation details (can be
dependent on target OS/platform, performance requirements, etc.)

The JVM specification defines a machine independent “
class file
format
” that all JVM implementations must support.

.class files

JVM

load

External representation

(platform independent)

Internal representation

(implementation dependent)

objects

classes

methods

arrays

strings

primitive types

The Java Virtual Machine

7

Data Types

JVM (and Java) distinguishes between two kinds of types:


Primitive types:


boolean:
boolean



numeric integral:
byte, short, int, long, char


numeric floating point:

float, double


internal, for exception handling:

returnAddress


Reference types:


class types


array types


interface types

Note:
Primitive types are represented directly, reference types are
represented indirectly (as pointers to array or class instances).

The Java Virtual Machine

8

JVM: Runtime Data Areas

Besides OO concepts, JVM also supports multi
-
threading. Threads are
directly supported by the JVM.

=> Two kinds of runtime data areas:


1. shared between all threads


2. private to a single thread

Shared

Thread 1

Thread 2

pc

Java

Stack

Native

Method

Stack

pc

Java

Stack

Native

Method

Stack

Garbage Collected

Heap

Method area

The Java Virtual Machine

9

Java Stacks

JVM is a stack based machine, much like TAM.


JVM instructions


implicitly take arguments from the stack top


put their result on the top of the stack


The stack is used to


pass arguments to methods


return a result from a method


store intermediate results while evaluating expressions


store local variables

This works similarly to (but not exactly the same as) what we
previously discussed about stack
-
based storage allocation and routines.

The Java Virtual Machine

10

Stack Frames

The Java stack consists of frames. The JVM specification does not say
exactly how the stack and frames should be implemented.

The JVM specification specifies that a stack frame has areas for:

Pointer to runtime constant pool

local vars

operand stack

args

+

A new call frame is created by executing
some JVM instruction for invoking a
method (e.g.
invokevirtual,
invokenonvirtual
, ...)

The operand stack is initially empty,

but grows and shrinks during execution.

The Java Virtual Machine

11

Stack Frames

local vars

operand stack

args

+

Stack for storing intermediate results
during the execution of the method.

• Initially it is empty.

• The maximum depth is known at
compile time
.

The role/purpose of each of the areas in a stack frame:

pointer to

constant pool

Used implicitly when executing JVM

instructions that contain entries into the

constant pool (more about this later).

Space where the arguments and local variables

of a method are stored. This includes a space
for the receiver (
this
) at position/offset 0.

The Java Virtual Machine

12

Stack Frames

An implementation using registers such as SB, ST, and LB and a
dynamic link is
one possible implementation.

LB

ST

dynamic link

SB

to previous frame on the stack

to runtime constant pool

local vars

operand stack

args

+

JVM instructions
store

and
load

(for accessing args and locals) use

addresses which are numbers

from 0 to #args + #locals
-

1

The Java Virtual Machine

13

JVM Interpreter

The core of a JVM interpreter is basically this:

do {


byte opcode =
fetch an opcode
;


switch (opcode) {


case

opCode1

:


fetch operands for
opCode1
;


execute action for
opCode1
;


break;


case

opCode2

:


fetch operands for
opCode2
;


execute action for
opCode2
;


break;


case

...

} while (
more to do
)

The Java Virtual Machine

14

Instruction
-
set: typed instructions!

JVM instructions are explicitly typed: different opCodes for
instructions for integers, floats, arrays, reference types, etc.


This is reflected by a naming convention in the first letter of the
opCode mnemonics:

Example:

different types of “load” instructions

iload

lload

fload

dload

aload

integer load

long load

float load

double load

reference
-
type load

The Java Virtual Machine

15

Instruction set: kinds of operands

JVM instructions have three kinds of operands:


-

from the top of the operand stack


-

from the bytes following the opCode


-

part of the opCode itself

Each instruction may have different “forms” supporting different
kinds of operands.

Example:

different forms of “iload”

iload_0

iload_1

iload_2

iload_3

Assembly code

Binary instruction code layout

26

27

28

29

21

n

iload
n

wide iload
n

196

n

21

The Java Virtual Machine

16

Instruction
-
set: accessing arguments and locals

locals: indexes #args .. #args + #locals
-

1

args: indexes 0 .. #args
-

1

arguments and locals area inside a stack frame

Instruction examples:

iload_1

iload_3

aload 5

aload_0

istore_1

astore_1

fstore_3

0:

1:

2:

3:


A
load

instruction takes something
from the args/locals area and pushes
it onto the top of the operand stack.


A
store

instruction pops something
from the top of the operand stack
and places it in the args/locals area.

The Java Virtual Machine

17

Instruction
-
set: non
-
local memory access

In the JVM, the contents of different “kinds” of memory can be
accessed by different kinds of instructions.

accessing locals and arguments:
load

and
store

instructions

accessing fields in objects:

getfield, putfield

accessing static fields:

getstatic, putstatic

Note
: Static fields are a lot like global variables. They are allocated
in the “method area” where also code for methods and
representations for classes (including method tables) are stored.

Note:

getfield

and
putfield

access memory in the heap.

Note:

JVM doesn’t have anything similar to registers L1, L2, etc.

The Java Virtual Machine

18

Instruction
-
set: operations on numbers

add:

iadd, ladd, fadd, dadd

subtract:

isub, lsub, fsub, dsub

multiply:

imul, lmul, fmul, dmul

etc.

Arithmetic

Conversion

i2l, i2f, i2d,

l2f, l2d, f2d,

f2i, d2i, …

The Java Virtual Machine

19

Instruction
-
set …

Operand stack manipulation

pop, pop2, dup, dup2, swap, …



Control transfer

Unconditional:

goto, jsr, ret, …

Conditional:
ifeq, iflt, ifgt, if_icmpeq,



The Java Virtual Machine

20

Instruction
-
set …

Method invocation:

invokevirtual
:
usual instruction for calling a method on an
object.


invokeinterface
:

same as
invokevirtual
, but used
when the called method is declared in an interface (requires a
different kind of method lookup)

invokespecial
:

for calling things such as constructors, which
are not dynamically dispatched (this instruction is also known as
invokenonvirtual
).

invokestatic
:
for calling methods that have the “static”
modifier (these methods are sent to a class, not to an object).


Returning from methods:

return, ireturn, lreturn, areturn, freturn, …

The Java Virtual Machine

21

Instruction
-
set: Heap Memory Allocation

Create new class instance (object):

new


Create new array:

newarray
:
for creating arrays of primitive types.

anewarray, multianewarray
:
for arrays of reference types.

The Java Virtual Machine

22

Instructions and the “Constant Pool”

Many JVM instructions have operands which are indexes pointing to
an entry in the so
-
called
constant pool
.

The constant pool contains all kinds of entries that represent
“symbolic” references for “linking”. This is the way that instructions
refer to things such as classes, interfaces, fields, methods, and
constants such as string literals and numbers.

These are the kinds of constant pool entries that exist:


Class_info


Fieldref_info


Methodref_info


InterfaceMethodref_info


String


Integer


Float


Long


Double


Name_and_Type_info


Utf8_info
(Unicode characters)

The Java Virtual Machine

23

Instructions and the “Constant Pool”

Example:
We examine the
getfield

instruction in detail.

180

indexbyte1

indexbyte2

Format:

CONSTANT_Fieldref_info {


u1 tag;


u2 class_index;


u2 name_and_type_index;

}

Class_info {


u1 tag;


u2 name_index;

}

Utf8Info

fully
qualified
class name

CONSTANT_Name_and_Type_info {


u1 tag;


u2 name_index;


u2 descriptor_index;

}

Utf8Info

name of field

Utf8Info

field descriptor

The Java Virtual Machine

24

Instructions and the “Constant Pool”

180

indexbyte1

indexbyte2

Format:

Fieldref


Class



Utf8Info

fully qualified
class name

Name_and_Type


Utf8Info

name of field

Utf8Info

field descriptor

That previous picture is rather complicated, let’s simplify it a little:

The Java Virtual Machine

25

Instructions and the “Constant Pool”

Luckily, we have a Java assembler that allows us to write a kind of
textual assembly code and that is then transformed into a binary
.class file.


This assembler takes care of creating the constant pool entries for us.
When an instruction operand expects a constant pool entry the
assembler allows you to enter the entry “in place” in an easy syntax.


Example:

getfield mypackage/Queue i I

The constant entries format is part of the Java class file format.

The Java Virtual Machine

26

Instructions and the “Constant Pool”

Fully qualified class names and descriptors in constant pool UTF8
entries.

1. Fully qualified class name:

a package + class name string. Note
this uses “/” instead of “.” to separate each level along the path.


2. Descriptor:
a string that defines a type for a method or field.


Java

descriptor

boolean

Z

integer

I

Object

Ljava/lang/Object;

String[]

[Ljava/lang/String;

int foo(int,Object)

(ILjava/lang/Object;)I

The Java Virtual Machine

27

Linking

In general, linking is the process of resolving symbolic references in
binary files.


Most programming language implementations have what we call
“separate compilation”. Modules or files can be compiled separately
and transformed into some binary format. But since these separately
compiled files may have connections to other files, they have to be
linked.


=> The binary file is not yet executable, because it has some kind of
“symbolic links” in it that point to things (classes, methods, functions,
variables, etc.) in other files/modules.


Linking is the process of resolving these symbolic links and replacing
them by real addresses so that the code can be executed.

The Java Virtual Machine

28

Loading and Linking in JVM

In JVM, loading and linking of class files happens at runtime,
while
the program is running!


Classes are loaded as needed.


The constant pool contains symbolic references that need to be
resolved before a JVM instruction that uses them can be executed
(this is the equivalent of linking).


In JVM a constant pool entry is resolved the first time it is used by a
JVM instruction.

Example:

When a
getfield

is executed for the first time, the constant pool
entry index in the instruction can be replaced by the offset of the field.

The Java Virtual Machine

29

Closing Example

class Factorial {



int fac(int n) {


int result = 1;


for (int i=2; i<n; i++) {


result = result * i;


}


return result;


}

}

As a closing example on the JVM, we will take a look at the
compiled code of the following simple Java class declaration.

The Java Virtual Machine

30

Compiling and Disassembling

%
javac Factorial.java


%
javap
-
c
-
verbose Factorial

Compiled from Factorial.java

class Factorial extends java.lang.Object {


Factorial();


/* Stack=1, Locals=1, Args_size=1 */


int fac(int);


/* Stack=2, Locals=4, Args_size=2 */

}


Method Factorial()


0 aload_0


1 invokespecial #1 <Method java.lang.Object()>


4 return


The Java Virtual Machine

31

Compiling and Disassembling ...


// address: 0 1 2 3

Method int fac(int) // stack: this n result i


0 iconst_1 // stack: this n result i 1


1 istore_2 // stack: this n
result

i


2 iconst_2 // stack: this n result i 2


3 istore_3 // stack: this n result
i


4 goto 14


7 iload_2 // stack: this n
result

i result


8 iload_3 // stack: this n result
i

result i


9 imul // stack: this n result i result*i


10 istore_2 // stack: this n
result

i



11 iinc 3 1 // stack: this n result
i



14 iload_3 // stack: this n result
i

i


15 iload_1 // stack: this
n

result i i n


16 if_icmplt 7 // stack: this n result i


19 iload_2 // stack: this n
result

i result


20 ireturn

The Java Virtual Machine

32

Writing Factorial in “jasmin”

.class package Factorial

.super java/lang/Object


.method package <init>( )V

.limit stack 50

.limit locals 1

aload_0

invokenonvirtual java/lang/Object/<init>( )V

return

.end method


Jasmin

is a Java Assembler Interface. It takes ASCII descriptions

for Java classes, written in a simple assembler
-
like syntax and using

the Java Virtual Machine instruction set. It converts them into binary

Java class files suitable for loading into a JVM implementation.

The Java Virtual Machine

33

Writing Factorial in “jasmin” (continued)

.method package fac(I)I

.limit stack 50

.limit locals 4

iconst_1

istore 2

iconst_2

istore 3

Label_1:

iload 3

iload 1

if_icmplt Label_4

iconst_0

goto Label_5

Label_4:

iconst_1

Label_5:

ifeq Label_2


iload 2

iload 3

imul

dup

istore 2

pop

Label_3:

iload 3

dup

iconst_1

iadd

istore 3

pop

goto Label_1

Label_2:

iload 2

ireturn

iconst_0

ireturn

.end method