Object Serialization in Java

computerharpySoftware and s/w Development

Dec 2, 2013 (3 years and 6 months ago)

60 views

Object Serialization in Java

Or: The Persistence of Memory…

So you want to save your data…


Common problem:


You’ve built a large, complex object


Spam/Normal statistics tables


Game state


Database of student records


Etc…


Want to store on disk and retrieve later


Or: want to send over network to another Java
process


In general: want your objects to be
persistent

--

outlive the current Java process

Answer I: Homebrew file formats


You’ve got file I/O nailed, so…


Write a set of methods for saving/loading each
class that you care about

public class MyClass {


public void saveYourself(Writer o)


throws IOException { … }


public static MyClass loadYourself(Reader r)


throws IOException { … }

}

Coolnesses of Approach 1:


Can produce arbitrary file formats


Know exactly what you want to store and get
back/don’t store extraneous stuff


Can build file formats to interface w/ other
codes/programs


XML


Tab
-
delimited/spreadsheet


Etc.


If your classes are nicely hierarchical, makes
saving/loading simple

Saving/Loading Recursive Data Structs

public interface Saveable {


public void saveYourself(Writer w)


throws IOException;


// should also have this


// public static Object loadYourself(Reader r)


// throws IOException;


// but you can’t put a static method in an


// interface in Java

}

Saving, cont’d

public class MyClassA implements Saveable {


public MyClassA(int arg) {


// initialize private data members of A


}


public void saveYourself(Writer w)


throws IOException {


// write MyClassA identifier and private data on


// stream w


}


public static MyClassA loadYourself(Reader r)


throws IOException {


// parse MyClassA from the data stream r


MyClassA tmp=new MyClassA(data);


return tmp;


}

}

Saving, cont’d

public class MyClassB implements Saveable {


public void MyClassB(int arg) { … }


private MyClassA _stuff;


public void saveYourself(Writer w) {


// write ID for MyClassB


_stuff.saveYourself(w);


// write other private data for MyClassB


w.flush();


}


public static MyClassB loadYourself(Reader r) {


// parse MyClassB ID from r


MyClassA tmp=MyClassA.loadYourself(r);


// parse other private data for MyClassB


return new MyClassB(tmp);


}

}

Painfulnesses of Approach 1:


This is called
recursive descent parsing

(and
formatting)


We’ll use it in project 2, and there are plenty of
places in the Real World (TM) where it’s terribly
useful.


But... It’s also a pain in the a**


If all you want to do is store/retrieve data, do
you
really

need to go to all of that effort?


Fortunately, no. Java provides a shortcut that
takes a lot of the work out.

Approach 2: Enter Serialization...


Java provides the serialization mechanism for
object persistence


It essentially automates the grunt work for you


Short form:

public class MyClassA implements Serializable { ... }

// in some other code elsewhere...

MyClassA tmp=new MyClassA(arg);

FileOutputStream fos=new FileOutputStream(“some.obj”);

ObjectOutputStream out=new ObjectOutputStream(fos);

out.writeObject(tmp);

out.flush();

out.close();


In a bit more detail...


To (de
-
)serialize an object, it must
implements
Serializable


All of its data members must also be marked
serializable


And so on, recursively...


Primitive types (
int
,
char
, etc.) are all
serizable automatically


So are Strings, most classes in java.util, etc.


This saves/retrieves the entire object graph,
including ensuring uniqueness of objects



The object graph and uniqueness

MondoHashTable

Entry

Entry

“tyromancy”

“zygopleural”

Vector

Now for some subtleties...


static

fields are not automatically serialized


Not possible to automatically serialize them b/c
they’re owned by an entire
class
, not an object


Options:


final static

fields are automatically initialized
(once) the first time a class is loaded


static

fields initialized in the
static {}

block will
be initialized the first time a class is loaded


But what about other static fields?

When default serialization isn’t enough


Java allows
writeObject()

and
readObject()

methods to customize output


If a class provides these methods, the
serialization/deserialization mechanism calls
them
instead

of doing the default thing

writeObject()

in action

public class DemoClass implements Serializable {


private int _dat=3;


private static int _sdat=2;



private void writeObject(ObjectOutputStream o)


throws IOException {


o.writeInt(_dat);


o.writeInt(_sdat);


}


private void readObject(ObjectInputStream i)


throws IOException, ClassNotFoundException {


_dat=i.readInt();


_sdat=i.readInt();


}

}

Things that you
don’t

want to save


Sometimes, you want to explicitly
not

store
some non
-
static data


Computed vals that are cached simply for
convenience/speed


Passwords or other “secret” data that
shouldn’t be written to disk


Java provides the “
transient
” keyword.
transient foo
==don’t save
foo


public class MyClass implements Serializable {


private int _primaryVal=3;

// is serialized


private transient int _cachedVal=_primaryVal*2;


// _cachedVal is not serialized

}

Gotchas: #0
--

non Serializable fields


What happens if class Foo has a field of type
Bar, but Bar isn’t serializable?


If you just do this:




You get a
NotSerializableException

(bummer)


Answer: use
read
/
writeObject
to explicitly
serialize parts that can’t be handled otherwise


Need some way to get/set necessary state

Foo tmp=new Foo();

ObjectOutputStream out=new ObjectOutputStream;

out.writeObject(tmp);

Gotchas: #0.5
--

non
-
Ser. superclasses


Suppose


class Foo extends Bar implements
Serializable


But
Bar

itself isn’t serializable


What happens?

Non
-
Serializable superclasses, cont’d


Bar

must provide a no
-
arg constructor


Foo

must use
readObject
/
writeObject

to take care
of
Bar
’s private data


Java helps a bit with
defaultReadObject

and
defaultWriteObject


Order of operations (for deserialization)


Java creates a new
Foo

object


Java calls
Bar
’s no
-
arg constructor


Java calls
Foo
’s
readObject


Foo
’s
readObject

explicitly reads
Bar
’s state data


Foo

reads its own data


Foo

reads its children’s data

Gotchas: #1
--

Efficiency


For your
MondoHashTable
, you
can

just
serialize/deserialize it with the default methods


But

that’s not necessarily efficient, and may
even be wrong


By default, Java will store the entire internal
_table
, including all of its
null

entries!


Now you’re wasting space/time to load/save all
those empty cells


Plus, the
hashCode()
s of the keys may not be
the same after deserialziation
--

should explicitly
rehash them to check.

Gotchas: #2
--

Backward compatibility


Suppose that you have two versions of class
Foo: Foo v. 1.0 and Foo v. 1.1


The public and protected members of 1.0 and
1.1 are the same; the semantics of both are the
same


So Foo 1.0 and 1.1 should behave the same and
be interchangable


BUT... The private fields and implementation of
1.0 and 1.1 are different


What happens if you serialize with a 1.0 object
and deserialize with a 1.1? Or vice versa?

Backward compat, cont’d.


Issue is that in code, only changes to the
public

or
protected

interfaces matter


With serialization, all of a sudden, the private
data memebers (and methods) count too


Have to be
very

careful to not muck up internals
in a way that’s inconsistent with previous
versions


E.g., changing the
meaning
, but not
name

of
some data field

Backward compat, cont’d


Example:

// version 1.0

public class MyClass {


MyClass(int arg) { _dat=arg*2; }


private int _dat;

}


// version 1.1

public class MyClass {


MyClass(int arg) { _dat=arg*3; } // NO
-
NO!


private int _dat;

}

Backward compat, cont’d:


Java helps as much as it can


Java tracks a “version number” of a class that
changes when the class changes “substantially”


Fields changed to/from static or transient


Field or method names changed


Data types change


Class moves up or down in the class
hierarchy


Trying to deserialize a class of a different
version than the one currently in memory
throws
InvalidClassException

Yet more on backward compat


Java version number comes from names of all
data and method members of a class


If they don’t change, the version number won’t
change


If you want Java to detect that something about
your class has changed, change a name


But
, if all you’ve done is changed names (or
refactored functionality), you want to be able to
tell Java that nothing has changed


Can lie to Java about version number:

static final long serialVersionUID = 3530053329164698194L;