be recoverable is not a straightforward issue. There are some drawbacks of making an
object recoverable. First of all, a recoverable object should have extra code inside and
this additional code decreases the execution speed slightly. So, if there will be a five
minutes execution of an object, that object may not be eligible to be recoverable.
However, if an object has a very time-consuming calculation, there may be need for
recovery. After many hours of calculation, it may be necessary to shutdown the
system. The object should be recoverable for surviving from system shutdowns.

If an object is chosen as recoverable, it is written into the RT by the Object Manager
(or by its parent object if it is a subobject). A recoverable object should be signalled
for any incoming shutdown process. The object is signalled by setting the SS
(Shutdown Signal) field of the recoverable object from 0 to 1 in the RT.

Each recoverable object has a record in the RT. Each of them has a Shutdown Signal
(SS) field that is initially 0. They should periodically check their SS fields. The object
knows that system will be shutdown soon, if its SS field becomes 1.

53
Shutdown process is a hierarchical and decentralized process. Some objects should
wait other objects to go into the recovery state. This explains the meaning of the
keyword hierarchical. In addition, each recoverable object is responsible for its
recovery. The RA does not deal with how the recoverable objects recover themselves.
It just coordinates the proper shutdown process.

Periodically checking the SS field is possible with additional code inside the
recoverable objects. Additional code deals with the object execution states. An object
changes its execution state, when it receives data, produces data or sends data. After
changing its execution state, the object should check the RT for its SS field. It is in
programmer's control to determine these execution states. States may be atomic or
coarse depending on the program behavior. However, one must understand that after
the system startup, the object may continue its execution starting from it last state
visited. Unfortunately, it looses all calculations after that state.

There are some disadvantages of this approach. First of all, it increases the object
code size and the execution time. However, its advantage is very obvious. An object
may continue its execution from the point where it is interrupted. Another big
advantage over signal-based recovery approach is that it is more efficient and nearly
optimal. If a shutdown signal approaches between states, the object is not aware of the
shutdown event until it changes its state. This approach requires sligthly more
shutdown time, but provides more efficient recovery.

54
5.4 System Shutdown
When the RA receives a shutdown request, it immediately sets the SS field of the OM
to 1. When periodically checking the RT, the OM realizes that its SS field is 1, and it
immediately changes SS fields of its subobjects to 1 in a hierarchical order. By doing
so, it signals user objects, and its Handlers, and wait their SS fields all become 0
again. When they all become 0, the OM knows that user objects properly finished
their execution.


Figure 5.2: Setting the SS field.

Then it signals the system objects in a hierarchical order. It signals the Notification
Agent and the Migration Agent first. (since, there are no user objects at this stage, it
does not receive any request such as create, delete, or relocate. This means that the
Object Manager does not create new Handlers.) There is a design choice after this
stage. When the NA and the MA are done with the shutdown process (their SS field
becomes 0 again), the OM may signal its Handlers, instead of signalling them
simultaneously with the user objects. Handlers are the worker threads of the Object
Manager, and they just send Notification requests to the Notification Agent. Then,
55
they start waiting for the Notification results from the Notifiers. This design
alternative may optimize the shutdown process. However, in the current design, the
Handlers are signalled before the NA and the MA. The CA is the final object to be
signalled, since it is the lowest layer agent that is responsible from all the
communication backbone of the MaROS. Figure 5.4 illustrates the whole recovery
hierarchy. The numbers indicate the shutdown order. Some of the objects have the
same number for indicating parallel processing.
Figure 5.3: The OM realizes the Shutdown Signal.


56

Figure 5.4: Flow of the Shutdown Signal

5.4.1 Creating Image Files
As indicated in Section 5.3, the RA uses a decentralized approach for the recovery.
Each recoverable object is responsible for backing up its valuable data before
shutdown. Basically, they create a file that stores all the necessary information to
resume the object at the next startup. This file is called image file. After the creation
of the file, the recoverable object set its SS field to 0 again to signal its parent that it
has finished its recovery work.

Each recoverable object is responsible from its image file content. The file name is
the MaROS object identifier of the object. For instance an object with an Object ID of
4 creates an image file 4. It is ideal to keep all the image files in the same directory.
57
5.5 System Startup
When starting up the system, the first object, which will be run, is the Object Manager
(the creator of all other MaROS objects). It checks if there is an image file in the
image file directory. If there is not, normal startup process is initiated. However, if
there is such an image file, it immediately enters its recovery state. Each recoverable
object has a recovery state that restores its tables and data. In this state, the object
reads its image file and restores its environment.

5.5.1 Object Manager Startup Process
Figure 5.5 illustrates the startup process of the Object Manager. The OM restores its
Object Table and creates the Recovery Agent. Then, it waits RA to finish its recovery
work. After the creation of the RA, the RA restores its RT and flushes all SS fields
with 1s. Then, it signals the OM. The OM continues with creating other system and
user objects. Thereafter, it starts waiting all objects' SS fields to become 0. Each
recoverable object checks whether its image file is present or not. Incase it is present,
the recoverable object starts by running its recovery state. After the recovery state
finishes, the object set its SS field to 0. If all SS fields that the OM is checking
becomes 0, the OM starts its normal execution after flushing 1s to the SS fields
signalling all of its objects that the recovery state successfully completed. When a
recoverable object realizes that its SS field is 1 again, it understands that everything is
OK. It sets its SS field 0 again and continues its execution.
58

Figure 5.5: Startup of the Object Manager

59
5.5.2 Recovery Agent Startup Process
The very first job of the Recovery Agent after its creation is to check its image file. If
its image file does not exist, it continues its normal execution without entering any
recovery state. Otherwise, it enters a recovery state in which it restores its Recovery
Table and its state. Figure 5.6 depicts the startup process of the Recovery Agent. After
restoring the Recovery Table, the RA flushes SS fields of all objects with 1s. This
process is one of the key points of the recovery protocol. The Object Manager detects
recovered objects by checking their SS fields.

Figure 5.6: Startup of the Recovery Agent.

60

5.5.3 Startup of the System Agents
The startup process of the other system agents are very similar to each other. The
same approach may be used by all of the recoverable objects. In the current design,
each recoverable object checks whether there is a filename equals to its object
identifier. If there is any, the object first restores its tables and variables by reading
that image file. Then, it continues from the state where it left. The original code is
replicated for each state. An alternative approach may be use of labels for jumping to
the exact desired state. However, Java does not allow this solution; since, it does not
support unconditional jump operations. Figure 5.7 depicts the code replication
process.

Figure 5.7: Code replication solution for the recovery process.
61
5.5.4 Mutation of SS fields and RA Garbage Collector
The execution of the recoverable objects may finish any time, and the immediate
removal of those objects from the RT is not a viable solution; since, their parent
objects may be blocked on a suddenly dead child object. The parent objects should be
informed in some way. The mutation process of the shutdown signal field sets the SS
field of the dead object to an extraordinary value of 2. This value of the SS field
indicates that the owner of that SS field is dead.

The Recovery Agent runs a garbage collector in order to remove those dead objects
from the Recovery Table. The garbage collector periodically checks the SS field of all
recoverable objects for mutation. Incase it finds any, it immediately removes that
object from the list of its parent object record, and from the Recovery Table.
62



6

Pilot System Implementation


6.1 Introduction
This chapter provides a detailed information for the pilot implementation of MaROS
environment. First, the implementation language that is chosen to implement MaROS
is discussed, briefly. Secondly, the pilot system implementation environment is
described. Finally, the implementation details of the Notification and Recovery
modules of MaROS will be given.

6.2 Pilot System Implementation Language
The implementation language of the system has been chosen as Java. It is an object-
oriented programming language very similar to C++. It has many advantages, and
unfortunately some disadvantages. The choice of Java as a programming language in
MaROS was one of the milestones in the design phase. Java is a very powerful
programming language because:
• Java is platform independent. A compiled object code may be run on any
hardware and OS without any modification and even compilation. This was one of
the main reasons for choosing Java as the implementation language.
63
• Java does not support pointers. This property provides system security; since,
users may not garble the crucial memory locations by using pointer operations.
• Java does not support the disadvantageous properties of other object-oriented
languages. For example, many OOP languages support multiple inheritance,
which can sometimes lead to confusion or unnecessary complications. Java does
not.
• Java provides many pre-implemented utility classes such as Hashtable, Stack and
Vector classes. This property prevents programmers to implement and use their
own classes providing simplicity. This property also enhances the code
readability. Of course, programmers may extend these main classes by writing
their own methods.
• Java has a simple Thread package. MaROS is a multithreaded system, and Java is
one of the ideal programming languages for implementing such a system.

Java has also some disadvantages. First of all, it is slower than its counterparts; since,
it interprets the compiled byte code at runtime. It does not give full control to the
programmer for the sake of system security. For instance, with the lack of pointers,
and the process management tools, system programmers may encounter very
frustrated work.

Briefly, the choice of Java is a trade-off between system performance and system
portability & security.

64
6.3 Pilot System Implementation Environment
The MaROS environment consists of one SUN UltraSparc1 and eight Intel PCs. The
SUN system uses Solaris 2.5.1 operating system. Four of the PCs run Windows'95,
and the other four run Turkuaz 1.0.3 GNU/Linux operating systems. The UltraSparc1
is used as the MSP and Windows 95 machines are used as MHs. The Linux machines
were used as local MSPs by each programmer for testing MaROS modules. When the
new versions of the modules become feasible, those modules are transferred to the
MSP.

6.4 Pilot System Implementation
There are five main modules that forms the MaROS when combined together. These
modules are called as packages in Java. The five main packages are listed below:
• OM: The MaROS.OM package is the Object Manager of the system. It handles
objects and their operations.
• net: The MaROS.net package is responsible from the communication
infrastructure of MaROS.
• Notify: The MaROS.Notify package handles the object notification process.
• Migration: The relocation of the objects is managed by the MaROS.Migration
package.
• Recovery: The system recovery process after voluntary shutdowns is handled by
the MaROS.Recovery package.

There are also additional utility packages in MaROS. This thesis only covers the
Notify and the Recovery packages.
65
6.4.1 Notify Package
The MaROS.Notify package contains all the applications that are necessary for the
object notification process. The NotificationAgent is the main class of that package.
In the following subsections, the classes of the Notify package are overviewed.

6.4.1.1 Notify.NotificationAgent Class
This class is the heart of the notification process. It is implemented as a MaROS
thread, and it is the part that listens notification requests coming from the Object
Manager. This Class also contains Notifier class, and additional three table classes:
Notifier Information Table (NIT), Notifier Object Transfer Table (NOTT), and Partial
Object Transfer Table (POTT).

There are two versions (MH and MSP) of this class; since, there are two types of
machines in MaROS. For the system recovery, the MH version has additional
methods such as check_ShutdownSignal(), and saveImage(). Since, the job of the
Notification Agent is listening notification requests and assigning Notifiers to those
requests, it contains an infinite loop and a Notifier creation code.

6.4.1.2 Notify.NIT Class
NotificationAgent class maintains Notifier Information Table (NIT) for keeping track
of its Notifiers. The NIT is a global table that is used by all the Notifiers. Therefore,
all of the NIT methods are synchronized so that there is only one access to each
method at a time. The NIT has been implemented by using a table structure mapped
on array structures.

66
There are two versions of the table: The MH and the MSP version. In the MSP
version, there is an additional field for keeping the MaROS identifier of the mobile
host that is the source of that notification request. Table 6.1 and 6.2 displays the
formats of two versions of the NIT.

Notifier Object Identifier
Object Identifier
Notification Type
int
int
char

Table 6.1: The format of the MH version of the NIT.

Notifier Object Identifier
Object Identifier
Notification Type
Mobile Host Identifier
int
int
char
12 chars

Table 6.2: The format of the MSP version of the NIT.

The table size is 255, in default. That number determines the maximum number of
Notifiers that may run simultaneously. It may seem very large for a MH, and very
small for the MSP. Currently, the use of the array structure seems optimal; however,
the use of other data structures may be considered for better scalability in the future.
The methods of the NIT are explained below:
• ClearTable(): It sets the length of the table to 0. All elements in the table are left
untouched. However, they may be overwritten with the new table entries.
• insert (int NOID, int OID, char Type): This method inserts a new item into the
NIT. It accepts three parameters: The Object Identifier of the Notifier (NOID),
Object Identifier of the object which is to be notified (OID), and finally the type of
the notification (Create or Delete).
67
• delete (int NOID): This method deletes an existing entry from the table.
• get_NOID (int OID, char Type): It returns the Notifier Object Identifier (NOID)
of a given notification request.
• get_tmax (): This method returns the current size of the table.
• set_tmax (int mx): With a given parameter, this method sets the table size. It is
used by the recovery process.
• get_Notifier_OID (int idx), get_ObjectID (int idx), get_Type (int idx): Those
three methods are used by the recovery process to restore the table entries.

6.4.1.3 Notify.NotifierClass Class
The NotifierClass is the class that is responsible from the Object Notification process.
It accesses NIT for registering or removing the current Notifier instance. It handles
the notification request forwarded by the Notification Agent. For the object transfer,
two additional tables are used. These tables are called as the NOTT and the POTT.

There are two versions of the NotifierClass like the NotificationAgent Class: The MH
and the MSP versions. The MH version contains additional methods for the recovery
process. It also has additional code inside. That means the MH versions of the
Notification Agent and the Notifiers run slower than their MSP counterparts for the
sake of reliability.

6.4.1.4 Notify.NOTT and Notify.POTT Classes
The Notifier Object Transfer Table (NOTT) and the Partial Object Transfer Table
(POTT) are used by the NotifierClass, and they are local tables for each of the
68
Notifiers. Each Notifier, that is responsible from an object transfer process, manages
its NOTT. On the other hand, the Partial Object Transfer Table (POTT) is created,
and used only, when the object transfer operation is in partial type.

File Names
File Lengths
String
long

Table 6.3: The format of the NOTT.

File Indices
Current File Lengths
int
long

Table 6.4: The format of the POTT.

The structure of the NOTT and the POTT are very similar to NIT. The formats of
both tables are shown in Table 6.3 and Table 6.4, respectively. The maximum table
size is 255 for both tables, in default. The methods of the NOTT are explained below:
• ClearTable(): It sets the length of the table to 0. All elements in the table are left
untouched. However, they may be overwritten with the new table entries.
• insert (string Filename, long Filelength): This method inserts a new item into
the NOTT. It accepts two parameters: The name and the length of the file. This
information is used for the object transfer operation.
• get_tmax (): It returns the current size of the table.
• get_filename (int idx): It returns the filename field of a given index in the NOTT.
• get_filelength (int idx): This method returns the filelength field of a given index
in the NOTT.
69

The POTT has almost the same methods. There are some differences in the methods
listed below:
• insert (int Fileindex, long currentFilelength): This method inserts a new item
into the POTT. It accepts two parameters. The first parameter is the index of the
file in the NOTT. The second parameter contains the partial file length of the file.
The partial transfer starts from that point.
• get_fileidx (int idx): It returns the fileindex field of a given index in the POTT.
• get_currfilelength (int idx): This method returns the filecurrlength field of a
given index in the POTT.

6.4.2 Notify.CDT Package
The Class Dependency Table (CDT) package is available in the MSP version of the
Notify package. A mobile host user may want to transfer many objects that use shared
classes. If a copy of a class file is available at the MSP site, there is no use of
transferring it again. The CDT package prevents retransmission of the same classes by
keeping track of a table called as Class Dependency Table (CDT). The CDT class
uses another class called as Class Replica Table (CRT). Chapter 4 contains a very
detailed information about the logical structures of these tables. This section gives a
detailed explanation about the physical structure of the tables.

6.4.2.1 Notify.CDT.CDT Class
The Class Dependency Table (CDT) has been implemented as an array structure. The
table size is 512, in default. Since, the CDT and the CRT are global tables that may be
70
updated by many Notifiers at a time, they are synchronized. The format of the CDT is
shown in Table 6.5.

Object Identifier
Object Name
# of Dependent Classes
Dependent Classes
int
String
int
int Vector

Table 6.5: The format of the CDT.

Each MaROS object consists of at least one class file. The CRT contains the names
and the number of occurrences of transferred files at the MSP. The CDT contains the
number of these files and their references to the CRT. The Dependent Classes field is
a Vector structure that contains the indices of these files in the CRT. All of the vector
components are in integer format.

There are a number of methods for the table management:
• insert (int OID, String ObjectName, int DepClassNo): This method inserts a
new element into the table. The dependent classes are inserted by the insertClass()
method.
• insertClass (int index, String ClassName): It inserts the dependent class file
information into the dependent classes vector field. This information contains the
CRT index and the name of the class file.
• delete (int OID): This method removes an entry from the CDT. It also updates
the CRT entries.
71
• getClassName (int index, int classIndex): It is used for obtaining the file names
from the CDT. This method is mainly used by the Notifiers for the full object
transfer operation.
• printCDT(): It is used for debugging purposes.

6.4.2.2 Notify.CDT.CRT Class
The Class Replica Table (CRT) is used by the CDT. The CRT has been implemented
as an array structure. Its default size is 16384 (4000 in hexadecimal format). In the
future, it is planned to be implemented by using hashtable structure. The format of the
table is shown in Table 6.6. This table simply keeps the number of occurrences of
each transferred class file in the MSP.

Class Names
How Many Copies
String
int

Table 6.6: The format of the CRT.

There are several methods for maintaining the table:
• insert (String ClassName): It inserts a new class name into the table. If that class
name already exists, the How Many Copies field is incremented by 1.
• delete (int index): This method decrements the How Many Copies field by 1. If
its value becomes 0, that record is deleted, and its corresponding class file is
removed from the system.
• getClassName (int index): It returns the name of the class file at a given index.
• printCRT(): It is used for debugging purposes.
72
6.4.3 Recovery Package
The Recovery package has been designed and implemented to increase the reliability
of the mobile hosts. Therefore, this package has only MH version. There are two
classes in the Recovery package: The RecoveryAgent and the RecoveryTable classes.
Since, there is no signal handling in Java, an alternative approach has been designed.
This approach uses a mutually exclusive shared global table: The Recovery Table.
Chapter 5 contains all of the details for the design of this table. In this section, the
implementation of the table is explained.

6.4.3.1 RecoveryTable Class
The Recovery Table is used by the Recovery Agent and all recoverable MaROS
objects. It is a signal handling backbone for the MaROS. A recoverable object may be
signalled by setting its Shutdown Signal (SS) bit to 1 in the Recovery Table.
Moreover, a recoverable object may wait for another recoverable object, and then
continue its work by using special Recovery Table methods. The Recovery Table is a
global table, and it must be set as mutually exclusive. All the methods of the
Recovery Table are synchronized. The format of the table is shown in Table 6.7. The
table is a hashtable. The keys for the hashtable are OIDs of the objects. All hashtable
elements are vectors. Each element in the vector is in Object format.

Shutdown Signal
Object Identifier
(Hash Key)
OID of
SubObject #1
OID of
SubObject #2

Object
Object
Object
Object


Table 6.7: The format of the Recovery Table
73

The RecoveryTable class methods are explained below:
• Insert (Object key): This method inserts a new recoverable object information
into the RT. In all of the RT methods key is the object identifier, and Pkey is the
parent object's object identifier.
• Insert_SubObjectID (Object Pkey, Object key): It is used by the Recovery
Agent for recovering the table at the startup.
• Insert_SubObject (Object Pkey, Object key): This method inserts a new
element into RT, and updates its parent record adding the key of the subobject.
• Shutdown_Signal (Object key): It returns either 0 or 1. If this method returns a
1, that means the object with the given key should start its recovery procedure.
• Signal_Object (Object key): An object may signal another object by using this
method. It simply sets the Shutdown Signal (SS) field of the given object to 1.
• Wait_Object (Object key): In the recovery hierarchy, an object may have to wait
another object (e.g. its subobjects) to go on with its own recovery procedure. This
method blocks the calling object until the object with the given key finishes its
recovery.
• Signal_All_SubObjects (Object Pkey): This method is the enhanced version of
the Signal_Object() method. The parameter Pkey is the object identifier of the
object with one or many subobjects. It simply calls Signal_Object() method for all
of the subobjects.
• Wait_All_SubObjects (Object Pkey): This method is the enhanced version of
the Wait_Object() method. The object with an object identifier equals to Pkey
waits until all of its subobjects finish their recovery.
74
• clearShutdownSignal (Object key): It clears the Shutdown Signal (SS) field of
the given object in the Recovery Table.
• mutateShutdownSignal (Object key): This method sets the Shutdown Signal
field of the given object to 2. It is used by the terminating objects as a last call.
The methods Wait_Object() and Wait_All_SubObjects() check the SS field of the
object(s). These objects are removed from the RT by the garbage collector.
• deleteObject (Object Pkey, Object key): This method removes the object with
an object identifier equals to key. It also removes its entry from the record of its
parent object.

6.4.3.2 RecoveryAgent Class
The RecoveryAgent class is responsible from the coordination of the recovery
process. It is one of the main MaROS system agents. It manages the Recovery Table
(RT) and enables the system and the user objects to use the available Recovery Table
methods.

Another task of the Recovery Agent is the recovery of the Recovery Table. When the
system recovery is in progress in the system startup, the Recovery Agent restores the
Recovery Table.

The RecoveryAgent class maps the RecoveryTable class methods. Those methods
have already been explained in the previous section. The Recovery Agent has also
additional methods for the recovery process of the Recovery Table. These methods
are explained below:
75
• Signal (): This method is the controller of the shutdown process. It is triggered by
a shutdown request. Then, it initiates the recovery process.
• saveImage(): It is used for saving the image of the Recovery Table.

6.4.3.3 Recoverable Object Implementation
The current design and implementation of the recovery process in MaROS does not
provide a user transparent interface. In order to make an object recoverable, the
programmer should complete the following steps:
• Shutdown Specific Steps:
• The determination of the code states: A MaROS code may be divided into
several pieces. These pieces are called as code states. A code state change may
occur, when a program sends, receives or updates data. Figure 6.1 displays a
code part before and after the addition of the code states.
String tmpstr = strarr.substring (5); String tmpstr = strarr.substring (5);
strarr=tmpstr; strarr=tmpstr;
lngth = lngth - 5; lngth = lngth - 5;
strarr=strarr.substring(0,lngth)+'\0'; strarr=strarr.substring(0,lngth)+'\0';

// Get Virtual Port Number from VPM // ##################################
// MESSAGE RECEIVED
try { // E N T E R I N G S T A T E 1
dummy = new TCpClient ();
}catch(MaROS.net.VPortException vpe) { STATE = 1;

// exception handling // NECESSARY ROLLBACK DATA:
} // - NIT
// - lngth
PortNumber = dummy.reservePort(); // - strarr (<- OMMH)
PortKey = dummy.getKey(); // - HandlerPort

// Check Shutdown Signal
if (check_ShutdownSignal() == 1)
return;
// ##################################

// Get Virtual Port Number from VPM

try {
dummy = new TCpClient ();
}catch(MaROS.net.VPortException vpe){
// exception handling
}
PortNumber = dummy.reservePort();
PortKey = dummy.getKey();


BEFORE

AFTER

76
Figure 6.1: The example piece of code showing the addition of code states.

• The addition of the signal checker and the image file creator: Each
recoverable object has a record in the Recovery Table. Since, there is no signal
handling backbone in Java, the objects should periodically check their
Shutdown Signal field in the Recovery Table. This check may be done
between the code state transitions. At each transition, a method may be called
to check the Shutdown Signal field of the recoverable object. This method is
check_ShutdownSignal() as a tradition. Figure 6.1 and Figure 6.2 show the use
and implementation of this method, respectively.
// This method check shutdown signal for this object.
// It returns 0, if everything is usual
// Otherwise, it returns 1 indicating shutdown signal has reached

public static int check_ShutdownSignal()
{
if (RecoveryAgent.Shutdown_Signal(new
Integer(MaROSobject.currentObject().getOID() ) ) == 1)
{
saveImage();
RecoveryAgent.clearShutdownSignal(new Integer
(MaROSobject.currentObject().getOID()) );
return (1);
}
return (0);
} // check_ShutdownSignal()


Figure 6.2: The implementation of check_ShutdownSignal() method.

The next job is the creation of the image file creator. The traditional method
saveImage() is used for this process. This method creates a random access file in
the image directory with the name of the object identifier of the recoverable
object. Then, it saves the necessary tables and variables one by one. The format of
the image file is left to the programmer. However, it is also tradition to use "^"
character between the fields.

77


// This method takes image of the current object instance to disk
// including tables, etc.

public static void saveImage()
{
// Signal all subobjects
RecoveryAgent.Signal_All_SubObjects (new
Integer(MaROSobject.currentObject().getOID() ));

// All subobjects signalled
// Now wait them to finish
RecoveryAgent.Wait_All_SubObjects (new Integer
(MaROSobject.currentObject().getOID() ));

// All subobjects finished their recovery job
// Save Image File
RandomAccessFile imagefile;
String imagefilename = null;

// Create file
imagefilename =
SysConst.TempRecoveryPATH+MaROSobject.currentObject().getOID();
try
{
imagefile = new RandomAccessFile (imagefilename,"rw");
}catch (IOException ioe){
// error handling
return;
}

// STATE
try {
imagefile.writeByte (STATE);
imagefile.writeBytes ("^");
} catch (IOException ioe)
{
// Error handling: Unable to write image file
}

// Write data into file
if (STATE >= 0)
{
// NIT
try {
int i;
int tmax = NIT_instance.get_tmax();

imagefile.writeInt(tmax);
for (i=1; i<=tmax; i++)
{
imagefile.writeInt (NIT_instance.get_Notifier_OID (i));
imagefile.writeBytes ("^");
imagefile.writeInt (NIT_instance.get_ObjectID (i));
imagefile.writeBytes ("^");
imagefile.writeChar(NIT_instance.get_Type (i));
imagefile.writeBytes ("^");
}
} catch (IOException ioe) {
// Error handling: Unable to write NIT to image file
}
// NIT written
}

if (STATE >= 1)

{
<CODE CONTINUES>

Figure 6.3: An example code part of the saveImage() method.
78

• Startup Specific Steps:
• The addition of the image file reader: Each recoverable object code start by
checking its image file. If the object has an image file, it should be read and
the last state has to be restored. An example image file reader code is shown in
Figure 6.4.

RandomAccessFile imagefile;
String imagefilename = null;
byte recovery_state = (byte) 255;

// Image file name
imagefilename = SysConst.TempRecoveryPATH+MaROSobject.currentObject().getOID();
try
{
int mx=0; // Maximum size of the NIT
int i;
int _NOID, _OID;
char _Type;

imagefile = new RandomAccessFile (imagefilename,"r");

// There is an image file
// Get STATE First
try {
recovery_state = imagefile.readByte();
imagefile.readByte();

} catch (IOException ioe){
// Error Handling: Unable to read image file
}

if (recovery_state >= 0)
{

// recover NIT
mx = (imagefile.readInt ());
NIT_instance.set_tmax (mx);

for (i=1;i<=mx;i++)
{
_NOID = imagefile.readInt ();
imagefile.readByte();
_OID = imagefile.readInt ();
imagefile.readByte();
_Type = imagefile.readChar();
imagefile.readByte();
NotificationAgent.NIT_instance.insert (_NOID, _OID, _Type);
}
}

if (recovery_state >= 1)
{
<CODE CONTINUES>
Figure 6.4: An example image file reader code piece.

79

• The replication of the original code for each state: The original code should
be replicated for each state in the final step of the recovery. The replicated
code at each state enables the continuation of the execution of MaROS
objects. Another approach may be use of labels and unconditional jumps;
however, Java does not support any of them. Figure 5.7 shows the code
replication process in detail.


80



7

Evaluation and Future Work



7.1 Introduction
This chapter presents the results and the evaluation of the performance tests for the
notification module. Moreover, the future research areas for the system is discussed at
the end of the chapter.

7.2 Performance Evaluation
The testing platform has been set by using two different computers: One of them is
for the MaROS client and the other for the MaROS server. A Pentium 166MMX
machine with Turkuaz GNU/Linux 0.99 operating system has been set as an MSP,
and a Pentium 200 machine with Windows '95 operating system has been set as a
MaROS client. In the tests, two types of object transfer (full transfer and no transfer)
and the object deletion processes have been tested on a 100 Mbit ethernet, 115200
bit/sec., 24000 bit/sec. and 19000 bit/sec. modem connections. In all the tests, the
machines have minimum CPU load, and there are minimum network traffic. Both of
the machines run Java 1.1.6.
81
7.2.1 Full Transfer Tests
Notifier
MH
reads the files into a buffer, and then it sends them to Notifier
MSP
. The
buffer size is 4096 bytes (4K), in default. The Notifier
MSP
constructs the files by
collecting the incoming packets together. The default buffer size may be increased or
decreased by changing the SysConst.DEF_BUFF_SIZE system constant. In full
transfer tests, the effect of different buffer size values over transfer speed has been
tested. Since, the TCp uses 8K-packet size, the tests were run for 2K, 4K, 6K and
8000 bytes buffer sizes (The 8K-buffer size is not allowed, since 8K-TCp packet
contains a header). A MaROS object, with a size of approximately 500K, has been
transferred throughout the test. The results are shown in Table 7.1. The timer is
started before the first packet is sent from the Notifier
MH
to the Notifier
MSP
, and
stopped right after the Handler
MH
receives the notification result.

Buffer Size vs. Transfer Time (ms.)
100 Mbit
115200 bit
19200 bit
2K (2048 bytes) buffer size
60040
259910
432083
4K (4096 bytes) buffer size
32627
197017
364160
6K (6144 bytes) buffer size
26290
192423
346157
8000 bytes buffer size
18837
186070
351723

Table 7.1: Transfer results of 505319 bytes object.

From Figure 7.1 through Figure 7.3, the graph of transfer time vs. buffer size and
transfer speed vs. buffer size are shown for both 100 Mbit ethernet and modem tests.
Those figures show that the performance of the 100 Mbit ethernet connection
increases, when the buffer size is increased. On the other hand, modem tests show that
there is a barrier value for the buffer size (Figure 7.3). There is no performance
82
increase in the full transfer operation, when this barrier is exceeded. Moreover, the
use of larger buffer sizes may drastically decrease the performance as a side effect,
since large buffers has to be segmented for TCp encapsulation. In order to summarize,
there is no speed-up when using buffer size values larger than the communication
bandwidth of the mobile host.
0
10000
20000
30000
40000
50000
60000
70000
2K 4K 6K 8000
Buffer Size
Transfer Time (ms.)

Figure 7.1: Transfer time vs. buffer size graph for 100 Mbit tests
0
50000
100000
150000
200000
250000
300000
2K 4K 6K 8000
Buffer Size
Transfer Time (ms.)

Figure 7.2: Transfer time vs. buffer size graph for 115200 bit tests

83
0
100000
200000
300000
400000
500000
2K 4K 6K 8000
Buffer Size
Transfer Time (ms.)

Figure 7.3: Transfer time vs. buffer size graph for 19200 bit tests

The transfer speed vs. buffer size graphs below clearly depict the effect of different
buffer size values on full transfer operation. It is seen that the transfer speed increases,
if the buffer size is increased. However, there is a barrier for the buffer size as it is
seen in Figure 7.6. This barrier value is strictly effected by the network bandwidth.
For example, the 19200 bit/sec. modem connection provides maximum of about 6
Kbit/sec network bandwidth with compression, and this is the barrier for the buffer
size value.
0
5000
10000
15000
20000
25000
30000
2K 4K 6K 8000
Buffer Size
Transfer Speed (Kbyte/sec.)

Figure 7.4: Transfer speed vs. buffer size graph for 100 Mbit tests
84
0
500
1000
1500
2000
2500
3000
2K 4K 6K 8000
Buffer Size
Transfer Speed (Kbyte/sec.)

Figure 7.5: Transfer speed vs. buffer size graph for 115200 bit tests

0
200
400
600
800
1000
1200
1400
1600
2K 4K 6K 8000
Buffer Size
Transfer Speed (Kbyte/sec.)

Figure 7.6: Transfer speed vs. buffer size graph for 19200 bit tests
7.2.2 No Need Type Object Transfer and Object Deletion Tests
A No Need Type object transfer is an object transfer operation without a full transfer
or a partial transfer. The tests performed on the same test platform where the full
transfer tests were done. The timer is started right after the notification request is
received by the Notification Agent. It is stopped right after the Handler
MH
receives the
notification result. In the current code, there is a five seconds synchronization delay
85
included in these test results. Since, there is no object transfer, the size of the buffer
has no importance on the test results. Table 7.2 displays the results of the tests. The
modem tests have required approximately three more seconds to finish the operation
when compared with the 100 Mbit tests. The table also contains the results of the
object deletion tests. There is not much time difference between these two test results.
In these tests, it is seen that the bandwidth of the TCp connection does not have much
importance on the No Need Type object transfer and the object deletion operations.


Run #1 (ms.)
Run #2 (ms.)
Run #3 (ms.)
Average (ms.)
24 Kbit modem
(No Need Transfer)
12300
11810
12410
12173
24 Kbit modem
(Object Deletion)
11860
12300
12300
12153
100 Mbit ethernet
(No Need Transfer)
10050
9410
9120
9527
100 Mbit ethernet
(Object Deletion)
9060
10110
9170
9447

Table 7.2: The results of the No Need type object transfer tests.

7.3 Future Work
This section presents the future research areas related with the Notification and
Recovery modules of the MaROS. Furthermore, the future works planned for the
MaROS is discussed at the end of the section.

86
7.3.1 Future Work on the Notification Module
The Notification Module deals with the transfer and the deletion operations on the
relocatable objects. In the performance tests, it has been proved that there is no
considerable speed-up in the object transfer operation, when the buffer size exceeds
the communication bandwidth. Since, the mobile hosts may connect to the MSP in
different connection speeds, dynamic buffer size values may be used in order to
optimize the transfer operation for each connection.

Object compression is another useful approach for an optimal transfer process.
MaROS objects can be compressed and then be transferred to the MSP. When the
MSP receives the compressed object data, it may decompress and create the
relocatable MaROS object. This process requires additional object compression time;
however, the transfer speed will be improved considerably.

The object deletion process may be modified by adding a new server machine next to
the MSP. This machine may be called as MaROS Recycle Bin (MRB), and all the
objects, which are to be deleted, may be moved to the MRB instead of being deleted
from the MSP automatically. This approach does not increase the object deletion time
too much; since, there will be a very fast network connection between the MSP and
the MRB.

7.3.2 Future Work on the Recovery Module
Currently, the system is vulnerable to failures such as system lockups and hardware
problems. In order to overcome these problems, the Recovery Module should be
completely redesigned. Since, Java does not provide signal handling primitives, the
87
implementation language may need to be changed. However, this is not good for the
portability of the MaROS.

The Recovery Module may be made at-least semi-transparent to the programmers by
providing a programming interface, in the future. In the current version of Recovery
Module, a MaROS programmer should know almost everything about Recovery
Module to write recoverable applications.

7.3.3 Future Work on MaROS
There are many research areas that are not designed and implemented in the current
version of MaROS. Some of these areas are system security, heavy-weight migration,
and load balancing on multiple MSPs.

The system security is one of the most important issues in a system like MaROS;
since, there may be many unauthorized attempts to access to the system. There is a
host registration and authentication protocol; however, in the future, the design of a
new agent (Security Agent) should be considered.

In the current design and implementation, the Migration Agent only deals with the
light-weight type object migration. In the future, the migration of running MaROS
objects may be implemented. This type of migration may be made possible with
increase in the bandwidth of wireless connections in the future.

Another possible enhancement that may be implemented in the future is the use of
multiple MSPs. The current design may be extended to a distributed system of MSPs
88
connected via high-speed networks. In this case, MaROS may be optimized by using
techniques such as load balancing, and parallel processing.

In order to increase the system performance, the MaROS threads may communicate
using shared memory instead of using MaROS communication primitives. However,
all the possible problems such as starvation and deadlock of the objects should be
dealt in that case.

Finally, the implementation language may be changed in order to increase the overall
system performance. However, in this case, the system should be redesigned
considering the advantages and disadvantages of the new implementation language.
The C++ seems the ideal alternative. In order to keep the portability feature of
MaROS, the Java-based MaROS objects may continue to be used.
89



8

Conclusion


The Mobile and Relocatable Object System (MaROS) is an application development
platform especially designed to minimize the problems that arise from the limitations
of mobile computers. The system supports disconnected operations, object relocation,
and recovery of MaROS clients. In this dissertation, the design and the
implementation of the Notification and the Recovery modules have been presented.

The transfer operation of the relocatable objects is automatically initiated by the
system. A copy of the relocatable object is created on the MSP site, while the object is
being created on the mobile host. This process is called as the notification. The
notification process simplifies and speeds up the object relocation process. The
Notification Agent and other system agents use worker threads to achieve optimal
response times for the requests.

System recovery is one of the most important issues in a system like MaROS. In the
current design, the recovery of all recoverable objects is possible after voluntary
shutdowns. However, the Recovery module is not transparent to the programmers.
Currently, a MaROS programmer should follow a well-defined path to code
recoverable objects.
90

The recovery process is hierarchical and decentralized. The Recovery Agent only
coordinates the process by signalling the system objects in a hierarchical order.
However, the process is not centralized; since, all the system agents and recoverable
objects are responsible from their own recovery.

The current design of Recovery module does not cover failure recovery which is the
result of hardware and OS failures. In order to deal with that type of recovery, a
checkpointing approach should be designed and implemented. Java does not provide
low level primitives for accesing to the system resources directly. Therefore, another
programming language may be chosen for the implementation, in the future.

Mobile Computing is the technology of the future. Currently, there are many research
projects that are carried out on mobile computing platform. The aim of those projects
is to improve the performance and the functionality of mobile computers, in general.
MaROS is one of them, and it tries to provide an application development platform
especially designed for the mobile computers.
91




References



[1] M. Faiz, A. Zaslavsky, B. Srinivasan. Revising Replication Strategies for Mobile
Computing Environments
[2] Ramon Caceres, Liviu Iftode. Improving the Performance of Reliable Transport
Protocols in Mobile Computing Environments, Proceedings of the IEEE, special issue
on Mobile Computing Networks, 1994.
[3]

Şebnem Baydere et. al. MaROS: A Framework For Mobile Application
Development, EURO-PDS'97 International Conference on Distributed and Parallel
Systems, June'97, Barcelona, Spain
http://www.yeditepe.edu.tr/MaROS/paper1.ps.Z

[4] Jeppe D. Nielsen. Transactions in Mobile Computing,1995.
[5] Anthony D. Joseph, Alan F. deLespinasse, Joshua A. Tauber, David K. Gifford
and M. Frans Kaashoek, Rover: A Toolkit for Mobile Information Access, Proceedings
of the Fifteenth Symposium on Operating Systems Principles, December 1995.
[6] A.D. Birrel and B.J. Nelson, Implementing Remote Procedure Calls, ACM Trans.
Comp. Syst., 2(1):39-59, Feb. 1984.
[7] Anthony D. Joseph, M. Frans Kaashoek, Building Reliable Mobile-Aware
Applications Using the Rover Toolkit, Wireless Networks Magazine, Vol.3 (1997),
No. 5, October 1997.
92
[8] Toshio Shirakihara, Hideaki Hirayama, Kiyoko Sato and Tatsunori Kanai,
ARTEMIS: Advanced Reliable disTributed Environment Middleware System,
Proceedings of the International Conference on Parallel and Distributed Processing
Techniques and Applications, July 97.
[9] Andrzej Goscinski, Distributed Operating Systems The Logical Design, 1992,
Addison-Wesley Publishing Company
[10] Ören, T.I., Software Agents: Basic Concepts and Internet Applications,
Bilisim’96, Bildiriler96, 1996.
[11] Wreggit, D.J., Software Agents Using Java, Distributed Processing, 1995.
[12] Yıldız, M.C., Object Naming and Creation in a Mobile System, MSc. thesis,
Yeditepe University,1998.
[13] Demir, O., Object Relocation in a Mobile Computing Environment, MSc. thesis,
Yeditepe University, 1998.
[14] Devlet, G., A Communications Infrastructure for Disconnected Operations in a
Client/Server Computing Environment, MSc. Thesis, Yeditepe University, 1998.

93




Bibliography



[1] Naughton P., Schmidt H., Java: The Complete Reference, Osborne, MCGrawHill.
[2] Brian N. Bershad and Henry M. Levy. A remote computation facility for a
heterogeneous environment, Computer, 21(5): 50-60, May 1988.
[3]

Bruce Walker, Gerald Popek, Robert English, Charles Cline and Greg Thiel, The
LOCUS Distributed Operating System, In Proceedings of the Ninth ACM Symposium
on Operating System Principles, pages 49-70, October 1983.


[4] P. Stanski, An Integrating Architecture for Distributed and Persistent Mobile
Software Agents, PESOS Technical Report, Monash University Department of
Computer Technology, Australia, 1997.
[5] Theimer M.M., Lantz K.A. and Cheriton D.R., Preemptable Remote Execution
Facilities for the V System, In Proceedings of the Tenth ACM Symposium on
Operating Systems Principles, Orcas Island, Washington pp.2-12, 1985.
[6] V. Koudounas, Why Mobile Computing? Where can It be Used?,
http://www-
dse.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/vk5/article1.html

[7] J.F. Bartlett, W4 – the Wireless World Wide Web, In Proceedings of IEEE
Workshop on Mobile Computing Systems and Applications, December 1994.
94
[8] T.F. La Porta, K.K. Sabnani and R.D. Gitlin, Challenges for nomadic computing:
Mobility management and wireless communications, Mobile Networks and
Applications 1(1), 1996.
[9] Object Management Group, Corba Services: Common Object Services
Specification, revised edition, 95-3-31, March 1995.
[10] Object Management Group, The Common Object Request Broker Architecture
and Specification 2.0, July 1995.
[11] G.M. Voelker and B.N. Bershad, Mobisaic: An Information system for a mobile
wireless computing environment, In Proceedings of IEEE Workshop on Mobile
Computing Systems and Applications, December 1994.
[12] R. Want et al., An overview of the ParcTab ubiquitous computing environment,
IEEE Personal Communications Magazine, 2(6), December 1995.
[13] T.F. La Porta et al., Experiences with network-based user agents for mobile
applications, Mobile Networks and Applications, Vol.3 pp.123-141, August 1998.
[14] N. Davies et al., L
2
imbo: A distributed systems platform for mobile computing,
Mobile Networks and Applications, Vol.3 pp.143-156, August 1998.
[15] W.N. Schilit, A system architecture for context-aware mobile computing, PhD.
Thesis, Department of Computer Science, Columbia University, New York, 1995.
[16] B.D. Noble, M. Price and M. Satyanarayanan, A programming interface for
application-aware adaptation in mobile computing, In Proceedings of MLIC’95,
pp.57-66, Ann Arbor, MI, April 1995.
[17] A. Friday and N. Davies, Distributed systems support for mobile applications, In
Proceedings of IEEE Symposium on Mobile Computing and its Applications, Savoy
Place, London, November 1995.
95
[18] N. Davies, S. Pink and G.S. Blair, Services to support distributed applications in
a mobile environment, In Proceedings of SDNE’94, pp.84-89, Prague, June 1994.
[19] R. Parkash, M. Singhal, Dependency sequences and hierarchical clocks: Efficient
alternatives to vector clocks for mobile computing systems, Wireless Networks, Vol.3.
pp.349-360, October 1997.
[20] G.H. Forman and J. Zahorjan, The challenges of mobile computing, IEEE
Computer 27(4) pp.38-47, April 1994.
[21] K. Birman and T. Joseph, Reliable communication in the presence of failures,
ACM Transactions on Computer Systems 5(1) pp.47-76, February 1987.
[22] M. Ahamad, P. Dasgupta and R.J. Leblanc, Fault-tolerant atomic computations
in an object-based distributed system, Distributed Computing 4 pp.69-80, 1990.
[23] Sun Microsystems Corporation, Remote Method Invocation for Java,
http://chatsubo.javasoft.com/current/rmi/index.html
, July 1996.


96