An Application Service Provider for a Mobile Computing Environment

lowlyoutstandingMobile - Wireless

Nov 24, 2013 (3 years and 24 days ago)

130 views




An Application Service Provider for a
Mobile Computing Environment




by
Gürhan Küçük






A thesis submitted to Institute of Graduate Studies in
Science and Engineering partial fulfillment of the
requirements for the degree of
Master of Science
in
Information and Computer Science



YEDİTEPE UNIVERSITY
1998



Acknowledgements


I would like to express my deepest gratitude to my supervisor, Associate Professor
Şebnem Baydere, for her invaluable guidance, motivation and support during all
stages of this study.

I would like to thank specially to Onur Demir, my colleague and partner throughout
the past four years, for his invaluable help and patience. I would also like to thank the
members of the MaROS project Mehmet Can Yıldız and Giray Devlet, and former
MaROS members Nurhan Çetin, Değer Cenk Erdil and Nilüfer Girgin for their
presence and invaluable help. Moreover, thanks to Yeditepe University Engineering
Faculty staff Alper Özpınar, Ayşegül Ergin, Ertan Toprakbastı, İlker Birbil, Mehmet
Ali Özcan, Rana Belen, Tansel Çağlar and Zeynep Yazıcıoğlu for their moral support
and help.

Last but not least, thanks to my mother, Emek Küçük, and my father, İlhan Küçük,
and the rest of my family for their moral support and patience throughout all stages of
this study and my life.

2




Abstract


MaROS (Mobile and Relocatable Object System) application development platform
is especially designed for mobile computers. In this platform, registered mobile
computers may transfer their applications to MaROS server. This process is called
“relocation”. In this thesis, the design and implementation of the MaROS Notification
and Recovery modules are presented.
MaROS Notification module deals with transfer and removal of MaROS objects. It is
the preparation step for the object relocation process. MaROS Recovery module deals
with the system recovery and startup process after voluntary shutdown requests. It
coordinates the recovery process by keeping a table called “Recovery Table”.
3


Contents

LIST OF TABLES vii

LIST OF FIGURES viii

LIST OF ABBREVIATIONS x

1 INTRODUCTION......................................................................................................1
1.1 Introduction..........................................................................................................1
1.2 Mobile and Relocatable Object System (MaROS)..............................................3
1.3 Motivation and Aims...........................................................................................4
1.4 Background Information......................................................................................5
1.4.1 Mobile Computing.........................................................................................5
1.4.2 Mobile Host...................................................................................................5
1.4.3 Disconnected Communication.......................................................................6
1.4.4 Objects...........................................................................................................6
1.4.5 Object Relocation (Object Migration)...........................................................7
1.4.6 Object Notification........................................................................................7
1.4.7 Object Recovery............................................................................................7
1.5 Thesis Summary...................................................................................................8

2 PREVIOUS WORK...................................................................................................9
2.1 Introduction..........................................................................................................9
2.2 Rover: A Toolkit for Mobile Information Access...............................................9
2.3 ARTEMIS:Advanced Reliable disTributed Environment Middleware System 11
2.4 Eden...................................................................................................................13
2.5 LOCUS..............................................................................................................15
2.6 Discussion..........................................................................................................16


3 MaROS Environment..............................................................................................18
3.1 Introduction........................................................................................................18
3.2 The Physical Structure of MaROS.....................................................................18
4
3.3 The Logical Structure of MaROS......................................................................20
3.4 The System Agents............................................................................................21
3.4.1 Communication Agent.................................................................................22
3.4.2 Object Manager............................................................................................22
3.4.3 Notification Agent.......................................................................................23
3.4.4 Migration Agent...........................................................................................23
3.4.5 Recovery Agent...........................................................................................23
3.5 Host Registration and Authentication Protocol.................................................24
3.6 MaROS Objects.................................................................................................26
3.6.1 Object Types................................................................................................26
3.6.2 Object Modes...............................................................................................26
3.6.3 Object States................................................................................................27
3.7 Communication Structure..................................................................................29
3.8 System Recovery...............................................................................................30

4 NOTIFICATION DESIGN.....................................................................................32
4.1 Introduction........................................................................................................32
4.2 Notification in General......................................................................................33
4.3 Detailed Design..................................................................................................34
4.3.1 Peer Entities.................................................................................................35
4.3.2 Reserving Ports............................................................................................36
4.3.3 Notifier Tables.............................................................................................39
4.3.3.1 Notifier Object Transfer Table (NOTT)...............................................39
4.3.3.2 Partial Object Transfer Table (POTT)..................................................39
4.3.3.3 Class Dependency Table (CDT) and Class Replica Table (CRT)........40
4.3.3.4 Notifier Information Table (NIT).........................................................42
4.3.4 Object Transfer (Object Creation)...............................................................43
4.3.4.1 Full Transfer..........................................................................................46
4.3.4.2 Partial Transfer......................................................................................47
4.3.4.3 No Need To Transfer............................................................................47
4.3.5 Object Deletion............................................................................................48


5 SYSTEM RECOVERY DESIGN...........................................................................50
5.1 Introduction........................................................................................................50
5.2 Recovery Table (RT) and Recovery Tree Structure..........................................51
5.3 Recoverable Objects vs. Unrecoverable Objects...............................................53
5.4 System Shutdown...............................................................................................55
5.4.1 Creating Image Files....................................................................................57
5.5 System Startup...................................................................................................58
5.5.1 Object Manager Startup Process..................................................................58
5.5.2 Recovery Agent Startup Process..................................................................60
5.5.3 Startup of the System Agents.......................................................................61
5
5.5.4 Mutation of SS fields and RA Garbage Collector........................................62

6 PILOT SYSTEM IMPLEMENTATION...............................................................63
6.1 Introduction........................................................................................................63
6.2 Pilot System Implementation Language............................................................63
6.3 Pilot System Implementation Environment.......................................................65
6.4 Pilot System Implementation.............................................................................65
6.4.1 Notify Package.............................................................................................66
6.4.1.1 Notify.NotificationAgent Class............................................................66
6.4.1.2 Notify.NIT Class...................................................................................66
6.4.1.3 Notify.NotifierClass Class....................................................................68
6.4.1.4 Notify.NOTT and Notify.POTT Classes..............................................68
6.4.2 Notify.CDT Package....................................................................................70
6.4.2.1 Notify.CDT.CDT Class.........................................................................70
6.4.2.2 Notify.CDT.CRT Class.........................................................................72
6.4.3 Recovery Package........................................................................................73
6.4.3.1 RecoveryTable Class.............................................................................73
6.4.3.2 RecoveryAgent Class............................................................................75
6.4.3.3 Recoverable Object Implementation.....................................................76

7 EVALUATION AND FUTURE WORK...............................................................81
7.1 Introduction........................................................................................................81
7.2 Performance Evaluation.....................................................................................81
7.2.1 Full Transfer Tests.......................................................................................82
7.2.2 No Need Type Object Transfer and Object Deletion Tests..........................85
7.3 Future Work.......................................................................................................86
7.3.1 Future Work on the Notification Module....................................................87
7.3.2 Future Work on the Recovery Module........................................................87
7.3.3 Future Work on MaROS..............................................................................88

8 CONCLUSION.........................................................................................................90

REFERENCES..............................................................................................................92

BIBLIOGRAPHY.........................................................................................................94
6


List of Tables

Table 1.1 Characteristics of computer hardware..............................................................2
Table 1.2 Characteristics of network technology.............................................................2

Table 3.1 An instance of the Host Identification Table (HIT)........................................24
Table 3.2 Object states....................................................................................................29

Table 4.1 Sample NOTT and POTT instances...............................................................40
Table 4.2 Sample CDT and CRT instances....................................................................41
Table 4.3 Sample NIT instances.....................................................................................42

Table 6.1 The format of the MH version of the NIT......................................................67
Table 6.2 The format of the MSP version of the NIT.....................................................67
Table 6.3 The format of the NOTT.................................................................................69
Table 6.4 The format of the POTT.................................................................................69
Table 6.5 The format of the CDT...................................................................................71
Table 6.6 The format of the CRT....................................................................................72
Table 6.7 The format of the Recovery Table..................................................................73

Table 7.1 Transfer results of 505319 bytes object..........................................................82
Table 7.2 The results of the No Need type object transfer tests......................................86
7

List of Figures

Figure 2.1 The Rover toolkit client/server distributed object model..............................11
Figure 2.2 Highly reliable distributed environment provided by ARTEMIS.................12
Figure 2.3 Transitions between active and passive representations in Eden..................14

Figure 3.1 The physical structure of MaROS.................................................................19
Figure 3.2 MaROS layers...............................................................................................20
Figure 3.3 The host authentication process.....................................................................25

Figure 4.1 Initial phase of a Notification process...........................................................35
Figure 4.2 A Notification design approach (Initial Phase).............................................36
Figure 4.3 Current design of the Notification process (Initial Phase)............................37
Figure 4.4 The comparison of two approaches...............................................................38
Figure 4.5 Scope of the Notifier tables...........................................................................40
Figure 4.6 Object transfer (creation) process..................................................................43
Figure 4.7 Message format of a notification request......................................................44
Figure 4.8 cfinfo field in detail........................................................................................44
Figure 4.9 Message format of the message that is sent from H
MSP
to Notifier
MSP
.........46
Figure 4.10 pcfinfo field in detail....................................................................................47
Figure 4.11 Message format of the message sent from Notifiers to Handlers................48
Figure 4.12 Object deletion process................................................................................48

Figure 5.1 A sample Recovery Table instance and its corresponding tree structure......52
Figure 5.2 Setting the SS field........................................................................................55
Figure 5.3 The OM realizes the Shutdown Signal..........................................................56
Figure 5.4 Flow of the Shutdown Signal........................................................................57
Figure 5.5 Startup of the Object Manager......................................................................59
Figure 5.6 Startup of the Recovery Agent......................................................................60
Figure 5.7 Code replication solution for the recovery process.......................................61

Figure 6.1 The example piece of code showing the addition of code states...................76
8
Figure 6.2 The implementation of check_ShutdownSignal() method.............................77
Figure 6.3 An example code part of the saveImage() method........................................78
Figure 6.4 An example image file reader code piece......................................................79

Figure 7.1 Transfer time vs. buffer size graph for 100 Mbit tests..................................83
Figure 7.2 Transfer time vs. buffer size graph for 115200 bit tests................................83
Figure 7.3 Transfer time vs. buffer size graph for 19200 bit tests..................................84
Figure 7.4 Transfer speed vs. buffer size graph for 100 Mbit tests................................84
Figure 7.5 Transfer speed vs. buffer size graph for 115200 bit tests..............................85
Figure 7.6 Transfer speed vs. buffer size graph for 19200 bit tests................................85
9
List of Abbreviations


AO Authentication Object
CA Communication Agent
CPU Central Processing Unit
CDT Class Dependency Table
CGI Common Gateway Interface
CRT Class Replica Table
DBMS Database Management System
DNS Domain Name Service
FTP File Transfer Protocol
H Handler
HIT Host Identification Table
ID Identifier
JVM Java Virtual Machine
LAN Local Area Network
MA Migration Agent
MaROS Mobile and Relocatable Object System
MASL MaROS Application Support Layer
MCSL MaROS Communication Support Layer
MH Mobile Host
MID Mobile Host Identifier
MMX Multimedia Extension
10
MRB MaROS Recycle Bin
MSP Mobile Host Service Provider
MUI MaROS User Interface
N Notifier
NA Notification Agent
NIT Notifier Information Table
NOTT Notifier Object Transfer Table
O MaROS Object
OID Object Identifier
OOP Object Oriented Programming
OM Object Manager
OS Operating System
OT Object Table
POTT Partial Object Transfer Table
QRPC Queued Remote Procedure Call
RA Recovery Agent
RDO Relocatable Dynamic Object
RT Recovery Table
SS Shutdown Signal
TCp Turkish Coffee Protocol
TCP Transmission Control Protocol
UDP User Datagram Protocol
VPM Virtual Port Mapper
VPR Virtual Port Reservation
WWW World Wide Web
11



1

Introduction


1.1 Introduction
Computers and computer networks have opened a new era in the world history:
Information Age. Today, we can access any database located in any region of the
world as if it were locally available, join multimedia conferencing, and do online
shopping. Distributed systems and client/server architecture are actually the best
keywords that may describe this age. A distributed system consists of many
computers that are connected to a computer network. In the client/server model,
servers are the computers that share their resources to the system, and clients are the
computers that use these resources. This scheme is ideal for computer networks with
fixed hosts.

In the beginning of this decade, wireless networks have started to become popular
with the increase in the number of portable computer sales. Wireless networks have
provided computers with wireless interfaces that allow networked communication
even while a user is travelling. The rapid advances in cellular communication
technology, wireless LAN, and satellite services have enabled mobile users to access
information anywhere and at anytime [1].
1
Wireless networks need everything that classical computer networks need. However,
they also need an improvement over classical client/server model. Because, reliable
transport protocols have been tuned for networks composed of wired links and
stationary hosts [2]. Moreover, the classical client/server model assumes that both the
clients and the servers are connected to a network via a fixed, and continuous
connection. The portable computers have some deficiencies such as low bandwidth
capacity, limited power supply, limited CPU power, and vulnerability to line failures,
hand-offs, etc. Table 1.1 presents the characteristics of computer hardware. From the
table it is clear that portability is traded for performance.

Hardware/
Characteristic
Server
Workstation
Laptop
Palmtop
Processing power
High
High
Medium
Limited
Storage capacity
High
High
Medium
Limited
Portability
None
Limited
Slightly limited
Full
User interface
Full
Full
Slightly limited
Limited
Reliability
High
Medium
Limited
Limited

Table1.1: Characteristics of computer hardware.
Table 1.2 summarizes the major characteristics of network technology. From this
table it is obvious that availability is traded for performance.

Techology/
Characteristic
Fixed WAN/LAN
Dial-up wire
Dial-up cellular
Bandwidth
High
Medium
Low
Reliability
High
Medium
Low
Initial Cost
High
Low
Low
Latency
Low
Medium
High
Cost to use
Low
Medium
High
Topology
Fixed
Fixed, but readily
changeable
Dynamic
Available at
Outlet in organization
Any phone outlet
Anywhere
(theoretically)

Table 1.2: Characteristics of network technology
2
A new type of client/server model or an application development platform has to be
designed to deal with the inadequacies of the portable computers. This new model has
to support many new ideas such as disconnected communication, object relocation,
and system recovery, which are hardly needed in classical client/server model.

1.2
Mobile and Relocatable Object System (MaROS)

MaROS is a mobile computing environment that is especially designed for
suppressing the inadequacies of mobile computers [3]. In a mobile computing
environment, there are portable computers that are connected to a static network via
wireless links. Portable computers in these mobile environments, have limited power
supply, and limited communication bandwidth. Because, the classical client/server
model assumes that both the clients and the server (or servers) are connected to a
network via a fixed, and uninterrupted connection, it is not a good solution for
portable computers. Moreover, classical communication protocols, such as TCP, does
not take the wireless networks into consideration.

MaROS proposes a new type of client/server model, and a new type of
communication protocol. This model and protocol, helps mobile hosts to extend their
computing environment. The MaROS server called as MSP (Mobile Host Service
Provider), is a fixed, and powerful host that has a wireless interface to communicate
with mobile hosts (or MaROS clients). MaROS clients (or MHs, or Mobile Hosts),
may transfer their objects to MSP, and execute there. This approach enables Mobile
Hosts to extend their computing environment using fixed, and powerful server
machines.

3
1.3 Motivation and Aims
Extending the computing environment of a Mobile Host is one of the major aims of
MaROS. With this service, Mobile Hosts may run CPU, and bandwidth bound
applications even when they are powered off. However, this service requires the
transfer of applications, or parts of applications to the MSP site. This process is a kind
of object synchronization process which creates the exact copy of an object in the
MSP. The Notification service primarily concerns with the transfer of the MaROS
objects. The additional task of this service is deletion of transferred objects, when
they are not needed anymore. The object notification process is also a preparation to
object migration process. It simplifies the migration process by transferring the
objects when they are created.

The mobile computing environment of MaROS should be efficient, and reliable. If the
Mobile Host has to be shutdown (Note that, this is a very usual case for a portable
machine), without a Recovery service, many jobs have to be restarted from the
beginning, or -worst of all- many jobs may be lost, and may not be started anymore.
The Recovery service enables the Mobile Hosts to continue their execution after a
proper system shutdown and start-up process. It provides much more stable and
efficient computing environment for Mobile Host users. There are many research
efforts related with recovery in distributed systems. Some of them deal with failure
recovery which covers software and hardware failures all together. Many of them use
checkpointing algorithms for rollbacking to a previous state after a failure. The
current recovery system of MaROS does not deal with hardware failures. The system
tries to recover itself, after a shutdown request is given by the user. The recovery
module is one of the most important parts of the MaROS system. It enables the
4
MaROS user to exit the system whenever it is necessary to know that at the next
startup, the system will continue its execution from the exact point where it has
stopped.

1.4 Background Information
This thesis is mainly concerned with an application development environment for
mobile computers. Therefore, it is vital for any reader to know the basic concepts in
this area. This section provides a general background information on mobile
computing environments, and object related issues.

1.4.1 Mobile Computing
With the rapid increase in the number of portable computers, and the existence of
their wireless link interfaces, a new computing approach has become feasible: Mobile
Computing. The Wireless Computing, Ubiquitous Computing, Location-independent
Computing and the Nomadic Computing terms are also equally used. The portable
computers in a mobile computing system are usually called as Mobile Hosts.

1.4.2 Mobile Host
A Mobile Host is a portable computer that may connect to a network via its wireless
link interface. It may be carried easily; however, it has some important disadvantages.
It has to be recharged periodically; since, it has a limited power supply. It may be
connected to a mobile network via a cellular phone line, when on the move. This type
of connection provides communication hand-offs when changing cells. Therefore, the
communication is not reliable as it is in fixed networks. Moreover, the communication
5
bandwidth is far less than the bandwidth of the fixed networks. Furthermore, mobile
communication is still expensive. A Mobile Host is also vulnerable. It may be
dropped and physically broken, or it may be stolen. All of these handicaps have
forced the computer scientists to design new type of computing environments that
covers Mobile Hosts.

1.4.3 Disconnected Communication
A Mobile Application Environment should provide communication primitives
transparent to the applications. Disconnected communication is the part of a new type
of communication protocol designed for supporting mobile hosts. It tries to minimize,
or totally remove the side effects of connecting to a network via a mobile host. By
using queuing strategies at both side of the connection, the client and the server may
send data, even when there is no connection. The queued data is sent to the target
host, when the communication link becomes up again.

1.4.4 Objects
The object concept is used in many areas. A car, a TV set, a computer program, and a
door knob are examples of an object. There are two main characteristics of an object
that are universal: 1) Properties and 2) Methods. Each object has special properties.
For instance, a car object has a color, a type, a width, and a length. It has also
methods such as drive, stop, and change gear. In MaROS context, the objects are
Java threads. They have many properties such as recoverability, relocatability, and
state. A MaROS object has also many methods. The suspendObject, sleepObject,
activateObject, and relocateObject methods are some examples of these methods.
6

1.4.5 Object Relocation (Object Migration)
Sometimes, an object may be needed to move from one machine to another. The
reasons for that process may be various. The host that contains the object may be very
loaded, and the object may be moved to a less loaded machine. This process is called
as load balancing. Another reason may be long-running objects. They require
uninterrupted execution, and this may not be possible every time. For instance, a
portable computer has a limited power supply and unreliable communication
interface. All these handicaps are not suitable for such kind of objects. In short,
Object Relocation is the movement of an object from one host to another. In MaROS
terms, it is the movement of a MaROS object from Mobile Host (MaROS client) to
Mobile Host Service Provider (MaROS server); since, MaROS supports one-way
object relocation.

1.4.6 Object Notification
In order to support object relocation, the copy of the objects should be created at the
MSP. These mirror objects should also be deleted, when the object is to be deleted
from the mobile host. This is a kind of object synchronization process, and since the
MSP is informed about the process, the process is called as notification.

1.4.7 Object Recovery
In a mobile computing environment, there are portable computers that are not reliable.
The recovery process is used for handling system failures. There are exactly two types
of failures: Involuntary and voluntary. The first one may occur any time, and its
7
source may be a hardware or software problem. This type of failures are handled by
saving snapshot of the system (checkpointing) periodically, and rollbacking to a
previous stable state in the next system startup. The second type of failures is
voluntary. The user may want to shutdown the system, and the system may be
signalled before the system shutdown occurs. The voluntary system interruptions give
the system a chance to save its crucial data. Of course, dealing with the first type of
failures is much more difficult.

1.5 Thesis Summary
MaROS is an application development platform especially designed for portable
computers. These mobile hosts are called as MaROS clients. MaROS is a client/server
environment, and consists of two types of machine: MaROS clients (MH, in short)
and MaROS server or Mobile Host Service Provider (MSP). A MaROS client runs
MaROS client code and connects to the MSP via wireless link. MSP accepts
connections from MaROS clients and permits them to relocate their objects.

This thesis mainly concerns with the design and implementation of Notification and
Recovery modules of MaROS. It will also cover the general architecture of MaROS to
provide more clear vision of the system to the reader. In Chapter 2, previous work on
this area is surveyed. Chapter 3 covers the overall design of the MaROS. In Chapter 4
and 5, the design of Notification and Recovery services of MaROS is explained in
detail. Chapter 6 provides the implementation details of the pilot system. Chapter 7
presents the performance evaluation and also additional ideas that may be
implemented in the future. Chapter 8 is the final chapter, and it presents a conclusion
for the whole work.
8



2

Previous Work


2.1 Introduction
There are several operating systems and toolkits focused on mobile and/or distributed
computing platforms. In this chapter, some of those projects are discussed, in detail,
focusing especially on their recovery modules.

2.2 Rover: A Toolkit for Mobile Information Access
This toolkit is one of the projects that are very close to MaROS. The Rover toolkit
was developed in the Computer Science Laboratory in MIT. It provides mobile
application developers with a set of tools to isolate mobile applications from the
limitations of mobile communication systems. It supports mobile communication by
providing Relocatable Dynamic Objects (RDOs) and Queued Remote Procedure Call
(QRPC). An RDO is an object with a well-defined interface that can be dynamically
loaded into a client computer from a server computer (or vice versa) to reduce client-
server communication requirements. Queued remote procedure call is a
communication system that permits applications to continue to make non-blocking
remote procedure calls [6] even when a host is disconnected: requests and responses
are exchanged upon network reconnection [5].
9
The Rover toolkit offers applications a uniform distributed object system based on
client/server architecture. Rover applications employ a check-in, check-out model of
data sharing: They import RDOs into their address spaces, invoke methods provided
by the RDOs, and export the RDOs back to servers.

The latest extensions provide the tools for handling a specific class of faults: transient,
recoverable faults. These faults are typically caused by environmental circumstances
(e.g. power glitches, communication link errors or failures, resource exhaustion due to
high system load, etc.) or software errors in rarely used code paths. The extensions do
not address repeatable or non-recoverable failures (e.g. those due to critical design or
implementation errors).

The reliability extensions leverage functionality already provided by the Rover
toolkit: stable logging of each message sent by a client and message retransmission
after communication failures. While the use of stable logging at the client provides
reliable delivery of a message to a server, it does not handle failures at the server [7].

Figure 2.1 shows the Rover toolkit client/server distributed object model. Rover offers
applications client caching and optimistic concurrency control based upon a check-in,
check-out model of data sharing. Client applications use QRPCs to import RDOs
from servers (steps 1 and 2) and to export changed RDOs back to servers (steps 3 and
4).

10
Figure 2.1: The Rover toolkit client/server distributed object model.

2.3 ARTEMIS: Advanced Reliable disTributed Environment
Middleware System
ARTEMIS is a middleware to improve reliability of application programs, which are
executed in distributed environment such as 3-tiers client-server model application
programs or groupware application programs, without changing them.

ARTEMIS is implemented as library routines and daemon processes with the
configuration where there is a backup computer for a server computer. ARTEMIS
uses checkpoints as its key method for achieving high reliability. It provides a
checkpointing protocol which makes checkpoints of distributed processes
consistently.
11

Figure 2.2: Highly reliable distributed environment provided by ARTEMIS.

Figure 2.2 shows the environment provided by ARTEMIS. In this example,
ARTEMIS controls a WWW server, CGI application programs and a DBMS running
in a server computer as well as WWW browsers running in client computers. Under
control of ARTEMIS, even if the primary server computer goes down, all the
processes in the primary server computer can be resumed in the backup server
computer using their checkpoints and replicated files. DBMS can continue to run in
the backup server computer without executing journal recovery processing.

In the ARTEMIS environment, it is not necessary to modify application programs;
because, ARTEMIS libraries are linked to application programs dynamically, and
they have the same interfaces with and operating systems. ARTEMIS libraries keep
12
watch on behavior of a process to which they are linked, and acquire checkpoints of
its process [8].
2.4 Eden
The Eden system was developed at the University of Washington in Seattle. The goal
of Eden system was to investigate logically-integrated but physically-distributed
operating systems.

Eden was based on the object model. It is descendant of Hydra (Wulf et al. 1981). All
‘traditional’ programs and physical and logical resources are represented as objects.
There are no pure data objects – Eden objects are supported by active processes. An
Eden object may be seen as an instance of an abstract data type. Because, there are
some differences between Eden’s objects and those of other systems and languages,
the designers refer to them as Ejects (for Eden Objects).

The underlying system of Eden is Berkeley UNIX running on VAXes. Each active
Eject executes within a separate UNIX process with its own address space. This
process is managed by the Eden kernel using UNIX facilities.

Ideally, an Eject should be active. However, it is not always active, either because it
or its computer has crashed, or because it has explicitly deactivated itself in order to
economize on the use of system resources. Thus, an Eject has two manifestations: An
active representation (with its system-level process) and a passive representation. The
passive representation consists primarily of a disk file, and only the passive
representation can survive a crash.

13
An Eject can perform a Checkpoint operation. This operation creates a passive
representation, that is, a data structure designed to endure system crashes. This means
that the in a passive representation should be sufficient to enable the Eject to
reconstruct its long term state. Acquiring and releasing active and passive
representations are illustrated in Figure 2.3.

Figure 2.3: Transitions between active and passive representations in Eden.

The figure shows that when an Eject is created, only an active representation exists. It
does not have its state saved in permanent store. This implies that if this Eject were to
Deactivate, or if the system were to crash, it would vanish and it could not be invoked
again.

Performing a Checkpoint operation results in the following operations: Opening a
passive representation of an Eject, writing its state in a series of PutData calls, and
14
completing the passive representation with a call. The Eject then has its state and
identity on permanent store. If this Eject Deactivates or crashed, its active
representation vanishes, but the passive representation remains. If the Eject having a
passive representation is invoked by another Eject, then the kernel reactivates it, that
is, it constructs a new active representation [9].

2.5 LOCUS
LOCUS is a UNIX-compatible, distributed operating system developed by Popek,
Walker and their co-workers at the University of California, Los Angeles. The system
has been in use for several years.

LOCUS’s general goals include making the development of distributed applications
as simple as single machine programming, and realizing the potential that distributed
systems with redundancy have for highly reliable, available operation. The LOCUS
architecture addresses the goals of:

(1) Network transparency – giving all users the illusion of operating on a single
computer. The network is not visible; there is no need to refer to a specific node
of a network;
(2) High reliability and availability – introduced for two general reasons. First,
many applications demand a high level of reliability and availability. Second, the
distributed environment presents new sources of failure, and recovery
mechanisms to deal with them are far more difficult to construct than in
centralized computer systems. LOCUS processes one very important reliability
15
feature, namely, it supports automatic replication of stored data, with the degree
of replication indicated by associated reliability profiles; and
(3) Good performance – LOCUS achieves two basic performance characteristics
desirable in the case of distributed system:
(a) Access to local resources in a distributed system should have comparable
performance to access to resources in a centralized system, as if mechanisms
for remote access were not present.
(b) Remote access, of course slower than local access, should be reasonably
comparable to local access [9].

2.6 Discussion
All of the above platforms provides system reliability by utilizing a special system
object (agent) or including an extension package to the system. Some of the platforms
use replication strategy for a more reliable system [8,9]. The rest of the platforms
prefer checkpointing strategy for the system recovery. The checkpointing strategy is
divided into two camps. One camp applies checkpointing operation periodically over
a system-wide perspective [7], and the other side uses a one-time checkpointing
operation over only recoverable objects [8].

The replication strategy requires a very extensive network traffic that is not suitable
for mobile platforms. In the mobile platforms, the most vulnerable machines are the
mobile hosts. Keeping a replica for each mobile host is not a feasible approach; since,
the mobile hosts do not have enough network bandwidth for supporting this kind of
strategy. However, this strategy is very effective when dealing with hardware failures.

16
On the other hand, the checkpointing strategy may be effectively used by the mobile
platforms. However, it is not as effective as the replication strategy for the hardware-
based failures. It is possible to continue execution from the last checkpoint; however,
there is nothing to do, if the checkpoint information is damaged. The second type of
checkpointing strategy (one-time checkpointing) could not deal with hardware
failures.


17



3

MaROS Environment



3.1 Introduction
This thesis covers the design and implementation of the two crucial parts (The
notification and the recovery modules) of the MaROS environment; however, it is
necessary to explain the MaROS environment clearly to give an idea about the whole
system, before going into more details about its modules. The first topic of this
chapter is physical and logical structure of the MaROS. Then, the registration and
authentication process of the mobile hosts is explained in detail. The object specific
events and the system recovery process are discussed at the end of this chapter.

3.2 The Physical Structure of MaROS
The physical structure of MaROS consists of many portable computers and a fixed
host. The portable computers may connect and disconnect to the fixed host via their
wireless network interfaces using cellular phones. The fixed host has also a wireless
network interface to communicate with the portable computers. The portable
computers run the MaROS client software, and they are called as MaROS clients. On
18
the other hand, the fixed host runs the MaROS server software, and it is called as the
MaROS server. In MaROS terms, the MaROS clients are called as Mobile Hosts (or
MHs, in short), and the MaROS server is called as Mobile Host Service Provider (or
MSP, in short). In Figure 3.1, the physical structure of MaROS is shown in detail. In
the current design and implementation of MaROS, only one MSP is available. The
design may be modified by using additional MSPs. These MSPs may provide many
new ideas, which may be implemented in the near future, to the current design such as
parallel processing and load balancing.


Figure 3.1: The physical structure of MaROS.

The MaROS is platform independent. It may run on any OS that supports Java Virtual
Machine (JVM). This is a great advantage for both system programmers and
19
application developers. Once a Java code is compiled, it may be transferred into any
other platform and run there.

3.3 The Logical Structure of MaROS
MaROS uses a layered approach. In each layer, there may be one or more modules.
Each module is responsible from a specific task, and it may use the services provided
by the lower layer modules. The interaction between layers is shown in Figure 3.2.

RA: Recovery Agent
MA: Migration Agent
OM: Object Manager
NA: Notification Agent
CA: Communication Agent
A, B: MaROS Application

Figure 3.2: MaROS layers.

The lowest layer is a composite layer. It is the combination of the OS kernel and the
JVM. The upper layers do not directly communicate with the kernel. They use the
lowest layer services via JVM.

20
The second layer is called as the MaROS Communication Support Layer (MCSL). In
this layer, the Communication Agent (CA) takes place. It provides reliable
communication primitives. These layer also supports disconnected communication
that is essential for the mobile hosts. The communication protocol provided by this
layer is called as the Turkish Coffee Protocol (TCp). The name is probably an
inspiration based on Java.

The third layer is the MaROS Application Support Layer (MASL). This layer is very
rich in agents. There are four agents in this layer: Object Manager, Notification
Agent, Migration Agent, and Recovery Agent. These agents provide many services to
the MaROS applications. Even the Communication Agent uses some of these
services. Object specific services such as object creation, deletion, notification,
relocation and system recovery are supported in this layer.

In the upper layer, the MaROS applications are take place. A MaROS application is a
Java application that may use MaROS services provided by the lower layers.

3.4 The System Agents
The software agent concept is one of latest programming techniques in the computing
world. Software agents are software modules with cognitive abilities such as
motivation, goal processing, reasoning and autonomy [10]. They are capable of
learning, act independent of the user to achieve a given goal [11]. In the current
implementation of MaROS, the system agents are MaROS applications that are
responsible from performing the given tasks and providing services to the applications
independent of the user. They do not have artificial intelligence.
21

There are five system agents in MaROS: Communication Agent (CA), Object
Manager (OM), Notification Agent (NA), Migration Agent (MA) and Recovery
Agent (RA). In the current design, the MSP does not support recovery. Therefore, the
Recovery Agent is available for only MHs. On the other hand, the other system agents
have MH and MSP versions.

3.4.1 Communication Agent
In MaROS, the Communication Agent is the system agent which is responsible from
all the communication backbone. It uses a new communication protocol called TCp
(Turkish Coffee Protocol) that supports disconnected operations and virtual
connections. Some crucial tasks of the Communication Agent are handling
disconnected operations, providing non-blocking primitives, and the re-establishment
of existing connections after voluntary shutdowns.

3.4.2 Object Manager
The Object Manager is the system agent that is responsible from all the object specific
operations. It handles the Object Table that keeps information about all objects in the
local system [12]. It is the creator of the other system agents. It interacts with other
system agents to support notification, migration and recovery operations. Another task
of the Object Manager is to provide system-wide unique identification for all the
objects.

22
3.4.3 Notification Agent
When a relocatable object
1
is to be created or deleted, the MSP site, which keeps the
exact copy of the object, has to be informed. This process is necessary for the object
synchronization between the MH and the MSP, and called as notification. In the
object creation phase, the copy of the object is automatically created at the MSP site.
The Notification Agent uses notifier threads for handling notification requests.
Actually, the notification process is a preparation for the relocation process. Chapter 4
explains the Notification Agent, in detail.

3.4.4 Migration Agent
The agent that handles the object relocation requests is the Migration Agent. Since,
the Notification Agent automatically creates copies of the relocatable objects at the
MSP site, the Migration Agent deals with the transfer of the parameters, running the
object at the MSP site, and retrieving the results when the object execution ends. The
logical structure of the Migration Agent is very similar to Notification Agent. It uses
migrator objects to handle the relocation requests [13].

3.4.5 Recovery Agent
The mobile hosts may not run forever. They need to be shutdown periodically. The
task of the Recovery Agent is the recovery of the system after voluntary shutdowns. It
detects the shutdown request, and coordinates the shutdown process by managing a
table called Recovery Table (RT). This agent is explained in Chapter 5.



1
Types of objects are explained in section 3.6.1
23
3.5 Host Registration and Authentication Protocol
The system security is one of the most important parts of a system. The system should
be protected from unauthorized access. MaROS tries to provide the system security
by registering its users. After the registration process, the system may easily identify
and authenticate its clients.

The Host Identification Table (HIT) keeps the information of registered users. If a
mobile host user wants to use MaROS environment, s/he supplies some information
to the administrator of the MSP. This information is recorded into HIT and the user is
given a password for further authentication [12].

NAME
ID
AT
MT
OI
TI
P
gurhan
00:27:45:10
30/08/1998
30/08/1998
Yeditepe Univ.
Compaq35 NB
xxxx
mandrake
00:26:40:12
01/09/1998
01/09/1998
ABC Company
IBM 350L
xxxx








Table 3.1: An instance of the Host Identification Table (HIT).

The Host Identification Table contains seven fields:
1. NAME: The name of the mobile host.
2. ID: Network interface number (It is a worldwide unique identifier)
3. AT: Time of the host registration (Addition Time).
4. MT: Time of modification.
5. OI: Mobile host owner information.
6. TI: Technical details of the mobile host.
7. P: Encrypted password of the mobile host.
24

The authentication is done at the startup by the Authentication Object (AO). In order
to join the MaROS environment, the Authentication Object at the mobile host sends a
data packet that contains its NAME, ID and Password to its peer at the MSP site. The
Authentication Object at the MSP site searches the Host Identification Table for the
ID of the mobile host. If the ID is found in the table, the password is checked. If the
ID is not found in the table or the password is incorrect, the authentication request is
discarded. Otherwise, the host is accepted to the MaROS environment by sending a
positive acknowledgement. The host authentication process is depicted in Figure 3.3.
AO: Authentication Object
HIT: Host Identification Table

Figure 3.3: The host authentication process.

25
3.6 MaROS Objects
An object can be defined as a collection comprising of a data structure and a set of
operation on this data structure [9]. In MaROS, an object is a program or a part of it,
which can be executed in the MaROS environment.
3.6.1 Object Types
MaROS environment provides a big opportunity for its objects: The relocation
process. However, some parts of the objects do not need relocation capability.
Therefore, two types of objects are available in MaROS: ordinary and relocatable.

The type of the object should be decided at the time of the creation. After the creation
of an object, it is not possible to change its type. The ordinary objects do not have
relocation capability, and they may only be run at the host in which they are created.
On the other hand, the relocatable objects are automatically transferred to the MSP
site via the notification process. After a successful notification process, the relocatable
object may be run either in the MH or in the MSP. However, there is a restriction for
the relocatable objects. Once they start to execute, they may not change their host.
Since, each site has a copy of the object, another object internal is considered to start
the execution of the relocatable object: The object modes.

3.6.2 Object Modes
A relocatable object has two copies at both sites. It is not possible to run them at the
same time in the system. There are two modes of a relocatable object: active or
passive. Only the object that is in active mode may be run. The ordinary objects have
only one mode that is always active.

26
The object may be activated in one of the two possible ways:
• By using activate() call: Ordinary objects and relocatable objects that do not need
relocation is activated by using this call. They are run in the environment where
they are created.
• By using relocate() call: Only the relocatable objects that are notified may use
this call. They are run in the MSP site, if this call is used. The mode of the object
at the mobile host is set to passive, and the mode of the copy object located in the
MSP site is set to active.

3.6.3 Object States
Every object has a state concept. For instance, a student may be in the studying state.
A worker may be in the working state. The MaROS objects may be found in one of
the nine possible states:
• created: This state is the initial state of ordinary objects.
• created_notnotified: This state is the initial state of relocatable objects. The
objects in this state can be activated on the MH. However, the relocate()
primitive may not be used in this state.
• created_notified: After a successful object notification phase, the state of the
object is set to created_notified. By the use of the relocate() primitive, that object
may be run at the MSP site.
• ready: When an object is activated, its state becomes ready. The objects in the
ready state may be suspended or deleted. If the object is a relocatable object, it
cannot be relocated once it is activated.
• sleeping: If the execution of an object is suspended temporarily, its state becomes
sleeping. In this state, the object can be resumed (activated again) or deleted.
27
• relocating: The relocation process transfers all the input parameters of a
relocatable object to the MSP site. Until the end of the transfer operation, the
state of the object is relocating. The next state may be either relocated or
deleted_notnotified.
• relocated: After the transfer of all the input parameters of the object, the state of
the object is set to relocated. The next state may only be deleted_notnotified.
• finished: This state is special to the relocatable objects. If a relocatable object on
the MSP finishes its execution (not being deleted by a system call), its output
values must be transferred back to the mobile host. During the output transfer
process, the state of the object is set to finished.
• deleted_notnotified: After the end of the output transfer operation or a delete
request from the parent object, the object changes its state to deleted_notnotified.
This state lasts until the end of the successful notification for the object deletion
process, and ends up with the death of the object.

It is not possible to create relocatable objects at the MSP site. That means some of the
states are not used at the MSP site, whereas some of them are not used at the MH site.
Table 3.2 shows all the possible states of a MaROS object depending on its type and
location. More detailed information can be found in [3].
28

Relocatable
Ordinary
Object
States
MH
MSP
MH
MSP
ready




sleeping




created




created_notnotified




created_notified




relocating




relocated




finished




deleted_notnotified





Table 3.2: Object States.
3.7 Communication Structure
The communication design is one of the most important parts of a system like
MaROS. The performance of the system heavily depends on its communication
infrastructure.

The mobile systems mostly operate in a voluntary disconnected state. Since, the
current communication primitives are designed for the fixed networks, they easily
block a mobile host if there is no connection. In MaROS, a new communication
protocol, the Turkish Coffee Protocol (TCp), is designed to overcome the problems of
the mobile hosts. It resides over UDP, and provides virtual ports, nonblocking
primitives and message queues for supporting the reliable disconnected operations.

The Tp protocol uses two system objects to manage queues and virtual ports: The
Communication Agent (CA) and Virtual Port Mapper (VPM).

29
The CA is the creator and the manager of the outgoing message queue (send queue).
This queue contains the messages that are to be sent to the remote host. The use of
queues prevents the loss of messages, incase the mobile host is disconnected from the
system. The other system agents and user programs may get information about the
connection status via the CA.

The VPM is the controller of the virtual ports. There are two main task of the VPM:
1. Virtual Port Assignment: The job of the VPM is the mapping process of the
virtual ports to physical ports. When an object requests a port, the VPM allocates
a physical port. Then, it maps the port to a virtual port in its table, and returns the
virtual port number to the object. The object, which requests a virtual port, may
request any virtual port, or a special virtual port. The VPM provide services for
both types of requests.
2. Virtual Port Reservation: An object may need to reserve a port for its
subobjects. This process is very popular among the system agents to gain time.
The object reserves a virtual port and is given a key for accessing that port, later.
Then that object or any object that knows the port number and the key may use the
reserved port. Section 4.3.2 explains the process in more detail.

3.8 System Recovery
The mobile hosts have limited power supply. They have to be voluntarily shutdown,
when their battery requires rechargement. Because of this limitation, a mobile host
user does not want to run a long-run process. However, when the system shutdown is
necessary, a program may signal all the programs in the system to create their
recovery files. In MaROS, the task of the Recovery Agent is exactly the same. It
30
detects and coordinates a shutdown process. More detailed information may be found
in Chapter 5.
31


4

Notification Design


4.1 Introduction

This chapter briefly explains the design of the Notification process of the MaROS
system. In MaROS, there are two sites that need to communicate with each other: The
Mobile Host and the Mobile Host Service Provider. The user at the MH may need to
transfer his/her MaROS programs (objects) to the MSP site, and may want to delete
them after running those programs and retrieving the results. The transfer operation
creates copies of the chosen objects at the MSP site, and deletion operation deletes
those copies. In both operations, the MSP site is said to be notified. There is a special
system agent that is responsible from the Object Transfer and the Object Deletion
operations. This agent is called as Notification Agent, and the term Notification
covers both Object Transfer and Object Deletion.
32

4.2 Notification in General

A relocatable object may only be created at the MH site and may only be transferred
to the MSP. The initial state of a relocatable object is created-notnotified. In this state,
a MaROS user has two possible choices: 1) Running the object at the MH site, or 2)
Running the object at the MSP site. If the first choice is selected, the user may run the
object anytime it is necessary. However, after running the object, the user may not
interrupt the object execution and relocate it to the MSP site (In the current design and
implementation, the migration of running objects are not supported). If the user
selects the second way, he/she should wait the object state to change to
created_notified. When a relocatable object is created, this state change is
automatically initiated by MaROS. The Notification Agent is responsible from the
transfer of the Java Class files of the relocatable object to the MSP site. When the
files are transferred, the object may be run at the MSP site, also. The Notification
Agent receives Notification requests
2
directly from the Handler (The worker thread of
the Object Manager), and informs it whether or not the transfer or deletion job was
successful. If the notification job is successfully done, and it is an object transfer
process, then the OM changes the state of the relocatable object to created_notified.
In order to run the object at the MSP site, relocate() primitive is used. The relocate()
primitive activates the Migration Agent, and the MA transfers the command line
parameters that are necessary to run the object to the MSP site.


2
A Notification request is either an object transfer or object deletion request.
33

4.3 Detailed Design

The NA uses reliable communication primitives of Turkish Coffee Protocol (TCp). It
directly receives information from the Handler thread of the local Object Manager.
The information contains the type of the task (Actually, the NA may handle more than
one task). After successful transfer of the object, the user may want to delete the
object. It is in the NAs responsibility to delete the object specific files after the arrival
of delete request.

The Notification Agent is a very busy agent, and it should be available any time it is
needed. Transferring and deleting objects are very time-consuming tasks, and they
require extensive work. If the request is a transfer or delete request (if it may be an
invalid packet, the NA actually does nothing, and ignores the packet.) It immediately
reserves a virtual port from the Virtual-Port-Mapping service of the Communication
Agent, and creates a Notifier. The Notifier is a system object that is responsible for
transfer or delete requests. When creating the Notifier, the NA passes all the
necessary information to the Notifier as parameters. This information contains
reserved virtual port number, type of the task, Object ID and name, and Java Class file
names to be transferred (if the job type is object transfer). The Notifier uses a reserved
virtual port for communicating with other system objects. There may be more than
one Notifier running at a time. Each of them is responsible for one special transfer or
delete request.

34
4.3.1 Peer Entities

The four system agents are located at both on the MSP and on the MH. For instance,
the Notification Agent at the MH has a peer at the MSP site. In the rest of the text, the
NA
MH
refers to the Notification Agent at the Mobile Host, and the NA
MSP
refers to the
Notification Agent at the MSP site. When the NA
MH
is creating its Notifier, it also
informs its peer for the new incoming request. As depicted in Figure 4.1, the NA
MSP

reserves a virtual port number and creates a peer Notifier at the MSP site. When
creating the Notifier, the NA
MSP
gives all necessary information about the notification
request and its peer Notifier which is located at the MH site. This information
contains the virtual port number of the Notifier
MH
, the type of the notification request,
the Internet address of the Mobile Host, the reserved virtual port number and virtual
port key for using that port.
O: MaROS object
OM: Object Manager
H: Handler
NA: Notification Agent
N: Notifier
VPM: Virtual Port
Mappe
r

Figure 4.1: Initial phase of a Notification process
35

Figure 4.1 shows the initial phase of the Notification process. At this stage, both the
NA
MH
and the NA
MSP
assign the given task to the Notifier peers and start waiting for
new requests. In the figure, the numbers on arrows indicate the order of events.

4.3.2 Reserving Ports

The Notification process is a time-critical process, and everything should be
organized in an efficient way. The first priority job of the NA
MH
is to inform its peer
(NA
MSP
) as soon as possible. Any latency at this process delays the Notification
process. There are two design choices in the initial phase of Notification process. The
first approach is simpler than the second one; however, it is not optimal. The Figure
4.2 illustrates the first approach.
t
nc
: Time of Notifier creation at MH
t
nc’
: Time of Notifier creation at MSP
t
i
: Time for informing peer NA
t
m
: Time for informing OM
MSP

Figure 4.2: A Notification design approach (Initial Phase).
36

In this approach, the NA
MH
creates its Notifier before informing its peer. The NA
MH

and the NA
MSP
are in different machines and they may carry on their work in parallel.
This approach proposes a sequential execution. The NA
MSP
sits idle until the
Notifier
MH
informs it. In the Figure 4.3, t
nc
denotes the time for the Notifier creation, t
i

denotes the time for informing the NA
MSP
, t
nc'
denotes the time for the Notifier
MSP

creation, and t
m
denotes the time for informing the OM
MSP
.

Figure 4.3 illustrates the second and currently used approach. The NA
MH
immediately
informs the NA
MSP
. In this case, they may work in parallel providing shorter
Notification time values over the first approach. In Figure 4.4, these two approaches
may easily be compared.
t
nc
: Time of Notifier creation at MH
t
nc’
: Time of Notifier creation at MSP
t
i
: Time for informing peer NA
t
m
: Time for informing OM
MSP

Figure 4.3: Current design of the Notification process (Initial Phase).

37
However in the second approach, the NA
MH
should inform its peer about its
Notifier
MH
, which is not created yet. The NA
MH
should know the virtual port number
of its Notifier
MH
, before the creation of that Notifier
MH
. The same process is used at
the MSP site by the NA
MSP
. At the end, the NA
MH
may not inform its peer, before
creating its Notifier, and it may not create its Notifier, before informing its peer. This
deadlock is solved by a service called Virtual Port Reservation (VPR) in Virtual Port
Mapper (VPM) system object of the communication layer. The VPR is actually not a
NA specific service. It is used by Handler objects of the Object Manager, and it may
be used by any other user program. The VPR enables the NA to reserve port numbers.
The NA requests and reserves a port number. This adds a small amount of time to the
Notification process. This time period is shown in Figure 4.4 as time period of t
pr
(t
pr

denotes the time for virtual port reservation). It receives the port number and also a
key for using that reserved port. The key provides security in the reservation process.
Only the Notifier with this key may get the reserved port. However, the Notifier
should use a different method when creating a connection. It passes virtual port
number and the virtual port key to the VPM for obtaining the reserved port.



t
nc
: Time of Notifier creation at MH
t
nc’
: Time of Notifier creation at MSP
t
i
: Time for informing peer NA
t
m
: Time for informing OM
MSP
t
pr
: Time for port reservation

Figure 4.4: The comparison of two approaches.
38
4.3.3 Notifier Tables

Notifiers are the worker threads of the Notification Agent. There are five different
tables hold and used by Notifiers: Notifier Object Transfer Table (NOTT), Partial
Object Transfer Table (POTT), Class Dependency Table (CDT), Class Replica Table
(CRT), and Notifier Information Table (NIT).

4.3.3.1 Notifier Object Transfer Table (NOTT)

This table holds the names and full path of the class files to be transferred, and also
the length of these files. When an object transfer is in progress, the NOTT is always
used. The NOTT contains the names of the files that are going to be transferred.

4.3.3.2 Partial Object Transfer Table (POTT)

The POTT is used when a partial transfer operation is in progress. An object may be
partially transferred to the MSP site, due to many factors such as system shutdown,
link failures, etc. When the object transfer operation is interrupted, the retransfer of all
already-transferred objects in the next startup is not an ideal way. It is wasting of time
and system resources. MaROS tries to optimize the transfer operation by keeping an
additional table called POTT. The POTT keeps the indices of the partial files in the
NOTT. It additionally keeps the current length of each partial file. This information is
used to transfer partial files.

39
idx
Filename with fullpath
File length
1
/MaROS/test.class
17000
2
/MaROS/sample.class
25000
3
/MaROS/rect.class
1007



NOTT

idx
NOTT index
Current length
1
1
4096
2
2
8192
3
3
0



POTT

Table4.1: Sample NOTT and POTT instances.

4.3.3.3 Class Dependency Table (CDT) and Class Replica Table (CRT)

The CDT and its sub-table the CRT are used by only the Notifiers on the MSP. The
Figure 4.7 depicts the scope of all tables used by the Notification Agent and its
Notifiers. Each NOTT and POTT are created and used by only one Notifier. It is
shown that each NOTT and POTT have a copy at the peer Notifier.

Figure 4.5: Scope of the Notifier tables.

40
On the other hand, the NIT, the CDT and the CRT are global tables used and updated
by all Notifiers. For this reason, these tables should be synchronized for preventing
readers-writers problem (A typical and famous synchronization problem).

OID
# of
Classes
File 1
File 2
File 3

28
3
1
2
3

29
2
2
3








CDT

idx
Class Name
# of occurrence
1
/gurhan/Circle.class
1
2
/gurhan/Rect.class
2
3
/gurhan/Test.class
2



CRT
Table 4.2: Sample CDT and CRT instances.

In the sample tables above, the task of each table is depicted. The CDT keeps object
records. It knows which object has which classes. There are references to the CRT for
class names. For example, in the sample CDT object Circle has three classes that are
referenced as 1
st
, 2
nd
, and 3
rd
positions in the CRT. A class file may be used by more
than one MaROS object. The CRT holds the information of how many objects are
using how many dependent classes. From the sample CRT, it is seen that class Rect
and class Test are used by two objects; whereas, class Circle is used by only one
object. Moreover, all these classes are uploaded from the mobile host which is called
as gurhan.

If any object deletion is necessary; first, the CDT is searched and all dependent
classes are located at the CRT. Each class occurrence number is decremented by 1 in
the CRT. If there are any class file with an occurrence number less than 1, this means
the object may be physically deleted since there are no any other object is using that
class (Another design preference is to hold the class file, since a new object may need
it very soon. However, at this case, a ttl (time-to-live) number may need to be
41
attached to that record. The CRT is periodically checked and if any ttl becomes zero,
the corresponding object is deleted)

4.3.3.4 Notifier Information Table (NIT)

As its name implies, this table holds the information of all Notifiers. There are two
versions of the NIT: The MH version and the MSP version. The MSP version holds
Mobile Host Identifier (MID) as an additional field. Each Notifier records itself into
that table in the beginning of its execution, and deletes itself at the end of its
execution. It is a synchronized table as CDT and CRT. The NIT
MH
consists of three
fields: The Object ID of the Notifier, the Object ID of the object that Notifier deals
with, and finally the notification type. The notification type may be Creation or
Deletion. The NIT
MSP
contains an MID field. This is used to identify which objects
belong to which Mobile Hosts. This table is mainly used by the Object Deletion
process (Section 4.3.5).

Notifier
OID
OID
MID
Type of
Notification
3090
980
00000000001
C
3095
993
00000000001
C
3110
765
00000000001
D




Notifier
OID
OID
Type of Nofification
1050
980
C
1067
993
C
1056
765
D



NIT
MH
NIT
MSP

Table 4.3: Sample NIT instances.

42
4.3.4 Object Transfer (Object Creation)

The object transfer process deals with copying all the Java class files of the object to
the MSP site. It is a unidirectional process; the object transfer is only possible from
the MH to the MSP. This process is automatically initiated by the Object Manager,
when an object is created as relocatable. The Figure 4.10 shows the entire object
transfer scenario.

Figure 4.6: Object transfer (creation) process

The Object Manager creates a Handler (3), and the Handler at the Mobile Host (H
MH
)
sends a Notification request to the Notification Agent (4). The message format of the
request is as follows:

43

pnum
rt
oid
path
ncf
cfinfo
opts


Figure 4.7: Message format of Notification request.

This message is sent from the H
MH
to the NA
MH
. The first field, pnum, contains the
virtual port number of the H
MH
. This information is used by the Notifier
MH
to connect
to the H
MH
. The second field is the request type. As indicated before, a Notification
request may be either a Create (Transfer) or a Delete request. The next field contains
the object identifier of the object that will be notified. The next three fields are only
used when the request is an Object Creation request. The path contains Java
CLASSPATH of the object. The field ncf contains the number of class files to be
transferred. The next field, cfinfo, holds the information of the class files. The figure
4.12 shows this part of the message, in detail. The final field, opts, is reserved for
future use. It is added to the message format to hold any possible options that may be
added to the Notification process, in the future.

name
len
cdt
owner
Opts
#
….
#

Figure 4.8: cfinfo field in detail.

The fifth field in the Notification request message contains information about the
class files that the object have. Each file record has five fields, and file records are
separated with # signs. The first field contains the name of the class file. This
44
information is combined with the path value for locating the class file. The next field
is the length of the file. The third field contains the creation date and time of the file.
This information is planned to be used to detect different versions of the objects. The
fourth field holds the owner of that file in the system. In current design, there may be
only one user running MaROS in a MH, and this field is set to user maros. The last
field is again reserved for future use, and currently null.

The NA
MH
reserves a virtual port (5) for its (currently non existent) Notifier and
writes this port to the first field of the message. Then, it adds MID (Mobile Host
Identifier) and 48-bit security code
3
to the head of the message, and forwards it to its
peer (6), and creates a Notifier for dealing with that process (7). The Notifiers use a
file transfer protocol similar to FTP (File Transfer Protocol). Since, TCp primitives
are being used, the Notifiers do not deal with packet sequencing and error correction.

When the NA
MSP
receives the request, it reserves a virtual port number (8) for its
(currently non existent) Notifier and writes this port to the third field (pnum) of the
message. This process is very similar to the job of the NA
MH
. After replacing the first
field in the message, it forwards the message to the OM
MSP
(10). The OM
MSP
creates a
Handler (H
MSP
) for informing the Notifier
MSP
(11). The H
MSP
checks the file structure
and creates a message for the Notifier
MSP
(12).


3
In current design, this security code is added for future work. It does not have any function, yet.
45

tt
ncpf
pcfinfo
opts

Figure 4.9: Message format of the message that is sent from H
MSP
to Notifier
MSP
.

This message contains all the necessary information related with the object transfer.
The first data field contains the type of transfer. An object transfer may be in three
types: Full, Partial, or No need for transfer as illustrated in Figure 4.13. When the
H
MSP
checks the object files in the file system of the MSP, it may find out that none of
the object files are found in the file system. Then it determines the object transfer type
as Full transfer. However, if some of the object files are in the system, or some of
them are partially in the system, the object transfer type is set to be Partial transfer. In
the third case, all of the object files may already be in the system. Then, there is no
need for the object transfer, since all the files are in the system. The transfer type is
No Need in this case. Brief explanation of these types is as follows:

4.3.4.1 Full Transfer

In Full Transfer mode, the H
MSP
only sends an F (indicating Full transfer) in the
message. When the Notifier
MSP
receives this message, it knows that all the files
should be transferred in the NOTT (The NOTT is created and filled in the beginning
of the Notifier execution). The Notifier
MSP
forwards the packet to the Notifier
MH
(13),
and the Notifier
MH
starts transferring files one by one (14).

46
4.3.4.2 Partial Transfer

In Partial transfer mode, the H
MSP
sends a P (indicating Partial transfer) followed by
the number of partial files (ncpf). The pcfinfo contains partial class files information.
The Figure 4.14 pcfinfo field in detail. It contains the NOTT indices, and the lengths
of those partial files at the MSP site (If file does not exist, this field is 0). The file
records are again separated with # signs. The Notifier
MSP
forwards this packet to the
Notifier
MH
(13). Both of them create their POTT, and the Notifier
MH
starts transferring
only those partial files (14).

NOTT_index
len
opts
#

#

Figure 4.10: pcfinfo field of the message in Figure 4.13.

4.3.4.3 No Need To Transfer

In this case, the H
MSP
sends an N (indicating No transfer is needed) in the message.
When the Notifier
MSP
receives this message it decides that all the files already exist in
the MSP. It simply forwards this message to the Notifier
MH
(13).

When the transfer operation ends, the Notifier
MSP
signals the H
MSP
, and the Notifier
MH

signals the H
MH
for success or failure in object transfer operation (15). Figure 4.15
displays the message format of the message that is sent from Notifiers to Handlers.
This message contains a Success or an Fail indicating the result of the Notification
process. The second field contains the object identifier of the object that is just
notified or failed to be notified. If the operation is successfully finished, the H
MSP
and
47
the H
MH
update their tables and change the state of the object from
created_notnotified to created_notified at the MH site.

result
oid

Figure 4.11: Message format of the message sent from Notifiers to Handlers.

4.3.5 Object Deletion

After running the object, and obtaining the results, the user may want to delete the
object. In order to delete the object, deleteObject() primitive of the OM is used.
Figure 4.16 illustrates the object deletion process.

Figure 4.12: Object deletion process
48

When the OM receives the deletion request (1), it immediately checks the type of the
object (2). If the object is an ordinary (non-relocatable) type object, the OM may
delete the object immediately. Ordinary objects may be deleted without regarding
their state. However, the object may be a relocatable object. In this case, the state of
the object is checked by the OM. The OM creates a H
MH
(3), and the H
MH
interacts
with other system agents for a successful object deletion.

Object deletion process is very similar to object transfer process. The Notifiers are
created at both site (7, 9), and the OM
MSP
is informed as it is in object transfer (10).
However, each Notifier checks the Notifier Information Table (NIT) for learning if
that object is already transferred or not (8, 10). If there is a Notifier dealing with the
transfer of that object, it is stopped by the Notifier who is charged to delete that
object. Meanwhile, the OM
MSP
creates a H
MSP
for dealing with this deletion process
(11). H
MSP
checks the Object Table (OT) and if the object is present, it removes the
object from the OT (12). Then it informs the Notifier
MSP
whether the deletion was
successful or not (13). If the object is deleted from the OT successfully, the
Notifier
MSP
tries to delete all the class files of that object. First, it deletes the object
record from the CDT and the CRT (The deletion of the object from the CDT and the
CRT is explained in Section 4.3.3.3). Then, it sends the result of the deletion process
to the Notifier
MH
(14). Finally, the Notifier
MH
does nothing but forwards the packet to
the H
MH
(15). The first field of this packet contains an S or F for notifying success, or
failure in the deletion process. The second, and the last, field contains the object
identifier of the object for providing security in the Notification process.
49



5

System Recovery
Design



5.1 Introduction
System recovery is one of the most crucial parts in MaROS. There are two possible
types of Recovery: Heavy-weight Recovery, and light-weight Recovery. The first type
of recovery deals with all type of unexpected failures such as system lock-ups,
hardware failures, etc. In the current design and implementation, this type of recovery
is not handled. The second type of recovery deals with expected system interruptions
such as shutdown request by the user. Since, a shutdown request is detected by the
system, a negligible amount of time may be spent to backup some crucial data. This
process enables the system to continue its execution as if there were no interruptions,
in the next system startup. The process of backing up all the crucial data is not a
straightforward issue, and it should be coordinated in a careful manner. MaROS uses
a special agent to control the recovery process: Recovery Agent (RA). Recovery
Agent provides a controlled shutdown and this is called as System Suspension. When
the MaROS is rebooted, everything continues their execution from the point where
50
they are suspended. Without the RA, all interrupted processes should be restarted
without any chance. That means wasting of resources, and time.

5.2 Recovery Table (RT) and Recovery Tree Structure
Java does not support signal handling. This deficiency led the MaROS group to
implement their own signal handling backbone. The Recovery Agent and the
Recovery Table are the two main components of the signal handling structure. The
Recovery Agent keeps track of the Recovery Table (RT) for handling recoverable
objects. The RT holds all recoverable objects and their subobjects (if there are any).
This table actually holds a recovery tree structure
4
as depicted in Figure 5.1. The root
of the tree is the Recovery Agent. Since each system agent controls a crucial part of
the system, all of them are recoverable.

The RA creates and handles a Recovery Tree structure by the help of the RT. If a
MaROS object has subobjects, the programmer should decide whether these
subobjects need recovery or not. For instance, the Notifiers are such subobjects that
rely on recovery. They transfer class files from the MH to the MSP. Incase there is an
interruption, all the transfer operation should not be started from the beginning.




4
Recovery Table does not contain the root (RA) of the Recovery Tree.
51
Figure 5.1: A sample Recovery Table instance and its corresponding tree structure.

One of the main tasks of the RA is to detect shutdown requests and initiate the
shutdown process. Before a proper shutdown, all recoverable objects should be
signalled and given a chance to write their crucial data to disk. The RA signals all
recoverable objects by traversing the Recovery Tree using the RT. Traversing the
Recovery Tree is a cooperative task. The RA initiates the process, and the OM
continues. All recoverable objects signal their subobjects by using the RT class
methods and wait until the subobjects finish their recovery process. Then, they do
their recovery work and signal their parent object.

52
5.3 Recoverable Objects vs. Unrecoverable Objects
Recoverable Objects are MaROS objects that are recorded into the RT. This table
contains the recoverable objects and their subobjects. This means if a subobject of an
object is recoverable, it should also be recoverable. Best examples for recoverable
objects are system objects such as the Notification Agent and its Notifiers. They are
all recoverable and they are recorded into the RT, when the system is in the startup
process. A MaROS programmer should make a plan and decide which of his/her
objects should be recoverable. Unfortunately, choosing the objects that are eligible to