Introduction and motivation
This project implements a file synchronization mechanism between multiple devices,
independent of their hardware platform or softwa
re (operating system) platform. Basically what
the application does it monitors one or more
file system paths, and when a modification is
detected on a file it will sync it to the other devices in the mesh, and also create a version of that
modified file on the server. The protocol through which all of this is done is XMPP.
will be able
to control any of his meshed devices from any computer (ex: from an internet
coffee) that has an internet connection and a web browser, so he could control them even if he is
not standing in front of any of his devices.
The accomplishment of this project
help make collaboration and proactive file
synchronization easier, and having all you important data, centralizes in the cloud.
big that you have a lot of important stuff on your computer like financial documents, email,
digital photos, mu
sic and more. Unfortunately, computers are vulnerable to hard drive crashes,
virus attacks, theft and natural disasters, which can erase everything in an instant. Current
statistics show that one in every ten hard drives fail each year. The cost of recover
ing a failed
hard drive can exceed $7,500, and success is never guaranteed.
Living in a more and more dynamic IT world the technology users expect proactive behavior
from their devices, expect all their personal devices to be connected, synched and reacha
even though they are not standing in front of them.
In the above picture we could see a typical mesh of 3 devices, one laptop, one desktop, and
one mobile phone. The picture showing a globe is the symbol, showing the devices could be
m the web. As you can see all the devices are connected and in sync. If one device
make a modification the modification is sent to the other two devices. This way they will always
be in sync with one another.
With the help of virtualization the Graviton pr
oject will allow you to manipulate virtual hard
disks that do not even exist on your local storage, but rather
the cloud. So you could create,
mount, assign drive letters or demount virtual hard disks from the could. This results in a better
t effective backup system. Graviton will make scheduled backups, of you important
data in your personal virtual hard disks on the cloud. So rather than saving you data on you
device, and eventually lose, it, you can just make a click to mount you virtual h
ard disk, and
copy all your data there, and in the cloud. Virtual hard
will also help computers boot from
Having said that graviton is also a collaboration software, it will also make possible, data
sharing between registered users. Users
will be able to share any of their data on their personal
computers with other registered users, and set different permissions like
, delete, execute.
, client side
As discussed in previous chapter the Graviton
service, will be able to serve clients respective a
certain in house protocol, independent of their hardware platform (laptop, mobile phone,
desktops) and software platform (Microsoft Windows, Linux, MAC OS, etc..).
For this project specifically, the impl
emented client is for Microsoft Windows operating
system, and the core of the syncing engine is a file system filter driver, above the NTFS file
system driver layer.
The GUI application is implemented using C# and the libraries that link the driver with th
GUI are written in ANSI C / C++.
To make anything happen, first of all the clients on each device need to understand the XMPP
To communicate with an XMPP server, I chose the Jabber
Net library for easier access, control and greater manipulation of the protocol.
Messaging and Presence Protocol
) is an open,
originally aimed at near
(e.g., buddy lists), but now expanded into the broader realm of
message oriented middleware
It remains the core protocol of the Jabber Instant Messaging and Presence technology. Built to be
extensible, the protocol has been extended with features such as
Voice over Internet Protocol
file transfer signaling.
Unlike most instant messaging
protocols, XMPP is an
, it is an
where anyone who has a domain name and a suitable Internet connection can run his own
Jabber server and talk to users on other servers. The standard server implementations and many
ents are also
free and open source software
Internet Engineering Task Force
(IETF) formed an XMPP Working Group in 2002 to
formalize the core protocols as an IETF instant messaging and presence technology. The XMPP
WG produced four specifications whic
h were approved by the
as Proposed Standards in
are now undergoing revisions in preparation for advancing them
to Draft Standard within the Internet Standards Process. The
XMPP Standards Foundation
(formerly the Jabber Software Foundation) is active in developing open XMPP extensions.
technology correctly implements the RFCs in full.
based software is deployed on thousands of servers across the Internet and by 2003
was used by over ten million people worldwide, according to the XMPP Standards Foundation.
Popular commercial servers include the
client applications include the freeware clients offered by Google, Nimbuzz and the Gizmo
protocol instant m
(formerly Gaim), and free
dedicated clients such as
provides XMPP gateways to its service.
's federation protocol is an open extension to the XMPP protocol.
Net is a set of libraries for accessing Jabber functionality from .NET. It is written in
C#, but should be accessible from other .NET languages such as VB.NET. Components exist for
connecting to a Jabber server either as a client or as a component.
As you explore, you'll find
there are some other goodies buried inside, like Trees, CommandLine processing, etc.
The library consists
controls for sending and receiving Extensible Messaging and
Presence Protocol (XMPP), also known as the Jabber.
e library can handle c
server component connections, presence,
service discovery, and the like.
Windows File System Drivers
(often also written as
) is a method for
and the data they contain to make it easy to find and access them. File
systems may use a
data storage device
such as a
and involve maintaining
the physical location of the files, they might provide access to data on a file server
by acting as
clients for a
clients), or they may be virtual and exist
only as an access method for virtual data (e.g.,
). It is distinguished from a
More formally, a file system
is a special
for the storage, organization,
manipulation, and retrieval of
File System API
file system API
application programming interface
through which an
interfaces with file system code. The operating
system usually provides abstractions for
accessing different file systems transparently to
programs, and in this sense it is
APIs that provide abstracted access to hardware.
Some file system APIs may also include interfa
ces for maintenance operations, such as
creating or initializing a file system, verifying the file system for integrity, and
although these are more often implemented independently from the file system code.
file system API known as a file system
driver for the
and FAT32 file systems.
Kernel Level API
The API is "kernel
level" when the kernel not only provides the interfaces for the filesystems
developers but is also the
space in which the filesystem code reside.
It differs with the old schema in that the kernel itself uses its own facilities to talk with the
filesystem driver and vice
versa, as contrary to the kernel being the one that handles the
filesystem layout and th
e filesystem the one that directly access the hardware.
It isn't the cleanest scheme but resolves the difficulties of major rewrite that has the old
With modular kernels it allows adding filesystems as any kernel module, even third party
modular kernels however it requires the kernel to be recompiled with the new
filesystem code (and in closed
source kernels, this makes third party filesystem impossible).
systems such as
have used this scheme.
There is a var
iation of this scheme used in
(DOS 4.0 onward) and compatibles to
ROM and network filesystems. Instead of adding code to the kernel, as in the old
scheme, or using kernel facilities as in the kernel
based scheme, it traps all calls to a f
identifies if it should be redirected to the kernel's equivalent function or if it has to be handled by
the specific filesystem driver, and the filesystem driver "directly" access the disk contents using
Driver Based API
The API is "driver
based" when the kernel provides facilities but the filesystem code resides
totally external to the kernel (not even as a module of a modular kernel).
It is a cleaner scheme as the filesystem code is totally independent, it allows filesy
stems to be
created for closed
source kernels and online filesystem additions or removals from the system.
Examples of this scheme are the
File System Filter driver
A file system filter driver intercepts requests targ
eted at a file system or another file system
filter driver. By intercepting the request before it reaches its intended target, the filter driver can
extend or replace functionality provided by the original target of the request. Examples of file
ter drivers include anti
virus filters, backup agents, and encryption products. To
develop file systems and file system filter drivers, use the IFS (Installable File System) Kit,
which is provided with the Windows Driver Kit (WDK).
Filter Manager and Mini
The Filter Manager is a file system filter driver
provided by Microsoft that simplifies the development of third
party filter drivers and solves
many of the problems with the existing legacy filter driver model, such as the ability to contr
load order through an assigned altitude. A filter driver developed to the Filter Manager model is
called a minifilter. Every minifilter driver has an assigned altitude, which is a unique identifier
that determines where the minifilter is loaded relative
to other minifilters in the I/O stack.
Altitudes are allocated and managed by Microsoft.
C# and .NET Framework
(pronounced "C Sharp") is a
paradigm programming language
programming disciplines. It was developed by
initiative and later
approved as a standard by
). C# is one of the
programming languages designed for the
on Language Infrastructure
C# is intended to be a simple, modern, general
language. Its development team is led by
, the designer of
. It has an object
. It was initially named Cool, which stood
like Object Oriented Language". However, in July 2000, when
made the project
public, the name of the programming language was given as C#. The most recent version of the
language is 3.0 which was
released in conjunction with the
3.5 in 2007. The
next proposed version, 4.0, is in development.
Microsoft .NET Framework
that can be installed on computers
. It includes a large
of coded solutions to
common programming problems and a
that manages the execution of programs
written specifically for the
. The .NET Framework is a key
offering and is
intended to be us
ed by most new applications created for the Windows platform.
Base Class Library
provides a large range of features including user
, data and
. The class library is used by
programmers, who combine it with their own
to produce applications.
Programs written for the .NET Framework execute in a
environment that manages
ments. Also part of the .NET Framework, this runtime environment
is known as the
Common Language Runtime
(CLR). The CLR provides the appearance of an
application virtual machine
so that programmers need not consider the capabilities of the specific
t will execute the program. The CLR also provides other important services such as
. The class library and the CLR together
constitute the .NET Framework.
Version 3.0 of the .NET Framework is included with
Windows Server 2008
. The current version of the framework can also be installed on
Windows Server 2003
family of operating systems. A reduced version of the .NET Framework is
also available on
. Version 4.0 of the framework was released as a public Beta on 20 May 2009.
Project overview, server side
Having talked a little about the client side, and saw what components it needs, and how they
work individually, now it is time to talk a little about the server side.
The server is actually the most important component of all as it needs to handle request from
thousands of clients, connected from all sorts of different devices.
As well as the client, the server will also have to support and implement the XMPP protocol
that we have discussed earlier, in chapter
. For this to happen the server machine will have
installed an XMPP server, and I
chose, the server from Ignite, called OpenFire (will discuss
about that soon).
The OpenFire server will work with multiple database servers from Microsoft
SQL to mySQL, but out server will make use of PostgreSQL
As I mentioned earlier, the service would al
so be provided through the web browser, so the
server will also have to provide web interface interaction with the user. This is done using ASP
.NET and an open fire library for flex, called
is a real time collaborati
on (RTC) server licensed under the Open Source GPL. It
uses the only widely adopted open protocol for instant messaging, XMPP (also called Jabber).
Openfire is incredibly easy to setup and administer, but offers rock
solid security and
ministration of the server is done through a web interface, which runs on
the ports 9090 (HTTP) and 9091 (HTTPS) by default. Administrators can connect from
anywhere and edit the server's settings, add and delete users, conference rooms, and so forth.
Openfire supports the following features:
based administration panel
friendly web interface and guided installation
Database connectivity (i.e. embedded
) for storing messages and user details
Platform independent, pure
Full integration with
Spark Jabber client
The proprietary extension to Openfire allows multiple server instances to work together in one
relational database management system
(ORDBMS). It is released
and is thus
. As with many other open
PostgreSQL is not controlled by any single company, but has a g
lobal community of developers
and companies to develop it.
is an Open Source Flash library for instant messaging and presence clients using the
XMPP (Jabber) protocol. XIFF includes an extension architecture that makes it easy to add
functionality for additional protocol extensions, or even your own special
There are quite a few extensions already included in the library, giving it support for XML
over XMPP (
user conferencing (
), Service browsing (
and XHTML message support (
web application framework
developed and marketed by
to build dynamic
. It was first released
in January 2002 with version 1.0 of the
, and is the successor to Microso
Active Server Pages
(ASP) technology. ASP.NET is built on the
Common Language Runtime
(CLR), allowing programmers to write ASP.NET code using any supported
.NET pages, known officially as "web forms", are the main building block for app
Web forms are contained in files with an ".aspx" extension; in programming
jargon, these files typically contain static (
markup, as well as markup defining server
side Web Controls and User Controls where the developers place a
ll the required static and
dynamic content for the web page. Additionally, dynamic code which runs on the server can be
placed in a page within a block
which is similar to other web
development technologies such as
, but this practice is generally discouraged
except for the purposes of
since it requires more calls when rendering the page.
Definition of terms
We will call the
the actual product under this specification.
An actor is an entity that has a
plays in its relationship with the
system, interacts with the system, makes use of the system
can be defined as
ny device or PC that has a file
system and can follow the
A device connected to the server will be recognized as a client if it cans login to
the server with a valid username and password. After that, it should be able to create XMPP IQ
requests that the server can proce
An entity that will successfully login to the server, but will not be able to send any valid
XMPP IQ, will also be recognized as a client, but the server will not be able to provide any
is a c
requests, and syncs data throughout a
set of rules
The server entity is actually a set of more components put together to offer the client
the functionality it requests. The server is composed of the
XMPP server, sync manager
house in the XMPP server), database manager, storage manager.
The server will recognize a
only after, the entity connected to it, will successfully
login. Only after this the server will provide functionality f
or the client requests.
If the server
does not recognize the requests, than he will send back to the client an appropriate error message.
This will happen if a request type is recognized but cannot be fulfilled. A success message will
be sent to the reques
tor client if the request was processed. Eventually the request data will be
given back to the client.
server component, which based on a set of user rules, syncs data files
. The sync manager is e
mbedded in the XMPP server as a plug
manager will work through a set of in
house XMPP stan
za requests (ex: IQ UpdateFileRequest,
with parameters: update size, update offset, and buffer).
The sync manager is the most complicated part of the p
roject to implement and it requires
being the most efficient, requiring a small cache manager on the server, for some kind of updates
to work really fast.
The sync manager will do such tasks as assuring that one
all the devices.
is considered synced if the newest
of that file is on all the devices in the same
time. Further modification to that file, will remove the “synced file” bit from that file.
The user can choose to manually sync one
file across his devices, if any problems, or
malfunctions appear in the sync process.
The database manager is the actor that deals with database/table management. The database
manager, basically knows the whole organization of the
project in its background, things like
how tables are related to one another, what query should you make to see the protected folders of
one user, or the users space quota.
The database manager lies on top of the PostgreSQL SGBD. The layer on top will be
dll library like with all sort of in
library queries. (ex: LONG GetUserFreeSpaceQuota(IN char
*UserName, OUT PLARGE_INTEGER *UserQuota) will put in UserQuota the free space user
quota for the specified user, and will return the appropriate error co
de if any error);
The storage manger
will all storage related issues on the server, like total space, free space,
user folders, VHD option and settings.
The user will have the option to choose if they will want
their storage on a V
HD or on the server’s direct storage; this will also be handled by the storage
Of course the storage manager is in really close relationship with the DB manager, all queries
and SETs he will make will reflect in the databases. The storage manager
will also allow the user
to extend his current space quota as well as go to a lower space quota.
Backup and restore manager
As its name is very suggestive, the backup and restore manager will, be server component that
will maintain user’s backup meta
a, like backup schedules, backed
up folders, or file versions.
The backup and storage manager will know what server folder paths a certain user has that he
uses to make the backup happen. The restore manager will help the user choose which files he
o restore from the server. The user will have to choose files to restore according to date,
or version (see
The restore manager will allow the user to choose his interface of restoring the files. The user
choose to either download the files via HTTP or FTP, or the files will be automatically
downloaded by the desktop application, and restored where needed.
The restore manager will work to restore the files even if the user is not in front of any of his
ces. The user can choose to make a web login from a web browser, and download any file
need at the computer he is in front of.
The restore manager basically offers the user interfaces of access to its files. It is tightly
integrated with the sharing manage
of a file
ny previous content of a file that
on the server. The file
versioning that the server will support will vary on the file size or version history.
The user can
impose the server not to support more than a number of versions of file. If then number of
versions exceeds the maximum for a file, the server will automatically delete the first versions of
the file, and shift the versions, so the last versio
n is always backed up and present on the server.
Actor interface to easily, remotely and securely access backed up and storage
data with the
help of your web browser. The web interface will help the user interact with his files if he is not
in front of one of his computers.
The user will be able to login online, and control any of its devices, perform actions on them
delete, rename, modify files).
showing the "Welcome" web interface. The connected user can see his devices, which are online, and which
not. If logged from a new device, he can choose to add the new device to its sync m
esh. The user can see its Storage and
remote control his devices
The user will be able to perform file sharing operations, via web, but also access shared files
The user will be also able to register a new user, change his password or change se
A very important component of the service is the sharing manager. The user can choose to
share files on his devices with other registered users or unregistered users, making them public.
The sharing manager works
closely with the restore manager, being able to offer interfaces to
reach the desired files.
The user will be able to select the users which he will want to share the files with, and set
permissions, like read
files that one user will expose will be available either through a web
link or an
anonymous login to
ftp server. The files will be
available for download for a determined
period of time (default 30 days).
Virtual Hard Disk
(VHD) is a
containing the complete contents and structure
Hard Disk Drive
, and is used to store virtual operating systems and their
associated programs in a single file by various
programs or a
format was created by
which was later acquired by
June 2005 Microsoft has made the VHD Image Format Specification available to third parties
Microsoft Open Specification Promise
Virtual Hard Disks allow multiple operating systems to reside on a single host machine. This
enables developers to test software on different operating systems without the cost or hassle of
The ability to directly modify a virtual machine
’s hard disk from a host server supports many
Moving files between a VHD and the host file system
Backup and recovery
Antivirus and security
Image management and patching
Disk conversion (physical to virtual, and so on)
management and provisioning
Windows Client GUI
Windows Client GUI”
is the actually the main interface for the user to interact with.
The user is able to indirectly control the “Sharing manager” (see
), the “
restore manager” (see
), the “Sync Manager” (see
) or the settings from the “Windows
driver client” (see
The Windows Client GUI (
) will provide also the user the ability to administrate his
He can add, delete or customize contact details for any of its contacts. He can choose to
share files, see what o
thers had to share with you and also collaborate.
showing a typical contacts tree with three groups, 2 offline contacts and one online contact, and a search bar,
with only the matched users
As shown in
ed in user can see its contact roster, and search through his
contacts. He could also choose to, make contact groups to manage them with more ease.
indicates, the logged in user can choose to manage any of its devices from the one
he’s connected at.
is more, a layer between the managers on the servers and the services offered and
the client, trying to gain access to them.
Windows Client driver”
component is one the most hard to implement, and the one that
makes a difference between a normal file sync and collaboration application, and Graviton.
The, driver type, is a
driver, which can be configured to
system paths, and look for any modifications to files. The driver will have a local backup folder
where it will save the
files as they are modified. The driver will make backup copies, detec
deletion, detect file modification (offset and length), or file creation.
The driver also provides callbacks to the user mode application to register to be called when
certain update happens in the driver.
The user application will send each updat
e from the driver to the server and then the server
will route it to all the devices in the mesh. The driver will ignore any modifications made by the
user mode application. This way the user mode application can patch the files without having to
ut the driver interpreting them as file modifications.
The trick is that the driver will send only the
modified part of the file to the user mode
application and the offset and length. So if a huge file is partly modified, the driver will not send
a request to sync the whole file, but the modified part. This way the client save
and uploading time. Each update like this can mean one version of the file, if the user
chooses to have the files version for each update. The user can configur
e how the versions will
be made, according to number of versions of storage occupied.
One last thing that needs to be mentioned for the driver is that the driver cannot work if
attached to a VHD, because it does not meet the concurrency requirements. A ver
sion for this
driver will be released and, will offer functionality only through a callback interface, to the user
and an API will be exposed as a dynamic library to make direct calls and setting to the driver on
Windows client libraries
dows client libraries
consist of .NET dynamic link libraries or normal WIN32/64
libraries that the whole application uses to offer the client the functionality it expects. Such
libraries expose functions to manipulate entities like the
A typical library like this will help the programmer easily
integrate in the main application a
login box, a register box or even a contacts tree view with all the contacts shown as they are
online or offline.
the login box is generated by a function call in the Login library. T
he library also exposes functions that
support the login without a login form.
As shown in Figure 3 the, programmer can choose to make a login call from the login library and
the login box will automatically be shown from the API. For example the login lib
rary class has
nd exposes the following public methods:
of these functions will use the private variable of type JabberClient and try to log it in to
The class also exposes some public variables showing the last error code and the login status.
This way all the programmer has to do is just make a n
ew instance of the class
and just call Login(). As you can see there is also an option, if no
form is required, for a login with no form. This is typically done when the user wants to
remember its credentials on the machine he is on,
and the application will automatically log
him/her in without showing the form.
The model of the
is followed for the other libraries as well.
Windows client libraries
will enhance programming speed and offer new and
of implementing the main application
Windows mobile client GUI
As mentioned earlier, the
application will be able to keep in sync even the mobile smartphones
or pocket PC’s. All that is needed is that your mobile devices have and operating system and c
connect to the internet. Of course not all the functionalities are implemented in the mobile
version of the application due to the mobile devices limitations, but the necessary and most
important functionalities will be there, like file sync, sharing, b
ackup and restore.
show a devices emulator, emulating a Pocket PC using Windows Mobile 5.0. The current running application
is made to make a proof of concept that the data is sent from the mobile phone to a Linux server
As suggested by Figure 4, the first client which I wi
ll implement will only support windows
mobile clients. The server protocol of course is platform independent, and the clients can be
multi platform. Because of the ease with which the Windows mobile client can be implemented I
choose to start with it.
Windows mobile FileSystem
Windows mobile FileSystem
class is probably the biggest and most important
needed code that this project will need for mobile devices.
Normally, Microsoft does not have a
class for the .NET Compact
framework. Normally if the functionality should be minimum for the mobile devices using
Windows Mobile, than the FileSystemWatcher class is a must. Luc
kily there is an open source
community, which handles projects for the .NET Compact framework to extend its functionality.
is committed to open source projects to help th
e mobile and embedded
development community in their projects whether it be enterprise development or commercial
The only difference here between the windows client and the windows mobile client is that,
the FileSystemWatcher (on the mobile c
lient), does not come close to the functionality offered
by the driver (in the windows client). The compromise is little though, due to the fact that mobile
devices do not have large files, and, also do not have such high I/O activity. So this being said
he conclusion is that the windows mobile
will only sync file, with their entire size, not
parts of the file.