A New Approach for Peer-to-Peer Distributed Computation

tackynonchalantΛογισμικό & κατασκευή λογ/κού

3 Δεκ 2013 (πριν από 4 χρόνια και 9 μήνες)

185 εμφανίσεις




A New Approach for Peer
-
to
-
Peer Distributed Computation

Supervised By: Dr. Khaled Nagaty

Faculty of Informatics and Computer Science

Department of Computer Science

Alex Movsessian


Student ID:
111171


Submitted to the Faculty of Informatics and Co
mputer Science

The British University in Egypt

In partial fulfillment of the requirements for the Degree of

BACHELOR OF SCIENCE

June, 2013


Alex Movsessian

Computer Science

BUE

ii


A New Approach for Peer
-
to
-
Peer Distributed Computation


Abstract

Distributed Computing is considered to be one
of the challenging
problems in computer science. The entry barriers are quite significant due to the
inherent complexity which such systems have to provide. This system provides a
new approach for tackling the problems related to distributed computing by
c
reating a multi
-
tier structure that is quite accessible and powerful enough for
creating distributed parallel applications
with a minimal learning curve and
debugging effort, coupled with inherent safety and modularity allowing for further
expansion of fea
tures.


Submitted to

Prof. Ahmed Hamad
, Dean of the Faculty

Faculty Committee

Dr Khalid Nagaty
, Dissertation Advisor

Alex Movsessian

Computer Science

BUE

iii


Acknowledgments



I would like to seize this occasion to acknowledge the help, vision and
trust that my graduation project advisor Dr. Kha
lid Nagaty had offered me. His
support had facilitated the process and provided quite a bit of motivation

and the
right
quantity

of catalytical weight over the various project stages. I would also
like to thank Professor Ahmed Hamad for the perception and
knowledge in the
past years. Furthermore, I thank those of my competent colleagues whom with
their cooperation and intuition the past few years had been quite advantageous
and inspiring.





Alex Movsessian

Computer Science

BUE

iv



Table

of
Contents

Table of Contents
................................
................................
................................
.

iv

List of Figures

................................
................................
................................
.....

viii

List of Graphs
................................
................................
................................
.......

xi

Li
st of Tables
................................
................................
................................
.......

xii

Chapter 1
-

Introduction

................................
................................
......................

13

1.1

Motivation

................................
................................
..............................

14

1.2 Object
ive

................................
................................
................................
....

16

1.3 Project Management
................................
................................
..................

19

1.3.1 Management Methodology

................................
................................
..

19

1.3.2 Ta
bular Representation

................................
................................
.......

20

1.3.3 Gantt Chart Representation

................................
................................
.

22

1.4 Licensing

................................
................................
................................
...

22

Chapter 2
-

Review of Literature

................................
................................
.........

24

2.1 Computer Networking

................................
................................
................

25

2.1.1 OSI Model

................................
................................
...........................

25

2.1.2 The IP Protocol

................................
................................
....................

2
6

2.1.3 The TCP Protocol

................................
................................
................

27

2.1.4 Client
-
Server Web Structure

................................
................................

27

2.1.5 Peer
-
to
-
Peer Networks

................................
................................
........

30

2.2 Distributed Systems

................................
................................
...................

37

2.2.1 DHT

................................
................................
................................
.....

37

2.2.2 Kademlia

................................
................................
.............................

40

2.2.3 Distributed File Systems

................................
................................
......

45

Alex Movsessian

Computer Science

BUE

v


2.3

Security

................................
................................
................................
..

46

2.3.1 Symmetric Key Encryption

................................
................................
..

46

2.3.2 AES Algorithm

................................
................................
.....................

48

2.3.3 Public Key Encryption
................................
................................
..........

50

2.3.4 RSA

................................
................................
................................
.....

51

2.3.5 Key Exchange

................................
................................
.....................

53

2.3.4 Hashing Algorithms

................................
................................
.............

54

2.3.5 SHA 512

................................
................................
..............................

55

2.4 Web Services

................................
................................
............................

57

2.4.1 HTTP Protocol

................................
................................
.....................

57

2.4.2 Software as a Service (SaaS)

................................
.............................

60

2.5 Used Tools

................................
................................
................................

62

2.5.1

Python

................................
................................
.............................

62

2.5.2

The C# Programming Language

................................
.....................

64

2.5.3

Model View View
-
Model (MVVM)

................................
....................

66

Chapter 3
-

Related Works

................................
................................
.................

69

3.1 Google Map Reduce

................................
................................
..................

70

3.2 BitTorrent Sync

................................
................................
..........................

72

3.3 Skype

................................
................................
................................
.........

73

3.4 BitCoin

................................
................................
................................
.......

74

Chapter 4
-

Contribution

................................
................................
......................

77

4.1 TCP Connection

................................
................................
........................

81

4.2 Authentication

................................
................................
............................

82

4.3 Message Exchange

................................
................................
...................

83

4.4 Proxification (Tunneling)

................................
................................
............

84

Alex Movsessian

Computer Science

BUE

vi


4.5 DHT

................................
................................
................................
...........

88

4.6 Web Server

................................
................................
................................

90

4.7

Distributed File System

................................
................................
..........

92

4.7.1

File Identifiers

................................
................................
..................

92

4.7.2

Network Functionality

................................
................................
......

93

4.7.2

Local File Management

................................
................................
...

95

4.8

Omega Python

................................
................................
.......................

99

4.9

Attack Vectors

................................
................................
......................

101

Chapter 5
-

Experiments and Test Results

................................
.......................

105

5.1 N
-
Way Merge

................................
................................
..........................

106

5.2 Load Balancing for Link Crawler

................................
..............................

109

5.3 Experimental Data Analy
sis

................................
................................
.....

111

5.3.1 Data Analysis Tools

................................
................................
...........

111

5.3.2 Experimental Data

................................
................................
.............

115

5.4 Com
parison to Other Peer
-
to
-
Peer Systems

................................
...........

127

5.5 Optimality

................................
................................
................................
.

130

Chapter 6


Conclusion and Future Works

................................
.......................

133

6.1

Conclusion

................................
................................
...........................

134

6.2 Future Works

................................
................................
...........................

138

6.2.1 Distributed SQL
-
like Database

................................
..........................

138

6.2.2 Live Streaming

................................
................................
..................

138

6.2.3 Native Algorithmic Libraries

................................
...............................

138

6.2.4 Ad
-
hoc Application Ports

................................
................................
...

139

6.2.5 Dynamic Cloud Storage
................................
................................
.....

139

6.2.6 Monetisation

................................
................................
......................

141

Alex Movsessian

Computer Science

BUE

vii


Appendix A
-

P
roject Schedule Gantt Chart

................................
......................

143

Appendix B
-

Project Classes UML Diagram

................................
....................

147

Appendix C
-

Performance Analysis for Launching Parallel
Application

...........

149

Appendix D


Handshake Process Detailed Log

................................
..............

154

Appendix E
-

Creative Commons License Legal Code

................................
.....

160

Glossary of Key Terms

................................
................................
.....................

171

References

................................
................................
................................
.......

176

Index

................................
................................
................................
.................

186

Acronyms

................................
................................
................................
..........

192




Alex Movsessian

Computer Science

BUE

viii


List of
Figure
s


Fig. 1 Relationship between System Components

................................
..............

18

Fig. 2 The RAD Model (Leffingwel, 2007)

................................
...........................

20

Fig. 3 Creative Commons License Summary (Creative Commons, 2013)

..........

23

Fig. 4 The OSI

Model (Mintzberg, 2010)

................................
.............................

25

Fig. 5 IP Packet Structure (Cisco, 2012)

................................
.............................

26

Fig. 6 TCP Packet Structure (Cisco, 2012)

................................
.........................

27

Fig.7 Client Server Model

................................
................................
...................

28

Fig. 8 Illustration of the Bit Torrent Network (Siganos, 2009)

..............................

32

Fig. 9 The Tor Network Logo (The Tor Project, 2012)

................................
........

33

Fig. 10 Illustration of the Tor Network (McCoy, 2008)

................................
.........

35

Fig. 11 SETI@Home Network Logo (SETI@Home Labs, 2013)

.........................

36

Fig. 12 BitTorrent Logo (BitTorrent Labs, 20
13)

................................
.................

38

Fig. 13 Kademlia Binary Tree Example (Maymounkov, 2002)

............................

42

Fig. 14 Kademlia Lookup Process (Maymounkov, 2002)

................................
....

43

Fig. 15 AES (Paar, 2010)

................................
................................
....................

49

Fig. 16 RSA Key Pair Generation (Holmes, 2012)

................................
..............

51

Fig. 17 A hash function used in a phonebook to map names to phone numbers
(Rogaway, 2004)
................................
................................
................................
.

55

Fig. 18 SHA
-
512 (Grembowski, 2002)

................................
................................

56

Fig. 19 basic operation of the HTTP protocol for a client to retrieve a page from a
server (Budgen, 2003

................................
................................
.........................

58

Fig. 20 HTTP Message Structure (Berners
-
Lee,
1999)

................................
.......

59

Fig. 21 Python Logo (Python Software Foundation, 2012)

................................
.

62

Fig. 22 C# Logo (Microsoft, 2013)
................................
................................
.......

64

Fig. 23 Illustration of the MVVM Design Pattern (Microsoft, 2008)
......................

66

Fig. 24 WPF Architecture (Freeman, 2010)

................................
........................

67

Fig. 25 Illustration for MapReduce Basic Operations (Yang, 2007)

....................

70

Fig. 27 BitTorrent Sync User Interface for Distributed File Management
(BitTorrent Labs, 2013)

................................
................................
.......................

72

Alex Movsessian

Computer Science

BUE

ix


Fig. 26 BitTorrent Sync Logo (BitTorrent Labs, 2013)
................................
.........

72

Fig. 29 Illustration for the Skype P2P netwo
rk structure (Dean, 2012)

................

73

Fig. 28 Skype Logo (Microsoft, 2013)

................................
................................
.

73

F
ig. 30 Bitcoin Logo (Nakamoto, 2013)

................................
...............................

74

Fig. 31 Illustration for the BitCoin operation. The cloud represents the P2P
network and the DHT (Nakamoto, 2013)

................................
............................

75

Fig. 32 The value of BitCoin versus US dollar indicating its recently growing
potential (Maurer, 2013)

................................
................................
......................

76

Fig. 33 System Diagram

................................
................................
.....................

80

Fig. 34 RSA Key Exchange Automaton

................................
..............................

82

Fig. 35 Envelope Data Structure Used for Transmitting the Messages

...............

83

Fig. 36 The Proxification Process

................................
................................
.......

85

Fig. 37 Proxification Example
................................
................................
..............

86

Fig. 38 The usage of nodes as multiple proxie
s

................................
..................

87

Fig. 39 DHT Buckets Structure

................................
................................
...........

89

Fig. 40 Operation of the local web server

................................
...........................

91

Fig. 41 Hierarchical Data Structure Format

................................
.........................

93

Fig. 42 Sample Parallel Search Query

................................
................................

94

Fig. 43 Flat table repr
esentation of the file structure

................................
...........

96

Fig. 44 FileSystemWatcher Operation

................................
................................

97

Fig. 45 FileSystemWatcher Sample Demo

................................
.........................

98

Fig. 46 N
-
Way Merge

................................
................................
........................

108

Fig. 47 Web Crawler Load Balancing Operation

................................
...............

110

Fig. 48 Wireshark Logo (Wireshark, 2013)

................................
.......................

111

Fig. 49 WPE Pro Main Window Screenshot

................................
......................

112

Fig. 50 Visual Studio Logo (Microsoft, 2012)

................................
....................

112

Fig. 51 Illustration of TraceRoute Operation (Cisco, 2012)

...............................

113

Fig. 52 No
-
IP Logo (No
-
IP, 2012)

................................
................................
.....

114

Fig. 53 Screenshot of the Web Crawler Load Balancer Application

..................

117

Fig. 54 Link Loading Response Time Analysis

................................
.................

118

Fig. 55 Web Scrapper Queue

................................
................................
...........

119

Alex Movsessian

Computer Science

BUE

x


Fig. 56 Web Scr
apper Finished
................................
................................
.........

120

Fig. 57 Web Scrapper Result Links

................................
................................
...

121

Fig. 58 Packet Dump 1

................................
................................
.....................

122

Fig. 59 Packets after handshake
................................
................................
.......

124

Fig. 60 BubbleSort Pseudocode

................................
................................
.......

130

Fig. 61 Merging Algorithm Pseudocode

................................
............................

131

Fig. 62 Example of In
-
App Advertisement in uTorrent

................................
......

142

Fig. 63 Project Schedule Gantt Chart
-

Pa
rt 1

................................
...................

144

Fig. 64 Project Schedule Gantt Chart
-

Part 2

................................
...................

145

Fig. 65 Project Schedule Gantt Chart
-

Part 3

................................
...................

146

Fig. 66 Project Classes UML Diagram

................................
..............................

148

Fig. 67 Handshake Log Part 1

................................
................................
..........

155

Fig. 68 Ha
ndshake Log Part 2

................................
................................
..........

156

Fig. 69 Handshake Log Part 3

................................
................................
..........

157

Fig. 70 Handshake Log Part 4

................................
................................
..........

158

Fig. 71 Handshake Log Part 5

................................
................................
.........

159


Alex Movsessian

Computer Science

BUE

xi


List of
Graph
s


Graph. 1 Node Uptime Probability (Maymounkov, 2002)

................................
....

44

Graph. 2 Average Network Latency

................................
................................
..

116

Graph. 3 BitTorrent Speed Analysis

................................
................................
..

129

Graph 4 Comparison Between Two Appr
oaches of Sorting

..............................

132




Alex Movsessian

Computer Science

BUE

xii


List of
Table
s


Table 1 Project Schedule

................................
................................
....................

20

Table 2 Summary of AES Algorithm Feature
s (Paar, 2010)

...............................

48

Table. 3 HTTP Status Codes (Berners
-
Lee, 1999)

................................
.............

58

Table 4 Python Standard Library Modules (Python Software Foundatio
n, 2012)

63

Table. 5 System Components

................................
................................
.............

78

Table. 6 States in RSA Key Exchange

................................
................................

82

Table.7 Omega Python Primitives
................................
................................
.......

99

Table 8 Attack Vectors Summary
................................
................................
......

101

Table 9 Network Latency

................................
................................
..................

115

Table 10 Communication Monitor PC Specifications

................................
........

121

Table 11 Gathered Experimental Elements

................................
......................

126

Tab
le 12 Peer
-
to
-
Peer Networks Comparison

................................
..................

127

Table 13 Summary Comparison with Related Works
................................
........

136
Alex Movsessian

Computer Science

BUE

13



Chapter 1

Introduction


Chapter 1
-

Introduction


Alex Movsessian

Computer Science

BUE

14


1
.

Introducti
on


Peer
-
to
-
Peer networks are being increasingly regarded as a major
breakthrough in computing. Throughout the last decade which witnessed a
radical increase in the number of interconnected computer, various peer
-
to
-
peer platforms were

created. However, the majority of the aforementioned
platforms focused upon file sharing and anonymity. Quite few managed to
harness the aggregate computing power, and those who did, were in an ad
-
hoc application
-
oriented basis rather than a generic distr
ibuted computing
manner. In this paper a new kind of peer
-
to
-
peer structural system shall be
presented addressing such demands for generic application
-
independent
peer
-
to
-
peer computation.

1
.
1

Motivation


The increasing number of connected u
sers to the Internet in the last
decade has opened up various new frontiers for innovation which were
unthinkable of only few years ago. Every day over 1 billion Internet users create
over 2.5 quintillion bytes of data (IBM, 2013).

The demand for computin
g power and data handling has been increasing
exponentially for years, and
it’s

not expected to slow down any time soon. Such

demand spawned an increasing need for
making use of

the power of as many
computers as possible in the most efficient ways to coop

with the needed
processing tasks.

This has led to the creation of several
science fiction

like parallel
computing systems in organisations such as Google, Microsoft, Amazon and
various security entities. However, most of the development in parallel and
d
istributed computing has remained almost exclusive to large organisations
capable of purchasing and running hundreds of thousands of machines on their
premises.

Alex Movsessian

Computer Science

BUE

15


Almost all of the Internet
-
based public
-
resource

distributed solutions which
were created befo
re were based on a quite ad
-
hoc basis, whereby a very specific
instance of a problem was to be solved using a group of “peers”, typically utilising
their idle time, with a master command and control server handling the data from
all of the peers. The quint
essential example of such efforts is the SETI@Home
project for analysing data about the potential of extraterrestrial life

(Anderson,
2002)
.

There were very few attempts to make large scale distributed computation
using arbitrary public computers connected

through the Internet in a general
manner
similar to that of file sharing, whereby any group of computers could
cooperate and share a file in a relatively easy and secure manner with minimal
requirements in terms of setting up the necessary coordination an
d control
infrastructure and background knowledge.

Having a general
-
purpose and easy to use system for performing
distributed computation over the Internet would open up various opportunities for
productivity and innovation
. Such opportunities would be exp
loited by

a much
wider group of users who could begin with a minimal knowledge about distributed
computing and yet be able to create useful distributed parallel computing
solutions for
complex
problems
. The target problems range

from quite moderate
large d
ata processing all the way up to advanced data mining, artificial
intelligence and image processing algorithms.


The discussed project in this paper shall tackle this issue and present a
solution in the form of an integrated distributed system structure wi
th various
components aimed to be working together, facilitating the creation of distributed
applications on general
-
purpose Internet
-
connected machines.



Alex Movsessian

Computer Science

BUE

16


1.2
Objective


The project’s main objective is to provide the required components
for an
integrated distributed system, integrated and usable together in an easy manner
even for novice programmers.

In addition, the project’s plan is to make use of state
-
of
-
art industry
-
standard security measures for all of its components, ensuring to p
rovide the
maximum possible level of security and privacy to the users participating in the
system.

That objective has to be not confused with anonymity
. The project’s
emphasis on security comes from the need of preventing any potential
abuse for
the system that might cause security exploitation risks to the user’s machines.
That is somewhat different than providing anonymity for reasons such as free
speech. Anonymity is provided as a side effect of some of the security measures
built in,

however it is not thoroughly emphasised as a primal goal unlike other
similar projects which have the sole goal of providing anonymous Internet
access, and maintaining it is ultimately the responsibility for the users.

The ultimate goal of the project is
leveraging the entry barrier for
distributed computation
, making it possible for novice programmers to
experiment and tackle such algorithms and design patterns in a quite friendly and
intuitive manner.

The proposed system i
s a distributed computing model which has several
aspects aimed for facilitating both distributed file sharing, distributed
communication and distributed parallel processing.

The primal aspects of the system include:



A base Transmission Control Protocol (T
CP
) based communication over
the Internet among networked computers reachable through a publicly
announced IP address.

Alex Movsessian

Computer Science

BUE

17




A “Proxification
” layer for establishing “tunnels” to communicate to nodes
which do not have publicly a
ccessible IP address or would like to remain
anonymous.



A unique
-
identifier base system or identifying and locating various entities
on the network (nodes and stored resources).



A crypto
g
raph
ic protocol for encrypting all communications end
-
to
-
end for
all
peers participating in the network.



A Distributed Hash
Table

(DHT) for locating and storing nodes and
resources in the system.



A distributed file system based on the aforementioned DHT layer for
locating files on the network using unique paths.



An authenti
cation layer operating on top of the distributed file system layer
for ensuring the authenticity of returned files based on the previously
-
known identity of their owners.



An internal HTTP Web Server

for serving local content from the net
work to
the local client machine through standards
-
compliant HTTP web browser
or a generic user agent.



A custom Python
-
based programming language for facilitating parallel
distributed programming, providing a set of powerful and simple primi
tives
for making use of the network infrastructure as an abstract entity.

Alex Movsessian

Computer Science

BUE

18


Fig. 1 shows a diagrammatic representation for the
relationship among
the
various parts of the system.

Fig.

1

Relationship between System Components


Python + HTTP

Distributed Hash Table (DHT)

Cryptography (RSA + AES)

Proxification

TCP Connection

Alex Movsessian

Computer Science

BUE

19


1.3
Project Management


This section is concerned with discussing the higher
-
level aspects of
managing the software project. It presents the selected software development
methodology, cou
pled with a tabular and a visual Gantt Chart representation.

1.3.1 Management Methodology



Given the research centric nature of the project, as the requirements and
feature were expected to evolve throughout the lifetime of
the project due to the
various results discovered during the research phase and the testing and
experimental phase, it was deemed that the Rapid Application Development
(RAD
) methodology was the most
suitable

one.


The RAD model emphasis interl
eaving the project planning with the
programming the software, either through the creation of fully functional modules
or prototypes for use as proof
-
of
-
concept for potential features

(Larsen, 2008)
.

In addition, RAD encourages writing test cases proactive
ly, which is
crucial in case of the system being developed, due to the somewhat large and
complex interaction between its various components, which are being added at
different stages, and should not change the previous behavior or result in
unintended cha
nges in the interaction among the previously made components
and relationships among them

(Roman, 2005)
.

The RAD model is adopted by various software organizations, and its
increasing usage was the primal reason behind several features added to model
Integ
rated Development Environments (IDEs) such as Visual Studio to facilitate
and accelerate the RAD process

(Hejlsberg, 2007)
.

Fig. 2 shows The RAD Model (Leffingwel, 2007).


Alex Movsessian

Computer Science

BUE

20



Fig.

2

The RAD Model (Leffingwel, 2007)

1.3
.
2

Tabular Rep
resentation


Table
1

Project Schedule

Time Period

Tasks

September 2012
-

November 2012



Implementation of the server side HTTP
Server with support for:

o

Displaying WebPages

o

Displaying Regular files

o

Handling Cookies

o

Handling Forms (G
ET and PUT
requests)



Initial research about network structure
and crypto
graph
y and authentication
mechanisms






Alex Movsessian

Computer Science

BUE

21


December 2012
-

January 2013

Implementation of the network portion:

o

Data and command transfer mechanisms.

o

Authentication, crypto
graph
y and o
ther
security measures of the protocol.

o

Testing of the network portion of the project
on a sample network of 10
-
20 computers.


February
-

March 2013


o

Implementation of the dynamic programming
language.

o

Thesis Writing.

April 2013

o

Testing the program on me
dium
-
sized
networks

o

Building sample dynamic websites using the
language.

o

Building sample multi
-
peer applications using
the language: For example, a chat
application.

o

Building sample distributed applications
based on the programming language [e.g.
Merge sor
t, Chat application, Controlling of
multiple client computers simultaneously]


May 2013

o

Continuation of testing

o

Thesis Writing

Alex Movsessian

Computer Science

BUE

22


1.3.
3

Gantt Chart

Representation


A Gantt chart representation for the major milestones is included in
App
endix A.

1.4 Licensing



Considering the quintessential nature of the produced work, it was
concluded that an open source release would fit the best, as it would allow for
maximum collaboration among interested developers in creating rela
ted works
based on the system, maintain it and to add new features and customisations as
they please.


Putting the work in the public domain, lifting any copyright restrictions was
one option. However, to prevent potential inappropriate usage by commercial

entities which might add features to the system and sell them for
-
profit in
unrelated products, gaining significant amounts of money without
benefiting

back
the open source community

and not putting attribution back to the author or to
the community membe
rs, it was decided to choose the Creative Commons

ShareAlike license. The license allows for use on the condition of sharing back
the source code that was added to the system, thus ensuring it remainin
g as free
and open source.


The full detailed legal code of the license is provided in Appendix E.

Fig.
3

shows a summary for the Share Alike license agreement (Creative Commons,
2012).

Alex Movsessian

Computer Science

BUE

23



Fig.

3

Creative Commons License Summary (C
reative Commons, 2013)




Alex Movsessian

Computer Science

BUE

24


Chapter 2

Review of Literature


Chapter 2
-

Review of Literature


Alex Movsessian

Computer Science

BUE

25


2
.

Review of Literature

2.1 Computer Networking

2.1
.1

OSI Model


The Open Systems Interconnection (OSI) is a conceptual model for
representing the ab
stract layers which form the current communication model
over the Internet

(Stallings, 1987)
.
Fig.
4

shows the seven layers of the OSI
model (Mintzberg, 2010).


Fig.

4

The OSI Model

(Mintzberg, 2010)

The physical layer is concerne
d with the actual physical infrastructure in
terms of routers, gateways, hubs and
fibre

optic cables. The Data Link layer
concerned with the transmission of data between directly connected network
entities. The Network layer is the one responsible for tran
smitting a data packet
from a host on a particular network to another host on another network (that is
Alex Movsessian

Computer Science

BUE

26


where the IP protocol operates). The transport layer is concerned transmission of
data to the applications, the ones at the layers above it (this is wher
e the TCP
protocol operates). The layers above the transmission layer are responsible for
the operation directly with individual user application. Some of the upper layers
may be not used directly.

2.
1.
2 The IP Protocol


The IP Protocol

is the main protocol used to identify networked nodes,
both in a Local Area Network (LAN) or on the Internet. The IP main function is
routing of messages from one location onto another given a certain address

(Forouzan, 2002)
. The most used version of IP
currently is IPv4, which has a 32
bit address space (4 billion nodes) and is in the process of being gradually
replaced with the more recent version of IPv6 with a 128 bit address space to
satisfy the increasing demand for more connected nodes on the Inter
net

(Deering, 1998)
.

Fig.
5

shows the structure of an IP packet (Cisco, 2012).


Fig.

5

IP Packet Structure (Cisco, 2012)

Alex Movsessian

Computer Science

BUE

27


2.
1.
3 The TCP Protocol


The TCP Protocol servers as one of the backbone protocols for th
e
Internet together with IP, for transmitting messages in ordered sequences
guaranteeing a reliably and error
-
checked transmission across various nodes,
assuming no malicious interference

(Forouzan, 2002)
. The primal drawback of
TCP is the lack of adversar
ial expectation

(Bellovin, 1989)
. Thus, it is possible for
a 3
rd

party to interfere the traffic among two connected parties and

listen to or

alter it

without a clear method for detection assuming the usage of TCP alone.

Fig.6 shows the structure of a TCP p
acket (Cisco, 2012).


Fig.

6

TCP Packet Structure (Cisco, 2012)


2.
1.
4
Client
-
Server We
b Structure


The traditional web, which was introduced by Tim Berners
-
Lee in 1990,
which became the basis for hosting the majority websites on
the Internet until
now:

Alex Movsessian

Computer Science

BUE

28


The fundamental intuition behind the web is that it consists of three main entities:
Clients, Servers and service providers (Berners
-
Lee, 2001).



Clients:

The users interesting in visiting hosted websites.



Servers:

The computers whic
h host the content which the clients connect
and obtain their data from.



Service providers:

Includes entities such Internet Service Providers
(ISPs
), domain name servers and proxy servers.


Fig.
7

shows a d
iagram of the traditional web model
.





Client 1

Client 2

Client 3

Client 1 ISP

Client 2 ISP

Client 3 ISP

Server

Fig.
7

Client Server Model

Alex Movsessian

Computer Science

BUE

29


The major drawbacks of the traditional web are:



DDoS

attacks:

Distributed Denial of Service Attacks: Which exploit a
major flaw in the architecture of the web, which is that the bandwidth
available to the server becomes inversely proport
ional with the amount of
consumption by the clients connecting to it. To simply put it, the more
clients who demand content from a server, the more stressed that server
becomes, and eventually
-
if the attack is strong enough
-

it can no longer
have sufficie
nt bandwidth for delivering any content. Even without DDoS
attacks specifically, paying for the bandwidth for hosting content is the
main cost factor for any major website with a lot of traffic (Hills, 2006).



Vulnerability to censorship

by the service providers:

For example
governments blocking websites for political reasons (Harwit, 2001).



Vulnerability to the abuse of legal frameworks of the Internet
:

For
example, a website’s domain name could be ceased by the country
operating it (e.g.

the United States has the right to cease any .com domain
name) (Cukier, 2005).



Vulnerability to spoofing

by the service providers:

Whereby an ISP or
a malicious hacker might redirect the traffic coming to a website to another
one, in orde
r to steal information (Beverly, 2009).



Vulnerability for interception:

For example, governments recording chat
and email conversations of citizens (Sinclair, 2002).

Those drawbacks, coupled with other factors led to the proliferation of the
Peer
-
to
-
Peer n
etworks which began gaining significant momentum since the late
1990s.



Alex Movsessian

Computer Science

BUE

30


2.
1.
5 Peer
-
to
-
Peer Networks



Peer
-
to
-
Peer networks are an alternate model for establishing connections
between a group of computers interested in commu
nicating and exchanging data
or resources together. The model does not make use of a centralized server,
rather it relies on the peers themselves to act as both clients and servers within
the system (Bellovin, 2001).

First generation of P2P networks includ
ed the infamous Napster and Kazaa.
The main feature of the first generation of P2P networks was having a main
centralized group of servers for the purposes of location peers and files on the
same network (Bellovin, 2001). That led to the following problem
s:



The central servers were able to monitor the entire network, which led to
several potential privacy concerns.



As the networks grew in terms of number of peers and files, scaling the
coordination and search indexing servers up was becoming a critical
iss
ue, and in case of downtime of a critical number of the main command
and control servers, the entire network would become effectively shut
downed.



It was relatively easy to block access to such networks, either by filing
copyright lawsuits based on the cla
im that they facilitate sharing
copyrighted files, or by merely blocking the domain name entries of the
centralized servers from a particular network (Yang, 2008).



Most implementations were based on downloading each file from one
peer, as a one piece, much

like the traditional server implementation
(Pouwelse, 2005).


Alex Movsessian

Computer Science

BUE

31


Thus, despite the hype surrounding P2P in its infancy, the first generation
networks did not provide a sufficient solution to the problems faced by the
traditional web.


Second generation of P2
P networks began gaining momentum in the mid
2000s following the problems which followed the first generation, in particular the
shutting down of Napster. Examples include Bit Torrent and Gnutella. This is the
current generation of P2P networks in operatio
n around the world.


The major traditional web problems which the second generation P2P
networks solved:



Immunity against DDoS attacks
, as there were no longer centralized
servers to search file hosted files or coordinate peers, thus each peer in
the netw
ork contributed its bandwidth to the swarm of peers requesting a
file, even before fully downloading a file. Thus, the more requests become
available for a file, the more bandwidth it gets (Garbacki, 2005).



Splitting files into multiple pieces
, also known
as chunks. Chunks are
parts of a file, based on a start offset and number of bytes of the chunk.
Thus facilitating downloading files from multiple peers simultaneously,
where each peer contributes a chunk or more of the file (Epema, 2005).

The major drawba
cks for the second generation of P2P networks are:



Lack of encryption in most default implementations
, which did not
help in solving the censorship issue, as a lot of ISPs and governments
collaborated in installing traffic analyzers which could determine t
he
content being downloaded, and link it back to the users who downloaded
it. In recent years there have been an increasing number of lawsuits
Alex Movsessian

Computer Science

BUE

32


against individual’s downloaded content (e.g. movies) from P2P networks
such as Bit Torrent (Hatehet, 2010).



Lack

of a centralized way to search the network for files

due to lack of
centralized servers. A drawback that did not exist with the first generation,
but was it was a necessity to be introduced in the second generation, to
keep the network from making it an e
asy target to take down (Tian, 2006).



Dependence on “tracker servers”:

Each file hosted on the Bit Torrent or
similar networks is being tracked by one or more servers, whose role is to
facilitate finding peers having the same file for a new peer to downloa
d it
from them. If all the tracker servers for a file go down, it won’t become
possible to locate and download the file (Wu, 2006).

Fig.
8

shows
Illustration of the Bit Torrent Network (Siganos, 2009).


Fig.

8

Illustration of the B
it Torrent Network

(Siganos, 2009)

Alex Movsessian

Computer Science

BUE

33


In addition to P2P networks for sharing files, there have been other P2P
networks introduced in the recent years for the purpose of providing anonymity
for servers. One particularly popular example is the Tor network.

Fig
.
9

shows the logo of the Tor network, one that has became quite
famous in the recent years in various news outlets
(The Tor Project, 2012).


Fig
.

9

The Tor Network Logo (The Tor Project, 2012)

The
Tor network (and other similar n
etworks such as Freenet) which was
introduced in 2004 provides a method for web servers to host content without
being directly observable. The way servers are identified and authenticated is by
their public key which acts as a pseudo domain name identifier
. The basic
intuition is that one does not connect to the server hosting the content directly,
but instead goes through a series of at least 3 nodes other nodes in the network
prior to reaching the server, with a public key based authentication mechanism i
n
place to verify that one is connecting to the intended server on the far end of the
network (McCoy, 2008).

However, the Tor network (and similar ones like Freenet) does not provide
ability to host files within the network itself, as all the hosted websi
tes are hosted
on their individual websites such as the case in the traditional web. Thus, the Tor
network might still provide anonymity for servers; however it still has most of the
drawbacks of the traditional Web servers:




DDoS
:
Tor network

servers can still be vulnerable to DDoS attacks,
just like the regular web. In fact, DDoS attacks on Tor
-
hosted
Alex Movsessian

Computer Science

BUE

34


servers does have another major problem of slowing down the
entire Tor network significantly as it would slow down not just the
server hosting
the data but all the intermediate nodes for each
connection being made to the server .




Anonymity
:

Despite the connection being passed through several
nodes, anonymity of the data hosted on Darknet networks such as
Tor can still be compro
mised via two primary methods:



Mass
-
network sniffing
:
W
hereby an entity capable of dedicating a
sufficient number of computer resources could be monitoring and
analyzing the communication patterns of a large number of nodes
(c
onnected peers) in the Tor network, and by various means of
statistical correlation analysis obtain several clues about where the
actual servers are hosted.



Server software exploits
:

Whereby an expert in server software or
a hacker can find an exploit in the server software enabling him/her
to know the server actual IP address and take it down.

Fig. 1
0

shows an illustration of the functional structure of the Tor network
(McCoy, 2008).

Alex Movsessian

Computer Science

BUE

35



Fig.

10

Illus
tration of the Tor Network

(McCoy, 2008)

Thus, the state of the P2P networks is currently in a dichotomy. One end
of the spectrum offers to host files only such as Bit Torrent and on the other end,
networks such as Tor offering anonymity for existing serve
rs which allow hosting
of full websites on regular servers. None of which provide true means security for
the content. In case of P2P file sharing networks such as Bit Torrent, their
encryption is typically non
-
existent or optional at best, and in case of
Darknets
like the Tor network, the true location of the server could still being exposed, and
even with the location of the servers remaining anonymous, they still are
vulnerable to DDoS attacks such as the traditional web.

It is also worth to mention tha
t besides those two major architectures for
P2P which are used by millions around the world currently, there exist other
fragmented “Ad
-
Hoc” P2P networks for specific purposes of utilizing the
computer power of the peers, such as the SETI@Home which aims t
o gather
Alex Movsessian

Computer Science

BUE

36


enough computing power for analyzing mass data and signals in search for signs
of intelligence in the outer space (Anderson, 2002).

Fig. 1
1

shows the
SETI@Home project logo (SETI@Home Labs, 2013).


Ad
-
hoc networks such as SETI@Home

have yet
another major drawback. When a new distributed
computing task becomes available, usually
programmers create new P2P applications from
scratch to support such Ad
-
Hoc applications. This
makes them have the challenging task of finding users to j
oin the network and
distributing their application to them, in addition to having to make sure that their
networking and communication codes in the new programs are correct and work
as intended.



Fi
g.
11

SETI@Home Network
Logo (SETI@Home Labs,
2013)

Alex Movsessian

Computer Science

BUE

37



2.2 Distributed Systems

2.
2.1

DHT


Distributed Hash
Tables

(DHTs) are a form of distributed systems which offer
the basic functionality of a hash
table
, that is:



Storing key value pairs (k
, v
).



Looking up a value given its key.

DHTs are intended to be formed through arbitrary numb
er of nodes
connected together in a peer
-
to
-
peer network, without any centralized servers or
any form of “master” nodes holding the coordination among the participating
nodes in the system

(Gaurang, 2013)
.

Reliability wise, DHTs are designed to provide max
imum possible
sustainability, allowing nodes to enter and leave unconditionally without causing
any significant degree of disruption for the rest of the nodes or the stored values

(Naor, 2003)
.

Prior to the invention of DHTs, lookups were still possible, h
owever they
heavily relied upon recursive algorithms, which were quite inefficient as they
required traversing the entire network in a recursive fashion, causing general
slow down for any file lookups, and potential for misuse and spreading malware
through

forging the returned results

(Kalafut, 2006)
.

With the increasing growth of the Internet, in particular high
-
bandwidth
always
-
on computers, DHTs gained a significant amount of momentum and
found several applications in which hundreds of thousands and up t
o tens of
millions of peers were connected
simultaneously
.
DHTs for most of the time acts
as a “middle layer” for such applications, facilitating a part of their major
Alex Movsessian

Computer Science

BUE

38


functionality that could not be done otherwise.
Applications using DHTs currently
includ
e:




Bit Torrent
:

The popular file
sharing system, w
hich uses
DHTs to avoid having
centralized “tracker” servers
which could be taken off by
governments or by sheer amount of overwhelming traffic for popular
content. In case of Bit Tor
rent, the DHTs map from the hash code of a
particular file

(the key) onto another file that holds more detailed
informati
on about the torrent structure which includes
sub
-
files, ea
ch with
its hash and block size (Cohen, 2008)
.

Fig.1
2

shows the BitTorrent l
ogo
(BitTorrent Labs, 2013).




Bit
Coin
:

The increasingly popular online virtual
currency

makes use of
DHT to keep track of all the made financial transactions among all peers,
to prevent issues such as double spending and race conditions

wh
ich
could lead to unintended
behaviour

such as spending the same currency
token twice (Babaioff, 2012)
.





Freenet
:

An anonymous peer
-
to
-
peer network makes use of DHTs for
storing the information about files stored on the network, mapping fr
om
their names to their hash codes and actual content

(Clarke, 2012)
.




Botnets
:

Such as the Kelihos and Storm botnets, which make use of
DHTs to

coordinate attacking targets without having a “master” command
and control server that could be

disrupted by law enforcement agencies

(Ortloff, 2011)
.

Fig.
12

BitTorrent Logo (BitTorrent Labs,
2013)

Alex Movsessian

Computer Science

BUE

39



DHTs have several parameters, some of which include:



Key size

(address space
)
:

Assuming the values are numerical (which
could be generalized to any other value type), the key si
ze is the number
of different values a key could have. Typically powers of two are used.

A
key size of 128
-
bits means 2
128

different values to the key are possible.




Lookup method
:

This is how the lookups are performed for a particula
r
key.

Methods include either iterative (where one node iterates through the
required nodes to find the required value) or recursive. Most DHTs prefer
an iterative method for looking up, as it puts less stress over the network.





Stored objects type:
Primi
tive DHTs store numbers or single strings;
however various ones allow the storage of arbitrary objects. The stored
object type does not depend on the DHT structure, as it would on the
actual implementation details.




Redundancy:

For improving reliability, r
edundancy is usually a critical
parameter for DHTs, whereby a single key
-
value pair is stored at multiple
nodes to prevent the loss of data in case a node goes offline, and to
accelerate the lookup process for popular values.



Alex Movsessian

Computer Science

BUE

40


2.
2.2

Kademlia


One of the major DHT structures widely used on the Internet is the
Kademlia structure.

Kademlia was designed by Petar Maymounkov and David Mazières in
2002. It is currently used in various file sharing networks such as the Kad
network, Bit Torrent net
work and the Bit Coin network.

In Kademlia, each node and each file is supposed to have a unique
numerical identifier. In their paper, Maymounkov and Mazières did not put
restrictions onto which values should be used, as long as they were uniformly
random

(Maymounkov, 2002)
.

Kademlia introduces the notion of a “distance metric”, which is used to
classify how two nodes are close to each other. The idea of a distance metric
was used in other DHTs as well such as an older DHT system called “Chord”.
The primal
motivation behind it is the intuitive fact that if one wants to find a
value, one would go to the closest peer he/she knows about in the system, and
then if that peer is not the intended one, it would return the list of closest peers it
knows about, for wh
ich one would ask again in an iterative process until the
value is either found or the search space is exhausted

(Maymounkov, 2002)
.

The distance metric in Kademlia is the Exclusive Or (XOR) of two IDs.
The
advantage is that XOR satisfies the generic dista
nce properties, namely:



Self Distance is always equal to zero:

XOR (X
, X
) = 0



Symmetry:

XOR(X
, Y
) = XOR(Y
, X
)



Triangular Inequality:

Alex Movsessian

Computer Science

BUE

41


The sum of any two sides is always greater than the third side.

In its lookup process, Kademlia does not have distinction
between looking
up for peers or for keys. Both do have the same address structure. A lookup
function has the parameter of ID and a flag for looking up values. When a node
receives a lookup request, it examines the closest nodes it knows about with
relative

to the requested ID. The primal difference is that when looking up values,
a particular node would return the value if it owns it, otherwise it would return the
list of the IDs of the closest nodes it knows about. The process keeps going
forward as long a
s the newly retrieved nodes are smaller in their distance from
the requested node. That would mean advancing a single bit at a time.

The primal advantage for this design is that all lookups are
O(
log n),
where n is the number of nodes in the network

(Maymo
unkov, 2002)
.
To illustrate
the advantage of such a speedy lookup process, the logarithmic function

would
mean for example no more than 20 lookups for nodes if the network has 10
million simultaneously connected nodes and no more than 32 if the network has

4 billion simultaneously connected nodes.

Fig.1
3

shows a Kademlia Binary Tree Example (Maymounkov, 2002).

Alex Movsessian

Computer Science

BUE

42



Fig.

13

Kademlia Binary Tree Example

(Maymounkov, 2002)



Alex Movsessian

Computer Science

BUE

43


Fig. 1
4

shows the Kademlia lookup process illustrated by an exam
ple

(Maymounkov, 2002)
.



Fig.
14

Kademlia Lookup Process (Maymounkov, 2002)


For practical purposes, the

original Kademlia

paper suggested that
networks with large address space (for example 160 bits), that the lookup be
divided
into “chunks” of the address. That means instead of comparing the 160
bits, which would lead to the bits following the first few having a probability of
almost zero to bec
ome a match, to compare every i
th

bit instead, thus having a
more efficient storage f
acility for storing the known nodes.

Alex Movsessian

Computer Science

BUE

44


For reliability purposes, the original design paper of Kademlia makes the
observation that a node is more likely to remain online in the future as a function
of how long it stayed online previously (its uptime).

Graph.
1

shows a studied
Node Uptime Probability (Maymounkov, 2002).


Graph.

1

Node Uptime Probability (Maymounkov, 2002)

That observation is used when picking which nodes to keep as “favourite”
nodes, if the distance is the same, or as a
general heuristic rule to optimize the
lookup process.

Implementation wise, the original design paper suggests using UDP (User
Datagram Protocol). However, in almost all of recent implementations UDP has
been dropped in
favour

of TCP for increased reliabil
ity.



Alex Movsessian

Computer Science

BUE

45


2.2.3 Distributed File Systems


A Distributed File System (also known as Clustered File System or
Parallel File System) is a system for serving files, enabling accessing them from
a variety of computers simultaneousl
y (Baker, 1991). They are either centrally
managed (such as the Google File System, or Microsoft’s Distributed File
System) or decentralized (either fully or partially by means of several “master”
nodes monitoring the file system).

Distributed File Systems

have the following criteria (Howard, 1988):



Avoiding Single Point of Failure:

Due the nature of having a large
number of connected nodes, failure of any number of nodes is expected
and should be considered in the design process.



Performance:

Overhead for
data exchange should be kept minimal to
avoid having exponential performance drawbacks as the number of
connected nodes and stored files increase.



Concurrency:

Mechanisms should be set to avoid having collisions
between multiple versions of files issued by

various parties.



Security:

Measures should be put in place to prevent single node or small
number of misbehaving nodes from abusing the system and causing
irrecoverable damage.



Alex Movsessian

Computer Science

BUE

46


2
.
3

Security


2.3.1
Symmetric Key Encryption


A

symmetric key encryption algorithm is one that given an input (called
“plain

text”), and a key, outputs
what’s

called “
cipher text
”, which is
based on
deterministic transformations of the plaintext based on the key.

The
corresponding decryption algorithm
does a reverse process of taking a cipher
text and transforming it to the original plaintext it was created from by making use
of the same key that was used for encryption (hence the “symmetric key”
terminology)

(Bellare, 1997)
.

Two major
types’

symmetric
-
key encryption
methods exist
,
which are
:



Stream Ciphers
:

Which encrypt a stream one character (or token) at a
time. Those have the advantage of being “online”, as in encrypting the
data on the fly, with a minimal overhead and time

(S
hujun, 2001)
.




Block Ciphers
:

Which encrypt the data one block at a time, padding the
plaintext whenever needed to match the specified size of the block
parameter for the algorithm. They do make use of various “rounds”,
whereby in eac
h round a group of permutations is performed on the data
from the previous round based on a new key derived from the previous
round permutations on the original key

(Bellare, 2000)
.

Both types of ciphers produce an output that is the same size as the input
,
having very minimal space overhead, with minimal runtime requirements, as the
operations were designed to be conducted in a speedy fashion, with even
various processors having built
-
in special purpose instructions particularly for
block cipher algorithms
.

Alex Movsessian

Computer Science

BUE

47


Data Encryption Standard (DES) used to be the dominant block cipher until
the late 1990s, where the advance in computing power rendered it obsolete.
Instead, the Advanced Encryption Standard (AES) evolved in 200
0

as the winner
for a contest held by the U
.S. National Institute of Standards and Technology
(NIST) to create the next generation cipher for use on top
-
secret documents
protection

(Lipmaa, 2000)
.



Alex Movsessian

Computer Science

BUE

48


2.3.2
AES

Algorithm


Table.2 shows a summary of the AES algorithm features.

Ta
ble
2

Summary of AES Algorithm Features (Paar, 2010)

Block size

128 bits (fixed)

Key sizes

128, 192 or 256 bits

Rounds

10, 12 or 14 (varies according to the
key size)


No critical attack methods have been developed against AES an
d it has
been in use by the US government starting from 2001 for all data including “top
secret” classified ones. The popularity of AES drove various software
manufacturers to support it for encrypting the traffic of their applications. Various
software ve
ndors included AES libraries in their development tools (such as the
.Net AES library and the Java AES library).

Hardware

manufacturers such as Intel embedded AES instruction sets in
their high
-
end processors as well for speeding up the encryption and dec
ryption
process.

A throughput of 700 MB per second per thread has been
claimed on the
Intel i7 chipset, which was credited largely
to the

parallel enhancements which
were possible for AES

(Benadjila, 2009)
.

Fig. 1
5

shows the basic operations of
the AES alg
orithm at a high level

(Paar, 2010)
.


Alex Movsessian

Computer Science

BUE

49



Fig.

15

AES (Paar, 2010)




Alex Movsessian

Computer Science

BUE

50


2.3.3
Public Key Encryption


Public Key Encryption is also known as “asymmetric key encryption”
makes use of two distinct keys which
are created together. The keys could be
referred interchangeably as public key and private key. A message M encrypted
by one of the keys could be decrypted by the other

(Cramer, 2003)
.

The primal advantages of public key encryption is Authenticity: Because

one of the keys is not shared, it could be used for either encryption (in which
case ensuring only the party who know the secret key was the one who
encrypted a certain piece of data), or decryption (ensuring only a single party
would be able to read a ce
rtain message). The former is referred to as “digital
signature”.

Disadvantages to public key encryption include
:



R
elatively slower speed for encryption and decryption
.



N
ot supporting m
essages larger than the key size,
which causes the need
for relatively
sophistic
ated splitting for the messages.



A
dding potential overhead to the encrypted messages if they are less than
the key size

(Barrett, 1987)
.

To combat the
aforementioned

disadvantages, public key encryption is
usually used with symmetric key encryptio
n, whereby public key encryption
provides the authenticity portion that is lacking in symmetric encryption, while
symmetric encryption provides the speed and low overhead needed for high
-
speed communications

(Harkins, 1998)
.



Alex Movsessian

Computer Science

BUE

51


2.
3.4

RSA


The RSA algorithm

is the most widely known and used design for public
key encryption. It was first described in 1977 by Ron Rivest, Adi Shamir and
Leonard Adleman.

The RSA consists of generating a key pair first
. Fig.1
6

shows an algorithm
for perform
ing that process (Holmes, 2012).











The encryption and decryption are illustrated through the following
equations (“m” is the plaintext and c is the
cipher text
).



Early implementations of the RSA algorithm included various issues which
were reso
lved eventually, some of which include:

Fig.

16

RSA Key Pair Generation (Holmes, 2012)

Alex Movsessian

Computer Science

BUE

52




Timing

attacks
:

Given the time that a message was generated, and
knowing that the random number generator used for generating the
primes has initialized its seed by that time, it was possible t
o reconstruct
the key pair by a 3
rd

party. This issue has been resolved by using other
parameters than the time for initializi
ng the random number generator,
hence the invention of “crypto
graph
ic random number generators (Jun,
1999)
.



Enumerable RSA keys:

V
arious faulty RSA key generators would
general only an enumerable number of keys (in the range of few millions),
which could be all enumerated eventually and tested on various keys.
Such an issue has

been resolved by the invention of more secure RSA
key ge
nerators

(Kelsey, 1998)
.



Small Key Sizes:

Using enough computing power (dozens or hundreds of
machines) which is currently obtainable through services which rent
processing time such as Amazon EC
2

or Microsoft Windows Azure, it is
possible to have enough c
omputing power to crack RSA keys of smaller
size such as 256 or even 512 bits. Such an issue led to the
recommendation of using key sizes of at least 1024 bits, with 2048
becoming an industry standard

(Courtois, 2007)
.



Alex Movsessian

Computer Science

BUE

53


2.3.5
Key Exchange


As previously mentioned, symmetric key encryption algorithms such as
AES are quite efficient and fast, yet they do lack the authenticity needed for them
to operate on an adversarial open medium such as the Internet. Using AES for
encryption is
accept
able

on the Internet assuming the two parties could first
exchange the required encryption key in a safe manner, without any potential
adversaries able to intercept it.

The RSA key exchange between parties A and B is as follows

(on the
assumption that part
y A is the one which would initiate the exchange)
:



“A” generates a random number; to be used once and only once (called
nonce as in “number use once”).



“A” encrypts the nonce with B’s public key. Only B could decrypt the
nonce.



“A” sends the encrypted nonce to B.



“B” receives the nonce, decrypts it,
concatenates a nonce of its own
encrypted with “A”’s public key, and sends the new message to “A”.



By this point, “A” would have authenticated “B”, and thus it would send
“B”’s nonce after decrypting it, with an AES key to use for encrypting the
exchanges
messages for the rest of the session.



Receiving its nonce, “B” would have authenticated “A” as well, and the
secure transmission could begin, with all messages encrypted using the
AES cryptographic algorithm.




Alex Movsessian

Computer Science

BUE

54


2.3.4
Hashing Algorithms


A hashing algorithm is a function that takes an input from a universal key
space and maps it to a smaller key space
ge
nerating the
hash
value.
Hash codes
are used for variety of purposes such as authentication of data and integrity
checking. Hash functions which are used for critical data are referred to as

cryptographic

hash functions”, which exhibit various properties
to resist potential
crypto analytical attacks

(Preneel, 1994)
.

Hash functions have the following features:



One
-
way

Function
:

It is infeasible given a hash code to generate
the original value from which it was hashed.




Collision Re
sistance
:

It is infeasible to find two values for which
their hash code is the same.


Fig.1
7

shows A hash function used in a phonebook to map names to
phone numbers (Rogaway, 2004).

Alex Movsessian

Computer Science

BUE

55



Fig.

17

A hash fun
ction used in a phonebook to map names to phone
numbers

(Rogaway, 2004)


2.3.5
SHA 512


A particular implementation of hashing algorithms is the Secure Hash
Algorithm (SHA). In the 512 bit variation, it produces hashes of 512 bit size for a

given byte array of input

(Grembowski, 2002)
. In 2001 it was approved by the US
National Institute for Standards and Technology (NIST) for usage of government
documents for data processing purposes, and has since become the industry
standard, having vario
us supporting libraries in modern programming frameworks
such as .Net and Java.

The primal security aspect in SHA is its ability to map inputs onto unique
outputs with infinitesimal probability for collision given inputs of reasonable sizes.
That is partl
y due to the larger key space it has than previous hash algorithms
(such as CRC32) and its inherent mathematical properties as a design
specification to ensure a sufficient level of complexity.

Alex Movsessian

Computer Science

BUE

56


SHA divides the input into various blocks, and makes use of a
cryptographic

compression function to reduce the size of each block onto a fixed
one that is represented by the number of keys in the SHA parameter.

Fig.
18

shows the operation of the SHA
-
512 algorithm (Grembowski, 2002).


Fig.

18

SHA
-
512 (Grembowski, 2002)



Alex Movsessian

Computer Science

BUE

57


2.4 Web Services

2.4.1
HTTP

Protocol

The Hypertext Transfer Protocol (HTTP) is the most commonly used
protocol on the web for transmitting web pages.

It was designed in 1991 by Sir
Tim Berners Lee for satisfying the need of se
rving web pages to a variety of
different computers which
not a common protocol for message exchanged

before.

HTTP is a stateless protocol, which does not maintain memory about any
previous messages exchanged during a communication session. The primal
oper
ations it supports are “Request” with a parameter indicating the requested
resource (for example a web page path), and “Response” which carries out a
response for a previously sent request message

(Fielding, 1999)
.

HTTP is a client
-
server protocol, whereby

the client entity connects to a
server and sends its messages, and the server replies to the clients. In HTTP, a
client is known as “user agent”.

The most widely adopted HTTP standard is
HTTP/1.1 which was released in 1999.

Fig.
19

shows the basic operati
on of the HTTP protocol for a client to
retrieve a page from a server (Budgen, 2003).

Alex Movsessian

Computer Science

BUE

58



Fig.

19

basic operation of the HTTP protocol for a client to retrieve a page
from a server (Budgen, 2003

HTTP supports a number of status code
s to indicate certain flags

(Berners
-
Lee, 1996)
.
Table.

1 shows the different HTTP status codes and their
meanings.

Table.

3

HTTP Status Codes (Berners
-
Lee, 1999)

Status Code

Semantic
Meaning

200

No errors, request satisfied succes
sfully

301

Web page moved permanently

305

A proxy is required

403

Access denied

500

Internal Server Error

404

Page not found


Alex Movsessian

Computer Science

BUE

59


In addition, HTTP supports the notion of “cookies”, which are small chunks
of text (not more than 4 kilobytes

typically
), wh
ich when sent by the server are
stored locally and then sent every time from the browser to the server

(Bristol,
2001)
.

Fig.2
0

shows the structure of HTTP header messages (Berners
-
Lee,
1999).


Fig.

20

HTTP Message Structure

(Berne
rs
-
Lee, 1999)

Various HTTP Clients exist such as Internet Explorer, Mozilla Firefox and
Google Chrome.

HTTP Servers exist as well, either though independent applications such
as Microsoft’s Internet Information Services (IIS) or natively in operating syste
ms
such as the
“HTTPServer”

Windows APIs.



Alex Movsessian

Computer Science

BUE

60


2.4.2
Software as a Service (SaaS)


Software as a Service (SaaS) is a growing 12 Billion US dollars a
year industry. The primal aspects of SaaS
evolves around o
ffering
applica
tions which make use of significant and expensive computation of
networking power

(Buxmann, 2008)
.


Such applications are accessible through relatively
-
affordable “thin
clients”, a term that refers to computing equipment with the minimal
capabilities of an

interface to larger system running in the background

(Buxmann, 2008)
.

SaaS has been existing in various forms since the dawn of the
modern computing era in the 1960s, when mainframes had “terminals”
accessible by various researchers and other computer use
rs who could
make use of the powerful terminal resources by means of the terminals
providing the input/output mechanism

(Turner, 2003)
.

More recently, the primal focus for SaaS became on the “cloud”
platforms. “Cloud” platforms nowadays refers to services
offered by major
corporations such as Amazon’s EC^2

and Microsoft’s Windows Azure

(Mell, 2011)
.


Those corporations make use of their ability to purchase large
amounts of computing machinery and stori
ng them in what’s being called
“server farms”. A typical server farm would have 10,000
-
100,000
machines having storage in the excess of petabytes, with a combined
processing power equivalent to that of a supercomputer

(Gandhi, 2010)
.


By having the computi
ng infrastructure in place, such corporations
could quite effortlessly lease portions of it for small individual requests at a
minimal cost, as the bulk of the cost is distributed evenly among millions of
potential users.

Alex Movsessian

Computer Science