p2p

bonkburpsNetworking and Communications

Oct 23, 2013 (3 years and 5 months ago)

155 views

P2P Architectures

Klaus Marius Hansen

University of Aarhus

2003/09/09

Introduction to software architecture. Reference architectures for P2P systems.
Communication paradigms for P2P computing

P2P Architectures



Material




Software Architecture Introduction




P2P Reference Architectures




P2P Communication Paradigms




Summary


Material

Material



(Taylor & Dashofy 2001)

o

Taylor, R.N. and Dashofy, E.M. (2001) Function Follows Form: Architecture
and 21st Century Software Engineering. In
Proceedings of the Vanderbilt
Workshop on
New Visions for Software Design & Productivity: Research & Applications


o

A general discussion of architectures for novel application areas, focussing on
publish/subscribe and P2P architectures



(Melville, Walkerdine & Sommerville 2003)

o

Melville,

L., Walkerdine, J. & Sommerville, I. (2003), P2P Reference
Architectures,
Technical Report IST
-
2001
-
32708,

Lancaster University.

o

Chapter 1 cursory, chapter 2

o

Gives "reference architectures" for P2P computing. Discusses existing systems
in relation to the
se reference architectures



(Baehni, Eugster & Guerraoui 2002)

o

Baehni, S., Eugster, P. & Guerraoui, R. (2002),
OS support for P2P
programming: A case for TPS
. Technical Report 200204, Computer and Communication
Sciences, LEcole Polytechnique Federale de Lau
sanne (EPFL).

o

Mixes the two communication paradigms presented in (Taylor & Dashofy
2001): P2P and publish/subscribe. Discusses a prototype built on top of JXTA

Software Architecture Introduction

What is "Software Architecture"?



The software architecture of

a program or computing system is the structure of the
system, which comprise software components, the externally visible properties of those
components, and the relationship among them



Implications

o

Every software system has a software architecture



This ar
chitecture may be good or bad



Architectures exist independently of their representation

o

Architecture defines components



Architecture balances between abstracting away detail and still
allowing for meaningful analysis of systems

o

Software systems can compris
e more than one structure



No structure is
the

architecture



Canonical reference on software architecture:

o

(Bass, Clements & Kazman, 2003) Bass, L., Clements, P. & Kazman, R. (2003)
Software Architecture in Practice
, 2nd edition, Addison Wesley, 2003

o

Defini
tions stolen from there

Why is Software Architecture Important?



Enables communication among stakeholders

o

Software architecture represents a common high
-
level abstraction
-

learning
what a system is all about

o

Basis for mutual understanding, forming consensu
s, communication among
customers, developers, users



Allows early design decisions

o

Design and build software systems

o

Amenable to analyses



Creates transferable abstractions of a system

o

Relatively small, intellectually graspable model of a system

o

Can be appli
ed to other systems
-

effective level for addressing reuse issues

A Bad Architecture Description
-

Why?


Architectural Structures (1)



Conceptual/logical structure

o

Components: Abstractions of the systems functional requirements

o

Connectors: Shares
-
data
-
with

relations

o

Useful for: Understanding interactions between entities in problem space



Process/coordination structure

o

Components: Processes or threads

o

Connectors: Synchronizes
-
with, can't
-
run
-
without, preempts, ...

o

Useful for: Dealing with process synchroniza
tion and concurrency



Uses structure

o

Components: Procedures or modules

o

Connectors: Assumes
-
the
-
correct
-
presence
-
of relation

o

Useful for: Engineering easily extensible systems (e.g., using incremental build)

Architectural Structures (1)
-

Example



...

Architec
tural Structures (2)



Module structure

o

Components: Work assignments with associated products (interface, code, ...)

o

Connectors: Is
-
a
-
submodule relations

o

Useful for: Allocating work during maintenance and development



Calls structure

o

Components: (Sub)procedur
es

o

Connectors: Calls or invokes relations

o

Useful for: Tracing flow of execution



Data flow

o

Components: Programs or modules

o

Connectors: May
-
send
-
data to relations

o

Useful for: Requirements traceability

Architectural Structures (2)
-

Example



...

Architectural
Structures (3)



Physical structure

o

Components: Software mapped onto hardware

o

Connectors: Communicates
-
with

o

Useful for: Reasoning about performance, availability, security



Control flow

o

Components: Programs, modules or system states

o

Connectors: Becomes
-
active
-
after relations

o

Useful for: Verifying functional behaviour as well as timing properties



Class structure

o

Components: Objects

o

Connectors: Inherits
-
from or is
-
an
-
instance
-
of

o

Useful for: Reasoning about collections of similar behaviour



...

Architectural Struc
tures (3)
-

Example



...

Architectural Description Languages (ADLs) (1)



Used to describe architectural structures

o

Precise (and more or less formal)

o

Provide overview

o

Useable for analysis and design



A plethora of variants

o

We just saw the Unified Modeling Lang
uage (UML) used as an ADL



Stereotyped class diagrams with objects for logical structure



Class diagrams with packages for module structure



Combined deployment and component diagrams for physical/process
structure

o

Another example: The WRIGHT Language

Archite
ctural Description Languages (ADLs) (2)

Configuration SimpleSimulation

Component TerrainModel(map : Function)

Port ProvideMap = [Interaction Protocol]

Computation = [provide terrain data]

Component = VehicleModel

Port Environment = [Interaction Protocol]

C
omputation = [compute vehicle movement]

Connector UpdateValues(nsims : 1..)

Role Model1::nsims = [Interaction Protocol]

Glue = [Data travels from one Model to another]

Instances

Pittsburgh : TerrainModel([map of Pittsburgh])

PAT Bus : VehicleModel

C : Upda
teValues(2)

Attachments

Pittsburgh.ProvideMap, PAT Bus.Environment as C.Model

End SimpleSimulation.

An ... Architecture Description


Qualit
y Attributes



Two broad categories of quality attributes

o

Observable via execution



How well does the system satisfy its behavioural requirements?



Does the system act in a timely enough manner?



Does the system function as expected when connected to other syst
ems?

o

Not observable via execution



How easy is the system to integrate?



How expensive was the system to develop?



What was the time
-
to
-
market of the system?



Architecture is critical to realizing many quality attributes in systems



Some qualities are not archi
tecture
-
sensitive

Qualities Discernible at Runtime



Scalability

o

The ability of a system to support an increasing use



Performance

o

The time it takes for a system to react to a stimulus



Security

o

The degree to which a system can withstand attacks



Availability

o

T
he part of the deployment period during which a system can deliver the
services it implements

Qualities Not Discernible at Runtime



Modifiability

o

The ability to make changes to the system effectively



Portability

o

The ability of a system to run in different e
xecution environments



Reusability

o

The degree to which a system's components are usable in future systems



Integrability

o

The ability of a system to get separately developed ocmponents to work
together



Testability

o

How well a system supports effective and effi
cient testing

Software Architecture Introduction
-

Summary



Software architecture is concerned with components and connectors of a system



A number of architectural structures exist
-

each are useful for certain aspects of
architecture design/work



Many quali
ty attributes of software systems are tightly connected to (software)
architecture

P2P Reference Architectures

Reference Architectures et al. (1)



Architectural style

o

A description of component types and a pattern of their runtime control and/or
data transf
er

o

Examples: Pipes
-
and
-
Filters (Unix tools), Client
-
Server, Layered Systems

o

Essentially architectural patterns



Reference model

o

A division of functionality together with data flow, typically in mature
application domains

o

Examples: model for compiler, databa
se management system, microkernel
operating system



Reference architecture

o

A reference model mapped onto software components (that will cooperatively
implement the functionality defined in the reference model) and the data flows between
the components

o

Examp
les: J2EE, OSGi, ISO OSI

Reference Architectures et al. (2)




Architectural styles, reference models, and reference architectures are
not

architectures

(Melville, Walkerdine & Sommerville 2003)



Reference architectures seen from a conceptual view (and in a
layered style)



Also induced physical/logical network structure

o

Most P2P systems builds a logical overlay network



On top of TCP, UDP, HTTP, Bluetooth, ...



More on routing and location later...

o

Different Types

Network structures


Layers



Looks at layers for server, client, and decentralized peers

o

Cf. different network structures



Common layers

o

Network Interface Layer



Represents a physical conn
ection to a network

o

P2P Network Layer



Connection/communication components representing P2P middleware

o

Message Resolver Layer



Sending and retrieving data

o

Realtime Connection Monitor



Monitor and adjust connections in realtime, providing quality of service

o

Wo
rkpackage Manager Layer



Assigning and managing work packages

Specific Layers



Server
-
specific layers

o

Repository Manager Layer



Interfaces to any external data sources

o

Check In/Out Data Layer



Concurrent and verified access to data sources

o

Authentication Layer



Authenticates peers connected to a P2P network

o

Version Control Layer



Organises data into different versions

o

Document Management System Layer



Additional document management functionality (e.g., concurrency
control)

o

Awareness Monitor Layer



Monitor and suppo
rt peers', users', and resources' awareness and
availability

o

Data Search/Filtering Layer



Searching the index of the Data Repository



Client
-
specific layers

o

Awareness Controller Layer



Using Awareness Monitor and caching awareness information



Peer
-
specific la
yers ("decentralised
-
specific layers")

o

P2P Network Layer



Publishing and discovery of data since there is not central server

Shared Workspace



Requirements

o

Allowing peers and users to discover, be aware of, communicate with, identify
each other of each

o

Ensur
ing a high quality of service for communications

o

Providing security within the system

o

Allowing data to be stored and managed on the peers

o

Allowing for the creation of a shared workspace user interface for the peers

o

Allowing for the creation of de
-
centralis
ed and semi
-
centralised systems



Example

o

Groove

Shared Workspace
-

Server


Shared Workspace
-

Client


Shared Workspace
-

Peer


Search System



Requirements

o

Allowing peers t
o discover and be aware of each other

o

Allowing peers to discover and be aware of resources (for example, data) that
may exist on the network

o

Allowing peers to communicate with each other, including allowing the
transference of data

o

Allowing peers to search

the network for resources, potentially using a variety
of filtering techniques

o

Allowing peers and resources to be uniquely identified on the network

o

Providing security within the system

o

Allowing data to be stored and managed on the peers

o

Allowing for the
creation of a search system user interface for the peers

o

Allowing for the creation of de
-
centralised and semi
-
centralised systems



Example

o

Search in Gnutella

Search System
-

Server


Search System
-

Client


Search System
-

Peer


Other Reference Architectures



Document Management



Instant Messenger



Cooperative Environment

Mapping of Reference Architectures
-

Napster



Hybrid of search and shared workspace reference architectures


Mapping of Reference Architectures
-

JXTA



"The P2P Network" comparable to Network Interface layer (representing peers on
network)



"JXTA Core" comparable
to P2P Network Layer



"JXTA Services" map to various layers (and application capabilities)



"JXTA Applications" is not specifically represented in reference architectures


P2P Reference Architectures
-

Summary



Reference architectures embody a reference model and architectural styles



P2P reference architectures are useful for designing software architectures for P2P
systems



Melville et al.'s list is a

starting point, but not exhaustive

P2P Communication Paradigms

"Architectures for the 21st Century"



(Taylor & Dashofy): Two architectural styles essential for current and future software
systems



P2P architectures

o

Independent agents capable of performing u
seful work

o

No central authority



Event
-
based notification architectures

o

Loosely coupled, highly dynamic

o

Listens for notifications, take action based on these



Why not combine these?

o

(Baehni, Eugster & Guerraoui 2002)

Publish/subscribe



Publish/subscribe: even
t bus

o

publish events

o

subscribe to events by giving criteria



Traditional publish/subscribe

o

subject
-
based or content
-
based publish/subscribe

o

an event = a list of simple name/value pairs



Type
-
Based Publish/Subscribe (TPS) is an object
-
oriented variant

o

events
are instances of application
-
defined types, i.e., objects

o

generalisation of subject
-

and content
-
based pub/sub

TPS "Architecture"


TPS Example (1)



TPS Example (2)

class StatusChangedEvent extends PublishSubscribe.Event {

private String user;

private String status;

public String getUser() {return user
;}

...

}

StatusChangedEvent event = new StatusChangedEvent("damm",
"active");

publish event;

Subscription s = subscribe (StatusChangedEvent event) {

return true;

} {

updateUserStatus(event.getUser(), event.getStatus());

}

s.activate();

TPS Example (3)


T
PS Benefits



Publish/subscribe mechanisms decouple publishers and subscribers in

o

Space: publishers and subscribers need not know the location of each other

o

Time: publishers and subscribers need not be available at the same time

o

Flow: sending and receiving m
essages is non
-
blocking

o

Data: subscribers only receives events they are interested in



TPS integrates this into object
-
oriented software architectures

o

Stays native (compare RMI vs. socket communication)

o

Integrated data and computation in events

o

Modelling wi
th event hierarchies



Suitable for P2P applications

Idea: Combine TPS & JXTA


TPS Over JXTA



Main characteristics

o

Uses the August 2001 JXTA implementat
ion (old)

o

Uses propagation pipes and algorithms



Unreliable, unsecure



One
-
to
-
many (one output, may input pipes) + propagation

o

One type is represented by one advertisement

o

Uses Generic Java (GJ)



"
List<Integer>
"



Allows to create templates in Java (similar to
virtual classes)



Avcids casts when using collections



Integrated in Java 1.5

o

Part of the JXTA application layer

Architecture (1)




(Which structur
e(s)?)

Architecture (2)


Programming



Initialization

TPSEngine<SkiRental> tpse = new TPSEngine<SkiRental>();

TPSInterface tpsInt = tpse.newInterf
ace("JXTA", null, new
SkiRental(), argv);



Callback interface

public class MyCBInterface implements
TPSCallBackInterface<SkiRental> {

public void handle(SkiRental skiR) {

...

}

}



Exception handler

public class MyExHandler implements
TPSExceptionHandler<SkiR
ental> {

public void handle(Throwable th) {

...

}

}



Subscription

MyCBInterface mCBInt = new MyCBInterface();

MyExHandler mExH = new MyExHandler();

tpsInt.subscribe(mCBInt, mExH);



Publication

tpsInt.publish(new SkiRental("XTremShop", "Salomon", 14f,
100f));

Results



Better abstractions for (some) P2P applications

o

High
-
level

o

Easy to use

o

Type safety

o

Decoupling



Performance compared to same application in JXTA

o

Same invocation time

o

Same publisher throughput

o

Same subscriber throughput

Summary

Summary



Working with s
oftware architecture is essential for any complex software system
-

including P2P system



Reference architectures may help in constructing software architectures
-

and thus
systems



Publish/subscribe and P2P communication paradigms may be beneficially combin
ed


Created by
JackSVG


Quality Attributes

Klaus Marius Hansen

University of Aarhus

2003/09/16

Quality attributes of P2P systems. Dependability. Performance. Measurements. Scalability.

Quality Attri
butes



Material




Dependability and P2P




Performance and P2P




Measurements of P2P Systems




Scalability of P2P Systems




Summary


Material

Material



(Walke
rdine, Melville & Sommerville 2002)

o

Walkerdine, J., Melville, L. & Sommerville, I. (2002),
Dependability
properties of P2P architectures
, Technical Report, http://polo.lancs.ac.uk/p2p/,
Lancaster University.

o

General discussion of dependability
-
related qua
lity attributes in P2P
computing.



(Oram 2001) chapter 14

o

Hong, T. (2001), Performance. In Oram, A. (Ed.),
Peer
-
to
-
Peer. Harnessing
the Power of Disruptive Technologies
, O'Reilly, pp 203
-
242.

o

Introduction to the "small worlds" model of P2P systems. Discusse
s
performance of Gnutella
-

and Freenet
-
like P2P systems.



(Saroiu, Gummadi & Gribble 2002)

o

Saroiu, S., Gummadi, P. K. & Gribble, S. D. (2002), A measurement study of
peer
-
to
-
peer file sharing systems, in
Proceedings of Multimedia Computing and
Networking (M
MCN) 2002
, pp. 407
-
418.

o

Measurements of the nature of peers in Napster and Gnutella



(Chawathe, Ratnasamy, Breslau, Lanham & Shenker 2003)

o

(Chawathe, Y., Ratnasamy, S., Breslau, L., Lanham, N. & Shenker, S. (2003),
Making Gnutella
-
like P2P systems scalable,

in
Proceedings of the 2003 conference on
Applications, technologies, architectures, and protocols for computer communications
(SIGCOMM'03)
, pp. 407
-
418.

o

Presents algorithms based on the observations of (Saroiu, Gummadi & Gribble
2002)

Dependability and P2
P

Dependability



The thrustworthiness of a system

-

captures a number of desirable characteristic of a
system



Context sensitive

o

Gnutella may be fine for sharing of personal files, but not for sharing of
corporate files...

o

Freenet may be fine for anonymous p
ublishing, but not for backup
applications...



Affected by other architectural properties such as reliability and security



Further complication: A large number of different software architectures exist
-

e.g.,
not one single reference architecture

What Infl
uences Dependability?



Reliability



Scalability

o

Today



Security

o

Later lectures



Survivability/Robustness/Fault Tolerance

o

Today



Safety

o

Not in this course, but intriguing...



Maintainability



Responsiveness/Performance

o

Today



Responsibility, Accountability, Reputat
ion

o

Later lectures



Availability

o

Today

Performance and P2P

"Performance"



The responsiveness of a system
-

the time required to respond to stimuli (events) or the
number of events processed in some interval of time

o

Communication between components often take

longer than computation
within components
-

particularly for distributed systems

o

Thus performance is related to the communication and interaction patterns
between components, which is architectural in nature

o

Historically a driving quality attribute, but n
ot necessarily any longer in
general



Hong (2001) presents an analysis of this for a class of P2P systems, but really
simulates more than that

Why is performance important for P2P systems?



P2P are distributed, often with frequent communication and coordinat
ion



Moore's law doesn't apply to Internet connection speed...



Decentralised systems use messages forwarded over many hops (bandwidth and
connection time)

The Small
-
World Effect



1967: Stanley Milgram gave 160 people in Omaha, Nebraska a letter each

o

Task: Pa
ss this letter to a particular stockbroker in Boston, Massachusetts

o

Use only intermediaries known on a first
-
name basis

o

(Widely criticized as a scientific experiment)



Results

o

42 letters made it through

o

Median number of 5,5 intermediaries (compare 200 milli
on people in US in
1967)!

o

Example: an Engineer in Omaha passed it to a transplanted New Englander
living in Bellevue, Nebraska, who passed it to a math teacher in Littleton, Massachusetts,
who gave it to a local shopkeeper, who gave it to the stockbroker..
.



Consider the situtation as a graph

o

Nodes as people, edges as relationships between people

o

Why such a small characteristic pathlength (= average of distances between all
pairs of nodes in a connected graph)?

A Small
-
World Model (1)



Social networks are hig
hly cliquish/clustered?



However, "bridges" between social clusters exist

o

A quarter of all chains reaching the target person, passed through the same
local shopkeeper

o

Half the chains were mediated by three people



Small number of bridges dramatically reduce
s the characteristic pathlength...

A Small
-
World Model (2)
-

Regular Graphs




Consider a regular graph

o

Connected

o

n nodes, all of degree k


o

Lon
g characteristic pathlength ~ n/2k for n >> k

o

High clustering coefficient



(= proportion of possible links between neighbours that are present)

A Small
-
World Model (3)
-

Random Graphs




Consider a random graph

o

n nodes, random (probability p) edge to on avera
ge k (= (n
-
1)p) neighbours


o

Short characteristic pathlength ~ log n/log k for n >> k

o

Low clustering coefficient (p ~ k/n)

A Small
-
World Model

(4)
-

Hybrid graphs




"Rewire" edges in regular graph with probability p

o

p = 0: regular graph

o

p = 1: random graph




With higher p, clustering
remains high, but pathlength drops


Case: Freenet



Freenet forwards queries for a GUID based on "closeness" of known GUIDs

o

GUID and data locat
ion is cached on way back

o

Creates a directed graph of Freenet nodes and references to other nodes




Claim: Freenet exhibits small
-
world charac
teristics

o

Requires connectedness
-

induction and insert/request implementation

o

Requires short characteristic pathlengths + high clustering
-

anonymity, need
to do simulation...

Freenet Simulation



Step 1: Evolving a reqular network topology

o

Create 1000 init
ial nodes, empty data store, space for 50 data items, and 200
additional references

o

Connect to the 4 closest nodes (using hashes)

o

(Pathlength: 1000/2*4 = 125, Clustering: 6/12 = 0,5)

o

TTL = 20


o

Simulate network usage. At each time step



Pick random node



Perform a request or insert from that node using random key



Measure network state each 100 time steps

Evolving a Regular Network
-

Evolution

of Pathlength and Clustering Over Time





Pathlength decreases rapidly, clustering coefficient less so
-

small
-
world effect



How does this relate

to performance...?

Probing Static Network
-

Median Request Pathlength Over Time





Performance ~ number of hops per request



For every 100 time
steps, simulate 300 request on static network, TTL = 500



Not optimal, based on local routing rather than global optimisation

Probing Static Network
-

Random Routing
-

Median Request Pathlength Over Time





Worse than Freenet's local routing...



Corresponds to
random walks

(more later)

Simulating Growth
-

Median Request Pathlength in a Growing Network





Start with 20 nodes, add new nodes incrementally



Inserts, requests, probes as before

Fault Tolerance
-

Change in Request Pathlength Under Random Failure





Freenet holds up until 30% of nodes removed

Fault Tolerance
-

Change in Request Pathlength Under Targeted Attack





Freenet now only holds up until 18% of nodes removed

Fault Tolerance
-

Proportion of Nodes vs. Number of Links





Highly skewed



Well
-
connected nodes see more requests, and thus tend to get even better connected
over time

Scalability
-

Median Request Pathlength vs. Network Size





Characteristic pathlength grows logarithmically with number of nodes



Well
-
connected nodes see more requests, and thus tend to get even better connected
over time



Bandwidth requirements also scales logarithmically

(since messages transmitted per
request is proportional to request pathlength)

Case: Gnutella



Queries in Gnutella are forwarded to connected peers, if file is found, an answer is
sent back

o

Query continues within a TTL bound

o

Breadth
-
first search of network

-

given sufficiently high TTL, the shortest path
to data will be found



Claim: Gnutella does not exhibit small
-
world characteristics
-

little network adaptation



Simulation again

o

Random graph, 3 edges per vertex in average

Nodes Contacted
-

Distribution of
the Number of Nodes Contacted per Query





Good worst
-
case performance, high network saturation

Scalability
-

Median Number of Nodes Contacted p
er Query vs. Network Size





Characteristic pathlength scales logarithmically with network size



But bandwidth usage scales linearly...



(Current
solution: "Super Peers", indexing all files of subordinate peers, cf also Kazaa)

Summary



P2P performance is most often dominated by communication costs



Graph model of P2P systems as a basis of performance considerations

o

Nodes: peers, edges: connections to
other peers

o

Freenet: Small
-
world graph, scales well

o

Gnutella: Random graph, good worst
-
case performance



Fundamental assumption: Peers are homogeneous (except for free riders)

Measurements of P2P Systems

Motivation



How do peers
actually

behave in P2P (file
sharing) systems?



Should be taken into account when designing P2P systems and algorithms



Measurements of actual Napster and Gnutella use (i.e., no simulation)

Gnutella and Napster Revisited




Napster

o

Servers maintain index of files on peers

o

Also maintain metadata such as peers' reported connection bandwidth and
duration of connection to system

o

Metadata returned with query answers



Gnutella

o

Peers form

overlay network through point
-
to
-
point connections with a set of
neighbours

o

Queries flooded throughout the network
-

controlled by TTL field

o

Ping/pong messages used to manage overlay network

Measurement Methodology



Two steps

o

Periodically crawl each system

to gather snapshots of peers participating

o

Actively probe discovered peers over a couple of days to measure properties



Measured properties

o

Latency



Round
-
trip time of TCP packet between measurement machine and
peer



TCP discriminates against high round
-
trip

times

o

Lifetime



Using low
-
level TCP packets to tell whether peers are offline, inactive,
or active

o

Bottleneck Bandwidth



The slowest hop between two hosts



Used to approximate available bandwidth at peer

The Napster Crawler



No access to central Napster serve
rs

o

Need to query for popular files and cache peers referenced in responses



Ask servers for metadata based on discovered peers

o

Reported bandwidth

o

Number of files shared by peer

o

Current number of uploads and downloads in progress by the peer

o

Names and sizes
of files shared by the peer

o

IP address of peer



Captured 40%
-
60% of peers on crawled server, contributing to 80%
-
95% of shared
files (based on statistics from server)



Biased towards popular files

The Gnutella Crawler



Using well
-
known peers for bootstrapping

o

And ping/pong iteratively on the set of discovered peers



Capture metadata from pong messages

o

IP address

o

Number and size of files shared



Captured 8000
-
10000 peers ~ 25%
-
50% of Gnutella peer population at a given time



No biases

Peer Characteristics



"Server"

peers need

o

High bandwidth (particularly upstream)

o

Low latency

o

High availability



"Client" peers are

o

Not sharing files

o

Always downloading



Potential heterogeneity has implication for P2P systems design...

Measured Bottleneck Bandwidths for Gnutella




92% with downstream bottlenecks > 100Kbps, only 8% with upstream bottleneck > 10
Mbps



22% with upstream bottleneck bandwidths < 100 Kbps, i.e., unsuita
ble to serve
content and data to a high number of connections

Measured Latencies for Gnutella




Substantial fraction (20%) with latencies > 280ms

Session Durations




Measured durations less than twelve hours (due to length of total measurement period)



Median duration is 60 minutes

Number of
Shared Files in Gnutella




25% of peers do not share files



7% of peers share more than 1000 files, and more files than all other peers collectivel
y

Willingness to Cooperate? (Reported Bandwidths for Napster)




30% of peers report Modems + ISDN but have higher bandwidth

Summary



Addresses nee
d for actual data about peer behaviour



Extensive measurements of real use of Napster and Gnutella



Lessons for P2P system and algorithm design

o

Algorithms for P2P systems should take heterogeneity into account
-

connection speed, latency, lifetime, shared da
ta varies widely

o

Peers should have incentives to cooperate

Scalability of P2P Systems

Gnutella Scalability?



One study showed (Clip2)...

o

Query rate of 10 queries per second

o

560 bits total per Gnutella query

o

Queries = one quarter of Gnutella traffic

o

On avera
ge three remote peers actively connected

10 queries/second

* 560 bits/query

* 4

* 3

------------------

67,200 bits/second



Compare common network connection speed and homogeneous networks...

Then What?



"Gia" protocol (and P2P system)design

o

Trying to take pe
er capacity constraints into account

o

Here: capacity = number of requests a peer can handle per second



Four key components

o

Dynamic topology adaptation

o

Active flow control

o

One
-
hop replication of pointers to content

o

Biased random walks



Modelled on top of the
Gnutella protocol

Topology Adaptation



Goal: ensure that high capacity nodes have high degree and that low capacity nodes
are within short reach of higher capacity ones



Each node maintains node cache of other known nodes to be used for topology
adaptation



C
ompute satisfaction level
S

for each node (say X)

o

S

< 1: Adapt topology for X to have new neighbours and higher satisfaction



Pick potential neighbours for X randomly (prefer ones with high capacity)

o

num_nbrs(X) + 1 > max_nbrs: pick_neighbor_to_drop(X,Y)

pi
ck_neighbor_to_drop(X,Y)


Active Flow Control



Nodes are only allowed to direct queries to neighbours, if actively allowed to do so

o

In "common"
flow control mechanisms for Gnutella, nodes drop messages if
they are overloaded

o

Not acceptable to drop messages indiscriminately when using random walks
(more later)



Gia assigns flow tokens to nodes that nodes send to neighbours from which they are
able t
o handle requests



Assignment of tokens according to advertised capacity of neighbours
-

builds
incentive

One
-
Hop Replication of Pointers to Content



Each node maintains an index of the contents of neighbours



Index incrementally updated
-

mostly up
-
to
-
date..
.

Biased Random Walks



Observation: high capacity nodes can typically provide most useful response for
requests



Nodes forward incoming requests to nodes with highest capacity (for which it has a
flow control token)

Simulation



Comparing

o

FLOOD: Search using T
TL
-
scoped flooding over random topologies



Corresponds to original Gnutella protocol

o

RWRT: Search using random walks over random topologies



Corresponds to other proposed search techniques

o

SUPER: Search using supernode mechanisms in which queries are only
fl
ooded between supernodes and supernodes maintain indices for connected connected
subnodes



Corresponds to the KAZAA protocol

o

GIA



Results

o

Three to five orders of magnitude (1,000 to 100,000 times) improvement in
total capacity
-

retaining robustness

o

No singl
e component of Gia responsible for performance boost

Comparisons of Collapse Points




Collapse point = point beyond which success rate for queri
es drop below 90%

Summary

Summary



Dependability as central
-

discussed related qualities



Performance

o

Network communication dominates

o

Graph
-
based model for performance of Freenet and Gnutella



Measurements



Measurements of real use of Napster and Gnutella



Pee
rs are heterogeneous



Peers need incentives in order to cooperate



Scalability

o

Gnutella
-
like protocol based on peer capabilities

o

Orders of magnitude better scalability than previous protocols


Created by
JackSVG



Routing and Location in P2P Networks

Klaus Marius Hansen

University of Aarhus

2003/09/23

Routing and location in previously introduced systems. Routing in structured overlay
networks: Pastry and Chord.

Routing and Location in P2P Networks



Material




Routing and Location in Introduced P2P System
s




Pastry




Chord




Summary


Material

Material



Previous systems introduced in the course



(Rowstron & Druschel 2001)

o

Rowstron, A. & Druschel, P. (2001), Pastry: Scalable, distributed object
location and ro
uting for large
-
scale peer
-
to
-
peer systems, in
IFIP/ACM International
Conference on Distributed Systems Platforms (Middleware 2001)
, pp. 329
-
350.

o

P2P routing and location in a structured overlay network taking into account
network locality



(Stoica, Morris,

Liben
-
Nowell, Karger, Kaashoek, Dabek & Balakrishnan 2003)

o

Stoica, I., Morris, R., Liben
-
Nowell, D., Karger, D. R., Kaashoek, M. F.,
Dabek, F. & Balakrishnan, H. (2003), Chord: A scalable peer
-
to
-
peer lookup protocol for
internet applications,
IEEE/ACM Tr
ansactions on Networking Software

11(1), 17
-
32.

o

Algorithms for location in P2P networks. Provable correctness and
performance.

Routing and Location in Introduced P2P Systems

Background



Routing:
the process of moving a data packet to a location



Central Ques
tions

o

How do we efficiently locate a peer?

o

When a peer is located, how do we efficiently route messages to and from that
peer?

o

... of course all in the context of no global network knowledge, and frequent
joins and leaves by peers...

A Traceroute Example

I
ndexed



Using index servers for location, IP for routing

o

Napster

o

SETI@Home

o

ICQ

Walking/flooding



(Unstructured) walks based on neighbor sets

o

Gnutella

o

Kazaa (hybrid)

o

(JXTA)

Key
-
Proximity



Route based on (unstructed) narrowing down of difference in keys

o

Freenet

o

Windows P2P

Weaknesses



Single point of failure for indexed



Potentially low performance for walking/flooding



Hard to prove correctness/performance/space requirements for routing protocols

Pastry

Pastry


Overview



Effective, distributed object location and routing substrate for P2P networks

o

"Effective": O(log N) routing hops

o

"Distributed": no servers, routing and location distributed to nodes, only
limite
d knowledge (size O(log N) of routing tables) at nodes

o

"Substrate": not an application itself, rather it provides Application Program
Interface (API) to be
used by

applications



Runs on all nodes joined in a Pastry network



Each node has a unique identifier
(nodeId)



Given a key and a message, Pastry routes the message to the node with nodeId
numerically closest to the key



Takes into account network locality

Pastry API



Pastry exports

o

nodeId = pastryInit(Credentials, Application)
make the
local node join/create
a Pastry network.
Credentials

are used for authorization. An
object used for callbacks is passed through the
Application

parameter

o

route(msg, key)
: routes a message to the live node D with nodeId
numerically closest to the key (at the time of delivery)



App
lication

interface to be implemented by applications using Pastry

o

deliver(msg, key)
: called on the application at the destination node for
the given id

o

forward(msg, key, nextId)
: invoked on applications when the
underlying node is about to forward the give
n message to the node with nodeId = nextId.



(Actually using the FreePastry 1.3 open source Java is slightly more involved)

Assumptions and Guarantees



Each node is assigned a 128 bit nodeId

o

nodeIds are assumed to be uniformly distributed in the 128 bit id s
pace =>
numerically close nodeIds belong to diverse nodes

o

can be achieved, e.g., using a crypthographic hash of IP address of a node



Pastry can route to numerically closest node in ceiling(log2^b(N)) steps (b is a
configuration parameter)

o

If less than |L|/
2 (|L| is a configuration parameter) adjacent nodeIds fail
concurrently, eventual delivery is guaranteed



Join, leave in O(log N)



Maintains locality based on application
-
defined scalar proximity metric

Example Applications



SCRIBE:group communication/event n
otification

o

Groups can be created and joined



Members of a group may multicast messages to all members of a group
(delivered using best
-
effort)



Each group has a unique id, groupId (from a hash of the group name
and the creators name)

o

The node with nodeId nu
merically closest to groupId acts as rendezvous for
the group



Group creation is handled by sending a CREATE message to the node
with id groupId



Nodes wishing to join sends a JOIN message to this node

o

To send a message, a node sends a MULTICAST message to t
he rendezvous



In principle, the rendezvous might just then send messages to all joined
nodes. (SCRIBE actually builds a multicast tree rooted in the rendezvous for
optimization)



PAST: Archival storage

o

Each file inserted gets a 160 bit fileId (from a hash o
f file name, owner's public
key, and random salt)

o

Pastry routes the file to the
k

nodes that are numerically closest to the first 128
bits of the fileId

o

Lookup ensures that the file is found as long as 1 of the
k

nodes are alive



SQUIRREL

o

co
-
operative web c
aching



SplitStream

o

high
-
bandwidth content distribution



...

Routing Table


Routing Example


Pastry Routing Algorithm


Pastry Routing Algorithm
-

Analysis




Observation: Either

1), 2), or 3) must hold

o

For 3): leaf set must contain nodes numerically closer to the key with same
shared prefix as us (otherwise, we are the closest node)

o

-

unless |L|/2 nodes in leaf set have failed... simultaneously



Termination

o

1) Directly terminates
at chosen node

o

2) Node routed to shares a longer prefix with key

o

3) Node routed to shares a prefix of same length but with numerically closer
key



(Expected) performance

o

1) Destination one hop away

o

2) The set of possible nodes with a longer prefix match is
reduced by 2^b

o

3) Only one extra routing step is needed (with high probability)



Given accurate routing tables, the probability for 3) is the probability
that a node with the given prefix does not exist and that the key is not covered by the
leaf set

o

=> exp
ected performance is O(log N)

Self
-
Organization
-

Node Arrival




New node, X, needs to know existing, nearby node, A, (can be achieved using, e.g.,
multicast on local network)



X asks a to route a "join" message with key equal to X

o

Pastry routes this message

to node Z with key numerically closest to X

o

All nodes enroute to Z returns their state to X



X updates its state based on returned state

o

neighborhood set = neighborhood set of A

o

leaf set is based leaf set of Z (since Z has id closet to id of X)

o

Rows of rou
ting table is initialized based on rows of routing tables of nodes
visited enroute to Z (since these share common prefixes with X)



X calibrates routing table and neighborhood set based on data from the nodes
referenced therein



X sends it state to all the n
odes mentioned in its leaf set, routing table, and
neighborhood set



O(log2^b(N)) messages exchanged

Self
-
Organization
-

Node Departure




Assumption: A node that can no longer be communicated with has failed



Repair of leaf set

o

Contact the live node with the
largest index on the side of the failed node and
get leaf set from that node

o

Returned leaf set will contain an appropriate one to insert

o

Contacting works unless |L|/2 nodes with adjacent nodeIds have failed



Repair of routing table

o

Contact other node on the

same row to check if this node has a replacement
node (the contacted node may have a replacement node on the same row of its routing
table)

o

If not, contact node on next row of routing table



Repair of neighborhood set

o

Neighborhood set is normally not used
in routing => contact periodically to
check for liveness

o

If a neighbor is not responding, check with other neighbors for close nodes

Locality



Routing performance is based on small number of routing hops
-

and "good" locality
of routing with respect to unde
rlying network



Scalar proximity metric (e.g., number of IP routing hops, geographical distance, or
available bandwidth)

o

Applications are responsible for providing proximity metrics



Join protocol maintains closeness invariant

Handling Malicious Nodes?



Choos
e randomly between nodes satisfying the criteria of the routing protocol...

Experimental Results
-

Routing Performance



Experimental Resul
ts
-

Routing Distance



Experimental Results
-

Routing Distance



Experimental Results
-

Quality of Routing Tables



Summary



Pastry is a P2P content location and routing s
ubstrate

o

Structured overlay network

o

Usable for building various P2P application



Space and time requirements (expected) in O(log N), N = number of nodes in network



Takes locality into account

Chord

Overview



One operation

o

IP address = lookup(key)
: Given a ke
y, find node responsible for
that key



Goals: Load balancing, decentralization, scalability, availability, flexible naming



Performance and space usage

o

Lookup in O(log N)

o

Each node needs information about O(log N) other nodes

Example Applications



Cooperative

File System (CFS)

o

Building a distributed hash table on top of Chord (DHash)

o

Storing blocks using DHash, lookup using Chord




Distributed Indices

o

Derive k
eys from desired keywords

o

Let values be servers holding documents matching the desired keywords

Use of Consistent Hashing in Chord (1)



Keys are assigned to nodes with
consistent hashing


o

Hash function balances load

o

Rebalancing (when node joins or leaves) r
equires moving only O(log 1/N)



Nodes and keys are assigned m
-
bit identifiers

o

Using SHA
-
1 on nodes' IP addresses and on keys

o

m should be big enough to make collisions improbable



"Ring
-
based" assignment of keys to nodes

o

Identifiers are ordered on an identifi
er circle modulo 2^m

o

A key k is assigned to the first node n whose identifier is equal to or follows k
-

n = successor(k)



Chord improves on consistent hashing by only requiring knowledge about O(log N)
other nodes at each node

Use of Consistent Hashing in
Chord (2)


Use of Consistent Hashing in Chord (3)



Designed to let nodes enter and leave network easily

o

Node n leaves: all of n's assigned keys are

assigned to successor(n)

o

Node n joins: keys k <= n assigned to successor(n) become assigned to n

o

Compare "traditional hashing", e.g., h(x) = ax + b (mod p), in which p
changes...



Example: node 26 joins => key 24 becomes assigned to node 26




(Each physical node runs a number of virtual nodes each with its own identifier to
balance load)

Simple Key Location




(a) Simple key location can be implemented in time O(log N) and space O(1)



(b) Example: Node 8 performs a lookup for key 54

Scalable Key Location (1)



Uses
finger tables


o

n.finger[i] = succ
essor(n + 2^(i
-
1)), 1 <= i <= m


Scalable Key Location (2)




If successor not found, search finger table to find n' whose ID most immediately
precedes id

o

Rationale: this node will know the most about n' of all nodes in the finger table

Scalable Key Location (3)



Performance is O(log N) with
high probability

o

Each node can forward a query at least halfway along the remaining distance
=> less than m steps to find node

o

After 2log N steps, the distance is max 2^m/2^(2log N) = 2^m/N^2
-

and the
probability for two nodes to be in such an interval is

1/N, i.e., negligible



Space required is O(log N) with high probability

o

As above: for i <= m
-

2log N, the i'th finger of the node will be the node's
immediate successor with high probability

Self
-
organization
-

Node failures




Chord maintains successor lis
ts to cope with node failures

o

Node leave could be viewed as a failure

o

If nodes leaves voluntarily, it may notify its successor and predecessor

Experimental Results
-

Path Length


Summary



Decentralized lookup of nodes responsible for storing keys

o

Based on distributed, consistent hashing



Performance and space in O(log N) for stable networks



Simple; provable performance and correctness

Summary

Summar
y



"First generation" routing and location in P2P networks

o

Largely application
-
specific

o

Hard to analyse



"Second generation" routing and location in P2P networks

o

Based on structured network overlays

o

Typically expected O(log N) time and space requirements


C
reated by
JackSVG



P2P Security

1.0

30/September/2003

Niels Olof Bouvin

Aarhus Universitet

P2P Security
-

overall topics for this talk




General security concerns

o

relevant to all distributed systems



Secure P2P systems and techniques

Security



Basic security concerns




(Lack of) Internet security




Cryptography




Closing remarks on security


Basic security concerns
-

all the things that can go wrong

Dangers of distributed systems



Trust

o

who can you trust?



Identity theft

o

pretending to be you (or someone you trust)



Privacy

o

preventing others listeni
ng in on the conversation



Censorship & attacks

o

denying you the right to know

(Lack of) Internet security

The Internet



The Internet is vast and not safe at all

o

data packets going from machine to machine before they reach you



Many standards and protocols est
ablished back in safer days

o

SMTP
,
NNTP
,
telnet
, ...



Today there are plenty of sociopaths, who would delight in destroying your data or
machine

o

see iloveyou, Code Red, SQL Slammer, SoBig.F, Swen, etc. etc.

o

not to mention industrial espionage etc.



Spammers,
anyone?

o

it has been claimed that recent worms are behind DDoS attacks against anti
-
spam sites...

Who can you trust?



Surely you can trust well
-
established Web sites?



Several important open source
ftp

servers have been 'owned' over the years

o

thus leaving bla
ck hats free to insert code of their own in the
cvs

trees...



This also happened for Microsoft last year (apparently without any ensuing nastiness)...



And of course numerous sites have been hacked for credit card numbers etc.

Email



Email remains (despite sp
ammers) the most successful CSCW and file sharing
application on the planet



The vast majority of email is not encrypted

o

neither in transit nor at source/destination



Standards for email encryption exist

o

PGP (Pretty Good Privacy

o

S/MIME



These are generally so
mewhat cumbersome and are (therefore) not widely used

Email



Email headers can be easily spoofed



faking the sender of an email is trivial



One of the reasons behind the "success" of email worms

o

users naturally trust email received from friends, colleagues, a
nd family...



Similar points can be raised for instant messaging

File systems



File systems are generally, by default, not encrypted



Thus not funny at all to loose the company laptop PC...



Without encryption your data is only as safe as the hard disk it resi
des on

Cryptography

Cryptography



Fact: Messages can be intercepted. But intercepted data is worthless, if the interceptor
cannot read it

o

(the people involved are traditionally known as Alice, Bob, and Carol (the
latter usually trying to intercept and decry
pt messages)



Cryptography is very old (at least as old as the Roman Empire), and has been based on
a long number of techniques

o

letter substitution or various permutations (e.g. ROT
-
13)

o

one time pads

o

complex machines (e.g. Enigma)

o

etc.



Today cryptography is

based on advanced, hard
-
to
-
solve mathematical problems



Regardless of the method used, a
key

is used to signify how the plain text is
transformed into cipher text

Symmetric cryptography



The same key is used to encrypt and decrypt the message



Advantages

o

sym
metric cryptography is fast



Disadvantages

o

the key must be
securely

exchanged between Alice and Bob

o

if the key is compromised, the entire communication is instantly readable

Asymmetric cryptography



Keys come in pairs:

o

a public key known to all

o

a private (or

secret) key known only by the individual user



A message encrypted with the public key can be decrypted only by the private key

o

So if Alice encrypts a message with Bob's public key, only Bob can decrypt it
with his private key



A message encrypted with the
private key can be decrypted only by the public key

o

So if Alice encrypts a message with her private key, all can verify (using
Alice's public key) that Alice is the author

Asymmetric cryptography



Advantages

o

as the private key is never shared, the system is

secure

o

the system can also be used to authenticate (or "digitally sign") messages



Disadvantages

o

only as secure as the private key...

o

much slower than symmetric cryptography

Establishing trust



How does Alice know Bob is really Bob, and not Carol claiming t
o be Bob?



Asymmetric cryptography often relies on CAs
-

Certification Authorities


o

these, using out
-
of
-
band methods, establish the correct identity of Bob, and
assigns a (signed) certificate to Bob

o

Alice can then verify that some CA has vouchsafed Bob, and

if she trusts the
CA, she can trust Bob



A problem with these certificates is the cost...



A less centralised approach is taken by PGP (Pretty Good Privacy), where Bob relies
on associates to confirm his identity

o

if Alice knows (and trusts) any of these ass
ociates, she can trust Bob's identity

Symmetric/asymmetric cryptography



Not an either/or situation
-

often used in combination



Asymmetric cryptography is used for the initial communication to establish identity
and (securely) exchange a randomly generated
symmetric key used for the rest of the
session



This is the method used by SSL (Secure Socket Layer), which handles e.g.
https


o

the Web server provides the Web browser with its CA signed certificate (this
makes man
-
in
-
the
-
middle attacks harder)

o

the Web brow
ser generates a random key, encrypts it with the Web server's
public key, and returns it to the Web server

o

as only the Web server can decrypt the key, the Web server and Web browser
can now initiate a symmetric encrypted session

Secure hashes



Secure (or cr
yptographic) hashes are used to verify the integrity of a message



Most common are MD5 (128 bit) and SHA
-
1 (160 bit)



It is computationally infeasible to create two different messages with identical secure
hash codes (it requires brute force and 2^128 or 2^1
60 are
big
)



Thus, if the (MD5/SHA
-
1) hash code of a message is known, we can check whether
the message has been modified by computing the hash code of the message



Given the quality of the secure hash, it is just as good to encrypt the (compact) hash
code w
ith your private key for authentication as encrypting the entire message

Closing remarks on security

Security
-

a purely technical problem?




Security can be addressed through a number of technical means



However, these valiant efforts are all for naught

o

in
the face of inexperience and general cluelessness



The most successful hackers have operated, not through absurd Hollywood computer
guru excellence, but through
social engineering


o

hacking is considerably easier if you can get people to tell you their passw
ord



Script
-
kiddies usually operate through kits exploiting
well
-
published

security holes

o

leaving their victims with a valuable lesson if nothing else



Most worms spread because people apparently just can't help clicking on attachments...

o

where is the proble
m? With Outlook or the users?

o

"Machine destroying email"
used to be

an Internet urban legend... but no more
(this is progress?)

Security in P2P



P2P syst
ems previously seen...




Groove




Various techni
ques for anonymity


P2P systems previously seen...

Gnutella/Napster/KaZaA/...
-

the vast majority of file sharing tools




No security to speak of



No user authentication

o

in some cases for quite obvious reasons...

o

but users are not anonymous...



No on
-
the
-
wir
e encryption

o

eavesdropping is trivial on most file sharing networks, as a number of people
have discovered to their cost



No content integrity checking

o

the retrieved file is not what it seems...

Freenet



No authentication (to real world identities) as such,
but can authenticate pseudonyms,
allowing e.g. only the original author to update a document



Each resource in a Freenet node space is encrypted and integrity checked with SHA
-
1
hash



Network traffic not encrypted, but as the resources are encrypted, this is

less of a
problem



Routing is performed in a way to foil surveillance

JXTA



Per default:

o

authentication at the client

o

not on
-
the
-
wire encryption

o

not strong checking of group membership



However

o

secure pipes can be used

o

Arbitrary (as in "write your own") grou
p membership checking can done

Groove

Security concerns for the corporate world
-

or this place for that matter




Most are placed behind a firewall

o

excellent protection for most attacks

o

unless black hats somehow get inside

o

can make it quite difficult to col
laborate with people on the other side of the
firewall



As email pass through the firewall, it is a popular choice for collaboration and sharing

o

but as noted before email is usually not safe

o

mail filters check for spam, worms and trojans, but cannot be comp
letely
updated at all times

Groove
-

P2P for the corporation




Aims to provide a secure shared space between coworkers



Encryption is the default and quite transparent to the user

o

thus avoiding the hassle of PGP & S/MIME in email



Provides

o

chat with threaded
persistent discussions

o

shared writing space

o

file sharing

o

calendar

o

(according to their Web site, they now also integrate with Microsoft Office)

Identity in Groove
-

a system with many, many keys




A user in Groove has an account (possibly on a number of devi
ces) with

o

A pass phrase to login to the client and to generate the

o

master symmetric key



The user can have a number of identities, each with

o

an asymmetric key pair for signing and verification

o

an asymmetric key pair for encryption

o

a digital fingerprint

o

a Di
ffie
-
Hellman key pair per member per shared space

Shared spaces
-

mutual trusting




A mutual trusting shared space has

o

a symmetric key used by all members to verify/authenticate messages

o

a symmetric key used by all members to encrypt messages



All space data

is stored on all clients using a symmetric storage key per member per
shared space (encrypted with the user's master key)



There is not guarantee for identity
-

users could in principle spoof identities when
posting to the group

Shared spaces
-

mutual trus
ting messages




A delta (a chunk of data) is sent from Bob to the group containing:

o

a header

o

a body encrypted using the group's symmetric encryption key

o

a digest (SHA
-
1) of header and body authenticated using the group's
symmetric authentication key



Upon re
ceiving this delta, Alice can

o

verify the integrity of the delta (and thus ensuring that the delta was sent by a
group member)

o

decrypt the delta

o

store the delta locally using her own encryption

Shared spaces
-

mutual suspicious




Mutual suspicious shared spa
ces are used in situation where the identity of the
individual user must be ensured



Each member has a Diffie
-
Hellman key pair per other member



D
-
H allows Bob to compute a shared symmetric key with Alice using Bob's private D
-
H key and Alice's public D
-
H ke
y (and vice versa)



By using this symmetric encryption communication between Alice and Bob is secured

Shared spaces
-

mutual suspicious messages




A delta is sent from Bob to the group containing:

o

a header

o

a body encrypted using the group's symmetric encrypt
ion key

o

digests (SHA
-
1) of header and body authenticated using each D
-
H derived key
pair, including Bob/Bob



Upon retrieval, Alice can then verify that the delta was indeed sent by Bob

Initiating membership of a shared space



Bob is chair and sends Alice an
invitation to a shared space

o

cryptographic context (parameters for D
-
H)

o

Bob's public keys



Alice then replies with

o

a one
-
time key encrypted in Bob's public key

o

Alice's D
-
H public key



Bob can then

o

decrypt the one
-
time key, use it to send an encrypted message

only Alice can
respond to and thus verifying that she intends to join the space

o

send the group keys to Alice

o

send Alice's D
-
H public key to the other members of the space

Leaving a shared space



If Carol leaves the shared space, the group key is recomputed
, so that she no longer
can access the shared space's material



Group keys are not recomputed, when people join the space, as they should have
access to the history of the group



Dependency on keys are stored, so that other members (who perhaps were off
-
line
)
can still access data

Communication in Groove



Usually peers communicate using point
-
to
-
point connections



Peers are not always online or immediately available (e.g. they could be behind a
firewall)



Relays are used for tunnelling through firewalls, handle
fan
-
out of messages, and do
storing and forwarding

Various techniques for anonymity

Mix networks
-

defeating traffic analysis




Mix networks are used to ensure that a sender and receiver cannot both be known



A mix network consists of a number of known mixes

-

routers with asymmetric key
pairs



A sender chooses a path through the mix network (
m_1
, ...,
m_n
), and encrypts the
message (with final destination) with
m_n
's public key, encrypts this message (with
m_(n
-
1)
-
>
m_n
) with
m_(n
-
1)
's public key and so on



The

message is then sent to
m_1
, who decrypts the message using its private key, and
sends it to the next mix, who repeats the process



Only
m_1

knows the sender and only
m_n

knows the receiver and neither knows the
route of the message (not even their own pos
ition on the path)

Crowds
-

defeating Web browsing tracking




A number of members participate in a crowd, and they are known to each other



If a member, Bob, wishes to retrieve a Web page, Bob sends a request for the URL to
a random member, Carol (using symm
etric encryption). Carol can then choose to retrieve
the Web page or forward the request to another crowd member, Alice, and so on.
Eventually a member chooses to retrieve the Web page, and the Web page is returned along
the request's path

Summary and poin
ters



A number of proven technologies exists, which can enable safe computing



The success of these technologies hinges on ease
-
of
-
use, as they are (in themselves)
quite complicated



There are two directions in the area

o

ensuring anonymity and privacy on the I
nternet

o

ensuring identity and integrity in a working environment



Next time:

o

Accountability

o

Reputation



can get complicated if users are anonymous...

Project description time!



You must formulate a P2P project for your group



You will present the result of thi
s project to the rest of us during the first half of
December



(and deliver a report about it to Klaus and I by the middle of January)

Project topics



Anything goes! As long as

o

it is P2P in nature

o

it is sufficiently ambitious (neither too much nor too little
)



You can

o

develop new routing and searching algorithms

o

create a neato P2P application

o

extend JXTA in some direction

o

create a P2P framework The Way It Ought Be Done And Not The Way JXTA
Did It

o

etc., etc.



Only one requirement prior to starting:

o

send an URL t
o a short project description (no more than a page) to Klaus and
I and get it approved


Created by
JackSVG



P2P Security
-

Reputation and
Credibility

1.0

07/October/2003

Niels Olof Bouvin

Aarhus U
niversitet

Recapture from last time:



Cryptography enables (to a high degree) secure communication



With asymmetric cryptography (public/private keys) messages can be free from
eavesdropping and tampering



Thus, we have the basics for building a secure infras
tructure



This however still leaves the question: Who can you trust?

Reputation and Credibility



Introduction




Reputation and moderation systems


Introduction

Who can you trust?



Trust is inherent in most of our social interactions



Generally we have a pretty good idea whom to trust and whom not to trust



People (and companies) earn and loose our trust over time



If Alice trusts Bob, and Bob trusts Daniel, Alice is more likely to trust Daniel

o

and if Bob does not trust Carol, Alice is unlikely to trust Carol



Trust is hard to earn
-

easy to loose

o

a powerful in
centive to be nice

Trust in a computer setting



In real life, we have a number of mechanisms for handling trust, and we are usually
dealing with people we know



On the Internet, these mechanisms break down

o

"on the Internet, no one knows you're a dog"



You can

establish trust across the Internet

o

and the techniques outlined last time can help you ensure that you are dealing
with the person you expect

Some types of attack on the Internet



Pseudospoofing

o

using a number of aliases to influence a situation (e.g. voti
ng)

o

on
-
line identities are not tied to real world identities
-

people cannot be
prevented from using various aliases



Denial of Service

o

request a service so much it becomes unavailable

o

can be handled (to a degree) through caching and micropayment



Flooding

o

f
illing a system with garbage drowning out the valid content

o

various defences: micropayment or dropping unpopular content

Reputation and moderation systems

Usenet



People on Usenet earn their reputations by their postings



Some are respected as authorities
-

others are reviled as kooks, flamers, spammers,
and trolls



Especially trolls will often change their identities as they invariantly are kill filed by
their unwilling audience

o

uncovering these new identities is a pastime of some newsgroup residents
-

but
th
ey are of course always one step behind the troll

Usenet
-

anti
-
{kook,flamer,troll} weaponry




Most news readers have 'kill file' functionality

o

the user adds an unwanted poster to his/her kill file and the unwanted one is
then automatically filtered by the
news reader

o

disadvantage: binary, requires user intervention. Per user.



Some news readers (Gnus) use
adaptive scoring


o

postings and users are awarded scores depending on the reader's actions

o

advantage: much more flexible

o

still not shared
-

does not work fo
r newsgroups only visited seldom or for the
first time



Apart from adaptive scoring, scoring can be explicitly designated, so that specific
subjects or authors are scored up or down. Scoring can depend on arbitrary computations
on the postings

o

disadvantage:

complex

Usenet
-

moderation




Some newsgroups become too cluttered by spam or flame wars



Either the newsgroup is left to deteriorate (with the regulars moving elsewhere) or the
newsgroup becomes moderated



A number of trusted regulars are elected as moderat
ors by the newsgroup



A posting must hence forward be approved by a moderator, before it appears on the
newsgroup



Problems:

o

moderation is hard work

o

moderators can misuse their power

o

moderation of newsgroup
can

be bypassed

Usenet
-

NoCeM and GroupLens




Moder
ators do a great job, but

o

not all newsgroups are moderated

o

you may not agree with all the moderators' decisions



NoCeM

o

opt
-
in moderation
-

if you trust a moderator, you can subscribe to his/her/its
decisions



GroupLens

o

moderation done by people you agree wit
h

o

you assign scores to individual postings, and thus build a profile

o

the moderation done by people with similar profiles are then used to score the
postings they see

Slashdot



Purpose

o

moderate postings on the discussion boards by scoring interesting posts u
p and
scoring uninteresting/inflammatory posts down



Moderation were traditionally handled by a group of trusted moderators

o

problem:
way

too many posts for the moderators to handle

o

misuse of moderator power (who watches the watchers?)

Slashdot
-

meta
-
modera
tion




All (registered) users in good standing occasionally gets (limited) moderation power



Moderators cannot themselves participate in the discussions they moderate



The moderations are meta
-
moderated by all registered users



Bad moderators will not moderate

again (or very rarely)

Amazon



Purpose:

o

the recommendation of books, you're likely to like (and buy)



Based on the buying habits of the Amazon customers



You're (presumed) likely to buy books bought by people, who buy the books you buy



This works surprisingl
y well

eBay



How can you trust a stranger with your money?



Sellers and buyers mutually rate each other after each transaction



So, you don't buy stuff from sellers with bad reputations



Problem:

o

you can build up a good reputation over many smaller transaction
s and then
cheat people on high price deals

Advogato



System to establish the reputation of open source programmers

o

users assign trust to each other
-

their own reputation determines how much
their trust is worth to others

o

pseudospoofing attacks become diff
icult as new users have zero trust to give to
others

Micropayment
-

making cheating unprofitable




DoS attacks becomes infeasible if attackers must pay for the used resources



Cash payment

o

using a number of sophisticated cryptographic methods

o

very much an on
going research topic



Proof of Work (POW)

o

performing some non
-
trivial work to gain access to a resource

o

typically variation of cryptographic hash collision calculation

Summary



Existing reputation systems provides users a modicum of quality of service



Howeve
r, they are not tamper proof and may be misused



Even if the moderators have integrity, you may not agree with them



Systems must have checks and balances in order to prevent abuse

P2P systems



Reputation and security in P2P systems




Damiani et al