2 P2P or Not 2 P2P

doctorrequestInternet and Web Development

Dec 4, 2013 (3 years and 8 months ago)

54 views

1
2 P2P or Not 2 P2P
M. Roussopoulos, M. Baker, D.
Rosenthal, T. Giuli, P. Maniatis,
and J. Mogul
Outline
• Introduction
• Problem Characteristics Axes
• Candidate Problems
• Decision Tree for 2 P2P or Not 2 P2P
• Conclusion and Discussion
Introduction
• P2P researches focus:
– Algorithms for efficiency, scalability,
robustness, security, indexing/search,
dissemination, etc.
• Problem Addressed:
– What questions should a system designer
ask to judge whether a P2P solution is
appropriate for his particular problem?
Problem Characteristics Axes
• P2P Characteristics
– Self organizing: no global directory of peers or resource
– Symmetric communication: no client/server role
– Decentralized control: no centralized sever
• Problem Axes:
– Budget: P2P is a low budget solution
– Resource relevance to participants: high relevance favors P2P
solution
– Trust: cost of handling trust is high in P2P
– Rate of system changes: timeline and consistence
– Criticality: P2P is not favored
– Physical constraint: not included
Candidate Applications
N/AN/ALowLowLow
Cell phone
forwarding
N/AN/ALowLowLow
Internet backup
N/AN/AHighLow Low
Corporate
backup
HighHighHighHighLow
Distributed
monitoring
(online)
LowLowHighHighLow
Distributed
monitoring
(offline)
HighHighHighHighLow
Internet routing
(BGP/RON)
CriticalityRate of changeMutual TrustRelevanceBudgetApp Name
HighLowHighHighLow
Ad hoc routing
Candidate Applications
LowHighHighHighLow
File sharing
N/AN/ALowHighLow
Freenet
highLowHighHighLow
Critical flash
crowds
LowLowHighHighLow
Non-critical
content distr.
LowLowHighHighLow
Usenet
N/AN/ALowLowLow
Tangler
N/ALowLowHighLow
Auditing
LOCKSS
N/ALowLowHighLow
Distributed time
stamping
CriticalityRate of changeMutual TrustRelevanceBudgetApp Name
2
Decision Tree
Conclusion
• Motivation for P2P solution
– Limited budget, high relevance of resource,
high trust between nodes, low rate of system
change, low criticality of the solution
• Downside of P2P solution
– Design complexity
– Not applicable to all problems
Discussion
• Decentralized P2P architecture
– Scalable, fault-tolerant, lack of resource bottleneck,
openness
– Less control, complexity, insecure
• Is the decision tree applicable to all P2P
applications?
– How about other P2P applications like P2P CVS, P2P
E-mail?
• Intellectual property right against complete
openness
• Is P2P stoppable?
• Hybrid P2P structure
Scooped Again
Jonathan Ledlie, Jeff Shneidman, Margo Seltzer, John
Huth
Harvard University
Outline
• Motivation
• What is the Grid & What is P2P
• Three fallacies
• Shared technical problems
• Conclusion
• Discussion
Motivation
• The history of Web
– In 1989, Tim Berner-Lee’s need to communicate his
own work and the work of other physicists at CERN
led him to develop HTML, HTTP, and a simple
browser.
– This simple inelegant solution remains at the core of
the Web until now.
• A parallel situation exists today with the p2p and
Grid communities.
– A large group of users (scientists) are pushing for
immediately useable tools to pool large sets of
resources.
3
What is the Grid
• a type of parallel and distributed system that enables the
sharing, selection, and aggregation of resources
distributed across multiple administrative domains based
on their (resources) availability, capability, performance,
cost, and users’ quality-of-service requirements. ------
Buyya
• Goals of the Grid (autonomic computing)
– Self-configuring
– Self-tuning
– Self-healing
• Manifestation
– Condor (shared computation)
– Globus (computational middleware)
– European Data Grid Project (Data Grids)
What is P2P
• Much like a grid
• Peer-to-peer is a class of applications that take
advantage of resources —storage, cycles, content,
human presence —available at the edges of the Internet.
---Shirky
• Goal of P2P
– take advantage of the idle cycles and storage of the edge of the
Internet, effectively utilizing its “dark matter”.
• Manifestations
– Gnutella, KaZaA (file sharing)
– Distributed.net (distributed computation)
– Chord, Pastry, Tapestry
Three Fallacies
Difference in technical problems?
• Conventional wisdom
– “Computational problems”, Grid
– “file sharing”, P2P
• P2P is moving in the computational
direction, i.e. desktop collaboration,
network compuation
• Similarity in formation, utilization, security
and maintenance.
Solutions should be fundamentally
different?
• Researchers familiar with both
communities see good ideas in each
community that can solve common
problems.
• This fallacy is application dependent
• A general awareness of technical
approaches taken by the other community
may help solve “physically private”
problems.
Flexibility or not?
• P2P research is very flexible: one version can
obsolete the previous and new algorithms can
be developed without conforming to any
standard.
• There is room for flexible research in Grid too.
– Grid researchers recognize the need for test-beds as
staging grounds for new applications and protocols.
– Traditional Grid settings have been in university
settings where support staff is on hand to test and
deploy new software updates
– Grid users are willing to adopt different technologies
to get their work done.
4
Shared Technical Problems
• Formation
• Utilization
• Coping with Failure
• Maintenance
Formation
• Topology formation and peer discovery
deals with the problem of how nodes join a
system and learn about their neighbors,
often in an overlay network.
– Much Grid infrastructure is hardcoded and
could benefit from the active formation found
in p2p research prototypes.
Utilization
• Both communities have examined data
replication and caching algorithms to use
resources more efficiently.
• Scheduling and handling of contention has
been examined in both communities.
• Load balancing/splitting schemes in both
communities have been attempted.
Coping with Failure
• P2P allow lossy storage to some extent.
However, Grid data cannot be lossy.
• Traditional distributed system techniques
for dealing with failure may not be
appropriate for traditional p2p systems or
Grid systems.
• They both have to deal with authentication
issues, authorization issues, availability
issues.
Maintenance
• Traditionally, P2P has no standard or API
• Grid papers profess the need for a
standardized programming interface, like
OGSA
• Similar efforts in P2P standardization are
Berkeley BOINC, Google Compute, and
overly standardization.
How do we avoid being scooped
again then?
• familiarize ourselves with the set of
problems the Grid is addressing
• Understand Grid users’ day to day needs
• how robust must a solution be in order to
be appropriate for deployment on a Grid?
• We should understand the standards they
are developing and to which they expect
all systems to comply (OSGA)
5
Find a user and figure out what that user needs
Discussion
• How can we take into consideration the
huge gap of storage size between P2P
and Grid, whose data cannot be lossy?
• Grid has lower churn rate, higher trust
than P2P, thus, many P2P solutions might
be an overkill for Grid.
• What can p2p researcher do in practice?
A Modular Network Layer for
Sensornets
Cheng Tien Er, Rodirgo Fonseca,
Sukun Kim, Daekyeong Moon, Arsalan
Tavakoli, David Culler, Scott Schenker,
Ion Stoica
The need of modular network layer
• Vertical integrated design
– Variety of sensornet applications
– Heavy need for optimization
– Lack of consistency in component module and interface
• Modular network layer
– Solve the inconsistency in component module and interface
– Better organization, easy development
– Less consumption of resource, i.e. memory, energy etc.
– Narrow-waist of sensornet arch lies between the link and network
layer
• Design goal
– Code reuse: rapid application development
– Run-time sharing: sharing code and resource, i.e. code, radio etc.
Common Components in Protocols
30
Network layer service and major
components
• Provides best-effort,
connectionless multi-
hop communication
abstraction to higher
layer
• Balance flexibility and
reusability
– Control plane
• Routing engine(RE)
• Routing topology(RT)
– Data plane
• Forward engine(FE)
• Output queue(OQ)
6
Interfaces
• FE | OQ: pass complete package around
• FE | RE
– Basic interface: All routing protocols must provide this
– Cost-based interface:Some routing protocols can extend the
interface and provide this
• RE | RT
– Unified interface have increased code size, added complexity,
instability
– Protocol specific necessary information to determine routes
Interface BasicForwarding {
neighbor_list getNextHops(RoutingHeader*)
}
Interface CostBasedForwarding {
cost_t getCost(RoutingHeader*, neighbor)
}
Package Header
• Separate headers for different components
– Protocol identifier: select appropriate components
– OQ header: info for scheduling and buffer
management, i.e. packet priority.
– FE header: info for forwarding packet, i.e. hopcount,
unique message id etc
– RE: info to determine next-hop
Output Queue Modules
• Implement packet scheduling
– Basic OQ: high priority packet transferred before low priority
ones
– Flexible power scheduling (FPS):
• TDMA, fixed slot each cycle
• Balance supply and demand at each neighbor and achieve high
utilization
• Require the knowledge of destination of the message to classify
packet (help from RE)
– Epoch-based proportional selection (EPS):
• Dynamic number of slots per cycle
• Weighted round-robin serving of children’s queue to achieve
fairness.
• Require the knowledge of destination of the message to classify
packet (help from RE)
Forwarding Engine
• Basic forwarding
– Get per-packet next-hop info from RE
– Check for packet interception request from
higher layers
• Opportunistic forwarding
– Use the knowledge of the cost to the
destination (from RE) to forward the packages
• Multicast
– RE implicitly returns the list of all next-hops
Routing Engines Module
• Broadcast RE:
– Handles all packets that are logically
broadcast to all one-hop neighbors
• Protocol specific routing
– PathDCS: route along the network path which
data is mapped to using beacon as guides
– BVR: route based on hop distance to beacons
– MintRout: route along the topology tree
Routing Topology
• MTree
– M routing trees routed at random nodes in the
netwrok
– Used for basic route-to-base applications
– Routing info maintained in routing table
• Gradient topology
– Each node maintain its cost-to-destination
• Geographic RT
– Geographic coordinates, i.e. GPS etc.
– Provide closet next-hop nodes towards a given
destination
– Euclidean distance can be used as a cost metric
7
Packet forwarding procedure
Example: Collection by MintRoute
• Tree based routing
Routing Tree maintains:
• Cost to root
• Neighbor's costs
• Parent
Tree RE/RT
Classic FE
neighbor_list
getNextHops(RH*)
data packet flow
control signals
Tree maintenance
traffic
Forward {
getNextHops(RH*)
SPSend(first_in_list)
}
Example: Point-to-Point by
Geographic Routing
• Greedy geographic routing
•My coordinates
•Neighbor coordinates
•Cost is eucl. distance
Geo RE/RT
Classic FE
Forward {
getNextHops(RH*)
SPSend(first_in_list)
}
data packet flow
control signals
Coordinates
exchange with
neighbors
neighbor_list
getNextHops(RH*)
Example: data-centric routing by
Opportunistic FE
Routing Tree maintains:
• Cost to root
• Neighbor's costs
• Parent
Tree RE/RT
Opportunistic FE
cost
getCost(RH*, me)
Forward {
if (my_cost < pkt.cost)
{
pkt.cost = my_cost
SPSend(bcast)
}
}
data packet flow
control signals
Tree maintenance
traffic
Constraint on Composition
• Basic FE|OQ works for all protocols
• Opport requires meaningful cost-to-dest
metric
Evaluation: code size and memory
footprint
• Two combination save 40%-58% less
memory, 18%-37% program memory
8
Evaluation: Performance
• More delay in
network layer
than monolithic
arch
• Performance is
not the primary
goal in sensornet,
i.e. energy.
Conclusion
• Modular network layer design achieves
58% memory reduction and 37% less
code when running protocols concurrently.
• Increase portability of sensornet
application.
• Speedup the development of sensornet
applications.
• Additional latency is acceable in the
context of sensornets.
Discussion
• How complex can a sensornet application
become in future?
• How does the layered design affect other
functionalities, i.e. power management, security,
reliability, and time synchronization?
• How to address issues which require cross-
layer cooperation in a layered design?
• How well does the “end-to-end” argument fit into
the sensornet applications?
Evaluating the Running Time
of a Communication Round
over the Internet
Omar Bakr, Idit Keida
Outline
• Introduction
• Methodology
• Experiments
• Conclusion
• Discussion
Decentralized or leader-based?
• Hard to answer because
– end-to-end Internet performance itself is
extremely hard to analyze, predict, and
simulate
– end-to-end performance observed on the
Internet exhibits great diversity (in terms of
topology and time period)
– different algorithms can prove better under
different performance metrics
9
Round-based Metric?
• The typical theoretical metric used to
analyze the running time of distributed
algorithms is the number of message
exchange rounds the algorithm performs,
or the number of communication steps
in case of a non-synchronous system
• The paper’s result indicated the round-
based metric is misleading
Algorithms
Methodology
• Host list
– MIT, at the Massachusetts Institute of Technology,
Cambridge, MA;
– UCSD, at the University of California San Diego;
– CU, at Cornell University, NY;
– NYU, at New York University, NY;
– Emulab, at the University of Utah.
– CA in California
– UT1 and UT2 in Utah.
– KR in Korea,
– TW, at National Taiwan University in Taiwan;
– NL, at Vrije University in the Netherlands
Server Implementation
• Each server has knowledge of the IP
addresses and ports of all the potential
servers in the system.
• Every server keeps an active TCP connection to
every other server that it can communicate with.
• Server invokes each algorithm in a round-robin
order after a random period
• Constantly run probe to track the latency and
loss rate of the underlying network
Measurements
• Distribution of local
running time
• Distribution of overall
running time
• Clock skew
– Compute clock skew
every 15 mins
– Compute clock
difference against a
fixed host
Running time distribution over
TCP/IP
• Assume TCP latency is d and loss probability
is p, and TCP retransmission timeout grows
exponentially.(d,2d,4d,8d…)
1
[ (2 1) ] (1 )
k k
P latency d p p

= − = −
[ ] [ ( ) ]
each
link
i
P latencyOfStage D P latency i D
< = <

The last formula explains why the expected overall running time of a
stage increases as the number of messages increases.
10
Experiment I
-Lossy Hosts
11
Experiment III
• Impact of latency changes over time
– All-to-all from all initiator got affected
– Leader is affected only for those initiators
whose latency have changed
• High loss rate does not necessarily result
in high overall running time.
• Observe triangle inequality and re-route
messages.
Experiment III