Routing Indices For Peer-to-Peer

cracklegulleyAI and Robotics

Oct 19, 2013 (3 years and 11 months ago)

89 views

Routing Indices For Peer
-
to
-
Peer
Systems

Svetlana Strunjas

University of Cincinnati

May,2002



Outline


Motivation and background



P2P systems



Routing Indices



Experimental results



Conclusions

Motivation




Create efficient and feasible


distributed


index search mechanism for
P2P systems



Background



Napster


centralized indices (vulnerable to
attack and difficult to scale)



Gnutella


queries flood significant part of
the network(simple but very costly approach)

Background(cont’d 1)



Freenet


each node builds an index with the
location of recently requested documents
(queries are restricted to the specific
documents and it takes time for node to build
an effective index)


Background(cont’d 2)


Oceanstore


distributed


index approach
similar to Routing Indices(special case of
Compound RIs)


Key difference is in static network and
queries on document identifier (Oceanstore)
versus dynamic network and queries on
content of document(Routing Indices)

Background(cont’d 3)


Selecting a neighbor for forwarding query using Routing Indices is
generalization of the Bellman
-
Ford algorithm


Bellman


Ford algorithm


-
transmits packet between two nodes through the


shortest route


-
destination of packet is pre
-
defined(IP routing)


Routing Indices


-
transmit packet from node to node in order to find


best answer to the query


-
destination of packet is not pre
-
defined , but instead depends on


query contained in packet

P2P Systems

(Introduction)


Formed by a large number of nodes that can
join or leave the system at any time and have
equal capabilities


Each node has local documents database
which can be accesed through local index


The local index receives content queries and
returns pointer to the document with
requested content

Query Processing in Distributed
P2P Systems



Users submit queries to any node along with
the stop condition


Node receives query , evaluates the query
against it’s own database ,returns pointer to
any results , and , if the stop condition is not
reached , forwards the query to
neighbor(s)(parallel or sequentially)

Routing Indices

Introduction



RI allows a node to select “best“ neighbors to send a query
to



It is data structure(and associated algorithm) which ,for
given query , returns a list of neighbors , rank according to
their goodness for query



Goodness reflects the number of documents in “nearby”
nodes


Routing Indices

Types


Compound Routing Indices(CRI)




Hop
-
count Routing Indices(HRI)



Exponential Routing Indices(ERI)

Compound Routing Indices

(example)

CRI(cont’d1)


Documents are on zero or more topics,and queries
request documents on particular topic


Each node has a local index for quickly finding
local documents


CRI contain the number of documents along each
path and the number of documents of each topic of
interest

CRI

(“goodness” estimators)


Measure of “goodness” of the neighbor , for given
query , is number of documents that can be found in
that path


Estimated number of results along each path is
given by formula:



CRI

(drawbacks)


CRI do not take into account the difference in cost
due to the number of hops necessary to reach the
document



Are not applicable if we have cycles in network




Using RI









RIs with four topics of interests : database,network,theory and languages

Storage space required


It can be adjusted by increasing or decreasing the
level of summarization of index


For centralized systems it is C*(T+1)*N bytes


For distributed systems it is C*(T+1)*B*N


C


number of categories


T


counter size in bytes


B


branching factor


N


number of nodes



Creating (maintaining)RIs





Disconnection from Network

150 75 0 75 125

1500 110 80 70 165

Algorithm for
Creating/Updating RIs



There are two phases :




1.Newly connected node sends a summary of its local index to its
new neighbors (creation phase)


2.The node waits for update messages or for changes on its local
index ; after updating its RI , the node sends a new aggregate RIs
to all its neighbors

Algorithm for Answering
Queries



It runs every time query is received


Query is given together with stop condition


There are three phases:


1.Node attempts to answer the query using its local db


2.If not enough results are obtained to reach stop condition , the


algorithm ranks all the neighbors using estimating function


3.Node sends the query to each neighbors sequentially,checking if


the stop condition was reached whenever each query returns



Hop
-
count RI


Store aggregated RI for each “hop” up to a
maximum number of hops(horizon of the RI)


Hop
-
count RI(cont’d1)


Goodness of neighbor is defined as ratio between the
number of documents available through that neighbor
and the number of messages required to get those
documents



Simple model that allows us to compute this ratio is
regular
-
tree model


Hop
-
count RI(cont’d2)


Model assumes that all nodes are
connected to exactly F+1 nodes ,
except ones at the leaves which are
connected to only one


Goodness of the neighbor is given by
formula:



goodness()


estimating function for CRI

N
i
[j]

RI entry for j hops through i
-
th neighbor

F
-
fanout

H
-
horizon

Hop
-
count RI

(drawbacks)


HRI performances can be negatively affected
by the lack of information beyond the
horizon



Higher storage and transmission cost than
CRI

Exponentially Aggregated RI


Stores the result of applying the regular
-
tree
cost formula to hop
-
count RI

Exponentially Aggregated
RI(cont’d)


Each entry of the ERI for node N
contains a value computed as
:



th


height of the assumed regular tree

goodness()


estimating function for CRIs

N[j]


summary of the local index of neighbor j of N

T


topic of inerest for entry


With ERIs we can keep information for all nodes


accessible from each neighbor in the RI,while HRI’s do not have

any information beyond horizon





Cycles in P2P Network

There are three approaches for dealing with


cycles:


No
-
op solution



Cycle avoidance solution



Cycle detection and recovery


No
-
op solution


Works only with the HRI and ERI




If there are cycles CRI algorithm can be
trapped in an infinite loop




NO
-
op solution (cont’d1)

10

15

20

G1=21.67

G2=23.58

NO
-
op solution (cont’d2)


In HRI cycles longer than horizon will not affect
the RI


If we use regular
-
tree cost model we can limited
effect of short cycles


In HRI and ERI difference between two values of
goodness introduced by cycle is very
small(algorithm will not propagate updates which
are not much different than old value)


Cycle avoidance solution


Do not allow nodes to create an update
connection to other node if such connection
creates the cycle



Drawback : Absence of global information
causes suboptimal update network

Cycle detection and recovery


Detects cycles sometime after they are
formed and takes recovery actions to
eliminate the effect of the cycles


Cycles are detected using unique message
identifier


Recovery procedure can decrease accuracy
of the RI

Experimental Results

(Introduction)


Results are obtained using different models for


elements of search mechanism


Steps of experiment:


1
.Setting P2P system as network of nodes T, where each node


contains set of documents


2.Users send requests consisting of query Q and stopCondition


to a node of P2P system


3. search mechanism answers requests by obtaining a set of documents of size


stopCondition that matches Q


4.search mechanism allows for update such as addition of nodes and new


documents





Experimental Results

(Techniques and measure of cost)



Search techniques : CRI , HRI , ERI,


No
-
RI(for comparison purposes)



Measure of cost:numbers of messages
generated

Experimental Results

(Network topologies)


Topology of network defines number of nodes and
how they are connected


In this model , three kinds of topologies are
considered:


1.Tree


2.Tree with added cycles


3.Power
-
law graph



Experimental Results

(Distribution of documents)


Distributions for modeling the location of document
results :


1.Uniform distribution( all nodes have the same
probability of having each document result)


2.80/20 biased distribution ( assigns uniformly 80%
of the document results to the 20% of nodes , and
the remaining 20% of the documents to the
remaining 80% of nodes)

Experimental Results

(Simulation parameters)

Experimental Results

(Comparison of CRI,HRI and ERI)

Experimental Results

(Number of results)

Experimental Results

(Effects of overcounts)

Experimental Results

(Effects of cycles)

Experimental Results

(Network topology)

Experimental Results

(Updates and network topology)

Experimental Results

(Updates and cycle policy)

Experimental Results

(Updates per minute)

Conclusions


ERI and HRI offer significant improvements
versus not using an RI,while keeping update
costs low


Routing indices (in particular ERI and HRI
can help improve the search performance of
current and future P2P systems


References:


[1] A.Crespo , H. Garcia
-
Molina.Routing Indices For
Peer
-
to
-
Peer Systems(2001).
The 22
nd

International
Conference on Distributed Computing Systems
,
Vienna , Austria.


[2] J.F.Kurose , K.W.Ross.(1999).”Routing
Principles”.
Computer Networking


A Top Down
Approach Featuring Internet,
2
nd

ed. , Addison
Wesley,4.