Q: What is the problem?
A: The RSSR paper has an error.
Algorithm 0
seems does not work.
Q: But the evaluation result
s
look good.
A: The
se
result
s
are
generate by another algorithm (
Algorithm 1
)
Q: Why does
Algorithm 0
does not work?
Is it really an error?
A:
By simulation, t
he honest edge betweenness is not so larger than that of the attack edge betweenness.
Q: What is the
possible
reason?
A:
The problem
is
that in
the simulation,
the issuer rate is too small.
Here,
issuer rate
(
C
)
is the number of ARWs disseminated by each pair of nodes.
Let he and ae be an honest edge and an attack edge, respectively.
Suppose that the
network is even, i
f C is large,
the in and out betweenness of
he cancel off each
other, which makes the betweenness of he low. However, if C is low (i
n the simulation,
C=1
).
Therefore, the in

betweenness and out

betweenness of honest nodes
does not cancel
off each other, making the betweenness of he also high.
Figure
1
shows the result of Algorithm 0 when one pair of nodes disseminates 5000 first type ARWs. Therefore, the conclusion is that C
should be large. But the
number
node
pairs that disseminate ARWs is not necessarily to be large.
Lemma: each first type random walk increase the betweenness of at least one attack edge.
To forge the destination set of each honest node.
To reduce t
he betweenness of attack edges, Sybil nodes
can launch two kinds of attacks.
Attack 1: feedback
Solution: set the length of each arw to be m. For a (s,t)

ARW arw, if arw does not reach t in m steps, s rejects t.
First,
m steps is sufficient for t to be reached
.
When t is honest, the Sybil nodes should not launch attack 1.
Suppose that t is honest.
I
f Sybil nodes prevent t from being reached, the betweenness of attack edges
will be increased.
s can issue large number of (s,t)

ARWs to increase the betweenness of at
tack edge. Therefore, Sybil node should not launch attack 1.
When t is Sybil, Sybil nodes should not launch attack 1, neither.
Suppose that t is Sybil.
If Sybil nodes prevent t from be reached, t will be rejected
.
t is chosen form
the destination set of s,
which means t needs to communicate with s. There, Sybil nodes should not launch Attack 1.
Attack 2: forge
Solution: Steered random walk.
Figure
1
0
0.2
0.4
0.6
0.8
1
1.2
12
36
61
85
109
134
158
183
207
232
256
281
305
329
354
378
403
427
452
g
algorithm 0, c500, real1222rn500
har(sohl)
har(rssr)
sar(sohl)
sar(rssr)
Q: Why does
Algorithm 1
work?
A: Because mathematically it works.
Q:
Algorithm 0
and
Algorithm 1
, which is better?
A: Seems that
Algorithm 1
is more efficient. But, it has security problem.
For v to compute the betweenness of e=(v,u), the betweenness of e for v is something like v
.potential

u
.potential

, where
v
.potential
is the expected number of
times that a (i,j)

ARW passes v.
Therefore, the problem is u maybe Sybil and will not give u the correct
potential.
Q:
OK, you can just give up
Algorithm 1
. What is left problems of
Algorithm 0
?
A: Let arw be a
n
ARW1, the Sybil
nodes may never relay arw to its destination.
I
f you can prove that
Algorithm 1
works, you can write in your dissertation that
Algorithm 1
makes some improvement over
Algorithm 0
.
After the graduation, you can ask the journal to correct your error.
RS
C
Q: Let rw be a first type (
s
,
t
)

SRW, where v and u are honest and Sybil, respectively.
What is rw simply discard rw once received it?
A: This rw will increase the betweenness of an attack edge by one.
Q: Let ae=(fhn,fsn) be an attack edge that connect
s honest node fhn and Sybil nodes fsn. Suppose fsn has received rw from ae. Fsn can simply send rw back to fhn
to reduce the betweenness of ae.
A: ae will finally reach u. Therefore, rw will finally pass some attack edge front the
honest region to the Sybi
l region and increase thus increases the betweenness of
this attack edge by one.
Q:
Then, rw
ha
s
to hop
for
a long distance, which wastes the resource of honest nodes.
A:
Cut rw once rw has hoped
N
steps.
Q:
You mean
that rw
has a writable counter initiated as n? Then, Sybil nodes can reset the counter to be very large.
A:
The counter is a hash chain.
Initially, the counter is rw.counter=Hash(rw.s).
Let v be the current node of rw,
.
Seems that we need
an
asymmetric
encryption.
s encrypts the counter
. In each hop where rw is currently at node x. x asks s to decrypt the counter, decrease the
counter and then re

encrypt the counter.
..
Q: This is a big overhead for t.
A: Secure routing is a big problem for network syst
ems
. All the possible security problems have been intensively researched
[
SPV: Secure Path Vector for securing
BGP
][
SEAD: Secure Efficient Ad

hoc network Distance vector routing protocol
]
[
Effi
cient Security Mechanisms for Routing Protocols
]
.
We do not nee
d to consider
these problems
by ourselves
.
Q: Even without attack, your algorithm has to disseminate many long random walks, which is not efficient.
What is the message cost?
A:
Q: Is the algorithm in your paper wrong?
A: The algorithm in the paper is r
ight. But, to test this algorithm, the network should be large. For example, in pl100rn100g1, the honest edge current is almo
st
equal to the attack edge current.
This is because that the network
is too small.
For e
ach
ARW1 arw, each honest edge e, and each
attack edge ae, arw1
will pass
both ae and he and increase the betweennesses of ae and hp w.h.p.
Q: How can the betweenness of attack edges be increased?
A:
A large number of first type ARWs end in the Sybil region.
Q: So, what should you do?
A: For each
ARW arw, increase the probability that arw enters the Sybil region and decrease the probability that arw return to the honest
region.
SRNC
Q: What is the advantage and disadvantage of SRNC and RSSR?
A:
Advantage
Disadvantage
SRNC
Efficient and
effective:
Suppose that the number of Sybil nodes is comparable to the
number of honest nodes. Then, the m
essage cost
of SRNC is
O(
n
log(
n
)
)
.
The
memory
cost of SRNC is O(n)
It is hard to compute shortest path in real world system. To know shortest path
is
equal to know the route information. Intensive researches have been done to
ensure secure routing.
Secure routing is possible, but expensive.
SRNC is a
synchronous
algorithm, where synchronization is a complicated problem
for distributed system
s
.
Count

to

infinity problem
http://bit.ly/vA7zkK
Can be solved: Destination

Sequenced Distance Vector protocol (DSDV)
http://bit.ly/sXOsU8
Churn?
Churns of DV http://bit.ly/sXOsU8
Compute the accurate shortest path
information of a large network is hard?
C
onvergence
speed
?
Not suited for large and complex networks
Link State protocols should be used instead
RSSR
Random walk is scalable, strong.
Less efficient.
The message cost of RSSR
is up to O(n
2
log
2
(n)).
In RSSR, each pair of nodes has to
disseminate
absorbing
random walks to each other.
Let hn and sn be an honest
node and a Sybil node, respectively
. Let rw be the (hn, sn) absorbing random walk.
Once rw enters the Sybil region, Sybil nodes can prevent
r
w f
rom reaching sn
forever. To prevent this problem, hn rejects sn once rw has moved nlog
2
(n) hops
because nlog
2
(n) has been sufficient for absorbing random walks each any node
.
The number of Sybil nodes i s O(n), therefore, the
message cost of RSSR i s
O(n
2
log
2
(n)).
The memory cost per node is O(
n
2
)
.
Non

deterministic (
Random walk
based algorithm
)
Partial information
can be
maintained more easily and is
sufficient to finish the same job
in many applications
.
o
Can deal with network change
o
Can deal with securi
ty problems
o
Can finish the job
Random Walk Based Node Sampling in Self

Organizing Networks
:
Random walk is particularly attractive to self

organizing networks like Internet overlay
networks and wireless ad hoc networks. In these systems, nodes can join
and leave dynamically without centralized control, and the network topology itself can also
change over time.
Random walk requires little index or state maintenance
and it can function on almost all connected network topologies. In these aspects, it is
sup
erior to systems with sophisticated index states or rigid network structures, e.g., distributed hash tables (DHTs) [25, 26, 2
8]. Compared with index

free node
traversal schemes like network flooding, random walk is inherently
scalable
in that its network c
ommunication overhead does not increase as the network size
grows.
Random Walks in Distributed Computing: A Survey
:
Random walks are interesting by providing a
scalable
mechanism to insert information into the distributed
computation, for example when node
insertion occurs in the distributed system or to update topology modification (edge or node deletion). Ad

hoc networks or
pervasive distributed systems, because of their very limited communication bandwidth for network control, can
also benefit of this ap
proach.
Because of their
inherent complexity, deterministic solutions to control large distributed systems are often unsatisfactory.
One solution is to design randomized algorithms which can
be simpler, especially for their correctness proof.
The system ma
ybe large and the network structure may change a lot, it should be
expensive
to
keep correct deterministic information
.
Deterministic scheme (Shortest path)
It is complicated to get deterministic information in distributed system due to attacks, network
change.
o
Security problem
o
Network change
Q:
It looks like that SybilDector works quit well. Why should I use RSC?
A:
RSC
is more dependable
than SybilDector.
Specifically,
SybilDector
needs
shortest path
information
of the system
, which is hard to obtai
n in distributed system due to attacks
.
The existing shortest path
computing algorithms are…. The security problems are …
In contrast,
RSC need only partial information
, which is easier to obtain
.
RSC use random walk betweenness. RWB is statistic metric
an
d this is easier to compute
.
Usually, systems needing non

determinist information is easier to construct than systems needing complete information.
For example
:
structured and non

structure
P2P systems.
As a conclusion,
RSC is more dependable.
Q: How to
compute shortest path information in distributed systems?
A:
In distributed system, the most used algorithm to for computing shortest path is
the
distance vector protocol
[RIPv1, RIPv2,
IGRP
]
. In DV, each node maintains a
distance vector which contain
s
the shortest path information to the other nodes. The nodes disseminate their DV to their incident nodes periodically and upd
ate
their DV using the received DV.
Gr
adually, the DV of each node converge to a state that contains the shortest path information
to all the other nodes in the system.
In real world systems, DV faces many atta
cks. During
the update of DV, routing updates can be fabricated, modified, replayed, deleted, and snooped. Therefore,
i
ntensive researches have
been done to resist
these attack
s
[
Securing Distance

Vector Routing Protocols
]
[
SPV: Secure Path Vector for securing BGP
][
SEAD: Secure
Efficient Ad

hoc network Distance vector routing protocol
][
Effi
cient Security Mechanisms for Routing Protocols
]
.
Therefore, it is possible to compute th
e shortest
path information, although may be it is expensive
.
This research
assumes that there is efficient and effective shortest path computing algorithms, as SSR algorithm Gatekeeper does [Gatekeepe
r].
Q: Why random walk betweenness is good?
A:
Random
walks based algorithm has many advantages.
These algorithm
s do
not need
deterministic information. Therefore are easier to maintain
[
Random Walks
in Distributed Computing: A Survey
][
Random Walk Based Node Sampling in Self

Organizing Networks
]
.
Q: So, how
to write the introduction of the dissertation?
A:
The objective of this research is to design security mechanisms to resist the
\
emph{false result attack} and the
\
emph{Sybil attack} in distributed systems.
In distributed computing systems, false result
attack, where malicious nodes send incorrect data deliberately to other nodes to disrupt the system, is a key security
threat. This research proposes an algorithm named MSC that can effectively resist the false result attack.
However, MSC is vulnerable w
hen the malicious users can collude. Therefore, this research turns to study Sybil attack, the most intractable kind of collu
ding attack
in distributed systems. Accordingly, this research proposes SybilDetector, which can effectively resist the Sybil attac
k.
However, SybilDetector
has a scalability problem because it is a synchronous algorithm. Hence, this research tries to design Sybil resisting algorit
hms that are both
effective and scalable. Specifically, this research proposed RSC, a mechanism that can improve the performances
of existing Sybil resisting algorithms. Many existing
Sybil resisting algorithms are asynchronous and thus scalable. By implementing RSC on these algorithms, scale and effective S
ybil resisting algorithms are obtained.
Q:
Can you remove
the
global parame
ter of SRNC?
A:
Assumption (1): the weights of honest edges are stable.
Idea:
v sample
s
a set of edges using random walks of length log(n)
.
Calling the set of ending edges S. Then, v
use
s
the average of the weights of the sampled edges
as its threshold.
U
nder
A
ssumption
(1)
,
the random walks
stay within the honest
region
and
end on a random
honest edge.
In this way,
the threshold computed by v
should be equal to the average weight of honest edges
.
Hence, edges of high weights would be detected.
(*
The glo
bal parameter can be
solved
by adopting the water fall random walk technique to SRNC.
However, this problem should
not be completely solved
.
Otherwise, SRNC
completely
outperforms RSSR. Hence, we should simply give a non

complete solution to this problem.
)
Q: This may not work.
What we need is to enable the front honest nodes fh
n
s to detect the
attack edges among their incident edges. However, t
he sampling
random walks of
fhns
will escape the honest region w.h.p.
Accordingly, fhns
have high threshold and cannot detect the attack edges.
A: v can use water fall random walks to reduce the escape probability of their sampling random walks.
Q: Water fall random walk does not work, neither.
For a front honest node fhn, the differen
ce
of
weights of the incident edges of fhn may be small, especially when
the degree of fhn is small.
A:
So, maybe we should use IBT to detect the Sybil nodes
.
Identification
Algorithm (
IA
1): For node v, v
1.
Compute
s
sp(u)
2.
Compute
s
the bottleneck of
bn(
sp(u)
)
3.
C
omputes bb
a.
v samples a set of honest nodes S, called the bottleneck bound
sample
s.
b.
bb=average({bn((v,ui)

SP)  for all ui in S})
4.
Reject
s
u if
and only i
f
bbFactor * bb <
bn(sp(u))
v.sp1=<s1,…, v>
Figure
2
Figure
3
Q: Are the bottleneck bound of honest nodes stable?
A: Yes. As
Figure
3
shows.
Q: Are the bottleneck
s
of routes of Sybil node greatly hi
gher than the bottleneck
bound
of honest nodes?
A: Yes. As
Figure
3
shows.
Q: Good.
Specifically
, how do you do the evaluation?
A: The following evaluations should be
implemented.
1.
Does Assumption (1) hold?
(
“
Yes
”
expected)
2.
Will the weights of attack edges be greatly larger than these of honest edges
? (“Yes” expected)
0
10000
20000
30000
40000
50000
60000
70000
80000
1
19
37
55
73
91
109
127
145
163
181
199
217
235
253
271
289
node index
pl200rn100ae0.02, bottleckBound
bottleckBound
botttleneck
of front nodes
bottleneck
bounds of
non

front nodes
3.
Will
the sampling
random walks end on random honest edge w.h.p? (“Yes” expected)
4.
Evaluate
performance
of SRNC on different networks
.
(Do not compare SRNC with SybilLimit
. It is impossible to win
!)
Q:
Does Assumption (1) hold?
(
“Yes”
expected)
A: Yes. As
Figure
4
shows that:
1.
The edge betweenness an edge e=(v1, v2) is positively correlated to the degrees of v1.
2.
Although fluctuates, weights (load/degree) are almost a constant among the edges.
Figure
4
Q: Will the weights of attack edges be greatly larger than these of honest edges?
(“Yes” expected)
A: Yes. As
Figure
5
shows.
0
500
1000
1500
2000
2500
3000
3500
0
100
200
300
400
1
208
415
622
829
1036
1243
1450
1657
1864
2071
2278
edge index
pl500, relation between edge
load and node degree
load/degree1
e
Linear
(load/degree1)
Figure
5
Q:
Will the sampling random walks end on random honest edge w.h.p? (“Yes” expected)
A:
Yes, a
s
indicated in
Figure
3
.
Q: What is the theoretical performance?
A:
A node
u
is accepted by
v
=>
bbFactor * v.bb < v.bn(
v.
sp(u))
=> need to know the bottleneck bound and the bottleneck => need to know the distribut
ion of weight
=> need to know the distribution of edge
load
=>
The distribution of
load
is not clear yet =>
left
the theoretical analysis of SRNC
as future work.
Q: bbFactor is a global parameter again. Can you remove it?
A:
IA
2: For node v, v
0
100
200
300
400
500
600
700
1
106
211
316
421
526
631
736
841
946
1051
1156
1261
1366
1471
1576
1681
1786
pl200rn100ae0.1, srnc, load/degree1
load/degree1
ae
1.
C
lusters
the paths into two
sets
according to the bottleneck of the paths.
For each set S, denote by the bottleneck of S as the average of the bottlenecks of
the paths in S and denote by Sh be the set that has a smaller bottleneck.
2.
A
ccepts the heads of the paths i
n Sh.
Q:
Which of these two algorithms
(
IA
1 and
IA
2)
has lower error rate?
A:
I think that
IA
2
should be
better.
Suppose
the system have 4 nodes,
{
v, u(0), u(1)
,
u(2)
}
. N
odes u(0) and u(1) are honest, and u(2) is Sybil.
Let X[i]
be
the bottlenecks of (v
,u(i))

SP
. Suppose that
X=[1,5,100].
IA
1: Suppose that the
bottleneck bound
sample
s
of v is {u(0),u(1)}
, which is perfect. Accordingly
, v.bb=(1+5)/2=3. Suppose that bbFactor=1. Then, v cannot accept
any honest nodes.
IA
2:
IA
2 can clusters
{u(0), u(1), u(2)
} into Sh={u(0), u(1)} and Ss={u(1)}. Then, v accept
s
the nodes of Sh.
However,
as
Figure
6
and
Figure
7
show,
IA1 is better.
Figure
6
Figure
7
Q: So, do you have other method to remove the global parameter?
A: IA3 (local benchmark technique)
Having obtained the bottleneck bond
sample
s, v increases bbf until more that
\
beta (e.g, 90%) percent of the bb
sample
are accepted.
Edge load/edge
betweenness
[Community structure in social and biological networks]:
the definition of edge betweenness
.
[Mapping Anatomical Connectivity Patterns of Human Cerebral Cortex Using In Vivo Diffusion Tensor Imaging Tractography]
:
In the present
0
100
200
300
400
500
600
12
36
61
85
109
134
158
183
207
232
256
g
real1222snn500bbf5
sna(Sl)
sna(Srnc)
0
100
200
300
400
500
600
12
36
61
85
109
134
158
183
207
232
256
g
real1222snn500ia2
sna(Sl)
sna(Srnc)
investigation,
we demonstrated that both the node

and edge

betweenness centrality of the human cortical network followed
exponentially truncated power

law distribution
(it is not Power Law!). Exponentially truncated power

law is more resistant to attack.
[The
Betweenness Centrality Of Biological Networks]
:
the distribution of the edge betweenness centrality has a Poisson

like distribution with a very sharp
spike
[On the Distribution of Edge Load in Scale

free Trees]
:
edge distribution for tree.
Does not know what the
conclusion
of this paper
is
.
[Universal Behavior of Load Distribution in Scale

Free Networks]:
the node load of scale free networks obeys a power law.
[Application of Cray XMT for Power Grid Contingency Selection]
: the edge betweenn
ess of a power grid obeys power law.
Q: What is the performance matric of SRNC?
A:
HAR
SAR
SNA
Q: What to evaluate?
A:
1.
The distribution of betweenness.
2.
The influence of the number of attack edges in the system.
3.
The influence of the number of Sybil node
s.
Q: What is the kind of betweenness distribution that you are expecting?
A:
The deviation of the betweenness of the honest edges is small
Honest node v accepts more honest node and less Sybil node
For nodes us and uh,bn((v,us)

SPs) > bbf * v.bb > bn
((v,uh)

SPs)
Has to reduce bbf, in order to reduce sar
Deviation of the bottlenecks of honest shortest paths is small
The deviation of the betweenness of the honest edges is small
Q:
What is
the
betweenness
distribution
really like
?
A: Seems that it
obeys the power law.
1
10
100
1000
10000
100000
1
10
100
1000
10000
100000
edge
real1222, edge betweenness
betweenness
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
snn
real1222, g=36, HAR
har(Sl)
har(Srnc)
0
100
200
300
400
500
600
snn
real1222, g=36, SNA
sna(Sl)
sna(Srnc)
1.
HAR of
SRNC
is
slightly
less than expectation $
\
beta$
.
The reason is that in SRNC,
the accept rates of different nodes are different.
1.
AS SNN increases, SNA of SL increases and SNA of SRNC decreases
2.
SNA of
SRNC is at leases three times of SNA of SL
0.88
0.9
0.92
0.94
0.96
0.98
snn
pl1222, g=36, HAR
har(Sl)
har(Srnc)
0
100
200
300
400
500
600
snn
pl1222, g=36, SNA
sna(Sl)
sna(Srnc)
0.86
0.88
0.9
0.92
0.94
0.96
0.98
12
36
61
85
109
134
158
183
207
232
256
g
real1222rn500, HAR
har(Sl)
har(Srnc)
0
100
200
300
400
500
600
12
36
61
85
109
134
158
183
207
232
256
g
real1222rn500, SNA
sna(Sl)
sna(Srnc)
1.
HAR of SRNC is slightly less than expectation $
\
beta$
:
1.
SRNC and SybilLimit accept more Sybil nodes as g increases
2.
Under a situation where SL accept all Sybil nodes, SRNC accept
three to ten
times less
Sybil nodes
Conclusion:
SRNC makes notable performance improvement over SybilLimit.
1.
Each h
onest node in SRNC accept
s
more that 95% of the honest nodes.
2.
Each h
onest node
in SRNC accept
s
400
~
700 percent less Sybil nodes than in SybilLimit
.
0.88
0.9
0.92
0.94
0.96
0.98
1
12
36
61
85
109
134
158
183
207
232
256
g
pl1222rn500, HAR
har(Sl)
har(Srnc)
0
100
200
300
400
500
600
12
36
61
85
109
134
158
183
207
232
256
g
pl1222rn500, SNA
sna(Sl)
sna(Srnc)
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο