# Lemma: each first type random walk increase the betweenness of at least one attack edge.

Networking and Communications

Oct 28, 2013 (4 years and 8 months ago)

92 views

Q: What is the problem?

A: The RSSR paper has an error.
Algorithm 0

seems does not work.

Q: But the evaluation result
s

look good.

A: The
se

result
s

are

generate by another algorithm (
Algorithm 1
)

Q: Why does
Algorithm 0

does not work?

Is it really an error?

A:
By simulation, t
he honest edge betweenness is not so larger than that of the attack edge betweenness.

Q: What is the
possible
reason?

A:
The problem
is

that in
the simulation,

the issuer rate is too small.

Here,
issuer rate
(
C
)
is the number of ARWs disseminated by each pair of nodes.

Let he and ae be an honest edge and an attack edge, respectively.
Suppose that the
network is even, i
f C is large,
the in and out betweenness of
he cancel off each
other, which makes the betweenness of he low. However, if C is low (i
n the simulation,
C=1
).

Therefore, the in
-
betweenness and out
-
betweenness of honest nodes
does not cancel

off each other, making the betweenness of he also high.

Figure
1

shows the result of Algorithm 0 when one pair of nodes disseminates 5000 first type ARWs. Therefore, the conclusion is that C

should be large. But the
number

node
pairs that disseminate ARWs is not necessarily to be large.

Lemma: each first type random walk increase the betweenness of at least one attack edge.

To forge the destination set of each honest node.

To reduce t
he betweenness of attack edges, Sybil nodes

can launch two kinds of attacks.

Attack 1: feedback

Solution: set the length of each arw to be m. For a (s,t)
-
ARW arw, if arw does not reach t in m steps, s rejects t.

First,
m steps is sufficient for t to be reached
.

When t is honest, the Sybil nodes should not launch attack 1.
Suppose that t is honest.
I
f Sybil nodes prevent t from being reached, the betweenness of attack edges
will be increased.
s can issue large number of (s,t)
-
ARWs to increase the betweenness of at
tack edge. Therefore, Sybil node should not launch attack 1.

When t is Sybil, Sybil nodes should not launch attack 1, neither.
Suppose that t is Sybil.
If Sybil nodes prevent t from be reached, t will be rejected
.
t is chosen form
the destination set of s,

which means t needs to communicate with s. There, Sybil nodes should not launch Attack 1.

Attack 2: forge

Solution: Steered random walk.

Figure
1

0
0.2
0.4
0.6
0.8
1
1.2
12
36
61
85
109
134
158
183
207
232
256
281
305
329
354
378
403
427
452
g

algorithm 0, c500, real1222rn500

har(sohl)
sar(sohl)

Q: Why does
Algorithm 1

work?

A: Because mathematically it works.

Q:
Algorithm 0

and
Algorithm 1
, which is better?

A: Seems that
Algorithm 1

is more efficient. But, it has security problem.

For v to compute the betweenness of e=(v,u), the betweenness of e for v is something like |v
.potential
-
u
.potential
|
, where
v
.potential

is the expected number of
times that a (i,j)
-
ARW passes v.
Therefore, the problem is u maybe Sybil and will not give u the correct
potential.

Q:
OK, you can just give up
Algorithm 1
. What is left problems of
Algorithm 0
?

A: Let arw be a
n

ARW1, the Sybil
nodes may never relay arw to its destination.

I
f you can prove that
Algorithm 1

works, you can write in your dissertation that
Algorithm 1

makes some improvement over
Algorithm 0
.

RS
C

Q: Let rw be a first type (
s
,
t
)
-
SRW, where v and u are honest and Sybil, respectively.

A: This rw will increase the betweenness of an attack edge by one.

Q: Let ae=(fhn,fsn) be an attack edge that connect
s honest node fhn and Sybil nodes fsn. Suppose fsn has received rw from ae. Fsn can simply send rw back to fhn
to reduce the betweenness of ae.

A: ae will finally reach u. Therefore, rw will finally pass some attack edge front the
honest region to the Sybi
l region and increase thus increases the betweenness of
this attack edge by one.

Q:
Then, rw
ha
s

to hop
for
a long distance, which wastes the resource of honest nodes.

A:
Cut rw once rw has hoped
N

steps.

Q:
You mean
that rw

has a writable counter initiated as n? Then, Sybil nodes can reset the counter to be very large.

A:

The counter is a hash chain.
Initially, the counter is rw.counter=Hash(rw.s).
Let v be the current node of rw,
.

Seems that we need
an
asymmetric
encryption.

s encrypts the counter
. In each hop where rw is currently at node x. x asks s to decrypt the counter, decrease the
counter and then re
-
encrypt the counter.
..

Q: This is a big overhead for t.

A: Secure routing is a big problem for network syst
ems
. All the possible security problems have been intensively researched

[
SPV: Secure Path Vector for securing
BGP
][
-
hoc network Distance vector routing protocol
]
[

Effi
cient Security Mechanisms for Routing Protocols
]
.
We do not nee
d to consider
these problems

by ourselves
.

Q: Even without attack, your algorithm has to disseminate many long random walks, which is not efficient.

What is the message cost?

A:

Q: Is the algorithm in your paper wrong?

A: The algorithm in the paper is r
ight. But, to test this algorithm, the network should be large. For example, in pl100rn100g1, the honest edge current is almo
st
equal to the attack edge current.
This is because that the network

is too small.
For e
ach
ARW1 arw, each honest edge e, and each

attack edge ae, arw1
will pass
both ae and he and increase the betweennesses of ae and hp w.h.p.

Q: How can the betweenness of attack edges be increased?

A:
A large number of first type ARWs end in the Sybil region.

Q: So, what should you do?

A: For each

ARW arw, increase the probability that arw enters the Sybil region and decrease the probability that arw return to the honest

region.

SRNC

A:

SRNC

Efficient and

effective:

Suppose that the number of Sybil nodes is comparable to the
number of honest nodes. Then, the m
essage cost

of SRNC is
O(
n
log(
n
)
)
.

The
memory

cost of SRNC is O(n)

It is hard to compute shortest path in real world system. To know shortest path
is
equal to know the route information. Intensive researches have been done to
ensure secure routing.
Secure routing is possible, but expensive.

SRNC is a
synchronous

algorithm, where synchronization is a complicated problem
for distributed system
s
.

Count
-
to
-
infinity problem
http://bit.ly/vA7zkK

Can be solved: Destination
-
Sequenced Distance Vector protocol (DSDV)
http://bit.ly/sXOsU8

Churn?

Churns of DV http://bit.ly/sXOsU8

Compute the accurate shortest path
information of a large network is hard?

C
onvergence

speed
?

Not suited for large and complex networks

Random walk is scalable, strong.

Less efficient.

is up to O(n
2
log
2
(n)).

In RSSR, each pair of nodes has to
disseminate
absorbing
random walks to each other.
Let hn and sn be an honest
node and a Sybil node, respectively
. Let rw be the (hn, sn) absorbing random walk.
Once rw enters the Sybil region, Sybil nodes can prevent
r
w f
rom reaching sn
forever. To prevent this problem, hn rejects sn once rw has moved nlog
2
(n) hops

because nlog
2
(n) has been sufficient for absorbing random walks each any node
.
The number of Sybil nodes i s O(n), therefore, the
message cost of RSSR i s
O(n
2
log
2
(n)).

The memory cost per node is O(
n
2
)
.

Non
-
deterministic (
Random walk

based algorithm
)

Partial information
can be
maintained more easily and is
sufficient to finish the same job

in many applications
.

o

Can deal with network change

o

Can deal with securi
ty problems

o

Can finish the job

Random Walk Based Node Sampling in Self
-
Organizing Networks
:
Random walk is particularly attractive to self
-
organizing networks like Internet overlay
networks and wireless ad hoc networks. In these systems, nodes can join
and leave dynamically without centralized control, and the network topology itself can also
change over time.
Random walk requires little index or state maintenance
and it can function on almost all connected network topologies. In these aspects, it is
sup
erior to systems with sophisticated index states or rigid network structures, e.g., distributed hash tables (DHTs) [25, 26, 2
8]. Compared with index
-
free node
traversal schemes like network flooding, random walk is inherently
scalable

in that its network c
ommunication overhead does not increase as the network size
grows.

Random Walks in Distributed Computing: A Survey
:
Random walks are interesting by providing a
scalable

mechanism to insert information into the distributed
computation, for example when node

insertion occurs in the distributed system or to update topology modification (edge or node deletion). Ad
-
hoc networks or
pervasive distributed systems, because of their very limited communication bandwidth for network control, can
also benefit of this ap
proach.
Because of their
inherent complexity, deterministic solutions to control large distributed systems are often unsatisfactory.

One solution is to design randomized algorithms which can
be simpler, especially for their correctness proof.

The system ma
ybe large and the network structure may change a lot, it should be
expensive
to
keep correct deterministic information
.

Deterministic scheme (Shortest path)

It is complicated to get deterministic information in distributed system due to attacks, network
change.

o

Security problem

o

Network change

Q:
It looks like that SybilDector works quit well. Why should I use RSC?

A:

RSC
is more dependable
than SybilDector.

Specifically,
SybilDector
needs
shortest path
information

of the system
, which is hard to obtai
n in distributed system due to attacks
.

The existing shortest path
computing algorithms are…. The security problems are …

In contrast,
RSC need only partial information
, which is easier to obtain
.

RSC use random walk betweenness. RWB is statistic metric

an
d this is easier to compute
.
Usually, systems needing non
-
determinist information is easier to construct than systems needing complete information.
For example
:

structured and non
-
structure
P2P systems.

As a conclusion,
RSC is more dependable.

Q: How to
compute shortest path information in distributed systems?

A:

In distributed system, the most used algorithm to for computing shortest path is
the
distance vector protocol
[RIPv1, RIPv2,
IGRP
]
. In DV, each node maintains a
distance vector which contain
s

the shortest path information to the other nodes. The nodes disseminate their DV to their incident nodes periodically and upd
ate
their DV using the received DV.

Gr
adually, the DV of each node converge to a state that contains the shortest path information

to all the other nodes in the system.

In real world systems, DV faces many atta
cks. During

the update of DV, routing updates can be fabricated, modified, replayed, deleted, and snooped. Therefore,
i
ntensive researches have

been done to resist
these attack
s

[
Securing Distance
-
Vector Routing Protocols
]

[
SPV: Secure Path Vector for securing BGP
][
-
hoc network Distance vector routing protocol
][

Effi
cient Security Mechanisms for Routing Protocols
]
.
Therefore, it is possible to compute th
e shortest
path information, although may be it is expensive
.

This research
assumes that there is efficient and effective shortest path computing algorithms, as SSR algorithm Gatekeeper does [Gatekeepe
r].

Q: Why random walk betweenness is good?

A:
Random

walks based algorithm has many advantages.
These algorithm
s do

not need
deterministic information. Therefore are easier to maintain
[
Random Walks
in Distributed Computing: A Survey
][
Random Walk Based Node Sampling in Self
-
Organizing Networks
]
.

Q: So, how

to write the introduction of the dissertation?

A:

The objective of this research is to design security mechanisms to resist the
\
emph{false result attack} and the
\
emph{Sybil attack} in distributed systems.

In distributed computing systems, false result

attack, where malicious nodes send incorrect data deliberately to other nodes to disrupt the system, is a key security
threat. This research proposes an algorithm named MSC that can effectively resist the false result attack.

However, MSC is vulnerable w
hen the malicious users can collude. Therefore, this research turns to study Sybil attack, the most intractable kind of collu
ding attack
in distributed systems. Accordingly, this research proposes SybilDetector, which can effectively resist the Sybil attac
k.

However, SybilDetector

has a scalability problem because it is a synchronous algorithm. Hence, this research tries to design Sybil resisting algorit
hms that are both
effective and scalable. Specifically, this research proposed RSC, a mechanism that can improve the performances
of existing Sybil resisting algorithms. Many existing
Sybil resisting algorithms are asynchronous and thus scalable. By implementing RSC on these algorithms, scale and effective S
ybil resisting algorithms are obtained.

Q:
Can you remove
the

global parame
ter of SRNC?

A:

Assumption (1): the weights of honest edges are stable.

Idea:
v sample
s

a set of edges using random walks of length log(n)
.

Calling the set of ending edges S. Then, v
use
s

the average of the weights of the sampled edges
as its threshold.
U
nder
A
ssumption

(1)
,
the random walks
stay within the honest
region

and
end on a random
honest edge.

In this way,
the threshold computed by v
should be equal to the average weight of honest edges
.
Hence, edges of high weights would be detected.

(*
The glo
bal parameter can be
solved
by adopting the water fall random walk technique to SRNC.

However, this problem should
not be completely solved
.
Otherwise, SRNC
completely

outperforms RSSR. Hence, we should simply give a non
-
complete solution to this problem.
)

Q: This may not work.
What we need is to enable the front honest nodes fh
n
s to detect the

attack edges among their incident edges. However, t
he sampling
random walks of
fhns
will escape the honest region w.h.p.
Accordingly, fhns

have high threshold and cannot detect the attack edges.

A: v can use water fall random walks to reduce the escape probability of their sampling random walks.

Q: Water fall random walk does not work, neither.
For a front honest node fhn, the differen
ce

of

weights of the incident edges of fhn may be small, especially when
the degree of fhn is small.

A:
So, maybe we should use IBT to detect the Sybil nodes
.

Identification

Algorithm (
IA
1): For node v, v

1.

Compute
s

sp(u)

2.

Compute
s

the bottleneck of
bn(
sp(u)
)

3.

C
omputes bb

a.

v samples a set of honest nodes S, called the bottleneck bound
sample
s.

b.

bb=average({bn((v,ui)
-
SP) | for all ui in S})

4.

Reject
s

u if

and only i
f
bbFactor * bb <
bn(sp(u))

v.sp1=<s1,…, v>

Figure
2

Figure
3

Q: Are the bottleneck bound of honest nodes stable?

A: Yes. As
Figure
3

shows.

Q: Are the bottleneck
s

of routes of Sybil node greatly hi
gher than the bottleneck
bound
of honest nodes?

A: Yes. As
Figure
3

shows.

Q: Good.
Specifically
, how do you do the evaluation?

A: The following evaluations should be

implemented.

1.

Does Assumption (1) hold?

(

Yes

expected)

2.

Will the weights of attack edges be greatly larger than these of honest edges
? (“Yes” expected)

0
10000
20000
30000
40000
50000
60000
70000
80000
1
19
37
55
73
91
109
127
145
163
181
199
217
235
253
271
289
node index

pl200rn100ae0.02, bottleckBound

bottleckBound
botttleneck

of front nodes

bottleneck

bounds of
non
-
front nodes

3.

Will
the sampling
random walks end on random honest edge w.h.p? (“Yes” expected)

4.

Evaluate
performance
of SRNC on different networks
.

(Do not compare SRNC with SybilLimit
. It is impossible to win
!)

Q:
Does Assumption (1) hold?

(
“Yes”

expected)

A: Yes. As
Figure
4

shows that:

1.

The edge betweenness an edge e=(v1, v2) is positively correlated to the degrees of v1.

2.

Although fluctuates, weights (load/degree) are almost a constant among the edges.

Figure
4

Q: Will the weights of attack edges be greatly larger than these of honest edges?

(“Yes” expected)

A: Yes. As
Figure
5

shows.

0
500
1000
1500
2000
2500
3000
3500
0
100
200
300
400
1
208
415
622
829
1036
1243
1450
1657
1864
2071
2278
edge index

pl500, relation between edge

e
Linear

Figure
5

Q:
Will the sampling random walks end on random honest edge w.h.p? (“Yes” expected)

A:
Yes, a
s
indicated in
Figure
3
.

Q: What is the theoretical performance?

A:
A node
u

is accepted by
v

=>
bbFactor * v.bb < v.bn(
v.
sp(u))

=> need to know the bottleneck bound and the bottleneck => need to know the distribut
ion of weight
=> need to know the distribution of edge
=>
The distribution of
is not clear yet =>
left
the theoretical analysis of SRNC
as future work.

Q: bbFactor is a global parameter again. Can you remove it?

A:

IA

2: For node v, v

0
100
200
300
400
500
600
700
1
106
211
316
421
526
631
736
841
946
1051
1156
1261
1366
1471
1576
1681
1786

ae

1.

C
lusters
the paths into two
sets
according to the bottleneck of the paths.

For each set S, denote by the bottleneck of S as the average of the bottlenecks of
the paths in S and denote by Sh be the set that has a smaller bottleneck.

2.

A
ccepts the heads of the paths i
n Sh.

Q:
Which of these two algorithms

(
IA
1 and
IA
2)

has lower error rate?

A:
I think that
IA
2

should be

better.

Suppose
the system have 4 nodes,
{
v, u(0), u(1)
,
u(2)
}
. N
odes u(0) and u(1) are honest, and u(2) is Sybil.
Let X[i]
be

the bottlenecks of (v
,u(i))
-
SP
. Suppose that
X=[1,5,100].

IA
1: Suppose that the
bottleneck bound
sample
s

of v is {u(0),u(1)}
, which is perfect. Accordingly
, v.bb=(1+5)/2=3. Suppose that bbFactor=1. Then, v cannot accept
any honest nodes.

IA
2:
IA
2 can clusters
{u(0), u(1), u(2)
} into Sh={u(0), u(1)} and Ss={u(1)}. Then, v accept
s

the nodes of Sh.

However,
as
Figure
6

and
Figure
7

show,
IA1 is better.

Figure
6

Figure
7

Q: So, do you have other method to remove the global parameter?

A: IA3 (local benchmark technique)

Having obtained the bottleneck bond
sample
s, v increases bbf until more that
\
beta (e.g, 90%) percent of the bb
sample

are accepted.

betweenness

[Community structure in social and biological networks]:

the definition of edge betweenness
.

[Mapping Anatomical Connectivity Patterns of Human Cerebral Cortex Using In Vivo Diffusion Tensor Imaging Tractography]
:

In the present
0
100
200
300
400
500
600
12
36
61
85
109
134
158
183
207
232
256
g

real1222snn500bbf5

sna(Sl)
sna(Srnc)
0
100
200
300
400
500
600
12
36
61
85
109
134
158
183
207
232
256
g

real1222snn500ia2

sna(Sl)
sna(Srnc)
investigation,

we demonstrated that both the node
-

and edge
-
betweenness centrality of the human cortical network followed
exponentially truncated power
-
law distribution

(it is not Power Law!). Exponentially truncated power
-
law is more resistant to attack.

[The
Betweenness Centrality Of Biological Networks]
:

the distribution of the edge betweenness centrality has a Poisson
-
like distribution with a very sharp
spike

[On the Distribution of Edge Load in Scale
-
free Trees]
:

edge distribution for tree.

Does not know what the
conclusion
of this paper

is
.

[Universal Behavior of Load Distribution in Scale
-
Free Networks]:

the node load of scale free networks obeys a power law.

[Application of Cray XMT for Power Grid Contingency Selection]
: the edge betweenn
ess of a power grid obeys power law.

Q: What is the performance matric of SRNC?

A:

HAR

SAR

SNA

Q: What to evaluate?

A:

1.

The distribution of betweenness.

2.

The influence of the number of attack edges in the system.

3.

The influence of the number of Sybil node
s.

Q: What is the kind of betweenness distribution that you are expecting?

A:

The deviation of the betweenness of the honest edges is small

Honest node v accepts more honest node and less Sybil node

For nodes us and uh,bn((v,us)
-
SPs) > bbf * v.bb > bn
((v,uh)
-
SPs)

Has to reduce bbf, in order to reduce sar

Deviation of the bottlenecks of honest shortest paths is small

The deviation of the betweenness of the honest edges is small

Q:
What is
the
betweenness
distribution

really like
?

A: Seems that it
obeys the power law.

1
10
100
1000
10000
100000
1
10
100
1000
10000
100000
edge

real1222, edge betweenness

betweenness

0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
snn

real1222, g=36, HAR

har(Sl)
har(Srnc)
0
100
200
300
400
500
600
snn

real1222, g=36, SNA

sna(Sl)
sna(Srnc)

1.

HAR of
SRNC

is
slightly
less than expectation \$
\
beta\$
.

The reason is that in SRNC,
the accept rates of different nodes are different.

1.

AS SNN increases, SNA of SL increases and SNA of SRNC decreases

2.

SNA of
SRNC is at leases three times of SNA of SL

0.88
0.9
0.92
0.94
0.96
0.98
snn

pl1222, g=36, HAR

har(Sl)
har(Srnc)
0
100
200
300
400
500
600
snn

pl1222, g=36, SNA

sna(Sl)
sna(Srnc)

0.86
0.88
0.9
0.92
0.94
0.96
0.98
12
36
61
85
109
134
158
183
207
232
256
g

real1222rn500, HAR

har(Sl)
har(Srnc)
0
100
200
300
400
500
600
12
36
61
85
109
134
158
183
207
232
256
g

real1222rn500, SNA

sna(Sl)
sna(Srnc)

1.

HAR of SRNC is slightly less than expectation \$
\
beta\$
:

1.

SRNC and SybilLimit accept more Sybil nodes as g increases

2.

Under a situation where SL accept all Sybil nodes, SRNC accept
three to ten
times less

Sybil nodes

Conclusion:

SRNC makes notable performance improvement over SybilLimit.

1.

Each h
onest node in SRNC accept
s

more that 95% of the honest nodes.

2.

Each h
onest node

in SRNC accept
s

400
~
700 percent less Sybil nodes than in SybilLimit
.

0.88
0.9
0.92
0.94
0.96
0.98
1
12
36
61
85
109
134
158
183
207
232
256
g

pl1222rn500, HAR

har(Sl)
har(Srnc)
0
100
200
300
400
500
600
12
36
61
85
109
134
158
183
207
232
256
g

pl1222rn500, SNA

sna(Sl)
sna(Srnc)