Online Social Networks

longtermagonizingInternet and Web Development

Dec 13, 2013 (3 years and 5 months ago)

52 views

Measurement and Analysis of
Online Social Networks

1

Measurement and Analysis of
Online Social Networks

A. Mislove, M. Marcon, K Gummadi, P.
Druschel, B. Bhattacharjee

Presentation by

Shahan Khatchadourian

Supervisor: Prof. Mariano P. Consens

Measurement and Analysis of
Online Social Networks

2

Focus


graphs of online social networks


how they were obtained


how they were verified


how measurement and analysis was
performed


properties of obtained graphs


why these properties are relevant


Measurement and Analysis of
Online Social Networks

3

Why study the graphs?


important to
improve existing system
and
develop new applications


information search


trusted users


what is the
structure of online social networks


what are
different ways

to examine a social
network
when complete data is not available
?


how do they
compare

with each other
and to
the
Web
?


Measurement and Analysis of
Online Social Networks

4

Which graphs?


Flickr, YouTube, LiveJournal, and Orkut


All are
directed
except for Orkut


Weakly Connected Component (
WCC
)


Strongly Connected Component (
SCC
)

Measurement and Analysis of
Online Social Networks

5

How are the graphs obtained?


API


users


groups


forward/backward links


HTML Screen Scraping

Measurement and Analysis of
Online Social Networks

6

Summary of graph properties


small
-
world


scale
-
free


correlation between indegree and
outdegree


large strongly connected core of high
-
degree nodes surrounded by small
clusters of low
-
degree nodes

Measurement and Analysis of
Online Social Networks

7

Crawling Concerns
-

Algorithms


BFS and DFS


Snowball method: underestimates number
of low
-
degree nodes. In social networks,
they underestimate the power
-
law
coefficient, but closely match other metrics
such as overall clustering coefficient.


Measurement and Analysis of
Online Social Networks

8

Crawling Concerns


FW links


cannot reach entire WCC

Measurement and Analysis of
Online Social Networks

9

How to Verify Samples

1.
Obtain a random user sample


LJ: feature which returns 5,000 random users


Flickr: random 8
-
digit user id generation

2.
Conduct a crawl using these random users as
seeds

3.
See if these random nodes connect to the
original WCC

4.
See what the graph structure of the newly
crawled graph compares to original

Measurement and Analysis of
Online Social Networks

10

Crawling Concerns


FW links


no effect on largest WCC

Measurement and Analysis of
Online Social Networks

11

Crawling Concerns


FW links


increasing the size of the WCC by starting at a
different seed

Measurement and Analysis of
Online Social Networks

12

Site

YT

Flickr

LJ

Orkut

Users(mill)

1.1

1.8

5.2

3

Links(mill)

4.9

22

72

223

symmetry

79.1%

62.0%

73.5%

100.0%

Access

(FW:
Forward
-
only)

(SS: HTML
screen
-
scraping)

API

(users
only)

FW

SS for
group
info

API

(users +
groups)

FW

API

(users +
groups)

FW + BW

SS for
users +
groups

Measurement and Analysis of
Online Social Networks

13

Link Symmetry


even with directed links, there is a high
level of symmetry


possibly contributed to by informing users
of new incoming links


makes it harder to identify reputable
sources due to dilution


possible sol:
who initiated the link?

Measurement and Analysis of
Online Social Networks

14

Power
-
law node degrees


Orkut deviates:


only 11.3% of network reached (effect of
partial BFS crawl


Snowball method)


artificial cap of user’s number of outgoing
links, leads to a distortion in distribution of
high degrees


differs from Web

Measurement and Analysis of
Online Social Networks

15

Power
-
law node degrees

Measurement and Analysis of
Online Social Networks

16

Power
-
law node degrees

e.g. analysis of top keywords

Measurement and Analysis of
Online Social Networks

17

Spread of Information


Measurement and Analysis of
Online Social Networks

18

Power Law affectors


services, accessibility, features

mobile users

10
0

10
0

10
-
8

10
-
8

1

1

10000

10000

Measurement and Analysis of
Online Social Networks

19

Correlation of indegree and
outdegree


over 50% of nodes have indegree within 20% of
their outdegree


Measurement and Analysis of
Online Social Networks

20

Path lengths and diameter


all four networks have short path length


Broder et al noted if Web were treated as
undirected graph, path length would drop
from 16 to 7, so what?

Measurement and Analysis of
Online Social Networks

21

Link degree correlations


JDD: joint degree distribution


mapping between outdegree and average
indegree of all nodes connected to nodes
of that outdegree


YouTube different due to extremely
popular users being connected to by many
unpopular users


Orkut shows bump due to undersampling

Measurement and Analysis of
Online Social Networks

22

Joint degree distribution and Scale
-
free behaviour

undersampling

of low
-
degree

nodes

celebrity
-
driven

nature

cap on links

Measurement and Analysis of
Online Social Networks

23

Densely connected core


removing 10% of core nodes results in breaking up graph into
millions of very small SCCs


why an SCC? directed links matter for actual communication


graphs below show results as nodes are removed starting with
highest
-
degree nodes (left) and path length as graph is constructed
beginning with highest
-
degree nodes(right)

Sub logarithmic growth

Measurement and Analysis of
Online Social Networks

24

Tightly clustered fringe


based on clustering coefficient


social network graphs show stronger
clustering, most likely due to mutual
friends

Possibly because personal content

is not shared

Measurement and Analysis of
Online Social Networks

25

Groups


group sizes follow power
-
law distribution


represent tightly clustered communities

Measurement and Analysis of
Online Social Networks

26

Groups


Orkut special case maybe because of
partial crawl

Measurement and Analysis of
Online Social Networks

27

Node Value Determination

1.
Directed Graph, current model


nodes with many incoming links (hubs) have
value due to their connection to many users


it becomes easy to spread important information
to the other nodes, e.g. DNS


unhealthy in case of spam or viruses


in order for a user to send spam, they have
become a more important node, amass
friends

Measurement and Analysis of
Online Social Networks

28

2.
Link Initiator, requires temporal
information


if user A requests a link with user B, does
that mean that user B is more important?


even though graphs have a high level of link
symmetry, this additional information can
offset this symmetry


unfortunately, examined graphs do not have
temporal information

Node Value Determination

Measurement and Analysis of
Online Social Networks

29

Trust


lendingclub.com, Facebook application


people are more willing to lend money to
friends who are linked through a short path


people are more willing to pay back those
who are linked through a short path


no indication of whether this actually works


does trust increase as degree increases?


what credit rating and JDD does a person
have to get a good interest rate?

Measurement and Analysis of
Online Social Networks

30

Thank you

shahan@cs