What is the right
clustering of this graph?
Clique Percolation
A community is a collection of adjacent
𝑘

cliques.
Questions:
What is a good k?
How to find cliques?
Clique Finding
Find the largest clique in a graph?
NP

complete
Find maximal clique containing a node?
Polynomial
Percolation Algorithm
•
Find all
maximal
cliques
•
Create clique

clique overlap
matrix
•
Ignore entries
less than k
Running Time
•
Maximal clique finding is output

polynomial
–
Extensively studied
•
“
we note that a complete analysis
of a
co

authorship network with 127000 links takes
less than 2 hours on a PC
.”
A more Theory approach
Important features of a community
Internally dense
Externally sparse
Clique

percolation ignores externally sparse
Modularity defines it as the edge cut
(
,
)
clustering
A cluster C is an
(
,
)
cluster if
–
Internally Dense:
Every vertex in the cluster neighbors at least a
β
fraction of the cluster
–
Externally Sparse:
Every vertex outside the cluster neighbors at
most an
α
fraction of the cluster
(1/5,4/5)
(1/5,4/5)
First approach

𝜌

Champions
Wes
Anderson
9
7
,
3
1
Ben Stiller
Owen
Wilson
Bill
Murray
Gwenyth
Paltrow
Will
Ferrell
Vince
Vaughn
Anjelica
Houston
Steve
Martin
Dan
Akroyd
Scarlett
Johanssen
Jack
Black
Ellijah
Wood
Algorithm with
𝜌

Champions
•
Let c be a
ρ

champion
•
If
v
in
C
, then
v
and
c
share at least
(
2
−
1
)

𝐶

neighbors
•
If
v
is outside
C
then
v
and
c
share at most
(
𝜌
+
)

𝐶

neighbors
β
C
β
C
ρ
C
α
C
(2
β

1)C
c
v
v
Runs in
O
(
d
0.7
n
1.9
+n
2+o(1)
) time where
d
is the average degree
Discussion
•
Pros
–
Very parallel
–
Experiments show good results
–
Are a good feature in recommendation
algs
•
Cons
–
Beta > ½ doesn’t seem realistic
–
The
𝜌
champion is fairly restrictive
–
Not based on observed data
Finding Overlapping Communities
Assumptions
1)
Community edges are chosen according to
the expected affinity (degree) model.
2)
Maximality
assumption with gap
𝜖
3)
Community membership accounts for a
significant portion of each node’s edges,
Another Algorithm Style
•
Grow a community from a set of seed nodes.
•
Clique finding:
Pick s starting nodes at random
For each starting node v, sample
𝑆
⊂
Γ
𝑣
For each clique in S, grow to maximal clique.
Output if it satisfies your conditions.
Ego Networks
You are the ego. Your
friends form the ego
network.
Sociology on Ego Networks
Functions Served by Ego Networks
•
Social
support
•
Sense

making
•
Social control
•
Access to
resources
•
Behavioral models
Dunbar Circles
Dunbar number

“the theoretical
cognitive limit to the number of
people with whom one can maintain
stable social relationships
.” between
100 and 250
Community Detection with
Egonets
Idea 1
–
When you
remove the ego, the
egonet
becomes
disconnected
components.
Idea 2
–
It becomes
weakly connected
components.
family
eecs
Grade school
college
microsoft
uva
radio
TCS
Egonet
based Systems

DEMON
DEMON
•
Apply a community detection algorithm (Label
Propagation) to the
Egonet
•
Repeat this for every user in the network.
Community Definition:
The set of communities
is the set of maximal
sets that ‘contain’ the
egonet
communities.
Demon
•
Merge the results.
Output:
Set of overlapping communities
Running time:
𝑂
(
𝑛
𝐾
3
−
𝛼
)
‘Real’ Community
Random Walk
Metis
Infomap
Newman

Modularity
Louvain
21
Cornell Study
Slides due to Bruno
Abrahao
Community Detection
•
Community structure is not well defined
–
different people have different notions of community structure
•
Traditional strategy
–
(1) start with an expectation of what a community should look like
•
e.g., a set of nodes that interact more within the set than with the outside
–
(2) define an optimization problem
–
(3) design heuristic
–
(4) the solution gives the desired communities
22
Key questions
•
A multitude of algorithms
–
different objective functions
–
different heuristics
•
How dissimilar are their outputs?
•
Communities may differ from the
proposed mathematical constructs
–
e.g., preponderance of links to the outside
•
Which algorithms extract communities that most
closely resemble the structure of real
communities?
23
Obstacles to answering the questions
•
We don't know what properties communities possess
•
We can't characterize communities in the absence of negative
examples
–
Look at real communities and determine their structure
–
do other sets that are not communities have these properties?
–
every other connected set could be a negative example

intractable
–
sets that are not annotated could also be communities
•
We don't know what metrics we should use
–
modularity, conductance, clustering coefficient...
24
Building structural classes
Algorithm
Network
Extract community
examples
Apply
25
Building structural classes
Algorithm 2
Algorithm 4
Algorithm k
Algorithm 1
Algorithm 3
Class 1
Class 2
Class 3
Class 4
Class k
26
Building a feature space
Labeled Example
Feature Vector
27
Building a feature space
Feature Space
28
Inter

class
separability
Feature Space
Separability = Distinct structures
Class Separability
Measure
Are the classes separable?
29
Large

scale network datasets
•
Social
•
Commercial
•
Biological
+ Rice University
Facebook+Rice with permission of Mislove et al.. Other datasets publicly available.
30
Community detection algorithms
•
BFS (Random connected subgraphs)
•
Random

Walk

based (with and without
restart)
•
(α,β)

communities
•
InfoMap
•
Markov Clustering
•
Metis
•
Louvain
•
Newman

Clauset

Moore
•
Link Communities
31
Annotated communities
+ Rice University
Metadata included in the datasets identifies exemplar communities that form
in these domains
32
To what extent are the classes separable?
Probabilistic k

way
classifier
(SVM, k

NN)
Algorithm 1
Algorithm 2
Annotated
communities
Train
33
Probabilistic multi

class learners
Probabilistic k

way
classifier
(SVM, k

NN)
Classify
(cross

validation)
Pr(
Algorithm 1
) = 0.05
Pr(
Algorithm 2
) = 0.08
...
Pr(
Annotated
) = 0.48
34
Cross

validation performance
35
Matching annotated communities
•
Which algorithms extract communities that most
closely resemble the structure of annotated
communities?
Probabilistic multi

class learners
Probabilistic k

way
classifier
Algorithm 1
Algorithm 2
Algorithm k
Learn
37
Probabilistic multi

class learners
Probabilistic k

way
classifier
Classify
Pr(
Algorithm 1
) = 0.02
Pr(
Algorithm 2
) = 0.19
...
Pr(
Algorithm k
) = 0.12
38
Classification of annotated into extracted
39
Step 1: identifying the most important features
7 features out of 36 retain the discriminative power of the full set
40
Tendencies of algorithms
with respect to most discriminative features
41
Summary
•
Traditional methods are
unsupervised
–
they find a particular type of community
–
little sensitivity to different purposes, structures of interest and domains of
application
•
Our approach suggests a
supervised
approach to
community detection
–
user specifies what they intended to find through examples (real or
synthetic)
–
algorithm learns from those examples and retrieves similar structures in
the network
42
Experimental Assignment
•
Goal: Do some data mining research,
comparing real networks and the models in
class
•
Due: Email a report by Friday, October 12.
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment