Topic Distributions over Links on Web

farmpaintlickInternet και Εφαρμογές Web

21 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

94 εμφανίσεις

1

Topic Distributions over Links on Web

Jie Tang
1
, Jing Zhang
1
, Jeffrey Xu Yu
2
, Zi Yang
1
, Keke
Cai
3
, Rui Ma
3
, Li Zhang
3
, and Zhong Su
3


1

Tsinghua University

2

Chinese University of Hong Kong

3

IBM, China Research Lab

Dec. 7
th

2009

2

Motivation


Web users create links with significantly different
intentions


Understanding of the category and the influence of
each link can benefit many applications, e.g.,


Expert finding


Collaborator finding


New friends recommendation




3

Original citation network

Semantic citation network

Examples


Topic distribution analysis over citations

Researcher A



an in
-
depth understanding
of the research field?

VS.

4

Problem: Link Semantic Analysis

Topic modeling
over links

Citation context
words

Link semantics

5

Outline


Previous Work


Our Approach


Pairwise Restricted Boltzmann Machines (PRBMs)


Experimental Results


Conclusion & Future Work

6

Previous Work

Link influence analysis



Citation influence topic [Dietz, 07];



Social influence analysis [Crandall, 08; Tang, 09];

Graphical model



Probabilistic LSI [Hofmann, 99],



Latent
Dirichlet

Allocation [
Blei
, 03],



Restricted Boltzmann machines [Welling, 01]

Social network analysis



Social network analysis [Wasserman, 94]



Web community discovery [Newman, 04]



‘Small world’ networks [Watts, 18]

7

Outline


Previous Work


Our Approach


Pairwise Restricted Boltzmann Machines (PRBMs)


Experimental Results


Conclusion & Future Work

8

Pairwise Restricted Boltzmann
Machines (PRBMs)

Link context
words

Topic distribution

Link category

Latent variables
defined over the
link to bridge the
two pages

Pairwise

Restricted Boltzmann
Machines (PRBMs)

Example

9

Formalization of PRBMs

Formalization

PRBMs

Obj.
Func
:

with

10

Model Learning

Generative learning

Discriminative learning

Hybrid learning

Obj.
Func
:

Expectation
w.r.t
. the data
distribution

Expectation
w.r.t
. the
distribution defined by the
model

We use the Contrast
Divergence to learn the
model distribution
P
M

11

Link Semantic Analysis


Link category annotation


First we calculate


Then we estimate the probability
p
(
c
|
e
) by a mean field
algorithm


Link influence estimation


Estimate influence by KL divergence



An alternative way is to generate the influence score by a
Gaussian distribution, thus

12

Outline


Previous Work


Our Approach


Pairwise Restricted Boltzmann Machines (PRBMs)


Experimental Results


Conclusion & Future Work

13

Experimental Setting


Data sets


Arnetminer data: 978,504 papers, 14M citations


Wikipedia: 14K “article” pages and 25 K links


Evaluation measures


Link categorization accuracy


Topical analysis


Baselines:


SVM+LDA


SVM+RBM


14

Accuracy of Link Categorization

gPRBM
:
our approach
with generative
learning

dPRBM
:
our approach
with discriminative
learning

hPRBM
:
our approach
with hybrid learning


15

Category
-
Topic Mixture

16

Example Analysis

17

Outline


Previous Work


Our Approach


Pairwise Restricted Boltzmann Machines (PRBMs)


Experimental Results


Conclusion & Future Work

18

Conclusion & Future Work


Concluding remarks


Investigate the problem of quantifying link semantics on the
Web


Propose a Pairwise Restricted Boltzmann Machines to solve
this problem


Future Work


Semantic analysis over
social relationships


Correlation

between the link semantics and the information
propagation

19

Thanks!

Q&A

HP:
http://keg.cs.tsinghua.edu.cn/persons/tj/