Measuring Reliability in
Wikipedia
Wen

Yuan Zhu
2007.11.13
Outline
•
Introduction
•
Some term of Wikipedia
•
Basic concept of measuring reliability
•
A way to measure reliability
•
Conclusion
•
Reference
Introduction
•
Wikipedia is the most popular online
cooperation cyclopedia
•
it has rich phenomenon which is difference to
internet network and common webs
Some term of Wikipedia
Some term of Wikipedia(2)
•
feature article
–
to be considered to be the best articles in
Wikipedia
–
as determined by
Wikipedian
–
at present, there are 1683 featured articles
Some term of Wikipedia(3)
•
if an article is a feature article, it will show the
icon at right corner
Some term of Wikipedia(4)
•
articles are reviewed at
Wikipedia:Featured
article candidates
•
according to
Wikipedia:Featured
article
criteria
Some term of Wikipedia(5)
•
make sure that it meets all of the featured
article criteria
•
consensus must be reached that it meets the
criteria
Some term of Wikipedia(6)
•
articles that no longer meet the criteria can be
proposed for improvement or removal at
Wikipedia:Featured
article review
Some term of Wikipedia(7)
•
clean

up article
–
cleanup issues that this project covers may
include
wikification
, spelling, grammar,
tone, and sourcing
–
anyone can require to cleanup some page in
Wikipedia:Cleanup
Some term of Wikipedia(8)
Basic concept of measuring reliability
•
if the article has the higher link ratio, the
article has the higher reliability
•
this part referred to [2]
Basic concept of measuring reliability(2)
•
class of terms
Basic concept of measuring reliability(3)
•
relation between full name and short
Basic concept of measuring reliability(4)
•
Relation between
PageRank
and Link

ratio
Basic concept of measuring reliability(5)
•
it is not enough to measuring reliability only
rely on linking data
•
there are too many factors to influence
reliability of article in Wikipedia
A way to measure reliability
•
to use Bayesian statistic to model reliability in
Wikipedia
•
to use revision history to assess the reliability
of article in Wikipedia
•
this part referred to [3]
A way to measure reliability(2)
A way to measure reliability(3)
•
article trust
–
trustworthiness of a version of an article
•
fragment trust
–
trustworthiness of a fragment in a version
of an article
•
author trust
–
trustworthiness of an author
A way to measure reliability(4)
•
is the version of an article
•
is the trust value of
•
the author who revised
•
is the trust value of
•
is the inserted content in by
•
is the deleted content in by
•
is the size of
i
V
i
A
i
A
t
i
V
t
i
D
i
I
n
i
,...,
1
,
0
i
V
i
A
i
V
th
i
x


x
1
i
V
1
i
A
i
V
1
i
A
A way to measure reliability(5)
•
•




0
1
i
i
V
I




0
i
i
V
D
A way to measure reliability(6)
•
Dynamic Bayesian networks
–
to be defined by a pair
•
is the graph structure of the network
•
is the set of the network’s conditional
density distributions
o
B
)
,
(
o
s
B
B
S
B
A way to measure reliability(7)
•
from to ,
•
the state at the revision is represented
as a quad
•
the states satisfies the Markov property
–
since
•
,
i
V
1
i
V
S
B
)
,
0
[
,
i
i
d
i
]
1
,
0
[
,
i
i
A
V
t
t
1
,...,
1
,
0
n
i
)
,
,
,
(
i
i
A
V
d
i
t
t
i
i
th
i
i
X
)

(
)
,
,
,

(
1
0
1
1
i
i
i
i
i
X
X
f
X
X
X
X
f
A way to measure reliability(8)
A way to measure reliability(9)
•
to determine the posterior density
distribution of
•
is fully characterized by and
)
(
1
i
V
t
f
o
B
)

(
0
0
A
v
t
t
f
)
,
,
,

(
1
1
i
i
A
V
V
d
i
t
t
t
f
i
i
i
A way to measure reliability(10)
•
the Beta distribution
•
where is the beta function with and
1
1
)
1
(
)
,
(
1
)
,

(
p
p
B
p
bete
)
,
(
B
)!
1
(
)!
1
(
)!
1
(
)
(
)
(
)
(
)
,
(
B
A way to measure reliability(11)
A way to measure reliability(12)
•
to assume
•
let
•
is the mean of
•
then or
)
,

(
)

(
0
0
0
0
0
p
beta
a
t
t
f
A
V
0
0
a
10
10
,
10
)
,

(
p
beta
10
,
1
10
1
)

(
0
0
a
t
a
t
P
o
o
A
V
A way to measure reliability(13)
•
•
)
,

(
)

,

,
,

(
1
1
1
1
1
i
i
i
i
i
i
i
A
V
V
p
beta
D
d
I
i
a
t
t
t
t
f
i
i
i


)
0
,

)
1
(


max(
)

,

)
1
min((




1
1
1
1
1
i
i
i
i
i
i
i
i
i
i
i
V
V
t
D
a
V
t
D
a
I
a
V
t








1
i
i
i
i
D
I
V
V
A way to measure reliability(14)
•
featured articles
–
considered highly trustworthy
•
clean

up articles
–
considered untrustworthy
•
Normal articles
–
remaining articles
A way to measure reliability(15)
•
administrators
–
•
registered authors
–
•
anonymous authors
–
•
blocked users
–
95
.
0
),
10
,
190

(
p
beta
7
.
0
),
10
,
23

(
p
beta
05
.
0
),
190
,
10

(
p
beta
6
.
0
),
10
,
15

(
p
beta
A way to measure reliability(16)
•
a set of English
articles from the
Geography category
in Wikipedia in
January 2006
•
50 featured articles
•
50 clean

up articles
•
768 normal articles
•
manually classify
A way to measure reliability(17)
•
U.S. National Forest
in Wikipedia
•
created by an anonymous author
A way to measure reliability(18)
•
is mean of the posterior density
distribution
n
)
(
n
V
t
f
A way to measure reliability(19)
•
to developed a classifier based on
aforementioned 50 featured articles and 50
clean

up articles
•
the training set contains 100 pairs , where
is the trust value of an article and is its
class
)
,
(
y
x
y
A way to measure reliability(20)
•
the learned rule for feature article is
•
the test size of 200 new articles(48805
revisions) was evaluated
•
the accuracy of prediction is 82%
842
.
0
x
A way to measure reliability(21)
•
to use trust track to predict events
A way to measure reliability(22)
•
the method has some problems
–
the reliability of author is not a constant
–
the test set of classifier is too small
–
what is the predicting standards of predict
events
Conclusion
•
An overview of Wikipedia and measuring
reliability in Wikipedia
•
to introduce some ways to measuring
reliability in Wikipedia
•
to realize difficult problems of measuring
reliability in Wikipedia
Reference
[1] http://en.wikipedia.org/
[2] D.
McGuinness
, H.
Zeng
,
Pda
Silva,
LDing
,
DNarayanan
, and
MBhaowal
.
Investigation into trust
for collaborative information repositories: A
Wikipedia case study
. In Proceedings of the
Workshop on Models of Trust for the Web, 2006.
[3] H.
Zeng
, M.
Alhoussaini
, L. Ding, R.
Fikes
, and D.
McGuinness
.
Computing trust from revision history
.
In Intl. Conf. on Privacy, Security and Trust, 2006.
Comments 0
Log in to post a comment