Measuring Reliability in

hartebeestgrassΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

41 εμφανίσεις

Measuring Reliability in
Wikipedia

Wen
-
Yuan Zhu

2007.11.13

Outline


Introduction


Some term of Wikipedia


Basic concept of measuring reliability


A way to measure reliability


Conclusion


Reference

Introduction


Wikipedia is the most popular online
cooperation cyclopedia


it has rich phenomenon which is difference to
internet network and common webs

Some term of Wikipedia


Some term of Wikipedia(2)


feature article


to be considered to be the best articles in
Wikipedia


as determined by
Wikipedian


at present, there are 1683 featured articles

Some term of Wikipedia(3)


if an article is a feature article, it will show the
icon at right corner


Some term of Wikipedia(4)


articles are reviewed at
Wikipedia:Featured

article candidates


according to
Wikipedia:Featured

article
criteria


Some term of Wikipedia(5)


make sure that it meets all of the featured
article criteria


consensus must be reached that it meets the
criteria

Some term of Wikipedia(6)


articles that no longer meet the criteria can be
proposed for improvement or removal at
Wikipedia:Featured

article review


Some term of Wikipedia(7)


clean
-
up article


cleanup issues that this project covers may
include
wikification
, spelling, grammar,
tone, and sourcing


anyone can require to cleanup some page in
Wikipedia:Cleanup


Some term of Wikipedia(8)


Basic concept of measuring reliability


if the article has the higher link ratio, the
article has the higher reliability


this part referred to [2]

Basic concept of measuring reliability(2)


class of terms

Basic concept of measuring reliability(3)


relation between full name and short

Basic concept of measuring reliability(4)


Relation between
PageRank

and Link
-
ratio

Basic concept of measuring reliability(5)


it is not enough to measuring reliability only
rely on linking data


there are too many factors to influence
reliability of article in Wikipedia

A way to measure reliability


to use Bayesian statistic to model reliability in
Wikipedia


to use revision history to assess the reliability
of article in Wikipedia


this part referred to [3]

A way to measure reliability(2)


A way to measure reliability(3)


article trust


trustworthiness of a version of an article


fragment trust


trustworthiness of a fragment in a version
of an article


author trust


trustworthiness of an author

A way to measure reliability(4)



is the version of an article



is the trust value of



the author who revised



is the trust value of



is the inserted content in by



is the deleted content in by



is the size of

i
V
i
A
i
A
t
i
V
t
i
D
i
I
n
i
,...,
1
,
0

i
V
i
A
i
V
th
i
x
|
|
x
1

i
V
1

i
A
i
V
1

i
A
A way to measure reliability(5)







|
|
|
|
0
1



i
i
V
I
|
|
|
|
0
i
i
V
D


A way to measure reliability(6)


Dynamic Bayesian networks


to be defined by a pair



is the graph structure of the network



is the set of the network’s conditional
density distributions

o
B
)
,
(
o
s
B
B
S
B
A way to measure reliability(7)



from to ,


the state at the revision is represented
as a quad


the states satisfies the Markov property


since



,

i
V
1

i
V
S
B
)
,
0
[
,


i
i
d
i
]
1
,
0
[
,

i
i
A
V
t
t
1
,...,
1
,
0


n
i
)
,
,
,
(
i
i
A
V
d
i
t
t
i
i
th
i
i
X
)
|
(
)
,
,
,
|
(
1
0
1
1
i
i
i
i
i
X
X
f
X
X
X
X
f





A way to measure reliability(8)


A way to measure reliability(9)


to determine the posterior density
distribution of



is fully characterized by and

)
(
1

i
V
t
f
o
B
)
|
(
0
0
A
v
t
t
f
)
,
,
,
|
(
1
1
i
i
A
V
V
d
i
t
t
t
f
i
i
i


A way to measure reliability(10)


the Beta distribution




where is the beta function with and


1
1
)
1
(
)
,
(
1
)
,
|
(










p
p
B
p
bete
)
,
(


B


)!
1
(
)!
1
(
)!
1
(
)
(
)
(
)
(
)
,
(




















B
A way to measure reliability(11)


A way to measure reliability(12)


to assume






let



is the mean of



then or

)
,
|
(
)
|
(
0
0
0
0
0


p
beta
a
t
t
f
A
V


0
0
a






10
10
,
10









)
,
|
(


p
beta
10
,
1
10







1
)
|
(
0
0



a
t
a
t
P
o
o
A
V
A way to measure reliability(13)










)
,
|
(
|)
|
|,
|
,
,
|
(
1
1
1
1
1










i
i
i
i
i
i
i
A
V
V
p
beta
D
d
I
i
a
t
t
t
t
f
i
i
i


|
|
)
0
|,
|
)
1
(
|
|
max(
|)
|
|,
|
)
1
min((
|
|
|
|
1
1
1
1
1












i
i
i
i
i
i
i
i
i
i
i
V
V
t
D
a
V
t
D
a
I
a
V
t

|
|
|
|
|
|
|
|
1
i
i
i
i
D
I
V
V




A way to measure reliability(14)


featured articles


considered highly trustworthy


clean
-
up articles


considered untrustworthy


Normal articles


remaining articles

A way to measure reliability(15)


administrators





registered authors





anonymous authors





blocked users




95
.
0
),
10
,
190
|
(


p
beta
7
.
0
),
10
,
23
|
(


p
beta
05
.
0
),
190
,
10
|
(


p
beta
6
.
0
),
10
,
15
|
(


p
beta
A way to measure reliability(16)


a set of English
articles from the
Geography category
in Wikipedia in
January 2006


50 featured articles


50 clean
-
up articles


768 normal articles


manually classify

A way to measure reliability(17)


U.S. National Forest

in Wikipedia


created by an anonymous author


A way to measure reliability(18)








is mean of the posterior density
distribution

n

)
(
n
V
t
f
A way to measure reliability(19)


to developed a classifier based on
aforementioned 50 featured articles and 50
clean
-
up articles


the training set contains 100 pairs , where


is the trust value of an article and is its
class

)
,
(
y
x
y
A way to measure reliability(20)


the learned rule for feature article is


the test size of 200 new articles(48805
revisions) was evaluated


the accuracy of prediction is 82%

842
.
0

x
A way to measure reliability(21)


to use trust track to predict events

A way to measure reliability(22)


the method has some problems


the reliability of author is not a constant


the test set of classifier is too small


what is the predicting standards of predict
events

Conclusion


An overview of Wikipedia and measuring
reliability in Wikipedia


to introduce some ways to measuring
reliability in Wikipedia


to realize difficult problems of measuring
reliability in Wikipedia

Reference

[1] http://en.wikipedia.org/

[2] D.
McGuinness
, H.
Zeng
,
Pda

Silva,
LDing
,
DNarayanan
, and
MBhaowal
.
Investigation into trust
for collaborative information repositories: A
Wikipedia case study
. In Proceedings of the
Workshop on Models of Trust for the Web, 2006.

[3] H.
Zeng
, M.
Alhoussaini
, L. Ding, R.
Fikes
, and D.
McGuinness
.
Computing trust from revision history
.
In Intl. Conf. on Privacy, Security and Trust, 2006.