# Rank Aggregation Methods II Experiments

Internet και Εφαρμογές Web

4 Δεκ 2013 (πριν από 4 χρόνια και 7 μήνες)

156 εμφανίσεις

Rank Aggregation Methods II

Experiments

CS728

Lecture 12

Recall the Rank Aggregation Problem

m candidates

(a.k.a. “alternatives”)

M = {
1
,…,m}: set of candidates

n voters

(a.k.a. “agents” or “judges”)

N = {
1
,…,n}: set of voters

Each voter i, has an
ranking

i

on M

i
(a) <

i
(b)

means i
-
th voter prefers
a

to
b

Ranking may be a total or partial order

The rank aggregation problem:

Combine

1
,…,

n

into a single ranking

on M, which
represents the “social choice” of the voters.

Rank aggregation function:
f(

1
,…,

n
) =

may be a total or partial order

Experiments: Distance Measures

Goal: Quantitatively compare different rank aggregation
methods.

Performance Measures:

(1)
Spearman footrule distance

is sum of pointwise distances. It
is normalized by dividing this number by the maximum value
(1/2)
|S|
2
, value between 0 and 1.

(2)
Kendall tau distance

counts the number of pairwise
disagreements. Dividing by the maximum possible value
(1/2)
S
(
S

-

1) we obtain a normalized version, value between 0
and 1.

(3) The
induced footrule distance

is obtained by taking the
projections of a full list s with each partial list. In a similar
manner,
induced Kendall tau distance

can be defined.

(4) The
scaled footrule distance

weights contributions of
elements based on the length of the lists they are present in. If
s is a full list and t is a partial list, then:

SF
(s, t) = Sum |

s(
i
)/|s|)
-

(t(
i
)/|t|)

|. Normalize
SF

by
dividing by |t|/2.

Experiments: Distance Measures

So for each aggregation method and each
distance measure we get a vector of values,
each component representing a distance to
from the aggregation to each voter list

Simplest is to take the average (or 1
-
norm)

Other norms are interesting

Mean square distance (2
-
norm)

Max distance (

-
norm)

Experiments: Minimizing Average

Altavista

(AV),
Alltheweb

(AW),
Excite

(EX),

(GG),
Hotbot

HB),
Lycos

(LY), and
Northernlight

(NL)

K = Kendall distance SF = scaled footrule distance

IF = induced footrule distance LK = Local Kemenization

Experiments in Spam Filtering

Define
spam

to be web pages are low
-
ranked by
majority

opinion (machine and human

a simplifying
assumption)

although they may be highly ranked by
some search engines

Intuition: if a page spams most search engines for a
particular query, then no combination of these search
engines can filter the spam.
---
garbage in, garbage out.

Spam pages are the
Condorcet losers
, and will
occupy the bottom of ranking that satisfies the
extended Condorcet criterion

Similarly, good pages will be in the
Condorcet
winners
, and will rank above the losers.

Condorcet Criterion

An candidate of M which wins every other in
pairwise simple majority voting should be ranked first.

Extended Condorcet Criterion (XCC):

Version 1: If most voters prefer candidate a to
candidate b (i.e., # of i s.t.

i
(a) <

i
(b) is at least n/2),
then also

should prefer a to b (i.e.,

(a) <

(b)).

Version 2: If there is a partition (
W
,
L
) of
M

such that
for any
x

in
W

and
y

in
L

the majority prefers
x

to
y
,
then
x

must be ranked above
y
.
W

is called Condorcet
winners and
L

is Condorcet losers

Condorcet Criteria

XCC(2) and SPAM Filtering

Note that XCC(1) => XCC(2), so Version 1 is
stronger

But XCC(1) is not always realizable

As we will see XCC(2) is always realizable via
Local Keminization

Hence using rank aggregation with XCC(2)
should assist in SPAM filtering, since
Condorcet losers will be lowest rank

Let us look at where spam pages (human
determined) are ranked with good aggregation
methods.

Experiments: Filtering SPAM

Table 3:

Ranks of "spam" pages for the queries:

Feng Shui, organic vegetables
and
gardening
.

url

AV

AW

GG

HB

LY

NL

SFO

MC4

www.lucky
-
bamboo.com

4

43

41

144

63

www.cambriumcrystals.com

9

51

5

31

59

www.luckycat.com

11

14

26

13

49

36

www.davesorganics.com

84

19

1

17

77

93

www.frozen.ch

9

63

11

49

121

www.eonseed.com

18

6

16

23

66

www.augusthome.com

26

16

27

12

16

57

54

www.taun
ton.com

25

21

78

67

www.egroups.com

34

29

108

101

Experiment: Word association

Different search engines and portals have different (default)
semantics of handling a multi
-
word query.

Some use OR semantics (documents contain one of the given
query terms) while Google uses the AND semantics (all the
query words must appear). Both inconvenient in many
situations.

Consider searching for the job of a software engineer from an
on
-
line job database. The user lists a number of skills and a
number of potential keywords in the job description, for
example, "Silicon Valley C++ Java CORBA TCP
-
IP
algorithms start
-
up pre
-
IPO stock options". It is clear that the
"AND" rule might produce no document or SPAM, and the
"OR" rule is equally disastrous.

Experiment with rank aggregation using multiple queries
based on small subsets of terms.

(cities in the state of Tamil Nadu, India)

www.mssrf.org/Fris9809/location
-

www.indiaplus.com/Info/schools.html

...Forests.html

www.tn.gov.in/policy/environ.htm

SFO with LK

www.ozemail.com.au/clday/locations.htm

www.utoledo.edu/homepages/speelam/coimbatore.html

MC4 with LK

www.surfindia.com/omsakthi/tourism.htm

www.indiatravels.com/forts/vellore_fort.htm

www.india
-

Locally Kemeny optimal
aggregation and XCC(2)

Many of existing aggregation methods do not
satisfy XCC(1) or XCC(2).

It is possible to use your favorite aggregation
method to obtain a full list. Then apply local
kemenization to realize XCC(2) which filters
Condorcet losers.

Locally Kemeny optimal

Recall that Kemeny optimal is NP
-
hard

Definition of locally optimal

A permutation p is a
locally Kemeny optimal

aggregation of partial lists t1, t2, ..., t
k
, if there is no
permutation p' that can be obtained from p by
performing a single transposition of an

of elements and for which

Kendal distance

K
(p', t1, t2, ..., t
k
) <
K
(p, t1, t2, ..., t
k
).

In other words, it is impossible to reduce the total
distance to the t's by flipping an adjacent pair.

Example of LKO but not KO

Example 1

t1 = (1,2), t2 = (2,3), t3 = t4 = t5 = (3,1).

p = (1,2,3),

We have that p satisfies Definition of LKO,
K
(p, t1, t2, ..., t5)= 3, but transposing 1 and 3
decreases the sum to 2.

LKO satisfies XCC(2)

If the result is false then there exist partial lists t1, t2, ..., t
k
, a
LKO aggregation p, and a partition (W,L) that violates
XCC(2); that is some pair c in
W

and d in
L
, such that p(d) <
p(c). Let (c,d) be the closest such pair in p.

Consider the immediate successor of
d

in p, call it
e
. If
e=c

then
c

d

in p and transposing this adjacent pair of
alternatives produces a p' such that
K
(p', t1, t2, ..., t
k
) <
K
(p,
t1, t2, ..., t
k
), contradicting the assumption on p.

If
e

does not equal
c
, then either
e

is in
W
, in which case the
pair (
e,d
) is a closer pair in p than (
d
,
c
) and also violates the
XCC(2), or
e

is in
L
, in which case (
e
,
c
) is a closer pair than
(
d
,
c
) that violates XCC(2). Both cases contradict the choice of
(
d
,
c
).

A local Kemenization of a full list with respect to preference
lists so as to compute a locally Kemeny optimal aggregation

that is maximally consistent with original.

This approach:

(1) preserves the strengths of the initial aggregation

(2) ranks non
-
spam above spam.

(3) gives a result that disagrees with original on any pair
(i, j) only if a majority
endorse this disagreement.

(4) for every d, 1 ≤
d

≤ |
μ

|, the restriction of the output is a
local Kemenization of the top d elements of
μ

Local Kemenization procedure

Local Kemenization procedure

A simple inductive construction.

Assume inductively for that we have constructed p, a local
Kemenization of the projection of the t's onto the elements 1,
...,
l
-
1.

Insert next element x into the lowest
-
ranked "permissible"
position in p: just below the lowest
-
ranked element
y

in p such
that

(a) no majority among the (original) t's prefers
x

to
y

and

(b) for all successors
z

of
y

in p there is a majority that prefers
x

to
z
.

In other words, we try to insert
x

at the end (bottom) of the list
p; we bubble it up toward the top of the list as long as a
majority of the t's insists that we do.

Example local kemenization procedure

A

B

F

E

C

D

B

C

A

E

F

D

A

C

F

D

E

B

B

F

D

C

A

E

C

A

B

F

E

D

B

A

D

C

E

F

B

B

A

A

B

A

B

D

A

B

D

C

A

B

C

D

A

B

C

F

E

D

Local Kemenization Example!

disagree

A>B: 3

A<B: 2

B>D: 4

B<D: 1

RA and Searching Workplace Web

Axiom 1: Intranet documents are not spam

Axiom 2: Queries usually have unique answers