Rank Aggregation Methods II
Experiments
CS728
Lecture 12
Recall the Rank Aggregation Problem
•
m candidates
(a.k.a. “alternatives”)
–
M = {
1
,…,m}: set of candidates
•
n voters
(a.k.a. “agents” or “judges”)
–
N = {
1
,…,n}: set of voters
•
Each voter i, has an
ranking
i
on M
–
i
(a) <
i
(b)
means i

th voter prefers
a
to
b
–
Ranking may be a total or partial order
•
The rank aggregation problem:
Combine
1
,…,
n
into a single ranking
on M, which
represents the “social choice” of the voters.
–
Rank aggregation function:
f(
1
,…,
n
) =
may be a total or partial order
Experiments: Distance Measures
Goal: Quantitatively compare different rank aggregation
methods.
Performance Measures:
(1)
Spearman footrule distance
is sum of pointwise distances. It
is normalized by dividing this number by the maximum value
(1/2)
S
2
, value between 0 and 1.
(2)
Kendall tau distance
counts the number of pairwise
disagreements. Dividing by the maximum possible value
(1/2)
S
(
S

1) we obtain a normalized version, value between 0
and 1.
(3) The
induced footrule distance
is obtained by taking the
projections of a full list s with each partial list. In a similar
manner,
induced Kendall tau distance
can be defined.
(4) The
scaled footrule distance
weights contributions of
elements based on the length of the lists they are present in. If
s is a full list and t is a partial list, then:
SF
(s, t) = Sum 
s(
i
)/s)

(t(
i
)/t)
. Normalize
SF
by
dividing by t/2.
Experiments: Distance Measures
•
So for each aggregation method and each
distance measure we get a vector of values,
each component representing a distance to
from the aggregation to each voter list
•
Simplest is to take the average (or 1

norm)
•
Other norms are interesting
–
Mean square distance (2

norm)
–
Max distance (
∞

norm)
Experiments: Minimizing Average
Altavista
(AV),
Alltheweb
(AW),
Excite
(EX),
Google
(GG),
Hotbot
HB),
Lycos
(LY), and
Northernlight
(NL)
K = Kendall distance SF = scaled footrule distance
IF = induced footrule distance LK = Local Kemenization
Experiments in Spam Filtering
•
Define
spam
to be web pages are low

ranked by
majority
opinion (machine and human
–
a simplifying
assumption)
–
although they may be highly ranked by
some search engines
•
Intuition: if a page spams most search engines for a
particular query, then no combination of these search
engines can filter the spam.

garbage in, garbage out.
•
Spam pages are the
Condorcet losers
, and will
occupy the bottom of ranking that satisfies the
extended Condorcet criterion
•
Similarly, good pages will be in the
Condorcet
winners
, and will rank above the losers.
•
Condorcet Criterion
–
An candidate of M which wins every other in
pairwise simple majority voting should be ranked first.
•
Extended Condorcet Criterion (XCC):
–
Version 1: If most voters prefer candidate a to
candidate b (i.e., # of i s.t.
i
(a) <
i
(b) is at least n/2),
then also
should prefer a to b (i.e.,
(a) <
(b)).
–
Version 2: If there is a partition (
W
,
L
) of
M
such that
for any
x
in
W
and
y
in
L
the majority prefers
x
to
y
,
then
x
must be ranked above
y
.
W
is called Condorcet
winners and
L
is Condorcet losers
Condorcet Criteria
XCC(2) and SPAM Filtering
•
Note that XCC(1) => XCC(2), so Version 1 is
stronger
•
But XCC(1) is not always realizable
•
As we will see XCC(2) is always realizable via
Local Keminization
•
Hence using rank aggregation with XCC(2)
should assist in SPAM filtering, since
Condorcet losers will be lowest rank
•
Let us look at where spam pages (human
determined) are ranked with good aggregation
methods.
Experiments: Filtering SPAM
Table 3:
Ranks of "spam" pages for the queries:
Feng Shui, organic vegetables
and
gardening
.
url
AV
AW
GG
HB
LY
NL
SFO
MC4
www.lucky

bamboo.com
4
43
41
144
63
www.cambriumcrystals.com
9
51
5
31
59
www.luckycat.com
11
14
26
13
49
36
www.davesorganics.com
84
19
1
17
77
93
www.frozen.ch
9
63
11
49
121
www.eonseed.com
18
6
16
23
66
www.augusthome.com
26
16
27
12
16
57
54
www.taun
ton.com
25
21
78
67
www.egroups.com
34
29
108
101
Experiment: Word association
•
Different search engines and portals have different (default)
semantics of handling a multi

word query.
•
Some use OR semantics (documents contain one of the given
query terms) while Google uses the AND semantics (all the
query words must appear). Both inconvenient in many
situations.
•
Consider searching for the job of a software engineer from an
on

line job database. The user lists a number of skills and a
number of potential keywords in the job description, for
example, "Silicon Valley C++ Java CORBA TCP

IP
algorithms start

up pre

IPO stock options". It is clear that the
"AND" rule might produce no document or SPAM, and the
"OR" rule is equally disastrous.
•
Experiment with rank aggregation using multiple queries
based on small subsets of terms.
•
Results for query: madras madurai coimbatore vellore.
(cities in the state of Tamil Nadu, India)
•
Google
www.mssrf.org/Fris9809/location

tamilnadu.html
www.indiaplus.com/Info/schools.html
www.focustamilnadu.com/tamilnadu/Policy%20Note
...Forests.html
www.tn.gov.in/policy/environ.htm
www.indiacolleges.com/Tamil_Nadu.htm
•
SFO with LK
www.madurai.com
www.ozemail.com.au/clday/locations.htm
www.utoledo.edu/homepages/speelam/coimbatore.html
www.ozemail.com.au/clday/madras.htm
www.madurai.com/around.htm
www.indiatraveltimes.com/tamilnadu/tamil1.html
•
MC4 with LK
www.madurai.com
www.surfindia.com/omsakthi/tourism.htm
www.indiatraveltimes.com/tamilnadu/tamil1.html
www.indiatraveltimes.com/tamilnadu/tamil2.html
www.indiatravels.com/forts/vellore_fort.htm
www.india

tourism.de/english/south/tamil_nadu.html
•
•
Locally Kemeny optimal
aggregation and XCC(2)
•
Many of existing aggregation methods do not
satisfy XCC(1) or XCC(2).
•
It is possible to use your favorite aggregation
method to obtain a full list. Then apply local
kemenization to realize XCC(2) which filters
Condorcet losers.
Locally Kemeny optimal
•
Recall that Kemeny optimal is NP

hard
•
Definition of locally optimal
A permutation p is a
locally Kemeny optimal
aggregation of partial lists t1, t2, ..., t
k
, if there is no
permutation p' that can be obtained from p by
performing a single transposition of an
adjacent pair
of elements and for which
Kendal distance
K
(p', t1, t2, ..., t
k
) <
K
(p, t1, t2, ..., t
k
).
In other words, it is impossible to reduce the total
distance to the t's by flipping an adjacent pair.
Example of LKO but not KO
•
Example 1
•
t1 = (1,2), t2 = (2,3), t3 = t4 = t5 = (3,1).
•
p = (1,2,3),
We have that p satisfies Definition of LKO,
K
(p, t1, t2, ..., t5)= 3, but transposing 1 and 3
decreases the sum to 2.
LKO satisfies XCC(2)
•
Proof by contradiction
If the result is false then there exist partial lists t1, t2, ..., t
k
, a
LKO aggregation p, and a partition (W,L) that violates
XCC(2); that is some pair c in
W
and d in
L
, such that p(d) <
p(c). Let (c,d) be the closest such pair in p.
•
Consider the immediate successor of
d
in p, call it
e
. If
e=c
then
c
is adjacent to
d
in p and transposing this adjacent pair of
alternatives produces a p' such that
K
(p', t1, t2, ..., t
k
) <
K
(p,
t1, t2, ..., t
k
), contradicting the assumption on p.
•
If
e
does not equal
c
, then either
e
is in
W
, in which case the
pair (
e,d
) is a closer pair in p than (
d
,
c
) and also violates the
XCC(2), or
e
is in
L
, in which case (
e
,
c
) is a closer pair than
(
d
,
c
) that violates XCC(2). Both cases contradict the choice of
(
d
,
c
).
•
A local Kemenization of a full list with respect to preference
lists so as to compute a locally Kemeny optimal aggregation
that is maximally consistent with original.
This approach:
(1) preserves the strengths of the initial aggregation
(2) ranks non

spam above spam.
(3) gives a result that disagrees with original on any pair
(i, j) only if a majority
endorse this disagreement.
(4) for every d, 1 ≤
d
≤ 
μ
, the restriction of the output is a
local Kemenization of the top d elements of
μ
Local Kemenization procedure
Local Kemenization procedure
•
A simple inductive construction.
•
Assume inductively for that we have constructed p, a local
Kemenization of the projection of the t's onto the elements 1,
...,
l

1.
•
Insert next element x into the lowest

ranked "permissible"
position in p: just below the lowest

ranked element
y
in p such
that
–
(a) no majority among the (original) t's prefers
x
to
y
and
–
(b) for all successors
z
of
y
in p there is a majority that prefers
x
to
z
.
•
In other words, we try to insert
x
at the end (bottom) of the list
p; we bubble it up toward the top of the list as long as a
majority of the t's insists that we do.
Example local kemenization procedure
A
B
F
E
C
D
B
C
A
E
F
D
A
C
F
D
E
B
B
F
D
C
A
E
C
A
B
F
E
D
B
A
D
C
E
F
B
B
A
A
B
A
B
D
A
B
D
C
A
B
C
D
A
B
C
F
E
D
•
Local Kemenization Example!
disagree
A>B: 3
A<B: 2
B>D: 4
B<D: 1
RA and Searching Workplace Web
•
Axiom 1: Intranet documents are not spam
•
Axiom 2: Queries usually have unique answers
(not broad topic based)
•
Axiom 3: Intranet docs are not search engine
friendly (docs are accessed through portals and
database queries
•
Rank aggregation allows us to combine
number of heuristic alternatives: static and
dynamic, query dependent and independent
Comments 0
Log in to post a comment