Learning to Rank: New Techniques and Applications

muscleblouseΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 3 χρόνια και 5 μήνες)

54 εμφανίσεις

Learning
to
Rank:


New Techniques
and
Applications

Martin Szummer

Microsoft Research

Cambridge, UK

Microsoft

Research

Why learning to rank?


Current rankers use many features, in complex
combinations


Applications


Web search ranking, enterprise search


Image
search


Ad selection


Merging multiple results lists


The good: uses training data to find combinations of
features that optimize IR metrics


The bad: requires judged training data. Expensive,
subjective, not provided by end
-
users, out
-
of
-
date

Martin Szummer

2

Microsoft

Research

This talk


Learning to rank with IR metrics



A single, simple yet competition
-
winning, recipe.


Works for NDCG, MAP, Precision with linear or non
-
linear ranking functions (neural nets, boosted trees
etc
)



Semi
-
supervised ranking

A new technique. Reduce the amount of judged training
data required.



Learning to merge

Application: merging results lists from multiple query
reformulations



Martin Szummer

3

Actually


I apply the
same recipe in three
different settings!

Microsoft

Research

Ranking Background


Classification: determine the
class

of an item
i


(operates on individual items)


Ranking: determine the
preference

of item
i

versus
j


(
operates on pairs of items)



Ranking function:


=

𝑓


;





Example: Linear function


=


𝑻


Ranking function induces a preference:




when


>



Martin Szummer

4

score

function

query
-
doc

features

parameters

Microsoft

Research

From Ranking Function to the Ranking


Applying the ranking function to define a ranking










{

1
,

2
,

}



𝑓
(


;

)




1
,

2
,




Sort


{

1
,

2
,

}



Above: had a deterministic model of preference


Henceforth: a
probabilistic

model


translates score differences into a probability of preference


Bradley
-
Terry/Mallows

𝑃



=
1
1
+


𝑠


𝑠


Martin Szummer

5

Microsoft

Research

Learning to Rank


Learning
to
rank


{

1
,

2
,

}



𝑓
(


;

)




1
,

2
,




Sort


{

1
,

2
,

}




Maximize likelihood of the preference pairs given in
training data



max




𝕀




log
𝑃
(



)
(

,

)


𝐿


indicator
𝕀



=
1

when




in train


e.g. RankNet model
[
Burges et al 2005]

Martin Szummer

6

given

given

determine
w

preference

pairs

Microsoft

Research

Learning to Rank for IR metrics


IR metrics such as NDCG, MAP or
Precision
depend
on:


sorted order of items


ranks of items: weight the top of the ranking more

Recipe

1)
Express the metric as a sum of pairwise
swap deltas

2)
Smooth it by multiplying by a Bradley
-
Terry term

3)
Optimize parameters by gradient descent over a judged
training set


LambdaRank &
LambdaMART

[Burges et al]
are instances of
this recipe. The latter won the Yahoo! Learning to rank
challenge (2010).

Martin Szummer

7

Microsoft

Research

Example: Apply recipe to NDCG metric


Unpublished material. Email me if interested.

Martin Szummer

8

Microsoft

Research

Gradients
-

intuition


Gradients



𝑠

=



Δ

(
𝕀




𝑃
(



)
)

:

,



𝐿






act as
forces
on doc pairs

Martin Szummer

9

x

L

r

1

2

3

4

5

𝑑
𝐶
𝑑




𝑑
𝐶
𝑑




Microsoft

Research

Semi
-
supervised Ranking

10

Martin Szummer

[
w
ith Emine Yilmaz]

Train with
jud
ged

AND
unjudged

query
-
document
pairs

Microsoft

Research

Semi
-
supervised
Ranking


Applications


(Pseudo) Relevance feedback


Reduce the number of (expensive) human judgments


Use when judgments are hard to obtain


Customers may not want to judge their collections


adaptation to a specific company in enterprise search


ranking for small markets, special interest domains,


Approach


preference
learning


end
-
to
-
end optimization of ranking metrics (NDCG, MAP)


multiple and completely unlabeled rank instances


scalability


Martin Szummer

11

Microsoft

Research

How to benefit from
unlabeled

data
?

Unlabeled

data gives information about the data
distribution
P
(
x
)
. We must make assumptions about
what the structure of the
unlabeled

data tells us about
the ranking distribution
P
(
R
|
x
).


A common assumption: the
cluster assumption


Unlabeled

data defines the extent of clusters,


Labeled

data determines the class/function value of
each cluster



Martin Szummer

12

Microsoft

Research

Semi
-
supervised



classification
:

similar documents
Þ

same
class


regression
:
similar documents
Þ

similar function value


ranking: similar documents
Þ
similar preference


i.e. neither is preferred


to the other





Differences
from classification & regression:


Preferences provide
weaker constraints than function values or
classes

Martin Szummer

13

is a type of
regularizer

on
the function we are learning.

Similarity can be defined based
on
content.

Does not require judgments.

Microsoft

Research

Quantify Similarity


similar
documents
Þ
similar
preference


i.e. neither is preferred to the other








Unpublished material. Email me if interested.

Martin Szummer

14

Microsoft

Research


Semi
-
supervised Gradients

Martin Szummer

15

x

L

𝑑
𝐶
𝐿
𝑑




𝑑
𝐶
𝑈
𝑑




𝑑
𝐶
𝐿
𝑑



+

𝑑
𝐶
𝑈
𝑑



Microsoft

Research

Experiments

Relevance Feedback task:


1) user issues a query and labels a few of the resulting documents from a traditional ranker (BM25)


2) system trains query
-
specific ranker, and re
-
ranks


Data: TREC collection. 528,000 documents, 150 queries

1000 total documents per query; 2
-
15 docs are labeled


Features:



ranking features (q, d): 22 features from LETOR



content features (d1, d2): TF
-
IDF
dist

between top 50 words


Neighbors in input space using either of the above



Note: at test time, only ranking features are used;



method allows using features of type (d1, d2) and (q, d1, d2) at training that other
algos

cannot use


Ranking function
f()
: neural network, 3 hidden units

K=5 neighbors

Martin Szummer

16

Microsoft

Research

Relevance Feedback Task

Martin Szummer

17

LambdaRank L&U
Cont

LambdaRank L&U

LambdaRank L

TSVM L&U

RankBoost L&U

RankingSVM

L

RankBoost L

Microsoft

Research

Novel Queries
T
ask

Martin Szummer

18

90,000 training documents

3500 preference pairs

40 million unlabeled pairs

Microsoft

Research

Novel Queries Task

Martin Szummer

19

LambdaRank L&U
Cont

LambdaRank L&U

LambdaRank L

Upper Bound

Microsoft

Research

Learning to Merge

Task: learn a ranker that merges results from other rankers


20

Martin Szummer

Example application

users
do not know the best way to express their web search
query

a single query may not be enough to reach all relevant
documents

merge


results

wp7

wp7 phone

reformulate

in parallel:

microsoft

wp7

user:

Solution

Microsoft

Research

Merging Multiple Queries

[
with Sheldon, Shokouhi, Craswell]


Traditional approach: alter before
retrieval


Merging:
alter
after
retrieval



Prospecting: see results first, then decide


Flexibility: any is rewrite allowed, arbitrary
features


Upside potential: better than any individual list


Increased query load on engine: use cache to
mitigate it

Martin Szummer

21

Microsoft

Research

LambdaMerge
: learn to merge

A weighted mixture of ranking function

Martin Szummer

22

Rewrite features
:



Rewrite
-
difficulty:



ListMean
,
ListStd
, Clarity



Rewrite
-
drift:



IsRewrite
,
RewriteRank
,


RewriteScore,Overlap@N


Scoring features:



Dynamic rank score, BM25,


Rank,
IsTopN

rewrite feat

score feat

score feat

jupiters

mass

mass of
jupiter




=



𝑓

(

𝒅
)


Microsoft

Research


Martin Szummer

23

Microsoft

Research


Martin Szummer

24

Microsoft

Research

Martin Szummer

25

Reformulation


Original NDCG

Merged


Original NDCG

𝜆
-
Merge
Results

Microsoft

Research

Summary


Learning to Rank


An indispensable tool


Requires judgments: but semi
-
supervised learning can help


crowd
-
sourcing is also a possibility


research frontier: implicit judgments from clicks


Many applications beyond those shown


Merging: multiple local search engines, multiple language engines


Rank recommendations in collaborative filtering


Many
thresholding

tasks (filtering) can be posed as ranking


Rank ads for relevance


Elections




Use it!


Martin Szummer

26