Optimizing search engines using clickthrough data

thumbpinchInternet και Εφαρμογές Web

18 Νοε 2013 (πριν από 3 χρόνια και 1 μήνα)

67 εμφανίσεις

Optimizing search engines using
clickthrough data

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ

ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ

ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ

ΤΟΜΕΑΣ ΤΕΧΝΟΛΟΓΙΑΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΚΑΙ
ΥΠΟΛΟΓΙΣΤΩΝ



2

Problem


Optimization of web search results ranking


Ranking algorithms mainly based on similarity


Similarity between query keywords and page text keywords


Similarity between pages (Page Rank)


No consideration for user personal preferences

Digital libraries

High rank

Vacation

Low rank

3

Problem


E.g.: query “delos”


Digital libraries


Archaeology


Vacation


Room for improvement by incorporating user behaviour
data:
user feedback


Use of previous implicit search information to enhance result ranking

Vacation

Low rank

Digital libraries

High rank

4

Types of user feedback


Explicit feedback


User explicitly judges relevance of results with the
query


Costly in time and resources


Costly for user


limited effectiveness


Direct evaluation of results


Implicit feedback


Extracted from log files


Large amount of information


Indirect evaluation of results through click behaviour

5

Implicit feedback (Categories)


Clicked results


Absolute relevance
:


Clicked result
-
> Relevant


Risky: poor user behaviour quality


Percentage of result clicks for a query


Frequently clicked groups of results


Links followed from result page


Relative relevance
:


Clicked result
-
> More relevant than non
-
clicked


More reliable


Time


Between clicks


E.g., fast clicks


maybe bad results


Spent on a result page


E.g., a lot of time


r
elevant page


First click, scroll


E.g., maybe confusing results

6

Implicit feedback (Categories)


Query chains: sequences of reformulated queries to
improve results of initial search


Detection:


Query similarity


Result sets similarity


Time


Connection between results of different queries:


Enhancement of a bad query with another from the query chain


Comparison of result relevance between different query results


Scrolling


Time (quality of results)


Scrolling behaviour (quality of results)


Percentage of page (results viewed)


Other features


Save, print, copy etc of result page


maybe relevant


Exit type (e.g. close window


poor results)

7

Joachims approach


Clickthrough data


Relative relevance


Indicated by user behaviour study


Method


Training of svm functions


Training input: inequations of query result rankings


Trained function: weight vector for features examined


Use of trained vector to give weights to examined
features


Experiments


Comparison of method with existing search engines

8

Clickthrough data


Form


Triplets (q,r,c) = (query,ranked results,links clicked)


…in the log file of a proxy server


Relative relevance (regarding a specific query)


d
k

more relevant than d
l

if


d
k

clicked and


d
l

not clicked and


d
k

lower initial ranking than d
l

d5 more relevant than d4

d5 more relevant than d2


d3 more relevant than d2

9

Clickthrough data (Studies)


Why not absolute relevance?


User behaviour influenced by initial
ranking


Study: Rank and viewership


Percentage of queries where a user
viewed the search result presented at
a particular rank


Conclusion: Most times users view
only the few first results



Study: Rank and viewership


Percentage of queries where a user
clicked the result presented at a
particular rank, both in normal and
swapped conditions


Conclusion: Users tend to click on
higher ranked results irrespective of
content

10

Method (System train input)


Data extracted from log


Relevance inequalities
:


d
k

<
r
q

d
l



d
k

more relevant

than d
l

for query q


Construct relevance inequalites for every query q and
every result pair (d
k

, d
l
) for q


For each link di (result of query q):


construct a
feature vector
Φ(
q,d
i
)

d5 more relevant than d4

d5 more relevant than d2


d3 more relevant than d2

11

Method (System train input)


Feature vector:


Describes the quality of the match between a document
d
i

and a query q


Φ(
q,d
i
)
= [rank_X, top1_X, top10_X, …..]

12

Method (System train input)


Weight vector:


w

= [w1, w2,…,wn]


Assigns weights to each of the features in
Φ(
q,d
i
)


Sd
i

=

w *
Φ(
q,d
i
)
assigns a score to document di for
query q


w
: unknown


must be trained by solving the system of
relative relevance inequalities derived from clickthrough
data

13

Method (System training)


Initial input transformation:


d
k

<
r
m

d
l

=> Sd
k

>
r
m

Sd
l

=>


w *
Φ(
q
m
,d
k
)
>

w *
Φ(
q
m
,d
l
)
=>


[w1, w2,…,wn]
Φ(
q
m
,d
k
)
>

[w1, w2,…,wn]
Φ(
q
m
,d
l
)


System of

relevance inequalites for every query q and
every result pair (d
k

, d
l
) for q


Object of training:


Knowing, for every clicked link d
i

, its feature vector,


Find an optimal weight vector
w

which satisfies most of
the inequalities above (optimal solution)


With vector
w

known, we can calculate a score for every
link, whether clicked or not, hence a new ranking

d5 more relevant than d4

d5 more relevant than d2


d3 more relevant than d2

14

Method (System training)


Intuitively


Example: 2 dimension feature vector


X
-
axis: similarity between query terms and link text


Y
-
axis: time spent on link page


Looking for optimal weight vector
w


If training input: r1 < r2 < r3 < r4


Optimal solution w1: text similarity more important for ranking


If training input: r2 < r3 < r1 < r4


Optimal solution w2: time more important for ranking


w

optimized by maximizing
δ

Link 1

Link

2

Link

3

Link 4

15

Experiments


Based on a meta
-
search engine


Combination of results from different search
engines


“Fair” presentation of results:


For a meta
-
meta
-
search engine combining 2 engines:


Top z results of meta
-
search contain


x results from engine A


y results from engine b


x + y = z


|x


y| <= 1

16

Experiments


Meta
-
search
engine


Assumptions


Users click more
often on more
relevant links


Preference for
one of the search
engines not
influenced by
ranking

17

Experiments (Results)


Conditions


20 users (university students)


260 training queries collected


20 days of training


10 days of evaluation


Comparison with


Google


MSNSearch


Toprank (meta
-
search for Google, MSNSearch, Altavista, Excite,
Hotbot)


Results

18

Experiments (Results)


Weight vector

19

Open issues


Trade
-
off between amount of training data and
homogeneity


Clustering algorithms to find homogenous groups
of users


Adaptation to the properties of a particular
document collection


Incremental online learning/feedback algorithm


Protection from spamming

20

Joachims approach II


Clickthrough data


Addition of new relative relevance criterions


Addition of query chains


Method


Modified feature vector


Rank features


Term/document features


Constraints to the svm optimization problem


To avoid trivial/incorrect solutions

21

Query chains


Sequences of reformulated queries


Poor results for first query


Too many/few terms


Not representative terms


q1: “special collections”


q2: “rare books”


Incorrect spelling


q1: “Lexis Nexus”


q2: “
Lexis Nexis



Execution of new query


Addition/deletion of query terms to get better results


In every query of the sequence, searching for the same thing


Ways of detection


Term similarity between queries


Percentage of common results


Time between queries

22

Query chains detection method


1
st

strategy


Data recorded from log for 1285 queries:


query, date, IP address, results returned, number of
clicks on the results, session id uniquely assigned


Manual grouping of queries in query chains


Division of data set into training and testing set


Training


Feature vector for every pair of query


Training of a weight vector indicating which features
are more important for a query pair to be a query
chain


Testing


Recognition of query chains in testing set


Comparison with manual grouping


94.3% accuracy


2
nd

strategy


Assumption: Query chain if:


Queries from the same IP


Time between 2 queries < 30 min


91.6% accuracy


Adoption of (simpler) 2
nd

strategy

23

Clickthrough data (studies)


Conclusion: Most
times users view
the first 2 results





Conclusion: Most
times users view
the next result
after the last
clicked

24

Relevance criteria (query)


Click >
q

Skip Above


d
k

more relevant than d
l

if


d
k

clicked and


d
l

not clicked and


d
k

lower initial ranking than d
l


Click First >
q

No
-
Click Second


d
1

more relevant than d
2

if


d
1

clicked


d
2

not clicked

d5 more relevant than d4

d5 more relevant than d2

d3 more relevant than d2

d1 more relevant than d2

25

Relevance criteria (query chain)


Click >
q1

Skip Above


d
k

more relevant than d
l

if


d
k

clicked and


d
l

not clicked and


d
k

lower initial ranking than d
l


Click First >
q1

No
-
Click Second


d
1

more relevant than d
2

if


d
1

clicked


d
2

not clicked

FOR QUERY 1 (q’)

d8 more relevant than d7

d8 more relevant than d6

FOR QUERY 1 (q’)

d5 more relevant than d6

QUERY 1

QUERY 2

26

Relevance criteria (query chain II)



Click >
q1

Skip Earlier Query






Click First >
q1

Top 2 Earlier Query

FOR QUERY 1 (q’)

d6 more relevant than d1

d6 more relevant than d2

d6 more relevant than d4

FOR QUERY 1 (q’)

d6 more relevant than d1

d6 more relevant than d2

QUERY 1

QUERY 2

27

Relevance criteria (Experiment)


Sequences of reformulated queries


Search on 16 subjects


Relevance inequalities produced by previous
criteria


Comparison with explicit relevance judgements
on query chains

28

Method (SVM training)


Feature vector consists of


Rank features
φ
fi
rank
(d, q)


Term/document features
φ
terms
(d, q)


Rank features


One
φ
fi
rank
(d, q) for every retrieval
function fi examined


Every
φ
fi
rank
(d, q) consists of 28
features


For ranks 1,2,3,…10,15,20,…100 in the
initial

ranking


If rank of d in
initial

ranking


1,2,3,…10,15,20,…100 set 1 else set 0


Allow us to learn weights for and
combine different retrieval functions

29

Method (SVM training)


Term/document features


φ
terms
(d, q) consists of N*M features


N number of terms


M number of documents


Set 1 if di = d and tj
ε

q


Trains weight vector to learn
assosiations between terms and
documents

30

Method (Constraints)


Problem


Most relevance criteria suggest that a lower ranked link
(in the initial ranking) is better than a higher ranked link


Trivial solution: reverse the initial ranking by assigning
negative values to weight vector


Solution


Each w*
φ
fi
rank
(d, q)


a minimun positive value



31

Experiments


Based on a meta
-
search engine


Initial retrieval function from Nutch


Training


9949 queries


7429 clicks


Pqc: Total of 120134 relative relevance inequalities


Pnc: A set of Pqc generated without use of query chains


Results: (rel
0

initial ranking)

32

Experiments


Results


Trained weights for term/document features

33

Open issues


Tolerance of noise and malicious clicks


Different weights in each query of a query chain


Personalized ranking functions


Improvement of performance

34

Related work (Fox et al. 2005)


Evaluation of implicit measures


Use of explicit ratings of user satisfaction


Method


Bayesian modeling and decision trees


Gene analysis


Results: Importance of combination of


Clickthrough


Time


Exit type


Features used:

35

Related work (
Agichtein, Brill, Dumais 2006
)


Incorporation of implicit
feedback features directly
into the initial ranking
algorithm


BM25F


RankNet


Features used:

36

Related work


Query/links clustering based on features extracted
from implicit feedback


Score based on interrelations between queries and
web pages

37

Possible extensions


Utilization of time feedback


As relative relevance criterion


As a feature


Other types of feedback


Scrolling


Exit type


Combination with absolute relevance clickthrough feedback


Percentage of result clicks for a query


Links followed from result page


Query chains


Improvement of detection method


Link association rules


For frequently clicked groups of results


Query/links clustering


Constant training of ranking functions

38

References


Search Engines that Learn from Implicit Feedback


Joachims, Radlinski


Optimizing Search Engines Using Clickthrough Data


Joachims


Query Chains: Learning to Rank from Implicit Feedback


Radlinski, Joachims


Evaluating implicit measures to improve web search


Fox et al.


Implicit Feedback for Inferring User Preference A Bibliography


Kelly, Teevan


Improving Web Search Ranking by Incorporating User

Behavior


Agichtein, Brill, Dumais


Optimizing web search using web click
-
through data


Xue, Zeng, Chen, Yu, Ma, Xi, Fan

39

Clickthrough data (Studies)


Why not absolute relevance?


Study: Rank and viewership


Percentage of queries where a user clicked the result presented
at a particular rank, both in normal and swapped conditions


Conclusion: Users tend to click on higher ranked results
irrespective of content

40

Types of user feedback


Explicit feedback


User explicitly judges relevance of results with the
query


Costly in time and resources


Costly for user


limited effectiveness


Direct evaluation of results


Implicit feedback


Extracted from log files


Large amount of information


Real user behaviour (not expert judgement)


Indirect evaluation of results through click behaviour