The use of relevance criteria

bawltherapistSoftware and s/w Development

Dec 13, 2013 (3 years and 7 months ago)

71 views

The use of relevance criteria
during predictive judgment

An eye tracking approach

Panos

Balatsoukas

& Ian
Ruthven

Computer and Information Sciences

University of Strathclyde

outline


overall study


to understand assessments of information
relevance

in web searching


particularly how
information features
are used to decide on
relevance


mixed

method research design based on the use of talk
aloud protocols, recording of eye movements and post
search interviews


study of
predictive

relevance assessments


focussing particularly on
Google

interface


how people interact with a search interface to decide
which

information paths to follow


Definition [1]: predictive judgments

Predictive judgment

Evaluative judgment

Definition [2]: Relevance criteria


relevance
criterion can be defined as the
parameter

or
value

by which users determine
the relevance of a retrieved object at a certain
point in
time


a number of studies have shown
Topicality

is
the most frequently used criterion during
relevance judgments


at least as the
first

criterion




Definition[2]: relevance criteria

Currency/
Recency

Availability

User background

Tangibility

Scope/depth

Authority/quality

Presentation

Definition [2]: relevance criteria


criteria can be both negative or positive:

e.g.
This source does not discusses the topic of my
search
(
negative

occurrence of the criterion of
Topicality)


e.g.
The paper is on topic and presents examples of
programming codes
(
positive

occurrences of the
criteria of Topicality and Tangibility)

Aim and objectives


to explore the
relationship

between relevance criteria use and human eye
movements


e.g. number of fixations, fixation length and
scanpaths

during the process of
predictive relevance judgment



to date, studies on user
-
defined relevance criteria were focused on the
use of occurrence frequencies.


data subjected to post hoc rationalisation and
re
-
construction
effect


lack of objective and real time
behavioural

data during the relevance
judgment process



advantages of eye tracking


real time data


opportunity to associate participants’
cognitive

processes with visual
searching behaviour


different types of eye movement
data



number and length of fixations, gaze diameter,
scanpaths




Questions


what is the
cognitive

effort spent on specific
relevance criteria?


is there an effect of
ranking

order of search
results and surrogate components?


is there an association between
criteria

and
grades

of relevance?

participants


24 students


own

information problem


searched for about
25

minutes


only focus on interaction with
Google


although they could use other search engines


later studies also looked at other groups and
pre
-
assigned information tasks

Research design

Post search interview


why did you decide to click
on that link?



did you expect to find
Relevant/ Partially Relevant
/ Not Relevant information?
(grades of relevance)



what kind of information
did you expect to find?



what words or phrases
helped you make a
decision?



281 fixated surrogates


Eye movements:

The participant fixates on the term

SQLite
” which appears in the title of
the first surrogate
.

Talk
aloud:

In the analysis of the
talk
aloud
protocol the participant justified
this fixation as follows:

“It looks relevant to my topic as it
talks about
SQLite


coding

Participant ID = 08

Category

Relevance grade

Word fixated

=
SQLite


Talk

aloud:

“It looks relevant to my topic as it talks about
SQLite


Interviews:

“I found it

oe汥癡湴n扥ca畳攠潦⁴桥 睯w搠
p兌楴e

潮⁴桥oT楴汥



T潰楣o汩ty


(1*)


Relevant

(3)

Interviews:


“I found it Relevant because of the
word
SQLite

on the Title […] I
expected to find information about
SQLITE
documentation”

+

+

I haven’t found
anything interesting
elsewhere, I might
be lucky here

Judgment based on
ranking: ‘if Google
thinks it is good…’

Participant ID = 08

Category

Relevance
grade

Location

Word fixated

=
SQLite


Talk

aloud:

“It looks relevant to my topic as it talks
a扯畴b
p兌楴e


Interviews:

“I found it

oe汥癡湴n扥ca畳攠潦⁴桥
睯w搠
p兌楴e

潮⁴桥oT楴汥



Topicality


(1*)


Relevant

(3)


1.1.First
surrogate

1.2.2.1.Title

Participant ID = 08

Category

Relevance
grade

Location

Eye movement data

Word fixated

=
SQLite


Talk

aloud:

“It looks relevant to my topic as it
瑡汫猠a扯畴b
p兌楴e


Interviews:

“I found it

Relevant because of the
word
SQLite

on the Title



Topicality


(1*)


Relevant

(3)


1.1.First
surrogate

1.2.2.1.Title


Fixation
number
= 2


Total

fixation length

=
0.46

secs


plus

some others

Data analysis process

RESULTS

Relevance criteria and fixation data [1]

Criteria

mean number of fixations
(per surrogate)

mean fixation length
(
secs
)

occurrence

Topicality

1.37

0.52

0.5

Scope

0.72

0.33

0.3

User background

0.45

0.7

0.2

Quality

0.36

0.18

0.15

Tangibility

0.17

0.07

0.1

Top 5 criteria

Criteria

mean number of fixations
(per surrogate)

mean fixation length
(
secs
)

occurrence

Recency

0.09

0.04

0.05

Ranking

0.09

0.03

0.06

Serendipity

0.07

0.04

0.04

Format

0.04

0.01

0.04

Document

characteristics

0.03

0.01

0.04

The 5 least fixated criteria

significantly more cognitive
effort (as measured by
number of fixations and
fixation length) than for
other criteria

significantly less cognitive
effort (as measured by
number of fixations and
fixation length) than for
other criteria


Exhaustive evaluation

Economic evaluation

Relevance criteria and ranking order [1]

Relevance criteria and Ranking order

Lots of activity and decision
making on first few
surrogates


Less time spent on topicality
as they use lower ranked
surrogates


Criteria reflecting personal
background and quality
used almost throughout

Relevance criteria and surrogate
components [1]

Title

URL

Summary

Mean
number of fixations across
surrogate
components

Use of different
surrogate components
for different decisions


Title for decisions of
topicality and scope


Summary for decisions
on tangibility,
affectiveness

and
recency


URL for quality and
resource type

…and across time

Initial Search Intermediate searches Final search

Topicality

Can assess
behaviour

when assessing criteria
across time


This is fixation length for
topicality


Lot of time spent
assessing (deciding) what
is not relevant in the
middle of a search


Less time on partial
relevance towards end


Similar to
Greisdorf

and
Spink




Initial search

Intermediate searches

Final search

conclusions


what is the
cognitive

effort spent on specific relevance criteria?


more effort on topicality, scope, quality and user subjective criteria


in the sense that they apply to more relevance decisions


and that they can take longer to make


is there an effect of
ranking

order of search results and surrogate
components?


less time spent on some criteria further down the ranking


different surrogate used to decide on different criteria


But changing use of criteria not reflected in different surrogates


is there an association between
criteria

and
grades

of relevance?


more/less time spent on some criteria at early/mid/late points in a
search


have also investigated predicting relevance decisions based on criteria
being used with some success