what, how, and what for?

erminerebelAI and Robotics

Nov 15, 2013 (3 years and 10 months ago)

74 views

Diversity
in search:

what, how,

and what for?


Bettina Berendt


Dept. Computer Science,

KU Leuven

Thanks to


Sebastian Kolbe
-
Nusser


Anett Kralisch


Siegfried Nijssen


Ilija Suba
ši
ć


Mathias Verbeke


Hugo Zaragoza


...

Diversity in natural language


diverse (s#2)
, various :

distinctly dissimilar or unlike

...,
diversity (s#1)
, ..., variety :

noticeable heterogeneity

(Wordnet)



“the fact that members of a set are
different from one another“

Why is diversity

interesting for search?


“People like to see a range of different, non
-
redundant things/views/etc.“

“Different people search differently.“




How?



When / under what conditions?



(What) can
we

do?

What is diverse?


Documents


the relevance of a document must be determined
considering the documents appearing before it
(Goffman, 1964)


E.g. MMR (Carbonell & Goldstein, 1998)


Many further developments, e.g. for images


Presentation choices, e.g. re
-
ranking or clustering?

What is diverse?


Documents


People


“The term
diversity

is a form of euphemistic
shorthand to describe differences in racial or ethnic
classifications, age, gender, religion, philosophy,
physical abilities, socioeconomic background, sexual
orientation, gender identity, intelligence, mental
health, physical health, genetic attributes, behavior,
attractiveness, place of origin, cultural values, or
political view as well as other identifying features.”


http://en.wikipedia.org/wiki/Diversity_(politics)


What is diverse?


Documents


People


Knowledge and its articulations



(= documents in a wider sense?!)


“Knowledge and its articulations are strongly
influenced by diversity in, e.g., cultural backgrounds,
schools of thought, geographical contexts.”


“LivingKnowledge will study the effect of diversity and
time on opinions and bias.”


“The goal [is] to improve navigation and search in
very large multimodal datasets (e.g., the Web itself).”

How we got here

The impact of
language and
culture on
Web usage
behaviour













Diversity of
users

How we got here

The impact of
language and
culture on
Web usage
behaviour











Tools for
sense
-
making
in literature
search


Diversity of
users

Diversity of
documents

How we got here

The impact of
language and
culture on
Web usage
behaviour

Tools for
sense
-
making
in literature
search


PORPOISE,
STORIES tools
for graphical
news summa
-
rization and
understanding

Diversity of
users

Diversity of
documents

How we got here

The impact of
language and
culture on
Web usage
behaviour

Tools for
sense
-
making
in literature
search


PORPOISE,
STORIES tools
for graphical
news summa
-
rization and
understanding

Collaborative
re
-
use of
literature
search results

Diversity of
users

Diversity of
diversity


Diversity of
documents

Why this talk?

The impact of
language and
culture on
Web usage
behaviour

Tools for
sense
-
making
in literature
search


PORPOISE,
STORIES tools
for graphical
news summa
-
rization and
understanding

Collaborative
re
-
use of
literature
search results

Diversity of
users

Diversity of
diversity


Diversity of
documents

Why this talk?

The impact of
language and
culture on
Web usage
behaviour

Tools for
sense
-
making
in literature
search


PORPOISE,
STORIES tools
for graphical
news summa
-
rization and
understanding

Collaborative
re
-
use of
literature
search results

e.g. Information
Retrieval J.

2009

Proceedings

Living Web
WS@ISWC 2009

Inf. Processing &
Management

2010

e.g. Knowledge
and Information
Systems J.

2009

Towards an integrated

understanding of

diversity

The impact of linguistic diversity on
Web usage
and thereby on the Web

Or:



Why are non
-
English languages under
-
represented on the Web?



A web
-
analysis approach asking for underlying


cognitive
-
linguistic


behavioural


attitude


factors

A simple expectation of how much
content exists in which language

But: Dynamics of content creation, link
setting, link following, attitudes, and use

But: Dynamics of content creation, link
setting, link following, attitudes, and use

People create less content

People link less to content

People use links less

People think the content

is bad

... and use it less

But: Dynamics of content creation, link
setting, link following, attitudes, and use



Under
-
representation !

Underlying data and methods


Database of countries and official languages


Distribution comparisons between


worldwide proportions of native speakers of different languages


worldwide distribution of servers registered by country


crawler analysis of links to a multilingual site S


log analysis assigning each session a native language


log analysis of


(user native language)


(S
-
entry
-
page language)


Questionnaire/TAM analysis of native and non
-
native
users of S:


usability, ease of use, competence in English, beliefs about
availability of content in native language

Some questions


Does one find such dynamics also in search
engines?


What factors stop or reverse such language
-
marginalisation trends?


Critical mass?


Laws?


Volunteers?


Did / can Web 2.0/3.0 change this?


(When) is it better to work without pre
-
defined
labels for users?




Part 2: An approach that ...


Does one find such dynamics also in search
engines?


What factors stop or reverse such language
-
marginalisation trends?


Critical mass?


Laws?


Volunteers?


Did / can Web 2.0/3.0 change this?


(When) is it better to work without pre
-
defined
labels for users?

Motivation (1):

Diversity of people is ...


Speaking different
languages (etc.)


localisation /
internationalisation


Having different
abilities


accessibility


Liking different
things


collaborative filtering


Structuring the world
in different ways


?

Motivation (2):

Diversity
-
aware applications ...


Must have a (formal) notion of diversity


Can follow a


“personalization approach“



adapt to the user‘s value on the diversity
variable(s)



transparently? Is this paternalistic?


“customization approach“



show the space of diversity



allow choice / raise awareness / semi
-
automatic!

Measuring grouping diversity

Diversity = 1


similarity = 1
-

Normalized mutual information

NMI = 0

NMI = 0.35

By colour &

Measuring user diversity


“How similarly do two users group documents?“


For each query
q
, consider their groupings
gr
:







“How similarly do two users group documents?“


For each query
q
, consider their groupings
gr
:




For various queries: aggregate



... and now: the application domain

... that‘s only the 1st step!

Workflow

1.
Query

2.
Automatic clustering

3.
Manual regrouping

4.
Re
-
use

1.
Learn + present way(s) of grouping

2.
Transfer the constructed concepts


Concepts


Extension


the instances in a group


Intension


Ideally: “squares vs. circles“


Pragmatically: defined via a
classifier

Step 1: Retrieve


CiteseerX via OAI


Output: set of


document IDs,


document details


their texts

Step 2: Cluster


“the classic bibliometric solution“


CiteseerCluster:


Similarity measure: co
-
citation, bibliometric
coupling, word or LSA similarity, combinations


Clustering algorithm: k
-
means, hierarchical


Damilicious: phrases


Lingo


How to choose the

best“?


Experiments: Lingo better than k
-
means at
reconstruction and extension
-
over
-
time

Step 3 (a): Re
-
organise

& work on document groups

Step 3 (b):

Visualising document groups

Steps 4+5: Re
-
use


Basic idea:

1.
learn a classifier from the final grouping (Lingo phrases)

2.
apply the classifier to a new search result




“re
-
use semantics“


Whose grouping?


One‘s own


Somebody else‘s


Which search result?




the same“ (same query, structuring by somebody else)




More of the same“ (same query, later time


more doc.s)




related“ (... Measured how? ...)


arbitrary

Visualising user diversity (1)

Simulated users with different strategies


U0: did not change anything
(

System“)


U1:
tried produce a better fit of the
document groups to the cluster
intensions; 5 regroupings


U2: attempted to move everything
that did not fit well into the remainder
group “Other topics”, & better fit; 10
regroupings


U3: attempted to move everything
from „Other topics“ into matching real
groups; 5 regroupings


U4: regrouping by author and
institution; 5 regroupings




5*5 matrix of diversities
gdiv(A,B,q)



multidimensional scaling

Visualising user diversity (2)

aggregated

using
gdiv(A,B)

Web mining

Data mining

RFID

Evaluating the application


Clustering only: Does it generate
meaningful document groups?


yes (tradition in bibliometrics)


but: data?


Small expert evaluation of CiteseerCluster


Clustering & regrouping


End
-
user experiment with CiteseerCluster


5
-
person

formative user study of Damilicious

The Damilicious tool: Summary and

(some) open questions


A tool that helps users in sense
-
making, exploring diversity, and re
-
using semantics



diversity measures when queries and result sets are different?


how to best present of diversity?


How to integrate into an environment supporting user and community
contexts?


Incentives to use the functionalities?


how to find the best balance between similarity and diversity?


which measures of grouping diversity are most meaningful?


Extensional?


Intensional? Structure
-
based? Hybrid? (cf. ontology matching)


which other sources of user diversity?


Diversity and relevance: can we learn from user
-
dependent
relevance judgements?

Some lessons learned

(or questions raised?)


We need to embrace diversity.


We need to take into account


The diversity of documents / knowledge


The diversity of people


The diversity of diversity .


We need to be clear about what we mean.


We need to ask whether / when „striving for
diversity“ is in itself A Good Thing.


We need to ask whether / when „raising
awareness of diversity“ is in itself A Good Thing.

Thanks
!

Diversity
in search:

what, how,

and what for?


Bettina Berendt


Dept. Computer Science,

KU Leuven

... and now: the application domain

... that‘s only the 1st step!