# Crowdclustering

Τεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 4 χρόνια και 6 μήνες)

95 εμφανίσεις

Crowdsourcing

04/11/2013

Neelima

Chavali

ECE 6504

Introduction

C
rowd Kernel

The ESP Game

CrowdClustering

Experiment

Introducion

“The
practice of obtaining needed services,
ideas, or content by soliciting contributions
from a large group of
people, especially an
online community”
-
Wikipedia

Combines
the efforts of crowds of volunteers
or part
-
time
workers to give a significant
result

Applications

Testing & Refining a
Product(Netflix)

Market
Research(
)

Knowledge Management(
wikipedia
)

Customer
Service(My Starbucks Ideas)

R&D

Computer Vision/Machine Learning

And many more fields

KERNEL

Paper:1

ML on New domain

Describe the dataset as a d
-
dimensional
representation of every object in the domain.

Requires expertise

Two representations:

Feature vector representation

Kernel representation

Slide credit: O. Tamuz

1. INPUT

Database of
𝒏

objects, say images.

Slide credit: O. Tamuz

1. INPUT

+

Slide credit: O. Tamuz

2. CROWD QUERIES

Slide credit: O. Tamuz

3. OUTPUT

e
mbedding in

𝑑

Slide credit: O. Tamuz

FIT
𝑲

TO DATA

𝑲


𝑲


𝑲


𝑲


𝑲


𝑲


𝐥𝐨𝐠

𝒑



𝐥𝐨𝐠

𝒑



𝐥𝐨𝐠

𝒑



𝒑


=

Probability that random
turker

reports

is more similar to


than to

.”

=
𝒇
𝝀
(
𝑲

,
𝑲

)
.

Find max
-
likelihood
𝑲

with
𝑲
𝒊𝒊
=

.

Equivalently, minimize log
-
loss.

-
projection descent.

+

+

Slide credit: O. Tamuz

ALGORITHM

Turk
random
triples

Fit
𝑲

to
all
data so
far

Turk “most
informative
triples”

Maximum likelihood fit

to
logistic

or
relative

model

We use probabilistic model +

information gain to decide how informative a triple is.

Slide credit: O. Tamuz

LURE OF

Toy example: complete binary trees with
𝒏

leaves, depth
𝑶
(
log

𝒏
)

Avg. cost is
𝚯
𝒏

from

random

queries
.

Avg. cost is
𝚯
𝐥𝐨𝐠

𝒏

Tie
store

Bow
ties

Neck
ties

Tie clips

Scarves

Slide credit: O. Tamuz

PERFORMANCE EVALUATION

20 Questions metric

Random object is chosen secretly

System asks 20 questions and then ranks objects in terms of likelihood

Dataset: 75 ties+75 tiles+75 flags

Slide credit: O. Tamuz

LABELING IMAGES WITH A
COMPUTER GAME

Paper 2

IMAGE SEARCH

ON THE WEB

USES FILENAMES
AND HTML TEXT

Slide Credit: Luis von
Ahn

TWO
-
PLAYER ONLINE GAME

PARTNERS DON’T KNOW EACH OTHER AND CAN’T
COMMUNICATE

OBJECT OF THE GAME:

TYPE THE SAME WORD

THE ONLY THING IN COMMON IS
AN IMAGE

THE

ESP GAME

Slide Credit: Luis von
Ahn

PLAYER 1

PLAYER 2

GUESSING:
CAR

GUESSING:
BOY

GUESSING:
CAR

SUCCESS!

YOU AGREE ON CAR

SUCCESS!

YOU AGREE ON CAR

GUESSING:
KID

GUESSING:
HAT

THE

ESP GAME

Slide Credit: Luis von
Ahn

Slide Credit: Luis von
Ahn

CHEATING?

IF A PAIR PLAYS TOO FAST, WE DON’T RECORD THE WORDS THEY
AGREE ON

Slide Credit: Luis von
Ahn

WE GIVE PLAYERS
TEST IMAGES

FOR WHICH WE KNOW ALL THE
COMMON LABELS:

WE ONLY STORE A PLAYER’S GUESSES IF THEY SUCCESSFULLY
LABEL THE TEST IMAGES

CHEATING?

Slide Credit: Luis von
Ahn

MANY PEOPLE PLAY
OVER 20 HOURS A WEEK

3.2 MILLION LABELS

WITH 22,000 PLAYERS

THE ESP GAME
IS FUN

Slide Credit: Luis von
Ahn

LABELING THE

ENTIRE WEB

INDIVIDUAL GAMES IN YAHOO! AND MSN AVERAGE OVER 10,000
PLAYERS AT A TIME

5000 PEOPLE

PLAYING SIMULTANEOUSLY CAN LABEL ALL IMAGES ON
30 DAYS
!

Slide Credit: Luis von
Ahn

A FEW
MILLION LABELS

CAN IMPROVE IMAGE SEARCH

CAN BE USED TO
IMPROVE COMPUTER VISION

CAN BE USED TO

IMPROVE ACCESSIBILITY FOR VISUALLY IMPAIRED

Slide Credit: Luis von
Ahn

CROWDCLUSTERING

Paper:3

What did they do?

Use crowdsourcing to discover categories

How? Approach

Each worker given M images to cluster.

Images are represented in d
-
dimensional
euclidean

space(hidden variables)

Atomic clusters:
Dirichlet

process mixture
model

Worker: pairwise binary classifier with a
bias(hidden variables)

A worker’s tendency to label pair of images is
modelled

as a pairwise logistic regression

How? Approach

The number of atomic cluster
centres

and
their means and
covariances

need to be
evaluated.

EXPERIMENTS

Color?

Color?

Color?

Color?

Crowdsourcing on Mechanical Turk

Crowdsourcing on Mechanical
Truk

Results

Black

Red

0
5
10
15
20
25
30
35
Black
red
Pink
0
5
10
15
20
25
30
35
Red
Coral
Results

Lavender(male)

0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
Results

Purple(female)

0
1
2
3
4
5
6
7
8
Lavender
Purple
Violet
Maroon
0
1
2
3
4
5
6
7
8
Lavender
Purple
pink
maroon
Results

Pink(female)

0
2
4
6
8
10
12
14
16
Violet
Pink
Lavender
0
2
4
6
8
10
12
14
1
2
3
4
Results

Violet(female)

0
1
2
3
4
5
6
7
8
9
10
Purple
violet
indigo
0
2
4
6
8
10
12
Purple
Violet
Series1
Acknowledgements

Dr. Parikh

Pavan

Ghatty