Practice to Theory

unknownlippsAI and Robotics

Oct 16, 2013 (3 years and 9 months ago)

174 views

Unsupervised Domain Adaptation: From
Practice to Theory

John Blitzer


TexPoint

fonts used in EMF.

Read the
TexPoint

manual before you delete this box.:
A
A
A
A
A
A
A
A
A

.

.

.




.

.

.

?

?

?

Unsupervised Domain Adaptation

Running with
Scissors

Title
:

Horrible

book,

horrible
.

This

book

was

horrible
.

I

r e a d

h a l f,

s u f f e r i n g

f r o m

a

h e a d a c h e

t h e

e n t i r e

t i me,

a n d

e v e n t u a l l y

i

l i t

it

on

fire
.

1

less

copy

in

the

world
.

Don't

waste

your

money
.

I

w i s h

i

h a d

t h e

t i m e

s p e n t

r e a d i n g

t h i s

b o o k

b a c k
.

It

wasted

my

life

.

.

.




.

.

.

Avante

Deep

Fryer
;

Black

Title
:

lid

does

not

work

well
...

I

love

the

way

the

Tefal

deep

fryer

cooks,

however,

I

am

returning

my

second

one

due

to

a

defective

lid

closure
.

The

lid

may

close

initially,

but

after

a

few

uses

it

no

longer

stays

closed
.

I

won’t

be

buying

this

one

again
.

Source

Target

.

.

.




.

.

.

?

?

?

Target
-
Specific Features

Running with
Scissors

Title
:

Horrible

book,

horrible
.

This

book

was

horrible
.

I

r e a d

h a l f,

s u f f e r i n g

f r o m

a

h e a d a c h e

t h e

e n t i r e

t i me,

a n d

e v e n t u a l l y

i

l i t

it

on

fire
.

1

less

copy

in

the

world
.

Don't

waste

your

money
.

I

w i s h

i

h a d

t h e

t i m e

s p e n t

r e a d i n g

t h i s

b o o k

b a c k
.

It

wasted

my

life

.

.

.




.

.

.

Avante

Deep

Fryer
;

Black

Title
:

lid

does

not

work

well
...

I

love

the

way

the

Tefal

deep

fryer

cooks,

however,

I

am

returning

my

second

one

due

to

a

defective

lid

closure
.

The

lid

may

close

initially,

but

after

a

few

uses

it

no

longer

stays

closed
.

I

won’t

be

buying

this

one

again
.

.

.

.




.

.

.

Source

Target

Learning Shared Representations

fascinating

boring

read half

couldn’t put it down

defective

sturdy

leaking

like a charm

fantastic

highly recommended

waste of money

horrible

Source

Target

Shared Representations: A Quick

Review

Blitzer et al. (2006, 2007).
Shared CCA
.


Tasks: Part of speech tagging, sentiment.

Xue

et al. (2008).
Probabilistic LSA



Task: Cross
-
lingual document classification.

Guo

et al. (2009).
Latent
Dirichlet

Allocation


Task: Named entity recognition

Huang et al. (2009).
Hidden Markov Models


Task: Part of Speech Tagging

What do you mean, theory?

What do you mean, theory?

Statistical Learning Theory:

What do you mean, theory?

Statistical Learning Theory:

What do you mean, theory?

Classical Learning Theory:

What do you mean, theory?

Adaptation Learning Theory:

Goals for Domain Adaptation Theory

1.
A computable (source) sample bound on target error

2.
A formal description of empirical phenomena


Why do shared representations algorithms work?

3.
Suggestions for future research

Talk Outline

1.
Target Generalization Bounds using Discrepancy Distance

[BBCKPW 2009]

[
Mansour

et al. 2009]


2.
Coupled Subspace Learning

[BFK 2010]

Formalizing Domain Adaptation

Source labeled data

Source distribution


Target distribution


Target
unlabeled
data

Formalizing Domain Adaptation

Source labeled data

Source distribution


Target distribution


Semi
-
supervised adaptation

Target
unlabeled
data

Some target labels

Formalizing Domain Adaptation

Source labeled data

Source distribution


Target distribution


Semi
-
supervised adaptation

Target
unlabeled
data

Not in this talk

A Generalization Bound

A
new adaptation bound

Bound from [MMR09]

Discrepancy Distance

When good source models go bad

Binary Hypothesis Error Regions

+

Binary Hypothesis Error Regions

+

+

Binary Hypothesis Error Regions

+

+

Binary Hypothesis Error Regions

Discrepancy Distance

low

low

high

When good source models go bad

Computing Discrepancy Distance

Learn pairs of hypotheses to discriminate
source

from
target

Computing Discrepancy Distance

Learn pairs of hypotheses to discriminate
source

from
target

Computing Discrepancy Distance

Learn pairs of hypotheses to discriminate
source

from
target

Hypothesis Classes & Representations

Linear Hypothesis Class:

Induced classes from projections

3
0


1
0
0
1

.

.

.

Goals for

1)


2)




A Proxy for the Best Model

Linear Hypothesis Class:

Induced classes from projections

3
0


1
0
0
1

.

.

.

Goals for

1)


2)




Problems with the Proxy

3
0


1
0
0

.

.

.

1)


2)

Goals

1.
A computable bound

2.
Description of shared representations

3.
Suggestions for future research

Talk Outline

1.
Target Generalization Bounds using Discrepancy Distance

[BBCKPW 2009]

[
Mansour

et al. 2009]


2.
Coupled Subspace Learning

[BFK 2010]

Assumption: Single Linear Predictor

target
-
specific

can’t be estimated from
source alone

. . . yet

source
-
specific

target
-
specific

shared

Visualizing Single Linear Predictor

fascinating

source

shared

Visualizing Single Linear Predictor

fascinating

Visualizing Single Linear Predictor

fascinating

Visualizing Single Linear Predictor

fascinating

Dimensionality Reduction Assumption

Visualizing Dimensionality Reduction

fascinating

Visualizing Dimensionality Reduction

fascinating

Representation Soundness

fascinating

Representation Soundness

fascinating

Representation Soundness

fascinating

Perfect Adaptation

fascinating

Algorithm

Generalization

Generalization

Canonical Correlation Analysis (CCA) [
Hotelling

1935]

1) Divide feature space into disjoint views

Do
not buy

the Shark
portable
steamer. The trigger
mechanism is
defective
.

not buy

trigger

defective

mechanism

2) Find maximally correlating projections

Canonical Correlation Analysis (CCA) [
Hotelling

1935]

1) Divide feature space into disjoint views

Do
not buy

the Shark
portable
steamer. The trigger
mechanism is
defective
.

not buy

trigger

defective

mechanism

2) Find maximally correlating projections

Ando and Zhang (ACL 2005)


Kakade

and Foster (COLT 2006)

Square Loss: Kitchen Appliances

1.1
1.3
1.5
1.7
Books
DVDs
Electronics
Naïve
Source Domain

Square loss ( s)

Square Loss: Kitchen Appliances

1.1
1.3
1.5
1.7
Books
DVDs
Electronics
Naïve
Coupled
In Domain
Source Domain

Square loss ( s)

Using Target
-
Specific Features

mush

bad quality

warranty

evenly

super easy

great product

dishwasher

books


k
itchen

trite

the publisher

the author

introduction to

illustrations

good reference

kitchen


books

critique

Comparing Discrepancy &
Coupled Bounds

Target: DVDs

Square Loss

Source Instances

true error

coupled bound

Comparing Discrepancy & Coupled Bounds

Target: DVDs

Square Loss

Source Instances

Comparing Discrepancy & Coupled Bounds

Target: DVDs

Square Loss

Source Instances

Idea: Active Learning

Piyush

Rai

et al. (2010)

Goals

1.
A computable bound

2.
Description of shared representations

3.
Suggestions for future research

Conclusion

1.
Theory can help us understand domain adaptation better

2.
Good theory suggests new directions for future research

3.
There’s still a lot left to do


Connecting supervised and unsupervised adaptation


Unsupervised adaptation for problems with structure

Thanks

Collaborators

Shai Ben
-
David

Koby

Crammer

Dean Foster

Sham
Kakade

Alex
Kulesza

Fernando Pereira

Jenn

Wortman

References

Ben
-
David et al.
A Theory of Learning from Different Domains
. Machine Learning 2009.

Mansour

et al.
Domain Adaptation: Learning Bounds and Algorithms
. COLT 2009.