6.899 Learning and Inference in Vision

unknownlippsΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

57 εμφανίσεις

MIT 6.899

Learning and Inference in Vision


Prof. Bill Freeman, wtf@mit.edu


MW 2:30


4:00


Room: 34
-
301


Course web page:
http://www.ai.mit.edu/courses/6.899/

Reading class


We’ll cover about 1 paper each class.


Seminal or topical research papers in the
intersection of machine learning and vision.


One student will present each paper. Then
we’ll discuss the paper as a class.


One student will write a computer example
illustrating the paper’s main idea.

Learning and Inference


“Learning”: learn the parameter values or
structure of a probabilistic model.


Look at many examples of people walking, and
build up probabilistic model relating video
images to 3
-
d motions.



“Inference”: infer hidden variables, given a
observations.


Eg, given a particular video of someone
walking, infer their motions in 3
-
d.

Statistical
dependencies
between variables

Learning and Inference

y1

y2

Observed variables

x1

x2

Unobserved variables

Statistical
dependencies
between variables

Learning and Inference

Observed variables

Unobserved variables


Learning
”: learn this model, and the form

of the statistical dependencies.

Statistical
dependencies
between variables

Learning and Inference

y1

y2

Observed variables

x1

x2

Unobserved variables

“Learning”: learn this model, and the form

of the statistical dependencies.


Inference
”: given this model, and the
observations, y1 & y2, infer x1 & x2, or
their conditional distribution.

Cartoon history of speech
recognition research


1960’s, 1970’s, 1980’s: lots of different
approaches; “hey, let’s try this”.


1980’s Hidden Markov Models (HMM),
statistical approach took off.


1990’s and beyond: HMM’s now the
dominant approach. “The person with the
best training set wins”.

Same story for document
understanding


The person with the best training set wins.

Computer vision is ready to make
that transition


Machine learning approaches are becoming
dominant.


We get to make and watch the transition to
principled, statistical approach happen.


It’s not trivial: issues of representation,
robustness, generalization, speed, …

Categories of the papers

1.
Learning image representations

2.
Learning manifolds

3.
Linear and bilinear models

4.
Learning low
-
level vision

5.
Graphical models, belief propagation

6.
Particle filters and tracking

7.
Face and object recognition

8.
Learning models of object appearance



1 Learning image representations

Example training image

From http://www.amsci.org/amsci/articles/00articles/olshausencap1.html

1 Learning image representations


From: http://www.cns.nyu.edu/pub/eero/simoncelli01
-
reprint.pdf

2 Learning manifolds

From: http://www.sciencemag.org/cgi/content/full/290/5500/2319

Joshua B. Tenenbaum, Vin de Silva, John C. Langford


2 Learning manifolds


From: http://www.sciencemag.org/cgi/content/full/290/5500/2319

2 Learning manifolds


From: http://www.sciencemag.org/cgi/content/full/290/5500/2319

3 Linear and bilinear models


From: http://www
-
psych.stanford.edu/~jbt/NC120601.pdf

4 Learning low
-
level vision

From Y. Weiss, http://www.cs.berkeley.edu/~yweiss/iccv01.ps.gz

Images, under
different lighting

reflectance

illumination

5 Graphical models, belief propagation


From: http://www.cs.berkeley.edu/~yweiss/nips96.pdf

6 Particle filters and tracking


From: http://www.robots.ox.ac.uk/~ab/abstracts/eccv96.isard.html

7 Face and object recognition


From Viola and Jones, http://www.ai.mit.edu/people/viola/research/publications/ICCV01
-
Viola
-
Jones.ps.gz

7 Face and object recognition


From Viola and Jones, http://www.ai.mit.edu/people/viola/research/publications/ICCV01
-
Viola
-
Jones.ps.gz


7 Face and object recognition

From: Pinar Duygulu, Kobus Barnard, Nando deFreitas, and David Forsyth,


8 Learning models of object appearance



Weber, Welling, and Perona, http://www.gatsby.ucl.ac.uk/~welling/papers/ECCV00_fin.ps.gz

Images containing
the object

Images not containing
the object

8 Learning models of object appearance



Test images

Weber, Welling, and Perona, http://www.gatsby.ucl.ac.uk/~welling/papers/ECCV00_fin.ps.gz

Contains the
object?

Contains the
object?

8 Learning models of object appearance


Weber, Welling, and Perona, http://www.gatsby.ucl.ac.uk/~welling/papers/ECCV00_fin.ps.gz

Guest lecturers/discussants


Andrew Blake (Condensation,
Oxford/Microsoft)


Baback Moghaddam (Bayesian face
recognition, MERL)


Paul Viola (Fast face recognition, MERL)

Class requirements

1.
Read each paper. Think about them.
Discuss in class.

2.
Present one paper to the class.

3.
Present one computer example to the
class.

4.
Final project: write a conference paper
related to vision and learning.

1. Read the papers, discuss them


Write down 3 insights about the paper that
you might want to share with the class in
discussion.


Turn them in on a sheet of paper.

2. Presentations about a paper


About 15 minutes long. Set the stage for
discussions.


Review the paper. Summarize its
contributions. Give relevant background.
Discuss how it relates to other papers we’ve
read.


Meet with me two days before to go over
your presentation about the paper.

3. Programming example


Present a computer implementation of a toy
example that illustrates the main idea of the
paper.


Show trade
-
offs in parameter settings, or in
training sets.


Goal: help us build up intuition about these
techniques.


Ok to use on
-
line code. Then focus on
creating informative toy training sets.


Toy problems


Simple summaries of the main idea.


Identify an informative idea from the paper


Make a simple example using it.


Play with it.

Toy problem

by Ted Adelson

Toy problem

“If you can make a
system to solve this,
I’ll give you a PhD”

by Ted Adelson

Particle filter for inferring human
motion in 3
-
d


From: Hedvig Sidenbladh’s thesis, http://www.nada.kth.se/~hedvig/publications/thesis.pdf

Particle filter toy example


From: Hedvig Sidenbladh’s thesis, http://www.nada.kth.se/~hedvig/publications/thesis.pdf

What we’ll have at the end of the class

Non
-
negative matrix factorization example

1
-
d particle filtering example

Boosting for face recognition

Example of belief propagation for scene
understanding.

Manifold learning comparisons.



Code examples

4. Final project: write a conference paper


Submitting papers to conferences, you get just one
shot, so it’s important to learn how to make good
submissions.


We’ll discuss many papers, and what’s good and
bad about them, during the class.


I’ll give a lecture on “how to write a good
conference paper”.


Subject of the paper can be:


A project from your own research.


A project you undertake for the class.


Your idea


One I suggest to you

Feedback options


At the end of the course
: “it would have
been better if we had done this…”


Somewhat helpful


During the course
: “I find this useful; I
don’t find that useful…”


Very helpful

What background do you need?




Be able to read and understand the papers


Linear algebra


Familiarity with estimation theory


Image filtering


Background in machine learning and
computer vision.

Auditing versus credit


If you’re a student and want to take the
class, sign up for credit.


You’ll stay more engaged.


Makes it more probable that I can offer the
class again.


But if you do audit:


Please don’t come to class if you haven’t read
the paper.


I may ask you to present to the class, anyway.

First paper


Monday, Feb. 11.


Emergence of simple
-
cell receptive field properties
by learning a sparse code for natural images,
Olshausen BA, Field DJ (1996) Nature, 381: 607
-
609


Presenter: Bill Freeman


Computational demonstration: need volunteer
(software is available:
http://redwood.ucdavis.edu/bruno/sparsenet.html)


Second paper


Wednesday, Feb. 13.


Learning the parts of objects by non
-
negative
matrix factorization, D. D. Lee and H. S. Seung,
Nature 401, 788
-
791 (1999), and commentary
by Mel.


Presenter: need volunteer


Computational demonstration: need volunteer