The Discipline and Future of Machine Learning

journeycartAI and Robotics

Oct 15, 2013 (3 years and 10 months ago)

106 views

Tom M. Mitchell

E. Fredkin Professor and Department Head

March 2007


The Discipline and Future

of Machine Learning

The Discipline of Machine Learning


The defining question:


How can we build computer systems that automatically
improve with experience, and what are the fundamental
laws that govern all learning processes?



A process learns with respect to <T,P,E> if it


Improves its performance P


at task T


through experience E


Machine Learning
-

Practice

Object recognition

Mining Databases

Speech Recognition

Control learning



Reinforcement learning



Supervised learning



Bayesian networks



Hidden Markov models



Unsupervised clustering



Explanation
-
based learning



....

Extracting facts from text

Machine Learning
-

Theory

PAC Learning Theory

# examples (
m)

representational
complexity (
H)

error rate (
e
)

failure
probability (
d
)

Other theories for



Reinforcement skill learning



Semi
-
supervised learning



Active student querying





… also relating:



# of mistakes during learning



convergence rate



asymptotic performance



bias, variance



VC dimension

(for supervised concept learning)

The Discipline of Machine Learning

Machine Learning:


How can we build computer systems that automatically improve with
experience, and what are the fundamental laws that govern all
learning processes?



Computer Science:


How can we build machines that solve problems, and which
problems are inherently tractable/intractable?



Statistics:


What can be learned from data with a set of modeling assumptions,
while taking into account the data
-
collection process?

Animal learning

(Cognitive science,
Psychology,
Neuroscience)

Machine learning

Statistics

Computer science

Adaptive Control
Theory

and

Robotics

Evolution

Economics

ML and CS


Machine learning already the preferred approach to


Speech recognition, Natural language processing


Computer vision


Medical outcomes analysis


Many robot control problems






The ML niche will grow


Why?

All software

ML software

ML and Empirical Sciences


Empirical science
is

a learning process, subject to automation and to study


improve performance P (accuracy)


at task T (predict which gene knockouts will impact the aromatic AA pathway,
and how)


with experience E (active experimentation)


Functional genomic hypothesis generation and experimentation
by a robot scientist, King et al.,
Nature
, 427(6971), 247
-
252

Which protein
ORFs influence
which enzymes in
the AAA pathway

Our current state:


The problem of tabula
-
rasa function approximation is
solved (in an 80
-
20 sense):



Given:


Class of hypotheses H = {h: X


Y}


Labeled examples {<x
i
,f(y
i
)>}


Determine:


The h
from

H that best approximates f



It

s time to move on


Enrich the function approx problem definition


Use function approx as building block


Work on new problems

Some Current Research Questions


When/how can unlabeled data be useful in function approximation?



How can assumed sparsity of relevant features be exploited in high
dimensional nonparametric learning?



How can information learned from one task be transferred to
simplify learning another?



What algorithms can learn control strategies from delayed rewards
and other inputs?



What are the best

active learning


strategies for different learning
problems?



To what degree can one preserve data privacy while obtaining the
benefits of data mining?


The Future of Machine Learning

A Quick Look Back

1960

1970

1980

1990

2000

Samuel

s
checker
learner

Perceptrons

Winston

s
symbolic
concept learner

Rule
learning

Decision tree
learning

Neural
networks

Explanation
-
based
learning

Dimensionality
reduction

Bayes nets

PAC learning
theory

Architectures
for learning
and problem
solving

Reinforcement
learning

Semi
-
supervised
learning

Non
-
parametric
methods

Statistical
perspective
on learning

HMMs

SVMs

Theories of
grammar
induction

Large scale
datamining

Speech
applications

Robot
control

Privacy
preserving
data mining

Transfer
learning

Version
Spaces

Theories of
perceptron
capacity and
learnability

Evolutionary and revolutionary changes

What might lead to the next revolution?

1.
Use Machine Learning to help
understand Human Learning

(and vice versa)

Models of Learning Processes


# of examples


Error rate


Reinforcement learning


Explanations



Learning from examples


Complexity of learner

s
representation


Probability of success


Prior probabilities


Loss functions



# of examples


Error rate


Reinforcement learning


Explanations



Human supervision


Lectures


Questions, Homeworks


Attention, motivation


Skills vs. Principles


Implicit vs. Explicit learning


Memory, retention, forgetting


Hebbian learning, consolidation

Machine Learning:

Human Learning:

Reinforcement Learning

[Sutton and Barto 1981; Samuel 1957]

Observed immediate reward

Learned sum of future rewards

Reinforcement Learning in ML

r =100

V=100

0

V=72

V=81

V=90



= .9

S
0

S
2

S
1

S
3

To learn V, use each transition to generate a training signal:

Reinforcement Learning in ML


Variants of RL have been used for a variety of practical
control learning problems


Temporal Difference learning


Q learning


Learning MDPs, POMDPs



Theoretical results too


Assured convergence to optimal V(s) under certain conditions


Assured convergence for Q(s,a) under certain conditions

Dopamine As Reward Signal

[Schultz et al.,
Science
, 1997]

t

Dopamine As Reward Signal

[Schultz et al.,
Science
, 1997]

t

Dopamine As Reward Signal

[Schultz et al.,
Science
, 1997]

t

RL Models for Human Learning

[Seymore et al., Nature 2004]

[Seymore et al., Nature 2004]

Human

and

Machine
Learning

Additional overlaps:



Learning of perceptual representations


Dimensionality reduction methods,
low level percepts


Lewicky et al.: optimal sparse codes of natural scenes yield gabor
filters found in primate visual cortex. Similar result for auditory cortex.



Learning with redundant sensory input


CoTraining methods,
Sensory redundancy hypothesis in development


De Sa & Ballard; Coen: co
-
clustering voice/video yields phonemes


Mitchell & Perfetti: co
-
training in second language learning



Learning and explanations


Explanation
-
based learning
,
teaching concepts & skills, chunking


VanLehn et al: explanation
-
based learning accounts for some human
learning behaviors.


Chi: students learn best when forced to explain


Newell; Anderson: chunking/knowledge
-
compilation models

2. Never
-
ending learning

Never
-
Ending Learning

Current machine learning systems:


Learn one function


Are shut down after they learn it


Start from scratch when programmed to learn the next
function


Let

s study and construct learning processes that:


Learn many different things


Formulate their own next learning task


Use what they have already learned to help learn the
next thing




Example: Never
-
ending learning robot

Imagine a robot with three goals: (1) avoid collisions, (2) recharge when
battery low, and (3) find and collect trash


What is stopping us from giving it some trash examples, then letting it
learn for a year?


What must it start with to formulate and solve relevant learning subtasks?


Learn to
recognize

trash in scene


Learn
where to

search

for trash, and
when


Learn
how close to get

to find out whether trash is there


Learn to
manipulate

trash


Transfer

what it learned about paper trash to help with bottle trash


Discover
relevant subcategories

of trash (e.g., plastic versus glass
bottles), and of other objects in the environment

Core Questions for Never
-
Ending Learning Agent


What function or fact to learn next?


Self
-
reflection on performance, credit assignment



What representation for this target function or fact?


Choice of input
-
output representation for target function


E.g.,

classify whether it

s trash




How to obtain (which type of) training experience?


Primarily self
-
supervised, but occasional teacher input


E.g.,

classify whether it

s trash




Guided by what prior knowledge?


Transfer learning, but transfer between what?


X

PaperTrash help learn X

PlasticTrash ?


State(t) x Action(t)


State(t+1) help learn X

PlasticTrash ?


Example: Never
-
ending language learner

Read the Web project: Create 24x7 web agent that each day:


Extracts more facts from the web into structured database


Learns to extract facts better than yesterday


Starting point:


Ontology of hundreds of categories and relations


and 6
-
10 training examples of each


Never
-
ending learning architecture


State of art language processing primitives


Learning mechanisms


Top level task:


Populate a database of these categories and relations by reading
the web, and improve continually

[Carlson, Cohen, Fahlman, Hong, Nyberg, Wang, ...]

Q: how can it obtain useful training
experience (i.e., self
-
supervise)?

A: redundancy

Bootstrapping: Learning to extract named entities

I arrived in
Pittsburgh

on Saturday.

location?

x
1
: I arrived in _________ on Saturday.

x
2
:
Pittsburgh

Bootstrap learning to extract named entities

[Riloff and Jones, 1999], [Collins and Singer, 1999], ...

Iterations

Initialization

Australia

Canada
China
England
France
Germany
Japan Mexico
Switzerland
United_states

locations in ?x

South Africa

United Kingdom

Warrenton

Far_East

Oregon

Lexington

Europe

U.S._A.

Eastern Canada

Blair

Southwestern_states

Texas

States

Singapore …

operations in ?x

Thailand

Maine

production_control

northern_Los

New_Zealand

eastern_Europe

Americas

Michigan

New_Hampshire

Hungary

south_america

district

Latin_America

Florida ...

republic of ?x




...

Co
-
Training

Answer
1

Classifier
1

Answer
2


Classifier
2

I flew to
New York

today.

New York

I flew to
____

today

Idea: Train
Classifier
1

and
Classifier
2
to:

1. Correctly classify labeled examples

2.
Agree
on classification of unlabeled

Co
-
Training Theory

[Blum&Mitchell 98; Dasgupta 04, ...]

Final
Accuracy

# unlabeled examples

Conditional
dependence
among inputs

# labeled examples

Number of
redundant
inputs



want inputs less dependent,
increased number of redundant
inputs, …



disagreement over unlabeled
examples can bound true error

Example Bootstrap learning algorithms:



Classifying web pages
[Blum&Mitchell 98; Slattery 99]


Classifying email

[Kiritchenko&Matwin 01; Chan et al. 04]


Named entity extraction
[Collins&Singer 99; Jones 05]


Wrapper induction

[Muslea et al., 01; Mohapatra et al. 04]


Word sense disambiguation
[Yarowsky 96]


Discovering new word senses
[Pantel&Lin 02]


Synonym discovery

[Lin et al., 03]


Relation extraction
[Brin et al.; Yangarber et al. 00]


Statistical parsing
[Sarkar 01]


What is relation between

Elvis


and

January 8

?

Q: how can it choose next learning task?

A: self
-
reflect on where it is failing, then
formulate learning task to repair failure

Some strategies for generating new tasks


Collect more data from web


To learn about specific entities (e.g.,

Rolling Stones

)


To learn meaning of particular language (e.g.,

will attend

)


To locate easy
-
to extract facts (e.g., web pages with lists)



Learn regularities from the populated KB



Most LTI office names are of the form

NSH dddd




Explore specializations of ontological categories


What distinguishes events occurring on CMU campus from
those who occurring elsewhere? Can this be predicted?
What subsets of events warrant becoming categories?



Explore specializations of language structures


Which

location


entities share surrounding language?


e.g.,

the city of ?x,


Do they share other properties?

Some Types of Knowledge to Learn


Linguistic regularities


{

spoon

,

fork

,

chopsticks

} occur often in

eat with my ___



They

re instances of ontology class

eating implement




HTML layout regularities


HTML lists often contain items of the same class



Web site regularities


University departments often have page listing all faculty



Regularities over extracted facts



Professors typically have more publications than their advisees




Professors typically received their BS degree before their advisees




Temporal stability


Birthdays don

t change. Stock prices do.


Research Issues


What target knowledge representation?


How can initial ontology be extended?


What types of self
-
reflection are required?


Can one learn language without non
-
linguistic
knowledge?


How can we manage mapping between text
tokens and non
-
text entities they describe?


What curriculum for staging the learning?


What active learning methods?

More Revolutionary Research Directions


Can we design new kinds of computer programming languages
with explicit learning primitives?



Can we build robot scientists?



What are the fundamental tradeoffs between computational
efficiency and statistical efficiency?



How can we build systems that learn from instruction, dialogs
and problem sets, in addition to labeled examples?



How can we unify machine learning theories and models with
those from other fields studying adaptation, eg., adaptive control
theory, economics, evolution?


Summary


Machine Learning research is (should be more)
connected to understanding
all

learning
processes



Field is ripe for new revolutionary directions:


Computational models for human learning


Never
-
ending learners


<your idea here>

Thank you!