Transcript Stanford Ai Professor Andrew Ng

beadkennelAI and Robotics

Oct 15, 2013 (4 years and 8 months ago)


Andrew Ng “The Future of Robotics and Artificial Intelligence” (

Science fiction has promised us house cleaning robots for many years. Let me
: Who here would
want a robot to clean you

house? Yeah. Everyone wants a robot.

We think of house cleaning robots as being in the realm of science fiction, but is that really the case

Let me show you a video of this robot
here. There’s actually a cheat
in this video
and things aren’t
exactly as they appear, but let’s watch the

video first. So, there’s that robot, cleaning up an apartment

it turns out that rather than just tidying up it can also sweep, dust, vacuum, water flowers, fetch
food, use the dishwater. Seems like it must be an incredibly smart robot if

do al
l these things,

and we
would all love to have one,

but it turns out that there is a cheat in this video and this robot wasn’t really
smart at all.

<Picture of a Sony Playstation joystick> This was the cheat. Um, the way the video was made was…there
a grad student sitting a little off camera, using the joystick to very slowly and painfully control every
tiny movement of the robot and it was actually much slower and more painful to clean your house this
way than to just walk in there and do it yourself

So, what does this mean? If we want house cleaning robots
this means that
what’s really missing is the
software. We are seeing that our robots are physically

mechanically capable of doing almost all the
household chores we’d like them to, but what i
s missing is the artificial intelligence software to
them smart enough to do this themselves
. So, in robotics we need software to do two main things, we
call them control and perception.

Control means, once you know where you want the robot to move o
r what motion you want it to make,
how to get it to make that motion and perception means getting the robot to see the world around it
and understand what’s around it.

So, let’s start with talking about control. When I started working on robotics, about
ten years ago, I
asked a lot of people, what’s the hardest control problem
you know

of and back then the most common
answer I heard was “getting a computer to
fly a helicopter
,” so I said, “Well, let’s work on that.”

And our helicopter here is instrumente
d with GPS decelerometers and a compass, so it always knows
where it is. It turns out flying helicopters is pretty difficult. If you are ever in a helicopter,
and you’re
watching the pilot, the pilot

is holding two handles and each foot is on a
foot pedal and they
are always moving both hands and both feet to try to keep the helicopter
in the air
. So for the computer
the problem is

ten times a second it
can tell where the helicopter is and you need to figure out how to
move all

these control st
icks to keep the helicopter in the air.

When I first go

my hands on a helicopter, I first thought: “You know what. I know math. Let’s just write
out a mathematical specification for how helicopters behave and we’ll program the math specification
the computer and then the computer will know how to fly a helicopter that way. Turns out I had a
friend who had taken that approach. Let me play a video that I got from him. When I play this video you
hear David’s voice say, “E
nable Control” and that’s w
his program takes over this helicopter.

“Enable Control! Abort, abort, abort.”

So, that’s him shouting “Abort, abort.”

So when I saw this
, what I realized was that helicopters are just too complicated and their aerodynamics
are just too complicated.

No one can sit down and just write a mathematical

for how to fly
a helicopter. Um. Instead what I realized was that a much better way to do try was to let a computer
learn by itself how to fly a helicopter. So what you do is you give a c
omputer access to a helicopter and
let it try out different things on the helicopter and try out different strategies of flying it and see what
happens and learn from its own experiences. This is much like a beginner pilot might to learn how to fly
it. T
his is also maybe not unlike the way you might have learned to ride a bicycle, trying out different
things until, after a while, you get it. So researchers call this technique machine learning. And using this
method, we’ve been able to not only get the he
licopter to fly around, but also get it to fly many
interesting aerobatic stunt maneuvers. Um let me show you just one vide
o of one of the aerobatic stunt

maneuvers that we flew on this helicopter.

This is a video that we made on the Stanford football fiel
d. It’s flying under computer control and when
we zoom out the camera

you see

this picture

is part of the sky. So, among being able to fly upside
his is the first helicopter in the world able to do so under computer control
ur helicopters can
w fly maneuvers and stunt maneuvers at a skill level comparable to the very best human pilots in the

So, we seem to be doing pretty well on the control. Let’s look at perception next.

Here’s an example. Let’s say I tell my robot to please find my
coffee mug. Um, because we know how to
do control, my robot can drive to the kitchen to start looking for my coffee mug. And, let’s say my robot
sees this picture. Um, where does it think the coffee mug


If you run a standard computer vision
m, this is the sort of result it gets. It completely misses where the coffee mug is. Um, so, why is
this hard

It turns out that your and my visual systems are so good it’s almost hard for us to understand how a
computer could fail to recognize what th
at is, but let’s zoom in to a small part of the mug. Where you
and I see a coffee mug, the computer sees this. That’s a grid of numbers or that’s a matrix of pixel
brightness value
. It’s also called pixel intensity values and the computer vision task i
s to look at all
those numbers and to decide that those numbers represent the rim of a coffee mug.

Um, in computer vision, we don’t just want them to recognize coffee mugs, we also want them to
recognize people and faces, have depth perception to tell how
far away things are and it seems like you
must need very complicated mathematical functions

very complicated computer programs

to look at
all those numbers and figure out what is going on in an image and in fact this is mostly what computer
vision expert
s have done, which is write very complicated programs to try to understand different parts
of an image. Um, these are illustrations from six of the leading computer vision programs, the technical
term is features

of six of the leading programs that try t
o do vision.
And you know what?
Sadly, these
are really complicated programs, but none of them work that well yet.

Um, perception for robots isn’t just vision. We also want it to understand audio. In other words, we
want the robot to understand when I
say words to it. And it turns out, we’re starting to have software
that can do speech recognition, but

still hard. It’s hard for a similar reason as vision. The speech
recognition problem is to look at a wave form like that

to look at you know a curve like that, which is
what a microphone records
and to decide that that curve, that wave form

corresponds to me saying
“Robot, please find my coffee mug.” So, because it seems like you need a really complicated function to
do thi
s, it turns out that audio researchers have spent many years writing really complicated programs
to try to do these tasks, and it’s starting to get there, but not quite.

So, when I started to work on perception, and I saw the state of the..what I started t
o do was write even
more complicated programs. You know, like, let’s write even more complicated programs than anyone
else in the world.
, for a long time, I
tried to do


made very little progress, was getting very
little traction
, and I got
very frustrated. For a long time I actually seriously doubted my ability to make
any contribution because things were so complicated and it was so hard to get it to work
I was just
not making progress.

Then, five years ago I came across one new idea

that completely re
energized my thinking

how to get
robots to be smart. This is the idea. It is that, if you look at how the human brain does perception

rather than needing tons of algorithms of vision, tons of algorithms for audio. It
may be


most of

the brain
does it,

a single learning
algorithm or single
(this would be a learning
and that if this is true, then maybe we don’t need to figure


all these different and
complicated programs, maybe you just ne
ed one s
ingle program maybe something that whatever the
brain is doing and
maybe that will make us progress much faster on perception.

So, why do we think that the brain may need just one algorithm or just one program to do all the
wonderful things we do in perc
eption? Let me show

you some of the evidence from


So, that red part of the brain is your auditory cortex. The way that you

e understanding my words now
is that your ears are sending a sound signal to your auditory cortex

which is then pr
ocessing the
signal. Neuroscientists are doing
the following amazing experiment. You can cut the wire from the ear
to the auditory context and rewire the brain so that the signal from the eyes, from the optic nerve, gets
routed to the auditory cort
ex. If you do this, that red part of the brain, that red piece of brain tissue, will
learn to see, will learn to process images, and these animals can do visual discrimination tasks, they can
use that auditory cortex to understand images and you know look

at things and tell things about the
world from vision.

One more quick example, that red part of the brain is your somatosensory cortex and is responsible for
your sense of touch. If you do a similar rewiring experiment, your somatosensory cortex will lea
rn to
see, that same red piece of the brain tissue will learn to process


So, it turns out that there’s
tons of evidence
like this

that suggests that the same piece of brain tissue can process sights or sounds
or touch or even other things and ther
efore perhaps the same, maybe just one computer program,
maybe the same computer program, the same algorithm can process sight or sound or touch. And, if we
can discover what that algorithm is and get our computers to do that, maybe that will let us make
faster progress in perception

So, how does the brain work? Well, you
r brain and mine are jam
packed full of neurons that are tightly
connected to and
talk to each other. In a computer we can, therefore, build what’s called an
artificial neural

network. The other technical term is a smart learning algorithm. So if we can build a
neural network that simulates all these neural networks being connected and talking to each other.

Finally, what do we want these neural networks to do

Let’s turn to

biology one last time. It turns out
that if you look at how the brain processes images
, the way you and I see
. The first thing that your brain
does when you see an image is

your brain will look for sh


in the image
, it will look for edges

. For example, there is probably a neuron in your brain right now looking for a short vertical
line like that shown on the left and there is probably a different neuron in your brain looking for a 45
degree line like that shown

the right.

Here ar
e 16 little edges or lines that 16 neurons in your brain may be looking for. It’s shown at lower
resolution. What we did was we ran the learning algorithm and for what we kn
occurs in


which is what is shown on the left

we found the closest ma
tches in what the learning algorithm can do
and that’s shown on the right. It’s sort of not a perfect match

But this

means a piece of software can
explain early visual processing in your brain and mine remarkably well.

How about audio? Well it turns out we visualize sound snippets, we visualize audio, using

which are pictures that look like these, but this
corresponds to six different sound snippets that the
neuron in your auditory processing system may be l
ooking for. We ran exactly the same learning
algorithm as from the previous line and for each of these found the closest match and there it is. And
what this means is that one computer program with the exact same computer program can on the one
hand do a

surprisingly good job mimicking how the brain processes vision, and do a surprisingly good job
mimicking how the brain processes audio and it turns out it can mimic how we process touch too and so

So what are the implications for computer vision does
this work. The final thing we did was

take these
ideas and apply them to computer vision tasks. And we’ve done this on various benchmarks with
varying degrees of success. But let me share with you just one result. On one particular benchmark,

Computer Vision, recognizes objects correctly 87% of the time, but when you use neuro
network the accuracy jumps up to near perfect.

One last example, and then I will close.
So, r
ecently just for fun we actually sent a robot around my
office to look for c
offee mugs. Initially these were the re

we were getting. That’s a map of my
office building and initially we were getting results like these

where every red dot is a mistake made by

We did a lot of things to improve the algorithm.
We changed, to change the

system we added



the algorithm. But now our most result looks like this. On our most recent run, out of 28
coffee mugs it found 28. And there in fact there are all the students’ coffee mugs.

Let me just cl
ose with a personal story. Ever since I was a kid I always wanted to work with AI and work
with robots. And then I got older and I got into college and I started to learn about AI in college and I
learned out how hard it was and it turns out that AI has
given us tons of great stuff. AI has helped us to
build web search engines like Google and Bing It’s given us …It has given us tons of great stuff. But there
was always this big dream of not just building web search engines and …filters, but of building
that could see the world and think and understand the world and be intelligent the way that people are
and for the longest time I gave up on that big dream

nd for many years as a professor I was even
advising students to not think about that big

AI dream because it was just too hard. It had failed for
many years. It was just too hard to make progress. And that last part, that way of advising students

something I am not that proud of now.

And it was only 5 years ago when I learned about the
se ideas in neuroscience and machine learning that
the brain might be much simpler than we had thought and that it might be possible to replicate some of
how the brain works in the computer and use that to build perception systems. So, that, about 5 years

ago, was the first time in my adult life when I felt like we might have a chance of making progress in this
game. And it terms of getting this robot to clean your house. I think our best shot is if we figure out
how the brain works and program it that w
ay to make it smart enough to
clean your house


I might be wrong. A lot of what I’m saying may turn out to be false. What we’re working on may
completely fail, but when I look at

of our lives,
I see all of spending so much ti
me in acts of
mental drudgery, spending so much time cleaning our houses, filling out silly paperwork, having to go
shopping for trivial items, and I think if we can make a robot
smart enough
to do some of these things
for us to free up our time for higher

endeavors, what could be more exciting than that

Thank you.

For more, please visit us at