Introduction to Machine Learning

chardfriendlyΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

76 εμφανίσεις

AI


CS289

Machine Learning

Introduction to Machine Learning


30
th

October 2006

Dr Bogdan L. Vrusias

b.vrusias@surrey.ac.uk

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

2

Contents


Learning


Artificial Neural Networks


Supervised Learning


Unsupervised Learning

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

3

What is Learning?


‘The action of receiving instruction or acquiring knowledge’.



‘A process which leads to the modification of behaviour or the
acquisition of new abilities or responses, and which is additional to
natural development by growth or maturation’.








Source: Oxford English Dictionary Online: http://www.oed.com/, accessed October 2003.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

4

Machine Learning


Negnevitsky:


‘In general, machine learning involves adaptive mechanisms that enable
computers to learn from experience, learn by example and learn by
analogy’ (2005:165)


Callan:


‘A machine or software tool would not be viewed as intelligent if it could
not adapt to changes in its environment’ (2003:225)


Luger:


‘Intelligent agents must be able to change through the course of their
interactions with the world’ (2002:351)



Learning capabilities can improve the performance of an intelligent
system over time.



The most popular approaches to
machine learning

are
artificial
neural networks
and
genetic algorithms
.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

5

Types of Learning


Inductive learning


Learning from examples



Evolutionary/genetic learning


Shaping a population of individual solutions through survival of the fittest


Emergent learning


Learning through social interaction: game of life

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

6

Inductive Learning


Supervised learning


Training examples with a known classification from a teacher



Unsupervised learning


No pre
-
classification of training examples


Competitive learning: learning through competition on training examples

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

7

Key Concepts


Learn from experience


Through examples, analogy or discovery



To adapt


Changes in response to interaction



Generalisation


To use experience to form a response to novel situations

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

8

Generalisation

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

9

Uses of Machine Learning


Techniques and algorithms that adapt through experience.



Used for:


Interpretation / visualisation: summarise data


Prediction: time series / stock data


Classification: malignant or benign tumours


Regression: curve fitting


Discovery: data mining / pattern discovery

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

10

Why Machine Learning?


Complexity of task / amount of data


Other techniques fail or are computationally expensive



Problems that cannot be defined


Discovery of patterns / data mining



Knowledge Engineering Bottleneck


‘Cost and difficulty of building expert systems using traditional […]
techniques’ (Luger 2002:351)

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

11

Common Techniques


Least squares


Decision trees


Support vector machines


Boosting


Neural networks


K
-
means


Genetic algorithms

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

12

Decision Trees


A map of the reasoning process, good at solving classification
problems (Negnevitsky, 2005)



A decision tree represents a number of different
attributes

and
values


Nodes

represent attributes


Branches

represent values of the attributes



Path through a tree represents a decision



Tree can be associated with rules

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

13

Example 1


Consider one rule for an ice
-
cream seller (Callan
2003:241)


IF Outlook = Sunny

AND Temperature = Hot

THEN Sell

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

14

Example 1

Outlook

Temperature

Sunny

Hot

Sell

Don’t Sell

Sell

Yes

No

Mild

Holiday Season

Cold

Don’t Sell

Holiday Season

Overcast

No

Don’t Sell

Yes

Temperature

Hot

Cold

Mild

Don’t Sell

Sell

Don’t Sell

Root node

Branch

Leaf

Node

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

15

Construction


Concept learning:


Inducing concepts from examples



We can intuitively construct a decision tree for a small set of examples



Different algorithms used to construct a tree based upon the examples


Most popular ID3 (Quinlan, 1986)

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

16

Which Tree?


Different trees can be constructed from the same set of examples



Which tree is the best?


Based upon choice of attributes at each node in the tree


A split in the tree (branches) should correspond to the predictor with the
maximum separating power



Examples can be contradictory


Real
-
life is noisy

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

17

Extracting Rules


We can extract rules from decision trees


Create one rule for each root
-
to
-
leaf path


Simplify by combining rules



Other techniques are not so transparent:


Neural networks are often described as ‘black boxes’


it is difficult to
understand what the network is doing


Extraction of rules from trees can help us to understand the decision
process

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

18

Issues


Use prior knowledge where available



Not all the examples may be needed to construct a tree


Test generalisation of tree during training and stop when desired
performance is achieved


Prune the tree once constructed


Examples may be noisy



Examples may contain irrelevant attributes

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

19

Artificial Neural Networks


An
artificial neural network

(or simply a
neural network
) can be
defined as a model of reasoning based on the human brain.



The brain consists of a densely interconnected set of nerve cells, or
basic information
-
processing units, called
neurons
.



The human brain incorporates nearly 10 billion neurons and 60 trillion
connections,
synapses
, between them.



By using multiple neurons simultaneously, the
brain can perform its
functions much faster than the fastest computers in existence
today
.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

20

Artificial Neural Networks


Each neuron has a very simple structure, but an army of such elements
constitutes a tremendous processing power.



A neuron consists of a cell body,
soma
, a number of fibers called
dendrites
, and a single long fiber called the
axon
.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

21

Artificial Neural Networks


Our brain can be considered as a highly complex, non
-
linear and
parallel information
-
processing system.



Learning is a fundamental and essential characteristic of biological
neural networks. The ease with which they can learn led to attempts to
emulate a biological neural network in a computer.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

22

Artificial Neural Networks


An artificial neural network consists of a number of very simple processors,
also called
neurons
, which are analogous to the biological neurons in the brain.



The neurons are connected by
weighted links

passing signals from one neuron
to another.



The output signal is transmitted through the neuron’s outgoing connection.
The outgoing connection splits into a number of branches that transmit the
same signal. The outgoing branches terminate at the incoming connections of
other neurons in the network.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

23

The Perceptron


The operation of Rosenblatt’s
perceptron

is based on the McCulloch
and Pitts neuron model. The model consists of a linear combiner
followed by a
hard limiter
.



The weighted sum of the inputs is applied to the
hard limiter
, which
produces an output equal to +1 if its input is positive and 1 if it is
negative.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

24

The Perceptron


How does the
perceptron

learn its classification tasks?



This is done by making small adjustments in the
weights

to reduce the
difference between the
actual

and
desired outputs
of the
perceptron
.



The initial weights are randomly assigned, usually in the range [
-
0.5,
0.5], and then updated to obtain the output consistent with the training
examples.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

25

Multilayer neural networks


A
multilayer perceptron
is a
feedforward neural network
with one or
more
hidden layers
.



The network consists of an
input layer
of source neurons, at least one
middle
or
hidden layer
of computational neurons, and an
output layer
of computational neurons.



The input signals are propagated in a forward direction on a layer
-
by
-
layer basis.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

26

What does the middle layer hide?


A
hidden layer
“hides” its
desired output
.



Neurons in the
hidden layer
cannot be observed through the
input/output behaviour of the network.



There is no obvious way to know what the desired output of the hidden
layer should be.



Commercial ANNs incorporate three and sometimes four layers,
including one or two hidden layers. Each layer can contain from 10 to
1000 neurons. Experimental neural networks may have five or even
six layers, including three or four hidden layers, and utilise millions of
neurons.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

27

Supervised Learning


Supervised or active learning is learning with an external “teacher” or
a supervisor who presents a training set to the network.



The most populat supervised neural network is the
back
-
propagation
neural network.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

28

Back
-
propagation neural network


Learning in a
multilayer network
proceeds the same way as for a
perceptron
.



A training set of
input patterns
is presented to the network.



The network computes its output pattern, and if there is an
error



or in
other words
a difference between actual and desired output
patterns



the
weights are adjusted to reduce this
error
.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

29

Back
-
propagation neural network

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

30

Back
-
propagation neural network


Network represented by McCulloch
-
Pitts model for solving the
Exclusive
-
OR operation.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

31

Back
-
propagation neural network

(a) Decision boundary
constructed by hidden neuron 3;

(b) Decision boundary
constructed by hidden neuron 4;

(c) Decision boundaries
constructed by the complete
three
-
layer network

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

32

Unsupervised Learning


Unsupervised
or
self
-
organised learning
does not require an external
teacher.



During the training session, the neural network receives a number of
different input patterns, discovers significant
features

in these patterns
and learns how to classify input data into appropriate categories.



Unsupervised learning tends to follow the
neuro
-
biological
organisation
of the brain.



Most popular unsupervised neural networks are the
Hebbian network
and the
Self
-
Organising Feature Map
.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

33

Hebbian Network


Hebb’s Law can be represented in the form of two rules:

1.
If two neurons on either side of a connection are activated
synchronously, then the weight of that connection is increased.

2.
If two neurons on either side of a connection are activated
asynchronously, then the weight of that connection is decreased.


AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

34

Competitive Learning


In
competitive learning
, neurons compete among themselves to be
activated.



While in Hebbian learning, several output neurons can be activated
simultaneously, in competitive learning,
only a single output neuron
is active at any time
.



The output neuron that wins the “competition” is called the
winner
-
takes
-
all
neuron.



Self
-
organising feature maps
are based on competitive learning.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

35

What is a self
-
organising feature map?


Our brain is dominated by the
cerebral cortex
, a very complex
structure of billions of neurons and hundreds of billions of synapses.



The cortex includes areas that are responsible for different human
activities (motor, visual, auditory, somatosensory, etc.), and associated
with different
sensory inputs
. We can say that each sensory input is
mapped into a corresponding area of the cerebral cortex. The
cortex is
a self
-
organising computational map in the human brain
.



The
self
-
organising feature map
has been introduced by Teuvo
Kohonen and therefore is also called
Kohonen network
.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

36

What is a self
-
organising feature map?

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

37

The Kohonen network


The Kohonen model provides a topological mapping. It places a fixed
number of input patterns from the input layer into a higher
-
dimensional output or Kohonen layer.



Training in the Kohonen network begins with the winner’s
neighbourhood of a fairly large size. Then, as training proceeds, the
neighbourhood size gradually decreases.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

38

The Kohonen network


The lateral connections are used to create a competition between
neurons.



The neuron with the largest activation level among all neurons in the
output layer becomes the winner.



This neuron is the only neuron that produces an output signal. The
activity of all other neurons is suppressed in the competition.



The lateral feedback connections produce excitatory or inhibitory
effects, depending on the distance from the winning neuron.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

39

Competitive learning in the Kohonen network


To illustrate competitive learning, consider the Kohonen network with
100 neurons arranged in the form of a two
-
dimensional lattice with 10
rows and 10 columns.



The network is required to classify two
-
dimensional input vectors each
neuron in the network should respond only to the input vectors
occurring in its region.



The network is trained with 1000 two
-
dimensional input vectors
generated randomly in a square region in the interval between

1 and
+1.

AI


CS289

Machine Learning

30
th

October 2006

Bogdan L. Vrusias
© 2006

40

Competitive learning in the Kohonen network