Computer Projects
Projects are to be presented in class together with the simulation results and discussion. Please let me know
at least a week earlier if your the presentation date needs to be changed.
Project 1.
Wang Feng
In the reinforcement learni
ng (RL) the choice of action is based on the epsilon greedy strategy, where
epsilon is arbitrarily set. Use your program for n

armed bandit as an example. (please send me a Matlab
code for your program). Then use the following procedure to establish the
policy for choosing an action
(playing a particular machine):
1.
Repeat 2

8 for the number of iterations in the experiment
2.
Calculate preference P’
a
for the action taken based on its average return value.
3.
Using the preferences for each action calculate their
nominal probability from
(1)
4.
Use the count of number of times this action was selected to determine its 95% confidence
interval.
5.
Assume that the nominal probability is a random variable having normal distribution with the
mean value equal to the nominal probability
p
0
and the standard deviation that you can derive
from its confidence interval.
6.
Use the lookup tables for the normal distributions to find out the probability that the particular
action has higher return than th
e maximum nominal probability
p
0_max
(the largest from all
actions). Use this as a new preference P
rn
.
7.
Based on P
rn
calculate action probabilities
p
a
in a way similar to (1).
8.
Generate action based on P
a
.
9.
Compare the accumulated reward with other methods
.
Please contact me if you are not clear about this assignment.
Project 2
Yinyin Liu
Use the balancing beam experiment in RL, replacing the critic network with SOLAR software. Since the
existing SOLAR operates in a batch mode, you must store all the
training data, appending a new group of
training data as soon as they are available.
You will train a new SOLAR network after each new set of training data was obtained and appended. Then
action network will use SOLAR without modifications until new rewa
rd is received. This will generate
new subset of training data and after appending them to the set of training data, repeated self

organization
and training of SOLAR.
Please contact me if you are not clear about this assignment.
Project 3
Michelle Say
re
In the neural network learning a lot of attention is paid to function approximation by selected basis
functions. Deciding how many basis factions should be used is one of the challenges. Since each new
basis function used in the approximation reduces
the number of degrees of freedom by one, the absolute
limit for the number of basis function is the number of sample data points. The danger of over fitting is
obvious. In order to have a clear indication of over fitting we need to determine when the di
fference
between the approximate and the measured data has characteristics of the noise signal.
To approximate a function without over fitting perform the following operations:
1.
Assume that your unknown function u(x) is described by a vector of samples s(
x) on the unit
interval
2.
Select a set of B basis functions (for instance basic polynomials f
i
(x)=x
i
, i=0,…,B).
3.
Use least square fit to your data using the selected basis to obtain the approximate function a(x).
4.
Calculate the approxim
ation error signal e(x)=s(x)

a(x)
5.
Determine the signal to noise ratio of e(x) by using signal correlation. Self correlation of e(x)
determines the signal plus noise energy, while correlation with the shifted signal approximates
signal energy. The differe
nce of such determined values will give noise energy included in e(x),
such that the S/N ratio of e(x) is obtained from:
(1)
where e
i
is the original error vector and e
i

1
is the shifted error vector
–
use the circular shift.
6.
Compare (S/N)
e
with (S/N)
gn
, where the (S/N)
gn
represents such computed S/N of the Gaussian
noise with zero mean and standard deviation equal to 1. This S/N of the Gaussian noise is a
random variable and its statistics can be directly estimated using
where n is the number of samples
7.
If the (S/N)
e
> (S/N)
gn_mean
+ ((S/N)
gn_std
) = 1.4204*(S/N)
gn_mean
, this means that, most likely, not
all the information was extracted from sampled data. In such case, increas
e B by 1 and repeat 2

6, otherwise stop.
Display your results.
Please contact me if you are not clear about this assignment.
Project 4
Wenlong Ni
In the Monte Carlo Method in reinforcement learning (RL) the choice of action in each state is either
rand
om or on

policy. Use your Black Jack program as an example. (please send me a Matlab code for
your program). Initial state of each episode is randomly generated and the play starts from this initial state
until it is finished, which results in an award
received. This award is averaged with other awards for this
state to approximate this state action value. Modify your program to incorporate changes in each state
action choice. Use the following procedure to establish the policy for choosing an action
(hit or stay) for
each state:
10.
Repeat 2

8 for the number of iterations in the experiment
11.
Calculate preference R’
a
for the action taken in a given state based on its average return value.
12.
Using the preferences for each action calculate their nominal probab
ility from
(1)
where R’
b
is the reward received for the other action from this state.
13.
Use the count of number of times this action was selected to determine its 95% confidence
interval.
14.
Assume that the nominal probability
p
0
is a random variable having normal distribution with the
mean value equal to the nominal probability
p
0
and the standard deviation that you can derive
from its confidence interval.
15.
Use the lookup tables for the normal distributions to find out the proba
bility that the particular
action has higher reward than the maximum nominal probability
p
0_max
(the largest from both
actions taken from a given state). Use this as a new preference R
rn
.
16.
Based on R
rn
calculate the actual action probability
p
a
in a way
similar to (1).
17.
Generate an action in a given state based on p
a
and p
b
apply it to the game.
18.
Determine the policy based on v
a
and v
b,
where v
a
and v
b
are the average values of each action in a
given state. Stop if this is an optimal policy.
19.
Compare the nu
mber of iterations needed to learn the optimal policy for this game with the result
you obtained in the last quarter.
Please contact me if you are not clear about this assignment.
Project 5
Haibo He
In on

line statistical learning a neuron adjusts its
threshold and determines its function based on the
dynamically received training data. This is effectively done by dynamic threshold adjustment based on the
average sample value. In order to generalize this approach, consider dynamic adjustment of the se
parating
function described by a linear combination of the selected basis functions. Let us assume, that the input
data received by the neuron was normalized and is scaled between 1 and 256. Assume that each neuron has
two inputs describing neuron’s subs
pace data points with two coordinates x and y.
Assume that neuron will dynamically adjust its separating (thresholding) function to minimize the least
square error of the function approximating correlation between all training data x and y as follows:
(1)
where
F
i
is n dimensional vectors, and n is the number of data points.
Using least square solution we can determine a, b, c, and d by pseudo inverse. To do it dynamically we
need to accumulate function values and their combina
tions for different input samples. Equation (1) can be
solved as follows:
(2)
then
(3)
For on

line implementation, this requires storage of 14 values of different combinations
(4)
As new samples arrive, these 14 values are updated, and (3) solved for new coefficients. Using the new
coefficients a sample data is classified as passing threshold if
(5)
otherwise it is not. Use this scheme
to modify on

line threshold adjustment and learning. Demonstrate
with a two

class classification problem. Notice
–
you can use the difference
as a new variable z. Then normalize z to be included in [0,1] interval
by performing tra
nsformation
(6)
and use it to compute the neurons output by calculating
(7)
where p>0.5 is the probability of correct classification computed in the way similar to thresholding neuron
(check
that f(z) is between 0 and 256).
Project 6
Zhineng Zhu
In on

line statistical learning a neuron adjusts its threshold and determines its function based on the
dynamically received training data. This is effectively done by dynamic threshold adjustmen
t based on the
average sample value. Communication channel was used to transfer data between layers of neurons, so that
effective pipelining may take place, even if neurons are connected across several layers. Generalize the
communication channel to proc
ess 2 dimensional input vectors. Neurons should chose their
interconnections from the 2D neighborhood and determine how many connections should be used.
Neurons make their decisions based on the probability of correct classification of the connected neur
ons.
Perform classification based on logic neurons outputs and apply to a selected two class classification
problem.
Project 7
Mingwei Ding
The existing program for on

line threshold based classification uses data values only at the input nodes.
Th
e remaining nodes process logic information, which is based on the probability of correct classification
at individual logic nodes. This gives the triangular logic structure as input nodes are merged together in
groups of 5 (or any other number) and logic
information is purified as it reaches the final output logic layer.
Modify this program to process numerical data in several layers by using a linear fit to the input data.
Then the classification result of arithmetic nodes in all layers are combined usin
g the triangular logic
structure as in the existing program. This logic structure takes inputs from all arithmetic neurons, so the
number of logic input nodes is increased. Perform classification in this structure.
In order to introduce the on

line dyn
amic threshold adjustment use the linear separating plane in the input
space. Let us assume that a neuron has k inputs describing neuron’s subspace data points with coordinates
x
1
, x
2
, … x
k
.
Assume that neuron will dynamically adjust its separating (thre
sholding) function to minimize the least
square error of the function approximating correlation between all training data x
1
, x
2
, … x
k
as follows:
(1)
where
X
mi
m=1,…,k are n dimensional vectors, 1 is a vector of all ones,
and n is the number of data points.
Using the least square solution, we can determine a
1
,…,a
k
, by pseudo inverse. To do it dynamically, we
need to accumulate function values and their combinations for different input samples. Equation (1) can be
solved a
s follows:
(2)
then
(3)
For on

line implementation, this requires storage of k(k+3)/2 values of different combinations
(4)
As new samples arrive, these k(k+3)/2 values a
re updated, and (3) solved for new coefficients. Using the
new coefficients a sample data x
1
, x
2
, … x
k
is classified as passing threshold if
(5)
otherwise it is not. Use this scheme to modify on

line threshold adjustment
and learning. Demonstrate
with a two

class classification problem. Notice
–
you can use the difference
as a new variable z. Then normalize z to be included in [0,1] interval by
performing transformation
(6)
and use it to compute the numerical neurons output by calculating
(7)
(check that f(z) is between 0 and 256).
Project 8
Zhu Zhen
Support Vector Machine is an efficient kernel based statistical machine learni
ng method. Use the
description of A Library for Support Vector Machines (LIBSVM) provided by Chih

Chung
Chang and
Chih

Jen Lin
on the web page at
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
and OSU SVM Classifier Matlab Toolbox (ver 3.00) By
Junshui Ma
,
Yi Zhao
, and
Stanley Ahalt
that can be downloaded from
http://www.ece.osu.edu/~maj/osu_svm/
to illustrate the use of SVM on
selected application. (
The description on these web pages may be a good example
for a similar
description of our SOLAR software.
)
The OSU toolbox implements LIBSVM, and is capable of dealing with practical classification
problem with huge training set. This version contains two data

preprocessing functions,
Normalize.m
and
Scale.m
.
A
lso use A Practical Guide to Support Vector Classification by
Chih

Wei Hsu, Chih

Chung
Chang, and Chih

Jen Lin at
http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
for
explanations and experiment preparation.
Project 9
Yangcui Huang
Consid
er a following two

class classification problem. Generate two groups of the same number of points
on a two dimensional plane. For instance chose two arbitrary mean values as each group center and
generate a random set of points using 2D uniform Gaussian
distribution with zero mean and unit standard
deviation times two selected covariance matrices (different for each class). Make sure that the resulting
classes overlap. Plot points from both classes on 2D plane to illustrate your generated database.
Poin
ts in each group are then characterized by tree numbers (x,y,s) where x and y are coordinates on the 2D
plane and s is a symbol (for instance you can use s =1 and
–
1 for two classes).
1.
Combine all points together and find their mean value (x
mn
, y
mn
).
2.
Draw
the separation boundary line y
mn
*x

x
mn
*y=0.
3.
This line should divide your set of points into two subsets:
S
1
={(x,y)  (y
mn
*x

x
mn
*y) > 0} and
S
2
={(x,y)  (y
mn
*x

x
mn
*y) < 0}
4.
Determine the total number of points in each subset n
1
and n
2
and calcu
late four probabilities
based on the number of points from each class in subsets S
1
and
S
2
:
.
5.
Find the largest probability
p
ij
, and associate points from class
j
with the subspace
i
and points of
the opposite class with the opposi
te subspace. In each subspace remove all the points of the class
associated with this subspace.
6.
Use the remaining points solve the least square fit problem that satisfies the following equation:
where X and Y contain all the remain
ing points coordinates, and a, and b are unknown values.
For hint how to solve the least square fit problem see discussion in Project 7 equations (1)

(3).
7.
Use the solution obtained in 6 as a new separation boundary between two sets of points.
8.
Repeat 3

7 un
til there is no change in the separation boundary line equation. Determine
probability of correct classification based on average probability of classes associated with each
subspace.
Try your program on several data sets.
Comments 0
Log in to post a comment