Artificial Intelligence Tutorial 5

Answers
1. Use the table of examples below:
Exam
Use Calculator
Duration (hrs)
Lecturer
Term
Difficulty
1
yes
3
Jones
summer
easy
2
yes
3
Jones
spring
difficult
3
no
3
Smith
spring
difficult
4
no
2
Armstrong
summe
r
easy
5
yes
2
Jones
summer
easy
1a) Calculate the Entropy of the set of five examples with respect to the binary categorisation into
difficult and easy problems. Use the formula:
Entropy(S) =

p
+
log
2
(p
+
)
–
p

log
2
(p

)
Where p
+
is the proportion of
exams which are in the positive category (which we’ll take to be
difficult
), and p

is the proportion of exams in the negative category.
There are 2 exams in the positive category and 3 in the negative category. Hence, p
+
is 2/5 and p

is
3/5. We can sim
ply put these into the formula as follows:
Entropy(S) =

2/5 log
2
(2/5)
–
㌯㔠P潧
2
(3/5)
To calculate log
2
(2/5) using our calculators, we need the well known result that log
2
(y)=ln(y)/ln(2),
where ln(y) is the natural logarithm of y, and is found on most
calculators. The result of the
calculation is:
Entropy(S) = (

2/5)(

1.322)
–
⠳⼵((
J
〮㜳0⤠)‰⸵ 㤠9‰⸴㐲㈠ ‰⸹㜱
ㅢ⤠p異灯獥 a渠age湴n畳u湧 瑨t f䐳aa汧o物瑨r 桡s c桯獥渠瑯t畳u 瑨攠汥l瑵牥爠a瑴物扵瑥 潦o瑨e 灲潢汥洠
a猠瑨攠瑯瀠湯摥 楮i瑨攠摥c楳i潮ot
牥e 楴 楳 汥l牮楮r⸠f渠䕮g汩獨Ⱐ睨y 睯畬搠瑨攠a汧潲楴桭 桡癥 c桯hen
瑨慴瑴物扵瑥? t桡琠睯畬搠瑨攠灡牴楡汬yea牮e搠摥c楳i潮⁴牥e潯欠 楫攠i晴fr⁴桡琠 桯楣h?
ㅣ⤠)桡琠慴瑲楢畴e⁷潵汤 桯獥渠n猠瑨e x琠t漠灵琠o渠瑨攠瑲ee⁵湤 爠瑨e⁊潮 猠s牲o眿
ㅢ⤠T
桥 f䐳a a汧o物瑨r 扥g楮猠by 摥c楤i湧 睨楣栠a瑴物扵瑥 sc潲os 浯獴 景f 瑨t 楮景i浡瑩潮m条楮
ca汣畬l瑩潮Ⱐ睨楣栠楳i 楴se汦l 扡獥搠潮 瑨攠e湴牯ny ca汣畬l瑩潮⸠fn景f浡瑩潮o条楮i 景f a瑴物扵瑥 䄠楳
ca汣畬l瑥搠a猺
睨w牥 p
v
 is the number of examples in the set wh
ich take value V for attribute A. Information gain
can be seen as the expected reduction in entropy which would be caused by knowing the value of
attribute A. Hence, ID3 chooses the attribute which has the most expected reduction. The decision
tree will th
erefore have the lecturer attribute as the top node of the tree, with three branches coming
from it as follows:
Jones Smith Armstrong
1c) The lecturer attribute cannot be
used, so the answer must be either the use

calculator, the term or
the duration attribute. To determine which one would be used, we must calculate the information
gain for each one
with respect to the exams where Dr. Jones was the lecturer.
So, we will on
ly be
using three examples in the following calculations, namely exams 1, 2 and 5: the ones which Dr.
Jones lectured. Hence S will be taken as 3 from now on. Moreover, we need to calculate the entropy
of this set of examples. Noting that only one example
is difficult (positive) and two are easy
(negative), we can calculate the entropy as:
Entropy(S) =

(1/3)log
2
(1/3)
–
(2/3)log
2
(2/3) = 0.528 + 0.390 = 0.918
We will start with the use

calculator attribute. This takes values yes and no, and the example set
can
be partitioned like this:
S
yes
= {exam1, exam2, exam5} and S
no
= {}
This means that the entropy of the ‘yes’ examples is the entropy of all the examples, so this is 0.918
as before. Also, the entropy of the ‘no’ examples is 0. Putting all this into
the information gain
calculation for the use

calculator attribute gives zero: because all the examples (of Jones’ exams)
require a calculator, knowing this fact does not reduce the entropy of the system.
We now move on to the duration attribute. The valu
es for this are 2 and 3, and S
2
is {exam5},
whereas S
3
is {exam1, exam2}. We need to calculate the value of Entropy(S
2
) and Entropy(S
3
). For
S
2
, as there is only one example, the entropy will be zero, because we take 0*log
2
(0) to be zero, as
discussed in t
he notes. For S
3
, there is one difficult exam and one easy exam. Therefore, the value of
Entropy(S
3
) will be
–
(1/2) log
2
(1/2)
–
(1/2) log
2
(1/2), which comes to 1. It had to be 1, because, with
both examples having a different categorisation, the entropy mus
t be as high as it possibly can be.
The information gain for the duration attribute will therefore be:
Gain(S, duration) = Entropy(S)
–
(S2*Entropy(S2))/S

(S3*Entropy(S3))/S=0.918
–
0/3
–
2/3=0.251
If we finally look at the term attribute, we see that
there are two possible values, spring and summer,
and that S
spring
= {exam2}, while S
summer
= {exam1, exam5}. Hence Entropy(S
spring
) will be 0,
because there is only one value. Moreover, as exam1 and exam5 are both in the easy category,
Entropy(S
summer
) wi
ll also be zero. Therefore: Gain(S,term) = 0.918
–
0
–
0 = 0.918.
Hence, as the term attribute has the highest information gain, it is this attribute which will be chosen
as the next node below the Jones arrow in the decision tree.
Lecturer
?
?
?
2. Suppos
e we have a function, f, which takes in two inputs and outputs an integer which takes only
the value 10 or the value 17, such that
f(2,2) = f(3,3) = 10 and f(

1,

1) = f(

2,

2) = 17.
2a) Plot f on a graph.
2b) Can f be represented as a perceptron? If
so, explain your answer.
2c) Draw a suitable perceptron for the function.
2a) The plot is as follows:
This is a linearly seperable graph, because we can draw a line through the graph which separates the
10s from the 17s, as indicated by the dotted l
ine on the graph. Hence the function is learnable as a
perceptron, because weights and a threshold can be found to distinguish the categories.
2b) To determine a suitable perceptron, we must realise that we only want the perceptron to “fire”
睨w渠瑨t 瑷漠
楮灵i猠a牥 楮i瑨攠瑯瀠物g桴h煵a摲d湴n潦o瑨攠杲g灨p⡩⸬.扯瑨bp潳楴楶攩⸠周Tre景feⰠ楦i睥
獥琠瑨攠瑨牥獨潬搠瑯t扥 ze牯Ⱐa湤n瑨攠we楧桴猠瑯tbe 扯瑨bㄬ1瑨敮t瑷漠灯獩瑩癥 癡汵敳l楮灵i 睩汬 灲潤pce
a 灯獩瑩癥 獵洬s a湤nhence 瑨t 灥rce灴牯渠睩汬 f楲i⸠佮l 瑨t 潴o
er 桡湤Ⱐ瑷漠湥ga瑩癥 楮灵i猠睩汬
灲潤pce a 湥ga瑩癥 獵洮s周楳T睩汬 扥 汥獳l 瑨慮t 瑨攠瑨牥獨潬搠⠰(Ⱐ獯s瑨e 灥牣e灴牯渠睩汬 湯琠晩fe.
䡥湣eⰠ瑨e潬汯睩湧業灬攠灥牣e灴牯渠楳渠pcce灴慢汥 灲e獥湴慴楯渠潦⁴桩猠晵湣瑩潮o
3. Suppose we are training
a neural network to learn a function which takes three integers as input,
representing weight, fuel capacity and passenger numbers and outputs either “car” or “bus”. Suppose
the ANN currently looks like this:
3a) Why are there four input nodes? Fill in
the two ?s.
3b) Does the network predict a bus or a car for this triple of inputs: (10,12,13)?
3a) The top node is a special input node which always outputs 1. That’s why there is seemingly one
浯牥 湯摥 瑨慮t瑨攠楮灵i 牥煵楲q献s周e a摶d湴n来 瑯t桡癩湧
瑨t猠楳i獯s瑨慴t瑨攠瑨牥獨潬搠ca渠be 獥琠a琠
zero and doesn’t have to be learned. So that means the two ?s are both zero.
㍢⤠te 桡癥 瑯t牥me浢m爠瑨慴t瑨攠潵瑰畴 晲潭o瑨t 晩f獴s楮灵i 湯摥 楳iㄠa湤n瑨攠潵瑰畴 晲潭o瑨攠潴桥o
瑨牥e 湯摥s 楳i瑨攠獡浥 a猠瑨攠
癡汵攠楮灵i 瑯t瑨敭t 䡥湣e 瑨攠weig桴敤h獵s 楮i漠瑨攠潵瑰畴 湯摥 睩汬
扥:
〮ㄪㄠ0
J
〮㌪‰⸲⨱
J
〮㈪ㄳ‽‰⸱
–
㌠+′ 㐠
–
㈮㘠2
J
㌮P
周楳猠汥獳⁴桡渠ne牯Ⱐ獯⁴桥 瑷潲t⁷潵汤⁰ e摩捴⁴桡琠瑨攠瑲楰ie映癡汵敳a浥牯洠ra爮
㍣⤠p異灯獥 瑨慴t
瑨e 瑲楰ie ⠱〬(㈬ㄳ2 ha猠 扥e渠 浩s
J
ca瑥g潲楳od by 瑨e 灥rce灴牯渮 啳楮朠 瑨攠
灥牣e灴牯渠汥l牮楮朠牵汥I ca汣畬lte 瑨攠we楧桴hc桡湧e 景f 瑨e 睥ig桴猠楮i瑨e 湥瑷潲t 楮i汩g桴h潦 瑨t猠
瑲t楮i湧 瑷t牫Ⱐ楦⁷ ⁵獥敡牮楮r 牡瑥映〮ㄮ†
㍤⤠P桡琠摯敳⁴桥
J
瑲
ai湥搠湥瑷潲t潯欠 楫政
㍥⤠)潥猠瑨攠牥
J
瑲a楮敤e瑷潲t潲牥c瑬ya瑥g潲o獥⁴桥xa浰me
ⰱ㈬ㄳ⤿
㍣⤠周T ac瑵慬t潵瑰畴 晲o洠瑨攠湥瑷潲t 景f 瑨攠瑲t灬攠⠱〬(㈬ㄳ2 睡猠
J
ㄠ⡡ 扵猩Ⱐ扵琠瑨攠瑡tge琠潵瑰畴
睡猠sㄠ1aa爩⸠
周T 睥ig桴h c桡n来 景r a weig桴
扥瑷ee渠 a渠 i湰畴n 湯摥 a湤 瑨攠 潵瑰畴u 湯摥 楳i ca汣畬l瑥t by
浵m瑩灬y楮g 瑨攠潵瑰畴 晲潭o瑨攠楮灵i 湯摥 by 瑨t 摩晦e牥nce 扥瑷ee渠瑨攠targe琠a湤n潢獥牶r搠潵瑰ot
癡汵敳⸠周Tse 癡汵敳 are 瑨敮t獣a汥l by 浵m瑩灬yi湧 by 瑨e 汥a牮楮g ra瑥⸠周T 摩晦e牥nce be瑷ten
瑨攠
瑡t来琠a湤n潢oe牶rd 癡汵e猠景f 瑨e 瑲楰te 楳iㄠ
–
(
J
1⤠= ㈮2周e we楧桴hc桡n来猠a牥 瑨e牥景fe ca汣畬a瑥搠
a猺s
c桡nge 爠reig桴‱‽‰hㄠ⨠㈠⨠ㄠ1‰⸲
c桡nge 爠reig桴′‽‰hㄠ⨠㈠⨠‽′
c桡nge 爠reig桴″‽‰hㄠ⨠㈠⨠ㄲ‽′⸴
c桡nge 爠reig桴‴‽h
〮ㄠ⨠㈠⨠ㄳ‽′⸶
3d) The weight changes get added on to the existing weights. Hence the new weights are:
weight 1 = 0.1 + 0.2 = 0.3
weight 2 =

0.3 + 2 = 1.7
weight 3 = 0.2 + 2.4 = 2.6
weight 4 =

0.2 + 2.6 = 2.4
and the altered network looks like th
is:
3e) If we now feed the input value forward through the network, we get the following value for the
weighted sum:
1*0.3 + 10*1.7 + 12*2.6 + 13*2.4
This is clearly greater than zero, so the network now correctly predicts that the values came from
a
bus, rather than a car as it previously did.
3f) Has the network over

corrected? This triple: (5,7,7) used to be (correctly) categorised as a car. Is
it still correctly categorised?
3f) The weighted sum for input triple (5,7,7) is 1*0.3 + 1.7*5 +
2.6*7 + 2.4*7 > 0. Hence this triple
is now predicted to come from a bus, so this example is no longer correctly categorised by the
network. It’s likely that we should have used a smaller learning rate.
Comments 0
Log in to post a comment