1
Neural Network To Predict Student Success
Abstract:
Some problems of on

line courses are related with the adaptive strategies
for efficient teaching. Suggested approach uses neural network models for this
purpose.
It is important to understand the factors
that may affect students’ success
in an online course and predict the failures before they happen. These predictions
can be helpful to design dynamic teaching strategies yielding a more successful
learning process for the students. The factors that affect
learning can be grouped
as external and internal. External factors, such as age, sex, success in previous
courses, are already known before the student applies in the course. Gathering the
data of external factors of the previous students of the same cour
se and training a
back

propagation network with these data will make instructor to see the effective
factors and their weights in result. This net can be used to predict new registered
student’s success and choose a suitable teaching strategy. The internal
factors
appear during the learning process in the course such as time spent for a particular
section in course, number of requests to the instructor and etc. These data can be
used to train another back

propagation network and can be used to determine a
s
uitable dynamic sequence of information to be presented.
A special program for every applying student could be developed.
Different strategies can be suggested according the adaptive selection of
subprograms after the initial step. Every strategy can be ev
aluated during the
course time and after the course. This results in improved strategies to be
implemented for the course in the future.
Keywords:
Artificial neural networks, Back

propagation algorithm, Education,
Distance Education, prediction of student
success.
Introduction
Since education is an orderly and deliberate effort, some plan needed to is needed
to guide this effort. The term curriculum generally refers to this plan. Curriculum
is a set of intentions about opportunities for to be educated wit
h other persons and
with things (all bearers of information, processes, techniques and values) in
certain arrangements of time and space. (
J.Lewis and Miel, 1989 ,p.27 )
. In
traditional classroom teaching or in web based distance education environments
on
ly one plan is used for whole set of students. Educational aims are not achieved
by the whole set, because of the some differences between individual students.
These differences can be predicted and problems can be prevented before they
happen. Our method
finds the effective student properties in achieving
educational goals. We use a neural network model for this purpose. A neural
network is a system composed of many simple processing elements operating in
parallel whose function is determined by network st
ructure, connection strengths,
and the processing performed at computing elements or nodes. In principle, NNs
can compute any computable function, i.e., they can do everything a normal
digital computer can do (
Siegelmann, H.T., and Sontag, E.D., 1999, p.77

80),
or
perhaps even more, under some assumptions of doubtful practicality . Practical
applications of NNs most often employ supervised learning. For supervised
2
learning, training data that includes both the input and the desired result (the target
value)
must be provided. After successful training, input data can be presented
alone to the NN (that is, input data without the desired result), and the NN
computes an output value that approximates the desired result. However, for
training to be successful, l
ot of data and lot of computer time are needed to do the
training .There are many kinds of NN’s. In this work Multilayer feedforward
neural network model is used. Multilayer feedforward neural networks have been
the preferred neural network architectures f
or the solution of classification and
function approximation problems due their interesting learning and generalization
abilities. The use of artificial neural networks to approximate functions with a
high degree of non

linearity is well established. It
is mathematically proven in the
literature (
Hornik and Stinchcombe and H.White,1989,p.183

192)
that with
enough hidden neurons, a feed

forward network with a single hidden layer can
approximate any arbitrarily complex nonlinear function.
Feedforward Netw
ork
Neural Networks operate as a parallel computer, which consists of a number of
processing elements (PEs) that are interconnected. In feedforward networks, the
PEs are arranged in layers: an input layer, one or more hidden layers, and an
output layer .T
he input from each PE in the previous layer (x
i
) is multiplied by a
connection weight (w
ji
), these connection weights are adjustable and may be
likened to the coefficients in statistical models. At each PE, the weighted input
signals are summed and a thres
hold value (
j
) is added. This combined input (I
j
) is
then passed through a nonlinear transfer function (f(.)) to produce the output of
the PE (y
j
). The output of one PE provides the input to the PEs in the next layer.
(
Maier and Dandy,2001,671
) This proce
ss is summarized and illustrated in
Figure1.
I
j
=
w
ji
x
i
+
j
summation
y
j
=f(I
j
)
transfer
Figure 1: Processing Element
x
0
x
1
x
2
x
n
w
j0
w
j1
w
j2
w
jn
SUM
Transfer
y
j
3
We considered the grades of some courses as inputs and a specific course that
success o
f it to be predicted as output in our feedforward neural network
model.(figure 2)
Figure 2: Model used in the research
This model is dynamic at the input and the hidden layers. According to the n input
data, input layer contain
s 1 bias PE and n input PE’s in input layer. Bias PE
always contains 1 as an input. A bias term can be treated as a connection weight
from a special unit with a constant activation value . Before input data is given
into the network, input grades are trans
lated into floating point values as shown in
table 1.
GRADE
VALUE
A
0.900
A

0.750
B+
0.600
B
0.450
B

0.300
C+
0.150
C
0.001
C


0.150
D+

0.300
D

0.450
D


0.600
F

0.900
Table 1 Grade scaling
In many problems, input vectors and output vec
tors have components in
the same range of values. Because one factor in the weight correction expression
is the activation of the lower unit, units whose activations are zero will not
participate in learning process?. This approach? suggests that learning
may be
improved if the input is represented in bipolar form and the bipolar sigmoid is
used for the activation function.
Each input unit (Xi, I=1,…..,n) receives input
Input layer
Hidden layer
Output layer
bias
bias
v
ij
x
i
x
1
z
j
w
j
y
4
signal x
i
and broadcasts his signal to all units above (the hidden units). Each PE in
hi
dden unit (Zj,j=1,….p) sums its weighted input signal
z_in
j
=v
0j
+
x
i
v
ij
here weights v
ij
are small random values between
–
0.5 and 0.5. The number of
hidden PE’s is determined at runtime.
One rule of thumb is that “it should never
be
more than twice as large as the input layer." (Berry and Linoff, 1997, p. 323).
Each hidden unit a
pplies its activation function to compute its output signal,
z
j
=f(z_in
j
).
An activation function for a backpropagation net should have several
important char
acteristics: It should be continuous, differentiable, and
monotonically non

decreasing. Furthermore, for computational efficiency, it is
desirable that its derivative be easy to compute. For the most commonly used
activation functions, the value of the de
rivative (at a particular value of the
independent variable) can be expressed in terms of the value of the function (at
that value of the independent variable). Usually, the function is expected to
saturate, i.e., approach finite maximum and minimum values
asymptotically.
f(x)=
Output unit sums its weighted input signals
y_in=w0+
z
j
w
j
and applies its activation function to compute its output signal
y =f(y_in)
In this research, 90% of student grade data are used
to train the network and 10%
to test the network. For each training data, the grade to be examined is assigned
as target. If any of the difference between y
and the target value is more than
0.05,
the network updates its weights until the difference beco
me less than 0.05. To do
this update, output PE calculates its error information term
=(t

y)f
1
(y_in)
calculates its weight correction term (used to update w
j
later)
w
j
=
z
j
calculates its bias correction term (used to update w
j
later)
楳⁴桥敡牮r
湧n瑥t
w
0
=
and sends
to units in the layer below
Each hidden unit z
j
sums its delta inputs (from units in the layer above),
_in
j
=
w
j
multiplied by derivative of its activation function to calculate its error information
term,
j
=
_in
j
f
1
(z_in
j
)
further calculates its weight correction term (used to update v
ij
later)
v
ij
=
j
x
i
and calculates its bias correction term (used to update v0j later)
v
0j
=
j
5
output unit y updates its bias and weights :
w
j
(new) =w
j
(old)+
w
j
Each hidden unit z
j
up
dates its bias and weights :
v
ij
(new) =v
ij
(old)+
v
ij
Case Study
In this study we use multilayer backpropagation neural network for predicting
grades of students. To implement the code CBuilder 5 is used. We use an
msAccess database of student grades
in faculty of Engineering of Baskent
University. The Database contains 173 records of students with a maximum 37
grades. Not all the students had all these grades so an SQL query is constructed
within the application program to find the suitable students.
Within this query we
choosed 90% of students randomly to train and %10 of students to test. Using the
grades of the lessons taken in the first semester as inputs, the grade of the target
lesson that was taken in second semester as target output, the pro
gram was run
100 times. And in these runs we found that the application predicted in 0 error
with 24% , with ±1(like C and C+) error %31, with +

2 error %14 , with +

3
error %17. This shows that just using previous grades can help us to predict
suc
cess in new course.
Conclusion and Future Work
In this work, we tried to estimate grade before the student attends the
course. In the future we plan to build a web page that uses the weights calculated
by neural network as a starting point. The page wi
ll also use its own students data
for training its own network. We plan to add a test to web page for getting the
level of pre

knowledge of each new student in the course. This will also be used
in training. Neural Network is an effective tool in many area
s. We can predict
student grade before the course begins and also within the course.
This results in
improved strategies to be implemented for the course in the future.
References
Books
1 J.Lewis and Alice Miel, Supervision for improved instruction: New
Challenges,
New Responses Belmont, Calif: Watsworth Publishing Company
2. Berry, M.J.A., and Linoff, G. (1997), Data Mining Techniques
,
NY: John
Wiley & Sons.
Articles
3. Siegelmann, H.T., and Sontag, E.D. (1999), "Turing Computability with
Neural Ne
tworks," Applied Mathematics Letters, 4
4.
H. R. Maier and G. C. Dandy
Neural network based modelling of
environmental variables: A systematic approach,
Mathematical and Computer
Modelling, Volume 33, Issues 6

7, March

April 2001, Page 671
5.
K.Hornik,M
.Stinchcombe,H.White,Multilayer feedforward networks are
universal approximators,Neural Networks 2 (1989) 183
–
192.
6
Mustafa Zafer Bolat
Research Assistant,
Department of Computer Engineering,
Başkent University, Ankara
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο