Bayesian Knowledge Tracing and Other Predictive
Models in Educational Data Mining
Zachary A. Pardos
PSLC Summer School 2011
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
2
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Outline of Talk
•
Introduction to Knowledge Tracing
–
History
–
Intuition
–
Model
–
Demo
–
Variations (and other models)
–
Evaluations (baker work /
kdd
)
•
Random Forests
–
Description
–
Evaluations (
kdd
)
•
Time left?
–
Vote on next topic
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
History
•
Introduced in 1995 (Corbett
& Anderson,
UMUAI)
•
Basked on ACT

R theory of skill knowledge
(Anderson 1993)
•
Computations based on a variation of
Bayesian calculations proposed in 1972
(Atkinson)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Intuition
•
Based on the idea that practice on a skill leads
to mastery of that skill
•
Has four parameters used to describe student
performance
•
Relies on a KC model
•
Tracks student knowledge over time
Given a student’s response sequence 1 to n, predict n+1
0
0
0
1
1
1
?
For some Skill K:
Chronological response sequence for student
Y
[
0 = Incorrect response 1 = Correct response]
1 …. n n+1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
0
0
0
1
1
1
1
Track knowledge over time
(model of
learning
)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Knowledge Tracing (KT) can be represented as a simple HMM
Latent
Observed
Node representations
K = Knowledge node
Q = Question node
Node states
K = Two state (0 or 1)
Q = Two state (0 or 1)
UMAP 2011
7
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Four parameters of the KT model:
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
UMAP 2011
P(L
0
)
P(T)
P(T)
P(G)
P(G)
P(G)
P(S)
Probability of forgetting assumed to be zero (fixed)
8
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Formulas for inference and prediction
•
Derivation (Reye, JAIED 2004):
•
Formulas use Bayes Theorem to make
inferences about latent variable
If
𝐶 𝑐
𝑛
𝑃
(
𝑛
−
1
)
=
𝑃
𝐿
𝑛
−
1
∗
(
1
−
𝑃
𝑆
)
𝑃
𝐿
𝑛
−
1
∗
1
−
𝑃
𝑆
+
(
1
−
𝑃
𝐿
𝑛
−
1
)
∗
(
𝑃
𝐺
)
(1)
𝐼𝑐 𝑐
𝑛
𝑃
(
𝑛
−
1
)
=
𝑃
𝐿
𝑛
−
1
∗
𝑃
𝑆
𝑃
𝐿
𝑛
−
1
∗
𝑃
(
𝑆
)
+
(
1
−
𝑃
𝐿
𝑛
−
1
)
∗
(
1
−
𝑃
𝐺
)
(2)
𝑃
𝑛
=
𝑃
(
𝑛
−
1
∗
(
1
−
𝑃
𝐹
)
+
(
1
−
𝑃
(
𝑛
−
1
)
∗
𝑃
(
)
)
(3)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
0
0
1
1
1
Model Training Step

Values of parameters
P(T), P(G), P(S) & P(L
0
)
used to predict
student responses
•
Ad

hoc
values could be used but will likely not be the best fitting
•
Goal: find a set of values for the parameters that minimizes prediction error
1
1
1
1
0
1
0
0
0
0
1
0
Student A
Student B
Student C
0
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Model Training:
Model Tracing Step
–
Skill: Subtraction
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
0
1
1
Student’s last three responses to Subtraction questions
(in the Unit)
Test set questions
Latent
(knowledge)
Observable
(responses)
10%
45%
75%
79% 83%
71%
74%
P(K)
P(Q)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Model Prediction:
I
nfluence of parameter values
P(L
0
): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09
Student reached 95% probability of knowledge
After 4
th
opportunity
Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1
P(L
0
): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09
P(L
0
): 0.50 P(T): 0.20
P(G): 0.64 P(S): 0.03
Student reached 95% probability of knowledge
After 8
th
opportunity
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Influence of parameter values
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
( Demo )
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Variations on Knowledge Tracing
(and other models)
Prior Individualization Approach
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Do all students enter a lesson with the same background knowledge?
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
Node representations
K = Knowledge node
Q = Question
node
S = Student node
Node states
K = Two state (0 or 1)
Q = Two state (0 or 1)
S = Multi state (1 to N)
P(L
0
S)
Observed
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Conditional Probability Table of Student node and Individualized Prior node
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
S)
S
value
P(S=value)
1
1/N
2
1/N
3
1/N
…
…
N
1/N
CPT of Student node
•
CPT of observed
student node is fixed
•
Possible to have S value
for every student ID
•
Raises initialization
issue (where do these
prior values come from?)
S value can represent a
cluster or type of student
instead of ID
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Prior Individualization Approach
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Conditional Probability Table of Student node and Individualized Prior node
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
S)
S
value
P(L
0
S)
1
0.05
2
0.30
3
0.95
…
…
N
0.92
CPT of Individualized Prior node
•
Individualized L
0
values
need to be seeded
•
This CPT can be fixed or
the values can be learned
•
Fixing this CPT and
seeding it with values
based on a student’s first
response can be an
effective strategy
This model, that only
individualizes L
0
, the Prior
Per Student (PPS) model
P(L
0
S)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Prior Individualization Approach
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Conditional Probability Table of Student node and Individualized Prior node
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
S)
S
value
P(L
0
S)
0
0.05
1
0.30
CPT of Individualized Prior node
•
Bootstrapping prior
•
If a student answers
incorrectly on the first
question, she gets a low
prior
•
If a student answers
correctly on the first
question, she gets a
higher prior
P(L
0
S)
1
1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Prior Individualization Approach
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
S)
S
value
P(L
0
S)
0
0.05
1
0.30
CPT of Individualized Prior node
What values to use for
the two priors?
P(L
0
S)
1
1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Prior Individualization Approach
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
S)
S
value
P(L
0
S)
0
0.10
1
0.85
CPT of Individualized Prior node
1.
Use ad

hoc values
P(L
0
S)
1
1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Prior Individualization Approach
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
S)
S
value
P(L
0
S)
0
EM
1
EM
CPT of Individualized Prior node
1.
Use ad

hoc values
2.
Learn the values
P(L
0
S)
1
1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Prior Individualization Approach
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
S)
S
value
P(L
0
S)
0
Slip
1
1

Guess
CPT of Individualized Prior node
1.
Use ad

hoc values
2.
Learn the values
3.
Link with the
guess/slip CPT
P(L
0
S)
1
1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Prior Individualization Approach
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
S)
S
value
P(L
0
S)
0
Slip
1
1

Guess
CPT of Individualized Prior node
1.
Use ad

hoc values
2.
Learn the values
3.
Link with the
guess/slip CPT
P(L
0
S)
1
1
With ASSISTments, PPS (ad

hoc) achieved an R
2
of 0.301 (0.176 with KT)
(Pardos & Heffernan, UMAP 2010)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Prior Individualization Approach
UMAP 2011
25
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Variations on Knowledge Tracing
(and other models)
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
UMAP 2011
P(L
0
)
P(T)
P(T)
(Baker et al., 2010)
26
1. BKT

BF
Learns values for
these parameters
by
performing a
grid search
(0.01 granularity)
and chooses the set of parameters with the
best squared error
. . .
P(G)
P(G)
P(G)
P(S)
P(S)
P(S)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
UMAP 2011
P(L
0
)
P(T)
P(T)
(Chang et al., 2006)
27
2. BKT

EM
Learns values for
these parameters
with
Expectation Maximization
(EM). Maximizes
the log likelihood fit to the data
. . .
P(G)
P(G)
P(G)
P(S)
P(S)
P(S)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
UMAP 2011
P(L
0
)
P(T)
P(T)
(Baker, Corbett, & Aleven, 2008)
28
3
. BKT

CGS
Guess and slip parameters
are assessed
contextually
using a regression on features
generated from student performance in the
tutor
. . .
P(G)
P(G)
P(G)
P(S)
P(S)
P(S)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
UMAP 2011
P(L
0
)
P(T)
P(T)
(Baker, Corbett, & Aleven, 2008)
29
4
. BKT

CSlip
Uses the student’s averaged
contextual
Slip
parameter
learned across all incorrect
actions.
. . .
P(G)
P(G)
P(G)
P(S)
P(S)
P(S)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
UMAP 2011
P(L
0
)
P(T)
P(T)
(
Nooraiei
et al, 2011)
30
5. BKT

LessData
Limits
students
response sequence length
to
the most recent 15 during EM training.
. . .
P(G)
P(G)
P(G)
P(S)
P(S)
P(S)
Most recent 15 responses used (max)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
UMAP 2011
P(L
0
)
P(T)
P(T)
(Pardos & Heffernan, 2010)
31
6. BKT

PPS
Prior per student (PPS) model which
individualizes
the
prior parameter
. Students
are assigned a prior based on their response
to the first question.
. . .
P(G)
P(G)
P(G)
P(S)
P(S)
P(S)
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
S
)
Observed
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
UMAP 2011
32
7
. CFAR
Correct on First Attempt Rate (CFAR)
calculates the student’s
percent correct
on
the
current skill
up until the question being
predicted.
Student responses for Skill X:
0 1 0 1 0 1
_
Predicted next response would be 0.50
(Yu et al., 2010)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
UMAP 2011
33
8. Tabling
Uses the student’s response sequence (max
length 3) to predict the next response by
looking up the
average next response
among
student with the
same sequence
in the
training set
Training set
Student A: 0 1 1
0
Student B: 0 1 1
1
Student C: 0 1 1
1
Predicted next response would be 0.66
Test set student: 0 0 1 _
Max table length set to 3:
Table size was 2
0
+2
1
+2
2
+2
3
=15
(Wang et al., 2011)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
UMAP 2011
34
9
. PFA
Performance Factors Analysis (PFA).
Logistic
regression
model which elaborates on the
Rasch
IRT model. Predicts performance based
on
the count of student’s prior failures and
successes
on the current skill.
An overall difficulty parameter
ᵝ
is also fit for
each skill
or
each item
In this study we
use the
variant of PFA that fits
ᵝ
for
each skill. The PFA
equation is:
,
∈
𝐶
,
,
=
+
(
+
𝜌
𝐹
)
(Pavlik et al., 2009)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Study
•
Cognitive Tutor for Genetics
–
76 CMU undergraduate students
–
9 Skills (no multi

skill steps)
–
23,706
problem solving attempts
–
11,582
problem steps in the
tutor
–
152 average problem steps completed per student
(SD=50)
–
Pre and post

tests were administered with this
assignment
Dataset
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Methodology
Evaluation
Intro to Knowledge Tracing
Study
•
Predictions were made by the 9 models using a 5 fold cross

validation by
student
Methodology
model in

tutor prediction
Student 1
Skill A
Resp
1
0.10
0.22
0
Skill A
Resp
2
….
0.51
0.26
1
Skill A
Resp
N
0.77
0.40
1
Student 1
Skill B
Resp
1
…
0.55
0.60
1
Skill B
Resp
N
0.41
0.61
0
…
Actual
•
Accuracy was calculated with A’ for each student. Those values were then
averaged across students to report the model’s A’ (higher is better)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Study
Results
in

tutor model prediction
Model
A’
BKT

PPS
0.7029
BKT

BF
0.6969
BKT

EM
0.6957
BKT

LessData
0.6839
PFA
0.6629
Tabling
0.6476
BKT

CSlip
0.6149
CFAR
0.5705
BKT

CGS
0.4857
A’ results averaged across students
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Study
Results
in

tutor model prediction
Model
A’
BKT

PPS
0.7029
BKT

BF
0.6969
BKT

EM
0.6957
BKT

LessData
0.6839
PFA
0.6629
Tabling
0.6476
BKT

CSlip
0.6149
CFAR
0.5705
BKT

CGS
0.4857
A’ results averaged across students
No significant
differences within
these BKT
Significant
differences
between these BKT
and PFA
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Study
•
5 ensemble methods were used, trained with the same 5 fold cross

validation
folds
Methodology
ensemble in

tutor prediction
•
Ensemble methods were trained using the 9 model predictions as the features
and the actual response as the label.
Student 1
Skill A
Resp
1
0.10
0.22
0
Skill A
Resp
2
….
0.51
0.26
1
Skill A
Resp
N
0.77
0.40
1
Student 1
Skill B
Resp
1
…
0.55
0.60
1
Skill B
Resp
N
0.41
0.61
0
…
Actual
features
label
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Study
•
Ensemble methods used:
1.
Linear regression with no feature selection (predictions bounded between {0,1})
2.
Linear regression with feature selection (stepwise regression)
3.
Linear regression with only BKT

PPS & BKT

EM
4.
Linear regression with only BKT

PPS, BKT

EM & BKT

CSlip
5.
Logistic regression
Methodology
ensemble in

tutor prediction
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Study
Results
in

tutor ensemble prediction
Model
A’
Ensemble:
LinReg
with BKT

PPS, BKT

EM
& BKT

CSlip
0.7028
Ensemble:
LinReg
with BKT

PPS
& BKT

EM
0.6973
Ensemble:
LinReg
without feature selection
0.6945
Ensemble:
LinReg
with feature selection (stepwise)
0.6954
Ensemble:
Logistic without feature selection
0.6854
A’ results averaged across students
Tabling
No significant difference between ensembles
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Study
Results
in

tutor ensemble & model prediction
Model
A’
BKT

PPS
0.7029
Ensemble
:
LinReg
with BKT

PPS, BKT

EM
& BKT

CSlip
0.7028
Ensemble
:
LinReg
with BKT

PPS
& BKT

EM
0.6973
BKT

BF
0.6969
BKT

EM
0.6957
Ensemble
:
LinReg
without feature selection
0.6945
Ensemble
:
LinReg
with feature selection (stepwise)
0.6954
Ensemble
:
Logistic without feature selection
0.6854
BKT

LessData
0.6839
PFA
0.6629
Tabling
0.6476
BKT

CSlip
0.6149
CFAR
0.5705
BKT

CGS
0.4857
A’ results averaged across students
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Study
Results
in

tutor ensemble & model prediction
Model
A’
Ensemble
:
LinReg
with BKT

PPS,
BKT

EM & BKT

CSlip
0.7451
Ensemble
:
LinReg
without feature selection
0.7428
Ensemble
:
LinReg
with feature selection (stepwise)
0.7423
Ensemble
:
Logistic regression without feature selection
0.7359
Ensemble
:
LinReg
with BKT

PPS
& BKT

EM
0.7348
BKT

EM
0.7348
BKT

BF
0.7330
BKT

PPS
0.7310
PFA
0.7277
BKT

LessData
0.7220
CFAR
0.6723
Tabling
0.6712
Contextual Slip
0.6396
BKT

CGS
0.4917
A’ results calculated across all actions
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
In the KDD Cup
•
Motivation for trying non KT approach:
–
Bayesian method only uses KC, opportunity count and
student as features. Much information is left unutilized.
Another machine learning method is required
•
Strategy:
–
Engineer additional features from the dataset and use
Random Forests to train a model
Random Forests
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Random Forests
•
Strategy:
–
Create rich feature datasets that include features created
from features not included in the test set
Validation set 2
(val2)
Validation set 1
(val1)
raw training dataset rows
r
aw test dataset rows
Feature Rich Validation
set 2 (frval2)
Feature Rich Validation
set 1 (frval1)
Feature Rich Test set
(
frtest
)
Non validation
training rows
(
nvtrain
)
Random Forests
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
•
Created by Leo Breiman
•
The method trains T number of separate decision
tree classifiers (50

800)
•
Each decision tree selects a random 1/P portion of
the available features (1/3)
•
The tree is grown until there are at least M
observations in the leaf (1

100)
•
When classifying unseen data, each tree votes on the
class. The popular vote wins or an average of the
votes (for regression)
Random Forests
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Feature Importance
Features extracted from training set:
•
Student progress features (avg. importance: 1.67)
–
Number
of data points [today, since the start of unit]
–
Number
of correct responses out of the last [3, 5, 10]
–
Zscore
sum for step duration, hint requests,
incorrects
–
Skill
specific version of all these features
•
Percent correct features (avg. importance: 1.60)
–
%
correct of unit, section, problem and step and total for each skill and also for each
student (10 features)
•
Student Modeling Approach features (avg. importance: 1.32)
–
The
predicted probability of correct for the test row
–
The
number of data points used in training the parameters
–
The
final EM log likelihood fit of the parameters / data points
Random Forests
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
•
Features of the user were more important in Bridge
to Algebra than Algebra
•
Student progress features / gaming the system (Baker
et al., UMUAI 2008) were important in both datasets
Random Forests
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Rank
Feature set
RMSE
Coverage
1
All features
0.2762
87%
2
Percent correct+
0.2824
96%
3
All features (fill)
0.2847
97%
Rank
Feature set
RMSE
Coverage
1
All features
0.2712
92%
2
All features (fill)
0.2791
99%
3
Percent correct+
0.2800
98%
Algebra
Bridge to Algebra
Random Forests
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Rank
Feature set
RMSE
Coverage
1
All features
0.2762
87%
2
Percent correct+
0.2824
96%
3
All features (fill)
0.2847
97%
Rank
Feature set
RMSE
Coverage
1
All features
0.2712
92%
2
All features (fill)
0.2791
99%
3
Percent correct+
0.2800
98%
Algebra
Bridge to Algebra
•
Best Bridge to Algebra RMSE on the
Leaderboard
was 0.2777
•
Random Forest RMSE of 0.2712 here is exceptional
Random Forests
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Rank
Feature set
RMSE
Coverage
1
All features
0.2762
87%
2
Percent correct+
0.2824
96%
3
All features (fill)
0.2847
97%
Rank
Feature set
RMSE
Coverage
1
All features
0.2712
92%
2
All features (fill)
0.2791
99%
3
Percent correct+
0.2800
98%
Algebra
Bridge to Algebra
•
Skill data for a student was not always available for each test row
•
Because of this many skill related feature sets only had 92% coverage
Random Forests
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Conclusion from KDD
•
Combining user features with skill features was very
powerful in both modeling and classification
approaches
•
Model tracing based predictions performed
formidably against pure machine learning techniques
•
Random Forests also performed very well on this
educational data set compared to other approaches
such as Neural Networks and SVMs. This method
could significantly boost accuracy in other EDM
datasets.
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Hardware/Software
•
Software
–
MATLAB used for all analysis
•
Bayes
Net Toolbox for Bayesian Networks Models
•
Statistics Toolbox for Random Forests classifier
–
Perl used for pre

processing
•
Hardware
–
Two rocks clusters used for skill model training
•
178 CPUs in total. Training of KT models took ~48 hours when
utilizing all CPUs.
–
Two 32gig RAM systems for Random Forests
•
RF models took ~16 hours to train with 800 trees
Random Forests
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Choose the next topic
•
KT: 1

35
•
Prediction: 36

67
•
Evaluation: 47

77
•
sig tests: 69

77
•
Regression/sig tests: 80

112
Time left?
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
UMAP 2011
55
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
Individualize Everything?
Fully Individualized Model
Model Parameters
P(L
0
) = Probability of initial knowledge
P(L
0

Q
1
) = Individual Cold start P(L
0
)
P(T) = Probability of learning
P(
T
S) = Students’ Individual P(T)
P(G) = Probability of guess
P(
G
S) = Students’ Individual P(G)
P(S) = Probability of slip
P(
S
S) Students’ Individual P(S)
Node representations
K = Knowledge node
Q = Question node
S
= Student node
Q
1
= first response node
T
= Learning node
G
= Guessing node
S
= Slipping node
Parameters in
bold
are learned
from data while the others are fixed
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
Q
1
)
P(G)
P(S)
S
Student

Skill Interaction Model
Node states
K ,
Q,
Q
1,
T
,
G
,
S
= Two state (0 or 1)
Q = Two state (0 or 1)
S
= Multi
state (1 to N)
(Where
N is the number of students in the training data)
G
S
T
P(
T
S
)
P(
G
S
)
P(
S
S
)
Q
1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
(Pardos & Heffernan, JMLR 2011)
Fully Individualized Model
Model Parameters
P(L
0
) = Probability of initial knowledge
P(L
0

Q
1
) = Individual Cold start P(L
0
)
P(T) = Probability of learning
P(
T
S) = Students’ Individual P(T)
P(G) = Probability of guess
P(
G
S) = Students’ Individual P(G)
P(S) = Probability of slip
P(
S
S) Students’ Individual P(S)
Node representations
K = Knowledge node
Q = Question node
S
= Student node
Q
1
= first response node
T
= Learning node
G
= Guessing node
S
= Slipping node
Parameters in
bold
are learned
from data while the others are fixed
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
Q
1
)
P(G)
P(S)
S
Student

Skill Interaction Model
Node states
K ,
Q,
Q
1,
T
,
G
,
S
= Two state (0 or 1)
Q = Two state (0 or 1)
S
= Multi
state (1 to N)
(Where
N is the number of students in the training data)
G
S
T
P(
T
S
)
P(
G
S
)
P(
S
S
)
Q
1
S identifies the student
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
(Pardos & Heffernan, JMLR 2011)
Fully Individualized Model
Model Parameters
P(L
0
) = Probability of initial knowledge
P(L
0

Q
1
) = Individual Cold start P(L
0
)
P(T) = Probability of learning
P(
T
S) = Students’ Individual P(T)
P(G) = Probability of guess
P(
G
S) = Students’ Individual P(G)
P(S) = Probability of slip
P(
S
S) Students’ Individual P(S)
Node representations
K = Knowledge node
Q = Question node
S
= Student node
Q
1
= first response node
T
= Learning node
G
= Guessing node
S
= Slipping node
Parameters in
bold
are learned
from data while the others are fixed
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
Q
1
)
P(G)
P(S)
S
Student

Skill Interaction Model
Node states
K ,
Q,
Q
1,
T
,
G
,
S
= Two state (0 or 1)
Q = Two state (0 or 1)
S
= Multi
state (1 to N)
(Where
N is the number of students in the training data)
G
S
T
P(
T
S
)
P(
G
S
)
P(
S
S
)
Q
1
T
contains the CPT
lookup table of
individual student
learn rates
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
(Pardos & Heffernan, JMLR 2011)
Fully Individualized Model
Model Parameters
P(L
0
) = Probability of initial knowledge
P(L
0

Q
1
) = Individual Cold start P(L
0
)
P(T) = Probability of learning
P(
T
S) = Students’ Individual P(T)
P(G) = Probability of guess
P(
G
S) = Students’ Individual P(G)
P(S) = Probability of slip
P(
S
S) Students’ Individual P(S)
Node representations
K = Knowledge node
Q = Question node
S
= Student node
Q
1
= first response node
T
= Learning node
G
= Guessing node
S
= Slipping node
Parameters in
bold
are learned
from data while the others are fixed
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
Q
1
)
P(G)
P(S)
S
Student

Skill Interaction Model
Node states
K ,
Q,
Q
1,
T
,
G
,
S
= Two state (0 or 1)
Q = Two state (0 or 1)
S
= Multi
state (1 to N)
(Where
N is the number of students in the training data)
G
S
T
P(
T
S
)
P(
G
S
)
P(
S
S
)
Q
1
P(T)
is trained for each skill
which gives a learn rate for:
P(T
T
=1) [high learner] and
P(T
T
=0) [low learner]
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
(Pardos & Heffernan, JMLR 2011)
SSI model results
0.279
0.28
0.281
0.282
0.283
0.284
0.285
0.286
Algebra
Bridge to Algebra
RMSE
PPS
SSI
Dataset
New RMSE
Prev
RMSE
Improvement
Algebra
0.2813
0.2835
0.0022
Bridge to Algebra
0.2824
0.2860
0.0036
Average of Improvement is the difference between the 1
st
and 3
rd
place. It is also the difference between 3
rd
and 4
th
place.
The difference between PPS and SSI are significant in each dataset at
the P < 0.01 level (t

test of squared errors)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Zach Pardos
(Pardos & Heffernan, JMLR 2011)
Comments 0
Log in to post a comment