# PSLC Summer School 2011

AI and Robotics

Nov 7, 2013 (4 years and 6 months ago)

109 views

Bayesian Knowledge Tracing and Other Predictive
Models in Educational Data Mining

Zachary A. Pardos

PSLC Summer School 2011

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

2

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Outline of Talk

Introduction to Knowledge Tracing

History

Intuition

Model

Demo

Variations (and other models)

Evaluations (baker work /
kdd
)

Random Forests

Description

Evaluations (
kdd
)

Time left?

Vote on next topic

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

History

Introduced in 1995 (Corbett
& Anderson,
UMUAI)

-
R theory of skill knowledge
(Anderson 1993)

Computations based on a variation of
Bayesian calculations proposed in 1972
(Atkinson)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Intuition

Based on the idea that practice on a skill leads
to mastery of that skill

Has four parameters used to describe student
performance

Relies on a KC model

Tracks student knowledge over time

Given a student’s response sequence 1 to n, predict n+1

0

0

0

1

1

1

?

For some Skill K:

Chronological response sequence for student
Y

[

0 = Incorrect response 1 = Correct response]

1 …. n n+1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

0

0

0

1

1

1

1

Track knowledge over time

(model of
learning
)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Knowledge Tracing (KT) can be represented as a simple HMM

Latent

Observed

Node representations

K = Knowledge node

Q = Question node

Node states

K = Two state (0 or 1)

Q = Two state (0 or 1)

UMAP 2011

7

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Four parameters of the KT model:

P(L
0
) = Probability of initial knowledge

P(T) = Probability of learning

P(G) = Probability of guess

P(S) = Probability of slip

UMAP 2011

P(L
0
)

P(T)

P(T)

P(G)

P(G)

P(G)

P(S)

Probability of forgetting assumed to be zero (fixed)

8

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Formulas for inference and prediction

Derivation (Reye, JAIED 2004):

Formulas use Bayes Theorem to make

If
𝐶 𝑐
𝑛

𝑃
(

𝑛

1
)
=

𝑃
𝐿
𝑛

1

(
1

𝑃
𝑆
)
𝑃
𝐿
𝑛

1

1

𝑃
𝑆
+

(
1

𝑃
𝐿
𝑛

1
)

(
𝑃
𝐺
)

(1)

𝐼𝑐 𝑐
𝑛

𝑃
(

𝑛

1
)
=

𝑃
𝐿
𝑛

1

𝑃
𝑆
𝑃
𝐿
𝑛

1

𝑃
(
𝑆
)
+

(
1

𝑃
𝐿
𝑛

1
)

(
1

𝑃
𝐺
)

(2)

𝑃

𝑛
=

𝑃
(

𝑛

1

(
1

𝑃
𝐹
)
+
(
1

𝑃
(

𝑛

1
)

𝑃
(

)
)

(3)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

0

0

1

1

1

Model Training Step
-

Values of parameters
P(T), P(G), P(S) & P(L
0

)
used to predict
student responses

-
hoc

values could be used but will likely not be the best fitting

Goal: find a set of values for the parameters that minimizes prediction error

1

1

1

1

0

1

0

0

0

0

1

0

Student A

Student B

Student C

0

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Model Training:

Model Tracing Step

Skill: Subtraction

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
0

1

1

Student’s last three responses to Subtraction questions

(in the Unit)

Test set questions

Latent

(knowledge)

Observable

(responses)

10%

45%

75%

79% 83%

71%

74%

P(K)

P(Q)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Model Prediction:

I
nfluence of parameter values

P(L
0
): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09

Student reached 95% probability of knowledge

After 4
th

opportunity

Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1

P(L
0
): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09

P(L
0
): 0.50 P(T): 0.20
P(G): 0.64 P(S): 0.03

Student reached 95% probability of knowledge

After 8
th

opportunity

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Influence of parameter values

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

( Demo )

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Variations on Knowledge Tracing

(and other models)

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Do all students enter a lesson with the same background knowledge?

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
Node representations

K = Knowledge node

Q = Question
node

S = Student node

Node states

K = Two state (0 or 1)

Q = Two state (0 or 1)

S = Multi state (1 to N)

P(L
0
|S)

Observed

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Conditional Probability Table of Student node and Individualized Prior node

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(S=value)

1

1/N

2

1/N

3

1/N

N

1/N

CPT of Student node

CPT of observed
student node is fixed

Possible to have S value
for every student ID

Raises initialization
issue (where do these
prior values come from?)

S value can represent a
cluster or type of student

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Conditional Probability Table of Student node and Individualized Prior node

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(L
0
|S)

1

0.05

2

0.30

3

0.95

N

0.92

CPT of Individualized Prior node

Individualized L
0
values
need to be seeded

This CPT can be fixed or
the values can be learned

Fixing this CPT and
seeding it with values
based on a student’s first
response can be an
effective strategy

This model, that only
individualizes L
0
, the Prior
Per Student (PPS) model

P(L
0
|S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Conditional Probability Table of Student node and Individualized Prior node

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(L
0
|S)

0

0.05

1

0.30

CPT of Individualized Prior node

Bootstrapping prior

incorrectly on the first
question, she gets a low
prior

correctly on the first
question, she gets a
higher prior

P(L
0
|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(L
0
|S)

0

0.05

1

0.30

CPT of Individualized Prior node

What values to use for
the two priors?

P(L
0
|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(L
0
|S)

0

0.10

1

0.85

CPT of Individualized Prior node

1.
-
hoc values

P(L
0
|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(L
0
|S)

0

EM

1

EM

CPT of Individualized Prior node

1.
-
hoc values

2.
Learn the values

P(L
0
|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(L
0
|S)

0

Slip

1

1
-
Guess

CPT of Individualized Prior node

1.
-
hoc values

2.
Learn the values

3.
guess/slip CPT

P(L
0
|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(L
0
|S)

0

Slip

1

1
-
Guess

CPT of Individualized Prior node

1.
-
hoc values

2.
Learn the values

3.
guess/slip CPT

P(L
0
|S)

1

1

-
hoc) achieved an R
2

of 0.301 (0.176 with KT)

(Pardos & Heffernan, UMAP 2010)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

UMAP 2011

25

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Variations on Knowledge Tracing

(and other models)

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge

P(T) = Probability of learning

P(G) = Probability of guess

P(S) = Probability of slip

UMAP 2011

P(L
0
)

P(T)

P(T)

(Baker et al., 2010)

26

1. BKT
-
BF

Learns values for
these parameters
by

performing a
grid search
(0.01 granularity)

and chooses the set of parameters with the
best squared error

. . .

P(G)

P(G)

P(G)

P(S)

P(S)

P(S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge

P(T) = Probability of learning

P(G) = Probability of guess

P(S) = Probability of slip

UMAP 2011

P(L
0
)

P(T)

P(T)

(Chang et al., 2006)

27

2. BKT
-
EM

Learns values for
these parameters
with
Expectation Maximization

(EM). Maximizes
the log likelihood fit to the data

. . .

P(G)

P(G)

P(G)

P(S)

P(S)

P(S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge

P(T) = Probability of learning

P(G) = Probability of guess

P(S) = Probability of slip

UMAP 2011

P(L
0
)

P(T)

P(T)

(Baker, Corbett, & Aleven, 2008)

28

3
. BKT
-
CGS

Guess and slip parameters
are assessed
contextually

using a regression on features
generated from student performance in the
tutor

. . .

P(G)

P(G)

P(G)

P(S)

P(S)

P(S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge

P(T) = Probability of learning

P(G) = Probability of guess

P(S) = Probability of slip

UMAP 2011

P(L
0
)

P(T)

P(T)

(Baker, Corbett, & Aleven, 2008)

29

4
. BKT
-
CSlip

Uses the student’s averaged
contextual

Slip
parameter
learned across all incorrect
actions.

. . .

P(G)

P(G)

P(G)

P(S)

P(S)

P(S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge

P(T) = Probability of learning

P(G) = Probability of guess

P(S) = Probability of slip

UMAP 2011

P(L
0
)

P(T)

P(T)

(
Nooraiei

et al, 2011)

30

5. BKT
-
LessData

Limits

students
response sequence length
to
the most recent 15 during EM training.

. . .

P(G)

P(G)

P(G)

P(S)

P(S)

P(S)

Most recent 15 responses used (max)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge

P(T) = Probability of learning

P(G) = Probability of guess

P(S) = Probability of slip

UMAP 2011

P(L
0
)

P(T)

P(T)

(Pardos & Heffernan, 2010)

31

6. BKT
-
PPS

Prior per student (PPS) model which
individualizes

the
prior parameter
. Students
are assigned a prior based on their response
to the first question.

. . .

P(G)

P(G)

P(G)

P(S)

P(S)

P(S)

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S
)

Observed

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

UMAP 2011

32

7
. CFAR

Correct on First Attempt Rate (CFAR)
calculates the student’s
percent correct
on
the
current skill

up until the question being
predicted.

Student responses for Skill X:
0 1 0 1 0 1

_

Predicted next response would be 0.50

(Yu et al., 2010)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

UMAP 2011

33

8. Tabling

Uses the student’s response sequence (max
length 3) to predict the next response by
looking up the
average next response

among
student with the
same sequence

in the
training set

Training set

Student A: 0 1 1
0

Student B: 0 1 1
1

Student C: 0 1 1
1

Predicted next response would be 0.66

Test set student: 0 0 1 _

Max table length set to 3:

Table size was 2
0
+2
1
+2
2
+2
3
=15

(Wang et al., 2011)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

UMAP 2011

34

9
. PFA

Performance Factors Analysis (PFA).
Logistic
regression
model which elaborates on the
Rasch

IRT model. Predicts performance based
on
the count of student’s prior failures and
successes
on the current skill.

An overall difficulty parameter

is also fit for
each skill
or
each item
In this study we
use the
variant of PFA that fits

for
each skill. The PFA
equation is:

,


𝐶
,

,

=


+

(





+
𝜌

𝐹


)

(Pavlik et al., 2009)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study

Cognitive Tutor for Genetics

9 Skills (no multi
-
skill steps)

23,706
problem solving attempts

11,582
problem steps in the
tutor

152 average problem steps completed per student
(SD=50)

Pre and post
-
assignment

Dataset

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Methodology

Evaluation

Intro to Knowledge Tracing

Study

Predictions were made by the 9 models using a 5 fold cross
-
validation by
student

Methodology

model in
-
tutor prediction

Student 1

Skill A

Resp

1

0.10

0.22

0

Skill A

Resp

2

….

0.51

0.26

1

Skill A

Resp

N

0.77

0.40

1

Student 1

Skill B

Resp

1

0.55

0.60

1

Skill B

Resp

N

0.41

0.61

0

Actual

Accuracy was calculated with A’ for each student. Those values were then
averaged across students to report the model’s A’ (higher is better)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study

Results

in
-
tutor model prediction

Model

A’

BKT
-
PPS

0.7029

BKT
-
BF

0.6969

BKT
-
EM

0.6957

BKT
-
LessData

0.6839

PFA

0.6629

Tabling

0.6476

BKT
-
CSlip

0.6149

CFAR

0.5705

BKT
-
CGS

0.4857

A’ results averaged across students

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study

Results

in
-
tutor model prediction

Model

A’

BKT
-
PPS

0.7029

BKT
-
BF

0.6969

BKT
-
EM

0.6957

BKT
-
LessData

0.6839

PFA

0.6629

Tabling

0.6476

BKT
-
CSlip

0.6149

CFAR

0.5705

BKT
-
CGS

0.4857

A’ results averaged across students

No significant
differences within
these BKT

Significant
differences
between these BKT
and PFA

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study

5 ensemble methods were used, trained with the same 5 fold cross
-
validation
folds

Methodology

ensemble in
-
tutor prediction

Ensemble methods were trained using the 9 model predictions as the features
and the actual response as the label.

Student 1

Skill A

Resp

1

0.10

0.22

0

Skill A

Resp

2

….

0.51

0.26

1

Skill A

Resp

N

0.77

0.40

1

Student 1

Skill B

Resp

1

0.55

0.60

1

Skill B

Resp

N

0.41

0.61

0

Actual

features

label

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study

Ensemble methods used:

1.
Linear regression with no feature selection (predictions bounded between {0,1})

2.
Linear regression with feature selection (stepwise regression)

3.
Linear regression with only BKT
-
PPS & BKT
-
EM

4.
Linear regression with only BKT
-
PPS, BKT
-
EM & BKT
-
CSlip

5.
Logistic regression

Methodology

ensemble in
-
tutor prediction

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study

Results

in
-
tutor ensemble prediction

Model

A’

Ensemble:
LinReg

with BKT
-
PPS, BKT
-
EM

& BKT
-
CSlip

0.7028

Ensemble:
LinReg

with BKT
-
PPS

& BKT
-
EM

0.6973

Ensemble:
LinReg

without feature selection

0.6945

Ensemble:

LinReg

with feature selection (stepwise)

0.6954

Ensemble:

Logistic without feature selection

0.6854

A’ results averaged across students

Tabling

No significant difference between ensembles

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study

Results

in
-
tutor ensemble & model prediction

Model

A’

BKT
-
PPS

0.7029

Ensemble
:
LinReg

with BKT
-
PPS, BKT
-
EM

& BKT
-
CSlip

0.7028

Ensemble
:
LinReg

with BKT
-
PPS

& BKT
-
EM

0.6973

BKT
-
BF

0.6969

BKT
-
EM

0.6957

Ensemble
:
LinReg

without feature selection

0.6945

Ensemble
:

LinReg

with feature selection (stepwise)

0.6954

Ensemble
:

Logistic without feature selection

0.6854

BKT
-
LessData

0.6839

PFA

0.6629

Tabling

0.6476

BKT
-
CSlip

0.6149

CFAR

0.5705

BKT
-
CGS

0.4857

A’ results averaged across students

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study

Results

in
-
tutor ensemble & model prediction

Model

A’

Ensemble
:
LinReg

with BKT
-
PPS,

BKT
-
EM & BKT
-
CSlip

0.7451

Ensemble
:
LinReg

without feature selection

0.7428

Ensemble
:
LinReg

with feature selection (stepwise)

0.7423

Ensemble
:
Logistic regression without feature selection

0.7359

Ensemble
:
LinReg

with BKT
-
PPS

& BKT
-
EM

0.7348

BKT
-
EM

0.7348

BKT
-
BF

0.7330

BKT
-
PPS

0.7310

PFA

0.7277

BKT
-
LessData

0.7220

CFAR

0.6723

Tabling

0.6712

Contextual Slip

0.6396

BKT
-
CGS

0.4917

A’ results calculated across all actions

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

In the KDD Cup

Motivation for trying non KT approach:

Bayesian method only uses KC, opportunity count and
student as features. Much information is left unutilized.
Another machine learning method is required

Strategy:

Engineer additional features from the dataset and use
Random Forests to train a model

Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Random Forests

Strategy:

Create rich feature datasets that include features created
from features not included in the test set

Validation set 2
(val2)
Validation set 1
(val1)
raw training dataset rows
r
aw test dataset rows
Feature Rich Validation
set 2 (frval2)
Feature Rich Validation
set 1 (frval1)
Feature Rich Test set
(
frtest
)
Non validation
training rows
(
nvtrain
)
Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Created by Leo Breiman

The method trains T number of separate decision
tree classifiers (50
-
800)

Each decision tree selects a random 1/P portion of
the available features (1/3)

The tree is grown until there are at least M
observations in the leaf (1
-
100)

When classifying unseen data, each tree votes on the
class. The popular vote wins or an average of the

Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Feature Importance

Features extracted from training set:

Student progress features (avg. importance: 1.67)

Number
of data points [today, since the start of unit]

Number
of correct responses out of the last [3, 5, 10]

Zscore

sum for step duration, hint requests,
incorrects

Skill
specific version of all these features

Percent correct features (avg. importance: 1.60)

%
correct of unit, section, problem and step and total for each skill and also for each
student (10 features)

Student Modeling Approach features (avg. importance: 1.32)

The
predicted probability of correct for the test row

The
number of data points used in training the parameters

The
final EM log likelihood fit of the parameters / data points

Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Features of the user were more important in Bridge
to Algebra than Algebra

Student progress features / gaming the system (Baker
et al., UMUAI 2008) were important in both datasets

Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Rank

Feature set

RMSE

Coverage

1

All features

0.2762

87%

2

Percent correct+

0.2824

96%

3

All features (fill)

0.2847

97%

Rank

Feature set

RMSE

Coverage

1

All features

0.2712

92%

2

All features (fill)

0.2791

99%

3

Percent correct+

0.2800

98%

Algebra

Bridge to Algebra

Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Rank

Feature set

RMSE

Coverage

1

All features

0.2762

87%

2

Percent correct+

0.2824

96%

3

All features (fill)

0.2847

97%

Rank

Feature set

RMSE

Coverage

1

All features

0.2712

92%

2

All features (fill)

0.2791

99%

3

Percent correct+

0.2800

98%

Algebra

Bridge to Algebra

Best Bridge to Algebra RMSE on the

was 0.2777

Random Forest RMSE of 0.2712 here is exceptional

Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Rank

Feature set

RMSE

Coverage

1

All features

0.2762

87%

2

Percent correct+

0.2824

96%

3

All features (fill)

0.2847

97%

Rank

Feature set

RMSE

Coverage

1

All features

0.2712

92%

2

All features (fill)

0.2791

99%

3

Percent correct+

0.2800

98%

Algebra

Bridge to Algebra

Skill data for a student was not always available for each test row

Because of this many skill related feature sets only had 92% coverage

Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Conclusion from KDD

Combining user features with skill features was very
powerful in both modeling and classification
approaches

Model tracing based predictions performed
formidably against pure machine learning techniques

Random Forests also performed very well on this
educational data set compared to other approaches
such as Neural Networks and SVMs. This method
could significantly boost accuracy in other EDM
datasets.

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Hardware/Software

Software

MATLAB used for all analysis

Bayes

Net Toolbox for Bayesian Networks Models

Statistics Toolbox for Random Forests classifier

Perl used for pre
-
processing

Hardware

Two rocks clusters used for skill model training

178 CPUs in total. Training of KT models took ~48 hours when
utilizing all CPUs.

Two 32gig RAM systems for Random Forests

RF models took ~16 hours to train with 800 trees

Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Choose the next topic

KT: 1
-
35

Prediction: 36
-
67

Evaluation: 47
-
77

sig tests: 69
-
77

Regression/sig tests: 80
-
112

Time left?

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

UMAP 2011

55

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Individualize Everything?

Fully Individualized Model

Model Parameters
P(L
0
) = Probability of initial knowledge
P(L
0
|
Q
1
) = Individual Cold start P(L
0
)
P(T) = Probability of learning
P(
T
|S) = Students’ Individual P(T)
P(G) = Probability of guess
P(
G
|S) = Students’ Individual P(G)
P(S) = Probability of slip
P(
S
|S) Students’ Individual P(S)
Node representations
K = Knowledge node
Q = Question node
S
= Student node
Q
1
= first response node
T
= Learning node
G
= Guessing node
S
= Slipping node
Parameters in
bold
are learned
from data while the others are fixed
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|Q
1
)
P(G)
P(S)
S
Student
-
Skill Interaction Model
Node states
K ,
Q,
Q
1,
T
,
G
,
S
= Two state (0 or 1)
Q = Two state (0 or 1)
S
= Multi
state (1 to N)
(Where
N is the number of students in the training data)
G
S
T
P(
T
|S
)
P(
G
|S
)
P(
S
|S
)
Q
1
Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

(Pardos & Heffernan, JMLR 2011)

Fully Individualized Model

Model Parameters
P(L
0
) = Probability of initial knowledge
P(L
0
|
Q
1
) = Individual Cold start P(L
0
)
P(T) = Probability of learning
P(
T
|S) = Students’ Individual P(T)
P(G) = Probability of guess
P(
G
|S) = Students’ Individual P(G)
P(S) = Probability of slip
P(
S
|S) Students’ Individual P(S)
Node representations
K = Knowledge node
Q = Question node
S
= Student node
Q
1
= first response node
T
= Learning node
G
= Guessing node
S
= Slipping node
Parameters in
bold
are learned
from data while the others are fixed
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|Q
1
)
P(G)
P(S)
S
Student
-
Skill Interaction Model
Node states
K ,
Q,
Q
1,
T
,
G
,
S
= Two state (0 or 1)
Q = Two state (0 or 1)
S
= Multi
state (1 to N)
(Where
N is the number of students in the training data)
G
S
T
P(
T
|S
)
P(
G
|S
)
P(
S
|S
)
Q
1
S identifies the student

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

(Pardos & Heffernan, JMLR 2011)

Fully Individualized Model

Model Parameters
P(L
0
) = Probability of initial knowledge
P(L
0
|
Q
1
) = Individual Cold start P(L
0
)
P(T) = Probability of learning
P(
T
|S) = Students’ Individual P(T)
P(G) = Probability of guess
P(
G
|S) = Students’ Individual P(G)
P(S) = Probability of slip
P(
S
|S) Students’ Individual P(S)
Node representations
K = Knowledge node
Q = Question node
S
= Student node
Q
1
= first response node
T
= Learning node
G
= Guessing node
S
= Slipping node
Parameters in
bold
are learned
from data while the others are fixed
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|Q
1
)
P(G)
P(S)
S
Student
-
Skill Interaction Model
Node states
K ,
Q,
Q
1,
T
,
G
,
S
= Two state (0 or 1)
Q = Two state (0 or 1)
S
= Multi
state (1 to N)
(Where
N is the number of students in the training data)
G
S
T
P(
T
|S
)
P(
G
|S
)
P(
S
|S
)
Q
1
T

contains the CPT
lookup table of
individual student
learn rates

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

(Pardos & Heffernan, JMLR 2011)

Fully Individualized Model

Model Parameters
P(L
0
) = Probability of initial knowledge
P(L
0
|
Q
1
) = Individual Cold start P(L
0
)
P(T) = Probability of learning
P(
T
|S) = Students’ Individual P(T)
P(G) = Probability of guess
P(
G
|S) = Students’ Individual P(G)
P(S) = Probability of slip
P(
S
|S) Students’ Individual P(S)
Node representations
K = Knowledge node
Q = Question node
S
= Student node
Q
1
= first response node
T
= Learning node
G
= Guessing node
S
= Slipping node
Parameters in
bold
are learned
from data while the others are fixed
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|Q
1
)
P(G)
P(S)
S
Student
-
Skill Interaction Model
Node states
K ,
Q,
Q
1,
T
,
G
,
S
= Two state (0 or 1)
Q = Two state (0 or 1)
S
= Multi
state (1 to N)
(Where
N is the number of students in the training data)
G
S
T
P(
T
|S
)
P(
G
|S
)
P(
S
|S
)
Q
1
P(T)

is trained for each skill
which gives a learn rate for:

P(T|
T
=1) [high learner] and
P(T|
T
=0) [low learner]

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

(Pardos & Heffernan, JMLR 2011)

SSI model results

0.279
0.28
0.281
0.282
0.283
0.284
0.285
0.286
Algebra
Bridge to Algebra
RMSE

PPS
SSI

Dataset

New RMSE

Prev

RMSE

Improvement

Algebra

0.2813

0.2835

0.0022

Bridge to Algebra

0.2824

0.2860

0.0036

Average of Improvement is the difference between the 1
st

and 3
rd

place. It is also the difference between 3
rd

and 4
th

place.

The difference between PPS and SSI are significant in each dataset at
the P < 0.01 level (t
-
test of squared errors)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

(Pardos & Heffernan, JMLR 2011)