PSLC Summer School 2011

hartebeestgrassΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 5 μήνες)

93 εμφανίσεις

Bayesian Knowledge Tracing and Other Predictive
Models in Educational Data Mining

Zachary A. Pardos

PSLC Summer School 2011

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

2

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Outline of Talk



Introduction to Knowledge Tracing


History


Intuition


Model


Demo


Variations (and other models)


Evaluations (baker work /
kdd
)


Random Forests


Description


Evaluations (
kdd
)


Time left?


Vote on next topic

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

History


Introduced in 1995 (Corbett
& Anderson,
UMUAI)


Basked on ACT
-
R theory of skill knowledge
(Anderson 1993)


Computations based on a variation of
Bayesian calculations proposed in 1972
(Atkinson)


Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Intuition


Based on the idea that practice on a skill leads
to mastery of that skill


Has four parameters used to describe student
performance


Relies on a KC model


Tracks student knowledge over time

Given a student’s response sequence 1 to n, predict n+1

0

0

0

1

1

1

?

For some Skill K:

Chronological response sequence for student
Y

[

0 = Incorrect response 1 = Correct response]


1 …. n n+1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

0

0

0

1

1

1

1

Track knowledge over time

(model of
learning
)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Knowledge Tracing (KT) can be represented as a simple HMM

Latent

Observed

Node representations

K = Knowledge node

Q = Question node


Node states

K = Two state (0 or 1)

Q = Two state (0 or 1)

UMAP 2011

7

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Four parameters of the KT model:

P(L
0
) = Probability of initial knowledge

P(T) = Probability of learning

P(G) = Probability of guess

P(S) = Probability of slip

UMAP 2011

P(L
0
)

P(T)

P(T)

P(G)

P(G)

P(G)

P(S)

Probability of forgetting assumed to be zero (fixed)

8

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Formulas for inference and prediction





Derivation (Reye, JAIED 2004):




Formulas use Bayes Theorem to make
inferences about latent variable


If
𝐶 𝑐
𝑛

𝑃
(

𝑛

1
)
=

𝑃
𝐿
𝑛

1

(
1

𝑃
𝑆
)
𝑃
𝐿
𝑛

1

1

𝑃
𝑆
+

(
1

𝑃
𝐿
𝑛

1
)

(
𝑃
𝐺
)


(1)

𝐼𝑐 𝑐
𝑛

𝑃
(

𝑛

1
)
=

𝑃
𝐿
𝑛

1

𝑃
𝑆
𝑃
𝐿
𝑛

1

𝑃
(
𝑆
)
+

(
1

𝑃
𝐿
𝑛

1
)

(
1

𝑃
𝐺
)


(2)

𝑃

𝑛
=

𝑃
(

𝑛

1

(
1

𝑃
𝐹
)
+
(
1

𝑃
(

𝑛

1
)

𝑃
(

)
)


(3)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

0

0

1

1

1

Model Training Step
-

Values of parameters
P(T), P(G), P(S) & P(L
0

)
used to predict
student responses


Ad
-
hoc

values could be used but will likely not be the best fitting


Goal: find a set of values for the parameters that minimizes prediction error


1

1

1

1

0

1

0

0

0

0

1

0

Student A

Student B

Student C

0

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Model Training:

Model Tracing Step


Skill: Subtraction

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
0



1



1

Student’s last three responses to Subtraction questions

(in the Unit)

Test set questions

Latent

(knowledge)

Observable

(responses)





10%


45%


75%


79% 83%

71%



74%

P(K)

P(Q)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Model Prediction:

I
nfluence of parameter values

P(L
0
): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09

Student reached 95% probability of knowledge

After 4
th

opportunity

Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1

P(L
0
): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09

P(L
0
): 0.50 P(T): 0.20
P(G): 0.64 P(S): 0.03

Student reached 95% probability of knowledge

After 8
th

opportunity

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Influence of parameter values

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

( Demo )

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Variations on Knowledge Tracing

(and other models)

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Do all students enter a lesson with the same background knowledge?

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
Node representations

K = Knowledge node

Q = Question
node

S = Student node


Node states

K = Two state (0 or 1)

Q = Two state (0 or 1)

S = Multi state (1 to N)

P(L
0
|S)

Observed

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Conditional Probability Table of Student node and Individualized Prior node

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(S=value)

1

1/N

2

1/N

3

1/N





N

1/N

CPT of Student node



CPT of observed
student node is fixed



Possible to have S value
for every student ID



Raises initialization
issue (where do these
prior values come from?)

S value can represent a
cluster or type of student
instead of ID

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Conditional Probability Table of Student node and Individualized Prior node

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(L
0
|S)

1

0.05

2

0.30

3

0.95





N

0.92

CPT of Individualized Prior node



Individualized L
0
values
need to be seeded



This CPT can be fixed or
the values can be learned



Fixing this CPT and
seeding it with values
based on a student’s first
response can be an
effective strategy

This model, that only
individualizes L
0
, the Prior
Per Student (PPS) model

P(L
0
|S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
Conditional Probability Table of Student node and Individualized Prior node

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(L
0
|S)

0

0.05

1

0.30

CPT of Individualized Prior node



Bootstrapping prior



If a student answers
incorrectly on the first
question, she gets a low
prior


If a student answers
correctly on the first
question, she gets a
higher prior

P(L
0
|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(L
0
|S)

0

0.05

1

0.30

CPT of Individualized Prior node

What values to use for
the two priors?

P(L
0
|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(L
0
|S)

0

0.10

1

0.85

CPT of Individualized Prior node

1.
Use ad
-
hoc values

P(L
0
|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(L
0
|S)

0

EM

1

EM

CPT of Individualized Prior node

1.
Use ad
-
hoc values

2.
Learn the values

P(L
0
|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(L
0
|S)

0

Slip

1

1
-
Guess

CPT of Individualized Prior node

1.
Use ad
-
hoc values

2.
Learn the values

3.
Link with the
guess/slip CPT

P(L
0
|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
What values to use for the two priors?

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S)

S

value

P(L
0
|S)

0

Slip

1

1
-
Guess

CPT of Individualized Prior node

1.
Use ad
-
hoc values

2.
Learn the values

3.
Link with the
guess/slip CPT

P(L
0
|S)

1

1

With ASSISTments, PPS (ad
-
hoc) achieved an R
2

of 0.301 (0.176 with KT)


(Pardos & Heffernan, UMAP 2010)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Prior Individualization Approach

UMAP 2011

25

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Variations on Knowledge Tracing

(and other models)

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge

P(T) = Probability of learning

P(G) = Probability of guess

P(S) = Probability of slip

UMAP 2011

P(L
0
)

P(T)

P(T)

(Baker et al., 2010)

26

1. BKT
-
BF

Learns values for
these parameters
by

performing a
grid search
(0.01 granularity)

and chooses the set of parameters with the
best squared error

. . .

P(G)

P(G)

P(G)

P(S)

P(S)

P(S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge

P(T) = Probability of learning

P(G) = Probability of guess

P(S) = Probability of slip

UMAP 2011

P(L
0
)

P(T)

P(T)

(Chang et al., 2006)

27

2. BKT
-
EM

Learns values for
these parameters
with
Expectation Maximization

(EM). Maximizes
the log likelihood fit to the data

. . .

P(G)

P(G)

P(G)

P(S)

P(S)

P(S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge

P(T) = Probability of learning

P(G) = Probability of guess

P(S) = Probability of slip

UMAP 2011

P(L
0
)

P(T)

P(T)

(Baker, Corbett, & Aleven, 2008)

28

3
. BKT
-
CGS

Guess and slip parameters
are assessed
contextually

using a regression on features
generated from student performance in the
tutor

. . .

P(G)

P(G)

P(G)

P(S)

P(S)

P(S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge

P(T) = Probability of learning

P(G) = Probability of guess

P(S) = Probability of slip

UMAP 2011

P(L
0
)

P(T)

P(T)

(Baker, Corbett, & Aleven, 2008)

29

4
. BKT
-
CSlip

Uses the student’s averaged
contextual

Slip
parameter
learned across all incorrect
actions.

. . .

P(G)

P(G)

P(G)

P(S)

P(S)

P(S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge

P(T) = Probability of learning

P(G) = Probability of guess

P(S) = Probability of slip

UMAP 2011

P(L
0
)

P(T)

P(T)

(
Nooraiei

et al, 2011)

30

5. BKT
-
LessData

Limits

students
response sequence length
to
the most recent 15 during EM training.

. . .

P(G)

P(G)

P(G)

P(S)

P(S)

P(S)

Most recent 15 responses used (max)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

K
K
K
Q
Q
Q
P(T)
P(T)
Model Parameters
P(L
0
) = Probability of initial knowledge
P(T) = Probability of learning
P(G) = Probability of guess
P(S) = Probability of slip
Nodes representation
K = knowledge node
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
P(L
0
)
P(G)
P(S)
Knowledge Tracing
P(L
0
) = Probability of initial knowledge

P(T) = Probability of learning

P(G) = Probability of guess

P(S) = Probability of slip

UMAP 2011

P(L
0
)

P(T)

P(T)

(Pardos & Heffernan, 2010)

31

6. BKT
-
PPS

Prior per student (PPS) model which
individualizes

the
prior parameter
. Students
are assigned a prior based on their response
to the first question.

. . .

P(G)

P(G)

P(G)

P(S)

P(S)

P(S)

K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|S
)
P(G)
P(S)
S
Knowledge Tracing with Individualized P(L
0
)
P(L
0
|S
)

Observed

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

UMAP 2011

32

7
. CFAR

Correct on First Attempt Rate (CFAR)
calculates the student’s
percent correct
on
the
current skill

up until the question being
predicted.

Student responses for Skill X:
0 1 0 1 0 1

_

Predicted next response would be 0.50

(Yu et al., 2010)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

UMAP 2011

33

8. Tabling

Uses the student’s response sequence (max
length 3) to predict the next response by
looking up the
average next response

among
student with the
same sequence

in the
training set


Training set

Student A: 0 1 1
0

Student B: 0 1 1
1

Student C: 0 1 1
1

Predicted next response would be 0.66

Test set student: 0 0 1 _

Max table length set to 3:

Table size was 2
0
+2
1
+2
2
+2
3
=15

(Wang et al., 2011)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

UMAP 2011

34

9
. PFA

Performance Factors Analysis (PFA).
Logistic
regression
model which elaborates on the
Rasch

IRT model. Predicts performance based
on
the count of student’s prior failures and
successes
on the current skill.

An overall difficulty parameter


is also fit for
each skill
or
each item
In this study we
use the
variant of PFA that fits

for
each skill. The PFA
equation is:








,


𝐶
,

,

=


+


(





+
𝜌

𝐹


)


(Pavlik et al., 2009)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study


Cognitive Tutor for Genetics


76 CMU undergraduate students


9 Skills (no multi
-
skill steps)


23,706
problem solving attempts


11,582
problem steps in the
tutor


152 average problem steps completed per student
(SD=50)


Pre and post
-
tests were administered with this
assignment

Dataset

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Methodology

Evaluation

Intro to Knowledge Tracing

Study


Predictions were made by the 9 models using a 5 fold cross
-
validation by
student

Methodology

model in
-
tutor prediction

Student 1

Skill A

Resp

1

0.10

0.22

0

Skill A

Resp

2


….

0.51

0.26

1

Skill A

Resp

N

0.77

0.40

1

Student 1

Skill B

Resp

1




0.55

0.60

1

Skill B

Resp

N

0.41

0.61

0



Actual


Accuracy was calculated with A’ for each student. Those values were then
averaged across students to report the model’s A’ (higher is better)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study

Results

in
-
tutor model prediction

Model

A’

BKT
-
PPS

0.7029

BKT
-
BF

0.6969

BKT
-
EM

0.6957

BKT
-
LessData

0.6839

PFA

0.6629

Tabling

0.6476

BKT
-
CSlip

0.6149

CFAR

0.5705

BKT
-
CGS

0.4857

A’ results averaged across students

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study

Results

in
-
tutor model prediction

Model

A’

BKT
-
PPS

0.7029

BKT
-
BF

0.6969

BKT
-
EM

0.6957

BKT
-
LessData

0.6839

PFA

0.6629

Tabling

0.6476

BKT
-
CSlip

0.6149

CFAR

0.5705

BKT
-
CGS

0.4857

A’ results averaged across students

No significant
differences within
these BKT

Significant
differences
between these BKT
and PFA

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study


5 ensemble methods were used, trained with the same 5 fold cross
-
validation
folds

Methodology

ensemble in
-
tutor prediction


Ensemble methods were trained using the 9 model predictions as the features
and the actual response as the label.

Student 1

Skill A

Resp

1

0.10

0.22

0

Skill A

Resp

2


….

0.51

0.26

1

Skill A

Resp

N

0.77

0.40

1

Student 1

Skill B

Resp

1




0.55

0.60

1

Skill B

Resp

N

0.41

0.61

0



Actual

features

label

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study


Ensemble methods used:

1.
Linear regression with no feature selection (predictions bounded between {0,1})

2.
Linear regression with feature selection (stepwise regression)

3.
Linear regression with only BKT
-
PPS & BKT
-
EM

4.
Linear regression with only BKT
-
PPS, BKT
-
EM & BKT
-
CSlip

5.
Logistic regression


Methodology

ensemble in
-
tutor prediction

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study

Results

in
-
tutor ensemble prediction

Model

A’

Ensemble:
LinReg

with BKT
-
PPS, BKT
-
EM

& BKT
-
CSlip

0.7028

Ensemble:
LinReg

with BKT
-
PPS

& BKT
-
EM

0.6973

Ensemble:
LinReg

without feature selection

0.6945

Ensemble:

LinReg

with feature selection (stepwise)

0.6954

Ensemble:

Logistic without feature selection

0.6854

A’ results averaged across students

Tabling

No significant difference between ensembles

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study

Results

in
-
tutor ensemble & model prediction

Model

A’

BKT
-
PPS

0.7029

Ensemble
:
LinReg

with BKT
-
PPS, BKT
-
EM

& BKT
-
CSlip

0.7028

Ensemble
:
LinReg

with BKT
-
PPS

& BKT
-
EM

0.6973

BKT
-
BF

0.6969

BKT
-
EM

0.6957

Ensemble
:
LinReg

without feature selection

0.6945

Ensemble
:

LinReg

with feature selection (stepwise)

0.6954

Ensemble
:

Logistic without feature selection

0.6854

BKT
-
LessData

0.6839

PFA

0.6629

Tabling

0.6476

BKT
-
CSlip

0.6149

CFAR

0.5705

BKT
-
CGS

0.4857

A’ results averaged across students

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Study

Results

in
-
tutor ensemble & model prediction

Model

A’

Ensemble
:
LinReg

with BKT
-
PPS,

BKT
-
EM & BKT
-
CSlip

0.7451

Ensemble
:
LinReg

without feature selection

0.7428

Ensemble
:
LinReg

with feature selection (stepwise)

0.7423

Ensemble
:
Logistic regression without feature selection

0.7359

Ensemble
:
LinReg

with BKT
-
PPS

& BKT
-
EM

0.7348

BKT
-
EM

0.7348

BKT
-
BF

0.7330

BKT
-
PPS

0.7310

PFA

0.7277

BKT
-
LessData

0.7220

CFAR

0.6723

Tabling

0.6712

Contextual Slip

0.6396

BKT
-
CGS

0.4917

A’ results calculated across all actions

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

In the KDD Cup


Motivation for trying non KT approach:


Bayesian method only uses KC, opportunity count and
student as features. Much information is left unutilized.
Another machine learning method is required


Strategy:


Engineer additional features from the dataset and use
Random Forests to train a model


Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Random Forests


Strategy:


Create rich feature datasets that include features created
from features not included in the test set


Validation set 2
(val2)
Validation set 1
(val1)
raw training dataset rows
r
aw test dataset rows
Feature Rich Validation
set 2 (frval2)
Feature Rich Validation
set 1 (frval1)
Feature Rich Test set
(
frtest
)
Non validation
training rows
(
nvtrain
)
Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos


Created by Leo Breiman


The method trains T number of separate decision
tree classifiers (50
-
800)


Each decision tree selects a random 1/P portion of
the available features (1/3)


The tree is grown until there are at least M
observations in the leaf (1
-
100)


When classifying unseen data, each tree votes on the
class. The popular vote wins or an average of the
votes (for regression)

Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Feature Importance

Features extracted from training set:


Student progress features (avg. importance: 1.67)


Number
of data points [today, since the start of unit]


Number
of correct responses out of the last [3, 5, 10]


Zscore

sum for step duration, hint requests,
incorrects


Skill
specific version of all these features


Percent correct features (avg. importance: 1.60)


%
correct of unit, section, problem and step and total for each skill and also for each
student (10 features)


Student Modeling Approach features (avg. importance: 1.32)


The
predicted probability of correct for the test row


The
number of data points used in training the parameters


The
final EM log likelihood fit of the parameters / data points


Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos


Features of the user were more important in Bridge
to Algebra than Algebra


Student progress features / gaming the system (Baker
et al., UMUAI 2008) were important in both datasets

Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Rank

Feature set

RMSE

Coverage

1

All features

0.2762

87%

2

Percent correct+

0.2824

96%

3

All features (fill)

0.2847

97%

Rank

Feature set

RMSE

Coverage

1

All features

0.2712

92%

2

All features (fill)

0.2791

99%

3

Percent correct+

0.2800

98%

Algebra

Bridge to Algebra

Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Rank

Feature set

RMSE

Coverage

1

All features

0.2762

87%

2

Percent correct+

0.2824

96%

3

All features (fill)

0.2847

97%

Rank

Feature set

RMSE

Coverage

1

All features

0.2712

92%

2

All features (fill)

0.2791

99%

3

Percent correct+

0.2800

98%

Algebra

Bridge to Algebra



Best Bridge to Algebra RMSE on the
Leaderboard

was 0.2777



Random Forest RMSE of 0.2712 here is exceptional

Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Rank

Feature set

RMSE

Coverage

1

All features

0.2762

87%

2

Percent correct+

0.2824

96%

3

All features (fill)

0.2847

97%

Rank

Feature set

RMSE

Coverage

1

All features

0.2712

92%

2

All features (fill)

0.2791

99%

3

Percent correct+

0.2800

98%

Algebra

Bridge to Algebra



Skill data for a student was not always available for each test row



Because of this many skill related feature sets only had 92% coverage

Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Conclusion from KDD


Combining user features with skill features was very
powerful in both modeling and classification
approaches


Model tracing based predictions performed
formidably against pure machine learning techniques


Random Forests also performed very well on this
educational data set compared to other approaches
such as Neural Networks and SVMs. This method
could significantly boost accuracy in other EDM
datasets.

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Hardware/Software


Software


MATLAB used for all analysis


Bayes

Net Toolbox for Bayesian Networks Models


Statistics Toolbox for Random Forests classifier


Perl used for pre
-
processing


Hardware


Two rocks clusters used for skill model training


178 CPUs in total. Training of KT models took ~48 hours when
utilizing all CPUs.


Two 32gig RAM systems for Random Forests


RF models took ~16 hours to train with 800 trees

Random Forests

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Choose the next topic


KT: 1
-
35


Prediction: 36
-
67


Evaluation: 47
-
77


sig tests: 69
-
77


Regression/sig tests: 80
-
112

Time left?

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

UMAP 2011

55

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

Individualize Everything?

Fully Individualized Model


Model Parameters
P(L
0
) = Probability of initial knowledge
P(L
0
|
Q
1
) = Individual Cold start P(L
0
)
P(T) = Probability of learning
P(
T
|S) = Students’ Individual P(T)
P(G) = Probability of guess
P(
G
|S) = Students’ Individual P(G)
P(S) = Probability of slip
P(
S
|S) Students’ Individual P(S)
Node representations
K = Knowledge node
Q = Question node
S
= Student node
Q
1
= first response node
T
= Learning node
G
= Guessing node
S
= Slipping node
Parameters in
bold
are learned
from data while the others are fixed
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|Q
1
)
P(G)
P(S)
S
Student
-
Skill Interaction Model
Node states
K ,
Q,
Q
1,
T
,
G
,
S
= Two state (0 or 1)
Q = Two state (0 or 1)
S
= Multi
state (1 to N)
(Where
N is the number of students in the training data)
G
S
T
P(
T
|S
)
P(
G
|S
)
P(
S
|S
)
Q
1
Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

(Pardos & Heffernan, JMLR 2011)

Fully Individualized Model


Model Parameters
P(L
0
) = Probability of initial knowledge
P(L
0
|
Q
1
) = Individual Cold start P(L
0
)
P(T) = Probability of learning
P(
T
|S) = Students’ Individual P(T)
P(G) = Probability of guess
P(
G
|S) = Students’ Individual P(G)
P(S) = Probability of slip
P(
S
|S) Students’ Individual P(S)
Node representations
K = Knowledge node
Q = Question node
S
= Student node
Q
1
= first response node
T
= Learning node
G
= Guessing node
S
= Slipping node
Parameters in
bold
are learned
from data while the others are fixed
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|Q
1
)
P(G)
P(S)
S
Student
-
Skill Interaction Model
Node states
K ,
Q,
Q
1,
T
,
G
,
S
= Two state (0 or 1)
Q = Two state (0 or 1)
S
= Multi
state (1 to N)
(Where
N is the number of students in the training data)
G
S
T
P(
T
|S
)
P(
G
|S
)
P(
S
|S
)
Q
1
S identifies the student

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

(Pardos & Heffernan, JMLR 2011)

Fully Individualized Model


Model Parameters
P(L
0
) = Probability of initial knowledge
P(L
0
|
Q
1
) = Individual Cold start P(L
0
)
P(T) = Probability of learning
P(
T
|S) = Students’ Individual P(T)
P(G) = Probability of guess
P(
G
|S) = Students’ Individual P(G)
P(S) = Probability of slip
P(
S
|S) Students’ Individual P(S)
Node representations
K = Knowledge node
Q = Question node
S
= Student node
Q
1
= first response node
T
= Learning node
G
= Guessing node
S
= Slipping node
Parameters in
bold
are learned
from data while the others are fixed
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|Q
1
)
P(G)
P(S)
S
Student
-
Skill Interaction Model
Node states
K ,
Q,
Q
1,
T
,
G
,
S
= Two state (0 or 1)
Q = Two state (0 or 1)
S
= Multi
state (1 to N)
(Where
N is the number of students in the training data)
G
S
T
P(
T
|S
)
P(
G
|S
)
P(
S
|S
)
Q
1
T

contains the CPT
lookup table of
individual student
learn rates

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

(Pardos & Heffernan, JMLR 2011)

Fully Individualized Model


Model Parameters
P(L
0
) = Probability of initial knowledge
P(L
0
|
Q
1
) = Individual Cold start P(L
0
)
P(T) = Probability of learning
P(
T
|S) = Students’ Individual P(T)
P(G) = Probability of guess
P(
G
|S) = Students’ Individual P(G)
P(S) = Probability of slip
P(
S
|S) Students’ Individual P(S)
Node representations
K = Knowledge node
Q = Question node
S
= Student node
Q
1
= first response node
T
= Learning node
G
= Guessing node
S
= Slipping node
Parameters in
bold
are learned
from data while the others are fixed
K
K
K
Q
Q
Q
P(T)
P(T)
P(L
0
|Q
1
)
P(G)
P(S)
S
Student
-
Skill Interaction Model
Node states
K ,
Q,
Q
1,
T
,
G
,
S
= Two state (0 or 1)
Q = Two state (0 or 1)
S
= Multi
state (1 to N)
(Where
N is the number of students in the training data)
G
S
T
P(
T
|S
)
P(
G
|S
)
P(
S
|S
)
Q
1
P(T)

is trained for each skill
which gives a learn rate for:

P(T|
T
=1) [high learner] and
P(T|
T
=0) [low learner]

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

(Pardos & Heffernan, JMLR 2011)

SSI model results

0.279
0.28
0.281
0.282
0.283
0.284
0.285
0.286
Algebra
Bridge to Algebra
RMSE

PPS
SSI
















Dataset

New RMSE

Prev

RMSE

Improvement

Algebra

0.2813

0.2835

0.0022

Bridge to Algebra

0.2824

0.2860

0.0036

Average of Improvement is the difference between the 1
st

and 3
rd

place. It is also the difference between 3
rd

and 4
th

place.


The difference between PPS and SSI are significant in each dataset at
the P < 0.01 level (t
-
test of squared errors)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models

PLSC Summer School 2011

Zach Pardos

(Pardos & Heffernan, JMLR 2011)