Curves using Learning Factors

journeycartΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

73 εμφανίσεις

Learning from Learning
Curves using Learning Factors
Analysis

Hao Cen, Kenneth Koedinger, Brian Junker

Human
-
Computer Interaction

Institute

Carnegie Mellon University

Cen, H., Koedinger, K., Junker, B.

Learning Factors
Analysis
-

A General Method for Cognitive Model
Evaluation and Improvement
.
the 8th International
Conference on Intelligent Tutoring Systems
. 2006.
Science,
26(2)



Cen, H., Koedinger, K., Junker, B.

Is Over Practice
Necessary? Improving Learning Efficiency with the
Cognitive Tutor
.
The 13th International Conference on
Artificial Intelligence in Education (AIED 2007)
. 2007.

Student Performance As They
Practice with the LISP Tutor

Production Rule Analysis

Evidence for Production Rule as an
appropriate
unit of knowledge acquisition

Using learning curves to
evaluate a cognitive model


Lisp Tutor Model


Learning curves used to validate cognitive model


Fit better when organized by knowledge components
(productions) rather than surface forms (programming language
terms)


But, curves not smooth for some production rules


“Blips” in leaning curves indicate the knowledge
representation may not be right


Corbett, Anderson, O’Brien (1995)


Let me illustrate …

Curve for “Declare
Parameter” production rule


How are steps with blips different from others?


What’s the unique feature or
factor

explaining these
blips?

What’s happening
on the 6th & 10th
opportunities?


Can modify cognitive model using unique
factor

present at “blips”


Blips occur when to
-
be
-
written program has 2 parameters



Split
Declare
-
Parameter by parameter
-
number factor:


Declare
-
first
-
parameter


Declare
-
second
-
parameter


Learning curve analysis by hand
& eye …


Steps in programming problems where the function
(“method”) has two parameters (Corbett, Anderson,
O’Brien, 1995)


Can learning curve analysis be
automated?


Learning curve analysis


Identify blips by hand & eye


Manually create a new model


Qualitative judgment



Need to automatically:


Identify blips by system


Propose alternative cognitive models


Evaluate each model quantitatively


Overview


Learning Factors Analysis algorithm


A Geometry Cognitive Model and Log Data


Experiments and Results


Learning Factors Analysis (LFA):
A Tool for KC Analysis


LFA is a method for discovering & evaluating alternative
cognitive models


Finds knowledge component decomposition that best predicts
student performance & learning transfer


Inputs


Data: Student success on tasks in domain over time


Codes: Hypothesized factors that drive task difficulty


A mapping between these factors & domain tasks


Outputs


A rank ordering of most predictive cognitive models


For each model, a measure of its generalizability & parameter
estimates for knowledge component difficulty, learning rates, &
student proficiency

Learning Factors Analysis (LFA) draws
from multiple disciplines


Machine Learning & AI


Combinatorial search
(Russell & Norvig, 2003)


Exponential
-
family principal component analysis
(Gordon,
2002)


Psychometrics & Statistics


Q Matrix & Rule Space
(Tatsuoka 1983, Barnes 2005)


Item response learning model
(Draney, et al., 1995)


Item response assessment models
(DiBello, et al., 1995;
Embretson, 1997; von Davier, 2005)


Cognitive Psychology


Learning curve analysis
(Corbett, et al 1995)

Steps in Learning Factors Analysis

We’ve talked
about some of
these steps 1
-
4
before …

LFA


1. The Q Matrix


How to represent relationship between knowledge components
and student tasks?


Tasks also called items, questions, problems, or steps (in problems)


Q
-
Matrix (Tatsuoka, 1983)















2* 8 is a single
-
KC item


2*8


3 is a conjunctive
-
KC item, involves two KCs











13

Item | KC

Add

Sub

Mul

Div

2*8

0

0

1

0

2*8
-

3

0

1

1

0

What good is a Q matrix? Used to predict
student accuracy
on items not previously
seen
, based on KCs involved

LFA


2. The Statistical Model


Problem: How to predict student responses from model?


Solutions: Additive Factor Model (Draney, et al. 1995, Cen, Koedinger,
Junker, 2006)













LFA


2. An alternative “conjunctive”
model


Conjunctive Factor Model (Cen, Koedinger, Junker, 2008)














16

LFA
-

4. Model Evaluation


How to compare cognitive models?


A good model minimizes prediction risk by balancing fit
with data & complexity (Wasserman 2005)


Compare BIC for the cognitive models


BIC is “Bayesian Information Criteria”


BIC =
-
2*log
-
likelihood + numPar * log(numOb)


Better (lower) BIC == better predict data that haven’t seen



Mimics cross validation, but is faster to compute

LFA


5. Expert Labeling & P
-
Matrix


Problem: How to find the potentials to improve the
existing cognitive model?


Solution: Have experts look for
difficulty factors
that are
candidates for new KCs. Put these in P matrix.

Item | Skill

Add

Sub

Mul

2*8

0

0

1

2*8


3

0

1

1

2*8
-

30

0

1

1

3+2*8

1

0

1

Q Matrix

P Matrix

Item | Skill

Deal with
negative

Order
of Ops



2*8

0

0

2*8


3

0

0

2*8
-

30

1

0

3+2*8

0

1

LFA


5. Expert Labeling and P
-
Matrix


Operators on Q and P


Q + P[,1]


Q[, 2] * P[,1]

Item | Skill

Add

Sub

Mul

Div

neg

2*8

0

0

1

0

0

2*8


3

0

1

1

0

0

2*8
-

30

0

1

1

0

1

Q
-

Matrix after add P[, 1]

Item | Skill

Add

Sub

Mul

Div

Sub
-
neg

2*8

0

0

1

0

0

2*8


3

0

1

1

0

0

2*8
-

30

0

0

1

0

1

Q
-

Matrix after splitting P[, 1], Q[,2]

LFA


6. Model Search


Problem: How to find best model given P
-
matrix?


Solution: Combinatorial search



A best
-
first search algorithm (
Russell & Norvig

2002)


Guided by a heuristic, such as BIC


Start from an existing model

Combinatorial Search


Goal:
Do model selection within the logistic regression model
space

Steps:

1.
Start from an initial “node” in search graph

2.
Iteratively create new child nodes by splitting a model using
covariates or “factors”

3.
Employ a heuristic (e.g. fit to learning curve) to rank each
node

4.
Expand from a new node in the heuristic order by going back
to step 2


LFA


6. Model Search

Automates the process of
hypothesizing alternative KC
models & testing them against
data

Overview


Learning Factors Analysis algorithm


A Geometry Cognitive Model and Log Data


Experiments and Results


Domain of current study

15 skills

1.
Circle
-
area

2.
Circle
-
circumference

3.
Circle
-
diameter

4.
Circle
-
radius

5.
Compose
-
by
-
addition

6.
Compose
-
by
-
multiplication

7.
Parallelogram
-
area

8.
Parallelogram
-
side

9.
Pentagon
-
area

10.
Pentagon
-
side

11.
Trapezoid
-
area

12.
Trapezoid
-
base

13.
Trapezoid
-
height

14.
Triangle
-
area

15.
Triangle
-
side


Domain of study: the area unit of the geometry tutor


Cognitive model:



Log Data
--

Skills in the Base
Model

Student

Step

Skill

Opportunity

A

p
1
s
1

Circle
-
area

1

A

p
2
s
1

Circle
-
area

2

A

p
2
s
2

Rectangle
-
area

1

A

p
2
s
3

Compose
-
by
-
addition

1

A

p
3
s
1

Circle
-
area

3

The Split


Binary Split
--

splits a skill a skill with a factor
value, & a skill without the factor value.

Student

Step

Skill

Opportunity

A

p
1
s
1

Circle
-
area
-
alone

1

A

p
2
s
1

Circlearea
-
embed

1

A

p
2
s
2

Rectangle
-
area

1

A

p
2
s
3

Compose
-
by
-
addition

1

A

p
3
s
1

Circle
-
area
-
alone

2

Student

Step

Skill

Opportunity

Factor
-

Embed

A

p
1
s
1

Circle
-
area

1

alone

A

p
2
s
1

Circle
-
area

2

embed

A

p
2
s
2

Rectangle
-
area

1

A

p
2
s
3

Compose
-
by
-
addition

1

A

p
3
s
1

Circle
-
area

3

alone

After Splitting Circle
-
area by
Embed

The Heuristics


Good model captures sufficient variation in
data but is not overly complicated


balance between model fit & complexity minimizing
prediction risk (Wasserman 2005)


AIC and BIC used as heuristics in the search



two estimators for prediction risk


balance between fit & parisimony


select models that fit well without being too complex


AIC =
-
2*log
-
likelihood + 2*number of parameters


BIC =
-
2*log
-
likelihood + number of parameters *
number of observations

System: Best
-
first Search


an informed graph search algorithm
guided by a heuristic


Heurisitcs


AIC, BIC


Start from an existing model


System: Best
-
first Search


an informed graph search algorithm
guided by a heuristic


Heurisitcs


AIC, BIC


Start from an existing model


System: Best
-
first Search


an informed graph search algorithm
guided by a heuristic


Heurisitcs


AIC, BIC


Start from an existing model


System: Best
-
first Search


an informed graph search algorithm
guided by a heuristic


Heurisitcs


AIC, BIC


Start from an existing model


System: Best
-
first Search


an informed graph search algorithm
guided by a heuristic


Heurisitcs


AIC, BIC


Start from an existing model


System: Best
-
first Search


an informed graph search algorithm
guided by a heuristic


Heurisitcs


AIC, BIC


Start from an existing model


Overview


Learning Factors Analysis algorithm


A Geometry Cognitive Model and Log Data


Experiments and Results


Experiment 1


Q: How can we describe learning behavior in
terms of an existing cognitive model?


A: Fit logistic regression model in equation
above (slide 27) & get coefficients


Experiment 1


Results:

Skill

Intercep
t

Slope

Avg Opportunties

Initial Probability

Avg Probability

Final
Probability

Parallelogram
-
area

2.14

-
0.01

14.9

0.95

0.94

0.93

Pentagon
-
area

-
2.16

0.45

4.3

0.2

0.63

0.84

Student


Intercep
t

student
0

1.18

student
1

0.82

student
2

0.21

Model
Statistics

AIC

3,950

BIC

4,285

MAD

0.083

Higher intercept of skill
-
> easier skill

Higher slope of skill
-
> faster students learn it

Higher intercept
of student
-
>
student initially
knew more

The AIC, BIC & MAD
statistics provide
alternative ways to
evaluate models

MAD = Mean Absolute
Deviation

Experiment 2


Q: How can we improve a cognitive model?


A:
Run
LFA on data including factors &
search through model space


Experiment 2


Results with BIC


Splitting Compose
-
by
-
multiplication into two skills


CMarea and CMsegment, making a distinction
of the geometric quantity being multiplied

Model 1

Model 2

Model 3

Number of Splits:3

Number of Splits:3

Number of Splits:2

1.
Binary split compose
-
by
-
multiplication by
figurepart segment

2.
Binary split circle
-
radius by repeat repeat

3.
Binary split compose
-
by
-
addition by
backward backward

1.
Binary split compose
-
by
-
multiplication by figurepart
segment

2.
Binary split circle
-
radius by
repeat repeat

3.
Binary split compose
-
by
-
addition by figurepart area
-
difference

1.
Binary split compose
-
by
-
multiplication by
figurepart segment

2.
Binary split circle
-
radius
by repeat repeat

Number of Skills: 18

Number of Skills: 18

Number of Skills: 17

AIC: 3,888.67

BIC: 4,248.86

MAD: 0.071

AIC: 3,888.67

BIC: 4,248.86

MAD: 0.071

AIC: 3,897.20

BIC: 4,251.07

MAD: 0.075

Experiment 3


Q: Will some skills be better merged than if
they are separate skills? Can LFA recover
some elements of original model if we search
from a merged model, given difficulty factors?


A:
Run
LFA on the data of a
merged

model,
and search through the model space


Experiment 3


Merged Model


Merge some skills in the original model to remove some
distinctions, add as a difficulty factors to consider



The merged model has 8 skills:


Circle
-
area, Circle
-
radius => Circle


Circle
-
circumference, Circle
-
diameter => Circle
-
CD


Parallelogram
-
area and Parallelogram
-
side => Parallelogram


Pentagon
-
area, Pentagon
-
side => Pentagon


Trapezoid
-
area, Trapezoid
-
base, Trapezoid
-
height => Trapezoid


Triangle
-
area, Triangle
-
side => Triangle


Compose
-
by
-
addition


Compose
-
by
-
multiplication



Add difficulty factor “direction”: forward vs. backward


Experiment 3


Results

Model 1

Model 2

Model 3

Number of Splits: 4

Number of Splits: 3

Number of Splits: 4

Number of skills: 12

Number of skills: 11

Number of skills: 12

Circle *area

Circle *radius*initial

Circle *radius*repeat

Compose
-
by
-
addition

Compose
-
by
-
addition*area
-
difference

Compose
-
by
-
multiplication*area
-
combination

Compose
-
by
-
multiplication*segment

All skills are the same as those in
model 1 except that

1. Circle is split into
Circle
*backward*initial, Circle
*backward*repeat, Circle*forward,

2.
Compose
-
by
-
addition is not split

All skills are the same as those in
model 1 except that

1. Circle is split into
Circle
*backward*initial, Circle
*backward*repeat, Circle
*forward,

2.
Compose
-
by
-
addition is split
into Compose
-
by
-
addition and
Compose
-
by
-
addition*segment

AIC: 3,884.95

AIC: 3,893.477

AIC: 3,887.42

BIC: 4,169.315

BIC: 4,171.523

BIC: 4,171.786

MAD: 0.075

MAD: 0.079

MAD: 0.077

Experiment 3


Results


Recovered three skills (Circle,
Parallelogram, Triangle)

=>
distinctions made in the original model are necessary


Partially recovered two skills (Triangle, Trapezoid
)

=>

some original distinctions necessary, some are not


Did not recover one skill (
Circle
-
CD)


=>

original distinction may not be necessary


Recovered one skill (Pentagon
) in a different way


=>

Original distinction may not be as significant as
distinction caused by another factor


Beyond Experiments 1
-
3


Q: Can we use LFA to improve tutor
curriculum by identifying over
-
taught or
under
-
taught rules?


Thus adjust their contribution to curriculum length
without compromising student performance


A: Combine results from experiments 1
-
3

Beyond Experiments 1
-
3
--

Results


Parallelogram
-
side is over taught.


high intercept (2.06), low slope (
-
.01).


initial success probability .94, average number of practices per student is
15


Trapezoid
-
height is under taught.


low intercept (
-
1.55), positive slope (.27).


final success probability is .69, far away from the level of mastery, the
average number of practices per student is 4.


Suggestions for curriculum improvement


Reducing the amount of practice for Parallelogram
-
side should save
student time without compromising their performance.


More practice on Trapezoid
-
height is needed for students to reach
mastery.





Beyond Experiments 1
-
3
--

Results


How about Compose
-
by
-
multiplication?





Intercept

slope

Avg

Practice

Opportunties

Initial

Probability


Avg

Probability

Final

Probability

CM

-
.15

.1

10.2

.65

.84

.92

With final probability .92 students seem to have mastered
Compose
-
by
-
multiplication.

Beyond
Experiments 1
-
3

--

Results


However, after split





CMarea does well with final probability .96

But CMsegment has final probability only .60 and an average amount of
practice less than 2

Suggestions for curriculum improvement: increase the amount of practice for
CMsegment

Intercept

slope

Avg

Practice

Opportunties

Initial

Probability


Avg

Probability

Final

Probability

CM

-
.15

.1

10.2

.65

.84

.92

CMarea

-
.009

.17

9

.64

.86

.96

CMsegment

-
1.42

.48

1.9

.32

.54

.60

Conclusions and Future Work


Learning Factors Analysis combines statistics,
human expertise, & combinatorial search to evaluate
& improve a cognitive model


System able to evaluate a model in seconds &
search 100s of models in 4
-
5 hours


Model statistics are meaningful


Improved models are interpretable & suggest tutor
improvement


Planning to use LFA for datasets from other tutors to
test potential for model & tutor improvement

END