BUILDING A PREDICTIVE MODEL A Behind the Scenes Look

internalchildlikeInternet and Web Development

Nov 12, 2013 (3 years and 11 months ago)

359 views

BUILDING A PREDICTIVE MODEL

A Behind the Scenes Look

Mike Sharkey

Director of Academic Analytics, The Apollo Group


January 9, 2012

THE 50,000 FT. VIEW

We have lots of data;

we need to set a

good foundation…

…so we can extract
information that will help

our students succeed

O
UR

D
ATA

F
OUNDATION

INTEGRATED DATA WAREHOUSE

LMS

SIS

CMS

Applications

Integrated

Data

Repository

Databases

Reporting

Tools

Analytics

Tools

Business

Intelligence

Applicant

HOW IS IT WORKING?


Continuous flow of
integrated data



Can drill down to the
transaction level


New data flows require

in
-
demand resources



Need skilled staff to
understand the data model

Advantages

Disadvantages

B
UILDING

A

P
REDICTIVE

M
ODEL

PREDICTING SUCCESS…


…BUT WHAT IS SUCCESS?

Learning



Program persistence



Course completion

?



Student

drops out

Student

passes class

Did the students learn
what they were
supposed to learn?

THE PLAN


Use available data to build a model (logistic regression)


Demographics, schedule, course history, assignments


Develop a model to predict course pass/fail


e.g. scale of 1
-
10


10 will likely pass the course


1 will most likely fail the course


Feed the score to academic counselors who can
intervene (phone at
-
risk students)

THE MODEL



Built different models


Associates, Bachelors, Masters


Predict at Week 0, Week 1, … to Week (last)


Strongest predictive coefficients


Course assignment scores (stronger as course goes on)


Financial status (mostly at Week 0)


Did the student fail courses in the past


Credits earned in the program (tenure)

0.00
1.00
2.00
3.00
4.00
5.00
6.00
WHERE WE ARE TODAY


Validation


The statistics are sound, but we need to field test the
intervention plan to validate the model scores


What we learned


The strongest parameters are the most obvious (assignments)


Weak parameters: gender, age, weekly attendance


Add future parameters as available


Class activity, participation, faculty alerts, inactive time
between courses, interaction with faculty, orientation
participation, late assignments


THANK YOU!

Mike Sharkey

mike.sharkey@phoenix.edu


602
-
557
-
3532


5 CHALLENGES IN BUILDING &
DEPLOYING

LEARNING
ANALYTICS
SOLUTIONS

Christopher Brooks (
cab938@mail.usask.ca
)

MY BIASES


A domain of higher education


Scalable and broad solutions


The grey areas between research and
production

QUESTION: YOUR BIASES: WHAT DO YOU THINK
THE PRINCIPAL GOAL OF LEARNING ANALYTICS
SHOULD BE
?


Enabling human intervention


Computer assisted instruction (dynamic content
recommendation, tutoring, quizzing)


Conducting educational research


Administrative intelligence, transparency,
competitiveness


Other (write in chat)

CHALLENGE 1: WHAT ARE YOU BUILDING


Exploring data


Intuition and domain expertise are useful


Multiple perspectives from people familiar with the data


More data types (diversity) is better, smaller datasets (instances)
is ok


Imprecision in data is ok


Visualization techniques


Answering a question


Data should be cleaned and rigorous, with error recognized
explicitly


The quantity of data in the datasets (instances) strengthens the
result


Decision makers must guide the process (are the questions
worth answering?)


Statistical techniques

CASE 1: HOW HEALTHY IS
YOUR
CLASSROOM
COMMUNITY (SNA
)

CASE 2: APPLYING SUPERVISED LEARNING
TECHNIQUES (CLUSTERING
)

RESULTS VALIDATED, QUANTIFIED, AND
ENCOURAGED MORE INVESTIGATION


Hypotheses


H1: There will be a group of minimal activity learners...


H2: There will be a group of high activity learners...


H3: There will be a group of disillusioned learners...


H4: There will be a group of deferred learners...

CHALLENGE 2: WHAT TO COLLECT


Too much versus too little


Make a choice based on end goals


Think in terms of events instead of the “click stream”


Collecting “everything” comes with upfront
development costs and analysis costs


The risk is the project never gets off the ground


Make hypotheses explicit in your team so they can decide
how best to collect that data


Follow agile software development techniques
(iterate & get constant feedback)


Build institutional will with small targeted gains

CHALLENGE 3: UNDERSTAND YOUR USER

Breadth of Context

Administrator

Rates for degree completion, retention rate, re
-
enrolment
rate,

number of active students...


(Abbreviated statistics)

Instructional Design/Researcher

Educational researcher, what works and what doesn't

tools and processes should change...


(Sophisticated statistics & visualizations)

Instructor

Evaluation of students, of a cohort of students, and

identifying immediate remediation...


(Visualization, Abbreviated statistics)

Student

Evaluation, evaluation, evaluation....


(Visualization)

WITH GREAT POWER COMES GREAT
RESPONSIBILITY....


Some potential abuses of student tracking data


Changing pedagogical technique to the detriment of some
students


Denying help to those who “aren't really trying”


A failure of instructors to acknowledge the challenges that
face students


Is it ethical to give instructors access to student

analytics data?


Yes


No


Sometimes

(write your thoughts in the chat)


CHALLENGE 4: ACKNOWLEDGE CAVEATS


Analytics shows you a part of the picture only


Dead tree learning, in
-
person social constructivism,
shoulder surfing/account sharing


Anonymization

tools,
javascript
/flash blockers


False positives (incorrect amazon recommendations)


Misleading actions (incorrect self
-
assessment, or
gaming the system (Baker))


Solutions


Aggregation &
anonymization


Make error values explicit


Use broad categories for actionable analytics

DOES LEARNER MODELLING OFFER
SOLUTIONS?


Learner modelling community blends with analytics.


Open learner modelling (students can see their completed
model)


Scruitable

learner modelling (students can see how the
system model of them is formed)

Question: I believe the student should have the right to view
where analytics data about themselves has come from
and who it has been made available to.


Yes


No


Sometimes

(and what are the implications on doing this? write in chat)

CHALLENGE 5: CROSS APPLICATION
BOUNDARIES


Data from different applications (clickers, lcms, lecture
capture, SIS/CIS, publisher quizzes, etc.) doesn't play well
together


Requires cleaning


Requires normalizing on semantics


Requires access


Data warehousing activities


Is there a light on the horizon?

http://www.flickr.com/photos/malikdhadha/5105818154/

QUICK CONCLUSIONS


Thus far I've learned it's
important to:


Know your goals


Know your user


Capture what you know you
need and don't worry about
the rest


Acknowledge limitations of
your approach


Iterate, iterate, iterate


Christopher Brooks

Department of Computer Science

University of Saskatchewan

cab938@
mail.usask.ca


LEARNING ANALYTICS FOR C21
DISPOSITIONS & SKILLS

Simon Buckingham Shum

Knowledge Media Institute, Open U. UK

simon.buckinghamshum.net

@sbskmi


L.A. FRAMEWORK TO THINK WITH…

Discipline knowledge

Educator

owns

and
manages

a
single
dataset

L.A. FRAMEWORK TO THINK WITH…

Discipline knowledge

Educator owns and
manages

a single
dataset

Educator

owns

and manages

multiple
datasets

L.A. FRAMEWORK TO THINK WITH…

Discipline knowledge

Educator owns and
manages

a single
dataset

Educator owns and
manages

multiple
datasets

Learners

add their
own
datasets

L.A. FRAMEWORK TO THINK WITH…

Discipline knowledge

Educator owns and
manages

a single
dataset

Educator owns and
manages

multiple
datasets

Learners add their own
datasets

Hybrid

closed + open
datasets

L.A. FRAMEWORK TO THINK WITH…

Discipline knowledge

Educator owns and
manages

a single
dataset

Educator owns and
manages

multiple
datasets

Learners add their own
datasets

Hybrid

closed + open
datasets

Hybrid
closed + open
analytics

L.A. FRAMEWORK TO THINK WITH…

Discipline knowledge

Educator owns and
manages

a single
dataset

Educator owns and
manages

multiple
datasets

Learners add their own
datasets

Hybrid closed + open
datasets

Hybrid closed + open
analytics

Focus of most LA
effort


beginning to move
towards these more
complex spaces

L.A. FRAMEWORK TO THINK WITH…

Discipline knowledge

Educator owns and
manages

a single
dataset

Educator owns and
manages

multiple
datasets

Learners add their own
datasets

Hybrid closed + open
datasets

Hybrid closed + open
analytics

Focus of most LA
effort


beginning to move
towards these more
complex spaces

http://solaresearch.org/
OpenLearningAnalytics
.pdf


L.A. FRAMEWORK TO THINK WITH…

Discipline knowledge

C21 Learning Capacities

Educator owns and
manages

a single
dataset

Educator owns and
manages

multiple
datasets

Learners add their own
datasets

Hybrid closed + open
datasets

Hybrid closed + open
analytics

critical for
learner
engagement,
and authentic
learning

Focus of most LA
effort


beginning to move
towards these more
complex spaces

“We are preparing students for jobs that do
not exist yet, that will use technologies that
have not been invented yet, in order to
solve problems that are not even problems
yet.”



Shift Happens


http
://
shifthappens.wikispaces.com


LEARNING ANALYTICS FOR THIS?

LEARNING ANALYTICS FOR THIS?

“The test of successful education is
not the amount of knowledge that
pupils take away from school, but
their appetite to know and their
capacity to learn.”


Sir Richard Livingstone, 1941

ANALYTICS FOR…

C21 SKILLS?

LEARNING HOW TO LEARN?

AUTHENTIC ENQUIRY?

social capital
critical questioning
argumentation

citizenship
habits of mind
resilience

collaboration
creativity

metacognition

identity

readiness
sensemaking

engagement
motivation

emotional intelligence

38

L.A. FRAMEWORK TO THINK WITH…

Discipline knowledge

C21 Learning Capacities

Educator owns and
manages

a single
dataset

Educator owns and
manages

multiple
datasets

Learners add their own
datasets

Hybrid closed + open
datasets

Hybrid closed + open
analytics

More LA effort
needed


e.g.

1. Disposition
Analytics

2. Discourse
Analytics

Focus of most LA
effort


beginning to move
towards these more
complex spaces

ANALYTICS FOR

LEARNING DISPOSITIONS

ELLI:
EFFECTIVE LIFELONG LEARNING
INVENTORY

W
EB QUESTIONNAIRE
72
ITEMS
(CHILDREN AND ADULT VERSIONS:
USED
IN SCHOOLS, UNIVERSITIES AND WORKPLACE)

Buckingham Shum, S. and Deakin Crick, R (2012).
Learning Dispositions and Transferable
Competencies: Pedagogy, Modelling, and Learning Analytics
. Accepted to
2
nd

International
Conference on Learning Analytics & Knowledge
(Vancouver, 29 Apr


2 May, 2012).

VALIDATED AS LOADING ONTO

7
DIMENSIONS OF “LEARNING POWER”

Changing & Learning

Meaning Making

Critical Curiosity

Creativity

Learning Relationships

Strategic Awareness

Resilience



Being Stuck & Static

Data Accumulation

Passivity

Being Rule Bound

Isolation & Dependence

Being Robotic

Fragility & Dependence

ELLI GENERATES A 7
-
DIMENSIONAL SPIDER
DIAGRAM OF HOW THE LEARNER SEES THEMSELF

Bristol and Open
University are now
embedding ELLI in
learning software.

Basis for a mentored
-
discussion on how the learner
sees him/herself, and
strategies for strengthening
the profile

43

ADDING IMAGERY TO ELLI DIMENSIONS TO
CONNECT WITH LEARNER IDENTITY

Milhouse

ELLI GENERATES COHORT DATA FOR EACH
DIMENSION

…DRILLING DOWN ON A SPECIFIC DIMENSION

Plugin visualizes
blog categories,
mirroring the ELLI
spider

ENQUIRYBLOGGER:

TUNING WORDPRESS AS AN ELLI
-
BASED LEARNING JOURNAL

Standard Wordpress editor

Categories from ELLI

ENQUIRYBLOGGER:

COHORT DASHBOARD

LEARNINGEMERGENCE.NET


more on analytics for learning to learn and authentic enquiry

ANALYTICS FOR

LEARNING
CONVERSATIONS

DISCOURSE LEARNING ANALYTICS

Effective learning conversations display some
typical characteristics which learners can and
should be helped to master

Learners’ written, online conversations can be
analysed

computationally for patterns signifying
weaker and stronger forms of contribution

SOCIO
-
CULTURAL DISCOURSE ANALYSIS

(MERCER ET AL, OU)


Disputational talk
, characterised by
disagreement and
individualised decision making
.


Cumulative talk
, in which speakers
build positively but
uncritically on what the others have said
.


Exploratory talk
, in which partners
engage critically but
constructively
with each other's ideas.

Mercer, N. (2004).
Sociocultural

discourse analysis:
analysing

classroom talk as a social mode of thinking.
Journal of Applied Linguistics, 1(2),
137
-
168.


Exploratory talk
, in which partners
engage critically but
constructively
with each other's ideas.


Statements and suggestions are
offered for joint consideration
.


These may be challenged and counter
-
challenged, but
challenges are justified
and alternative hypotheses are offered
.


Partners all actively participate
and
opinions are sought
and considered
before
decisions are jointly made
.


Compared with the other two types, in Exploratory talk knowledge is made
more
publicly accountable
and
reasoning is more visible in the talk
.

Mercer, N. (2004).
Sociocultural

discourse analysis:
analysing

classroom talk as a social mode of thinking.
Journal of Applied Linguistics, 1(2),
137
-
168.

SOCIO
-
CULTURAL DISCOURSE ANALYSIS

(MERCER ET AL, OU)

ANALYTICS FOR IDENTIFYING EXPLORATORY TALK

Elluminate sessions can be
very long


lasting for hours
or even covering days of a
conference

It would be useful if we could identify
where quality learning conversations
seem to be taking place, so we can
recommend those sessions, and not
have to sit through online chat about
virtual biscuits

Ferguson, R. and Buckingham Shum, S. Learning analytics to identify exploratory dialogue within synchronous text chat.

1st International Conference on Learning Analytics & Knowledge
(Banff, Canada, 27 Mar
-
1 Apr, 2011)

De Liddo, A., Buckingham Shum, S., Quinto, I., Bachler, M. and Cannavacciuolo, L. Discourse
-
centric learning analytics.
1st
International Conference on Learning Analytics & Knowledge
(Banff, 27 Mar
-
1 Apr, 2011)

KMI’S

COHERE:

A WEB DELIBERATION PLATFORM ENABLING SEMANTIC SOCIAL
NETWORK AND DISCOURSE NETWORK ANALYTICS

Rebecca is playing the
role of
broker
,
connecting 2 peers’
contributions in
meaningful ways

DISCOURSE
ANALYSIS

BACKGROUND KNOWLEDGE:

Recent studies indicate …

… the previously proposed …

… is universally accepted ...





NOVELTY:

... new insights provide direct evidence ...

... we suggest a new ... approach ...

... results define a novel role ...







OPEN QUESTION:

… little is known …

… role … has been elusive

Current data is insufficient …

GENERALIZING:

... emerging as a promising approach

Our understanding ... has grown
exponentially ...

... growing recognition of the

importance ...

CONRASTING IDEAS:

… unorthodox view resolves …
paradoxes …

In contrast with previous hypotheses
...

... inconsistent with past findings ...

SIGNIFICANCE:

studies ... have provided important
advances

Knowledge ... is crucial for ...
understanding

valuable information ... from studies

SURPRISE:

We have recently observed ...
surprisingly

We have identified ... unusual

The recent discovery ... suggests
intriguing roles

SUMMARIZING:

The goal of this study ...

Here, we show ...

Altogether, our results ... indicate

Xerox’s parser can detect the presence of ‘knowledge
-
level’ moves in text:

Ágnes Sándor & OLnet Project:

http://olnet.org/node/512

De Liddo, A.,
Sá
ndor
,

. and Buckingham Shum, S. (In Press). Contested Collective Intelligence: Rationale,
Technologies, and a Human
-
Machine Annotation Study.
Computer Supported Cooperative Work Journal


NEXT STEPS

SOCIAL LEARNING ANALYTICS:
Develop this framework to
integrate social, discourse, disposition and other
process
-
centric analytics

DISPOSITION ANALYTICS:
Extend the capabilities of the
ELLI ‘learning power’ platform using
real
-
time
analytics data from online learner activity

DISCOURSE ANALYTICS:
human+machine annotation
of written discourse and argument maps

IN MORE DETAIL…

Social Learning Analytics


Buckingham Shum, S. and Ferguson, R. (
2011
).
Social Learning Analytics
. Available as:
Technical Report KMI
-
11
-
01
, Knowledge
Media Institute, The Open University, UK.
http://kmi.open.ac.uk/publications/techreport/kmi
-
11
-
01


Discourse Analytics


De Liddo, A., Buckingham Shum, S.,
Quinto
, I.,
Bachler
, M. and
Cannavacciuolo
, L. (
2011
).
Discourse
-
Centric Learning Analytics.

1
st

International Conference on Learning Analytics & Knowledge
(Banff,
27
Mar
-
1
Apr,
2011
).
Eprint
:
http://
oro.open.ac.uk
/
25829



Ferguson, R. and Buckingham Shum, S. (
2011
).
Learning Analytics to Identify Exploratory Dialogue Within Synchronous Text
Chat.

1
st

International Conference on Learning Analytics & Knowledge
(Banff, Canada,
27
Mar
-
1
Apr,
2011
).
Eprint
:
http://oro.open.ac.uk/
28955


De Liddo, A.,
Sá
ndor
,

. and Buckingham Shum, S. (
2012
, In Press).
Contested Collective Intelligence: Rationale, Technologies,
and a Human
-
Machine Annotation Study.

Computer Supported Cooperative Work.
DOI:
10.1007
/s
10606
-
011
-
9155
-
x.
http://www.springerlink.com/content/
23
n
1408
l
9
g
06
v
062



Disposition Analytics


Ferguson
,
R.,
Buckingham Shum,
S. and
Deakin Crick,
R. (
2011
).
EnquiryBlogger:
Using Widgets
to
Support Awareness
and
Reflection
in a PLE Setting.
1
st

Workshop
on Awareness and
Reflection
in Personal Learning
Environments
, PLE
Conference
2011
,
11
-
13
July
2011
, Southampton, UK
.
Eprint
:
http://
oro.open.ac.uk
/
30598



Buckingham Shum, S. and Deakin Crick, R (
2012
).
Learning Dispositions and Transferable Competencies: Pedagogy, Modelling,
and Learning Analytics
. Accepted to
2
nd

International Conference on Learning Analytics & Knowledge
(Vancouver,
29
Apr


2
May,
2012
). Working draft under revision:
http://projects.kmi.open.ac.uk/hyperdiscourse/docs/SBS
-
RDC
-
review.pdf


SUMMARY

Discipline knowledge

C21 Learning Capacities

Educator owns and
manages

a single
dataset

Educator owns and
manages

multiple
datasets

Learners add their own
datasets

Hybrid closed + open
datasets

Hybrid closed + open
analytics

More LA effort
needed


We need analytics tuned
to generic capacities
which equip learners for
novel challenges

Focus of most LA
effort


mastery of core
knowledge and skills in
training is vital, but no
longer sufficient