Learning Analytics: Tool Matrix

jumentousmanlyInternet και Εφαρμογές Web

21 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

156 εμφανίσεις


1




Learning Analytics:
Tool Matrix

David Dornan





Tool (URL)

Description

Opportunities in Learning
Analytic

Solutions

Weaknesses/Concerns/
Comments

Data

O
ne of
the

biggest hurdles in developing learning analytic tools is
developing data governance and privacy policy related to accessing
student

data.

The two initiatives in this section offer frameworks for opening access to student attention/learning data. The fi
rst
initiative provides a start to developing data collection standards and the second provides inspiration on how/why it is not
only feasible to
deliver free open courses, it is also makes sense in terms of providing a community based research environment

to explore, develop and
test learning theories and learning feedback mechanisms/tools.

PSLC (Pittsburgh
Science of Learning
Center) DataShop


The PSCL DataShop is a repository
containing course
data from a
variety of math, science, and
language courses.


Data Standards

Initiatives like PSLC will
help the

learning analytics community
develop standards for collecting,
anonomizing and sharing student
level course data.

Convincing individual
institutions to
contribute to this type of data
repository may be difficult given that
many institutions do not have data
governance/sharing policies to share
this type of information internally.

Open Learning
Initiative


This is an exciting initiative taking
place at Carnegie Mellon Uni
versity.
Students’ interaction with free on
-
line course material/activities

provides a virtual learnin
g analytic
laboratory to experiment with
algorithms and feedback
From Solo Sport to Community
Based Research Activity

Herbert Simon

from
Carnegie
Mellon University

states that,

“Improvement in Post Secondary
Education will require converting
t
eaching from a ‘solo sport’ to a


2


mechanisms.

community based research activity.”

There are often two concerns
related to conducting
experimentation using learning
analytics:


1.
Privacy

concerns

related to

accessing student related data
.


2. Ethical
concerns

related
to

testing
different feedback
\
instructional
response mechanisms.


By

offering free courses to student
with full disclosure of how their
interactions will be tracked and
analyzed, these
the two

issues are
no longer road blocks for conducting
learning anal
ytics research. As
learni
ng material/objects become
commodities
, the development of
learning analytics tools that help
guide and direct students will
become
what is valued
and this
requires that institutions build
expertise in developing and
sustaining th
e communities required
to conduct community based
learning research.


3


Database Storage

The majority of current learning analytics initiative are handled adequately using relational databases.

However, as

learning analytics
programs

begin to make use of the semantic web and social media tools
,

there will be a need to start exploring data storage technology
that can handle large unstructured data

sets. This section provides a brief description to the data storage required for LA prog
rams.

Relational Database

For years we have used relational
databases to structure the data
require
d for

our analyses. Data is
stored in tables consisting of rows
and columns. The columns are
well
-
defined

attributes pertaining to
an

object represented by
a

table. There
are good open source relational
database such as greenplum and
mysql. However, most
universities

have standard supported
RDMS
offerings
. At the University of
Guelph we support both SQL Server
and Oracle's RDMS.

Orac
le provides a secure repository
for structured data. The
recent
release of 11g
also provides
integration with the

R

engine
permitting it to access

data stored in
the database
.


NoSQL
Database/Hadoop/
Map Reduce

Hadoop is an Apache project
inspired by Googl
e's Mapreduce and
the
Google File System. It has
become a standard for distributing
large

unstructured data set
s
. It
provides a framework that can
distribute large data set over a
number of servers and can provide
intermediate results as data flows
throug
h the framework's pipeline.

As learning analytics
programs

begin
to make use of the semantic web
and social media tools there will be a
need to start exploring data storage
technology that can handle large
unstructured data.

Universities have good relat
ional
database infrastructures including
expertise. As LA
programs

grow to
include analysis of unstructured
data, universities will need to
devel
op skill and capacity to offer
H
adoop data storage and retrieval
services.


4


EC2


There are a number of companies
that lease access to processing via
virtual servers. Amazon’s EC2 is a
common cloud server option
available to host applications.

It is becoming common for
organization to look
at moving
application to the cloud. For many
of the traditional services
,

like the
RDMS, there
is

resistance to cloud
based

deployments. This resistance
is primarily due to
privacy concerns
and resistance to change. As LA
programs require access to new
technologies such as Hadoop and
require infrequent massive
analytical
cycles
,
there may be an
opportunity
to
introduce

cloud
-
based

offerings

such as EC2.

The
first assignment for this course
(the development of a LA tool)
provided me an opportunity to
depl
oy

an a
pplication using EC2.
EC2 is a

great
way to explore

new
technologies. If mistakes are made
one simply redeploy
s

a new EC2
instance. There are many
publically

available instances that save time in
deploying
complete

environments.
In
developing
my LA tool
,

I

deployed
an Oralce XE instance
(
which
required virtually no effort
)

and
another
RedHat instance

where

I
installed
Revo
DeployR.
Since

Revo
Deploy
R

was a new tool for me,
I had to start over several times
before
completing a successful
installa
tion
. It is possible to create
backup images in EC2. However, it
was not as intuitive as creating a
new instance.

Data Cleansing/Integration

Prior to conducti
ng data analysis and presenting it through visualizations, data must be acquired

(extracted), integrated,
cleansed and
stored in an appropriate data structure.
The tools
that perform
these tasks

are commonly referred to as ETL tools.
Given the need for
both structured and unstructured data (as described in the above section), the ide
al ETL tools will be able to access and load data to and
from data sources including RRS feeds, API calls, RDMS and unstructured data stores such as Hadoop.


5


Needlebase


Needlebase is a web
-
based
webscraping tool that provides an
easy to use interface to
acquire
,
integrate and cleanse
web
-
based

data. As a user navigates a website
tagging page elements of interest,
Needlebase detects the underlying
database structure and
web
navigation and automates the
collection of the underlying data into
a table of data.

Needle base is a great tool for
accessing a websites underlying
data when direct access to the data
is not e
asily accessible. I have used
N
eedlebase to create a lookup table
for archived National Occupation
Codes and

to create a lookup table
for our undergraduate course
calendar.

Th
ere is no API access to the
N
eedlebase scripts that are created.
It seems best for one off extracts or
for appl
ications where the entire
dataset is
acquired

using
N
eedlebase tools. It does not seem
all that useful for an integrated
solution. One other restriction that I
ran across using this tool was that it
did not support accessing websites
requiring authentica
tion.

Pentaho Integration


Pentaho Data Integration (PDI) is a
powerful easy to learn open source
ETL tool that supports
acquiring

data
from a variety of data sources
including flat
files, relational
databases,
H
adoop

databases, RSS
Feeds, and RESTful
API

calls. It can
also be used to cleanse and output
data to the same list of data sources.

PDI p
rovides a versatile ETL tool that
can grow with the evolution of an
institutions learnin
g analytics
program. For example, initially a LA
program may start with institutional
data that is easily accessible via
institutional relational databases. As
the program grows to include text
mining and recommendation
systems that require extracting
un
structured data outside the
institution, the skills developed with
PDI will
accommodate the

new
sources of data collection and
cleansing.

There are two concerns that I have
with PDI:


1. Pentaho does not have built in
integration with R statistics. Instea
d
Pentaho data mining integration
focuses on a WEKA module.


2. Pentaho is moving away from the
open source model. Originally PDI
was an open source ETL tool called
Kettle developed by Matt Casters.
Since Pentaho
acquired

Kettle (and
Matt Caster), it ha
s become a central
piece to their subscription
based BI

Suite and the
support costs

are
growing at a rapid pace. Twice, I
have
budgeted

for support on this
product only to find that the support

6


costs have more than doubled year
over year.

Talend

Talend is another open source ETL
tool that has many of the same
features as PDI. The main
differences between
PDI and Talend
are presented
in the following blog
post:


http://churriwifi.wordpress.com/20
10/06/01/comparing
-
talend
-
open
-
studio
-
and
-
pentaho
-
data
-
integration
-
kettle/


The main
difference

that from my
perspective is
that Talend is a code
generator whereas PDI is not. I have
also found PDI a much easier tool to
l
earn and use.

Talend has the same
strengths

as
described above with the additional
benefit of
having

built in
integration

with R.


Yahoo Pipes

Yahoo provides this free web
-
based
GUI

tool that allows users to extract
web
-
based data and create data
stream that will cleanse, filter or
enhance data prior to outputting the
data via an RSS feed.

Since PDI and Talend seem to be
able to
provide the

same ability as
Yahoo Pipes I did not spend

a great
deal of time exploring Yahoo Pipes.
However, it seems to me that Yahoo
pipes could provide t
he webscraping
functionality that

Needlebase
provides, yet offer a RRS feed output
that could be picked up by either
Talend or Pentaho in order to
The one concern that I have wrt
Yahoo pipes is that some of the
unstructured data that will require
analysis in a LA sys
tem will be posts
by student. If a free public service
like Yahoo Pipes is being used to
stream data through various analytic
API’s
, we will potentially
release

personal student data.


7


schedul
e nightly loads. It might be a
more

efficient way to pass web
based data streams through various
API
's

prior to extractions using PDI>

Statistical Modeling

There are three major statistical software
vendors: SAS, SPSS and R. All three of these tools are excellent for developing
analytic
/predictive

models
that are useful in developing learning analytics models. This section focuses on R.
The open source project
R
has numerous packages and commercia
l add
-
ons available that
position it well to grow with any LA program
.
Given that many researchers
are proficient in R, incorporating the R engine into a LA platform also offers an opportunity to engage faculty in the develo
pment of
reusable models/algori
thms.


8


R

R is an active open source project
that has
numerous

packages
available

to perform any type of
statistical

modeling
.

R statistics strength is the fact that it
is a widely
used by

the research
community.
Code for analysis is

widely available and there are
many
packages
available

to help with any
type of analysis and presentation
that might be
of interest
. Some

of
these
include:

1)

Visualization
:

a)

ggplot

provides good
charting functionality.

b)

googlevis

p
rovides an
interface between R and th
e
Google
Visualization API


2)

Text Mining:

a)

tm

provides functions for
manipulating text including
stripping whitespace and
stop words and removing
suffixes (stemming).

b)

openNLP

identifies words as
nouns, verbs, adjectives or
adverbs

c)

wordnet

provides ac
cess to
wordnet
library
. This is often
used to replace similar words
with a commo
n word prior to
text analysis.

Although I really like R
there are two

issues that may be of concern to
some universities:

1)

Lack of Support
-

only Revolution
R provides support for the R
product

2)

High Level of Expertise Required
to Develop and Maintain R. How
does a university retain peo
ple
that have the skill required to
develop and maintain
R/
Revo
DeployR. However, since
many faculty and students
are
proficient

with R, perhaps
building a
platform

similar

to
Datameer (see below) would
allow R code to be community
sourced allowing the
ma
jority

of
faculty and students to easily
access and build their
own

learning dashboards.


9


Here a
re

a few articles that show the
power of using a few of these text
mining packages:

1. C
reating a wordle using tm and
ggplot
-

http://ww
w.r
-
bloggers.com/building
-
a
-
better
-
word
-
cloud/

2. Provides an o
verview of
conducting text analysis using R
-

http://www.jstatsoft.org/v25/i05/pa
per


Oracle has also integrated R into it's
11g RDMS allowing R models direct
access to RDMS data.


10


Revolution R

Offerings

Including:



RevoDeployR



RevoConnectR



Integration with
IBM Netezza

Revolution R provides support for
the open source R engine and
provides add on to enhance the
integration and use of R within
databases and websites. The
RevoDeployR

is a
server
-
based

platform that provides access to the
R engine via a RESTful
API
. The
R
evoConnectR allows use of
Hadoop stored data by the R engine.
Revolution R also provides
integration with IBM Netezza data
warehouse appliances providing a
scalable infrastructure for
analyzing

very large datasets.

Revolution R is the only
commercial

supp
ort
offering

for R. Revolution R
will be useful for
institutions

that
have procurement or risk
management policies that restrict
the use of open source products.


Revolution R tools are free for
research purposes and their support
contract or licenses fo
r
institutional

purposes (i.e. learning analytics and
dashboards) are very reasonable. I
was quoted $4,ooo/core for
RevoDeployR product.

The support that I received using
RevoDeployR was very slow.
However, I am not a supported
customer.

rApache


This is an open source apache
module named mod_R that embeds
the R
statistical

engine inside the
web server.




11


Zementis ADAPA

Zementis offers a PMML
-
based
scoring engine which can be
deployed on
-
site, within a
greenplum database, within an excel
spreadsheet or consumed as a web
service using Zementis amazon
cloud based service. By using the
PMML (Predictive Model Markup
Langua
ge) standard ADAPA can
easily leverage predictive models
developed in the major statistical
software including R, SAS and SPSS.
It can quickly provide scoring based
on
any

of the following
modeling

techniques:


-

Support Vector Machines


-

Naive Bayes Cla
ssifiers


-

Ruleset Models


-

Clustering Models


-

Decision Trees


-

Regression Models


-

Scorecards


-

Association Rules


-

Neural

Networks

ADAPA allows for easy consumption
of predictive scores into a student or
faculty web based learning
dashboard. T
he cloud based service
starting at only $0.99/hr only
requires a $2000/semester
investment. I tried using the API to
create a Purdue
-
like dashboard in
the LA
tool
,
but
I

did not have time
to get it working properly.

Zementis has partnered with
RevoDeplo
yR to create their web
base subscription
service

using
RevoDeployR. So if RevoDeployR is
part of your LA architecture, it could
provide the same functionality using
your in house service.

Network Analysis

Network Analysis focuses on the relationship between entities. Whether the entities are students, researchers, learning obje
cts or ideas,
network analysis attempts to understand how the entities are connected rather than understand the attributes of the e
ntities. Measure
include
density, centrality, connectivity, betweenness and degrees
. This is an important area to explore, as we take up
Herbert Simon

(from
Carnegie Mellon University
) challenge

and nudge learning and teaching ‘from a solo sport to a

co
m
munity based research activity’
.

12


Network analysis can not only help us identify pattern that help identify dis
-
connected students or help predict success based network
metrics, these tools can help student develop networking skill that will be required
for successful life long learning and research.

SNAPP

Social Networks

Adapting
Pedagogical Practice (SNAPP)

is
a
network visualization tool that is
delivered as a

'bookmarklet'
. Users
can easily create network
visualizations from LMS forums in
real time.

Self Assessment Tool for Students

SNAPP p
rovide student
s with

easy
access to network visualizations
of
forum posting. These diagram can
help
students

understand their
contribution to class discussions.

Identify at Risk Students
/ Monitor
Impact of Learning Activity

Network Analysis visualizations can
help faculty identify students that
may be isolated. They can also be
used to see if specific activit
ies have
impacted the class network.




NodeXL

NodeXL is an excel add
-
on that
creates network visualizations from
a worksheet containing the lists of
edges. The tool provides the ability
to calculate common networking
measures such as
density, centrality,
connectivity, betweenness and
degrees
. Data c
an be exported in a
format that can be imported into
Sophisticated Network Analysis

Both NodeXL and Gelphi can be used
to explore network patterns. These
tools are useful for researchers.
It
would be
interesting to explore the
relationship these

network metrics
(

e.g.

centrality and betweeness
)

and


13


gelphi for further analysis or refined
visualization.

student success.


Gelphi

Gelphi offers a standalone product
for analyzing networks. It is the
most advanced of the
three network
analysis tools described in this
section.


cohere


Simon Buckingham and Anna De
Liddo have developed
an

enhanced

diigo
-
like
tagging
/bookmark

tool
that has
allows a
user

to
l
ink their
contributions to other ideas and
websites with descriptive adjectives.


Idea Creation

While t
his tool provides the creators
with data that

is useful to conduct
th
eir discourse analysis research i
t
also provides
people/researchers

with a
tool th
at may help connect
them to people that have related
interests and ideas and may help to
stimulate new ideas and
collaborations.


Other Tools for Analysis

ViralHeat

Viral heat provides a
full
-
featured

tool set and an
API

that helps
monitor web content for specific
mentions of people, products and
services.


Monitor and Evaluate
Course/Program Satisfaction

This relatively cheap analytics
offering could help introduce the
use of analytics by helping
evaluate
a recruitment drive/strategy or
fundraising campaign.



14


WordNet


Princeton

University provides a
lexical database that links English
words (or sets of words) by their
common meaning. It is essentially a
database that helps identify
synonyms.

Identify Main Concepts found in a
L
earning

O
bjects
/ Forum Post

This lexical database

is
used in text
analysis
to replace

similar words
with one common descriptor.


Leximancer


Leximancer provides sophisticated
text analysis and presentation of
concepts found in a learning object.
The
API

can return
interactive
concept maps demonstrating how
different ideas connect.

The tool provides the ability to drill
from the concept map down to
the
text that spawned
the concept map.


Identify Main Concepts found in a
L
earning

O
bjects
/ Forum Post

Leximacer could be a
used

to help
consolidate the main ideas of a
lecture or d
iscussion groups. It can
also provide students with easy
access to the detailed

discussion and
material related to a concept
via a
link from the concept map to the
discussion fo
rum posting.



Wolfram API

The
Wolfram Alpha

API provides
a

developers with the ability to
submit free text/questions from a
website to the
Wolfram Alpha

engine and have the results
returned.


Dynamic
Content Delivery

The wolfram
API

could be used to
provide supplemental material to
on
-
line discussion.


Linked Data

If Tim Berners
-
Lee vision of linked data (
http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
) is successful in
transforming the internet into a huge database, the value of delivering content via courses and programs will diminish and
universities will need to find new ways of adding val
ue to learning.

Developing tools that can facilitate access to relevant

15


content using linked data could be one way that universities remain relevant in the higher learning sector.



Ontologies

e.g.
DBPedia

Ontologies are essentially an agreed
upon concept map for a particular
domain of knowledge.

Dynamically Deliver

Relevant
Content

Using OpenCalais along with
well
-
defined

ontologies provide
s

a
mechanism for dynamically
delivering/suggesting related
readings.


OpenCalais


Reuter’s offers this free
API

that
takes text input and returns tags
that will link the concepts in the text
to other linked data on the web.


Visualization

The presentation of the data after it has been extracted, cleansed and analyzed is critical
to successfully engage

students

in learning and
acting on the information that is presented.

Google
Visualization
API’s

(
http://code.google.
com/apis/chart/
)

Google Visualization provides an
API

to their chart library allowing for the
creation of charts and other
visualizations. They have recently
released an
API

to add interactive
controls to their charts.

Interactive Learning Dashboards

All of these tools

are useful for
creating visualizations

for learning
feedback systems such as
dashboards.

The Motion Chart (purchased from
gapminder) is one of my favourite
interactive charts that
Google

provides access via their
API
.

All of these tools can

present data as
a heat map
s
, network analysis

Learning how to use these
tools/
libraries requires a fair amount
of effort
. D
eveloper r
etention
is
a
risk for

system maintenance and
enhancement.

Protovis

(http://mbostock.git
hub.com/protovis/)

D3

(http://mbostock.git
hub.com/d3/)

Protovis and D3 are
JavaScript

frameworks for creatin
g web
-
based
visualizations. Protovis

is no longer
an active open source project. It has
been replaced by D3.


16


FusionCharts

(http://www.fusionc
harts.com/)

Fusion Charts provides

a commercial
JavaScript

framework for creating
dynamic visualizations
.

diagrams

and
tree maps
. Here's a
link to an example dashboard
created in D3,
presenting university
admission data.

http://keminglabs.com/ukuni/


Reporting Suites

Many universities have reporting
tools available to create
visualizations. Tools include
Tableau, Cognos, Pentaho and
Jasper Reports.

All of these vendors provide good
tools to create reports and
dashboards. My favourite is
Tableau, however, JasperRepo
rts or
Penta
ho are much more affordable.


Full Analytics
Offerings

LOCO

Using
LOCO (Learning Object
Context Ontologies)
student on
-
line
activities are mapped to specific
learning objectives. The tool set
provides faculty with feedback
related to how well material has
been understood, as well it
provides network visualizations
describing student interaction.
The tool prov
ides a framework for
describing
on
-
line learning
environments.


Faculty Feedback Related to
Learning Success



17





DataMeer

(http://www.datame
er.com/)

Datameer p
rovides full set of tools
allowing users to condu
ct advanced
analytics on H
adoop based data.

Engage

Faculty in Learning
Analytics

I like Datameer's wizard based
approach to user controlled
analytics. It provides some ideas on
how one could provide faculty with
the ability to contribute or reuse
predictive models, quickly test
historic
data,
deploy a le
arning
analytics
algorithm and present the
results in a learning dashboard
.

This approach may be too
compli
cated for delivery to the
masse,
as I suspect that the majority
of faculty will want something that
requires less effort.