PPT - Ateneo de Manila University

scacchicgardenSoftware and s/w Development

Dec 13, 2013 (3 years and 6 months ago)

57 views


first
-
year computer science students lack
programming comprehension


failing rate in an introduction to programming
class in Australia is as high 35%


30% of computer science students in the United
Kingdom and the United States did not
understand programming basics after their first
programming class


students have a fragile grasp of programming
and were unable to read, analyze, and trace
through short fragments of code


Research is conducted to:


know the characteristics of novice programmers


Causes of their problems


Find possible solutions



lack of a mental model


misconception of programming constructs


lack of programming strategies


lack or absence of debugging strategies


Prior to entering CS1


Gender


secondary school performance


dislike of programming


intrinsic motivation and comfort level


high school mathematics background


prior programming experience


attribution to luck for success/failure, and


perceived understanding of the material


Behaviors that have positive effect on
performance:


perfectionism and self
-
esteem, and


high states of arousal or delight


Behaviors that have negative effect on
performance:


disliking programming


frustration


Confusion


boredom and


IDE
-
related on
-
task conversation



Determine whether analysis of online
protocols can successfully identify/predict
at
-
risk novice Java programmers



sequence of program compilations while performing
laboratory exercises


Are gathered by enhancing development
environments used in programming to store data in
a database


1.
How do students with different
achievement levels differ in terms of


Error profiles?


Average time between compilation profiles?


EQ profiles?

2.
What factors can predict the midterm
score?


Participants


143 Introduction to Computing students


Tools for Data Collection


BlueJ


WebServer



Sqlite

Database


LAN


Procedure


Laboratory Setup


Orientation


Data Gathering


Data Analysis


Data Cleaning


Data Extraction



Data Analysis


Generate summaries


Errors encountered


Time between compilations


Compute EQ score


Use statistical tool R Stat


Perform one
-
way Anova to differentiate student groups


correlate EQ score with midterm exam score


Use datamining tool (Rapidminer and Weka) for
creating linear regression models


Developed by Matthew Jadud


Quantifies students’ compilation behavior


Characterizes how much or little a student
struggles with syntax errors


EQ score ranges from 0.0 to 1.0, where a 1.0
is an indication that a student encountered
the same error all throughout the
compilations


Start

End

Do both
events end
in errors?

Add 2

Add 2

Add 2

Same error
location?

Same
error
type?

Same edit
location?

Add 3

Y

Y

Y

Y

N

N

N

N


Lowest score=38


Highest score=96


Mean=75, Standard Deviation=13


Student Grouping:


AtRisk


scores 62 and below


HighPerforming
-
scores 89 and above


Average= scores 63 to 88


HighPerforming group was significantly
different from the AtRisk and Average
groups at p < .001 and have lower number
of errors encountered compared to the two


Average group is not significantly different
from the AtRisk group


HighPerforming group was significantly
different from the AtRisk and Average groups
and they have higher average time between
compilations compared to the two groups


There was no significant difference between
the Average and AtRisk groups



the HighPerforming group was significantly
different from the Average and AtRisk groups
except on the time intervals


21
-
30, 111
-
120 and >120 seconds for the Average
group


81
-
90 seconds for the AtRisk group


the HighPerforming group have lower
number of compilations


there was no significant difference between
the Average and AtRisk group in all time
intervals




Linear Regression was performed to come up with models
-
regression
line in the form

Y =
aX

+ b


Two questions to ask about the model:

1.
Does the model fit the observed data well?


Compute correlation coefficient
r
, a measure of the relation between X
and Y


look at the
scatterplot


Compute R
2


the square of the correlation coefficient
r,
measures the
strength of the relationship between X and Y


Compute
BiC

-
Bayesian Information Criterion

2.
Can the model generalize to other samples?


Can the model predict the same outcome from the same set of
predictors in a different sample?


Adjusted R
2


indicates the loss of predictive power of the model








Model 1:


MidtermScore = 83.63049
-

0.0919*TotalErrors


p
-
value < .001,


BiC’ =
-
7.8,


Adjusted R
2
=0.161

Model 2:

MidtermScore = 83.50274


-

0.25632*UNKNOWN_VARIABLE


-

0.42035*CLASS_INTERFACE_EXP


-

0.75506*UNKNOWN_CLASS

p
-
value < .001,


r =


BiC’ =
-
10.2635,


Adjusted R
2
= 0.1994,

Model 3:



MidtermScore = 65.04788



+ 0.12107*AverageTBC_seconds


p
-
value < .01,


BIC =
-
1.97243,


Adjusted R
2

= 0.06512

Model 4:


MidtermScore = 87.4381
-

2.0042*Twenty


+ 6.4780*Ninety + 7.4892*Hundred


p
-
value < .01,


BIC =
-
7.01032,


Adjusted R
2

= 0.1263

Model 5:


MidtermScore = 92.918
-

64.396*EQ

p
-
value < .001,


BIC =
-
17.3303


Adjusted R
2

= 0.2971,

Model 6:

MidtermScore = 90.58643
-

43.33380*EQ

p
-
value < .001,


BIC =
-
20.8326,




Adjusted R
2

= 0.3073


We found:


Students encounter similar error types


Total Errors Encountered


HighPerforming < Average <= AtRisk


Three out of the top 10 errors may affect the
midterm scores of the Average and AtRisk students


Average Time between compilations among
HighPerforming students are higher compared to the
Average and AtRisk students


EQ among HighPerforming students are lower
compared to the Average and AtRisk students





Linear Models


Informs which errors directly affects the midterm
score which implicitly points to the concepts that
AtRisk students need assistance


High incidence of rapid fire compiling maybe a
symptom of AtRisk students


EQ can significantly predict Midterm Scores





Use the models to automatically detect
AtRisk students while using an IDE


Implications on teaching: to address
concepts that help students resolve the
errors that directly affects performance

Questions?