Welcome to CD140!

viraginityfumblingSoftware and s/w Development

Nov 2, 2013 (3 years and 9 months ago)

78 views

CD140 Introduction/Slide
1

Welcome to CD140!

The purpose of this course is to prepare you for your own quantitative data
analytic projects. Along the consumption/production continuum, we will focus on
the
production

of trustworthy data analysis. You
will

become a critical consumer
of data analysis, because you will be able to ask the incisive question: “How would
I have conducted this analysis?”



Wherever you work, there will be spreadsheets full of data. When those data need
analyzing, I hope that you will step forward and take on the challenge.

The content of CD140 is comprised of 10 units and 1 appendix, each of which has
an associated “post hole” and “memo.” Post holes and memos are the two major
work products of this course

POST HOLES

are key concepts and skills that you will learn by heart for the purposes of this course. The term “post hole” is not a
statistical term, but a pedagogical term. The metaphor comes from ranching. You want to build a corral for a great number of

da
ta
-
analytic applications. You will dig deeply into eleven of those concepts and skills and use them as anchoring points. As you
mak
e
connections between those points, you will fence in a myriad of otherwise fleeting practical applications. You will have regu
lar

post
hole assessments and check ups to make sure you are on track for the final exam. The final exam is an in
-
class, closed
-
book, ind
ividual
effort. If that’s not scary enough: you must pass the final in order to pass the course, and only an A grade counts as passin
g.
If that’s
too scary: you will work on the eleven specific questions and tasks again and again, and only those specific questions and ta
sks

will be
on the exam; furthermore, if you do not pass the first administration of the final exam, you can retake the final, and if you

do

not pass
the second administration, we will sit down elbow
-
to
-
elbow and work through the third administration together. You can do this.
Note
that the post holes are not the posts, the posts are not the rails, the rails are not the fence, the fence is not the corral,

an
d the corral
is not the horses. The post holes are necessary, but not sufficient, concepts and skills.

Consumption of
Data Analysis

Production of
Data Analysis

MEMOS

are short weekly writing assignments (2
-
3 pages). The first phase of each memo is an individual effort, and the second phase is
a group effort. The memos will not only reinforce the post holes but also go beyond. Good data analysis must be collaborative
, a
nd it
must be mindful of the ultimate audience. To those ends, each memo will in fact be a pair of memos: one memo for a research t
eam

and one memo for a school board. The (sometimes sentence
-
by
-
sentence) outline for each memo will be at the beginning of each
lecture. Each week, I expect you to spend 90 to 180 minutes on the individual memo and another 90 to 180 minutes on the group

memo. The group memo must explicitly address my “red” comments from the individual memo.

CD140 Introduction/Slide
2

Ease Your Anxious Mind

You will do great in this course, with a few possible exceptions…


If you are motivated by getting better grades than everybody else, this course may
not be for you. In this course, everybody who does the work earns an A. (I will insist
on rewrites and retakes for any work that is not up to specs.)


If you are motivated by harsh criticism and disapproval, this course may not be for
you. I confidently predict that I will like the work that you do. Do not get me wrong: I
will be critical. I will get you weekly feedback, and to the best of my ability, I will
tear apart your work. My challenge is to provide constructive criticism in proportion
(not inverse proportion) to the excellence of your work.


If you cannot send me an e
-
mail or speak to me outside of class when you feel as
though you are drowning, this course may not be for you. I designed this course to be
challenging in the best of circumstances. Life, however, does not always present us
with the best of circumstances. I do not need to know your personal business, but I
do need to know enough so that I can make the right accommodations to meet your
learning needs.

There are many moving parts to this course,
but the weekly pattern is repetitive. Treat
the first week as a rehearsal, the second
week as a dress rehearsal, the third week as
a preview and the fourth week as an opening.

CD140 Introduction/Slide
3

Materials

(Very) Optional Materials


We will be using a free online textbook (
http://www.onlinestatbook.com
)
as a supplement.

If you want a hunk of paper and print to hold and to
cherish, you can purchase Hinkle, Wiersma and Jurs'
Applied Statistics for
the Behavioral Sciences

or one of a thousand other introductory textbooks.


You may want to purchase an SPSS/PASW Grad Pack for about $200.
SPSS/PASW is easier to use than R, prettier than R, and more common in
the social sciences than R. I will be teaching R and SPSS/PASW side by side.
If you can use R, you can use SPSS, but the other way around is not quite as
true.


Required Materials


High Speed Internet Access

I will be screencasting my lectures, and you will be collaborating online through
Google Docs. You do not want to be tethered to a computer lab this semester
.


Google E
-
Mail (Gmail) Account

This need not be your “personal” Gmail account; you can create a Google
account (
https://www.mail.google.com
) for this course. We will be collaborating via Google Docs.



Portable USB Data Storage Device

Most everybody owns a “thumb drive” or “pen drive” or “USB drive,” and
that will surely suffice. Bring it to
every

class because we will be moving data around.


R Statistical Computing Software

Each week, we will be writing and running statistical programs. Most, if
not all, of our statistical computing work will be in class. However, for this semester and the future, you want
ready access to statistical software, and R is the best statistical software package in the world, and it happens
to be free. You can download R at
http://cran.stat.ucla.edu/

and get instructions for the R Commander GUI at
http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/installation
-
notes.html
, but I will be providing more download
and setup support soon.

CD140 Introduction/Slide
4

If personality conflicts arise, please be professional. If a conflict is interfering with your learning, please let me know.
I will not take sides (no matter how right you are!), but I will do my best to get the learning back on track.

Collaboration

You can collaborate online (through Google Docs) without any face
-
to
-
face meetings outside of class, but ideally you
will be able to meet for a couple of hours on Mondays, Tuesdays or Wednesdays to hash out ideas in person. If the
whole group cannot get together, it may still be helpful to meet in subgroups.

For the in
-
class projects and the group memos,

you will be working in rather large teams of four to six.


Distribute the work evenly (but variedly) every week. Do not take turns piling the work on one another.

For the group memos, choose one teammate’s individual memo as a basis. Every
team member should read every other member’s memo (and my comments*) and
build on the basis by adding, subtracting and rewriting. You can do this in real
time together, or send it around (at least twice).
* You can edit my comments before
sharing. My comments on your work are primarily for you, and if you do not want to share them all,
then do not. You be the judge.

If you have a major disagreement, GOOD! Note the disagreement, state your concerns, and that is a “conceptual
dialogue.” You must have at least two documented conceptual dialogues per memo.

Some students (but not all) have found it helpful to form strictly undergraduate or graduate groups. Keep this in mind
when forming groups. I have had great experiences with homogeneous and heterogeneous groups.

Teams are large enough to absorb the absence of one or two team members in any given week. Please inform your
teammates when you are out of the loop so that they can carry on without you. Please be patient with your
teammates.

We will form teams expeditiously in the first class. It will help if you complete the “Getting to Know You” Google
Spreadsheet that will arrive via your Gmail account.

CD140 Introduction/Slide
5

© Sean Parker EdStats.Org

Semester Work Flow

Thursday

Friday>Saturday>Sunday

Monday>Tuesday>Wednesday

Attend Class: (2.5 Hours)


Lecture Follow Up


-
20 Minutes


In
-
Class Project


-
50 Minutes


Post Hole Assessment


-
30 Minutes


Memo Head Start


-
50 Minutes


Submit: (3 Hours)


Individual Memo

Submit: (3 Hours)


Group Memo

Due Before Class

Prepare: (3 Hours)


Watch Screencast


Review PowerPoint Slides


Practice Post Holes


Read Supplementary Text

Submit: (1 Hours)


Post Hole Check Up


Two Discussion Posts

Final Exam(s): The first administration will
be on the last day of class. If you get a
“Pass,” you’re done. If you get a “Not Yet,”
then you can take it again the next day. If
you get a “Pass,” you’re done. If you get a
“Not Yet,” then together we will take the
test again the next week. If you cannot
meet some time between 12pm and 10pm
on the 29
th
, please let me know ASAP.
There is flex in the timing of the retakes to
meet your learning needs. The one
-
day
turnaround is in response to student
requests; it is not meant to rush anybody.

S

M

T

W

T

F

S

S

M

T

W

T

F

S

JAN

16

Unit 1

MAR

13

Unit 8

JAN

23

Unit 2

MAR

20

Break

JAN

30

Unit 3

MAR

27

Unit 9

FEB

6

Unit 4

APR

3

Unit 10

FEB

13

Unit 5

APR

10

Appendix A

FEB

20

Sub Day

APR

17

Practicum

FEB

27

Unit 6

APR

24

Final

Retake

MAR

6

Unit 7

MAY

1

Retake

Retake

CD140 Introduction/Slide
6

Introduction: Road Map (VERBAL)

Nationally Representative Sample of 7,800 8
th

Graders Surveyed in 1988 (NELS 88).

Outcome Variable (aka Dependent Variable):


READING
, a continuous variable, test score, mean = 47 and standard deviation = 9


Predictor Variables (aka Independent Variables):


FREELUNCH
, a dichotomous variable, 1 = Eligible for Free/Reduced Lunch and 0 = Not


RACE
, a polychotomous variable, 1 = Asian, 2 = Latino, 3 = Black and 4 = White



Unit 1: In our sample, is there a relationship between reading achievement and free lunch?




Unit 2: In our sample, what does reading achievement look like (from an outlier resistant perspective)?



Unit 3: In our sample, what does reading achievement look like (from an outlier sensitive perspective)?



Unit 4: In our sample, how strong is the relationship between reading achievement and free lunch?



Unit 5: In our sample, free lunch predicts what proportion of variation in reading achievement?



Unit 6: In the population, is there a relationship between reading achievement and free lunch?



Unit 7: In the population, what is the magnitude of the relationship between reading and free lunch?



Unit 8: What assumptions underlie our inference from the sample to the population?



Unit 9: In the population, is there a relationship between reading and race?



Unit 10: In the population, is there a relationship between reading and race controlling for free lunch?



Appendix A: In the population, is there a relationship between race and free lunch?


© Sean Parker EdStats.Org

CD140 Introduction/Slide
7

© Sean Parker EdStats.Org

Introduction Roadmap (R Output)

Unit 1

Unit 2

Unit 3

Unit 4

Unit 5

Unit 6

Unit 7

Unit 8

Unit 9

CD140 Introduction/Slide
8

© Sean Parker EdStats.Org

Introduction: Roadmap (SPSS Output)

Unit 1

Unit 2

Unit 3

Unit 4

Unit 5

Unit 6

Unit 7

Unit 8

Unit 9

CD140 Introduction/Slide
9

Introduction: Road Map (Schematic)

© Sean Parker EdStats.Org

Single Predictor

Continuous

Polychotomous

Dichotomous

Continuous

Regression

Regression

ANOVA

Regression

ANOVA

T
-
tests

Polychotomous

Logistic
Regression

Chi Squares

Chi Squares

Dichotomous

Chi Squares

Chi Squares

Outcome

Multiple Predictors

Continuous

Polychotomous

Dichotomous

Continuous

Multiple
Regression

Regression

ANOVA

Regression

ANOVA

Polychotomous

Logistic
Regression

Chi Squares

Chi Squares

Dichotomous

Chi Squares

Chi Squares

Outcome

Units 6
-
8: Inferring
From a Sample to
a Population


CD140 Introduction/Slide
10

The Lecture Slides are Color Coded

Avoid unwarranted causal and developmental language. Correlation does not imply causation. Free
lunch eligibility is correlated with reading scores such that, on average, students who are eligible for
free lunch score lower on the reading test. This correlation does not imply that free lunch hurts
reading performance! If so, we could solve our literacy problems by charging $20 for school lunch.

“In our sample, students eligible for free lunch tend to score 5 points lower on the reading test than
their ineligible counterparts.”

“In our sample, given two students who differ in free lunch eligibility, we predict that the student
who is eligible for free lunch will, on average, have a reading score that is 5 points lower.”

Magnitude: The slope of the regression line. It is the most important number in statistics. It helps us
determine the practical implications of relationships. It answers: What’s the bang for your buck?
Does a little of X “buy” you a lot of Y?

Key Concepts are in RED:

Key Interpretations are in VIOLET:

Key Terminology is in BLUE:

Note: if you are struggling with what to write in your memo, start with the Key Interpretations for
the unit in your memo. Each memo should have as its heart the Key Interpretations from the unit.

Note: I know the color is more aptly described not as RED but as HOT PINK, but I’ll call it RED. I had
to live through the 80’s, so I get bad flashbacks when I think of the color as HOT PINK.

Note: At the end of each lecture (in addition to a series of practice material) will be a series of
appendices that compiles all the Key Concepts, Interpretations and Terminology for the unit.

CD140 Introduction/Slide
11

Epistemological Minute

The Preface Paradox
, or
the paradox of the preface
,
[1]

was introduced by
David Makinson

in 1965
(from
Wikipedia
).

It is customary to preface a scholarly work with acknowledgements and a caveat that all
errors are strictly the responsibility of the author. In keeping with custom, I acknowledge
that I am deeply indebted to my data analytic mentors John Willett, Judith Singer, Daniel
Koretz, Erin Phelps, Richard Murnane, Terry Tivnan and all my students; all errors of
omission and commission, however, are solely mine.

Here is the rub:

In this course, I only make statements (or inscriptions) if I am highly certain of their
truth. Yet, in this course, I am highly certain that I make false statements (or
inscriptions). How can it be rational for me to be highly certain of each statement that it
is true but at the same time be highly certain that at least one of my statements is false?

I am also indebted to my epistemological mentor, Kate Elgin. She has guided me in my goal to make
pedagogically useful connections between statistical reasoning and general reasoning. Thank you!

CD140 Introduction/Slide
12


I intend to give every student an A for the semester, but I need every student to keep
up with the weekly work.

Keep a weekly checklist. If you complete the checklist each week, you will earn an A.
Here is the checklist:


Submit Individual Memo:

Put forth an honest first effort. Spend at least an hour and a half but absolutely no
more than three hours. At the three
-
hour mark, stop (in mid
-
sentence perhaps), and cite the “Three
-
Hour Rule”
(so I know what happened to the rest of the sentence). To ensure that the most important writing gets done in
the three hours, always start with the most important writing, which will be based on the unit’s Key
Interpretations.


Submit Group Memo:

First and foremost, your group memo should explicitly address all my
RED

comments from
your individual memo. Use the comment feature of the word processor to explicitly note (with your initials and
comment #) your attempt to address each
RED

comment from your individual memo. In order to pass the group
memo, you must explicitly address each of my
RED

comments from your individual memo.


Submit Post Hole Check Up:

Conduct an honest self assessment of your posthole progress. How would you do on
the final exam (excluding future post holes)?


Submit Discussion Post #1:

Each post should be at least one full paragraph with a clear topic sentence.


Submit Discussion Post #2:

Your posts can be a question, a clarification, an insight, or an answer.

If you don’t complete the checklist each week, there will be grade penalties.


5 points

from your semester grade if you fail to submit an individual memo by noon on Monday.


3 points

from your semester grade for each
RED

comment that you fail to explicitly address in your group
memo. (Note that you must
address

each comment; you need not
perfectly address

each comment.)


1 point

from your semester grade for each failure to submit a post hole check up or discussion post by midnight
on Wednesday.

You have a three
-
week grace period (with a possible one
-
week extension on request).
During the grace period, all confusions are forgiven as you acclimate to the course structure. You are still responsible
for the five weekly submissions for the first three weeks even if it takes you a while to get in sync. (If you are missing
grace
-
period submissions by the fourth week, you will be doubly penalized for those missing submissions.)

Grading

CD140 Introduction/Slide
13

Expectations

Throw
YOUR

expectations out the window! This class is different. In how many classes do you listen to lectures at
home and complete group projects in class? In how many classes do you have a pass/fail final exam where you get
all the questions on the first day of class? In how many classes are you guaranteed an A as long as you submit five
assignments each week (individual memo, group memo, post hole check up, and two discussion posts)?

A good research methods course requires from 1.5 to 2 courses
worth of time and effort. I did not make up this rule, but CD140
is no exception to the rule. Treat this course as a ¼
-
time job that
requires you to come early and stay late. Set aside ten hours
in
addition

to class. Set aside three hours on Friday, Saturday or
Sunday to draft your individual memo. Set aside three hours
(with your group if possible) on Monday, Tuesday or Wednesday
to draft your group memo. Set aside three hours on Monday,
Tuesday or Wednesday to watch the screencast, practice the post
holes and review the lectures. Set aside an hour on Wednesday
to submit a post hole check up and two discussion posts.

This course is a cognitive apprenticeship. During the lectures I will talk you through one or more data analytic
projects. I will go beyond the post holes. I will make essential points. I will make tangential points. I will take
backward looks, and I will take forward looks. Data analysis is not a 1
-
2
-
3 recipe, and I hope to model that. Watch
the screencasts, take notes, and ask questions.

There IS crying in statistics. Statistics is a notoriously frustrating subject. I cannot claim that this statistics course
will be tear free. However, I can promise you that, if something is driving you crazy, I will work with you to make
things better.

Your best is (more than) good enough. Data analysis is so complex that nobody has all necessary talents and
everybody has some valuable talents. It takes a village to analyze a data set, and we need you in this village.

CD140 Introduction/Slide
14

EdStats.Org

Much of the course content will be accessible through EdStats.Org. Please create an account. The benefits: EdStats.Org
will be there for you when you are ready to undertake your own projects (perhaps five years from now). If you want to
foster a culture of data
-
analytic inquiry in your school or organization, you may want to use EdStats.Org.

CD140 Introduction/Slide
15

Checklist of Things To Do
Before

the First Class

Step 1 (ASAP)


E
-
mail me your Gmail address.


Sign Up at EdStats.Org


Step 2 (When You Get The Link Via Gmail)


Tinker With The “Getting to Know You” Google Spreadsheet


Step 3 (Monday > Tuesday > Wednesday)


Prepare For Class (Materials On
http://blackboard.tufts.edu/)


Watch The Unit 1 Screencast


Review The Unit 1 PowerPoint Slides


Complete The Post Hole Check Up


Submit Two Discussion Posts


Do not forget that there is a three
-
week grace period (i.e., one week of
rehearsal, one week of dress rehearsal and one week of previews).
Take a deep breath, and do what you can. What gets done, gets done.
What doesn’t get done, we’ll figure out in class together.