> table(survey$grade, survey$effort)

taxidermistplateSoftware and s/w Development

Nov 7, 2013 (3 years and 7 months ago)

52 views



Version 1.0

Exploring Computer Science

Unit 6: Participatory Urban Sensing “R” Supplement

Page
1



Public Agenda Data and Mosaic Plots Activity

** Looking at two variables at once

Yesterday, we looked at bar plots about grades and effort separately.



A good question to ask is “are the two are related?”


R code

>

table(survey$grade, survey$effort)


Description

This groups the data down into how many people answered the 2 questions the same
way. In other words, there are 311 students that an A AND are Trying best to do well in
school.




Version 1.0

Exploring Computer Science

Unit 6: Participatory Urban Sensing “R” Supplement

Page
2


Output



To represent this graphically, we can use a mosaic

plot.


R code

>

mosaicplot(table(survey$grade,survey$effort))


Description

Graphically compare 2 categorical variables




Version 1.0

Exploring Computer Science

Unit 6: Participatory Urban Sensing “R” Supplement

Page
3


Output










Version 1.0

Exploring Computer Science

Unit 6: Participatory Urban Sensing “R” Supplement

Page
4


How to interpret the mosaic plot:



What grade is the most common?

What grade is the least common?



Version 1.0

Exploring Computer Science

Unit 6: Participatory Urban Sensing “R” Supplement

Page
5



Each column

is broken down to how many people within it answered differently.

Within those students with A’s, are most of them trying their best or could they try harder?

Within those students with B’s, are most of them trying their best or could they try harder?


All the sizes are proportional to the numbers in the tables. So if twice as many respond a certain way,
then the height would be twice as big in the mosaic plot.

Looking at the mosaic plot as a whole, is there a trend?

Changing the order of the variab
les:



Version 1.0

Exploring Computer Science

Unit 6: Participatory Urban Sensing “R” Supplement

Page
6


R code

>

mosaicplot(table(survey$effort,survey$grade))


Description

If you switch the order of the variables, the mosaic plot will be drawn differently. It
may suggest a different story.


Output



What do you see in the second plot?




Version 1.0

Exploring Computer Science

Unit 6: Participatory Urban Sensing “R” Supplement

Page
7


R code

>

mosaicplot(table(survey$grade,survey$effort),col=c(
"cyan","magenta")
)


Description

Adding color to your mosaic plot.


Output


Now explore:

Try making mosaic plots with different combinations of the available questions: year, effort, homework,
grades.
Find another plot and explain what story it tells.



Extra Credit: Playing with the whole survey.



Version 1.0

Exploring Computer Science

Unit 6: Participatory Urban Sensing “R” Supplement

Page
8


** The original data

The data you have been working with comes from a larger survey conducted by Public agenda. As we
mentioned before, it was created by "sam
pling" students from a large registry. Again, imagine students'
names placed in a hat and draw them one at a time and call them with the questions. The "code book"
(survey questions and answers) is available at:



http://mobilize.stat.ucla.edu/day1/data/students.doc

You can access the original data for the surveys through R directly. The group who created the data
made it available in the format of another statistical programming environment

called SPSS. Just like
there are many general programming languages (Perl, Python, Java, C, C++) there are lots of statistical
programming languages as well. To read this "foreign" format into R we need to add to R's functionality
by loading a "library".
People contribute libraries to R to allow statisticians to represent new kinds of
data, perform new computations, make new graphics and so on. Here, we want to use a function that
lets R read an SPSS data set.

> library(foreign)

> big_survey <
-




read.s
pss("
http://mobilize.stat.ucla.edu/day1/data/reality_check.sav
",



to.data.frame=TRUE)

Here, the library() command introduced a new function, read.spss(). The arguments to
this function are
the URL where the data are located and a request that the data be read into a data table (again,
formally, a data frame).

Now, the code book labels its questions K1 through K48 or so. These are translated into the names of
the variables i
n "big survey". For example, the question about a students' grades is K36 in the main

survey which we could access with

> big_survey$qk36

You can now look through the questions and come up with a few that you are interested in. What story
can you tell?