Operationalising social science variables and using the GEODE, GEMDE and GEEDE services (Practical session)

feelingmomInternet and Web Development

Dec 7, 2013 (3 years and 6 months ago)

1,026 views

1




Operationalising social science variables and using the GEODE,
GEMDE and GEEDE services

(Practical session)


Pract
i
cal session

prepared for

the
workshop of the
‘Data Management through e
-
Social
Science’ (DAMES) research Node
,
30
th

January 2012
,

Universit
y of
Stirling


30
th

J
anuary

201
2

[Edition 1.
0
]



Workshop handout
:


Paul Lambert

(
University of Stirling
)

Additional contributions:


Tom Doherty, Susan McCafferty,

John Watt (all University of Glasgow)
;


Guy Warner (University of Stirling)





A
ims of t
his practical are to:


-

Introduce the GESDE service webpages and portal
environments

and illustrate how they can be
used

(see Part 1)

-

Illustrate some important options over what can be done with classifications of popular
sociological measures of occupatio
ns, educational qualifications and ethnicity, as can be
facilitated by the GESDE services (see Part
2
)



In part
1
, we use web browsers and basic editing tools to explore access to the GESDE services

In part
2
, we use
Stata,
SPSS
and R
example
s

to illustra
te
using
sociological classifications
with

survey data


To run this practical independently of the workshop sessions:



The majority of materials referred to below are available for download from the DAMES
Node ‘workshops’ website:
http://www.dames.org.uk/workshops/




Some of the s
urvey microdata
used in the second

session is av
a
i
l
a
ble from the UK Data
Archive (
http://www.data
-
archive.ac.uk/
) and the European So
cial Survey
(
http://www.europeansocialsurvey.org/
)



Individual login details to the GESDE portals can be obtained by following the instructions on
the respective ‘welcome’ pages



The GESDE services are p
art of the ESRC’s Data Management through e
-
Social Science research Node
(
www.dames.org.uk
). The DAMES node is supported by
an
ESRC grant (ref RES
-
14
9
251066).






2


Contents


Part 1: Introducing the GESDE services: GEODE, GEMDE and GEEDE

................................
..................

3

1.1 Overview of the GESDE services

................................
................................
................................
...

3

1.2 An

illustration of components of GEODE: Grid Enabled Occupational Data Environment

........

5

1)

Enter the GEODE portal (v2) at a guest level access

................................
...............................

6

2)

Enter the GEODE portal as a registered user

................................
................................
..........

7

3)

Try out the search resource at GEODE, and access the resultant data

................................
..

8

4)

An exercise in using the DAMES curation tool to upload a resource to GEODE

.....................

9


1.3 Introduction to GEEDE: Grid Enabled Educational Data Environment

................................
.....

13

1.4 Introduction to GEMDE: A service for MUGs and MIRs

................................
............................

14

GEMDE: Entering the GEMDE portal

................................
................................
............................

15

GEM
DE: Uploading MIRs and MUGs

................................
................................
.............................

17

GEMDE: Editing entries

................................
................................
................................
.................

21

GEMDE: Searching for MUGs and MIRs

................................
................................
........................

22

GEMDE: Live data analysis functionality

................................
................................
.......................

25


Part 2: Some applications in operationalising social science variables

................................
..............

27

2.1 Getting started with operationalising key variables

................................
................................

27

Handling occupational data

................................
................................
................................
..........

28

Handling education
al data

................................
................................
................................
............

29

Handling data on ethnicity

................................
................................
................................
............

30


2.2 Merging data in Stata, R and SPSS in order to operationalise variables

................................
.

31

2.3 An illustrative application in analysing social science variable operationalisations

...............

31

Further details and references

................................
................................
................................
.............

32


To cite this paper, please use:

Lambert, P.S.
, Doherty, T.
,

McCafferty, S.
,

Warner, G.
,

and Watt, J
.

(20
12
).
Operationalising social science variables and using the GEODE, GEMDE and GEEDE
services
(Practical session workbook)
, Unive
rsity of Stirling: Technical Paper 20
1
2
-
2

of
the Data Management through e
-
Social Science Research Node (www.dames.org.uk)



This practical updates, and draws substantially upon,
a

document

published in June 2010 and also available on
the DAMES Node websi
te, entitled: “Sociological classifications: The GESDE services for data on occupations,
educational qualifications and ethnicity (Practical
session
)”.

3


Part
1
:
Introducing the GESDE services: GEODE, GEMDE and GEEDE




1.1
Overview

of the GESDE services


The three GESDE services cover access to ‘specialist’ data relevant to operationalising variables in
social survey research.




GEODE

(‘Grid Enabled Occupational Data Environment,
www.geode.stir.ac.uk
), concerns
d
ata about occupations, such as are used in generating measures of social class or
stratification



GEMDE

(‘Grid Enabled ethnic Minority Data Environment,
www.dames.org.uk/gemde
),
covers data
on
ethnicity and its
related ‘referents’ (e.g. religion, nationality, language, ethnic
identity).



GEEDE

(‘Grid Enabled Educational Data Environment,
www.dames.org.uk/geede
), covers
data on educational qualifications (such as on wa
ys of harmonising measures across
countries).



A full text article describing the services and their contribution is also available in the conference
paper by Lambert et al. (2011).


Each
of the GESDE
service
s

has two elements, an introductory website
(linked above) and an
underlying ‘Portal’ (a route into a computing system which features file storage and access to the
designed services). Links to the portal services are found near the top of each
introductory
web
site
,
and in long
-
hand the
respective
a
ddresses are:


GEODE portal

https://dames.cs.stir.ac.uk/liferay/group/geode

GEMDE portal

https://dames.nesc.gla.ac.uk/web/guest/ge
mde

GEEDE portal

https://dames.cs.stir.ac.uk/liferay/group/geede



Typical appearance of one of the GESDE portals, after logging in as a ‘registered users’:


4



Noteworthy c
onsiderations whe
n accessing and using the
three GESDE
portals:




Firefox recommended browser:

The portals use a software called ‘Liferay’, which in our
experience performs best under the Firefox internet browser. Other
browsers

should be ok
with these services, but their
may be some unintended display features.



Accept security certificate:

The first time you use the portals, you may experience a popup
warning about an ‘untrusted certificate’, to proceed from which you should agree to accept
the site/certificate as relevan
t

to your browser



Guest or registered user access:

The portals feature federated secure access: some features
of the services are available to any visitor who identifies themselves as a ‘guest’ user; other
features are only available to registered users wh
o have signed in using credentials supplied
by us on an individual basis



Risk of site crashes:

The portals are rela
tive
ly

complex systems and unfortunately do tend to
be at some risk of failure under various scenarios.
It has been common for them to be
‘do
wn’ for short periods in the recent past whilst bugs have been identified then fixed
, and it
has also been common for specific elements of the portal services to fail (whilst the bulk of
the services run smoothly)
. Please do contact members of the DAMES no
de to notify them if
you find the portals to be down
, giving as many details as possible
, and we will respond as
soon as we can. As a back
-
up, we also try to maintain a low
-
tech alternative route to
important data through the open access websites (‘GESDE
-
l
ite’


see the links from the
relevant websites),



The objectives of the

portal

services
are

that they should become widely used, easily accessible
facilities
providing access to relevant information in each of the three specialist domains, in a way
whic
h should be relevant to
social scientists

ac
ross disciplines. Work on each service is now largely
complete,
though there are still some aspects of the services which we hope to improve upon
through further development.
The core activities that the services

support
are to:




A
llow all users (registered or unregistered) to s
earch through deposited data

in the subject
areas and
download

results

in a variety of ‘easy
-
to
-
use’ formats



Allow registered users (e.g. academic researchers) to deposit specialist data o
n their subject
areas with the portals, with it being systematically catalogued and made available to others



Additional contributions of the service
s which vary somewhat between the portals include:




Tools to r
un pre
-
specified queries on live micro
-
dat
a resources from the UK (e.g. frequency
tables for ethnic groups

for different year ranges
)



Tools to merge data obtained from the portals with a users own data
resource (for example,
to m
atch data files by enabling linking of data between a MIR and an use
rs own resource
)



Documentation

and dissemination of data on standard categories’ of measures covering
ethnicity, occupational classifications and educational qualifications (e.g. category labels)



Systems to collect ‘user’ and ‘expert’ ratings on the quali
ty of each resource in the portals








Feedback

to us…



Any inputs on
GESDE and
your
attempts to use some its services

are
most welcome, whether good or bad


please let us know
..!!

5



1.2 An illustration of components of GEODE: Grid Enabled Occupationa
l Data Environment





Go to the GEODE website and review its materials whilst reading this note:

www.geode.stir.ac.uk



The GEODE portal, linked from the GEODE
website
, is designed to act as a library style servi
ce for
accessing, distributing and linking with occupational information resources (databases of
information about occupations).


The GEODE project actually began in 2006 when another portal was developed of a slightly different
character to the current ve
rsion. The earlier portal, known as GEODE version 1, is still accessible
(linked from the main GEODE website), but it is very slow to use now, and in any case it is not
currently accessed by many people. There are some papers summarising its construction a
nd
development available at
www.geode.stir.ac.uk

(e.g. Lambert et al. 2007).


The current GEODE portal, known as version 2, has some different features and is now used as the
main resource. It was launched as a
prototype version in 2009 and has largely been complete since
summer 2011, though at time of writing there are still some features of the portal service which we
hope to further develop. Below, we suggest some activities which will give you a brief ‘tour’

of the
facilities available from the GEODE portal and website.





Send us your occupational information!!


GEODE is designed to by a dynamic web service with a clear social element (as are the other two
GESDE services, GEEDE

and GEMDE): its content is contributed by its own users, so the more people
send in materials to GEODE, the better the service becomes. This arrangement does of course rely
upon goodwill, but all of the GESDE are conceived of as resources to support the p
rocess of
academic and public service social science research, where sharing of background information such
as documentation files and derivation matrices is critical good practice in supporting documentation
for replication. Therefore, to be a good scient
ist, send in your occupational information to GEODE!


Nevertheless, in order to promote quality standards, resources at GEODE are subject to a little
monitoring:



All users have the ability to give ratings to, or register comments on, GEODE resources,
t
hrough the ‘comments and ratings’ tab



As a safeguard, contributed content is checked by members of the GEODE project
before being made publicly available to other users.



When possible, members of the GEODE project proactively seek out new relevant
resourc
es and upload them, and check entries on existing resources and update them
if relevant.

6



1)

Enter the GEODE portal (v2) at a guest level access


Click on the link to the portal at
http://www.geode.stir.ac.uk/index.h
tml#Portal



the direct link
itself is to
https://dames.cs.stir.ac.uk/liferay/group/geode




When you get to the portal site, click on GEODE and then you’ll go to the starting page which loo
ks
something like:



At this stage, the ‘Search’
‘Ratings’ and ‘Occupational standards’
pages are available to you, but
other functionality is not.

7




2)

Enter the
GEODE
portal as a registered user


Either click on ‘create account’ from the above page and f
ollow the steps to generate a new account,
or use the account details we’ve provided for this session:


User: …
………demo@dames.org.uk….

[
CHECK DETAILS WITH SESSION INSTRUCTORS
]…

Pass: …
………
L33dsD4mes…
………………..
[
CHECK DETAILS WITH SESSION INSTRUCTORS
]…


After
logging in, you can see additional functionality, which includes links to add in your own data at
GEODE, and to submit ratings on other resources:






8


3)

Try out the search resource at GEODE, and access the resultant data


Guest and registered users have
the ability to search through the GEODE portal and identify
resources which may be of use. Resources often include downloadable data or syntax files, but
sometimes they may only
constitute

a link to a website with further resources.


A typical search resu
lt (a search for the term ‘social interaction’):



Follow the ‘show metadata’ and ‘files’ link to download files for a resource. If you’re not logged in as a
registered user, you’ll need to enter the username ‘guest’ and password ‘guest’ before you are a
llowed
to download the resulting file.


9


4)


An exercise in u
sing the DAMES curation tool
to upload a resource to GEODE


This

requires you to click on the ‘D
ata
: deposit/add details
’ page of the portal, and follow the links
provided.
You’ll need a resource
to upload


as an exercise, we suggest you make up a new social
class scheme (everyone else does, after all!) and list it’s categories (e.g. class names) in an excel file
or word document, and upload that.


Here is a depiction of the overall curation proc
ess working in a prototype form:


Original occupational
data resource


..which contains…


10


We begin curating this
at the GEODE portal..


..Add some author
details…


…Some data on the
variables that are on
the file…




11


..including the standard
cl
assification of
occupational unit
groups to which the
data refers..


…namely HISCO..


..and the countries the
resource is concerned
with.


12


A browser then lets us
find and upload the
data file into the
filespace on the DAMES
server


.. and this incl
udes a
DDI format file with
metadata about the file


Lastly we notify the
DAMES tool about the
nature of the file(s) we
uploaded. When this is
all done, both the file,
and metadata about
the file, are available
on the GEODE server
and can be searched
and
, if relevant,
downloaded, by other
users.


13


1.3 Introduction to GEEDE: Grid Enabled Educational Data Environment






Go to the GEEDE website and review its materials whilst reading this note:

www.dames.org.uk
/geede




The GEEDE service uses the same underlying portal system as GEODE, so we won’t repeat all of the
elements illustrated above. For instance, the basic login as a registered user (the same registered
accounts are valid for GEODE and GEEDE


cf. p.
above) on GEEDE resembles the image below, which
is largely similar to that of the GEODE portal



Image of the GEEDE portal as a registered user:




At time of writing the number of information resources on GEEDE is smaller than we would like
(there are
about 50 entries). In time, we hope this resource will expand as users from across
countries deposit relevant information
resources
.




Illustrative task in GEODE:
T
here is a resource within the GEODE system called ‘
Tools for
coding BHPS educational qualif
ications measures to ISCED and a 4
-
category scheme’.
Find

it
and

download
one of
the data file
s for this resource

and examine its contents

(tip: search
for
a word included in the title, such as
‘ISCED’ or ‘BHPS’).
14





1.4
Introduction to GEMDE: A service f
or MUGs and MIRs



T
he organising concepts used in the GEMDE service

are things that we call

MUGs and MIRs.




A MUG is a ‘Minority Unit Group’ and constitutes a systematic listing of categories linked to
a measure of ethnicity




A MIR is a ‘Minority Inform
ation Resource’, it constitutes a piece of information about a
number of ethnic categories (i.e. about a MUG), of potential relevance to social survey
research





Here’s

an example of a MIR

(it’s generated during an exercise in section 2 below)
:








































































1
1
.



1
1
.

O
t
h
e
r

e
t
h
n
i
c

g
r
o
u
p






2
.
6
3
8







2
.
8
2





2
0





2
7






































































1
0
.














1
0
.

C
h
i
n
e
s
e






2
.
9
2
8






3
.
6
4
5






6






6




9
.











9
.

O
t
h
e
r

B
l
a
c
k






3
.
8
0
3






3
.
2
0
3






3






8




8
.









8
.

B
l
a
c
k

A
f
r
i
c
a
n






2
.
6
7
5






2
.
8
8
5





1
5






9




7
.







7
.

B
l
a
c
k
-
C
a
r
i
b
b
e
a
n






3
.
1
0
2






3
.
2
7
4






8





1
3




6
.











6
.

O
t
h
e
r

A
s
i
a
n






3
.
3
3
9






2
.
9
9
2





1
1





1
2







































































5
.











5
.

B
a
n
g
l
a
d
e
s
h
i






3
.
0
7
2






2
.
8
0
7






6





1
2




4
.













4
.

P
a
k
i
s
t
a
n
i






2
.
8
9
6






3
.
0
0
2





2
7





3
2




3
.
















3
.

I
n
d
i
a
n






3
.
1
7
7






3
.
0
2
8





4
1





4
1




2
.

















2
.

M
i
x
e
d










3






2
.
8
4
2





2
6





3
8




1
.

















1
.

W
h
i
t
e






2
.
8
1
6






2
.
8
7
2



5
9
8
5



7
3
0
5





























































































x
e
t
h
h



q
o
p
f
_
m
e
n



q
o
p
f
_
f
e
m



n
m
e
n



n
f
e
m





































































.

l
i
s
t

x
e
t
h

q
o
p
f
_
m
e
n

q
o
p
f
_
f
e
m

n
m
e
n

n
f
e
m

S
o
r
t
e
d

b
y
:



s
i
z
e
:











3
9
6

(
9
9
.
9
%

o
f

m
e
m
o
r
y

f
r
e
e
)

v
a
r
s
:













6


























2
6

J
a
n

2
0
1
0

2
0
:
5
9
















































w
o
r
k
,

B
H
P
S

w
1
7
















































t
h
a
t

i
t

i
s

b
e
t
t
e
r

f
o
r

w
o
m
e
n

t
o


o
b
s
:












1
1


























E
t
h
n
i
c

g
r
o
u
p

a
v
e
r
a
g
e
s
:

a
g
r
e
e
m
e
n
t
C
o
n
t
a
i
n
s

d
a
t
a

f
r
o
m

c
:
\
g
e
m
d
e
\
l
a
b
\
m
i
r
s
\
\
b
h
p
s
_
q
o
p
f
a
m
c
.
d
t
a





And here’s a very simple example of a MUG:


.

l
a
b
e
l

d
e
f
i
n
e

e
t
h
2
l

1

"
W
h
i
t
e

U
K
"


2

"
B
l
a
c
k

o
r

A
s
i
a
n
"

3

"
O
t
h
e
r

w
h
i
t
e
/
o
t
h
e
r
"




15



GEMDE:
Entering the GEMDE portal


We’ll next go through the process of entering the GEMDE portal and moving between its tabs.




Go to the GEMDE website
(http://www.dames.org.uk/gemde/)
:







C
lick on the link to the prototype portal:

This link will trigger a request for Shibboleth authentication:



16








Complete the authentication using the details we’ve given you

You can
access

the system either as a Guest or a named user (we’ll give you ac
counts for the
latter).
When the system is fully developed, access to the portal
as a registered user will be
given on the basis of you

HE institutional
account (
or
alternatively available
at a guest level)



REGISTERED USER




GUEST


Instit
ut
ion:

…..
National e
-
Science Centre….


…..National e
-
Science Centre…
.


Username:

………
[ON REQUEST]
………………


……
dames
…………..

Password:

………[ON REQUEST]
………………


……
dames
…………..







Explore the site a little


On entering the site you should see something like the above d
isplay. Click on the tab ‘Welcome’, ‘e
-
Health’ etc to see information on other aspects of the DAMES.


**Browser
compatibility

note**

In general, all the GESDE portals are compatible with popular Firefox and IE.
However these and other browsers will someti
mes give warning messages relating to the site certificates of
components. For best results, It is generally necessary choose 'No' when asked 'Do you want to view only the
webpage content that was delivered securely'
.

17



GEMDE:
Uploading MIRs and MUGs


We
’ll start with the more technical facility (depositing data). In practice, relatively fewer users are
likely to want to upload MIRs and MUGs, and relatively more will want to use GEMDE simply to search
for relevant resources (section 5).




To upload a MIR,

try using a resource you have or that you created during exercise 2.
Alternatively, we’ve placed a sample MIR

online (‘
bhps_ethnic_group_SOR_scores.dat
’, a file
which give socioeconomic
scores for UK ethnic groups
, available, along with explanatory
docume
nts, at:
http://www.dames.org.uk/gemde/index.html#workshop_jan2010



1. White
2. Mixed
3. Indian
5. Bangladeshi
6. Other Asian
7. Black-Caribbean
8. Black African
9. Other Black
10. Chinese
11. Other ethnic group
4. Pakistani
-2
-1
0
1
2

Source: BHPS wave 17, n = 12626, % 'White' = 97.3
Identified principally by age, gender attitudes and household income
SOR model dimension scores for BHPS ethnic groups




To upload a MUG, try uploading the new scheme you defined above in exercise 2.
We’ve

placed a sample MUG, cov
ering ethnic groups for Bolivia from the 2001 census (data from
IPUMS,
www.ipums.org
),
at
www.dames.org.uk/gemde/index.html#workshop_jan2010




18





Click on the link ‘Deposit new data resource’




Click on the link to
add a MIR, and fill out a form for your resource, and upload


19


Most entries
on the form are not
compulsory,
but the more that are completed the better the quality
of the resource. For most resources, a very important element of the submission is the upl
oad of a
particular data file linked to the resource: once uploaded, a copy of the file or files will be stored on
the GEMDE server, and potentially available for others to access.





If
all goes well, the new resource is added :


Note that there’s l
ots more metadata you might usefully supply with your ‘MIR’. You can add this by
further editing the resource (see next sections).

20



Click on the link to deposit data and to add a MUG, and fill out a form for your resource,
and upload it





Similarly t
o the process of submitting a MIR, after you upload a MUG in the first instance, it is then
possible (and encouraged by us) for you to use the ‘edit data’ page to return to that record and add
additional metadata about the resource, for instance:



21


GEMD
E:
Editing entries


GEMDE is intended to support easy editing of data resources after they have been uploaded or registered with
the service. The edit data tabs allow anyone who has entered data at GEMDE to edit it manually at a later
date.
This includes
options for updating / overwriting files previously uploaded (such as if you found a minor
error in a file after you’d uploaded it).


Click on the link to ‘edit data’, and explore editing the resources you’ve just uploaded

For example..



22



GEMDE:
Sear
ching for MUGs and MIRs


One of the more exciting aspects of building up a

collaborative

data resource concern making new
resources readily accessible

to others
. As soon as any registered user has uploaded a resource, others
are able to identify it when se
arching or browsing
. Other registered users

are also able t
o rate

and
comment on a resource in terms of its quality, and these ratings feed into the way in which the results
from analysis are displayed.






Experiment with ‘search’ and ‘browse’.


E.g.
-

Try browsing for
MUGs which use the referent of ‘citizenship/nationality’
.


-

Try searching for the term ‘
Stata







23





Assign a rating to a resource, and observe how this impacts the display order for a
resource with ‘search’ and ‘browse’.


-

O
bserve that at the bottom of a search screen, there are rating options:


24




-

Ratings can be entered for any resource, and comments given:




-

These ratings can later be used to navigate between multiple options:




25



GEMDE: Live data analysis functionali
ty


In the earlier section using SPSS, and in various other comments, we have repeatedly stressed that a
good understanding of sociological classifications as variable operationalisations ordinarily requires
clear recognition of the correlation between th
e social classification and other socio
-
demographic or
socio
-
economic structures. Different ethnic groups, for example, have very different age profiles and
these
should

be taken into consideration
when summarising patterns of ethnic difference.


In GEMDE

(and in the educational data service, GEEDE) we have developed a data analysis service
within the portal which allows users to run live queries on large scale microdata resources for the
UK, and readily derive robust statistical data in the relevant subje
ct areas. In addition, we have
prepared analysis options which allow users to try out various different data permutations (such as
the time span of the data, or changes in the recoding of a relevant classification). We’d argue that
these ‘data management’
functionalities are a crucial context to understanding the data (these also
make the analysis itself more complex, and hence the results themselves otherwise harder to get).
The functionality is available to GEMDE users via the ‘Microdata’ page on the port
al.





Go to the ‘microdata’ page on the GEMDE portal









26




From this page, fill out successive requests ‘Step 1’ (choose the Li and Heath dataset), ‘Step
2’ (choose the ‘mean ages’ script) and ‘Step 3’ (choose a year range of your own preference).
Th
is requests the task to run.




The result
may

take a minute or so
to come through
(a large scale dataset is
being processed
behind the scen
es). Eventually, when it has finished, a graph should be visible on your screen
(you may need to accept a prompt a
greeing to download non
-
secure items at this stage).




[Reminder of the note on p11: you may need to choose 'No' when asked 'Do you want to view only the
webpage content that was delivered securely', in order to see this graph]
27



Part
2
: Some application
s in operationalising social science variables



2.1 Getting started with operationalising key variables


Measures of occupations, of educational qualifications, and of ethnicity constitute three
classes of socio
-
economic/socio
-
demographic measures whic
h are important across social science
disciplines, but have been subject to particularly long
-
standing methodological scrutiny in the
discipline of sociology (for instance Stacey, 1969 and Burgess, 1986 for discussions oriented to the
UK). In empirical ter
ms, these measures are often very important influences upon other measures of
social circumstances. Moreover it can easily be demonstrated that the particular choice of coding (or
‘variable operationalisation’) for any measure can (potentially) have a majo
r impact upon its own
properties, and on other related patterns of correlation and association (i.e., how you classify a
measure influences what you end up finding!). Accordingly, we would argue that researchers
across

social science disciplines would oft
en benefit from more access to, and awareness of, relevant
sociological classifications of occupations, educational qualifications and ethnicity. Common
observations include that:




There

are

probably
many more feasible alternative
s than you realised




Int
uitive coding or re
-
coding of classifications is generally
a
bad

idea…



H
armonisation or standardisation

proposals for these measures for comparative
analysis
probably
already exist…



Scaling category codes

can often be

an effective, parsimonious approach




In general, the use of classifications and measures of occupations, educational qualifications
and ethnicity is inconsistent between researchers across the social sciences. It is readily possible to
identify examples of analyses which might, in retrospect
, have benefitted from consideration of
alternative variable operationalisations (this applies within the discipline of sociology itself, as well
as more widely across other social science research domains). It is widely assumed that problems
arise because

the techniques required to exploit social classifications tend to be developed as
specialist methodological topics, and require particular knowledge of data resources and processing
techniques. Typically, few social scientists outside of the relevant expe
rt communities are aware of
the existence of relevant resources, or of how best to exploit them (see Lambert et al., 2007, for a
discussion relating to the use of occupational data).


Our point of departure in the DAMES Node has been to argue that better
documentation and
metadata on information resources related to measures of occupations, ethnicity and educational
qualifications is desirable. Viz, if more people knew how to do more sophisticated things with the
measures available to them, social science
could take a step forward. We claim that progress can be
achieved by developing services which provide access to relevant information resources on
classifications, and which make it easier for specialist researchers to themselves disseminate the
documentat
ion from their own analyses for the benefit of other research projects.


The first step is making people aware of
effective ways to handle data on key social science
variables, and of
the value that might be added to their own analysis by taking fuller a
ccount of
measures and supplementary information resources on these topics.
Below we introduce the forms
of data typically involved, then in
section

2.1 give some examples of linking datasets to exploit
specialist information resources, and in section 2.3
an illustrative application showing the empirical
consequences of variable operationalisations.


28



Handling occupational data


The starting point of most analyses which involve occupational data are records which are coded to
‘occupational unit groups’ (r
elatively detailed taxonomies of occupational positions).




Take a
quick

look at:



The ONS listing of occupational unit groups in the Standard Occupational Classification, at:
http://www.ons.gov.uk/ons/guide
-
method/classifications/current
-
standard
-
classifications/soc2010/index.html




The ONS website instructions on collecting occupational data, at:
http://www.ons.gov.uk/ons/guide
-
method/harmonisation/primary
-
set
-
of
-
harmonised
-
concepts
-
and
-
questions/index.html

(p4)



The CASCOT tool for coding textual descriptions

of occupational titles (try a few examples),
at:
http://www2.warwick.ac.uk/fac/soc/ier/software/cascot

(under ‘use online’



CASCOT output: 5323 is the best recommendation of a SOC
-
20
10 code for a hapless decorator.





Not all research projects collect such detailed occupational data


in fact, Ganzeboom (2005)
argues that the pay
-
off to more rather than less detail in occupational classifications is often
relatively slight


but in mos
t circumstances it is seen as relatively easy to record and code
detailed descriptions, and for some writers the precise circumstances of different
occupations are particularly important (e.g. Jonsson et al. 2009).



In most instances, an analyst holds a da
taset which features a list of occupational unit group
codes, then for analytical purposes
,

sh
ould
follow instructions, such as the derivation
matrices available on the GEODE system, in order to
cod
e

those
(often also using extra
information on employment
status)
into the values of an occupation
-
based social
classification. We give
some
examples
of doing this
in sections 2.2. and 2.3 below
.

29



Handling educational

data


With data on educational qualifications,
we

typically have access to
one or more

variabl
es with
records on the
qualifications held by a respondent. For some surveys, a single variable may be
identified
,

which indicates something like the ‘highest qualification held’
. However

it is probably
more common for data to feature ‘multiple response’ q
uestions where different variables indicate
the presence or absence of different qualifications (example illustrated below). In applied research,
we would typically code qualifications to some ordered or scaled measure of relative advantage,
and, in the ca
se of multiple response data, devise some algorithm for identifying and recording the
highest relevant value. Such decisions are often made more complex by the strong relationship
between qualifications held and birth cohort (resulting for institutional re
forms over time).




We show some examples of using resources from GEEDE to handle educational data in 2.2 and 2.3.
Some other useful sites with information about educational qualifications taxonomies are:

Harry Ganzeboom’s cross
-
national coding frame
s for educational qualifications:

http://home.fsw.vu.nl/HBG.Ganzeboom/ISMF/ismf.htm


IPUMS international’s coding details covering qualifications data:

https://international.ipums.org/international
-
action/variables/group/educ


The ISCED classification scheme documentation (see esp. Schneider, 20
10
):

http://epp.eurostat.ec.europa.eu/statistics_explained/index.php/Glossary:ISCED



-1
-.5
0
.5
Fitted values
20
30
40
50
60
70
age at date of interview
Personal income
Occupational advantage
Educational advantage
BHPS wave 17, unweighted,
Males in work aged 16-70. N=3590. R2=0.09 for income, 0.03 for occupation and education.


This graph tries to
illustrate the influence of
age cohort on
educational attainment
(and its contradictory
trend when compared to
age
-
income or age
-
occupation patte
rns).


Example of multiple
responses for the UK’s
‘Family and Working Lives’
survey

(54 variables
are
available

to record the
presence or absence of
potentially 54 different
types of qualification
)


30




Handling
data on ethnicity



Data on ethnicity is usually only recorded on a small number of measures, but there are various
complications in using it in statistical analysis, such as achieving harmonisation over time or between
countries; dealing with sparse categories; and potentially dealing with information from several
relevant ‘referents’ (e.g. ethnic identity, religion, i
mmigrant status, etc). The table copied below
from Simpson and Akinwale, for instance, illustrates categories of ethnic identity from two recent UK
social surveys.





We also have some examples of processing and analysing data on ethnicity in sections
2.2 and 2.3
below.





Reminder:

.
..Please help us improve the GESDE services by
sending us feedback on the services




and by submitting your own relevant data resources to them
!
...




31



2.2) Merging data in Stata, R and SPSS in order to operationalise v
ariables


An example dataset (based upon an anonymised extract from the UK’s BHPS, see University of Essex
2010)
,

called
bhps_anon_sample_gesde.dta
,

f
eatures data on occupations, educational
qualifications and ethnicity. In this section we show examples of

enhancing the data with new
variables based upon information resources which are available in the GEODE, GEEDE and GESDE
systems. In the next section, we show some analytical results.



For Stata users: Download the
file
gesde_illustrations_stata.d
o
, and

work through its ‘part 1’

For R users: Download the file

gesde_illustrations_R.
R
, and work through its ‘part 1’

For SPSS users: Download the file
gesde_illustrations_SPSS.
sps
, and work through its ‘part 1’.



Relevant data files and syntax files are ava
ilable at:


On lab machines at the Stirling workshop of 30
th

January 2012:


Stata is available at:


SPSS is available at:


R is available at:








2.3) An illustrative application in analysing social science variable operationalisations



Follow
ing ‘Part 2’ of each of the relevant syntax files
, and the comments written within the
command files,
run regression models using the different files and
reflect upon

the features
of the variable operationalisions
.



Here are some interesting results from
the models run in this section:


Model….

N

Pseudo
-
r2

ll

bic






Demographic vars only

1083

0.013

-
700

1435

+ CAMSIS

1051

0.049

-
655

1352

+ NS
-
SEC full

1051

0.044

-
659

1401

+NS
-
SEC simple

1051

0.034

-
665

1372

+ Educ (2 categ)

1080

0.043

-
676

1395

+

Educ (4 categ)

1080

0.044

-
676

1407

+ Ethnicity (scale)

1083

0.016

-
698

1437

+ Ethnicity (9 categ)

1069

0.014

-
693

1448

+ CAMSIS + Educ (2) + Eth(scale)

1048

0.060

-
646

1348

+ NS
-
SEC full + Educ (4) + Eth(9)

1038

0.061

-
641

1413







32



Further de
tails

and references


Some further examples

in dealing with variable operationalisations in the social sciences
:


-

The DAMES Node webpages feature some generic illustrative examples of handling data files and
operationalising variables, at
http://www.dames.org.uk/workshops/data_management_help.html


-

In our DAMES Node workshops we have prepared many other example files showing file handing and
variable operati
o
nalisations in acti
on


see especially the workshop ‘Documentation and workflows in social
science research’, with materials available from:
www.dames.org.uk/workshops/index.html#workflows_aut10


-

We have written a technical paper which includes Stata syntax examples showing data handling and analysis
in a study of the impact of variable operationalisations for an illustrative research example concerning the UK
Higher Education RAE scores (see Lambe
rt and Gayle 2008)

-

There is an extended worked example, in SPSS syntax, of applications using data on occupations, educational
qualifications and ethnicity in the syntax file ‘gesde_examples.sps’ (this example also requires some data files
for use in th
e exercise, which are available from the UK Data Archive, and to prepare them using the additional
syntax file gesde_workshop_data_setup*

-

see

www.dames.org.uk/workshops/index.html#l
eeds_jun10
).

-

In a shortly forthcoming edited book, which features inputs from researchers from the DAMES Node, there
are several chapters with discussion and illustrative analysis of the operationalisation of ‘key variables’ in
sociological research ex
amples (see Lambert et al. 2012)



References

cited


Burgess, R. G. (Ed.). (1986).
Key Variables in Social Investigation
. London: Routledge.

Ganzeboom, H. B. G. (2005). On the Cost of Being Crude: A Comparison of Detailed and Coarse
Occupational Coding. I
n J. H. P. Hoffmeyer
-
Zlotnick & J. Harkness (Eds.),
Methodological
Aspects in Cross
-
National Research

(pp. 241
-
257). Mannheim: ZUMA, Nachrichten Spezial.

Jonsson, J. O., Grusky, D. B., Di Carlo, M., Pollak, R., & Brinton, M. C. (2009). Microclass Mobility:

Social Reproduction in Four Countries.
American Journal of Sociology, 114
(4), 977
-
1036.


Lambert, P. S., Connelly, R., Blackburn, R. M., & Gayle, V. (Eds.). (2012).
Social Stratification:
Trends and Processes
. Aldershot: Ashgate.

Lambert, P. S., & Gayle,
V. (2008).
Data management and standardisation: A methodological
comment on using results from the UK Research Assessment Exercise 2008
. Stirling,
University of Stirling: Technical Paper 2008
-
3 of the Data Management through e
-
Social
Science Research Node
(www.dames.org.uk).

Lambert, P. S., Tan, K. L. L., Turner, K. J., Gayle, V., Prandy, K., & Sinnott, R. O. (2007). Data
Curation Standards and Social Science Occupational Information Resources.
International
Journal of Digital Curation, 2
(1), 73
-
91.

Lambert
, P. S., Warner, G. C., Doherty, T., McCafferty, S., Watt, J., Comerford, M., et al. (2011).
Collaborative systems for enhancing the analysis of social surveys: the Grid Enabled
Specialist Data Environments
. Paper presented at the New Techniques and Techno
logies for
Statistics conference, ESTAT, Brussels, 22
-
24 February 2011, and
http://www.ntts2011.eu/

/
http://www.dames.org.uk/
docs/conf_papers/estat/NTTS
-
2011_lambert_et_al.pdf


Li, Y., & Heath, A. F. (2008).
Socio
-
Economic Position and Political Support of Black and Ethnic
Minority Groups in the United Kingdom, 1972
-
2005 [computer file]. 2nd Edition
. Colchester,
Essex: UK Data
Archive [distributor], SN: 5666.

Schneider, S. L. (2010). Nominal comparability is not enough: (In
-
)Equivalence of construct validity of
cross
-
national measures of educational attainment in the European Social Survey.
Research
in Social Stratification and
Mobility, 28
(3), 343
-
357.


Simpson, L., & Akinwale, B. (2006).
Quantifying Stab
i
lity and Change in Ethnic Group
. Manchester:
University of Manchester, CCSR Working Paper 2006
-
05.


Stacey, M. (Ed.). (1969).
Comparability in Social Research
. London: Heineman

(on behalf of the
British Sociological Association).