Microsoft PowerPoint - NCRM EPrints Repository

internalchildlikeInternet και Εφαρμογές Web

12 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

154 εμφανίσεις

Dealing with variables: Resources
and topics in enhancing secondary
survey data

Paul Lambert

University of Stirling

DAMES research Node,
www.dames.org.uk

Part of session
17
‘Resources (i): Resources for data management’


6
/JUL/
2010

4
th ESRC Research Methods Festival

St Catherine’s College, Oxford.
5
-
8
July
2010



Dealing with variables: Resources and
topics in enhancing secondary survey data


1)
‘Rigorous and vigorous’ approaches to
dealing with variables


2)
Three specialist topics: The GESDE
services for data on occupations,
ethnicity and educational qualifications

…Survey research and variable analysis…


4

‘Data management’ applied to
variables refers to…



the tasks associated with linking related data resources, with
coding and re
-
coding data in a consistent manner, and with
accessing related data resources and combining them within the
process of analysis


[…DAMES Node..]



Usually performed by social scientists themselves


Pre
-
analysis tasks (though often revised/updated)


Inputs also from data providers


Usually a substantial component of the work process


But may not be explicitly rewarded (sometimes even penalised..)




a little different from archiving / controlling data itself

5

Some components in secondary survey
research…


Manipulating data


Recoding categories / ‘operationalising’ variables


Linking data


Linking related data (e.g. longitudinal studies)


Combining / enhancing data (e.g. linking micro
-

and macro
-
data)



Secure access to data


Linking data with different levels of access permission


Full or restricted access to detailed micro
-
data


Harmonisation standards


Approaches to linking ‘concepts’ and ‘measures’ (‘indicators’)


Recommendations on particular ‘variable constructions’


Cleaning data


‘missing values’; implausible responses; extreme values

6

Example


recoding data
[use a ‘recode’ or file matching routine]



Count
323
0
0
0
0
323
982
0
0
0
0
982
0
425
0
0
0
425
0
1597
0
0
0
1597
0
0
340
0
0
340
0
0
3434
0
0
3434
0
0
161
0
0
161
0
0
0
1811
0
1811
0
0
0
0
2518
2518
0
0
0
331
0
331
0
0
0
0
421
421
0
0
0
257
0
257
102
0
0
0
0
102
0
0
0
0
2787
2787
138
0
0
0
0
138
1545
2022
3935
2399
5726
15627
-9 Missing or wild
-7 Proxy respondent
1 Higher Degree
2 First Degree
3 Teaching QF
4 Other Higher QF
5 Nursing QF
6 GCE A Levels
7 GCE O Levels or Equiv
8 Commercial QF, No O
Levels
9 CSE Grade 2-5,Scot
Grade 4-5
10 Apprenticeship
11 Other QF
12 No QF
13 Still At School No QF
Highest
educational
qualif ication
Total
-9.00
1.00
Degree
2.00
Diploma
3.00 Higher
school or
vocational
4.00 School
level or
below
educ4
Total
7

..plus the centrality of keeping clear records
of DM activities

Reproducible

(for self)

Replicable

(for all)

Paper trail for
whole
lifecycle

Cf. Dale
2006
; Freese
2007



In survey research,
this means using
clearly annotated
syntax files



(e.g. SPSS/Stata)



Syntax Examples:

www.dames.org.uk/workshops/


www.longitudinal.stir.ac.uk

Some provocative examples for the UK…


Social mobility is increasing, not decreasing!!


Popularity of controversial findings associated with Blanden et al (
2004
)


Contradicted by wider ranging datasets and/or better measures of stratification position


DM: researchers ought to be able to more easily access wider data and better variables



Degrees, MSc’s and PhD’s are getting easier


{or at least, more people are getting such qualifications}


Correlates with measures of education are changing over time


DM: facility in identifying qualification categories & standardising their relative value within
age/cohort/gender distributions isn’t, but should, and could, be widespread




‘Black
-
Caribbeans’ are not disappearing



As the
1948
-
70
immigrant cohort ages, the ‘Black
-
Caribbean’ group is decreasingly
prominent due to return migration and social integration of immigrant descendants


Data collectors under
-
pressure to measure large groups only


DM: It ought to be possible to harmonise measures of ethnicity over time, and to build richer
data resources with more cases (e.g. by merging survey data)



People interpreted the RAE wrongly!



Most responses to the RAE
2008
involved comparing GPA scores between subject areas
within and/or across institutions; but standardising relative to subject area distribution, or
scaling by subject area, often gives very different results.


DM: see Lambert and Gayle (
2008
) for a demo of alternative uses of RAE data

What might a rigorous and vigorous
variable analysis look like?

..open to debate but I’d nominate:




Replicability


Features a pro
-
active review of variables


Review a full set of alternative measures


Review alternative functional forms


Attention to distribution/standardisation


Attention to harmonisation





How should I make my work replicable?


The concept of a ‘workflow’ is a useful device
for documenting a survey research project



Workflows involve organising materials as a series of
interrelated but distinctive components


In survey research, software syntax files make excellent
templates for documenting our work in component elements

[Long,
2009
;
Treiman
,
2009
; Altman & Franklin,
2010
;
Kulas
,
2008
]


Computer science researchers have developed workflow
depositories
[e.g.
MyExperiment
]
and workflow capture tools
[e.g.
Taverna
]



Ad hoc organisation of a workflow as a ‘master file’ in Stata

Forthcoming
workshop:
‘Documentation and
workflows for social
survey research’,
University of Stirling,
1
-
2
September
2010
, see
www.dames.org.uk

A workflow summary in Excel (following Long,
2009
)


How should I review variables/functional
forms/distributions/harmonisations?


We tend to rely on personal expertise in
particular subject domains


Expertise of the depositor of the data


Expertise of the analyst

Some textbooks and other capacity building events cover these topics
generically [e.g.
Treiman

2009
], but by and large they get unduly
neglected from methodological training


…Something called ‘e
-
Science’ can help with
both variable reviews and replication…


The ‘e
-
Social Science’ endeavour


see
http://www.merc.ac.uk/

for up
-
to
-
date links


A number of UK projects seeking to improve social
science research by capitalising on emerging
computer science techniques


Handling distributed data; collaborative technologies;
large and complex data; secure data



The ‘Grid’ embodies these technologies, but more
generic terms like ‘e
-
Social Science’ & ‘Digital
Social Research’ are increasingly preferred


GESDE: ‘Grid Enabled Specialist Data Environments’

14

e
-
Social Science, BSA
2009

15

Example: Understanding New Forms of
Digital Records (DReSS)
http://web.mac.com/andy.crabtree/NCeSS_Digital_Records_Node/DReSS.html



transcribed
talk


audio


video


digital
records


system logs


location



transcript

code
tree

video

system
log

16

This session part
-
organised by the ‘Data
Management though e
-
Social Science’ node


DAMES


www.dames.org.uk




ESRC Node funded
2008
-
2011




Aim: Useful social science provisions by exploiting
tools for data management developed in computer
science. Core components are:


Data curation tool


Data fusion tool


Portals for access to data and data resources


Data curation tool collects metadata and allows data resources of
different formats to be organised in an accessible depository


Data fusion tool
supports merging of data files through shared variables

(e.g. for recodes, aggregations, pooling data, linking related data, probabilistic linkages)

External user

(micro
-
social data)

Occ

info (index file)
(aggregate)

User’s output

(micro
-
social data)

id

oug

sex

.

oug

CS
-
M

CS
-
F

EGP

id

oug

CS

1

110

1

.

110

60

58

I

1

110

60

.

2

320

1

.

320

69

71

II

2

320

69

.

3

320

2

.

874

39

51

VIIa

3

320

71

.

4

874

1

.

4

874

39

.

5

874

2

.

5

874

51

.

GEMDE


Example of a
‘portal’ for
distributing
and accessing
supplementry
data related to
ethnicity

2
) Special Topics: The GESDE services
for sociological classifications


‘Key variables’ in social science research are not just
for sociology, but are much debated there


Complex categorical measures and ‘variable
operationalisation
’ recommendations/debates


Individual level measures of social positioning…



‘GESDE’ =
3
related online services which are “Grid
Enabled Specialist Data Environments”


GEODE: the ‘o’ is for data on Occupations


GEEDE: the ‘e’ is for data on Educational qualifications


GEMDE: the ‘m’ is for data on ethnic Minorities

Our contribution in GESDE..


Many existing resources on these topics
[See app.]


Academic reviews and projects


[e.g. Rose & Harrison
2010
; Ganzeboom,
2008
; Schneider,
2008
; Guveli,
2006
]


Service providers


[e.g. ESDS variable guides; CESSDA
-
PPP]


National Statistics Institutes’ guidelines


[e.g.
www.ons.gov.uk/about
-
statistics/harmonisation/
]



It’d be good if more people were engaging with and
exploiting these resources to enhance their own data..!

22

At the centre of this are problems of
standardizing categorical data



‘Measurement equivalence’
(e.g. van Deth,
2003
)
is often
not feasible for complex categorical measures


For categorical data, equivalence for comparisons is
often best approached in terms of meaning equivalence

(because of non
-
linear relations between categories and shifting
underlying distributions)


(even if measurement equivalence
seems

possible)



Arithmetic standardisation offers a convenient form of meaning
equivalence by indicating relative position with the structure
defined by the current context


For categorical data, this can be achieved/approximated by
scaling categories

in one or more dimension of difference




23


Managers and Administrators
Professional
Associate professional and technical
Clerical and secretarial
Craft and related
Personal and protective services
Sales
Plant and machine operatives
Other occupations
.
higher degree
first degree
teaching qf
other higher qf
nursing qf
gce a levels
gce o levels or equiv
commercial qf, no o levels
cse grade 2-5,scot grade 4-5
apprenticeship
other qf
no qf
.
white
black-carib
black-african
black-other
indian
pakistani
bangladeshi
chinese
other ethnic grp
20
30
40
50
0
1
2
3
Source: British Household Panel Survey 2007, adults aged 18+ and father's Cambridge Scale score.
Points at 1-3 show category mean. Points at 0 show individual values (scaled mean=28, sd=6; pop. mean=28, sd=18).
‘Effect proportional scaling’ using parents’ occupational advantage

What was that then?



We can represent categories through positions on a scale


In turn, we can use position in the dimension as a category
score which then plugs into a further analysis (e.g.
regression main and interaction effects)


..E.g. some options for data on ethnicity..


Stereotyped Ordered Logistic Regression (SOR) models, summarize
dimensions of difference according to regression predictor values



[e.g. Lambert and Penn,
2001
]


Geometric data analysis for distances between people, or things



[cf. Prandy,
1979
; Bennett et al.,
2009
]


Assign category scores by hand (
a priori

or by selected average)


24

25


WhB-a
WhO-b
Ind-c
WhB-c
PkB-a
WhO-c
WhI-c
WhB-b
WhO-a
Oth-b
BA-b
BC-b
Chi-b
PkB-b
BC-a
BC-c
Ind-b
Chi-c
Oth-a
PkB-c
BA-a
WhI-a
Ind-a
WhI-b
Oth-c
BA-c
Chi-a
I/II
IIIa
IVabc
V/VI
VII/IIIb
Unemployed
Inactive
-2
-1
0
1
2
Dimension 2 (22.1%)
-2.5
-2
-1.5
-1
-.5
0
.5
Dimension 1 (58.4%)
a = Born in UK; b = Came to UK before 1970; c = came to UK 1970 or later
N=640295 (Data: Li and Heath, 2008)
LFS pooled data for men, 1991-2005
Correspondence analysis dimension scores
2
(a) Data on occupations


Occupational unit groups = standardised lists
of occupational titles


E.g. via CASCOT,
www
2
.warwick.ac.uk/fac/soc/ier/publications/software/cascot/

26

..data on occupations..


find ways of
attaching
summary
information
about
occupations to
occupational
unit groups

27

18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
Vingtiles
male
female
maximum: 5799
frequency
CAMSIS
routine occupations
semi-routine occupations
lower supervisory and technical
small employers and own account workers
intermediate occupations
lower managerial and professional
higher managerial and professional
male
female
maximum: 9764
frequency
NS-SEC
Source: Labour Force Survey Jan-Mar 2008, current job of employed (18yrs+)
Comparability problems => value of documenting
methods & comparing alternatives


28

Unskilled
Skilled manual
Petty-bourg.
Non-manual
Salariat
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Source: Females from LFS/GHS, using data from Li and Heath (2008)
percent of year category
Goldthorpe class scheme harmonised over time
GEODE: Our contribution


GEODE acts as a library style service for access
to ‘occupational information resources’


We encourage people to supply data they’ve
produced, and we upload data ourselves


Researchers are encouraged to use the portal to
find and exploit suitable data


Services: search, browse, deposit data,
link data, user ratings

29

GEODE (v
1
)


Occupational data

Survey Network
4
June
2009

31

Using occupational data: Example as a measure of
marked social disadvantage Lambert & Gayle (
2009
)


0
20
40
60
80
100
50% Med
60% Med
Mean - (1SD - Skew)
MCAMSIS - BHPS adults' most recent job
0
20
40
60
80
100
50% Med
60% Med
Mean - (1SD - Skew)
MCAMSIS - BHPS adults' fathers
32

[Example: Occupational not geographical inequality]


Scotland
0-20%
21-40%
41-60
61-80%
81-90%
91%+
Central Scotland
Source: CASWEB, Census 2001 Output areas.
Points show percentile mean average CAMSIS score for males in work.
2001 Census
Geography of occupational advantage
2
(b) Data on educational
qualifications


Similar issues arise with the use of
educational data


Specialist resources exist which can enhance
measures of educational data


Many users aren’t aware of alternative coding
schemes or harmonised approaches



GEEDE acts as a service for bringing
together and disseminating relevant data
resources on educational measures

34

Example


recoding data



Count
323
0
0
0
0
323
982
0
0
0
0
982
0
425
0
0
0
425
0
1597
0
0
0
1597
0
0
340
0
0
340
0
0
3434
0
0
3434
0
0
161
0
0
161
0
0
0
1811
0
1811
0
0
0
0
2518
2518
0
0
0
331
0
331
0
0
0
0
421
421
0
0
0
257
0
257
102
0
0
0
0
102
0
0
0
0
2787
2787
138
0
0
0
0
138
1545
2022
3935
2399
5726
15627
-9 Missing or wild
-7 Proxy respondent
1 Higher Degree
2 First Degree
3 Teaching QF
4 Other Higher QF
5 Nursing QF
6 GCE A Levels
7 GCE O Levels or Equiv
8 Commercial QF, No O
Levels
9 CSE Grade 2-5,Scot
Grade 4-5
10 Apprenticeship
11 Other QF
12 No QF
13 Still At School No QF
Highest
educational
qualif ication
Total
-9.00
1.00
Degree
2.00
Diploma
3.00 Higher
school or
vocational
4.00 School
level or
below
educ4
Total
35

Family and Working Lives Survey
(
54
vars per educ record)


2
(c) Data on ethnicity


We can conceive of similar information
resources and data analysis requirements
for measures of ethnicity


There are generally fewer published resources /
agreed standards in this domain



GEMDE publishes resources but puts more emphasis
on understanding complex ethnicity data

36

…working with ethnicity data in surveys is
hard…!

-

It’s sparse

-

It’s collinear
(e.g. to age, location)

-

It’s dynamic
(cf. comparative research)

37

38

EFFNATIS sample (
1999
):

Subjective ethnic identity

[Heckman et al.,
2001
]

3
0
.

E
n
g
l
i
s
h
,

W
h
i
t
e
,

C
o
s
m
o
p
o
l
i
t
a
n

(
2
,
1
0
,












1








0
.
1
2







8
6
.
0
3
2
9
.

E
n
g
l
i
s
h
,

E
u
r
o
p
e
a
n

a
n
d

W
h
i
t
e
-
B
r
i
t
i
s
h











2
0








2
.
4
3







8
5
.
9
1

2
8
.

E
u
r
o
p
e
a
n

a
n
d

W
h
i
t
e
-
B
r
i
t
i
s
h

(
1
1
,
1
3
)












7








0
.
8
5







8
3
.
4
8






















2
7
.

W
h
i
t
e
-
I
t
a
l
i
a
n












1








0
.
1
2







8
2
.
6
2





2
6
.

E
n
g
l
i
s
h

&

W
h
i
t
e
-
B
r
i
t
i
s
h

(
2
,
1
3
)












8








0
.
9
7







8
2
.
5
0
2
5
.

W
h
i
t
e
-
B
r
i
t
i
s
h

&

C
o
s
m
o
p
o
l
i
t
a
n

(
1
3
,
1
6












5








0
.
6
1







8
1
.
5
3






2
4
.

E
n
g
l
i
s
h
,

W
h
i
t
e
-
B
r
i
t
i
s
h

(
2
,
1
3
)











4
1








4
.
9
8







8
0
.
9
2













2
3
.

E
n
g
l
i
s
h

&

W
h
i
t
e

(
2
,
1
0
)











5
4








6
.
5
6







7
5
.
9
4
2
2
.

I
n
d
i
a
n
-
B
r
i
t
i
s
h
,

A
s
i
a
n
-
B
r
i
t
i
s
h

(
7
,
1
5












9








1
.
0
9







6
9
.
3
8













2
1
.

I
n
d
i
a
n

&

B
r
i
t
i
s
h

(
1
,
4
)












1








0
.
1
2







6
8
.
2
9





















1
9
.

B
r
i
t
i
s
h

M
o
s
l
e
m












6








0
.
7
3







6
8
.
1
7
















1
8
.

I
n
d
i
a
n
,

A
s
i
a
n

(
4
,
9
)












2








0
.
2
4







6
7
.
4
4





























1
7
.

M
o
s
l
e
m











1
2








1
.
4
6







6
7
.
1
9























1
6
.

C
o
s
m
o
p
o
l
i
t
a
n












7








0
.
8
5







6
5
.
7
4






















1
5
.

A
s
i
a
n
-
B
r
i
t
i
s
h











6
5








7
.
9
0







6
4
.
8
8






















1
3
.

W
h
i
t
e
-
B
r
i
t
i
s
h











6
0








7
.
2
9







5
6
.
9
9



























1
1
.

E
u
r
o
p
e
a
n












6








0
.
7
3







4
9
.
7
0






























1
0
.

W
h
i
t
e












7








0
.
8
5







4
8
.
9
7































9
.

A
s
i
a
n












6








0
.
7
3







4
8
.
1
2

















8
.

B
a
n
g
l
a
d
e
s
h
i
-
B
r
i
t
i
s
h











2
2








2
.
6
7







4
7
.
3
9






















7
.

I
n
d
i
a
n
-
B
r
i
t
i
s
h











3
4








4
.
1
3







4
4
.
7
1



















6
.

P
a
k
i
s
t
a
n
i
-
B
r
i
t
i
s
h











7
3








8
.
8
7







4
0
.
5
8

























5
.

B
a
n
g
l
a
d
e
s
h
i











1
0








1
.
2
2







3
1
.
7
1






























4
.

I
n
d
i
a
n











1
0








1
.
2
2







3
0
.
5
0



























3
.

P
a
k
i
s
t
a
n
i











2
5








3
.
0
4







2
9
.
2
8





























2
.

E
n
g
l
i
s
h











7
9








9
.
6
0







2
6
.
2
5





























1
.

B
r
i
t
i
s
h










1
3
7







1
6
.
6
5







1
6
.
6
5










































































































y
o
u
r
s
e
l
f
?








F
r
e
q
.





P
e
r
c
e
n
t








C
u
m
.







d
e
s
c
r
i
b
e
s

h
o
w

y
o
u

w
o
u
l
d

d
e
s
c
r
i
b
e








Q
.
1
2
9

W
h
i
c
h

o
f

t
h
e

f
o
l
l
o
w
i
n
g

b
e
s
t













9
0
.

W
h
i
t
e
-
B
r
i
t
i
s
h
,

U
k
r
a
i
n
i
a
n












1








0
.
1
2







9
8
.
9
1







8
9
.

I
n
d
i
a
n
,

A
s
i
a
n
-
B
r
i
t
i
s
h

(
4
,
1
5
)












1








0
.
1
2







9
8
.
7
8
8
8
.

I
n
d
i
a
n
,

B
l
a
c
k
,

A
s
i
a
n
-
B
r
i
t
i
s
h

(
4
,
1
2
,












1








0
.
1
2







9
8
.
6
6
8
7
.

I
n
d
i
a
n
-
B
r
i
t
i
s
h
,

B
l
a
c
k
-
B
r
i
t
i
s
h

(
7
,
1
4












1








0
.
1
2







9
8
.
5
4














8
6
.

E
n
g
l
i
s
h
,

I
n
d
i
a
n

(
2
,
4
)












4








0
.
4
9







9
8
.
4
2





8
5
.

E
u
r
o
p
e
a
n
,

W
h
i
t
e
-
B
r
i
t
i
s
h
,

I
r
i
s
h












1








0
.
1
2







9
7
.
9
3









8
3
.

P
a
k
i
s
t
a
n
i
-
B
r
i
t
i
s
h
,

I
t
a
l
i
a
n












1








0
.
1
2







9
7
.
8
1
8
2
.

E
n
g
l
i
s
h
,

P
a
k
i
s
t
a
n
i
-
B
r
i
t
i
s
h
,

A
s
i
a
n

(












1








0
.
1
2







9
7
.
6
9
























8
0
.

H
u
m
a
n

b
e
i
n
g












2








0
.
2
4







9
7
.
5
7
7
9
.

E
n
g
l
i
s
h
,

I
n
d
i
a
n
-
B
r
i
t
i
s
h
,

W
h
i
t
e
,

A
s
i












1








0
.
1
2







9
7
.
3
3



7
8
.

P
a
k
i
s
t
a
n
i
-
B
r
i
t
i
s
h
,

A
s
i
a
n
,

M
o
s
l
e
m












1








0
.
1
2







9
7
.
2
1











7
7
.

E
n
g
l
i
s
h
,

P
a
k
i
s
t
a
n
i

(
2
,
3
)












3








0
.
3
6







9
7
.
0
8



7
3
.

B
r
i
t
i
s
h

w
i
t
h

B
a
l
t
i
c
-
S
l
a
v

o
r
i
g
i
n
s












1








0
.
1
2







9
6
.
7
2
7
2
.

E
n
g
l
i
s
h
,

W
h
i
t
e
-
B
r
i
t
i
s
h
,

C
o
s
m
o
p
o
l
i
t
a












2








0
.
2
4







9
6
.
6
0






















7
1
.

I
r
i
s
h
-
E
n
g
l
i
s
h












1








0
.
1
2







9
6
.
3
5

7
0
.

E
n
g
l
i
s
h
,

W
h
i
t
e
,

E
u
r
o
p
e
a
n

(
2
,
1
0
,
1
1
)












3








0
.
3
6







9
6
.
2
3














6
9
.

E
n
g
l
i
s
h
,

W
h
i
t
e
,

I
r
i
s
h












1








0
.
1
2







9
5
.
8
7
6
8
.

P
a
k
i
s
t
a
n
i
,

A
s
i
a
n
-
B
r
i
t
i
s
h
,

M
o
s
l
e
m
,

F












1








0
.
1
2







9
5
.
7
5




6
7
.

P
a
k
i
s
t
a
n
i
,

A
s
i
a
n
-
B
r
i
t
i
s
h

(
3
,
1
5
)












1








0
.
1
2







9
5
.
6
3



























6
6
.

H
u
m
a
n
o
i
d












1








0
.
1
2







9
5
.
5
0
























6
5
.

L
a
n
c
a
s
t
r
i
a
n












3








0
.
3
6







9
5
.
3
8











6
4
.

B
r
i
t
i
s
h
,

E
u
r
o
p
e
a
n

(
1
,
1
1
)












2








0
.
2
4







9
5
.
0
2

6
3
.

B
a
n
g
l
a
d
e
s
h
i
,

A
s
i
a
n
,

B
l
a
c
k

(
5
,
9
,
1
2
)












1








0
.
1
2







9
4
.
7
8
6
2
.

P
a
k
i
s
t
a
n
i
-
B
r
i
t
i
s
h
,

B
l
a
c
k
,

A
s
i
a
n
-
B
r
i












2








0
.
2
4







9
4
.
6
5



6
1
.

E
n
g
l
i
s
h
,

P
a
k
i
s
t
a
n
i
-
B
r
i
t
i
s
h

(
2
,
6
)












2








0
.
2
4







9
4
.
4
1
6
0
.

B
r
i
t
i
s
h
,

W
h
i
t
e
,

E
u
r
o
p
e
a
n
,

C
o
s
m
o
p
o
l
i












1








0
.
1
2







9
4
.
1
7
5
9
.

E
n
g
l
i
s
h
,

E
u
r
o
p
e
a
n
,

W
h
i
t
e
-
B
r
i
t
i
s
h
,

C












3








0
.
3
6







9
4
.
0
5

























5
8
.

I
n
d
i
v
i
d
u
a
l












2








0
.
2
4







9
3
.
6
8











5
6
.

E
n
g
l
i
s
h
,

E
u
r
o
p
e
a
n

(
2
,
1
1
)












2








0
.
2
4







9
3
.
4
4



























5
5
.

S
c
o
t
t
i
s
h












1








0
.
1
2







9
3
.
2
0














5
4
.

B
r
i
t
i
s
h
,

W
h
i
t
e

(
1
,
1
0
)












2








0
.
2
4







9
3
.
0
7













5
3
.

B
r
i
t
i
s
h
,

E
n
g
l
i
s
h

(
1
,
2
)












9








1
.
0
9







9
2
.
8
3
5
2
.

E
n
g
l
i
s
h
,

I
n
d
i
a
n
,

A
s
i
a
n
-
B
r
i
t
i
s
h

(
2
,
4












1








0
.
1
2







9
1
.
7
4
5
1
.

P
a
k
i
s
t
a
n
i
-
B
r
i
t
i
s
h
,

I
n
d
i
a
n
-
B
r
i
t
i
s
h

(












1








0
.
1
2







9
1
.
6
2






5
0
.

E
n
g
l
i
s
h
,

A
s
i
a
n
-
B
r
i
t
i
s
h

(
2
,
1
5
)












2








0
.
2
4







9
1
.
4
9






4
9
.

E
n
g
l
i
s
h
,

I
n
d
i
a
n
-
B
r
i
t
i
s
h

(
2
,
7
)












2








0
.
2
4







9
1
.
2
5
4
8
.

E
n
g
l
i
s
h
,

I
n
d
i
a
n
-
B
r
i
t
i
s
h
,

B
l
a
c
k
,

A
s
i












2








0
.
2
4







9
1
.
0
1








4
7
.

I
n
d
i
a
n
-
B
r
i
t
i
s
h
,

A
s
i
a
n

(
7
,
9
)












5








0
.
6
1







9
0
.
7
7
4
6
.

I
n
d
i
a
n
,

A
s
i
a
n
,

A
s
i
a
n
-
B
r
i
t
i
s
h

(
4
,
9
,
1












1








0
.
1
2







9
0
.
1
6

























4
5
.

N
e
a
p
o
l
i
t
a
n












1








0
.
1
2







9
0
.
0
4
4
4
.

E
n
g
l
i
s
h
,

P
a
k
i
s
t
a
n
i
-
B
r
i
t
i
s
h
,

I
n
d
i
a
n
-












2








0
.
2
4







8
9
.
9
1
4
3
.

E
n
g
l
i
s
h
,

I
n
d
i
a
n
-
B
r
i
t
i
s
h
,

A
s
i
a
n
-
B
r
i
t












1








0
.
1
2







8
9
.
6
7





4
2
.

P
a
k
i
s
t
a
n
i
-
B
r
i
t
i
s
h
,

A
s
i
a
n

(
6
,
9
)












6








0
.
7
3







8
9
.
5
5







4
1
.

B
l
a
c
k
,

A
s
i
a
n
-
B
r
i
t
i
s
h

(
1
2
,
1
5
)












2








0
.
2
4







8
8
.
8
2
4
0
.

P
a
k
i
s
t
a
n
i
-
B
r
i
t
i
s
h
,

A
s
i
a
n
-
B
r
i
t
i
s
h

(
6












2








0
.
2
4







8
8
.
5
8













3
9
.

P
a
k
i
s
t
a
n
i
-
b
o
r
n

B
r
i
t
i
s
h












1








0
.
1
2







8
8
.
3
4












3
8
.

P
a
k
i
s
t
a
n
i
,

B
l
a
c
k

(
3
,
1
2
)












1








0
.
1
2







8
8
.
2
1
3
7
.

P
a
k
i
s
t
a
n
i
-
B
r
i
t
i
s
h
,

A
s
i
a
n
,

B
l
a
c
k
-
B
r
i












1








0
.
1
2







8
8
.
0
9
3
6
.

P
a
k
i
s
t
a
n
i
-
B
r
i
t
i
s
h
,

B
l
a
c
k
-
B
r
i
t
i
s
h
,

A












1








0
.
1
2







8
7
.
9
7
3
5
.

P
a
k
i
s
t
a
n
i
-
M
o
s
l
e
m

l
i
v
i
n
g

i
n

G
r
e
a
t

B
r












1








0
.
1
2







8
7
.
8
5



























3
4
.

K
a
s
h
m
i
r
i












4








0
.
4
9







8
7
.
7
3













3
3
.

P
a
k
i
s
t
a
n
i
,

A
s
i
a
n

(
3
,
9
)












3








0
.
3
6







8
7
.
2
4
3
2
.

P
a
k
i
s
t
a
n
i
-
B
r
i
t
i
s
h
,

A
s
i
a
n
-
B
r
i
t
i
s
h

(
6












6








0
.
7
3







8
6
.
8
8
3
1
.

E
n
g
l
i
s
h
,

I
n
d
i
a
n
-
B
r
i
t
i
s
h
,

A
s
i
a
n
-
B
r
i
t












1








0
.
1
2







8
6
.
1
5
39

A ‘data management’ contribution


Preserve information on what was done with categorical data


Communicate information on what should/could be done

GEMDE seeks to promote replicability /
transparency…


Document your own recodes


Access somebody else’s recodes


Identify commonly used recodes (& use them..!)



40

..and making complex analysis of ethnicity
data easier..


Organising complex categorical data


Labelling, recoding, etc


Effect proportional scaling


Standardisation


Interaction terms


41

1. White
2. Mixed
3. Indian
5. Bangladeshi
6. Other Asian
7. Black-Caribbean
8. Black African
9. Other Black
10. Chinese
11. Other ethnic group
4. Pakistani
-2
-1
0
1
2

Source: BHPS wave 17, n = 12626, % 'White' = 97.3
Identified principally by age, gender attitudes and household income
SOR model dimension scores for BHPS ethnic groups
The GEODE model for GEMDE?


….A service for MUGs and MIRs…


o
Define/register ‘Minority Unit Groups’


o
Define/register ‘Minority Information Resources’


o
Explore data resources and obtain help in
approaching analysis of complex, sparse data

What's a MIR?


'
Minority Information Resource
'.

o
This is our own terminology. By a MIR, we mean any piece of information
which supplies systematic data on a minority unit group (MUG) classification.
We've used this term to be deliberately similar to the phrase 'Occupational
Information Resources' that we used on GEODE


E.g. summary statistical data about the categories from and
documentation or information


E.g. recodings which have been used in a particular study

o
Social scientists are not in general aware of the existence of MIRs (cf. wides
use of popular Occupational Information Resources). In GEMDE we seek to
publicise little know resources and promote their uptake: We argue that
better communication and dissemination of MIRs is in fact an important step
towards better scientific practice of replication and standardisation of
research.


In our terms, every MIR necessarily links to a MUG (but not
every MUG has a MIR).


The GEMDE portal

‘Liferay portal’ with access to MUGs and MIRs, first release Jan
2010
,
now available for general use (www.dames.org.uk/gemde)



Shibboleth access for
registered users


Guest level access


Deposit MUGs/MIRs


Search/browse
deposited resources




Feedback on resources
(user ratings)


Review live data (e.g.
pooled LFS records)


Expert and user quality
ratings


Screenshot here!



46

Summary: Remind me how these
topics enhance survey data..?


Variable
operationalisations

can ordinarily
be improved by more ‘rigour and vigour’


More transparent
operationalisation
/documentation


Better use of detailed data


Better ability to include measures in suitably
complex models/analysis



The GESDE approach has been to seek
technological solutions to the organisation
and distribution of complex variable
-
related
information


48

Data used


Department for Education and Employment. (
1997
).
Family and Working Lives
Survey,
1994
-
1995
[computer file]
. Colchester, Essex: UK Data Archive [distributor],
SN:
3704
.


Heckmann, F., Penn, R. D., & Schnapper, D. (Eds.). (
2001
).
Effectiveness of National
Integration Strategies Towards Second Generation Migrant Youth in a Comparative
Perspective
-

EFFNATIS
. Bamberg: European Forum for Migration Studies,
University of Bamberg.


Li, Y., & Heath, A. F. (
2008
).
Socio
-
Economic Position and Political Support of Black
and Ethnic Minority Groups in the United Kingdom,
1972
-
2005
[computer file].
2
nd
Edition
. Colchester, Essex: UK Data Archive [distributor], SN:
5666
.


Office for National Statistics. Social and Vital Statistics Division and Northern Ireland
Statistics and Research Agency. Central Survey Unit,
Quarterly Labour Force
Survey, January
-

March,
2008

[computer file].
4
th Edition.

Colchester, Essex: UK
Data Archive [distributor], March
2010
. SN:
5851
.


University of Essex, & Institute for Social and Economic Research. (
2009
).
British
Household Panel Survey: Waves
1
-
17
,
1991
-
2008
[computer file],
5
th Edition
.
Colchester, Essex: UK Data Archive [distributor], March
2009
, SN
5151
.



49

References


Altman, M., & Franklin, C. H. (
2010
). Managing Social Science Research Data. London: Chapman and Hall.


Bennett, T., Savage, M., Silva, E. B.,
Warde
, A.,
Gayo
-
Cal, M., Wright, D., et al. (
2009
).
Culture, Class, Distinction.
London:
Routledge
.


Blanden
, J., Goodman, A., Gregg, P., &
Machin
, S. (
2004
). Changes in generational mobility in Britain. In M.
Corak

(Ed.), Generational
Income Mobility in North America and Europe (pp.
147
-
189
). Cambridge: Cambridge University Press.


Dale, A. (
2006
). Quality Issues with Survey Research.
International Journal of Social Research Methodology,
9
(
2
),
143
-
158
.


Freese
, J. (
2007
). Replication Standards for Quantitative Social Science: Why Not Sociology?
Sociological Methods and Research,
36
(
2
),
153
-
171
.


Ganzeboom, H. B. G. (
2008
). Tools for deriving status measures from ISKO
-
88
and ISCO
-
68
. Retrieved
1
March,
2008
, from
http://home.fsw.vu.nl/~ganzeboom/PISA/


Guveli, A. (
2006
). New Social Classes within the Service Class in the Netherlands and Britain: Adjusting the EGP class schema for the
technocrats and the social and cultural specialists. Nijmegen:
Radbound

U. Nijmegen.


Harkness
, J., van de
Vijver
, F. J. R., &
Mohler
, P. P. (Eds.). (
2003
). Cross
-
Cultural Survey Methods. NY: Wiley.


Hoffmeyer
-
Zlotnik
, J. H. P., & Wolf, C. (Eds.). (
2003
). Advances in Cross
-
national Comparison: A European Working Book for
Demographic and Socio
-
economic Variables. Berlin:
Kluwer

Academic / Plenum Publishers.


Jowell
, R., Roberts, C., Fitzgerald, R., & Eva, G. (
2007
). Measuring Attitudes Cross
-
Nationally. London: Sage.


Kulas
, J. T. (
2008
). SPSS Essentials: Managing and Analyzing Social Sciences Data New York:
Jossey

Bass.


Lambert, P. S., & Gayle, V. (
2009
). Data management and standardisation: A methodological comment on using results from the UK
Research Assessment Exercise
2008
. Stirling: University of Stirling, Technical paper
2008
-
3
of the Data Management through e
-
Social
Science research Node (
www.dames.org.uk
).


Lambert, P. S., & Gayle, V. (
2009
).
'Escape from Poverty' and Occupations.
Colchester, Essex: BHPS Research Conference,
9
-
11
July
2009
, and www.iser.essex.ac.uk/events/conferences/bhps
-
2009
-
conference/overview


Lambert, P. S., & Penn, R. D. (
2001
). SOR models and Ethnicity data in LIS and LES : Country by Country Report. Syracuse
University, Syracuse, New York
13244
-
1020
: Luxembourg Income Study Paper No.
260
.


Levesque, R., & SPSS Inc. (
2010
). Programming and Data Management for IBM SPSS Statistics
18
: A Guide for PASW Statistics and
SAS users. Chicago: SPSS Inc.


Long, J. S. (
2009
). The Workflow of Data Analysis Using
Stata
. Boca Raton: CRC Press.


Penn, R. D., & Lambert, P. S. (
2009
). Children of International Migrants in Europe: Comparative Perspectives. Basingstoke: Palgrave.


Prandy, K. (
1979
). Ethnic discrimination in employment and housing.
Ethnic and Racial Studies,
2
(
1
),
66
-
79
.


Schneider, S. L. (
2008
). The International Standard Classification of Education (ISCED
-
97
). An Evaluation of Content and Criterion
Validity for
15
European Countries. Mannheim: MZES.


Simpson, L., &
Akinwale
, B. (
2006
).
Quantifying
Stablity

and Change in Ethnic Group
. Manchester: University of Manchester, CCSR
Working Paper
2006
-
05
.


Rose, D., & Harrison, E. (Eds.). (
2010
). Social Class in Europe: An Introduction to the European Socio
-
economic Classification
London:
Routledge
.


Treiman
, D. J. (
2009
). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York:
Jossey

Bass.


van
Deth
, J. W. (
2003
). Using Published Survey Data. In J. A.
Harkness

et.
a.l
. (
2003
) (pp.
329
-
346
).




50

Appendix


Existing resources


sources and types of
support for data management in the social
sciences:

NCRM, Session
27
,
1
July
2008

51



Existing resources (i): Data providers

a)
Documentation and metadata files

52

Existing resources (i): D
ata providers


b)
Resources for variables


CESSDA PPP on key variables
http://www.nsd.uib.no/cessda/project/



UK Question Bank
http://surveynet.ac.uk/sqb/datacollection/resources.asp



ONS Harmonisation
www.ons.gov.uk/about
-
statistics/harmonisation/


c)
Resources for datasets


UK Census data portal,
http://census.ac.uk/



IPUMS international census data facilities,
www.ipums.org



European Social Survey,
www.europeansocialsurvey.org


d)
Data manipulations prior to data release


Missing data imputation / documentation


Survey design / weighting information


Influential


most analysts use ‘the archive version’



53

Existing resources (ii)
Resource projects /
infrastructures


-
UK ESDS
www.esds.ac.uk


ESDS International

|

ESDS Government

ESDS Longitudinal

|

ESDS Qualidata

-
Helpdesks; online instructions; user support..


-
UK ESRC NCRM / NCeSS / RDI initiatives

-
Longitudinal data


www.longitudinal.stir.ac.uk


-
Linking micro/macro
-

www.mimas.ac.uk/limmd/


-
Other resources / projects / initiatives

-
EDACwowe
-

http://recwowe.vitamib.com/datacentre


54

Existing resources (iii)
Analytical and
software support


Textbooks featuring data management


[Levesque & SPSS Inc,
2010
] [Altman & Franklin,
2010
] [Long,
2009
]
[Kulas,
2008
]



Software training covering DM


Stata’s ‘data management’ manual


SPSS user group course on syntax and data management,
www.spssusers.co.uk



But generally, sustained marginalisation of DM as a topic


Advanced methods texts use simplistic data


Advanced software for analysis isn’t usually combined with extended
DM requirements


55

Existing resources (iv)
Data analysts’
contributions


Academic researchers often generate and
publish their own DM resources, e.g.

Harry Ganzeboom on education and occupations,
http://home.fsw.vu.nl/~ganzeboom/pisa/


Provision of whole or partial syntax programming examples


Analysts often drive wider resource provisions
related to DM

CAMSIS project on occupational scales,
www.camsis.stir.ac.uk


CASMIN project on education and social class


56

Existing resources (v) Literatures on
harmonisation and standardisation


National Statistics Institutes’ principles and
practices

E.g. ONS
www.ons.gov.uk/about
-
statistics/harmonisation/



Cross
-
national organisations

E.g. UNSTATS
-

http://unstats.un.org/unsd/class/



Academic studies

E.g.
[Harkness et al
2003
] [Hoffmeyer
-
Zlotnick & Wolf
2003
] [Jowell et al.
2007
] [Scheider,
2008
] [Rose and
Harrison
2010
]