Education for Data Professionals: A Study of Current Courses and Programs

wildlifeplaincityΔιαχείριση

6 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

85 εμφανίσεις

Final

Search

Terms
:


Archiving (digital or data)

Authentication (data)

Conservation (digital or data)

Curation (digital or data)

Cyberinfrastructure

Data access

Data collection

Data discovery

Data mining

Data provenance

Data quality

Data retrieval

Data standards (non CS)

Digital library

Digitization

e
-
Science

Informatics

Information architecture

Information documenting

Modeling (info. or data)

Management (digital, data,
info., or knowledge)

Metadata

Ontology

Policy (digital, data, or info.)

Preservation (digital or data)

Representation (data, info., or
knowledge)

Retrieval (digital, data, or info.)

Semantic web

Systems analysis

Methods
:



Courses

and

programs

identified

by

searching

online

course

catalogs
.

Searches

limited

to

courses

in

library

or

information

schools
.



Either

the

course

name

or

description

had

to

include

a

keyword

or

keyword

combination

associated

with

data

curation,

data

science

or

data

management
.



Data

Curation

broadly

defined

as

the

active

and

on
-
going

management

of

data

through

its

lifecycle

of

interest

and

usefulness

to

scholarship,

science,

and

education
.

Data

curation

activities

enable

data

discovery

and

retrieval,

maintain

data

quality,

add

value,

and

provide

for

re
-
use

over

time
.

This

field

also

includes

authentication,

data

standards,

archiving,

collection

and

management,

preservation,

retrieval,

knowledge

representation,

and

policy

as

it

affects

data
.



To

further

clean

and

validate

the

dataset,

course

descriptions

were

viewed

in

context

and

individuals

at

each

institution

were

contacted

(
54
.
7
%

return)
.



The

dataset

contained

476

courses

in

158

programs

at

55

institutions
.


Course

and

program

descriptions

were

coded

separately

using

AtlasTI

by

selecting

every

descriptive

word

or

phrase

and

then

grouping

codes

into

families

associated

with

data

curation

or

data

science

as

found

in

the

literature
.


ABSTRACT
:


In

response

to

the

current

data
-
intensive

research

environment

and

the

necessity

of

a

professional

data

workforce,

iSchools

are

building

new

programs

and

enhancing

existing

programs

to

meet

workforce

demands

in

data

curation,

data

management,

and

data

science

[
1
-
3
]
.

To

understand

the

state

of

education

in

the

field,

we

studied

current

programs

and

courses

offered

at

iSchools

and

other

schools

of

Library

and

Information

Science
.

Here

we

present

an

overview

of

the

methods

and

results
.

Courses

are

divided

info

four

categories
:

data

centric,

data

inclusive,

digital,

and

traditional

LIS
.

The

analysis

reveals

trends

in

LIS

education

for

data

professionals

and

identifies

particular

areas

of

expertise

and

gaps

in

LIS

education

for

data

professionals
.

CONCLUSIONS
:


iSchools

are

making

important

progress

on

curriculum

for

educating

the

data

workforce,

but

there

is

high

dependency

on

existing

digital

library

curriculum

and

limited

new

curriculum

specific

to

“research

data”

expertise
.



11

institutions

offer

5

programs

specifically

focused

on

data

and

12

more

covering

aspects

of

data
;

15

other

institutions

have

programs

with

a

pronounced

emphasis

on

digital

content,

but

not

necessarily

data
.



There

is

wide

variability

in

the

terminology

used

to

describe

courses

and

concepts
.


Most

data
-
centric

courses

appear

new,

with

entry
-
level

course

numbers,

and

most

coverage

of

data

issues

and

expertise

appears

to

be

through

revision

of

existing

courses
.



Existing

digital

courses

are

contributing

the

most

to

data
-
oriented

content,

covering

areas

such

as

representation

/

modeling

and

archiving

/

preservation,

but

traditional

courses

make

up

the

majority

of

courses

overall
.


Data

and

digital

categorizations

apply

to

both

program

and

course

level

attributes
.


Schools

lacking

well
-
defined

“digital”

or

“data”

curriculum

offer

access

to

some

courses

of

value

to

students

wishing

to

develop

expertise

as

data

professionals
.


Further

investment

in

data

centric

courses

and

programs

will

be

essential

to

support

contemporary

science

and

research
.

Education for Data Professionals:

A Study of Current Courses and Programs

Virgil E. Varvel Jr., Ph.D.

vvarvel@illinois.edu

Elin J. Bammerlin, M.S.L.I.S.

bammerl1@illinois.edu

Carole L. Palmer, Ph.D.

clpalmer@illinois.edu

1.
Interagency Working Group on Digital Data. (2009, January).
Harnessing the power of digital data for science and society
. Washington,
DC: Office of Science and Technology Policy. Retrieved from http://www.nitrd.gov/About/Harnessing_Power_Web.pdf

2.

Lynch, C. (2008). Big data: How do your data grow?
Nature, 455
, 28
-
29.

3.
National Science Board. (2010).
Grant proposal guide: Chapter II C.2.j.

Retrieved from http://www.nsf.gov/pubs/policydocs/pappguide

4.
Lee, C. (2009).
Matrix of digital curation knowledge and competencies
. Retrieved from http://www.ils.unc.edu/digccurr/digccurr
-
matrix.html

Project Funded by the Data Conservancy (NSF Award Number OCI
-
0830976).

Data Centric

7.6%

Data
Inclusive

10.8%

Digital

27.0%

Traditional

54.6%

Course Category Distribution

Course Analysis:
To better understand the concentration
of data topics on the courses, we divided them into four
categories along a continuum;


Data Centric:

Focused exclusively on data curation, data
management, or data science topics.


Data Inclusive:
Having segments devoted to data topics.


Digital
: Including digital topics highly relevant for
education of data professionals.


Traditional:
Covering long
-
standing areas in LIS
curriculum often giving students overview of topics.

405


Data

Centric

programs

were

named

Information

Architecture,

Informatics,

Data

Curation,

Knowledge

&

Data

Discovery,

and

eScience

specifically
.

Data

Inclusive

p
rograms

covered

the

areas

of

digital

curation
;

informatics
;

digital

libraries
;

escience
;

information

architecture
;

information,

records,

content,

or

knowledge

management
;

and

archives

&

preservation
.



Data

Centric

programs

emphasized

data

discovery,

collection,

indexing,

access,

retrieval,

representation,

sharing,

mining,

analysis,

standards,

modeling,

policy,

management,

metrics,

preservation,

and

archiving

in

their

descriptions

and

course

representation
.


No

trends

were

found

regarding

whether

courses

were

required

or

recommended

in

programs
.


26
.
2
%

of

courses

analyzed

were

available

online

&

50
.
8
%

of

those

were

exclusively

online
.

Online

only

courses

tended

to

be

newer

digital

or

data
-
centric

courses
.


There

were

over

800

different

terms

in

13

families

of

concepts
.



The

most

common

terms

were

Metadata,

Preservation,

Retrieval,

Archives,

Management,

Organization,

Indexing,

Human

Computer

Interaction,

and

Digital

Library
.

Each

had

sub

entries

such

as

Management

(Asset,

Digital,

Data,

Electronic,

Information,

Knowledge,

Records,

Systems,

&

Theory)


172

course

descriptions

specifically

mention

the

word

“data”

in

some

context
;

however

some

were

research

methods

courses
.


Data

Mining

was

the

most

common

usage

of

data

in

data

centric

or

data

inclusive

courses
.

Represent at i on

&

Model i ng

( whi ch

i ncl uded

Met adat a

codes)

and

Management

( whi ch

i ncl uded

Dat a

Management )

were

t he

next

most

common

occur rences

of

data

in

course

descriptions
.


Management

was

the

most

comment

concept

represented

in

courses

and

by

institution

followed

by

Representation

&

Modeling
;

Information

systems

and

Administration
;

&

Discovery,

Access,

&

Use,

coinciding

with

search

term

representation

in

course

descriptions
.


No

institution

represented

every

concept

area
.

On

average,

6
.
0

concepts

were

represented

per

institution,

with

no

institution

representing

more

than

11

of

the

13

code

families
.


Despite

search

criteria,

most

courses

were

still

traditional

courses

that

covered

some

form

of

data

curation

topic
.

Center for Informatics Research in Science and Scholarship

Graduate School of Library and Information Science

University of Illinois at Urbana
-
Champaign

0
50
100
150
200
250
222

170

166

165

99

80

75

44

33

19

19

15

9

Concept Representation in Courses

(out of 476 courses)

0
1
2
3
4
5
6
7
8
9
13
12
11
10
9
8
7
6
5
4
3
2
1
Number of Institutions

Total Content Areas Represented

Concept
Areas Per Institution

Average = 6.0

0
5
10
15
20
25
30
35
40
45
50
50

46

42

36

32

25

22

16

14

13

10

6

6

Number of Institutions

Concept Representation by
Institution
(n=53)

Code Family
\

Course Type

Traditional

Digital

Data
Inclusive

Data
Centric

Representation &
Modeling

45.8%

30.1%

16.3%

7.8%

Management

57.9%

25.2%

9.4%

7.4%

Discover, Access

&
Use

53.3%

28.3%

11.8%

6.6%

Policy & Social
Aspects

40.8%

42.1%

10.5%

6.6%

Selection & Collection
Development

36.2%

46.8%

10.6%

6.4%

Project
&
Organization
Management

55.9%

35.3%

2.9%

5.9%

Data
Quality

22.2%

55.6%

16.7%

5.6%

Info. Sys.
& Systems
Administration

61.4%

18.3%

15.7%

4.6%

Archiving

53.2%

33.8%

10.4%

2.6%

Preservation &
Conservation

53.5%

39.4%

6.1%

1.0%

Digitization

12.9%

83.9%

3.2%

0.0%

Scholarly
Communication

55.6%

33.3%

11.1%

0.0%

Data
Mining

5.6%

0.0%

33.3%

61.1%

1

4

14

2

2

13

2

6

23

0
5
10
15
20
25
Data-Centric
Data-Inclusive
Digital (lines = data total)
Program Descriptions

Highlighting Data or Digital Aspects

Masters
CAS
Concentration
This course and program data can be searched at

http://cirssweb.lis.illinois.edu/DCCourseScan1/index.html

Keywords

derived

from

definition,

current

literature

on

data

curation,

and

by

consulting

the

Matrix

of

Digital

Curation

Knowledge

and

Competencies

[
4
]
.

They

evolved

as

new

and

relevant

terms

were

identified
.