Faculty Research Data: Informatics and Archiving

collardsdebonairΔιαχείριση

6 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

68 εμφανίσεις


ECURE 2005, Phoenix, AZ


Faculty Research Data:

Informatics and Archiving


Sarah M. Pritchard

University Librarian

University of California, Santa Barbara


March 1, 2005


ECURE 2005, Phoenix, AZ


2

Informatics: A Definition


The study of the structure and behavior of natural
and artificial systems designed to process data



Development of tools to ingest and interpret large
stores of data in heterogeneous and distributed
systems



Integration of data (numeric, textual, image,
spatial) with tools for modeling, trend analysis,
mapping, image processing, etc.



Business applications not studied in this context


March 1, 2005


ECURE 2005, Phoenix, AZ


3

Informatics at UCSB


Emergence of informatics as a specialty in several
academic departments, notably environmental
sciences


Highly interdisciplinary faculty


Development of unique stand
-
alone systems for
managing collaborative research data


No ongoing mechanisms for communication and
technical coordination


Campus and consortial projects emerging for
digital publications and for instructional support
but not yet for research data


March 1, 2005


ECURE 2005, Phoenix, AZ


4

Faculty Research Data



Large numeric data sets from physical
sciences and laboratory research


Imaging


geosciences, neurosciences


Fieldwork


environmental, archaeological


Customized interpretive and manipulation
tools


Drafts, correspondence, notes


March 1, 2005


ECURE 2005, Phoenix, AZ


5

UCSB Computing Environment


One of the original nodes of the Internet



No centralized academic computing organization



Offices for networking, and for instructional
support



Individual colleges and departments have
developed own servers and support for research
data and teaching tools



High
-
level campus policy board for IT issues brings
some coordination




March 1, 2005


ECURE 2005, Phoenix, AZ


6

UCSB Library Context


Alexandria Digital Library
(www.alexandria.ucsb.edu)


Extension into new disciplinary applications


Heterogeneous metadata ingest


Extensive backup and archiving architecture


Long record of faculty collaboration


NDIIPP


California Digital Library (www.cdlib.org)


Digital preservation initiatives for published documents and for
(under development) government information web sites


eScholarship program to support publication of online journals,
preprint archives


Online Archive of California


special collections support


Other faculty support


Electronic reserves including streaming audio reserves


Digital document delivery to the desktop


March 1, 2005


ECURE 2005, Phoenix, AZ


7

What questions emerge from this?



Why are faculty building informatics systems?


Is valuable research time and funding being spent
on tangential work?


Are there commonalities across informatics
applications and disciplines?


Is there redundancy in tool development?


Can data be openly accessed or shared?


Are digital library concerns (metadata, IP rights,
archiving) incorporated?


March 1, 2005


ECURE 2005, Phoenix, AZ


8

Informatics Project Goals


Create stronger linkages among relevant faculty
research projects


Identify components and needs in informatics and
the management of research data


Assess the degree of commonality in informatics
tools and functionality


Determine whether more support is needed for
data archiving, metadata, interfaces, IP


Develop a planning agenda for informatics in a
distributed environment


Inform the design of facilities and services


March 1, 2005


ECURE 2005, Phoenix, AZ


9

Project Components


Background research in current informatics work
in academic disciplines


Structured interviews and site visits with selected
faculty


Matrix of system characteristics and issues


Informal roundtables for faculty working in these
areas


Collaboration with related IT units


White paper for campus discussion of futures


March 1, 2005


ECURE 2005, Phoenix, AZ


10

UCSB Informatics: Participants


Faculty chosen on the basis of


Innovative science


Data intensive work


Interdisciplinary research


Recommended by the Office of Research, colleagues,
department heads, IT offices and librarians.



Control Group: Non
-
science faculty



Select group of technologically innovative faculty in other
disciplines were used as a control to determine whether
trends were specific to sciences



About 40 people interviewed



March 1, 2005


ECURE 2005, Phoenix, AZ


11

Sample Questions for Faculty



How do you store research information?


Do you do any cataloging, indexing, or metadata?


How are your data maintained on an on
-
going basis?


Is there something special about the way that you manage
your data compared to colleagues within the field?


Do you write or borrow scripts/tools? For what purpose?


Are you having difficulty managing your data collection? Are
there services that you wish others would provide?


How is IP and sharing of datasets/information handled in your
field?


When you collaborate with others through the web what kinds
of tools, if any, do you use?


What are your plans for this research in the next five years?
Are there service requirements that you will need then?


March 1, 2005


ECURE 2005, Phoenix, AZ


12

Findings: Growth of Systems


The sophistication of informatics arrangements is determined
by the amount of data collected and how labor
-
intensive it is to
collect.


Change happens when the following converge:


Data size increases exponentially


Research questions encompass broad range of specialties


Funding agencies require change for funding



Guiding principles seem to be:


“What is the smallest group of people that I can have do
the work, and still do the [work]”


“What is the least amount of indirect work [e.g.,
informatics] related to the research that I can do, and still
do the [work]”




March 1, 2005


ECURE 2005, Phoenix, AZ


13

Findings: Data Preservation

Perceived

Long
-
term

Preservation

Need

of

Faculty

and

Staff

Researchers

Perceived Preservation Need
Impact Unknown
31%
Some Need
50%
Future Need
3%
Critical Need
16%

March 1, 2005


ECURE 2005, Phoenix, AZ


14

Findings: Data Preservation



Some science fields have national and international data
centers where data deposit is required for grant funding.


Where data centers do not exist, backup depends on:


Length of a grant


Length of time primary researcher on campus


Perception that data has maximum value for 12
-
18 months after
publication, and negligible value after 5
-
10 years.


Departments lack personnel and support for long
-
term
preservation of data.


Faculty store data on the “removable media of the day” and
forget about it, until it becomes difficult or impossible to access


More complex systems, same number of people to manage
them, leads to less time to devote to “meta
-
issues”


Critical impact: research collaboration and long term historical
data analysis suffer





March 1, 2005


ECURE 2005, Phoenix, AZ


15

Data Preservation Practices

Stores on multiple
machines
22%
Run & Maintain
Portal
14%
No strategy
5%
Varies with project
3%
Relies on back-up
(short term storage)
20%
Contribute to a non-
governmental portal
14%
Contribute National
Supercomputer
Center
3%
Archive data on
CD/DVD
12%
Contribute to
Agency Data Center
or Portal
7%

March 1, 2005


ECURE 2005, Phoenix, AZ


16

Findings: Data Organization



Most common organizing mechanism


directory structure,
spreadsheets, and word processing software


Databases (with or without metadata) are uncommon. Viewed
as time/labor
-
intensive, unnecessary drain on research time.


Portals built by tech specialists within a field are well utilized.


Storage space is adequate for now. Over half the people
contacted were in the process of upgrading.


Most departments did not have strictly enforced limits on email,
data storage, and personal storage


Though much on their servers is “garbage,” memory is thrown
at the problem; little support in most departments for data
management


“Not a solved problem.” While actual memory might be cheap,
tape, labor, and other equipment to ensure that data are
maintained is NOT.




March 1, 2005


ECURE 2005, Phoenix, AZ


17

Findings: Metadata issues


Metadata is discipline specific; commonalities
exist, but key requirements of a discipline vary.


Metadata structures and subject taxonomies
reflect the way faculty in a discipline think


While organizational structure is an important
issue in metadata use, other considerations are:


Services available in one’s discipline


Acceptance and standardization in the discipline


Usage in key portals, data centers, and repositories


One worldwide metadata format is not likely at
this time


Interdisciplinary metadata issues and crosswalks




March 1, 2005


ECURE 2005, Phoenix, AZ


18

Metadata Usage

Used in select
projects.
11%
On campus usage
only
19%
Assisted in
development of
metadata
5%
Rarely used.
38%
Consistent use at data
centers/portals
27%

March 1, 2005


ECURE 2005, Phoenix, AZ


19

Findings: Intellectual Property


Intellectual property protocols that faculty follow
after creating software, portals or databases are
highly correlated to the discipline.


In disciplines where things move quickly, the ideal method
is to open source one’s tool to obtain an audience, then
later align oneself with a company, or start one;


In disciplines where there is a lot of money there is
pressure to ensure patents are filed.


Databases, portals and data centers on campus
typically all have legal waiver forms, allowing
release of the data sets to other researchers as
part of the process to ingest the data.


Disciplines vary in the extent to which they
support an ethic of data sharing.



March 1, 2005


ECURE 2005, Phoenix, AZ


20

Digital Rights Management Practices

Prefer to create open source


products to avoid intellectual

property issues, 22%

Practices and Procedures in

industry are well tested and

accepted
-

no major issues,

16%

Occasional minor issues with


an individual collaborator or

publisher, 24%

Intellectual property issues


affect my research


significantly, 30%

Have not yet encountered

issues, 8%


March 1, 2005


ECURE 2005, Phoenix, AZ


21

Findings: Data Support Needs



Some needs and services were mentioned across
disciplines regardless of current arrangements:


Informatics “point person” or clearinghouse for
information on tools, expertise, and research
knowledge on campus and nationally


Long term archiving of research data especially
during the gap in coverage between publication and
obsolescence


Tiered support services for database development,
cataloging, conversion, emulation, migration, web
development, metadata, pre
-
planning for technology
grants


March 1, 2005


ECURE 2005, Phoenix, AZ


22

Trends Shaping Future Demand



Growth in complex data objects


Improved data mining


Policies of funding agencies


National repositories


New cyberinfrastructure initiatives


Prevalence of campus repositories for text


Tech
-
intensive academic programs


Need for rapid and global data exchange


Steady or decreasing staffing


March 1, 2005


ECURE 2005, Phoenix, AZ


23

Key System Characteristics



Flexibility to customize control, interfaces and
security


Secure access worldwide


Metadata
-
agnostic design


Interoperability with scholarly
communication, archiving and rights
management systems


Clearinghouse functions


Advanced services for migration, emulation,
long
-
term digital archiving



March 1, 2005


ECURE 2005, Phoenix, AZ


24

Topics for Campus Discussion



Where are the gaps in current offerings?


How do technology services on campus
interact, and are new organizational models
needed?


What are faculty priorities for various services?


What kinds of research data should be high
priority for preservation, and how much is at
risk?


What are incentives for faculty participation?


What is the impact of tenure and promotion
structures in encouraging “data maintenance
work?”



March 1, 2005


ECURE 2005, Phoenix, AZ


25

Possible outcomes


Everything stays as is


More peer
-
to
-
peer sharing of resources and
expertise


Policies are established


Intellectual property rights at several levels


Use of metadata and digital object standards


Ensure data sustainability


Organizational approaches are considered


IT offices, the library, consortial systems support, disciplinary
groups, or a combination


New services are offered


Database design


Metadata creation


Consulting


Clearinghouse functions


Full digital archiving and migration


March 1, 2005


ECURE 2005, Phoenix, AZ


26

Further Information


UCSB Informatics Project web site:


http://www.library.ucsb.edu/informatics/



ECAR Research Bulletin
, vol. 2005, Issue 2:
“Informatics and Knowledge Management for
Faculty Research Data,” Jan. 18, 2005


Contact:


Sarah M. Pritchard, University Librarian
pritchard@library.ucsb.edu


Larry Carver, Director of Library Technologies and
Digital Initiatives,
carver@library.ucsb.edu



Special thanks to Smiti Anand, Project Analyst