The NSF Cyberinfrastructure for the

voltaireblingΔιαχείριση Δεδομένων

20 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

169 εμφανίσεις

The NSF
Cyberinfrastructure

for the
21
st

Century Program

CIF21


Rob Pennington

Program
Directo
r

Office
of
Cyberinfrastructure

National Science Foundation

1

The Shift Towards a “Sea of Data”

Implications


All science is becoming data
-
dominated


Experiment, computation, theory


Fourth paradigm


Classes of data


Collections, observations, experiments, simulations


Software


Publications


Totally new methodologies


Algorithms, mathematics, culture


Data become the medium for


Multidisciplinarity
, communication, publication…science




2

Fundamental questions
become focused around
data: How to remove
boundaries? How to
incentivize sharing?

How do we attribute credit
for this new publication
form? How are data peer
reviewed? What is a
publication in the modern
data
-
rich world?

Scientific Data Challenges

3

Bytes per day

2012






2020


Genomics

LHC

TeraGrid
,
Blue

Waters

Square

Kilometer

Array

Genomics

LHC

Climate,

Environment

LSST

Exa

Bytes






Peta

Bytes






Tera

Bytes





Giga

Bytes






Climate,

Environment

Volume

Distribution

Data Access

Many smaller datasets…

DataNet

4

Software

Analytic Tools

Compute,

Modeling

Communities

Expertise,
research

Networks

Sea of Data

CIF21

Science, innovation, discovery, economic competitiveness

Grand
Challenges


EarthCube
, Understanding the
Phenome
,



Clean Energy,

Climate prediction, Social networking,


Complex networks, Health records,
cybersecurity
,

Matter
-
by
-
design, disaster recovery,
etc

Multi
-
disciplinary & multi
-
scale integration

CIF21 and Transforming Research

Discovery

Collaboration

Education

NSF

CIF21 Major Areas

Organizations


Universities, schools


Government labs, agencies


Research and Medical Centers


Libraries, Museums


Virtual Organizations


Communities

Expertise


Research and Scholarship


Education


Learning and Workforce Development


Interoperability and operations


Cyberscience

Networking



Campus, national, international networks


Research and experimental networks


End
-
to
-
end throughput


Cybersecurity

Computational
Resources


Supercomputers


Clouds, Grids, Clusters


Visualization


Compute services


Data Centers

Data



Databases, Data repositories


Collections and Libraries


Data Access; storage, navigation


management, mining tools,


curation
, privacy

Scientific Instruments


Large Facilities,
MREFCs,,
telescopes


Colliders, shake Tables


Sensor Arrays


-

Ocean, environment, weather,


buildings, climate. etc

Software



Applications, middleware


Software development and support

Cybersecurity
: access,


authorization, authentication

Advanced
Computational
Infrastructure


Data

Infrastructure


Program

Broad Principles to Lead CIF21


Builds national infrastructure for S&E


Leverages common methods, approaches,
and applications


focus on interoperability


Catalyzes other CI investments across NSF


Provides focus and is a vehicle for coordinating
efforts and programs


Based upon a shared governance model
involving
all parts of NSF


Managed
as a coherent program by OCI


Spiral
development
methodology

6

Evolution of CIF21 and NSF Data Programs

7

ACCI

Task

Force

NSB

DataNet


Awards

Community

Input

NSF

CIF21

Data

Programs

On
-
going input

Science &

Engineering

Research

+

Cyberinfrastructure


Data Related Context


National Science
and Technology
Council (NSTC)


http://www.whitehouse.gov/blog/2012/01/30/your
-
comments
-
access
-
federally
-
funded
-
scientific
-
research
-
results


Networking and Information Technology Research
and Development (NITRD)


http://www.nitrd.gov/subcommittee/
bigdata.aspx


National Science Board Data Policies Task Force


http://www.nsf.gov/nsb/committees/
tskforce_dp.jsp


Advisory Committee for
Cyberinfrastructure

(ACCI)


www.nsf.gov
/od/
oci
/taskforces/

8

NSTC RFIs for Public Comment
-

Context


Two Requests for Information (RFIs)


Nov 2011


Public Access to Digital Data Resulting from Federally
Funded Scientific Research


Preservation, Discovery and Access


Standards for Interoperability, Re
-
Use and Re
-
Purposing


RFI for Scholarly Publications


http://
www.whitehouse.gov
/blog/2011/11/07/request
-
information
-
public
-
access
-
digital
-
data
-
and
-
scientific
-
publications


Comment period closed on 12 Jan 2012


Digital Data: 118 responses


Scholarly Publications: 377 responses


Individual and institutional responses


9

NSB Data Policy Task Force
-

Context


Dec 2011: NSB 11
-
79 Recommendations


http://www.nsf.gov/nsb/publications/2011/nsb1124.pdf



#
1: Provide leadership … in the development and
implementation of digital research data policies ...


#2: … require grantees to make both the data and the methods
and techniques used in the creation and analysis of the data
accessible … Data should be shared using persistent electronic
identifiers …


#3: Continue
to expand the support of computational and data
-
enabled science
and
engineering



#4: Convene
a panel
..
to explore and develop a range of viable
long
-
term business
models…


#5: Further
the expansion of sustainable data management,
including
preservation
and
curation

of pre
-
existing and newly
generated long
-
lived
data …


10

NSF Advisory Committee for
Cyberinfrastructure

(ACCI)

Task Force
-

Context

Grand

Challenges

Campus

Bridging

Data and
Viz

Cyberlearning

HPC

HIGH P ERFORMANCE COMPUTING

Software


Grand Challenges
,
HPC,
Data/
Viz
, Software, Campus

Bridging,
Cyberlearning



More than 25 workshops and
Birds

of a Feather session
s and
more than 1300 people involved



Final reports:

http
://
www.nsf.gov
/od/
oci
/taskf
orces/

11

ACCI Data Task Force Recommendations


Recognize

data infrastructure and services as
essential research assets fundamental to today’s
science and as long
-
term investments in national
prosperity


Create new citation models in which data and
software tool providers are credited with their
data
contributions


Develop and publish realistic cost models to
underpin institutional/national business plans for
research repositories/data
services


Identify and share best
-
practices for the critical
areas of data management


12

CIF21 and Data Enabled Science


Provide critical tools and services for data
mining, integration, analysis, modeling and
visualization.


Overcome barriers to scaling, synthesis, and
interoperability to promote effective use of
large scale, shared data resources.


Strategic investments that concentrate tools,
resources and expertise in support of
compelling grand challenge science
questions.

13

Data Infrastructure: A Multi
-
tiered
and Multi
-
Disciplinary Landscape

14

Observational

Communities

Modeling and Simulation
Communities

Population, Climate,

Environment
Communities

Data
Content


Data
Storage

Data
-
enabled
Science


DataNet

supported


CIF21: Data
-
Enabled Science


Data
-
intensive Science Program
(knowledge)


Intensive disciplinary efforts, multi
-
disciplinary
discovery and innovation


Data Analysis and Tools Program
(information)


Data mining, manipulation, modeling, visualization,
decision
-
making systems


Data Services Program
(data)



Provide reliable digital preservation, access,
integration, and analysis capabilities for science
and/or engineering data over a decades
-
long timeline



15

Dumped On by Data: Scientists Say
a Deluge Is Drowning Research

Data
Curation


Sustainable, community
-
based networks for
management of critical scientific data resources in a
life
-
cycle context.


Overcome challenges of culture change, policy
development and implementation, sustainable
operations, quality and usability control.


Strategic awards that address heterogeneity in
formats, complexity, semantics of data collections
that are valued by science communities of significant
breadth.


Operate as a network of data services that promote
interoperability,
multidisciplinarity
, and scalability.



16

Data Storage


National storage infrastructure for scientific data


Accommodate
scale
and
heterogeneity
through robust,
open, and broadly accepted standards


Business model implemented with governmental,
academic, non profit, and commercial stakeholders


Make strategic investments that:


Leverage existing resources in XSEDE, commercial
clouds, federal data centers


Meet growing capacity needs at optimum cost


Provide coordinating and integrative functions for
integrity, access control, availability, persistence


Catalyze a national
data infrastructure


17

Cross Cutting
C
hallenges


Balancing Research into Next Generation
infrastructure with operation & maintenance of
current
capacity


Sustainability through technical design,
development of business models, and integration
with the research
cycle


Integration


Vertical


Linking low
-
level bit storage infrastructure
to data collections, and
to applications


Horizontal


Achieving connectivity and interoperability
between activi
ties that vary in scale,
disciplinarity
, and
funding
source

18

Summary


CIF21 is focused on effective ways to
approach and respond to the challenges


C
ritical concepts and goals


Realistic and innovative


Spiral process with strong, on
-
going feedback



Structure for longevity


Scalable open inclusive governance


Long term business models


International collaborations and programs


19