Measuring TeraGrid Impact: Methods to Document Effects of TeraGrid Resources and Capabilities on Scientific Practice and Outcomes

usefultenchMécanique

22 févr. 2014 (il y a 3 années et 1 mois)

93 vue(s)

Measuring TeraGrid Impact:

Methods
to

Document
Effects

of TeraGrid
Resources


and Capabilities

on
Scien
tific Practice and Outcomes



Impact Requirements Analysis Team


Co
-
Chairs:

Mark Sheddon (SDSC)

Ann Zimmerman (University of Michigan)


Members:

John Co
bb (ORNL)

Dave Hart (SDSC)

Lex Lane (NCSA)

Scott Lathrop (UC/ANL)

Sergiu Sanielevici (PSC)

Kevin Walsh (SDSC)


GDO Supervisor:

Dane Skow (ANL, Deputy Director)








Final report: Sep
t
ember

2006


Draft
-

9
/1/06

1

Table of C
ontents


Executive Summary
....................
......................................................................................


2


Background

.......................................................................................................................


4


Recommendations
............
.................................................................................................


7


Science Gateways
..............................................................................................................

12


Advanced Methods

.....
......................................................................................................

13


Conclusion

.........................................................................................................................

13


Acknowledgme
nts
.............................................................................................................

13


References

.........................................................................................................................

14


A
ppendices



Appendix A: Table correlating guiding questions, TG goals, and data sources .....

15


Appendix B: Recommendations for modifications to POPS ..................................

17


Appendix C: Components of a good nugget .........................
..................................

19


Appendix D: Message describing TG impact .........................................................

21


Appendix E: Sample publication list categorization ..............................................

23


Appendix F:
Globus, GPFS
-
WAN, visualization, and exemplar projects .............

28








Draft
-

9
/1/06

2

Execut
i
ve Summary


It is important to document and explain the impact of the TeraGrid (TG) on science to TG staff and
participants, to TG sponsors and review teams, to the grea
ter scientific community, and to the public.
This information would be valuable to answer stakeholder questions, to promote interest in and
support for the TG, and to assist with internal program management. In light of this need, the TG
Grid Infrastructur
e Group (GIG) initiated the Impact Requirements Analysis Team (Impact RAT) to
address these issues. The purpose of the Impact RAT was to investigate and recommend measures to
assess the short
-

, mid
-

, and long
-
term effects of the TG's capabilities and res
ources on the
work of

the scientific users of the TG.
Specifically, the team was charged to formulate the
overarching
questions
to be addressed
by the impact measures, to investigate existing
measures being used by
TeraGrid, and to
identify the use or deve
lopment of new methods to assess TeraGrid impact on
scientific practice and outcomes.



This report summarizes the Impact RAT's work and provides recommendations for TG management
to consider and act upon.
In preparing its recommendations, t
he
team attempt
ed to strike a bal
ance
between the usefulness of
the approache
s
it

identified
, the effort required from TG users and
reviewers to provide data, and

the
number and type of
TG
personnel that would be
necessary

to
collect, manage,

and analyze data
.

The Impact

RAT also considered issues related to privacy and
confidentiality of impact
-
related data and
strove to ensure that all
aspects of TG, including people
and

compute and non
-
compute resources were represented.



The Impact RAT concluded that a combination of

approaches to measure impact is required

for three
reasons
. F
irst
, TG's impact on science must be measured in light of its three mai
n goals (Deep, Wide,
and Open) each of which presents different challenges in terms of assessment. Second, there are
multip
le sides to the TG
, and t
he state
-
of
-
the
-
art in the assessment of scientific impact does not
adequately address the complexity and uniqueness of the TG. Third, scientific users are still adapting
to the notion of grid computing. Many users continue to use
TG resources in traditional ways, but
this is ex
pected to change over time. For all these reason
s
, the approaches described in this report are
only a beginning, and they must continue to evolve and change as our understanding
grows
and
measurement methods
improve.


T
he recommendations presented here
were guided by the princi
ples and context described

above
,

and
they
include

suggestions to
:



enhance and expan
d

ex
isting sources of data, such
the Partnerships Online Proposal System
(POPS), the Principal Invest
igator (PI)

log
, and
the
ASTA log

to make them more useful and
mine
-
able
;



collect

and analyze

usage data on software
, particularly “grid” software such as Globus and
GPFS
-
WAN

and non
-
allocated resources (e.g., data storage, networking resources)
;



creat
e

a new source of data in the form of a nugget database

to collect important qualitative
data
;




improve the collection of data on TG users; and



continue to employ
and improve the design of user surveys




In each case, responsible individuals
must

be identif
ied
,

and
in
some
instances,
additional personnel
may be required

i
f

the
recommendations
a
re

to
succeed.


Draft
-

9
/1/06

3

T
he Impact RAT
also discu
ssed more complex
, in
-
depth,

and
resource intensive

approaches
to
impact measurement
, such as network analysis

and data visual
ization techniques

to discern
changes
to the social organization of research
, peer review and evaluation beyond current activities,
ongoing
interviews and focus groups,

and historical

case studies.

These methods are described briefly in this
report, but th
ey are not discussed in detail. However, the Impact RAT believes that they are critical to
measure the full breadth of TG’s impact on scientific discovery. While this report’s
recommendations do not include specific suggestions with regard to these methods
, t
he
Impact RAT

considered
the

data that might be requ
ired for
the
se approaches
so that, to the degree possible, these
data would be collected
and available for analyses by TG personnel or by others.



Finally, the Impact RAT discussed the importance of S
cience Gateways and their impact on science
in terms of TG and non
-
TG resources and capabilities
. Capturing this impact is
critical

both
to TG
and to the gateway communitie
s and requires a collaborative approach.

One way in which TG can
begin to facilitate

this process is to work with the Gateways to develop tools and methods to
instrument the use of TG resources within the Gateways. The Impact RAT re
commends that this
work be carried out under the direction of Nancy Wilkins
-
Diehr, the Area Director for Sci
ence
Gateways
, at the appropriate time
.
1



1

The Impact RAT would like to thank Nancy Wilkins
-
Diehr for sharing her knowledge and insights regarding this
topic.

Draft
-

9
/1/06

4


Background

It is important to document and explain the impact of the TeraGrid (TG) on science to TG staff and
participants, to TG sponsors and review teams, to the greater scientific community, and to the public.
T
his information would be valuable to answer stakeholder questions, to promote interest in and
support for the TG, and to assist with internal program management. In light of this need, the TG
Grid Infrastructure Group (GIG) initiated the Impact Requirement
s Analysis Team (Impact RAT) to
address these issues. The purpose of the Impact RAT was to investigate and recommend measures to
assess the short
-

, mid
-

, and long
-
term effects of the TG's capabilities and resources on the work
conducted by the scientific

users of the TG.



The Impact RAT, which met for the first time on April 13, 2006, drafted a charter that outlined its
objectives. This charter was approved by the GIG and included the following specific tasks:




Articulate the questions to be addressed i
n regard to the TG's impact on scientific outcomes
and provide a rationale for their relevance and importance in light of TG's goals;



Investigate existing metrics that are indicative of the outcomes to be measured and assess
their relevance to TG and
, if
needed,

develop new metrics;



Identify metrics currently in use by the TG to measure the impacts of its programs and
evaluate them against the questions developed in order to make a recommendation regarding
whether to discontinue them, continue them unalter
ed, or enhance their use; and



Make final recommendations regarding metrics of scientific impact taking into account the
time, effort, and cost required to obtain the necessary data and information.


Although the tasks specified in the Impact RAT's charter
used the word
metrics,

the team soon
recognized that

for several reasons

it would be difficult to define quantitative measures of scientific
and technical outcomes. First, the main product of research is an understanding of fundamental
phenomena, and this
understanding is not
compatible with

metrics. Only the physical expressions of
understanding such as publications, hardware, and software are amenable to metrics (
Kostoff, 2001)
.
Thus, metrics will always be incomplete in describing the performance and pro
gress of scientific
research. Second, the team recognized that qualitative data are a necessary and important component
of the assessment of TG impact. For example, usage data provide important information about
growth in the use of a particular resource o
ver time, but they provide little insight into the effects of
that resource on scientific discovery. This information is often best captured through interviews or
focus groups. Finally, the word m
etrics implies comparison, but the baseline measures that
ar
e
needed to make comparisons are
difficult to find, so it is not possible to rely
only

on

this approach.


In contrast, an impact is an effect on something that can be
assessed

in many ways, including
anecdotally. Therefore, the Impact RAT chose to use th
e
word
impact
instead of the word
metrics
. In
the case of the TG, the goal was to find ways to correlate scientific achievement by TG users with the
TG resources and technologies that enable this achievement. This demands the use of quantitative
and qualitat
ive approaches alone or in combination.


It was important to clarify the difference between metrics and impacts,
but
this distinction did not
simplify the Impact RAT's task.
T
he measurement of science and technology outcomes is comprised
of a disparate ar
ray of indicators and measures that scholars
,

practitioners
, and agencies such as
the
National Science Foundation (
NSF
)

and the Department of Energy (DOE)

are striving to improve
Draft
-

9
/1/06

5

(e.g., Geisler, 2005; Kostoff, 1994, 1997; Ruegg & Feller, 2003).

Thus, the
I
mpact RAT
soon learned

that it would be a

challenging

task to

measure the
scientific
impact of

the TG
.

This

difficulty was
also encountered recently by
(DOE) centers

engaged in an effort similar to the Impact RAT. In July
2006, representatives from three D
OE centers met
with Gordon Bell, senior researcher at Microsoft
and noted expert in high
-
performance computing, to
discuss replacing the existing metrics currently
used by the Office of Management and Budget to judge the success of DOE Computational effort
s.
The Impact RAT reviewed presentation slides that resulted from DOE's work and found that many of
the

DOE group’s

discussions
and conclusions
were similar.


Guiding Questions

In light of the many challenges to the measurement of the scientific impact of
the TG, the Impact
RAT articulated two questions to guide its
discussion
s

and recommendations:


1)

What impact has the TeraGrid had on the practice of scientific research?

2)

What impact has the TeraGrid had on the production of scientific knowledge?


The Impact

RAT recognized that

each of these questions
should

be
measured in light of
TG's
three
main
goals
:





The goal of TeraGrid DEEP is to fully exploit the integrated capabilities of the TeraGrid
facility in order to support scientific discovery that would othe
rwise not be possible.



The goal of TeraGrid WIDE is to bring cyberinfrastructure resources to a much broader
science community by adapting TeraGrid to the tools and systems that communities are using
today.



The goal of TeraGrid OPEN is to leverage systems
from other Grids to promote
interoperation.


Each of these goals presents different challenges in terms of
impact measures
. T
he

state
-
of
-
the
-
art in
the assessment of scientific impact
does not
adequately address the complexity and uniqueness of the
TG

as e
xpressed in these goals
.
In addition, scientific users are still adapting to the notion of grid
computing, and many continue to use the TG and other
high
-
performance computing (HPC)
resources in traditional ways. Over time, this is expected to change. As t
his occurs, our knowledge
about how to capture important measures of grid computing

will grow, and the methods
discussed
in
this report will need to
change

and evolve.


There are many sub
-
questions that fit un
der the two broad questions
listed

above
.

T
he I
mpact RAT
identified those

listed in
the box below
to help

guide the next steps in its work.



How influential within their field are users of the TeraGrid? (e.g., Number of

times publications are cited,
q
uality of journals published in)


Is high quality
science being produced by the users of TeraGrid (bibliometrics, peer review)


What is possible that was not possible without the TeraGrid? (e.g., Can scientists do things faster? Can
they run more complex analyses?) Are there products or services along t
he development and
deployment timeline that have had
significantly more impact? D
o the
se

warrant further investment of time
and effort?


Draft
-

9
/1/06

6

What economic impacts have been produced as a result of the TeraGrid? (e
.g.,

number of patents,
expanded range of HP
C and grid related products and services)


How have the perceptions of the allocations committee

members, NSF program officers, and TG users

changed over time? What do they see as major factors of TeraGrid’s impact on science?


Has the TeraGrid increase
d the level of awareness of high performance computing and of grid
computing? Do

research and education communities

understand what grid computing is and how it can
benefit science?


Has the TeraGrid extended the availability of high performance computing

to more types of institutions,
users, and disciplines? What has been the impact on under
-
represented communities, and
organizations? What has been the impact on under
-
served disciplines? Are the gaps widening or
narrowing?


Has the use of the TeraGrid
increased over time? What resources have seen the most increased use?
How has usage changed across the disciplines

and institutions

over time? What are the trends that
would suggest success or barriers to broadening community use?


What have been the cha
nges in the use of HPC/grid computing approaches relative to experimental
approaches over time? What factors have contributed to significant changes?


What have been the changes in the use of HPC/grid computing in education over time? What factors
have c
ontributed to significant changes?


What is still not possible even with TeraGrid? What are major barriers and challenges that preclude
significantly advancing science impact?



A
nswer
to
the questions listed above
will require varying levels of
budget an
d s
taff

support
, and
not
all of the questions can be addressed within the limitations of TG resources. For example, to uncover
changes to scientific practices that are spurred all or in part by the TG requires more than the
collection of usage data. Interv
iews and focus groups with TG personnel and TG users and analysis
of long
-
term data is time and resource intensive

and may be beyond the scope of TG efforts

even
though

these methods are necessary to

understand

the
influence
of
TG on scientific discovery
.

For
this reason, the Impact RAT believed it was important to include all the questions in this report

despite the fact that they cannot
all
be answered by TG alone.


Existing Metrics

As the Impact RAT worked toward ways to answer the guiding question
s
, it
began by reviewing
existing metrics in use by the TG
. These approaches a
re summed up
largely
by a quote from
the
report produced by the panel convened by NSF to review the TG in
Jan
uary

2006
.


The quantitative evaluation of TeraGrid’s research impact relie
s heavily on three
metrics: number of users, distribution of TeraGrid usage (measured in NUs)
, and

by
discipline and numbers of papers published. The panel felt that both the kinds of data
collected and the use made of it could be improved. For example, wh
ile user and usage
data confirms the demand for TeraGrid services, it does give not a very informative
picture of how TeraGrid is serving different research communities. The number of
papers suggests impressive productivity but fails to address impacts suc
h as changes to
research practices. Also, lack of baseline data against which to make a comparison,
makes it hard to determine what impact TeraGrid has made.

Draft
-

9
/1/06

7


The Impact RAT
concurred with the

review panel's assessment.
Further,
the
team

felt

that
these
m
etrics are still useful

and should be added to and enhanced
instead of
discarded
, which was

also

in
the spirit of the review panel's recommendations
.

Recommendations

The remainder of this report discusses the impact measures that the Impact RAT recommends

for
the TG. We begin by outlining the principles that guided our recommendations. This is followed
by a description of the sources of impact data, and then by the team's specific recommendations.

Guiding Principles

This section describes the overarching
ideas

that surround the specific report recommendations.

Specifically, t
he Impact RAT's recommendations for the collection of data related to impacts of the
TG were guided by the following

principles
:




It is important to articulate clearly the reasons why
particular data are being collected and to
describe the purposes for which they will be used.



It is necessary to consider the ramifications for budget and staff and to be realistic with
regard to what can be implemented
and sustained
by TG.



Requests for d
ata should be standardized across sites as much as possible.



TG users, reviewers, and TG staff should only be asked to provide data that are needed and
are useful.



Concerns regarding privacy and confidentiality of data should be considered.



TG is more tha
n the sum of its parts. In addition, it is comprised of compute, non
-
compute
resources, and people; assessments of TG's impact should consider all of these aspects.



It is not TG's role to evaluate the quality of the science as there are other mechanisms in

place
for this. However, TG can use accepted approaches (e.g. quality of journals published in,
awards and prizes, etc.) to investigate the perceived quality of the scientific work that is
produced through the use of TG resources.



There are
individuals an
d organizations

outside of TG that can contribute to pr
ocess. This
includes people

involved in the peer review process for scientific merit

and TG allocations
;

organizations such as DOE and the Open Science Grid (OSG) that share similar concerns
;

and
agenc
ies that
fund scientific research.
I
nput
from all these sources is

essential to the
process and can help influence the evolution

and development

of
measures of scientific
impact.

Data Sources

In general, t
here are three types of data sources

relevant to th
e measurement of TG's impact on
science
: 1) raw utilization data; 2) peer
-
reviewed information, such as journal publications and
allocation and funding proposals; and 3) anecdotal evidence. All
of these
are important to assess the
impact of the TG.
Within
these three types, t
he Impact RAT identified ten
specific

sources of data
that are useful to impact assessment:


1) Allocations Proposals

2) TeraGrid Central Database

3) Surveys

Draft
-

9
/1/06

8

4) System/Network Logs

5) Trouble Ticket Logs

6) Web

7) User Advisory Committe
es (e.g., C
yberinfrastructure
U
ser
A
dvisory
C
ommittee
)

8) Individual (concentrated) experience (
e.g.,
ASTA)

9) Interview and focus group data

10) T
hird
-
party data sources (e.g., s
cience citation data)


The Impact RAT discussed the sources and chose several

for recommendations where improvements
could be made.
Greater detail can be found in the table in
Appendix A
, which

matches

guiding
questions

with one or more of TG's three goals and w
ith
specific
examples of the
data sources

listed
above. In addition, th
e table describes whether the data are currently being collected, and if so by
whom, and the limitations of
each
data source
.

Specific
Recommendations

The team suggests the following recommendations for consideration by the TG management team for
improving

the capability for describing impact.



#1: Modify
the Partnerships Online Proposal System (
POPS
)

to make it

more useful and
mine
-
a
ble

Appendix
B

describes specific recommendations for modifications to POPS to permit quantitative
analysis of xRAC reviewer

comments and PI agency funding support. The text below de
scribes the
rationale for the suggested
changes.


The Partnerships Online Proposal System (POPS) is an online submission form for requesting
computing allocations on T
G resources. Every principal in
vestigator (PI)

who requests time on TG
resources must complete this form. Requests submitted to POPS are reviewed by the Development
Allocations Committee (DAC), Medium Resource Allocations Committee, (MRAC), or Large
Resource Allocations Committee (LRAC)
, as appropriate. There are approximately 40 reviewers that
represent different scientific fields, and each MRAC and LRAC proposal is reviewed by 2 to 5
reviewers.

The Impact RAT felt that i
mprovements in the form can be made both in terms of the
informati
on that is requested from PIs and from reviewers

in proposals for initial allocation requests
a
s well as those for

renewals
.


The role of the TG Resource Allocation Committees is to evaluate the quality of the past and
proposed mapping of scientific progr
ess to the TG resources.
Applicants are asked to set forth a
detailed work plan that shows how specific TG resources are expected to be used to provide
particular scientific results. Renewal p
roposals are supposed to include a narrative of the way in
which

resources allocated in the previous grant period resulted in progress. Specifically, applicants
are requested to provide a list of publications, prizes, honors, etc., that can be attributed in part to the
use of TG resources. In theory, mining proposals w
ould provide valuable information on the impact
of the TG on scientific outcomes. In practice, this is difficult for the following reasons.




Tools to mine the POPS data do not exist and much of the information collected is in free
-
form text and hence diff
icult to mine. In addition, in order for these tools to be used
effectively, changes are required to the POPS forms and
to
the database that holds them. Staff
time would be needed to develop and operate the tools and to analyze and report the
information.

Draft
-

9
/1/06

9



Proposals are confidential information. Data would need be to be made anonymous as part of
the reporting process. Or, applicants might check a box if they are willing to have their
information made public.



Users are generally better at reporting their scie
ntific progress than they are in explaining
which TG technologies or capabilities help to make the discovery possible. The justifications
for non
-
compute

and

non
-
allocated resources that applicants provide for their requests are
often vague, and reviewers
usually accept this. This is even truer of specific TG technologies,
such as Globus toolkit components and ETF backbone use, than it is of machi
ne
-
local
resources (numbers of CPUs
, memory, disk, programming environment, libraries and
application software).

These grid technologies are new and are thus less familiar to users, so
it is difficult for them to relate these technologies to advances in their science.

POPS should
contain questions to help guide users
describe more usefully how TG has impacted their
science. Thus, we recommend than when PIs request a renewal, they should address

questions similar to
the following three in relation to work already accomplished:
2

o

Why is the science important? What new science has been achieved by using TG?

o

What aspects
of the computational part are hard?
What aspects have been of greatest
value to your research? What aspects have been improved with time? What aspects need
further improvement to help you advance your science?

o

How did the TeraGrid help you accomplish the

computational/cyberinfrastructure
part? What could TeraGrid do to further help you?

POPS has the potential to provide a wealth of data related to impact. For example, an analysis of
“supporting grant” information would provide some insight into the monet
ary value that funding
agencies have put on the potential

impact of allocated projects. W
ith hundreds of proposals submitted
each year

and re
viewed by independent experts in the field, POPS is in the position to collect much
more quantifiable data. For exa
mple, reviewers might categorize the type of activities being
conducted, identify the classes of TG resources proposed for use, and provide some assessment of
recent or potential impact. Such responses (restricted to a fixed set of answers) would provide a

new
window into proposals.


Staffing and Effort
: The modifications to POPS

that are d
escribed above and in Appendix B

will
require some effort

from TG staff, reviewers,

a
nd TG users
, but
the Impact RAT did not consider
these to be
ex
cessive
.

However, we r
ecommend

that

representatives of these groups be consulted
before significant changes are

made
.

In addition, s
ome effort will be required to extract and analyze
the data.


#2: Create a nugget database

Anecdotal evidence is an important indicator of the imp
act of TG resources. The Impact RAT
recommends that a nugget template and database be created to capture this critical inf
ormation and
associated visuals
.
The data in this system will be useful in the short term as well as the long term.
NSF has a similar
“nugget” database to fill a
comparable
need by the agency. By NSF definition, a
nugget should be
short
--
no more than one page

and should answer the following questions:




Why is this science important?



Why is the computational/cyberinfrastructure part hard?




2

The first question in each bullet was provided by Guy Almes, NSF Program Manager
, as guidance to TG for the
development of nuggets. The Impact Team felt that these questions were equally useful for POPS.

Draft
-

9
/1/06

10



What did the TeraGrid do to help accomplish the computational/cyberinfrastructure part?



As NSF Program Manager Guy Almes has pointed out, “
Without the first element, there's no point.
Without the second element, you don't need the TeraGrid. Without the
third element, you don't find
out just how the TeraGrid helped. Key issues to consider include why the computational work could
not have been done on a workstation or on a loosely coupled grid such as the Open Science Grid.”

Appendix D includes an email fr
om Phil Maechling (Southern California Earthquake Center) that
describes SCEC's usage of this type of "remarkable" software and it changed his method of
computing and the role of TG staff in the project's success. This message exemplifies the kind of
infor
mation to be collected by nuggets.


Appendix
C

describes the fields in a nugget record and
contains a mockup of the components of
a
nugget entry form.

In addition to the fields shown on the template,
each record should include one or
more images that detai
l the science represented in the image, the source of the image, and permission
to share the image with NSF and others.



Anyone should be able to enter n
uggets
in
the database, but
the most
likely candidates
are

TG
Advanced Support for TeraGrid Applicatio
ns (ASTA)
staff, E
xternal
R
elat
i
ons

staff,
and
User
Services staff.

If the nugget database is adopted by TG management, procedures will need to be put
in place to help ensure that the information is captured.


There are two existing internal reporting sys
tems that
currently
co
llect information

similar to that
desired for a nugget
, the PI log and the ASTA report log
(
http://accounts.teragrid.org/user_services/
).
3

The PI l
og web page contains infor
mation about the PI,
service units allocated, etc. A TG contact person is also listed for each project. The ASTA Report log
is for TG staff to enter their progress reports. These reports have the potential for deep anecdotal
insight to specific accomplishm
ents.
For example,
they could be

used

to capture speed
-
ups in user
codes
,

to identify
improvements in user processes as a result of
particular user tools,

and to highlight
the contributions of the ASTA

team to the research project.
Content of these reports

is similar to
what is desired from
n
uggets, so it may make sense to integr
ate them with the more general n
ugget
database.



Staffing and Effort
: The nugget database and front
-
end application will require a significant amount
of staff effort. The precise l
evel will need to be determined after more detailed software requirements
are defined.


#3: Instrument
non
-
compute resources

for usage data collection

TG has a variety of software

and other non
-
compute resources

that should
be
instrument
ed

to better
unders
tand the usage patterns and the impact on the process and production of science. Particular
emphasis is needed

on "grid" software such as

cross
-
site runs, Globus, GPFS
-
WAN, work flow tools,

GridFTP,
and others
.
These utilization data would serve as importa
nt baseline data, in the same way
that compute NUs provide baseline metrics for those resources.
In addition
,
TG
should track use of
non
-
compute resources such as
s
torage and networking resources. Finally,
consideration should be
given to

instrumenting the

various scientific data collections available through the TG. This
information would be helpful in gaining an understanding of the use of
these data.





3

This is an internal TG site that is password protected.

Draft
-

9
/1/06

11

Staffing and Effort
: This may require somewhat significant staff effort, which again will be defined
by
the relevant system administrators, when specific metrics are determined to be feasible and
usable.


#4: Categorize publications

At present, the POPS submission system collects publication data only as components of proposal
documents, that is, as part of
Word or PDF files.
Asking

PIs to submit publication lists via web
-
based
forms suitable for direct database ingestion may place an undue burden on those projects with dozens
of publications. TG may want to consider whether requiring publication lists as sep
arate documents
or as separate documents in specific formats (e.g., E
ndNote files) would be fruitful.
4

In any case, the
result of data collection is a lengthy list of publications published, accepted submitted
,

or in
-
progress.
The NSF reviewers
rightly

not
ed the limited utility of such a list.


The Impact RAT r
ecommend
s

that
additional analysis of the POPS publication list
be conducted to
categorize
citations
according to journal

(as applicable)
, discipline, “ranking,” and
the
POPS
proposal

associated with
the publication
. This will provide greater detail on publication impact by
showing quality of journal, etc. Including
the
POPS proposal number will provide a means to tie
publication
s

to the TG resources and capabilities used
. The POPS proposal will allow
the linking of

publication output to utilization and reviewer input
.


See Appendix
E

for a sample categorization using last year’s publication list that was completed by
Dave Hart.


Staffing and Effort
: Based on last year’s efforts, compiling and categor
izing the publication lists will
require about 20 hours of staff effort. Providing tools to link publications outcomes to utilization data
and POPS reviewer input will require additional staff effort.


#5:
Look deeper into

the user community

Besides
examin
ing the largest users of

TeraGrid compute resources (DEEP

impact), the usage
database
should be improved so that it is possible to
examine

trends among “non
-
standard” users,
such as
those from

discip
lines (e.g., social sciences) and

instit
utions (e.g., M
in
ority
S
erving
I
nstitutions
)

outside the traditional users of HPC resources
.
For all users, TG should track by
institution and type of institution (e.g., 2
-
year, 4
-
year, MSI) type of
user (e.g., race, gender, and
status
)
, and history of allocations received
. Over time, these data would be useful to help discern
whether education, outreach, and training programs
are having and impact.
This information woul
d
also be useful to assess how
usage changes over time and whether users continue to use TG.

The
latter w
ould be helpful in gaining an understanding of why users “leave.”
5


Staffing and Effort
: Once defined, such metrics would not require

significant additional effort if the
current T
era
G
rid Central

Database (TGCD)
data and reporting tools

were used
.





4

The ability to extract data from particular fields, such as the author

field, would greatly improve the ability to use
more complex techniques of impact assessment such as social network analysis.

5

Of course, additional work would be required to find out
why

individuals stop using TG resources through surveys
or interviews.


Draft
-

9
/1/06

12

#6 Cont
inue doing
an

annual user survey

to gain

direct feedback

TeraGrid has conducted two user surveys to date.
6

The first
survey
focused on

user requirements
looking forward
, and the second

was

oriented toward user satis
faction.
A brief, focused survey
minimize
s

the
burden on users and coordinating random samples among different surveys
reduces

the
chance that the
same users
will be

solicited more than once.

TeraGrid should follow these and
other guidelines to improve the reliability and validity of the surveys.

In 2006, TG is

doing this by

participating in a survey being conducted by the University of Michigan evaluation team.



Smaller surveys directed toward particular audiences or topics should also be considered. For
example, pre
-

and post
-
surveys of researc
hers that benefit from ASTA support could be very
informative.


Staffing and Effort
: Each survey
requires a significant a
mount of effort

in terms of design,
preparation, administration, and
analy
sis
, so the number of surveys should be carefully considered.

The addition of sampling
, however, does not dramatically increase the

effort

required and is useful
for the reasons mentioned above.


#7 Learn from others

The issues surrounding impact are of interest and concern
to many
.

TG should
share what it has
learn
ed and
monitor what others are doing

in ways that can inform the

p
ractices
of all interested
parties.

A first step in this process is to share this report with a broad range of
individuals and
institutions to gain their feedback.

The list of relevant revie
wers includes other entities that provide
HPC facilities and services such as DOD and DOE, Science Gateways, representative users
, NSF
officials
, and experts in the measurement of science and technology impacts
.

A workshop that brings
these groups together

to share and explore issues and approaches would

require a significant amount
of
effort, but it is another option that could

be worth exploring.

Science Gateways

The Science Gateways are
an
important avenue to bring HPC resources to communities that have

not
traditionally used them. Gateways present several challenges in terms of impact measurement. First,
Science Gateways include both TG and non
-
TG resources and capabilities and it is difficult to
separate the contributions of each. Second,
Gateways are
consciously designed to make it easy for
researchers to employ TG resources, and thus many users are not even be aware that they are using
them.
Third, the burden of reporting on impacts must be kept to a minimum for Gateways.


In spite of the difficultie
s, capturing
data on

impact is critical both to TG and to the
G
ateway
s

and
requires a collaborative approach.
At the minimum
, basic usage data should be collected. Since
gateway users take advantage of community accoun
ts
, it is possible to find the number
of jobs
submitted by them.
A further step that
TG can
take to
facilitate th
e collection of impact data
is to
work with the Gateways to develop tools and methods to instrument the use of TG resources within
the Gateways.
7

The Impact RAT

recommends that

this

effort be carried out under the direction of
Nancy Wilkins
-
Diehr, the Area Director for Science Gateways, at the appropriate time.





6

Individual Resource Provider sites often have requirements to conduct their own surveys, and this will continue.
We do not discuss these here, however.

7

We see TG being responsible to develop tools that will assist TG to collect gateway information th
at it needs.
However, this should be done in consultation with the Gateways, so the tools serve the needs of TG and Gateways.

Draft
-

9
/1/06

13

Advanced Methods

The team's recommendations considered the number and type of personnel that would be required to
collect,
manage, and analyze the data. Therefore, the recommendations focus on methods that
are not
overly complex to administer and that
do not demand a significant increase in staff
or require
substantial
effort

from staff, reviewers, users, or other experts
.

How
ever, the Impact RAT believes
that
approaches beyond the specific recommendations presented in this report are
critical to measure
the full breadth of TG’s impact on scientific discovery.
Appendix
F

contains material prepared by
Impact RAT mem
ber Kevin Wal
sh that illustrates

how SDSC
-
related projects use Globus

and GPFS
-
WAN

on the TG. This story is an excellent example of
the reason
in
-
depth studies are necessary to
capture the breadth of TG impacts on science.


The Impact RAT discussed complex, in
-
depth, a
nd resource intensive approaches to impact
measurement, such as network analysis and data visualization techniques to discern changes to the
social organization of research, peer review and evaluation beyond current activities, ongoing
interviews and focus

groups, and historical case studies

(e.g.,
Borgman, 2002;
Börner et al. 2003;
Bozeman
, 1993;
Cole, 1978;
Horn et al., 2004; Kostoff, 199
4
, 1997; Newman, 2001; Ruegg & Feller,
2003)
.
8

The team attempted to
insure that
at least some of
the data to assess lo
ng
-
ter
m and/or more
complex impacts would be collected as part of the recommendations made in this report
.

TeraGrid
will not have the resources to conduct many of these analyses on a regular basis, however, this
activity can be conducted as part of an ongo
ing evaluative process.

Conclusion

The Impact Rat believes that adoption of the recommendations in this report will go a long way to
improve the collection and analysis of data on the impact of the TG on scientific discovery.

However,
for the many reasons

outlined in this document, the team’s recommendations

are

a
beginning and not an end
. The approaches used by TG
must continue to evolve and change as our
understanding grows and
as
measurement methods improve.

Acknowledgments

Mark Sheddon and Ann Zimmerm
an want to recognize and thank
each

member of the Impact RAT
for their hard work and
brilliant thinking on
this

very difficult t
ask.

Special recognition
goes to Dave
Hart who
figured

significantly
in

every aspect of the team's work. Among
his
ma
ny contribu
tions,
Dave
outlined specific changes for the POP
S

database and
to the
PI and ASTA logs,
developed the
nugget template,
and c
omp
il
ed the publication data

in Appendix
C
.






8

Kevin Walsh, a member of the Impact RAT, is studying the impact of HPC on domain science from 1985
-
2005 for
his dissertation res
earch. This is an example of the type of study that is resource intensive, but that is critical to
understand impacts of HPC on scientific discovery. Kevin envisions a
dataspace

that includes all published and
unpublished research (and references to that w
ork) associated with computing cycles and resource allocations, so
that correlations between the data could be answered over time. Some of this information could be captured as part
of the recommendations in this report. Other information, such as PowerPoi
nt slides, lectures, etc. would be much
more time
-
consuming to compile.

Draft
-

9
/1/06

14

References


Borgman, C. L.

and Furner, J. L.

(
2002

). Scholarly communication and bi
bliometrics
.
Annual
Review of Information Science and Technology

36: 3
-
72.


Börner, K., Chen, C., and Boyak, K. W. (2003). Visualizing knowledge domains.
Annual Review of
Information Science and Technology

37:179
-
255.


Bozeman, B. (1993). Peer review and e
valuation of R&D impacts

Pages 79
-
98
in

B. Bozeman and J.
Melkers, eds.
Evaluating R&D Impacts: Methods and Practice
. Boston, Kluwer.


Cole, (get complete ref.)


Geisler, E. (2005). The measurement of scientific activity: Research directions in linking phi
losophy
of science and metrics of science and technology inputs.
Scientometrics

62(2): 269
-
284.


Horn, D. B., Finholt, T.A., Birnholtz, J. P., Motwani, D., and Jayaraman, S. (2004). Six degrees of
Jonathon Grudin. A social network analysis of the evolution

and impact of CSCW research.
CSCW
'04
: 582
-
591.


Kostoff, R. N. (1994). Federal research impact assessment: State
-
of
-
the
-
art.
Journal of the American
Society for Information Science
45(6):

428
-
440.


Kostoff, R. N. (1997). Peer review: The appropriate GPRA
metric for research.
Science

277: 651
-
652.


Kostoff, R. N. (2001). The metrics of science and technology (book review).
Scientometrics

50(2):
353
-
361


Newman, M. E. J. (2001). The structure of scientific collaboration networks.
Proceedings of the
National
Academy of Sciences

98(2): 404
-
409.


Ruegg, R. and Feller, I. (2003). A toolkit for evaluating public R&D investment: Models, methods,
and findings from ATP's first decade. National Institute of Standards and Technology, Advanced
Techology Program.



Draft
-

9
/1/06

15

Appe
ndix
A
: Table correlating guiding questions, TG goals, and data sources


Impact Question 1 (Q1): What impact has the TeraGrid had on the practice of scientific research?

Impact Question 2 (Q2): What impact has the TeraGrid had on the production of scientif
ic knowledge?


TeraGrid Goals
:



Goal 1 (G1): The goal of TeraGrid DEEP is to fully exploit the integrated capabilities of the TeraGrid facility in order to s
upport
scientific discovery that would otherwise not be possible.



Goal 2 (G2): The goal of TeraGrid
WIDE is to bring cyberinfrastructure resources and tools to a much broader science community
by adapting TeraGrid to the tools and systems that communities are using today.



Goal 3 (G3): The goal of TeraGrid OPEN is to leverage systems from other Grids and
to promote interoperation.


Data Source

Exists?

Metrics for
Goal(s)?

Collected by

Expert opinion

Impact questions

Limitations

Compute resource
utilization

Y

G2: System use,
breadth of user
community


RPs,
accounting

n/a

Q1: Are users able to take
advantag
e of TG resources?
Are the resources desirable,
well
-
used?

Impact per NU not
possible, not possible to
link jobs to specific
impact

Non
-
compute
resource utilization

N

G1, G2: Value of
‘non
-
traditional’ TG
resources, TG value
-
add

No one

n/a

Q1: Does provid
ing non
-
compute resources increase
TeraGrid’s impact. Do users
care about non
-
compute
resources.

Impact per unit used not
measurable

Allocation proposals
and reviews

Y

G1,G2(?): Science
being done,
disciplines impacted

POPS

xRAC reviewers

Q1: Progress mad
e by PIs?
What kind of research is being
done?

Q2: Are the science,
publications good?

Free
-
form documents,
xRAC not supposed to
review the science

PI funding sources

Y

G1, G2: Value of
science being done,
as measured by
funding agencies

POPS

Funding agen
cy
reviewers

Q1: Does the science meet
agency merit criteria? How
much science funding does TG
augment?

Not very detailed
information. Impact per
grant $$ supported?

PI publications

Y, sort of

G1, G2: Scientific
productivity

POPS

Journal reviewers

Q2: Is
quality science being
produced, as rated by journal
quality?

Not easily tied to TG
resource used, non
-
compute resources
and/or services. All pubs
are not equal. Citation
Draft
-

9
/1/06

16

analysis hard. Pubs take
years to appear, have
impact.

User Survey(s)

Y

G1, G2, G3(?)
: User
satisfaction with
TG
-
wide resources
and services

Survey
instrument

PIs

Q1, Q2: Respondent self
-
reports of impact. Q1: Success
of TG at deploying useful
infrastructure

Self
-
reporting
inconsistent. Impact is
not of scientific, but only
of resource pro
vision
success

Science Gateways
utilization

N

G1,G2,G3: TG
broad impact,
effectiveness of TG
griddiness

?

Gateway leads

Q1, Q2: Impact of TG on
disciplinary community

“anonymity” of gateway
users

CUAC

N (not yet)

?

CUAC

CUAC members

Q1, Q2: Expert opinio
n of TG
impact on particular disciplines

Limited set of
viewpoints.

Grid/software
utilization

N

G2,G3: Use of grid
capabilities, third
-
party software

n/a

n/a

Q1: Utility of TG offerings, TG
training and services

Tying impact to
utilization.

ASTA, other T
G
services

Sort of

G1: Use, impact of
various activities

?

ASTA PIs, users of
these services

Q1, Q2: Is this service having
impact?

Tying impact to
utilization.

Inca reporters

Y

G3(?): RP
compliance with TG
CTSS

Inca


??: Are resources well
-
integrated int
o TG?

Tying availability of grid
capabilities to specific
impacts

TG Research Briefs

Y

G1: Science impact

TG ER, web
site

Featured PIs

Q1, Q2: What impact has TG
had on specific PI’s research?

Limited pool of
examples.

TG Nugget Mine

N

G1,G2: Breadth,
de
pth of TG impact

TG internal
web app

TG staff

Q1, Q2: How have TG
resources/services impacted
specific projects/PIs?

Anecdotal only.
Requires buy
-
in by TG
staff.

TG Impact Survey

N

G2: Depth of TG
impact on random
sample of PIs

TG Impact
Survey

Survey ana
lysts,
PIs

Q1, Q2: Picking a set of TG PIs
at random, how has (or hasn’t)
TG changed the way they do
research and/or improved their
productivity?

Time
-
consuming.

TG User Census

N

G2,G3: User
community

POPS,
add_user form

Analysts,
demographers

Q1: Social

network analyses?
Breadth of impact? Etc.
Students affected, trained.

No tracking of persons
over time, especially
after ‘leaving’ TG.


Draft
-

9
/1/06

17

Appendix B: Recommendations for modifications to POPS


For submission of proposals
:



Under Supporting Grants
:



Select

Funding Agency from stored list (li
ke field of science) to ensure c
onsistency.



Awarded amount should be fixed as $$. We could choose to exclude or distinguish grants of
non
-
monetary resources (e.g., DOE compute time).



PI name as Last Name, First Name fiel
ds.



METRICS: More consistent report of agency grants and $$ supported by allocation awards.



Under Resource Request
:



Each listed resource has a small set of short
-
answer questions designed to elicit a description
of expected usage patterns. At the moment

these are defined by each RP and specific to each
resource
; we believe this information is
rarely
, if ever, u
sed
, but this should be investigated
before changes are made.




Co
uld shorten POPS by omitting these questions, or perhaps define them consistently

across
all platforms.



METRICS: PI self
-
report of potential job size, resource usage scenarios.


In general, most resources ask for the following set of information
:



Large intensive runs (about half the resources ask this)



Number of processors



Memory per p
rocessor



Shared disk



Programming languages

A few resources have some specialized questions. TeraGrid roaming asks

whether cross
-
site apps
(y/n) or multi
-
site apps (y/n) will be run.
We are not aware

that anyone, including the xRAC, uses
this information.
We would have to ask if the xRAC finds it useful before making sweeping changes.
But at the moment, we collect a lot of unused (possibly useless) information.
What

information
would

be useful?



Under Attachments
:



Specify some more structure to the attache
d documents. In addition to a proposal document, a
renewal submission could also require (a) progress report, (b) list of reviewed publications,
(c) CVs. A 'new' submission would only require CVs, in addition to the main proposal
document.



Permit publicati
on lists to be uploaded in EndNote format (or some machine
-
parse
-
able
format).



METRICS: Cleaner publication list data, progress reports.



Completely new stuff
:



Create a 'Special Request' screen (optional for PIs to fill out) that would include the PI's
de
sire/expectations on using other, non
-
allocated TeraGrid resources: grid capabilities, large
-
scale storage, networking, specialized software, data collections,
or
instruments.




Draft
-

9
/1/06

18


For review of proposals



Each review currently consists of the following fiel
ds: (a) detailed review in plain text, (b)
"
overall evaluation
"


a short, freeform answer,



(c) suggested award amounts for the various resources.



Add two or three other short
-
answer questions to each review, possibly for TG metric use
only to categorize p
roposals.
We

suggest the following three questions/categorizations:



(1) Resources to be used


(a) compute only


(b) compute and non
-
compute


(c) compute, non
-
compute, and grid


(2) Research type (primary)


(a) incremental science


(b) high
-
risk, high
-
pay
off science


(c) transformative science


(d) gateway/community science


(e) methods/algorithm development


(f) classroom instruction


(3) Impact quality (for renewals) or potential (for new)

(a scale of 1
-
5?, not sure how to assess, capture this, but xRAC
might

have ideas
)




METRICS: Types of proposals received, potential impact of science, etc.



Overall evaluation could become a fixed answer, say

a scale from 0
-
5:

0
-
Reject, 1
-
Poor, 2
-
Fair, 3
-
Good, 4
-
Very Good, 5
-
Excellent.



METRICS: This rating of proposal qu
ality may not be a metric, but would be valuable for
other aspects of the allocation process.
Draft
-

9
/1/06

19

Appendix
C
:
Components of a good nugget


Basic identifying info
rmation
:



PI grant number(s) for the associated allocation/project. This one detail will tie the nug
get to
PI name, usage, POPS proposal, funding agencies, discipline, etc.



Submitter's name and institution



Collection source (ASTA report, PI log, user survey, etc.)



TeraGrid supporting cast
:



Types of resources used



Type of research



Impact/result type



TeraGrid services used (if any)



---

We're

thinking
about
ASTA in particular, but others might apply, such as gateways, cross
-
site reservations, training, viz, etc.



Site(s) involved



Specific (allocated) compute resources involved



Specific other resources in
volved



---

This list might get very long. Perhaps a free form text field?



---

On the other hand, this might be redundant to the Types of Resources question.


TeraGrid goals supported
:



Ideally, we'd want to tie the nugget to one of the three TeraGrid goals
(deep, wide, open), so
that we can use the nuggets to demonstrate that TG is achieving its goals.


Description



G
eneral text description


Associated publication(s)
:



Th
is would help us link to the publications list being assembled.


The
next page shows

a sam
ple web page for entering nug
get information into a database
. This
template

was developed by Dave Hart.


Draft
-

9
/1/06

20


Draft
-

9
/1/06

21

Appendix
D
: Message from Phil Maechling describing TG impact


-----
Original Message
-----

From: owner
-
SCECGRID
-
L@usc.edu [mailto:owner
-
SCECGRID
-
L@usc
.edu] On

Behalf Of Philip Maechling

Sent: Wednesday, May 03, 2006 12:13 PM

To: scecgrid
-
l@usc.edu

Cc: 'Bill Bell'; tjordan@usc.edu; 'Bernard Minster'; Reagan Moore

Subject: NCSA Article


TeraShake and CyberShake Groups,


NCSA is interested in an article ab
out how SCEC simulations use TeraGrid (NCSA and SDSC) systems. I
a
m supposed to talk to the Bill Bell tomorrow. Our angle is SCEC science. Their angle is how we used the
NCSA and SDSC systems. Here is draft list of ways we've used the TeraGrid systems. Pl
ease let me
know if there are any other "remarkable" usages that I might mention in the call tomorrow.


I've suggested that an appropriate angle from the TeraGrid perspective is that SCEC simulation science
requires a "broad spectrum" of usage patterns, an
d that the TeraGrid was flexible enough to provide
these varying usage patterns.



Combined Site Usages:

.

We use NMI software stack including globus, condor, srb
-
client,

vds

.

SCEC VO includes grid access to TeraGrid and large academic

computing center (
USC HPCC)

through CA coordination.

.

We use of VDS workflows

.

We use of RLS/MCS

.

We calculate SGT's at one site (SDSC or NCSA) and post
-
processing at

another site

.

Testing of /gpfs
-
wan at SDSC and NCSA

.

Integration/testing of GridFTP
-
based file regist
ration into SRB

.

Gridftp transfer of data between NCSA and SDSC to support

.

Can we say we used RFT in any of our data transfers between sites

.

Weekend response when we ran the allocation tank dry right before

our Annual Meeting.



NCSA:


CyberShake:

.

Use of Fast
-
io nodes to support IO intensive post
-
processing

.

80TB+ of storage on fast
-
io nodes and /san
-
scratch

.

Use of Up to 121 fast I/O nodes, (242 processors) using

glide
-
ins

.

Support for long runtime queues (7 days)

.

Running in excess of 100K job
s for a single site (PAS) using

workflows

.

CPU hours for workflow in excess of 13K for single site

.

Used IA
-
64 system


TeraShake 2:

.

Multiple low resolution dynamic rupture runs at NCSA on IA
-
64

machine

.

Multiple high resolution dynamic rupture runs at

NCSA including

TS2.1, TS2.2

.

Scaling tests of up to 1024 processors using Kim's DR code


SDSC

.

Storage: 20 TB /gpfs 40 TB /gpfs
-
wan 40TB tape

.

SAC Project

.

ASTA Project

.

Code optimization support

Draft
-

9
/1/06

22

.

I/O improvements

.

Initialization improvements

.

Ben
chmarking test

.

Integration of AWM and DR codes

.

Multiple 240 processor TS1.TS2 simulations

.

100TB SRB
-
based digital library

.

Use of IA
-
64 for CyberShake SGT and Workflows

.

Use of Datastar for TS2.1

.

Scenario interfaces to SRB provide access to wavef
orms

.

Long run
-
time queues (170 hours)

.

CyberShake workflows completed xxxxx jobs

.

Our nagios pushed Globus job submission causing suspension at

SDSC
-

then we resolved

problem.

.

Animations and animation processing



Future:

.

3 SCEC allocations this

year: TeraShake 3, CyberShake Map, SCEC

Earthworks Science

Gateway

.

SCEC Earthworks science gateway using VDS
-
based workflows

.

Visualization services request interface between SCEC earthworks

and

SDSC vis group.

.

Migration to Globus 4.0.1


I realize t
his is a rough list, but I'm just trying to organize our material before the call tomorrow. Please let
me know if there are other items we would like to discuss about our use of the systems. You don't have to
write them up. Just send me a note and I'll cal
l you to discuss.


Thanks,

Phil M.

Draft
-

9
/1/06

23

Appendix
E
:
Sample publication list categorization


Journal

Discipline

Articles

Impact
Factor

Art.Impact

PNAS

_

8

4

32

Nature

_

1

4

4

Philosophical Transactions of The Royal
Society

_

1

3

3

Sci Am

_

1

3

3

_Proceedin
gs (any)

ASC

8

1

8

_Book (any)

ASC

2

1

2

_In preparation

ASC

2

1

2

_Tech Report (any)

ASC

1

1

1

Ann Ops Res

ASC

1

2

2

IEEE Comp Graph & Apps

ASC

1

2

2

Int J Num Meth Eng

ASC

1

2

2

Multiscale Modeling and Simulation

ASC

1

2

2

Parallel Computing

ASC

1

2

2

Astrophysical J

AST

28

4

112

_Preprint

AST

8

2

16

_In preparation

AST

7

1

7

_Submitted

AST

6

1

6

Mon Not Roy Astron Soc

AST

6

3

18

_Proceedings (any)

AST

3

2

6

Ap J Lett

AST

2

1

2

_PhD Thesis

AST

1

1

1

_Proceedings (any)

ATM

9

1

9

J Geophys

Res

ATM

6

3

18

_Submitted

ATM

3

1

3

J Atm Terr Phys

ATM

3

2

6

J Atmos Solar
-
Terr Phys

ATM

3

2

6

_Book (any)

ATM

2

1

2

_In preparation

ATM

2

1

2

_Tech Report (any)

ATM

2

1

2

IEEE Trans Plasma Sci

ATM

2

2

4

Mon Wea Rev

ATM

2

2

4

Adv Atmos Sci

ATM

1

2

2

Atmos Env

ATM

1

2

2

J Atmos Ocean Tech

ATM

1

2

2

Meteo Atmos Phys

ATM

1

2

2

_Proceedings (any)

BCS

1

1

1

_Submitted

BCS

1

1

1

_Tech Report (any)

BCS

1

1

1

ASME J Biomech Eng

BCS

1

2

2

JACS

CHE

11

4

44

J Chem Phys

CHE

9

3

27

_Submitted

CHE

6

1

6

Adsorption

CHE

4

2

8

J Comp Chem

CHE

4

2

8

Draft
-

9
/1/06

24

J Org Chem

CHE

4

2

8

J Chem Theo Comp

CHE

3

2

6

J Phys Chem A

CHE

3

2

6

Adv Synth Catal

CHE

2

1

2

J Phys Chem

CHE

2

2

4

Langmuir

CHE

2

2

4

_In preparation

CHE

1

1

1

Acc Chem Res

CHE

1

1

1

Ann. Repor
ts Comp. Chem.

CHE

1

1

1

Chem Phys

CHE

1

1

1

Org Lett

CHE

1

1

1

_Proceedings (any)

CTS

17

1

17

Phys Fluids

CTS

6

3

18

J Fluid Mechanics

CTS

4

3

12

_Book (any)

CTS

2

1

2

_Submitted

CTS

2

1

2

Bull Am Phys Soc

CTS

2

1

2

Int J Heat Mass Transfer

CTS

2

1

2

J Heat Transfer

CTS

2

1

2

Aeroacoustics

CTS

1

1

1

Flow, Turbulence and Combustion

CTS

1

1

1

J Fluids and Structures

CTS

1

2

2

_Proceedings (any)

DMR

2

1

2

_Preprint

DMR

1

1

1

Appl Surf Sci

DMR

1

2

2

Proc Mat Res Soc

DMR

1

2

2

J Hyperbolic Di
ff Eqs

DMS

1

2

2

_Book (any)

EAR

3

1

3

Geophys Res Lett

EAR

3

3

9

_Submitted

EAR

1

1

1

Geophys J Int

EAR

1

2

2

PAGEOPH

EAR

1

2

2

Seismo Res Let

EAR

1

2

2

J Comp Electronics

ECS

1

2

2

_Book (any)

IBN

1

1

1

Biophys J

MCB

27

3

81

J Phys Chem B

MCB

2
0

3

60

_Submitted

MCB

11

1

11

_In preparation

MCB

8

1

8

Structure

MCB

7

3

21

Biochemistry

MCB

5

3

15

Proteins: Struc, Func, Bioinform

MCB

4

2

8

_Book (any)

MCB

3

1

3

Mol Phys

MCB

3

2

6

Biopolymers

MCB

2

1

2

Curr Opin Struct Bio

MCB

2

2

4

J Mol Bi
o

MCB

2

2

4

Nucl Acids Res

MCB

2

2

4

_Proceedings (any)

MCB

1

1

1

Bell Labs Tech J

MCB

1

1

1

Draft
-

9
/1/06

25

Bioinformatics

MCB

1

2

2

Frontiers in Biosci

MCB

1

1

1

Genome Bio

MCB

1

1

1

J Med Chem

MCB

1

1

1

J Peptide Sci

MCB

1

1

1

J Virology

MCB

1

1

1

Mol Simulat
ion

MCB

1

1

1

Peptides

MCB

1

1

1

Polymer

MCB

1

1

1

Prot Sci

MCB

1

1

1

Protein Eng Design & Selection

MCB

1

1

1

Proteins

MCB

1

2

2

Tetrahedron

MCB

1

1

1

Phys Rev D

PHY

37

3

111

_Preprint

PHY

14

1

14

Phys Rev Lett

PHY

12

3

36

Nucl Phys B (Proc Supp
l)

PHY

9

2

18

Class. Quantum Grav.

PHY

6

2

12

Phys Rev B

PHY

6

3

18

J Comp Phys

PHY

5

2

10

Optics Express

PHY

5

2

10

_Submitted

PHY

4

1

4

Optics Letters

PHY

4

1

4

Phys Rev E

PHY

4

2

8

Appl Phys Let

PHY

3

1

3

Phys Lett B

PHY

2

2

4

Phys Rev A

PHY

2

2

4

_Proceedings (any)

PHY

1

1

1

Appl Phys B

PHY

1

1

1

Chaos

PHY

1

1

1

IEEE J Sel Topics Quant Elec

PHY

1

1

1

Prog Theor Phys Suppl

PHY

1

1

1

_Proceedings (any)

SES

1

1

1















TOP 10 JOURNALS (BY # ARTICLES PUBLISHED)




Journal

Discipl
ine

Articles

Rank


Phys Rev D

PHY

37

1


Astrophysical J

AST

28

2


Biophys J

MCB

27

3


J Phys Chem B

MCB

20

4


Phys Rev Lett

PHY

12

5


JACS

CHE

11

6


J Chem Phys

CHE

9

7


Nucl Phys B (Proc Suppl)

PHY

9

7


PNAS

_

8

9


Structure

MCB

7

10













Draft
-

9
/1/06

26

Avg. Article Impact by Discipline





Discipline

Avg. Impact



_

3.82




AST

2.75




CHE

2.33




PHY

2.21




MCB

2.20




DMS

2.00




ECS

2.00




EAR

1.90




ATM

1.68




CTS

1.53




DMR

1.40




ASC

1.28




BCS

1.25




IBN

1.00




SES

1.0
0




Draft
-

9
/1/06

27

Avg. Article Impact
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
_
AST
CHE
PHY
MCB
DMS
ECS
EAR
ATM
CTS
DMR
ASC
BCS
IBN
SES
Draft
-

9
/1/06

28

Appendix F: Kevin Walsh’s 5/24/06 email
to Impact RAT
on G
lobus, GPFS
-
WAN,
v
isualization, and exemplar projects
.


I have been recommending that we identify some exemplar projects and follow their progress over

time that focus on the metrics we h
ave been discussing, while emphasizing "time
-
to
-
solution",
"capabilities", and "scientific insights". Three SDSC
-
centric projects stand out. All use Globus, and
make vital use of GPFS
-
WAN: BIRN, SCEC, ENZO, and the NVO. hese projects have their analog
at the other Teragrid sites.


For ENZO, GPFS
-
WAN is key. It has reduced the transit time for host processing to nil. ENZO job
runs employ 512 to 2048 processors and generate 10
-

100 TB. The crucial factor for GPFS is data
management that allow for comput
e processing on NCSA's Mercury and post
-

processing on
Datastar, where the whole, or majority of the data set can be placed into memory. Impact includes a
lower
-
time
-
to
-
solution (TTS).


For, SCEC's earthquake modeling, their application is less tightly co
upled, and this community can
use several sights of long simulations that are very specialized. So SCEC jobs can run on several
machines, with visualization rendered separately. The Terashake visualizations that everyone has
seen are rendered post job run.

Current development is aimed at rendering the modeling runs while
the output is in memory.
W
ithout the visualization, there output is not useful. The visualizations
done for Terashake have resulted in requests by the UCSD Cancer Center, GEON, and further

requests from SCEC and the USGS for viz for their problems that can quickly benefit from such
velocity and volume viz. The USGS has asked the viz group to develop a library that can be used
after a earthquake to model its behavior within two hours of occu
rrence. This cannot be done without
GPFS, and SCEC has developed their own workflow that is part of the most recent release of the
Globus toolkit. This work has to be put into historical perspective and the building blocks that made
it possible have to be
recognized. NPACI VISTA was the tool that made the visualization possible.
Globus, GPFS, and the newly validated seismic models over ten years in the making = scientific
impact, which translates into public policy decisions on preparing for earthquake haz
ards. So, we are
looking at a ten year period when all the tools and technologies have come together. Terashake has
been a two year effort and its example has jumped disciplines from geosciences to biology.


You see similar project trajectories in BIRN,
with the node count for cross site job runs doubling to
512 between SDSC and NCSA, and viz taking place at John Hopkins. Again a lower time
-
to
-
solution
made possible by Globus + GPFS
-
WAN. It could not be done otherwise.


Visualization is the common compon
ent to these examples, and one of most important ways
scientists assess outputs. We need to stress visualization as a mechanism to convey impact.


All of these examples involve scientists who have a past body of wor
k that can be connected to
TeraG
rid to as
sess gestation of impact (Mike Norman, Mark Elisman, Kim Olson, J.B. Minister,
Alex Szalay, etc.). However, you have to talk to the people who do the work at SDSC and the other
TeraGrid sites to learn the process that was going on, the give and take over t
ime, and the decisions
that are made based upon partial results that do not fit the model, or decisions to proceed with a full
job run when the incremental results justify moving forward with a large job run. This dynamic is
not described in the scientifi
c articles, and the contribution of the TeraGrid resources is almost a
footnote som
etimes
. Would there be value in annotating eac
h publication where use of TeraG
ri
d
Draft
-

9
/1/06

29

resources were cited, the
description of the preci
se contribution, and the staff
resource t
hat made it
possible is included? I have in mind writing a companion supplement of all, or designated articles,
public presentations, and publications, that provides much richer detail of who an
d what was
involved from a TeraG
rid support perspective, and
that speak in terms of the metrics we are
identifying.