Future Grid Report June 21 2010

wheatauditorΛογισμικό & κατασκευή λογ/κού

30 Οκτ 2013 (πριν από 4 χρόνια και 8 μέρες)

162 εμφανίσεις

Future Grid Report

June
21

2010

Geoffrey Fox


Introduction

This report
is the
Nine
teen
th

for the project and
now con
tinue
s with status of each committee

and
the collaborating sites
.
More information can be found at
http://www.
FutureGrid
.org/committees

which has links for each committee.

Progress continues as we move from worrying about hardware to worrying about support of
users!

Operations and Change Management
Committee

Operations Committee Chair: Craig Stewart, Executive Director

Change Control Board Chair: Gary Miksik, Project Manager



Proposal for collaboration with Cummins, Inc. to use FutureGrid endorsed by Operations
Committee and forward to FutureGrid PI

for approval:



Proposal from UCSD to tap one of their IBM iDataPlex nodes with a power harness to
measure power consumption


a “green IT” initiative


endorsed by Operations Committee
and forward to FutureGrid PI for approval.



First FutureGrid quarterly r
eport still in progress. We are waiting for invoicing from
Chicago, Texas (TACC) and Florida, so the “effort report” section of the report can be
completed. We have received invoices and requisite accounting details for UCSD, USC, and
Virginia.



For the

UCSD sub award, the budget includes $84K for a 10G network connection to NLR
Los Angeles. UCSD has the quote for this requirement and would like to proceed with
submitting the order. The quote covers the full expense for this service but the sub award
co
ntract’s budget shows this expense budgeted across the 4 years, as follows:

Yr 1 $12K
;
Yr 2 $24K
;
Yr 3 $24K
;
Yr 4 $24K

o

UCSD has asked if it is okay to proceed with this network connection purchase and
included the full expense with thei
r invoicing in PY1. This will require a re
-
budgeting effort thru formal change control process with NSF.




Project manager will be reviewing the PY1 budget for both IU and partners to ensure that
funding levels are still current. This effort is being done

in anticipation of the rest of PY1
funding being awarded.


Performance Committee

Chair: Shav
a

Smallen


During the past two weeks, the Performance Analysis Committee has continued HPCC work
discussed in the Inca section of the software committee report
and Netlogger work discussed in
the ISI/Pegasus section of the software committee report. The committee held one conference
call on June 16th.


Software Committee

Chair: Gregor von Laszewski

Inca:
A bug fix to an Inca script used to submit batch jobs was

made so that HPCC performance
tests can be executed on Sierra. The HPCC performance tests will also be configured for the IU
machine.

A new report page to display HPCC output history via graphs was added to the Inca
web pages.

We pushed back HPCC perform
ance test milestones to the end of July so we can
include Tango from TACC .



Pegasus:
We are working on updating the Netlogger installation on inca.futuregrid.org to update
yet the newest release. The Netlogger broker is integrated into inca.futuregrid.o
rg and comes
with init.d scripts, and the API libraries for Python, Perl and Java were re
-
installed. MongoDB
was installed. We started the activity to document and discuss a “
FutureGrid

Pegasus Software
Stack” and added instruction on how to install Pegasu
s using a yum repository. To fulfill the
requirements of
FutureGrid

new features will be added with the upcoming Pegasus 3.0 release.
We have identified that special attention will be placed on dependencies with Globus and
Condor.


Nimbus:
The Nimbus cloud

client version 15 was released. This includes the FutureGrid root
CA certs, which simplifies setup of the Nimbus cloud client for
FutureGrid
. Work on releasing
and bugfixes for Nimbus 2.5 are continued.

The work of installing Nimbus at UC as a service
has

been started.

Preparation for a demo at OGF in Chicago is underway. UC was tasks with
developing a
FutureGrid

specific Nimbus installation tutorial and scripts as well as a tutorial of
FutureGrid

users wanting to us Nimbus on
FutureGrid
.


Nimbus @ Univer
sity of Florida:

We are working on configuring a grid
-
appliances Xen image
to work on stratus@UF

cloud (which has a similar setup to the nimbus deployment on foxtrot).
The problem seems to be related to the use of Ubuntu based images, with upstart initiali
zation.
We are Preparing VM image(s) for the OGF demo
-

the image is based on the CCGrid demo,
with

nimbus contextualization (currently work in progress).


OGF demo summary:
Sky Computing on FutureGrid and Grid'5000

"Sky computing" is an emerging comput
ing model where resources from multiple cloud
providers are leveraged to create large scale distributed infrastructures. This demonstration will
show how sky computing resources can be used as a platform for the execution of a
bioinformatics application (B
LAST). The application will be dynamically scaled out with new
resources as need arises. This demonstration will also show how resources across two
experimental projects: the FutureGrid experimental testbed in the United States and the
Grid'5000, an infras
tructure for large scale parallel and distributed computing research in
France, can be combined and used to support large scale, distributed experiments. The demo
will showcase not only the capabilities of the experimental platforms, but also their emergin
g
collaboration. Finally, the demo will showcase several open source technologies. Specifically,
our demo will use Nimbus for cloud management, offering virtual machine provisioning and
contextualization services, ViNe to enable all
-
to
-
all communication am
ong multiple clouds, and
Hadoop for parallel faul
t
-
tolerant execution of BLAST.



Networking:

A meeting with several people from GNOC took place that demonstrated tools that
we as non
-
n
etwork people do not have access to. Documentation for the
se tools was
requested to
be d
elivered by the GNOC groups so we can share this information with the
FutureGrid

software
committee.


This may be done in two steps: a) a document that we cannot distribute to people outside of IU
due to Intellectual Property protection. b
) a document that we can distribute. A follow up
meeting with the network engineers will follow to implement a process to communicate
information about networks and the network impairment device to the users. The software group
provided monitoring tool lis
ts. We asked that each site reports which tools they use. Several
members have found tools such as Nagios and Ganglia which may provide useful information for
those interested in performance studies, although they may place some additional overhead on
the
system.. Many people in
t
he software committee did not show concern about the overhead as
the benefit of monitoring is one of the core requirements of
FutureGrid

motivated by
performance studies.


Security Architecture:
We restarted the discussion about
a role based

authorization and single
sign on onto all
FutureGrid

services that was originally envisioned to be art of the Phase I
software architecture. Everyone in the Software Committee has been asked to contribute
requirements to the security architect
ure.


Azure
: We spend some time on a possible authentication to azure and identified that the PAM
ldap module is a strong contender to enable a separation between security and image generation.
E.g. the images that could be generated simply point back to t
he LDAP server and have
implicitly security integrated. Azure services can use this external identity provider to allow
access to services for
FutureGrid

users.


Hadoop:

We have looked at Hadoop and how to generate a software stack for it for
FutureGrid
,
This task is done in collaboration with Dr. Lizhe Wang that uses Hadoop as part of an
application related project and is not funded by
FutureGrid
.
This will build on existing work on
MapReduce (Hadoop, Dryad) on FutureGrid.
FutureGrid

provides an ideal tes
tbed for his
activities and we like to see him as one of the early users.


Moab
: The software team has identified a bug in the existing Moab distribution (v 5.3.7). A
newer build was returned to the software group. The recommendation is to either upgrade
to 5.4
or to use a newer build from v5.3.7. Furthermore, we identified that a change in licensing took
place between 5.3.x and 5.4. Versions 5.4 need explicit licensing for dynamic provisioning and
on demand power management. These features were included i
n the hybrid builds previously
distributed as part of the normal licensing with 5.3.x according to Archit Kulshresthe.


Image Repository
: Testing deployment of the preliminary version at the mini
-
cluster we have
and at xray@fg.


Image Creation
: A
FutureG
rid

image creation and validation system is under design and
development. This system will let users pick their desired operating system, a set of preinstalled
software specific to their development environment, and the amount of system resources they
woul
d like each virtual machine to acquire. A command line interface is targeted however the
ultimate goal would be to incorporate such functionality within a portal.


vSMP
: After some discussions at ISC10, it became apparent that virtual SMPs
(
symmetric
multiprocessor
s)

may be something that FutureGrid can support and provide to users. While it
does not replace the absolute performance of a true SMP, it can provide a large memory platform
to users who need such support on current
FutureGrid

cluster infras
tructure. Options with
ScaleMP's vSMP solution are being investigated along with other possible avenues. Due to
priorities we have not continued to investigate this further for now. However we believe enabling
this would provide a significant service to th
e planed
FutureGrid

service pool and software stack.
. Preliminary experiments by Robert Henschel's group have shown that a given application
deployed on vSMP provides performance "on
-
par" with various Itanium2 based SGI Altix
systems.

Further experimenta
tion is needed.



Experiment Harness
: Most of our time the last 2 weeks has been experimenting with the Qpid
message broker to learn more about some advanced features, verify that it meets our needs, and
plan how to use it in the Experiment Harness. In pa
rticular, we are investigating the dynamic
creation of message queues, dynamic modification of who has access to message queues,
message filtering, and X.509 authentication. The purpose of this is to be able to authenticate
users to the harness and segrega
te experiment monitoring and management messages so that
different user groups do not interfere with each other. We have also spent some time enhancing
our experiment harness client and daemon implementations.


Portal:
Improvement of the
FutureGrid

portal
was addressed by researching some templates.
Furthermore, drafts of pages for two core user groups (current and potential user) were prepared.
Information about
FutureGrid

has been integrated into an XML document that can now be
rendered as gadgets. Plans are being
made for deploying a beefier Dru
pal forums module to
increase support for
FutureGrid

has been initiated.




Tutorials
: We asked to develop to
FutureGrid

specific

tutorials for users of Nimbus and
Eucalyptus on FutureGrid resources. We tasked the support group with executing this activity.


Administration:



University of Virginia:
It was identified that besides Jessica Otey a more technical
person needs to be involv
ed with the software committee weekly status updates. This
person may be John Karpivich. However due to Andrew Grimshaw

s unavailability this
week it could not yet be confirmed



Wiki upgrade:

A plan for upgrading the wiki was proposed and approved by the
co
mmittee



Meeting Notes:

We reaffirmed the operation as a committee and introduced the
following plan to be executed: All weekly

meet
ing notes are supposed to be available
Wednesday 5pm. Every two weeks a summary is created based on the notes and
forwarded f
or approval on Thursday. The software committee approves all meeting notes,
and will be submitted to Geoffrey Fox on Thursday 5pm. The notes will be copied from
the wiki into a word document that is th
e
n forwarded.



Automatic PDF document creation from the

Wiki
: We have updated the template for
writing software architecture documents to allow for easy printing of the document in pdf
format. A title page has been added to the template and all reports. We found a way to
download the reports via a command line

scripts so they could be automatically exposed
onto an svn server and integrated into the publically available
FutureGrid

portal. We
developed scripts and method for auto publishing documents and manual pages.


System Administrat
ion

& Network Committee

Chair: David Hancock


Networking



The cross connect from CENIC to NLR has completed and NLR is testing the circuit
before handing it over to SDSC for final turn
-
up.



Once CENIC takes ownership of the circuit a brief outage of the SDSC system will be
schedule
d.



After the SDSC circuit is deemed in production significant networking activities will be
complete and networking will transition to a fully operational state.


Compute & Storage Systems



IU iDataPlex (india)

o

FG environment beginning configured &
installed

o

Eucalyptus and a standard HPC available to early users

o

33 Eucalyptus nodes have been configured for use by a FG early user



IU Cray (xray)

o

Request for dynamic library support, Cray working with IU to determine if the
current storage IO nodes can s
upport a shared root DVS file system to enable
Dynamic or Shared Libraries



SDSC iDataPlex (sierra)

o

Oracle working with UCSD to diagnose IB instability problems on the storage
nodes.

o

33 nodes of MS HPC Server 2008 have been provisioned to support an FG earl
y
user

o

IU administrators are configuring dynamic provisioning on this system

o

Additional network addresses space has been provisioned (/23 instead of /24)

o

Network issues seen by Eucalyptus VMs have also appeared in Nimbus. SDSC
network engineers will join
a concall with the admin group to discuss this on
6/21.



UC iDataPlex (hotel)

o

Acceptance report sent for approval, all benchmarking targets met



UF iDataPlex (foxtrot)

o

System passed a 6
-
day stability test with 100% job completion and 100% uptime

o

System relea
sed to admins for early user configuration. All 32 nodes will be
configured for Nimbus



Dell system at TACC

o

System power has been fully installed

o

Installation of compute nodes has begun.


Training, Education and Outreach Services Committee

Chair: Renato Fig
ueiredo




Development activities remain focused on development, testing and improvements of the
baseline educational virtual appliance image (with particular emphasis on integration with
Nimbus), in preparation for deployment on FutureGrid cloud resources.




Dissemination activities focused on preparations for the OGF
-
29 demo, to be given June 21
st
.
The plan for the demonstration is to show how computing resources distributed across
multiple cloud sites can be used as a platform for the execution of a
bioinformatics
application (BLAST) across two experimental projects: the FutureGrid experimental testbed
in the United States and the Grid'5000, an infrastructure for large scale parallel and
distributed computing research in France, can be combined and us
ed to support large scale,
distributed experiments.



User Requirements Committee

and
FutureGrid

User Advisory Board (FUAB)

Chair: Andrew Grimshaw


No items
report
ed. Andrew Grimshaw on vacation


User Support Committee

Chair: Jonathan Bolte




RT tickets

o

21

tickets created in the last 2 weeks

o

9 open tickets and 3 stalled.



FutureGrid

Knowledgebase

docs

o

6 new docs

o

1 revised doc



Sidd released two Google gadgets.
FutureGrid

Hardware/Software List

was created to
display content about software/hardware availability in
FutureGrid

that is being maintained
in an xml file until a proper database is can be created. This i
nformation will be displayed in
appropriate places in the revision of the
FutureGrid

website that is currently slated for next
month. The other gadget is
FutureGrid

Core Services

which displays the status of internal
services like JIRA, LDAP, WIKI, and INCA.



We organized a m
eeting with the GNOC to learn about monitoring tools they use to manage
the
FutureGrid

network. This was valuable and we left with some documentation tasks. We
have a follow up meeting scheduled to learn what data the NOC broadcasts via web services.
The

plan is to provide gadgets to deliver dynamic network information into the
FutureGrid

website and later the portal.



A meeting is planned with support for the Cray. We will reach an understanding about how
support solutions will be captured and how to mak
e available non
-
FutureGrid

specific help
for the Cray.



Work continues on the revision of the futuregrid.org site which will be an intermediate step
on the way to the
FutureGrid

Portal. The planning document, proposed wireframe and
sitemap are posted in th
e
FutureGrid

Wiki
https://wiki.futuregrid.org/index.php/Website_revision

. Subpages are being built for the two
key audiences, current and prospective researchers and the account and p
roject requests
forms will be revised to include additional communication material to manage expectations
and describe the process.



Additional upcoming tasks include the creation of a feedback form, implementation of the
OG Forum module in Drupal,
preparation of RSS feeds for news and announcements, and a
system notices system.


Site Reports

University of Virginia

Lead: Andrew Grimshaw


No items reported.


University of Southern California Information Sciences

Lead: Ewa Deelman




USC continued to participate in the following conference calls:

all hands,
FutureGrid

Performance,
FutureGrid

Software, and
FutureGrid

Operations Committee.



USC continued to discuss requirements with the Netlogger group, culminating in new release
of the N
etlogger software.




USC works with the
FutureGrid

Performance Group, providing an updated installation of the
Netlogger software, including daemons and APIs, to enable testing in the
FutureGrid

environment.



USC provided preliminary documentation on the ins
tallation of Pegasus
-
WMS on
FutureGrid

VM images.





University of Texas at Austin/Texas Advanced Computing Center

Lead: Warren Smith


Dell cluster:



TACC received an IP address space from the University of Texas.



Electrical circuits have been modified so

that all of the compute nodes can be powered on.
Due to the over
-
commitment of the UT electricians, an outside contractor was brought in to
make these changes.



All of the hardware has been powered on.



The Linux operating system that will be used for accep
tance testing has been loaded on the
login node, administrative node, and all of the compute nodes.



One compute node may have hardware problems, but all of the other hardware seems fine at
this point.



The software to be used in the acceptance tests is bein
g built.


Experiment harness:

We continued to experiment with the Qpid message broker to learn more about it, verify that it
meets our needs, and plan how to use it in the Experiment Harness. In particular, we investigated
the dynamic creation of message q
ueues, dynamic modification of who has access to message
queues, message filtering, and X.509 authentication. The purpose of this is to be able to
authenticate users to the harness and segregate experiment monitoring and management
messages so that differe
nt user groups do not interfere with each other. We also made some
improvements to our experiment harness client and daemon implementations.


University of Chicago
/Argonne National Labs

Lead: Kate Keahey


1) Status of the
machine
hotel at University of Chicago:

The UC cluster finally cleared IBM acceptance testing (June, 14th). The UC core management
node was rebuilt and cleaned post
-
IBM (by Wednesday, June 16th). The compute profile was
stabilized and the whole cluster
was loaded

with Nimbu
s.



2) On the software side:

a) We completed integration and testing for Nimbus 2.5 (release candidate available in next 2
weeks).

b) We released Nimbus cloud client version 15. This includes the FutureGrid root CA
cert
ificate
s, which simplifie
s setup of the Nimbus cloud client for
F
utureGrid
.

c) We continued release work for Nimbus 2.5
-

bug/feature list


3) Together with UFL participated in FutureGrid demo preparation to take place at OGF on June,
21st.


University of Florida

Lead: Jose Fortes




Activities at UF included: a) improvements in the virtual machine image that had been used
in the CCGrid

demo in preparation for the OGF demo next week; the emphasis is on
improving contextualization, and UF is hosting a visitor from INRIA this summer who has
assisted in the aggregation of Grid’5000 resources for the demo (refer to summary below); b)
further

testing and improvements in the grid appliance Xen/Nimbus image to support both
64
-

and 32
-
bit x86; both images have been uploaded to the science clouds marketplace.




Sky Computing on FutureGrid and Grid'5000:

"Sky computing" is an emerging
computing mode
l where resources from multiple cloud providers are leveraged to create
large scale distributed infrastructures. This demonstration will show how sky computing
resources can be used as a platform for the execution of a bioinformatics application
(BLAST). T
he application will be dynamically scaled out with new resources as need arises.
This demonstration will also show how resources across two experimental projects: the
FutureGrid experimental testbed in the United States and the Grid'5000, an infrastructur
e for
large scale parallel and distributed computing research in France, can be combined and used
to support large scale, distributed experiments. The demo will showcase not only the
capabilities of the experimental platforms, but also their emerging colla
boration. Finally, the
demo will showcase several open source technologies. Specifically, our demo will use
Nimbus for cloud management, offering virtual machine provisioning and contextualization
services, ViNe to enable all
-
to
-
all communication among mu
ltiple clouds, and Hadoop for
parallel fault
-
tolerant execution of BLAST.


San Diego Supercomputer Center at University of California San Diego

Lead: Shava Smallen




During the past two weeks, UCSD continued to work with Sun/Oracle on two tickets
regarding
the Infiniband setup on the storage machines. We resolved one ticket, which was
due to a bug with applying patches during the Rocks automated build process. We are still
working with Sun/Oracle on the remaining ticket so we can run the Infiniband perform
ance
tools on these machines.



The SDSC networking group is also now working to resolve a router problem with
recognizing new machines on the network, inhibiting the Nimbus and Eucalyptus installs on
the Sierra cluster. There was some confusion and dela
ys in getting this issue addressed but it
is now being worked on.



UCSD, with help from IU, also put forward an internal proposal for SDSC user Catherine
Olschanowsky to attach a power monitoring harness to one node of the UCSD Sierra cluster.
This experim
ent will incur a cost for IBM to recertify the node and the proposal was
approved this week, first by the Operations committee and then by PI Fox.



UCSD continues to lead the performance group activities. Our work over the past two weeks
includes additiona
l HPCC performance testing deployment and adding a new report page to
view the HPCC results. More details can be found in the software section.


Other Sites not reported

Center for Information Services and GWT
-
TUD from
Technische Universtität Dresden

(funding starts year 2)

Purdue University

(unfunded)

University of Tennessee Knoxville

(funding starts year 2)