SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond

dizzyeyedfourwayInternet and Web Development

Nov 3, 2013 (4 years and 1 month ago)

271 views

SIENA is a Specific Support Action funded by the GÉANT & e-Infrastructure Unit, DG Information Society & Media, European Commission
www.sienainitiative.eu | info@sienainitiative.eu
SIENA European Roadmap
on Grid and Cloud Standards
for e-Science and Beyond
Use Cases & Position Papers
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
1
Index
Executive Summary ...............................................................................................................................................3
Introduction: Forces Driving Change .............................................................................................................5
Future European e-Infrastructure ...................................................................................................................6
e-Infrastructure Requirements. .......................................................................................................................8
e-Infrastructure Technology .............................................................................................................................9
Enabling Standards ..............................................................................................................................................10
International Co-ordination .............................................................................................................................12
Clouds Standards Coordination .....................................................................................................................13
Conclusions/Recommendations/Future Directions ...........................................................................14
Target Audience ....................................................................................................................................................15
Timeline ....................................................................................................................................................................15
Scope .........................................................................................................................................................................15
Roadmap Editorial Board (REB) Member List ..........................................................................................16
SIENA Project Description – www.sienainitiative.eu ...........................................................................18
Cloudscape III Use Cases & Position Papers .............................................................................................19
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
2
Disclaimer
The views expressed in this roadmap are those of the authors and do not necessarily reflect
the official European Commission’s view on the subject.
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
3
SIENA European Roadmap on Grid and
Cloud Standards for e-Science and Beyond
Executive Summary
T
he future European electronic infrastructure for research (e-infrastructure) needs to
integrate federated and virtualised technologies based on geographically distributed
information and communications technology (ICT) resources in a secure and interoperable
way. Such ICT resources will be provided by both the public sector and commercial vendors
and be dynamically and flexibly accessed on demand to provide a set of common services
for the communities they serve.
A driving force for e-infrastructures in Europe is data intensive science exemplified in
Europe by existing research projects at national and European levels
1
, and future projects
such as those described in the Roadmap of the European Strategy Forum on Research
Infrastructures, commonly referred to as the ESFRI projects
2
. Our focus is to identify
the core common requirements relating to the provision of e-infrastructure that the
communities have rather than the specific functionality used by particular communities. A
high-level description of these requirements, and especially those that are common to all
or most projects, is contained in the report of the European E-Infrastructure Forum
3
. Other
relevant documents describing e-infrastructure requirements have been produced by the
e-Infrastructure Reflection Group (e-IRG)
4
and the High Level Expert Group on Scientific
Data
5
.
An overarching and fundamentally important characteristic of an e-infrastructure is the
interoperability of its component technologies. Failure to achieve interoperability can have
powerful negative consequences for cost and efficiency of operation, and for the research
productivity of user communities of an e-infrastructure. Interoperability is best achieved
through adherence to a set of open standards and agreed principles. Work to establish such
a set of standards is ongoing for the e-infrastructure components, the services, and the
metadata, and will continue for the foreseeable future. Agreed principles are important to
achieve interoperability as a temporary measure while an agreed set of open standards is
being developed.
Due to the highly diverse, domain specific requirements of different user communities,
there is a risk of fragmentation in the development of e-infrastructure. The fact that
funding for public infrastructure comes primarily from the independent Member States
1] See, for example, the book edited by Hey and Gray
research.microsoft.com/en-us/collaboration/fourthparadigm/contents.aspx
2] ec.europa.eu/research/infrastructures/index_en.cfm?pg=esfri-roadmap
3] https://documents.egi.eu/public/ShowDocument?docid=12
4] www.e-irg.eu/
5] cordis.europa.eu/fp7/ict/e-infrastructure/high-level-group_en.html
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
4
of the European Community also represents a risk for fragmentation due to national
objectives (e.g. budgetary) possibly being misaligned with European level needs. These risks
apply equally to research e-infrastructure and to e-government infrastructure, the use of
ICTs in public sector activities. The most important recommendation of this roadmap is to
undertake determined and targeted efforts to discourage fragmentation, and to encourage
and participate in the development of an adequate set of structures - both organisational
(e.g. governance, single sign on, etc.) and technical (e.g. open standards, security, software,
etc.) to ensure the interoperability of future European e-infrastructures for research and
e-government.
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
5
Introduction: Forces Driving Change
P
owerful economic and environmental forces are driving a major evolution in the way
information and communications technology (ICT) is provisioned for user communities
in industry and the public sector. Economies of scale are driving consolidation of IT resources
into a smaller number of ever larger data centers. Data centers with hundreds of thousands
of computational and storage units are no longer uncommon. Considerations of the cost
of powering and cooling such large concentrations of electronic equipment, together with
environmental concerns, drive the placing of such data centers in geographic locations
where power is plentiful and inexpensive. As communities become more dependent on ICT
resources, the desire to assert their ownership of their data, legal concerns on the locality
of the data, and the need for geographical redundancy may lead to a diffusion of data
centres. The forces now driving change within ICT are many and potentially contradictory,
leading to different solutions that optimise the needs of different communities and their
use cases.
These forces and their consequences simultaneously enable and drive the move towards a
utility model of ICT. The current manifestation of this model is cloud computing through
the commoditisation of the underlying virtualisation technology and the globalisation of
service provision. The dynamic flexibility and reduced cost of accessing ICT resources in the
cloud are beginning to overwhelm most other considerations on provisioning ICT resources.
Such a fundamental shift poses numerous challenges to user communities. For example
the Integrated Sustainable Pan-European Infrastructure for Researchers in Europe (EGI-
InSPIRE) project partially funded by the EC is responding to the demands from its user
communities by exploring aspects of cloud computing, notably flexible and elastic
provisioning, within its grid of federated resource providers. This document addresses a
number of these challenges, with a primary focus on standardization and interoperability
of the infrastructures built around the utility model.
Finally, market forces may be working against standardization in cloud computing
6
.
The differing requirements of diverse customer communities lead naturally to market
segmentation. These differing requirements also enable vendor differentiation through
the development of different cloud architectures to address different market segments.
Competition among vendors can then lead to locking customers into distinct cloud
offerings.
6] See article “Cloud Computing Standards – Not This Year”, by John Considine, January 2011 at
cloudcomputing.sys-con.com/node/1691805
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
6
Future European e-Infrastructure
E
lectronic infrastructures at a European level are becoming fundamental resources for
supporting activities across the public sector - primarily e-research, e-government and
e-health - as society attempts to exploit the data deluge it is facing from the numerous
existing and future digital data sources. Obtaining knowledge from this data to benefit
many areas of society requires convergence at three main levels:
» The provision of a cost-effective, flexible, adaptable and reliable e-infrastructure that is
able to support different user groups and use cases;
» Access to persistently identifiable data sources - open access for public data and
restricted access for confidential data;
» The development of appropriate applications, algorithms and environments that use the
e-Infrastructure to extract knowledge from the data sources.
Tackling these issues cuts across many of the areas identified within the Digital Agenda for
Europe
7
as being critical for Europe’s continued growth towards a smart society: reducing
the fragmentation of services, improving their interoperability, providing secure access to
valuable data and resources, driving innovation and development in these services, and
educating a generation of users and developers in the benefit of such technologies.
Europe has already built up significant knowledge and momentum in one public sector
area - e-research - after over a decade of investment through the European Commission’s
Framework Programmes and national funding sources. A succession of projects has resulted
in capacity building across Europe and its regional partners in both grids of high throughput
computing (e.g. EGEE
8
, EGI-InSPIRE
9
) and high performance computing (e.g. DEISA
10
, PRACE
11

that are linked by the pan-European networking infrastructure GÉANT
12
. Alongside the
establishment of this e-infrastructure, innovative scalable middleware
13
has been developed
and deployed into operation to meet the needs of researchers across many disciplines
investigating such scientific and societal challenges as particle physics, the human genome,
or climate modeling.
The e-research community comprises researchers in such domains as high-energy physics,
astronomy and astrophysics, energy research, and the earth, material, biological and life
sciences. For this e-research community, the next decade will see European e-infrastructure
being used as a foundation for establishing multi-national multi-disciplinary research
infrastructures such as those described in the ESFRI roadmap. Although the maturity of
these individual projects varies, together they have common needs that if provided
consistently across the sector will promote many aspects of the Digital Agenda for Europe
and provide cost-effective return on investment.
Central to meeting these different use cases across the public sector is to provide a best
7] ec.europa.eu/information_society/digital-agenda/index_en.htm
8] www.eu-egee.org
9] www.egi.eu/projects/egi-inspire/
10] www.deisa.eu/
11] www.prace-project.eu/
12] www.geant.net
13] en.wikipedia.org/wiki/Middleware
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
7
of breed e-infrastructure that brings together public and commercial providers to deliver
a series of increasingly sophisticated platforms that are tuned to the particular needs of
these communities. At the heart of this vision is the provision of a federated, virtualised
e-infrastructure:
» Federated: Bringing commercial and public sector providers from different countries that
are able to inter-operate with each other - ultimately through the adoption of open
standards;
» Virtualised: Using new and emerging software to flexibly partition these resources on
demand to meet the needs of various user communities dynamically;
» e-infrastructure: Having a set of common services (e.g. identity management, accounting,
provisioning, data access, etc) that provides a platform for adoption, portability and re-
use across different communities.
The vision presented in this document is by no means guaranteed. The investment that has
been committed by national governments and the European Commission in GÉANT, EGI and
PRACE provides vital structural building blocks in the e-infrastructure community, but in
moving from core e-infrastructure to higher-level components the priorities for investment
begin to diverge across Europe and between communities. The need for software to
manage, deploy and run in the federated virtualized environments remains. To avoid a single
monolithic software deployment across Europe the development and implementation of
standards remains essential if individual sectors are not to fragment into using their own
bespoke and non-interoperable software solutions.
While the Infrastructure as a Service (IaaS) model is at the heart of this vision for Europe as
a whole, it will be used as a basis for deploying platforms (Platforms as a Service - PaaS) and
software, notably application software (Software as a Service - SaaS) that are developed to
meet the needs of particular communities.
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
8
e-Infrastructure Requirements
D
ifferent communities will have different needs from the future European
e-infrastructure. Our focus is to identify the core common requirements relating
to the provision of e-infrastructure that the communities have rather than the specific
functionality used by particular communities.
» Single Sign-On: Inter-domain access to services from different communities demands
secure, portable, electronic identity that can be used across different service providers.
The federated identity providers that are being established in Europe present one
possible solution to this requirement.
» Security: Supporting secure and dynamic resource (including data, knowledge, and services)
sharing and collaborations across institutional and national boundaries is an essential part
of achieving the vision of an e-infrastructure. Robust electronic authentication capable of
reliably identifying remote users (human beings or software components) with a certain
level of assurance in authentication strength is an important pre-requisite to facilitate
effective user authorisation and fine-grained access control to distributed services
14
.
» Group Management: Managing individual access to resources across Europe is not feasible
considering the number of users and resources. Using group based access control, such
as the virtual organisation models used in grids, the project model used in HPC and the
attributes model used in federated identities, provides a more scalable access control
model.
» Persistent Data Identifiers: The ability to uniquely identify a data set, and from that data
set identify its ownership, access rights, privacy attributes provenance, life-time, stored
locations, etc. is vital for systematic reuse of data across communities.
» User Support: Support is needed for all types of users (end-users, system administrators,
developers, etc.) across the complete life-cycle of e-infrastructure adoption. This
includes training on the deployed technologies, consultancy on their use and problem
solving when something goes wrong. This is needed both for the core infrastructure and
any domain specific software that is deployed on top of it.
» Virtualisation: Communities need to deploy their own services, potentially co-located
with particular data sets, on sites across Europe on demand. Such activity can then be
decoupled from the deployment activities of other communities.
» High Throughput Data Analysis: Such communities need to be able to move large data-
sets to where the computing resources are available, and to move the results from such
analysis to where long-term storage capacity is available. In addition to the previous
requirements this requires a high-performance pan-European networking infrastructure
closely coupled to data-centres with large computing and storage capabilities as
supported through the EGI-InSPIRE project.
» High Performance Computing: Peta-scale computing resources are essential for the small
proportion of researchers solving science’s most demanding problem through projects
such as PRACE. Efficient access to the small number of peta-scale machines in Europe is
facilitated through high-performance networking links.
14] See E-infrastructure Security: Levels of Assurance Final Report:
www.jisc.ac.uk/media/documents/programmes/einfrastructure/finalreport.pdf
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
9
e-Infrastructure Technology
e
-infrastructure in Europe has reached a production status over the last decade by
driving innovation in middleware and networking technology. This innovation needs to
continue over the next decade in areas such as:
» Virtualisation: High-quality hypervisors that underpin virtualisation in modern data-
centres are becoming commonplace. Commercial solutions provide integration with
data centre operations. Open-source solutions, such as the OpenNebula environment,
are being used as powerful tools for innovation and interoperability in the research
community, and as platforms to implement new standards in cloud computing.”
» Networking: Driven by the worldwide growth of the Internet commercial networking
solutions are available for deployment to support public service activities. A focus on
on-demand cross-domain provisioning of high-speed data transfer links (light paths) with
defined service level agreements is an area which needs continuing investment.
» Software: The software platforms and services necessary to federate the virtualised
resources to provide seamless access and to run within the virtualised environments
continue to need investment. Increasingly, investment needs to take place through
acquisition of commercially provided software solutions where they exist and allowing
the research community to innovate through open-source software in areas where they
can add unique value beyond the scope of commercial solutions.
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
10
Enabling Standards
S
tandardisation and interoperability are invaluable characteristics to a successful
application of distributed computing.
The importance of the need for open standards to support interoperability goals is now
well documented in the e-business world. Of particular relevance to the e-research and
e-government communities are the statements made in the EICTA Interoperability White
Paper of 2004
15
, the ETSI White Paper No. 3. “Achieving Technical Interoperability”
16
and
the EC’s European Interoperability Strategy (EIS)
17
and Interoperability Framework (EIFv2)
18

documents of 2010.
Given a policy of using open standards to achieve interoperability, the next question is
which standards? At present this is not easy to answer. There are many initiatives to define
the optimum set of standards to support all aspects of cloud computing
19
, but as yet the full
set does not exist. Putting in place the necessary on-going procedures for tracking emerging
standards and technologies in order to a) set up and maintain a central agreed list of open
standards, and b) provide best practice advice to e-infrastructure projects, is a significant
task, and will require future investments. In an effort to align the needs of both the research
and e-government communities it may be beneficial to take into consideration current EC
work on Project CAMSS
20
and SEMIC.eu
21
.
However the following questions will persist for some time to come:
1. How does one proceed with interoperability if sufficient standards do not yet exist?
2. What happens if a large market develops for commercial offerings without open standard
specifications?
3.
What if relevant open standard specifications exist but are not, or not yet, supported by industry?
The EIS/EIF provides the following pragmatic guidance on these questions which should be
equally applicable to the research communities:
» Public administrations may decide to use less open specifications, if open specifications
do not exist or do not meet functional interoperability needs.
» In some cases, public administrations may find that no suitable formalised specification is
available for a specific need in a specific area. If new specifications have to be developed,
15] EICTA Interoperability white paper www.digitaleurope.org/fileadmin/user_upload/document/docu-
ment1166548285.pdf . In March 2009 EICTA was rebranded DIGITALEUROPE.
16] ETSI White Paper No. 3 Achieving Technical Interoperability - the ETSI Approach. By Hans van der Veer (Alcatel-
Lucent), Anthony Wiles (ETSI Secretariat). 3rd edition, April 2008.
www.etsi.org/WebSite/document/whitepapers/IOP%20whitepaper%20Edition%203%20final.pdf
17] COM(2010) 744 final, Annex 1 ec.europa.eu/isa/strategy/doc/annex_i_eis_en.pdf
18] COM(2010) 744 final, Annex 2 ec.europa.eu/isa/strategy/doc/annex_ii_eif_en.pdf
19] See, for example forge.gridforum.org/sf/go/doc15990
20] ec.europa.eu/isa/workprogramme/doc/detail_description_of_actions.pdf . CAMSS, an initiative of the Eu-
ropean Commission’s IDABC programme, aims to initiate, support and coordinate the collaboration between
volunteer Member States in defining a “Common Assessment Method for Standards and Specifications” and to
share the assessment study results for the development of eGovernment services.
21] www.semic.eu/semic/view/snav/shared-development.xhtml . SEMIC.EU is a participatory platform and a service
by the European Commission that supports the sharing of assets of interoperability to be used in public adminis-
tration and eGovernment.
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
11
public administrations may either develop the specifications themselves and put forward
the result for standardization, or request a new formalised specification to be developed
by standards developing organisations.
» Even where existing formalised specifications are available, they evolve over time
and experience shows that revisions often take a long time to be completed. Active
government participation in the standardization process mitigates concerns about
delays, improves alignment of the formalised specifications with public sector needs and
can help governments keep pace with technology innovation.
In the context of the SIENA Roadmap, it is essential that the research communities
who need e-infrastructures for their work define their requirements of the relevant
e-infrastructures. Without such definitions and conformance, little can be done to furnish
standards-compliant solutions that meet any community requirements. They should also
support and contribute to the current standardization initiatives and not seek to re-invent
wheels. As an interim measure they should consider building adaptors to fill gaps in the
standards landscape, but adapters should not be seen as the long term solution to achieve
interoperability.
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
12
International Co-ordination
W
ork on the SIENA roadmap complements that of the far larger US National Institute
of Standards and Technology (NIST) Cloud Computing Program
22
. A US Federal Cloud
Computing Strategy document has been released which outlines the Federal Government’s
approaches to Cloud Computing
23
. The SIENA project is concerned with e-infrastructure for
research including grids and clouds. The NIST program is concerned with government use
of cloud computing. The NIST SAJACC initiative
24
develops cloud system use cases to drive
the formation of cloud computing standards.
Cross communication between SIENA and the NIST program is proving beneficial. A number
of members of the SIENA REB are also participants in the NIST cloud computing expert
group.
Similar work is going on in Japan
25
China
26
and other countries. The NIST program in the
US, GICTF in Japan, and CESI in China are all potential partners in evaluating potential cloud
standards relevant for European e-infrastructure.
22] www.nist.gov/itl/cloud/index.cfm, collaborate.nist.gov/twiki-cloud-computing/bin/view/CloudComputing/
WebHome
23] Federal Cloud Computing Strategy - Vivek Kundra U.S. Chief Information Officer, February 8th 2011. www.nist.
gov/itl/cloud/
24] www.nist.gov/itl/cloud/sajacc.cfm
25] See www.gictf.jp/index_e.html and the presentation “Smart Cloud Strategy in Japan” by Yasu Taniwaki, Division
Director, ICT Strategy Division, Japanese Ministry of Internal Affairs and Communications, November 2010
items-int.eu/IMG/pdf/1011_Smart_Cloud_Strategy_Global_Forum_.pdf
26] www.en.cesi.cn
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
13
Clouds Standards Coordination
C
loud standardisation efforts led by the Distributed Management Task Force (DMTF),
the Storage Networking Industry Association (SNIA) and the Open Grid Forum
(OGF) are frequently cited as being enablers that could have a major impact on compute
infrastructure in the future. Work on additional standards for various aspects of cloud-
based services is underway in the Organisation for Advancement of Structured Information
Standards (OASIS) and the Internet Engineering Task Force (IETF). At the same time, market
adoption of some of these standards is mixed, and different regions (US, China, Japan) are
still evaluating their approaches to cloud standards, so it is difficult to predict whether
consensus will emerge in the near term. The standards listed below that have emerged from
analysis of use cases collected to date are being coordinated through an alliance between
the OGF and SNIA as well as through a cross-SDO cloud standards collaboration group
27
:
» Open Virtualization Format (OVF)
28
developed by DMTF. OVF is a packaging standard
designed to address the portability and deployment of virtual appliances. This is
recognised as a DMTF, ANSI standard categorized under IaaS, Interoperability. There are
firms who provide tools for conversion between various appliance formats, including
OVF format to Amazon Machine Image (AMI) format.
29
» The Open Cloud Computing Interface (OCCI)
30
developed by the OGF. OCCI describes
application programming interfaces (APIs) that enable cloud providers to expose their
services. It focuses on “IaaS” based clouds and allows the deployment, monitoring
and management of virtual workloads (like virtual machines), but is applicable to any
interaction with a virtual cloud resource through defined http(s) header fields and
extensions. While there are several open-source implementations, OCCI has not yet been
widely adopted in commercial platforms. OCCI is also an input to the DMTF standard for
cloud management.
» The Cloud Data Management Interface (CDMI)
31
developed by SNIA. CDMI defines the
functional interface that applications use to create, retrieve, update and delete data
elements from the Cloud. CDMI is not yet widely implemented in commercial platforms.
Other standards may emerge that enable interoperability between clouds and grids. For
example, the OGF GLUE
32
standard provides one information model for describing grid
and cloud entities while the CIM model from DMTF
33
provides an alternative model used
frequently in industry.
27] See the summary at www.ogf.org/standards/; the Cloud Standards Wiki is available at
cloud-standards.org
28] A description is available at dmtf.org/standards/ovf
29] aws.amazon.com/amis/
30] occi-wg.org/
31] www.snia.org/tech_activities/standards/curr_standards/cdmi/
32] GLUE Specification v. 2.0, by S. Andreozzi (INFN); S. Burke (RAL); F. Ehm (CERN); L. Field (CERN); G. Galang (ARCS);
B. Konya (Lund University); M. Litmaath (CERN); P. Millar (DESY); JP Navarro (ANL). March 2009
www.ogf.org/documents/GFD.147.pdf
33] www.dmtf.org/standards/cim
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
14
Conclusions
Recommendations
Future Directions
T
he most important recommendation of this roadmap is to:
Undertake determined and targeted efforts to discourage fragmentation, while at
the same time preserving innovation in the development of e-infrastructure.
In support of this recommendation we believe the following actions are necessary by all
stakeholders to achieve the desired outcomes:
Fund participation in the long-term development of an adequate set of open standards
to ensure the interoperability of future European infrastructures for research and
e-government.
Public sector and commercial providers should engage more to explore shared standards
requirements.
An ongoing process is needed to track emerging standards, technologies, and best practices
in order to create and maintain a structured repository of open standards (from various
SDOs) for grids and clouds, and provide updated guidance to European e-infrastructure
projects. This activity will benefit from interaction with worldwide initiatives and other
European projects (e.g. NIST, GICTF, CESI, CAMSS
34
, SEMIC.eu
35
, etc.).
Encourage and fund the definition of sound security policies concerning the access, use
and provisioning of services within distributed infrastructures.
Introduce guidelines for dealing with data privacy, long term data curation, liability and
taxation issues in clouds and grids for work across legislative boundaries.
Fund procurement of open source or commercially provided software solutions
allowing the research community to innovate in areas where they can add unique value
beyond the scope of commercial solutions.
Fund on-demand cross-domain provisioning of high-speed data transfer links (light
paths) with defined service level agreements.
Involve Europeans citizens in e-science through volunteer computing (using, e.g.,
desktop grids and clouds).
34] ec.europa.eu/idabc/en/document/7407.html. See also footnote n. 20.
35] www.semic.eu/semic/. See also footnote n. 21.
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
15
Target Audience
This initial draft document is for circulation to the SIENA Roadmap Editorial Board (REB),
Industry Expert Group (IEG), Special Liaison Group (SLG) and the European Commission.
Timeline
Since October 2010, REB members have been contributing material to the SIENA Wiki. The
material is structured according to a table of contents for a final document. This initial draft
has been prepared as a SIENA deliverable to the EC. The REB has developed a publishable
version circulated at Cloudscape-III (Brussels, 15-16/03/2011). The REB will then integrate
further elements, namely the use cases presented at Cloudscape III from SIENA and NIST.
Scope
This document addresses requirements, technologies, and interoperability and standards
for e-infrastructure to support existing, ongoing, and future research in the European
Research Area. The term e-infrastructure encompasses the distributed information and
communications technologies (ICTs), together with federating software, that together
provide services and access to resources needed to support public sectors such as research
in the natural and social sciences and humanities. While not a focus of this specific
document, some consideration is given to aspects of e-infrastructure that apply also to
e-government. The most recent European Commission call under Framework Programme 7
for proposals relevant to e-infrastructure can be found in the European Commission Work
Programme 2011 Capacities Part 1 Research Infrastructures
36
.
36] cordis.europa.eu/fp7/wp-2011_en.html
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
16
Roadmap Editorial Board (REB) Member List
REB Member Role & Organisation Country
John Borras Independent Consultant & OASIS United Kingdom
Goetz-Philip Brasche Program Director Cloud Computing EMIC
& Venus-C representative
Germany
Mark Carlson Senior Architect, Oracle & SNIA & DMTF
representative
United States
Guy Coates Group leader, Informatics systems group
at Wellcome Trust Sanger Institute
United Kingdom
Juan Cáceres Middleware Technologies Specialist,
Telefónica I+D & StratusLab
representative
Spain
Michel Drescher EGI.eu Technical Manager The Netherlands
Åke Edlund KTH project manager and researcher &
ECEE representative
Sweden
Mike Fisher Distributed Computing Research
Group Leader BT & Chair of Technical
Committee, ETSI
United Kingdom
Patrick Guillemin ETSI Secretariat, Strategy & New
Initiatives
France
Jenny Huang AT&T , OMG representative United States
Gershon Janssen Independent Consultant & OASIS
Standards Group representative
The Netherlands
Craig Lee The Aerospace Corporation United States
Bob Marcus ET-Strategies United States
Ignacio Martin Llorente Complutense University of Madrid &
OpenNebula representative
Spain
Steven Newhouse EGI.eu Director & EGI-InSPIRE Director The Netherlands
Alexander Papaspyrou Technische Univ. Dortmund & IGE
representative
Germany
Morris Riedel Jülich Supercomputing Centre & EMI
representative
Germany
Alan Sill VP of Standards, OGF & Senior Scientist,
Texas Tech University
United States
Etienne Urbah LAL, Univ Paris-Sud & EDGI representative France
Martin Antony Walker Independent Consultant & REB Chair France
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
17
Roadmap content has been contributed by members of the SIENA Roadmap Editorial Board
(REB) and Industry Expert and Special Liaison Groups (IEG and SLG), who also contributed
to the editing process. Roadmap content structuring, production, and final editing were
done by Martin Antony Walker, REB chair, John Borras, co-chair, and Steven Newhouse,
Director of EGI.eu and EGI-InSPIRE, with contributions by Silvana Muscella, SIENA technical
coordinator, and James Ahtes, ATOS Origin. Organisation and coordination of the REB and
editorial activities have been carried out by the SIENA consortium.
The SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
18
SIENA Project Description – www.sienainitiative.
eu
S
IENA (RI-261575) the Standards and Interoperability for eInfrastructure Implementation
Initiative (2010-2012), is a Support Action funded by the European Commission
under Framework Programme 7 (2007-2013) Research infrastructures projects. SIENA will
contribute to defining a future eInfrastructures roadmap focusing on interoperability and
standards, in close collaboration with the European Commission, Distributed Computing
Infrastructures (DCI) projects and Standard Development Organisations (SDOs) to gain an
in-depth understanding of how distributed computing technology is being developed in
this context. The roadmap will define scenarios, identify trends, investigate the innovation
and impact sparked by cloud and grid computing, and deliver insight into how standards
and the policy framework is defining and shaping current and future development and
deployment in Europe and globally.
Use Cases &
Position Papers
15-16 March 2011
Brussels, Belgium
CloudScape III - Taking European Cloud Infrastructure Forward
20
Index
Introduction ...........................................................................................................................................................21
Uses and perspectives from Science and Research
BiGGrid HPC Cloud.............................................................................................................................................23
Biology on the Cloud .........................................................................................................................................25
CONTRAIL - Open Computing Infrastructures for Elastic Services ...............................................27
RESERVOIR - IaaS Cloud Interoperability ..................................................................................................29
TClouds - Trustworthy Cloud Computing ..................................................................................................31
European Distributed Computing Infrastructures
EDGI, DEGISCO & IDGF - European Desktop Grid Initiative, Desktop Grids for International
Scientific Collaboration & International Desktop Grid Federation ................................................33
EGI - European Grid Infrastructure ..............................................................................................................35
EMI - European Middleware Initiative ........................................................................................................37
IGE - Initiative for Globus in Europe ...........................................................................................................39
StratusLab - Enhancing Grid Infrastructures with Virtualization and Cloud Technologies ..41
VENUS-C - Virtual Multidisciplinary Environments using Cloud Infrastructures .....................43
Business & Government
The shift to cloud computing in government in the EU .....................................................................45
G-Cloud - UK Government Cloud Computing Infrastructure ..........................................................48
CitySourced/FreedomSpeaks citizen services platform ....................................................................50
CUSTOM - Cultural Heritage & Tourism Store on the Cloud ............................................................52
Standards & Interfaces
OpenNebula - A reference open cloud stack to enable interoperable enterprise-class
cloud computing platforms ............................................................................................................................54
OCCI - Open Cloud Computing Interface specification set .............................................................56
Legal, Economic, Ethical and Security Issues
Cloud computing and its ethical challenges ............................................................................................58
VENUS-C study on economic and legal implications of sustainable scientific clouds .........60
The Cloud: Understanding security, privacy and trust challenges ..................................................62
CloudScape III - Taking European Cloud Infrastructure Forward
21
Introduction
Cloudscape III use cases and Position Papers for the SIENA
European Roadmap on Grid and Cloud Standards for
e-Science and Beyond
Cloud computing is expected to play a key role in the digital economy in Europe and beyond.
To ensure European citizens gain real benefits from the cloud, it is essential that we address
legal and institutional barriers, as well as technical challenges such as interoperability.
The SIENA Roadmap on grids and clouds for European research infrastructures and public
services addresses interoperability and standards and in the next 15 months is committed to
delivering a policy framework for distributed computing that ensures fair competition and
brings to bear European strategic priorities.
To help achieve these goals, the SIENA consortium is drawing on Cloudscape III to showcase
speakers from all over the globe who will offer their personal insights on specific use cases
or interoperability issues surrounding Cloud computing.
The following use cases and position papers have been collected for the Cloudscape III
event, serving primarily as a sample of the cloud computing landscape. They highlight
potential challenges for deliberation at Cloudscape III and for the SIENA Roadmap Editorial
Board in the coming months, with the aim of shaping future developments and the SIENA
Roadmap itself.
The full collection of use cases and position papers are available at
www.sienainitiative.eu/CloudscapeIII-UseCases&PositionPapers/
CloudScape III - Taking European Cloud Infrastructure Forward
22
CloudScape III - Taking European Cloud Infrastructure Forward
23
Uses and perspectives from Science and Research
BiGGrid HPC Cloud
General overview and field of application
With the newly developed BiGGrid High Performance Computing (HPC) Cloud environment,
scientific researchers get access to their own Virtual Private HPC Cluster. It is a virtualized HPC
Cluster that users can configure to exactly match their needs, without interfering with the needs
of other users. It is flexible, offers self service and is dynamically scalable.
Users can start from existing templates (images), or build their own cluster from scratch. It is
even possible to make a copy from their current IT software environment (for example their
laptop or desktop pc) and turn that into a HPC cluster in our Cloud. In that way, there will be very
little difference between their development environment and their production environment.
There is no need for an (expensive) rewrite of their software, and scientific challenges can be
scaled up very easily from desktop scale to High Performance Compute cluster scale.
The importance of interoperability
For us the most important part of Cloud standards is that we offer infrastructure as a service,
but we want to hide all the differences and little details of hardware behind an abstract
interface or API. For example, it does not really matter which Cloud middleware we use and
which OS runs on the hosts to deploy our VMs, we use OCCI as an interface between our
GUI and OpenNebula. Also, we are finalizing an implementation of CDMI to have the same
setup for storage. CDMI will hide the complexities for users of where data is located in a
distributed cloud and which protocols they can use to access it. Also through CDMI users
can deploy a storage volume and manage their data, including fine grained authorizations,
without manual steps by our administrators.
Adoption of emerging or existing standards
We are also starting to work on an API for network configurations. Our users will be able to
manage many network settings by themselves, for example the creation of a VLAN between
VMs, setting firewall rules and setting up secure connections to their virtual machines.
Our goal is that we fully automate the management of virtual HPC clusters. All (skilled) end
users can be completely self supporting and can access and configure their virtual private
HPC cluster in the BiGGrid HPC Cloud through a secure and functionally complete API.
When these standards for compute, storage and network are complete, it can also be used
between Cloud clusters/providers to (automatically) negotiate migration of workloads.
Security configurations are especially important for this use case.
CloudScape III - Taking European Cloud Infrastructure Forward
24
Finally, standards should be open, so everybody can benefit and end users will actually have
a choice of where to deploy.
Possible future cooperation
ECEE – Enabling Clouds for eScience – is an open collaboration spot for cloud projects in
Europe. The purpose with ECEE is to share experiences to find out as much as possible, as
quick as possible, about how clouds can help our users in their daily work.
eScience projects involved so far are NEON, BalticCloud, NGS, GRNET cloud, SARA cloud,
UCM (OpenNebula), StratusLab, VENUS-C, SEECCI and CESGA – which together represent
a fair share of the European cloud community. ECEE focus on interoperability-now, sharing
its input and requirements with ongoing standardization efforts. Meeting twice a year since
OGF28 in March 2010, the projects together share roadmaps, experiences and issues – trying
to identify: a common roadmap over all; gap analysis; “Market analysis” – today’s users,
tomorrow’s; Guidelines – best practices, quick start one-pager, checklists and practical ‘rules
of thumb’. A number of Focus Areas were identified at an early stage including: Security,
Metering, Accounting, Billing, Business models, Federation of clouds, Network and Licences,
Scheduling, load balancing (resource sharing, application correlation) and in making a list of
tested solutions, and their pros and cons.
Contacts: Floris Sluiter Ake Edlund
Organisation; SARA HPC centre KTH Royal Institute of Technology
Contact details: floris@sara.nl edlund@nada.kth.se
Web: www.cloud.sara.nl
CloudScape III - Taking European Cloud Infrastructure Forward
25
Biology on the Cloud
The Cloud provides a wide range of infrastructure and software services that can be used
by the Biology user community. Indeed, experienced technical computing users are already
finding ways in which to use these services to augment their existing computing resources.
The greater promise of the cloud is that it can make technical computing pervasive,
opening up the field to new researchers who have not been traditional HPC users. These
researchers will be able to co-opt sophisticated cloud services provided by both academia
and commercial providers to aid them in their research. In this paper I will showcase two
Biology Cloud use cases which offer a number of advantages to users.
IaaS: Web-services Mirrors
The Ensembl project provides a variety of web services which allows researchers to visualise
and data-mine genomic data (www.ensembl.org). Ensembl has a world-wide audience and is
accessed 24 hours a day. Historically, the web service was hosted in a single UK datacentre.
Whilst this provided fast access to users in the UK and Europe, users in Asia and the
Americas found that access to the web services was slow, due the large latencies involved
in serving requests across the globe. Single site hosting also made the website vulnerable
to datacentre and network outages.
The global, distributed nature of commercial Cloud IaaS make them a useful building block
for providing world-wide availability and reach. Ensembl has used public IaaS providers to
build mirrors of its web services in the United States of America and Asia. Not only has this
massively increased the performance of the website for non European users, but it also
provides continued availability of service when the UK datacentre is offline.
Cloud hosting provides several advantages over hosting in a traditional co-location facility.
Installing real hardware in a remote co-location facility requires time-consuming and costly
logistics. Hardware has to be shipped to the facility and cleared through customs, and
staff need to be present on site to oversee hardware installation and initial provisioning.
In contrast, provisioning virtual hardware in a remote cloud IaaS facility can be done from
any location with internet access, whilst the “on-demand” facilities allow machines to be
provisioned within a matter of minutes
SaaS: Providing Informatics services for Next-Generation
Sequencing (NGS)
SasS provides new opportunities for organisations to provide IT services to researchers.
IT service provision for next-generation sequencing machines is a huge challenge. A single
sequencing instrument can produce approximately a terabyte of raw data per day and a
large sequencing study may end up with a total dataset of many hundreds of terabytes.
Dealing with this data is a challenge for organisations of all sizes, whether they are a small
lab with a single machine, or a large sequencing centre with many tens of machines.
CloudScape III - Taking European Cloud Infrastructure Forward
26
Although sequencing manufacturers provide basic analysis software for their machines,
there is a whole extended eco-system of software that researchers typically want to run on
their data. The large volumes of data means that labs need to integrate their instruments
with a LIMS (Laboratory Information Management System), in order to organise and track
their data. Researchers will also want to run down-stream analysis on their data once it
comes off the sequencers; raw sequence data is typically only the first stage in a scientific
investigation. Down-stream analysis software is typically complex, and requires a high-
performance computing (HPC) infrastructure.
Rather than having to provide software and HPC support in house, the Cloud SaaS model
allows researchers to obtain LIMs and data analysis services from specialised bio-informatics
suppliers.
Using this model, researchers run a sequencing experiment in-house, and the raw data is then
uploaded to the SaaS providers, who will then analyse, track and store their data. Researchers
are therefore freed from having to manage their own LIMs and HPC infrastructure.
Whilst most sequencing SaaS is currently provided by commercial entities (eg https://
www.seqcentral.com, www.dnanexus.com) opportunities also exist for academic cloud
providers. Many large scale sequencing projects are carried out by large academic consortia,
composed of many different organisations with differing specialities. (eg the International
Cancer Genome Consortium www.icgc.org). Members of the consortium with a high level
of IT expertise can provide SaaS services to the whole of the consortium. These services
may be hosted on the consortium’s own infrastructure, or on cloud IaaS provided by a third
party. Private cloud SaaS provision within a consortium may be especially useful when
data-privacy and security policies make it impractical to host data on third-party cloud
services.
Challenges remain. Although research organisations are connected by high speed networks,
these networks are currently not well connected to the commercial networks used by
commercial cloud providers. In practice, transfers of large amount of data into commercial
cloud providers is time consuming, and can limit the usefulness of SaaS services for
sequencing applications, especially for organisations with limited network connectivity.
Contact: Guy Coates
Organisation: Wellcome Sangar Institute
Contact details: gmpc@sanger.ac.uk
Web: www.ensembl.org
Relevant Links: www.seqcentral.com; www.dnanexus.com; www.icgc.org
CloudScape III - Taking European Cloud Infrastructure Forward
27
CONTRAIL – Open Computing
Infrastructures for Elastic Services
General overview and field of application
The Contrail project will deliver federated access to cloud resources. Single registration and
account management are core features of the use cases, where “account management” also
includes roles and permissions, billing, resource allocations, etc. Services are selected based on
published service levels and “quality of protection,” as well as, of course, cost and permissions.
Federated access must be transparent, with the federation accessing, or enabling access to,
remote cloud services on behalf of the user, but of course without incurring unexpected
costs. Account management will thus need to include an internal economic model.
Briefly, the use cases (case studies) cover geo-referenced data, processing streaming
multimedia, real-time high performance scientific data analysis, and drug discovery. Our
user communities cover both industry and academic users. (The mapping of use cases to
requirements is still ongoing.)
Contrail will provide both PaaS and IaaS. The PaaS services will be using existing components
for “structured storage” – a key/value store, a database infrastructure (using SQL), as well
as hosting services enabling hosting of PHP applications, MapReduce-enabled storage with
Hadoop, and “bag-of-tasks” services. In addition to the native interfaces, we will need
interfaces for provisioning and managing PaaS resources.
The importance of interoperability
Interoperability is very important to Contrail. As the federation accesses services on behalf
of users, having standard interfaces into clouds (such as OCCI from OGF and CDMI from
SNIA) will be very useful. Otherwise, we will need to code an interface for each service
provider which will limit the number of service providers we can support. As we currently
plan to work with OpenNebula, we will support their interfaces.
The other role of standards is to ensure that the interface remains stable: a proprietary
interface could be changed by its owner, potentially without consulting us, whereas a
standard managed by a standards body will have processes for updating protocols. In this
respect, it would be useful to focus on open standards bodies and/or working groups,
where the participation is open and not prohibitively expensive.
Whenever possible, we try to identify existing standards and evaluate them to see whether
they are appropriate for Contrail. If not, we consider working with the standards working
groups to augment their standard. While we reuse whenever possible, we will also seek
standardisation of our own work whenever appropriate. Having learnt from other EU-
projects, we will identify work for potential standardisation and collaboration in standards
bodies at an early stage in the project, to ensure that such work has a reasonable chance of
completion during the lifetime of Contrail. We make as much use as possible of collaboration
CloudScape III - Taking European Cloud Infrastructure Forward
28
events and are currently working on identifying peer projects for collaboration.
There are additional benefits to collaborating on standards: we avoid duplication of effort,
and get more effort behind the work by collaborating.
Adoption of emerging or existing standards
The maturity of standards – and their implementations – is very important: a standard which
has more than one implementation behind it, at least one in C or C++ and one in Java, where
the implementations are robust and independent of each other, and the underlying libraries are
themselves mature, will be much more useful. We could in principle use a protocol which has a
single implementation (most of our own code will be implemented in Java), but Contrail will also
need to interoperate with more than itself, so mature implementations should be preferred.
As an example, there are many security-related standards from IETF, W3C, OASIS, ITU which
are relevant to Contrail. We note that even very mature standards like X.509 certificates can
pose interoperation problems, and many later standards (e.g. in WS-Security) have themselves
taken a long time to mature, and not all of these are usable yet. There is also a risk with new
standards that they only partially implemented the specification, in which case we will need
to know – or learn “the hard way” – which parts of the specification we can use.
We are still reviewing existing standards for suitability for Contrail, as well as related work
produced by other EU-funded projects. We are following interoperation activities in OGF
(e.g. GIN, PGI, and the proposed Cloud-BP (BP=Basic Profile, analogous to HPC-BP.)
We see interoperation testing happening mainly in collaborations with peer projects, and/
or within the scope of standards bodies, not usually within Contrail itself.
It is possible that we can help emerging standards mature by using them both within Contrail
and in collaborations, but this will require more effort and will extend the development time
for our own components. So, all other things being equal, a mature standard is preferred.
We are likely to use (or at the very least evaluate) the following emerging standards:
OCCI from OGF; CDMI from SNIA; Proposed extensions to XACML (to bring it in line with
functionality in POLPA): DMTF standards may be relevant (OVF, “OVF+”); Standards (if any)
for managing workflow: AMQP – Advanced Message Queuing Protocol (www.amqp.org).
Possible future cooperation
Existing projects:
SLA@SOI – SLA management, service management – uses Apache TASHI, and they claim
their service manager is “based on OCCI”(?); MASTER - protection profiles, risks, trusted
infrastructure; DEPLOY – formal methods; Cloud4SOA; RESERVOIR framework for business
applications – applications, SLA. Use of OpenNebula; StratusLab; mOSAIC.
Contact: Dr Christine MORIN Dr Jens Jensen
Organisation: INRIA Rennes Science and Technology Facilities Council
Contact details: contrail-contact@inria.fr jens.jensen@stfc.ac.uk
Web: contrail-project.eu
CloudScape III - Taking European Cloud Infrastructure Forward
29
RESERVOIR - IaaS Cloud Interoperability
General overview and field of application
The RESERVOIR project is developing an IaaS cloud computing platform with advanced
features regarding current alternatives, such as automatic scalability and site federation.
The applications to which RESERVOIR is aimed are multi-tier services that are deployed and
managed using the RESERVOIR middleware. The services demonstrated in the project range
all application fields, from GRID computing, corporate services (e.g. SAP), eGovernment
and the telco industry. RESERVOIR architecture provides site federation and functionality
is split in three different middleware layers: Service Manager (SM), which provides holistic
service management; Virtual Execution Environment Management (VEEM), which manages
the virtual machines that compose the service implementing the federation capabilities;
and Virtual Execution Environment Host (VEEH) which implements the virtualization
platform (i.e. hypervisor).
The importance of interoperability
Interoperability is key in RESERVOIR and standards are used in three areas. Firstly, the
service packaging format should leverage standard formats, so the same services that
customers get from ISVs, deploy in their in-house IT infrastructure and/or other cloud, can
also be seamlessly deployed in RESERVOIR. Secondly, the deployment and management
API used by users to interact with RESERVOIR cloud should be standardized. Thirdly, as
RESERVOIR is composed of three independent middleware layers (Service Manager, Virtual
Execution Environment Management and Virtual Execution Environment Host) that could
be developed and provided independently, standard APIs between them are needed.
Adoption of emerging or existing standards
In order to package the services that are deployed in RESERVOIR cloud, the Distributed
Management Task Force (DMTF)’s Open Virtualization Format (OVF) is used. The challenge
with OVF in RESERVOIR is how to adhere to the basic standard, widely used among
industry but without the advanced features in RESERVOIR (elasticity, deployment-time
configuration, deployment constraints, etc), and at the same time how to introduce these
features without breaking it. The key to achieving this goal is using OVF built-in extensibility.
Apart from OVF, standard APIs are needed to allow the interaction between users and the
RESERVOIR cloud. In this area, we have found a lot of fragmentation, due to each alternative
in the IaaS management API landscape being actually a vendor-specific API rather than a
standard one. However, some emerging efforts are being taken to define a truly standard
IaaS management API and one of the most outstanding ones is the work carried out in the
CloudScape III - Taking European Cloud Infrastructure Forward
30
DMTF’s Cloud Management WG. In the RESERVOIR project, TCloud API has been defined
and used as IaaS management API and, in order to get a close alignment with the final DMTF
standard, we submitted this proposal to DMTF and actively participate in CMWG work.
Regarding interoperability between RESERVOIR middleware layers, standard alternatives
are also being explored and used: TCloud API (being the “intra-layer” functionality a
subset of the API exposed to cloud users) and libvirt. Once the DMTF’s CMWG API
consolidates, interoperability tests could be done between RESERVOIR and future vendors’
implementations.
Possible future cooperation
The standards consolidated in RESERVOIR (OVF and TCloud API) will continue its evolution
in other cloud-related projects participated by the same partners (such as FP7 4CaaST, FP7
VISION or Spanish funded NUBA) and in the products developed by the industrial partners
in those consortia.
Contact: Fermín Galán Márquez
Organisation: Telefónica I+D
Contact details: fermin@tid.es
Web: www.reservoir-fp7.eu/
CloudScape III - Taking European Cloud Infrastructure Forward
31
TClouds – Trustworthy Cloud Computing
General overview and field of application
The TClouds project investigates two use cases:
1. The Smart Grid Use Case
This case is based on a smart grid application that has been developed jointly by Portugal’s
main energy provider EDP (www.edp.pt) and the engineering company EFACEC (www.
efacec.pt). The application is in a pre-commercial stage and is currently piloted with
public agencies. A central element is the real-time data generation, intelligent analysis
and smart control of public lightning.
2. The eHealth Use Case
This case is based on a patient monitoring, medical data analysis and remote diagnosis
application that is being developed by Philips (www.healthcare.philips.com) and the
St. Raffaele Hospital (www.sanraffaele.org) in Milan. The application is in the research
and development stage. Central requirements are differentiated data access according
to roles such as patient, doctor, pharmacist or patient family members. Also, strict
regulatory requirements need to be observed in order to protect the privacy of the
treated information.
TClouds investigates the migration of central elements of these applications into an
IaaS cloud environment – in particular the scalable operational data storage as well as
performance critical run-time components. In both cases specific regulatory conditions
apply that are derived from EU as well as national law. Both cases also imply specific
requirements for security and need to protect the application from external as well as
insider attacks from cloud provider maintenance personnel.
TClouds is specifically investigating the migration into a cloud-of-clouds environment
that is composed by multiple federated IaaS providers. For this reason, TClouds will set-up
several test-sites as well as use commercial IaaS providers.
The importance of interoperability
TClouds is to one extent researching on technologies that can provide external security
and privacy to any IaaS cloud – such as allowing computation with encrypted data in the
cloud or the automated integrity verification of results received from software components
deployed in a cloud.
However, complementary mechanisms that TClouds is developing will also involve
interfaces and interaction with the IaaS providers on the deployment and enforcement of
security and privacy policies. This relates to the IaaS service management interface level as
well as to the standards for deployment descriptions and monitoring.
CloudScape III - Taking European Cloud Infrastructure Forward
32
Adoption of emerging or existing standards
TClouds is investigating two Open Source cloud platforms: OpenStack (www.openstack.org)
and Open Nebula (opennebula.org).
Tclouds also envisages the adoption and extension of Open cloud standards. Currently, the
following are examples for standards that are considered:
» The DMTF Open Virtualization Format (OVF)
» The OGF Open Cloud Computing Interface (OCCI)
»
The SNIA Cloud Data Management Interface (CDMI)
»
The NIST Cloud Standards Roadmap – e.g. SCAP / Security Content Automation Protocol
»
Existing security standards – such as for identity and access management, encryption and
key management
Possible future cooperation
TCLouds is collaborating with the following initiatives:
» Effectsplus – Networking of EU Security Projects
» FIA - European Future Internet Assembly
» NESSI – Networked European Software and Services ETP
Relevant EU cloud projects (only first indications):
» RESERVOIR (federated IaaS clouds)
» VISION (federated cloud storage)
» SAIL (cloud networking)
Contacts: Elmar Husmann Matthias Schunter
Organisation: IBM Strategy & Change - Innovation IBM Research – Zurich
Contact details: huselmar@de.ibm.com mts@zurich.ibm.com
Web : www.tclouds-project.eu
CloudScape III - Taking European Cloud Infrastructure Forward
33
European Distributed Computing Infrastructures
EDGI, DEGISCO & IDGF
General overview and field of application
The EDGI (European Desktop Grid Initiative) and DEGISCO (Desktop Grids for International
Scientific Collaboration) European projects, together with IDGF (International Desktop
Grid Federation), are expanding the power of eScience infrastructures such as EGI with
Desktop resources (which are numerous and cheap) and Cloud resources (which provide
Quality of Service) in full production.
On the e-Infrastructures side, we interface with the computing element by presenting the
collected Desktop resources as just another Batch System. On the Desktop Grid side, we
interface with the Desktop Grid server by submitting jobs to it. We interface with Clouds
by using their API.
Our ‘Application Repository’ middleware publishes applications from government, industry
or academia which have been adapted and validated for secure execution on Desktop
resources.
The importance of interoperability
Our projects are needed because of the current lack of interoperability between the various
middleware stacks for Grids, Desktop Grids and Clouds. In fact, we are providing practical
interoperation through our bridge, using ad-hoc adapters, converters and translators for
each connected Grid or Cloud middleware.
Our work would be eased very much by common Grid/Cloud open standards which are not
only published, but widely implemented in a really interoperable manner. We present here
the relevant standardization domains by decreasing level of importance.
Adoption of emerging or existing standards
We are currently using many of the following de facto and official standards and we plan to
use more of them in the future:
» Information publication and discovery is standardized by OGF GLUE 2.0.
» Security is covered by IGFT, RFC-3820 compliant X509 proxies, OGF VOMS, Oasis SAML
and EGI SPG.
» Log records will be standardized by OGF Activity Instance Document Schema.
» Accounting records are standardized by OGF Usage Record.
» Monitoring may be performed using the WLCG Nagios stack.
» Data management is standardized by OGF DFDL, OGF ByteIO, GridFTP, SRM, DMI and
SNIA CDMI; Virtual image format and definition by DMTF OVF.
CloudScape III - Taking European Cloud Infrastructure Forward
34
» VM instantiation and management by OGF OCCI.
» Job description language; by OGF JSDL; Job management protocol i by OGF BES and HPC
Basic Profile.
Possible future cooperation
IDGF and EDGI/DEGISCO are working in strong collaboration with EGI, EMI, NorduGrid,
UNICORE Forum and interested NGIs in order to reach the widest possible user and
resource provider communities.
IDGF is organising desktop grid operators and application developers. Standardization
activities are carried out mainly inside OGF. EDGI is carefully following any improvements
and further developments of ARC, gLite and UNICORE maintained by EMI in order to make
sure that the Service Grids to Desktop Grids bridge middleware developed by EDGI will be
compatible with any new versions of the ARC, gLite, UNICORE and UMD middleware stacks.
IDGF and EDGI/DEGISCO will explore the integration in future eInfrastructures. This means
possible collaborations with Cloud research projects such as Contrail and mOSAIC. And it
will look at extending virtualization techniques to the Desktop Grid client.
Contact: Etienne Urbah
Organisation: LAL, Univ Paris-Sud
Contact details: urbah@lal.in2p3.fr
Web: edgi-project.eu
Relevant Links: desktopgridfederation.eu
CloudScape III - Taking European Cloud Infrastructure Forward
35
EGI - European Grid Infrastructure
General overview and field of application
EGI provides an e-infrastructure to support the data analysis and computational needs
of its publicly funded and supported end-users from the research community within
Europe. Increasingly, this community has experimented with the interfaces provided from
commercial cloud providers (IaaS, PaaS & SaaS) and would like to experience similar ease of
use and flexibility, but with the efficiency, data transfer rates, control and cost (free at the
point of use) that they have experienced within publicly funded e-Infrastructure.
The main users of such an environment are not foreseen to (directly) be end-users.
Rather they will be experts associated with the Virtual Research Community (or Virtual
Organisation) that will manage the preparation, deployment and operation of the virtual
machines. These experts will come either from within the community or within an NGI
working on behalf of that community. These experts would decide on behalf of their
community the distribution of the services at the resource centres that they have access to,
when to deploy new software updates, and even the software that they would use.
Essential to this model is to federate the virtual resources located at the resource
infrastructure providers (the European NGIs and EIROs within EGI) to provide:
» Authentication and authorization model that permits the access to virtual machine
management functions (deploy, start, stop, inspect, etc.) located at sites in different
administrative domains
» Provisioning and maintenance of virtualized resources driven by locality to existing data
sources, data sinks, or high performance networking links
The importance of interoperability
Interoperability is essential to a federated virtualised infrastructure. Each resource centre
(site) will wish to make its own decision as to the underlying virtual machine management
system it uses. This capability will need to be exposed in a systematic and consistent way
to a distributed user group which will need to access many such centres. Standards such as
OCCI and other IaaS activity are essential for this usage model.
Likewise, coordination is a key aspect of any federated model. For a virtualised federated
infrastructure, the ability to manage consistent access to these resources demands a
common security model that scales with regards to authentication and authorization.
The X.509 related technology coupled to virtual organization model has shown to work
technically at this scale, and if its primary use is to govern access to the virtual machine
management functions (as opposed to access to the services run inside the virtual machine)
it provides a standards based solution.
A key aspect of federation is resource discovery and to report on its usage. Standards such as
GLUE2 are being used within EGI to describe resources and derivatives of the Usage Record
CloudScape III - Taking European Cloud Infrastructure Forward
36
specification are used to aggregate accounting records on a European wide basis. Much of
this information flow is now being supported by messaging technologies implemented the
JMS specification.
Adoption of emerging or existing standards
Many of the emerging standards/specifications mentioned previously (GLUE2, Usage
Record, OCCI, JMS, X.509, etc.) have multiple servers or clients and are frequently sourced
from communities beyond EGI. This not only gives us technical confidence in adopting
the technologies (they are proven to work in many other areas) but gives us adoption
confidence as there are multiple providers that already need and know that their work
needs to inter-operate.
Any deployment of new technology releases will go through staged rollout before widescale
production deployment to ensure that the interoperability is actually achieved between
the critical components where it is needed. However, having to do explicit interoperability
tests with different technologies would demonstrate low confidence in the technical
provider… and these would not be ones we would chose to work with.
Possible future cooperation
The technologies emerging out of the European Middleware Initiative, StratusLab, Initiative
for Globus in Europe could all contribute to this activity. The Contrail project is exploring
the issue as to how different resource sites can contribute to a cloud as an infrastructure,
as opposed to individual sites.
A missing capability in the open-source area seems to be the provisioning aspect across
multiple cloud providers. Dealing with the negotiation of resources from each provider to
match the high-level deployment plan coming from the requesting user seems to be a gap.
Likewise, linking a local virtualised network topology to existing high-speed networking
links between virtualised resources does not seem to have an integrated solution at the
moment.
Contact: Steven Newhouse
Organisation: European Grid Initiative
Contact details: steven.newhouse@egi.eu
Web: www.egi.eu
Relevant Links: Integration of Clouds and Virtualisation into the European production
infrastructure – go.egi.eu/258
CloudScape III - Taking European Cloud Infrastructure Forward
37
EMI - European Middleware Initiative
General overview and field of application
As being primarily a ‘research middleware provider’, European Middleware Intiative (EMI)
use cases, in the context of e-infrastructures, are driven by ‘complex distributed high-
level scientific workflows’ that partly span over different types of e-Infrastructures.
These require the transparent access to different types of heterogeneous computational
resources (i.e. HPC and HTC) as well as performing storage management and necessary
data transfers between resources. Here different computational paradigms such as
HPC and HTC are needed in order to support common scientific community accepted
different low-level application programming models (i.e. OpenMP, MPI vs. task farming).
This in turn points to requirements for common interfaces to computing resources,
storage management, and the use of commonly agreed interfaces for data transfer
adopted by middleware services that provide access to such resources. Related to this are
challenging security requirements such as enabling single-sign on across e-Infrastructure
boundaries or even performing work on behalf of another identity than the initial
middleware user itself (i.e. delegation of rights). Although many security models (PKI,
SLC-services, OpenID, etc.) and interfaces/standards (X.509, SAML, etc.) exist, they
have been not consistently adopted across technology providers. More recently, cloud
computing is emerging using virtualization technologies that form a dynamic kind of ‘on-
demand e-Infrastructure’. EMI explores solutions to enable middleware services to take
advantage of such emerging virtualized infrastructures. In this context, we consider two
options. EMI services that are part of virtual machine appliances and the seamless access
to existing cloud infrastructures from already established and broadly used middleware
services/clients.
The importance of interoperability
The requirement for interoperability between existing middleware services that are
deployed as part of virtual appliances is relatively well supported by available standards
in the field that EMI is commonly adopting during the course of the project (i.e. compute,
data, information, security area, etc.). However, end-users typically require interoperability
to take advantage of middleware services with unique capabilities that specifically offer
access to HPC, HTC, or storage resources across all different kinds of e-Infrastructures
(e.g. PRACE, EGI, clouds). While HPC-based clouds are rather rare, we mostly experience
interoperability requirements for middleware to use it seamlessly with already existing
cloud-based infrastructures (and their access and management interfaces) offering HTC
resources and dynamic storage capabilities. EMI will work towards the interoperability
with implementations providing emerging standards-based interfaces to existing cloud
infrastructures, with a particular focus on the access of computing and data resources.
CloudScape III - Taking European Cloud Infrastructure Forward
38
Scientific end-users already take advantage of commonly used middleware (client) tools
today which require the seamless access to these infrastructures by having interoperability
in the areas of security, job and data management, as well as accounting.
Adoption of emerging or existing standards
Several agreed standard interfaces/schemas for the interoperability between established
middleware technologies are adopted and continuously tested for compliance during the
course of the EMI project (e.g. SRM, GLUE2, etc.). Nevertheless, from a ‘client perspective’,
several middleware services are expected to be compliant with emerging standard
interfaces of cloud-based infrastructures. At the time of writing, there is currently one
emerging standard named as Open Cloud Computing Interface (OCCI) that might be
relevant for EMI when it offers functionality on the PaaS and SaaS-level rather than on the
IaaS-level as today. In terms of storage, the standard Cloud Data Management Interface
(CDMI) seems to be a promising standard to be adopted by EMI services as well while the
standard still needs to prove its relevance in industry. In both cases, EMI has to be aware
of the dynamics of virtual resources and at the same time make good use of them ideally
through the adoption of commonly agreed standard interfaces.
Possible future cooperation
» StratusLab (Providing EMI middleware-based virtual machine appliances)
» VENUS-C (EMI clients might benefit via similar standard interfaces based on BES/JSDL)
Contact: Morris Riedel
Organisation: Jülich Supercomputing Centre
Contact details: m.riedel@fz-juelich.de
Web: www.eu-emi.eu/en
CloudScape III - Taking European Cloud Infrastructure Forward
39
IGE - Initiative for Globus in Europe
General overview and field of application
IGE targets, as a base middleware provider, various fields of applications and does not limit
itself to a certain community. However, a strong focus lies on helping scientists in their daily
work, making the use of eInfrastructure as simple and seamless as possible while not trying
to cover specific issues, but rather cover general services. The two general use cases IGE has
collected from the user communities, and which are seen as the most important, are “Grid
on top of Cloud” and “Cloud on top of Grid”.
While the “Grid on top of Cloud” use case covers the exercise of running Grid middleware
services in an IaaS environment and is basically solved by technology providers from various
directions (the EGI roadmap, commercial IaaS vendors, open-source projects, infrastructure
standardization efforts, etc.), it still requires significant automation efforts to bring benefit
to the operators of such services.
The “Cloud on top of Grid” use case, in turn, requires an entirely new set of interfaces, which
are yet to be defined. For example, the typical IaaS model of managing virtual machines needs
to be mapped to current Grid middleware environments. A starting point for this is the Globus
Online effort, which is an integral part of the project for the European Research community.
The importance of interoperability
For the “Grid on top of Cloud” use case, interoperability is a key issue: the deployment of Grid
services should work as seamless as possible for the operators, even cross-infrastructure. As
such, common interfaces to the underlying infrastructure are crucial and should be available
as broadly as possible. One candidate for this process would be OCCI, but the area of “service
templates” and deployment automation, also with respect to instance-specific configuration
and adaptation, is yet to be resolved since no accepted standards are available here.
For the “Cloud on top of Grid” use case, the capabilities as defined by the EGI roadmap are a
starting point for possible standards. However, in this context, the applications and platforms
comprising the Cloud environment highly influence the requirements for such standards. Here it
would be necessary to collect Cloud application use cases that are eligible to run on top of Grid
infrastructure and extract common requirements that need to be addressed by the DCI projects.
Adoption of emerging or existing standards
At the moment, IGE evaluates the applicability of Cloud standards to the project goals. As
said before, a good candidate for the described use cases is the OCCI family of specifications.
Interoperability tests conducted by IGE would largely consider using Cloud interfaces from
the client perspective; as such, the project requirements are consumer-oriented regarding
CloudScape III - Taking European Cloud Infrastructure Forward
40
IaaS services. From the provider perspective, the upcoming European deployment of a
cloud-based file transfer service on top of Grid infrastructure, Globus Online, will show
whether and how scalability is an issue, but is unlikely to touch interoperability issues on
the Cloud interface level.
Possible future cooperation
A main issue seems to lie in the field of usable templates in the context of virtualized
services. Especially the post-template creation aspects such as individual VM modification
(tailoring towards the VRC that is to be targeted) seems to be an open issue. While the EGI
roadmap seems to touch this field, concrete steps are yet to be defined.
Contact: Alexander Papaspyrou
Organisation: Technische Universität Dortmund
Contact details: alexander.papaspyrou@tu-dortmund.de; eglo@ige-project.eu
Web: www.ige-project.eu
CloudScape III - Taking European Cloud Infrastructure Forward
41
StratusLab – Enhancing Grid Infrastructures
with Virtualization and Cloud Technologies
General overview and field of application
The StratusLab project started in June 2010 with the purpose of investigating the impact
of the emerging cloud computing paradigm in the provision of grid computing services.
StratusLab focuses on the Infrastructure-as-a-Service (IaaS) cloud paradigm, which implies
the usage of virtualization technologies for the provision of computing resources. The
project is integrating a cloud distribution, based on the OpenNebula cloud management
toolkit, specifically designed with the purpose of hosting grid services. During the design
phase the specific requirements and/or restrictions of grid services are taken into account
in order to provide optimized cloud environments for deploying virtualized production
grid sites. The first version of the StratusLab distribution was released in October 2010. The
distribution is used by the project itself to setup and provide a reference cloud service.
Currently two capabilities are available to the public: a cloud IaaS service, giving users the
ability to to instantiate and manage VMs and a appliance repository where the VM images
are stored. This reference cloud service is used also internally by the project as a testbed
for deploying grid sites and in order to investigate potential implications of their operation
over the cloud.
The primary application domains that the project is targeting are similar to those of grid
computing, i.e. scientific applications either in research or production phase. In particular
the Bioinformatics group from CNRS/IBCP participates in the project offering the primary
use cases for end-user applications on the StratusLab infrastructure.
The importance of interoperability
Interoperability plays an important role for StratusLab as with any large scale shared
infrastructure environment. Currently the main focus is on IaaS interfaces, access to virtual
machine appliances and security. Another level of interoperability particularly important
for StratusLab is the one between grid middleware and cloud management service. In this
level issues of accounting and monitoring have been identified as a priority for investigation.
Adoption of emerging or existing standards
OpenNebula is in the core of StratusLab distribution and has already adopted the OGF
OCCI standard. The toolkit’s development team, which also participates in StratusLab, plays
a central role in the standardization process of OCCI. Although OCCI support is currently
not yet integrated in the StratusLab distribution, it is scheduled for the upcoming releases
of the project. For what concerns security and authentication, StratusLab has adopted X.509
CloudScape III - Taking European Cloud Infrastructure Forward
42
certificates and utilizes VOMS services for VO management and end-user authentication.
During the second year of the project we plan to investigate hybrid cloud solutions and
exploitation of commercial cloud infrastructures. In this case IaaS interoperability will
become even more relevant and may re-focus the development and integration activities
of the project.
Possible future cooperation
StratusLab keeps close contact with most of the DCI European projects currently under way.
In particular the project is in close collaboration with EGI-InSPIRE, EMI and EDGI projects.
These collaborations are being formalized with respective MoUs. The project is also
planning to collaborate with commercial cloud providers like ElasticHosts and Flexiscale in
order to test the application of the StratusLab distribution in hybrid cloud environments.
Contact: Vangelis Floros
Organisation: GRNET
Contact details: support@stratuslab.eu
Web: www.stratuslab.eu
CloudScape III - Taking European Cloud Infrastructure Forward
43
VENUS-C – Virtual Multidisciplinary
Environments Using Cloud Infrastructures
General overview and field of application
The VENUS-C project is aimed at validating the use of cloud infrastructures to support
research in seven user scenarios, plus around ten more applications that will be identified
through an open call. Current user scenarios include seven applications across four
thematic areas: civil engineering, marine biodiversity, civil protection and emergencies and
biomedicine. Specifically, applications focus on 3D static and dynamic structural analysis
(Universidad Politecnica de Valencia), building information management (Collaboratorio),
marine biodiversity maps (National Research Council of Italy), wildfire risk prediction
and fire propagation simulation (University of the Aegean), bioinformatics (Universidad
Politecnica de Valencia), systems biology (Center for Computational and Systems Biology),
and drug discovery (Newcastle University), covering a wide range of scientific use cases
targeting on the use of intensive computing and data storage.
Cloud infrastructures are envisaged as a way to access improved computing power beyond
users’ facilities (long-duration earthquake simulations, the alignment of large-scale
sequences with respect to public databases, drug discovery over large ligand databases,
biological systems simulation, and so on), by adapting computing kernels as worker roles
or complete virtual appliances. These working units are orchestrated in a coordinated