Autonomic Computing

basesprocketΔιαχείριση Δεδομένων

31 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

65 εμφανίσεις

ISA5428:
普及計算


Autonomic Computing

金仲達教授

清華大學資訊系統與應用研究所

九十一學年度第二學期


(
Slides are taken from the presentations by

Alan Ganek, Alfred Spector, Jeff Kephart of IBM)

1



Trillions of heterogeneous
computing devices connected to
the Internet


Dream of Pervasive Computing …


or Nightmare!

2

Core of the Problem


Complexity


in systems themselves and in the operating
environment


As systems become more interconnected and
diverse, architects are less able to anticipate and
design interactions among components


push to runtime, late binding


e.g., hot
-
plug, JVM, JIT compilation, service
discovery, mobile agents, …


Complexity management


human intervention and IT costs

3

Need Complexity Management


But complexity is beyond that human can
handle


Human out of the control loop

autonomic



Even though we are moving along this
direction, is there any systematic way of
addressing this issue?



Autonomic Computing

4

Alan G. Ganek

Vice President

IBM Autonomic Computing









http://www.ibm.com/autonomic/

Autonomic Computing

5

Complex Heterogeneous Infrastructures
Are a Reality!

6

Industry Trends


Administration of systems is increasingly difficult


100s of configuration, tuning parameters for DB2


Heterogeneous systems are increasingly connected


Integration becoming ever more difficult


Architects can't plan interactions among
components


Increasingly dynamic; frequently with unanticipated
components


More burden must be assumed at run time


But human administrators can't assume the burden


6:1 cost ratio between storage admin and storage


40% outages due to operator error


Need self
-
managing computing systems


Behavior specified by sys admins via high
-
level policies


System and its components figure out how to carry out
policies

7





Autonomic Computing Vision



Intelligent” open systems that…


Manage complexity


“Know” themselves


Continuously tune themselves


Adapt to unpredictable conditions


Prevent and recover from failures


Provide a safe environment


Self
-
management:


free administrators from details of operations


provide peak performance 24/7


Concentrate on high
-
level decisions and policies

8

Increase
Responsiveness

Adapt to dynamically
changing environments

Business
Resiliency

Discover, diagnose,
and act to prevent
disruptions

Operational
Efficiency

Tune resources and
balance workloads to
maximize use of IT
resources

Secure
Information and
Resources

Anticipate, detect,
identify, and protect
against attacks

Self
-
managing Systems That …

Aware/Proactive

9

Self
-
Configuring Example:

DB2 Configuration Advisor

10

Self
-
Healing Example:

IBM Electronic Service Agent

11

Self
-
tuning, end
-
to
-
end performance
management


Dynamic allocation of network resources


Workload balancing & routing


Cross platform reporting


Policy
-
based for various classes of users & applications


Heterogeneous, distributed

components working together


Self Optimizing:

Enterprise Workload Management

12

Rapid / automated analysis

of complex situations

Self
-
Protecting Example:

IBM Tivoli Risk Manager

13

Evolving towards Self
-
management

Today

The Autonomic Future

Self
-
configure

Corporate data centers are
multi
-
vendor, multi
-
platform.
Installing, configuring,
integrating systems is time
-
consuming, error
-
prone.

Automated configuration of components,
systems according to high
-
level policies;
rest of system adjusts automatically.
Seamless, like adding new cell to body or
new individual to population.

Self
-
heal

Problem determination in large,
complex systems can take a
team of programmers weeks

Automated detection, diagnosis, and
repair of localized software/hardware
problems.

Self
-
optimize

WebSphere, DB2 have
hundreds of nonlinear tuning
parameters; many new ones
with each release.

Components and systems will continually
seek opportunities to improve their own
performance and efficiency.

Self
-
protect

Manual detection and recovery
from attacks and cascading
failures.

Automated defense against malicious
attacks or cascading failures; use early
warning to anticipate and prevent
system
-
wide failures.

14

Manual

Autonomic

Benefits

Skills

Characteristics

Managed

Level 2

Predictive

Level 3

Adaptive

Level 4

Autonomic

Level 5

Basic

Level 1

Multiple
sources of
system
generated data

Requires
extensive,
highly skilled

IT staff

Basic
Requirements
Met

Evolving to Autonomic Computing

15

Manual

Autonomic

Benefits

Skills

Characteristics

Basic

Level 1

Predictive

Level 3

Adaptive

Level 4

Autonomic

Level 5

Multiple
sources of
system
generated data

Requires
extensive,
highly skilled

IT staff

Basic
Requirements
Met

Managed

Level 2

Consolidation

of data and

actions
through
management

tools

IT staff

analyzes and

takes actions

Greater
system
awareness

Improved
productivity

Evolving to Autonomic Computing

16

Manual

Autonomic

Benefits

Skills

Characteristics

Basic

Level 1

Managed

Level 2

Adaptive

Level 4

Autonomic

Level 5

Multiple
sources of
system
generated data

Requires
extensive,
highly skilled

IT staff

Basic
Requirements
Met

Consolidation

of data and

actions
through
management

tools

IT staff

analyzes and

takes actions

Greater
system
awareness

Improved
productivity

Predictive

Level 3

System

monitors,
correlates and
recommends
actions

IT staff

approves and
initiates actions


Reduced
dependency on
deep skills

Faster/better
decision making

Evolving to Autonomic Computing

17

Manual

Autonomic

Benefits

Skills

Characteristics

Basic

Level 1

Managed

Level 2

Predictive

Level 3

Autonomic

Level 5

Evolving to Autonomic Computing

Multiple
sources of
system
generated data

Requires
extensive,
highly skilled

IT staff

Basic
Requirements
Met

Consolidation

of data and

actions
through
management

tools

IT staff

analyzes and

takes actions

Greater
system
awareness

Improved
productivity

System

monitors,
correlates and
recommends
actions

IT staff

approves and
initiates actions


Reduced
dependency on
deep skills

Faster/better
decision making

Adaptive

Level 4

System
monitors,
correlates and
takes action

IT staff
manages
performance
against SLAs

Balanced
human/system
interaction

IT agility and
resiliency

18

Manual

Autonomic

Benefits

Skills

Characteristics

Basic

Level 1

Managed

Level 2

Predictive

Level 3

Adaptive

Level 4

Multiple
sources of
system
generated data

Requires
extensive,
highly skilled

IT staff

Basic
Requirements
Met

Consolidation

of data and

actions
through
management

tools

IT staff

analyzes and

takes actions

Greater
system
awareness

Improved
productivity

System

monitors,
correlates and
recommends
actions

IT staff

approves and
initiates actions


Reduced
dependency on
deep skills

Faster/better
decision making

System
monitors,
correlates and
takes action

IT staff
manages
performance
against SLAs

Balanced
human/system
interaction

IT agility and
resiliency

Autonomic

Level 5

Integrated
components
dynamically
managed by
business
rules/policies

IT staff
focuses

on enabling
business needs

Business policy
drives IT
management

Business agility
and resiliency

Evolving to Autonomic Computing

19

IBM’s Architecture Model


Intelligent control loop:


Implementing self
-
managing attributes involves
an intelligent control loop

20

Control Loops Delivered in 2 Ways

Combinations of
Management Tools

Recourse
Provider

21

3 Layers of Control Loop Management


Composite resources
tied to business
decision
-
making




Composite resources
decision
-
making, e.g.,
cluster servers



Resource elements
managing themselves

22

Autonomic Element
-

Structure


Fundamental atom of the architecture


Managed element(s)


Database, storage


Autonomic manager


Responsible for:


Providing its service


Managing own

behavior in

accordance with

policies


Interacting with other autonomic elements

An Autonomic Element

Monitor

Analyze

Sensors

Execute

Plan

Effectors

Knowledge

Autonomic

Manager

Managed

Element

Sensors

Effectors

23

Alerts, events & problem
analysis request interface

SLA/Policy interface,
interprets & translates
into "control logic"

Plan

Policy Transforms

Plan
Generators

Policy Interpreter

Analyze

Execute

Service Dispatcher

Distribution Engine

Scheduler Engine

Workflow Engine

Monitor

Metric Managers

Filters

Simple Correlators

Knowledge

Policy

Calendar

Topology

Recent Activity Log

Sensors

Effectors

Rules Engines

Analysis Engines

Policy Validations

Policy Resolution

Autonomic Manager Substructure

24

Autonomic Elements
-

Interaction


Relationships


Dynamic, ephemeral


Formed by agreement


May be negotiated


Full spectrum


Peer
-
to
-
peer


Hierarchical


Subject to policies

25

Multiple Contexts for Autonomic
Behavior

System
Elements

(Intra
-
element

self
-
management)

Groups of
Elements

(Inter
-
element

self
-
management)

Business
Solutions

(Business Policies,
Processes,
Contracts)

Server

Farm

Enterprise

Network

Storage

Pool

Customer
Relationship
Management

Enterprise

Resource

Planning

Servers

Storage

Network

Devices

Middleware

Database

Applications

26

Mapping to IT Processes

27

Levels of Maturity

28

Enabled capabilities

Core technologies

Administrative
Console

Policy Infrastructure

Data Collection

(Logging/Tracing)

Infrastructure

Provisioning

Install/Dependency

Management

Heterogeneous Workload Management

Solution Management

Policy
-
based Management

End
-
to
-
end Problem Determination

Automated Root Cause Analysis

Auto
-
Update

Identity/Security Management

Auto
-
Detection

Dynamic Provisioning

Autonomic Computing Requires Core
Technologies

29

Integrated Solutions Console for
Common System Administration


Value:


One consistent interface
across product portfolio


Common runtime
infrastructure and
development tools based

on industry standards,
component reuse


Provides a presentation
framework for other
autonomic core technologies

Customer pain point:

Complexity of operations

Standards
-
based:

J2EE, JSR168

30

Log and Trace Tool for Problem
Determination


Value:


Introduces standard
interfaces and formats for
logging and tracing


Central point of interaction
with multiple data sources


Correlated views of data


Reduced time spent in
problem analysis

Customer pain point:

Difficulty in analyzing problems in multi
-
component systems

Standards
-
based:

JSR47, Apache

31

Install/Config Package for Solution
Install


Value:


One consistent software installation
technology across all products


Consistent and up
-
to
-
date configuration
and dependency data, key to building

self
-
configuring autonomic systems


Reduced deployment time with less errors


Reduced software maintenance time,
improved analysis of failed system
components



Component
-
based install for IBM and non
-
IBM products

Customer pain point:

Difficulty of deployment in complex systems

Standards
-
based:

OGSA, Web Services

Partnering with

InstallShield

32

Policy Tools for Policy
-
based Management


Value:


Uniform cross
-
product policy
definition and management
infrastructure, needed for
delivering system
-
wide self
-
management capabilities



Simplifies management of
multiple products; reduced
TCO


Easier to dynamically change
configuration in on
-
demand
environment

Customer pain point:

Complexity of product and systems management

Standards
-
based:

DMTF, OASIS, OGSA

Adaptation

Definition

Validation

Local

Reposito
ry

Distribution

Enforcement

Point

Push or pull

Push or pull

Activate

Implement

M

O

N


I

T

O

R

Facts

Analysis

Resource





Enforcement

Point

Resource

Resource

33

Technologies for Implementing Autonomic
Managers

Value:



Components to simplify the incorporation of
autonomic functions into applications


Building blocks for self
-
management


Monitoring, analysis, planning and execution
components


Including autonomic computing technologies,
grid tools, and services


Pluggable


Defines interfaces and provides
implementations for each major toolkit
component

Customer pain point: How to implement end
-
to
-
end
autonomic solutions

Standards
-
based:

OGSA, W3C

34

Summary of Autonomic Computing
Architecture


Based on a distributed, service
-
oriented
architectural approach, e.g., OGSA


Every component provides or consumes services


Policy
-
based management


Autonomic elements


Make every component resilient, robust, self
-
managing


Behavior is specified and driven by policies


Relationships between autonomic elements


Based on agreements established and maintained by
autonomic elements


Governed by policies


Give rise to resiliency, robustness, self
-
management of
system

35

Summary