Management in the Virtual World

penredheadManagement

Nov 18, 2013 (3 years and 8 months ago)

272 views

© 2010 VMware Inc. All rights reserved

Management in the Virtual World


Singapore, Q1 2013

Iwan ‘e1’ Rahabok

Staff SE, Strategic Accounts

e1@vmware.com

| virtual
-
red
-
dot.info | 9119
-
9226 | Linkedin.com/in/e1ang

VCAP
-
DCD, TOGAF Certified

2

Document Information


This deck is part of a series.


Part 1 is “Management in the Virtual World: a technical introduction.”


http://communities.vmware.com/docs/DOC
-
17841


Part 2 is “Resource Management in the Virtual World”


http://communities.vmware.com/docs/DOC
-
17417


Part 3 is “Performance Management in the Virtual World”


http://communities.vmware.com/docs/DOC
-
22034


Part 4 is “Capacity Management in the Virtual World”


http://communities.vmware.com/docs/DOC
-
21791


Part 5 is “Chargeback in the Virtual World”


http://communities.vmware.com/docs/DOC
-
18593


Part 6 is “Compliance & Configuration Management in the Virtual World”


To be written. Email me if you are keen to co
-
author it


This is a very long &
technical material.

Use the
Section

feature
to see how it is
organised.

Use the speaker notes.

3

Session Objective

Why?

What?

How?

Why

is
Management
difference in the
virtual world?

Now that we
know the
difference,
what

exactly needs to
be managed?

Now that we
know what to
manage,
how

to
do it?

4

Do we need to change the way we manage?

“VM
is
just

Physical Machine
virtualised
. Even VMware said the
Guest OS is not aware it’s
virtualised

and it does not run
differently.”

“It is still about monitoring CPU, RAM, Disk, Net.
No different
.”

“Our management
process
does not have to change.”

“All these VM must still
feed into
our main IT
Mgmt

system. This
is how we run our business and it works.”

If only life was that
simple
….

we
would all be
100% virtualised
and have
0%
headache!

5

Why is it different?


Fundamental differences


Not all “virtualisation” is equal


Paradigms shift caused by virtualisation


3
rd

party view on the impact


VM vs Physical Machine


Virtual DC vs Physical DC


vCloud Suite is a whole new world


Objects, Relationship, Properties, Events, Counters


World of >1000 counters and 100000 metrics


The whole datacenter
defined

in software

6

Virtualisation
><

Partitioning

App

OS

HW

App

OS

App

OS

HW

App

OS

App

OS

HW

App

Hypervisor

OS segregation

can be logical or physical

Hardware

Partitioning

Virtual

Machines

OS

Partitioning

VMM

VMM

e.g. LPAR, LDOM, DSD

e.g. Solaris Zones

e.g. vSphere, Hyper
-
V

Partitioning & Virtualisation are 2 different technology, resulting in major
differences
in
functionalities. As both evolve, the gaps gets wider. As a result, managing a Partition is
different to managing a VM.

A new layer, Virtual Hardware

A new layer, Hypervisor

OS purposes reduced.

7

Virtualization’s Impact


Analysis below is dated before mid
-
2008. In 2013, virtualisation has further evolved into cloud, making the
difference to physical world even wider.


Virtualisation is taking IT into the “USB Data Center”, which is very different to physical DC, hence the
technology & thinking to manage also change.

Virtualization will be the most impactful
trend in infrastructure and operations
through 2010, changing:





How you plan


How, what and when you buy


How and how quickly you deploy


How you
manage


How you
charge


Technology,
process
,
culture


Why can’t my Data Center be as easy
to use as the USB port on my
Macintosh? It’s plug and play, mix and
match, multi
-
function, and it just
works

every time, and on the fly.
That’s what I want, I want a
USB Data
Center
! Whoever figures that out, will
really be on to something big…”

CIO,
Fortune
200

8

Virtualisation: a paradigm shift


Virtualisation changes the architecture of IT, turning
Operation As Usual
from best practice to dated practice.

Backup
: agentless and LAN
-
free

Storage
: 10% on SAN becomes 100% on Central Storage. Cluster file
-
system is free. Storage migration is live.

HA
: from 10% server on cluster to 100% servers protected by HA. Cost & Complexity of FT hardware drastically reduced.

Server
: 1000 becomes 100. Planned downtime eliminated.

OS
: no more WWWW. Hardware upgrade is live.

Network
: access switch is 100% virtualised. Network becomes a software

Firewall
: become built
-
in property of VM. Deployed everywhere as it’s part of hypervisor

Anti Virus
: no longer visible by malware. Agentless.

Disaster Recovery
: automated, audit
-
proof, simplified and much lower cost

Desktop
: birth of virtual desktop. 10000 Windows upgrade done overnight

Finance
: Infra move from project to shared service. Chargeback becomes mandatory.

9

A VM is different to a physical server

Component

Physical Machine

VM

BIOS

Unique BIOS for every

model.

BIOS needs updates & management.

1 BIOS

for entire datacenter.

BIOS needs no update & management.

Virtual HW

Not

applicable

A new layer

below BIOS.

Need update on every vSphere release.

Drivers

A lot of

drivers loaded, bundled with OS

Minimal

drivers. VMware Tools

Storage

See

the SAN. Need HBA drivers.

Has multi
-
pathing software.

Has advance FS or Volume Manager

Storage QoS by array

See
local disk
. No FC/NFS.

Multi
-
pathing

by vSphere

FS or Volume Manager not required.

Storage QoS
by vSphere

Network

NIC teaming. 2 cables
/server

VLAN

aware.

VLAN is normally used for segregation.
VLAN complexity.

Impacted by spanning tree. Switch
must learn MAC address.

Network QoS by core switches

NIC

teaming provided by ESXi.

VLAN provided by vSphere

VLAN is not required (same VLAN can be
blocked)

No Spanning Tree, no need to learn MAC
address.

Network QoS
by vSphere

10

A VM is different to a physical server

Component

Physical Machine

VM

HW upgrade

Mostly

offline

Mostly online.
Operation

changes.

Utilisation

5%. No need to monitor

closely.

70%.

Need to monitor
closely

Monitoring

In
-
guest

counter is accurate.

HA provided by
Cluster
-
ware

Availability & Performance monitored by
Mgmt tools

In
-
guest counter
not

accurate.

HA is
built
-
in

by vSphere

Availability & Performance monitoring is via
vCenter

Back up

Back up
agent

and Back up
LAN

needed.

Not needed

in 90% of cases.

Anti Virus

Agent

installed on Guest.

Consume OS resoures and can be seen
by attacker.

Agent

runs on
ESXi

as VM.

Does not consume
OS resources
. Can’t be
seen by attacker.

Firewall

Centrally

located. Another machine.

Change IP = change rules

Distributed.

Attached
on

each VM.

Rules
not

tied to IP or hostname

Asset

Physical

server is an Asset

VM is not an asset

Apps

All

apps can run & supported

Most

apps can run & supported

QoS

Not possible

Hypervisor enables it

11

VM: many properties do not exist in Physical Server

…and the properties change with each vSphere release.

12

A Virtual DC is different to Physical DC

Before

After

200 physical servers.

Various brands,

models, configuration.

300 VM on 30 ESXi on 4 clusters.

1
-
2

config only (large ESX or small ESX)

200 “independent” machines. 90% not on SAN/NAS.

Everyone is on dedicated hardware.

QoS

not required.

300 inter
-
dependent machines.

100% on shared storage.

Performance impact one another.

QoS required.

5% utilised.

70% utilised.

Static environment.

Dynamic environment.

Low complexity.

Lots of work, but

not a lot of expertise required

High complexity.

Less quantity,

but deep expertise required

Clustering

for HA. Agent
-
based Back

Both don’t work well in vSphere. New

solution provided.

13

A Virtual DC is different to Physical DC

Racks
of physical
boxes

in special
airconditioned room.

No built
-
in bird
-
eye view of entire
DC. Mgmt is
detached/separate

from
the boxes it manages.

Need to do
stock take
as actual
>< inventory.

Each server is an Asset. Need a
proper
Asset Management
as
components vary between
servers.

Rack shrinks drastically as servers
are consolidated.

Built
-
in
bird
-
eye view of entire DC.
Mgmt is
integrated. In fact, it is the
only point of entry.

Stock take no longer applicable.
Inventory is built
-
in.

Each VM is not an Asset as it has
no
Accounting value
. ESXi is the asset
now. Asset Management is
simplified as ESXi is homogenous

14

Virtual Datacenter

Physical Datacenter 2

Physical Datacenter 1

Software
-
Defined Datacenter anyway

Physical Compute Function

Compute Vendor 1

Compute Vendor 2

Physical Network Function

Network Vendor 1

Network Vendor 2

Physical Storage Function

Storage Vendor 1

Storage Vendor 2

Physical Compute Function

Compute Vendor 1

Compute Vendor 2

Physical Network Function

Network Vendor 1

Network Vendor 2

Physical Storage Function

Storage Vendor 1

Storage Vendor 2

Shared Nothing Architecture.

Not

stretched between 2 physical DC
.

Production might be 10.10.x.x. DR might be 20.20.x.x

Shared Nothing Architecture.

No

replication between 2 physical DC.

Production might be FC. DR might be iSCSI.

Shared Nothing Architecture.

No

stretched cluster between 2 physical DC
.

Each site has its own vCenter.

15

Inventory Management System

Management System: relevance drops

Before

After

Need to deploy

a complex management s
ystem

just to
manage
physical servers.

ESX is now the new asset to be managed, but

it is much
simpler and less complicated

Each server

has multiple hardware components (CPU,
RAM, HDD, Network
, PCI cards)

VM

properties are no longer
an asset.

Each hardware component has various properties (Part
Number,

Serial Number
, Warranty, Driver)

Not

a need to track individual hardware components,
everything
is virtual. No more unique drivers.

What if someone unplugs a network cable from the
server?

Do
you know where it is
now?

Missing

hardware can be easily re
-
configured
through
vCenter.

There is no physical network cable attached to a VM

H
eadcounts
dedicated to managing these assets

Re
-
allocate

headcount to focus more on business
aspects

16

Virtual DC: Different Architecture and Operation

Component

Physical Datacenter

Virtual Datacenter

Data

Center

Bound by 1 physical

site

DC migration is automated.

Disaster

Recovery

Manual.

Actual Live

DR rarely done, if ever.

Done by each apps

Automated.

Actual Live DR done

frequently.

Provided as service by platform

Network

No

DR Test network.

Same

host name can’t exist on DR Site.

No QoS (no Shares

concept)

DR Test Network
required
.

Same host name can exist

on any site.

Built
-
in QoS.

Back

up

Back up LAN

+ back agent

LAN
-
free

and agent
-
less for most VM.

Clustering

MSCS

vSphere

HA + Symantec AppHA

Firewall

FW

not part of Server.

FW scales separately.

Rules based on IP

Rules embedded into VM.

Rules not limited

to

IP/Hostname.

Engine

embedded into hypervisor

DMZ Zone

Physically separate. IP based separation.

IDS/IPS limited in DMZ

Logically separate. Not limited

to IP.

IDS/IPS in all zones

Chargeback

Optional

Required

Capacity Management

Simple.

Complex. Tools required.

Asset Management

Complex &

Time consuming.

Much

s
impler

Server life cycle

Manual provisioning & decomm.

Automated provisioning

& retiring

17

vCloud Suite: A whole new world…

Object & Relation

Events

Counters

Properties


ESXi Host


Cluster


Data Center


Resource Pool


Folder


vCenter


vSwitch


Distributed vSwitch


vApp


vmnic


Port Group


Datastore


Datastore group


Agent VM


Devices


… many others!


vMotion


DRS


DPM


Storage vMotion


Maintenance

mode


VM Provisioning


Storage IOC

kicks in


Network IOC

kicks in


Hot Add


Hot Remove


Network LBT


Each object in

vCloud Suite triggers

many events



CPU Ready


Co
-
Stop


Ballooning


KAVG


Memory

compression


TPS


vSphere

Replication


>100 counters

has no physical

equivalent!




Share


Limit


Reservation


Fault Tolerant


HA


Master


VM


Boot order


Licensing


vSphere Replication


Each object in

vCloud Suite has

many properties


18

Example of counters: Memory

ESX has 32 counters
for RAM + 10 for
vmkernel RAM.

v
mkernel has ~55
processes which are
tracked.

A

cluster of 8 ESX can
have ~800 counters
just for ESX RAM.

VM has 28 counters for
RAM.

A farm with 1000 VM will
have 28000 counters
just for VM RAM.

With RAM overcommit,
TPS, Ballooning, etc, we
can’t look at individual
VM in isolation.

A lot of these counters are not monitored and
fully
understood by traditional
management tool
. Partial
understanding can lead to misunderstanding.

19

Example of counters: CPU

VM has 14 counters for
CPU.

Each core has its own
value too.

A farm with
1000 VM
can have
50000

counters just for VM
CPU.

With sharing, DRS,
vNUMA, we can’t look at
VM in isolation.

A
new approach is required as we have massive data and they are
related. Intelligence analysis + derived metrics are needed.

ESX has 14 counters for
CPU + 10 for vmkernel
CPU.

As each core has its
own values, a 2 socket
ESXi has >200 counters
just for CPU alone.

A

cluster of
8 ESX
, can
have
>1000
counters
just for ESX CPU

20

Example of counters: Storage (ESXi host)

ESX has 20 counters
for datastore, 25 for
disk, 8 for path, 8 for
adapters, 3 for
vSphere Replication.

Typical ESX host will
have ~1000 counters
for storage as it has
multiple datastores,
disks, adapters and
paths.

VM has 10 counters
for virtual disk, 1 for
disk, 7 for datastore.

Storage has to be a first
-
class object in virtual world management tool.

21

Actual sample: 131 VM


㜶7〰0浥瑲楣猡

22

vCloud Suite: Datacenter defined in software

Complex relationship
and dynamic nature in
vCloud Suite requires
intelligence

(instead of
static rules) and
derivatives

(formulae
based on multiple
counters)

… and this screenshot
is just vSphere!
Adding vCD, vCNS
will add complexity

23

IDC’s take

http://www.apmdigest.com/idc
-
prediction
-
predictive
-
analytics
-
goes
-
mainstream
-
in
-
2012

Operational complexity in
virtualized
, scale
-
out, and cloud environments and composite Web
-
based applications will drive demand for automated analytic performance management and
optimization tools that can quickly
discover, filter, correlate, remediate
, and ideally prevent
performance and availability slowdowns, outages, and other service
-
interrupting incidents.


The need to rapidly sort through tens of thousands


or even
hundreds of thousands


of
monitor variables, alerts and events to quickly discover problems and pinpoint root causes far
exceeds the capabilities of manual methods.


To meet this growing need, IDC expects
powerful performance management tools
, based on
sophisticated statistical analysis
and modeling techniques, to emerge from niche status and
become a recognized mainstream technology during the coming year.


With the proliferation of scale
-
out architectures,
VMs
, and public and private clouds for
applications deployment, the number of monitored elements increases rapidly and often results
in a large stream of data with many variables that must be quickly scanned and analyzed to
discover problems and find root causes. Multivariate statistical analysis and modeling are long
-
established mathematical techniques for analyzing large volumes of data, discovering
meaningful relationships
between variables, and building formulas that can be used to predict
how related variables will behave in the future.

24

Recap: Components that change in SDDC

Area

Why

Performance management

It gets
harder

as
performance of ESX/VM/Datastore can impact one another.

Environment is no longer static. VM activities like vMotion, Storage vMotion, provisioning,
power on, etc all add workload. Troubleshooting something with variables is difficult.

Availability &


Availability Management

vCloud Suite relies

heavily on “central” storage.

A failure of 1 ESX might impact >10 VMs. Cluster failure might impact >100 VM.

Back up can be mostly agentless and LAN
-
free.

DR architecture changes from 2 copy (with standby server) to 1 VM. No need manual
synchronisation.

DR: no longer need to maintain run book.

Capacity management

A lot less

physical resource means we need to watch capacity closer.

Capacity modelling becomes more difficult as performance is dynamic.

Capacity planning now require us to see “the giant machine”

Compliance management

vCloud

Suite
is

a big area that needs to be in compliance.

Compliance becomes more challenging due to lack of physical segregation.

Security

vSphere is

a world that needs to be properly secured.

vCloud

Suite has its own firewall, negating the need of physical firewall

Configuration management

(related to Change
Management)

VM configuration changes needs to be

tracked as it’s more volatile.

vSphere becomes

another thing where config management needs to be applied.

Changes

happens more often and faster. And user expects it faster too.

vCloud

Suite itself becomes another world where lots of changes can happen.

Patch management

The datacenter

itself becomes software, which needs patching and upgrade

25

Area

Why

Seeing

the “giant machine”

(Big Picture)

As

resources are shared, ESX/VM/Datastore can
impact one another
. First thing we check is
the overall health, then follow the hierarchy into a specific VM.

The upgrade of this “giant machine” is a new project for IT. And it has to be handled carefully
as it is as good as upgrading the data center while servers all still running.

A new requirement

is application visibility. We can’t no longer troubleshoot in isolation.

HA event can bring

down multiple VMs, need to know what VMs or Services need to be
restarted too.

Application
-
level info aids in troubleshooting.

ITIL and CMDB

vCloud

Suite

become the

new source of truth, displacing the CMDB (as it is detached).

ITIL principle does not change, but actual processes change drastically.

Process changes (new process, modified process, irrelevant

process). For example,
provisioning and decommisioning processes change drastically.

Financial Management

Shared resources means users do not expect to pay full price.

Tiered Services mean

IT needs to charge differently.

Asset Management

Drastically

simplified

as VM is not an asset. Most network & storage appliance become
software.

ESXi is the new asset, but it can’t be changed without the central management get alerted.
周攠c潮晩朠楳i慬獯aVW慮T慲T楳敤.

Recap: Components that change in SDDC

26

What needs to be managed in SDDC

Management

New

Things

to manage in the data center defined by software

Performance

VM, ESXi, Cluster, Resource Pool, Datastore Cluster

Availability

vCloud Suite,

VM availability

Capacity

CPU, RAM, Disk (Capacity and IOPS), Network

VM level, ESX level, Cluster level, DC level

Compliance

vCloud Suite, Guest OS

compliance

Security

vCloud Suite
roles & permission, VM firewall requirements, vCloud Network & Security

Configuration

vSphere, vCloud Director, vShield Networking & Security, VM config

Some VM property changes no

longer need CM as it has become simple. E.g. server relocation within a data
center has been automated with vMotion.

Some VM property changes are new Change Management items as VM has much richer property than
physical server.

Patch

vCloud Suite patches, Guest

OS, application patch

Big

Picture

Application dependancy, licensing.

ITIL & CMDB

Some processes

become irrelevant. E.g. server relocation becomes vMotion

Some processes become simplified. E.g. storage relocation becomes Storage vMotion

Financial

Need to develop a

new chargeback model that reflects the
sharing

nature of resources

Asset

Software licencing

provides new option to save money.

27

vCloud Suite 5.1 mapping

Management

Product

Remarks

Manage Physical?

Performance

vCloud

Suite

Performance

monitoring and troubleshooting

Yes

Capacity

vCloud

Suite

VC Operations component

No

Configuration

vCloud

Suite

VCM

component

Yes

Compliance

vCloud

Suite

VCM

component

Yes

Patch

vCloud

Suite

VCM

component

Yes

Availability

vCloud

Suite

vSphere Cluster

vCenter SRM

No

Security

vCloud

Suite

vCloud

Network & Security

No

Application

vCloud Suite

vFabric DataDirector

vCenter Infrastructure Navigator

vFabric Application Director

vFabric Data Director

No.

Workflow

vCloud

Suite

Business workflow: vClou
d Automation Center

Technical workflow: Orchestrator

Yes

Financial

vCloud

Suite

VMware ITBM

Chargeback component in vCenter

Operations

VMware ITBM

No.

Yes.

Change

Partner

Need to be

aware of vCloud Suite changes

Yes

Asset

Partner

Significant reduced

in software
-
defined DC

Only physical

28

A
Differentiated Approach is required


SERVICE
PROVISIONING


OPERATIONS

MANAGEMENT


BUSINESS

MANAGEMENT

INTEGRATED,

AUTOMATED
MANAGEMENT

29

Provision infrastructure
and application services
on VMware private and
public clouds, other
hypervisors, physical and
Amazon EC2 based on
business and IT policies

HYBRID, HETEROGENEOUS

IAAS

PROVISIONING

Multi
-

platform

Hybrid

Multi
-

provider

Model and automate
deployment of applications
to any registered cloud
using blueprints and
standardized application
components and settings

APPLICATION

PROVISIONING

Deliver a virtual desktop
cloud by automating and
orchestrating the rapid
creation of desktops that
meet exact specifications
of both the business and
individual users

DESKTOP

PROVISIONING

VMware Service Provisioning

30


Desktop


Production

User Centric, Business Relevant


Dev/Test

vCloud Automation Center

Shared Infrastructure

Policies That Enforce A Business
Relevant Policy

31

VMware ITBM Suite

32


SERVICE
PROVISIONING


BUSINESS

MANAGEMENT

vCloud Automation
Center 5.1

vFabric
Application
Director 5.0

VMware

IT Business
Management

Suite 7.5

INTEGRATED,

AUTOMATED
MANAGEMENT


OPERATIONS

MANAGEMENT

vCenter

Operations

Management

Suite
5.6

Operations
Management

33

Operations
Management: Fundamental shift

Traditional DC

Software
-
defined DC

Detached, separate system

Embedded into

the Platform

Orchestration (workflow)

Automation

(policy)

Static threshold (upper)

Dynamic threshold

(both)

Rule
s
-
based

Analytic
-
based

Little knowledge of virtualisation

Deep

knowledge of the Platform

34

Embedded

35

Embedded

This is the new UI in vSphere 5.1.

Add
-
on are being embedded into
vSphere to give single
-
pane of glass:

-
Workflow & Orchestration

-
Backup

-
Application Dependancy

-
Host
-
based Replication

36

Embedded

37

Policy
-
based, not workflow
-
based automation

Policy based

38

Policy
-
based, not workflow
-
based automation

Policy based

39

Dynamic threshold for dynamic environment


Static threshold can be misleading


During peak, it is common for VM to reach high utilisation.


Static threshold will generate alerts when they should not.


During non
-
peak, it might be abnormal for VM to reach moderate utilisation.


High

threshold alone not sufficient


Do you set static threshold when CPU or RAM utilisation drops below 5%?


Each VM differs. And the
same

VM differs depending on day/time.


Intelligence required to analyse each metrics and their expected “normal” behaviour.


Dynamic Threshold

40

Common example: daily run

Time

Dynamic Threshold

Source: Michael Preston presentation at VMworld 2012.
blog.mwpreston.net


41

Common example: daily run

Dynamic Threshold

42

Counters and Badges


A vCenter farm with
100
VM and
10
ESX will have
>50000 counters!


It is not humanely possible to look at them, let alone
analyse them.


vCenter presents
raw

counters


e.g. What does Ready Time of 1500 in Real Time chart
mean? Is value of 2000 in Real Time chart better than value
of 75000 in Daily Chart?


e
.g.
Is memory.usage at 90%
at ESXi level good
or bad?


E.g. Is IOPS of 300 good or bad for datastore XYZ?


Single counter can be
misleading


e
.g. Low CPU usage does not mean VM is getting the CPU, if
there is Limit, Contention

and Co
-
Stop.


e
.g. To see disk performance, we need to see multiple
counters at multiple layers (VM, kernel, physical)


Different counters have different
units


GHz, %, MB, kbps, ops/sec, ms


This makes analysis even more complex



S
tandardises
the scale into 0
-

100.

1 universal unit. Minimise the
“translation” in our head.

Can be >100 if demand is unmet

Universal
. Apply to CPU, RAM,
Disk,
Net, etc.

Counters derived using
sophisticated formula, not just
aggregated.

For the same counter, different
objects use different formula.

Derived Counters

Analytics

43

Samples of Derived Metric: Health


Health Score of an Object = MAX (Abnormal Workload, Faults)


Abnormal
Workload per Metric =
Geometric Mean (MAX (Abnormality (Capacity/Entitlement), Abnormality (Demand/Usage
)),
Workload)


Abnormal
Workload per Object = Score Aggregation (Abnormal Workload per Metric
)


Fault depends on the object as every object is different:

Cluster

= HA Issues = MAX (HA Insufficient Failover Resources, HA Failover In Progress, HA Cannot Find Master)


Host

= MAX (Hardware Issues, HA Issues)

Hardware Issues
= MAX (Network Issues, Storage Issues, Compute Issues, CIM Issues)

Network Issues = MAX (Network, DVPort, VMNic)

Network = Max_of_all_instances (Network Device)

DVPort = Max_of_all_instances (DVPort Device)

VMNic = Max_of_all_instances (VMNic Device)

Storage Issues = MAX(Storage, SCSI, VMFS heartbeat, NFS server, CIM Storage)

Storage = Max_of_all_instances (Storage Device)

SCSI = Max_of_all_instances (SCSI Device)

VMFS heartbeat = Max_of_all_instances (VMFS heartbeat Device)

NFS server = Max_of_all_instances (NFS server Device)

Compute Issues = MAX (Error, PCIe)

CIM Issues = MAX (Processor, Memory, Fan, Voltage, Temperature, Power, System Board, Battery, Other
Health, IPMI, BMC)

HA Issues
= HA Host Status


VM

= MAX (FT Issues, HA Issues)

Analytics

44

Dynamic Threshold Analysis


DT analysis runs
nightly


New dynamic thresholds are computed for
each

metric


Data categorization


Tries to identify stat as linear,
multinomial, step function, etc


If one of those matches, that DT function
is used


Otherwise: competition


Sigma: assumes hourly cycles


CCPD: tries to find normal cycles


ACPD: tries to find abnormal cycles


Winner is assigned based on metric
trending accuracy


The same metric may get
different

DT
function on different day

Data

Categorization

Linear DT

Multinomial

DT

Quantile

Sigma DT

Sparse

S
igma DT

CCPD

ACPD

DT Scoring

For each metric

Step Function

DT

Dynamic

Thresholds

Analytics

45

Dynamic Threshlold: Algorithm







,,
1,1 1,2,
1 1
0,0,,
1 1
1 1,1
1 1
,,...,1,1 1,2,,,,
1 1
1 1,1
0,0,,
1 1,1
(,,...,) 1
i j i j
m m
m m m
i j i j
m m m
i j i m j
P P P m m i j i j i j
m m m
i j i m j
i j i j
i j i m j
p p p p p p
 
  
  
 
 
   
 
 
   
   
 
 
  
 
 
 
 
 
     
 
 
 
    
 
 
 
 
 
 


0,0
1
1 1
,
1 1,1
1 1
1
,,,
0
1 1,1
where 1,0 1 and
m m m
i j
i j i m j
m m m
z t
i j i j i j
i j i m j
p
p p p z t e dt


 
   
 

 
   
 
 

 
 
 
 
 
     
 
 

   
   





 
  
 
 
 
 
 
 


1
,,1,2,1
1
,1,1
0,0,,1,2
1
The marginal distribution of the row
of is:
Dirichlet ,,,..., for 1,.
..,1
(,...,)
Dirichlet ,,,.
th
m
i j i i i m
j
i i m
m
m j m m
j
i J
i m
p p
 
  
 
   
 
 
 
 
 
 

 
 
 
 
 
   
 
,0,0
1 1
0,0,,
1 1,1
..,, for
where
m m
m m m
i j i j
i j i m j
i m
It is pretty difficult for a human to beat the computer in analysis of the
big data.

The above is one of the many algorithms applied by vCenter Operations.

Analytics

46

Deep understanding of vCenter is required

B
uy more RAM?

Here is a common example of why

a deep understanding of vSphere counters make a
huge

difference.

Deep knowledge

47

Deep understanding of vCenter is required


Yes, buy more RAM.

ESXi has 32 GB RAM.

It is highly used

Deep knowledge

48

Deep understanding of vCenter is required

What?! It’s been high constantly for the last 24 hours! Better buy more RAM now.

But hang on! This is ESXi
-
06 host in VMware ASEAN lab. I know who use them


vCenter Ops shows

a very different data.

Memory is only 32%.

Plenty of headroom.

Deep knowledge

49

vCenter Operations shows

a different situation.

Memory is only 32%.

Plenty of headroom.

It just saves us from a costly
RAM upgrade project

Deep knowledge

50

New UI standard for SDDC

vCenter 5

vCenter Operations 5

The above shows the new standard of VMware user interface, which was started by vSphere 5.

vCenter Operations 5.6, vCenter Orchestrator 5.1, vCloud Director 5.1 use the same UI standard.

There are plans to make all products to use the same UI standard

51

52

Eventual goal: 1 User
Experience

53

Demo


UI integration


Backup: VDP


Replication: Host
-
based replication


vSphere calling VC Ops


VIN

© 2010 VMware Inc. All rights reserved

Thank you