Disaster Recovery on a Budget

californiamandrillSoftware and s/w Development

Dec 13, 2013 (3 years and 9 months ago)

84 views

Santa Clara, CA USA

October 2013



Disaster Recovery on a Budget

Douglas Soltesz

VP CIO

Budd Van Lines

@
DougSoltesz


1

Santa Clara, CA USA

October 2013



About these slides

1.
There is more to Disaster Recovery than can
be covered during this presentation


2.
These slides are available online


Don’t try to write down all the hyperlinks!


OSS will publish these slides online


http://www.doubleparity.com/OSS2013


3.
Saving money by using Free software can
result in more time spent troubleshooting


Sometimes you get what you pay for



2

Santa Clara, CA USA

October 2013





Budd
Van Lines Lessons Learned


3 Major Storms in NJ over past 2 years


Power Loss at NJ HQ


Employees without power


Roadways closed for days


No Gas!!!



3

Santa Clara, CA USA

October 2013



4 Steps to Disaster Readiness

1.
Virtualize

2.
Replicate

3.
Automate

4.
Communicate


4

Santa Clara, CA USA

October 2013



Step 1


Virtualize Critical Systems


Servers that are virtualized can be moved
easily between disparate hardware


5

Santa Clara, CA USA

October 2013



Hypervisors on
a Budget


VMware
vSphere


Has a free edition but limitations can prevent
successful Automation of failover to DR Site


http://techhead.co/vmware
-
vsphere
-
5
-
1
-
hypervisor
-
free
-
esxi
-
5
-
1
-
limitations
/



Essentials Kits available


give access to
vCLI
,
vCenter
,
Powershell
; but still really limit functionality


http://store.vmware.com/store/vmware/en_US/cat/ThemeID.2485600/categoryID.661
92900


6

Santa Clara, CA USA

October 2013



Hypervisors on a Budget


Microsoft Hyper
-
V


Hyper
-
V Is
technically included with Windows
Server Licensing


Can be managed via
Powershell

cmdlets


Manage multiple systems with:


Microsoft Systems Center Virtual Machine Manager
(SCVMM) (NOT included / “free”)


5Nine Manager for Hyper
-
V (has free edition)


http://www.5nine.com/5nine
-
manager
-
for
-
hyper
-
v
-
product.aspx


Others exist but not free



vtUtilities

http://vtutilities.com/


7

Santa Clara, CA USA

October 2013



Hypervisors on a Budget


Xen

/ XCP


Built into many Linux
distros



http://
wiki.xen.org/wiki/Xen_Overview



Xenserver

(Citrix) now open sourced & includes
management
XenCenter


http://www.xenserver.org
/



Many other free management tools


http://wiki.xen.org/wiki/XCP_Management_Tools


http://wiki.xen.org/wiki/Xen_Management_Tools


8

Santa Clara, CA USA

October 2013



Hypervisors on a Budget


KVM (Kernel based Virtual Machine)


Included in mainline Linux, as of 2.6.20


http://
www.linux
-
kvm.org/page/Main_Page



Red Hat offers support &
management



SmartOS

(
Joyent
) based on
illumos

adds ZFS,
dTrace

& management (Project FIFO
)


http://
wiki.smartos.org/display/DOC/Welcome+to+SmartOS



Many other free management tools


http://www.linux
-
kvm.org/page/Management_Tools


9

Santa Clara, CA USA

October 2013



Virtualize on a Budget Summary


Virtualization abstracts a server OS from hardware
aiding Disaster Recovery in many areas


Live
/ Cold Migration


Gives High Availability during single host outage/maintenance


VMs
are portable between sites during major
disasters



Virtualized networking allows seamless failover
between
switches



Shared storage should be more fault tolerant than
local storage


10

Santa Clara, CA USA

October 2013



Step 2
-

Replicate


Second Datacenter is required


Cloud /

Colocation /
Branch
Office



Connect with IPsec
VPN over Internet


Cable Modem /
Fios


Static
IPs



Copy of critical VMs


RPO (Recovery point objective)


SAN
vs

VM replication


11

Santa Clara, CA USA

October 2013



Replicate


Cloud
/
Colo

/
Branch


Branch


Your company is already paying for the space


Hand me downs from Primary Site


Your equipment, power, A/C


Colocation


Your
equipment; Primary
Site hand me downs


No need to manage power, A/C, Internet Access


Costs for a full rack around $12k/year


Cloud


No equipment, power, A/C, networking headaches


Only pay for VMs when running


Hard to get SAN to SAN replication


Around $100/TB/Month storage


12

Santa Clara, CA USA

October 2013



Replicate Critical VMs


VM Based Replication


Often a feature of 3rd party backup software


Hard to get sub 15 minute RPO on VMs


Many rely on hypervisor snapshot


Leaf Coalesce can be an issue in
Xen
, Hyper
-
V, (KVM
?)



SAN to SAN Replication


Requires same SAN OS on both sides


Can be more expensive


Works with any hypervisor


Lowest RPO can be achieved


VMs are “Crash Consistent”


13

Santa Clara, CA USA

October 2013



SAN to SAN Replication on a Budget


Use ZFS (now
OpenZFS
)


Nexenta

and
TrueNAS

offer ZFS systems with HA,
replications and enterprise support


Install
Napp
-
it on
OmniOS

or
OpenIndiana


http://www.napp
-
it.org


Install
FreeNAS

(
ZFS
on
BSD)


http://www.freenas.org


Script ZFS send /
receive
on any
OpenZFS

system


http://open
-
zfs.org


http://www.aisecure.net/2012/01/11/automated
-
zfs
-
incremental
-
backups
-
over
-
ssh/


14

Santa Clara, CA USA

October 2013



The VMware SAN Budd Built


NexentaStore

32TB license w/Gold Support & HA Plugin


JBOD


2
Supermicro

SuperChassis

847E26
-
RJBOD1


2 STEC
ZeusRam

Drives for ZIL


2 OCS
Talos

2 C (240GB) Drives for L2ARC


36 Seagate Constellation ES SAS 6Gb/s 1
-
TB HD


ST32000424SS


Setup in 18 Mirrored
vDevs

(Raid 10)


2 Controllers


Supermicro

Chassis w/X8DAH+
-
F Motherboard


144GB RAM


Dual Intel E5606 Xeon


Quad Core @ 2.13GHz


LSI 9205
-
8e SAS Controller

Entire Solution


Running 100 VMs & File Server

$34,000


15

Santa Clara, CA USA

October 2013



Notes on Building a ZFS SAN


Don’t skimp on the parts


You’re
already saving a ton of $$$


Always use SAS over SATA


Don’t buy parts on
Ebay

for mission critical data


Build in extra
redundancy


Only use equipment on the HCL
list



Some vendors will sell you the hardware
without the support / software


16

Santa Clara, CA USA

October 2013



Part 3
-

Automate


How quickly
can you
fail over to another
site?



What is your company Recovery Time
Objective (RTO)?



Have
you created a
Runbook
?


http://
en.wikipedia.org/wiki/Runbook



Very few products on the market with
Automated
Runbook

for Disaster Recovery


VMware SRM


17

Santa Clara, CA USA

October 2013



Runbook

Process Example


Failing over critical VMs on a single LUN

1.
Start SAN replication to backup site

2.
Power off critical VMs in assigned order

3.
Unregister Critical VMs from hosts

4.
Unmap

LUN from hosts

5.
Rerun SAN replication to backup site

6.
Reverse SAN replication backup site to primary

7.
MAP LUNs on backup site hosts

8.
Register VMs on backup hosts

9.
Power Up VMs on backup host in assigned order

10.

Re
-
IP VMs if subnets are different

11.

Update routing, DNS, NAT


18

Santa Clara, CA USA

October 2013



How to Automate Failover


Build a
system at the
backup site to run failover scripts


Windows VM


Use
Powershell

commands with VMware & Hyper
-
V


Use Plink to script against ZFS systems, KVM &
Xen


http://the.earth.li/~sgtatham/putty/0.53b/htmldoc/Chapter7.html


Script commands together using
Powershell

or System Center
Orchestrator


VMware Orchestrator


http://www.vmware.com/products/vcenter
-
orchestrator/


Linux VM


Shell, Python, Perl Scripts to Automate failover


VMware
vMA



https://my.vmware.com/web/vmware/details?downloadGroup=VSP51
0
-
VMA
-
510&productId=285


19

Santa Clara, CA USA

October 2013



Automation Commands


VMware


PowerCLI

http://www.vmware.com/support/developer/PowerCLI/index.html


vMA

vCLI

& Perl
http://www.vmware.com/support/developer/vima/


Examples


Unmount

NFS
datastore


Remove
-
Datastore
-
Datastore Datastore
-
VMHost 10.23.112.234
-
Confirm:$false


esxcli

storage
nfs

remove
-
v

NFS_Datastore_Name


Stop VM


Stop
-
VM
-
VM
VM

-
Kill
-
Confirm:$false


esxcli

vm

process
list


esxcli

vm

process kill
--
type=
[
soft,hard,force
]

--
world
-
id=
WorldNumber


Unregister VM


Remove
-
VM VM


vmware
-
cmd

--
server

vcenter
--
vihost

esxhost


s unregister

path_to_vmx_file



19

Santa Clara, CA USA

October 2013



Automation Commands


Hyper
-
V


PowerCLI

http://technet.microsoft.com/en
-
us/library/hh848559.aspx


Xen


Xencenter

XE cli
http://docs.vmd.citrix.com/XenServer/6.0.0/1.0/en_gb/reference.html


XE
http://wiki.xen.org/wiki/XCP_Command_Line_Interface


KVM


Depends on Manager used


Nexenta


http://info.nexenta.com/rs/nexenta/images/NexentaStor_User_Guide_3.1.5.x.
pdf


Illumos

ZFS


http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm







2
1

Santa Clara, CA USA

October 2013



Step 4
-

Communicate


If users can not make it into the office how will
they work?


Post status updates to Facebook / LinkedIn if
corporate site down


Forward office lines to cell phones


Deploy Disaster Readiness kits to key employees


Laptop


MiFi


Car Inverter








22

Santa Clara, CA USA

October 2013



Step 4
-

Communicate


Now that the systems are up and running how
will end users connect in?


VPN


http://openvpn.net


VDI


Host desktops at DR Site


Citrix / Remote Desktop / Terminal Server



23

Santa Clara, CA USA

October 2013



Questions?

Slides available @

http://www.doubleparity.com/OSS2013


24