MICROSOFT SQL SERVER HIGH AVAILABILITY AND DISASTER RECOVERY

grrrgrapeInternet και Εφαρμογές Web

31 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

74 εμφανίσεις

MICROSOFT SQL SERVER

HIGH AVAILABILITY

AND DISASTER RECOVERY

Michael Poremba // October 2008

Database HA & DR Experience…



Work with business to determine HA or DR
requirements for applications and data?



Design HA or DR solutions?



Administer HA or DR process?



Still learning MS SQL Server HA & DR capabilities?

2

Scope of this Presentation


Data Availability


Data recovery


High availability


Disaster recovery


Technology Focus


MS SQL Server


Physical servers


SANs


In
-
depth how
-
to

(available elsewhere)


Partitioned views (federated)


Advanced DBA techniques


Custom application logic


3
rd
-
party software solutions


Alternate DBMS engines

(e.g. Oracle; DB2)


HA on virtual machines


Complex scenarios & solutions


Load balancing

Presentation Focus

Beyond Scope of Presentation

3

So, you need to make your

production database bulletproof…

Introduction to Data Availability

4

Data Availability Continuum

Degrees of protection for information systems:

Business Risk

Solution

Data Recovery

D
ata loss

Redundant data

High Availability

Downtime

of

database service

Redundant system

components

Disaster Recovery

D
owntime of

business operations

Redundant systems

and facilities

5

Business Case for Availability


Keep business
-
critical
applications available


Secondary:


Server maintenance


Protect against loss of
data center


Secondary:


Application upgrades


Infrastructure upgrades


High Availability

Disaster Recovery

6

Service Level Agreement (SLA)


Permitted downtime (planned vs. unplanned?)







Acceptable data/transaction loss


Application response times


Mean time
to recovery


Note: Database uptime is not equivalent to application availability


Failures of other application services


Network outages

Uptime SLA

Downtime

per Year

Downtime

per Month

99.9%

8.76 hours

43.8

minutes

99.99%

52.6

minutes

4.38 minutes

99.999%

5.26

minutes

0.438 minutes

7

Protect What?


Application data stores


Databases


Files


Other data repositories


Database services


DBMS availability for applications


Application services


Application availability for users and external systems


Databases are the heart of most information systems;

they deserve the highest
affordable

protection.

8

Database Failure Scenarios


Storage subsystem


Disk


Controller


Network


Server


Power


Operator errors


DBMS interruption


Drops / deletes


Application defects


DBMS defects


Data corruption

Physical Infrastructure Failures

Logical Data Failures

9

Service Recovery Strategies

Standby

Mode

Failover

Behavior

SQL Server Feature

Cold

standby


Manual intervention required

to restore offline data copy


Backup and restore

Warm

standby


Data copy online and ready


Ma湵n氠fa楬潶敲 r敱畩u敤


Tra湳act楯渠汯l
s桩hp楮i


Database

m楲r潲楮i

Hot
standby


A畴潭at楣 fa楬潶敲


Database

m楲r潲楮i


䙡楬潶敲 c汵lt敲楮i

10

Data Recovery

Terminology

Terminology varies for source vs. copy

High Availability Strategy

Data Source

Data Copy

Backup

and Restore

Database

Backup

Log Shipping

Primary

Secondary

Standby

Database Mirroring

Principal

Mirror

Failover Clustering

Primary

Active

Secondary

Passive

Standby

Inactive

11

[Briefly…]

Data Recovery

12

Database Backups


Traditional backup types


Full backup


Differential backup


Transaction log backup


Disk is better than tape


First backup to disk (separate physical disk volume)


Detect exceptions encountered during backup


Verify backup files


Copy backup files to tape or remote disk


Data retention policy for backup files

13

Database Backup Strategy

Backup of user databases not sufficient for recovery


System database


Master database


MSDB database


Model database


External data stores…

14

Synch with External Data Stores

Synchronize recovered database with external data
stores:


Identity column seeds


Full
-
text indexes

(SQL Server 2000)


LDAP entries


File system objects


Other databases

15

Backup Retention Policy


Location of backup files


Duration of retention


Protection of sensitive data


Sarbanes/Oxley (SOX)


HIPAA


Internal policies for data management and protection


Access to backups from offsite data storage

16

Data Recovery Process


Backup file sets


Full baseline, differential, and
transaction logs


Retrieving backup files


Offsite storage


Tape


Network copy


Dependency on multiple
people to get access to
backup files



Recovery strategy depends on
failure scenario


Create comprehensive failure
matrix


Devise recovery strategy for
each scenario


Does worst
-
case recovery
scenario fit within SLA
parameters?


Recovery time; SLA


Include future data growth in
recovery plan


Fully test recovery
strategies

practice is
essential

17

High Availability

18

High Availability


Minimize or avoid service downtime


Whether planned or unplanned


When components fail,

service interruption is brief or non
-
existent


Automatic failover


Eliminate single points of failure (as affordable)


Redundant components


Fault
-
tolerant servers

19

Redundant Components

Objective: Avoid single points of failure (where affordable)

Approach: Use redundant components for database service


Database server nodes


Server components


ECC RAM; failure
-
tolerant HW & OS


DBMS instance


User databases


Storage devices


Storage unit components


MPIO: Interfaces; paths; switches; controllers


RAID: Disks


Networking


MPIO: Interfaces; paths; switches


Data copies


E.g. Recovering torn page from mirror in SQL Server 2008

20

Transaction Log Shipping


Warm standby solution


Duplicate user database


Copy transaction logs to standby server & restore


Database available for read
-
only access


Users must disconnect for logs to be applied


Two database licenses required if querying standby


Manual application failover


Supported on standard hardware


Possible data loss (unapplied transactions)

21

Database Mirroring


Redundancy at
user database

level


Duplicate copy of user database


Independent storage devices


Multiple copies of instance databases


Mirrored over private network channel


Mirror always redoing transactions from principal


Negligible impact on transaction throughput


Multiple mirroring modes:


High
-
availability
: commit @ log on mirror; automatic failover


High
-
protection
: commit @ log on mirror; manual failover


High
-
performance
: commit when logged on principal


Very fast automatic failover

seconds


Requires witness server


Mirror
-
aware application client connection


Provided by client library


Database connection string must specify both servers


Mirror may be available for read
-
only access (snapshots)


Works with standard hardware

22

Mirror Witness


With mirroring, more than one server is required to
decide on failover


Witness automates failover from primary to mirror


Watches database availability


Reports observations back to principal and mirror


Runs in separate SQL Server instance (Express is OK)


Prevents “split brain” scenario


Very low resource consumption


Can be witness for multiple databases


Not a single point of failure

23

SQL Server Failover Clustering


Two clustered nodes


Active/Passive
config


MS SQL services


Running on virtual server


Shared storage device


User databases


System databases


Quorum drive


Redundant internal
components


24

Active/Passive Failover Clustering


Redundancy at
database instance

level


All databases fail over together


Shared copy of system databases


Single data copy on shared storage
device


No I/O overhead reducing throughput


Storage unit is single point of failure for
cluster


All database services are clustered


SQL Agent; Analysis Services; Full
-
Text
engine, MS DTC


Automatic failover (up to minutes)


DBMS accessed over virtual IP


Database not available from inactive
node for DB client connections


Storage is controlled by one cluster node
at a time


Requires hardware certified by Microsoft
for Microsoft Cluster Service


25

HA Comparison


Scope: user DB


Standard hardware


One SQL license

(unless querying snapshots on
mirror)


Very fast failover (seconds)


OS flexible (e.g. 32/64)


Independent storage


Independent services


Reporting on mirror


Geographic separation OK


Scope: DBMS instance


Certified hardware


One SQL license

(only one node can access
database)


Automatic failover (up to minutes)


Enterprise OS


Shared storage


Clustered services


Standby not available


Servers are usually co
-
located

Database Mirroring

Failover Clustering

26

Considerations for HA


HA complements backup and recovery strategy


Does not

replace data recovery plan


Application service availability is often determined by
a network of interdependent services


Availability can be difficult to define (e.g. partial failures)


Failure probability difficult to measure or compute


Increased system complexity could lead to lower service
availability!


Operator error a leading cause of availability issues


Increased number/types of system components


More complex to configure and administer

27

Data Recovery Requirements

Requirements

Backup and

Recovery

Log Shipping


DB Mirroring


High
-
Performance


DB Mirroring


High
-
Protection


DB Mirroring



High
-
Availability


Failover Clustering

C
ost

Low

Low/Med

Medium

Medium

Medium

High

Relative complexity

Low

Low

Medium

Medium

High

High

Data loss

Possible

Latest log

Possible

None

None

None

Scope of duplication

Database

Database

Database

Database

Database

DBMS

Failover

Downtime

Downtime

Manual

Manual

Seconds

Up to minutes

Client redirect

Manual

Manual

Automatic

Automatic

Automatic

Automatic

Rolling upgrades &
maint
.

No

No

OS & DB

OS & DB

OS & DB

OS

Access

data on secondary

Restore

Read
-
only

Snapshot

Snapshot

Snapshot

No

Geographic separation

OK

OK

OK

Latency
?

Latency
?

Latency
?


28

Disaster Recovery

29

Disaster Recovery


Minimize downtime of business operations


Redundant systems and facilities


SQL Server features:


Transaction log shipping


Database mirroring


Failover clustering


Other technologies


Storage
-
based mirroring

30

Disaster Recovery Planning


Data security requirements


Clarify SLA, data loss allowance


Evaluate system cost vs. data protection


Failure analysis


System redundancy


Process validation


Training for personnel


Prevention practices


Executing disaster recovery and business continuity


Practice, practice, practice

31

Business Continuity Facility


System redundancy


Systems: Web servers app servers; database, etc.


Data: Databases; data files on OS; security info, etc.


Networking: Domain, routing, subnet, VIPs, etc.


Alternate facilities


Network bandwidth


Physical or network access by operations staff


Failover


Often a
deliberate decision
, using manual failover

32

Data Redundancy


Synchronous redundancy


Network bandwidth cost


Network latency and application performance


Network reliability


Asynchronous redundancy


Risk of data loss


More cost
-
effective


Resilient to network latency issues


Candidate Technologies


SQL Server database mirroring


Failover clustering with SAN
-
based mirroring

33

DR Using Database Mirroring


Two sites: Primary and DR location


Separate failover clusters at each site


SQL Server database mirroring between sites

34

DR Using SAN
-
Based Mirroring


Two sites: Primary and DR location


Four
-
node failover cluster; one virtual IP address


SAN
-
based mirroring between sites


Manual cluster failover

35

[Skip if time is running short.]

Complimentary Technologies

36

SAN
-
Based Data Mirroring


Data blocks duplicated at storage level


Similar to transaction log shipping


Copy performed in sequence and coordinated with
database checkpoint


Ensures consistency of mirrored data files


Synchronous or asynchronous mirroring


Co
-
located or geographically dispersed

both are OK


SAN link bandwidth must support database I/O rate


May require extra feature support from SAN vendor


Could rely on Failover Clustering for HA

37

SQL Server Database Snapshots


Read
-
only point
-
in
-
time database snapshot


No data is copied

instantaneous


Historical snapshot pages tracked separately from
changing pages


Snapshots can be maintained indefinitely


Limited only by available storage


Snapshot copy can be used for reporting


Read
-
only, so no locking issues


38

SQL Server Replication


Transactional replication


High transaction volume


Low data latency required


Mixed technologies:
Integrates with other DBMS


Merge replication


Bi
-
directional data changes


Typically server
-
to
-
client


Snapshot replication


Large, infrequent data
changes


Data change latency OK


Best for smaller data sets


Subscriber databases
available for reporting


Replicate data subsets


Some data loss is possible


Periodically validate
replicated data


39

App Development and Admin

40

Considerations for App Developers


App services tolerant to database service interruptions


Application transactions must be handled in code

data consistency


Exception handling for transaction retry, connection recovery


Requires coding standards, code reviews, and testing


Bulk data operations


Transaction volume impacts rollback time during failover


Batch jobs must be run on alternate nodes


Don’t bypass transaction logging


Synchronization with external data sources?


Be aware of database recovery model


Mirroring uses FailoverPartner in connection string


Use TCP/IP as client protocol

41

Considerations for Admins


Use identical server hardware, when possible


Design network redundancies, when feasible


Consider network latency for geographic separation


Always manage through virtual cluster, not individual cluster nodes


Retest failover/failback after HA maintenance


Diagnose after failover


Repair alternate node


Resynchronize data, as necessary


Be aware of primary/secondary locations


Ensure application services are connected and functioning properly


Keep server node configurations synchronized:


Service pack and patch levels


Duplicate non
-
redundant resources


Jobs; logins and permissions; OS & sys objects


42

HA Risks


System performance degradation


HA system complexity leads to availability issues


Some system failures not planned for


Backup and recovery planning incomplete


Administrators not fully trained or informed


User databases not synchronized with other data
sources


43

Common Admin Use Cases


Maintain HA nodes


Hardware maintenance


Rolling upgrades and software patches


Resynchronize the redundant copy


Re
-
synch mirror


Restart log shipping


Diagnose and repair


Diagnose cause of failover


Repair failed node and restore failover capabilities


Test failover and failback

44

Common Admin Actions

Train and practice administrators to:


Initiate a database mirror


Manually failover mirror database or cluster node


Add/remove passive node from mirror or cluster


Upgrade/patch servers nodes


Restart or redirect application services




45

More Information

46

References

Books


Microsoft SQL Server 2008 High
Availability with Clustering &
Database Mirroring

by Michael Otey, 2009.


Microsoft SQL Server High
Availability

by Paul Bertucci, 2004.


Pro SQL Server 2005 High
Availability

by Allan Hirt, 2007.


Pro SQL Server 2005 Replication

by Sujoy Paul, 2006.


Pro SQL Server 2005 Service Broker

by Klaus Aschenbrenner, 2007.


The Rational Guide to SQL Server
2005 Service Broker

by Roger Wolter, 2006.

High Availability

Related Topics

47

References

Presentations

48


Microsoft Load Balancing and Clustering

http://ce.sharif.edu/courses/84
-
85/2/ce317/resources/root/lecture%20slides/

14.%20Microsoft%20Load%20Balancing%20and%20Clustering.ppt


SQL Server 2005 High Availability

http://www.atlantamdf.com/Presentations/AtlantaMDF_111207HA.ppt


High Availability Technologies In SQL Server 2000 And SQL Server 2005

http://202.181.238.2/hk/teched2004/ppt/Day_2_Rm407/DAT431(1330
-
1445).ppt


Meeting the Availability Challenge

http://download.microsoft.com/download/E/D/C/EDCF54DB
-
19CD
-
4882
-
9FC4
-
4F7D46FCEAA6/HighAvailability.ppt


Disaster Recovery Mistakes

http://www.sqlsig.org/Oct%2011%20DASSUG%20
-
%20Jason%20Hall%2010
-
11
-
07%20MM.ppt


SQL Server 2005 High Availability

http://blogs.msdn.com/sql2005event/attachment/564303.ashx


Effective Usage of SQL Server 2005 Database Mirroring


http://www.sqlserver
-
qa.net/SSQA
-
Effective%20Usage%20of%20SQL%20Server%202005%20Database%20Mirroring_show.ppt

References

Articles


Achieve High Availability for SQL Server

http://technet.microsoft.com/en
-
us/magazine/cc162477.aspx


Geographically Dispersed Clusters in Windows
Server 2003

http://www.microsoft.com/windowsserver2003/techinfo/overview/clustergeo.mspx


Restoring file and filegroup backups
http://support.microsoft.com/kb/281122/en
-
us


Restoring specific tables or rows from backups

http://support.microsoft.com/kb/321836/en
-
us


Maintaining Availability During Upgrades

http://msdn.microsoft.com/en
-
us/library/ms191449.aspx

49