WebSphere Performance Isolation

watermelonroachdaleInternet και Εφαρμογές Web

30 Ιουλ 2012 (πριν από 5 χρόνια και 2 μήνες)

241 εμφανίσεις

Behavior Isolation in Enterprise
Systems

Mohamed Mansour

mansour@cc.gatech.edu

Feb14, 2007

2

Client 1

Message queue

Travel Industry Example

Client 2

Client 3

clearinghouse

Airlines

GDS

Feb14, 2007

3

GDS Scale


Mission critical environment


24/7


11.5 million queries/days


2
-
16 seconds processing time


~10GB data set, 20% annual
increase


8 updates per day, moving to
seamless updates


Message queue

GDS

Feb14, 2007

4

Effect of Request Stream

Feb14, 2007

5

Why We Care?


Business


Consumer Loyalty


Violates contractual agreements


Technical


Occurs even in highly engineered systems


Can cause ripple effects

Feb14, 2007

6

Lets Just Fix it!


Difficult to identify root cause


Constant data changes


Request stream dependency


Sometimes can’t fix root cause


3
rd

part libraries



Interactions with OS, and H/W caches


Complex code base


Feb14, 2007

7

I(solation) Queue


Dynamic management of message streams


Correlate message sequences with server
behavior


Learning phase


Isolate undesired sequences


Control phase


Evaluation metrics


Quality of Information metrics (QoI)


Feb14, 2007

8

Learning Phase


Use online learning methods


Statistical correlation [ICSOC 06]


HMM [GIT
-
CERCS
-
06
-
11]


Behavior Model


Associate undesired behaviors with certain input
patterns


Online Learning
Behavior
Monitoring
Message
Sequence
Monitoring
Message Queue
Feb14, 2007

9

Control Phase


Observe input message sequence


Control sequence dispatched to each server to
maintain QoI


Dispatcher


Reordering messages in queue

Observe
Message
Parameters
Control
Message
Ordering/
Dispatching
Behavior Model
Message Queue
Feb14, 2007

10

I
-
Queue Applied to Worldspan Pricing
Engine


Affects customer relations


Possible impact on consumer experience


less options


Objective: return maximum number of alternate
fares



Problem


Variable number of
alternate fares for same
query


Root cause unknown


Feb14, 2007

11

Establishing Behavior Model


Heuristics point to query geographies


Geography based on From/To city pair, e.g. East Coast to
EU


Fare data stored in disk files separated by geography


Use geo
-
locality as our predictor


Goal: improve geo
-
locality

Feb14, 2007

12

Modified Queue Dispatcher


Dispatcher maintains server
execution history


Request routed to an available
server with matching geography


Message queue

GDS

Feb14, 2007

13

Evaluation


Used real traces from Worldspan


Set of about 1800 requests


20% process in 16 seconds


Geography extracted from messages


Hand
-
coded mapping from city pairs to geography code


Processing times measured using Worldspan
servers


Completely static environment


Simulations to measure geo
-
matching


Compare different isolation points

Feb14, 2007

14

Improvement in Geo
-
locality


Matching improves 6 times for min. farm size


Matching can improve further by adding more
servers

0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
10
20
30
40
50
60
70
80
90
100
Num Servers
% Geo Match
Max. Geo Match %
Min. servers to
meet SLA
Baseline
Feb14, 2007

15

Choosing the Right Metrics to Monitor

0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
10
20
30
40
50
60
70
80
90
100
Num Servers
% Geo Match
Max. Geo Match %
Min. servers to
meet SLA
Baseline

Min. of 28 servers to avoid queuing delays


Geo
-
match increases with more servers


Queuing delay is not the best metric to monitor

0
100
200
300
400
500
600
700
800
900
10
15
20
25
30
Num Servers
Max. Queue Length
Feb14, 2007

16

Future Directions