ppt - Northwestern University

longtermagonizingInternet και Εφαρμογές Web

13 Δεκ 2013 (πριν από 3 χρόνια και 7 μήνες)

65 εμφανίσεις

Web Overview


The birth of Web:
1989


Now Web is about
everything


Business (HR systems, e.g. NUHR)


Online Shopping (Amazon), Banking (Citibank,
Chase)


Communications (Gmail, Facebook)


Become mission
-
critical


Performance


Security



Web 2.0


Web 1.0


Basic HTML + Images


What is
Web 2.0
?


No one really gives a clear definition


Features


AJAX

( Asynchronous JavaScript and XML)


DOM

(Document Object Model)


Flash


CSS

(Cascading Style Sheets)


User involvement
: Wiki, Blog, Social Networks





Web 2.0 Basics
-

JavaScript


JavaScript


A scripting language with C/C++ like grammar


Dynamic, weakly typed language


Eval()


No need to claim the object types


Web 2.0 websites are JavaScript heavy


Google Maps (
510KB
)


Google Calendar (
152KB
)


Facebook (
558KB
)


DOM (Document Object Model)


One of the first JavaScript/DOM heavy apps:
Gmail


DOM Event API: Keyboard and mouse events


DOM CSS API

<html>


<head>


<title>Sample Document</title>


</head>


<body>


<h1>An HTML Document</h1>


<p>This is a <i>simple</i>
document.

</html>

AJAX ( Asynchronous JavaScript and XML)



Foundation

of popular web apps:
Google Map,
Gmail, Facebook
, etc.


Can transfer any object between browser


Web
server, e.g.
XML

or
JSON (JavaScript Object Notation)


req = new XMLHttpRequest();


function callback () { … }

function handler () {


if (req.readyState == 4 && req.status == 200) {


callback(req.responseText);


}

}


req.onreadystatechange = handler;

req.open(“GET”, url, true);

req.send(null);



XMLHttpRequest



register a callback
function to be
asynchronous



enable JavaScript
visit the url directly



response can be
either plain text or
XML


6

WebProphet: Automating
Performance Prediction for Web
Services

Zhichun Li
, Ming Zhang,
Zhaosheng Zhu
,
Yan Chen
,
Albert Greenberg and Yi
-
min Wang

Northwestern University

Microsoft Research

7

Large
-
scale Web Services


Most large
-
scale online services today are
web
-
based


Web search, map, Webmail, calendar, online
stores,
etc
.


Provided by Online Service Providers (OSPs)


MSN, Google, Yahoo, Amazon, etc.


Hosted by
multiple data
-
centers

around the
world


More and more complex


Yahoo Maps:
110

embedded objects,
complex
object dependencies

and
670KB

JavaScript

8


Amazon: 1% sale loss at the cost of 100ms extra
delay


Google found 500 ms extra delay reduce revenues by
up to 20%

Performance Is Important

OSP A

OSP B

Revenue

Revenue

SLOW!

Need a tool to understand and
improve the user perceived
performance.

9

Large Web Services Are Complex

Browser

Backend DCs

Frontend DCs




OSP Internal Network

DNS




Internet

Complex UI


large browser delay

Poor object dependency


more RTTs

(online map needs 40~60 http objects)

Potential Performance Problems

Complex DNS redirection


long
dns

query

(CNAME)

Different servers


more
dns

queries

RTT

Packet loss interact with TCP

Overload


Long response time

Overload


Long response time
for dynamic contents

RTT

Packet loss

Need a tool to diagnose why
slow? and where is the
bottleneck?

Performance Prediction Problem


Many ways can be used for performance
optimization.
However, cannot try them one
by one, huge cost!


What the performance will be under
hypothetical optimization strategies?



How to quickly evaluate the predicted
performance?

Performance

???

Optimization

11

Outline


Motivation


Design


Dependency Extraction


Performance Prediction


Implementation


Evaluation


Conclusion

Client Side Performance Prediction


Provider
-
based techniques


Hard to consider multiple data sources


Object dependencies


Page rendering time




Internet

CDN

Data
Center

Data
Center

The Page Load Time Decomposition


Page Load time

Object

Dependency

Client Delay

Net Delay

Server Delay

DNS Delay

Data Transfer

RTT

Packet loss

Load time of Object

i

TCP 3
-
WAY

System Architecture


Measurement Engine

Dependency

Extractor

Performance Predictor

New Scenarios

Results

PDGs

15

Outline


Motivation


Design


Dependency Extraction


Performance Prediction


Implementation


Evaluation


Conclusion

What are dependencies?


The embedded objects in an HTML page


Object requests generated by JavaScript
depend on the corresponding .JS files


External CSS and JavaScript files blocks the
other embedded objects in the HTML page


Event triggers, such as when image B trigger
“onload” event, then image A will be load by
JavaScript

Dependency Definitions


Descendant(X)
: objects that depend on X


Ancestor(X)
: objects that X depends on


Parent(X)
: The objects that X
directly

depends
on.
Direct

means can be the
last

among
ancestors




Based on parent relationship build
PDG

(parental dependency graph)


Discover Ancestors and Descendants


We discover the descendant(X) sets by using
time perturbation through HTTP proxy.




Extract non
-
stream parents


Stream VS. Non
-
stream


HTML is stream objects and other types of objects
are non
-
stream



Non
-
stream parent extraction

A

B

C

D

X

Y

Z

Descendant(A)={B,D}

Descendant(B)={D}

Extract stream parents


1) Load the HTML page very slow




2) Delay other known non
-
stream parents

X

Y

Z

Offset(Z)

X

Y

Z

Extract stream parents


1) Load the HTML page very slow




2) Delay other known non
-
stream parents

X

Y

Z

Offset(Z)

X

Y

Z

Offset2(Z)

22

Outline


Motivation


Design


Dependency Extraction


Performance Prediction


Implementation


Evaluation


Conclusion

Performance Prediction Procedure


Extract Object

timing

information

Annotate

client delay

Adjust each of

object according

to new scenario

Simulate the

page load

process

Packet

trace

PDG

New

Scenario

PDG

Object Timing Info


Basic object timing info





Adding client delay info

X

Parent(X)

Client delay

DNS

TCP

HTTP

DNS lookup time

TCP handshaking time

Response time

Reply transfer time

Request transfer time

Adjust Object Timing Info


Adjust DNS lookup time directly


Server response time: change the response
time


RTT:

m *
Δ
RTT

n *
Δ
RTT

Δ
RTT

Simulating Page Load Process I


Browser behaviors


HTTP time
(
c
)
without DNS and TCP time
(
b
)
with TCP time
(
a
)
with both DNS and TCP time
HTTP time
TCP time
Client
Delay
HTTP time
TCP time
DNS
time
T
r
HTTP
request
ready
T
p
last
parent
available
T
f
T
l
T
f
T
l
T
r
T
f
T
l
T
r
T
p
T
p
HTTP time
(
d
)
with TCP waiting time
T
f
T
l
T
p
TCP waiting
time
T
r
Client
Delay
Client
Delay
Client
Delay
Simulating Page Load Process II


Page load process


Find the earliest candidate
C

from
CandidateQueue


Load
C

according to the conditions in the pervious
slide


Find new candidates whose parents are all
available


Adjust timings of new candidates


Insert new candidates into
CanidateQueue

28

Outline


Motivation


Design


Dependency Extraction


Performance Prediction


Implementation


Evaluation


Conclusion

29

WebProphet Framework

Browser

Control

plug
-
in

Web robot

Scripting API

Application
transaction

script snippet

Pcap

trace logger

Agent network

Results

New scenario
input

PDGs

Web

Agent

Web

Proxy

Dependency Extractor

Annotate object timing info

Page simulator

Trace Analyzer

Performance Predictor

Traces

The whole system is about
12,000 lines of code

Dependency Extraction Results


Google and Yahoo Search







Validation: manual code analysis

HTML
9
Images
HTML
CSS
Image
Image
Image
Image
Image
Javascript
Google

Yahoo

Dependency Extraction Results


Google and Yahoo Maps








Validation: Create fake pages with the same
PDGs and validate the fake pages

#
HTML
=
1
#
JS
=
1
#
HTML
=
1
,#
JS
=
1
,#
IMG
=
17
#
JS
=
1
#
JS
=
1
,#
IMG
=
28
#
JS
=
1
#
JS
=
3
#
IMG
=
1
#
IMG
=
8
#
HTML
=
1
#
CSS
=
2
#
JS
=
1
#
JS
=
1
#
JS
=
1
#
JS
=
1
#
IMG
=
1
#
IMG
=
2
#
IMG
=
2
#
JS
=
5
,#
IMG
=
3
#
JS
=
1
#
JS
=
1
#
IMG
=
1
#
IMG
=
4
#
JS
=
1
#
JS
=
1
#
JS
=
1
#
HTML
=
2
,#
IMG
=
65
#
IMG
=
1
#
IMG
=
10
#
HTML
=
1
#
IMG
=
1
#
IMG
=
1
Google

Yahoo

Predication Accuracy


Evaluate both
median

and
95
-
percentile


Control experiment


50%

cases with predication error less than
6.1%


90%

cases with predication error less than
16.2%


Planetlab experiment


Predication error of median less than
6.1%


Predication error of 95
-
percentile less than
10.7%


Usage Scenarios


Analyze how to improve
Yahoo Maps


Only want to optimize a small number of objects


Use a greedy based search


Evaluate 2,176 hypothetical scenarios, find that


Move 5 objects to CDN:
14.8%


Reduce client delays of 14 objects to half:
26.6%


Combine both:
40.1%


34

Outline


Motivation


Design


Dependency Extraction


Performance Prediction


Implementation


Evaluation


Conclusion

Conclusions


Develop a novel technique to extract the
object dependencies of complex web pages


Implement a simple but yet effective model to
simulate the page load process


Apply Webprophet to Yahoo Map to show that
it can be useful for performance optimization