BLUEPRINT: Robust Prevention of Cross-site Scripting Attacks for Existing Browsers

nostrilshumorousInternet και Εφαρμογές Web

18 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

53 εμφανίσεις

BLUEPRINT: Robust Prevention of

Cross
-
site Scripting Attacks

for Existing Browsers



Mike
Ter

Louw

V.N.
Venkatakrishnan

University of Illinois at Chicago

Outline


Intro to Cross
-
site Scripting


Objective


Approach


Technical details


Evaluation


Related work



Cross
-
site scripting (XSS)


A widespread web application vulnerability


In the last few weeks…


Time magazine “Top 100 influential people” poll
defaced by XSS (Apr 2009)


Twitter XSS worm (Apr 2009)


McAfee web site attacked (May 2009)



The #1 threat on the Internet (OWASP)


Problem: Malicious user created
content!

Benign comment

“Pete is…”

Malicious comment

“<script>
doEvil
()</script>”

Our Objective



To develop a robust defense for

cross
-
site scripting attacks

Typical Web Application Goals


Allow

user created content to be expressive,
containing rich HTML content


Format text (<b>
bold
</b>, <
i
>
italics
</
i
>)


Hyperlinks (<a
href
=“
http://g.com
”>…</a>)


Embedded images



Prevent

scripts in user created content



Today’s web browsers / standards do not easily
facilitate these goals to be met simultaneously

Content Isolation


User
-
created content should always be
treated as “data”, never as “code”





Need to isolate user created
content as “data only”



Content Isolation for Browsers


Content Isolation can be achieved for future
browsers


Requires changes to standards and browser parser
implementations


Standards / Browsers’ revision cycles may take
several years



Today’s browsers continue to remain
vulnerable to XSS in the near term






Our Goal



Construct a robust defense for cross
-
site
scripting attacks that


permits rich HTML content


works on today’s browsers


configured to default settings



without requiring changes of any form, including
patches, plug
-
ins, add
-
ons, etc.



Most popular defense:

Content filtering



Involves sanitization of
untrusted

HTML by
removing script content


Mainly done using regular expressions / parsing
HTML


Absence of strong isolation facilities for HTML
has made content filtering the current main
line of defense


Problem with Content Filtering


The web application’s interpretation of
sanitized content may differ from the
browser’s interpretation


Example:
+ADw
-
SCRIPT+AD4
-
attack();


Web Application’s understanding :
raw text


Browser’s understanding:
“<SCRIPT>attack();”


The parsing “gap”



Browser generated Parse Tree



div

div

text

text

div

text

div

div

text

text

div

script

Server intended parse tree

XSS Cheat Sheet provides approx. 100 examples of such
browser “quirks”


Our Approach:

Server intended parse tree

of
untrusted

content



Reproduce on Browser



div

div

text

text

div

text

div

div

text

text

div

text

Challenge

: Parsers on existing browsers are unreliable

The Blueprint Approach


Take control content interpretation process
on the browser


Avoid
untrusted

content parsing by browser

No parsing
of
untrusted

content by
browser

No scripts
identified in
untrusted

content!

Robust

XSS

Prevention

High level overview


Generate a parse tree of
untrusted

content on
the server


Remove script content by applying
whitelist

of
known
-
static content types



Automatically generate a (trusted) JavaScript
program to reconstruct this parse tree on the
browser


Approach Overview

HTML parse tree via

document.createElement
()

et al.

Problem: Transporting data without
invoking browser’s parser


Parse tree is constructed using both JavaScript
code and data


Code constructs various tree nodes (e.g. <div>)


Data that annotates tree nodes (e.g. text content)


Exposing raw data to browser parser may lead to
unpredictable behavior


Our Solution
: Encode data using safe alphabet


E.g. “a
-
z”


transport encoded data to the JavaScript interpreter

Transporting
data

HTML parse tree via

document
.

createElement
() et al.

Text node

Plain text

String variable

DOM API used




document.


createElement
()


createTextNode
()


getElementById
()





element.


appendChild
()


insertBefore
()


parentNode
()


removeChild
()


setAttribute
()


style
[ ]


style.setExpression
()


Instrumenting

web application with
Blueprint

<?
php

foreach

($comments as $comment): ?>


<
li
><?
php

echo($comment);

?></
li
>

<?
php

endforeach
; ?>



<?
php

foreach

($comments as $comment): ?>


<
li
><?
php


$model = Blueprint::
cxPCData
($comment);


echo($model);


?></
li
>

<?
php

endforeach
; ?>

Transformed web application output

XSS Vector II:

Cascading
Style Sheets

CSS without XSS


Use style object to apply style rules


element.style
['width'] = decode(
untrusted

);


Dynamic properties not allowed by
whitelist


element.style
['behavior'] = …


element.style
['
-
moz
-
binding'] =


CSS expression vector


Any “static” property can be promoted to
dynamic via expression() syntax

element.
style
[“width”] = “expression(
attack
()
)”;




Threat exists only on Internet Explorer


IE has no DOM interface to directly force
static value

Protection against CSS expressions


Use
setExpression
( … ) to apply style rules


Forces all CSS rules to be dynamic


Trusted script invoked to retrieve property
value


Script looks up
untrusted

value in array, then
returns it


Returned value observed to be static


Evaluated
unobfuscated

expression() for all
allowed CSS properties

XSS vector III:

Uniform
Resource
Identifiers (URI)

URI


http:
//
www.example.com
/
a.html
?
param
#
a


URI scheme

indicates static / dynamic nature


Static: http:, https:, ftp:, mailto:


Dynamic:
javascript
:


No direct interface to URI parser to enforce a
particular (
whitelisted
)
scheme


We use a 3
-
tiered defense

Evaluation

Evaluation


Effectiveness

at preventing
XSS attacks
on
existing
browsers



Compatibility

with common use
cases



Performance

overhead on server and
browser

Browser
evaluation


Chrome 1


Firefox 3


Firefox 2


IExplorer

7


8 browsers tested


Total over 96% market share of browsers in active
use

Internet Explorer 6

Opera 9.6

Safari 3.2

Safari 3.1

Defense effectiveness


XSS Cheat Sheet [Ha09]


94 XSS attack examples


Designed
to target server
-
side defenses


Embedded in several syntactic contexts


Developed automated test platform


Identified which attacks successful on which browser


Evaluated defense effectiveness


All 94 attacks successfully defended on all 8
evaluated browsers

Compatibility


Modified source code for two popular web
applications:


WordPress


MediaWiki


Modified output of two popular websites


NY Times blog


Slashdot.org

WordPress

(compatibility)


Added protection for 3 low integrity outputs
(per user comment to blog article)


Name (plain text)


Website link (anchor element)


Comment body (mixed HTML)


Allows testing of pages with hundreds of
(relatively simple) models


Tested real
-
world blogs, 23

516 comments


No negative compatibility impact observed

MediaWiki

(compatibility)


Added protection for 2 low integrity outputs


Article (i.e., web page) title


Article content


Allows testing of large, complex models


Tested “Featured” article from Wikipedia


Content rendered very faithfully to original


Problems:


<
imagemap
> not in
whitelist


Relocate trusted script

Performance overhead measurements


Server page generation latency


Browser memory
overhead


Browser page
rendering
latency


Combined effect of server and
browser
latencies

WordPress

page generation latency


Measured significant overhead


Partly due to redundant content filter (KSES)

MediaWiki

page generation latency


Better performance than
WordPress


Redundant intermediate HTML stage

Client memory overhead


Minor
overhead

WordPress

page rendering latency

MediaWiki

page rendering latency

User experience impact of combined
latencies


Tested with Firefox 2 (mid
-
road performance)


WordPress

with 100 blog comments







Low perception of delays for common case

Related Work


Server
-
side (XSS
-
Guard,
NeatHTML
)


Prevent injected scripts in final output


Vulnerable to attacks exploiting parsing differences


Client
-
side (
NoMoXSS
,
Noxes
)


Identification and prevention of data leaks


Cannot detect XSS within same origin


Black box / proxy (XSS
-
DS, Taint inference)


Server: Detect and prevent reflected scripts


Client: Detect and prevent data leaks



Related work (cont.)


Server and browser collaboration (BEEP, DSI,
Noncespaces
)


Server: Identify policy regions and declare policies


Client: Enforce policies over policy regions


Require browser changes


Systems supporting benign scripts in user
-
created content


Caja
, Web Sandbox,
Facebook


Complimentary to our approach

Conclusion


Cross
-
site scripting attacks can be prevented
entirely if browsers and web applications can
come to a common understanding

of the
structure of
untrusted

content


Blueprint
faciliates

this goal and provides a
novel defense for XSS


Project page:


http://www.sisl.rites.uic.edu/blueprint

References


[Ha09] Hansen, Robert.
XSS Cheat Sheet



[Di07] Di Paola, Stefano.
Preventing XSS with
Data Binding

XSS Detail


Challenge for attacker: Embed content the
browser will interpret as script


Many vectors


Script tags <script> attack(); </script>


Script attributes:
onmousemove
=“attack();”


CSS Style rules: “width: expression( attack() );”


URI:
src
=“
javascript:void

attack()”


Encoding


Search engine optimization (SEO)


Screen readers


View source


Solutions:


Less destructive encoding


Modify reader


Add feature to browser

Dynamic attacks


UCC added to a page dynamically must also be
protected


Current implementation requires remote
procedure call (via XHR / AJAX) to request
model


Blueprint can ensure a base document free of
user
-
embedded scripts


Trusted code must then take precautions to
maintain security

Whitelist


Whitelist

can be site
-
specific


Whitelist

can be grown, gradually adding
content known to be static


Used off
-
the
-
shelf
whitelist

from
HTMLPurifier



URI Defense


3
-
tiered defense:


1. Character
-
level
whitelist


Only allow syntactically
-
inert
untrusted

chars


2. Parse behavior sensing


a.protocol

DOM property [Di07]


Assumes URI parsing same for all contexts


a.href
, img.src,
url
()


3. Impact mitigation


Rewrite URI pointing to redirection service


Attacks execute in different origin, void of sensitive
data

Eliminate dependency on browser
parser


Transform user
-
created content into static
content models on web server


Model reflects approved content parse tree


Propagate static content models into
JavaScript interpreter of web browser


Reconstruct server
-
approved parse tree using
client
-
side model interpreter

Create static content model


Parse
untrusted

HTML


Prune resulting parse tree in accordance with
whitelist

of known
-
static node types


Serialize parse tree into stream of benign data
characters


Wrap in <code> … </code> tags


Attach trusted script for invoking model
interpreter

Model interpreter


Interprets model as stream of declarative
statements


Uses reliable DOM API to generate content


document.createElement
( … )


element.appendChild
( … )


Enforces server
-
intended parse tree in
browser