SATE 2010 Background

judgedrunkshipΔιακομιστές

17 Νοε 2013 (πριν από 3 χρόνια και 4 μήνες)

67 εμφανίσεις

SATE 2010 Background

Vadim Okun, NIST

vadim.okun@nist.gov

October 1, 2010


The SAMATE Project

http://samate.nist.gov/

2

Cautions on Using SATE Data


Our analysis procedure has limitations


In practice, users write special rules, suppress
false positives, and write code in certain ways
to minimize tool warnings.


There are many other factors that we did not
consider: user interface, integration, etc.



So do NOT use our analysis to rate/choose tools

Overview

3

X

X

X

X

Human findings

CVE entries

Program

Buf

Leak

Race



Security
?

Quality?

Insignificant?

False?

?

Tool A

Tool B

Tool C

X

Tools that work on source code

4

SATE 2010 timeline


Choose test programs (2 C, 1 C++, 2 Java).
Provide them to tool makers (28 June)


Teams run their tools, return reports (30 July)


Analyze the tool reports (22 Sept)


Report at the workshop (1 Oct)


Teams submit a research paper (Dec)


Publish data (between Feb and May 2011)

5

Participating teams


Armorize CodeSecure


Concordia University Marfcat


Coverity Static Analysis for C/C++


Cppcheck


Grammatech CodeSonar


LDRA Testbed


Red Lizard Software Goanna


Seoul National University Sparrow


SofCheck Inspector


Veracode


a service company

6

Test cases


Dovecot: secure IMAP and POP3 server


C


Pebble: weblog server


Java


Wireshark
-

C


Google Chrome


C++



Apache Tomcat
-

Java



All are open source programs


All have aspects relevant to security


From 30k LoC (Pebble) to 4.7M LoC (Chrome)

CVE
-
based pairs

(vulnerable and fixed)

7

Tool reports

XML

HTML

DB

Original tool formats


Teams converted reports to
SATE format


Several teams also provided
original reports


Described environment in
which they ran tool


Some teams tuned their tools


Several teams provided
analysis of their tool
warnings



SATE

format

8

Analysis procedure

Tool warnings

~60K warnings

Select

randomly

Related

to human

findings

3 Selection Methods

Analyze for

correctness

and associate

Analyze

the data

Selected

warnings

Related

to CVEs

9

Method 1


Warning Selection

For Dovecot and Pebble only


We assigned severity if a tool did not


Mostly avoid warnings with severity 5 (lowest)


Statistically select from each warning class


Select more warnings from higher severities


Select 30 warnings from each of 10 tool reports


1 report had only 6 warnings


Did not analyze Marfcat warnings


Total is 276

10

Method 2


Human findings

For Dovecot and Pebble only


Security experts analyze test cases


A small number of findings


Root cause, with an example trace


Find related warnings from tools


Goal: focus our analysis on weaknesses
found most important by security experts

11

Method 3
-

CVEs

For Wireshark, Chrome, and Tomcat


Identify the CVEs


Locations in code


Find related warnings from tools


Goal: focus our analysis on real
-
life
exploitable vulnerabilities

12

Correctness categories


True security weakness


True quality weakness


True but insignificant weakness


Weakness status unknown


Not a weakness

13

Differences from SATE 2009


Add CVE
-
selected test cases


Include a C++ test case


Larger test cases: Chrome
-

4.7 MLOC


More correctness categories (true quality)


More detailed guidelines for analysis



Still, much can be improved…

14

Thanks


Romain Gaucher, Ram Sugasi


Aurelien Delaitre, Sue Wang, Paul Black,
Charline Cleraux, and other SAMATE team
members


Most of all, the participating teams!

15


16

Questions


What weaknesses exist in real programs?


What do tools report for real programs?


Do tools find important weaknesses?



Focus on tools that work on source code


Defects that may affect security


Goal is NOT too choose the “best” tools


This is the 3
rd

SATE (1
st

in 2008)

17

SATE goals


To enable empirical research based on large test sets


To encourage improvement and adoption of tools




NOT to choose the “best” tools

18

SATE common tool output format

<weakness id=“23”>


<name cweid=“79”>SQL Injection</name>


<location id=“1” path=“dir/f.c” line=“71”/>





<grade severity=“2” probability=“0.5”/>


<output> Query is constructed



with user supplied input … </output>




</weakness>

one or more

traces

1 to 5, with 1


the highest

optional

that it is true

…and other annotation

19

Lessons learned


Guidelines for analysis often ambiguous


need to be refined even more


Our analysis has inconsistencies and lapses


Careful analysis takes longer than expected


We do not know the code well


Tool interface is important to understand a
weakness

20

Analysis procedure


We cannot know all weaknesses in the test cases


Impractical to analyze all tool warnings


So analyze the following:



Method 1. A subset of warnings from each tool
report


Method 2. Tool warnings related to manually
identified weaknesses

21

SATE tool output format


Common format in XML


For each weakness


One or more trace
-

locations
-

line number and pathname


Name of weakness and (optional) CWE id


Severity: 1 to 5 (ordinal scale), with 1


the highest


Probability that the problem is true positive


Original message from the tool


And other annotation


22

Our analysis


Correctness of warning


Associate warnings that refer to the same
(or similar) weakness