MURI Progress Update, January 31, 2008

musicincurableData Management

Jan 31, 2013 (4 years and 8 months ago)

115 views

1

Objective

Develop self
-
regenerative enterprise networks that
recover and re
-
constitute themselves after attacks
and failures



Develop a transaction
-
based model for
commodity operating systems to determine
where an attack occurred, what data or programs
were altered, and back
-
out all these changes
without affecting unrelated data/activities.



Automatically generate patches to make
systems more robust after attack.

DoD Benefit:



Uninterruptible service for critical network
centric warfare services



Error localization and tolerance in
applications



Automatic system recovery after attack
including quarantine of tainted processes and
data



Increased resiliency after attack through auto
-
patch generation

Technical Approach:



Develop a layered approach to self
-
regenerative
systems:



application
-
level resilience using error
virtualization and rescue points



system
-
level resilience using virtualization and
transaction semantics for programs to roll back
system state to the last known good
continuation point



dynamic patching of applications to improve
resiliency after attack



roll forward with correction to quarantine
tainted processes and files & back
-
out changes

Budget:



Planned/

Actual $K

Dates and location of Major
Reviews/Meetings:


July 2008, TBD

FY07

FY08

FY09

FY10

FY11

FY__

496.5

836.1

938.4

1004

980.7

570.9

Autonomic Recovery of Enterprise
-
wide Systems After Attack or
Failure with Forward Correction



Anup Ghosh, Sushil Jajodia: {aghosh1,jajodia}@gmu.edu; Angelos Kerymidas, Sal Stolfo, Jason
Nieh: {angelos,sal,nieh}@cs.columbia.edu; Peng Liu :pliu@ist.psu.edu
???

2

Technical Breakthroughs &
Accomplishments (1 of 3)

Uninterruptible Server


Developed an architecture,
algorithms, and system for
providing uninterruptible critical
network services in the face of
attack


Breakthroughs:


Supports use of COTS buggy
software while still providing
100% availability


Experimental results show
resilience against classes of
malicious attack including
denial of service, worms, and
stealthy Trojans


Experimentally
-
verified low
-
overhead


Eliminates false negatives from
sensors, and automatically
handles false positives without
manual review

Health status monitor for virtual machines and
uninterruptible server
Architecture for uninterruptible servers

Sensors

State Estimator

Response
Selector

Actuators

TC

3

Technical Breakthroughs &
Accomplishments (2 of 3)


Self
-
Healing Systems


Developed an approach for a
self
-
recoverable Linux file
system


Developed Self
-
Healing
PostgreSQL, a damage tracking,
quarantine, and repair DBMS


The first COTS DBMS that satisfies
two essential enterprise health
requirements:



Near
-
zero
-
run
-
time
overhead: less than 8%


Zero
-
system
-
down
-
time:
during online repair, its
throughput degradation
quickly improves from 40%
to 10
-
20% within few
seconds


4

Technical Breakthroughs &
Accomplishments (3 of 3)

Application Recovery Through
Error Virtualization


Developed novel “error
virtualization with rescue
points” recovery technique


retrofit exception
-
handling capabilities in
vulnerable code


allows for safe and
efficient application
recovery from failures
and attacks


Evaluated recovery
mechanism with 6 open
-
source apps


90%+ success

5

Technical Approach (1 of 3)

Uninterruptible Server


Develop a scalable
architecture that virtualizes
diverse redundant copies of
critical network services


Create a trustworthy controller
(TC) that uses automatic
feedback control to control
state of servers


Hide details of server
replication from clients


Revert servers to pristine
condition on attack or
corruption while continuing to
provide service


TC

VS

VSH

VS

VSH

VS

VSH

VSH

Load

Balancer

TC Control Station

Architecture for uninterruptible servers

6

Technical Approach (2 of 3)


Self
-
Healing Systems


Zero
-
down
-
time self
-
recoverable
Linux file system


Create a DQR (dynamic
quarantine and repair) hypervisor
“underneath” User
-
Mode
-
Linux
(UML)


Zero
-
down
-
time quarantine and
repair of infected application
execution through processor
emulation


Use process
-
level reconstruction
to guide instruction
-
level
quarantine controls


Selective replay

to keep the
results of good instructions while
removing the effects of bad ones


Display

Guest OS

VMM

Host Kernel

Hook

Task

structure

Cache

Drivers

CPU

Guest OS

Gang A

Gang B

Timer

Keyboard

Ports

Disks

process

auditor

Task

structure

Stack

Heap

Log

Surgery

Agent

Quarantine

Dependency

Analyzer

Roll
-
Forward

Correction

Instruction

Generator

7

Technical Approach (3 of 3)








Overview:


Develop failure
-
agnostic
application
-
level recovery
mechanisms from faults and attacks


React to previously unknown (zero
day) observed attacks and software
faults



Recover using program’s native error
handling


Develop map between set of faults
that could occur and explicit error
handling


Profile programs during “bad” runs


Discover candidate “rescue” points


Error Virtualization


Modify program execution so that
fault is translated into handled error

Application Recovery Through Error Virtualization


8

Impact of AFOSR Funding


Uninterruptible Servers


Complex and buggy COTS
server software can be
deployed in mission
-
critical
system without
compromising reliability or
security


Off
-
the
-
shelf intrusion
sensors can be used and
false positives and false
negatives automatically
handled without requiring
human intervention


Self Healing Systems


Local and remote surgical
corrections of corrupted
applications and operating
systems


Zero
-
downtime correction
of corrupted processes


Application Recovery


Retrofitting 3rd party code
with fault resilience by
mapping potential faults to
native error handling


Increased resiliency after
attack through auto
-
patch
generation

9

Collaboration & Funding Sources


AFOSR MURI funding is
leveraging prior NSF and
DARPA
-
sponsored work


Breadth, duration, and
funding of MURI program
makes significant
research breakthroughs
feasible


Air Force Air Combat
Command has assigned
technical officer to monitor
progress and transition
useful technologies




Besides GMU, Penn State, and
Columbia University,
collaborations with Dartmouth
College and the University of
Pennsylvania on application
recovery has made an impact


Patents filed by GMU and
Columbia


Significant industry interest in
technologies being developed


License agreement with VA
-
based start
-
up is being
pursued


Test & evaluation
discussions with large
system engineering firm
being discussed