Research in Next-Generation Digital Forensics

jabgoldfishAI and Robotics

Oct 19, 2013 (3 years and 9 months ago)

61 views

Research in Next
-
Generation
Digital Forensics

Golden G. Richard III, Ph.D.

Associate Professor

Dept. of Computer Science

golden@cs.uno.edu

http://www.cs.uno.edu/~golden


Digital Forensics Research Group


Fall 2006:


Thursdays @ 1pm in NSSAL (Math 322)


Primary Collaborators:


Vassil Roussev [UNO CS]


Vico Marziale [UNO Ph.D. student]


Frank Adelstein [ATC
-
NY]

Digital Forensics

Definition: “Tools and techniques to recover,
preserve, and examine digital evidence on or
transmitted by digital devices.”


Devices include computers, PDAs, cellular phones,
videogame consoles, copy machines, printers, …

Examples of Digital Evidence


Threatening emails


Documents (e.g., in places they shouldn’t be)


Suicide notes


Bomb
-
making diagrams


Malicious Software


Viruses


Worms





Child pornography (contraband)


Evidence that network connections were made
between machines


Cell phone SMS messages

Facts (or: Why Digital Forensics?)


Deleted files aren’t securely deleted


Recover deleted file + when it was deleted!


Renaming files to avoid detection is
pointless


Formatting disks doesn’t delete much data


Web
-
based email can be (partially)
recovered directly from a computer


Files transferred over a network can be
reassembled and used as evidence

Facts (2)


Uninstalling applications is much more difficult than it
might appear…


“Volatile” data hangs around for a long time (even across
reboots)


Remnants from previously executed applications


Using encryption properly is difficult, because data isn’t
useful unless decrypted


Anti
-
forensics (privacy
-
enhancing) software is mostly
broken


“Big” magnets (generally) don’t work


Media mutilation (except in the extreme) doesn’t work


Basic enabler: Data is very hard to kill



Privacy Through Media
Mutilation

degausser

or

or

forensically
-
secure

file deletion

software

(but make sure it works!)

or

Digital Forensics Process


Identification of potential digital evidence


Where might the evidence be?


Which devices did the suspect use?


Preservation and copying of evidence


On the crime scene…


First, stabilize evidence…prevent loss and contamination


If possible, make identical copies of evidence for
examination


Careful examination of evidence


Presentation


“The FAT was fubared, but using a hex editor I changed the first
byte of directory entry 13 from 0xEF to 0x08 to restore
‘HITLIST.DOC’…”


“The suspect attempted to hide the Microsoft Word document
‘HITLIST.DOC’ but I was able to recover it without tampering with
the file contents.”


Legal: Balance of need to investigate vs. privacy

“Traditional” Digital Forensics


Pull the plug


“Image” (make bit
-
perfect copies) of hard drives,
floppies, USB keys, etc.


Use forensics software to analyze copies of
drives


Investigator typically uses a single computer to
perform investigation in the lab


Present results to client, to officer
-
in
-
charge,
court


Traditional: Where’s the evidence?


Undeleted files, expect some names to be incorrect


Deleted files


Windows registry


Print spool files


Hibernation files


Temp files (all those .TMP files!)


Slack space


Swap files


Browser caches


Alternate or “hidden” partitions


On a variety of removable media (floppies, ZIP,
Jazz, tapes, …)

But Evidence is Also…


In RAM


“In” the network


On machine
-
critical machines


Can’t turn off without severe disruption


Can’t turn them ALL off just to see!


On huge storage devices


1TB server: image entire machine and drag it
back to the lab to see if it’s interesting?


10TB?


Next Generation: Needs


Broad:


Better design, better software


Yes, some of it is engineering (and hacking)


Someone has to do it


Better vision, application of ‘real’ CS to problems


More specific:


Need for speed


Machine correlation


Machine profiling


Better auditing of investigative process


On
-
the
-
spot forensics: Triage


Live forensics


Network forensics


Specific tools for detection and remediation of malware


Phishing investigation





Next Generation: UNO


Better file carving


Forensic
-
aware OS components


In
-
place file carving


Forensic accountability


On
-
the
-
spot forensics


Distributed digital forensics



File Carving: Basic Idea

one cluster

one sector

header, e.g.,

0x474946e8e761

(GIF)


unrelated disk blocks interesting file

footer, e.g.,

0x003B

(GIF)

“milestones”

or “anti
-
milestones”

File Carving: Fragmentation

header, e.g.,

0x474946e8e761

(GIF)


footer, e.g.,

0x003B

(GIF)

“milestones”

or “anti
-
milestones”

File Carving: Fragmentation

header, e.g.,

0x474946e8e761

(GIF)


footer, e.g.,

0x003B

(GIF)

File Carving: Damaged Files

header, e.g.,

0x474946e8e761

(GIF)


“milestones”

or “anti
-
milestones”

No footer

File Carving: Doing a Better Job


Better design


Faster


Distributed implementation


More flexible description of file types


Automatic generation of type descriptions


Patterns


Rule sets


Multiple
-
pass carving


Carve, “remove” validated files from block list, re
-
carve, hope that some fragmented files coalesce


Block
-
sniffing

File Carving: Block Sniffing

header, e.g.,

0x474946e8e761

(GIF)


Do these blocks “smell” right?




N
-
gram analysis



entropy tests



parsing


Better Software: File Carving:
Scalpel


Two
-
pass design


Minimizes:


Reads


Seeks


Writes


Data copying


Memory usage


Doesn’t yet incorporate all of the carving
wizardry we have in mind


G. G. Richard III, V. Roussev, "Scalpel: A Frugal, High Performance File Carver,"

Proceedings of the 2005 Digital Forensics Research Workshop (DFRWS 2005)
, New Orleans, LA.

Some Scalpel Results (1)

Big targets, large carve sizes, huge improvement (over 5 hours faster)

T
read

+ 238,270,750,000 bytes

Some Scalpel Results (2)

Big targets, large carve sizes, huge improvement (over 7 hours faster)

T
read

+

117,622,357,936 bytes

OS Support for Digital Forensics


Export raw disk devices across network for
processing


Others: network block device (NBD)


Us: optimization


“In
-
place” file carving


Us: Export results from file carving as a
filesystem, w/ minimal extra storage


Better auditing of investigative process


Us: “digital evidence bag”
-
aware filesystems




FUSE (Filesystem in User Space)

user space

kernel space

Linux

Virtual File System

Interface

(VFS)

C library

dd if=/evidence/DEC/img.dd of=copy.dd

read()

FUSE

ext3

reiserFS

C library

FUSE library

Filesystem

Implementation

In
-
Place File Carving

preview database

FUSE

scalpel_fs

client applications

nbd server

nbd client

network

local drive

remote drive

G. G. Richard III, V. Roussev, V. Marziale, “In
-
Place File Carving,” submitted to the
Third Annual
IFIP WG 11.9 International Conference on Digital Forensics
, 2007.

Scalpel

Better Auditing

Want: Digital Evidence Bags


See:
P. Turner, “Unification of Digital Evidence from Disparate Sources (Digital Evidence Bags),” DFRWS 2005

See:
Common Digital Evidence Storage Format (CDESF) working group,
http://www.dfrws.org/CDESF/
.


Better Auditing (2)

DEC



(DEB, AFF,

Gfzip …)

FDAM

dd

scalpel

FTK



VFS Interface

TSK

Evidence

Data

Audit Log

Import/

Export

Applications

(User space)

(Kernel)

Operating

System

Block
-
level

Data Access

Filesystem

Data Access

FDAM

Block Device

G. G. Richard III, V. Roussev, "Toward Secure, Audited Processing of Digital Evidence:

Filesystem Support for Digital Evidence Bags," Research Advances in Digital Forensics, Springer, 2006.

Digital

Evidence

Container

Bluepipe: On the Spot Digital Forensics

Y. Gao, G. G. Richard III, V. Roussev, “Bluepipe: An Architecture for On
-
the
-
Spot Digital Forensics,”

International Journal of Digital Evidence (IJDE)
, 3(1), 2004.

Bluepipe Patterns

<
BLUEPIPE
NAME=”findcacti”>

<!
--

find illegal cacti pics using MD5 hash dictionary
--
>

<DIR TARGET=”/pics/” />

<FINDFILE

USEHASHES=TRUE

LOCALDIR=”cactus”

RECURSIVE=TRUE

RETRIEVE=TRUE

MSG="Found cactus %s with hash %h ">

<FILE ID=3d1e79d11443498df78a1981652be454/>

<FILE ID=6f5cd6182125fc4b9445aad18f412128/>

<FILE ID=7de79a1ed753ac2980ee2f8e7afa5005/>

<FILE ID=ab348734f7347a8a054aa2c774f7aae6/>

<FILE ID=b57af575deef030baa709f5bf32ac1ed/>

<FILE ID=7074c76fada0b4b419287ee28d705787/>

<FILE ID=9de757840cc33d807307e1278f901d3a/>

<FILE ID=b12fcf4144dc88cdb2927e91617842b0/>

<FILE ID=e7183e5eec7d186f7b5d0ce38e7eaaad/>

<FILE ID=808bac4a404911bf2facaa911651e051/>

<FILE ID=fffbf594bbae2b3dd6af84e1af4be79c/>

<FILE ID=b9776d04e384a10aef6d1c8258fdf054/>

</FINDFILE>

</BLUEPIPE>



Distributed Digital Forensics

V. Roussev, G. G. Richard III, "Breaking the Performance Wall: The Case for Distributed Digital Forensics,“


Proceedings of the 2004 Digital Forensics Research Workshop (DFRWS 2004)
, Baltimore, MD

750GB

750GB

300GB

300GB

Distributed Digital Forensics


Scalable


Want to support at least IMAGE SIZE / RAM_PER_NODE nodes


Platform independent


Want to be able to incorporate any (reasonable) machine that’s
available


Lightweight


Horsepower is for forensics, not the framework

less fat


Highly interactive


Extensible


Allow incorporation of existing sequential tools


e.g., stegdetect, image thumbnailing, file classification, hashing,



Robust


Must handle failed nodes smoothly


Distributed Digital Forensics (2)

Distributed Digital Forensics (3)

SCSI

RAID: 504GB

File Server

CPU: 2x1.4GHz

Xeon

RAM: 2GB

Switch

96

-

port, 10/100/1000 Mb

24

Gb

Backplane

1Gb

Node

CPU: 2.4 GHz

Pentium

4

RAM: 1 GB

SCSI

RAID: 504GB

File Server

CPU: 2x1.4GHz

Xeon

RAM: 2GB

Switch

96

-

port, 10/100/1000 Mb

24

Gb

Backplane

1Gb

Node

CPU: 2.4 GHz

4

RAM: 1 GB

Beowulf [RIP], Slayer of Computer
Criminals…

DDF: Results (1)


Live string search:


“Vassil Roussev”



Regular expression
search:


v[a
-
z]*i[a
-
z]*a[a
-
z]*g[a
-
z]*r[a
-
z]*a

DDF: Results (2)


Stego detection using Stegdetect 0.5 under RH9 Linux
on the cluster


Traditional:


6GB image mounted using loopback device


find /mnt/loop

exec ./stegdetect ‘{}’
\
;


790 seconds == 13:10 minutes


Using the distributed framework


Stegdetect 0.5 code incorporated into framework


Detection against cached files


“STEGO” command (after IMAGE/CACHE)


82 seconds == 1:22 minutes


9.6X faster with 8 machines


CPU bound operation



DDF: To Do List


User interface! (unless you love Putty)

DDF: To Do (2)


Case persistence


Secure support for overlapping cases


Better fault tolerance


Intelligent caching schemes to support larger
images


Collaboration with colleagues (you?) working in:


Image analysis/classification


Speech recognition


More stego


Other CPU horsepower
-
intensive, forensics
-
applicable stuff


We provide cycles…you provide…

Current: Live Forensics


Physical memory dumps


Hard to do when adversarial OS is present


Via USB hacking?


Firewire proof of concept developed by Maximillian
Dornseif


Defeating process hiding techniques, e.g., FU
“rootkit”


Check OS components from many angles


Remnants of applications (executed) past…


e.g., instant messenger fragments


e.g., recent invocations of process hiding


e.g., fingerprints of recently executed (or executing)
malware


Conclusion: Lots of Work To Do


Benevolent hacking (engineering) meets
science


Desperately need methods for pipelining
investigative process


Live forensics critically important


volatile computing


whole disk encryption


hardware
-
based whole disk encryption!


nasty malware


Conclusion (2)


Arguably, almost
any

field in CS can collaborate


All media handling needs work


Algorithms for dealing with huge, partially
-
organized
datasets


Attribution


Correlation


Profiling


Document similarity measures


Databases


High
-
performance computing


OS Internals


Random Bedside Reading…


http://www.dfrws.org (Digital Forensics Research Workshop)


http://www.ijde.org/ (International Journal of Digital Evidence)


F. Adelstein, “Live Forensics: Diagnosing Your System Without Killing it First,”
Communications of
the ACM
, February 2006.


M. A. Caloyannides,
Privacy Protection and Computer Forensics
, Second Edition, 2004.


B. Carrier,
File System Forensic Analsis
, Addison
-
Wesley, 2005.


B. Carrier, “Risks of Live Digital Forensics Analysis,”
Communications of the ACM
, February
2006.


E. Casey,
Digital Evidence and Computer Crime
, Academic Press, 2004.


J. Chow, B. Pfaff, T. Garfinkel, M. Rosenblum, “Shredding Your Garbage: Reducing Data Lifetime
Through Secure Deallocation,”
14th USENIX Security Symposium,
2005.


M. Geiger, “Evaluating Commercial Counter
-
Forensic Tools,”
5th Annual Digital Forensic
Research Workshop (DFRWS 2005)
, New Orleans, 2005.


G. G. Richard III, V. Roussev, "Next Generation Digital Forensics,"
Communications of the ACM
,
February 2006.


G. G. Richard III, V. Roussev, “Digital Forensics Tools: The Next Generation,” invited chapter in
Digital Crime and Forensic Science in Cyberspace
, IDEA Group Publishing, 2005.


A. Schuster, “Searching for Processes and Threads in Microsoft Windows Memory Dumps,” 6th
Annual Digital Forensic Research Workshop (DFRWS 2006), West Lafayette, IN, 2006.


S. Sparks, J. Butler, “Raising the Bar for Windows Rootkit Detection,” Phrack Issue # 63.


G. Hoglund, J. Butler, “Rootkits: Subverting the Windows Kernel,” Addison
-
Wesley, 2005.


Presentation available:

http://www.cs.uno.edu/~golden/teach.html


golden@cs.uno.edu


Security Lab (NSSAL): Math 322

?