Sensitive Data In a Wired World
Negative Representations of Data
Stephanie Forrest
Dept. of Computer Science
Univ. of New Mexico
Albuquerque, NM
http://cs.unm.edu/~forrest
forrest@cs.unm.edu
Introduction
•
Goal: Develop new approaches to data security and privacy that
incorporate design principles from living systems:
–
Survivability and evolvability
–
Autonomy
–
Robustness, adaptation and self repair
–
Diversity
•
Extends earlier work on computational properties of the immune
system:
–
Intrusion detection
–
Automated response
–
Collaborative information filtering
Project Overview
•
Immunology and data:
–
Negative representations of information
•
Epidemiology and the Internet:
–
Social networks matter
–
The real world is not always scale free
•
The social utility of privacy:
–
Why is privacy an important value in democratic societies?
–
Evolutionary perspective
Collaborations
•
Paul Helman and Cris Moore (UNM)
•
Robert Axelrod and Mark Newman (Univ. Michigan)
•
Matthew Williamson (Sana Security)
•
Rebecca Wright and Michael de Mare (Stevens)
•
Joan Feigenbaum and Avi Silberschatz (Yale)
–
Fernando Esponda’s post

doc next year.
How the Immune System Distributes Detection
•
Advantages of distributed negative detection:
–
Localized (no communication costs)
–
Scalable and tunable
–
Robust (no single point of failure)
–
Private
•
Many small detectors matching nonself (
negative
detection).
•
Each detector matches multiple patterns (
generalization
).
Applications to Computing
•
Anomaly detectors
earlier work
•
Information filters
earlier work
•
Adaptive queries
future
•
Negative representations
in progress
–
A positive set
DB
is a set of fixed length strings.
–
A negative set
NDB
represents all the strings
not
in
DB.
–
Intuition: If an adversary obtains a string from NDB, little
information is revealed.
Example:
–
U=
All possible four character strings
–
DB
={juan, eric, dave}
–
U

DB
={aaaa, aaab, cris, john, luca, raul, tehj, tosh,.…}
–
There are 26
4

3= 456973 strings in
U

DB.
Results
•
Can U

DB be represented efficiently, given U

DB >> DB ?
–
YES: There is an algorithm that creates an
NDB
of size polynomial in
DB
.
–
Strategy: Compress information using don’t care symbol. Other
representations?
•
What properties does the representation have?
–
Membership queries are tractable (linear time even without indexing).
•
Other queries, information leakage are future work.
–
Inferring information from a subset of NDB (next slide).
–
Inferring DB from NDB is NP

Hard (note: not doing crypto):
•
Currently investigating instance difficulty.
•
Algorithms for increasing instance difficulty.
•
On

line insert/delete algorithms preserve problem difficulty.
•
Collaborations with R. Wright, M. de Mare, and C. Moore.
DB
U

DB
NDB
000
001
01*
101
010
0*1
111
011
1*0
100
110
What information is revealed by queries?
(without assuming irreversibility)
•
Having access to a subset of
NDB
(or
DB
) yields some information about strings
outside that subset:
–
Assume NDB (or DB) is partitioned into n subsets.
•
To the query “Is
x
in
DB,
” what do I learn about x if x is not in my subset?
–
Must consult n subsets of NDB to conclude that x is in DB.
–
Must consult the subsets only until x is found (on average n/2).
–
Assumes that we care more about DB than U

DB.
Probability and information content as the membership of strings is
revealed.
DB
contains 10% of all possible L

length strings (
formulas
).
Private Set Intersection
•
Determine which records are in the intersection of
several databases i.e.
–
DB
1
DB
2
…
DB
n
–
(
NDB
1
NDB
2
…
NDB
n
)
•
Each party may compute the intersection
–
DB
i
(
NDB
1
NDB
2
…
NDB
n
)
•
Party
i
learns only the intersection of all the sets,
•
And not the cardinality of the other sets.
Results cont.
•
How might these properties be useful?
–
Protect data from insider attacks
–
Computing set intersections
–
Surveys involving sensitive information
–
Anonymous digital credentials
–
Fingerprint databases
–
Other ideas?
•
Prototype implementations:
–
Perl, C
–
http://esa.ackleyshack.com/ndb
–
See demo
Computer Epidemiology
Justin Balthrop, Mark Newman, Matt Williamson
•
Information spreads over networks of social contacts between computers:
–
Email address books.
–
URL links.
•
Network topology affects the rate and extent of spreading:
–
Epidemiological models, and the epidemic threshold.
•
Controlling spread on scale

free networks:
–
Random vaccination is ineffective (e.g., anti

virus software).
–
Targeted vaccination of high

connectivity nodes.
–
Control degree distribution in time rather than space.
Science
304:527

529 (2004)
The Social Utility of Privacy
Robert Axelrod and Ryan Gerety
•
Typical framing:
–
Privacy values should remain as is (e.g., Lessig).
–
Individual rights vs. state (i.e., civil liberties vs. community safety / crime).
•
A community may have its own interest in defending individual privacy
(and not), independent of the civil liberties argument:
–
To promote innovation in changing environments.
–
To cope with distortions (e.g., overconfidence of middle managers).
–
To compensate for overgeneralized norms.
•
Not necessarily advocating more privacy:
–
From a societal/informational point of view how should appropriate bounds
on privacy be determined?
•
Current status:
–
Exploratory modeling based on simple games.
Next Steps: Negative Representations
•
Distributed negative representations
•
Leaking partial information
•
Relational algebra operators on the negative database:
–
Select, join, etc.
•
Instance difficulty:
–
Hiding given satisfying assignments in a SAT formula
–
Approximate representations
–
Other representations?
•
More realistic implementations
•
Negative data mining:
–
Is it easier/harder to find certain instances in NDB?
•
Imprecise representations:
–
Partial matching and queries
–
Learning algorithms
People
Stephanie Forrest
Elena Ackley
Fernando Esponda
Paul Helman
Publications
•
F. Esponda, S. Forrest, and P. Helman ``Negative representations of information.''
International Journal of Information Security
(submitted March 2005).
•
F. Esponda, E.~S. Ackley, S. Forrest, and P. Helman ``On

line negative databases.''
Journal
of Unconventional Computing
(in press).
•
F. Esponda, S. Forrest, and P. Helman. ``A formal framework for positive and negative
detection.''
IEEE Transactions on Systems, Man, and Cybernetics
34:1 pp. 357

373 (2004).
•
J. Balthrop, S. Forrest, M. Newman, and M. Williamson.``Technological networks and the
spread of computer viruses.'’
Science
304:527

529 (2004).
•
H. Inoue and S. Forrest ``Inferring Java security policies through dynamic sandboxing.''
"2005 International Conference on Programming Languages and Compilers (PLC'05) (in
press).
•
F. Esponda, E. Ackley, S. Forrest, and P. Helman. ``On

line negative databases.'' Third
International Conference on Artificial Immune Systems (ICARIS) Best paper award (2004).
SUPPLEMENTARY MATERIAL
Probabilities
BACK
Generating Hard

to

Reverse Negative Databases
•
The randomized algorithm can be
used to create a negative database.
•
Insert/Delete operations turn known
hard formulas into negative
databases.
•
The Morph operator may be used to
search for hard instances.
H. Jia, C. Moore and B. Selman "From spin glasses to hard satisfiable
formulas” SAT 2004.
Effect of the Morph operation
•
The Morph operation takes as input
a negative database
NDB
and
outputs
NDB’
that represents the
same set
U

DB
.
•
The plot shows how the complexity
of a database changes after
applying the morph operator.
Comments 0
Log in to post a comment