Sensitive Data In a Wired World

blessinghomoeopathAI and Robotics

Nov 30, 2013 (3 years and 6 months ago)

99 views

Sensitive Data In a Wired World

Negative Representations of Data


Stephanie Forrest

Dept. of Computer Science

Univ. of New Mexico

Albuquerque, NM


http://cs.unm.edu/~forrest

forrest@cs.unm.edu



Introduction


Goal: Develop new approaches to data security and privacy that
incorporate design principles from living systems:


Survivability and evolvability


Autonomy


Robustness, adaptation and self repair


Diversity


Extends earlier work on computational properties of the immune
system:


Intrusion detection


Automated response


Collaborative information filtering

Project Overview


Immunology and data:


Negative representations of information


Epidemiology and the Internet:


Social networks matter


The real world is not always scale free


The social utility of privacy:


Why is privacy an important value in democratic societies?


Evolutionary perspective


Collaborations


Paul Helman and Cris Moore (UNM)


Robert Axelrod and Mark Newman (Univ. Michigan)


Matthew Williamson (Sana Security)


Rebecca Wright and Michael de Mare (Stevens)


Joan Feigenbaum and Avi Silberschatz (Yale)


Fernando Esponda’s post
-
doc next year.

How the Immune System Distributes Detection


Advantages of distributed negative detection:


Localized (no communication costs)


Scalable and tunable


Robust (no single point of failure)


Private


Many small detectors matching nonself (
negative
detection).


Each detector matches multiple patterns (
generalization
).

Applications to Computing


Anomaly detectors

earlier work



Information filters

earlier work


Adaptive queries

future


Negative representations

in progress


A positive set
DB

is a set of fixed length strings.


A negative set
NDB

represents all the strings
not

in
DB.


Intuition: If an adversary obtains a string from NDB, little
information is revealed.




Example:


U=
All possible four character strings


DB
={juan, eric, dave}


U
-
DB
={aaaa, aaab, cris, john, luca, raul, tehj, tosh,.…}


There are 26
4
-
3= 456973 strings in
U
-
DB.


Results


Can U
-
DB be represented efficiently, given |U
-
DB| >> |DB| ?


YES: There is an algorithm that creates an
NDB

of size polynomial in
DB
.


Strategy: Compress information using don’t care symbol. Other
representations?








What properties does the representation have?


Membership queries are tractable (linear time even without indexing).


Other queries, information leakage are future work.


Inferring information from a subset of NDB (next slide).


Inferring DB from NDB is NP
-
Hard (note: not doing crypto):


Currently investigating instance difficulty.


Algorithms for increasing instance difficulty.


On
-
line insert/delete algorithms preserve problem difficulty.


Collaborations with R. Wright, M. de Mare, and C. Moore.

DB

U
-
DB

NDB

000

001

01*

101

010

0*1

111

011

1*0

100

110

What information is revealed by queries?

(without assuming irreversibility)


Having access to a subset of
NDB

(or
DB
) yields some information about strings
outside that subset:


Assume NDB (or DB) is partitioned into n subsets.


To the query “Is
x

in
DB,
” what do I learn about x if x is not in my subset?


Must consult n subsets of NDB to conclude that x is in DB.


Must consult the subsets only until x is found (on average n/2).


Assumes that we care more about DB than U
-
DB.


Probability and information content as the membership of strings is
revealed.
DB

contains 10% of all possible L
-
length strings (
formulas
).

Private Set Intersection


Determine which records are in the intersection of
several databases i.e.


DB
1



DB
2





DB
n



(
NDB
1



NDB
2





NDB
n
)


Each party may compute the intersection


DB
i


(
NDB
1



NDB
2





NDB
n
)


Party
i

learns only the intersection of all the sets,


And not the cardinality of the other sets.

Results cont.


How might these properties be useful?


Protect data from insider attacks


Computing set intersections


Surveys involving sensitive information


Anonymous digital credentials


Fingerprint databases


Other ideas?


Prototype implementations:


Perl, C


http://esa.ackleyshack.com/ndb



See demo


Computer Epidemiology

Justin Balthrop, Mark Newman, Matt Williamson


Information spreads over networks of social contacts between computers:


Email address books.


URL links.


Network topology affects the rate and extent of spreading:


Epidemiological models, and the epidemic threshold.


Controlling spread on scale
-
free networks:


Random vaccination is ineffective (e.g., anti
-
virus software).


Targeted vaccination of high
-
connectivity nodes.


Control degree distribution in time rather than space.

Science
304:527
-
529 (2004)

The Social Utility of Privacy

Robert Axelrod and Ryan Gerety


Typical framing:


Privacy values should remain as is (e.g., Lessig).


Individual rights vs. state (i.e., civil liberties vs. community safety / crime).


A community may have its own interest in defending individual privacy
(and not), independent of the civil liberties argument:


To promote innovation in changing environments.


To cope with distortions (e.g., overconfidence of middle managers).


To compensate for overgeneralized norms.


Not necessarily advocating more privacy:


From a societal/informational point of view how should appropriate bounds
on privacy be determined?


Current status:


Exploratory modeling based on simple games.


Next Steps: Negative Representations


Distributed negative representations


Leaking partial information


Relational algebra operators on the negative database:


Select, join, etc.


Instance difficulty:


Hiding given satisfying assignments in a SAT formula


Approximate representations


Other representations?


More realistic implementations


Negative data mining:


Is it easier/harder to find certain instances in NDB?


Imprecise representations:


Partial matching and queries


Learning algorithms


People

Stephanie Forrest

Elena Ackley

Fernando Esponda

Paul Helman

Publications


F. Esponda, S. Forrest, and P. Helman ``Negative representations of information.''
International Journal of Information Security

(submitted March 2005).


F. Esponda, E.~S. Ackley, S. Forrest, and P. Helman ``On
-
line negative databases.''
Journal
of Unconventional Computing

(in press).


F. Esponda, S. Forrest, and P. Helman. ``A formal framework for positive and negative
detection.''
IEEE Transactions on Systems, Man, and Cybernetics

34:1 pp. 357
-
373 (2004).


J. Balthrop, S. Forrest, M. Newman, and M. Williamson.``Technological networks and the
spread of computer viruses.'’
Science

304:527
-
529 (2004).


H. Inoue and S. Forrest ``Inferring Java security policies through dynamic sandboxing.''
"2005 International Conference on Programming Languages and Compilers (PLC'05) (in
press).


F. Esponda, E. Ackley, S. Forrest, and P. Helman. ``On
-
line negative databases.'' Third
International Conference on Artificial Immune Systems (ICARIS) Best paper award (2004).


SUPPLEMENTARY MATERIAL

Probabilities

BACK

Generating Hard
-
to
-
Reverse Negative Databases


The randomized algorithm can be
used to create a negative database.


Insert/Delete operations turn known
hard formulas into negative
databases.


The Morph operator may be used to
search for hard instances.

H. Jia, C. Moore and B. Selman "From spin glasses to hard satisfiable
formulas” SAT 2004.

Effect of the Morph operation


The Morph operation takes as input
a negative database
NDB

and
outputs
NDB’

that represents the
same set
U
-
DB
.


The plot shows how the complexity
of a database changes after
applying the morph operator.