This work was supported by the TRUST Center (NSF award number CCF
-
0424422)
Introduction
With
recent
advances
in
technology
comes
an
increase
in
the
quantity
of
information
available
in
the
public
domain,
which
raises
concerns
regarding
the
individuals’
right
to
privacy
.
Our
team
is
interested
in
understanding
the
public’s
concerns
about
information
privacy
in
general
.
To
study
this
issue,
we
sought
to
identify
publicly
available
data
to
study
.
After
exploring
several
sources,
we
chose
Yahoo!
Answers
as
an
initial
source
of
privacy
complaint
data
because
it
provided
both
a
useful
and
free
API
and
a
vast
amount
of
publicly
available
data
that
could
be
obtained,
thus
eliminating
any
violations
of
personal
privacy
that
could
arise
.
To
collect
this
data,
we
wrote
a
python
script
to
create
a
command
line
executed
tool
that
queries
Yahoo!
Answers
for
specified
keywords
and
stores
selected
attributes
of
questions
in
a
MySQL
database
.
My
focus
in
this
team
was
on
adding
command
line
flags,
including
additional
parameters
in
the
Yahoo!
Answers
URL,
and
creating
a
cronjob
to
automatically
run
the
script
.
Methods
The
flowchart
below
illustrates
the
design
of
the
overall
script
.
The
process
includes
connecting
to
and
querying
Yahoo!
Answers
for
a
specified
keyword
and
store
the
results
in
a
database
.
My
focus
is
highlighted
in
purple
.
Process Overview
Script Refinement
Refinements of the script, which increased flexibility, autonomy
and the quantity of data collected:
•
Command line flags
•
URL Parameters: start, sort
•
Cronjob
•
While loop (illustrated below)
While Loop Flowchart
The
flowchart
above
illustrates
the
while
loop
refinement
:
Yahoo!
Answers
is
queried
and
the
‘start’
parameter
is
incremented
until
an
error
message
from
Yahoo!
is
received
.
Results
After
running
the
script
automatically
every
two
hours
for
three
days,
over
seven
thousand
questions
were
added
to
the
database
.
Quantitative Analysis
Visualization Analysis
Conclusions and Next Steps
Both
types
of
analysis
reveal
interesting
facts
about
the
data
collected
.
They
demonstrate
which
keywords
are
most
effective
in
retrieving
large
quantities
of
questions
from
Yahoo!
Answers
.
Furthermore,
the
more
qualitative
approach
of
the
Many
Eyes
visualization
shows
not
only
the
most
common
words
appearing
in
the
questions,
but
also
the
relationship
of
the
word
searched
for
within
the
text
to
other
words
in
the
text
analyzed
.
The
next
steps
for
this
research
include
additional
natural
language
processing
and
visualizations,
like
those
provided
on
the
Many
Eyes
web
site
.
Furthermore,
this
research
contributes
to
the
preliminary
data
collection
stage
of
a
larger
project
being
conducted
at
the
School
of
Information
at
UC
Berkeley
.
In
the
scheme
of
the
project
in
general,
the
next
steps
and
final
goal
are
to
produce
a
taxonomy
of
privacy
terms
.
Acknowledgments
I
would
like
to
thank
the
team
with
which
I
worked
to
produce
the
command
-
line
tool
discussed
in
this
research,
consisting
of
the
following
individuals
:
Christopher
Castillo,
German
Gomez,
Rafael
Negron,
and
Anand
Sonkar
.
In
addition,
I
would
like
to
thank
my
graduate
student
mentors,
Nick
Doty,
MS
and
Jen
King,
and
my
faculty
mentor,
Professor
Deirdre
Mulligan
.
Finally,
I
would
like
to
thank
Dr
.
Kristen
Gates,
TRUST
(The
Team
for
Research
in
Ubiquitous
Secure
Technology),
the
NSF
and
UC
Berkeley
for
the
opportunity
to
conduct
this
research
.
Investigating Privacy Complaints
Jennifer Felder
1
, Jennifer King
2
, Nick Doty
2
, Prof. Deirdre Mulligan
2
1
North Carolina State University,
2
University of California Berkeley School of Information
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment