Investigating Privacy Complaints

addictedswimmingΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

76 εμφανίσεις

This work was supported by the TRUST Center (NSF award number CCF
-
0424422)

Introduction

With

recent

advances

in

technology

comes

an

increase

in

the

quantity

of

information

available

in

the

public

domain,

which

raises

concerns

regarding

the

individuals’

right

to

privacy
.

Our

team

is

interested

in

understanding

the

public’s

concerns

about

information

privacy

in

general
.

To

study

this

issue,

we

sought

to

identify

publicly

available

data

to

study
.

After

exploring

several

sources,

we

chose

Yahoo!

Answers

as

an

initial

source

of

privacy

complaint

data

because

it

provided

both

a

useful

and

free

API

and

a

vast

amount

of

publicly

available

data

that

could

be

obtained,

thus

eliminating

any

violations

of

personal

privacy

that

could

arise
.

To

collect

this

data,

we

wrote

a

python

script

to

create

a

command

line

executed

tool

that

queries

Yahoo!

Answers

for

specified

keywords

and

stores

selected

attributes

of

questions

in

a

MySQL

database
.

My

focus

in

this

team

was

on

adding

command

line

flags,

including

additional

parameters

in

the

Yahoo!

Answers

URL,

and

creating

a

cronjob

to

automatically

run

the

script
.


Methods


The

flowchart

below

illustrates

the

design

of

the

overall

script
.

The

process

includes

connecting

to

and

querying

Yahoo!

Answers

for

a

specified

keyword

and

store

the

results

in

a

database
.

My

focus

is

highlighted

in

purple
.


Process Overview




























Script Refinement


Refinements of the script, which increased flexibility, autonomy
and the quantity of data collected:


Command line flags


URL Parameters: start, sort


Cronjob


While loop (illustrated below)


While Loop Flowchart









The

flowchart

above

illustrates

the

while

loop

refinement
:

Yahoo!

Answers

is

queried

and

the

‘start’

parameter

is

incremented

until

an

error

message

from

Yahoo!

is

received
.

Results

After

running

the

script

automatically

every

two

hours

for

three

days,

over

seven

thousand

questions

were

added

to

the

database
.

Quantitative Analysis













































Visualization Analysis









Conclusions and Next Steps

Both

types

of

analysis

reveal

interesting

facts

about

the

data

collected
.

They

demonstrate

which

keywords

are

most

effective

in

retrieving

large

quantities

of

questions

from

Yahoo!

Answers
.

Furthermore,

the

more

qualitative

approach

of

the

Many

Eyes

visualization

shows

not

only

the

most

common

words

appearing

in

the

questions,

but

also

the

relationship

of

the

word

searched

for

within

the

text

to

other

words

in

the

text

analyzed
.


The

next

steps

for

this

research

include

additional

natural

language

processing

and

visualizations,

like

those

provided

on

the

Many

Eyes

web

site
.

Furthermore,

this

research

contributes

to

the

preliminary

data

collection

stage

of

a

larger

project

being

conducted

at

the

School

of

Information

at

UC

Berkeley
.

In

the

scheme

of

the

project

in

general,

the

next

steps

and

final

goal

are

to

produce

a

taxonomy

of

privacy

terms
.


Acknowledgments


I

would

like

to

thank

the

team

with

which

I

worked

to

produce

the

command
-
line

tool

discussed

in

this

research,

consisting

of

the

following

individuals
:

Christopher

Castillo,

German

Gomez,

Rafael

Negron,

and

Anand

Sonkar
.

In

addition,

I

would

like

to

thank

my

graduate

student

mentors,

Nick

Doty,

MS

and

Jen

King,

and

my

faculty

mentor,

Professor

Deirdre

Mulligan
.

Finally,

I

would

like

to

thank

Dr
.

Kristen

Gates,

TRUST

(The

Team

for

Research

in

Ubiquitous

Secure

Technology),

the

NSF

and

UC

Berkeley

for

the

opportunity

to

conduct

this

research
.



Investigating Privacy Complaints

Jennifer Felder
1
, Jennifer King
2
, Nick Doty
2
, Prof. Deirdre Mulligan
2

1
North Carolina State University,
2
University of California Berkeley School of Information