Research Using Data Mined from the Internet --Regulatory ...

voltaireblingData Management

Nov 20, 2013 (3 years and 8 months ago)

89 views

1

Research Using Data Mined
from the Internet
--
Regulatory Considerations


Laura Odwazny

Senior Attorney

Office of the General Counsel

U.S. Department of Health and Human Services


DOE CIRB meeting

June 14, 2012

2

Disclaimer


This presentation does not constitute legal advice. The views
expressed are the presenter’s own, and do not bind the U.S.
Department of Health and Human Services or its
components.

3

Do Note:


OHRP has no guidance on Internet research specifically


Many boards have separate guidelines and best practices for
Internet research

4


Internet Research



Internet research = research which utilizes the Internet to collect
information through an online tool, such as an online survey; studies
about how people use the Internet, e.g., through collecting data
and/or examining activities in or on any online environments;
and/or, uses of online datasets, databases, databanks, repositories.



Internet as a
TOOL

FOR

research or…


Internet as a
MEDIUM/LOCALE OF

research



TOOL=search engines, databases, catalogs, etc…


MEDIUM/LOCALE=chat rooms, newsgroups, home pages,
multi
-
player gaming sites, blogs, skype, tweeting, online
course software, etc

5

Forms of Research: Exploring Where
Human Subjects Fit


Consider Methodologies, Venues, Types of Data Generated
through:


Quantitative Research



Data Aggregation, Scraping, Transaction Log Analysis,
Network Analysis, Statistical Analysis
etc


Qualitative Research



Ethnography, Focus Groups, Observation, Surveys,
Content/Discourse Analysis, etc

5

6

Forms of Internet Research Venues


Email, IM, tweets


Listserves, chat rooms


Search engines, other archives


Social network sites, media sharing sites


Blogs and home pages


Virtual worlds


Online marketplaces, online gaming


Databanks, repositories


Venues other than “place
-
based), e.g. mobile data
collection

7

E
-
Data Raises New Ethical Challenges


Trackability


“Dataveillance” = data monitoring+ recording


“Greased”



When information is computerized, it is
greased to slide easily and
quickly to many ports of call. But legitimate concerns about privacy
arise when this speed and convenience lead to the improper exposure of
information. Greased information is information that moves like
lightning and is hard to hold onto.”


Malleability


Can be utilized in varied ways for multiple purposes


Invisibility Factor


Computer operations usually invisible; can allow for abuse


James Moor, 1985

8

9

Online Support Groups

10

11

Twitter


Blurs the boundaries between public/private


Tweeter A (private)

followed by Tweeter B (public)

Tweeter B retweets A = Tweet
A is now visible to Tweeter B’s public)feed


Track
-
backability is increased; consider sensitivity, reputation, risk/benefit


Archived Tweet Data fields:


country code:


id:


klout score


link:


location


coord type:


location coords:


location displayname:


location type:


posted time:


real name:


rule match:


tweet url:user twitter page:


username:

12

Regulatory considerations

HEADER

13

Big regulatory issues…


What is “private”?


What is “identifiable”?


How to protect subjects’ privacy and confidentiality
interests?


Minimizing risk when using sensitive online data


Current sensitivity vs. future sensitivity


Informational risks


Data security

14

OHRP’s Analytic Framework for the
Common Rule: Always Start With…


Is the activity subject to regulation?


Conducted or supported by a Common Rule agency?


Covered under an applicable FWA?


Is it research?


Does it involve human subjects?


Is it exempt?


Keep in mind regulatory flexibilities:


Can it be expedited?


Waiver of informed consent?


Waiver of documentation of consent?

15

Human subject

.
102(f): “a living individual about whom an investigator conducting
research obtains (1) data through intervention or interaction with
the individual, or (2) identifiable private information…
Private
information
includes information about behavior that occurs in a
context in which an individual can reasonably assume that no
observation or recording is taking place, and information which
has been provided for specific purposes by an individual and
which the individual can reasonably expect will not be made
public (for example, a medical record).
Private information
must be individually identifiable (i.e., the identity of the
subject is or may readily be ascertained by the investigator or
associated with the information)
in order for obtaining the
information to constitute research involving human subjects.

(emphasis added)

16

Privacy in the Internet age

Private


How to interpret “reasonably expect that no observation
or recording is taking place” or “reasonably expect will
not be made public”


IMs, tweets, email, FB profile, chatroom discussions,
listserves


Must information be considered either “public” or “private”?


Members
-
only forum, community standards


Shifting norms about what information is “private”


What is a “reasonable” expectation of privacy in
grid/Internet/e
-
data?


Expectations of privacy vs. actual privacy

17

How should the IRB assess privacy?


What expectations of privacy are “reasonable”?


Get information about the environment


Get information about the users


Review Terms of Service


Data security consideration

18

Human subjects (2)

Identifiable


Individually identifiable = subject’s identity readily
ascertainable by the investigator or associated with the
information


Structure of social network, search terms, purchase habits,
movie ratings on Netflix may uniquely identify individual


Zip code + sex + DOB enough for Latanya Sweeney to identify


Given demonstrated ability to reidentify individuals from
anonymized or aggregated data, is this a meaningful
decision point?

19

How should the IRB assess identifiability?


When will the subject’s identity be “readily” ascertainable
by the investigator or associated with the information?


Consider the investigator, e.g. Professor LaTanya
Sweeney vs. Professor Elizabeth Buchanan


Consider the potential identifiers


Consider likelihood of reidentification with triangulation

20

Exemption .101(b)(4)


Research involving the collection or study of existing
data, documents, records, pathological specimens, or
diagnostic specimens, if these sources are publicly
available or if the information is
recorded by the
investigator in such a manner that subjects cannot be
identified
, directly or through identifiers linked to the
subjects.

HEADER

21

Exemption .101(b)(4) applied


When is information “recorded in an identifiable manner”


Is an email address an identifier?


Do tweets contain identifiers?


Does the inclusion of IP address make information
identifiable?


When are data, documents, or records publicly available on
the internet?


Does “publicly available” include large datasets
purchased/obtained from Google or Facebook?


What if data are semi
-
restricted
--

available only to
‘friends’, listserve members?

22

Key Considerations for IRB Review


What type of venue?


Expectations of privacy?


Consent procedures?


Sensitivity of data?


Harm/Risk?


Age verification?


Authentication of participants?


Identification of participants?


Use of encryption?


Storage/transmission of data?

23

Other potential issues


international research


PI is proposing to collect data from publically accessible
social media sites, some of which are hosted by servers
outside of the US. The PI will collect all data from his
computer in the US. Is the activity international research?”
(from IRB Forum)


Consider EU data protection directive, Canadian laws,
etc. if applicable!

24

Stay tuned

25

ANPRM


Implications for Internet research


Base concept of identifiability under Common Rule
on HIPAA Privacy Rule standards of identifiability?


Tor protect from informational risks (inappropriate
use/disclosure of information), mandatory data
security measures “modeled on” HIPAA?


Apply Common Rule to all institutions receiving
support from CR agency?


No continuing review for most minimal risk
research?

26

ANPRM


Proposals for “excused”
research


Additional requirements for “excused” (formerly exempt)
research?


Registration


Consent, oral or written, depending, with waiver
contemplated


Oral w/o documentation for educational tests, surveys,
focus groups, interviews


Data security standards


Retrospective auditing of portion of “excused” submissions

27

Proposal: Revised scope of
existing exemption 4


Expansion of .101(b)(4) by removing “existing” and
de
-
identified recording?


Keep collected for purposes other than the research

28

ANPRM


consent and exempt research


Additional consent requirements for “excused” (formerly
exempt) research?


Oral or written consent, depending, with waiver contemplated


Oral w/o documentation for educational tests, surveys, focus groups,
interviews (modifying exemption 46.101(b)(2))


Secondary use of data (modifying exemption 46.101(b)(4))


originally collected for research purposes, consent
required whether or not the researcher obtains identifiers


originally collected for non
-
research purposes, no change
(no consent required unless identifiers are obtained)

29

Your Experiences, Comments, Questions