Safety Data Mining:

sentencehuddleData Management

Nov 20, 2013 (3 years and 6 months ago)

65 views

Safety Data Mining:

Background and Current Issues

Ramin Arani, PhD



Safety Data Mining

Global Biometric Science

Bristol
-
Myers Squibb Company


SAMSI: July, 2006

Outline


Rationale for
Pharmacovigilance



AERS Data Base


Data base issues



Methodologies



BCNN (WHO)


MGPS (FDA)



Summary


Challenges and Opportunities



Pharmacovigilance
-

Rationale

Information obtained prior to first marketing is inadequate to cover all
aspects of drug safety:



tests in animals are insufficiently predictive of human safety,



in clinical trials patients are selected and limited in number,



conditions of use in trials differ from those in clinical practice,



duration of trials is limited



information about rare but serious adverse reactions, chronic
toxicity, use in special groups or drug interactions is often not
available.


Pre Approval Data

-

Controlled

-

Limited # Pts

-

Safety data not mature

Post Approval Data


-

Real life ; uncontrolled

-

Off label use

-
Generic

-

Solicited Safety


Data

-

Unsolicited Safety


Data

Population

Subjects for
approval

Pharmacovigilance
-

Rationale

Spontaneous AE Reports


Safety information from clinical trials is incomplete

°
Few patients
--

rare events likely to be missed

°
Not necessarily ‘real world’


Need info from post
-
marketing surveillance & spontaneous reports


Pharmacovigilance by reg. agencies & mfrs carried out.


Long history of research on issue

°
Finney (MIMed1974, SM1982)

Royall (Bcs1971)

°
Inman (BMedBull1970)


Napke (CanPhJ1970)



Issues


Incomplete reports of
events
, not necessarily reactions


How to compute effect magnitude


Many events reported, many drugs reported


Bias & noise in system


Difficult to estimate
incidence

because no. of pats at risk, duration of
exposure seldom reliable


Appropriate use of computerized methods, e.g., supplementing
standard pharmacovigilance to identify possible signals sooner
--

early warning signal


Safety Signal:

Reported information on a possible causal relationship between
an adverse event and a drug.

Pharmacovigilance
-

Definition

Phamacovigilance

Set of methods that aim at identifying and quantitatively
assess the risks related to the use of drugs in the entire
population, or in specific population subgroups

Adverse Drug Reaction

A
response

to a drug which is harmful and unintended, and which
occurs at doses normally used.


AERS Database


Database Origin 1969


SRS until 11/1/97; changed to AERS


3.0 million reports in database


All SRS data migrated into AERS


Contains Drug and "Therapeutic" Biologic Reports


exception = vaccines (VAERS)

Source of AERS Reports


Health Professionals, Consumers / Patients


Voluntary : Direct to FDA and/or to Manufacturer


Manufacturers: Regulations for Postmarketing Reporting

AERS Limitations



Different populations, Co
-
morbidities, Co
-
prescribing, Off
-
label
use, Rare events


Report volume for a drug is affected by, volume of use,
publicity, type and severity of the event and other factors,
therefore the reporting rate is not a true measure of the rate or
the risk


An observed event may be due to the indication for therapy
rather than the therapy itself; therefore observed associations
should be viewed as signal, and causal conclusions drawn
with caution



Examples

Claritin and arrhythmias

(channeling and need for detailed
data not in data base)


Increased number of reports due to preexisting
condition. Selection of high risk patients for the drug
deemed safest for them.


Prozac and suicide

(confounding by indication) Large
increase in reports following publicity and stimulated
reporting


The Pharmacovigilance Process

Detect Signals

Traditional


Methods

Data

Mining

Generate Hypotheses

Refute/Verify

Type A


(Mechanism
-
based)

Type B

(Idiosyncratic)

Insight from


Outliers

Estimate

Incidence

Public Health

Impact, Benefit/Risk

Act

Inform

Change Label

Restrict use/

withdraw

Methodologies

Finding “Interestingly Large” Cell Counts
in a Massive Frequency Table


Rows and Columns May Have Thousands of Categories


Most Cells Are Empty, even though
N
++

Is very Large


Only 386K out of 1331K Cells Have
N
ij

> 0


174 Drug
-
Event Combinations Have
N
ij

> 1000

No. Reports

AE
1



AE
n

Total

Drug 1

N
11



N
1n

N
1+

:

:

N
ij

:

:

Drug m

N
m
1



N
mn

N
m+

Total

N
+1



N
+n

N
++


Method
-

Basics


Endpoint: No of AEs


Most use variations of 2
-
way table statistics

No. Reports

Target
AE

Other
AE

Total

Target Drug

a

b

a+b

Other Drug

c

d

c+d

Total

a+c

b+d

n

Some possibilities



Reporting Ratio:


E(a) = (a+b)


(a+c)/n


Proportional Reporting Ratio:


E(a) =
(a+b)



c / (c+d)


Odds Ratio:


E(a) = b


c / d



OR > PRR > RR when a > E(a)


Basic idea:

Flag when

R =
a
/E(a) is
“large”

Bayesian Approaches


Two current approaches: DuMouchel & WHO


Both use ratio n
ij

/ E
ij

where

n
ij

= no. of reports mentioning both drug i & event j

E
ij

= expected no. of reports of drug i & event j


Both report features of posterior dist’n of ‘information criterion’





IC
ij

= log
2

n
ij

/ E
ij

= PRR
ij


E
ij

usually computed assuming drug i & event j are mentioned
independently


Ratio > 1 (IC > 0)


combination mentioned more often than
expected if independent

WHO (Bate et al, EurJClPhrm1998)



‘Bayesian Confidence Neural Network’ (BCNN) Model:


n
ij

= no. reports mentioning both drug i & event j


n
i+

= no. reports mentioning drug i


n
+j

= no. reports mentioning event j

Usual Bayesian inferential setup:


Binomial likelihoods for n
ij
, n
i+

, n
+j


Beta priors for the rate parameters (r
ij
, p
i
, q
j
)

WHO, cont’d


Uses ‘delta method’ to approximate variance of

Q
ij

= ln r
ij

/ p
i
q
j

= ln 2


IC
ij



However, can calculate exact mean and variance of Q
ij



WHO measure of importance = E(ICij)
-

2 SD(ICij)


Test of signal detection predictive value by analysis of signals 1993
-
2000: Drug Safety 2000; 23:533
-
542


84% Negative Pred Val, 44% Positive Pred Val


Good filtering strategy for clinical assessment

WHO, cont’d


WHO. (Orre et al 2000)


















IC
D
P
A
P
D
A
P
D
A
P
D
A
I
,
log
,
,













































1
,
0
,
log
log
,
log
log
,
,
,
1
1
1
2
1













































i
k
d
k
i
k
i
i
k
d
k
i
k
i
i
d
k
i
k
i
d
i
i
i
i
i
if
A
P
d
P
A
d
P
A
P
A
P
d
P
A
d
P
A
P
A
P
d
P
A
d
P
A
P
d
P
A
d
P
A
P
d
P
A
d
P
A
P
d
d
A
P
D
A
P
k
i
k
i
k
i
i


Let
A

denote adverse events and
D

denote the drug.

Mutual information
I
(A,D) is a measure of association

WHO, cont’d

DuMouchel (AmStat1999)


E
ij

known, computed using stratification of database
--


n
i+
(k)

= no. reports of drug i in stratum k

n
+j
(k)

= no. reports of event j in stratum k

N
(k)

= total reports in stratum k

E
ij

=

k

n
i+
(k)
n
+j
(k)
/ N
(k)

(E (n
ij
) under independence)


n
ij

~ Poisson(

ij
)
--

interested in

ij

=

ij
/E
ij


Prior dist’n for


= mixture of gamma dist’ns:

f(

; a
1
, b
1
, a
2
, b
2
,

) =


g(

; a
1
, b
1
) + (1



) g(

; a
2
, b
2
)


where

g(

; a, b) = b (b

)
a


1
e
-
b

/

(a)

DuMouchel, cont’d


Estimate

,
a
1
, b
1
, a
2
, b
2

using Empirical Bayes
--

marginal dist’n of
n
ij

is mixture of negative binomials


Posterior density of

ij

also is mixture of gammas


ln
2


ij

= IC
ij


Easy to get 5% lower bound (i.e. E(IC
ij
)
-

2 SD(IC
ij
) )

The control group and the issue
of ‘compared to what?’


Signal strategies, compare



a drug with itself from prior time periods


with other drugs and events


with external data sources of relative drug usage and
exposure


Total frequency count for a drug is used as a relative surrogate for
external denominator of exposure; for ease of use, quick and
efficient;


Analogy to case
-
control design where cases are specific AE term,
controls are other terms, and outcomes are presence or absence of
exposure to a specific drug.

Other useful metrics and methods


Chi
-
square statistics


P
-
value type metric
-

overly influenced by sample size


Modeling association through directly Multivariate Poisson dist


Incorporation of a prior distribution on some drugs and/or
events for which previous information is available
-

e.g. Liver
events or pre
-
market signals

Interpreting the Signal Through

the Role of Visual Graphics


Four examples of spatial maps that reduce the scores to
patterns and user friendly graphs and help to interpret
many signals collectively


Example 1


A spatial map showing the “signal scores” for the most
frequently reported events (rows) and drugs (columns) in
the database by the intensity of the empirical Bayes signal
score (blue color is a stronger signal than purple)

Example 2


Spatial map showing ‘fingerprints’ of signal scores allowing one
to visually compare the complexity of patterns for different drugs
and events and to identify positive or negative co
-
occurrences

Example 3


Cumulative scores and numbers of reports according to the
year when the signal was first detected for selected drugs

Example 4


Differences in paired male
-
female signal scores for a specific
adverse event across drugs with events reported (red means
females greater, green means males greater)


Summary


1.
There is NO Golden Standard method for signal detection.

2.
The signals become more stable over time, however there is a
limited time window of opportunity for signal detection.

3.
Use Time
-
slice evolution of signal.

-
Fluctuation might reveal external risk factors.

-
Robustness can be assessed.

4.
Consider other endpoint such as time to onset, duration of
event, etc.

5.
For spontaneous case reports, the means to improve content is
to standardize and improve intake

6.
Data mining likely will generate many false positives and
affirmations of what was previously known

7.
Causality assessments should largely be reserved refining
important signals



Challenges in the future


More real time data analysis


More interactivity ( Visual Data mining, e.g. ggobi )


Linkage with other data bases to control the bias
inherent in data base


Quality control strategies (e.g. Identifying duplicates


Methods to reduce the false positive and negative?