Supplementary Material
2
–
Biclustering
Background
.
The
canonical data model
f
or
clustering
assumes
a set of observations each of which is described by a
set of features or
variables.
T
he
clustering
data is typically
arranged in a matrix where
rows repres
ent observations
and the columns features or variables. In classical clustering the goal is
to group observations
that are similar
into
clusters, where similarity is defined
or computed
over
the full set of variables
. An
alternative
is to identify clusters
consisting
of observations that are similar based only on a subset of the original variables.
Biclustering
attempts
to
identify clusters of the latter type.
Additionally, in biclustering
the conceptual distinction between observations and
variables
need
n
ot
be
made
, and
because of that
biclustering is often described as the simultaneous clustering of both
observations and variables.
Current
applications of clustering are characterized by
massive
high

dimensional
(very large number of variables)
data
sets
,
which
in turn
pose several challenges that cannot be addressed by using traditional clustering
techniques.
The main
difficulties arise by the presence of noisy variables, the possibility
that
different variables are relevant to
different clusters, and
tha
t the concept of distance or similarity (e.g., Euclidian distance) becomes meaningless as the
dimension increases
1
. Clustering of drugs and
adverse events (
AEs
)
, the subject of
our
paper, falls into to this group
of applica
tions since typical SRS contain
thousands of drugs and thousands of AEs, creating a
clustering space of
several thousand dim
ensions.
In addition
, this
clustering space
is
very sparse (most cells
in the data matrix
contain
zero) since drugs are typically reported
or associated
with much fewer AEs then the full set of AEs found in the
SRS. Studies have shown
2
that the density (proportion of non

zero cells) of such
drug

AE
data matrices is around
30%, and much less if the
terminology used is left
unprocessed
. As mentioned in the main paper,
of the
308
,
5
20
,
126
(28,341 drugs × 10,886 AEs) cells of the
initial
data clustering matrix used in this study,
over 99%
contained zero
.
Biclustering is a relatively new clustering technique designed
mainly
to address these challenges
,
and
originated in
bioinformatic
s where it is used identify groups of genes that may be functionally related under certain environmental
conditions (e.g. cancer
samples
) based on expressions levels measured in a microarray experiment
1, 3
. The
microarray
data
is arranged in a matrix
, where
rows
correspond to
genes, colu
mns to environmental co
ndition
s, and
where e
ach
cell
in
this matrix represents the
expression level of
a specific
gene
under
a specific condition
.
Biclustering is then applied to identify sub

matrices representing genes that are similarly expressed under a
subset of
conditions.
Biclustering has also been used in the domain of text mining
to identify groups of documents which can
be defined or described by a similar group of words
, i.e., topics
, where a text corpora is represented by a data matrix
whose rows
rep
resent
documents, whose columns represent words, and whose cell
s
contain the
co

occurrence
frequency
of a particular word in a particular document
4
.
Similarly, biclustering can be used to cluster drugs and
AEs in SRS, where drugs are analogous to genes or documents, AEs to environmental conditions or words, and
drug

AE association strength to gene expression levels or document

word co

occurrence
freq
uencies. A bicluster of
drugs and AEs is then interpreted as a set of drugs that are related t
o each other by their common AE
associations.
Many biclustering algorithms have been
proposed
in recent years. Although they all share the general bicluster
conc
ept, each formulates a different objective, problem setting, and data model
1, 3
.
The general objective of
biclustering is to identify sub

matrices (biclusters) of the data matrix such that each bicluster
satisfies
a
specific
homogeneity criterion
, which generally varies
from approach to
approach
.
The most common homogeneity criteria
or the so called bicluster models are
3
: bi
clusters with constant values
, biclusters with const
ant values on row
s or
columns
, and biclusters with coherent or correlated values.
A relatively new biclustering model
which has gained
popularity in recent years, primarily due the application of
finding transcription factor modules responsible for gene
regulation
5
, assumes as
in
this
wor
k a binary data model. The binary model assumes two possible states, in gene
expression analysis, a gene is either “on” (expressed) or “off” (not expressed) with respect to a control or condition,
and
in
our case a drug is either associated
with
an adverse
effect or not.
D
isproportionality
analysis in preparation for biclustering.
Prior to clustering
the association strength value
must be computed for each drug

AE pair in the data. Then, based on this
value,
biclustering will group drugs that are
strongly
associated with a common set of AEs.
Disproportionality analysis
is the approach used by most current computerized pharmacovigilance signal detection
methods in order to quantify the association strength of a drug

AE combination and highlight strong assoc
iations as
potential ADEs. Among the set of
disproportionality measure
s
is the relative reporting ratio (RR)
6
, which is defined
as the ratio of the observed incidence rate of a drug

AE association to its “baseline” expected rate under the
assump
tion that the drug and AE occur independently. Based on the entries of Table 1
in this document,
the
expected number of reports containing a certain drug and AE combination under the assumption of independence is
equal to
t×n/t×m/t
(
n/t
and
m/t
are
estimat
es of
the probabilities of the drug and AE respectively)
. Given
that
the
observed incidence rate is equal to
a
,
then RR is defined as
,
t
n
m
a
RR
/
A value of RR close to 1 indicates that there is no association between the drug and AE. A valu
e of 3 for example
indicates that there are 3 times as many drug

AE reports in the database than would be expected, and might support
the hypothesis of an ADE association. The Gamma Poisson S
hrinker (GPS)
2
, which is used in this work, is a
pharmacovigilance signal detection method endorsed and used by the FDA
7
to monitor safety signals in their SRS
.
GPS is a based on a Bayesian approach that attempts to account for the uncertainty in RR associated with small
observed and expected coun
ts, by “shrinking” RR towards the baseline case of no association (value of 1), by an
amount that is proportional to the variability of the RR statistic. The result of this shrinkage is a reduction of
spurious associations when there is not enough data to
support it.
The general idea underlying GPS is to assume that
the true measure of RR is unknown, and begin by making a prior assumption about the distribution of the RR values
in the database. Then, based on a modeling framework called
empirical Bayes
, it
uses the observed RR values for
the entire set of associations in the data to estimate the parameters of this distribution. Having estimated these
parameters, GPS then computes a measure called EBGM (empirical Bayes geometric mean), which is essentially an
estimate of the posterior expected value of RR for a particular drug

AE pair.
In this work, the GPS method was implemented exactly as specified in original paper including
stratification (step 3
in the main paper) and recommended
seeding parameters
2
.
Table 1
.
Contingency table
specifying the
number of reports mentioning a specific drug
and a specific adverse effect (AE)
Biclustering in AERS.
Let
A
m×n
be our data matrix of
m
drugs and
n
AEs
representing drug

AE associations in
AERS, i.e., each cell in this matrix
a
ij
contains GPS’ EBGM association strength value
c
omputed for
the
i

th drug
and the
j

th AE pair.
In order to obtain a binary data model, this matrix was then transformed into a binary data
matrix B
m×n
, where each cell
b
ij
contained either a 1 or 0, representing the states of “strongly associated” or “w
eakly
associated” respectively. The transformation was performed by selecting an “association strength threshold” T,
which
is
used qualify each association as either “strong” or “weak”. That is,
T
a
if
T
a
if
b
ij
ij
ij
0
1
Given this matrix
,
a bicluster (D,
E) corresponds to a sub

matrix of B in which all cells equal 1, i.e., a subset of drugs
D that are jointly associated
with a subset of events E,
as in
Figure 1 of this document.
The problem of biclustering can be posed in a graph
theoretic
setting of sear
ching for bicliques (complete bipartite
graphs)
as depicted in Figure 1 of
the main paper, where on
e
set of nodes in the
graph
correspond to drugs, the other
set of nodes to AEs, and
where an edge connects nodes
between the two disjoint
sets
and represents
a strong
statistical
association (denoted by 1 in the data matrix) between a drug and an AE. The bicluster is then interpreted
as a set of drugs which are each associated with the same set of AEs, or alternately a group of drugs that potentially
all cause
the same set of AEs. Binary inclusion

maximal biclustering (Bimax)
8
, which assumes the
binary data
model
and is used in this work, uses
a divide

and

conquer
approach to identify biclusters (bicliques).
The general
Target AE
All other AEs
Total
Target drug
a
b
n=a+b
All other drugs
c
d
c+d
Total
m=a+c
b+d
t=a+b+c+d
idea behind Bimax is to decompose
the binary data
matrix into three sub

matrices, one which contains only
0’s
and
therefore
can
be discarded. The a
lgorithm is then applied recursively to the two
remaining
sub

matrices, and the
recursion ends when one of the sub

matrices represents a bicluster (
contains
only 1
’
s).
As mentioned in the main
paper, in contrast to most clustering algorithms, Bimax is an e
xact algorithm able to find all of the biclusters that
exist in the data
Fig.
1
.
Durg

AE
Biclusters
in the binary data model
. Left, the initial binary data matrix before clustering. Right, two biclusters
identified
(sub

matrices where all
cells equal 1)
.
The top bicluster is
( { d1,d3,d6 } , { e1,e5,e2,e4 } ) and
the bottom bicluster is
( { d2,d5 } , { e2,e4,e3 } ).
Reference List
(1)
Kriegel H, Kröger P, Zimek A. Clustering high

dimensional data: A survey on subspace clusterin
g, pattern

based clustering, and correlation
clustering.
ACM Trans Knowl Discov Data
2009;3(1):1

58.
(2) DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA Spontaneous Reporting System.
Am Stat
1999;53(3):177

90.
(3
) Madeira SC, Oliveira AL. Biclustering Algorithms for Biological Data Analysis: A Survey.
IEEE/ACM Trans Comput Biol Bioinformatics
2004;1(1):24

45.
(4)
Long B, Zhang Z, Yu PS. Co

clustering by block value decomposition.
Proceedings of the eleventh ACM SI
GKDD international conference
on Knowledge discovery in data mining
2005;635

40.
(5) van Uitert M, Meuleman W, Wessels L. Biclustering sparse binary genomic data.
J Comput Biol
2008;15(10):1329

45.
(6) Hauben M, Madigan D, Gerrits CM, Walsh L, van Puijenbr
oek EP. The role of data mining in pharmacovigilance.
Expert Opin Drug Saf
2005 September;4(5):929

48.
(7) Szarfman A, Machado SG, O'Neill RT.
Use of screening algorithms and computer systems to efficiently signal higher

than

expected
combinations of drugs
and events in the US FDA's spontaneous reports database.
Drug Saf
2002;25(6):381

92.
(8) Prelic A, Bleuler S, Zimmermann P et al.
A systematic comparison and evaluation of biclustering methods for gene expression data.
Bioinformatics
2006 May 1;22(9):1122

9.
Comments 0
Log in to post a comment