JCKBSE2010
Kaunas
Predicting Combinatorial Protein

Protein Interactions
from Protein Expression Data
Based on Correlation Coefficient
Sho
Murakami, Takuya Yoshihiro,
Etsuko Inoue and Masaru Nakagawa
Faculty of Systems Engineering, Wakayama University
JCKBSE2010
Kaunas
Wakayama University
2
2
Agenda
Background
Combinatorial Protein

Protein Interactions
The Proposed Data Mining Method
Evaluation
Conclusion
JCKBSE2010
Kaunas
Wakayama University
Background
Finding
Interactions among
genes/proteins
are
important
Many data

mining algorithms to discover gene

gene
(or protein

protein) interactions are proposed so far.
One of the main source is
gene or protein expression data
3
2D
Electorophoresis
（
for
protein expression
）
Microarray
（
for
gene expression)
Color
strength is
expression level
Size of
spot
is
expression
level
JCKBSE2010
Kaunas
Wakayama University
Related Work for Interaction Discovery
Bayesian Networks
Discovering
interactions from expression data
based on
conditional probability among events
4
A
C
B
A
B
8
7
B
A
C
B
A
C
Ex. to discover protein

protein interactions among proteins A, B and C,
1. Define events A, B and C
2. Compute conditional probability related with A, B and C
samples
Event “C is expressed”
If high,
Interaction is
predicted
JCKBSE2010
Kaunas
Wakayama University
Problems of Bayesian Networks
Bayesian Networks Require large Number of Samples
For gene: microarray supplies cheap and high

speed experiment
For protein: 2D

electrophoresis takes time and expensive
5
8
7
B
A
C
B
A
A
C
B
sufficient samples
in the area ?
Many Samples are Necessary
to obtain statistically reliable results
A
B
C
ex. to discover protein

protein interactions among proteins A, B and C,
1. Define events A, B and C
2. Compute conditional probability related with A, B and C
JCKBSE2010
Kaunas
Wakayama University
6
The Objective of our study
Finding
combinatorial protein

protein interactions
from
small

size protein expression data
JCKBSE2010
Kaunas
Wakayama University
7
7
Expression Data
2D

electrophoresis processed for each sample
which includes expression levels of each protein.
Expression levels: obtained by measuring size of areas
As pre

processing, normalization is applied
サンプル
個体
タンパク質
ID
A
B
C
D
…
Sample1
0.50
0.20
0.17
0.06
…
Sample2
0.30
0.40
0.12
0.02
…
Sample3
0.75
0.10
0.08
0.02
…
…
･
･
･
･
･
･
･
･
･
･
･
･
･
･
･
Each black area indicates a protein:
size of areas
represent expression levels
sample3
sample2
sample1
Proteins
JCKBSE2010
Kaunas
Wakayama University
8
8
Model of Protein

Protein Interaction Considered
Model: two proteins A and B effect on other protein C’s
expression level
only when both A and B are expressed
We want to estimate the
combinatorial Effect
!
A
B
C
C
A
B
C
Effect on
expression levels
Complex of A and B
A
B
A
B
A
B
Sole effect
from A,B on C
is usually considered
Only If both A and B exist,
Combinatorial effect
works on C!
JCKBSE2010
Kaunas
Wakayama University
9
9
Predicting Interactions by Correlation Coefficient
Computing correlation coefficient of (A,B) and C
Correlation coefficient requires less number of samples
The amount of complex (A,B) is estimated by min(A,B)
Total effect on C will be high if correlation is high
Expression
level
A
B
Expression level of A and B
of a sample
Estimated amount
of complex of A and B
Compute correlation of
min(
A,B
) and
C
This amount would
Effect on C
min(
A,B
)
C
JCKBSE2010
Kaunas
Wakayama University
10
10
The problem of scale difference
Amount of expression level for 1 molecular is different among
proteins, so
the same amount of A and B not always combined.
Therefore, taking min cannot express correct amount of complex
Exp.level
A
B
Proteins
A
and
B
Estimated number of complex
A
B
Proteins
A
and
B
The amount of
complex is
not correct
Taking min
leads
correct
amount of complex
Solution
：
correct the scale of A
Scaling problem and solution
is the expression
level required for a
complex
Exp.level
JCKBSE2010
Kaunas
Wakayama University
11
11
How to determine correct scale?
Expression
level
A
B
k
1
A
k
2
A
k
3
A
We compute
Score S
: the
total effect
of (A, B) on C
Compute
Correlation
Select the scale which leads the maximum correlation coefficient
of
min(A,B) and C
If interaction of our model exists, high correlation value must appear.
min(
A,B
)
min(
A,B
)
min(
A,B
)
min(
A,B
)
Score S
Correlation
：
0.1
Correlation
：
0.2
Correlation
：
0.3
Correlation
：
0.7
JCKBSE2010
Kaunas
Wakayama University
Estimating Combinatorial Effect from Score S
Score S consists of
“Sole Effect”
and
“Combinatorial Effect”
Compute Score S’: Score S assuming no combinatorial effect
Difference between S and S’ is the level of Combinatorial Effect
12
Level of combinatorial effect
B
C
A
The difference between score S and S’ is the combinatorial effect
A
B
C
B
C
A
C
Assuming no
combinatorial Effect
A
B
C
C
Score S
B
C
A
Score S
’
Computing
Statistic
Distribution
JCKBSE2010
Kaunas
Wakayama University
Assume that expression levels of proteins A, B and C follow
normal distribution
Computer simulation leads the distribution of Score S’
How to compute distribution of score S’?
13
上が平均、下が標準偏差
A→C
B→C
0.05
0.10
0.15
0.20
0.06921
0.10113
0.14296
0.18771
0.03089
0.03220
0.03206
0.03179
0.12239
0.15603
0.19713
0.03376
0.03392
0.03327
0.17806
0.21105
0.03498
0.03506
0.23262
0.03618
0.05
0.10
0.15
0.20
Correlation
α
Correlation
β
Distribution of A
Distribution of B
Distribution of C
Score S’ of α=0.5, β=0.3
②
Obtain distribution of score S’
①
Randomly create a distribution of A, B and C
where correlation coefficient of
A

B is α
, that of
B

C is β
③
Create the table
of average and
stddev
for each α and β
Repeat computation
of score S
Score S’
of
α=0.5, β=0.4
We can obtain the
distribution for
each α and β.
Upper: average
Lower:
stddev
JCKBSE2010
Kaunas
Wakayama University
Place the score S in distribution of S’
Z

score: Measure difference between score S and
average of S’
as the count of standard deviation
B
C
A
A
B
C
B
C
A
C
A
B
C
C
スコア
S
B
C
A
スコア
S’
Score S
Computing Combinatorial Effect as Z

score
14
The higher z

score is, the stronger the combinatorial effect is !
Distribution of score S’
Compute score S
corresponding
The amount of
combinatorial effect level
Z

score
＝
(score S

avg
(S’)) /
stddev
(S’)
Measurement as count of standard deviation
average
Score S
Z

score
Score S’
JCKBSE2010
Kaunas
Wakayama University
①
Trying all combination of A, B and C
②
Compute the maximum correlation coefficient among
all scale of A and B
to compute Score S
③
Compute z

score and create ranking by them
15
Compute z

scores
from distribution of S’
Summary of the proposed algorithm
A
B
C
D
sample1
sample2
sample3
Expression
Data
(A,B)→C
(A,B)→D
(A,B)→E
(A,B)→F
：
(A,C)→B
(A,C)→D
(A,C)→E
(A,C)→F
：
…
…
…
…
(B,C)→A
(B,C)→D
(B,C)→E
(B,C)→F
：
…
…
…
…
Trying all combinations
1
Compute max correlation
among every scale
2
Ｃ
A
B
Ｃ
A
B
Ｃ
A
B
Try every scales
correlation
：
0.3
correlation
：
0.8
correlation: 0.5
S
Z

score
=
5.5
list of all combinations
3
Ranking by z

score
４
rank
Combinations
Z

score
１
(A,C)→B
5.5
２
(B,C)→E
4.9
３
(A,B)→F
4.7
Score S = 0.8
S’
JCKBSE2010
Kaunas
Wakayama University
16
Evaluation
Applying our method into real expression data
Protein expression data of black cattle
# of samples is 195, # of proteins is 879
finding combinatorial protein

protein interactions
using our method
JCKBSE2010
Kaunas
Wakayama University
The Expression Data Follows Normal Distribution
By way of
Jarque

Bera
test with confidential level of 95%,
we test if expression data follows normal distribution.
Result:
454 proteins out of 879 proteins follow normal distribution
Thus, we use 454 proteins for evaluation
17
JCKBSE2010
Kaunas
Wakayama University
Results
We found
so
many
combinations of
proteins
which would have
combinatorial effect
The maximum value of z

score is 11.0
The combinations where z

value is more than about 5.5
(p

value is
less than 0.000000019(=0.05/
454
C
3
)))
would have combinatorial effect with confidential level of 95%.
18
実
データ
の
ヒストグラム
0
50
100
150
200
250
300
11
.
5
～
11
11
～
10
.
5
10
.
5
～
10
.
0
10
～
9
.
5
9
.
5
～
9
.
0
9
.
0
～
8
.
5
8
.
5
～
8
.
0
8
.
0
～
7
.
5
7
.
5
～
7
.
0
z
スコア
組合
せ
数
The histogram of z

score
# of combinations
Z

score
JCKBSE2010
Kaunas
Wakayama University
Comparing z

scores with normal distribution
19
We compare the
histogram with that of without combinatorial effect
Created
by
augmenting normal distribution with the number of trials (
454
C
3
)
It is inferred that
this data includes considerable amount of combinatorial effect
Distribution of z

score under
assumption no combinatorial effect
Estimated distribution of z

score
obtained from real data
実
データ
の
ヒストグラム
0
50
100
150
200
250
300
11
.
5
～
11
11
～
10
.
5
10
.
5
～
10
.
0
10
～
9
.
5
9
.
5
～
9
.
0
9
.
0
～
8
.
5
8
.
5
～
8
.
0
8
.
0
～
7
.
5
7
.
5
～
7
.
0
z
スコア
組合
せ
数
# of combinations
Z

score
Histogram of real data
複合体
の
作用
がない
場合
の
ヒストグラム
0
0.01
0.02
0.03
0.04
0.05
11
.
5
～
11
1110.5
10.510.0
10
～
9
.
5
9
.
5
～
9
.
0
9
.
0
～
8
.
5
8
.
5
～
8
.
0
8
.
0
～
7
.
5
7
.
5
～
7
.
0
6.5
6
z
スコア
組合
せ
数
の
期待値
# of combinations
Z

score
Histogram without
combinatorial
effect
JCKBSE2010
Kaunas
Wakayama University
The Ranking based on Z

score
20
The ranking
table
shows
that
Combinations
with
low
score
S
are
retrieved.
Same
protein
tends to appear
many times.
The ranking of Z

score obtained from real data
順位
A
（スポット番号）
B
（スポット番号）
C
（スポット番号）
Cor(A,C)
Cor(B,C)
Sabc
ｚスコア
1
5146
6239
4470
0.092
0.317
0.674
11.02
2
6154
6239
5418
0.071
0.339
0.674
11.01
3
2572
4292
6239
0.137
0.293
0.561
10.94
4
5146
6239
1468
0.173
0.371
0.729
10.29
5
5661
6281
5342
0.007
0.390
0.504
10.20
6
5146
6239
4478
0.089
0.315
0.648
10.19
7
5661
6281
5730
0.058
0.434
0.613
10.17
8
5026
6239
1333
0.052
0.350
0.560
10.15
9
5026
6239
3626
0.029
0.314
0.470
10.14
10
5695
6143
6042
0.148
0.444
0.640
10.12
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
B
C
A
C
Correlation
of B

C
Score S
B
C
A
Z

score
Rank
A
Protein Num
B
Protein Num
C
Protein Num
Correlation
of A

C
JCKBSE2010
Kaunas
Wakayama University
Conclusion
21
Summary
We propose
a
method to
estimate
combinatorial
effect
of
three
proteins
from protein
expression data
Applying
the method
into
real data,
we
found
many
combinations
which would
have
combinatorial
effect
Future
work
To confirm
the reliability, we are
planning
to study
whether
the
found
combinations
include
well

known
protein

protein
interactions
or not.
Comments 0
Log in to post a comment