Discovery of Hepatocelluar Carcinoma-specific genes based on Mutivariable Statistical Methodology

chardfriendlyΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

103 εμφανίσεις

다변량

통계분석에

의한

간암관련

특이

유전자

발굴과

해석


이동권
1
,

박진현
*, 1
,
최상욱
2
,
김명수
*
,
이인범
1, 2
,
김영희
3
,
정은정
3
,
임상욱
3
,
김문규
3
,
김정철
3

*
㈜피앤아이컨설팅
,
1
포항공과대학교

화학공학과
,
2
포항공과대학교

환경공학부

3
경북대학교

의과대학교

면역학교실


Discovery of Hepatocelluar Carcinoma
-
specific genes based on Mutivariable Statistical
Methodolog
y


Dongkwon Lee
1
,
Jin Hyun Park
*, 1
,
Sang wook Choi
2
, Myengsoo Kim
*
, In
-
Beum Lee
1, 2
,
Young Hee Kim
3
, Sang Uk Im
3
, Eun Jung Chung
3
, Moonkyu Kim
3
, Jungchul Kim
3

*
P&I Consulting Co., Ltd.,
1
Department of Chemical Engineering, POSTECH

2
School of Science and E
nvironmental Engineering, POSTECH

3
Department of Immunology, School of Medicine,

Kyungpook National University


Introduction

Hepatocellular carcinoma is the first leading cause of death among men in Korea. Little
is known about the genetic events in malign
ant transformation of liver cell. Despite the
variety of morphological and clinical parameters were used to classify HCC, patients have
received a similar chemotherapy. This is due in part to the functional diversity of the liver
and lack of genetic inform
ation from systematic analysis. However, so many criteria [1,2] to
classify HCC have been suggested, actually there are few
methods that

are obviously defined
and useful to understand molecular appearances of each patient in primary HCC. Therefore,
systema
tic cancer classification and monitoring methodology are needed to understand a
certain patient appearances and to identify more efficient methods for treatment in molecular
cancer research[3]. The specific
genes, which are related to determine the morphol
ogical
appearance
s

of HCC,

can be used as markers for diagnosis, prognosis, therapeutic
monitoring and the earlier indication of relapse. For HCC therapy, the genes or gene products
can also be targeted by various agents such as antibodies or drugs. Moreov
er, therapeutic
vaccines may be developed from these genes.

Numerous approaches have been used to scrutinize the genetic causes or composition of
HCC. Methods used to identify overexpressed genes include serial analysis of gene
expression, PCR and Northern

blotting. However, none of these techniques provides a
complete, easy, systematic and reliable gene expression between
morphologically

different
HCC. We have analyzed the genetic composition of HCC by combining a cDNA library
subtraction and a microarray
high throughput screening procedure. In this study, we present a
novel systematic methodology for discovering genes that determines the morphologic
subtypes of HCC. We mainly describe how multivariate data analysis can be applicable for
molecular classific
ation and mining biologically invaluable factors from the data rather than
discuss manners of cDNA microarray experiment itself. The suggested methodology is
constructed by three major stages; identification of subtypes and its class prediction,
HCC
-
specif
ic gene selection, validation of selected genes by clustering.


Microarray data analysis

Tumor samples and cDNA microarray data

Human 3k cDNA chip (include 3136 cDNA probes) from Bank of Human Stromal Cell
cDNA in Kyungbook National University was construc
ted for profiling. The 11 mRNA
samples from primary HCC (4 samples of solid type, 4 of pseudoglandular, 3 for class
prediction by support vector machine) were obtained each patient at the time of
surgical
operation. As previously mentioned, liver from canc
erous patient has many environmental
noises (e.g.,cirrhosis and sclerosis) comparatively higher than other organs. On that account
we used normal hepatic cells using laser
-
capture microdissection extracted from equal
patient as control since our focus is n
ot cirrhosis or sclerosis only but type of carcinoma. This
comparison can be more effective to eliminate non
-
focusing factors. For each of these 11
patients, target cDNA was obtained from mRNA by reverse transcription and labeled using a
red
-
fluorescent dy
e, Cy5. The reference sample was prepared by the same manner and
labeled using a green
-
fluorescent dye, Cy3. All samples were hybridized to Human 3k cDNA
chip and scanned microarray images by confocal laser scanner. The image analysis was
executed by Quant
array ver3.0 software. After all experiment were finished, we applied
multivariate data analysis to the obtained cDNA microarray data. For post
-
experimental
analysis, R/G ratio was acquired instead of intensity for chip to chip comparison.


Normalization

N
ormalization can be done in several ways depending on the experimental set
-
up. Each
method is more appropriate to another experiment. For example one is the way of
considering all genes
on the

array and another is using only constantly expressed genes, so
called

housekeeping genes

. The former is comparatively superior to cancer cell profiling
and the latter to cultured cell line profiling[4]. In this research, we use all genes on the array
as control after removing false
-
positives.


Support Vector Machine

Like conventional neural
networks

and radial basis function networks, support vector
machine (SVM) as a universal approximator recently has been widely used for pattern
recognition and nonlinear regression. Basically, SVM is a linear machine which is a
ap
proximate implementation of structural risk minimization since it considers the
generalization
error

rate rather than only training
-
error rate. Accordingly, SVM is constructed
by some training samples largely contributing to pattern separation unlike commo
n pattern
classifiers
.

Generally, SVM has two
computational

steps for
nonlinear
ly separable patterns. First, data
are nonlinearly mapped from an original
space

into a high
-
dimensional feature space where
the patterns are linearly separable with high probab
ility. The typical types of inner
-
product
kernels
K
(
x
,
x
i
) of input vector
x
and the input pattern
x
i

are polynomial and radial basis
functions. An inner
-
product kernel makes
nonlinear

transformation of original
multi
-
dimensional inputs.

Then, the optimal h
yperplane defined as a linear function of feature vector is designed for
separating features. We may state the primal problem for the constrained optimization of a
support vector machine. By solving optimization problem, the weights and bias, that is linea
r
parameters of a discriminant function, can be obtained.

In this stage, we set out to develop a systematic method to HCC classification and test a
significance of microarray data.


Gene selection

Biological comparisons made on microarrays are very specifi
c in nature, that is, it is
believed that only small number of genes are differentially expressed on the array according
to biological concepts[5]. Therefore, this stage is most important to identify the HCC
subtypes and reveal the cause of HCC.

We identif
ied the possibility for discriminating two separate
classes of HCC
. The interest
concentrated on the determining which genes are significant for extracting the meaningful
outcomes. Key concept, we suggest, is to select genes which can
discrete

the HCC clas
ses
explicitly

based on
canonical

correlation between the genes and class labels.


arg
{CORR(genes, class labels)}


where

CORR


is the function that means the correlated extent of genes with class labels.

All

well
-
known genes, whic
h can be discovered from gene
-
related database, should be
covered within the selected genes. Alternatively, to find the HCC related genes, we used
canonical variable to maximize the separation of these two known classes. In this research,
we combine two me
thods to confirm the selected genes and reduce the computational efforts.


Advanced
K
-
means

clustering

The K
-
means clustering
method

is used to validate the selected genes and investigated
unknown HCC subtypes. This clustering
method

is a bit modified in o
rder to add the
cluster
-
merging characteristic, so we do not need to predetermine the number of data
partitions. Nevertheless, it should be noted that the number of radial basis functions
determined by arbitrary clustering algorithm is not perfectly profit
able, since it is chosen by
considering only the input data pattern without output data pattern. The cluster
-
merging
K
-
means algorithm is as follows.


a.

Choose initial number of data partitions,
M

(
M
<
N
)

a.1 For
i
= 1,

,
M

If
i
= 1, randomly select among
N

all

data samples

Otherwise,

b. While
i



i
max

b.1 Rearrange each sample to the corresponding cluster and calculate overall cost
function
, where
G
j

is the
j
th cluster group.

b.2 If the number of samples belonging to
c
j

is less than predefined minimum number
of each cluste
r, merge
c
j

into the
nearest

cluster. Then, the number of clusters in the
(
i
+1)th step is
.

c. If

or
J



J
lim
, stop, otherwise, go to step 2.


Conclusion


We present here a new general
methodology

for discover
ing HCC
-
specific genes that
determines the morphologic appearances. Then, the HCC
-
related genes can be confirmed by
genebank (
www.ncbi.nlm.nih.gov
), published literatures or Northern blot analysis. For
example Cyr61

among CCN family[6]
found
from data analysis is
expressed in developing
mouse cartil
a
g
in
ous elements
. Besides it
promote
s

tumor growth and vascularization
. As
another example Osteonectin[7]

is known to be
a glycoprotein involved in extracellular
matrix re
modeling
. The selected genes contain useful ones associated with tumor growth,
transcriptional regulation, and intracellular signaling such as androgen
-
regulated protein
FAR
-
17[8], protective protein[9], and nuclear antigen H731[10].


Most importantly, the

proposed
methodology

provides a general diagnostic approach for
distinct cancer subtypes and determines drug target without previous
biological

knowledge.


Acknowledgement

This work was supported by the Brain Korea 21 project.


Figure 1. Class discovery

with MDS and Class prediction of HCC based on SVM

References

(1)

Lovet, J M; Brú, C; Bruix, J,
Seminars in Liver Disease, Volume 19, Issue 3, 1999,
Pages 329
-
338

(2)

Llado, L; Virgili, J; Figueras, J; Valls, C; Dominguez, J; Rafecas, A; Torras, J; Fabregat,
J; G
uardiola, J; Jaurrieta, E
.
,
Cancer, Volume 88, Issue 1, January 1, 2000, Pages 50
-
57

(3)

Golub, T R; Slonim, D K; Tamayo, P; Huard, C; Gaasenbeek, M; Mesirov, J P; Coller,
H; Loh, M L; Downing, J R; Caligiuri, M A; Bloomfield, C D; Lander, E S
,
Science,
Volum
e 286, Issue 5439, October 15, 1999, Pages 531
-
537

(4)

Tseng, G C; Oh, M K; Rohlin, L; Liao, J C; Wong, W H,
Nucleic Acids Research
(Online), Volume 29, Issue 12, June 15, 2001, Pages 2549
-
2557

(5)

Beissbarth, T; Fellenberg, K; Brors, B; Arribas
-
Prat, R; Boer, J
; Hauser, N C; Scheideler,
M; Hoheisel, J D; Schütz, G; Poustka, A; Vingron, M,
Bioinformatics (Oxford, England),
Volume 16, Issue 11, November 2000, Pages 1014
-
1022

(6)

Shoji Hirasaki, Norio Koide, Kozo Ujike, Toshiyuki Shinji and Takao Tsuji
,
Expression
of N
ov, CYR61 and CTGF genes in human hepatocellular carcinoma,
Hepatology
Research, Volume 19, Issue 3, 26 March 2001, 294
-
305

(7)

Le Bail, B; Faouzi, S; Boussarie, L; Guirouilh, J; Blanc, J F; Carles, J; Bioulac
-
Sage, P;
Balabaud, C; Rosenbaum, J
.
Osteonectin/SP
ARC is overexpressed in human
hepatocellular carcinoma,
The Journal of Pathology, Volume 189, Issue 1, September
1999, Pages 46
-
52

(8)

H. van Ojik, G. Groenewegen, R. Repp, T. Valerius, A. van Oers, N. A. C. Westerdaal, Y.
Deo, G. H. Blijham and J. G. J. van d
e Winkel
.
Phase I trial of MDXH210 (Fc
RI ×
HER
-
2/neu, bispecific antibody) in combination with G
-
CSF in patients with metastatic
breast cancer,
The Netherlan
ds Journal of Medicine, Volume 48, Issue 5, May 1996,
Page A59

(9)

Cardier, J E; Schulte, T; Kammer, H; Kwak, J; Cardier, M

,
Fas (CD95, APO
-
1) antigen
expression and function in murine liver endothelial cells: implications for the regulation
of apoptosis in
liver endothelial cells,
The FASEB Journal: Official Publication of the
Federation of American Societies for Experimental Biology, Volume 13, Issue 14,
November 1999, Pages 1950
-
1960

(10)

Cmarik, J L; Min, H; Hegamyer, G; Zhan, S; Kulesz
-
Martin, M; Yoshinaga,
H;
Matsuhashi, S; Colburn, N H
,
Differentially expressed protein Pdcd4 inhibits tumor
promoter
-
induced neoplastic transformation,
Proceedings of the National Academy of
Sciences of the United States of America, Volume 96, Issue 24, November 23, 1999,
Pages

14037
-
14042