Bioinformatics - plus and minuses - the Wales Cancer Institute

fleagoldfishΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

91 εμφανίσεις

Dr Paul Lewis

Lecturer in Bioinformatics

Swansea Medical School

p.d.lewis@swansea.ac.uk

Clinical Bioinformatics

Tissue


Data



Clinic

‘research, development, or application of computational
tools and approaches for expanding the use of
biological, medical, behavioural or health data,
including those to acquire, store, organize, archive,
analyze or visualise such data’


The NIH Biomedical Information Science and Technology
Initiative Consortium has defined bioinformatics as…

What is bioinformatics?

NCRI

National Cancer Research Institute

NCI

National Cancer Institute

Cancer


Informatics


Resources

UK Government

CRUK

MRC

Wellcome Trust

…….

USA

UK

caBIG

Center for Bioinformatics

genomic

proteomic

clinical

transcriptomic

biological

histology

images

patient

biomarkers

clinical trial

HETEROGENOUS DATA

Translational Cancer Research & Clinical Bioinformatics

DATAMINE

tissue

data

patient

demographic

What is clinical (bio)informatics?

heterogenous by source …



e.g. databases, text files, image files of various formats, questionnaires

heterogenous by constitution (type) …



Heterogeneous data can vary internally.


e.g. histology scores, microarray, proteomic, image, age, sex

heterogenous by structure…



‘raw’ data can be numerical, categorical etc

Data can be …

Heterogeneous data means
diverse

types

of data from
different sources
.

HETEROGENOUS DATA

DATAMINE

Microarray

data

Data Integration

Proteomic

data

histology marker

scores

Patient data

Eg age, sex

Data Structure:

Numerical

Data Source:

Database

Needs:




Data integration methods




Data standardisation




High performance computing

Data Structure:

Categorical

Data Source:

Text Files

Data Source:

Web

Data Source:

Image Bank

HETEROGENOUS DATA

DATAMINE

genomic

proteomic

clinical

transcriptomic

biological

histology

images

patient

biomarkers

clinical trial

HETEROGENOUS DATA

Translational Cancer Research & Clinical Bioinformatics

DATAMINE

tissue

data

patient

demographic

Clinical trials involving microarray data

Nature 2002, 415: 530
-
535



Breast cancer patients with the same stage of disease can have
markedly different responses & overall outcome




Strongest predictors for metastases (eg lymph node status,
histological grade) fail to classify accurately breast tumours
according to their clinical behaviour




Chemotherapy or hormonal therapy reduces the risk of distant
metastases by ~ 1/3




70
-
80% of patients receiving this treatment would have survived
without it

Overall

Objective

Find gene expression signatures of breast cancer that
predict survival & allow for patient
-
tailored therapy
strategies

Patients (young)


34

developed
distant metastases



within 5 years



44

remained
disease free

after at



least 5 years



18

with
BRCA1

germline mutations




2

BRCA2

carriers

117


Primary

Breast

cancers

19


‘test’ tumours


12

developed
distant metastases



within 5 years




7

remained
disease free

after at



least 5 years

Lymph

Node


ve


< 55 yrs


Lymph

Node

-
ve

98



‘experimental’ tumours

Group 1

62

tumours


34%

Sporadic tumours =

distant metastases

within 5 yrs

Group 2

36

tumours


70%

Sporadic tumours =

distant metastases

within 5 yrs

This method distinguishes between good and bad prognosis tumours

poor

good

Used a statistical classification method to identify a 70
gene signature that predicted survival



Women under 55 diagnosed with lymph
-
node
-
negative breast cancer with a
poor prognosis signature have a:


15
-
fold odds ratio (OR)


to develop metastases within 5 years compared to those that have a good
prognosis signature





This predictive value of the classifier is superior to the currently available
clinical & histopathological prognostic factors:





OR

High Grade



6.4

Tumour size > 2cm


4.4

Angioinvasion


4.2

Age <= 40



3.7

ER

ve



2.4


Predictive Power of Classifier for Clinical Outcome….

Similar study by Wang et al., Lancet 2005



Test set of 115 node

ve tumours




Validated on 171 independent tumours




Outperformed other standard univariate clinical parameters

However . . .


Despite similar clinical & statistical designs the independent gene signatures only
share TWO genes


Maybe due to use of different microarray platforms

Bioinformatic issues with predicting survival
microarray data



Ein Dor (Bioinformatics, 2005) reanalysed the van’t Veer data set and showed
that
the predictive signature was not unique

and that
multiple signatures exist

within the data set that all
correlate well with survival



Gruvberger (Breast Cancer Research, 2002),
Brors (2004)


used different
Classification methods (
including, support vector machines, ANN, Multiple
Decision Trees

) on data from other samples and the same samples and failed to
predict relapse with statistical confidence or had different predictive power




Eden (European Journal of Cancer, 2004):


Reanalysed the van t’Veer data and argue that “good old” clinical markers, if
optimally analysed might have a similar discriminating power for breast cancer
prognosis as microarray gene expression profilers





Huang (Lancet, 2003):


Could only identify 17 of the 70 van t’Veer predictor genes using Affymetrix:


“….Genomic data will not replace traditional clinical factors, but will add
substantial detail to this clinical information”




The way forward?



WHO says….


“Some

of

the

claims

for

the

medical

benefits

of

genomics

have

undoubtedly

been

exaggerated,

particularly

with

respect

to

the

time
-
scale

required

for

them

to

come

to

fruition
.

Because

of

these

uncertainties,

it

is

vital

that

genomic

research

is

not

pursued

to

the

detriment

of

well
-
established

methods

of

clinical

and

epidemiological

research
.

Indeed,

for

its

full

exploitation

it

will

need

to

be

integrated

into

clinical

research

involving

patients

and

into

epidemiological

studies

in

the

community
.

It

is

crucially

important

that

a

balance

is

maintained

in

medical

practice

and

research

between

genomics

and

these

more

conventional

and

well

tried

approaches
.





The way forward?

Brenton et al., Journal of Clinical Oncology 2005


Propose a way forward:


1.
Data from existing signatures should be mined using different
algorithms to find an overlapping consensus set of predictive genes


2.
Large retrospective studies should be performed using lecacy tumour
banks to generate a more definitive breast cancer taxonomy validated
prospectively

However…


Microarray tumour expression data brings additional knowledge to
the table in particular insights into the important genes & pathways
that underlie the disease outcome

The way forward?

TISSUE MICROARRAYS (TMA)

The same bioinformatic approaches can be used on
histological data

The same bioinformatic approaches can be used on
mixed patient and clinical data

Underlying Groups

“Cluster”

“Classify & Predict”

nearest neighbour

decision trees

Neural Nets

SVM’s

Rule Induction

etc
….

HCA

SOM

K
-
Means

Etc…

Unsupervised/supervised clustering

Data Reduction

PCA

CoA

MDS

Etc…

New Disease Subtypes

HETEROGENOUS DATA

DATAMINE / Knowledge Discovery


Clinical & Biological Data

Consensus . . .

Analysis of heterogenous data sets will need
supercomputing power

Model

Building

Statistical

Analysis

Data

Visualisation

Data

Exploration

CTBi Copyright (2004) Paul Lewis

Data Integration & Analysis Platform



There are a number of important informatics issues that need to be addressed
when considering data sets generated from tissue banks

-

Data integration

-

Data Mining




We can apply same/similar methods to different data types individually


-

How can we do this for mixed heterogenous data sets


Conclusions



Need an infrastructure for data integration




Need standards for different data types




Need supercomputing power for heterogenous data analysis