UK e-Infrastructure: an Industry

lapclassΔιαχείριση

6 Νοε 2013 (πριν από 3 χρόνια και 5 μήνες)

114 εμφανίσεις

UK e
-
Infrastructure
: an Industry
Perspective

Darren Green FRSC

GlaxoSmithKline

UK e
-
infrastructure
L
eadership Council

Lifesciences

and the UK economy

“The
UK life science industry is one of the world leaders;
it is the third largest contributor to economic growth in
the UK with more than 4,000 companies, employing
around 160,000 people and with a total annual
turnover of over £50 billion. Its success is key to future
economic growth and to our goal to rebalance the
economy towards making new products and selling
them to the world. Globally the industry is changing
with more focus on collaboration, out
-
sourcing of
research and earlier clinical trials with
patients”


David Cameron, 5
th

December 2011

The R&D Productivity Gap

Source: Burrill & Company; US Food and Drug Administration.

Note: NMEs do not include BLAs

26
25
22
28
53
39
30
35
27
24
17
21
31
18
18
14
$12
$13
$13
$15
$17
$19
$21
$23
$26
$30
$32
$33
$39
$39
$43
$54
0
10
20
30
40
50
60
0
5
10
15
20
25
30
35
40
45
50
$55
New Drug Approvals (NMEs)
PhRMA Member R&D Spending
New Drug Approvals (NMEs)
Pharma R&D ($ billions)
92
93
94
95
96
97
98
99
00
01
02
03
04
05
06
07
UK “Big
Pharma
” Research sites 2001

UK “Big
Pharma
” Research sites 2012

CEEDD

GSK is evolving from a monolith

Virtualization of
Drug Discovery

External

Resources

Internal

Resources

CEDDs

Pharma

Centralized

Control/Management

De
-
Centralized

Control/Management

>40
internal
engines

35 external
engines

Corporate


Venture Fund

New/expanded in
2008/2009

Adding external efforts to internal research

9

Lead Optimisation within Drug Discovery


gene
protein
target
screen and
identify lead
Lead
optimisation
chemical
diversity
(compound
library)
test safety& efficacy
in animals and
humans
Targets Hits Leads Candidates Drugs Products

10

The Lead Optimisation cycle

11

“Rational” drug design


Most design methodologies
are aimed at reducing the
number of cycles in lead
optimisation
-

ideally to 1!



All design methodologies, to
date, have had limited
success in this regard

12

Solubility

Absorption

Metabolic

stability

Safety

X

Drug

Potency

X

Lead

PC1

PC2

Traditional Way
: Sequential Process, Costly, Lengthy

A multi
-
objective optimisation

Desired
-

faster navigation through multi
-
dimensional space, by reducing the cycles


or speeding them up

A huge search space


Small organic molecule
property space:


Atomic basis set small for
organic reagents


H, C, N, O, S, F,
Cl
, Br, P


Carbon connectivity is not just
linear


Approximately
10
27

molecules of
25 atoms


References:


Fink &
Reymond
, J. Chem. Inf. Model. 47 (2007) 342
-
353


Fink et al.,
Angew
. Chem. Int. Ed., 44 (2005) 1504
-
1508


http://www.dcb.unibe.ch/groups/reymond/


C
C
C
Typical HPC usage


Coarse grain parallelisation


Same calculation across large numbers of
molecules


Simple properties



Docking/scoring



Quantum mechanics


Decreasing

Frequency of use

15

Lead Optimisation within Drug Discovery


gene
protein
target
screen and
identify lead
Lead
optimisation
chemical
diversity
(compound
library)
test safety& efficacy
in animals and
humans
Targets Hits Leads Candidates Drugs Products

Green Chemistry


Sustainable Development
:
“meeting
the needs
of the present without compromising the
ability of future generations to meet their own
needs”.
*


Green Chemistry**:
“To promote innovative
chemical technologies that reduce or
eliminate the use or generation of hazardous
substances in the design, manufacture and
use of chemical products
.”

* United Nations Commission on Environment and Development in 1987

** US Environmental Protection Agency 1990s

Enzyme design


Proteins that catalyse a chemical reaction


Substrate + Enzyme = Product + Enzyme







Proteins are linear assemblies of amino acids that
have a biological function



Example: Penicillin G Acylases in the production of semi
-
synthetic penicillins and cephalosprorins


Pen G
Acylase

(PGA) has been used since the 60s to make 6
-
amino
penicillanic

acid (6
-
APA) from Penicillin G



More recently, it has also been used in the reverse direction to
synthesise

penicillins and
cephalosporins

by
catalysing

the condensation of
phenylacetic

acid derivatives with a beta
-
lactam









N
S
N
H
O
O
O
H
O
N
S
N
H
2
O
O
H
O
O
O
H
+
PGA
Penicillin G
Phenylacetic acid
6-APA
The challenge


To be able to design enzymes which are able to
synthesis precisely the drug substance that is required,
with the efficiency needed for manufacturing


This will require


Libraries of existing enzymes for standard chemical bond
formation (e.g. amides)


Reliable methods for
ab

initio design/evolution of novel
enzymes for specific purposes



Synthetic Biology has been identified by the
Technology Strategy Board as a priority area of
investment

A(
nother
) huge search space


Protein property space:


20 amino acids in ~10 groups


G, A, S/T, C, P, D/E, R/K, N/Q, H/F/W/Y, I/L/M/V


Linear combination of amino acids


20
n

permutations


For N = 100 (a rather small protein)
the number 20
100

(~1.3x10
130
) is
already far greater than the number of
atoms in the known universe. Even a
library with the mass of the Earth
itself


5.98x10
27

g

would comprise
at most 3.3x10
47

different sequences

Rational approach


Use x
-
ray diffraction crystal structure information


View in graphics software



Identify binding pocket



Identify (or propose) binding mode


Information from similar ligands or molecular docking software



Identify amino acids surrounding pocket



Find bacterial sequences with variants in pocket


Use multiple sequence alignment

HPC applications


QM/Simulation for rational approaches


Ability to test millions of mutations
in
silico



Empirical/statistical algorithms for efficient
searching/sampling very large search spaces

Translational Medicine


Biomedical
research that aims to translate between
Clinical Practice and Laboratory research.


Most
translational studies are focused on the
identification and validation of biomarkers that are
testable in patients, including markers that are
predictive of:


the prognosis of disease
(severity)


how well a patient may respond to a pharmacological
therapy


the susceptibility of a patient to side effects of therapeutic
intervention


the identification of subgroups that are at increased risk
for disease


Potential Impact of Translational Medicine


Clinical trial design


Design of diagnostics


Targeted prescribing of medicines



Personalised Medicine


What needs to come together?

Scientific

Discipline

Infrastructure

Components

Clinical

Sciences


䑯捵浥nt

䵡n慧a浥nt



浡m慧a

瑲楡l

慰p牯v慬

慮d

p慴楥nt

捯nsent

fo牭r


E汥捴牯n楣

䍡Ce

剥Ro牴

䙯rm

⡥䍒䘩

d慴a

捯汬l捴楯n

sys瑥t


䍬楮楣慬

䑡瑡

䵡n慧敭ent

p污lfo牭r


䍬楮楣慬

却S瑩t瑩捳

P污lfo牭r



䵥M楣慬

H楳瑯特

牥捯牤s

⡥H剳)

Biobank


䑯捵浥nt

䵡n慧a浥nt



浡m慧a

瑲楡l

慰p牯v慬

慮d

p慴楥nt

捯nsent

fo牭r


䱡bo牡瑯特

䥮fo牭慴楯n

䵡na来ment

卹s瑥浳

⡌䥍匩

for

瑲t捫楮g

瑨e

汯捡c楯n



s慭a汥s

Biological

Sciences

(Bench)


E汥捴牯n楣

No瑥too歳



捡p瑵牥



spe捩c楣

e硰e物浥n瑳

Biological

Sciences

(High

Dimensional

Biology)


䱉LS

systems



o牧慮楳e

睯牫f汯w

慮d

捡c瑵牥

牥su汴l

f楬is


䑡瑡

却S牡来

䅲捨楶es



s瑯牥

污牧l

p物浡特

d慴a

fi汥s

f牯m

慮慬a瑩捡t

p污瑦t牭r

⡩(a杩g本

N䝓,

o浩ms,

e瑣t

Biostatistics/Bioinformatics


却S瑩t瑩捡氯t慴a

p牯杲慭浩gg

env楲inmen瑳

for

p牯捥ss楮g

慮d

慮慬as楮g

d慴a


剥Re牥n捥

䑡瑡b慳as



b楯汯杩g慬

楮fo牭慴楯n

Knowledge

Management/Systems

Biology




瑯o汳



捡p瑵牥

牥su汴l

慮d

ou瑰ut



慬a

e硰e物浥n瑳


䵯de汬lng

瑯o汳



捯浢楮e

d慴a

f牯m

慬l

domains

for

analysis


Reference

knowledge

(literature,

pathway

knowledge,

etc)

The infrastructure challenge


R
e
-
useable
, secure
infrastructure
service and components that can be
rapidly re
-
deployed and configured for
cross
-
organisational
investigations.


The
key features of such a platform include:


m
ulti
-
terabytes
of
storage


rigorous
access control (critical in handling patient data
),


data
governance and
curation

services


standardised
dictionaries,
ontologies

and
APIs


ETL
tools to carry out loading of data, high bandwidth connections to data provision
centres


data
modules enabling the management a wide range of data
modalities


patient
and sample
leve

data tracking (enabling data retraction
)



collaborative
search and analytics
tools


virtual
team collaboration
spaces


A
ll
of which are available as a sustainable service which can either host
multiple collaborations or be flexibly deployed to meet the needs of
specific collaborations.



On top of this such an infrastructure needs secure connections with
medical
eHR

systems,
biobanks

and LIMS systems.

HPC usage by industry: current


Internal systems:


Linux clusters



Commercial


Small use of commercial clouds



Some examples of large public cloud usage:


Inhibox
/Amazon



Industry use of UK e
-
infrastructure

“In
the domain of high performance computing
for life sciences, the Science and Technology
Facilities council (STFC) runs an e
-
science
project with a 10
-
year
history.
We are not
aware of any life science company that makes
of these
resources”*


* Response from the industry leads of the EU
OpenPhacts

IMI project to UK Research Council 2012

Barriers we need to overcome


Industry engagement



Software



Security



Data transfer



Domain Knowledge



Summary


Industrial applications of HPC are emerging


Lifescience

research increasingly involves
collaboration


Requirements of
lifesciences

companies are
diverse


UK HPC will need to evolve and differentiate
itself from commercial offerings


There is an opportunity for us to create
something unique