Life Sciences + IT = Bioinformatics

kettleitchyΒιοτεχνολογία

5 Δεκ 2012 (πριν από 4 χρόνια και 8 μήνες)

157 εμφανίσεις

Business Unit or Product Name

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

Life Sciences + IT = Bioinformatics

Chris Thomas

Programme Leader for Life Sciences and HPC

chrisw_thomas@uk.ibm.com

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

Bioinformatics

The merger of biotechnology and information
technology with the goal of revealing new insights and
principles in biology

(National Center for Biotechnology Information)


Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

Agenda


Introduction


A quick science lesson


The marketplace


Some examples


Proteomics


Focused pharmaceuticals


Biomedical simulation


Microarrays


mHealth


Data mining



Only a VERY brief look at a vast subject

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

IBM Hursley


IBM’s largest development lab outside US


$1 billion revenue


1,500+ employees


Focus on middleware software


CICS is “the” transaction processing s/w


MQ messaging


Web services


Javat Technology Centre


Also a home for specialist services….

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

Agenda


Introduction


A quick science lesson


The marketplace


Some examples


Proteomics


Focused pharmaceuticals


Biomedical simulation


Microarrays


mHealth


Data mining



Only a VERY brief look at a vast subject

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

Some basic science
-

1


Inside the cell nucleus, 6 feet of DNA is packaged into
23 pairs of chromosomes, one chromosome of each
pair coming from each parent.


Each of the 100 trillion cells in the human body (not
red blood cells) contains the human genome. This
information is encoded in 6 billion subunits of DNA
called base pairs


Each of the 46 human chromosomes contains the
DNA for hundreds or thousands of individual genes
which are the units of heredity.


The specific information contained within DNA is
coded in units called base pairs.

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

Some basic science
-

2


Base pairs link in combination A
-
T and G
-
C to join two DNA
strands to form double helix structure


DNA sequences are expressed simply as character
strings….

cctcactcac ttgcccctta caggactcag ctcttgaagg caatagcttt atagaaaaaa

cgaataggaa gacttgaagt gctatttttt tttttttttt tgtcaaggct gctgaagttt

attggcttct catcgtacct aagcctcctg gagcaataaa actgggagaa acttttacca

agatttttat ccctgccttg atatatactt tttcttccaa atgctttggt gggaagaagt

Guanine (G)

Adenine (A)

Thymine (T)

Cytosine (C)

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27
February
2004

Some basic science
-

3

DNA
DNA
Protein
RNA
Information
Information
Nuclear Membrane
mRNA
Transcription
RNA Synthesis
Translation
Protein Synthesis
Protein
Ribosome

Gene expression is the process whereby the information
“encoded” in the DNA is used to generate proteins

Hursley Technology & e
-
Solutions

©
2004
IBM Corporation

EPCC MSc Lecture

27 February 2004

Some basic science
-

4

DNA

RNA

Protein

Pathways

Phenotype


Historically, 220 targets have generated $3 trillion of value


Industrialised genome sequencing has created a target rich, lead
poor environment


Important to distinguish between


genotype



The genetic makeup, as distinguished from the physical
appearance, of an organism or a group of organisms



phenotype

-

The observable physical or biochemical characteristics of an
organism, as determined by both genetic makeup and environmental
influences



32,000 genes,
representing
less than 3% of
the genome.

Alternative splicing
turns 32,000 genes
into 500,000
messages

Post transitional
modification turns
500,000 messages
into 1.5 million
proteins

1.5 million proteins
interacting in
complex networks
create hundreds of
millions of metabolic
pathways

Hundreds of millions of
pathways influenced by
the environment and
stochastic processes
create 6 billion different
individuals

Hursley Technology & e
-
Solutions

©
2004
IBM Corporation

EPCC MSc Lecture

27 February 2004

Humane Genome Project

http://genome.gsc.riken.go.jp/hgmis/posters/chromosome/index.html

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

Agenda


Introduction


A quick science lesson


The marketplace


Some examples


Proteomics


Focused pharmaceuticals


Biomedical simulation


Microarrays


mHealth


Data mining



Only a VERY brief look at a vast subject

Hursley Technology & e
-
Solutions

©
2004
IBM Corporation

EPCC MSc Lecture

27 February 2004

The Marketplace


Funded research plays significant role in Discovery


Clinical development primarily major pharma companies


Some funded development


Many small players focusing on niche technologies


Healthcare driven by numerous factors


Demographics


increasing age of population


Cost of care


especially in US


Increasing focus on proactive management of conditions

Clinical

Development

Discovery

Healthcare

Delivery

Feedback from diagnosis to discovery

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

Agenda


Introduction


A quick science lesson


The marketplace


Some examples


Proteomics


Focused pharmaceuticals


Biomedical simulation


Microarrays


mHealth


Data mining



Only a VERY brief look at a vast subject

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27
February
2004

Basic proteomics and sequence matching


“Simplistic” approach to identifying genetic causes of disease


Process in
brief



Take a cancer cell and a healthy cell


Extract proteins from each cell


Separate proteins using 2
-
D electrophoresis on a gel plate


Differences identified through fluorescent markers


Physically extract “novel” protein from gel by cutting plate


Identify protein through spectroscopic analysis


Protein linked back to genetic information


Action of protein in medical pathway investigated


Last stage takes several months of intensive research


Saying


“one gene, one post
-
doc”


Mixture of wet chemistry, high energy physics and IT

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

Basic Local Alignment Search Tools (BLAST)


A family of heuristic algorithms which search for an optimal local alignment
to a query


May be used for both protein and nucleotide searches


Algorithm has to manage “gaps”


DNA can mutate, so some minor differences (insertions, substitutions or deletions)
will occurr


Many databases exist which can be used for reference


Drosophila


Escherichia coli


Swiss
-
Prot and Brookhaven protein data banks (pdb)


GenBank


All variations scale well both horizontally and vertically


Basically integer/string intensive so ideal for low cost Intel/AMD platforms


Database is split and distributed over multiple nodes in a cluster to fit in memory


A query from a user is distributed across a group of nodes spanning the entire search
database


For multi
-
user scaling just keep adding groups of nodes and balancing across the farm

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

Example BLAST search

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

Growth in available data

http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html

http://www.rcsb.org/pdb/holdings.html

GenBank

Brookhaven PDB

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

19

27
February
2004

Focused drug treatments

COPD = Chronic Obstructive Pulmonary Disease


Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27
February
2004

Biomolecular pathways and simulation

In vivo

In vitro

In silico


Moving drug development from “live body” and “wet chemistry” to
computation


Should reduce both cost and time to bring new drugs to market


Based on simulation models of biomolecular pathways


Allows “what if?” type questions to be answered more easily


Can help focus ongoing vivo/vitro work more productively


Can help filter out “bad” drug candidates before clinical trial process

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

Metabolic Pathways


Understanding the relationships is not enough


The human body and cells are dynamic systems

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27
February
2004

Microarray technology


Provides insight into
what genes are expressed
in a particular cell type of an organism, at a
particular time, under particular conditions.


Can provide valuable information for tuning
dynamics of biosim models.


Allows disease “signatures” to be determined
resulting in the potential for specific tests at
gene level.


Research organisations developing new
database and query technologies to allow
exploitation of results


Yet more data!!!!

Actual size 0.5”

500,00 sites

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

mHealth
-

Compliance Management

Remote

Web

Browser

Compliance
monitoring

Compliance
device loaded

Patient
diary

OTA
Regime

Compliance
alerts

Data

Prescription

dispensed

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27
February
2004

mHealth


Diagnostic Devices

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

Data Mining and Unstructured Information Management


Crawlers
: gather interconnected,
unstructured information (such as
web pages)


Trawlers
: systematically gather
indexed information (such as
databases, news archives)


Federated access gateways
:
provide a single interface to many
different data repositories


Feature discovery
: for extracting
meta
-
information from raw data.


Summarization
: automatic
construction of a meaningful
(textual) summary.


Clustering
: to group similar items.


Classification
: the allocation of an
item to a pre
-
defined category
within a taxonomy
.


Which
Tera Byte
would you
like?

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

IBM Discovery Link


Federates across multiple data sources and type


Allows complex queries to be constructed…


e.g. show me all the compounds similar to ketanserin that have been
tested against members of the serotonin family and have the
characteristics of a good drug

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

From human genome to personal genome


Single molecule array technology
allows an individual’s genome to be
sequenced.


Will allow every individual to obtain
their personal genome


In days not 13 years


For $1000 not $3 billion


Still ongoing research but technology
looks promising


Interesting sequencing challenges to
piece the bits together


http://www.solexa.co.uk

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27
February
2004

Summary


The explosion in biomedical data has increased dramatically in the last 10
years


Without the use of IT we would not even begin to exploit this.


We have moved from data gathering to the development of understanding.
True exploitation is still in its infancy.


Many technologies involved


Microarrays and gene expression databases


Computational Chemistry


Molecular Modelling


Sequence matching and homology


Unstructured Information Management


text searching


The pharmaceutical industry has an interesting 10 years ahead!


Business models are changing.


With such an open field even the smallest biotech company can make a huge impact.

Data is not Information. Information is not knowledge. Knowledge is not wisdom

With apologies to Frank Zappa

Hursley Technology & e
-
Solutions

© 2004 IBM Corporation

EPCC MSc Lecture

27 February 2004

Some books for beginners :o)


One Renegade Cell: The Quest for the Origins of Cancer

Robert A. Weinberg


In the Beginning was the Worm: Finding the Secrets of
Life in a tiny Hermaphrodite

Andrew Brown


Almost Like a Whale: The ‘Origin of Species’ Updated

Steve Jones