The Role of Machine Learning in

kettledoctorAI and Robotics

Oct 15, 2013 (4 years and 2 months ago)

59 views

The Role of Machine Learning in
Modelling the Cell.

John Hawkins

ARC Centre for Complex Systems

University of Queensland

Australia

Overview of Talk


Overview of cell biology


Modelling the cell


Subcellular localisation signals


Machine Learning in General


Neural networks


Feed Forward versus Recurrent


Cell Biology


Quick and Dirty


Membrane bound
Organelles


Nucleus


DNA
-
> RNA
-
>
Protein


Transport, e.g.


Mitochondria


Peroxisome


Modification, e.g.


Disulphide
Bond Formation


Glycosylation




Cell Feedback


At a particular time point a set of genes
will be expressed.


These do not remain constant, instead
the emerging picture is that


There is some essential cycle of gene
expression


With a capacity to indulge in alternative
pathways of expression under external
stimulus.


The pattern of expression is
implemented by protein and RNA
feedback onto the genes.



Modelling the cell


Ideally we would like to model the cell
from the level of a 3D physical
simulation.


Currently this is infeasible


So numerous approaches are taken to
form abstractions


Gene Regulatory Networks


Differential equation models of particular
pathways


Machine learning models of particular
processes

Biological Sequences


Many Important Biological Molecules are
Polymers.


Thus representable as a sequence of discrete
symbols.


Sequence M
= [
m
1
,
m
2
,

…, m
n
] where:


DNA
m
i



{ A, T, G, C }


RNA
m
i



{ A, U, G, C }


Protein
m
i



{ G, A, V, L, I, P, S, C, T, M, D,
E, H, K, R, N, Q, F, Y, W }

Information Content


How much information in a linear sequence?


Two crucial elements to function


Physical/chemical properties


Molecular shape


Each residue has well known properties


Denaturation. (Anfinsen,1973).


Sequence defines arrangement of chemical
properties which in turn defines folding.


Biological Patterns


Motifs


General term for patterns


Numerous Definitions & Visualisations


PROSITE Patterns


Regular Expression


PROSITE Profiles


Probability Matrix


LOGOs


Peroxisomal Localisation


Predominantly controlled by a C
-
terminal sequence called the PTS1
signal.


Roughly 12 residues long


Known dependencies between
locations

Nuclear Export


Some proteins move continuously between the
nucleus and cytoplasm of the cell.


Either as:


Transporters


Regulators

Machine Learning


Requires a set of examples, with


Raw input, sequences data, and


Known classes that the machine should
predict


In essence Function Approximation


Start with a General parametrised
function over the input data


Adjust the parameters until the output of
the function is a good approximation to
the known classes of the examples.




Bias


Bias is generally unavoidable


(Mitchell, 1980)


Three Sources of Bias


Input Encoding


Function Structure (Architecture)


Parameter adjustment algorithm (learning)


Neural Networks


Graphical Model consisting of layers of
nodes connected by weights


Feed forward neural networks


Fixed input window


Signal propagates in a single pass through the
layers


Recurrent Neural Networks


Signal processed in parts


Recurrent connections maintain a memory state


Output generated after processing the last piece
of the input signal

Simple Neural Networks


F F N N O
h

=

S
(
W
1

I
1

+
W
2



I
2

+
b
)


R N N O
h

=

S
(
W
1



I
2

+
W
2



S
(
W
1



I
1

+
b
) +
b
)

RNNs in Bioinformatics


Bi
-
Directional RNN


Applications


We have applied these techniques to


Subcellular Localisation to


Endoplasmic Reticulum


Mitochondria


Chloroplast


Peroxisome


http://pprowler.imb.uq.edu.au


Working with whole genome data and
wet lab biologists to use these tools for
data mining.

The End…

?