Intelligent Systems Research Centre

apricotpigletAI and Robotics

Oct 19, 2013 (3 years and 11 months ago)

76 views

Intelligent Systems Research
Centre

Use of
Neural

Networks

to
Predict

and
Analyze


Membrane Proteins

in the
Proteome


Subrata Kumer Bose

Intelligent Systems Research Centre


London Metropolitan University, UK

Intelligent Systems Research
Centre

Supervisors

Dr. Hassan Kazemian

Intelligent System Research Centre,CCTM

London Metropolitan University.


&

Dr. Kenneth White


Institute for Health Policy and Research


London Metropolitan University.



Intelligent Systems Research
Centre

Collaboration


With

Dr. Antony Browne

School of Computing


University of Surrey, Surrey, Guilford

Intelligent Systems Research
Centre

Abstract


Transmembrane (TM) proteins are one of the most understudied groups
of proteins in biochemical research, because of the technical difficulties
of obtaining structural information about transmembrane regions. 3D
structures of proteins derived by X
-
ray crystallography have been
determined for about 15000 proteins, but only about 30 of these are
transmembrane proteins, despite the fact that TM proteins may account
for about 30% of the proteome. This project seeks to make a contribution
to knowledge and understanding in the field of neural networks, through
the development of a particular area of theory and application of a novel
methodology. The project seeks to develop software for analysing
protein sequences for the presence of membrane spanning regions
using artificial neural network approaches. The expected benefits
include an increased understanding of how to create and train optimal
neural networks for membrane protein datasets, which will be extremely
useful in both academia and industry.

Intelligent Systems Research
Centre

Introduction


Bioinformatics is the application of computer technology
to the management of biological information. Computers
are used to gather, store, analyze and integrate
biological and genetic information which can then be
applied to gene
-
based drug discovery and
development.


Bioinformatics


Intelligent Systems Research
Centre

Introduction Contn..



In recent years, many bioinformaticians have
researched into the prediction of globular proteins,
which is roughly about 75% of the whole proteome



However, membrane proteins, which are 20
-
30% of the
proteome offer more novel targets for newer drug
developments, are largely ignored



There have been several attempts in the last 20 years
to develop tools for predicting membrane
-
spanning
region
s but current tools are far away from achieving
95% reliability in prediction (Baldi.P.,2002 )




Membrane
Proteins


Intelligent Systems Research
Centre

Introduction contn…


Data mining (or more precisely,
knowledge extraction
) can be
described as the process of discovering previously unknown
dependencies and relationships in data sets.


A (learning) system may discover
salient features

in the input
data whose importance was not previously recognized.


It is now
established

that algorithms can be designed which
extract

understandable representations from
trained neural
networks
, enabling them to be used for
data mining
(
Browne, A.,
2004)


Data Mining


Intelligent Systems Research
Centre

Introduction contn…


In the past, most data mining has been
performed using symbolic artificial intelligence
data algorithms such as
C4.5

and
C5

or
CART
.


Neural Networks (NNs) have in the past been
treated as

black boxes

: systems unable to
explain the process by which a decision or
output has been reached.

Knowledge
Extraction


Intelligent Systems Research
Centre

Objectives of the investigation


The project seeks to develop software for analysing protein sequences for the
presence of membrane spanning regions using artificial neural network
approaches. Beyond simply identifying membrane spanning regions the
approach would be used to analyse biologically useful subsets of proteins with
membrane spanning regions, which would include:


(i) The large family of G
-
protein coupled receptors (GPCRs). These form an
important group of drug targets of interest to the pharmaceutical industry, and
are the site of interaction of many hormones, neurotransmitters and other
chemical stimuli around the body. Attempts have been made to develop
methods for predicting coupling specificity of GPCRs using Hidden Markov
Matrices (Möller
et al.

2001b). and the project would extend work in this area.




Intelligent Systems Research
Centre

Objectives of the investigation

(
ii)
Membrane proteins with distinct cellular locations. Prediction of
the localization of membrane proteins to the Golgi apparatus has
been attempted (Yuan and Teasdale 2002) and it would be
useful to attempt analysis of proteins localized to other
membrane compartments, such as plasma membrane,
endoplasmic reticulum, lysosomes, and peroxisomes, to look for
discriminating motifs in membrane spanning regions in addition
to known localizing signals.


Intelligent Systems Research
Centre

Objectives of the investigation

The methodology could also be applied to membrane
proteins unique to bacteria and other micro
-
organisms
and could potentially identify new targets for antibiotics.

Intelligent Systems Research
Centre

The relationship of this investigation to previous work in the

area



A large number of researchers are investigating
globular proteins because of the easy availability of the
data


T
he prediction of membrane protein structures is a key
area that remains unsolved (Baldi
et al.

2002).


There have been several attempts over the last 20
years to develop tools for predicting membrane
spanning regions, reviewed recently by

(
Möller
et al
.
2001a).


Background

Intelligent Systems Research
Centre

The relationship of this investigation to previous work in the area


The problem of prediction is made topologically more
complex by the presence of several transmembrane
domains in many proteins, and the same authors
(Möller et al.
,2001b
) conclude that current tools are far
away from achieving a 95% reliability in prediction.The
same group have mentioned that the software
developed so far are basically divided on two principles
-
local approach and global approaches.


Background



Intelligent Systems Research
Centre

Neural Networks


An artificial neuron is an information processing
element that operates in a manner that
resembles some operation of a biological neuron
(simplified). A collection of several elements that
can process information in parallel (and in
connection) is a network of artificial neurons.


Neuron


Intelligent Systems Research
Centre

Neural Networks


According to Haykin, S. (1994)

A neural network is a
massively parallel distributed processor that has a
natural propensity for storing experiential knowledge
and making it available for use. It resembles the brain
in two respects:

1.
Knowledge is acquired by the network through a
learning process.

2.
Interneuron connection strengths known as synaptic
weights are used to store the knowledge.


Definition


Intelligent Systems Research
Centre

Architecture of the Neural Network

Amino Acid Sequence

Input
Signals

Output

y
1

Layer of Hidden
Neurons

y
2

x
2

x
m

x
1

Layer of Output
Neurons

Layer of Input
Neurons

Membrane Protein

Nonmembrane Protein


Network Used


Intelligent Systems Research
Centre

Architecture of the Neural Network


Data Collection:

ensuring that the correct data

are
gathered.


Data Preparation:

cleaning the data, and ensuring

that
they are in the appropriate format for Neural

Connection.


Design:

choosing the best neural approach

(Here MLP)


Training and Testing:

building
the

application.


Experimentation:

tailoring the application to improve
the

results.


Implementation:

producing
the

results.




Steps Involved


Intelligent Systems Research
Centre

Data Collection

Architecture


Intelligent Systems Research
Centre

Architecture of the Neural Network

Data
Preparation

Intelligent Systems Research
Centre

Architecture of the Neural Network


Design


The first consideration in designing the application
was

the neural technique to be adopted. This type of
problem, where
we

want to link a set of inputs
(
sequence of Amino acids
) to an output (
membrane or
nonmembrane or sub classification of membrane
)
should be solved using a supervised neural technique.
There are three supervised neural techniques in Neural
Connection, Radial Basis Function, the Bayesian
Network and Multi
-
Layer Perceptron.


Intelligent Systems Research
Centre

Architecture of the Neural Network


Design


MLPs are the most commonly used neural computing technique.


The MLP differs from the Simple Perceptron in two major ways.


Firstly, it has an additional layer of neurons between the input
and output layer, known as the hidden layer.This layer vastly
increases the learning power of the MLP.


Secondly, it uses a transfer, or activation function to modify the
input to a neuron.


The activation of hidden and output layer neurons is the same as
in the case of simple Perceptrons, while the transfer function is a
smooth non
-
linear function, usually the sigmoid function.

Intelligent Systems Research
Centre

Architecture of the Neural Network


Training and Testing

Training

Cycles

Intelligent Systems Research
Centre

Results

Intelligent Systems Research
Centre

Results

Intelligent Systems Research
Centre

Conclusion


Modern data gathering techniques are producing vast
amounts of data. However, data can be useless
in the
absence of understanding
.


The extraction of
decision trees

from
trained NNs

is an important
addition to the data mining toolkit of knowledge extraction
techniques(Browne, A & R.Sun.2001,1999)



The combination of NNs with an algorithm to extract knowledge
from the trained networks potentially offers the
‘best of both
worlds’

to those attempting to make
predictions

on their data and
simultaneously understand

it.

Intelligent Systems Research
Centre

Conclusions Contn..


This technique demonstrates that it is possible to
combine the generalization
accuracy of NNs

with the
comprehensibility generated by
the knowledge
extraction

method .



Preliminary
results

will be analysed and further
improvements will be designed


Intelligent Systems Research
Centre

Reference


Bose, S. and Browne, A.,Hassan K.,White,K. (2003) Knowledge Discovery in Bioinformatics using Neural Networks. Proceedings 6t
h
International Conference On Computer And Information Technology, Dhaka, Bangladesh


Baldi.P., G.Pollastri. "Machine Learning Structural and Functional Proteomics", IEEE Intelligent Systems (Intelligent Systems

in

Biology II),
March/April 2002.


Browne, A., Hudson, B. D., Whitley, D. C., Ford, M. G. and Picton, P. (2003) Biological Data Mining with Neural Networks: Imp
lem
entation &
Application of a Flexible Decision Tree Extraction Algorithm to Genomic Problem Domains. Neurocomputing: Special Issue on Neu
ral

Networks
in Bioinformatics (In Press) ISSN: 0925
-
2312.


Browne, A., Hudson, B. D., Whitley, D. C. , Ford, M. G. and Picton, P. (2004) Biological data mining with neural networks: Im
ple
mentation and
application of a flexible decision tree extraction algorithm to genomic problem domains. Neurocomputing (In Press) ISSN: 0925
-
23
12.


Browne, A. (2002). Representation and extrapolation in multi
-
layer perceptrons. Neural Computation, 14(7), 1739
-
1754. ISSN: 0899
-
7667.


Browne, A. & R. Sun. (2001). Connectionist inference models. Neural Networks, 14(10), 1331
-
1355. ISSN: 0893
-
6080.


Browne, A. & R. Sun (1999). Connectionist variable binding. Expert Systems: The International Journal of Knowledge Engineerin
g a
nd Neural
Networks 16(3), 189
-
207. ISSN: 0266
-
4720.


Browne, A. & P. Picton (1999). Two analysis techniques for feed
-
forward networks. Behaviormetrika: Special Issue on Analysis of
Knowledge
Representations in Neural Network Models 26(1), 75
-
87. ISSN: 0385
-
7417.


Möller, Michael D. R. Croning, and Rolf Apweiler (2001a) Evaluation of methods for the prediction of membrane spanning region
s B
ioinformatics
Vol 17: 646
-
653.


Möller, Jaak Vilo, and Michael D.R. Croning (2001b)Prediction of the coupling specificity of G protein coupled receptors to t
hei
r G proteins
Bioinformatics 17: 174S
-
181S.


Möller, Evgenia V. Kriventseva, and Rolf Apweiler (2000) A collection of well characterized integral membrane proteins Bioinf
orm
atics 16: 1159
-
1160.


Yang, S. & Browne, A. (2002a). Multistage neural networks: Adaptive combination of ensemble results. Proceedings of the Fourt
h I
nternational
Conference on Recent Advances in Soft Computing (RASC2002), Nottingham, UK.


Yang, S. & Browne, A. (2002b). Multistage Neural Network Ensembles. Proceedings of the Third International Workshop on Multip
le
Classifier
Systems, Caligari, Italy, published as Lecture Notes in Computer Science 2364, Springer Verlag, Berlin, Heidelberg.



Intelligent Systems Research
Centre

I am Indebted to


Ray jones

Intelligent Systems Research
Centre

Acknowledgement


Ray Jones


Dr. Sahithi Siva


Dr. Robert whitrow


Dr.Algirdas Pakstas


Dr Karim Ouazzane