6. Prospects of Bioinformatics in Mauritius - Virtual Centre for ...

vivaciousefficientBiotechnology

Oct 1, 2013 (3 years and 10 months ago)

162 views

1


Bioinformatics


Opportunities
for Mauritius

Oveeyen

Moonian
1

Yasmina Jauferally
-
Fakim
2

Sunilduth Baichoo
1

Shakuntala Baichoo
1

Zahra Mungloo
-
Dilmohamud
1

1

Department of Computer Science and Engineering

2
Department of
Agricultural and
Food

Sciences

Univer
sity of Mauritius

Abstract


Traditional research experiments in molecular biology have been largely overtaken by high
-
throughput
methods which generate far more data in relatively shorter periods of time. Genome sequences and
gene expression studies have
become crucial in understanding biological processes. This has made the
handling, analysis and storage of such data only poss
ible with computational tools,
hence the
development of the interdisciplinary field of bioinformatics which brings together the sc
iences of
biology, biochemistry and computers. Over the past decade or so
,

bioinformatics has become the
forefront of research on living organisms. The challenges presented by the scale of data produced in the
genomic and post
-
genomic era, have been addre
ssed by developers of computer programs in order to
provide efficient means for data analysis and management.

A whole new realm of bioinformatics resources has become available to scientists thus allowing for
rapid discovery of new genes and proteins. Al
l disciplines of biological sciences, including medical,
environmental, microbial and plant sciences are set to benefit from such developments. This paper
describes some of those resources and how they are being used. It also presents an overview of the
di
fferent bioinformatics organisations which are driving force
s

behind the rapid implementation of
facilities in this field

and the ethical issues related to bioinformatics development
. The

paper
finally

highlights

opportunities

for Mauritius

in the field of

bioinformatics
.

1.

Introduction

Advances in biological sciences over the past few decades have been marked by major developments in
technical methods for studying living cells and tissues more closely. Primarily, the advent of molecular
2


approaches, such a
s genetic engineering and DNA amplification, has revealed the complexities of
cellular interactions which determine physiological and biochemical characteristics. Such methods were
however relatively limited given that single or only a few genes could be
studied at a time. High
-
throughput technologies have revolutionised experimental outputs in a way that data coming out of
research activities have to be analysed with

powerful

computational tools. DNA sequencing, microarray
technology, DNA and protein chip
s, molecular markers have provided new platforms for understanding
how biological information is organised and utilised in different organisms. They have allowed an insight
into the causes of diseases, how hosts and pathogens interact, and all together dep
ict a much more
detailed picture of living organisms.

Bioinformatics is an area where computational applications are used for interpreting biological data
mainly from sequences of DNA, RNA or proteins, and from
patterns of gene

expression
. Determination
a
nd comparison of protein structures have also
become

possible through various tools. For this purpose,
specific
software
and algorithms
have

been
developed for particular uses. The field of bioinformatics has
developed very rapidly over the last decade and

has become indispensible in life sciences research. It
integrates various disciplines like computer science, molecular biology and biochemistry as well as
statistics and mathematics. Data from experiments have to be captured, stored, and
made

easily
acces
sible to users. Large databases store large amount of information that can be retrieved and
queried by scientists across the world. Many tools
are
integrated

within
web
-
based applications.

This paper
discusses
b
ioinformatics
resources and tools that are c
urrently used
and the opportunities
the area

present
s

for Mauritius.

The rest of the paper is organized as follows: Section 2
covers

the
resources available

to support research

in the area
.

S
ection 3 discusses the initiatives taken to develop a
Bioinformat
ics industry in different
regions.

Section 4 discusses Bioinformatics initi
atives on the African
continent.

Section 5 draws attention to the legal and ethical issues to be handled when developing the
area of Bioinformatics.
Section
6

discusses the
prospect
s
of Bioinformatics
for

Mauritius.

S
ection 7

makes recommendations for Mauritius to better seize the opportunities and meet
potential

challenges
and concludes the discussions.

2.

Bioi
nformatics resources worldwide

In order to facilitate ongoing research in bi
oinformatics, a number of resources are available
to

researchers. These tools can be broadly categorized as
programming tools
,
databases

and
data analysis
tools
.

3


2.1

Programming Tools in Bioinformatics

T
he main activities in the Bioinformatics discipline c
onsist of analyzing biological data which is composed
of the following sub
-
tasks:



Alignment of DNA sequences for comparison



Finding motifs within DNA sequences



Genome assembly following sequencing



Development of methods to predict the structure and/or fun
ction of newly discovered protei
ns and
structural RNA sequences




Clustering protein sequences into families of related sequences and the

development of protein
models



Aligning similar proteins and ge
nerating phylogenetic trees to determine

evolutionary rel
ationships

Programming tools

are software development
supports

that can be used to create
b
ioinformatics tools
.
The
se

programming

t
ools need to deal with
a huge
amount of scattered and complex information
(data/text) accurately, reliably, and effectively
.
Some of the programming tools can be classified as
follows:



BioJava

(
Biojava, 09
)
:
Biojava is
an open source project that provides
Java tools for processing
biological data which includes sequences

manipulation features
, dynamic programming, file parsers

a
nd

simple statistical routines.
It contains a

collection of Java
programs

that represent and
manipulate biological data

and
assist bioinformatics research
. It
started at EBI/Sanger

(European
Bioinformatics Institute

(
EBI, 09
)
)

in 1998 by Matthew Pocock and

Thomas Down.



BioPerl

(
Bio
Perl
, 09
)
:

BioPerl
consists of

Perl tools for bioinformatics and provides online resource
s

for modules, scripts and web links for developers of Perl
-
based software.
It has a bioinformatics
toolkit for:



format conversion



report pro
cessing



data manipulation



sequence analyses



batch processing



Biopython

(
Biopython, 09
)
:
Biopython is also an open source project with very similar goals to
bioperl. Biopython is a set of freely available tools for biological computation written in Python.
It is
4


a distributed collaborative effort to develop Python libraries and applications which address the
needs of current and future work in bioinformatics.



MATLAB Bioinformatics Toolbox:

Toolboxes (e.g., bioinformatics) are comprehensive collections of
MA
TLAB functions (M
-
files) that extend the MATLAB environment to solve particular classes of
problems
.
The Bioinf
ormatics Toolbox extends MATLAB

to provide an integrated and extendable
software environment for genome and proteome analysis. Together, MATLAB a
nd the
Bioinformatics Toolbox give scientists and engineers a set of computational tools to solve problems
and build applications in drug

discovery, genetic engineering

and biological research.



R
-
language
for Statistical Computing

[R
-
project, 09]
:
R is a f
ree software environment for statistical
computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and
MacOS
.
Its

bioinformatics counterpart component is
Bioconductor

(
Bioconductor,

09
)
.

Bioconductor
provides tools for the analysis and comprehension of genomic data.

The broad goals of
Bioconductor are to:




provide access to a wide range of powerful statistical and graphical methods f
or the analysis of
genomic data




facilitate the int
egration of biological metadata in the analysis of experimental data: e.g.
literature data from PubMed, annotation data

from LocusLink




allow the rapid development of extensible, scala
ble, and interoperable software




promote high
-
qu
ality and reproducible r
esearch




provide training in computational and statistical methods f
or the analysis of genomic data.


2.2

Databases for Bioinformatics

There is a very large number of databases covering a wide range of scientific data
available to

research
ers

in bioinforma
tics
(
Zvelebil, 0
8)
. Data is highly duplicated in different databases. An
important feature of many databases is that they do not only store sequence data but they also contain
a lot of relevant non
-
sequence data known as
annotation

that can include links
to related entries in
other databases, interpretation of data and relevant research citations. In addition to simply providing
information, some of the databases also provide web
-
base
d

interface to programs for online analysis of
their data.

A distinction
is sometimes made between databases of primary data and those that contain secondary
data derived from these primary sources. In some cases, the primary data include raw experimental
5


results such as scans of gene
-
expression arrays and two
-
dimensional prote
omic gels but in many cases
they include the initial experimental interpretation e.g. nucleotide sequences. An example of database
containing primary data is SWISS
-
PROT for protein sequences. Examples of secondary databases are
those that contain collectio
ns of conserved protein motifs, or comparisons of multiple sequences that
give measures of sequence similarity and relatedness and are only based on data existing at that time.

Databases can be categorized as follows:



Sequence databases

Nucleotide sequence

related databases include major international collaborations such as GenBank

(NCBI)
,
EMBL
-
EBI Nucleotide Sequence database

(
EBI, 09
)
,
and DDJB DNA Data

B
ank
of Japan. In
addition,

resources that are more gene
-
specific with information on introns, exons,
and splice sites,
as well as motifs and transcriptional regulators and sites. There are a number of different types of
DNA sequences stored in these databases, differing in the way they have been obtained and each
type provides different biological informa
tion. They are:



the raw genomic sequence representing the sequence of chromosomal DNA which is deposited
in GenBank (produced at National Center for Biotechnology Information (NCBI)
)

(
NCBI, 09
)

and
the organism
-
specific DNA sequence databases



the cDNAs whi
ch refer to the sequences of DNA molecules that have been synthesized by
reverse transcription of mRNA molecules indicating the range of genes being expressed in the
sample used at the time of experimentation



E
xpressed
S
equence
T
ags (ESTs) which is a parti
al cDNA sequence, also indicating the range of
genes being expressed in the sample used at the time of experimentation
.

Protein sequence databases include the major sequence databases such as UniProtKB

(
UniProt, 09
)

and NCBI Protein Database

(
NCBI
-
Protein,

09
)
,
both

being efforts to collect information on all
protein sequences. These protein databases are often compiled from raw nucleotide sequence data.
U
niProtKB is produced by analysis of all translations of the EMBL database nucleotide sequences. It
has
two components, namely Swiss
-
Prot which is manually annotated and TrEMBL which is only
computer annotated
.

In addition, a multitude of organism
-
spe
c
ific or protein families databases have been set up thus
allowing a more structural organisation

of informat
ion,

f
or example Fly
B
ase

(Drosophila
6


melanogaster)
, TAIR (for Arabidopsis thaliana), Vector
B
ase,
PLASMODB
(malaria), KEGG Pathway
Database which provides pathway maps based on known molecular interactions.

Most of the databases also provide analysis t
ools

for both DNA and proteins.



Microarray databases

and Gene expression databases

Microarray databases are repositor
ies

of data from microarray experiments, often accompanied by
data analysis and tools to visualize the raw image. Gene expression databases als
o contain
expression data collected by other experimental methods such as SAGE (
S
erial
A
nalysis of
G
ene
E
xpression) and EST

sequencing. The databases contain expression data and often extensive
annotation as well as techniques to visualize the numerical an
d statistical analysis programs. One
such database is the Stanford Microarray Database (SMD) which includes data from above 7000
microarray experiments. ArrayExpress

(
ArrayExpress, 09
)

is
another

repository for microarray data
which

additionally includes t
he ArrayExpress Data Warehouse that stores gene
-
indexed expression
profiles from a curated subset of experiments from the database.



Protein interaction databases

P
roteins
have to interact with other molecules, including other proteins, to
carry out their f
unction
s
.
The protein interaction databases provide an understanding of the functions of the proteins and
help in building up biological networks that can be used in systems biology. There are a number of
such databases, namely:



the
D
atabase of
I
nteracting

P
roteins (DIP)

(
DIP, 09
)

that contain
s

information only on protein
-
protein interactions



the Molecular INTeraction database (MINT)

(
MINT, 09
)

that contain
s

additional information on
protein, nucleic acid, and lipid interactions



the Biomolecular Interaction

Network Database (BIND)
(
BIND, 09
)

that describes interactions at
the atomic level for protein, DNA, and RNA



protein Signaling, Transcriptional Interaction and Inflammation Networks Gateway

(
pSTIING
)
(
pSTIING, 09
)

is a web
-
based application as well as an
interaction database for protein
-
protein,
protein
-
anything else interactions as well as transcriptional associations
.



Munich Information Center for Protein Sequences

(
MIPS
) host
s

a comprehensive, manually
curated databse of mammalian protein
-
protein intera
ctions.



Proteome (Proteome, 10)

is a useful reference for a list of protein interactions databases.

7




Structural databases

Structural databases include those containing information on the structure of small molecules,
carbohydrates, nucleic acids (DNA, RNA)
, and proteins. These are the results obtained using various
experimental techniques, using X
-
ray crystallography or
N
uclear
M
agnetic
R
esonance (NMR). The
most common structural databases are the Structural Bioinformatics Protein Databank (RCSB, PDB)
(
rcsb
, 09
)

and the Macromolecular Structure Database (MSD)
(
MSD, 09
)

at EBI.

CATH is a
p
rotein
c
lassification of structural domains. SCOP, Structural Classification of Proteins
,

provides detailed
information on folds, superfamilies and families with the aim of
being able to reconstruct structural
and evolutionary relationships among proteins.

2.3

Data analysis tools

A number of organisations which host databases for bioinformatics applications also provide data
analysis tools. The two main ones are
the EBI and
the NCBI toolboxes.

2.3.1

Toolbox at EBI

The European Bioinformatics Institute (EBI)
(
EBITools, 09
)

provides a comprehensive range of tools for
the field of bioinformatics. These are subdivided into the following categories:



Homology and Similarity Programs

The
BLAST

(
B
asic
L
ocal
A
lignment
S
earch
T
ool
)

enables a researcher to compare a query sequence
(protein or nucleotide) with a
database

of sequences, and identify sequences that resemble the
query sequence above a certain threshold.

The
Smith & Waterman algorit
hm

is used for performing
local sequence alignment
; that is, for
determining similar regions between two
protein sequences
. Instead of looking at the total
sequence, the Smith
-
Waterman algorithm compares segments of all possible lengths and optimizes
the s
imilarity measure.



Protein Functional Analysis

EBI provides the protein analysis application via the
InterPro

and

InterPro
Scan

tool

(
InterProScan,
09
)
. InterPro is an integrated database of predictive protein "signatures" used for the classification
and au
tomatic annotation of proteins and genomes. It classifies sequences at superfamily, family
and subfamily levels, predicting the occurrence of functional domains, repeats and important sites.
It adds in
-
depth annotation, including GO
(Gene Ontology)
terms,
to the protein signatures.

8


InterProScan tool allows a user to query his/her protein sequence against InterPro and allows for
s
earch
ing the

InterPro by accession number or sequence
. It can be used to search for
p
rotein
repeat
s, motifs, biochemical function
and

family
.



Structural Analysis

The determination of a protein's 2D/3D structure is crucial in the study of its function
s
. EBI provides
a set of tools
for p
rotein structure analysis and secondary structure prediction
. Some of them are:



DaliLite
:


This pro
gram is used for pairwise structure comparison
i.e.

it co
mpares the given
structure (first structure) to a reference structure (second structure).



EMSearch
:


This is a search tool for electron microscopy depositions.



MaxSprout
: Allows for the reconstructio
n of 3D coordinates from C (alpha) trace.



PQS and PQS
-
Quick
: These tools are used to search for Protein Quaternary Structure.



Sequence Analysis

Sequence analysis encompasses the use of various bioinformatics methods to determine the
biological function and
/or structure of genes and the proteins they code for.
Unknown structure and
function can be elucidated through comparison with database of known
structures/sequences/functions
. EBI provides a number of tools for sequence analysis, some of
which are:



Clust
alW

is a general
-
purpose
Multiple
Sequence
Alignment

tool for nucleotides or proteins. It
produces biologically meaningful multiple sequence alignments of divergent sequences. It
calculates the best match for the selected sequences, and lines them up so th
at the identities,
similarities and differences can be seen. Evolutionary relationships can be seen via viewing
Cladograms or Phylograms.



E
MBOSS
-
Align

contains two programs each using a different algorithm.
For

an alignment that
covers the whole length of

both sequences, the
N
eedle

program (based on
Needleman
-
Wunsch

algorithm

(
Needleman, 70
)
)

is used
.
In order

to find the best region of similarit
y between two
sequences,
the
W
ater

program (based on
Smith
-
Waterman algorithm

(
Waterman, 76
)
).

There are also a number of
Gene finding tools

and
translation

tools
.

2.3.2

Tools at NCBI

9


The NCBI
(NCBITools, 09)

provides

a comprehensive range of too
ls for the field of bioinformatics which
can be categorized as follows:



Nucleotide Sequence Analysis

The nucleotide sequence analysis tools at the NCBI can be summarised as follows:



BLAST
, used
for comparing gene and protein sequences against others in pub
lic databases
,

comes in several
forms

including PSI
-
BLAST, PHI
-
BLAST, and BLAST 2 sequences. Specialized
BLASTs are also available for human, microbial, malaria, and other genomes, as well as for
vector contamination, immunoglobulins, and tentative human c
onsensus sequences.



Electronic

PCR

allows a user to search a query DNA sequence for sequence tagged sites (STSs)
that have been used as landmarks in various types of genomic maps. It compares the query
sequence against data in NCBI's UniSTS
, which is

a uni
fied, non
-
redundant view of STSs from a
wide range of sources.



Model

Maker

allows a user to view the sequence (mRNAs, ESTs, and gene predictions) that was
aligned to assembled genomic sequence to build a gene model. It is then possible to edit the
model by

selecting or removing putative exons. The mRNA sequence and potential ORFs for the
edited model can be viewed and the mRNA sequence data saved for use in other programs.

Model Maker is accessible from sequence maps that were analyzed at NCBI and displaye
d in
Map Viewer.



ORF

Finder

identifies all possible ORFs in a DNA sequence by locating the standard and
alternative stop and start codons. The deduced amino acid sequences can then be used to
BLAST against GenBank.



Protein Sequence Analysis and Proteomics

BLAST
pr
ograms are also available

for comparing protein sequences.



B
l
ink

("BLAST Link") displays the results of BLAST searches that have been
carried out

for every
protein sequence in the Entrez Proteins data domain.



CDART

takes a given protein query seque
nce and
displays the functional domains that make up
the protein and lists proteins with similar domain architectures
.



TaxPlot

is

a tool for 3
-
way comparisons of genomes on the basis of the protein sequences they
encode.
In

TaxPlot, one selects a reference

genome to which two other genomes are compared.
Pre
-
computed BLAST results are then used to plot a point for each predicted protein in the
10


reference genome, based on the best alignment with proteins in each of the two genomes being
compared.



Structural An
alysis

Cn3D is a helper application for web browser
s

and

allows a user to view 3
-
dimensional structures
from NCBI's Entrez retrieval service.

VAST Search is NCBI's structure
-
structure similarity search service. It compares 3D coordinates of a
newly determ
ined protein structure to those in the
MMDB
/PDB (Molecular
Modeling

Database
/P
rotein
D
ata
B
ank)

database.



Genome Analysis

Entrez Genomes

hosts

whole genomes of over 1000 organisms. The genomes represent both
completely sequenced organisms and those for whi
ch sequencing is in progress. All three main
domains of life
-

bacteria, archaea, and eukaryota
-

are represented, as well as many viruses,
phages, viroids, plasmids, and organelles. Entrez Genomes provides graphical overviews of complete
genomes/chromosom
es and the ability to explore regions of interest in progressively greater detail.

Clusters of Orthologous Groups

(
COGs
)

(a system

of gene families
)

were delineated by comparing
protein sequences encoded in 43

complete genomes, representing 30 major phylogenetic lineages.
Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus
corresponds to an ancient conserved domain.



Gene Expression

Gene Expression Omnibus

(
GEO
)
provi
des several tools to assist with the visualization and
exploration of GEO data. Datasets may be viewed as hierarchical cluster heat maps, providing insight
into the relationships between samples and co
-
regulated genes
.

SAGEmap

provides a tool for performin
g statistical tests designed specifically for differential
-
type
analyses of Serial Analysis of Gene Expression

(SAGE
) data. The data include SAGE libraries
generated by individual labs as well as those generated by the Cancer Genome Anatomy Project
(CGAP),

which have been submitted to GEO.

The Cancer Genome Anatomy Project

(CGAP)

-

aims to decipher the molecular anatomy of cancer
cells. CGAP develops profiles of cancer cells by comparing gene expression in normal, precancerous,
and malignant cells from a wi
de variety of tissues.

11


2.3.3

S
wiss

B
ioinformatics

I
nstitute

Expert Protein Analysis System

(ExPASY
) is a proteomics server of the SIB hosts a variety of proteomics
tools with structural viewer
. Protein identification and characterisation can be performed using a

number of different tools that distinguish the different molecular properties of proteins such as
isoelectric point, molecular weight and amino acid composition. Similarity search as well as pattern and
profile search are also available.
Als
o provided ar
e a ViralZone and HAMAP for microbial proteomes.

3.

Initiatives in other parts of

the world

The
opportunities presented by

biotechnology and bioinformatics

ha
ve

motivated

the setting up of
research nodes
by

most nations throughout the world.
Many

association
s share resources

and make up
regional networks.

Two such associations are EMBnet

(European Molecular Biology Network, RIBioNet
(Latin America
) and

APBioNET

(Asia
-
Pacific)
.

3.1 EMBnet

EMBnet
(EMBnet
,
09)

is a science
-
based group of collaborating nodes thro
ughout Europe and a number
of nodes outside Europe. The combined expertise of the nodes allows EMBnet to provide services to the
European molecular biology community which encompasses more than can be provided by a single
node. This site gives an overview
of the organization and of
its

members. It provides the visitors with
news of the EMBnet community and new links related to bioinformatics. It also combines the services
available on the nodes and publishes EMBnet.news, the
electronic

letter devoted to pro
vide information
about what is happening at the national and special nodes.

Since its creation in 1988, EMBnet has evolved from an informal network of individuals in charge of
maintaining biological databases into the only organization world
-
wide bringing
bioinformatics
professionals to work together to serve the expanding fields of genetics and molecular biology. Although
composed predominantly of academic nodes, EMBnet gains an important added dimension from its
industrial members. The success of EMBnet
h
as attracted an
increasing number of organizations outside
Europe to join

the group
. EMBnet has a tried
-
and
-
tested infrastructure to organise training courses, give
technical help and help its members
to
effectively interact and respond to the rapidly chan
ging needs of
biological research in a way no single institute is able to do. In 2005 the organization created additional
types of node to allow more than one member per country. The "associated node" was born.

The following are some of the main
a
chieveme
nts of EMBnet
:

12




Development of the first complete e
-
learning system for
teaching Bioinformatics (EMBER)



EMBnet
’s

compromise with Society is reflected in its active involvement in dealing with relevant
problems and diseases (
AntiSARS
, RBMDB,
p53FamTaG
)



EMBnet has pioneered use of Grid technologies in the Biosciences and has been involved i
n seminal
Grid projects (
SWEGrid
, EGEE,
EMBRACE
,
HealthGrid
,
WebServices
)



From very early on, EMBnet has promoted development of distributed computing services initiati
ves
to share workload among international servers ( (
HASSLE
,
SRSfed
, MRSfed, FedBLAS
T, SIMDAT)



EMBnet is committed to bringing the latest software algorithms to the user
,

free of charge (
EGCG
,
Pratt
,
BITS
,
HoxPred
)
,

and continues to develop sta
te of the art public software (
EMBOSS
) and
powerful, easy to use intuitive interfaces (
CIN
EMA
,
W2H
,
GeneDoc
,
WWW2GCG
,
TOPS
,
Jalview
,
wEMBOSS
,
Jemboss
,
STACKpack
,
EMBOSSrunner
,
eBiotools
,
WebLab
,
UTOPIA
)



EMBnet has made major contributions to supercomputing in the Life Sciences as a means to deliver
more powerfu
l and advanced services (
Bioccelerator
,
MPSRCH
,
INSECTS+MOLLUSCS
)



EMBnet has contributed to the development and maintenance of advanced database systems for
the Life Sciences (
SRS
,
Bioimage
,
CpGisle
,
CLEANUP
,
Webin & Seqin
,
GQserv
, PRINTS, InterPro,
STACKdb
,
UniProt
,
NyHITS
,
ENSEMBL
,
MitoDrome
,
YeastBASE
,
MRS
,
MitoRes
)



EMBnet was the first to come up with advanced solutions for automated database distribution using
the

Internet (NDT,
SynCron
)



The
Ping

project was for a long time the only existing project giving continuous information about
network e
fficie
ncy across the whole of Europe



EMBnet had the first gopher and World Wide Web servers in biology (
CSC BioBox
).

3.2 APBioNET

The Asia Pacific Bioinformatics Network (APBioNet)
(
APBioNet
,
09
)

is a non
-
profit, non
-
governmental,
international organizati
on. It focuses on the promotion of bioinformatics in the Asia Pacific Region. Since
1998, its mission has been to pioneer the growth and development of bioinformatics awareness,
training, education, infrastructure, resources and research amongst member cou
ntries and economies.
Its work includes the technical coordination and liaison with other international bodies such as the
EMBnet.

APBioNet has more than 20 organizational and 300 individual members from over 12 countries in the
region, and members includ
e those from industry, academia, research, government, investors and
13


international organisations. APBioNet has coordinated or co
-
organised more than 20 international and
national meetings in cooperation with members in different economies. It is spearheadi
ng a number of
key bioinformatics initiatives in the region in collaboration with international organisations such as
APAN, APEC, S* Alliance and A
-
IMBN.

4.

The African initiatives

Africa is set to take up the challenge of bringing solutions to its major pr
oblems of health and food
through the applications of biotechnology and bioinformatics. Capacity building in this area will ensure
that scientists have the right tools to address the research issues relevant to the continent. South Africa
dominates the sce
ne with well established bioinformatics centres and
where
various universities have
engaged in this direction in order to ensure manpower training. The research output is eviden
t

to the
high level of activities that are currently on
-
going. Both Malawi and

Zambia have many projects in the
health and agricultural sectors that are molecular biology based and therefore need bioinformatics to
make good progress. East Africa has several institutions engaged in
the
utilisation of bioinformatics
applications. ILR
I, International Livestock Research Institute in Nairobi, has a state
-
of
-
the art centre
where several pathogen genomes have been sequenced. KEMRI, Kenya

Medical Research Institute, is
also involved in
the
application of bioinformatics tools in malaria rese
arch. This institute has a long
-
standing support from the Welcome Trust in UK which coordinates major sequencing projects at the
Sanger Institute, Cambridge, UK. North Africa is also active in developing
bioinformatics;

Pasteur
Institute in Tunis has close

collaborations with French research centres while working on local problems.
Similarly West Africa runs several health related projects where bioinformatics tools are widely applied.

The New Partnership for Africa’s Development (NEPAD), with
the
objecti
ves
of

stimulat
ing

Africa’s
development by bridging existing gaps in priority
sectors
, has identified that the future of Africa lies in
the development of Science & Technology
.

In this respect, i
n 2003,

it adopted

an outline of an action
plan containing a
number
of
flagship programme

areas
.


It has been recognized that investment in Biosciences can help Africa to ensure food security and better
health for its population.
Flagship programmes related to biosciences have been clustered to form the
Bioscience i
nitiative

which has created

four reg
ional networks in the continent.

These are:

1. Biosciences Eastern and Central Africa Network (B
ecA
Net).

2. Southern Africa Biosciences Network (SANBio).

3. West Africa Biosciences Network (WAB Net).

14


4. North Africa Bio
sciences Network (NAB Net).

Each of these networks consist
s

of a hub and a number of nodes that work towards the development of
b
iosciences including
b
ioinformatics
, in the respective region
.

These networks provide coordination and
financial support for th
e nodes for capacity building and development of research projects.

The BecA Net has drawn a 4
-
year business plan for achieving its objectives. The BecA Hub has a number
of service units among which one is for
b
ioinformatics.
The BecA Hub has a
b
ioinformat
ics
p
latform
,

hosted on a High Performance Computer (HPC) platform located on the
BecA
, Nairobi campus, and
provides advanced computational capabilities in bioinformatics to all BecA Hub scientists to:



Uncover the wealth of biological information hidden in

the mass of DNA sequences, structure,
literature and other bi
ological data



Obtain a clearer insight into the f
undamental biology of organisms



Use this information to enhance the standard of life for mankind.

The Sou
thern Afri
ca Biosciences Network (SANBio
) is to cater for
the development of
b
iosciences and
related areas in
12 countries of the South African region
,

including Mauritius.

The strategic objectives of
SANBio are to:



Address Southern African problems in agriculture, health, and environment thr
ough the application
of bioscience technologies



Use new developments in biosciences to protect the environment and conserve biodiversity in
Southern Africa



Build and strengthen human capacity in biosciences in Southern Africa



Promote access to aff
ordable, world
-
class research facilities within Southern Africa



Harness indigenous knowledge and technology of the Southern African people

for sustainable
utilization of natural resources and wealth generation
.

Due to its ability to enhance research and

development in Biosciences,
Bioinformatics can play an
important role to support the objectives of SANBio. It is acknowledged that in Biosciences in general,
including Bioinformatics, capacity building
is an important stepping stone.
A recent initiative o
f the
SANBio has launched a capacity building project for the training of scientists in the region in the various
applications in bioinformatics. The aim is to equip university academics and researchers with the skills to
teach and implement activities in
this field. Several collaborations have been set up for this purpose with
15


the European Molecular Biology Network and with ILRI. The University of Mauritius has been selected as
the SANBio regional node for Bioinformatics capacity building

(
Jauferally
-
Fakim

et al
.,
09
)
.

5.

Legal and ethical issues

The prospects of Bioinformatics ha
ve

aroused
a
lot of interest and enthusiasm in the research
community and public at large. In the agricultur
al

industry many plants have already been geneti
cally
modified (Steve Wind
ley,
08) to produce fruits which are resistant to pests, cold and other adverse
effects. Many benefits have been reported (Wolfenba
rger L. L., and Phifer P.

R.,
00) due to the use of
genetically modified plants, such as
r
educed environmental impacts from
pesticides, ease in
s
oil
conservation,
i
ncreased yield and Phytoremediation (remediation of polluted soils, sediments, surface
waters, and aquifers).

Research in
b
ioinformatics and
g
enetic
e
ngineering is also being carried out on human cells to find more
effective cures. MOSS Bernard
, in
1996
,

reported that
the
Vaccinia virus, no longer required for
immunization against smallpox, now serves as a unique vector for expressing genes within the
cytoplasm of mammalian cells. As a research tool, recombinant vacc
inia viruses are used to synthesize
and analyze the structure
-
function relationships of proteins,
to
determine the targets of humoral and
cell
-
mediated immunity, and
to
investigate the types of immune response needed for protection against
specific infecti
ous diseases and cancer. The vaccine potential of recombinant vaccinia virus has been
realized in the form of an effective oral wild
-
life ra
b
ies vaccine, although no product for humans has
been licensed. A genetically altered vaccinia virus that is unable
to replicate in mammalian cells and
produces diminished cytopathic effects retains the capacity for high
-
level gene expression and
immunogenicity while promising exceptional safety for laboratory workers and potential vaccine
recipients.

Rosenberg et al, i
n 2006, have reported that they have achieved Cancer Regression in Patients after
transfer of Genetically Engineered Lymphocytes. Search for which gene is responsible for which disease
is a very common topic of research in most groups. Some of the
cause
s h
ave already been identified and
simple tests can now determine who is prone to which disease.

One can definitely appreciate all the benefits that
biotechnology and bioinformatics have for the health
sector and also as a solution to food crisis. However, r
e
searchers have raised several concerns
over

the
safety of genetically modified foods. Researchers are concerned about what effects might come by
interfering with the DNA of these crops. What happens to the crops? What happens to the animals and
16


the humans
who eat them? Are these plants a problem now?
Will they be

a problem in the future? Can
the bacteria and viruses used to alter the DNA in these plants also affect the bacteria in our body? These
issues offer avenues for further research.

With the trend in
the h
uman genome project, it will be soon possible to identify the genes which are the
causes of different diseases. Simple tests can determine that one is prone to certain disease
s

or have
high risk
s

of developing certain severe disease
s
. This raises seve
ral ethical issues about how such
information can be used. Can a parent decide to abort a child that may be at risk? Can insurance
companies decide not to insure a person with a high risk? Can a company decide to reject the job
application on the same basi
s? Will one want to check his/her partner’s genetic information before
getting into a relationship?

Béatrice Godard and her co
-
authors
(Godard et al,
03) examine the professional and scientific views on
the social, ethical and legal issues that impact on g
enetic information and testing in insurance and
employment in Europe. For this purpose, many aspects were considered, such as the concerns of
medical geneticists, of the insurers and employers, of the public, as well as the regulatory frameworks
and unreso
lved issues. The work was based on debates from 47 experts from 14 European countries
invited to an international workshop organized by the European Society of Human Genetics Public and
Professional Policy Committee in Manchester, UK, 25

27 February 2000.
The result
s

stress on a need for
clear definitions of terms used in genetics, declaring the grounds on which genetic information is or is
not used, and promoting confidence between the public and the insurance industry. In Europe, there is
currently very l
ittle use of genetic information in relation to employment, but the situation should be
kept under review.


6.

Prospects
of B
ioinformatics

in
M
auritius


Two of the areas impacted by
b
ioinformatics and that are of high relevance to Mauritius are
h
ealthcare
a
pp
lications and
f
ood
s
ecurity.

However the first line of action should target education at tertiary level.
Bioinformatics has been introduced into existing programs at UoM but it is crucial that additional
resources be allocated for implementing programs an
d initiating research in this area.

17


6
.1 Healthcare Applications

Traditional drug discovery has been through the
isolation,

or synthesis of molecules whose activities are
then screened through a lengthy and costly process. Pharmacokinetic properties and to
xicity have to be
determined. This is being replaced by a more molecular targeting approach in which compounds are
screened in silico for their ability to bind to proteins and modifying their function. It is possible to do so
due to improved knowledge of t
he basis of diseases. Most large pharmaceutical firms are already
applying this technology. Drugs targets can be validated through their 3
-
D structure using proteomics
tools.

Molecular epidemiology of infectious diseases rel
ies

on
the
knowledge of their g
enetic variability in
order to have adequate control measures. Bacterial and protozoan genomes have become available
over the past years and the sequences can be compared with appropriate comparison tools. These
methods are more promising for vaccine devel
opment as well as finding new antibiotics. In silico
vaccinology allows the identification of appropriate binding molecules to antigenic epitopes that will
enhance an immune response in the vaccinated individual.

6
.2 Food Security

Food production relies
on a limited number of plant varieties which are bred for optimal yield and
agronomic characters. Major crops, like rice, have already been sequenced while other cereals


genomes
are in the pipeline. It is estimated that genomes sequences of crops will hel
p improve the quality of food
products and ensure adequate production in the future. Bioinformatics is promising in finding useful
genes and mapping them on the genome of both plants and animals. DNA sequence data as well as
expression patterns of genes a
re hopeful means of finding ways to deal with insect vectors as well as
disease causing organisms. More effective vaccines are being designed this way.

6
.3 Opportunities for Mauritius

The ICT sector has been identified as one of the important pillars of t
he Mauritian Economy. Software
development is to play an important role in the ICT sector. This activity can be extended to include
b
ioinformatics software development. Mauritius can participate actively in software for data mining,
simulation and visualiz
ation tools.

With the advent of Next Generation Sequencing there will be a high
demand
for trained man power to work with applications in genome assembly and annotation.
Mauritius can take advantage of such prospects in outsourcing.

18


However, to seize the o
pportunities, Mauritius will need to invest in the required resources to support
b
ioinformatics activity. These include the development of the required human resources and high
performance computing facilities to support the development of databases and co
mputing tools.

Equally important, there is an urgent need to invest in research facilities to carry out studies in the fields
of genomics and proteomics.
Mauritius has a high degree of endemicity with unique terrestrial and
marine species. The country can
have substantial economic prospects from studying the genomics of
these different species
, in particular those with medicinal properties
. A database of the genomic
information about these species would be extremely valuable
.

The population of Mauritius co
mes from different origins, thus providing unique opportunities for
understanding the effects of genotypes on diseases. This offers interesting prospects from
the

genomic
perspective.
Recent epidemics of both human and animal diseases in the region have re
sulted in severe
setbacks in the economy, thus

emphasizing the need for strengthening research in the area of molecular
epidemiology of pathogens.

6
.4
Bioinformatics at the
UoM

In order to support the above mentioned development, academic institutions need

to take the lead to
drive research and capacity building in the area.
The University of Mauritius, conscious of its important
role in this development, has been proactive in initiating appropriate steps. R
esearchers

from the
Faculties of Science, Agricult
ure and Engineering, have joined efforts to embark on research in the field
of
b
ioinformatics.
Among other initiatives, a

Bioinformatics
Computing
R
esearch
G
roup has been
set up

since 2006.

Additionally, t
here is an increasing number of programmes related

to bioinformatics or with
bioinformatics
components
that are being offered
both at undergraduate and postgraduate levels
at the
different faculties of the
University of

Mauritius
.
New programmes with higher emphasis in
bioinformatics are in the pipeline.

Recently, the SANBio
(
SANBio, 09
)

Steering Committee approved the designat
ion

of
the University of
Mauritius as a SANBio Node
for capacity building
in
b
ioinformatics. Among other
activities
, the
University of Mauritius through the Faculty of Agriculture wi
ll be coordinating

the implementation of
training
programme
s

in
b
ioinforma
t
ics in the SADC region under the auspices of NEPAD. Under this
19


initiative, a computer laboratory
(
equipped with necessary hardware and software
), sponsored by
SANBio,
is being

set u
p

at the University of Mauritius to support the capacity building
.

7.

Recommendation
s

and Conclusion

Bioinformatics is relevant to many fields of life, namely



Basic science

for understanding living systems at the molecular level.



Medicine

more specifically fo
r clinical informatics.



Agriculture

and
fisheries

so as to
improve
yield and disease resistance.



Environment

so as to better understand the biosphere and do biological spill clean
-
up.

In Mauritius, a number of institutions are concerned with
b
ioinformatic
s research due to the nature of
their activities. Among others, we have the Mauritius Sugar Industry Research Institute (MSIRI), the
Mauritius Oceanographic Institute (MOI),
the Food and Agricultural Research Council (FARC), the
Ministry of Agriculture, th
e Ministry o
f Health and academic institutions such as the University of
Mauritius.

Development of
b
ioinformatics at the national level requires coordination and collaboration
among these institutions.

Bioinformatics involve large amounts of data and inten
sive processing power.
In order to
support

research in
this area
, there is a need to increase resources for information infrastructure

and build the
appropriate computing environment
. Extensive training programmes in the field including hands
-
on to
the abo
ve
-
mentioned tools can kick
-
start research in the area of bioinformatics, and the University of
Mauritius can play a key role
in this respect
.

Mauritius should aim at building the necessary infrastructure

to
maintain
b
ioinformatics
databases

for
storing an
d archiving local data
. Such databases should be highly protected against piracy and unethical
use. Therefore acces
s to
this

data should be properly controlled
. However, overprotection may stifle
useful research. Currently Mauritius is equipped only with t
he Data
P
rotection Act 2004. More research
should be conducted to fine tune
the legal

aspects of data protection and use.

The field of bioinformatics presents a number of interesting challenges and opportunities for biologists,
computer scientists, informa
tion scientists and bioinformati
cians
. These challenges sit at the
intersection of biology and information. Ideally, larger scale work in this broad area involves a
partnership between those with expertise in relevant foundational domains (e.g. computer sc
ientists)
and application domains (e.g. biologists) as well as bioinformatici
ans

to serve as a bridge.

20


The potential benefits of addressing some of the above
-
mentioned challenges are
numerous

both in
terms of improving our understanding in general of how b
iological systems work and in terms of
applying
the knowledge
to help improve health and treat diseases.

Above all
,

bioinformatics has brought together researchers, organisations and institutions from different
areas with the aim of strengthening collabora
tive output in scientific discovery.


References

[APBioNet 09]

APBioNet Homepage,
http://www.APBionet.org/
,
accessed on 17 Dec 2009

[ArrayExpress, 09]

ArrayExpress Homepage,
http://www.ebi.ac.uk/microarray
-
as/ae/
,
accessed on
16 Dec 2009


[BIND, 09]

Biomolecular Interaction Database Homepage,
http://www.ncbi.nlm.nih.gov/pubmed/11125103
,
accessed on 16 Dec 20
09

[Bioconductor, 09]

Bioconductor Homepage,
http://www.bioconductor.org
,
accessed on 16 Dec
2009

[Biojava, 09]

Biojava Homepage,
http://
www.biojava.org
,
accessed on 16 Dec 2009

[BioPerl, 09]

BioPerl Homepage,
http://www.bioperl.org
,
accessed on 16 Dec 2009

[Biopython, 09]

BioPython Wiki,
http://biopython.org/wiki/Main_Page
,
accessed on 16 Dec
2009

[DIP, 09]

Database of Interacting Proteins Homepage,
http://dip.doe
-
mbi.ucla.edu
,
accessed on
16 Dec 2009

[EBI, 09]

EBI Homepage,
http://www.ebi.ac.uk
,
accessed on 16 Dec 200
9

[EBI, 09]

EMBL
-
EBI Homepage,
http://www.ebi.ac.uk/embl/
,
accessed on 16 Dec 2009

[EBITools, 09]

EBI Tools Homepage,
http://www.ebi.ac.uk/Tools/
,
accessed on 16 Dec 2009

[E
MBnet 09]

EMBnet Homepage,
http://www.Embnet.org/
,
accessed on 17 Dec 2009

Godard Béatrice,

Raeburn Sandy,

Pembrey Marcus, Bobrow Martin,

Farndon Peter and

Aymé Ségolène, ,

Genetic information and testing in insuran
ce and employment: technical, social and ethical issues”,
European Journal of Human Genetics

(2003)
11,

Suppl 2, S123

S142

[InterProScan, 09] InterProScan Sequence Search,
http://www.ebi.ac.uk/Tools/Int
erProScan/
, accessed
on 16 Dec 2009

[Jauferally
-
Fakim, 09]

Jauferally
-
Fakim Y.,
Puchooa
D., Mumba L. “
Status of Bioinformatics in Southern
Africa: Challenges and Opportunities”, EBMnet.news, vol 15, No. 3, October 2009.

21


[MINT, 09]

Molecular INTeraction Da
tabase Homepage,
http://mint.bio.uniroma2.it/mint/Welcome.do
,
accessed on 16 Dec 2009

[Moss 96]

MOSS Bernard, 1996, “Genetically engineered poxviruses for recombinant gene
expression, vaccination, a
nd safety”

Proc. Natl. Acad. Sci. USA Vol. 93, pp. 11341
-
11348, October 1996

[MSD, 09]

Macromolecular Structure Database Hom
e
Page,
http://www.ebi.ac.uk/msd/
,
accessed
on 16 Dec 2009

[NCBI, 09]

NCBI Homepage,
http://www.ncbi.nlm.nih.gov
,
accessed on 16 Dec 2009

[NCBI
-
Protein, 09]

NCBI Protein Database Homepage,
http://www.ncbi.nlm.nih.gov/protein/
,
accessed on 16 Dec 2009

[NC
BITools, 09]

NCBI Tools,
http://www.ncbi.nlm.nih.gov/Tools/index.html
,
accessed on 16 Dec 2009

[Needleman, 70] Needleman, S. B. & Wunsch, C. D. (1970).
Journal of Molecular Biology. 48, 443
-
453.

[P
roteome, 10] Proteome Homepage

http://proteome.wayne.edu/PIDBL.html

accessed on 11 Jan 2010

[pSTIING, 09]

protein Signaling, Transcriptional Interaction and Inflammation Networks Gateway,
http://pstiing.licr.org
,
accessed on 16 Dec 2009


[Rcsb, 09]
Structural Bioinformatics Protein Databank Homepage,
http://www.rcsb.org/pdb/home/home.do
,
accessed on 16 Dec 2009

[Rose
nberg 06] Rosenberg* S. A., Morgan R. A., Dudley M. E., Wunderlich J. R., Hughes M. S., Yang J. C.,
Sherry R. M., Royal R. E., Topalian S. L., Kammula U. S., Restifo N. P., Zhili Zheng, Azam N., Christiaan R.
de Vries, Linda J. Rogers
-
Freezer, Sharon A. M.

, , 2006, “Cancer Regression in Patients After Transfer of
Genetically Engineered Lymphocytes”,
Science

6 October 2006, Vol. 314. no. 5796, pp. 126
-

129

[R
-
project, 09]

R
-
Project Homepage,
(
http://www.r
-
project.org/
,
accessed on 16 Dec 2009

[SANBio, 06]

Southern African Network For Biosciences (SANBio) Business Plan 2006
-
2011”, Prepared
by SANBio Secretariat, c/o CSIR, Box 395, Pretoria 0001, Republic of South Africa, April 2006

[SANBio, 09] SANBio Home,
http://www.san
-
bio.com/
, accessed on 16 Dec 2009.

[Swiss
-
Prot, 09]

Swiss
-
Prot Homepage,
http://www.expasy.ch/sprot/
,
accessed on 16 Dec 2009

[UniProt, 09]

UniProt Homepage,
http://www.uniprot.org
,
accessed on 16 Dec 2009

[Waterman, 76]


Waterman, M. S., Smith, T. F. & Beyer, W. A. (1976). Advances in Mathematics, 20, 367
-
387.

[Windley, 08]

Windley Steve 2008, “Genetically Modified Foods”,
PureHealthMD.co
m, Pure Health
Corporation Fort Wayne IN USA, 2008.

22


[Wol
f
enbarger
, 00
]

Wolfenbarger L. L., and Phifer P.

R.,

2000,

“The Ecological Risks and Benefits of
Genetically Engineered Plants.”,
Science

15 December 2000, Vol. 290. no. 5499, pp. 2088
-

2093

[Zvelebi
l, 0
8
]

Zvelebil M., Baum J.O., “Understanding Bioinformatics”, Garland Science, ISBN 0
-
8153
-
4024
-
9, 2008