Slide 1 - Digital World 2012

hordeprobableBiotechnology

Oct 4, 2013 (4 years and 1 month ago)

102 views

Bangladesh perspective &
prospect on genomics: A few
completed cases.

Haseena

Khan

Department of Biochemistry and Molecular
Biology

University of Dhaka

Bioinformatics represents a potentially new
growth area for Bangladesh to build upon,
taking into account the country's strong ICT
foundation and its commendable effort in
the biotechnology arena.


Bioinformatics cannot be disregarded by any country
intending to remain up
-
to
-
date in
the biomedical
,
biotechnological and agricultural sectors.


Bioinformatics is
one of the most effective enabling
technology for several fields of biomedical and
agricultural research
.

Using Bioinformatics, developing countries
will have to manage their own specific data
on indigenous biological species.


They will also have to use it in local
epidemiology and biodiversity
programmes
.

As a specialist pursuit, Bioinformatics offers
substantial competitive opportunities to
smaller and developing countries, without the
requirement of prohibitive infrastructural
investment.


Public bioinformatics resources, such as databanks and
software tools that are crucial for biotechnology
projects, are today available via the internet.


Scientists
need only a computer and an internet
connection of a certain quality to use them.


If
these conditions exist, the situation of a developing
country biologist is no different than that of an
academic biologist in an industrialized country.




Bioinformatics knowledge itself has not been
restricted by patenting.


Most
algorithms are public and most of the best
software source codes are freely accessible.


However strategic support from government
and policy makers is necessary.


No Patent Restriction

Access to the results of bioinformatics, that is, teaching the
users to understand bioinformatics, is a general
problem.


E
xpertise
is the bottleneck.


The
first challenge is to teach the fundamentals of
bioinformatics to university students, which is
complicated
since senior professors
are
often not
familiar with
the
methodology.


The
second
difficulty
is a problem of building and
maintaining capacity.


There
are few university
-
level bioinformatics
curricula.

Instead
of maintaining bioinformatics research
groups, many universities choose to support
bioinformatics teaching in the general framework
of biological
curricula.


Knowledgeable
user
-
level teachers
are involved
in other areas of biological research at the same
university.


One of the most important aspects of bioinformatics is
identifying genes within a long DNA sequence.


Until the development of bioinformatics, the only way
to locate genes along the chromosome was to study
their behavior in the organism (
in vivo
) or isolate the
DNA and study it in a test tube (
in vitro
).


Bioinformatics allows scientists to make educated
guesses about where genes are located simply by
analyzing sequence data using a computer (
in
silico
).

Bioinformatics in our Laboratories

This makes the task of characterization
of a gene
sequence
obtained
from an
experiment relatively easy.


To
this end, the biologist performs a database search on
several of the publicly accessible and frequently updated
sequence databases available on the internet.



The
gene sequence is compared with the sequences in the
DNA database, resulting in a ranked list of the ‘hits’ to
the most similar sequences found in the database.

Just a few sufficiently similar sequences are usually
enough to

predict the properties and hence the natural
function of the new gene or protein with considerable
probability.


If no obviously similar sequences are found in the
databank, then more sophisticated tools, such as pattern
searching, could provide characteristics to predict
properties of unknown genes or proteins.


The majority of current molecular biology research relies
on these techniques
.

JUTE: A Symbol of National Identity


For Bangladeshis, jute is not just a plant that
produces
fibre
; it is rather a national icon,
linked to the adage: Sonar
Bangla



It is also linked to our quest for economic
emancipation.



In mythical golden Bengal, around which
much of our national lore is constructed, we
had these undulating rice fields together
with fields of
fibre
, golden in
colour
.


Germination
and growth of selected
accessions in growth chamber
at
16
°
C temperature ( at 35 days )

Identification of Polymorphic band With
RAPD Primer

Legend: 1
-
Var. O
-
4, 2
-
Var. O
-
9897, 3
-
Acc.1805, 4
-
Acc.1540, 5
-
Acc.1805, 6
-
Acc.1852, 7
-
Acc.2015.



1 2 3 4 5 6 7

1200 bp
Polymorphic
band

Low temperature tolerant

Low temperature sensitive

O
-
9897

Acc. No. 1805


Low
temperature sensitive and tolerant parents

F
2

population was raised by crossing
these two varieties

Var. O
-
9897

Acc. 1805

Hybrid (
F
2
)

3 day germination of parents & F
2

seeds at 16

C

Lanes 2, 4, 6
-
9 are from susceptible
plants, and Lanes 3, 5, 10
-
13 are
from tolerant plants

Amplified fragment closely linked to low temperature
tolerance: Sequence Analysis

Amplification
with arbitrary
decamer

has
identified a
fragment that
shows strong co
-
relation with the
low temperature
resistance trait in
jute

The fragment
is isolated
from the gel
and cloned in
a plasmid

The fragment
is sequenced
and analyzed
with
Bioinformatic

Tools

1200
bp

polymorphic band

Initial sequence of 1200
bp

and 174
bp

exon

in it

1.
Branch Point “A”


5.


Coding Region

2.
Acceptor Site “G”


6.


Non
-
Coding Region

3.
Stop
Codon

4.
Amplification Primer

Using sequence homology this gene was identified as a
vacoular

protein sorting protein, VPS 51


Jute
Sequence
: (translated sequence)

ESTLKLGSILTDGQVGIFKDRSAAAMSTFGDILP
V
QA
G
GLLSSFT
T
TRS
DS
-

ESTLKLGSILTDGQVGIFKDRSAAAMSTFGDILP
A
QA
A
GLLSSFT
N
TRS
E
-

Arabidopsis Sequence:
(sequence match from data base)


Out of 51 Amino
Acids:
46 Identical + 3 Highly Conserved + 1 Addition
+ 1 Substitution.

Bioinformatics analysis of the sequence

INTRON

EXON




Some

gene

specific

primers

from

new

sequences

and

degenerate

primers

from

the

upstream

of

the

Arabidopsis

VPS

51

gene

sequence

were

designed

and

further

sequence

was

explored

by

degenerate

primer

based

gene

walking

and

5
'

RACE



Complete
sequence

of this putative gene: How
do we get hold of it?

AGAACTTGTTCTGGAAGGTGATTTGGAACACAACAATATCTTGTCCAATTATCCTCTGCTTCAGCACCAGTGGCAGATCGTTGAGAGCTTTAAGGCT
CAGATTTCGCAGAGGAGTCGCGAGAGGTTGTTGGATCGAGGCCTTCCCGTTGCTGCCTATGCTGATGCCTTGGCTGCCGTGGCTGTAATTGATGAT
CTGGATCCTGAGCAGGCACTTGGGCTGTTTCTTGAAACGAGGAAGACTTGGATATTGCGTGCATTGAATGCTTTTGCTTCTGCTTCTGCTGGTAATG
CTGCTGATGCTACCTCTTCCATTGCCATTTCAGTGTTTTGTGATGTCTTGAGCATAATTCAAGTTAGCTTAGCGCAGATAGGGGAATTGTTTTTGCAT
GTTTTGAATGATGTGCCTCTCTTCTATAAGGTTATTTTGGGCTCTCCCCCTGCTTCTCAGTTGTATGGTGGGATACCAAATCCTGATGAGGAGGTAA
GGTTGTGGAAATCTTTTAGGGATAAATTAGAATCTGTAACAGTTATGCTCCCTA
AAACTTTCATTTCGAGCACTTGTTGGAATTGGTCGCTGTATTGC
GGAGAACAGATTGGCAATAAGATTAATGGGAGGTATCTCGTTGATGCCATACCAAGTGGCCAAGAACTTGCAACTTCTGAGAAGTTGATAAGGCA
CACAATAGAAAGCAAGGAGGTTTTGGAAGGGAGTTTGGAATGGCTTAAAAGTGTTTTTGGGTCTGAGATCGAGATGCCATGGGATAGGATTAGAG
AACTTGTTCTGGAAGGTGATTTGGATCTTTGGGATGAGATATTTGAAGATGCTTTCGTTAGGAGGATGAAAGTAATTATCGACTTACGATTTGAAGA
TCTGACGAGATCTGTCAATGTACCAGATGCAGTCCGTACTATTGTGGTCACAGCTGGTGAGAAGATGGATTTCCAGGCATATTTGAATAGGCCTTC
TAGGGGTGGGGGGATTTGGTTCACAGAACCTAATAATGTTAAGAAGCCTGTTCCACTATTGGGAAGTAAAGCATTAACTGAAGAAGATAATTTCCA
AAGTTGTCTCAATGCCTACTTTGGTCCTGAAGTGAGTCGAATTAGGGATATAGTAGACAGCTGCTGCAAAAGCATTCTTGAGGATCTATTGAGTTTC
TTAGAATCTGCCAAGGCATCTCTGAGGTTGAAGGATCTAGTTCCATATCTGCAGAATAAATGTTATGAAA


CTAGTTCCATATCTGCAGAAATAAAATGTTATGAAAGCATGTCAGCCATATTGAATGAACTAAAAACTGAGCTTGATATTTTATAC
ACGTCCATCGGAAGTGAACATAAGGAAGGTGATTCTGTGCCTCCTCCTATAATTGTTGAGAGATCCCTATTTATTGGCCGACTCAT
GTTTGCATTTGAGAAATACTCTAAACACATTCCTTTGATTCTTGGTTCTCCACGGTTCTGGGTGAAATACACATCCACTGCAGTTTT
TGAGAAGTTACCTTCCCTGTGGCAGTCTAAAGTTGCCACCGATTCTCCTCTCTCTAACGGCCTTGGAATACAAATGTTCAGTGGCT
CCCAGAGGCAAAGTTCGTCTACTACTTCCGCATTGCTTGGAGCAAATGAAAGTGCAAGCCCTAAACTTGACGAACTTGTTAAGAT
TACGCGAGAGCTCTGCATCAGAGCTTACAGCTTGTGGATATTATGGCTTTATGATGGGCTTTCAGTAATTCTCTCTCAGGAGCTTG
GACAAGATGATGGATTATCTGCAACATCTCCCTTAAGGGGTTGGGAAGAGACAGTTGTTAAGCAAGAACAGACCGATGAGGGGT
CATCAGAGATGAAAATATCACTACCGTCAATGCCTTCTCTTTATGTCATCTCCTCCTATGCCGAGCATGCAGTTCCGCA

CTGTATTGGAGGCCATGTTCTTGATAAATCCATTGTGAAAAAGTTTGCATCAAGCCTCACCGAAAAGGTCATTTCTGTCTACGAA
AATTTTCTCTCTAGTAAAGAAGCCTGTGGAGCTCAAGTGTCAGAGAAAGGAATTTTGCAGGTCTTGTTAGACATAAGATTTGCTA
CTGATATTCTTTCAGGTGGTGATTTCAATGTGAATGAAGAGTTATCTAGCACATCAAAGACAAAATCATCATTTAGAAGGAAGCA
GGATCAAATTCAGACAAAGTCTTTTATTAGAGAACGTGTTGATGGGTTAATCTATCGTCTTTCGCAAAAATTAGATCCCATTGATT
GGCTCACGTATGAGC

CATACTTATGGGAGAATGAAAGGCAAAAGTACCTCCGGCATGCTGTCCTCTTTGGGTTCTTTGTTCAACTTAATCGAATGTACAC
AGATACAATGCAAAAACTGCCTACAAATTCAGAGTCAAATATAATGAGATGTTCTGTGGTTCCACGGTTCAAATATCTTCCAATA
AGTGCTCCAGCATTGTCTTCTAGAGGGACTACTGGGGCATCTATTACAGCTGCCTCAAATGATATTGCTTCAAGAAGTTCCTGGA
GAGCTTATACAGATGGAGAGATTTCCCGGAAAGTTGATATGGATGACCAACAGAGTTTTGGTGTTGCAACGCCATTCCTAAAGT
CTTTCATGCAG

GTTGGAAGTAAATTCGGAGAGAGCACTTTAAAGTTGGGATCTATACTAACGGATGGGCAAGTGGGCATATTCAAGGATAGATC
AGCAGCTGCCATGTCAACATTTGGTGACATTTTACCTGTACAAGCTGGGGGATTTCTTTCTTCATTTACCACCACCAGATCAGA
TTCTTGA

Initial
Sequence
from
polymorphic
RAPD primer


5’ RACE sequence

Sequence
from anchor
PCR

1
st
Degenerate
RT
-

PCR


2
nd
Degenerate

RT
-
PCR


Total
cDNA

sequence obtained (2788
bp
) so far

TGCTCCAGCATTGTCTTCTAGAGGGACTACTGGGGCATCTATTACAGCTGCCTCAAATGATATTGCTTCAAGAAGTTCCTGGAG
AGCTTATACAGATGGAGAGATTTCCCGGAAAGTTGATATGGATGACCAACAGAGTTTTGGTGTTGCAACGCCATTCCTAAAGT
CTTTCATGCAG

5’ RACE
sequence

Intron

Exon

5’

3’

SDLT
-
N O
-

9897
-
N SDLT
-
LowT

O
-
9897
-
LowT SDLT
-
D O
-

9897
-
D SDLT
-
ABA O
-
9897
-
ABA

Expression pattern of the putative
vps

gene

Putative
vps

gene


Actin

Expression pattern was
analyzed under different
stress conditions (Low
temperature, Dehydration,
Abscisic

acid) after 48
hours
.

The expression
was down
regulated for O
-
9897 but up
regulated in low
temp. tolerant
jute plants

Intron

Exon

Primers were designed to explore the polymorphism
between low temperature tolerant and sensitive varieties

3’

5’

Polymorphic bands were found for the last intron of
a putative
vps51/vps67

gene in different species of
jute and
kenaf

Lane
-
L1: 1 Kb+

Lane
-

L2: 1 Kb+

Lane
-
L3: 1 Kb+

Lane
-
12: O
-
4

Lane
-
1:
C.
fascicularis

Lane
-
5:
H.
cannabinas

Lane
-
9:

C. caps
(CC45)

Lane
-
L5: 1 Kb+

Lane
-
2:
C.
aestuans

Lane
-
6:
H.
radiatus

Lane
-
L4: 1 Kb+ Ladder

Lane
-
13: O
-
9897

Lane
-
3:
C.
siliquosus

Lane
-
7:
H.
acetosella

Lane
-
10: OM1

Lane
-
14: SDLT

Lane
-
4:
C.
tridens


Lane
-
8:
H.
sabdariffa

Lane
-
11: O
-
72

Three bands
for
SDLT

2000

bp
-
top

band


1750

bp
-
mid

band


1500

bp
-
bottom

band


1750
bp

950

bp


L1

1
2 3 4

L2
5 6 7 8 L3 9 L4 10 11 12 L5 13 14

P
-
box element and O2
-
CS element were found in the conserved
sequences of the last
intron

of
vps51
gene
of

jute
(
Software:TFSitescan
).

Forward strand of the
conserved sequence

Reverse strand of the
conserved sequence

Protein binding motifs were searched

Motifs were confirmed by the online motif
analyses tools (Software: Jasper & ConSite).

Motifs were further confirmed

Protein binding
motifs in the
conserved sequence
suggest that the
sequence might be a
promoter itself or a
part of a promoter;
This is now under
study.

Genes for
miRNAs

were searched for in the
introns

but surprisingly one was found in
exon

5

Pri
-
miRNA
sequences

were
assumed to be transcribed
from
the
reverse complementary sequence of
exon

5
(Software:
miR
-
abela
).

Exon

5

5’

5’

3’

3’

Strand separation

Transcription

Binding of miRNA with exon 5

Degradation of
mRNA

5’

3’

Exon 5

Exon 5 (mRNA)

miRNA

3’

5’

Reverse strand

Pri
-
miRNA

sequence is assumed to be transcribed from
the reverse complementary sequence of
exon

5

miRNA

ID

Mature

miRNA

sequence

mir_
1
_
18
_
24

5

-

AGAGGUCCUUGAAGACUUCGUUAU

-
3


mir_
1
_
24
_
20

5

-

CCUUGAAGACUUCGUUAUAG

-
3


mir_
1
_
59
_
22

5

-

UUAUCUACGGGGUCAUCAGGGA

-
3


mir_
1
_
63
_
23

5

-

CUACGGGGUCAUCAGGGAGAUCU

-
3


Pri
-
miRNA

sequence was
then
subjected to
to mature
miRNA

sequence
prediction
(Software:
miRPara
).


mir_1_18_24


5’AGAGGUCCUUGAAGACUUCG
UUAU
-
3’


confirmed by stem loop RT
PCR


The putative
miRNA

possibly acts on the

vps51 gene in jute

Expression patterns of the putative
miRNA

and the
vps

51
gene

Expression patterns of the
putative
miRNA
, as well as that of
vps51 gene were analyzed under
different environmental stress
conditions (Low temperature,
abscisic

acid, dehydration,
fungus and salt) at different time
intervals (24 hours, 48 hours, 72
hours) in both low temperature
tolerant variety (SDLT
-
1) and
sensitive variety(O
-
9897)

A



B


N A C D F S

Figure
:

A
.

Expression

pattern

of

Vps
51

gene

for

SDLT
-
1

at

24

hours

on

1
.
5
%

agarose

gel
.

B
.

Band

of

Actin

for

normalization
.

Legend
:

N
:

Normal
;

A
:

Abscisic

acid
;

C
:

Cold
;

D
:

Dehydration
;

F
:

Fungus
;

S
:

Salt
.


0
2000
4000
6000
8000
10000
12000
14000
16000
SDLT
-
1 (
Vps

51,
24 hours)

A. Expression pattern of predicted
miRNA
. B. Band

of
Actin

normalization. . Legend: N: Normal; A:
Abscisic

acid; C: Cold; D: Dehydration; F: Fungus; S: Salt.


0
5000
10000
15000
20000
SDLT
-
1 ( miRNA 24 hours)

miRNA

after 24 hours of stress


N A C D F S

From Jute Gene can we
now think of Jute
Genome??

Options for dealing with uncharacterized genomes



‘Borrow’ a reference genome from a
phylogenetic

neighbour





OR


Take a deep breath and ‘
do
denovo



Denovo

Genome


Denovo

Transcriptome


DNA or RNA
Sequence Data

Assembly

Gene Annotation

Genetic Variation

Non
-
coding RNA

Transcript Variation

Plant Genomes : Haploid Size

Human

Arabidopsis

Jute

Rice

Potato


Sugarcane

Cotton


Barley

Diameter proportional to Haploid Genome Size

Raw Data


Assembly

Repeat
Masking


Annotation


Structural
Annotation


Functional
Annotation


JUTE GENOME ANNOTATION PIPELINE

Customized PERL Script


ASSEMBLY


REPEAT MASKING

Analyzing RAW DATA

Gene Prediction


De Novo Prediction

Glimmer

Augustus

SNAP

GeneMark


Homology Based Prediction

GeneWise

HalfWise


Non
-
coding RNA Prediction

tRNA

Scan (
tRNA
)

snoScan

(
snoScan
)

snoGPS

(
snoScan
)

Structural Annotation

Domain Annotation

InterProscan

Pfam


GO (Gene Ontology) Annotation

Homology Based Annotation Transfer


Pathway Annotation

KEGG (Kyoto Encyclopedia of Genes and
Genomes)

Functional Annotation

Work is underway to analyze genome sequence to
discover biomarkers for critical conditions such as
temperature, disease, drought, salinity etc.




Being used to study factors affecting growth, for
example, in finding essential functional genomic
mechanisms of synthesis of a jute plant grown under
stress.


Has created a huge collection of unique DNA spellings
that will correlate with particular signs or outward
characteristics called phenotypes.

What did we gain?

More Success in Genomics:
Macrophomina

genome,
another case study

Characterizing the
Macrophomina

Genome


Sequence characterization


Sequence comparison


Structure comparison


Systems Biology:


Knowledge based predictions (e.g. text
-
minng
)


Protein
-
protein interaction (PPI)


Protein
-
small molecule interaction (PSMI)


Protein
-
macromolecule interactions (e.g. carbohydrate)



Gene regulatory networks (GRN)



Metabolic/
reactome

networks



Cellular networks



Cluster based functional annotation




Macrophomina

genome: overall
statistics

Functional clusters in
M.
phaseolina


14,249 predicted proteins
using comparative genomics




Homologs

for
7,999 proteins
in nine fungal genomes




77
paralog

families having
more than six proteins




Largest
paralog

family
contains 6,210
paralog


Species wide genome comparisons


13 fungal species have
58,314 proteins in common




Among the 14,249
M.
phaseolina

genes, 7,767
(54.10%) are shared with
F.
oxysporum



Both the fungus showed
broad host range



M.
phaseolina

is
phylogenetically

closest to
Botryosphaeria

dothidea


Predictions of transposable elements


Transposons

are often
important as they associate
with PAMP (Pathogen
Associated Molecular
Pattern)



The
M.
phaseolina

genome
comprises 2.84% repetitive
DNA and 3.98%
transposable elements. The
transposable elements are
classified into 11 families

Prediction of carbohydrate degrading
enzyme


Cocktail of hydrolytic enzymes
(including carbohydrate
-
active
enzymes;
CAZymes
) are used for
degrading the plant cell wall and
penetrating into the host tissue



The
M.
phaseolina

genome encodes
362 putative
CAZymes

including 219
glycoside
hydrolases

(GH), 56
glycosyltransferases

(GT), 65
carbohydrate
esterases

(CE), 6
carbohydrate binding modules
(CBM), and 16 polysaccharide
lyases

(PL) comprising more than 80 distinct
families. These enzymes do not
appear to be tightly clustered.

Phenotypes profiles


Phenotype profile for the
fungus was generated by
growing them under
different conditions




Among the 14,249
M.
phaseolina

genes, 7,767
(54.10%) are shared with
F.
oxysporum



Both the fungus showed
broad host range



M.
phaseolina

is
phylogenetically

closest to
Botryosphaeria

dothidea


Pathogenicity

regulatory network


A range of host
-
fungus interaction is
responsible for
pathogenesis



Bioinformatics: a prolific field in
bioscience publications



Famous journal
Genome Research
has few papers with
Bioinformatic

focus
.
But they account for

8 of the journal’s 12 most
-
cited papers (5000
times on average) (out of 2,663 publications
as of 1 January
2009).




In
Bangladesh

Bioinformatics papers has also made it to
journals with
good IF
. Department of Biochemistry and Molecular Biology, Department of
Genetic Engineering and Biotechnology, Department of Computer Science
and Engineering at
University of Dhaka
, Department of Genetic Engineering
and Department at
Shahjalal

University of Science & Technology,
Department of Genetic Engineering and Biotechnology, Chittagong
University
has been prolific in the field.




In last
3 months
there has been roughly
20 publications
published/accepted in different peer reviewed journals from the above
institutions from student researchers which stands for roughly a staggering
7
publications per month.



This statistics are even more staggering keeping in mind the
meager
research facilities

available for the students.


In spite of these success we need to develop the
necessary national bioinformatics network and
human resource development programs.


We need to develop further the infrastructure,
connectivity, and resources for bioinformatics.