CAPSTONE - Bioinformatics

underlingbuddhaBiotechnology

Oct 2, 2013 (3 years and 8 months ago)

65 views

Developing novel web
-
based
Bioinformatics analysis tools for
Comparative Genomics



Kashi Vishwanath Revanna,

Capstone Presentation,

May 1, 2009


Primary Advisor:

Dr. Qunfeng Dong,

The Center for Genomics and Bioinformatics
(CGB)

1

Introduction


Comparative genomics


It is the analysis and comparison of genomes from
different species.



Identify


gene duplications.


gene inversions.


gene translocations.


gene clusters.


orthologs and paralogs.




2

Overview


B
last
O
utput
V
isualization (BOV) Tool.


visual representation of BLAST output.


Perl scripts from Rajesh Gollapudi, CGB.


C
omparative
G
enome
C
luster
V
iewer (CGCV)


gene clusters across multiple genomes.


database developed by Vivek Krishnakumar, CGB.


M
ultiple
G
enome
B
rowser (MGB)


synteny regions between genomes.



3

BOV:

BLAST Output Visualization Tool

4

Motivation



Commonly used tool for comparative genomics


Basic Local Alignment Search Tool (BLAST)*


web based at NCBI or Standalone local installation.


input


nucleotide/protein sequence(s)


database


nucleotide sequences of genes or genomes, or protein sequence.


output


textual format.


BLAST output consists of High
-
scoring Segment Pairs (HSPs) that correspond to matching
pair between the query and the database hit sequence.


Manual interpretation of these regions can/will be difficult.



5

*Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI
-
BLAST: A
new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389
-
3402.

Requirement


Post
-
processing BLAST Output.


Programs are available to

-
flexibly select BLAST matching regions. (e.g.
MuSeqBox, BioParser).

-
parse the output into database to facilitate keyword
search. (e.g. NuclearBLAST program, PLAN web
server).


Need


A tool for graphical representation of HSPs,
extracted from the BLAST output and provide
options to interactively select and analyze.




6

Specifications


To develop the tool


parse uploaded BLAST output.


extract HSP co
-
ordinates.


store the information in the database.


provide summary of query sequences and
corresponding hit sequences.


generate visual representation of HSPs.


ability to manipulate the HSPs.


7

CGB server

(Perl 5, Linux Platform)

Web interface

(DHTML, Perl, CGI)

Blast Output

(BLASTN/P/X, TBLASTN/X)

Perl Scripts

(BioPerl Modules)

MySQL

(HSPs, Projects, ..)

Email

Summary

Create Image

(Perl GD Library)

Visualization

(Javascript)

Download

(Sequences, HSP, image, ..)

Implementation

8

BLAST output submission

Screenshots

9

Query information

Screenshots

10

Screenshots

11

Program Release


BOV ver
-
1.0.7 is live and hosted at


http://bioportal.cgb.indiana.edu/bov


Web
-
pages


in
-
depth tutorial on using the tool.


download and installation manual.


Publication


Rajesh Gollapudi*, Kashi Vishwanath Revanna*,
Chris Hemmerich, Sarah Schaack, and Qunfeng
Dong (2008); BOV
-

A Web
-
based BLAST Output
Visualization Tool. BMC Genomics. 2008 Sep
15;9(1):414.

* contributed equally


12

CGCV:

Comparative Genome

Cluster Viewer

13

Motivation


Standard practice in comparative genomics


identification of conserved gene clusters across multiple genomes.


Existing tools rely on pre
-
computation strategies and algorithms
that are genome wide and computationally intensive.


Genome
-
wide orthologs for all gene families based on identifying
reciprocal best BLAST hits.


Limitations:


no optimal universal BLAST parameters for all gene families


distinguishing orthologs from paralogs on a genome
-
wide scale


when new organisms are available, time
-
consuming updates.


Requirement


Updated Database.


A tool which considers only a set of genes, perform dynamic search
against selected genomes and interactively visualize the gene cluster
conservation across the selected genomes.


14

Specification


To develop the web
-
based tool


maintain database of Prokaryotic and Eukaryotic
sequences, annotated gene information.


Database in
-
sync with NCBI and Ensembl


Use BLAST program to blast uploaded query
sequences.


User selects the BLAST database and parameters.


Generate Phylogenetic Profiling Table,


i.e., count of HSPs against a given genome with respect to
each query sequence.


Provide interactive tools to manipulate the visual
representation of the gene clusters across genomes.


15

CGB Server

(Perl 5, Linux Platform)

Web Interface

(DHTML, Perl, CGI, Ajax)

-

Select Genomes

-

Query Sequences

BLAST Program

Perl Scripts

(BioPerl Modules)

Email

Phylogenetic
Profiling Table

Create Image

(Perl, GD Library)

Visualization

(Javascript)

NCBI

MySQL

(Sequences,
GFF, GTF)

Ensembl

Perl Scripts

(download,

daily updates)

GFF format file

Database (CGB)


Implementation

16

Download

(BLAST output, ..)

Screenshots

17

Screenshots

18

19

Program Release


CGCV ver
-
1.0.5 is live and hosted at


http://cgcv.cgb.indiana.edu/


Web pages also provide


in
-
depth tutorial to use the tool


step
-
by
-
step procedure for local installation.


update information on database.


Publication
:


Kashi Vishwanath Revanna, Vivek Krishnakumar &
Qunfeng Dong (2009) A web
-
based software system
for dynamic gene cluster comparison across
multiple genomes.
Bioinformatics
, 25(7):956
-
957


20

MGB: Multiple Genome Browser

21

Motivation


Comparative Genomics involves determination of
the synteny regions between two or more genomes.


Synteny is the preserved order of genes between
related species.


Currently available tools like SynBrowse*, provide
visualization of synteny between genomes but it
involves pre
-
computation of alignments.


* Pan X, Stein L, Brendel V: SynBrowse, a synteny browser for comparative sequence analysis.
Bioinformatics 2005, 21(17):3461
-
3468.

22

Specification


To develop a web
-
based tool for visualizing
synteny for multiple genomes.


To allow users to determine the synteny by using
their choice of sequence comparison
methods/tools.


To be portable with simple installation
procedure.

23

Progress


Currently building this tool.


Expected time of completion


End of June.

24

Conclusion


Web
-
based tools were built to assist a Biologist
in Comparative Genomics.


Design, implementation, testing, maintenance
and provide support.


Balance between usability, functionality and
portability.


Future work


further development.


incorporate these tools in their workflow.




25

References



Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST
and PSI
-
BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997,
25(17):3389
-
3402.



Dong Q, Lawrence CJ, Schlueter SD, Wilkerson MD, Kurtz S, Lushbough C, Brendel V:
Comparative plant genomics resources at PlantGDB. Plant Physiol 2005, 139(2):610
-
618.



Xing L, Brendel V: Multi
-
query sequence BLAST output examination with MuSeqBox.
Bioinformatics 2001, 17(8):744
-
745.



Catanho M, Mascarenhas D, Degrave W, de Miranda AB: BioParser: a tool for processing of
sequence similarity analysis reports. Appl Bioinformatics 2006, 5(1):49
-
53.



Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf
I, Lapp H, Lehvaslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger
M, Stein LD, Stupka E, Wilkinson MD, Birney E: The Bioperl toolkit: Perl modules for the life
sciences. Genome Res 2002, 12(10):1611
-
1618.


Pan X, Stein L, Brendel V: SynBrowse: a synteny browser for comparative sequence analysis.
Bioinformatics 2005, 21(17):3461
-
3468.


Wang H, Su Y, Mackey AJ, Kraemer ET, Kissinger JC: SynView: a GBrowse
-
compatible approach
to visualizing comparative genome data. Bioinformatics 2006, 22(18):2308
-
2309.


Fong C, et al. PSAT: a web tool to compare genomic neighborhoods of multiple prokaryotic
genomes. BMC Bioinformatics (2008) 9:170.


Koski LB, Golding GB. The closest BLAST hit is often not the nearest neighbor. J. Mol. Evol.
(2001) 52:540

542.


Markowitz VM, et al. The integrated microbial genomes (IMG) system in 2007: data content and
analysis tool extensions. Nucleic Acids Res. (2008) 36:D528

D533.


Uchiyama I, et al. CGAT: a comparative genome analysis tool for visualizing alignments in the
analysis of complex evolutionary changes between closely related genomes. BMC Bioinformatics
(2006) 7:472.

26

Acknowledgment


Dr. Qunfeng Dong.


Bioinformatics Director,


The Center for Genomics and Bioinformatics (CGB)


Bioinformatics Faculty and Staff,

School of Informatics.


Friends and Colleagues at CGB for their support and
resources.


Special Thanks to my family.



Thank You.


27

28