Seahawk

raviolirookeryΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 6 μήνες)

123 εμφανίσεις


Seahawk



What is Seahawk?


Seahawk is a browser for Moby Web services, which are online tools using a shared
discovery

registry and data formats.
This commonality lets users chain together multiple services into an
analysis pipeline, commonly
referred to as a workflow when visual and executable by a
computer.

The goal of Seahawk is to help biologists automate their analyses (i.e. create
workflows) without needing to explicitly write a program.


To make a wider array of tools available within Se
ahawk, the Daggoo system helps users adapt
forms on existing Web sites to Moby's specifications. Seahawk
has been developed
so that
Moby and external Web tools can be browsed to create workflows "by demonstration".



How is Seah
awk different from other M
oby

clients?


Seahawk is an
application that
provide
s

richer user interaction
than Web page
-
based clients.
Examples include drag ‘n’ drop utilities and tooltips. Since it is written in Java, this also means
that Seahawk

can be embedded in other application
s
,
such

Bluejay
2

or jORCA
3
.


Seahawk is data
-
centric. There is only one screen
-
type in Seahawk, the data display. Services
are shown as choices in popup menus. Displayed data can either be your own text files, binary
files (e.g. sequence traces & images), Web pages, or MOBY objects.



Figure
1
. Seahawk interface at left, with cascading menus of services that can be run on data (an NCBI gi # in this
case). At lef
t
: A workflow

generated by
Seahawk
, derived from interactively browsing services
.


How do I use Seahawk?


A visual guide provided with

this tutorial demonstrates how data
-
creation
-
by
-
highlighting, and
hyperlink navigation work in Seahawk. It also shows how to save a tab’s navigatio
n history as a
Taverna
4

workflow.



References


1
Gordon P.M.K., Sensen C.W. (2007) “
Seahawk: Moving Beyond HTML in Web
-
based Bioinformatics
Analysis
.”

BMC Bioinformatics

8:208.


2
Soh J., Gordon P.M.K., Taschuk M.L., Dong A., Ah
-
Seng A.C., Turinsky A.L., Sensen C.W. (2008) Bluejay 1.0:
genome browsing and comparison with rich customization prov
ision and dynamic resource li
nking.
BMC
Bioinformatics

9:450
.


3
Martín
-
Requena V, Ríos J,
García M, Ramírez S, Trelles O. (2010)
jORCA: Easily integrati
ng bioinformatics
web services.
Bioinformatics

26(4):553
-
9.


4
Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T. (2006) “Taverna: a tool for building
and running workflows of services.”
Nucleic Acids Res.

34(Web Server issue):W729
-
32.





Activities




The data for these exercises can be found at

http://moby.ucalgary.ca/acgc


1.

Creating a Taverna workflow in Seahawk


Creating MOBY data by highlighting text:

1.

Open Seahawk, then click on the Clipboard tab to give it the focus.

2.

In a Web browser, go to the NCBI homepage and use the text box at the top of
the
screen to look up a gene of interest to you, or “ferredoxin” if nothing strikes your fancy.

3.

Follow the appropriate links through to a GenBank protein record.

Choose FASTA from
the “
Display:”

drop down menu in the upper left corner.

4.

Highlight the seque
nce, then d
rag

in onto the clipboard.

5.

Note
that
the
imports the data as
a MOBY AminoAcidSequence.


Chaining services:

1.

Click the AminoAcidSequence link on the Seahawk Clipboard, and run the service
runNCBIBlastp, using the default parameters.

2.

Run a point mu
tation analysis (PMUT) on the Blast results.

3.

Save the PMUT analysis as an HTML document to your desktop.


Data collation with the clipboard:

1.

Open the
exercise
’s first DNA sequence in Seahawk and add it to the clipboard.

2.

Do the same with the second and thir
d DNA sequences.

3.

Run a multiple sequence alignment on the collection of three sequence
s
.


Creating MOBY data by drag ‘n’ drop:

Drop the ABI trace file (a.k.a. sequence chromatogram) onto the Seahawk window.
Note that
Bluejay can’t show the contents of the binary trace file.

Find the chain of services that will do
the following:




Do the base calling to turn the trace file into a
textual
DNA sequence “read”
.



Trim the vector sequence from the read

(the left and right vector sequences for the
experiment are also found on the tutorial Web page).



Trim the read of low quality regions (i.e. sections of 10 or bases with m
ore than 30%
ambiguous calls).



BLAST the trimmed sequence against UniProt (a non
-
red
undant reference gene
dataset)
, setting the e
-
value threshold to 1.0x10
-
5
.


Workflow export:

Save
the trace file exercise as a workflow, then try loading it in Taverna.



Notes




Notes

2.
C
reating a Moby Service from an E
xisting Web Service


What is
this?


BioMoby tools and Web Services let you extract lots of information without scripting Perl
(screen scrapping)! By the end of this exercise, you should:




Be able to find Web Services (described in WSDL) for on
-
line tools you use



Be able to extract spe
cific data from a Web Service using Seahawk


Finding Web Services


BioMoby clients: Seahawk, MOWserv, jORCA…

Biocatalogue: www.biocatalogue.org

Seekda: www.seekda.com

Google query: “toolname WSDL”


Using Web Services in Seahawk


Common Steps:

1.

Find sample input

2.

Put the sample data on the clipboard.

3.

Find the WSDL you need.

4.

Drop the WSDL tab onto Seahawk.

5.

Wait for Seahawk to display a service form in your Web browser.

6.

Drag the sample data from Seahawk onto the appropriate service form field, and
fill any
other required fields.

7.

Ensure it says “Paste Noted” in the upper
-
right corner.

8.

Click “Execute Service” and be patient!


Exercise 1

In these two exercises, we’ll try to get from an
NCBI GI
number to functional descriptions. First,
we’ll map from G
I number to the Entrez Gene ID via the GI’s Genbank record. You’ll need to
find the NCBI Entrez WSDL document

(hint: try the Google query tip above)
.


For convenience, you can use
the NCBI gi in the Seahawk “H
elp


tab, accessed by clicking the

icon. If you followed the steps
listed
above, Seahawk should recognize the
Entrez Gene ID

(and some other data) in the output. You can click on the link in the Seahawk browser and
create a Moby service producing Entrez Gene IDs. All you need to do is fi
ll in some metadata to
label and describe the service.


HINT:
The NCBI WSDL main page lists the values you can use for the “db” field. Try “protein”.


Once you’ve created the service, you will be able to see it in the Seahawk service menu
whenever any GI
number is highlights. Go ahead, try another GI!



Exercise 2


The recognized
Entrez Gene ID

format is shown in red, just above the Summary section. Create a
service that extracts
GeneRIF
s (concise functional descriptions of a gene) from the Entrez Gene’s
Web Service XML output. You will need to find the “generif” XML link, and on the next page pick
the appropriate item from the dropdown menu to get function sentences.


In case you are not familiar with GeneRIFs, below is how they normally appear in a bro
wser,
with yellow highlight added for emphasis.



…scroll down…











3. Create a Moby Service from a Web Form


If the resource you are interested in putting in your workflow does not have a WSDL
document,
you can use a nearly identical technique to the last exercise to create a
service out of a standard Web form. Instead of dragging the WSDL URL onto Seahawk,
drag the URL of the Web page containing the form. Note that Web forms can be pretty
complicated, s
o this will not work for
every

form. You will be asked in class to perform
one of these exercises manually, and the other using Seahawk, to highlight why you
might choose one method over the other.


Exercise 1

Given the 9 GI numbers for nucleotide (EST) sequences in the
online materials
, retrieve
the sequences. If the sequence contains the promoter motif, design a hybridization
probe using the online version of Primer 3. Output the motif containing sequences a
nd
the probe.


Exercise 2

Given the

GO term


seed growth
” (
GO:0080112), find the related genes in the
Arabidopsis genome, then BLAST the sequences to find the equivalent rice gene.


Notes