20120203194431!EMBOSS_on_Biolinuxx

raviolirookeryΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 6 μήνες)

77 εμφανίσεις


Running BioLinux

The Desktop
:

1)

The desktop has some excellent icons to help get started
.

There are two User
Guides, a folder of Sample Data for your programs, and an icon to a search
screen that will display the documentation for the installed programs.

The top User Guide has information and instructions for the administrator of
the
Bio
Linux
virtual machine (
VM
)
.


It contains information on setting up user
accounts, getting so
ftware updates for the

VM, system configuration, installing
new software, and many other things.

The second User Guide is more of a User tutorial, and more detailed than

this
handout
.


It explains the basic Linux commands, and then some of the installed
BioInformatics programs.


It also has a few exercises to learn to navigate the
system.


It is written for the
U
nix

novice; experienced
Unix

users can skip to
"Part Two: Introduction to Bioinformatics on Bio
-
Linux".

The icon called "BioInformatics Docs" contains some documentation on all of
the pre
-
installed BioInfo
rmatics programs in Bio
-
Linux.

2)

The OpenOffice software suite, and editors "ge
dit", "nano", "pico", and "vi"
are included.

3)

The top Bio
Linux taskbar
.

From left to right the things you see in the top
taskbar by default are:

a.

Applications menu
-

the Bioinformatics sub
-
h
eading has links to many of
the
installed programs

b.

Places menu
-

links to folders and installed hardware

c.

System menu
-

to customize and administer your system

d.

Firefox Web Browser

e.

Ubuntu Help file

f.

Evolution Mail (reading and sending emails)

g.

Terminal (opens a terminal window)

You may or may not then find an
exclamation mark in a bubble.


If you do,
it is the Package Manager alerting you that there are updates available for

your system.

Then, there might be a network symbol
,

or up and down
arrows


this means you have a network connection.


Alternatively, you

may se
e a little red exclamation mark, which

means you have no network
connection.

h.

The Volume Control icon for audio

i.

The envelope icon allows you to set up Chat or Mail on the system.

j.

T
he System Clock
-

you can also click on this to open a calendar.

k.

T
he networking menu


displays the username.


Use this menu for social
networking, including chat accounts, logging into Ubuntu One, etc.

l.

The Power button


clicking on

this brings up a menu with options to:



Lock screen



Log out




Restart, or



Shu
t down th
e computer

4)

The bottom task
-
bar displays icons for open windows, and at the right hand
side, links to four virtual desktops, and the Recycle bin.


Note: the Bio
Linux VM comes with software already installed, but not with data.

The example files here contain an example Blast DB, but it is quite small, and
insufficient for actual research work.

Running BioInformatics Programs

(EMBOSS)
:

Exercise 1:

EMBOSS Graphical Commands

EMBOSS is "The European Molecular Biology Open Software
Suite", and i
t can be
run graphically on Bio
Linux.

(More information can be found in

the
documentation pages on Bio
Linux, or from the official EMBOSS overview,
http://emboss.sourceforge.net/what/#Overview

.)

1)

Start Jemboss:

Click on the "
Applications
" menu on the top task bar. Open

Applications
-
> Bioinformatics
-
> Jemboss


Click on each of the categories (eg.,

Alignment, Display, etc.) to see what
programs are listed.

2)

Then, click on "Feature Tables" and choose "coderet".

At the bottom right hand side of the window is an "i" button.

Click on it to
open the documentation window, and read what coderet does.

3)

At the top of the Jemboss window, fill in a "Sequence Filename".

(eg.,
embl:BX255937).

4)

Fill in an output filename in the "output file name" box.


Remember that it is
important to give your files descriptive and distinctive names.


Files will
overwrite ea
rlier files with the same name.


(eg., jemboss_bx.coderet)

5)

Hit the "GO" button at the bottom of the window.

6)

When the program has finished, a new window called Saved Results should
appear.


(Don't be fooled


your results haven't been saved yet!)


There s
hould
be a number of tabs in that window.


One will be called t
he name you entered
into the out
put
file

name box (e.g., jemboss_bx.coderet).


The others will likely
be called things like bx255937.mrna, bx255937.noncoding, etc.

7)

Take a look at the type of i
nformation in each tab.


Notice that:



each of the tabs that contains sequence information contains multiple
sequences



The

command line you would use to run this program
,

identically to how
you just ran it via Jemboss
,

is provided to you under the

cmd


t
ab.


This
will be useful later.

8)

Save the data to a local file.

Click on the tab with the name ending in .mrna.

Under the "File" menu, choose "Save to Local File..." and save this to a location
you can find again (e.g., under your bioinf_files

directory).


Give it a name that
will distinguish it from later work (e.g., jemboss_bx.mrna).


Do not close the
"Saved Results" window as we want to refer to the information under the
"cmd" tab later.

9)

Go back to the main Jemboss window, and choose

NUCLE
IC
-
> REPEATS
-
> palindrome

from the list of programs.


10)

Next to the box under "Sequence Filename" (near the top of the page),
there is a "Browse files..." button.

Use that t
o find the file you just saved.
Note that you'll have to change the "Files of Ty
pe:" option to "All Files" to find
your saved file, because it has a .mrna suffix.

11)

Check that you're happy with all the required options, and giv
e a filename
in the outfile

name box. (eg., jemboss_palin.txt).

Then press the GO button.

12)

Scan through the r
esults to see what has been returned to you.







EMBOSS Command Line Tools

Exercise 2
: Use a sequence analysi
s tool that is available in BioL
inux
.

The European Molecular Biology

Open Software Suite (EMBOSS)
is a collecti
on of
many computational tools used in
bioinformatics
.

In this exercise we will use
some of these tools to retrieve and analyze DNA sequence data.

To use the EMBOSS command line programs, start the BioLinux and open up a
te
rminal, using one of the method
s described in Part One. Then:




Step 1
: Get

a
Strawberry DNA Sequence using
“seqret” (sequence retrieval
). T
ype
in the command
s that are high
-
lighted /shaded:










$
seqret

Reads and writes (returns) sequences.


Input (gapped) sequence(s):
embl
:af193789

output sequence(s) [af193789.fasta]:


$
view af193789.fasta


Step 2
: To get
sequence data containing
c
omplete i
nformation relating to the
sequence
,

use
“entret” (entry retrieval
). T
ype in the comman
ds that are high
-
lighted/shaded:
























$
en
tret

Retrieves sequence entries from flatfile databases and files
.

Input sequence(s):
embl:af193789

Output file:
Full text of a

sequence database entry
[af193789.entret]:


$
view af193789.entret


Step 3
: Find the coding region from the
entret

file and wri
te it down.






Step

4:
Translate Nucleic Acid Sequence to Amino Acid Sequence
using
“transeq”.
T
ype in the comman
ds that are high
-
lighted/shaded:




































$
transeq

Translate nucleic acid sequences

Input nucleotide sequence(s):
embl:af193789

protein output sequence(s) [af193789.pep]:
protein1


$
view protein1

>AF193789_1 Fragaria x ananassa alcohol
acyltransferase (AAT) mRNA, complete cds.

TYFAKMEKIEVSINSKHTIKPSTSSTPLQPYKLTLLDQLTPPAYVP
IVFFYPITDHDFNL

PQTLADLRQALSETLTLYYPLSGRVKNNLYIDDFEEGVPYLEARV
NCDMTDFLRLRKIEC

LNEFVPIKPFSMEAISDERYPLLGVQVNVFDSGIAIGVSVSHKLI
DGGTADCFLKSWGAV

FRGCRENIIHPSLSEAALLFPPRDDLPEKYVDQMEALWFAGKK
VATRRFVFGVKAISSIQ

DEAKSESVPKPSRVHAVTGFLWKHLIAASRALTSGTTSTRLSIAA
QAVNLRTRMNMETVL

DNATGNLFWWAQ
AILELSHTTPEISDLKLCDLVNLLNGSVKQC
NGDYFETFKGKEGYGRM

CEYLDFQRTMSSMEPAPDIYLFSSWTNFFNPLDFGWGRTSWI
GVAGKIESASCKFIILVP

TQCGSGIEAWVNLEEEKMAMLEQDPHFLALASPKTLI*RY*LR
KIMWLVQCFDFAVNKV*

ISSPANQ*NASMIDFVYVCYPNVFPYACNQYSSLL*QMLY*AS
SYKVIYLLKIKLWKFYQ

KKKKKK


Step 5
:

Choose a region to translate. T
ype in the comman
ds that are high
-
lighted/shaded:






























$
transeq
-
regions 16:1374

Translate nucleic
acid sequences

Input nucleotide sequence(s):
embl:af193789

protein output sequence(s) [af193789.pep]:
protein2


$
view protein2

>AF193789_1 Fragaria x ananassa alcohol
acyltransferase (AAT) mRNA, complete cds.

MEKIEVSINSKHTIKPSTSSTPLQPYKLTLLDQLTPPAYVPIVFFYP
ITDHDFNLPQTLA

DLRQALSETLTLYYPLSGRVKNNLYIDDFEEGVPYLEARVNCDM
TDFLRLRKIECLNEFV

PIKPFSMEAISDERYPLLGVQVNVFD
SGIAIGVSVSHKLIDGGTA
DCFLKSWGAVFRGCR

ENIIHPSLSEAALLFPPRDDLPEKYVDQMEALWFAGKKVATRR
FVFGVKAISSIQDEAKS

ESVPKPSRVHAVTGFLWKHLIAASRALTSGTTSTRLSIAAQAVN
LRTRMNMETVLDNATG

NLFWWAQAILELSHTTPEISDLKLCDLVNLLNGSVKQCNGDYF
ETFKGKEGYGRMCEYLD

FQRTMSSMEPAPDIYLFSSWTNFFNPLDFGWGRTSWIG
VAG
KIESASCKFIILVPTQCGS

GIEAWVNLEEEKMAMLEQDPHFLALASPKTLI*


Step

6: Choose

a

different frame to translate.
T
ype in the comman
ds that
are high
-
lighted/shaded:































$
transeq
-
regions 16:1374
-
frame=2

Translate nucleic acid sequences

Input nucleotide sequence(s):
embl:af193789

protein output sequence(s) [af193789.pep]:
protein3


$
view protein3

>AF193789_2 Fragaria x ananassa alcohol
acyltransferase (AAT) mRNA, complete cds.

WRKLRSV*IPNTPSNHQLPLHHFSLTSLPSWTSSLLRRMSPSCS
STPLLTMTSIFLKP*L

T*DKPFRRLSLCTIHSLEGSKTTYTSMILKKVSHTLRLE*IVT*LIF
*GFGKSSALMSLF

Q*NHLVWKQYLMSVTPCLEFKSTFSILE*QSVSPSLTSSSMEER
QTVFSSPGVLFFEGVV

KISYILVSLKQHCFSHREMTCLKSMSIRWKRYGLPEKKLLQGDL
YLV*KPYLQFKMKRRA

SPCPSHHEFMPSLVFSGNI*SLLLGH*HQVLLQQDFL*RPRQ*T
*EHG*TWRQCWIMPLE

TCSGGHRPY*S
*VIQHQRSVILSCVTWLTCSMDLSNNVTVITLR
LSRVKRDMEECASI*I

FRGL*VLWNQHRIFIYSRAGLIFSTHLILDGGGHHGLELQEKLNL
QVASS*Y*FQHNAVL

ELKRG*I*KKRKWLC*NKIPIF*R*HLQRP*FX