Bioinformatics Exercise 1 - Cell Biology Promotion


Oct 1, 2013 (5 years and 7 months ago)


Introductory molecular biology computing and bioinformatics
for Molecular Mechanisms of Development.

Barcelona 2006

Exercise Set 1


Find the protein amino acid sequence of the mouse haematopoietic cell
protein kinase called “tec” described by Mano
et a

in 1993. Create a text
file (.txt) containing this sequence in FASTA format.


Find the sequence of the protein with accession number AAA37592.
Create a text file (.txt) containing this sequence in FASTA format.


Find the complimentary DNA (cDNA) sequenc
e with accession number
BC018394. Translate this nucleotide sequence into the corresponding
protein amino acid sequence and save this file. Edit the protein sequence
into FASTA format and save it as a text file (.txt)


Perform a BLAST search with the tec

kinase sequence you have saved.
Make notes on any conserved domains that are expected to be present in
the protein. Format your output before proceeding.


Examine the “E values” or “Expect scores”. What have these scores been
used to do to your list of

BLAST hits? Can you follow a link from the
BLAST page to notes on what this score means?


From the BLAST search output find the protein sequence for Bruton
agammaglobulinemia / Bruton’s tyrosine kinase….. submitted by Tsukada
et al
. in 1993. Create a tex
t file (.txt) containing this sequence in FASTA


From the BLAST search find the protein sequence for mouse BMX non
receptor tyrosine kinase submitted by Ekman
et al
. in 1997. Create a text
file (.txt) containing this sequence in FASTA format.


te a single text file (.txt) containing all of your saved protein
sequences. Edit the top identifier line (everything to the RHS of the >
symbol) so that each sequence is identified by a single short name, each
one beginning with either a different letter

or number. Eliminate any blank
lines to remove spaces between the blocks of to describe each sequence.


Use this file containing all of your sequences to obtain a multiple sequence
alignment (MSA) and save the alignment. You should use either ClustalW
r Multalin.

Please note the following points if you use ClustalW/Boxshade,


set the ClustalW output format to


follow the link to obtain the output file in .aln format (alignment file) and
copy the resulting page


paste the .aln output into t
he Boxshade window and set output to
either rtf new or rtf old and input to ALN


Use PFAM and InterPro (and any other programmes from the “Analysing
sequences” section of the tool box) to analyse your tec kinase protein
sequence. Determine which, if any
, domains are common to all of
sequences present in your MSA. Annotate your alignment to highlight this
region of similarity.

Geraint Thomas

Department of Physiology,