CISC 4020 Bioinformatics

vivaciousefficientBiotechnology

Oct 1, 2013 (3 years and 9 months ago)

95 views

CISC 4020 Bioinformatics








Lab Exercise #4
:
Multiple Sequence Alignment


(
1)

Practice using three NCBI resources to obtain groups of sequences in the FASTA format that
you can use for multiple sequence alignment. Select a keyword such as cytoc
hrome (or ferritin,
S100, or trypsin).

Approach 1: Enter search term in the homepage of NCBI and follow the link to HomoloGene. By
default, the entries are displayed in summary format. Using the pull
-
down menu, change the
display to Multiple Alignment. Th
is allows you to scroll through a series of multiple
sequence
alignments. Select one

for further st
udy. It is helpful to choose one

in which there are some
gaps, so that you can evaluate the performance of various software programs

(in exercise #2).
Once y
ou identify a group of proteins, click to view that HomoloGene group, and change the
display to FASTA. Copy and save these sequences in a text file.

Approach 2: Repeat this exercise b
eginning at the NCBI homepage,
and this time click the link
to CDD (Cons
erved Domain Database). There are pfam, cdd, smart, and/or COG identifiers.
Select an entry with a CDD identifier (such as cd00904 for ferritin). Here, a multiple sequence
alignment is shown. Change the format to obtain the desired number of proteins in th
is family
(e.g., up to 5, 10, or 20) in the FASTA format; you may select the most diverse members of this
group.

Approach 3: Perform a blastp search using a query such as ferritin light chain (NP_000137) and
inspect the pairwise

alignments to the query. Se
lect a group of ten proteins by clicking on the
box next to each and click “Get selected sequences.” These ten proteins appear on an Entrez
Protein page; change the display option to FASTA and use the pull
-
down menu option “send to
text.”


(
2)

Using the F
ASTA
-
formatted sequences obtained from exercise #1, perform multiple
sequence alignments using programs available at the European Bioinformatics Institute

(
http://www.ebi.ac.uk/Tools/sequence.html
)
:

ClustalW, MAFFT, Muscle, and T
-
Coffee. Save and
compare each result. How do they differ? How can you assess which is likely to be the most
accurate? When applicable, try adjusting the parameters such as the scoring matrices, gap
opening and extension pena
lties, or number of iterations to see the effects on the alignments.

(
3
)

Use the T
-
Coffee programs to evaluate the effect of structural information on your
alignments. Follow these steps:



Obtain a group of five distantly related lipocalins
. These include

rat odorant
-
binding
protein and human retinol
-
binding protein.

Eight distantly related lipocalin protein sequences:

>human_RBP4 gi|55743122|ref|NP_006735.2| retinol
-
binding protein 4, plasma
precursor [Homo sapiens]

MKWVWALLLLAALGSGRAERDCRVSSFRVKENFDKARFS
GTWYAMAKKDPEGLFLQDNIVAEFSVDETGQ

MSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAVQYSCRL

LNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQEELCLARQYRLIVHNGYCDGRSERNLL

>rat_OBP gi|20302101|ref|NP_620258.1| odorant binding protein I f [Rattus
norvegicus]

MVKFL
LIVLALGVSCAHHENLDISPSEVNGDWRTLYIVADNVEKVAEGGSLRAYFQHMECGDECQELKII

FNVKLDSECQTHTVVGQKHEDGRYTTDYSGRNYFHVLKKTDDIIFFHNVNVDESGRRQCDLVAGKREDLN

KAQKQELRKLAEEYNIPNENTQHLVPTDTCNQ

>1QWD NP_006735 retinol
-
binding protein 4 [Homo sapiens]

MKWVWALLLLAALGSGRAERDCRVSSFRV
KENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQ

MSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAVQYSCRL

LNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQEELCLARQYRLIVHNGYCDGRSERNLL

>1QWD|A Bacterial Lipocalin Blc E. Coli

MSYYHHHHHHLESTSLYKKSSSTPPRGVTVVNNFDAKRYLG
TWYEIARFDHRFERGLEKVTATYSLRDDG

GLNVINKGYNPDRGMWQQSEGKAYFTGAPTRAALKVSFFGPFYGGYNVIALDREYRHALVCGPDRDYLWI

LSRTPTISDEVKQEMLAVATREGFDVSKFIWVQQPGS

>1Z24|A Chain A, The Molecular Structure Of Insecticyanin From The Tobacco
Hornworm Manduca Sexta L. At 2.6 A Resolut
ion.

GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLPLENENQGKCTIAEYKYDGKKASVYNSFVSNGVKEYM

EGDLEIAPDAKYTKQGKYVMTFKFGQRVVNLVPWVLATDYKNYAINYNCDYHPDKKAHSIHAWILSKSKV

LEGNTKEVVDNVLKTFSHLIDASKFISNDFSEAACQYSTTYSLTGPDRH

>2BLG Bovine Beta
-
Lactoglobulin

LIVTQTMKGLDIQKVAGTWYSLAMA
ASDISLLDAQSAPLRVYVEELKPTPEGDLEILLQKWENDECAQKK

IIAEKTKIPAVFKIDALNENKVLVLDTDYKKYLLFCMENSAEPEQSLVCQCLVRTPEVDDEALEKFDKAL

KALPMHIRLSFNPTQLEEQCHI

>1PBO|A Bovine Odorant Binding Protein (Obp)

AQEEEAEQNLSELSGPWRTVYIGSTNPEKIQENGPFRTYFRELVFDDEKGTVDFYFSVKRDGKWKNVHVK

ATKQDDGTYVADYEGQNVFKIVSLSRTHLVAHNINVDKHGQKTELTGLFVKLNVEDEDLEKFWKLTEDKG

IDKKNVVNFLENEDHPHPE

>1E5P|A Aphrodisin Female Hamster

QDFAELQGKWYTIVIAADNLEKIEEGGPLRFYFRHIDCYKNCSEXEITFYVITNNQCSKTTVIGYLKGNG

TYETQFEGNNIFQPLYITSDKIFFTNKNXDRAGQETNXIVVAGKGNALTPEENEILVQFA
HEKKIPVENI

LNILATDTCPE




Align the sequences using T
-
Coffee (
http://www.tcoffee.org
).



Evaluate the alignment with the iRMSD program (
http://www.tcoffee.org
). Include the
informati
on on two known lipocalin structures; note the scores.



Align the same sequences using Expresso (
http://www.tcoffee.org
) to incorporate
structural information. Did the score improve? Do the alignments differ?

(
4)

X
-
li
nked adrenoleukodystrophy (X
-
ALD) is the most common inherited disease affecting
peroxisomes (a subcellular organelle involved in lipid metabolism and other metabolic
functions). The disease is caused by mutations in the ABCD1 gene on chromosome Xq28
encod
ing ALD protein (ALDP). In humans, there are thought to be four ALDP
-
related proteins on
peroxisomes: ALDP (NP_000024; 745 amino acid residues), ALDR (NP_005155, 740 residues),
PMP70 (NP_002849, 659 residues), and PMP70R (NP_005041, 606 residues). Two yeas
t ALDP
-
like proteins have also been identified, Pxa1p (NP_015178) and Pxa2p (NP_012733). These
proteins are all part of a much larger family of ATP
-
binding cassette (ABC) transporters,
including the cystic fibrosis transmembrane regulator (CFTR) and multid
rug
-
resistant proteins
(MDR).

Create a multiple sequence alignment of the human, mouse, and yeast ALDP family of proteins.
Identify the presumed nucleotide binding site, GPNGCGKS. Is this motif perfectly conserved?