MATLAB Bioinformatics Tools

clumpfrustratedΒιοτεχνολογία

2 Οκτ 2013 (πριν από 4 χρόνια και 1 μήνα)

73 εμφανίσεις

MATLAB Bioinformatics Tools

Rob Henson

The MathWorks, Inc.

Who Am I?


Development manager for Bioinformatics
group at The MathWorks


Natick, MA


Software developer


Background in algorithm design and
software engineering

What do I do?


Write software for bioinformatics


Sequence analysis


Microarray data analysis


Some consulting


Bioinformatics algorithm design


Machine learning tools


E.g. Neural networks, HMMs etc.

My solution to dotplot

>> map = eye(128);

>> spy(map(seq1,seq2))

Why does this work?

How could we make this better?

Enhancements to dotplot


Does map need to be 128?


What is the right value?


Can we use less memory?


How do we deal with bad inputs?


Can we extend this to look for longer
patterns?

Some useful tools


edit


dbstop


profiler


Getting help


Documentation


Technical Support Knowledge Base


Newsgroup

A full implementation of dotplot

function matches = dotplot(seq1,seq2,window,stringency)

% DOTPLOT Visualize sequence matches.

% DOTPLOT(S,T) plots the sequence matches of sequences S and T.

%

% DOTPLOT(S,T,WINDOW,NUM) plots sequence matches when there

% are at least NUM matches in a window of size WINDOW. For nucleotide

% sequences a WINDOW of 11 and NUM of 7 is recommended in the

% literature.

%

% MATCHES = DOTPLOT(...) returns the number of dots in the dotplot

% matrix.

%

% Example:

% moufflon = getgenbank('AB060288','sequence',true);

% takin = getgenbank('AB060290','sequence',true);

% dotplot(moufflon,takin,11,7)

%

% This shows the similarities between prion protein (PrP) nucleotide

% sequences of two ruminants, the moufflon and the golden takin.

%

% See also NWALIGN, SWALIGN.


Sequence properties


Amino acid composition


histc function


Molecular weight


Indexing and sum function


Hydrophobicity

Molecular weights

A: 89.000

R: 174.000

N: 132.000

D: 133.000

D: 121.000

Q: 146.000

E: 147.000

G: 75.000

H: 155.000

I: 131.000

L: 131.000

K: 146.000

M: 149.000

F: 165.000

P: 115.000

S: 105.000

T: 119.000

W: 204.000

Y: 181.000

V: 117.000

http://cn.expasy.org/tools/pscale/Molecularweight.html

mw = [89.0900


0


121.1500


133.1000


147.1300


165.1900


75.0700


155.1600


131.1700


0


146.1900


131.1700


149.2100


132.1200


0


115.1300


146.1500


174.2000


105.0900


119.1200


0


117.1500


204.2300


0


181.1900];

seq = ‘MATLAPEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSP’;

seqmw = mw(seq
-
’A’+1);

plot(seqmw)

proteinplot

Assignments

1. Create a hydrophobicity plot

You can get the amino acid values from
http://cn.expasy.org/cgi
-
bin/protscale.pl

Use Kyte & Doolittle’s values.

Create a function that has two inputs, the
sequence and the window size. The function
will create a hydrophobicity plot. The help
for the function is on the next slide…

function hydrophobic(sequence, window_length)

% HYDROPHOBIC plots the hydrophobicity of an amino acid sequence

% HYDROPHOBIC(SEQUENCE,WINDOW_LENGTH) creates a hydrophobicity plot of

% SEQUENCE using a smoothing window of length, WINDOW_LENGTH.

%

% SEQUENCE must be a valid amino acid sequence. If SEQUENCE contain any

% symbols other than the standard 20 amino acid letters, the function

% will give an error message. SEQUENCE can be either upper or lower case.

%

% WINDOW_LENGTH must be an odd positive integer.

%

Assignments

2. Modify the function to return the maximum
and minimum hydrophobicity values in the
plot.


Make appropriate changes to the help for
the function.

Advanced example


Alignment significance


Alignment algorithms such as Smith
-
Waterman
and Needleman
-
Wunsch always find some
alignment. How do we know if what they find
is significant or simply random?