First, the arguments can be passed to our verification ... - BioWiki

helmetpastoralΛογισμικό & κατασκευή λογ/κού

13 Δεκ 2013 (πριν από 3 χρόνια και 8 μήνες)

74 εμφανίσεις

First, the arguments can be passed to our verification program, “verify.pl”, as such:

-

verify.pl <hammerhead ribozyme sequence> 26 22

o

where <hammerhead ribozyme sequence> means
GGGCGACCCUGAUGAGCUUGAGUUUAGCUCGUCACUGUCCAGGUUCAAUCAGG
CGAAACGGUGAAAGCCGUAGGUUGCCC

o

26 is the start coordinate of the OBS,

o

and 22 is the length of the OBS.

The program verify.pl could run as follows:

-

Automate RNAfold through Perl and have it consider this first of two distinct
cases:

o

The first distinct case is when DNA
-
1 is
not

present.



This entails that we would not use any constraints
-
C (i.e. the
OBS can base pair to whatever it wants).

o

We'd print the data to a file, such as:



open RNAFOLD, "| RNAfold
-
p > Tempfile";



print RNAFOLD $ARGV[0];



close RNAFOLD;

o

In actuality, this saves the da
ta we need to dot.ps, whereas Tempfile
contains the information that would have been printed to the screen if
RNAfold had been run manually. We could potentially take the ubox
or lbox data from dot.ps and use it to verify the formation of the
different ste
ms. The syntax of the line which stores the probabilities is
of the form:



<i> <j> <probability of base
-
pairing> <ubox or lbox>



where <i> is the i coordinate of the first base in a base
pair (corresponding to the vertical axis of the base
-
pairing probabilit
y matrix),



where <j> is the j coordinate of the second base in a
base pair (corresponding to the horizontal axis of the
base
-
pairing probability matrix),



<probability of formation> is a 0, followed by a
decimal, followed by digits, corresponding to the
squ
are root of probability of base
-
pairing,



and <ubox or lbox> is either ubox or lbox.



The manpage of RNAfold tells us that a
ubox

datum for
a given base pair will be present if its probability of
formation is >10E
-
6.



An
lbox

datum corresponds to a given base

pair in the
MFE structure, the most believable structure.



We can thus use a regular expression to search for the presence
of either ubox or lbox data. However, since the Penchovsky
-
Breaker article shows the base
-
pairing probability plots
corresponding to
our ubox data, we will search for ubox data so
we can compare to the article's plot. Here is the regular
expression, upon reading in lines from dot.ps:



if ($line =~ m/(^[0
-
9]+)
\
s([0
-
9]+)
\
s0
\
.[0
-
9]+
\
subox/)



And this would store the i and j coordinates into
$1 and $2
respectively. We can then store these into arrays which contain
the i bases and j bases in a base pair, respectively. Upon
searching the whole dot.ps file for ubox data, we are left with
the said two arrays with all the base
-
pairing coordinates i
n the
MFE structure.

o

Our code should now look something like this:



#!/usr/bin/perl






open RNAFOLD, "| RNAfold
-
p > Tempfile";



print RNAFOLD $ARGV[0];



close RNAFOLD;






open (DATA, "dot.ps");



$i = 0;



while ($line = <DATA>) {




chomp $line;




if ($l
ine =~ m/(^[0
-
9]+)
\
s([0
-
9]+)
\
s0
\
.[0
-
9]+
\
subox/) {




$iCoord[$i] = $1;




$jCoord[$i] = $2;




$i++;




}



}



close (DATA,dot.ps);

o

We can then test to see if the i and j coordinates (stored in @iCoord
and @jCoord,
respectively) match the coordinates as given in the
Penchovsky
-
Breaker article, Figure 2a, corresponding to the OFF and
ON respectively.



This could be done (rather tediously) by first looking at the
figures in the article and creating four arrays @iCoordPB
OFF,
@jCoordPBOFF, @iCoordPBON, and @jCoordPBON, which
contain the i and j bases in i
-
j base pairs for the Penchovsy
-
Breaker diagrams for the OFF and ON states.



If our two i and j arrays have different lengths from the i and j
arrays constructed from the P
enchovsky
-
Breaker article, then
we already know that there is no complete match.



If they are the same length, we can use a for loop to compare
elements (note that we must have sorted arrays from the
Penchovsky
-
Breaker article to make it compare correctly).

o

Since we are currently considering this in the absence of DNA
-
1, the
input DNA
-
1 is “false”, and if:



our coordinates match the OFF position, then the output of the
YES Gate has the truth value of “false”



our coordinates match the ON position, then the out
put of the
YES gate has the truth value of “true”.



If it does not match the OFF or ON position, then we have
done something incorrect (or, perhaps we'd assume that the
YES Gate is inactive this way, and would give the truth value
of “false”).

o

The code for
comparing the i arrays and j arrays would be of the form:



$truth = "true";



foreach my $k (@iCoord) {




if ($iCoord[$k] != $iCoordPBON[$k] || $jCoord[$k] !=
$jCoordPBON[$k]) {




$truth = "false";




}



}

o

The above code assumes that not compl
etely matching the ON position
automatically results in a truth value of “false”, but this code can be
extended and modified.

-

The Perl automated RNAfold program should then consider the second of two
distinct cases:

o

The second distinct case is when DNA
-
1
i
s

present


this entails that
we should use

C for RNAfold, giving the constraint that the OBS
would bind DNA
-
1.



Since we passed in the coordinates of the OBS as input, we
could construct the constraint to be used by creating a string
with ($startcoord


1) whitespaces, followed by a # of x's equal
to($OBSlength). The x’s would indicate that our OBS is “not
base paired”
-

since it is actually base paired to DNA
-
1, it
effectively means that the OBS is not base pairing to any other
parts of the ribozyme.

o

Ag
ain, we'd print all the data to a file, such as to Tempfile or
Tempfile2.

o

The dot.ps file would be overwritten, and now we would retrieve the
ubox data from this new dot.ps file in the same way.

o

We would compare to the Penchovsky
-
Breaker Figure 2a OFF and
ON
base pair coordinates in the same way.

o

Since we are currently considering when the DNA
-
1 is present, the
input has a truth value of “true”, and if:



our coordinates match the OFF position, then the output of the
YES Gate has the truth value of “false”



ou
r coordinates match the ON position, then the output of the
YES gate has the truth value of “true”.



If it does not match the OFF or ON position, then we have
done something incorrect (or, perhaps we'd assume that the
YES Gate is inactive this way, and woul
d give the truth value
of “false”).

-

Finally, we'd print out the truth table. What we'd expect from the YES Gate is
as follows:


Input DNA
-
1

Output of YES Gate

True


True

False


False


-

This means that the YES Gate preserves the truth value of the input. Tha
t is,
the output has the same truth value as the input.