both Perl source code AND sample outputfor each of these problems to receive full credit

crashclappergapSoftware and s/w Development

Dec 13, 2013 (3 years and 7 months ago)

101 views


1

410.634 Practical Computer Concepts for Bioinformatics

Center for Biotechnology Education, The Johns Hopkins University

AAP


Bioinformatics Program


FINAL EXAM (Take
-
home)


Spring 2012


You can use any Perl book or Web resource you can get your hands on
; but you
must

do
the work yourself. Duplication of another person's, web site’s, or book’s code (i.e.
plagiarism) will lead to complete loss of credit for the problem(s).
Use comments liberally!

If
you can’t get the program to work, you will get partial

credit if you give me as much as you
can and at least tell me what logic you wanted to execute for the rest!


Problems are worth 25 points each


this exam is due
May 6 by 11:59 PM EDT
, submitted
as an Assignment on Blackboard..


N.B.

Please provide
both
Perl source code AND sample output

for each of
these problems to receive full credit
:




Carefully r
ead the directions!!!!

(Largest cause of losing points is failing to do so).



You should t
est for bad input from the user
.



Please clean up any unused code


co
mmented out print statements are fine, but
remove anything else unused.



Close filehandles after opening them.



Don't use subroutines if they don't return a value.



Echo what's happening in the program for major steps.



Don't use special variables (
@_
, etc.) f
or other than their intended use.



Use

w

or use warnings;
,
use strict;
,
and

my()



Generalize the program to accept
any

input
file


don't hardcode
.



1. Write a program that prompts the user for a nucleotide sequence and then determines its
melting tempera
ture. Warn the user if letters other than A, C, G, or T are included in the
input. Otherwise, calculate and display the melting temperature of the entered nucleotide
sequence.


Melting temperatures are calculated as follows:

For nucleotide sequences less

than 15 bases long, use the following formula:

2 * (number of As + number of Ts) + 4 * (number of Cs + number of Gs)


For nucleotide sequences greater or equal to 15 bases long, use the following formula:

64.9
+ .41 * (%GC content)


(675 / length of the
sequence)



2. Copy the file ~jgreene/
fexam
/Q2seqs to your directory. This file contains
several
oligonucleotide

sequences (e.g. 5' AATGCTTGGCCAT 3') with one sequence on each line.
Write a program that 1) opens the file and finds the reverse complement o
f each sequence
(e.g.
for example above,
5' ATGGCCAAGCATT 3'); 2) prints the reverse complement to an

2

output file ending with a .rc suffix (e.g. Q2seqs.rc); and 3) notifies the user that processing
of the input file (showing its name) is done.


You need
to
recall

that the
reverse()

function not only works on arrays, but in a scalar context,
reverses the letters of a text string.


i.e.
$word = “biology”;


$revword =

reverse $word;


# $revword now contains “ygoloib”


To get the filename

from the command line (e.g.
revcomp.pl Q2seqs
), you simply shift
off the first element of the
@ARGV

default array. This is so common that you do not even
have to write
@ARGV
; just do:
$filename = shift; #$filename will contain
“myfile”



3
. You are g
iven a tab
-
delimited file of 12 columns and 8 rows, representing UV absorbance
(OD
260
)

data from a sample of a 96
-
well plate preparation of DNA (copy this file to your
directory from
~jgreene/fexam/96well.txt).



Process this absorbance data to convert it
into the concentration of DNA in each well, using
the standard conversion factor of 1 OD
260
= 50 micrograms/milliliter of DNA as well as by
providing for a dilution factor used to sample the DNA from each well (just an integer
constant by which each value
is multiplied). So for example, you take 10 microliters out of
your stock solution, and add it to 1000 microliters of buffer. That’s a 100
-
fold dilution factor.
If you get a resulting OD
260
of 0.333, you would multiply 0.333 times the dilution factor of
10
0 times the conversion factor of
50 micrograms/milliliter to obtain a concentration of 1665
micrograms/ml.


Inputs to the program you must collect from the user are 1) the path to the input file, 2)
the dilution factor, and 3) the path to the output fi
le.


Print the converted values out into another tab
-
delimited file
of equal dimensions

(12 x 8)
, with a header in the file to tell what was done.
Besides a printout of the program
output and the source code, print out your output file as well.



4
.
Copy

t
he BLASTN file from ~jgreene/
fexam
/finalfixed.blastn to your directory. Open the
file, extract the query name and length for display in the output, and then parse
just the
accession number (e.g. A10416), length, and score

for
just
the first 5 hits.


You
m
ust

use regular expressions to precisely pull out the parts you want,
which is the
definition of parsing. You
will probably need to use parentheses to put some parts of those
expressions into temporary memory ($1).