IN HOUSE LOCAL BLAST SEARCH

helmetpastoralSoftware and s/w Development

Dec 13, 2013 (3 years and 8 months ago)

57 views








IN HOUSE LOCAL BLAST SEARCH



To get started you need the
blastall.exe

and
formatdb.exe

(From NCBI). The rest of the
perl and batch programs you might need to change the path of the directo
ries they are
pointing to or the blast option they use, could be downloaded from:
http://psi081.ba.ars.usda.gov/SGMD/software/blast/Blast.htm


For the programs to work withou
t modifying the paths, the whole folder “
Blast.zip

should be unzipped to a folder
"
Blast
"

moved under the “C:” directory.


For questions or comments please contact: Imed Ben Chouikha

bchouikh@gm
u.edu



I. Step one: Blasting


1) Download the database that you want to blast against, for example the
NT

database
from NCBI. If you want to use a local database, store all the sequences in a text
file.

The file provided by NCBI is a zipped (nt.gz) file so you have to unzip it.


2) At the DOS prompt (which you can get to from windows by choosing: Start, Run, then
typing: command), run formatdb.exe to create a local database from that text file or th
e
downloaded database.


Usage:


formatdb

t
databasename


i inputfile

p F


Examples:


1)

formatdb

t
nt


i nt

p F

2)

formatdb

t
snc


i inputfile

p F


databasename

is the name you want to give to your database

inputfile

is the name of the text file that
contains your sequences or the name of the
database that you downloaded from GenBank (technically also a text file of sequences).


More about formatdb.exe information and command options can be found here:

http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/doc/formatdb.txt



3) Open the file BlastList.pl (using Notepad or your favorite text editor)

Make the small changes as instructed in the file then save it.

T
hese are the only two changes that should be made to run the program.


4) Run BlastList.pl as follows:



c:> cd Blast


c:
\
>Blast
\
perl BlastList.pl


The file BlastList.pl automatically creates a batch file “DosBlast.bat” depending on the
list of the sequenc
e to be blasted.


5) Run DosBlast.bat



c:
\
>Blast
\

DosBlast.bat


DosBlast.bat is the actual file that does the blast search.



II. Step two: Extracting data from blast results


6) Move all the resulting ".txt" files to BlastOut

7) Go to the directory Blast
Out



c:
\
>Blast
\

cd BlastOut


8) Run Hits.pl





c:
\
>Blast
\
BlastOut
\
perl Hits.pl


That will move the files that returned no hits to a different directory


9) Run DataExt.pl



c:
\
>Blast
\
BlastOut
\
perl DataExt.pl


The output will be written to the file B
lasted.txt.

With Excel open (using tab delimited) the file Blasted.txt.

It contains a summary of the blast results that you can save, edit, etc.


LIST OF PROGRAMS:


The required programs are available in this directory but here is the code for four of t
hem
in case you wish to make some modifications:


1)

BlastList.bat

2)

DosBlast.bat

3)

Hits.pl

4)

DataExt.pl



============================BlastList.bat=========================


#!/usr/local/bin/perl


## file: BlastList.pl

##

## Imed Ben Chouikha

##

## 04/24/03


##

## This files creates a batch file "DosBlast.bat" based on

##

the list of ".seq" sequence files.

## The file "DosBlast.Bat" runs separetly. (For more information see "ReadMe.doc")

##

## send comments to: bchouikh@gmu.edu



# CHANGES TO MAKE


# 1) Ch
ange "SCN_seq.fas" (in the first line of the program) with the name of the local

# database you are Blasting against:

# 2) eliminate (only if needed) anything other than the sequence files in the

#

"unless" statement (below, in the middle of the cod
e).




$DBNAME = "SCN_seq.fas"; ###### Replace "SCN_seq.fas" with local Database name


$dirtoget="C:/Blast";


opendir(IMD, $dirtoget) || die("Cannot open directory");


# delete the old "DosBlast.bat" file that contains the list of sequences

#




to be b
lasted


$dosfile = "DosBlast.bat";

unlink($dosfile);


# Get the list of the new sequence files to blast


@thefiles= readdir(IMD);


closedir(IMD);


# Create a new file "DosBlast.bat"


open(OUT,">DosBlast.bat") || die "cannot open file for writing: $!";


fo
reach $f (@thefiles)

{



####### Add to the list below everything other that the sequence files


####### Here is the Unless statement:




unless ( ($f eq ".") || ($f eq "..") || ($f eq "DosBlast.bat") || ($f eq "BlastList.pl") || ($f eq $DBNAME)||

(
$f eq "BlastOut") || ($f eq "blastall.exe") || ($f eq "formatdb.exe")||

($f eq "formatdb.log")|| ($f eq "ReadMe.doc")){




@myarray = split(/
\
./,$f); # Old file name


$extension =".txt";


# This is the new file extension



@newname=@myarray[0].$
extension;





print(OUT "blastall
-
p blastn
-
d $DBNAME
-
i $f
-
o @newname
-
v 0
-
b 1
\
n");



} # end of unless


} # end of foreach




============================DosBlast.bat=========================

# Changed this file to be automatically generated. So yo
u do not have to worry about it

# it contains lines of the form

# blastall
-
p blastn
-
d $DBNAME
-
i $f
-
o @newname
-
v 0
-
b 1

# where $DBNAME is the Database name, $f and @newname are the input and output names

# read from the directory by BlastList.pl

=
==============================================================




============================Hits.pl==============================

#!/usr/local/bin/perl


##

## file Hits.pl

##

## Imed Ben Chouikha

##

## 04/24/03

##

## This files moves all the files that
returned "No Hits" to the directory called "NoHits"

## and deletes them from the current directory

##

## Send comments to: bchouikh@gmu.edu



$dirtoget="C:/Blast/BlastOut";

opendir(IMD, $dirtoget) || die("Cannot open directory");


@thefiles= readdir(IMD)
;


#closedir(IMD);


## loop over the files


foreach $f (@thefiles)

{


unless ( ($f eq ".") || ($f eq "..") || ($f eq "Blasted.txt") || ($f eq "DataExt.pl")||


($f eq "NoHits") || ($f eq "Hits.pl")){


open(IN, $f) || die "cannot open file for
reading: $!";

#open(OUT,">nohitlist.txt") || die "cannot open file for writing: $!";

$nohit = "No Hits Found";

$count = 0;


while($lines = <IN>){


chop($lines);



if ($lines =~ /$nohit/){



$count += 1;



}


else{



$count = $count;




}



$lines
+= 1;


} # end of while loop



if ($count >= 1){



close(IN);



$odir="NoHits";




opendir(IMD1, $odir) || die("Cannot open directory");



rename($f, "$odir/$f");



closedir(IMD1);




#unlink($f);


}


else {



$count = $count;


}





} # en
d of unless

} # end of foreach


closedir(IMD);


===========================DataExt.pl============================


#!/usr/local/bin/perl


##

## file: DataExt.pl

##

## Imed Ben Chouikha

##

## 04/24/03

##

## This file extracts the E
-
values, best hits, and
other values from the Blast result.

##

## send comments to: bchouikh@gmu.edu




$dirtoget="C:/Blast/BlastOut";

opendir(IMD, $dirtoget) || die("Cannot open directory");


@thefiles= readdir(IMD);


closedir(IMD);


## loop over the files

open(OUT,">Blasted.txt
") || die "cannot open file for writing: $!";

foreach $f (@thefiles)

{


unless ( ($f eq ".") || ($f eq "..") || ($f eq "Blasted.txt") || ($f eq "DataExt.pl")||


($f eq "NoHits") || ($f eq "Hits.pl")){



open(IN, $f) || die "cannot open file f
or reading: $!";


$besthit = 0;

$scorecount = 0;

$line0 = "";

while($lines = <IN>){



if ($besthit < 1) {


# main while loop


chop($lines);



if ($lines =~ />/){



$name = $lines;


}


elsif($lines =~ /Length =/){



$length=$lines;



if ($line0 =~ />/){




$secondname = $secondname;



}



else{




$secondname = $line0;



}



}


elsif($lines =~ /Score = /){



$score = $lines;


}


elsif($lines =~ /Identities = /){



$identities = $lines;


}


elsif($lines =~ /Strand = /){



$strand = $lines;


print
( OUT "$f $name $secondname
\
t $length
\
t $score
\
t $identities
\
t $stand
\
n");


$besthit += 1;


}



else{



$lines = $lines;


}


$line0 = $lines;



$lines +=1;


}


else{


$lines +=1;


}

} # end of while loop


} # end of unless

} # end of foreach