Convert strings to arrays

clumpfrustratedBiotechnology

Oct 2, 2013 (4 years and 1 month ago)

74 views

String ($
var
) arrays (@array)
conversion and substring extraction

Lecture
6

Split


strings


This function can be used to split (divide) data:


Strings into an arrays.


Strings into a list of scalars ($variables)


It can also split each character of a string by using
“”

as the
deliminiter
.



>192a8, the lactose gene, e. coli,
cambridge

university,
january

1981


chomp($line = <>); # read the line into $line


@fields =
split ‘,’,$line;

#splits a String into an array


($
clone,$laboratory,$left_oligo,$right_oligo
) = split
‘,’,$line;


See

SplitExample.pl




Join: elements of an array/


The join function is the reverse of the split:


Convert an array into a string



To transform arrays (lists) into strings: join


#initialize an array


@
seq

= (“
aaaaaa",“tttttt",“cccccc",“ggggggg
");



$
CombinedSeq

= join ‘', @
seq
;



Result of the join is:



aaaaaattttttccccccggggggg





See
JoinExample.pl









Concatetion


To concatenate to strings you use the


=. Symbol


Seq1 is a null string: $
seq

= “”;


We can add (concatenate) a sequence to this by:



$
seq

.= $input_seq2



It can be used to read in sequences and join them
together so they form one string.

Extracting substrings


Substr
: a function to extracting a substring from a string.


Assume the string is:
AAAAGGGGCCCCTTTT



To extract the sequence AGG (
a codon
) from the string we need:


Move to
4
positions [character} of the string] t.


Extract
3
characters or a
3
character substring



The syntax for
perl

substr

(substring function)


$sub =
substr

($string, offset position[
position to begin extraction
], size of substring)



Offset is
zero

based




# more details on substrings can be found at:


# http://perlmeme.org/howtos/perlfunc/substr.html



Extract words from a sentence
: Substring.pl


Extract codon from a DNA
seqeunce
:

substring.pl


Perl Functions for determining the ORF of DNA
sequences.


The Unpack function: this a function of the
perl

language that extracts sets of characters from a
sequence of characters and assign them to an array.


So they can be used to extract groups of 3 bases from a DNA sequence. E.g.. open reading frames,
and assign each set to an element of an array.



@triplets = unpack("a3" x (length($line)/3), $line);



To determining all possible open reading frames (ORFs) for a DNA sequence (
reading frame 1,
reading frame 2 and reading frame 3
) one needs to shift one base when going from reading frame
1 to reading frame 2 and the same when going from reading frame 2 to reading frame 3
subsequent




Frame Shift (1positions to the right)


@triplets = unpack(‘a1’ . “a3” x (length ($line)/3),$line);




Remember if there are only 2 characters at the end/ beginning of a sequence. Unpack will still
assign them to an element of the array. If using hash tables do not forget an exist function may
be required,






See
Unpack_codons.pl (
Run to show the output
)

Sample Exercise


Write a script to read in the contents of a
fasta

file (without descriptor line) and print it out as
a string containing
all

the DNA bases/ Amino
acids



Modify the unpack function to use substrings
instead of unpack.