Bioinformatics Using Python_001x - bio-bio-1

raviolirookeryΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

68 εμφανίσεις

1


Problem No 1


Determinaiton of GC Content

Among the four nucleotides, {A,T,C,G}, the ratio of C & G over a DNA sequence carries some very
important signals. This ratio is measured through “GC
-
content %” using the following formula.


GC
-
content % =
((n(G)+n(C))/(len(DNA)))*100%


Where


n(G) = number of G in the sequence


n(C) = number of C in the sequence


len(DNA)= length of the DNA sequence in base
-
pair(bp)


write a python program that can perform as bellow.


Input

1.

DNA sequence as a string

Output

1.

L
ength of the sequence

2.

GC
-
content %


Example

Input


ATCG

Output


Length of the seqeunce = 4 bp


GC
-
content % = 50%



2


Problem No
2


Complement DNA strand

DNA forms the double helix structure with two strands of DNA. Though when we work with DNA
seqeunce, we usually talk about a single DNA
-
sequence (single strand). But in the chromosome DNA
remains in a double stranded form. These two strands are called com
plement of each other. One is
named as 5’
-
3’ (forward strand) and other is named as 3’
-
5’(reverse strand). When it is
not
explicitly
mentioned the strand type (or direction), it is assumed that the respective DNA sequence is of 5’
-
3’ or
forward strand.



5

---
ACCGTA
---
3’


| | ||| |

3’
---
TGGCAT
---
5’



In a complement DNA strand each base of the original DNA sequence is replaced by the following inter
-
changing rule
-


A is replaced by T and vice
-
versa

C is replaced by G and vice
-
versa


This is because,
in the double helix structure
A of one strand is connected with T of other strand with
hydrogen bond and same in the case of C & G.


write a python program that can perform as bellow.


Input

1.

DNA sequence as a string

Output


1.
Complement of input DNA
sequence


Example

Input


ACCGTA

Output


Complement DNA Sequence

=
TGGCAT




3


Problem No
3


Reverse
Complement
of a
DNA
Sequence

This problem can be thought as an extension of Problem No 2

(read Problem No 2 first)
.
In
bioinformatics analysis the concept of
Reverse Complement DNA sequence is very often encountered. If
the complement of a DNA sequence is reversed, this is called reverse
-
complement of the original DNA
sequence.


5’
---
ACCGTA
---
3’


| | ||| |

3’
---
TGGCAT
---
5’


Here the complement of (5’
---
ACCGTA
---
3’) is (3’
---
TGGCAT
---
5’), and reverse of (3’
---
TGGCAT
---
5’) is
TACGGT, so the reverse complement of ACCGTA is
TACGGT.


write a python program that can perform as bellow.


Input

1.

DNA sequence as a string

Output


1.
Reverse
Complement of input DNA
sequence


Example

Input


ACCGTA

Output


Reverse
Complement DNA Sequence =
TACGGT





4


Problem No
4


Codon List from a DNA sequence

Triplets of nucleotides (for example ATT, TCG, CCC, etc) are called Codons. Through the process of
Transcription and Translation these Codons of a DNA sequence become responsible to produce an
amino acid individually. And finally chain of amino acids build
s a protein.
64 (4x4x4) different codons are
possible.


Lets think of a DNA sequence as
ATTTCGAGGT
. If we start parsing codons from left to right, the possible
codons will be
ATT
,
TCG
,
AGG

(ignore the
right

most remaining part

with length <3 bp
, in this ca
se
T
).


Write a function
/python program

that returns the list of codons for a DNA sequence
. This program
should return/print the list of codons as the “list” data structure of python.


Input

1.

DNA sequence as a string

Output


1.
List of Codons


Example

Inp
ut


ATTTCGAGGT

Output


Codon
-
List

=
[‘ATT’,’TCG’,’AGG’]






5


Problem No
5


Translate a DNA Sequence

Each codon represents an amino acid (skim through Problem No 4). The standard Codon
-
To
-
Amino Acid
mapping table is called the “Standard Genetic Code Table” or “Codon
-
Table”. This is built for codons
derived from RNA (detail will discussed in separate space

beyond this problem), as a result you will find
U

instead of
T
.
But for the simplicity, in this specific problem definition you should use the customized
(for DNA) genetic code table.


Standard Genetic Code


U

C

A

G




U

UUU

UUC

UUA

UUG

UCU

UCC

UCA

UCG

UAU

UAC

UAA

UAG

UGU

UGC

UGA

UGG

U

C

A

G



C

CUU

CUC

CUA

CUG

CCU

CCC

CCA

CCG

CAU

CAC

CAA

CAG

CGU

CGC

CGA

CGG

U

C

A

G



A

AUU

AUC

AUA

AUG

ACU

ACC

ACA

ACG

AAU

AAC

AAA

AAG

AGU

AGC


AGA

AGG

U

C

A

G



G

GUU

GUC

GUA

GUG

GCU

GCC

GCA

GCG

GAU

GAC

GAA

GAG

GGU

GGC

GGA

GGG

U

C

A

G


Phe
(F)

Leu (L)

Val

(V)


Ser (S)


Tyr (Y)


Stop


Cys (C)


Leu

(L)


Pro

(P)


Gln

(Q)


His

(H)


Arg

(R)


Ile

(I)


Thr

(T)


Lys

(K)


Asn

(N)


Ser (S)


Arg

(R)


Asp

(D)


Ala

(A)


Glu

(E)


Gly

(G)


Met

(M)


Trp (W)

Stop


6


Customized (for DNA)

Genetic Code

ttt: F tct: S tat: Y tgt: C

ttc: F tcc: S tac: Y tgc: C

tta: L tca: S taa: * tca: *

ttg: L tcg: S tag: * tcg: W


ctt: L cct: P cat: H cgt: R


ctc: L ccc: P cac: H cgc: R

cta: L cca: P caa: Q cga: R

ctg: L ccg: P cag: Q cgg: R


att: I act: T aat: N agt: S

atc: I acc: T aac: N ag
c: S

ata: I aca: T aaa: K aga: R

atg: M acg: T aag: K agg: R


gtt: V gct: A gat: D ggt: G

gtc: V gcc: A gac: D ggc: G

gta: V gca: A gaa: E gga: G

gtg: V gcg: A gag: E ggg: G



There are 20 different amino acids. Detial table is as bellow.


20 Amino A
cids and Their Codes


1
-
Letter code

3
-
Letter Code

Name

1

A

Ala

Alanine

2

R

Arg

Arginine

3

N

Asn

Asparagine

4

D

Asp

Aspartic acid

5

C

Cys

Cysteine

6

Q

Gln

Glutamine

7

E

Glu

Glutamic acid

8

G

Gly

Glycine

9

H

His

Histidine

10

I

Ile

Isoleucine

11

L

Leu

Leucine

12

K

Lys

Lysine

13

M

Met

Methionine

14

F

Phe

Phenylalanine

7


15

P

Pro

Proline

16

S

Ser

Serine

17

T

Thr

Threonine

18

W

Trp

Thryptophan

19

Y

Tyr

Tyrosine

20

V

Val

Valine



Write a function
/program that takes a
DNA sequence and returns
/prints

the translated protein
sequence

(using the customized codon table, and representing amino acids using 1
-
letter codes)
.

Ignore
right
-
most incomplete codon of length <3 bp, as explained in Problem No 4.


Input

1.

DNA sequence as a string

Output


1.
Amino Acids Sequence of Protein


Example

I
nput


TTTCCTAATC

Output


Protein Sequence

=
FPN