1
Problem No 1
Determinaiton of GC Content
Among the four nucleotides, {A,T,C,G}, the ratio of C & G over a DNA sequence carries some very
important signals. This ratio is measured through “GC
-
content %” using the following formula.
GC
-
content % =
((n(G)+n(C))/(len(DNA)))*100%
Where
n(G) = number of G in the sequence
n(C) = number of C in the sequence
len(DNA)= length of the DNA sequence in base
-
pair(bp)
write a python program that can perform as bellow.
Input
1.
DNA sequence as a string
Output
1.
L
ength of the sequence
2.
GC
-
content %
Example
Input
ATCG
Output
Length of the seqeunce = 4 bp
GC
-
content % = 50%
2
Problem No
2
Complement DNA strand
DNA forms the double helix structure with two strands of DNA. Though when we work with DNA
seqeunce, we usually talk about a single DNA
-
sequence (single strand). But in the chromosome DNA
remains in a double stranded form. These two strands are called com
plement of each other. One is
named as 5’
-
3’ (forward strand) and other is named as 3’
-
5’(reverse strand). When it is
not
explicitly
mentioned the strand type (or direction), it is assumed that the respective DNA sequence is of 5’
-
3’ or
forward strand.
5
’
---
ACCGTA
---
3’
| | ||| |
3’
---
TGGCAT
---
5’
In a complement DNA strand each base of the original DNA sequence is replaced by the following inter
-
changing rule
-
A is replaced by T and vice
-
versa
C is replaced by G and vice
-
versa
This is because,
in the double helix structure
A of one strand is connected with T of other strand with
hydrogen bond and same in the case of C & G.
write a python program that can perform as bellow.
Input
1.
DNA sequence as a string
Output
1.
Complement of input DNA
sequence
Example
Input
ACCGTA
Output
Complement DNA Sequence
=
TGGCAT
3
Problem No
3
Reverse
Complement
of a
DNA
Sequence
This problem can be thought as an extension of Problem No 2
(read Problem No 2 first)
.
In
bioinformatics analysis the concept of
Reverse Complement DNA sequence is very often encountered. If
the complement of a DNA sequence is reversed, this is called reverse
-
complement of the original DNA
sequence.
5’
---
ACCGTA
---
3’
| | ||| |
3’
---
TGGCAT
---
5’
Here the complement of (5’
---
ACCGTA
---
3’) is (3’
---
TGGCAT
---
5’), and reverse of (3’
---
TGGCAT
---
5’) is
TACGGT, so the reverse complement of ACCGTA is
TACGGT.
write a python program that can perform as bellow.
Input
1.
DNA sequence as a string
Output
1.
Reverse
Complement of input DNA
sequence
Example
Input
ACCGTA
Output
Reverse
Complement DNA Sequence =
TACGGT
4
Problem No
4
Codon List from a DNA sequence
Triplets of nucleotides (for example ATT, TCG, CCC, etc) are called Codons. Through the process of
Transcription and Translation these Codons of a DNA sequence become responsible to produce an
amino acid individually. And finally chain of amino acids build
s a protein.
64 (4x4x4) different codons are
possible.
Lets think of a DNA sequence as
ATTTCGAGGT
. If we start parsing codons from left to right, the possible
codons will be
ATT
,
TCG
,
AGG
(ignore the
right
most remaining part
with length <3 bp
, in this ca
se
T
).
Write a function
/python program
that returns the list of codons for a DNA sequence
. This program
should return/print the list of codons as the “list” data structure of python.
Input
1.
DNA sequence as a string
Output
1.
List of Codons
Example
Inp
ut
ATTTCGAGGT
Output
Codon
-
List
=
[‘ATT’,’TCG’,’AGG’]
5
Problem No
5
Translate a DNA Sequence
Each codon represents an amino acid (skim through Problem No 4). The standard Codon
-
To
-
Amino Acid
mapping table is called the “Standard Genetic Code Table” or “Codon
-
Table”. This is built for codons
derived from RNA (detail will discussed in separate space
beyond this problem), as a result you will find
U
instead of
T
.
But for the simplicity, in this specific problem definition you should use the customized
(for DNA) genetic code table.
Standard Genetic Code
U
C
A
G
U
UUU
UUC
UUA
UUG
UCU
UCC
UCA
UCG
UAU
UAC
UAA
UAG
UGU
UGC
UGA
UGG
U
C
A
G
C
CUU
CUC
CUA
CUG
CCU
CCC
CCA
CCG
CAU
CAC
CAA
CAG
CGU
CGC
CGA
CGG
U
C
A
G
A
AUU
AUC
AUA
AUG
ACU
ACC
ACA
ACG
AAU
AAC
AAA
AAG
AGU
AGC
AGA
AGG
U
C
A
G
G
GUU
GUC
GUA
GUG
GCU
GCC
GCA
GCG
GAU
GAC
GAA
GAG
GGU
GGC
GGA
GGG
U
C
A
G
Phe
(F)
Leu (L)
Val
(V)
Ser (S)
Tyr (Y)
Stop
Cys (C)
Leu
(L)
Pro
(P)
Gln
(Q)
His
(H)
Arg
(R)
Ile
(I)
Thr
(T)
Lys
(K)
Asn
(N)
Ser (S)
Arg
(R)
Asp
(D)
Ala
(A)
Glu
(E)
Gly
(G)
Met
(M)
Trp (W)
Stop
6
Customized (for DNA)
Genetic Code
ttt: F tct: S tat: Y tgt: C
ttc: F tcc: S tac: Y tgc: C
tta: L tca: S taa: * tca: *
ttg: L tcg: S tag: * tcg: W
ctt: L cct: P cat: H cgt: R
ctc: L ccc: P cac: H cgc: R
cta: L cca: P caa: Q cga: R
ctg: L ccg: P cag: Q cgg: R
att: I act: T aat: N agt: S
atc: I acc: T aac: N ag
c: S
ata: I aca: T aaa: K aga: R
atg: M acg: T aag: K agg: R
gtt: V gct: A gat: D ggt: G
gtc: V gcc: A gac: D ggc: G
gta: V gca: A gaa: E gga: G
gtg: V gcg: A gag: E ggg: G
There are 20 different amino acids. Detial table is as bellow.
20 Amino A
cids and Their Codes
1
-
Letter code
3
-
Letter Code
Name
1
A
Ala
Alanine
2
R
Arg
Arginine
3
N
Asn
Asparagine
4
D
Asp
Aspartic acid
5
C
Cys
Cysteine
6
Q
Gln
Glutamine
7
E
Glu
Glutamic acid
8
G
Gly
Glycine
9
H
His
Histidine
10
I
Ile
Isoleucine
11
L
Leu
Leucine
12
K
Lys
Lysine
13
M
Met
Methionine
14
F
Phe
Phenylalanine
7
15
P
Pro
Proline
16
S
Ser
Serine
17
T
Thr
Threonine
18
W
Trp
Thryptophan
19
Y
Tyr
Tyrosine
20
V
Val
Valine
Write a function
/program that takes a
DNA sequence and returns
/prints
the translated protein
sequence
(using the customized codon table, and representing amino acids using 1
-
letter codes)
.
Ignore
right
-
most incomplete codon of length <3 bp, as explained in Problem No 4.
Input
1.
DNA sequence as a string
Output
1.
Amino Acids Sequence of Protein
Example
I
nput
TTTCCTAATC
Output
Protein Sequence
=
FPN
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο