Introduction to Bioinformatics

powerfultennesseeBiotechnology

Oct 2, 2013 (3 years and 10 months ago)

94 views

הקיטמרופניאויבל אובמ

...
שדח דמימל היגולויבה תא תחקל

רמוש ינב
,

רבמבונ
2005

Exponential Growth Rate

Over the last two decades, nucleic acid data has accumulated at the EMBL
database at an exponential rate, currently totaling ~110 Gbases, related from
62M entries.


~
200
,
000
Protein Entries

Currently stored in the
UniProt database, with

70
M amino acids.

ERK2 MAP Kinase

The whole genome of over
1500
viruses and
775
bacteria has been completely sequenced or is in
progress…

Haemophilus influenza

Bacteriophage T
4

Salmonella sp.

…as well as some 400 eukaryotic genomes

of which 135 are of parasites, fungi and other lower forms.

Trypanosoma brucei

Schizosaccharomyces pombe

Plasmodium falciparum

Leishmania major

More than 500 organelle genomes are in the databases

Mitochondrion
3
D CAT

Chloroplast

Mitochondria

About
80
plants are being genome/EST sequenced or
genetically mapped

Arabidopsis thaliana

There are currently ~
170
genome projects

of Metazoa

3.2
Gb

~
30
,
000
genes.

"
עדי לש רגאמ לע תולעב
,
ןטק אל רשוא והז
"



סטרקוס

"
תויצרופורפמ תאצל ךירצ אל

םיינתמה לע שארה םע ראשיהל ךירצו
"


יחרזמ ןולא



Same Size Genome ~3Gb


About same number of genes (
30
,
000
)


Same gene contents


85
-
90%


similarity between genes


(up to
98
% similarity with apes)


Genes & Development Vol.
14
, No.
20
, pp.
2551
-
2569
, October
15
,
2000


The Basis for Bioinformatics

From Sequence to Biology



1460 1470 1480 1490 1500

AF
3071
TGGGCAATTCCCAGAAATTAATGGCTATGAGTTCTTTTTTGATCAACTCA


:: ::::::: ::::::::::::: :::::::: : ::::::::::::

AF
0712
TGTGCAATTCAAAGAAATTAATGGCCATGAGTTCCTATTTGATCAACTCC


180 190 200 210 220



1510 1520 1530 1540 1550

AF
3071
AACTATGTCGACCCCAAGTTCCCTCCATGCGAGGAATATTCACAGAGCGA


:::::::: ::::: ::::: :: :: :::::::::::::: ::::::::

AF
0712
AACTATGTGGACCCTAAGTTTCCACCCTGCGAGGAATATTCCCAGAGCGA


230 240 250 260 270



1560 1570 1580 1590 1600

AF
3071
TTACCTACCCAGCGACCACTCGCCCGGGTACTACGCCGGCGGCCAGAGGC


::::::::::: ::::: :: : ::::: : ::: ::::::::

AF
0712
CTACCTACCCAGT
---
CACTCTCCGG
---
ACTACTACAGCGCCCAGAGGC


280 290 300 310



1610 1620 1630 1640 1650

AF
3071
GAGAGAGCAGCTTCCAGCCGGAGGCGGGCTTCGGGCGGCGCGCGGCGTGC


::: : ::::::: ::: :: :: : : ::: ::: :::

AF
0712
AAGACCCCTCGTTCCAGCATGAGTCGATCTACCACCAGCGGTCGGGCTGC


320 330 340 350 360

Human

Zebrafish

HoxB
4
local alignment

Local, Global, Multiple…

Elongation Factor
1
alpha

>gi|28558768|sp|P53601|A4_MACFA Amyloid beta A4 protein precursor (APP)

(ABPP) (Alzheimer's disease


amyloid protein homolog) [Contains: Soluble APP
-
alpha


(S
-
APP
-
alpha); Soluble APP
-
beta (S
-
APP
-
beta); C99;


Beta
-
amyloid protein 42 (Beta
-
APP42); Beta
-
amyloid


protein 40 (Beta
-
APP40); C83; P3(42); P3(40);


Gamma
-
CTF(59) (Gamma
-
secretase C
-
terminal fragment 59);


Gamma
-
CTF(57) (Gamma
-
secretase C
-
terminal fragment 57);


Gamma
-
CTF(50) (Gamma
-
secretase C
-
terminal fragment 50);


C31]


Length = 770



Score = 1277 bits (3305), Expect = 0.0


Identities = 642/752 (85%), Positives = 643/752 (85%)


Query: 19 EVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKTCIDTKEGILQYCQEVYP 78


EVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKTCIDTKEGILQYCQEVYP

Sbjct: 19 EVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKTCIDTKEGILQYCQEVYP 78


Query: 79 ELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSDALLVPDKCKFLHQ 138


ELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSDALLVPDKCKFLHQ

Sbjct: 79 ELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSDALLVPDKCKFLHQ 138


Query: 139 ERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCCPLXXXXXXXXX 198


ERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCCPL

Sbjct: 139 ERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCCPLAEESDNVDS 198


Query: 199 XXXXXXXXXXWWGGADTDYADGSXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 258


WWGGADTDYADGS

Sbjct: 199 ADAEEDDSDVWWGGADTDYADGSEDKVVEVAEEEEVAEVEEEEADDDEDDEDGDEVEEEA 258


Query: 259 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXCSEQAETGPCRAMISRWYFDVTEGKCAP 318


CSEQAETGPCRAMISRWYFDVTEGKCAP

Sbjct: 259 EEPYEEATERTTSIATTTTTTTESVEEVVREVCSEQAETGPCRAMISRWYFDVTEGKCAP 318


SNP

RNA secondary structure prediction

TF Analysis

Gene Analysis

Genome Level Annotation

Genome Level Annotation

Chromosome

Slider

Focus Position

Focus Area
Overview

Chromosome Oriented

Genome Level Annotation

Focus Area
Detailed View

Genome Level Annotation

Focus Area
Basepair View

Genome Level Annotation

Gene Oriented

Protein properties

EX
33
inflammation related GPCR analysis

Protein properties

EX
33
inflammation related GPCR analysis



.
10
.
20
.
30
.
40
.
50


MWNSSDANFSCYHESVLGYRYVAVSWGVVVAVTGTVGNVLTLLALAIQPK

helix HHHHHHH

sheet E E EEEEEEEEEEEEEEE EEEE E

turns TT TTTTT TTTT TTT T


coil CC CCCC


.
60
.
70
.
80
.
90
.
100


LRTRFNLLIANLTLADLLYCTLLQPFSVDTYLHLHWRTGATFCRVFGLLL

helix HHHHHHHH H

sheet EE EEEEEE EEEEEEEE E EEEE EEEEEEEE

turns T TT T TTTTTTT


coil C


.
110
.
120
.
130
.
140
.
150


FASNSVSILTLCLIALGRYLLIAHPKLFPQVFSAKGIVLALVSTWVVGVA

helix HH HHHHH HHHHHH

sheet EEEEEEE EEEE EEE EEEEEEEEEEEEEE

turns T TT T


coil C C CC C


.
160
.
170
.
180
.
190
.
200


SFAPLWPIYILVPVVCTCSFDRIRGRPYTTILMGIYFVLGLSSVGIFYCL

helix

sheet EEEEEEEEEEEE EEEEEEEEEEEE EEEEEE

turns T TTTTTTT T TT


coil CCCC C CC CC


Secondary Structure Prediction

Garnier

Plotstructure

PredictProtein

3D Structure analysis

ID GATA_ZN_FINGER_
1
; PATTERN.

AC PS
00344
;

DT NOV
-
1990
(CREATED); NOV
-
1997
(DATA UPDATE); JUL
-
1998
(INFO UPDATE).

DE GATA
-
type zinc finger domain.

PA C
-
x
-
[DN]
-
C
-
x(
4
,
5
)
-
[ST]
-
x(
2
)
-
W
-
[HR]
-
[RK]
-
x(
3
)
-
[GN]
-
x(
3
,
4
)
-
C
-
N
-
[AS]
-
C.

NR /RELEASE=
41.18
,
131945
;

NR /TOTAL=
99
(
61
); /POSITIVE=
99
(
61
); /UNKNOWN=
0
(
0
); /FALSE_POS=
0
(
0
);

NR /FALSE_NEG=
14
; /PARTIAL=
0
;

CC /TAXO
-
RANGE=??E??; /MAX
-
REPEAT=
2
;

CC /SITE=
1
,zinc; /SITE=
4
,zinc; /SITE=
15
,zinc; /SITE=
18
,zinc;

DR O
13412
, AREA_ASPNG, T; O
13415
, AREA_ASPOR, T; P
17429
, AREA_EMENI, T;


Pattern and Motif Analysis

Protein Families

Pathway Analysis

Pathway Analysis

Protein
-
Protein interaction

Triclosan
-

FabI

Data Sources:

Yeast Two Hybrid system

Protein
-
Protein interaction

Data Sources:

Surface Plasmon Resonance

Triclosan
-

FabI

Protein
-
Protein interaction

Data Sources:

Natural Language Processing

DNA Microarray

& Expression Analysis

Cloning,

Restriction

& Mapping

PCR Design

Linguistics &

Information systems