PowerPoint-presentatie - Howest.be

tastelesscowcreekBiotechnology

Oct 4, 2013 (4 years and 2 months ago)

84 views

Login:

BITseminar

Pass:
BITseminar2011

BIOINFORMATICS

Bioinformatics


Combination of:


Theory and methods
(algorithms, statistical
methods, machine learning, …)


Applications

(sequence analysis, genome
assemblies, databases, ... )


Different kinds of datasets

(sequence data,
microarray, next
-
gen data, …)

Biology

Core

Concepts


Molecular biology


Systems biology


Evolutionary theory


Common lab techniques


Sequence comparison


Phylogenetic analysis

Computer
science


Programming


Database querying


Data mining


Visualization


Machine learning


Modeling




Data
exceeds

analysis

Bioinformatician

data

How
to

survive
?


Knowledge of Linux/Unix


Scripting:
Perl
/Python


Network
based

data storage


Knowledge
biology
,
genomics


Database
structures


Try

to

keep up
with

all

new tools!

Benifit

of
using

(Bio)
perl
,
example

You

have a 1000
sequences

to

blast
and

analyse…

You

can

do
this

manually


Or…
use

a
perlscript

to

do
this

for

you

and

present
you

the
final

results
!

Good

journals

to

keep up the pace


Bioinformatics (
http://bioinformatics.oxfordjournals.org/

)



BMC

Bioinformatics (
http://
www.biomedcentral.com/bmcbioinformatics/

)



PLoS

Computational

Biology

(
http://www.ploscompbiol.org/

)



...

DATABASES

Types of databases


DNA databases


Protein

databases


Genome

databases


Microarray databases


Next
-
Gen
seq

databases

What

to

find

in databases?


Sequences


Motifs


Mutations
,
SNPs


Gene
ineraction

profiles


Interactions

(
protein

protein

interactions
)


Transcription

factor binding sites


Etc


Databases?
G
ood

R
eference


http://
nar.oxfordjournals.org

annual

edition

NCBI
: lot of options… feed the
need

Amino

acid databases


Uniprot


SWISS
-
PROT


TrEMBL


PIR

Uniprot


http://www.uniprot.org



Good quality, curated


Minimal redundancy


Extensive cross linking
to useful databases

Structural

databases


Structure

leads
to

function
!


Protein

Data Base



PDB

http://www.pdb.org



SCOP

&
CATH

databases (
structural

classification
)
http://scop.mrc
-
lmb.cam.ac.uk/scop
/

;
http://www.cathdb.info/




Structure

prediction

(
modeling
)


SWISS
-
MODEL &
Repository

(
http:// swissmodel.expasy.org/

)



MODELLER

&
MODBASE

(
http://salilab.org

)


Study

of
interactions

(
docking
) & drug design


SNPs

and

pharma



To collect, encode, and disseminate
knowledge about the impact of human
genetic variations on drug response.


http://www.pharmgkb.org
/


DNA Microarray Databases


Standard:
MIAME

=
minimum information about
microarray experiment


Databases:


ArrayExpress (EBI)

http://www.ebi.ac.uk/arrayexpress/


GEO (NCBI)

http://www.ncbi.nlm.nih.gov/geo/



Check the database
before

planning
an

experiment!

Next gen data database



http://
www.ncbi.nlm.nih.gov/Traces/sra


http
://
www.ebi.ac.uk/ena


http
://www.ddbj.nig.ac.jp/sub/trace_sra
-
e.html

GENOME

BROWSERS

Human
reference

sequences



Celera


Huref


GRCh37

Three
reference

genomes
. Keep
this

in mind
when

browsing

databases!

Useful

Genome

Browsers


Ensembl
:
http://www.ensembl.org/


NCBI

Map Viewer
:
http://www.ncbi.nlm.nih.gov/mapview/map
_search.cgi
?


UCSC
:
http
://genome.ucsc.edu/




Genome

browser:
Ensembl

EMBL

Problems


Lots of
redundancy


Wrong or
old

annotations


Vector
contamination


Errors

in
sequences

Refseq


Better

option,
NCBI

reference


Curated


Annotations

are
controlled


No
redundancy

NCBI:Genbank

vs

RefSeq

http://www.ncbi.nlm.nih.gov/RefSeq/



Sequence

records are
created

by

scientists

who

submit

sequence

data
to

GenBank
. As
an

archival

database,
GenBank

may

contain

hundreds

of records
for

the
same

gene. In
addition
,
because

there

is no independent review
system, the types of information
may

vary

from

record
to

record,
and

GenBank

sequence

data
may

contain

errors

and

contaminant vector DNA.


To address some of the problems associated with
GenBank

sequence records,
NCBI

developed its
RefSeq

database.

Refseq

accession

numbers


NM_ mRNA (provisional, predicted, reviewed)


NP_
protein (provisional, predicted, reviewed)


NR_ non
-
coding RNA (provisional, reviewed)


NG_ human genes (provisional, reviewed)


NC_ chromosomes, complete genomes
(provisional, reviewed)


Refseq

accession

numbers

(2)


XM_ predicted mRNA (model)


XP_ predicted protein (model)


XR_ predicted non
-
coding RNA (model)


NT_ human and mouse genomic contiqs
(model)


NW_ mouse supercontiqs (model)


Genome

browser:
NCBI


Genome

browser:
UCSC


Example
:
UCSC


Good

tutorial:


http://
www.openhelix.
com/downloads/ucsc/
ucsc_home.shtml

SNPS

AND

DISEASE

RESEARCH

SNPs

and

disease

research


Association

analysis,
disease

related

(?),
mapping

genome

variation



R
eference =
dbSNP

database

Example

NCBI

SNP

database,
SNP


rs33957964


Other

useful

SNPs

databases


Genome

variation

center

http://gvs.gs.washington.edu/GVS
/


HapMap

(
Ensembl
)
http://hapmap.org/


List of
all
:
http://
www.hgvs.org/dblist/ccent.html



Clinical

Bioinformatics


Microarrays,
omics

data (
genomics
,
proteomics
,
interactomics
,
metabolomics
, …)


Combination of bioinformatics
and

medical

informatics

ALGORITHMS

AND

TOOLS

Algorithms


Fundaments
for

bioinformatic

tools


Implemented

in ‘front end tools’ (website, Java
applications
)


Can

be

slow


Good

for

smaller analysis,
quick

mining


Scripts, programs
-

use

in
command

line (
e.g.local

BLAST)


Usually

local

install

on server


faster


large
queries
, long analysis time
required


Knowledge of
linux
/
unix

essential



Hall of Fame


Linux

operating system,
mySQL

database


(Bio)Perl
: programming language


making your life easier!


Blast/Blat
: comparing sequences


Phylip
: Phylogenetic analysis, tree building


ClustalW
:
Multiple alignment


MEGA5:
Multiple alignment and editing sequences


HMMER
:
comparative genomics


EMBOSS
: combining several tools for sequence analysis

Open
sourcce



Free to
use and develop

Tools?
Good

Reference


http://nar.oxfordjournals.org
/

-

annual

edition

Analysing

next gen
sequencing

data


Different tools
for

different formats


Roche


Applied

Biosystems


Illumina

Next gen tools


FastQC
:
quality

assesment

of
FASTQ

files

Assembly tools next gen


A
number

of
specialized

tools
exist
:

ABySS
, gap4,
Geneious
, Mira,
Newbler
,

SSAKE
,
SOAPdenovo
,
Velvet, …

Galaxy
!
http://galaxy.psu.edu
/



Galaxy

provides

a web
-
based

application

for

the
analysis of
sequence

data


Includes

many

tools
including

NGS

data


Makes

your

life
easier
,
less

linux

knowledge

On the
cloud

Structure

Galaxy

Login:

BITseminar

Pass:
BITseminar2011

So

this

is
why

you

need

a
bioinformatician

in the lab!!