KNIME Bioinformatics Extensions

wickedshortpumpBiotechnology

Oct 1, 2013 (3 years and 8 months ago)

63 views

your name

Karol Kozak

ETH Zurich













January 2011,
Zurich, KNIME SIG

KNIME Bioinformatics Extensions



your name

Start HCS

your name

HCS Experiments and
Informatics

your name

HCDB


OpenBIS

&

Library
Database

KNIME Matlab

WEKA Nodes

Cell Classifiers

Bioinformatics

Off
-
Target

KNIME, Java

KNIME

LIMS

KNIME

Matlab

HCS & Open Source

Spotfire

KNIME

your name

Research interest

Software
engineering +
modeling

Database
understanding

Annotation
database

RNA technology





-
New prediction algorithms including Kernel methods

-
Post analysis of existing hits

-
Traditional bioinformatics: homology, blast, alignment
score

-
Database development


Role

Classification

Pattern
recognition

Off
-
target

prediction

2D/3D Structure

relationships

Gene
functionality

Role of Bioinformatics

in
RNAi

technology to detect functionality of
genes in mammalian cells

your name

RNAi Libraries

-

Genome wide

-

Functional groups (Kinases..)

-

n
-

Ologinucleotides

-

Pooled

your name

RNAi Libraries

-

Qiagen

-

Thermo Fisher Scientific

(
Dharmacon)

-

Applied Biosystems (Ambion)

-

Sigma
-
Aldrich

(esiRNA)

your name

RNAi Library evlolution

TFisher

Annot

database

Qiagen

annot


database

AppBiosyst

annot

database

Sigma
esiRNA

annot

database

TIME

Dharmacon

annot

database

Dharmacon

annot

database

V1

V2

V3

Purchase

Today

Less genes

More known oligo

New design

Transcripts analysis

Off
-
target information


your name

Annotation of human genome &

Reliability of
siRNA

libraries


Qiagen genome wide siRNA library

(HsDgV3 (Human Druggable Genome siRNA Set V3); HsNmV1 (Human Refseq Xm siRNA Set V1); HsXmV1 (Human Predicted
genome Set V1))

target genes

wrong predicted genes

target genes with off target(s)

siRNAs w/o off target(s)

siRNAs against wrong predicted genes

siRNAs with off target(s)

2006: 22’832 genes / 90’728 siRNAs

33%

25.6%

10.3%

30.6%

0.23 %

0.11

%

0.07 %

0.01

%

0.01

%

2 siRNA

3 siRNA


1 siRNA

4 siRNA

5 siRNA

6 siRNA

7 siRNA

8 siRNA

9 siRNA

%
of

genes
target by:

39%

41%

20%

71%

16.5%

12.5%

12.5 %

71 %

16.5 %

41 %

39 %

20 %

25.6 %

33 %

30.6 %

10.3 %

2010: 16’199 genes

Meier Roger

your name

Based on GENEID

your name

-

There are eliminated a lot of
Ribosomal siRNA

-

There are eliminated a lot of

siRNA against "
membrane
"
proteins (2061 in old list, 945 in clean list)

-

did not eliminate many siRNA against "
kinase
" proteins
(2624 in

old list, 1605 in list)

-

besides the 2800 proteins that there are discarded slightly
enriched in
virus

related pathways





Annotated off
-
target by companies

How did companies select these off targets siRNA ?

your name


siRNA

without

geneID?


The

web

site

shows

the

gene

that

the

siRNA

matched

at

the

time

it

was

selected
.

The

database

table

from

which

we

get

current

annotation

shows

what

the

siRNA

matches

in

the

current

Refseq
.


Library reduction

your name

Library handling

From pooled 2 Oligo based

Library analysis

your name

Bioinformatics
-

RNAi

UUGCCGUACAGGAUGGACGtg

UUAACUGAUGUUCCAAUCCtg



-
Off
-
target effect







your name

Bioinformatics

Sandra Kaestner

your name

Bioinformatics

your name

Workflow

your name

nucleolus

nucleoplasm

cytoplasm

Pol II

mRNA

transcription of

ribosomal mRNAs

mature 40S subunit

mature 60S subunit

80S ribosome

trans
-
acting
factors

90S particle

pre
-
60S particle

pre
-
40S particle

small ribosomal
proteins

large ribosomal
proteins

80S ribosome

translation of ribosomal proteins

maturation and export of


ribosomal mRNAs

ribosomal proteins

18S

23S rRNA

20S rRNA

18S

Pol III

5S rRNA

rDNA

Pol I

35S rRNA

transcription of ribosomal RNAs

18S

5.8S

25S

35S rRNA

18S rRNA maturation

maturation and export


of pre
-
40S particle


maturation and


export of pre
-
60Sparticle

final


maturation

18S rRNA

20S rRNA

18S

Project
Ribosome biogenesis

your name

mature 40S subunit

mature 60S subunit

80S ribosome

80S ribosome

ribosomal proteins

The Rps2
-
YFP read out

Nucleolus

Nucleoplasm

Cytoplasm

Thomas Wild

your name

mature 40S subunit

mature 60S subunit

80S ribosome

80S ribosome

ribosomal proteins

The Rps2
-
YFP read out

Nucleolus

Nucleoplasm

Cytoplasm

your name

mature 40S subunit

mature 60S subunit

80S ribosome

80S ribosome

ribosomal proteins

The Rps2
-
YFP read out

Nucleolus

Nucleoplasm

Cytoplasm

your name

mature 40S subunit

mature 60S subunit

80S ribosome

80S ribosome

ribosomal proteins

The Rps2
-
YFP read out

Nucleolus

Nucleoplasm

Cytoplasm

your name

Biogenesis

HCDC

RPS3

Transcript NM_001005

siRNA

84062
-

DTNBP1
AACCTTCAAAGCTGAACTAGA
DTNBP1

your name

Biogenesis

RNAi

Library

Eg. 4 oligo

Potential

Off
-
targets

RNAi

Library

+ Results =

Hit LIST

Results

Results

Hit

Wonder

Off
-
target

in

HIT LIST


Results

Off
-
target

in

HIT LIST


Hit LIST

Real

your name

Analysis Off targets

Known Off
-
targets

Build model

your name

AllStars

MVP_si03

MVP_si06

Uncoating

Acidification

AllStars

si03

si06

MVP

Tubulin

your name

Virus screen

1 out of 3


HCDC

MAP2K4

6416|NM_003010|2890|AS

Qiagen off
-
target SM

MAP2K4


siRNA

TGGGCCTGAGATGCAGGTAAA

MVP NM_003010

your name

mRNA 2D variants

-

2D structure relation to Off
-
target effects

-

We can model 2D structures quite robust (Metaserver, Python)

-

We can predict potential
-
target effects

-

We must find relation?

siRNA

mRNA

your name

mRNA

We want to identify structural motifs in a set of
mRNA sequences

your name

mRNA 3D Structure

-

Dicer, tRNA,

-

Off
-
target relations RNA
-
RNA i RNP
-
RNA

-

Model how RISC is related to microRNA/siRNA and how it finds own
target and bind to him

Nature Movie

Nature Movie

your name

RNA 3D

your name

your name

Workflow

ModeRNA

Python

BioPython for parsing structural data from the PDB format



your name

Database architecture (LIMS)

HCS and databases

Library and
annotation

Screening
experiment

Sample, Results,
Management, View

Public database

Phenotypic data

Done

But need maintenance

your name

OpenBis

-

Open Source
database

your name

Open Source database

HCDB
-

OpenBIS

Screen DB

your name

Open Source database

OpenBIS (ETH)

-
web client

-
Command line client

-
Java technology, GWT Google

your name

Open Source database

HCDB
-

OpenBIS

Screen DB

your name

Open Source database

HCDB
-

OpenBIS

Screen DB

Adam Srebniak

your name

Open Source database

HCDB
-

OpenBIS

Screen DB

your name

Image Processing and
KNIME

Adam Srebniak

your name

Open Source

KNIME

WEB

DESKTOP

your name

Oligo

dharmacon

With target gene

Oligo

qiagen

With target gene

External
databases

Oligo

ambion

With target gene

Ambion

Annot

database

Qiagen

annot


database

Dharmaco
nannot

database

Cross reference database for gene/
oligo

annotation


+

Workflow off
-
target prediction

OpenBis

Gene

Library

files

www

your name

Image Processing

One of the first Open Source: CellProfiler

your name


High Content


Image Processing (HCIP)

your name


High Content


Image Processing (HCIP)

your name

Teaching module

Slawek Mazur

your name

HCDC
-
HITS

Bio
-
Formats developed by
OME software

(Jason Swedlow),
UW
-
Madison
LOCI

and
Glencoe Software
.

your name

HCDC
-
HITS

Gabor Bakos

your name

HCDC
-
HITS

Gabor Bakos

your name

HCDC
-
HITS

your name

HCDC
-
HITS

your name

Visualization Improvement

Lukasz Zwolinski, ETH


PWR Student

your name

TU Konstanz

Dorit Merhof

Welcome to join

Bioquant Heidelberg and ETH

Holger Erfle, Karol Kozak, Berend Rind

Juergen Reymann, Gabor Csucs, Adam Srebniak


Slawek Mazur, Sandra Kaestner

Trinity College IR

Anthony Davies

MPI
-
IB, Berlin DE

Peter Braun

Andre Maeurer


TU Breslau:

Karol Kozak

Lukasz Miroslaw

Acknowledgement