BIOINFORMATICS 2012 - SocBiN

earthsomberBiotechnology

Sep 29, 2013 (4 years and 13 days ago)

420 views

BIOINFORMATICS  2012  
STOCKHOLM  JUNE  11-­‐14  2012  
h9p://
socbin.org
/bioinfo2012/  
Welcome
 
SocBiN
 
in  collaboration  with
 
Center  for  Biomembrane  Research
 
welcomes  
you  to  the  12th  annual  conference  in  bioinformatics.  This  year  the  conference  
will  be  held  in  beautiful  Stockholm  starting  at  lunch
-­‐
time  June  11  and  ending  
at  lunch  on  June  14.  It  will  be  
held  in  the  l
ecture  hall  "Berzelius"
 
(
Berzelius  
väg  3  /  Tomtebodavägen  at  the  bus  stop  for  SL  bus  69
)    
Stockholm
 
o
n  
the
 
Karolinska  Institutet
 
campus,  close  to
 
Science  for  Life  Laboratory,  
Stockholm
.  We  are  looking  forward  to  an  exciting  scientific  program  with  4  
invited  keynote  speakers  and  5  sessions  (Molecular  Machine
s,  Using  next  
generation  sequence  data,  Data  analysis  of  proteomics  assays,  Bioinformatics  
of  chemical  biology  and  RNA  bioinformatics).
 
We  wish  you  all  very  welcome
 
The  organization  committee
 
 
Arne  Elofsson,
 
Department
 
of  Biochemistry  and  Biophysics
,
 
Stockholm  
University
,  Erik  Lindahl,  Theoretical  Physics,
 
KTH
 
and  Bengt  
Persson,
 
Linköping  University
 
SocBin
 
 
SocBiN
 
(Society  for  Bioinformatics  in  Northern  Europe)  is  a  non
-­‐
profit  
organization  for  people  working  with  and  interested  in  bioinformatics  and  
computational  biology.  The  members  of  the  organization  are  predominantly  
from  the  Nordic  and  Baltic  countries,  but  ot
hers  are  also  welcome
.
 
We  are  grateful  for  the  help  of  our  session  chairs
 


Arne  Elofsson,Science  for  Life  Laboratory,  Stockholm  
University,Sweden
 


Lukas  Käll,  Science  for  Life  Laboratory,  KTH,Sweden
 


Anders  Andersson,  Science  for  Life  Laboratory,  KTH,Sweden
 


J
ens  Carlsson,  Center  for  Biomembrane  Research,  Stockholm  University,  
Sweden
 


Janusz  Bujnicki,  International  Institute  of  Molecular  and  Cell  Biology  in  
Warsaw,  Poland
 
 
Program

Mon 11

Data Analysis of Proteomics Assays

13:45
-
14:00

Arne Elofsson

Welcome

14:00
-
14:30

Ruedi
Aebersold

Searching and Mining of Proteomic SWATH
-
MS datasets

14:30
-
15:00

Lennart
Martens

Snakes and ladders: where do proteomics assays fail and how

can we fix them?

15:00
-
15:15

Finn Drabløs

The Triform algorithm: improved sensitivity and specificity in
ChIP
-
Seq peak finding

Coffee

15:45
-
16:15

Edward
Marcotte

Insights from proteomics into protein organization, evolution, and
genetic disease

16:15
-
16:45

Roman
Zubarev

Pathway Analysis in Expression Proteomics

16:45
-
17:00

Paul Horton

MoiraiSP: a novel mitochondrial cleavage site predictor

17:00
-
19:00

Reception and poster session (Presentation by odd numbers)




 
 
Tue June 12

RNA
Bioinformatics

09:30
-
10:00

Bob Darnell


10:00
-
10:30

Jan Gorodkin

Towards the search for RNA
-
RNA interaction based networks

10:30
-
10:45

Mihaela
Zavolan

A biophysical model to infer canonical and non
-
canonical
microRNA
-
target interaction

Coffee

11:15
-
11:45

Eric Westhof

The Detection of the Architectural Modules of RNA and Recent
Progress in RNA Modelling

11:45
-
12:15

Samuel Flores

A structural and dynamical model of human telomerase

12:15
-
12:30

Nanjiang Shu

Computational analysis of membrane
protein topology evolution

LUNCH

Keynotes Session

13:30
-
14:30

Anders Krogh

On the accuracy of short read mapping

14:30
-
15:30

Kerstin
Lindblad
-
Toh


Coffee



16:00
-
17:00

Jens Nielsen

Genome
-
Scale Metabolic Models: A Bridge between Bioinformatics
and
Systems Biology

17:00
-
18:00

Paul Horton

Excavating human NUMTs

18:00
-
19:00

Michael Levitt

olving the Recalcitrant Crystal Structure of Group II Chaperonin
TRiC/CCT by Mass Spectrometry and Sentinel Correlation Analysis

19:30
-
24:00

Conference Dinner




 
 
Wed 13

Bioinformatics of chemical biology

09:30
-
10:00

Gert Vriend

What can we (not yet) learn from 70 GPCR structures

10:00
-
10:30

Raymond
Stevens

Understanding Human G
-
protein Coupled Receptor Structural
Diversity and Modularity

10:30
-
10:45

David
Gloriam

Chemogenomic Discovery of Allosteric Antagonists at the GPRC6A
Receptor

Coffee

11:15
-
11:45

Helgi Schiöth

The origin of GPCRs, the largest family of membrane bound proteins

11:45
-
12:15

Andreas
Bender

Using Chemogenomics

Approaches to Modulate Biological Systems

12:15
-
12:30

Kentaro Tomii

PoSSuM: a database of known and potential ligand
-
binding sites in
proteins

Using Next generation sequence data

14:00
-
14:30

Jeroen Raes

Metagenomics

data analysis: from the oceans to the human
microbiome

14:30
-
15:00

Christopher
Quince

Extracting ecological signal from noisy microbiomics data

15:00
-
15:15

Johan
Bengtsson

Comprehensive Analysis of Antibiotic Resistance Genes in River
Sediment, Well Water and Soil Microbial Communities Using
Metagenomic DNA Sequencing

15:15
-
15:30

Daniel
Edsgärd

Allele specic expression changes after induction of inflammation

Coffee

16:00
-
16:30

Erik van
Nimwegen

Reconstructing transcription regulatory networks in mammals using
a combination of modeling and next
-
generation sequencing data

16:30
-
17:00

Joakim
Lundeberg

Sequencing and assembly of the largest and most complex genome
to
date
-

the Norway spruce (Picea abies)

17:00
-
17:30

Ivo Gut

High
-
resolution whole
-
genome analysis and cancer

17:30
-
19:00

Poster session (Presentation by even numbers)




 
 
Thu 14

Molecular machines

09:00
-
09:30

Martin Weigt

From sequence variability to

protein (complex) structure prediction

09:30
-
10:00

Burk
h
ard Rost

Evolution teaches protein prediction

10:00
-
10:15

Janusz
Bujnicki

If Thereʼs an Order in All of This Disorder…: Structural
Bioinformatics of the Human Spliceosomal Proteome

10:15
-
10:30

Joanna M
Kasprzak

PyRy3D: a software tool for modelling of large macromolecular
complexes

Coffee

11:00
-
11:30

Ingemar André

Design and Prediction of Protein Self
-
assembly

11:30
-
12:00

Rob Russel


12:00
-
12:15

Closing words

 
List of partcipants
Conference "Bioinformatics 2012", June 11-14 at Karolinska Institutet, Stockholm Sweden
First name
Surname/Family name
University/Organization
Country
E-mail address
Ruedi
Aebersold
ETH Zurich
SWITZERLAND
aebersold@imsb.biol.ethz.ch
Rahul
Agarwal
Chalmers
SWEDEN
deep_dude86@yahoo.com
Mehmood
Alam Khan
KTH
SWEDEN
malagori@kth.se
Raja Hashim
Ali
Kungliga Tekniska Hogskolan
SWEDEN
rhali@kth.se
Anders
Andersson
KTH
SWEDEN
anders.andersson@scilifelab.se
Ingemar
André
Lund University
SWEDEN
ingemar.andre@biochemistry.lu.se
Reidar
Andreson
University of Tartu
ESTONIA
reidar.andreson@ut.ee
Lars
Arvestad
Stockholm University
SWEDEN
arve@nada.su.se
Ahmad
Barghash
Saarland University
GERMANY
barghash@bioinformatik.uni-saarland.de
Walter
Basile
SWEDEN
walter.basile@scilifelab.se
Johan
Bengtsson
University of Gothenburg
SWEDEN
johan.bengtsson@neuro.gu.se
Jorrit
Boekel
Scilifelab Stockholm
SWEDEN
jorrit.boekel@scilifelab.se
Mikael
Borg
SWEDEN
mikael.borg@scilifelab.se
Susanne
Bornelöv
Uppsala university
SWEDEN
susanne.bornelov@icm.uu.se
John
Boss
Karolinska Institutet
SWEDEN
john.boss@ki.se
Fredrik
Boulund
Chalmers University of Technology
SWEDEN
fredrik.boulund@chalmers.se
Christian
Brüffer
Lund University
SWEDEN
christian.bruffer.679@student.lu.se
Torben
Brömstrup
KTH
SWEDEN
erik@kth.se
Janusz
Bujnicki
IIMCB
POLAND
iamb@genesilico.pl
Ignas
Bunikis
Uppsala University
SWEDEN
ignas.bunikis@igp.uu.se
Jens
Carlsson
Stockholm University
SWEDEN
jens.carlsson.lab@gmail.com
Alexey
Chernobrovkin
IBMC RAMS
RUSSIA
chernobrovkin@gmail.com
Anna
Czerwoniec
Adam Mickiewicz University
POLAND
aczerwo@amu.edu.pl
Robert
Darnell
The Rockefeller University
UNITED STATES
darnelr@rockefeller.edu
Carsten
Daub
Karolinska Institutet and SciLifeLab
SWEDEN
carsten.daub@ki.se
Ino
De Bruijn
Stockholm University
SWEDEN
ino.debruijn@scilifelab.se
Finn
Drablos
Norwegian Univ of Science and Technology
NORWAY
finn.drablos@ntnu.no
Lei
Du
Karolinska Institutet
SWEDEN
lei.du@ki.se
Stanislaw
Dunin-Horkawicz
IIMCB
POLAND
sdh@genesilico.pl
Daniel
Edsgärd
KTH, Science for Life Laboratory
SWEDEN
daniel.edsgard@scilifelab.se
Arne
Elofsson
Principal Investigator/Lab Head/Senior R
SWEDEN
arne@bioinfo.se
Olof
Emanuelsson
Kungliga Tekniska Högskolan
SWEDEN
olof.emanuelsson@scilifelab.se
Hassan
Foroughi Asl
Karolinska Institutet
SWEDEN
Hassan.foroughi@gmail.com
Oliver
Frings
SWEDEN
oliver.frings@sbc.su.se
Mattias
Frånberg
Stockholms Universitet/Karolinska Instit
SWEDEN
mattias.franberg@ki.se
David
Gloriam
University of Copenhagen
DENMARK
dg@farma.ku.dk
David
Gomez-Cabrero
BILS
SWEDEN
david.gomezcabrero@bils.se
Jan
Gorodkin
University of Copenhagen
DENMARK
gorodkin@rth.dk
Viktor
Granholm
Stockholm University
SWEDEN
viktor.granholm@scilifelab.se
Svenn Helge
Grindhaug
Uni Research
NORWAY
svenn.grindhaug@uni.no
Dimitri
Guala
SWEDEN
dimitri.guala@scilifelab.se
Ivo
Gut
Centro Nacional de Análisis Genómico
SPAIN
igut@pcb.ub.cat
Mohamed
Hamed
Saarland University
GERMANY
mhamed@bioinformatik.uni-saarland.de
Sampsa
Hautaniemi
University of Helsinki
FINLAND
sampsa.hautaniemi@helsinki.fi
Sikander
Hayat
hayat221@googlemail.com
Paul
Horton
AIST, Computational Biology Res. Ctr.
JAPAN
horton-p@aist.go.jp
Luisa
Hugerth
KTH
SWEDEN
luisa.hugerth@scilifelab.se
Lina
Hultin Rosenberg
Karolinska Institutet
SWEDEN
lina.hultin-rosenberg@ki.se
Lukasz
Huminiecki
Lukasz.Huminiecki@gmail.com
Katherine Abigail
Icay
University of Helsinki
FINLAND
katherine.icay@helsinki.fi
Henrik
Johansson
Karolinska Institute
SWEDEN
henrik.johansson@ki.se
Anna
Johnning
University of Gothenburg
SWEDEN
anna.johnning@gu.se
Viktor
Jonsson
University of Gothenburg
SWEDEN
viktor.jonsson@chalmers.se
Sini
Junttila
University of Turku
FINLAND
sjunttil@btk.fi
Mette
Jørgensen
University of Copenhagen
DENMARK
mette@binf.ku.dk
Yvonne
Kallberg
Karolinska Institutet
SWEDEN
yvonne.kallberg@ki.se
Joanna Kasprzak
Kasprzak
Adam Mickiewicz University
POLAND
jkasp@amu.edu.pl
Zeeshan
Khaliq
Uppsala University
SWEDEN
khaliq.zeeshan@gmail.com
Per
Kraulis
SWEDEN
per.kraulis@scilifelab.se
Anders
Krogh
University of Copenhagen
DENMARK
krogh@binf.ku.dk
Deepak
Kumar
Adam Mickiewicz University
POLAND
deepak.k.choubey@gmail.com
Mayank
Kumar
Saarland University
GERMANY
mayankumar@gmail.com
Kanthida
Kusonmano
Uni Research AS
NORWAY
kanthida.kusonmano@uni.no
Leena
Kytömäki
University of Turku
FINLAND
leena.kytomaki@btk.fi
Lukas
Käll
Royal Institute of Technology (KTH)
SWEDEN
lukas.kall@scilifelab.se
Jens
Lagergren
KTH
SWEDEN
jensl@csc.kth.se
Silja
Laht
University of Tartu
ESTONIA
siljal@ut.ee
Dan
Larhammar
Uppsala University
SWEDEN
dan.larhammar@neuro.uu.se
Ksenia
Lavrichenko
University of Bergen
NORWAY
ksenia.lavrichenko@gmail.com
Fredrik
Levander
Lund University
SWEDEN
fredrik.levander@immun.lth.se
Michael
Levitt
Stanford University
UNITED STATES
michael.levitt@stanford.edu
Sara
Light
SWEDEN
sara.light@scilifelab.se
Erik
Lindahl
KTH Royal Institute of Technology
SWEDEN
erik@kth.se
Jessica
Lindvall
Karolinska Institutet
SWEDEN
jessica.lindvall@ki.se
Joakim
Lundeberg
KTH, Science for Life Laboratory
SWEDEN
joalun@kth.se
Ingrid
Lundell
Uppsala University
SWEDEN
dan.larhammar@neuro.uu.se
Fredrik
Lysholm
Linköping University
SWEDEN
frely@ifm.liu.se
Ari
Löytynoja
University of Helsinki
FINLAND
ari.loytynoja@helsinki.fi
Muhammad Owais
Mahmudi
KTH
SWEDEN
mahmudi@kth.se
Edward
Marcotte
University of Texas
UNITED STATES
edward.marcotte@gmail.com
Tonu
Margus
University of Tartu
ESTONIA
tmargus@ebc.ee
Lennart
Martens
VIB and Ghent University
BELGIUM
lennart.martens@vib-ugent.be
Paula
Martinez
Chalmers University of Technology
SWEDEN
apaula@student.chalmers.se
Dorota
Matelska
IIMCB
POLAND
dmatelska@genesilico.pl
Veli
Mäkinen
University of Helsinki
FINLAND
vmakinen@cs.helsinki.fi
Jens
Nielsen
Chalmers University of Technology
SWEDEN
nielsenj@chalmers.se
Henrik
Nielsen
Technical University of Denmark
DENMARK
hnielsen@cbs.dtu.dk
Roland
Nilsson
Karolinska Institutet
SWEDEN
roland.nilsson@ki.se
Wieslaw
Nowak
Uniwersytet M.Kopernika w Toruniu
POLAND
wiesiek@fizyka.umk.pl
Johan
Nylander
Swedish Museum of Natural History
SWEDEN
johan.nylander@nrm.se
Pall Isolfur
Olason
Uppsala University
SWEDEN
pall.olason@ebc.uu.se
Ananta
Paine
Karolinska Institute
SWEDEN
ananta.paine@ki.se
Pekka
Parviainen
KTH
SWEDEN
pekka.parviainen@alumni.helsinki.fi
Maria
Pernemalm
Karolinska Institutet
SWEDEN
maria.pernemalm@ki.se
Bengt
Persson
Linköping University and BILS
SWEDEN
bpn@ifm.liu.se
Christoph
Peters
SWEDEN
christoph.peters@scilifelab.se
Kjell
Petersen
Uni Research AS
NORWAY
Kjell.Petersen@uni.no
Robert
Pilstål
Linköping University
SWEDEN
robpi892@student.liu.se
Rui
Pinto
Umeå University
SWEDEN
rui.pinto@chem.umu.se
Iman
Pouya
Royal Institute of Technology
SWEDEN
iman.pouya@gmail.com
Jasna
Pruner
Uppsala University
SWEDEN
dan.larhammar@neuro.uu.se
Christopher
Quince
University of Glasgow
UNITED KINGDOM
quince@civil.gla.ac.uk
Jeroen
Raes
Vrije Universiteit Brussel
BELGIUM
jeroen.raes@gmail.com
Balaji
Rajashekar
Tartu University
ESTONIA
balaji@ut.ee
Anirudh
Ranganathan
anirudhranganathan@gmail.com
Henri
Raska
Tallinn University of Technology
ESTONIA
henri@tftak.eu
Johan
Reimegård
Royal Institute of Technology
SWEDEN
johan.reimegard@scilifelab.se
Maido
Remm
University of Tartu
ESTONIA
maido.remm@ut.ee
Dirk
Repsilber
Leibniz Inst. for Farm Animal Biology
GERMANY
repsilber@fbn-dummerstorf.de
Ana Maria
Rodriguez Sanchez
KAROLINSKA UNIVERSITY HOSPITAL
SWEDEN
ana.rodriguez@ki.se
Burkhard
Rost
Technische Universitaet Muenchen
GERMANY
assistant@rostlab.org
Arcadio
Rubio García
Technical University of Denmark
DENMARK
arcadio@cbs.dtu.dk
Robert
Russell
University of Heidelberg
GERMANY
robert.russell@bioquant.uni-heidelberg.de
Kristoffer
Sahlin
KTH/Science for life Laboratory
SWEDEN
ksahlin@kth.se
Helgi
Schiöth
Uppsala University
SWEDEN
helgis@bmc.uu.se
Thomas
Schmitt
SWEDEN
thomas.schmitt@scilifelab.se
Sophie
Schwaiger
KTH
SWEDEN
cschwaig@sbc.su.se
Bengt
Sennblad
Karolinska Institutet
SWEDEN
bengt.sennblad@ki.se
Alexey
Sergushichev
University ITMO
RUSSIA
alsergbox@gmail.com
Hossein
Shahrabi Farahani
KTH
SWEDEN
farahani@kth.se
Nanjiang
Shu
SWEDEN
nanjiang@sbc.su.se
Gilad
Silberberg
Stockholm University
SWEDEN
gilad.zil@molbio.su.se
Indranil
Sinha
Karolinska Institute
SWEDEN
indranil.sinha.2@ki.se
Joel
Sjöstrand
SWEDEN
joel.sjostrand@scilifelab.se
Marcin
Skwark
POLAND
marcin@skwark.pl
Erik
Sonnhammer
SWEDEN
erik.sonnhammer@sbc.su.se
Matthew
Studham
SWEDEN
matthew.studham@scilifelab.se
Shravan
Sukumar
University of Wisconsin- Madison
UNITED STATES
sukumar@wisc.edu
Valentine
Svensson
SWEDEN
valentine@scilifelab.se
Thomas
Svensson
Science for Life Laboratory
SWEDEN
thomas.svensson@scilifelab.se
Christian
Tellgren-Roth
Uppsala University
SWEDEN
christian.tellgren@igp.uu.se
Andreas
Tjärnberg
SWEDEN
andreas.tjarnberg@sbc.su.se
Kentaro
Tomii
AIST
JAPAN
tomii@cbrc.jp
Ikram
Ullah
KTH
SWEDEN
ikram.ullah@yahoo.com
Per
Unneberg
SWEDEN
per.unneberg@scilifelab.se
Björn
Wallner
Linköping University
SWEDEN
bjornw@ifm.liu.se
Roman
Valls Guimera
SWEDEN
roman@scilifelab.se
Erik
Van Nimwegen
University of Basel
SWITZERLAND
erik.vannimwegen@unibas.ch
Lixiao
Wang
Umeå University
SWEDEN
lixiao.wang@chem.umu.se
Per
Warholm
SWEDEN
per.warholm@scilifelab.se
Martin
Weigt
University Pierre & Marie Curie
FRANCE
martin.weigt@upmc.fr
Björn
Wesén
KTH
SWEDEN
bjorn.wesen@gmail.com
Eric
Westhof
University of Strasbourg, IBMC-CNRS
FRANCE
e.westhof@ibmc-cnrs.unistra.fr
Francesco
Vezzi
KTH Royal Institute of Technology
SWEDEN
francesco.vezzi@scilifelab.se
Viola
Volpato
University College Dublin (UCD)
IRELAND
viola.volpato@ucdconnect.ie
Gert
Vriend
CMBI
NETHERLANDS, THE
vriend@cmbi.ru.nl
Bo
Xu
Uppsala University
SWEDEN
dan.larhammar@neuro.uu.se
Özge
Yoluk
KTH
SWEDEN
ozyo@kth.se
Katarzyna
Zaremba-Niedzwiedzka
Uppsala University
SWEDEN
Katarzyna.Zaremba@icm.uu.se
Weizhou
Zhao
Uppsala University
SWEDEN
weizhou.zhao.9350@student.uu.se
Marie
Öhman
Stockholm University
SWEDEN
marie.ohman@molbio.su.se
Linus
Östberg
Karolinska Institutet
SWEDEN
linus.ostberg@ki.se
Abstracts from
Invited speakers

Ruedi Aebersold, Institute of Molecular Systems Biology, ETH
Zurich and Faculty of Science, University of Zurich

Searching and Mining of Proteomic SWATH
-
MS datasets

Recently we introduced a new data independent (DIA) acquisition
method termed SWATH
-
MS (1). This method, in effect, is a time
-
and
-
mass segmented acquisition method where complex, high
-
specificity
fragment ion maps of all precursor ions within a user
-
define
d
precursor RT and m/z space are being generated and recorded. This
is accomplished by stepping the isolation window of a specifically
tuned quadrupole time
-
of
-
flight (QqTOF) instrument in discrete
increments recursively throughout the duration of the LC s
eparation.
The data acquired by SWATH
-
MS are not searchable by
conventional database search engines, because each fragment ion
spectrum is a composite of multiple, concurrently fragmented
precursor ions.

In this presentation we will describe an automatic p
ipeline for peptide
identification and quantification from SWATH
-
MS datasets. It is
conceptually related to the mProphet algorithm developed for the
analysis of S/MRM datasets (2). The algorithm applies a targeted
search strategy, whereby peak groups uniqu
ely identifying a
particular peptide are extracted from the SWATH
-
MS dataset and
assigned a probability of being correctly associated with the target
peptide. The algorithm uses a system of individual feature score
rankings that are then combined into a co
mposite score.

The performance of the method will be illustrated with selected
examples that indicate the power of the approach for the reproducible
analysis of proteomes, the detection of modified peptides and the
estimation of the absolute quantity of pr
oteins and proteomes.

1.

Gillet LC, Navarro P, Tate S, Roest H, Selevsek N, Reiter L, Bonner R,
Aebersold R. (2012) Targeted data extraction of the MS/MS spectra
generated by data independent acquisition: a new concept for
consistent and accurate proteome ana
lysis MCP [Epub ahead of print]

2.

Reiter L, Rinner O, Picotti P, Huettenhain R, Beck M, Brusniak MY,
Hengartner MO, Aebersold R. (2011) mProphet: automated data
processing and statistical validation for large
-
scale SRM experiments.
Nat Methods: 8(5):430
-
5.

I
ngemar André Center for Molecular Protein Science
Biochemistry and Structural Biology Lund University
ingemar.andre@biochemistry.lu.se

Design and Prediction of Protein Self
-
assembly

Many of the largest protein complexes in biology are composed of a
single
type of subunit that is repeated a large number of times to
generate a functional assembly. Such homomeric structures are often
assembled spontaneously from individual components through the
process of self
-
assembly. Research in our group is focused on the

prediction of the three
-
dimensional structure of homomeric
assemblies and the rational design of novel self
-
assembling proteins
and peptides. Over the last several years we have developed
computational methods to model the structure of homomeric
assemblie
s using the powerful constraint of molecular symmetry. In
this presentation I will illustrate how these prediction methods, in
conjunction with limited experimental constraints, can be used to
tackle important problems in structural biology. The second par
t of
the talk will deal with the rational design of self
-
assembling proteins
and peptides. We combine the powerful design template of self
-
assembly with structural modeling and computational protein to
design protein assemblies on an atomic level.



Andreas
Bender, Cambridge University

Using Chemogenomics Approaches to Modulate Biological
Systems

Modulating biological systems can be achieved via biological means
(such as knock
-
out animals, or RNA interference etc.); however,
chemical modulation by small molec
ules is an alternative method
with significantly different properties, such as the ability to control
dose and timecourse of the administration in detail. In this
presentation, different methods for the analysis of the mode
-
of
-
action
of small molecules whi
ch show an effect in phenotypic assays will be
discussed, in order to understand small molecule action better. Also,
reversing the direction of the analysis, we will outline how large
bioactivity databases available today can be be used to design
molecules

with the desired effect on a biological system, be it by
modulating single targets or, becoming more popular recently, by
modulating a defined set of target proteins.



Samuel Flores, Uppsala University

A structural and dynamical model of human telomerase

M
utations in the telomerase complex disrupt either nucleic acid
binding or catalysis, and are the cause of numerous human diseases.
Despite its importance, the structure of the human telomerase
complex has not been observed crystallographically, nor are its

dynamics understood in detail. Fragments of this complex from
Tetrahymena thermophila and, more controversially,Tribolium
castaneum have been crystallized. Biochemical probes provide
important insight into dynamics. In this work we use available
structura
l fragments to build a homology model of human TERT, and
validate the result with functional assays. We then generate a
trajectory of telomere elongation following a “typewriter” mechanism:
the RNA template moves to keep the end of the growing telomere in
the active site, disengaging after every 6
-
residue extension to
execute a “carriage return” and go back to its starting position. A
hairpin can easily form in the telomere, from DNA residues leaving
the telomere
-
template duplex. The trajectory is consisten
t with
available experimental evidence and suggests focused biochemical
experiments for further validation.



Jan Gorodkin, Center for non
-
coding RNA in Technology and
Health, Denmark

Towards the search for RNA
-
RNA interaction based networks

Within recent ye
ars the awareness of non
-
coding RNAs has
increased rapidly and experimental as well as in silico results
elucidate the large potential. Here, the motivation takes outset in the
thousands of in silico generated RNA structure candidates in the
genome. A majo
r challenge is to assign function to these. The first
step is to search for RNA interactions to other RNAs (DNA or
proteins). Searching for RNA
-
RNA interactions is in general a time
consuming task. As a first approach we have developed an approach
searchin
g for only near complement interactions (ignoring intra
molecular base pairs). We show that this approach is faster than
existing methods, while maintaining accuracy and show that the
method can be used as filter (on existing methods) for microRNA
target s
earch. In a case study on microRNAs, we combined target
predictons (conserved in human and mouse) to protein coding genes
with literature mining and obtained a combined enrichment to only
transcriptor factors (TFs) and subsequently found that TFs are also
enriched for targeting microRNAs. Our results suggests a network of
mutual activating and suppressive regulation.



Ivo Gut, Centro Nacional de Analysis Genomico, C/Baldiri
Reixac 4, 08028 Barcelona, Spain.

High
-
resolution whole
-
genome analysis and cancer

Th
e International Cancer Genome Consortium (ICGC) aims to fully
characterize in the 50 most common forms of cancer 50
tumour/normal sample pairs exhaustively and then to validate
observations in further 450 samples. The first three years of this
project have

seen huge advances in the development,
implementation and standardisation of the methods for characterising
samples, ethical approval, whole
-
genome sequencing, exome
sequencing, RNA sequencing, epigenetic analysis, methods for
validation, informatics anal
ysis and data basing.

The Spanish contribution to the ICGC is on Chronic Lymphocytic
Leukaemia (CLL). Our main responsibility has been on whole
genome sequence analysis, exome analysis, RNA sequence analysis
and epigenetic analysis. Complete genome sequenc
ing of many
samples requires bringing together many different elements, starting
from samples, preparation for sequencing, sequencing itself, data
analysis, through to verification of results and translating a result into
biological knowledge. Thorough exa
mination of the first 4
tumour/normal pairs and follow up in a large replication set allowed
us to identify four recurrent in the NOTCH1, XPO1, MYD88 and the
KLHL6 genes. In an extension we analysed 100 tumour/normal pairs
by exome sequencing which allowed

the identification of further
recurrent somatic mutations, the most frequent being in SF3B1 and
POT1. Interestingly the two recognised subtypes of CLL,
immunoglobulin modulated and not, do not completely reflect
themselves in the recurrent mutations. The
methods and findings will
be discussed.



Paul Horton, CBRC

Excavating human NUMTs

NUMTs (Nuclear mtDNA), are partial copies of the mitochondrial
genome found in the nuclear genome. They are sometimes referred
to as molecular fossils, and, due to the higher
mutation rate of
mtDNA, can in some cases be more similar to parts of our ancestral
mtDNA than our extent mtDNA genome is. The existence of NUMTs
has been known for decades and many informatics studies on
NUMTs have attempted to elucidate the characteristi
cs of their
insertion sites. By showing that NUMTs are typically very clean
insertions with only minimal deletion or duplication of the surrounding
nuclear DNA, these studies have lead to a consensus opinion that
most NUMTs are likely inserted as filler DN
A via NHEJ (Non
-
Homologous End Joining). Previous informatics studies have not
shed much light upon the preferred insertion sites of NUMTs. Most of
them conclude that NUMT insertion is random
--

except for
contradictory reports that NUMTs correlate positiv
ely, or negatively,
with retrotransposons. Fortunately, by employing more careful
methodology, we were able to discover several as yet undiscovered
aspects of this phenomenon. We found that inferred NUMTs insertion
sites strongly correlate with predicted p
hysical properties of DNA
(curvature and bendability) and A+T rich oligomers. Moreover,
recently inserted NUMTs correlate strongly with nucleosome free
regions as measured by DNase
-
seq and FAIRE
-
seq. We also firmly
establishing that NUMTs do indeed tend to

co
-
occur with
retrotransposons. As for the source mtDNA which is copied to create
NUMTs, we find that part of the mtDNA D
-
loop region is very seldom
copied. Relating these facts to concrete hypotheses regarding the
mechanism of NUMT insertion proved very
challenging, but also
fascinating, as it touched upon diverse topics in molecular biology:
from retrotransposon activity and DNA repair to evolutionary
conservation of chromatin structure and the packaging of mtDNA.

REFERENCES

Tsuji et al., under revision,

NAR



Anders Krogh, Copenhagen University, Denmark

On the accuracy of short read mapping.

Next
-
generation DNA sequencing technologies produce huge
amounts of DNA sequence reads. Often the initial bioinformatics task
is to map these reads to a reference genome. For this, well
-
tested
methods like Blast are way too slow and next
-
generation
bioinfo
rmatics tools are needed. Several new methods have been
developed, some of which builds on the Burrows
-
Wheeler index


an
elegant indexing of the genome that facilitates fast searches in a
small memory footprint. These methods are based on mapping the
read
s exactly apart from a few mismatches and indels. Most of them
do not report any significance or probability that a match is actually
correct. In this talk I will briefly review the field, give some general
results for mapping accuracy, and suggest a more
precise notion of
uniqueness. I will also present a probabilistic approach to short read
mapping, which uses quality scores to calculate mapping
probabilities. This can improve mapping accuracy, in particular when
mapping very short reads, such as small RN
As, various tag
sequences, and ancient DNA. The effect on mapping performance
will be illustrated using both simulated and actual DNA reads.



Michael Levitt, Stanford, USA

Solving the Recalcitrant Crystal Structure of Group II
Chaperonin TRiC/CCT by Mass Sp
ectrometry and Sentinel
Correlation Analysis

Eukaryotic group II Chaperonin TRiC or CCT is a 0.95 megadalton
protein complex that is essential for the correct and efficient folding of
cytosolic polypeptides.


The closed form is a 16 nm sphere made of
two h
emi‐spherical rings of 8 subunits (~550 residues/subunit) that
rotate to open a central folding chamber.



In eukaryotes, 8 different
genes encode the subunits of this ATP‐powered nanomachine.



The
high sequence identity of subunits made the 40,320 (=8 fa
ctorial)
possible arrangements indistinguishable in previous cryo‐electron
microscopy and crystallographic analysis.



We solve this problem by
independent studies on bovine and yeast TRiC chaperonin.




First we use cross‐linking, mass spectrometry and co
mbinatorial
homology modeling.




We react bovine TRiC under native conditions
with a lysine‐specific cross‐linker, follow up with trypsin digestion,
and use mass spectrometry to identify 63 cross‐linked pairs providing
distance restraints.



Independently

of the cross‐link set, we construct
all 40,320 homology models of the TRiC particle.



When we
compared each model with the cross‐link set, we discovered that one
model is significantly more compatible than any other
model.



Bootstrapping analysis confir
ms that this model is 10 times
more likely to result from this cross‐link set than the next best‐fitting
model.

Second, we re‐examine the 3.8 Å resolution X‐ray data of yeast
TRiC.


Our method of Sentinel Correlation Analysis (SCA)
exhaustively tests all 2
,580,460 possible models.


This unbiased
analysis singles out with overwhelming significance one model, which
is fully consistent with our previous biochemical data and refines to a
much lower Rfree value than reported previously with the same X‐ray
data.


With four‐fold averaging, our structure reveals remarkably
resolved details of the unique conformation of each subunit, and
suggests a mechanism for the initiation of transition to the open
state.


More generally, we expect SCA to resolve ambiguity in fut
ure
low‐resolution crystallographic studies.



Joakim Lundeberg, SciLifeLab, Sweden and The Spruce
genome project

Sequencing and assembly of the largest and most complex
genome to date
-

the Norway spruce (Picea abies)

Conifers are the dominant plant species

in many ecosystems,
including large areas in Sweden. Despite this, no conifer genome has
yet been published, mainly owing to their large size and complexity.
The lack of a genome sequence has hampered our understanding of
conifer biology and evolution, as

well as the development of potential
novel breeding strategies of these economically important species.

We are currently performing whole genome sequencing and
assembly of the 20 Gbp Norway spruce genome. This genome
contains huge amounts of repeated elem
ents, with an estimated
gene density of only 1/500 kbp. In common with other tree genomes,
heterozygozity is high, which further complicates the assembly
process. The Spruce Genome Project is addressing questions of
genome size, content and evolution, incl
uding analyses of gene
families and repeats, and will establish Norway spruce as a prime
model species for conifer research.

In this talk, we will present our main strategies concerning
sequencing and assembly of this de novo genome, and give an
update on
the results obtained so far. In brief, we use a combination
of whole genome shotgun and fosmid pool sequencing, followed by
scaffolding and merging of the separate assemblies. This is
complemented by a manually curated spruce
-
specific repeat library,
seque
ncing of random fosmid clones for assembly benchmarking, as
well as assemblies of the chloroplast and mitochondrial genomes.



Ed Marcotte, University of Texas Austin, US

Insights from proteomics into protein organization, evolution,
and genetic disease



Lenn
art Martens, Lennart Martens VIB, Gent, Belgium

Snakes and ladders: where do proteomics assays fail and how
can we fix them?

Proteomics assays increasingly rely on two distinct and largely
independent informatics processing steps: identification and
quanti
fication. Both procesing steps can rely on a plethora of
available algrotihms and tools, but the maturity of these algorithms is
quite distinct. Whereas identification is typically handled by venerable
algorithms called search engines, that have been in us
e for many
years, quantification algorithms are still continuously evolving to
accommodate the increasing resolution and sensitivity of modern
mass spectrometers. Despite this difference in maturity, both steps
can be improved. Indeed, the performance of c
urrent quantitative
workflows can be boosted by simply combining several of them into a
single, joint analysis, making the most of the specific sensitivities of
each of the algorithms used. On the other hand, the long
-
serving
search engines have also reach
ed crucial limits in terms of
specificity, effectively preventing proteomics from reaching a central
status in the life sciences. Fortunately, this inherent limitation of
current search engines can be fixed by improving the way in which
we use the measurem
ents provided by the mass spectrometer. We
will here discuss these developments, and highlight how both
quantification and identification can be improved; the former by
incremental advances, the latter by a more radical change in
approach.



Jens Nielsen Dep
artment of Chemical and Biological
Engineering, Chalmers University of Technology, Gothenburg,
Sweden

Genome
-
Scale Metabolic Models: A Bridge between
Bioinformatics and Systems Biology

We are currently working on building a Human Metabolic Atlas, a
novel w
eb
-
based database and modelling tool that can be used by
medical and pharmaceutical researchers to analyse clinical data with
the objectives of identifying biomarkers associated with disease
development and improving health care. The central technology in
the Human Metabolic Atlas is so
-
called genome
-

scale metabolic
modelling (GEMs), which will be made tissue
-
specific by using
different types of experimental data, e.g. from the Human Protein
Atlas. These models allow for context
-
dependent analysis of clini
cal
data, providing much more information than traditional statistical
correlation analysis, and hence advance the identification of
biomarkers from high
-
throughput experimental data that can be used
for early diagnosis of metabolic related diseases. As pa
rt of the
Human Metabolic Atlas we are developing GEMs for the gut
microbiome. In this context we are using metagenomics for
identification of different metabolic functions that are associated with
human diseases. Here we are using metagenomics sequencing
data
from the gut microbiome of patients with different diseases, e.g.
arteriosclerosis and type 2 diabetes. Through the combination of the
bacterial GEMs and metagenomics data we have identified enriched
metabolic functions in the microbiome, and based on

this we point to
novel prospective biomarkers for disease development. We are
further integrating metagenomics information into predictive
metabolic models that have the prospect for simulation of how the gut
microbiome will respond to diet.



Raymond Steve
ns, The Scripps Research Institute, USA

Understanding Human G
-
protein Coupled Receptor Structural
Diversity and Modularity

GPCRs constitute one of the largest protein families in the human
genome and play essential roles in normal cell processes, most
nota
bly in cell signaling. The human GPCR family contains more than
800 members and recognizes thousands of different ligands and
activates a number of signaling pathways through interactions with a
small number of binding partners. GPCRs have also been implic
ated
in numerous human diseases, and represent more than 40% of drug
targets. Delivering GPCR structures in close collaboration with
experts on specific receptor systems is of immense value to the basic
science community interested in cell signaling and mo
lecular
recognition, as well as the applied science community interested in
drug discovery. This work is being followed up with additional
biophysical characterization including NMR spectroscopy, HDX mass
spectrometry, medicinal chemistry and community wid
e assessments
with computational biology groups throughout the world. Crystal
structures are now available for rhodopsin, adrenergic, and
adenosine receptors in both inactive and activated forms, as well as
for chemokine, dopamine, histamine, S1P1, muscari
nic, opioid
receptors in inactive conformations. A review of the common
structural features seen in these receptors will be discussed and the
scope of structural diversity of GPCRs at different levels of homology
provides insight into our growing understan
ding of the biology of
GPCR action and their impact on drug discovery. Given the current
set of GPCR structural data, a distinct modularity is now being
observed between the extracellular (ligand
-
binding) and intracellular
(signaling) regions. The rapidly
expanding repertoire of GPCR
structures provides a solid framework for experimental and molecular
modeling studies, and helps to chart a roadmap for comprehensive
structural coverage of the whole superfamily and an understanding of
GPCR biological and ther
apeutic mechanisms. The long range goal
is to understand GPCR molecular recognition and evolution in
relation to human cognition.

This work was supported by NIGMS PSI:Biology for GPCR structure
processing (U54GM094618) and the NIH Roadmap Initiative
(JCIMP
T) for technology development (GM073197).



Burkhard Rost, TU Munich

Evolution teaches protein prediction

The objective of our group is to predict aspects of protein function
from sequence. The only reason why we can pursue such an
ambitious goal is the weal
th of evolutionary information available
through the comparison of the whole bio
-
diversity of species. Many
approaches have benefited substantially from using evolutionary
information; for some of these methods learning from evolution made
the difference b
etween possible and impossible. In my talk I will
present examples of methods that target the prediction of protein
interactions, of protein disorder, and of the effect of single residue
mutations upon protein structure and function.



Schiöth HB.

The origin of GPCRs, the largest family of membrane bound
proteins

G protein
-
coupled receptors (GPCRs) are the largest superfamily
among membrane bound proteins. The GPCRs in huma
ns are
classified into the five main families named Glutamate, Rhodopsin,
Adhesion, Frizzled and Secretin according to the GRAFS
classification. Several families of GPCRs show however no apparent
sequence similarities to each other, and it has been debated

which of
them share a common origin. Mining of early vertebrates including
lancelet (Branchiostoma floridae) and one of the most primitive
animals, the cniderian sea anemones (Nematostella vectensis)
provided considerable evidence suggesting that the Adhe
sion family
is ancestral to the peptide hormone binding Secretin family of
GPCRs. We also used integrated and independent HHsearch,
Needleman
-
Wunsch
-
based and motif analyses to determine at the
relationship of the other main families. We found strong evide
nce that
the Adhesion and Frizzled families are children to the cyclic AMP
(cAMP) family while the large Rhodopsin family is likely a child of the
cAMP family. We suggest that the Adhesion and Frizzled families
originated from the cAMP family in an event c
lose to that which gave
rise to the Rhodopsin family. We also found convincing evidence that
the Rhodopsin family is parent to the important sensory families;
Taste 2 and Vomeronasal type 1 as well as the Nematode
chemoreceptor families. The insect odorant
, gustatory, and
Trehalose receptors, frequently referred to as GPCRs, form a
separate cluster without relationship to the other families, and we
propose, based on these and other results, that these families are
ligand
-
gated ion channels rather than GPCRs
. We suggest common
descent of at least 97% of the GPCRs sequences found in humans.
Moreover, we provide the first evidence that four of the five main
mammalian families of GPCRs, namely Rhodopsin, Adhesion,
Glutamate and Frizzled, are present in Fungi. Th
e unicellular
relatives of the Metazoan lineage, Salpingoeca rosetta and
Capsaspora owczarzaki have a rich group of both the Adhesion and
Glutamate families, which in particular provided insight to the early
emergence of the N
-
terminal domains of the Adhes
ion family. Further
mining of Dictyostelium discoideum suggests that the Glutamate
family is as ancient as the cAMP receptor family. Together, these
studies clarify the early evolutionary history of the GPCR superfamily
and their emergence could be traced
back at least 1400 MYA.

Gert Vriend,Radboud University Nijmegen Medical Centre,
Neatherlands

What can we (not yet) learn from 70 GPCR structures?

Headed by the next speaker, the crystallography community has
cracked the GPCR crystallisation problem, and th
e past years we
have seen at least one new GPCR structure enter the PDB each
month. These structures are in an active state, semi active state,
inactive state, or sometimes also an artefactual state. We have been
comparing all available structures trying t
o average out the things
done to make the GPCRs crystallize (mutation of crucial residues;
adding llama antibodies; adding funny salts and lipids; cloning
-
in
lysozyme). The shear volume of data now allows us to extract the
beginning of a coherent story abo
ut the activation of GPCRs. Not
surprisingly, this story agrees more with basic laws of physics and
thermodynamics, and less with the myriads of funny activation
schemas that include distict states like R, R*, etc, that have entered
the literature over the

years.

Martin Weigt, University of Sorbonne, France

From sequence variability to protein (complex) structure
prediction

Many families of homologous proteins show a remarkable degree of
structural and functional conservation, despite their large variabilit
y in
amino acid sequences. We have developed a statistical
-
mechanics
inspired inference approach to link this variability (easy to observe) to
structure (hard to obtain), i.e. to infer directly co
-
evolving residue
pairs which turn our to form native contac
ts in the folded protein with
high accuracy. The gained information is used to guide tertiary and
quaternary structure prediction. As a specific example, I will discuss
the auto
-
phosphorylation complex of histidine kinases, which are
involved in the majori
ty of signal transduction systems in the bacteria.
Only a multidisciplinary approach integrating statistical genomics,
biophysical protein simulation, and mutagenesis experiments, allows
us to predict and verify the
-

so far unknown
-

active kinase structu
re.



Eric Westhof Architecture et Réactivité de lʼARN, Université de
Strasbourg, Institut de Biologie Moléculaire et Cellulaire, CNRS,
15 rue René Descartes, 67084 Strasbourg, France

The Detection of the Architectural Modules of RNA and Recent
Progress in R
NA Modelling

RNA architecture can be viewed as the hierarchical assembly of
preformed doublestranded helices defined by Watson
-
Crick base
pairs and RNA modules maintained by non
-
Watson
-
Crick base pairs.
RNA modules are recurrent ensemble of ordered nonWats
on
-
Crick
base pairs. Such RNA modules constitute a signal for detecting
noncoding RNAs with specific biological functions. It is, therefore,
important to be able to recognize such genomic elements within
genomes. Through systematic comparisons between homo
logous
sequences and x
-
ray structures, followed by automatic clustering, the
whole range of sequence diversity in recurrent RNA modules has
been characterized. These data permitted the construction of a
computational pipeline for identifying known 3D struc
tural modules in
single and multiple RNA sequences in the absence of any other
information. Any module can in principle be searched, but four can be
searched automatically: the G
-
bulged loop, the Kink
-
turn, the C
-
loop
and the tandem GA loop. The present pi
peline can be used for RNA
2D structure refinement, 3D model assembly, and for searching and
annotating structured RNAs in genomic data. Following the recent
dramatic advances in tools aimed at RNA 3D modelling, a first,
collective, blind experiment in RNA

three
-
dimensional structure
prediction has been performed. The goals are to assess the leading
edge of RNA structure prediction techniques, compare existing
methods and tools, and evaluate their relative strengths,
weaknesses, and limitations in terms of
sequence length and
structural complexity. The results should give potential users insight
into the suitability of available methods for different applications and
facilitate efforts in the RNA structure prediction community in their
efforts to improve the
ir tools.



Roman A. Zubarev Division of Physiological Chemistry I,
Department of Medical Biochemistry and Biophysics,
Karolinska Institutet, Scheeles väg 2, S
-
171 77 Stockholm,
Sweden

Pathway Analysis in Expression Proteomics

Proteomics studies have reveale
d unexpected plasticity and dynamic
nature of the human proteome. The paradigm that the time evolution
of a biological system can be described by abundance variation of
relatively few “regulated” proteins has been shuttered, being replaced
by the growing u
nderstanding that the whole proteome is regulated,
and virtually no protein remains unaffected when the system
undergoes transition from one state to another.

This finding underlines the importance of systems biology analysis of
expression proteomics data.

Systems biology shifts the analytical
focus from thousands of proteins to hundreds of signaling pathways,
thus reducing the number of entities to be analyzed. Application of
these methods required the development of novel systems biology
tools, such as th
e pathway search engine (PSE [1
-
3]). These tools
can only be effective when they are quantitative, i.e. predict not only
the activated pathway, but also the relative degree of its activation.
Introducing the quantitative aspect in systems biology is one of

the
greatest challenges this field is facing today, since the final goal of
pathway analysis, which is the creation of a quantitative predicting
model of the biological process under investigation.


1. Zubarev, R. A.; Nielsen, M. L.; Savitski, M. M.; Kel
-
Margoulis, O.;
Wingender, E.; Kel, A. Identification of dominant signaling pathways
from proteomics expression data, J. Proteomics, 2008, 1, 89
-
96.


2. Ståhl, S.; Fung, Y.M.E.; Adams, C. M.; Lengqvist, J.; Mörk, B.;
Stenerlöw, B.; Lewensohn, R.; Lehtiö, J.
; Zubarev, R. A.; Viktorsson,
K. Proteomics and Pathway Analysis Identifies JNK
-
signaling as
Critical for High
-
LET Radiation
-
induced Apoptosis in Non
-
Small Lung
Cancer Cells, Mol. Cell Proteomics, 2009, 8, 1117
-
1129.


3. Marin
-
Vicente, C.; Zubarev, R. A. S
earch engine for proteomics,
Fact or Fiction? G.I.T. Lab J, 2009, 11
-
12, 10
-
11.

 
ScalaLife  –  Scalable  Software  Services  for  
Life  Science  
Rossen  Apostolov,  KTH  

The Life Sciences have rapidly become one of the major beneficiaries of the European e-
Infrastructures, placing a growing demand on the capabilities of simulation software and on the
support services. The ScalaLife project has set to address some of the specific problems associated
with this growth, acting along two distinct and complementary directions.

On the one hand, the project is concerned with the discrepancy between the scalability advances
made by e-Infrastructure projects such as PRACE/DEISA on large molecular systems and the
reality of the typical Life Science simulation, which works predominantly with small-to-medium
systems. Thus, ScalaLife is implementing new techniques for efficient small-system parallelisation,
developing new hierarchical approaches (explicitly based on ensemble and high-throughput
computing for new multi-core and streaming/GPU architectures) and establishing open software
standards for data storage and exchange.

On the other hand, the project is committed to the long- term support of the Life Science users and
communities, providing both training and expert advice. First, ScalaLife is documenting and
developing training material for the new techniques and data storage formats implemented by the
project. Second, the project has created a pilot for a cross-disciplinary Competence Centre, which
enables the Life Science community to exploit the key European applications developed as part of
the project as well as the existing European e-Infrastructures effectively.

By providing a training and support infrastructure and by developing an adequate framework and
associated policies to foster collaboration, the Competence Centre establishes a long- term structure
for the maintenance and optimisation of Life Science software.

The ScalaLife Comptence Center is welcoming developers of bioinformatics applications for
partnership projects!
Bioinformatics2012Abstract

Prokaryotic and eukaryotic genomes each encode for hundreds of membrane
transporter proteins that play important roles for the cellular import and export of ions,
small molecules or macromolecules. Therefore, the functional classification of
membrane proteins is an important task in genome annotation. Experimental knowledge
about transporter function has been compiled in databases such as TCDB, TransportDB,
and Aramemnon. An important research question for membrane biology is whether two
membrane transporters in organisms X and Y that show a certain sequence similarity
will have the same function or not. Previous computational work in this area includes,
e.g., the tools TransportTP (Xhao 2009) and work by (Gromiha 2008). Prediction
methods often include features such as sequence homology, enriched motifs, and amino
acid properties. Interestingly, no study has sofar critically analyzed the reliability margins
of the individual features.

Here, we provide a benchmarking study of the transferability of functional classifications
of membrane transporters between organisms. We have tested the method using the
transporters of the two model organisms E. coli and Arabidopsis thaliana. 157
experimentally validated transporter sequences from E. coli were obtained from
TransportDB and 156 such sequences from A. thaliana were obtained from the
Aramemnon database. The statistical significance of sequence similarity between an
input sequence and sequences in the training set was determined using the well-known
tools BLAST and HMMER. The MEME program suite was used to identify enriched
motifs in different transporter families. Later, the MAST program from the MEME suite
provided a score for statistically significant motifs identified in the unknown sequence. If
all 3 approaches (BLAST, HMMER, MEME) assigned membership to the same TC
family, this was considered a high confidence annotation.

We tested at which E-value annotations could be reliable transferred between E.coli and
A. thaliana. For this purpose we created subsets according to (1) TC families, (2)
substrate annotations and (3) Substrates split into TC families. According to the TC
system, transporters of the two organisms are annotated to 47 different TC families (E.
coli) and 29 (A. thaliana). 14 TC families are shared and could be used for testing.
Concerning the first subset, E-values of 10
-4
, 10
-3
and 10
-8
were identified as reliable
thresholds for the three classifiers BLAST, HMMER and MEME, respectively. Different
thresholds were discovered for the other subsets. To the best of our knowledge, these
results provide the first benchmarking study for the transfer of functional annotations for
the important class of membrane transporter proteins.
Comprehensive Analysis of Antibiotic Resistance Genes in River Sediment, Well
Water and Soil Microbial Communities Using Metagenomic DNA Sequencing

Johan Bengtsson*, Fredrik Boulund^, Erik Kristiansson^, DG Joakim Larsson*

* Dept. of Neuroscience and Physiology, University of Gothenburg, Sweden
^ Dept. of Mathematical Statistics, Chalmers University of Technology, Sweden


The development and spread of antibiotic resistance across the globe has emerged as one
of the most immense health problems in modern time, further accentuated by the slow
pace of development of antibiotics with new functional mechanisms. While the role of
antibiotics use and abuse in resistance development has been extensively investigated,
examination of the impact of environmental antibiotic pollution in promoting emergence
and dissemination of resistance genes has been limited. We have shown that the selection
pressure of antimicrobial agents can be exceptionally high in environments contaminated
by wastewater from antibiotic manufacturing facilities, creating the kind of extreme
conditions that likely could drive mobilization of resistance genes.

We have compared bacterial genes within microbial communities from river sediments
upstream and downstream of a treatment plant in India receiving wastewater from the
pharmaceutical industry, and releasing effluent containing high concentrations of several
antibiotics into a small river. We have previously characterized these metagenomes using
454 pyrosequencing, however, to get a more thorough view of the community
composition and the resistance gene content, we have now sequenced the same
communities using high throughput Illumina sequencing. In addition, we have sampled
soil from nearby farmland, as well as water from wells in villages affected by antibiotics
pollution. From the DNA extracted from these microbial communities, we have generated
more than 650 million paired-end reads, corresponding to between 15 and 20 million pairs
of reads per sample.

In this data, we can identify a wide range of resistance gene types. Preliminary analysis of
the resistance gene content reveals clear differences in abundances between upstream and
downstream samples; for example the sul2 and sul3 genes are much more commonly
encountered downstream from the treatment plant. In addition, in a nearby lake polluted
by dumping of industrial waste, we find further deviations from the resistance gene pattern
of the river communities, with for example higher abundance of sul1. The preliminary
data also indicates that there are substantial differences in the prevalence of antibiotic
resistance genes between bacterial communities from different well water.

Utilizing short-read sequencing technologies opens up for broader screening for antibiotic
resistance genes in various environments, as the vast number of reads generated by e.g.
Illumina sequencing allows for far deeper studies than the fairly limited pyrosequencing
approaches. Thus, we are able to search also for relatively rare types of resistance genes.
However, some caution should be exercised, as the complexity of the sampled community
may be too large to generate sufficiently long stretches of DNA to accurately identify and
classify resistance genes and mobile genetic elements. Nevertheless, the material
investigated allows more precise studies of the effect on resistance promotion in microbial
communities, and consequently risks for further dissemination to human pathogens as a
result of antibiotic pollution from manufacturing sites
Searching metagenomes to identify and discover mobile
fluoroquinolone antibiotic resistance genes using hidden Markov
models
Fredrik Boulund
1
, Anna Johnning
2
, Mariana B. Pereira
1
, Joakim D.G. Larsson
2
, Erik Kristiansson
1

1
Department of Mathematical Sciences, Chalmers University of Technology and University of
Gothenburg, SE-412 96 Göteborg, Sweden,
2
Department of Neuroscience and Physiology, the
Sahlgrenska Academy at the University of Gothenburg, Box 434, SE-405 30 Göteborg, Sweden
Antibiotics are one of our most powerful tools for treating bacterial infections and have since their
introduction vastly improved human health and drastically reduced mortality rates. However, the
growing use of antibiotics has brought increased resistance in pathogens. Bacteria can acquire
resistance either through chromosomal mutations or via horizontal transfer of antibiotic resistance
genes. It is believed that there exists a vast and unexplored environmental library of mobile
resistance genes called the resistome. Many antibiotics are derived from compounds produced by
organisms in the environment and bacteria have therefore developed natural protection mechanisms
against such substances. Not surprisingly, it has been shown that several of the clinically important
resistance genes originate from the environment. Fluoroquinolones are family of widely used broad-
spectrum antibiotics of synthetic origin, thus lacking any known natural production system.
Consequently, it was originally believed that they would lack any natural resistance mechanisms.
However, a class of mobile fluoroquinolone resistance genes called qnr was recently discovered.
There are currently five known subclasses of plasmid-mediated qnr genes, with the last novel
subclass discovered as late as 2009. It is unknown whether more subclasses exist in the
environmental resistome. The Qnr proteins are pentapeptide repeat proteins that display a repeating
pattern of five amino acid residues. Based on this distinctive sequence feature we created a hidden
Markov model from the sequences of all currently known plasmid mediated subclasses and variants.
To enable identification of novel qnr-like gene variants or subclasses, we developed a classifier to
discriminate between putative novel qnr sequence fragments and non-qnr fragments in
metagenomic data. Evaluation of the model’s performance showed that the statistical power for
correctly classifying fragments from a novel class of qnr genes was more than 94% for input
sequences as short as 100 nucleotides. We applied the model to several large datasets containing
both annotated (e.g. NCBI GenBank) and metagenomic sequences produced with high-throughput
sequencing technologies (e.g. CAMERA, Meta-HIT). Using our method, we were able to identify all
previously known qnr-genes, as well as several putative novel variants. In addition, we discovered
several sequences in the annotated data sources where we could correct and improve annotation.
Two-site mechanism for the allosteric modulation of pentameric ligand
gated ion channels by anesthetics and alcohols
Torben Broemstrup
ac
, Rebecca Howard
d
, Samuel Murail
c
, James Trudell
e
, Adrian Harris
d
, Eric Lindahl
ab
a
Center for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University,
SE-10961 Stockholm, Sweden
b
Theoretical and Computational Biophysics, Kungliga Tekniska högskolan Royal Institute of Technology,
SE-10691 Stockholm, Sweden
c
Institut Pasteur, Groupe Récepteurs-Canaux, and Centre National de la Recherche Scientifique, Unité de
Recherche Associée 2182, F-75015 Paris, France
d
Waggoner Center for Alcohol and Addiction Research, The University of Texas at Austin, Austin, Texas,
United States of America
e
Department of Anesthesia and Beckman Program for Molecular and Genetic Medicine, Stanford
University School of Medicine, Stanford, United States of America
Pentameric ligand-gated ion channels (pLGIC) of the Cys-loop family mediate fast chemo-
electrical transduction. General anesthetics and n-alcohols alter the nerve signaling by interacting
with pLGICs. Despite mutagenesis and labeling studies, the relevant anesthetic binding sites
remain controversial as modeling studies have proposed diverse intrasubunit and intersubunit
binding sites. The recent determination of the crystal of GLIC a prokaryotic member of pGLIC
family enables structural studies to characterize the anesthetic and alcohol binding sites. But
GLIC as a lower-organism pLGIC resembles the bimodal n-alcohol modulation of eukaryotic
channels, while methanol and ethanol are potentiating longer n-alcohols are inhibiting the
channel.
Site-directed mutagenesis studies and a chimera between the GLIC and the human glycine
receptor identified the transmembrane domain as alcohol binding location. A single mutation in
GLIC was identified, which turns the volatile anesthetics desflurane and chloroform from
inhibitors to activators. Further this mutation increases ethanol potentiation and extends n-alcohol
potentiation to hexanol while longer chain alcohols still inhibiting the channel, compared to only
methanol and ethanol potentiating the wild-type. To explain the increased potentiation of the
GLIC mutant, the exact interaction sites of general anesthetics and n-alcohols need to be
characterized and the binding to differential sites needs to be quantified.
To this end we apply atomistic MD simulations and the Free Energy Perturbation method (FEP)
to get binding free energies for desflurane and chloroform as well as n-alcohols in the intra- and
intermolecular binding sites of GLIC.
Our results demonstrate two independent binding sites for alcohols and anesthetics in GLIC, an
inhibitory intrasubunit site and a potentiating intersubunit site. For example, the free energies of
binding show that the wild-type inhibition by desflurane correlates with superior intrasubunit
binding of desflurane (intra: -21.8 ± 0.3 KJ/mol versus inter: -14.4 ± 0.6 KJ/mol), while the
potentiating-enhancing mutation makes desflurane intersubunit binding superior to intrasubunit
binding (intra: -19.7 ± 0.4 KJ/mol versus inter: -23.2 ± 0.5 KJ/mol). Similar, binding affinities of
n-alcohols are increased in the intersubunit site by the mutation correlating with the increased
potentiation of GLIC by n-alcohols.
In conclusion, we present a two-site model for the modulation of pLGICs with an inhibitory
intrasubunit site and a potentiation intersubunit site. Computational predicting of the binding
affinities give quantitative support for the two-site model demonstrating that differential binding
to both sites results in differential modulation of pLGIC.
If There’s an Order in All of This Disorder…: Structural Bioinformatics of the Human
Spliceosomal Proteome

Iga Korneta
1
, Marcin Magnus
1
, Janusz M. Bujnicki
1,2,*

1
Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular
and Cell Biology, Warsaw, PL-02-109, Poland
2
Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Faculty of
Biology, Adam Mickiewicz University, Poznań, PL-61-614 , Poland
* iamb@genesilico.pl

The spliceosome is one of the largest molecular machines known. It performs the
excision of introns from eukaryotic pre-mRNAs. In human cells it comprises five RNAs, over
one hundred “core” proteins and more than one hundred additional associated proteins. The
details of the spliceosome mechanism of action are unclear, because only a small fraction of
spliceosomal proteins have been characterized structurally in high resolution. To aid structural
and functional analyses of the spliceosomal proteins and complexes, and to provide a starting
point for multiscale modeling, we carried out a comprehensive structural bioinformatics
analysis of the entire spliceosomal proteome.
First, we discovered that almost a half of the combined sequence of proteins abundant
in the spliceosome is predicted to be intrinsically disordered, at least when the individual
proteins are considered in isolation. The distribution of intrinsic order and disorder throughout
the spliceosome is uneven, and is related to the various functions performed by the intrinsic
disorder of the spliceosomal proteins in the complex. In particular, proteins involved in the
secondary functions of the spliceosome, such as mRNA recognition, intron/exon definition
and spliceosomal assembly and dynamics, are more disordered than proteins directly involved
in assisting splicing catalysis. Conserved disordered regions in splicing proteins are
evolutionarily younger and less widespread than ordered domains of essential splicing
proteins at the core of the spliceosome, suggesting that disordered regions were added to a
preexistent ordered functional core. The spliceosomal proteome contains a much higher
amount of intrinsic disorder predicted to lack secondary structure than the proteome of the
ribosome, another large RNP machine. This result agrees with the currently recognized
different functions of proteins in these two complexes.
For the ordered part of the spliceosomal proteome, we have carried out protein
structure prediction. We identified new domains in spliceosomal proteins and predicted 3D
folds for many previously known domains. We also established a non-redundant set of
experimental models of spliceosomal proteins, as well as constructed in silico models for
regions without an experimental structure. Altogether, over 90% of the ordered regions of the
spliceosomal proteome can be represented structurally with a high degree of confidence. The
combined set of structural models for the entire spliceosomal proteome is available for
download from the SpliProt3D database (http://iimcb.genesilico.pl/SpliProt3D).
Finally, we analyzed the reduced spliceosomal proteome of the intron-poor organism
Giardia lamblia, and as a result, we proposed a candidate set of ordered structural regions
necessary for a functional spliceosome.
The results of this work enable multiscale modeling of the structure and dynamics of
the entire spliceosome and its subcomplexes and will have a profound impact on the
understanding of the molecular mechanism of mRNA splicing.

COMPREHENSIVE ANALYSIS OF UNIDENTIFIED LC-MS
FEATURES FOR INVESTIGATING PROTEINS DIVERSITY IN
HIGH-THROUGHPUT PROTEOMICS EXPERIMENTS


A.L. Chernobrovkin*, V.G. Zgoda, A.V. Lisitsa and A.I. Archakov
Institute of Biomedical Chemistry RAMS, Moscow, Russia
e-mail: chernobrovkin@gmail.com
*Corresponding author

Key words: single amino-acid polymorphisms; lc-ms; proteins identification

Motivation and Aim
More than 65 thousands nsSNP are known to exist in human genome, and more than
20% of them associated with different diseases. However, the vast majority of annotated
nsSNP have not been observed at protein level yet. Investigation of diseases-related
nsSNP at protein level can shed light on the molecular nature of diseases and provide
additional information for molecular biomarkers discovering.
Methods and Algorithms
According to recent estimation only a small proteomes can be analyzed properly using
high-accuracy LC-MS without using MS/MS for peptide identification [1]. Within the
human proteome only 20% peptides can be properly identified using only accurate
parent mass and retention time data. Here we propose the new strategy for unidentified
LC-MS features analysis, which allows significantly increase the sequence coverage of
proteins, identified using MS/MS data and reveal protein variants caused by translation
of non-synonymous nucleotide polymorphisms. The method uses accurate m/z and
retention time data analysis for assigning theoretical peptides of identified using
MS/MS proteins to the unidentified LC-MS features. As an additional resource for
removing the ambiguity in features annotating we use quantitative data of protein
abundance changes during cells differentiation.
Results
There were 1370 proteins identified in HL60 cells using LC-MS/MS (LTQ Orbitrap
Velos, Thermo Scientific) analysis of triptically digested cell lysates. Quantitative
analysis was performed using Progenesis-LC-MS software and allows us to reveal 300
proteins that have changed their abundance more than 3 times during cells
differentiation process. LC-MS chromatograms were reanalyzed to select those features
that could be matched to the triptic peptides of selected proteins and their variants. Such
procedure allows two to three fold increase in the sequence coverage of selected
proteins. Additionally we observed 38 features that match 17 SAP-specific proteotypic
peptides of identified proteins.
Conclusion
Proposed approach makes it possible to decrease number of unsigned features in LC-
MS based proteomics experiments. Assigning of additional features to previously
identified proteins allows increasing protein sequence coverage and revealing variant-
specific proteotypic peptides.
References
1. P. Bochet et al. (2010) Fragmentation-free LC-MS can identify hundreds of proteins,
Proteomics, 11(1): 22-32.
Structural bioinformatics analysis of pre-mRNA editing complex in Trypanosoma brucei

Anna Czerwoniec
1
, Joanna Kasprzak
1
, Patrycja Bytner
1
, Janusz M. Bujnicki
1,2

1 Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam
Mickiewicz University, Umultowska 89, PL-61-614 Poznan, Poland
2 Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular
and Cell Biology, Ks. Trojdena 4, PL-02-190 Warsaw, Poland

Corresponding author: aczewo@amu.edu.pl

Key words
structural bioinformatics, pre-mRNA editing complex, Trypanosoma brucei

Abstract
Mitochondrial pre-mRNA in trypanosomas kinetoplastids undergoes editing process to
become translatable molecule. Insertion and deletion of uridine nucleotides is catalyzed by up
to 20 proteins acting in a series of catalytic steps. Despite intensive research on editing
complexes their complete structure and components interactions remain unknown. Here we
present structural analysis of ~20S pre-mRNA editing complexes of Trypanosoma brucei. We
built homology models for components of ~20S complexes and gathered information about
disordered regions of proteins, macromolecular interactions between individual elements and
within whole editing complexes. Then we used in software developed in our group – PyRy3D
– to build and visualize very low-resolution 3D models of large macromolecular complexes fit
into density maps. Procedure used represents components as experimental structures (e.g. X-
ray or NMR models), structural models (e.g. homology models) or flexible shapes and applies
Monte Carlo approach to find solutions fulfilling experimental restraints. All generated
models were clustered, scored and ranked and best complexes are presented. Obtained results
provide us with information about macromolecular interactions in pre-mRNA editing
complexes.


Acknowledgments
This analysis was funded by the Polish Ministry of Science and Higher Education (grant to
AC - number 0083/IP1/2011/71, grant to JK - N N301 123138).


The Triform algorithm: improved sensitivity and specificity in ChIP-Seq peak finding

Karl Kornacker
1
, Morten Beck Rye
2
, Tony Håndstad
2
, and Finn Drabløs
2


1
Division of Sensory Biophysics, Ohio State University, Columbus, Ohio, USA
2
Department of Cancer Research and Molecular Medicine, Norwegian University of Science
and Technology (NTNU), NO-7491 Trondheim, Norway


Chromatin immunoprecipitation combined with high-throughput sequencing (ChIP-Seq) is
the most frequently used method to identify the binding sites of transcription factors. Active
binding sites can be seen as peaks in enrichment profiles when the sequencing reads are
mapped to a reference genome. However, the profiles are normally noisy, making it
challenging to identify all significantly enriched regions in a reliable way, and with an
acceptable false discovery rate.

We have developed the Triform algorithm, an improved approach to automatic peak finding
in ChIP-Seq enrichment profiles for transcription factors. The method uses model-free
statistics to identify peak-like distributions of sequencing reads, taking advantage of improved
peak definition in combination with known characteristics of ChIP-Seq data. The statistical
test in Triform is fully nonparametric, i.e. free from any assumed relationships or fitted
parameters. In particular, the test is free from any assumed background model and is therefore
more robust than model-based tests, which depend on locally uniform background models and
fitted background parameters.

Triform outperforms several existing methods (i.e. MACS, Meta, QuEST, PeakRanger, PICS,
FindPeaks, and TPic) in the identification of representative peak profiles in curated
benchmark data sets for the transcription factors NRSF/REST, SRF and MAX [1]. We also
show that Triform in many cases is able to identify peaks that are more consistent with
biological function, compared with other methods. In particular, we test for properties that are
significantly associated with peak regions identified by Triform, MACS, Meta, QuEST,
PeakRanger and TPic, using statistical overrepresentation analysis. Finally, we show that
Triform can be used to generate novel information on transcription factor binding in repeat
regions, which represents a particular challenge in many ChIP-Seq experiments.


1. Rye MB, Saetrom P, Drablos F: A manually curated ChIP-seq benchmark
demonstrates room for improvement in current peak-finder programs. Nucleic Acids
Res 2011, 39(4):e25.
HAMP domains: implications for transmembrane signal transduction
Stanisław Dunin-Horkawicz
a,b
, Andrei Lupas
b

a
International Institute of Molecular and Cell Biology, Warsaw, Poland
b
Max Planck Institute for Developmental Biology, Tuebingen, Germany
Homodimeric receptors with one or two transmembrane (TM) segments per monomer are
universal to life and represent the largest and most diverse group of cellular TM receptors. They
frequently share domain types across phyla and, in some cases, have been recombined
experimentally into functional chimeras (e.g., the bacterial aspartate chemoreceptor with the
human insulin receptor), suggesting that they have a common mechanism. We have proposed a
model for transduction mechanism by axial helix rotation, based on the structure of a widespread
domain, HAMP, that frequently occurs in direct continuation of the last TM segment. Here we
show by statistical analysis that HAMP domain sequences have biophysical properties compatible
with the two conformations proposed by the model. The analysis also identifies networks of
coevolving residues, which allow the mechanism to subdivide into individual steps. The most
extended of these networks is specific for membrane-bound HAMP domains and most likely
accepts the signal from the TM helices. In a classification based on sequence clustering, these
HAMPs form a central supercluster, surrounded by smaller clusters of divergent HAMPs, which
typically combine into arrays of up to 31 consecutive copies and accept conformational input
from other HAMP domains.
Allele specic expression changes after induction
of in ammation
Recent advances in RNA and DNA sequencing technology has enabled a
more detailed picture of gene expression and genomic dierences to emerge.
One particularly interesting aspect is the dierence in expression between the
two dierent alleles of a gene within a single individual,one inherited from
the mother and one from the father.Any such allele specic expression (ASE)
could indicate an allele-specic cis-acting genetic factor.ASE thereby provides
an ecient means to explore the functional eects of genomic variation and can
help in identifying functional variants in the extensive conserved non-coding
part of the genome.
In this study we assessed ASE in human white blood cells with and without
treatment of the immune-inducing chemical LPS by performing RNA-seq on
several individuals.This allowed studying ASE of transcripts which potentially
are of special importance in in ammation.Further,to nd candidate haplo-
types responsible for observed allelic dierences we conducted whole genome
genotyping of the RNA source subjects.Preliminary results indicate that about
5% of all genes show ASE.Searching for variants where a change in allele speci-
city was induced by the treatment,a total of 117 unique signicant variants
were detected among all individuals,of which ten variants were found in two
or more individuals.To our knowledge,ASE analysis coupled with dierential
expression analysis of in ammatory induced cells have not previously been done.
Statistical assessment of gene group crosstalk enrichment in networks
Oliver Frings
1,2,
, Theodore McCormack
1,2, ‡
, Andrey Alexeyenko
1,3
, Erik L.L. Sonnhammer
1,2,4
1
Stockholm Bioinformatics Centre, Science for Life Laboratory, Box 1031, SE-17121 Solna,
Sweden.
2
Department of Biochemistry and Biophysics, Stockholm University
3
School of Biotechnology, Royal Institute of Technology
4
Swedish eScience Research Center
Abstract
Motivation
Analyzing groups of functionally coupled genes or proteins in the context of global interaction
networks has become an important part of bioinformatic analysis. Typically, one wants to analyze
the crosstalk, that is, the extent of connectivity between or within functional groups. However, this
is only meaningful if statistical significance of the measured crosstalk enrichment is assessed.
Results
CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment
between pairs of gene or protein groups in large undirected biological networks. We demonstrate
that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the
ability of four different methods to recover crosstalk within known biological pathways and
estimate the confidence of the findings. We conclude that the methods preserving second-order
topological properties perform the best for crosstalk analyses.
Availability and Implementation
CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is
implemented in C++ and is fast, accepts various input file formats, and produces a number
of statistics. These include z-score, p-value, false discovery rate, and a test of normality for
the random distribution.
Associate Professor David E. Gloriam
University of Copenhagen, Department of Drug Design and Pharmacology, Universitetsparken 2,
21000 Copenhagen, E-mail: dg@farma.ku.dk

Chemogenomic Discovery of Allosteric Antagonists
at the GPRC6A Receptor
We have integrated chemogenomic ligand inference, homology modeling, compound synthesis, and
pharmacological mechanism-of-action studies to discover the most selective GPRC6A allosteric
antagonists discovered to date
1
. GPRC6A is a Family C G protein-coupled receptor recently discovered
and deorphanized by the Bräuner-Osborne group at University of Copenhagen. Three compounds with
at least ~3-fold selectivity for GPRC6A were discovered, which present a significant step forward
compared with the previously published GPRC6A antagonists, calindol and NPS 2143, which instead
are ~30-fold selective for the calcium-sensing receptor. The antagonists constitute novel research tools
toward investigating the signaling mechanism of the GPRC6A receptor at the cellular level and serve
as initial ligands for further optimization of potency and selectivity enabling future ex vivo/in vivo
pharmacological studies.

Our chemogenomic lead identification is, to our knowledge, the first ligand inference between two
different GPCR families, Families A and C. The unprecedented inference of pharmacological activity
across GPCR families provides proof-of-concept for in silico approaches against Family C targets
based on Family A templates, greatly expanding the prospects of successful drug design and discovery.
Furthermore, ongoing work on the application of the chemogenomic method to a large number of
orphan receptors and drug targets will be described. Finally, a novel bioinformatic method for the
identification of endogenous peptide ligands will be presented.

(1) Gloriam, D. E. et al., Chem. Biol. 2011, 11, 1489-1498.