Download - CSBL - University of Georgia

tribecagamosisAI and Robotics

Nov 8, 2013 (3 years and 11 months ago)

61 views











RNACluster

A
n

integrated tool for

RNA
secondary
structure comparison and
cluster
ing







http://csbl.bmb.uga.edu/publication/materials/qiliu/
RNACluster
.html


Qi Liu

,

V. Olman

,
Huiqing Liu

,

Xiuzi Ye,
Shilun Qiu,
Ying Xu
.

RNACluster:
An
integrat
ed tool
for
RNA
secondary
structure
comparison and cluster
ing


To be submitted to the Journal of
Computational
C
hemistry

Version
1
.0
October

200
7



Qi Liu

Computational System Biology Lab

The University of Georgia

Athens, GA 30602

USA

RNACluster

documentation (
11/9/13
)



1

Contents


1.
Overview


2.
Obtaining and
i
nstallation of
RNACluster


2.1.
Obtaining
RNACluster


2.
2
.
Install
ing
RNACluster



3
. Tutorial


3.1. Functions

and features


3.
2
.
Inputs


3.
3
. Outputs


3.4. Interface


3.
4
.
S
imple example
s


4.
References


RNACluster

documentation (
11/9/13
)



2

1. Overview

RNACluster

is
an
integrated

computational software which
impleme
nts
6

common
structure distances to
measure the (dis)similarity of RNA secondary
structure
s including

base pair distance

(Ding
et al
.
,
2005)
;
mountain distance

(Vincent
et al
., 2000)
; morphological distance

(Vincent
et al
., 2000
;

Björn

et al
.,2004)
; tree
ed
it distance

(
Shapiro B A

and

Zhang K
,1990)
; string edit distance
(Shapiro,1988;
Shapiro B A

and

Zhang
K
,1990)

and our in
-
house structure matrix distance

(
Qi Liu
et al.
, Fuzzy kernel clustering of RNA secondary
structures using a novel similarity metric, pape
r submitted
)
,

and
one
effective
cluster algorithm
for the
clustering of
structure

sample
s

based on

the

mi
nimum spanning tree concept of graph theory (
Ying
et
al
.,2002;
Olman V

et al
.,2003;
)
.

This tool can be used to study the characteristics of RNA secondar
y
structures
;

RNA
structure

conformational switch
;

RNA conformational energy landscape
and RNA secondary
structure prediction based on the clustering of structure samples.


RNACluster

is

compiled
in Windows

and Linux
, and it can be run
on both

platform
s
.



Contact:
qiliu@csbl.bmb.uga.edu





















RNACluster

documentation (
11/9/13
)



3

2. Obtaining and Installation of
RNACluster


2.1. Obtaining
RNACluster


RNACluster

can be downloaded at:

http://csbl.bmb.uga.edu/publications/materials/qiliu/RNACluster.html


The
website

contains:



Documentations

(as a PDF file and as a
Word

file)
.



Three versions of
RNACluster
: Windows console version; Windows graphical vers
ion; Linux version.



A
nd other related
materials
.



2.
2
. Install
ing
RNACluster

For Linux users, just download the Linux version package and follow the README file to install it.

For Windows users to install the graphical version, just download the setup.exe
,

run
it

and install
RNACluster

as it guide
s.
The install program will
also make links of the

main program in the desktop

and start menu
.

A
friendly install in
terface can be seen in Figure 1
.





Fi
gure 1

䥮I敲f慣攠潦
RNACluster

Setup program

RNACluster

documentation (
11/9/13
)



4

3.
Tutorial

3.
1
.

Functions and features


1.
Integrate

6

different distances to measure (dis)similarity of RNA secondary structures
.

2. Cluster RNA structure samples based on the minimum spanning tree algorithm
.

3.
Visualiz
e the

MST

construction
procedure

and
p
lot
the
edge
-
distance in the order of their selection by the
Prim's algorithm

(MST curve)
.

4. Identify all possible clusters for the given structure ensemble and derive useful
information

about cluster

(clus
ter number,
size of each cluster,

p
-
value

of each cluster, parent cluster index / num
ber of children of
this cluster
,
centr
oi
d of each cluster, structure wit
h lowest energy in each cluster
,
compactness

of each
cluster
et al
.
).

5
. Apply multi
-
thread
mechan
ism

to utilize the

processing ability of W
indows
operation

system
when it
ha
s

multiple CPUs o
r CPUs with multiple cores,
and achieve a
better

speed when
calculating

and clustering
large set of structure samples
.

6. Friendly graphical interface.


3.
2
. Input
s


I
nput to
RNACluster

should be a
RNA structure ensemble
contains

a set of
suboptimal secondary structures
.

The

suboptimal secondary structures can be sampled
within a user defined energy range above the
minimum free energy (
MFE
)

or randomly sampled by Bo
ltzmann weight

(Ding
et al.
,

2005)
.

Several tools can
be used to generate such suboptimal structures including
Vienna RNA package

(
Hofacker

et al.
,1994)
,

MFold

(
zuker,1981)
,

RNAshapes

(
Giegerich

et al.
,2004)
,

SFold

(Ding
et al.
,2001)
et al
.


Format of inpu
t file
is
straightforward,

as

show in
Figure 2
, begin with one RNA sequence in the first line and
the dot
-
bracket
notation

of each structures in the following.


RNACluster

documentation (
11/9/13
)



5



Figure 2: An example of input file

Note:

1).
First line of
in
put

file should be the RNA sequence with no tabs, spaces
,

and wraps
in

the sequence.

(make sure to leave no tabs or spaces at the end of the line)

2).Following

lines contain the structure samples need to be analyzed in dot
-
bracket notation.

Each line has
one structure.

(make sure to leave no tabs or spaces at the end of the line).

3)
.The sequence is formed with character A/a,U/u,G/g,C/c,T/t.

Any other character contained in the
sequence
will
lead an error notification in the program.

4). User
can

click

OP
EN FILE


button to open the dialogue to select the input file.

3.
3
. Outputs


All output files are stored in the same directory where
the input file
is

stored.

There are two types of output files: output of distance matrix and output of the final cluster r
esult.

1.
Output file

of distance matrix

can be traced

for individual
distance metric
s.

T
he corresponding output
file
for individual
distance metric
s are:


a).
Base pair distance
:

basepair_distance.txt


b).
Mountain distance
:
mountain_distance.txt



c).
Morphological distance
:
morphological_distance.txt


d).
Tree edit distance
:

treeedit_distance.txt


e).
String edit distance
:

stringedit_distance.txt


f).
Structure matrix distance: structurematrix_distance.txt





2.
Output of
cluster
ing result

is named as

cluster_result.txt

. This file present
s

useful
information

about the
clustering of
the
given structure ensemble.



3. A list box
on

the main interface of the software will display necessary
information

about the running
statues of t
he software.


RNACluster

documentation (
11/9/13
)



6

Note:


Format of distance matrix

output file

is followed

with

the
format of the input to
the program
fitch
,
kitsch

and
neighbor

in
P
HYLIP
package

(
Felsenstein
,1989)
, which can be used to
construct

the
phylogeny
tree of the
input RNA structur
e samples.


3.
4
.
Interface

A friendly

interface is implemented in our software include
3
operation

areas

(Figure
3
):

MAIN WINDOWS:



OPEN FILE
: read the input file


PARAMETER SETTING
: choose distance

and set other parameter for cluster


DISTANCE
CALCULATION
: compute distance matrix


CLUSTER
ING
: cluster given structure ensemble

based on minimum spanning tree
algorithm.


CL
USTER

VISULIZATION
:

draw MST curve
.


CLEAR
: clear the running information in the list box

PROGRESS:


A progress button show the running procedure during the distance
calculation

or clustering

OPERATION
:


ONLINE HELP
: link to the online help file


ABOUT
: simply introduction of our software


EXIT
: exit

RNACluster

documentation (
11/9/13
)



7






Figure 3: Main interface of
RNACluster

3.
5
. A simple example


1. Start
RNACluster

by
click the software icon
on

the desktop.

This command opens the

main window of
RNACluster
.

2.
Click


OPEN FILE


to

select
the input file

and read it. Here please fi
nd the
file

example.txt


in your
installed program directory
, which contains 1
3

structure samples of an RNA attenuator.


3
.
After reading input file, the

DISTANCE
CALCULATION


button will be activated
.

C
lick this button to
compute distance matrix for
the

given ensemble. The default distance metric is base pair distance. Click

PARAMETER SETTING


button to change your distance metric

(
F
igure
4
)
. Users can set the minimum
cluster size before clustering. Also if the

FOCUSED STRUCTURE ID


is
designated

(0 m
eans no
designation)
,

the point of the focused structure in the MST curve will be highlight when the curve is draw
n

(Figure 5)
.

RNACluster

documentation (
11/9/13
)



8

4. After finishing distance
calculation
,

the

CLUSTER
ING


and

CLUSTER

VISUALIZATION


button will be
activated. Click them to c
luster the given structure ensemble or see the MST curve
of

clustering

(Figure
5
)
.

5. The corresponding output file for
distance

matrix and cluster result will be seen in the
same
directory


where the input file is stored
.









Figure 4: Parameter setting dialog
ue

RNACluster

documentation (
11/9/13
)



9












Figure 5: MST Cur
ve

4. References

Björn

Voß? Carsten Meyer Robert Giegerich, (2004),

Evaluating the Predictability of Conformational
Switching in RNA,

Bioinformatics
,

Pages: 1573


1582
.


Ding, Y. and Lawrence, C.E. (2001). Statistical prediction of single stranded regions

in RNA secondary
structure and application to predicting effective antisense target sites and beyond.
Nucleic Acids Res
. 29:
1034

1046.


D
ing
,

Y.
C
hi Yu

C
han
, and C
harles

E. L
awrence (2005),

RNA secondary structure prediction by centroids in
a Boltzmann w
eighted ensemble,

RNA
, 11:1157

1166.


Felsenstein, J.
(
1989
),

PHYLIP
--

Phylogeny Inference Package (Version 3.2).
Cladistics

5: 164
-
166.


Ivo L. Hofacker, Walter Fontana, Peter F. Stadler
et al
.,

(1994),

Fast Folding and Comparison of RNA
Secondary Struct
ures,

Monatsh.Chem
. 125: 167
-
188.


Olman V, Xu D, Xu Y.,CUBIC (2003), identification of regulatory binding sites through data clustering.

J Bioinform Comput Biol.

Apr;1

(1):21
-
40.


RNACluster

documentation (
11/9/13
)



10

Robert Giegerich,
Björn

Voß

and Marc Rehmsmeier, (2004), Abstract shapes of

RNA,

Nucleic Acids
Research
, Vol. 32, No. 16 4843

4851.


Shapiro B A, (1988), An algorithm for comparing multiple RNA secondary structures,
CABIOS

4, 381
-
393.


Shapiro B A, Zhang K (1990), Comparing multiple RNA secondary structures using tree comparison,

CABIOS

6, 309
-
318


V
incent

M
oulton
,M
ichael

Z
uker
,M
ichael

S
teel
,
et al

(2000).

Metrics on RNA Secondary Structures,

J
ournal

of

C
omputational

B
iology
,

PP. 277

292.


Ying X., Olmam, V., Dong X. (2002), Clustering Gene Expression Data Using a Graph
-
theoretic A
pproach:
an Application of Minimum Spanning Trees. Bioin
-
formatics, 18, 536
-
545.


Zuker M.

(1981)
,

Optimal computer folding of large RNA sequence using thermodynamics and auxiliary
information,
Nucl.Acids.Res.

, 9:133
-
148.