Ontologizer Tutorial and Exercises

splashburgerInternet και Εφαρμογές Web

22 Οκτ 2013 (πριν από 4 χρόνια και 17 μέρες)

74 εμφανίσεις


1

Politecnico di Milano

School of Information Engineering

Master Degree in Information Engineering

Course “Bioinformatics and Computational Biology for Medicine”



Ontologizer
Tutorial and
Exercises

Arif Canakoglu

canakoglu@elet.polimi.it


(This tutorial was taken from
Ontologizer

web application help)

Ontologizer Tutorial

You can run the application from the given link below: (you should start the application with clicking java webstart
button

http://compbio.charite.de/contao/index.php/ontologizer2.html


Setting Up the Ontologizer

1.

Download the file
http://compbio.charite.de/tl_files/ontologizer/examples/yeastSampleFiles.zip

and unpack
it into a directory of your choice, e.g, the desktop. The archive contains sets of genes up
-

or downregulated
following treatment with sulfometuron methyl, which is an
inhibitor of amino acid biosynthesis. Data was
gathered from
Jia et al. (2000) Global express
ion profiling of yeast treated with an inhibitor of amino acid
biosynthesis, sulfometuron methyl
.

2.

If you need a proxy to access the internet, please open the
Preferences Window

via the
Window >
Preferences...

menu entry within Ontologizer.


3.

Enter your p
roxy configuration in the appropriate line and then press
Ok
.


File Sets


2



Each File Set contains a definition file and an association file. Each project uses a File Set to perform the
analysis. This is done so that users can manage different file
configurations easily, allowing for instance
different versions of the definition file to be used for different projects, as it may be useful to use the same
version of the definition file (which is frequently updated at the GO website) for development.

Us
er
-
Supplied Files



In order to analyze your experimental data, you need to prepare one file for each of the groups of interest in
your experiment (For instance, this might be a list of genes differentially expressed at different time points).
We refer to su
ch groups as
study sets
. Additionally, you need to indicate the
population set
. In general, this
will be a list of all genes that were test (for instance, all genes represented on a microarray). The genes should
be listed one on a line in plain text (Alter
natively, FASTA files can be used if desired if the name of the gene
directly follows the '>' sign).



For this tutorial, you can download the yeast study and populations files from the Ontologizer website:
http://www.charite.de/ch/medgen/ontologizer/howto/
index.html. Unpack these files before use.

Creating a New Project

1.

In order to create the new project, press the
New Project

button within the toolbar of Ontologizer's main
window or select the
Project > New > Project...

menu entry.


2.

This brings up the
Ne
w Project Wizard
. First enter a name for the project. For the tutorial, enter
suflometuronMethyl

into the
Project Name

textfield then press the
Next

button to proceed to the next page.



3

3.

Here you need to indicate the
definition file

(via the
Ontology

text

field) and the
association file

(via the
Association

text field). The Ontologizer comes with predefined File Sets for frequently used species that can
be automatically downloaded. We have downloaded the File Set for Yeast above. If we hadn't, the Ontologi
zer
would now automatically download these files in the background. For this tutorial, click on the
File Set

combo
box and choose
Yeast
. Then press
Next

which brings you to the
Population Edit

page.


4.

Now enter the genes of the
population set
. Use the study set/population set example files downloaded from
the Ontologizer homepage as described above. Drag & Drop the file called
population.txt

into the
gene editor

field or use a File Selection Dialog by clicking on the
Append Set...

button. Noti
ce that names of genes with GO
annotations are highlighted (you may have to wait for completion of downloads or parsing before seeing
highlighting). You can hover the mouse over these entries to see more information about the gene's

4

annotation. Proceed by
clicking on the
Next

button.


5.

Drag & Drop a study file into editor area (again, alternatively, you can use a file selection dialog by clicking on
the
Append Set...

button). Press
Next

and repeat the procedure for each study set (file).


6.

Press
Finish

wh
en you added the last study set. The
New Project Wizard

window closes and you should now
see your new project
suflometuronMethyl

appearing in the main window.


5

Performing the Analysis

The Ontologizer offers multiple methods for searching for GO term overre
presentation and for multiple testing
correction. For more information on these topics please consult the Ontologizer homepage, where you will also find
links to publications describing the Ontologizer. For the purposes of this tutorial, we will use the Pa
rent
-
Child Union
Methods with a Bonferroni multiple testing correction.

1.

Within the main window, select our project which is
sulfometuronMethyl
.


2.

From the combo boxes in the tool bar, choose a
calculation method

(first combo box),
Parent
-
Child
-
Union

and
and the
Bonferroni

a
multiple te
st correction

(second combo box). Then press
Analyze
.

Exploring the Results

1.

The
Results Window

now appears. Depending on the size and number of the study sets and the type of
multiple testing correction desired, the analyis should complete in a few second
s to a few minutes. As
individual study sets are completed, new tabs appear with the results. If you have used all the files of this
tutorial, you should see seven tab folders corresponding to the name of the study sets once analysis is
completed. The firs
t study set is activated and within the tab folder the results are presented in form of a
table.


6



2.

Notice that the background of terms whose adjusted p
-
value falls below the significance level (as given by
widget below the table) is colorized according
to the sub
-
ontology and the rank. (Note that the significant
terms are marked in color, whereby the terms from biological process are shown in green, terms from
molecular function in yellow and terms from cellular component in magenta.)

3.

Now click on one of

the terms, e.g.,
amino acid and derivative
. This refreshes the browser of the bottom part
in the window to contain information about the term including the parents (more general terms), children
(more specific terms) or the names of the genes, to which th
e term is annotated to.

4.

To get a graphical overview, press the
Preview Graph

button (the third from left in the toolbar).

5.

The graphs consists of all active terms as defined by the little checkboxes before every time, which by default
are all signifcant t
erms.

6.

The parameter of the graph view (i.e., zoom factor, which extend is displayed) can be altered the button or
context menu commands.

Question 1:

try the same analysis without any correction and compare the results.

Question 2:

try the same analysis with different correction methods and try to explain the reason of the
numbers of the elements found it different for the different cases.

Question 3
:

Use different Statistical Analysis methods and compare them.

Note:

Statistical Ana
lysis controls the method by which the annotated genes or gene products in the study
set are analyzed for GO term overrepresentation with respect to the population set. The standard method has
been to calculate the upper tail of the hypergeometric distribu
tion (One
-
sided Fisher exact test) for each term
separately. The Ontologizer also provides analysis by means of the parent
-
child approach, which has several
advantages compared to the standard approach (see the Ontologizer homepage for further details and
references).