in English - Bioinformatics


Oct 2, 2013 (4 years and 7 months ago)



Bioinformatics in Functional genomics

Martti Tolvane

IMT Bioinformatics

University of

Course basics

extent: 4 cp, or 80 hours of work

duration: 12 weeks max.

starting: whenever it fits your personal
schedule, excepting holiday periods
when the application system is closed

wide bioinformatics (1)

comparative genom

genome analyses

evolution studies

analyses of components in a ”complete” system

functional genom
cs = inferring functions from data

expression patterns, gene regulation

sequence comparisons, homologue relationships

studies of gene variation, altered phenotypes

wide bioinformatics (2)


expression proteomics = differential proteomics =

interaction proteomics

functional proteomics

= systematic perturbation or
functional inactivation of proteins in a given

structural proteomics (with a frequently used
misnomer: structural genome cs)

Topics of our FG course


Gene variations

DNA microarrays


Course topics


genome projects

genome annotation

analysis and predicition of functions and
orthologous genes

gene identification and prediction

Course topics (3)

gene variations

mutation data banks

DNA microarrays

data manipulation


data mining

Course topics


expression proteomics

2D electrophoresis

mass spectrometry


removed/inactivated genes (”knock

RNA Interference (RNAi)

strongly emerging

structural proteomics

interaction proteomics

metabolic networks

How the course works

you need to choose your own focus
areas and plan your schedule at the
start of the course

main chapters (2.
5.) can be taken in
any order

a quick look at all material may help
you to decide how to proceed

How the course works (2)

all chapters provide learning goals and
exercises, but none of them are

it is up to each student to find
appropriate tasks to help them achieve
the goals they set for themselves

you need to document your course
work in the Learning diary

Learning diary

material you can enter in your Learning

summaries of new things you have learned
and/or feel to be important

solutions to exercises, descriptions of the
processes how you found the solutions

article and Internet references which you have
studied carefully (especially ones you have
found outside the course material)

what you
write down: time you
spent each day

Goals of your Learning diary

documents your presence and the time you
spent in the course

you know when it is
time to finish

deeper learning when you produce texts of
your own from what you have read and done

you need to reserve a lot of time for diary
work; perhaps even half of your course time
should be spent in tasks aiming at specific
diary entries

Genomes and their annotation

complete genomes of many organisms
are available

goal: ”system
wide” understanding of
the biology of a given organism

= seeing ”parts lists” of everything an
organism needs, and figuring out how
they work together

Genomes and their annotation

gene finding is not always straightforward

problem: rare gene products, for which
you cannot find corresponding mRNA or
protein sequences in databanks

additional complication: alternative
splicing, many transcripts per gene

Genomes and their annotation

if you intend to analyze or just use data
from a databank, it is useful to know both
the goals and the reality of their
annotation level

inconsistencies, missing data

even well
annotated databanks provide
only a fraction of all biologically relevant
information relevant to a gene or a
molecule (compared to literature)

Annotation: a vision

databank content: all knowlegde on functions
of a gene product

add structural information

insights in structure
function relationships

add data on expression patterns and regulation

understanding cell differentiation and other
big questions in biology on molecular level

Introduction to DNA microarrays

massive data sets from simultaneous
expression levels of thousands of genes

impossible to grasp directly by the
human mind

methods are needed for finding
meaningful results and patterns from
the bulk of data

DNA microarray bioinformatics

data manipulation: normalization etc.

data clustering

genes which behave in a similar fashion

sample classification by profiles of predictive
genes (e.g. cancer typing)

data mining:

finding interpretation to clustering results

example: recognition of regulatory factor binding
sites in coexpressed genes

data from a

Introduction to Proteomics

as in the transcriptome, composition of the
proteome depends on cell type,
developmental phase and conditions

proteome analyses are still struggling to solve
the ”basic proteome” of different cells and
tissues or limited changes under changing
conditions or during processes

current methods can only ”see” the most
abundant proteins

Proteomics experiments

typically a combination of 2D protein
electrophoresis and mass spectrometry

intensive, not really ”high
throughput” methods

more efficient ”protein array” methods
are emerging

dimensional electrophoresis

Bioinformatics in proteomics