# Bioinformatics

Βιοτεχνολογία

2 Οκτ 2013 (πριν από 4 χρόνια και 7 μήνες)

85 εμφανίσεις

Computational Questions

Bioinformatics

Where CS and Biology Meet

Bioinformatics: Applications of CS to
the life sciences

What are the computational issues?

Storage and retrieval of genetic data, data
mining, tools

Analysis of genetic data: similarities,
differences, structure

Processing experimental data

Problem Solving in

Computer Science

Program: Sequence of instructions that
perform a particular

data (input), produce results (output)

From problems to programs

Formulate the problem

Develop and verify an
algorithm

Write and test the program

Algorithm Analysis

Algorithm: Conceptual/theoretical form
of a program

What is analyzed?

Correctness: does it solve the problem?

Complexity: how much resources (time
and memory) does it consume?

Tradeoffs: sometimes, we need to sacrifice
correctness for efficiency

Example 1: Searching for an
Element in a List

Problem formulation:

Input: sorted list L of n elements (e.g.,
names) and a target element x

Output: the position of the target element
if it exists in the list

Possible algorithms

Linear search

Binary search

Linear Search

Algorithm:

For each element in L (from the first to the
last element), compare it with x and return
the position if equal

Time complexity:

Up to n comparisons performed

On the average, n/2 comparisons

Runs in linear time (proportional to the list
size n)

Binary Search

Algorithm:

Compare middle element of the list with x,
return the position if equal; if not, reduce
the list to either the lower half or the upper
half of original list; repeat the process

Time complexity

Up to log
2
n comparisons performed

Runs in logarithmic time

Linear vs Logarithmic Time

n

n/2

log n

10

5

4

100

50

7

1000

500

10

1,000,000

500,000

20

2,000,000

1,000,000

21

Comparing Running Times

Exercise: tabulate values of the following
run
-
time functions for different values of n

Functions:

log n (logarithmic)

n (linear)

n
2

n
3
(cubic)

2
n

(exponential)

n!

Example 2: Substring Search

Problem formulation:

Input: Strings s and t of characters

Output: If s is a substring of t, its position
in t

Example:

Input: s = “ctct”, t = “agtctcttctaac”,

Output: 4

Algorithm? Time Complexity?

Example 3: Traveling
Salesman

Problem Formulation:

Input: n cities, distances between cities

Output: shortest tour of all cities

Algorithm:

Consider all permutations of the cities,
compute total distances for each
permutation, select the minimum among
all total distances

Exponential Algorithms and
Intractable Problems

The Traveling Salesman problem is an
example of an intractable (“NP
-
complete”)
problem

Characterized by:

The existence of a correct exponential algorithm

No known polynomial algorithm

Exponential algorithm is impractical. Now
what?

Heuristics

There are polynomial algorithms for
intractable problems that do not always yield

nearest unvisited city, repeat process

Not always correct. Counterexample?

Selection of nearest city is called a
heuristic

Compromise: Can prove some statements on
the (incorrect) algorithm and that may be
enough in practice

Back to Bioinformatics:

Some Objectives

Formulate problems relevant to biology

Devise/understand algorithms for these
problems

Computer scientists and biologists need to
talk more

Computer scientists have a tendency to make
(often unreasonable) assumptions

Biologists may place too much faith on results
returned by automated systems

Overview: Selected Problems
in Bioinformatics

Sequence alignment

Phylogeny

Dealing with experimental results

BLAST Search

Blast Results

DNA Sequence Databases

Data representation, integrity, accuracy

Search and scoring methods

Meaning and reliability of results

e.g., how does BLAST

(Basic Local Alignment Search Tool)
respond to random data?

Sequence Alignment Problem

Given two nucleotide sequence, obtain
an optimal alignment between the
sequences

Example:

AT
-
C
-
TGAT

-
TGCAT
-
A
-

Dynamic Programming

T

G

C

A

T

A

A

T

C

T

G

A

T

Phylogeny

Construction of phylogenetic trees
based on genomic distance

Problems to be solved:

Determining genomic distance

Tree construction from the distances

Determining Genomic Distance

Given two genomes, determine the
number of mutations necessary to
obtain one from the other

Common distance model (least number
of mutations)

Mutation on the genome level:
rearrangement (sorting!) operations on
permutations

Sorting Permutations and a
Graph Theoretic Model

0 3 5 6 7 2 1 4 8 9

0 1 2 7 6 5 3 4 8 9

Phylogenetic Tree
Reconstruction

Given a set of species and genomic
distances between the species,
construct a phylogenetic tree that is
(most) consistent with the distances

Problem shown to be NP
-
complete

This means we should try some heuristics

Phylogenetic Tree

Mouse

Human

Monkey

Experimental Results

Image or data directly drawn from a
device

e.g., microarray, scanner

Need to make objective, discrete
conclusions

e.g., pixel intensity vs. gene expression

Need to handle errors and
imperfections

Microarray Image

Image Analysis to Aid
Microarray Experiments

Automatically locating the grid of spots

Use Fourier transforms to compute periods
and offsets

Extracting intensity

Refine spot sample to collect significant,
normalized data

Make conclusions on genetic function

Summary

Bioinformatics: a perfect opportunity for
interdisciplinary research within the
sciences