Introduction to Bioinformatics - Department of Computer Science

powerfultennesseeBiotechnology

Oct 2, 2013 (4 years and 1 month ago)

88 views

Introduction to Bioinformatics

What is Bioinformatics

Easy Answer

Using computers to solve molecular biology
problems; Intersection of molecular biology and
computer science


Hard Answer

Computational techniques (e.g. algorithms, artificial
intelligence, databases) for management and
analysis of biological data and knowledge

Bioinformatics


Bioinformatics = Biology + Information



Biology is becoming an information science



Computation methods are necessary to
analyze the massive amount of information
that coming out of the genome projects

Bioinformatics is Another
Revolution in Biology

Three concepts, which remain
central to Bioinformatics


Data representation

A complex, dynamic, three
-
dimensional molecule a simple
string of characters

Three concepts, which remain
central to Bioinformatics


The concept of similarity


Evolution has operated on every sequence


In biomolecular sequences (DNA, RNA or amino acid
sequences). High sequence similarity usually implies
significant functional or structural similarity.


The opposite is not true


Algorithms for comparing sequences and finding
similar regions are at the heart of bioinformatics

Three concepts, which remain
central to Bioinformatics


Bioinformatics is not a theoretical science; it is
driven by the data, which in turn is driven by the
needs of biology.



Sequences



Microarray technologies





GenBank Growth

Moore’s Law

What do you need to know?


It all depends on your background

Are you a …?


Biologist with some computer knowledge, or


Computer scientist with some biology
background


Few do both well

Background


Biology for Computer Scientists



Computer Science for Biologists

Biological Information Flow

Genome

Introns/Exons

Gene Sequence

Protein Sequence

Protein Functions

Protein Structure

Cellular Pathways

Bioinformatics
attempts to model
this pathway

Living Things


Entropy (the tendency to disorder) always
increase



Living organisms have low entropy
compared with things like soil



They are relatively orderly…



The most critical task is to maintain the
distinction between inside and outside

Living Things


In order to maintain low entropy, living organisms
must expend energy to keep things orderly.



They figured out how to do this 4 billion years ago



The functions of life, therefore, are meant to
facilitate the acquisition and orderly expenditure
of energy

Living Things


The compartments with low entropy are
separated from “the world.”



Cells are the smallest unit of such
compartments.



Bacteria are single
-
cell organisms


Humans are multi
-
cell organisms

The “living things” have the
following tasks:


Gather energy from environment


Use energy to maintain inside/outside distinction


Use extra energy to reproduce


Develop strategies for being successful and
efficient at the above tasks


Develop ways to move around


Develop signal transduction capabilities (e.g. vision)


Develop methods for efficient energy capture (e.g.
digestion)


Develop ways to reproduce effectively


How to accomplish…?


Living compartments on earth have
developed three basic technologies


Ability to separate inside from outside (lipids)


Ability to build three
-
dimensional molecules
that assist in the critical functions of life
(Protein, RNA)


Ability to compress the information about how
(and when) to build these molecules in linear
code (DNA)

Bioinformatics Schematic of a
Cell

Lipids


Made of hydrophilic (water loving) molecular
fragment connected to hydrophobic fragments



Spontaneously form sheets (lipid membranes) in
which all the hydrophilic ends align on the
outside, and hydrophobic ends align on the inside



Creates a very stable separation, not easy to pass
through except for water and a few other small
atoms/molecules

What is Nucleotide?


Pentose, base, phosphate group

Pentose: RNA and DNA

Base


Adenine (A),
Cytosine (C),
Guanine (G),
Thymine (T),


Uracil (U).

Nucleic Acid Chain


Condensation reaction


Orientation


From 5’ to 3’


In DNA or RNA, a nucleic
acid chain is called “Strand”


DNA: double
-
stranded


RNA: a single strand


The number of bases


Base pair (bp) in DNA

DNA Structure

DNA Structure

DNA Structure

RNA Structure and Function



The major role of RNA is to participate in protein
synthesis



Messenger RNA (mRNA)


Transfer RNA (tRNA)


Ribosomal RNA (rRNA)

mRNA

The Genetic Code

What is gene?


A gene includes the entire nucleic acid
sequence necessary for the expression of its
product.


Such sequence may be divided into


Regulatory region


Transcriptional region: exons and introns


Exons encode a peptide or functional RNA


Introns will be removed after transcription


Gene

Genome


The total genetic information of an
organism.


For most organisms, it is the complete DNA
sequence


For RNA viruses, the genome is the
complete RNA sequence

Genes and Control


Human genome has 3,000,000,000 bps divided into 23
liner segments (chromosome)



A gene has an average 1340 DNA bps, thus specifying a
protein of about
?

(how many) amino acids



Humans have about 35,000 genes = 40,000,000 DNA bps
= 3% of total DNA in genome



Human have another 2,960,000,000 bps for control
information. (e.g. when, where, how long, etc…)

Gene Expression


An organism may contain many types of cells,
each with distinct shape and function



However, they all have the same genome



The genes in a genome do not have any effect on
cellular functions until they are “expressed”



Different types of cells express different sets of
genes, thereby exhibiting various shapes and
functions

Gene Expression


The production of a protein or a functional
RNA from its gene


Several steps are required


Transcription


RNA processing


Nuclear transport


Protein synthesis


Gene Expression

Central Dogma

DNA

RNA

Protein

Next …


Protein Structure and Function

An Amino Acid


An amino acid is defined as the molecule
containing an amino group (NH2), a
carboxyl group (COOH) and an R group.


R
-
CH(NH2)
-
COOH



The R group differs among various amino acids.


In a protein, the R group is also call a sidechain.

An Amino Acid

The Twenty Amino Acids of
Proteins

The Twenty Amino Acids of
Proteins

Protein


Peptide
― a chain of amino acids linked
together by peptide bonds.



Polypeptides ― long peptides



Oligopeptides ― short peptides (< 10
amino acids)



Protein are made up of one or more
polypeptides with more than 50 amino acids

Protein Structure


Primary Structure


Refers to its amino acid sequence

Secondary structure


Regular, repeated
patterns of folding of
the protein backbone.


Two most common
folding patterns


Alpha helix


Beta sheet

Tertiary Structure


The overall folding of the entire polypeptide
chain into a specific 3D shape

Quaternary Structure


Many proteins are formed more than one
polypeptide chain


Describe the way in which the different
subunits are packed together to form the
overall structure of the protein


Hemoglobin molecule

Quaternary Structure

Evolution


Mutation
― rare events, sometimes single base
changes, sometimes larger events


Recombination ― how your genome was
constructed as a mixture of your two parents


Through Natural Selection


Homology (similarity): different species are
assumed to have common ancestors


The genetic variation between different people is

(surprisingly ..)

References


http://www.biology.arizona.edu/biochemistr
y/problem_sets/large_molecules/


http://helix
-
web.stanford.edu/bmi214/index2004.html


http://www.web
-
books.com/MoBio/


http://www.cs.sunysb.edu/~skiena/549/