How to use the web for bioinformatics - Q7.com

dasypygalstockingsBiotechnology

Oct 2, 2013 (4 years and 1 month ago)

63 views

How to use the web for
bioinformatics


Molecular Technologies

Ethan Strauss

ethan.strauss@promega.com

274
-
4330 X 1171

http://www.q7.com/~ethan

Objectives


At the end of this session you should be able to do all
of the following using freely available tools on the
world wide web:


Use Genbank or a similar database to find nucleic
acid sequences of interest


Understand the parts of a Genbank entry


Use some of the databases at NCBI to find more
information about a sequence.


Perform an alignment of several nucleic acid
sequences


Find an arbitrary tool or database on the web.

How to find all those dang
URLs!

http://q7.com/~ethan/molbio/

Outline


What is Bioinformatics


Sequence Databases


What does a Genbank Entry look like?


Other NCBI databases


Multiple Sequence Alignment


New tools & Databases

What is Bioinformatics?

Bioinformatics

refers to the creation and
advancement of algorithms,
computational and statistical
techniques, and theory to solve formal
and practical problems posed by or
inspired from the management and
analysis of biological data (
Wikipedia
)

What is Bioinformatics?

(my working definition)

Anything done on a computer in which
knowledge of biology is helpful.

or

Anything done in biology in which
knowledge of computers is helpful.


What sort of questions can
Bioinformatics answer?


Sequence analysis


Where are restriction sites?


How does an RNA molecule fold?


What changes can be made to a DNA sequence to get a
new protein with specific functional changes?


Computational evolutionary biology


How are two sequences related?


Analysis of gene expression


Is this gene highly expressed in cancer cells?


What sort of work is done in
Bioinformatics?


Measuring biodiversity


How diverse are individuals of a species?


Is it one species or two?


Analysis of regulation


What does this drug do to expression of a gene?


Analysis of mutations in cancer


What is different about these cancer cells as compared
to none cancer cells?


High
-
throughput image analysis


How can we analyze the affects of 1000 different
compounds on the location of a specific protein?


And more!


Sequence Databases


NCBI databases


Nucleic acids, proteins,
Literature, genomes, taxonomy, SNPs and more!


EMBL



Nucleic acid, protein, structure,
microarray data and more.


DBJJ



Nucleic acid, protein.


SwissProt


Very well annotated protein database.


Many other
general and specialized databases
exist.


Sequences Databases

NCBI/Genebank

Nation Center for Biotechnology Information

(NCBI)

Sponsored and run by the US government.


Contains many different databases and huge amounts
of information.

Most or all data is
freely downloadable
.

This one site is probably sufficient for all your
Nucleic acid and Protein database needs!

Sequences Databases

Entrez


Allows searching

and access to

NCBI databases.



Sequences Databases

Sequence Records


LOCUS Number

Size

Type

Topology Division

Date


DEFINITION
-

Name of the Sequence


ACCESSION
-

Unique Id number


VERSION
-

Other numbers which are associated


KEYWORDS


SOURCE


What was it isolated from


ORGANISM
-

More taxonomic detail


REFERENCE
-

Paper or papers about the sequence


AUTHORS


TITLE


JOURNAL


FEATURES
-

A complete list of all of the features of a sequence. Can be very
extensive and useful!


ORIGIN


The actual Sequence!

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=58533118


Other NCBI databases

Online Mendelian Inheritance in Man (OMIM)

A catalog of human genes and genetic disorders with
links to other NCBI databases, including sequence
databases.

This is a good starting point if you want to get
sequences for a specific disorder.

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CM
D=search&DB=omim&term=HFI


Other NCBI databases

Gene Database

Gathers information about a single gene.

Exactly one entry per Gene.

A good place to dig deeper into a single gene
or to reduce redundancy about a single
gene.

Other NCBI databases


HomoloGene

Gathers homologs from various species


3D Domains

Protein Structure collection


Taxonomy

Species information


Geo

(Gene Expression Omnibus)

A gene expression/molecular abundance
repository


General Utilities


http://searchlauncher.bcm.tmc.edu/seq
-
util/seq
-
util.html



Translation


Restriction Digestion


Reformatting
(alternately
FASTA Formatter
)


Complement/Reverse


Etc.


http://www.promega.com/biomath/calc11.htm



Melting Temperature of an oligo.

Database search by sequence
similarity

Basic Local Alignment Search Tool (BLAST)

Multiple Sequence Alignment

Many programs can align multiple sequences
with each other to find the best fit for all.

This is generally more biologically
meaningful for protein sequences since they
are more highly conserved.

Clustal

is the most common.

Multiple Sequence Alignment


MEAGAYLNAIIFVLVATIIAVISRGLTRTEPCTIRITGESITVHACHIDSX ETIKALA
MEAGAYLNAIIFVLVATIIAVISRGLTRTEPCTIRITGESITVHACHIDS...ETIKALA
MEA..YLNAII.VLV.TIIAVIS..L.RTEPC.IkITGESITV.ACklDa.....I..L.
MEAgaYLNAIIfVLVaTIIAVISrgLtRTEPCtIrITGESITVhAChiDsx etIkaLa




LK PLSLERLFQ

LK.PLSLERLFQ

......L.....

lk plsLerlfq


New Tools

Development of new tools and databases is
ongoing.

Your needs will probably change over time.

You can find new tools using


Google


Lists


Nucleic Acids Research Annual Database
issue

Homework

Assignments due next session

1.
Find a entry of interest in OMIM
(
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM
)

2.
Find a Gene associated with that entry

1.
Click on the “links” link on the right

and choose “Gene”

Homework

3.
The Gene page has gathered scads of information about this one gene.

Find homologs in other species.

From this page again choose “links” and go to Homologene

Homework

1.
Gather the protein sequences for

each homologous gene (or 5 of them if there are more than that).

1.
Click “DownLoad” in

the homologene listing










2.
Download everything with

the default settings.

Homework

You will get a text file in “Fasta” format. Save it somewhere convenient.

Homework

Go to the Clustal server at
http://searchlauncher.bcm.tmc.edu/multi
-
align/multi
-
align.html


Paste your complete Fasta file contents into the input box and click submit.

This takes awhile, so be patient. You will get output that looks something like this.


Homework

At the bottom of the alignment file is the same results in “Fasta” format. Copy the
complete Fasta results and paste it into the input box at a BoxShade server
(http://bioweb.pasteur.fr/seqanal/interfaces/boxshade.html)

Homework

Depending on the parameters chosen for BoxShade, you will see something like this.
Regions which are the same in all species are likely involved in function in some
way.

Homework

After all that work, your boss comes to you ands says that sequence comparison is
obsolete! He wants you do structural alignments of these proteins. Figure out
what a structural alignment is, find two different tools to find conserved 3D
structures and choose which one you would use for this. Describe why this
tool is preferable to the other.


NOTE: You do not need to actually do any structural alignments. Just find out
how you would go about doing on if you had to.