Presentation

dasypygalstockingsΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

80 εμφανίσεις

Ollie Bridle BSc. Hons., MA., MPhil.

oliver.bridle@ouls.ox.ac.uk

May 2008

Outline

1.
Introduction.

2.
Information sources in biology and
associated problems.

3.
What is bioinformatics?

4.
DNA databases.

5.
Entrez. (+ exercise)

6.
Summary.

Aims


Convince you that these bioinformatics
resources are valuable for research.


Give you some important searching
strategies.


Show you how to find what
you

want.


Suggest other resources and further help.


What I won’t Cover


All the resources available.


Commercial software.


Huge amounts of scientific detail.


Bibliographic and abstract databases


Check out some of the other WISER
sessions.

About Me…


Trainee librarian.


Formerly a biologist
-

degrees in Microbiology
(BSc) and Microbial Genetics (MPhil).


Much less familiar with animal and population
genetics…but…


As far as searching databases goes, similar
principles apply.

Information Sources for
Research
-

Key Questions


What is available?


Where do I find it?


How do I search it?


Information Sources for Research

Journals, books, theses, abstracts.

Technical literature (e.g. protocols,
equipment handbooks).

Conferences, seminars, meetings and
exhibitions.

Molecular biology databases.

Problems with Biological Data


Data collection.


The base of information is large, expanding and
diverse.


Organisation and accessibility.


Requirement for special search techniques.
You
can’t Google a DNA sequence…yet!


A student/researcher wants the right information
quickly!!!

The Good News


Large projects working to organise this
information.


Much is freely available over the internet.


University subscribes to many e
-
journals
and bibliographic databases available
through Oxlip.

A Definition of Bioinformatics



‘…information technology applied to the
management and analysis of biological
data’ (
Attwood, T. K
)



A multidisciplinary subject.


Bioinformatics aims to…


Collect,


Organise,


Store,


Retrieve,


Analyse,

….biological data with the use
of computers.



Protein
structural
modelling.

Taxonomy and
phylogenetics.

DNA/Protein
sequence
databases.

Protein interaction.

Gene
expression
studies.

E
-
journals and
bibliographic
databases.

Bioinformatics

Scope of Bioinformatics

What is a DNA Sequence?


The DNA double helix is made up of a
series of chemical bases stung along a
sugar backbone.


There are 4 bases usually represented by
the letters A, T, C and G.


The linear sequence in which these bases
occur determines all the instructions for
building an organism.

What is a Protein Sequence?


Proteins are complex molecules which
control most aspects of cell biology.


Constructed of small subunits called
amino
acids
.


There are 20 types of amino acid.


Assembeled by ‘reading’ (or
translating
) the
DNA sequence.


Every set of 3 bases (e.g. ATG) corresponds
to an amino acid.


So a protein is built up one amino acid at a
time according to the DNA blueprint.

DNA
Molecule

In Summary…

Proteins

DNA Sequence

Complete
Organism

Looking at DNA sequences I


Analysis of DNA or protein sequences is a
frequent requirement of research.


Locating genes within a sequence.


Comparing two sequences for similarity.


Searching for similar genes (orthologues) in
other organisms.


Looking at DNA sequences II


DNA sequences are easily stored, retrieved,
compared and manipulated on computers.


Just represent each base as a letter!


Computers can compare two or more
sequences and find similar regions.


Much analysis of genetic information now takes
place
in silico.

Looking at DNA Sequences III


DNA sequences can be determined
experimentally.


Software allows biologists to construct and
view maps of DNA sequence.


The DNA code of ATCG gets transformed
into something much more human friendly.


Artemis

is one available map viewer.

Artemis Map Viewer

Using a DNA Sequence

Identifying
genes of
similar
function

Determining
protein
composition

Identification

Classification

Medical
diagnostics

Forensics



DNA Databases


Free access to vast numbers of
sequences deposited by researchers all
over the world.


Used alongside scientific papers.


Can be searched or ‘mined’ in a variety of
ways.

Global Bioinformatics Agencies

E
uropean
M
olecular
B
iology
L
aboratory

N
ational
C
entre for
B
iotechnology
I
nformation

D
NA
D
ata
B
ank of
J
apan

I
nternational
N
ucleotide
S
equence
D
atabase

C
ollaboration



NCBI and Genbank


Genbank is NCBI’s DNA database.


Extensive search and deposit capabilities.

606 sequences

A Practical Example


A researcher might start with a piece of
DNA rather than a literature citation
.


Here we will


1.
Search a DNA database using a piece of
DNA sequence.

2.
Use the results of the search to identify
relevant literature.


The Experiment

1) Grow
some bugs.

2) Extract
the DNA.

3) Amplify up
the desired
section of DNA.

4) Generate
sequence.

A DNA Sequence


The following sequence is in FASTA
format.

>G08_CHEV11Fed.seq

GTCGACGCGCAAATGGTTCTATATCCATACCAATAGCAGTATCGTTGCCA

TTATCACGAATGGAATTAAGTAAAGTTTTCATTCTATCAATAGACTCTAA

AACCACATCCATGATATCTGGAGTTATTTTTAACTCGCCATGTCTTGCTT

TGTTTAAAACATCCTCCATGTGGTGAGTTAACTTTGTTAAAACATCAAAA

TTTAAGAAGCTTGATGATCCTTTAACCGTATGTGCAACACGGAAAATTCT

ATTTAATAATTCTAAATCTTCTGGATTTGATTCAAGCTCTACTAAATCAT

GGTCGATTTGCTCAACAAGCTCAAAAGCTTCAACCAAAAAGTCTTCAAGT

ATTTCTTGCATATCTTCCATATTTTACCCCTGTTCTTGAGATTGATGTTT

TTTAATAACCTTTGCAATTTCATTGAAGAAATCGCTAGCGTTAAATTTGA

CAAGATAGCCTTCTCCACCAGCTTCTTGAACACCTTTCTCATTCATAAAT

TCATTTGATAAAGATGAGTTAAAGACTATAGGAATATCTTTAAATCCGGG

ATCTTCTTTAATGCGTGCAGCGGATCCCGGGTACCTGCAGAATTCAGCTG

CGCCCTTTAGTTCCTAAAGGGTTTTTATCAGTGCGACAAACTGGGATTTT

ATTTATTCAGCAAGTCTTGTAATTCATCCAAAAAACGGCAAACATGAAAG

CCGTCACAAACGGCATGATGCACTTGAATCGATAAGGGAATATAGTATTT

TCCGCCCTCCTCATAATACTTCCCAAACGTAAATATCGGCAGTAGATAGT

A BLAST Search


B
asic
L
ocal
A
lignment
S
earch
T
ool


Aimed at finding
highly similar

sequences
in the database.


Lets see how to submit a sequence query
to the Genbank database.


BLAST Search Screen

Enter sequence.

Select database.

Select BLAST type.

BLAST Results I

The Statistics


Guidelines for evaluating stats (
data from
‘Introduction to Bioinformatics’, Lesk, A, OUP (2005)
)


E ≤0.02


Sequences probably homologous
(i.e. derived from a common ancestor)


E between 0.02 and 1


homology unproven
but can’t be ruled out.


E>1



Expect this good a match by chance.


Putting the amino acid sequence
NELLYTHEELEPHANT into a BLAST
protein search produces results!


Best match E value = 9

BLAST Results II

Two possible
matches.

BLAST Results III

Literature references
allow us to go straight
to citations in PubMed
relevant to the
sequence we have
found.

Here is the name of the
gene!

Evaluating the Data


There are errors in these databases!


Is a BLAST
search
appropriate?

What is the
source of this
sequence?

What are the
statistics
telling me?

Should I
cross
reference?

Using Accession Numbers


Papers often contain accession numbers.


No database submission = No publication.


Using HTML versions of papers you can
link directly to the gene or protein
sequence.


Here’s one I made earlier….


Exploring Further


Start with a completely unknown sequence.


Searching for ‘CheV’ in WOS will not bring up all
the relevant papers.


Starting from a DNA sequence you have a new
way to search.


‘Having a BLAST with bioinformatics (and
avoiding BLASTphemy)’,
A. Pertsemlidis and J.
W. Fondon III
. Genome Biology (2001), 2(10),
pp. 1
-
10

Structure of Entrez


Powerful resource for research.


Entrez is a cross
-
database search engine.


Records are cross referenced and linked
.

Simple
‘one box’
search.

DNA
databases

Literature
database

Protein
databases

Genome
projects

Taxonomy
databases

Entrez Main Screen

Single Keyword Search


Type keyword into the search box and
click ‘GO’




The number of hits for the search term is
shown by each database.


Single keyword searches are limited.


Advanced search techniques refine results
and produce fewer irrelevant hits.



Using Boolean Operators


Boolean operators and phrases build
complex searches.


Use
AND
,
OR

and
NOT

to join terms.

Chemotaxis
AND

“Campylobacter jejuni”


Use UPPERCASE for the operators.


A phrase is enclosed in quotation marks.


Protein glycosylation



Your Turn!


A little practice using
Entrez.


Follow the instructions on
the handout.


Shout if you have
problems.

10 Minutes

Notes on the Exercise


Using brackets with Boolean operators
refines search results.


Care with placing brackets is essential!


The clipboard is helpful for recording
results of searches.

Refining Searches and Setting
Limits.


Within an individual database results may
be further refined by setting limits.


The number and type of limits will depend
on the database.


Click the ‘limits’ tab from within one of the
databases.


Steps in Setting a Limit

1.
Select a field to limit the search by.

2.
Type in the limiting term in the search
box.

3.
Select other limiting options e.g.



Publication date.


Database.

4.
Hit ‘GO’ to retrieve the results.


Using the History


The history keeps track of previous
searches.


You can combine searches and limits
quickly and easily.


You can isolate records matching very
specific criteria.


A demonstration....

Jumping Between Databases


Records in Entrez are extensively cross linked.


The ‘links’ hyperlink next to each record lets you
jump between databases.

Entrez in Summary


We’ve looked at



Simple and advanced searching.


Accessing and moving between records.


Using the clipboard.


Setting limits.


Using the history.


Sorting results.

Evaluating Entrez I


Advantages


Quickly cross reference many databases.


Elaborate searches can be constructed within
each database.


Tools to save and modify searches.


Pools many resources.


Disadvantages


Can return many irrelevant results.


Syntax for advanced searching is complicated
(many databases = many fields).


Doesn't cover everything!



Evaluating Entrez II

Summary


Bioinformatics resources help collect, organise
and analyse biological data.


Essential resources for biology research.


Bioinformatics databases can be searched in
unique ways.


Entrez provides a powerful cross
-
database
searching tool.


Many more resources out there!

And Finally…


Thanks for listening!


Any Questions?

Resources

Search Engines and Software


NCBI BLAST



www.ncbi.nlm.nih.gov/blast/Blast.cgi


Entrez



www.ncbi.nlm.nih.gov/sites/gquery


SRS



Another cross database search engine for
bioinformatics data similar in principle to Entrez.
http://srs.ebi.ac.uk/



EMBOSS Bioinformatics software



A whole suite of
free applications for processing many kinds of biological
data.
http://emboss.sourceforge.net/


ARTEMIS


A free sequence viewer and editor.
www.sanger.ac.uk/Software/Artemis/

Sources of Help I


EMBL, DDJ and NCBI all provide reliable introductory information on
bioinformatics. They also have extensive documentation for the
databases and bioinformatics tools they support.


Tutorials


Try out the 2can tutorials provided by EMBL
www.ebi.ac.uk/2can/home.html


Entrez Help


The Entrez manual can be viewed on
-
line or downloaded as a PDF
document.
www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.chapter.EntrezH
elp


Sources of Help II

Subject Guides


Subject librarians have prepared a number of guides to research resources
available in a range of scientific fields.
www.ouls.ox.ac.uk/rsl/e
-
resources


Books


A number of books are available through OULS. I’d particularly recommend
the following. Search the OLIS catalogue at
www.lib.ox.ac.uk/olis/


‘Essential Bioinformatics’ by Jin Xiong (2006), Cambridge University Press.


‘Bioinformatics. Sequence and Genome Analysis, 2nd Edition’ by D. W. Mount.
(2004), Cold Spring Harbour Laboratory Press.


Courses


Oxford University School of Continuing Education has a bioinformatics
programme offering short courses, diplomas and Masters qualifications.


http://bioinfomsc.stats.ox.ac.uk/