AI and Bioinformatics - Dave Reed

weinerthreeforksBiotechnology

Oct 2, 2013 (3 years and 10 months ago)

76 views

AI and Bioinformatics

From Database Mining to the Robot
Scientist

History of Bioinformatics


Definition of Bioinformatics is debated


In 1973, Herbert Boyer and Stanely Cohen
invented DNA cloning.


By 1977, a method for sequencing DNA was
discovered


In 1981 The Smith
-
Waterman algorithm for
sequence alignment is published



History of Bioinformatics


By 1981, 579 human genes had been
mapped



In 1985 the FASTP algorithm is published.



In 1988, the Human Genome organization
(HUGO) was founded.


History of Bioinformatics


Bioinformatics was fuelled by the need to create
huge databases.


AI and heuristic methods can provide key solutions
for the new challenges posed by the progressive
transformation of biology into a data
-
massive
science.


Data Mining


1990, the BLAST program is implemented.


BLAST: Basic Local Alignment Search Tool.


A program for searching biosequence databases

History of Bioinformatics


Scientists use Computer scripting languages
such as Perl and Python


By 1991, a total of 1879 human genes had been
mapped.


In 1996, Genethon published the final version of
the Human Genetic Map. This concluded the
end of the first phase of the Human Genome
Project.




History of Bioinformatics

Year

Subject Name

MBP

(Millions of base pairs)

1995

Haemophilus Influenza

1.8

1996

Bakers Yeast

12.1

1997

E.Coli

4.7

2000

Pseudomonas aeruginosa


A. Thaliana

D. Melonagaster


6.3

100

180

2001

Human Genome

3,000

2002

House Mouse

2,500

Bioinformatics Today


There are several important problems where AI
approaches are particularly promising


Prediction of Protein Structure


Semiautomatic drug design


Knowledge acquisition from genetic data

Functional Genomics and the Robot
Scientist


Robot scientist developed by University of
Wales researchers


Designed for the study of functional genomics


Tested on yeast metabolic pathways


Utilizes logical and associationist knowledge
representation schemes

Ross D. King, et al.,
Nature
, January 2004

The Robot Scientist

Source: BBC News

Yeast Metabolic Pathways

Hypothesis Generation and
Experimentation Loop

Ross D. King, et al.,
Nature
, January 2004

Integration of Artificial Intelligence


Utilizes a Prolog database to store
background biological information


Prolog can inspect biological information,
infer knowledge, and make predictions


Optimal hypothesis is determined using
machine learning, which looks at probabilities
and associated cost

Experimental Results


Performance similar to humans


Performance significantly better than “naïve” or
“random” selection of experiments


Ross D. King, et al.,
Nature
, January 2004

For 70% classification accuracy:

A hundredth the cost of random

A third the cost of naive

Major Challenges and Research Issues


Requires individuals with knowledge of both
disciplines


Requires collaboration of individuals from diverse
disciplines

Major Challenges and Research Issues


Data generation in biology/bioinformatics is
outpacing methods of data analysis


Data interpretation and generation of hypotheses
requires intelligence


AI offers established methods for knowledge
representation and “intelligent” data interpretation


Predict utilization of AI in bioinformatics to increase



References and Additional Resources

Ross D. King, Kenneth E. Whelan, Ffion M. Jones, Philip G. K. Reiser, Christopher H.
Bryant, Stephen H. Muggleton, Douglas B. Kell & Stephen G. Oliver. Functional
Genomic Hypothesis Generation and Experimentation by a Robot Scientist.
Nature

427 (15), 2004.

A Short History of Bioinformatics
-

http://www.netsci.org/Science/Bioinform/feature06.html

History of Bioinformatics
-

http://www.geocities.com/bioinformaticsweb/his.html




National Center for Biotechnology Information
-

http://www.ncbi.nih.gov

Pubmed
-

http://www.pubmed.gov