Fungal Semantic Web

steelsquareInternet and Web Development

Oct 20, 2013 (3 years and 9 months ago)

78 views

Fungal Semantic Web

Stephen Scott, Scott Henninger, Leen
-
Kiat Soh (CSE)

Etsuko Moriyama, Ken Nickerson, Audrey Atkin
(Biological Sciences)

Steve Harris (Plant Pathology)

Motivation

Many fungal genomes being sequenced


100s in the next few years

Important fungal genetics work done by
large and strong UNL group


Have prototype fungal genome database

Numerous groups around the world are
developing disparate fungal genome
databases


Databases dissimilar and widely distributed


Difficult to unify others’ results with one’s own


Unnecessarily complicates research

But it’s impractical to unify the databases!


Semantic Web

Develop an
ontology

to describe the fungal
genome data


An ontology is a formal, explicit specification of shared
concepts


Allows both human and machine processing


Concepts shared between ontology files on the WWW


Ontology describes properties of genes, relations
between genes, and operations useful in analyzing
them

Participants keep their own data locally, but
represent it in a way consistent with this
framework

Captures the semantic meaning of the data,
facilitating automatic processing


This is where the fun starts

Semantic Web Architecture

What we can do with it

Can do transitive reasoning on genes


E.g. if genes A and B are related via property 1 and B
and C are related via 2, then perhaps so are A and C

Inverse relationships to reduce data entry


E.g. “EvolvedFrom” data entered automatically
implies “EvolvedTo” relation

Consistency checking


E.g. verify that UNL’s assertions about fungal
genomes don’t contradict those by others on the
same genomes

Hypothesis building and testing


E.g. identification of genes that function in specific
cellular processes

Knowledge discovery and data mining


Ontology includes appropriate techniques for users to
apply to extract new knowledge from the data

What we can do with it (cont’d)

Uniform interface to the world’s collection of
genomic resources


Visualization, query & search


Instructional tool: Train postdocs/students as bi/tri
-

lingual scientists who can understand molecular
(fungal) biology/genetics, bioinformatics, and
computer science

Can add
active machine learning

component to
facilitate querying of database to classify new
sequences


Computer learns how to classify biological sequences
via labeled examples, interaction with the user, and
interaction with other experts

Prior Work

Application of semantic web technology to
bioinformatics is not new

Gene Ontology (
http://www.geneontology.org
)


Collection of ontologies related to molecular
functions, biological processes, and cellular
components


Takes a rather limited view of ontologies

Little (if any) use of quantifiers, shared concepts, etc.

Prior Work (cont’d)

Fungal Web
(
http://www.cs.concordia.ca/~baker/
)


Built a fungal gene ontology based on GO


Developing technologies to parse on
-
line scientific
literature to add data to database


Tools to query databases and perform analysis

Similar to what we propose, but:


Their extensions to GO do not suit the needs of UNL
scientists or the broader fungal community

They focus on fungi that degrade cellulose

Their annotations too limited to represent entire fungal kingdom


They support machine learning, but not active learning


Extending Other Repositories

Use existing ontologies (e.g. GO) and data
stores as a basis for fungal ontologies


Utilize existing concepts in other gene
ontologies


Extend to meet needs of fungal genomes


Extensions can in turn be utilized by other
researchers, both fungal and other kingdoms

Because we use common concepts where
applicable

Funding Opportunities

NSF


Frontiers in Integrative Biological Research (FIBR): Oct
prerop, Feb full
http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf05597


Science and Engineering Information Integration and
Informatics (SEIII): December
http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf04528

NIH


INNOVATIONS IN BIOMEDICAL COMPUTATIONAL
SCIENCE AND TECHNOLOGY: Sept LOI, Oct full
http://grants1.nih.gov/grants/guide/pa
-
files/PAR
-
03
-
106.html

Nebraska Research Initiative: November

Conclusions

Semantic web now popular within bioinformatics,
but no support for the work of UNL’s fungal
research community

We plan to build the necessary infrastructure to
unify disparate data sources and provide an
interface conducive to knowledge discovery,
hypothesis testing, and collaboration


Will build on existing fungal database here at UNL


Contributions: distributed infrastructure, means for
querying, drawing inferences

We should send someone to the
Knowledge
-
Based Bioinformatics Workshop

to learn more
about the state of the art