Fungal Semantic Web

Stephen Scott, Scott Henninger, Leen
Kiat Soh (CSE)

Etsuko Moriyama, Ken Nickerson, Audrey Atkin
(Biological Sciences)

Steve Harris (Plant Pathology)


Many fungal genomes being sequenced

100s in the next few years

Important fungal genetics work done by
large and strong UNL group

Have prototype fungal genome database

Numerous groups around the world are
developing disparate fungal genome

Databases dissimilar and widely distributed

Difficult to unify others’ results with one’s own

Unnecessarily complicates research

But it’s impractical to unify the databases!

Semantic Web

Develop an

to describe the fungal
genome data

An ontology is a formal, explicit specification of shared

Allows both human and machine processing

Concepts shared between ontology files on the WWW

Ontology describes properties of genes, relations
between genes, and operations useful in analyzing

Participants keep their own data locally, but
represent it in a way consistent with this

Captures the semantic meaning of the data,
facilitating automatic processing

This is where the fun starts

Semantic Web Architecture

What we can do with it

Can do transitive reasoning on genes

E.g. if genes A and B are related via property 1 and B
and C are related via 2, then perhaps so are A and C

Inverse relationships to reduce data entry

E.g. “EvolvedFrom” data entered automatically
implies “EvolvedTo” relation

Consistency checking

E.g. verify that UNL’s assertions about fungal
genomes don’t contradict those by others on the
same genomes

Hypothesis building and testing

E.g. identification of genes that function in specific
cellular processes

Knowledge discovery and data mining

Ontology includes appropriate techniques for users to
apply to extract new knowledge from the data

What we can do with it (cont’d)

Uniform interface to the world’s collection of
genomic resources

Visualization, query & search

Instructional tool: Train postdocs/students as bi/tri

lingual scientists who can understand molecular
(fungal) biology/genetics, bioinformatics, and
computer science

Can add
active machine learning

component to
facilitate querying of database to classify new

Computer learns how to classify biological sequences
via labeled examples, interaction with the user, and
interaction with other experts

Prior Work

Application of semantic web technology to
bioinformatics is not new

Gene Ontology (

Collection of ontologies related to molecular
functions, biological processes, and cellular

Takes a rather limited view of ontologies

Little (if any) use of quantifiers, shared concepts, etc.

Prior Work (cont’d)

Fungal Web

Built a fungal gene ontology based on GO

Developing technologies to parse on
line scientific
literature to add data to database

Tools to query databases and perform analysis

Similar to what we propose, but:

Their extensions to GO do not suit the needs of UNL
scientists or the broader fungal community

They focus on fungi that degrade cellulose

Their annotations too limited to represent entire fungal kingdom

They support machine learning, but not active learning

Extending Other Repositories

Use existing ontologies (e.g. GO) and data
stores as a basis for fungal ontologies

Utilize existing concepts in other gene

Extend to meet needs of fungal genomes

Extensions can in turn be utilized by other
researchers, both fungal and other kingdoms

Because we use common concepts where

Funding Opportunities


Frontiers in Integrative Biological Research (FIBR): Oct
prerop, Feb full

Science and Engineering Information Integration and
Informatics (SEIII): December



Nebraska Research Initiative: November


Semantic web now popular within bioinformatics,
but no support for the work of UNL’s fungal
research community

We plan to build the necessary infrastructure to
unify disparate data sources and provide an
interface conducive to knowledge discovery,
hypothesis testing, and collaboration

Will build on existing fungal database here at UNL

Contributions: distributed infrastructure, means for
querying, drawing inferences

We should send someone to the
Based Bioinformatics Workshop

to learn more
about the state of the art