Exploiting Gene Ontology to Conceptualize Biomedical Document Collections

grassquantityΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

107 εμφανίσεις

BiKE

Exploiting Gene Ontology to
Conceptualize Biomedical
Document Collections

Hai
-
Tao Zheng, Charles Borchert, Hong
-
Gee Kim

Biomedical Knowledge Engineering Laboratory, Seoul

National University

BiKE

Contents

Motivation

1

The GOClonto method

2

3

4

Experiment

Conclusion and future work

BiKE

Motivation



much

research has been proposed to

utilize ontologies to help users
understand the information easily



most of the existing methods do not use ontologies to help

users
directly capture key gene
-
related terms within biomedical documents



key gene
-
related terms are considered as the most important

gene
-
related terms to which a biomedical document collection are

related



understanding key gene
-
related terms and their semantic

relationships is essential for comprehending the conceptual

structure
of biomedical document collections and avoiding

information overload
for users


BiKE

GOClonto



GOClonto
identif
ies

the key gene
-
related terms
based on LSA
(Latent Semantic Analysis)



Utilizing GO (Gene Ontology), GOClonto
automatically generate
s
corpus
-
related gene ontologies based on these key gene
-
related

terms, for conceptualization of biomedical document collections.



Conceptualization of biomedical document collections here means
representing document collections with a set of key gene
-
related
terms and their semantic relationships, which can help users more
easily understand biomedical document contents

BiKE

The GOClonto Method

BiKE

A biomedical document collection example

BiKE

Preprocessing



split a biomedical document into sentences



Java
-
based conditional random fields POS Tagger for English, to
perform the POS tagging.



CRFChunker, which is Java
-
based conditional random fields phrase
chunker, is used to identify the noun phrases in a document.



With the identified nouns and noun phrases, GOClonto determines
whether or not the nouns or noun phrases are GO
-
terms by
referencing GO



For the example, t=5 GO
-
terms appear more than once in the
collection and thus are treated as frequent

BiKE


Term
-
document Matrix Construction


TF/IDF method is used to calculated the term
weights:

where
df
t
is the document frequency of term
t
that counts how
many documents in which term
t
appears

BiKE

Key GO
-
term Induction

BiKE

Related Document Allocation


BiKE

Ontology Generation Algorithm


BiKE

GOClonto User Interface


BiKE

Experiment


1. Clustering Results Evaluation


2. Generated Ontology Evaluation

BiKE

Experiment


1. Clustering results evaluation

Based on the biomedical document collection, the key GO
-
terms
extracted by GOClonto are:
centrosome
,
microtubule
, centriole,
membrane
, flagellum, spindle,
cilium
, growth,
chromatin
.

BiKE

Generated Ontology Evaluation

BiKE

Conclusion



GOClonto exploits GO to automatically generate corpus
-
related
gene ontologies for users. The generated ontologies can help users
conceptualize biomedical document collections



The experimental results show that GOClonto is able to identify
key GO
-
terms from document corpora. The generated ontologies
are more informative than the hierarchical tree created by Fuzzy
Ants clustering algorithm



We believe that GOClonto will play an important role helping users
visualize and conceptualize biomedical document collections

BiKE

Future Work



Since the experiment data set is relatively small, large data sets
should be used to evaluate GOClonto comprehensively



More NLP methods should be studied to improve the precision of
GO
-
term extraction



Addition of other visualization techniques alongside GOClonto can
further aid user navigation of biomedical document collections



Other biomedical
-
related ontologies can be used to generate the
ontologies. Good examples are FMA (the Foundational Model of
Anatomy) and SNOMED CT (Systematized Nomenclature of
Medicine
-

Clinical Terms)

BiKE