A Guide to Ontology-Based Phenotyping Systems Rationale and Methods Based on the

elbowsspurgalledInternet και Εφαρμογές Web

21 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

76 εμφανίσεις


Ontology
-
Based Phenotyping Systems


Page
1

of
12

V. 0
.3


10/21/2013

A Guide to Ontology
-
Based Phenotyping Systems


Rational
e

and Methods Based on the
Rockefeller University Experience

Andreas C. Mauer, Edward M. Barbour, Nickolay A. Khazanov, Natasha Levenkova, Shamim A.
Mollah, Barry S. Coller


I.
Background and Rational
e


One of the major obstacles to clinical and translational science is the lack of a
standardized method for recording and retrieving clinical information, including medical
histories, physical examination findings, and details regarding responses to thera
py.
Collectively
,

such information helps to define an individual's phenotype. The benefits of carefully
collecting and organizing longitudinal phenotypic information for research purposes are perhaps
best exemplified by the Framingham study, which has prov
ided and continues to provide
enormously valuable clinical and translational information that has directly influenced medical
practice and led to improvements in human health.
(1)

The r
evolutionary advances in genomics
and the growing sophistic
ation of proteomics
reinforce the need for
high
-
quality
, detailed

phenotypic information

because medically and scientifically meaningful gene
-
gene and gene
-
environment interactions can only be identified when correlated with detailed and reliable
phenotype
s.
Yet
,

d
es
pite

the general recognition
of its importance

to clinical and translational
research, phenotyping as a scientific discipline has
lagged behind

advances in genetics, a
deficiency that prompted

Freimer and Sabatti to call for
a
“human phenome p
ro
ject.

(2)


Therefore, Rockefeller University investigators have undertaken an initiative to enhance
human phenotyping under the auspices of a Cli
nical and Translational Science Award (CTSA).
To address the deficiencies in current practices


including the lack of standardized, rigorous,
and comprehensive data recording instruments, the common practice of discarding case report
forms after study com
pletion, and the use of differing instruments by different investigators


we

Ontology
-
Based Phenotyping Systems


Page
2

of
12

V. 0
.3


10/21/2013

developed an electronic phenotyping system for use by investigators worldwide. This prototype
system uses the bleeding history as a paradigm.


In order to promote standardization
, transparency, and aggregation of data from multiple
sources, as well as to facilitate data retrieval and analysis, the phenotyping system is grounded
in the creation of domain ontologies for the disorders under study. Ontologies help to achieve
these goa
ls by explicitly defining the existing knowledge about a disorder. This allows a group of
investigators to formally define and encode that information. In this way, the ontology allows one
to develop a common understanding of a disorder among a community o
f investigators and
make assumptions about the disorder explicit. The ontology’s electronic structure facilitates the
organizational analysis of the encoded knowledge, including database design and the merger of
different databases. Examples of ontologies
range from the gene ontology (GO)
(3)

to the
internet search engine Yahoo
(

Yet Another Hierarchical
Officious Organizer

).


We set the following goals for the system: 1. Insure the quality of the instrument by
expert review. 2. Maximize the use of standardized vocabulary for medical terms. 3. Insure the
security of the system.

4. Insure transparency by making the instrument publicly available. 5.
Faciliate adoption of the recording instrument by investigators at other sites by making it Web
-
accessible. 6. Connect the instrument to a scalable database.


II.
Building the Bleeding

History Phenotyping System


The

Bleeding Hi
story Phenotyping System (BHPS, figure 1)
consists of: a
comprehensive bleeding history questionnaire; a bleeding history ontology; an electronic
phenotype recording instrument (PRI); and a database.


Ontology
-
Based Phenotyping Systems


Page
3

of
12

V. 0
.3


10/21/2013


The first
step in developing the BHPS
was

the creation of

a comprehensive B
leeding
History Questionnaire

(BHQ)
. The BHQ was used as the
reference
for constructing

a Bleeding
History Ontology

(BHO). The BHO
explicitly
define
s

knowledge about
,

and relations between

an
d among,

bleeding symptoms

in a
n electronic

format that is scalable, standardizable, and
tractable for database manipulation and machine learning applications.
Th
e

BHO

served

as the
foundation for

an electronic Phenotype R
ecording
I
nstrument (PRI
)
. The PRI

employs
logical
axioms to speed data collection as well as
pictorial aids to facilitate accurate data collection
; it

also
includes
integrated data representation and analysis utilities

(see Section II
,4: Phenotype
Recording Instrument)
.
The PRI

is availab
le at
https://bh.rockefeller.edu/prat/
. I
nstructions for
use can be obtained from Dr. Andreas Mauer (
smollah@rockefeller.edu
).

The BHO also serves
as the templ
ate for a Bleeding History Database (BHD) t
hat stores de
-
identified demographic
data and

question responses.


Ontology
-
Based Phenotyping Systems


Page
4

of
12

V. 0
.3


10/21/2013

Figure
1
: Bleeding History Phenotyping System.


A
fter
an extensive literature search and review by experts
, a

paper clini
cal reminder form was converted into a
comprehensive Bleeding History Questionnaire

(BHQ)
. The questionnaire formed the basis for
a Bleeding History
Ontology

(BHO)
as well as a Bleeding History Database
(BHD)
and a graphical user interface and electronic
r
ecording instrument, the Phenotype Recording Instrument

(PRI)
.




As of
August 4
th
, 2009, the BHPS has been used by 4
investigators to collect
comprehensive phenotypic information on bleeding symptoms from
500

normal individuals
across three

sites (an
aca
demic research facility and two

community health center
s
).

The BHPS
is freely available to

investigators worldwide, and an administrative
framework for the
dissemination of BHPS instruments to the hemostasis community has been established.
The
BHPS methodo
logy was presented at the American Medical Informatics Association 2009
Summit on Translational Bioinformatics,
(4)

and preliminary analyses of data collected with the
BHPS
were

presented at the XXII Congress of the Inte
rnational Society on Thrombosis and
Haemostasis.


Ontology
-
Based Phenotyping Systems


Page
5

of
12

V. 0
.3


10/21/2013


W
e are eager to extend our approach to the phenotyping of other disorders

and
are
therefore

pleased to offer our support to other investigators interested in developing
phenotyping instruments in their own
fields of expertise.


1. Medical History, Physical Examination, and Laboratory Data Selection and
Organization


The first step of our
phenotyping system
methodology

(Figure
2
)

entails the creation of a
comprehensive
phenotyping questionnaire

to collect sig
ns and symptoms associated with
the
disorder or group of disorders
.
Given the importance of expert opinion in knowledge
modeling,
(5)

it is vital that the ontologies reflect the most recent and compreh
ensive information
based on

a

comprehensive

review of the literature and the opinion of experts in the field
.

In
addition,
to standardize the language used in the ontology and questionnaire
, we recommend
mapping as many terms as possible to the codes conta
ined in controlled vocabularies such as
the

Unified Medical Language System

(UMLS).
(6)

Other mappin
g
s employed in the BHPS
include
the International Classification of Disease
9
th

Edition (ICD
-
10
)
(7)

for medical diagnoses

and Online Mendelian Inheritance in Man (OMIM)
(8)

for genetic information on particular
disorders.


Ontology
-
Based Phenotyping Systems


Page
6

of
12

V. 0
.3


10/21/2013

Figure 2: Sample Phenotyping System Methodology

A Phenotyping Questi
onnaire (1
)
is

used to
derive

a
Phenotype Ontology (
2
). Th
e
ontology

is in turn

used to build

a

Phenot
ype
Database (3
) and an electronic Phenotype Recording Instrument (
4
). Existing
databases and/or

registries
may

be incorporated
using ontology
-
mediated approaches. Possible
applications

for phenotyping systems

include
generation of phenotype scoring
instru
ments and analyses of genotype
-
phenotype correlations.


2. Ontology


The

second step
is o
ntology construction.
M
any

existing
ontologies
may be
adapted for

phenotyping applications
, including those
contained in
public ontology repositories such as
BioPort
al
(9)

and the
Open Biomedical O
ntologies (
OBO
)

Foundry
.
(10)

If no appropriate
ontology already

exist
s for the desired purpose
,
a new ontology must be constructed. N
umerous

Ontology
-
Based Phenotyping Systems


Page
7

of
12

V. 0
.3


10/21/2013

methodologies for ontology constructi
on have been proposed
,
(11
-
14)

but all shar
e a few
common principles (adapted from Uschold
(14)
):


1. Identify the purpose of the ontology. What will it be used for? Who will use it?


2. Define t
he level of formality. In general, the more informal the ontology, the easier it is
for humans to interpret. Conversely, more formal ontologies are

more tractable for computerized

applications such as database mergers and automated reasoning.


3. Define th
e scope. Should all possible terms relevant to a given disorder be included,
or will a subset of terms suffice? The scope of the ontology will be directly related to the
ontology’s purpose.


4. Build the ontology. A variety of computer programs are availab
le for ontology
development, but the standard in medical domains is Protégé,
(15)

an open
-
sourc
e ontology
editor supported by the National Center for Bioontologies (NCBO), an element of the NIH
Roadmap for Medical Research.

Protégé has gained wide acceptance in the
b
iomedical
i
nformatics community and supports several ontology formats in addition to

database functions.
For this reason, the BHO
constructed by the Rockefeller team
was encoded using Protégé.


5. Make the ontology publicly available and continually re
-
e
valuate and revise the
ontology.
One of the benefits of an ontology is to help develop

a consensus understanding of a
topic, and this is best achieved by making the ontology publicly available so that experts in the
disorder and experts in biomedical informatics can review and comment on its content and
organization. This can be achieved by

uploading the ontology to one of the two leading
repositories of biomedical ontologies, BioPortal and the OBO Foundry. A systematic approach
to updating ontologies at regular intervals based on community feedback is particularly
important.


Ontology
-
Based Phenotyping Systems


Page
8

of
12

V. 0
.3


10/21/2013


For
additional

details on
the purpose of ontologies and methods for their construction,
reviews by
Noy
(12)

and Uschold
(14)

can be consulted
.

3. Database


After ontology construction is complete, the ontology

structure

can

be used as the
template for building a
d
atabase
. Because one aim of ontology
-
based phenotyping systems is to
make data

sources

freely accessible via the Internet, “dat
abase” refers here to

relational
database system
s

such as

Oracle, Microsoft SQL Serv
er, or MySQL.
(16)

The Rockefeller

BHPS
is implemented in
MySQL
because this database package
is

open so
urce, fast, and supportive
of programming languages like Python and Perl that are useful for Web des
ign
.

4. Phenotype Recording Instrument


The
ontology can also serve
a
s the basis for a comprehensive, Web
-
based PRI similar
to
th
at used by the BHPS.
The Rockefeller

PRI w
as

developed using the Python programming
language and the Django Web Application Fra
mework
,
(17)

but numerous other options such as
Adobe Dreamweaver
(18)

can

also
be used to design a PRI
.


Within
our

PRI, each group of phenotypic symptoms
is

independently accessible so as
to create convenient, modular questionnaire sections. Within sections, logical axioms
are
implemented
to speed question
naire completion.
For instance, if a subject answers “Yes” to the
question “Have you ever had or do you currently have spontaneous nose
bleeds?” the PRI will
direct the subject

to appropriate followup questio
ns; in contrast, if the
answer

i
s “No,”
the subje
ct

will not be asked any further questions about nosebleeds and
will
be

immediately

directed to the
next question module.


The Rockefeller

PRI
is

time
-
stamped so that the time required to complete the study can
be analyzed. Users
can

log off and log on as
they wish so the PRI does not have to be
completed in a single session. Visual aids such as high
-
quality photographs

(Figure 3)

can

be

Ontology
-
Based Phenotyping Systems


Page
9

of
12

V. 0
.3


10/21/2013

included to help
individuals

understand the questions and provide accurate responses.

In
addition, data representation ut
ilities (Figure 4) can be implemented to help investigators review
their data.

Figure 3: Phenotype Recording Instrument


Figure 4: Data representation utilities


Ontology
-
Based Phenotyping Systems


Page
10

of
12

V. 0
.3


10/21/2013





The process of
questionnaire, ontology, database
,

and PRI development
require
s

the
colla
boration of b
io
medical
i
nformaticists

and clinician
s
.
At Rockefeller, it required approximately
one year to complete all of the steps in constructing the BHPS. The time required for other
systsms will depend on the nature of available data collection instr
uments, as well as th
e
availability of clinical and b
iomedical
i
nformatics expertise.

We are eager to help other groups
develop ontology
-
driven phenotyping systems fo
r other disorders by sharing our experience and
offering guidance on their construction a
nd deployment
.


Ontology
-
Based Phenotyping Systems


Page
11

of
12

V. 0
.3


10/21/2013

III. Bibliography


(1)

Shindler E. Framingham Heart Study.
http://www

framinghamheartstudy org/about/milestones
html 2008 December 3;Available from: URL:
http://www.framinghamheartstudy.org/about/milestones.html


(2)

Freimer N, Sabatti C. The human phenome project. Nat Genet 2003 May;34(1):15
-
21.


(3)

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Dav
is AP, Dolinski K, Dwight SS,
Eppig JT, Harris MA, Hill DP, Issel
-
Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE,
Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene
Ontology Consortium. Nat Genet 2000 M
ay;25(1):25
-
9.


(4)

Mauer AC, Barbour E, Khazanov N, Levenkova N, Mollah S, Coller BS. An Ontology
-
Driven
Bleeding History Phenotyping System to Pool Data Across Sites. 2009 Mar 15; American Medical
Informatics Association; 2009 p. 176.


(5)

Gomez Perez
A, Benjamins VR. Overview of Knowledge Sharing and Reuse Components:
Ontologies and Problem
-
Solving Methods. 2009 Aug 2; 2009.


(6)

Kashyap V, Borgida A. Representing the UMLS Semantic Network using OWL: (Or "What's in a
Semantic Web link?"). In: Fensel D
, Sycara K, Mylopoulos J, editors. The Semantic Web
-

International Semantic Web Conference. Springer
-
Verlag, Heidelberg; 2003. p. 1
-
16.


(7)

National Center on Health Statistics. International Classification of Diseases, 9th Revision.
http://www

cdc gov/nchs/about/major/dvs/icd9des htm 2009 April 16;Available from: URL:
http://www.cdc.gov/nchs/about/major/dvs/icd9des.htm


(8)

Online Mendelian Inheritance in Man.
http://www

ncbi nlm nih gov/omim/ 2009 January
5;Available from: URL:
http://www.ncbi.nlm.nih.gov/omim/


(9)

National Center for Biomedical Ontology. BioPortal.
http://bioportal

bioontology org/
2009;Available from: URL:
http://bioportal.bioontology.org/


(10)

Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A,
Mungal
l CJ, Leontis N, Rocca
-
Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N,
Whetzel PL, Lewis S. The OBO Foundry: coordinated evolution of ontologies to support
biomedical data integration. Nat Biotechnol 2007 November;25(11):1251
-
5.


(11)

Gruber T.

A translation approach to portable ontology specifications. Knowledge Acquisition
1993;5(2):199
-
220.


(12)

Noy N, McGuinness D. Ontology Development 101: A Guide to Creating Your First Ontology.
http://protege

stanford edu/pu
blications/ontology_development/ontology101
-
noy
-
mcguinness
html 2008 December 30;Available from: URL:
http://protege.stanford.edu/publications/ontolo
gy_development/ontology101
-
noy
-
mcguinness.html


Ontology
-
Based Phenotyping Systems


Page
12

of
12

V. 0
.3


10/21/2013


(13)

Stevens R, Goble CA, Bechhofer S. Ontology
-
based knowledge representation for bioinformatics.
Brief Bioinform 2000 November;1(4):398
-
414.


(14)

Uschold M. Building Ontologies: Towards a Unified Method
ology. 1996 Dec 16; 1996.


(15)

Protege
-
OWL.
http://protege

stanford edu/overview/protege
-
owl html 2009 January 5;Available
from: URL:
http://protege.stanford.
edu/overview/protege
-
owl.html


(16)

MySQL.
http://www

mysql com 2009 April 2;Available from: URL:
http://www.mysql.com


(17)

Django. www djangoproject com/ 2009 April 2;Available from: URL:
www.djangoproject.com/


(18)

Adobe Dreamweaver.
http://www

adobe com/products/dreamweaver/ 2009 April 2;Available
from: URL:
http://w
ww.adobe.com/products/dreamweaver/