CDISC EU Interchange 2012

elbowsspurgalledInternet και Εφαρμογές Web

21 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

124 εμφανίσεις

CDISC EU Interchange 2012



Page |
1







Semantic Models for CDISC Based Standards and Metadata Management

Introduction

We have possibly come at a critical turning point in the way clinical data can be managed, used
and reused within and across organizations. The coverage and maturity of exi
sting CDISC
standards, the establishment of these standards within the industry at large, the use of these
standards as a foundation for metadata driven systems, and the upcoming role of semantic
standards are all converging to create new and unique opport
unities. In this presentation we look
at the implications and challenges of integrating CDISC standards, metadata, and information
models into a single framework. We also show how semantic standards can provide a solid
foundation in building such a framewo
rk.

CDISC Standards

The role of data standards for the management of clinical data has shifted significantly over the
past few years, largely due to the establishment of CDISC standards across the pharmaceutical
industry. Not so long ago, sponsors had to c
onsider if and when they should use SDTM standards
for FDA submissions. Today, those questions have changed. Not if and when, but how to best
adopt CDISC based data standards is becoming the leading question. This change in mindset is
in itself a major ste
p forward, but also leads to formidable challenges, both for CDISC as the
owner of the standards, for sponsors integrating these standards into their own organizations, for
vendors providing products and services, and for regulatory organizations to review

submitted
data.

A key challenge for any set of standards is to be consistent and complete. Looking at the CDISC
standards, we see a variety of standards at different levels of maturity. The SDTM stan
dards,
domains and terminology

seem to have the highest
level of adoption to date, but as more
sponsors submit data according to those standards, its shortcomings become magnified. SDTM is
an informal model and in many instances open for interpretation. This leads to inconsistencies in
CDISC EU Interchange 2012



Page |
2



how collected data is map
ped to SDTM, potentially across studies from a single sponsor, but
definitely across studies from different sponsors. As sponsors get comfortable adopting the
SDTM standards, they naturally venture into the CDASH and ADaM standards. These standards
have ha
d a shorter life time and have not yet reached the maturity level of SDTM while suffering
from
similar problems. In addition, issues about consistency at the content and representational
levels across the CDISC standards come into focus as well. This is hi
ghlighted by the disconnect
between the standards just mentioned and the BRIDG model, a comprehensive domain analysis
model for protocol
-
driven biomedical and clinical research, captured as a UML model.

Sponsors adopting CDISC have to deal with these issue
s. They also face the challenge to manage
and integrate CDISC based data standards within their respective organizations at the
information architecture, process, and systems application level. In the following sections we
outline some fundamental principl
es that can help meet these challenges.

Information Architecture

We already indicated the importance for a set of standards to be complete and consistent. Formal
models make these notions precise. Another observation is that the content of the CDISC
standa
rds depends on the meaning of
what is studied in the biological and clinical reality (often
referred to as
concepts
)
, and how these concepts are represented by data elements from protocol
to submission, i.e. we are dealing with semantic and metadata inform
ation about biomedical and
clinical research
knowledge and
data. The conclusion is immediate and striking. An information
architecture taking this into account needs to be based on a formal ontological metadata model.

Well placed to get the job done are se
mantic models based on the W3C semantic web standards
(RDF, OWL, SKOS). These standards provide the means to define a formal representation of a
body of knowledge. In short, the Resource Description Framework (RDF) specifies a general
model of how any piec
e of knowledge can be represented by statements of the form Subject
-
Predicate
-
Object or Subject
-
Predicate
-
Value, called triples. Each part of a triple (except Value)
has a Uniform Resource Identifier (URI), and triples can be aggregated into graphs with su
bject
and objects as nodes, and predicates as arcs. The Web Ontology Language (OWL) adds a typing
mechanism to classify subjects and objects into a hierarchy of classes and defines modeling
constructs to express knowledge about predicates. This gives a ric
h modeling vocabulary to build
schemas and the capability to derive new triples from existing triples (inference). Finally, the
Simple Knowledge Organization System (SKOS) is a thin RDF based vocabulary that can
be
used to build terminologies
. See [2] for
more information on RDF based standards.

A knowledge base written in RDF can easily be shared between systems by serializing it into
formats such as RDF/XML. RDF knowledge bases are also easy to federate and cross
-
reference
as witnessed by the development
of the Linked Open Data (LOD) cloud, a large amount of open
and cross
-
linked RDF data sets available on the web today. In this context it should be noted that
CDISC EU Interchange 2012



Page |
3



an OWL version of the NCI Thesaurus

(the source for CDISC’s controlled terminologies)

is
freely a
vailable today in an RDF/XML format. Also, an effort is well on its way to port the
BRIDG UML model to an OWL based ontology.

Looking across the CDISC standards, we notice that the content is itself metadata, hence the
RDF schema we have in mind correspond
s to a level 3 meta
-
model. A good starting point here is
the ISO 11179 standard for metadata registries (MDR). This standard is a bit elaborate and not
that widely adopted, but it is does provide a good starting point to develop a small and generic
OWL voc
abulary for metadata models, including most notably the capability of item level
versioning for anything that goes into a metadata registry. Using an ISO 11179 based OWL
vocabulary
,

it is fairly straightforward to create a knowledge base for the CDASH, SDT
M, and
ADaM standards.

Finally, there is a need to eliminate any possible interpretation and to guarantee consistency
between the different CDISC standards. A biomedical concept model
, representing the meaning
of what is studied in the biological and clini
cal reality,

can provide the glue to hold everything
together. It provides common and precise semantic content for any CDASH, SDTM, and ADaM
data element, and restricts these standards to have only representational capabilities. On the other
side of the co
in, an RDF based biomedical concept model can link directly into other RDF
sources with semantic content such as the NCI Thesaurus and BRIDG once its OWL
representation is available.

Our considerations on an information architecture for CDISC standards bas
ed on semantic web
standards lead to the following RDF based information stack.


Figure
1

RDF

OWL

SKOS

ISO 11179 MDR Schema (
subset
)

BRIDG and ISO 21090

Biomedical Concept Model

CDASH

SDTM

ADaM

Sponsor Extensions

N
CI Thesaurus

CDISC EU Interchange 2012



Page |
4



Notice that the top layer offers sponsors the opportunity to extend content based on existing RDF
schemas, e.g. s
ponsors may add additional SDTM data elements a
s supplemental qualifiers, or

introduce additional RDF schemas to cover new types of content.

CDISC Considerations

The CDISC standards have come a long way, both in terms of maturity and adoption, but also
fac
e considerable challenges as more sponsors use the standards, and even more so as substantial
content
is expected to be added for
therapeutic areas. A layered information architecture based
on semantic standards can provide a solid foundation to systematic
ally address these challenges.

The CDISC SHARE project may be the best place to get such an effort on its way, but will
require substantial commitment from CDISC as a whole to be successful. Just recently we have
provided a first draft OWL model to give a
home to the ideas that the SHARE team has been
working on over the past few years. The future roadmap however seems to be unclear at best
with no firm commitment to implementation goals and time lines. At the same time the SHARE
team is already producing m
uch valuable content that fits extremely well in the biomedical
concept model.

Sponsor
and Vendor
Considerations

Right now we seem to have come at a turning point, driven by a widespread adoption of CDISC
standards and an emerging need for sponsors to esta
blish a standards management function
within their respective organizations. Large organizations have increasing difficulty just dealing
with the resulting work load of managing and applying clinical data standards. This naturally
leads to the need for a m
etadata repository (MDR).

The same arguments for the information architecture given earlier apply even more here.
RDF/XML represents an RDF
interface

format
for MDR content
. As indicated before, it can
easily be shared and federated, but also loaded into a

triple store database. Since an RDF
knowledge base can carry its own schema and everything is represented by triples, the triple
store load is immediate and the RDF knowledge base directly represents the MDR content.

Two examples of how sponsors have star
ted to implement semantic standards and apply linked
data principles: At Roche this is done by implementing an internally built MDR, see more details
below. At AstraZeneca the requirements on a commercial MDR product will include an interface
to MDR conten
t based on semantic standards and linked data principles. This is part of a larger
effort called integrative informatics (i2) establishing the components to let a Linked Data cloud
grow across AstraZeneca R&D.

CDISC EU Interchange 2012



Page |
5



MDR Based Standards Implementation at Roche

In

a first phase, Roche has successfully defined a set of clinical trial data standards based on the
CDISC, ISO 11179 MDR, and the W3C semantic standards following the architecture shown
earlier in
Figure
1
. In this
implementation, the biomedical concept model has deliberately been
designed as a thin layer in anticipation that CDISC SHARE is going to give this part of the stack
later on. BRIDG can be added as soon as its OWL representation becomes available. The data
collection and data tabulation standards cover all of safety and the Roche therapeutic areas, but is
only partially based on CDASH. Data analysis standards are still in their infant stages.

In a second phase, Roche has built an MDR and an application infra
structure in 2011. This
includes a controlled mechanism to publish the RDF stack to a triple store database, a web
browser application to deliver the content to end
-
users, and a set of web services to provide
access to other applications. The MDR includes
item level versioning following ISO 11179 and
is deployed in a high availability IT production environment. The next release is scheduled to
include semantic search and linking from the biomedical concept model into the NCI Thesaurus.
The good news for spo
nsors is that semantic technology has proven to work at all levels, from
W3C standards to semantic toolsets such as modeling workbenches, triple store databases, and
application programming interfaces (API).

Roche is now entering a third phase to establish

MDR driven workflow automation from
protocol to submission. The idea is to implement a semantic representation of the protocol and
data analysis plan, and from there use the MDR content to support study build, provide data
transformation services to deriv
e SDTM mappings, and finally support the production of data
analysis and submission deliverables.

References

1.

To read more on knowledge systems and semantic modeling, the following is recommended.



Dean Allemang and Jim Hendler.
Semantic Web for the Working
Ontologist. Second
Edition. Morgan Kaufmann, 2011. This is an excellent book, well
-
written, specifically on
the modeling aspects of RDF and OWL.



Christopher Walton. Agency and the Semantic Web. Oxford University Press, 2007. This
book gives a broad outlook

on knowledge systems and the semantic web, including more
academic background on the computational aspects of the subject.



Dragan Gasevic, Dragan Djuric, and Vladan Devedzic. Model Driven Engineering and
Ontology Management. Second Edition. Springer, 2009
. This book provides valuable
insight on knowledge engineering and the relationship between the different modeling
spaces.

CDISC EU Interchange 2012



Page |
6



2.

Here is a good entry page to locate the W3C standards for the semantic web, in particular the
RDF, RDFS, OWL, and SKOS standards:

http://www.w3.org/2001/sw/wiki/Main_Page


3.

To see what the National Cancer Institute (NCI) is doing in the area of controlled
terminologies and ontology modeling, have a look here:

https://cabig.nci.nih.gov/concepts/EVS/


4.

The National Center for Biomedical Ontology (NCBO) is a great resource for biomedical
ontologies and related technologies. It can be accessed here:

http://www.bioontology.org/