CruxFinalReport-01-27-11-0101-gully - MSU Forge

spotpullSoftware and s/w Development

Nov 2, 2013 (3 years and 7 months ago)


Michael J. Fox Foundation for Parkinson’s Research

Grant Award


Progress Report





AWARD DURATION (in years):





Insert project aims or goals from the original submitted application.













A set of standard termi
nology curated under the OBI formalism for the MJFF
sponsored experiments.














the system. This specifically pertains to how
the system will scale to accommodate new data from other gr
antees. Issues such as (a) the challenges of working with large
data sets, (b) dealing with disparate data formats, (c) data security models and methods, (d) ease of use.


We will provide MJFF with a representation of the accomplishments of this project inc
luding a clear estimation of the
feasibility of constructing a production
level KEfED based system for working on their complete research portfolio. The
implementation developed in this project will certainly be incomplete but should provide a clear indica
tor of the difficulty
of the task and the suitability of our solution.


Use whatever space necessary to concisely summarize progress made toward each stated goal (photos, charts,
and other supplementary materials may be attached to this report).

Aim 1:

A working, populated demonstration system for the two experiments described.

The experiments referred to are projects funded by the Kinetics Foundation (Infusion experiments by Marina Emborg at the Wisc
National Primate Center) and the Michael J Fox

Foundation (Infusion experiments conducted by
Codman and Shurtleff, Inc.
). The
development of the base software application forms the main technical component of the project and provides the focal point o
interaction between the three teams.


and p
rogress over the year.

Within this report package we include interim reports from the course of development of the project that we cite in this sect
ion. These
reports are:



use case design, system walkthroug
hs, dashboards, preliminary models and terminology.


acquiring data from the MS Access infusion database developed by Martin Brady, populating
KEfED models, developing the KEfED editor system, and formulating the ontology
modeling work within Crux.


description of
data gathering from

). Specifically
modeling the Rh2266 cas
e in detail, including pressure data files and images).

These reports describe the initial design phase and the project formulation for this work, charting progress over the course
of the year.

Additionally, in August, we demonstrated the prototype syste
m to Michael Weiner to showcase our capabilities and to have him vet
our progress for the foundation. He was initially very impressed with the system’s underlying design and concept and wanted u
s to
work with him to develop a solution for his laboratory, w
hich we attempted to pursue as an

element to the core Crux
project. It became clear that Dr Weiner needed a short term solution for his group and that, since Crux was an alpha prototyp
e, we
would not be able to provide a full production
level sy
stem immediately. We regard his interest as encouraging and feel that we
should revisit attempting to interest him in using our system as it matures.

Final product deployment

Since the overall system is built as a locally accessible Ruby
Rails a
we distribute the final system as a Virtual Machine (‘VM’, via the VirtualBox
virtualization software package
istributed under either the GNU GPL

). This allows softwar
e engineers at both the Michael J
Fox and Kinetics Foundation to install a complete virtual machine containing the
application and its data on any system (Linux, PC, or Mac).

This approach is somewhat cumbersome (the VM to be installed is a large artifact

taking in excess of 2 GB in disk space), but it is completely self
contained, it can be
installed on any platform and has allowed us to leverage the Ruby on Rails web
application architecture from Gwen Jacobs’ group. This provides fully
for the application as well since the web
server within the VM can only be
accessed from a browser running from within the VM as well. This method avoids
any issues of platform incompatibility and it is robust for low level scripting
approaches that we mig
ht use to automate execution. It should not be considered the
term solution but serves as a delivery mechanism for this prototype.

The system consists of several components, which provided a mechanism for
separating activities and assigning responsi
bilities to each part of the team (see Fig.
1). The central point of control of the application is the Ruby
Rails Crux Server,
which serves the HTML client system to the web browser to be accessed by the end
user. The Crux server also embeds and control
s the KEfED Editor within the client
as a plugin component. Both the Crux Server and KEfED Editor store their data via a
Persevere noSQL database server (which provides a REST Service interface for
storage, editing and deletion). The ontologies under devel
opment are also stored on
the VM within a Virtuoso openlink server that provides a standard RDF SPARQL
endpoint so that the KEfED Editor can access the relevant terminology.

Gwen Jacobs’ group at MSU provided the overall framework for this architecture,
xcept for the KEfED Editor system (provided by Gully Burns’ group at ISI) and the
ontology repository and server (from Alan Ruttenberg at ScienceCommons).

Prototype functionality

: Complete state
machine showing system
level transitions between
pages of the Crux HTML
Client. Screenshots of these
elements are shown in Figs.
7. Note that simple dialog
boxes are not included in
this description.


The component architecture of
the Crux system yea
r 1 prototype.

Program flow within the Crux system radiates from the cent
ral ‘Index’ page where all the individual experiments are listed. For each
experiment, it is then possible to navigate to perform a number of different activities on either the model or the data conta
ined in the
model. We here include screenshots of the va
rious steps to facilitate this explanation.

: Crux Index Page. This shows a
list of experiments curated into the
system. Each experiment is designated
with an Investigator and an Experimental
Description. It is possible
to navigate to
the Experimental Dashboard (Fig. 5), the
Experimental Design (Fig. 4), and the
experiment settings (not shown) pages.

: KEfED editor page. This shows
the experiment design tools of the syst
On the left hand panel is a ‘canvas’ that
permits the user to draw an experimental
design as a diagram.

: Experiment Dashboard.
This screen shows a two
representation of the data in the
x system. Each measurement
variable is listed as a link that
expands to show the substructure
of the variable. The number of
individual data tuples for each
variable are listed by the variable
name. Notably the dashboard
system uses the ‘KEfED
a non
editable component that
permits the end user to click on an
element within the model and then
select the selected variable in the
Crux system (see Fig. 6).

: Pop
up information for a specific variable.

shown in Figure 5, each variable on the dashboard may
be expanded to show how the variable is itself indexed
by Parameter variables (or within the ‘Quickview’
panel, where some examples of the data contained in the
variable are listed). Clicking on th
e ‘View All Data’ link
navigates to the screen shown in Figure 7.

: Measurement data for a specific variable. Note
that the number of columns can become quite large for
complex designs. The ‘Import CSV’ button and
nload CSV’ link permits the user to upload and
download a comma
delimited file for the data contents
of the system.

The KEfED Editor

Many novel aspects of this system are wrapped up in the functionality of the KEfED editor system. We will therefore
describe the
functionality of this specific component of the system in more depth here. The main idea of applying the ‘Knowledge Engineeri
from Experimental Design’ model to act as is to provide a lightweight, intuitive approach to representing the desig
n of an experiment
in terms of the interrelationships between experimental variables.

: A cartoon
depicting an example usage
of the KEfED editor to drag
an ‘activity’ (in this case
‘injection’, as opposed to
on’ see Fig. 9 and 10)
from the palette into the
canvas to add it to the
current model. The focus of
the right
hand panel shifts
to a specialized view that
permits the user to enter
specific data pertaining to
this element. Within the
middle pane, the use
r may
then search for underlying
ontology terminology by
clicking the ‘Search’ button.
A search window opens,
automatically performing a
query on the NCBO’s
ontology server for the term
‘injection’. Terms may then
be checked in the form and
then added to t
he element in
the model by clicking the
‘Add’ button.

The functionality of the KEfED editor (developed within
Crux leveraging work from the BioScholar project,
permits the construction of models expressed as workflows with ad
ditional semantics. The example shown in Fig. 8 is an example of
the intuitive, drag
drop functionality available within the Flex programming language and the way that the system includes
linkages to ontology terminology from OBI. Features of this syst
em specifically include structured data

The KEfED editor provides a typed, structured approach to defining variables within the KEfED Editor. Each variable may be de
as the following set of types.

Simple Types: True/False, Integer, Decimal.

ific Types: Decimal with units.

Ontology Types: Term (ontology terms), Region (a specialized ontology object for brain structures).

Text Types: Text, Text List, Long Text.

related Types: Date, Time, DateTime.

based Types: File, Image

Types: Table (a structured object with attributes set to the other basic types).

The KEfED model framework therefore provides a general purpose framework for modeling (A) the protocol and (B) parameters and

(C) measurement variables for an experiment. The

KEfED Editor provides a simple, easy
use interface (available as a component
within the BioScholar web application and as a separate plug
in component in Crux).

KEfED Models for the Codman and Emborg experiments

Our primary effort has centered arou
nd models for the experiments of the Codman and Shurtleff project and work by Marina Emborg
from the Wisconsin National Primate Research Center. These are shown below in detail.

Experimental KEfED Models For Codman.

, (A) Codman

experimental design: these
experiments involve studying the dynamics of infusing compounds
into gel flasks and measuring both the pressure and volume of the
resulting bolus. The studies

also involve performing MR Imaging on
the flask to obtain a concentration map of the infusate within the
flask from which relaxivity calculations could be based. (B) Codman

experimental design: these experiments involve placing a
catheter into the

brain of an experimental subject and then
concurrently performing MRI imaging and studying the relation
between pressure and volume of the infusate within the sample.
Concurrently with this process, MR imaging is performed on this
sample. Following the lo
oped section of the experiment, the
experimental subject is perfused and brain is extracted and processed
for immunohistochemistry to generate slides with are then imaged to
provide an image stack (which we would store as an archived file
within the system

Both the MR imaging and pressure measurements in both
experiments are performed over time (which is used as an indexing
variable over a sampling loop).



Experimental KEfED Models For

, (A) Emborg

design: as was the cas
e with the Codman
experiments, a catheter was placed into the a gel
flask and an infusion of a specific chemical was
started. Over the time
course of this infusion,
pressure at the tip of the catheter was measured
and then analysed to provide summary
ulations of a number of variables (flow
pressure, peak
) over the course
of the experiment. Concurrently with this, MR
images are acquired so that volumetric
calculations can be made.

(B) Emborg

experimental design: With
this experiment we only were able to find pressure
data from the Access database made available to
us, and so we have modeled the link pressure data
from there. Ex
vivo brains were infused and
pressurements and subsequent calculations were

(C) E

experimental design: Live
animals had Navigus infusion systems surgically
implanted so that catheters could be placed into
position. Initial scans were also made to determine
the target coordinates and then to confirm the
accuracy of the plac
ement. Over the time
of an infusion (indexed by time), a compound was
infused under a specific pump configuration and
the line
pressure was measured. This data was
then subsequently analyzed to generate calculated
values of peak
pressure and equilib
rium pressure.
During the same experimental acquisition, MR
imaging was performed and a number of data
transformations applied to eventually calculate a
map’ of the infusion site.
This map could be imaged (which would then
placed int
o a written report of the experiment)
and processed to calculate the value of the
following crucial measurement values: total
volume (Vi), total
amount (Ai),
volume (Vd), and distribution
target (Vt).




KEfED Data for the Codman and Emborg experiments

As described, we have curated data from the Emborg experiments only. Codman’s IP interests prohibited any access to the actua
l data
pertaining to the mode
ls that we constructed. Thus, we populated the Emborg models from two sources: the Microsoft Access
Database constructed by Martin Brady (Infusion.mdb) and from the

data repository. Our main focus

for this
activity was the Emborg

data set as a complex experimental design requiring multiple passes over the design to add stages
where it was clear that data needed to be added. For example, within an early draft of the experimental model we enc
apsulated the
processing of MRI data to generate volumetric measurements as a single step called “MRI Data Processing”. It became clear tha
t this
step involved several stages for which the intermediate data would typically be stored and would need to be tr
acked within the data
management system. Within the current data management strategy used by this project (a centralized MS Access database to mana
metadata and individual datafiles stored in a shared document repository), we observed that finding and re
trieving these intermediate
data files was complicated. Using the KEfED model to track and organize these data files seems to provide a natural approach
requires little additional extension of the existing approach. It was largely because of this obse
rvation that we added the ‘file’
measurement type. The brainfu repository contained a very large number of files (since each ‘image stack’ was itself a set of

files, that
we downloaded one
one and then zipped into a single archive).

POL: Anything else

needed for this section?

: A set of standard terminology curated under the OBI formalism for the MJFF

Alan, can you provide a write
up of your work with the ontology work here?

Aim 3:
Data pertaining to curation times and w
orkload imposed on grantees by using the
system. This specifically pertains to how the system will scale to accommodate new data from
other grantees. Issues such as (a) the challenges of working with large data sets, (b) dealing
with disparate data formats
, (c) data security models and methods, (d) ease of use.

Our original plan was to recruit users from the two Foundations into the team and to perform a planned experiment with clear
metrics. It became clear that this would be unfeasible for the

current way that the project is organized, largely due to the workload
actively present for likely collaborators from within the Foundations (and the early stage of development of the software). M
Brady provided very valuable support for this process

but we absolutely need to work closely with a member of the end
community (i.e. a grant administrator).

Here, we candidly describe our experiences of curating data into the Crux system with the stated caveat that the developers a
inventors of the

system performed this work. We will offset the natural bias of these observations by focusing on difficulties and
possible issues as we have observed them within our use of the system.

The curation times for the data described in aim 1 and 2 were short (
of the order of 2 hours for the entire MS Access database for the


experiments) since the data was already formatted appropriately. The process of generating a spreadsheet
for the data was simple and populating the system only req
uired that the columns in the spreadsheet matched the names (and types) of
variables in the KEfED model. Given the ease with which Excel and other spreadsheet software can manipulate very large arrays

such tables manually (and the ease with which script
s and lightweight data processing systems manipulate such tables), we feel that
the process of manipulating data to be uploaded into the system is very straightforward. Given that the process of generating

the target
spreadsheets from a KEfED model is full
y automated, we feel that populating such tables is laborious, and repetitive but

A technical issue that we intend to solve going forward is that each individual measurement variable generates an entire spre
and if the variable h
as many parameter dependencies, every single parameter will have a great many columns. We found that this was
functional, but somewhat inelegant and should be improved in future developments. One suggestion would be that we could reuse

same spreadsheet

for multiple measurement variables.

By far, the most uncertain aspect of the evaluation of this approach is whether the underlying KEfED modeling formulation cou
ld be
combined with the technically
challenging elements of ontology curation to permit the
easy and straightforward development of
practically viable models for experiments under the full workload of the Michael J Fox Foundation’s project load. One very po
outcome of the Crux project by having expert OBI curators work directly on the inter
actions between the development of the KEfED
modeling approach and the OBI ontology has been to elucidate the process of developing the elements of a model that can be sc
and performed effectively by a scientist who has not been trained as an ontologis
t (and consequently embedded into the logic of the
drawing interface). From a modeling perspective, the tasks that a user of the Crux system would have to perform is (a) use th
workflow primitives to describe accurately the complete experimental protocol
at an effective level of granularity and (b) use the
variable modeling approaches within the system to describe the data dependencies in the system.

A particular feature of this project that requires a little additional explanation is that of

Scientists tend to iterate over
experimental designs, requiring that their models be quite flexible and extensible. Within a conventional database system, th
is might
mean that columns and tables would need to be added over time, extending the schema. Natu
rally, these changes then effect the data
that was previously curated under a simpler experimental design by adding columns with no data. This situation is not tenable

in the
long run. We therefore envisage a flexible database with varying numbers of colum
ns depending on the version of the experimental
design that is being used and populated. This is a foundational principle of the KEfED approach.

The security issues invoked by the system simply relied on keeping all services implemented within the syste
m encrypted and secure.
This technology is mature and given the restricted way that the system is deployed, we anticipate no difficulties. If, in the

future, the
system must be deployed over the network (which would likely be the preferred way to deploy an
d use this), the approach would need
to be adjusted so that different roles could be assigned for different users.

The ease
use issues are challenging to address without direct feedback from grant administrators. As described above, it is essential

perform usability experiments within the foundation to investigate this question.

Aim 4
We will provide MJFF with a representation of the accomplishments of this project
including a clear estimation of the feasibility of constructing a production
l KEfED based
system for working on their complete research portfolio. The implementation developed in this
project will certainly be incomplete but should provide a clear indicator of the difficulty of the
task and the suitability of our solution

The pro
ject as it stands at the present time is a prototype system that demonstrates the feasibility of using KEfED as an underlying

approach for constructing a knowledge management system for MJFFs grants. The development work performed within the group has

ided such a tool, but with caveats.


The system is currently implemented as part of the Yogo framework (from MSU), using the core KEfED
editor system
(under development within ISI) and making use of the OBI ontology. Constructing the system on a single pl
atform within a
community with additional resources (and also other end
users and use
cases) is vital to the process of converting this
demonstration prototype to a functional operational system.


The development task of building a KEfED
enabled database
is tractable.


Scaling up such a system is primarily complicated by the possible complexity of dealing with a large number of different
experiments and experimental types. Our experiences of working with (a) the preliminary modeling approach via the
ace suggest that constructing such models based on the experience of grant officers (which we assume involves a deeper
understanding of the underlying science than members of our team) should be straightforward, (b) it is not entirely clear,
however, that
the coordination required to link terms together from these many models would be straightforward to do (and is
the perennial challenge of developing shared knowledge representations for reuse). It simply remains that working with only
five separate experim
ental designs that are all closely related in this project is insufficient to fully inform us of the challenges
that working with several hundred such designs would pose. These questions remain an important next step of the
development of the KEfED approac
h in conjunction with community
driven ontology development.


Discuss your final conclusions.

We feel that this project demonstrated the feasibility of this approach by leveraging existing systems together into a workin
application. Since these systems

originated under different design goals and implementations, the major challenges of this work were
concerned with coordination of the team and bringing together these technical components. The system itself is deployable, fu
and could be used bey
ond the scope of the existing project to manage experimental designs in its own right. The representations of the
example experiments are preliminary and would need to be validated and improved upon through work with expert grant managers.


Identify unres
olved issues and/or next steps for the project going forward.

This work has brought to light several representational issues that will improve the formulation:


Extensions of the representation of the experimental protocol to include loops and their ind
exing variables (such as repeating an
experimental assay at different times).


As mentioned above, versioning an experimental design.


Managing the relationships of parameters within data transformation steps is more complex than that from experimental
kflows that give rise to measurements. In order to effectively address this question, we will use the same approaches as thos
developed for computational workflows. There may be a substantial overlap between the two technologies which we will actively

estigate and exploit.


Data types and their relationship to measurement scales is an interesting issue that may make the way we model data in the
system more modular and tractable.

The MSU group has other commitments that prevent them from participating
directly in the second year of funding (but see the
section below on ‘Collaboration’).

We are excited though to bring the system directly into activities of the Biomedical Information Research Network (BIRN,
), of which Dr Gully Burns is the chair of the Knowledge Engineering Working Group. This is a large
consortium effort of experts in Data Management, Information Integration, Knowledge Engineering, Workflows, Genetics, Securit

and Operations. The environment is ideal to take the preliminary results of this project and convert them into a fully
fledged working
system. In addition, work in the BioScholar project (in which the KEfED Editor originated) has progressed and will serve

as a suitable
launching platform for future work.

Importantly, BIRN itself has a dedicated team of senior scientists (headed by Dr Karl Helmer) who are responsible for outreac
activities. We anticipate expanding the scope of this work to other disease
foundations (and possible to other funding organizations
and consortia). BIRN will provide direct support for this activity and have agreed to pursue such matters going forward (Dr J
oe Ames,
Dr Karl Helmer).

Future work must include direct and active pa
rticipation with end
users, we will perform usability and performance evaluations of the
system ongoingly.

Additionally, we will write this year’s work into a scientific publication based on non
confidential data from outside of the





Report (and include copies of) any


abstracts, findings, or
papers (submitted for publication
, in press,

or published) that resulted from the
funds associated with this award

Dr Gully Burns pre
sented this project the International Biocuration meeting in 2009, predating the contract but nonetheless attracting a
very positive response.

Dr Gwen Jacobs presented Crux as a poster at the Society for Neuroscience annual meeting in 2009.




(a) Indicate funding you have received from other sources for work based
this MJFF
funded award. Please include the name of the funder (e.g. NIH) and the work funded (title, $ amount).


Please also indicate if you
to seek

additional funds for work related to this MJFF grant.

Dr Gwen Jacobs’ Yogo project is an RO1 that is due for renewal this year. She will be directly leveraging results from this y
work in securing future funding from NIH. In addition, Dr Burns and R
uttenberg will continue to serve as consultants on that project.

Gully Burns’ BioScholar project is also an RO1 in its third year. Since BioScholar is based on using KEfED modeling with the
literature there is enormous scope to include a data
oriented ap
proach under the same technology.



Describe any resources (including but not limited to resources such as cell
lines and mouse

models) that resulted from your work as well as any patents

and/or licenses

that resulted
discovery on this project.



Will any collaboration result from this work? (Note: Collaboration can include work with
pharma and biotech as well as academic or nonprofit collaborations.)

This project has inherently in
volved a collaborative effort between three groups, leading to many technical challenges that we have
overcome. As described above, a direct collaboration will be the continued work by all three partners within the next phase o
f the
Yogo group.

The develo
pment of the project within BIRN will not only expose the system a range of bioinformatics technology developers but to
their users, including the Nonhuman Primate Centers (including the center where Dr Marina Emborg works in Wisconsin). We
anticipate push
ing this connection if it is in the interests of the Foundation

There was significant interest in this work from Mike Weiner of the ADNI group. Unfortunately, the system in its present form

too preliminary since he needed access to a fully
duction level system immediately. Nonetheless Dr Weiner remains a
very interesting possible collaborator in the future.

The OBI consortium includes a large number of different members (including Susanne Sansone, the originator of the ISA framewo
We w
ill pursue possible collaboration with her and Dr Tim Clark, the originator of the SWAN argumentation ontology, the main
developer of the PD Research Online web community and the head of the W3C
HCLS Scientific Discourse

group. The KEfED
approach may be us
ed within Dr Clark’s framework as the basis for structured nanopublications as a direct result of presenting the
KEfED methodology at the Beyond the PDF meeting in January 2011.


I certify that the statements herein are true, complete and accurate to the best of my knowledge