CruxFinalReport-01-27-11-0101-gully - MSU Forge

spotpullΛογισμικό & κατασκευή λογ/κού

2 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

69 εμφανίσεις


Michael J. Fox Foundation for Parkinson’s Research

Grant Award

Final

Progress Report



PRINCIPAL INVESTIGATOR:

DATE:

MJFF PROGRAM:

AWARD START DATE:

AWARD DURATION (in years):

PROJECT TITLE:


________________________________________________
_____________________________________________
______




PROGRESS TOWARDS INITIAL AIMS

1.

Insert project aims or goals from the original submitted application.


1.

A

working,

populated

demonstration

system

for

the

two

experiments

described.

2.

A set of standard termi
nology curated under the OBI formalism for the MJFF
-
sponsored experiments.

3.

Data

pertaining

to

curation

times

and

workload

imposed

on

grantees

by

using

the system. This specifically pertains to how
the system will scale to accommodate new data from other gr
antees. Issues such as (a) the challenges of working with large
data sets, (b) dealing with disparate data formats, (c) data security models and methods, (d) ease of use.

4.

We will provide MJFF with a representation of the accomplishments of this project inc
luding a clear estimation of the
feasibility of constructing a production
-
level KEfED based system for working on their complete research portfolio. The
implementation developed in this project will certainly be incomplete but should provide a clear indica
tor of the difficulty
of the task and the suitability of our solution.


2.

Use whatever space necessary to concisely summarize progress made toward each stated goal (photos, charts,
and other supplementary materials may be attached to this report).


Aim 1:

A working, populated demonstration system for the two experiments described.


The experiments referred to are projects funded by the Kinetics Foundation (Infusion experiments by Marina Emborg at the Wisc
onsin
National Primate Center) and the Michael J Fox

Foundation (Infusion experiments conducted by
Codman and Shurtleff, Inc.
). The
development of the base software application forms the main technical component of the project and provides the focal point o
f
interaction between the three teams.


Milestones

and p
rogress over the year.


Within this report package we include interim reports from the course of development of the project that we cite in this sect
ion. These
reports are:


1.
CruxReport
-
2010
-
02
-
Feb.pdf



initial

use case design, system walkthroug
hs, dashboards, preliminary models and terminology.

2.
CruxReport
-
2010
-
04
-
April.pdf



acquiring data from the MS Access infusion database developed by Martin Brady, populating
KEfED models, developing the KEfED editor system, and formulating the ontology
modeling work within Crux.

3.
CruxReport
-
2010
-
06
-
June.pdf



description of
data gathering from
https://forsher.org

(now
https://brainfu.org/
). Specifically
modeling the Rh2266 cas
e in detail, including pressure data files and images).


These reports describe the initial design phase and the project formulation for this work, charting progress over the course
of the year.


Additionally, in August, we demonstrated the prototype syste
m to Michael Weiner to showcase our capabilities and to have him vet
our progress for the foundation. He was initially very impressed with the system’s underlying design and concept and wanted u
s to
work with him to develop a solution for his laboratory, w
hich we attempted to pursue as an
additional

element to the core Crux
project. It became clear that Dr Weiner needed a short term solution for his group and that, since Crux was an alpha prototyp
e, we
would not be able to provide a full production
-
level sy
stem immediately. We regard his interest as encouraging and feel that we
should revisit attempting to interest him in using our system as it matures.




Final product deployment
.


Since the overall system is built as a locally accessible Ruby
-
on
-
Rails a
pplication,
we distribute the final system as a Virtual Machine (‘VM’, via the VirtualBox
virtualization software package
d
istributed under either the GNU GPL

from
http://www.virtualbox.org/
). This allows softwar
e engineers at both the Michael J
Fox and Kinetics Foundation to install a complete virtual machine containing the
application and its data on any system (Linux, PC, or Mac).


This approach is somewhat cumbersome (the VM to be installed is a large artifact

taking in excess of 2 GB in disk space), but it is completely self
-
contained, it can be
installed on any platform and has allowed us to leverage the Ruby on Rails web
-
application architecture from Gwen Jacobs’ group. This provides fully
-
packaged
security
for the application as well since the web
-
server within the VM can only be
accessed from a browser running from within the VM as well. This method avoids
any issues of platform incompatibility and it is robust for low level scripting
approaches that we mig
ht use to automate execution. It should not be considered the
long
-
term solution but serves as a delivery mechanism for this prototype.


The system consists of several components, which provided a mechanism for
separating activities and assigning responsi
bilities to each part of the team (see Fig.
1). The central point of control of the application is the Ruby
-
on
-
Rails Crux Server,
which serves the HTML client system to the web browser to be accessed by the end
-
user. The Crux server also embeds and control
s the KEfED Editor within the client
as a plugin component. Both the Crux Server and KEfED Editor store their data via a
Persevere noSQL database server (which provides a REST Service interface for
storage, editing and deletion). The ontologies under devel
opment are also stored on
the VM within a Virtuoso openlink server that provides a standard RDF SPARQL
endpoint so that the KEfED Editor can access the relevant terminology.


Gwen Jacobs’ group at MSU provided the overall framework for this architecture,
e
xcept for the KEfED Editor system (provided by Gully Burns’ group at ISI) and the
ontology repository and server (from Alan Ruttenberg at ScienceCommons).


Prototype functionality



Figure
2
: Complete state
machine showing system
-
level transitions between
pages of the Crux HTML
Client. Screenshots of these
elements are shown in Figs.
3
-
7. Note that simple dialog
boxes are not included in
this description.















Figure
1



The component architecture of
the Crux system yea
r 1 prototype.




Program flow within the Crux system radiates from the cent
ral ‘Index’ page where all the individual experiments are listed. For each
experiment, it is then possible to navigate to perform a number of different activities on either the model or the data conta
ined in the
model. We here include screenshots of the va
rious steps to facilitate this explanation.


Figure
3
: Crux Index Page. This shows a
list of experiments curated into the
system. Each experiment is designated
with an Investigator and an Experimental
Description. It is possible
to navigate to
the Experimental Dashboard (Fig. 5), the
Experimental Design (Fig. 4), and the
experiment settings (not shown) pages.


















Figure
4
: KEfED editor page. This shows
the experiment design tools of the syst
em.
On the left hand panel is a ‘canvas’ that
permits the user to draw an experimental
design as a diagram.























Figure
5
: Experiment Dashboard.
This screen shows a two
-
fold
representation of the data in the
Cru
x system. Each measurement
variable is listed as a link that
expands to show the substructure
of the variable. The number of
individual data tuples for each
variable are listed by the variable
name. Notably the dashboard
system uses the ‘KEfED
-
Navigator’
a non
-
editable component that
permits the end user to click on an
element within the model and then
select the selected variable in the
Crux system (see Fig. 6).
















Figure
6
: Pop
-
up information for a specific variable.

As
shown in Figure 5, each variable on the dashboard may
be expanded to show how the variable is itself indexed
by Parameter variables (or within the ‘Quickview’
panel, where some examples of the data contained in the
variable are listed). Clicking on th
e ‘View All Data’ link
navigates to the screen shown in Figure 7.




Figure
7
: Measurement data for a specific variable. Note
that the number of columns can become quite large for
complex designs. The ‘Import CSV’ button and
‘Dow
nload CSV’ link permits the user to upload and
download a comma
-
delimited file for the data contents
of the system.





The KEfED Editor


Many novel aspects of this system are wrapped up in the functionality of the KEfED editor system. We will therefore
describe the
functionality of this specific component of the system in more depth here. The main idea of applying the ‘Knowledge Engineeri
ng
from Experimental Design’ model to act as is to provide a lightweight, intuitive approach to representing the desig
n of an experiment
in terms of the interrelationships between experimental variables.







Figure
8
: A cartoon
depicting an example usage
of the KEfED editor to drag
an ‘activity’ (in this case
‘injection’, as opposed to
‘Infusi
on’ see Fig. 9 and 10)
from the palette into the
canvas to add it to the
current model. The focus of
the right
-
hand panel shifts
to a specialized view that
permits the user to enter
specific data pertaining to
this element. Within the
middle pane, the use
r may
then search for underlying
ontology terminology by
clicking the ‘Search’ button.
A search window opens,
automatically performing a
query on the NCBO’s
ontology server for the term
‘injection’. Terms may then
be checked in the form and
then added to t
he element in
the model by clicking the
‘Add’ button.

















The functionality of the KEfED editor (developed within
Crux leveraging work from the BioScholar project,
RO1
-
GM083871
)
permits the construction of models expressed as workflows with ad
ditional semantics. The example shown in Fig. 8 is an example of
the intuitive, drag
-
and
-
drop functionality available within the Flex programming language and the way that the system includes
linkages to ontology terminology from OBI. Features of this syst
em specifically include structured data


The KEfED editor provides a typed, structured approach to defining variables within the KEfED Editor. Each variable may be de
fined
as the following set of types.




Simple Types: True/False, Integer, Decimal.



Scient
ific Types: Decimal with units.



Ontology Types: Term (ontology terms), Region (a specialized ontology object for brain structures).



Text Types: Text, Text List, Long Text.



Time
-
related Types: Date, Time, DateTime.



File
-
based Types: File, Image



Structured
Types: Table (a structured object with attributes set to the other basic types).


The KEfED model framework therefore provides a general purpose framework for modeling (A) the protocol and (B) parameters and

(C) measurement variables for an experiment. The

KEfED Editor provides a simple, easy
-
to
-
use interface (available as a component
within the BioScholar web application and as a separate plug
-
in component in Crux).


KEfED Models for the Codman and Emborg experiments


Our primary effort has centered arou
nd models for the experiments of the Codman and Shurtleff project and work by Marina Emborg
from the Wisconsin National Primate Research Center. These are shown below in detail.


Experimental KEfED Models For Codman.






























Figure
9
, (A) Codman
in
-
vitro

experimental design: these
experiments involve studying the dynamics of infusing compounds
into gel flasks and measuring both the pressure and volume of the
resulting bolus. The studies

also involve performing MR Imaging on
the flask to obtain a concentration map of the infusate within the
flask from which relaxivity calculations could be based. (B) Codman
in
-
vivo

experimental design: these experiments involve placing a
catheter into the

brain of an experimental subject and then
concurrently performing MRI imaging and studying the relation
between pressure and volume of the infusate within the sample.
Concurrently with this process, MR imaging is performed on this
sample. Following the lo
oped section of the experiment, the
experimental subject is perfused and brain is extracted and processed
for immunohistochemistry to generate slides with are then imaged to
provide an image stack (which we would store as an archived file
within the system
).


Both the MR imaging and pressure measurements in both
experiments are performed over time (which is used as an indexing
variable over a sampling loop).

A

B





Experimental KEfED Models For
Emborg
.





Figure
10
, (A) Emborg
in
-
vitro

experimental
design: as was the cas
e with the Codman
experiments, a catheter was placed into the a gel
flask and an infusion of a specific chemical was
started. Over the time
-
course of this infusion,
pressure at the tip of the catheter was measured
and then analysed to provide summary
calc
ulations of a number of variables (flow
-
time,
start
-
pressure, peak
-
pressure,
etc.
) over the course
of the experiment. Concurrently with this, MR
images are acquired so that volumetric
calculations can be made.


(B) Emborg
ex
-
viv
o

experimental design: With
in
this experiment we only were able to find pressure
data from the Access database made available to
us, and so we have modeled the link pressure data
from there. Ex
-
vivo brains were infused and
pressurements and subsequent calculations were
made.


(C) E
mborg
in
-
viv
o

experimental design: Live
animals had Navigus infusion systems surgically
implanted so that catheters could be placed into
position. Initial scans were also made to determine
the target coordinates and then to confirm the
accuracy of the plac
ement. Over the time
-
course
of an infusion (indexed by time), a compound was
infused under a specific pump configuration and
the line
-
pressure was measured. This data was
then subsequently analyzed to generate calculated
values of peak
-
pressure and equilib
rium pressure.
During the same experimental acquisition, MR
imaging was performed and a number of data
transformations applied to eventually calculate a
‘scaled
-
concentration
-
map’ of the infusion site.
This map could be imaged (which would then
placed int
o a written report of the experiment)
and processed to calculate the value of the
following crucial measurement values: total
-
infusate
-
volume (Vi), total
-
infusate
-
amount (Ai),
distribution
-
volume (Vd), and distribution
-
volume
-
in
-
target (Vt).


A

B



C


KEfED Data for the Codman and Emborg experiments


As described, we have curated data from the Emborg experiments only. Codman’s IP interests prohibited any access to the actua
l data
pertaining to the mode
ls that we constructed. Thus, we populated the Emborg models from two sources: the Microsoft Access
Database constructed by Martin Brady (Infusion.mdb) and from the
https://brainfu.org/

data repository. Our main focus

for this
activity was the Emborg
in
-
vivo

data set as a complex experimental design requiring multiple passes over the design to add stages
where it was clear that data needed to be added. For example, within an early draft of the experimental model we enc
apsulated the
processing of MRI data to generate volumetric measurements as a single step called “MRI Data Processing”. It became clear tha
t this
step involved several stages for which the intermediate data would typically be stored and would need to be tr
acked within the data
management system. Within the current data management strategy used by this project (a centralized MS Access database to mana
ge
metadata and individual datafiles stored in a shared document repository), we observed that finding and re
trieving these intermediate
data files was complicated. Using the KEfED model to track and organize these data files seems to provide a natural approach
that
requires little additional extension of the existing approach. It was largely because of this obse
rvation that we added the ‘file’
measurement type. The brainfu repository contained a very large number of files (since each ‘image stack’ was itself a set of

files, that
we downloaded one
-
by
-
one and then zipped into a single archive).


POL: Anything else

needed for this section?


Aim
2
: A set of standard terminology curated under the OBI formalism for the MJFF
-
sponsored
experiments.


Alan, can you provide a write
-
up of your work with the ontology work here?


Aim 3:
Data pertaining to curation times and w
orkload imposed on grantees by using the
system. This specifically pertains to how the system will scale to accommodate new data from
other grantees. Issues such as (a) the challenges of working with large data sets, (b) dealing
with disparate data formats
, (c) data security models and methods, (d) ease of use.



Our original plan was to recruit users from the two Foundations into the team and to perform a planned experiment with clear
usability
metrics. It became clear that this would be unfeasible for the

current way that the project is organized, largely due to the workload
actively present for likely collaborators from within the Foundations (and the early stage of development of the software). M
artin
Brady provided very valuable support for this process

but we absolutely need to work closely with a member of the end
-
user
community (i.e. a grant administrator).


Here, we candidly describe our experiences of curating data into the Crux system with the stated caveat that the developers a
nd
inventors of the

system performed this work. We will offset the natural bias of these observations by focusing on difficulties and
possible issues as we have observed them within our use of the system.


The curation times for the data described in aim 1 and 2 were short (
of the order of 2 hours for the entire MS Access database for the
Emborg
ex
-
vivo

and
in
-
vivo

experiments) since the data was already formatted appropriately. The process of generating a spreadsheet
for the data was simple and populating the system only req
uired that the columns in the spreadsheet matched the names (and types) of
variables in the KEfED model. Given the ease with which Excel and other spreadsheet software can manipulate very large arrays

of
such tables manually (and the ease with which script
s and lightweight data processing systems manipulate such tables), we feel that
the process of manipulating data to be uploaded into the system is very straightforward. Given that the process of generating

the target
spreadsheets from a KEfED model is full
y automated, we feel that populating such tables is laborious, and repetitive but
straightforward.


A technical issue that we intend to solve going forward is that each individual measurement variable generates an entire spre
adsheet,
and if the variable h
as many parameter dependencies, every single parameter will have a great many columns. We found that this was
functional, but somewhat inelegant and should be improved in future developments. One suggestion would be that we could reuse

the
same spreadsheet

for multiple measurement variables.


By far, the most uncertain aspect of the evaluation of this approach is whether the underlying KEfED modeling formulation cou
ld be
combined with the technically
-
challenging elements of ontology curation to permit the
easy and straightforward development of
practically viable models for experiments under the full workload of the Michael J Fox Foundation’s project load. One very po
sitive
outcome of the Crux project by having expert OBI curators work directly on the inter
actions between the development of the KEfED
modeling approach and the OBI ontology has been to elucidate the process of developing the elements of a model that can be sc
oped
and performed effectively by a scientist who has not been trained as an ontologis
t (and consequently embedded into the logic of the
drawing interface). From a modeling perspective, the tasks that a user of the Crux system would have to perform is (a) use th
e
workflow primitives to describe accurately the complete experimental protocol
at an effective level of granularity and (b) use the
variable modeling approaches within the system to describe the data dependencies in the system.


A particular feature of this project that requires a little additional explanation is that of
versioning
.

Scientists tend to iterate over
experimental designs, requiring that their models be quite flexible and extensible. Within a conventional database system, th
is might
mean that columns and tables would need to be added over time, extending the schema. Natu
rally, these changes then effect the data
that was previously curated under a simpler experimental design by adding columns with no data. This situation is not tenable

in the
long run. We therefore envisage a flexible database with varying numbers of colum
ns depending on the version of the experimental
design that is being used and populated. This is a foundational principle of the KEfED approach.


The security issues invoked by the system simply relied on keeping all services implemented within the syste
m encrypted and secure.
This technology is mature and given the restricted way that the system is deployed, we anticipate no difficulties. If, in the

future, the
system must be deployed over the network (which would likely be the preferred way to deploy an
d use this), the approach would need
to be adjusted so that different roles could be assigned for different users.


The ease
-
of
-
use issues are challenging to address without direct feedback from grant administrators. As described above, it is essential
to

perform usability experiments within the foundation to investigate this question.


Aim 4
:
We will provide MJFF with a representation of the accomplishments of this project
including a clear estimation of the feasibility of constructing a production
-
leve
l KEfED based
system for working on their complete research portfolio. The implementation developed in this
project will certainly be incomplete but should provide a clear indicator of the difficulty of the
task and the suitability of our solution


The pro
ject as it stands at the present time is a prototype system that demonstrates the feasibility of using KEfED as an underlying

approach for constructing a knowledge management system for MJFFs grants. The development work performed within the group has

prov
ided such a tool, but with caveats.


1)

The system is currently implemented as part of the Yogo framework (from MSU), using the core KEfED
-
editor system
(under development within ISI) and making use of the OBI ontology. Constructing the system on a single pl
atform within a
community with additional resources (and also other end
-
users and use
-
cases) is vital to the process of converting this
demonstration prototype to a functional operational system.

2)

The development task of building a KEfED
-
enabled database
is tractable.

3)

Scaling up such a system is primarily complicated by the possible complexity of dealing with a large number of different
experiments and experimental types. Our experiences of working with (a) the preliminary modeling approach via the
interf
ace suggest that constructing such models based on the experience of grant officers (which we assume involves a deeper
understanding of the underlying science than members of our team) should be straightforward, (b) it is not entirely clear,
however, that
the coordination required to link terms together from these many models would be straightforward to do (and is
the perennial challenge of developing shared knowledge representations for reuse). It simply remains that working with only
five separate experim
ental designs that are all closely related in this project is insufficient to fully inform us of the challenges
that working with several hundred such designs would pose. These questions remain an important next step of the
development of the KEfED approac
h in conjunction with community
-
driven ontology development.



3.

Discuss your final conclusions.


We feel that this project demonstrated the feasibility of this approach by leveraging existing systems together into a workin
g
application. Since these systems

originated under different design goals and implementations, the major challenges of this work were
concerned with coordination of the team and bringing together these technical components. The system itself is deployable, fu
nctional
and could be used bey
ond the scope of the existing project to manage experimental designs in its own right. The representations of the
example experiments are preliminary and would need to be validated and improved upon through work with expert grant managers.



4.

Identify unres
olved issues and/or next steps for the project going forward.


This work has brought to light several representational issues that will improve the formulation:


1)

Extensions of the representation of the experimental protocol to include loops and their ind
exing variables (such as repeating an
experimental assay at different times).

2)

As mentioned above, versioning an experimental design.

3)

Managing the relationships of parameters within data transformation steps is more complex than that from experimental
wor
kflows that give rise to measurements. In order to effectively address this question, we will use the same approaches as thos
e
developed for computational workflows. There may be a substantial overlap between the two technologies which we will actively

inv
estigate and exploit.

4)

Data types and their relationship to measurement scales is an interesting issue that may make the way we model data in the
system more modular and tractable.


The MSU group has other commitments that prevent them from participating
directly in the second year of funding (but see the
section below on ‘Collaboration’).


We are excited though to bring the system directly into activities of the Biomedical Information Research Network (BIRN,
http://www.birncommunity.org/
), of which Dr Gully Burns is the chair of the Knowledge Engineering Working Group. This is a large
consortium effort of experts in Data Management, Information Integration, Knowledge Engineering, Workflows, Genetics, Securit
y

and Operations. The environment is ideal to take the preliminary results of this project and convert them into a fully
-
fledged working
system. In addition, work in the BioScholar project (in which the KEfED Editor originated) has progressed and will serve

as a suitable
launching platform for future work.


Importantly, BIRN itself has a dedicated team of senior scientists (headed by Dr Karl Helmer) who are responsible for outreac
h
activities. We anticipate expanding the scope of this work to other disease
foundations (and possible to other funding organizations
and consortia). BIRN will provide direct support for this activity and have agreed to pursue such matters going forward (Dr J
oe Ames,
Dr Karl Helmer).


Future work must include direct and active pa
rticipation with end
-
users, we will perform usability and performance evaluations of the
system ongoingly.


Additionally, we will write this year’s work into a scientific publication based on non
-
confidential data from outside of the
Foundation.


OUTCOME

MEASUREMENT

1.

PRESENTATIONS AND PUBLICATIONS


Report (and include copies of) any

presentations,

abstracts, findings, or
papers (submitted for publication
, in press,

or published) that resulted from the
funds associated with this award
.


Dr Gully Burns pre
sented this project the International Biocuration meeting in 2009, predating the contract but nonetheless attracting a
very positive response.


Dr Gwen Jacobs presented Crux as a poster at the Society for Neuroscience annual meeting in 2009.



2.

ADDITIONAL

FUNDING


(a) Indicate funding you have received from other sources for work based
on
this MJFF
funded award. Please include the name of the funder (e.g. NIH) and the work funded (title, $ amount).

(b)

Please also indicate if you
intend
to seek

additional funds for work related to this MJFF grant.


Dr Gwen Jacobs’ Yogo project is an RO1 that is due for renewal this year. She will be directly leveraging results from this y
ear’s
work in securing future funding from NIH. In addition, Dr Burns and R
uttenberg will continue to serve as consultants on that project.


Gully Burns’ BioScholar project is also an RO1 in its third year. Since BioScholar is based on using KEfED modeling with the
literature there is enormous scope to include a data
-
oriented ap
proach under the same technology.






3.

RESOURCE, PATENTS, AND LICENSES


Describe any resources (including but not limited to resources such as cell
lines and mouse

models) that resulted from your work as well as any patents

and/or licenses

that resulted
from
discovery on this project.


None.






4.
COLLABORATION


Will any collaboration result from this work? (Note: Collaboration can include work with
pharma and biotech as well as academic or nonprofit collaborations.)


This project has inherently in
volved a collaborative effort between three groups, leading to many technical challenges that we have
overcome. As described above, a direct collaboration will be the continued work by all three partners within the next phase o
f the
Yogo group.


The develo
pment of the project within BIRN will not only expose the system a range of bioinformatics technology developers but to
their users, including the Nonhuman Primate Centers (including the center where Dr Marina Emborg works in Wisconsin). We
anticipate push
ing this connection if it is in the interests of the Foundation


There was significant interest in this work from Mike Weiner of the ADNI group. Unfortunately, the system in its present form

was
too preliminary since he needed access to a fully
-
working
-
pro
duction level system immediately. Nonetheless Dr Weiner remains a
very interesting possible collaborator in the future.


The OBI consortium includes a large number of different members (including Susanne Sansone, the originator of the ISA framewo
rk).
We w
ill pursue possible collaboration with her and Dr Tim Clark, the originator of the SWAN argumentation ontology, the main
developer of the PD Research Online web community and the head of the W3C
HCLS Scientific Discourse

group. The KEfED
approach may be us
ed within Dr Clark’s framework as the basis for structured nanopublications as a direct result of presenting the
KEfED methodology at the Beyond the PDF meeting in January 2011.









_____________________________________________________________________
________________________________

I certify that the statements herein are true, complete and accurate to the best of my knowledge
.


PRINCIPAL INVESTIGATOR:








DATE: