wwPDB Common D&A Project January 28, 2010

seaurchininterpreterInternet και Εφαρμογές Web

7 Δεκ 2013 (πριν από 3 χρόνια και 11 μήνες)

110 εμφανίσεις

Worldwide Protein Data Bank

www.wwpdb.org

wwPDB Common D&A Project

January 28, 2010


Steering Committee

Project Update

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Update report


Status of D&A initial production deliverable:


Sequence Editor tool development


Integration within existing pipelines


Status of WF infrastructure initial implementation:


Sequence Processing components (external search, internal
analysis etc) integrated by WF engine and manager into the “new”
Sequence Processing Module.


Integration of Sequence Processing Module into existing pipeline.
RECONSIDER Timeline Estimate and Strategy


Next Phase


Ligand Processing: Planning



Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Overview of deliverable status for:

Sequence Editor tool


Deliverable timelines have been extended to enable full
response to user testing input (expanded requirements)
and to ensure development to agreed upon design.



Completion of Interface with additional prioritized
requirements
-

projected Feb 15


Integration within current production pipelines


Initial implementation of Master Format and format conversion
support


In Use by annotators by Feb 25




Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Sequence Editor Tool

Technologies and Standards


Model View Controller (MVC) Design




Separates data/application from presentation as much as
possible


Client/Server protocol




AJAX using JSON protocol


REST style service definitions


Server


Apache with embedded WSGI (mod_wsgi)


Application




Python with C++ extensions (Boost/Python)

All the good acronyms!

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Sequence Editor Tool

Architecture for Current and Future Deployment

Sequence

Data Store

Current

DP Pipeline

WFE/WFM


Sequence

Editor Tool

Annotated

Sequence Data

Future Workflow

DP Pipeline

PDB/FASTA

PDBx/PreBlast


PDB/PDBx


WFE/WFM


Sequence


Editor

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Accomplishments


Annotator graphical interface for Sequence Editing


Prototype evaluation and prioritization of additional requirements by
Annotators at all sites completed Jan 12


Expanded functionality development expected to be completed and
available for user testing Feb. 15, including:


Implements the capability to incrementally undo a process step (UNDO)


Summarization of sequence conflicts


Global editing features


Integration of this Sequence Editor tool (interface) into the
existing data processing pipelines (Feb 26)


Input accepts existing sequence data files at PDBe and RCSB (e.g. PDBx
+ Blast report or PDB + FASTA)


Output integration via intermediate file to be integrated via Maxit




Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Accomplishments



Master Format implementation (for current data model)


PDB to Master Format translation working with MAXIT


Final Test at PDBe


Validation and testing at all sites.


PDBj creation of new tool for Master Format Validation with
extended diagnostics.


Issues with Master Format will be ongoing
-

with evolution of the
PDB format, Hybrid methods etc.




Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Sequence Editor Tool Development

Lessons Learned


Iterative development and active Annotator involvement
is essential


and takes time.


Addressing integration issues with existing systems in
terms of modularity, process ordering and data
availability poses significant challenges.


Agile
process of development and planning supports
adaptation to evolving requirements.


We will need to further consider the most efficient level
of granularity for the deployment of new functionality in
existing systems in future planning.


Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Design Convergence Accomplishments

Master Format, API, WFM, WFE, UI

Distributed development on a complex project is challenging

Tag team development of WFE and API’s



Straw men articulation


flush out WFE/API requirements
for representative Use Cases


WFE pseudo code developed against straw men.


API integration layer will be developed against this pseudo
code.


WFE will then be implemented against the API

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Accomplishments: WF infrastructure
-

Integration of Sequence Processing



Tracking and Status DB developed and installed at
RCSB and PDBe for development purposes.


Work Flow Manager (WFM)


Prototype user testing on
-
going


Requirements refined and prototype updated


Infrastructure complete


to be deployed for testing this week


Work Flow Manager User Interface (WFM UI)


User prototype created, input received and prototype enhanced


Initial Level 1 annotator interface signed off by annotators


Level 2/3/4 interfaces prototyped and under review


Level 3 /4 under further development



Worldwide Protein Data Bank

Common D&A Project January 2010 Update

PDBe resource



Workflow XML


Luana/Tom : 1 day total to complete annotator requirements


WFE component supporting Sequence Processing :


Tom, 1
-
2 days per week ongoing, estimating 5
-
6 days (3 actual
weeks) to complete after all api’s are in place


WFM


Luana : currently full time


work is being prioritised to define the
subset of requirements to be delivered in March.


Web resources : interfaces and WFM


External services

technology requirements have been defined.
Timeline tbd. Critical Path.


Other resources


Wim : python expertise


Swanand : python expertise (after 13
th

Feb)


fall
-
back



Worldwide Protein Data Bank

Common D&A Project January 2010 Update

RCSB Resources




Web Tools
-



Currently supporting development and alpha
-
testing sites


Will add production site for Feb deployment


Database Support




MySQL database server for status and tracking database


Application Support


Project SVN code repository


JIRA issue tracking system


Project documentation and information site (Drupal)


Automated build system for API and application tools


People



Vladimir


API and build system (Python/C++)


Li


DB system and status and tracking API (Python/SQL)


Rahip


Sequence Editor Tool (Javascript/CSS)


Zukang/Raul/John


DP applications (C++/Python)








Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Updated Timeline Summary

Sequence Processing

1. Sequence Editor Tool


Completion of Interface with prioritized additional requirements
and beginning of final user testing
-

projected Feb 15


Integration with current pipelines using Master Format In test
by annotators by Feb 25


In production


best estimate early March

2. Integration of Sequence processing components with
new architecture (WFE/API and WFM)


User testing


April

3. Integration of module into Pipeline


Plan by end of March


Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Competing/Complementary Priorities


Address On
-
going data quality issues and remediation


Three Validation task forces


Implementation of recommendations


New PDB Format


with the next 6 months?


De
-
programming Kim


For Ligand Processing: timeline end of March


early April



Other strategic considerations


Stakeholders


Stress testing of new solutions against expectations and
existing solutions must be managed and will take some time.


Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Next Phase
-

Timeline

Ligand Processing


Requirements


Plans in place for Annotator exchange


March requirements consolidation, initial design plan


March create overview plan and initial timeline


Kick off development


Deployment


Strategy to be defined based on current and ongoing lessons
learned.


Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Things that have kept us up at night


These are cornerstone deliverables requiring intense
study and design consideration


beyond the proof of
concept.


Organization of data, communication protocols, etc.


Clear consensus of design features has required an evolution of
understanding


requiring wetting of hands


Ramp up of skill sets: Python, mmCIF (PDBe),


EBI External services: web
-
service set up


Site specific integration challenges


Resource issues




Worldwide Protein Data Bank

Common D&A Project January 2010 Update

BACK UP SLIDES

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Data and Application API Design


Unified Python language implementation


Provides all access to data and applications for the
workflow manager and workflow engine


Subcomponents of the API provide access to:


Data objects and data values


Applications and tools


Tracking and status information


Site level configuration information


Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Deliverable update: WFM Design


Functional Architectural design


Will present progress and tracking information


Will start/stop and restart the workflow engine in executing data
processing tasks


Will work in a fully distributed web
-
based mode


Will provide a launch point for tasks requiring interactive or
graphical interactions. Two modes defined




Immediate mode


all processing occurs in a single session
(simple case).


Deferred mode


requests for input are registered with the
workflow manager for later processing by annotator



Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Process Overview

With GO BACK

functionality