ConceptWiki for Open PHACTS

yoinkscreechedInternet και Εφαρμογές Web

13 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

106 εμφανίσεις

ConceptWiki

for
Open PHACTS

C.

Chichester, K. Burger, D.

van E
nckevort, F.

de Bruijn, H,

Mei, R.

Hooft

V0
.
2

from 2
9
-
10
-
2
012

Table of Contents

INTRODUCTION

................................
................................
................................
............

1

Overview of ConceptWiki

................................
................................
................................
................................
..........
1

Overview of ConceptWiki applications in Open PHACTS

................................
................................
............
2

Overview of ConceptWiki architecture (diagr
am of components)

................................
..........................
2

IMPLEMENTATION

................................
................................
................................
........

4

Brief overview of deployment of C
onceptWiki components

................................
................................
......
4

Specifications for minimal deployment

................................
................................
................................
.................

4

Speci
fications for redundant deployment

................................
................................
................................
............

6

Technical dependencies

................................
................................
................................
................................
.............
8

Relational Database Servers

................................
................................
................................
................................
.......

8

Graph Database Servers

................................
................................
................................
................................
...............

8

SOLR Servers

................................
................................
................................
................................
................................
......

8

Web Servers

................................
................................
................................
................................
................................
........

8

Deployment Dependencies Servers

................................
................................
................................
..........................

9

Deployment Dependencies Web Service and Front End

................................
................................
................

9

ConceptWiki code

................................
................................
................................
................................
.........................
9

Unit tests

................................
................................
................................
................................
................................
..............

9

UUID generation

................................
................................
................................
................................
...............................

9

Accessing the ConceptWiki data

................................
................................
................................
.............................
9

API methods

................................
................................
................................
................................
................................
........

9

Curation Interface

................................
................................
................................
................................
.........................

11

Importing Data

................................
................................
................................
................................
............................

11

Unified Medical
Language System (UMLS) import

................................
................................
........................

12

SwissProt Import

................................
................................
................................
................................
............................

12

ChemSpider Impo
rt

................................
................................
................................
................................
.......................

13

PubMed Import

................................
................................
................................
................................
...............................

14

LICENSES

................................
................................
................................
......................
14

Software

................................
................................
................................
................................
................................
.........

14

Data
................................
................................
................................
................................
................................
..................

14


INTRODUCTION

Overview of ConceptWiki

The challenge is to create a system used in Open PHACTS that manages the
heterogeneity of vocabularies and provides interoperability using open and
extensible
standards. The ultimate goal is to create a sustainable future for a large
-
scale,
community editable store of disambiguated scientific concepts, exemplifying a new
paradigm in life sciences dat
a accumulation. The ConceptWiki

is an open access
sy
stem that accepts essentially unlimited numbers of synonyms, in multiple
languages, and then maps all the terms correctly back to one unique concept
identifier, alleviating problems of vocabulary and identifier differences. The
ConceptWiki supports the dis
tinction between ’authority’ and ‘community’ data and
permits general editing only on the community branch of the data. This distinction is
the highly innovative aspect that convinces authorities that it is prudent to donate and
integrate their data into t
he system. Additionally, the comparison between the
authority branches and the community branches allows personal value judgments of
displayed data. Presently the ConceptWiki contains the biomedical terminology of
Unified Medical Language System mapped whe
re appropriate to the protein
terminology from SwissProt, the chemical terminology from ChemSpider for
biologically relevant chemical molecules and comprehensive names for the Chembl
target classification. Each concept in the ConceptWiki is annotated with
one or more
semantic types and basic information like a definition
.

Overview of ConceptWiki a
pplications in Open PHACTS

The ConceptWiki plays several roles in
Open PHACTS architecture. The technical
aspects of these roles will be described elsewhere in thi
s document. The purpose here
is to give an overview of the services that ConceptWiki supplie
s to the Open PHACTS
project
.

1.

Identity Resolution Service. ConceptWiki provides the
channel

for scientist
users to
start searching the

Open PHACTS triple store
using textual input.

a.

Auto
-
complete on search terms. The user begins typing into the search
box and once the user has type 3 characters the search of the
ConceptWiki concepts is invoked. The search box shows the concepts
that correspond to the text that is

being entered and allows the user to
select the concept of preference.

b.

Text to identifier. Once the user has selected a concept via a textual
search term, the ConceptWiki returns the identifier

(UUID)

corres
ponding to the selected concept,
which is
necess
ary for
querying
other data in the platform.

2.

Mapping supplier. Data exported from the ConceptWiki are delivered to
the
IMS as linksets for mapping between data sets contained within the

core

platform.

Mappings from concepts to the URLs of several data sets

are also
used by exemplar applications.

3.

Curation interface. The ConceptWiki has a curation interface that allows users
to make changes or add new terms to
the
concepts present in the system.

Overview of ConceptWiki architecture (diagram of components)

The ConceptWiki has been implemented as a set of modules.
For each module it could
be anticipated that the component can be replaced by a different component with
similar features.

The modules are shown and described below.



1.

Backend server modules
:

a.

Storage service
. The storage layer written in Java has the
responsibility to create, retrieve, update and delete the
domain objects (concepts, labels, definitions, etc). The
storage layer runs consistency checks to determine if the
UUID is already present.

It contains the edit history of each
domain object in a MySQL database. The current deployment
of ConceptWiki uses MyS
QL

version 5.1. The storage layer
us
es the service layer

as a component.

i.

Neo4
j

Neo4j graph database, version 1.7, was
d
ownloaded from ww
w.neo4j.org available under
license GPL v3.0.
Neo4j is a
NOSQL graph database
that inter
acts with the storage layer

via J
ava.

ii.

MySQL.

The MySQL database s
tores
the edit
history

iii.

SOLR
. The SOLR DB contains the index

of all concepts
.

b.
Concept
Service. The service layer
written in J
ava
provides the
validation of the incoming requests. It executes checks on
the
format of the input data.

c.

R
emote
M
ethod
I
nvocation (RMI)

Service.

The
communication of service layer with the front and web
services is
over remote method invocation (RMI).

The

storage layer is embedded in the service layer as a
dependency
.

2.

Fr
ont end server modules
:

a.

Web Application
. The front end is a J
ava web
application
-
using

Wicket for

the components. The front end is the
curation interface and
the entry point for individual
scientists
.

b.

Web se
rvices. The web services are a J
ava web application
that exposes operations
via REST
to search, retrieve, create
and update concepts.
The result
ing data are in
JSON

format.

IMPLEMENTATION

This section covers the deployment, the API, and the loading of data in the
ConceptWiki.

Brief overview of d
eployment of ConceptWiki components

The main modules
(Concept
Service
,
Web application
, and
.
Web

Services)
of the
ConceptWi
ki are packaged in
separate
Java WAR file
s. Each module is deployed in a

S
ervlet container

that interacts with the Java servlets
. The current deployment of
ConceptWiki uses Apache To
mcat version 7.0.30,
although

other S
ervlet con
tainers
could be used.

ConceptWiki
also
utilizes a search index to perform fast lookups. The
storage layer interacts with the index to cater
to all search requests
. The current
ConceptWiki deployment uses Apache Solr, version 3.5.0.

The code for deployment

of the ConceptWiki
can be found here:
https://trac.nbic.nl/svn/conceptwiki/trunk/code/conceptwiki

The specifications for deployment on one or more Linux servers are detailed below
.
Two different scenarios are
provided;

the minimal deployment on a single
server

and
the fully redundant deployment on multiple servers. The similarities between the two
types of deployment are discussed. A list of dependencies and their versions is given

under the technical dependencies section.

Specifications for minimal deployment

The minimal deployment consists of a single machine that hosts all components. This
machine needs to meet all the depend
encies as listed in the ‘Technical dependencies’
sectio
n.


The minimal deployment runs on an embedded Neo4J graph database and an
embedded relational database. The different components are all d
eployed in Apache
Tomcat using

the following structure:

/opt/apache
-
tomcat
-
7.0.32

Shared codebase for Apache Tomcat

/
opt/conceptwiki


ConceptWiki specific codebase and data

/opt/solr



SOLR specific codebase and data

SOLR and ConceptWiki are both run as a non
-
privileged system user, which has the
respective subdirectory of
/opt

as its home directory. Each has a tomcat di
rectory
within the home directory that contains the non
-
shareable parts of Apache tomcat.

For ConceptWiki there is a separate data directory that holds the Neo4j Graph
database and the H2 Relational database files. The location of these is configured in
s
ervice.properties

in the service component with the
neo4j.storeDir

and
jdbc.host properties
.

For SOLR the SOLR home directory specified in the Context hold the data files. It is
defined in
/opt/solr/tomcat/conf/Catalina/localhost/solr.xml

with the
followi
ng line:

<Environment name="solr/home" type="java.lang.String" value="/opt/solr/solrhome" />

Configuration of the applications

The three applications that make up ConceptWiki are deployed as ‘service’ (
service
-
impl.war
), ‘ws’ (
web
-
ws.war
) and ‘wiki’ (
web
-
gwt.war
). For the deployment the war
files need to be extracted into a subdirectory of
/opt/conceptwiki/tomcat/webapps
,
where the subdirectory is named after the context (service, ws, wiki). Each
component is configured with a properties file that resi
des in the
WEB
-
INF

subdirectory.

The service implementation needs to expose the
conceptService

through RMI to the
web service and web application. It also
must

access the SOLR index, the relation
database and the Neo4j graph database. These are configured
through the
service.properties

file.

# setup RMI properties

rmi.service=conceptService

rmi.host=localhost

rmi.port=10999


# jdbc properties

jdbc.driver=org.h2.Driver

jdbc.host=jdbc:h2:file:target/h2
-
db/db

jdbc.user=sa

jdbc.pass=


#
Location of the

neo4j
da
tastore

neo4j.storeDir=workshop/neo4j_workshopdata


# Solr
properties

solr.url=http://localhost:10080/solr

The web

service and the web application deployment need to refer to the
conceptService exposed through RMI by the Service implementation. The web

ser
vice
is configured through the ws.properties file and the web application is configured
through the web
-
gwt.properties file. Both files have the same contents.

# setup RMI properties

rmi.host=localhost

rmi.port=10999

rmi.service=conceptService

Specificatio
ns for redundant

deployment

For the redundant setup of ConceptWiki
both
the front
-
end servers and the back
-
end
servers

are implicated
.

Front
-
end Servers

The web
-
service is stateless, and can be deployed multiple times without problems.
The web application however needs to preserve state in the session. Either you need
to deploy in a J2EE application server that supports session replication (e.g. JBoss,
Gla
ssfish) or you would need sticky sessions support in the load balancer. Though the
latter solution would drop sessions when one node fails. With session replication
ConceptWiki supports both an Active
-
Active and an Active
-
Standby setup.

Backend
-
Servers

The

backend servers provide an RMI end
-
point. The Spring RMI client does not
natively support clustering, but does support retries and reconnecting to the RMI
service. This means that a layer 3 load balancer

is needed

to provide redundancy
(Active
-
Standby) or

layer 4 load balancer for a load balancing setup (Active
-
Active).
Since the ConceptWiki service is completely stateless
,
the service
can run
on as many
machines as necessary. The web
-
service and web application should be configured
with the service IP add
ress of the load balancer. The ConceptWiki service needs to be
configured for redundancy as described in the following section.

Configuring ConceptWiki service for redundancy

To create a fully redundant setup all the components need to be redundant. This
r
equires decoupling of the databases (Neo4j, RDBMS, SOLR) from the ConceptWiki
deployment. A redundant setup of these components falls outside the scope of this
document.

The connections to the databases are configured in the
service.properties

file. For
th
e JDBC driver you can use a regular JDBC connection string for the
jdbc.host

property. Make sure the
jdbc.driver

point to the correct driver for your database
and supports clustering or another HA setup. This driver needs to be made available
in the Java c
lass path. The SOLR index is configured with the
solr.url

property, as of
this writing SOLR only supports fail
-
over and requires an external load balancing
solution to support failover. The
solr.url

property should contain the service URL
configured on the

load balancer. The Neo4j configuration needs to be configured in
the
storage
-
neo4j
-
context.xml
. You will need to configure a
replace the
graphDb

bean
with an implementation that supports HA
.

See the Neo4j HA Setup for more
information.

Apache Tomcat setup

As specified above the Apache Tomcat installation is split up a shared and a
deployment specific part. This section outlines how to setup tomcat and create a
startup script that will run the

service with the correct user.

cd /opt

curl
-
o
apache
-
tomcat
-
7.0.
3
2.tar.gz

http://apache.mirror.versatel.nl/tomcat/tomcat
-
7/v7.0.32/bin/apache
-
tomcat
-
7.0.32.tar.gz

tar xvfz apache
-
tomcat
-
7.0.
3
2.tar.
gz

for SERVICE in conceptwiki solr

do


useradd
-
r
-
b /opt
-
c
“Apache Tomcat user for the $SERVICE service” $SERVICE


mkdir
-
p /opt/
$SERVICE
/tomcat/{bin,conf,logs,temp,webapps,work}


cp
-
a /opt/apache
-
tomcat
-
7.0.32
/conf/* /opt/$SERVICE/tomcat/conf


chmo
d +x /opt/$SERVICE/tomcat/bin/$SERVICE


chown
-
R $SERVICE:$SERVICE /opt/$SERVICE


ln
-
s /opt/$SERVICE/tomcat/bin/$SERVICE /etc/init.d/$SERVICE


insserv /etc/init.d/$SERVICE

done

Create an init script at
/opt/conceptwiki/tomcat/bin/conceptwiki
:

#!/bin/sh

### BEGIN INIT INFO

# Provides:


conceptwiki

# Required
-
Start:

$local_fs $network
apache2 solr

# Required
-
Stop:

$local_fs $network
apache2 solr

# Should
-
Start:

$named $time

# Should
-
Stop:

$named $time

# Default
-
Start:

2 3 4 5

# Default
-
Stop:

0 1 6

# Short
-
Description:

ConceptWiki service

# Descri
ption:

ConceptWiki service

### END INIT INFO


export CATALINA_HOME=/opt/apache
-
tomcat
-
7.0.32


ex
port CATALINA_BASE=/opt/conceptwiki
/tomcat

export CATALINA_USER=conceptwiki


c
d “/opt/${CATALINA_USER}”

su
-
p "${CATALINA_USER}" "${CATALINA_HOME}/bin/catalina.sh" $*

Create an init script at
/opt/solr/tomcat/bin/solr
:

#!/bin/sh

### BEGIN INIT INFO

# Provides:


solr

# Requir
ed
-
Start:

$local_fs $network

# Required
-
Stop:

$local
_fs $network

# Should
-
Start:

$named $time

# Should
-
Stop:

$named $time

# Default
-
Start:

2 3 4 5

# Default
-
Stop:

0 1 6

# Short
-
Description:

Apache SOLR service

# Descri
ption:

Apache SOLR service

### END INIT INFO


export
CATALINA_HOME=/opt/apache
-
tomcat
-
7.0.32


ex
port CATALINA_BASE=/opt/solr
/tomcat

export CATALINA_USER=solr


cd “/opt/${CATALINA_USER}”

su
-
p "${CATALINA_USER}" "${CATALINA_HOME}/bin/catalina.sh" $*

Modify
/opt/solr/tomcat/conf/server.xml

and change the liste
ning ports

so they
do not conflict with the ConceptWiki Tomcat instance. The port used here should also
be used for the solr.url in the service properties of ConceptWiki.

Apache HTTPD reverse proxy setup

The ConceptWiki web

service and
web
application are
hosted behind a Apache
HTTPD reverse proxy. This allows the Apache Tomcat service to run as a non
-
privileged user and allows flexibility in cr
eating a fully redundant setup.

# Required modules for proxy support

LoadModule proxy_module
/usr/lib/apache2/modules/mod_proxy.so

LoadModule proxy_http_module /usr/lib/apache2/modules/mod_proxy_http.so


# Only enable ProxyRequests for forward proxy, disable for reverse proxy

ProxyRequests Off


# ConceptWiki Web Service

ProxyPass / http://127.0.0.
1:8080/ retry=5 timeout=60 ttl=300

ProxyPassReverse / http://127.0.0.1:8080/

RewriteEngine On

RewriteRule ^/$ http://ops.conceptwiki.org/wiki/

On the old ConceptWiki we have the following configuration to make the URLs from
the cache work:

RewriteRule ^/co
ncept/(.*)/$ http://www.conceptwiki.org/wiki/#/concept/$1/view [NE,L]

RewriteRule ^/concept/(.*)$ http://www.conceptwiki.org/wiki/#/concept/$1/view [NE,L]

RewriteRule ^/wiki/(.*) http://ops.conceptwiki.org/wiki/$1 [NE,P,L]

ProxyPassReverse /wiki/ http://op
s.conceptwiki.org/wiki/

<Location "/wiki/">


Allow from all

</Location>

<Location "/concept/">


Allow from all

</Location>

Technical
d
ependencies

All

servers are running on Linux (Ubuntu 10.04LTS
)
,

although

any long
-
term

supported distribution would do (Red Hat, SuSE, Ubuntu, Oracle).


Relational Database Servers

Percona Server 5.5.27
-
28.1
-
log Percona Server (GPL), Release 28.1. In theory any
other relational database with a JDBC connector should work, but this would need
testing.

Graph Database Servers

Neo4J standalone 1.7.2 running
Java 6 SE 1.6.0_26
-
b03 or Java 7 SE 1.7.0_07
-
b10


SOLR Servers

Apache Tomcat

7.0.30 running Java 6 SE 1.6.0_26
-
b03 or Java 7 SE 1.7.0_07
-
b10 and
SOLR 3.5.0

Web Servers

Apache HTTPD 2.2.22 with

reverse proxy and rewrite engine and caching and SSL
enabled and Apache Tomcat 7.0.30


Deployment Dependencies Serv
ers

MySQL JDBC Connector: Connector /J 5.1.22 : The TCP/IP Connection to SOLR server
port can be configured

Deployment Dependencies Web Serv
ice and Front End

R
MI connection to service module

can

be configured

ConceptWiki code

Unit tests

Each module of the ConceptWiki has it own unit tests, which adhere to normal
conventions as done in the Maven projects. In short, the unit tests are designed t
o
make the functionality and the use of the unit
test
clear to other developer
s who may
encounter the code. Presently unit tests cover 71% of the classes, 65% of the lines of
code, and 70% of the methods.

The Jenkins build server is configured to stop the deployment if a unit test fails.

UUID generation

For
its pseudo
-
randomly generated
concept identifiers, the ConceptWiki
uses a java
library provided

by the
Open Web Applications Security Project (OWASP
)

t
ailored

toward cryptographic security,
Version 4 UUIDs
.
Each UUID is generated using an

algorithm
that
sets the version number as well as two reserved bits. All other bits are
set using a random or pseudorandom data source. Version 4 UUIDs

have the form
xxxxxxxx
-
xxxx
-
4xxx
-
yxxx
-
xxxxxxxxxxxx where x is any hexadecimal digit and y is one
of 8, 9, A, or B. e.g. f47ac10b
-
58cc
-
4372
-
a567
-
0e02b2c3d479.

Accessing the ConceptWiki data


API methods

ConceptWiki data can be accessed programmatically vi
a the web services module.
The
current APIs are exposed as a RESTful web

service with JSON response.

Below is an overview table of the services available
:

Name

Description

Status

search

Returns all concepts that match the free text input.

The
concepts will be matched on their labels.

available

search by
tag

Returns all concepts for a given semantic type.

available

search for
url

Returns all urls for concepts that match the free text input.

available

get
concept

Returns a concept object for a given uuid.

available

authority

parameter

Returns a concept object for a given authority
, when the
specific authority parameter is included

available

limit
Returns a specific
number of concepts corresponding to
available

parameter

the limit set in the parameter


Generic parameters

Limit

parameter
:
All search methods support an optional 'limit' parameter. By default,
the limit value is 10. Possible values for the limit range from 1 to
'infinite'. A limit
smaller then 1 is considered invalid.

Authority

parameter
: A number of

search methods support an optional 'authority'
parameter.

This parameter will filter the results of the API call. All results will be
filtered on data source authori
ty. The parameter values are currently integer
identifiers for the authorities known in ConceptWiki.

1 = Community

2 = UMLS

3 = SwissProt

4 = ChemSpider

5 = ConceptWiki

Specific parameters for each service

Name

URL

Required
Parameters

Response

search

http://ops.conceptwiki
.org/web
-
ws/concept/search/

Example:

http://ops.conceptwiki
.org/web
-
ws/concept/search/?q
=malaria


q: free text query
input. Should be
at least 3
characters

An array of concepts will be
returned. The concepts
contain the

following
properties:

uuid, the unique identifier
within the conceptwiki

labels, the preferred and
alternative labels

tags, the semantic types that
classify the concept

search
by tag

http://ops.conceptwiki
.org/web
-
ws/concept/search/by
Tag

Example:

http://ops.conceptwiki
.org/web
-
ws/concept/search/by
Tag?uuid=b946958d
-
b46f
-
4de3
-
aa55
-
63684b301cf1&q=ma
laria

uui
d: the uuid of
the tag concept


q: the

search
query


An array of concepts will be
returned. The concepts
contain the following
properties:

uuid, the unique identifier
within the conceptwiki

labels, the preferred and
alternative labels

search
http://ops.conceptwiki
q: free text query
An array of concepts will be
for url

.org/web
-
ws/concept/search/fo
rUrl

Example:

http://ops.conceptwiki
.org/web
-
ws/concept/search/fo
rUrl/?q=malaria


input. Should be
at least 3
characters

returned. The concepts
contai
n the following
properties:

uuid, the unique identifier
within the conceptwiki

urls, the external URLs that
describe the concept

get
concept

http://ops.conceptwiki
.org/web
-
ws/concept/get/

Example:

http://ops.conceptwiki
.org/web
-
ws/
concept/get/?uuid
=d19a73ff
-
579c
-
46c0
-
af47
-
52290ae06186


uuid: the uuid of
the concept

A concept object is returned.
The concept contains the
following properties:

uuid, the unique identifier
within the conceptwiki

labels, the preferred and
alternative
labels

notations, the preferred and
alternative labels

notes, the preferred and
alternative labels

tags, the semantic types that
classify the concept

urls, the external URLs that
describe the concept


Error handling

If the
concept

is

not f
ound a 404 error

message is displayed.

If there is a missing parameter in the query a 400 error message is displayed.

Curation Interface

The ConceptWiki home page for the Open PHACTS project can be found here:
http://ops.conceptwiki.org/wiki/
.
Th
e ConceptWiki user

interface has been
developed for manual access to the ConceptWiki by scientist
-
curators
. Scientists can
edit and add synonyms that are missing or misaligned from this web interface.
Individual
ConceptWiki concepts can accessed via a
n

html interface

available from
http://ops.conceptwiki.org/wiki/#/concept/[UUID]/view

,(
where

[UUID] represents
the
ConceptWiki
identifier of
the concept to be visualized
)
.

Importing Data

The d
ata
sources currently residing in the ConceptWiki are added to the Neo4j
backend by specific
import scripts

that take into account mapping files betw
een data
sources.

The imported data sources are listed b
elow along with key features of the
import script

Unified Medical Language System (UMLS) import

Data

s
ource: UMLS MySQL da
tabase (e.g. UMLS2012AB_meshgo).

Import
data sets
concerning
MeSH

& GO (as of 2012
-
2
-
21)

Code:
https://trac.nbic.nl
/svn/conceptwiki/tr
unk/code/conceptwiki/imports/imports
-
umls/src/main/java/nl/nbic/conceptwiki/imports/umls/UMLSImport.java

Methods:

importTermsNotationsUrls()

add MeSH url: http://purl.bioontology.org/ontology/MSH/<CUI>

add GO url: http://purl.org/obo/owl/GO#<GO_ID>

importD
efinitions()

importTags()

importSemanticNetworkTypes()

importSemanticNetworkRelations()

importSemanticTypeAsTags()

SwissProt Import

Data

source: uniprot_sprot.xml
.

The
2012
-
01 release
was

used in the first import
.

The
2012
-
07 release
was

used
as

the
first
update run (for more specific information see
data set update page)
.

Code:
https://trac.nbic.nl
/svn/conceptwiki/trunk/code/conceptwiki/imports/imports
-
swissprot/src/main/java/nl/nbic/conceptwiki/imports/swissprot/UniprotImport.ja
va

Import steps
:

1)
Update/create protein conce
pt:
protein name plus species name in
parenthesis as preferred name (e.g. Tissue
-
type plasmi
nogen activator (Pongo abelii)

2)
Update/create gene concept:
only some SP entries
have

gene

concepts

3)
Add EC number (e.g. EC 3.4.21.6
8) as synonym

4)
Add PDB DB reference as U
RL
:
http://www.pdb.org/pdb/explore/explore.do?pdbId=<PDB id>

Mappings

to be supplied
:


1)
specific protein mapping (SP2UMLS)

mapping file <SPid|UMLSid>:
https://trac.nbic.nl
/s
rc/main/resources/UMLS11AB_SP201201_specificproteins.txt

2)
generic protein mapping (SP2UMLS)

mapping file <SPid|UMLSid>:

https://trac.nbic.nl
/
src/main/resources/UMLS11AB_SP201201_genericproteins.txt

3)
generic gene mapping (SP2UMLS)
mapping file <SPid|UML
Sid>:

https://trac.nbic.nl
/
src/main/resources/UMLS11AB_SP201201_genericgenes.txt

4)

BioPortal mapping (SP2UMLS URL):
http://purl.bioontology.org/ontology/NCIM/

5)

Drugbank mapping (SP2drugbank

URL)

mapping file <SPid|drugURL>:

https://trac.nbic.nl/src/main
/resources/drugbank_SP_drugURL_mapping.txt
. A
dded
as URL
:

http://drugbank.ca/drugs/<DrugBankID>

6)

Drugbank target mapping (SP2targetID)
mapping file <SPid|targetID>:

https://trac.nbic.nl/src/main/resources/drugbank_target_SP_mapping.txt
. A
dded as
URL:

http://www4.wiwiss.fu
-
berlin.de/drugbank/resource/targets/<targetID>

7)

Chembl mapping (SP2chemblUrl)

mapping query: chembl_12 MySQL
database
. Added as URL:
http://chem2bio2rdf.org/chembl/resource/chembl_targets/<tid
>

ChemSpider

Import

Data sources: Files from ChemSpider dropbox: ChEMBL.ttl, DrugBank.ttl, MeSH.ttl,
PDB.ttl

(see data set page for current update information)

Code:
https://trac.nbic.nl/conceptwiki/browser/trunk/code/conceptwiki/imports/import
s
-
chemspider/src/main/java/nl/nbic/conceptwiki/imports/chemspider?rev=753

I
mport

steps

ChemSpider
-
Mesh

file
:

1)

P
reprocessing of files


a.

Check ChemSpider files with scripts to correct for unescaped "

b.

Run ChemSpider files with script that checks for two ChemSpider
ids associated with one external id

2)

Update chemical concept
:

a.

U
se UMLS semantic type "Chemical Viewed Structurally" as tag for
all chemspider concepts. The tagging relationships should be added
in ConceptWiki branch.

b.

W
hen multiple CS entries map to one MeSH concept, the MeSH
concept is used as tag

Import steps
for
Che
mSpider
files
other than MeSH :

1)

Preprocessing of files

a.

Check ChemSpider files with scripts to correct for unescaped "

b.

Run ChemSpider files with script that checks for two ChemSpider
ids associated with one external Id, e.g. from pdb file:

OLC 21428104
26328367, OLC 21428104 26328377, OLC 21428104 26328376

2)

Update/create chemical concept: Chemical concepts found in MeSH

should already be mapped to ChemSpider.

a.

Other ChemSpider chemical concepts, not in MeSH
, will need to be
created. These "new" concepts may re
-
occur in several files (i.e.
PDB, DrugBank, Chembl) and should be update with the appropriate
URL.

b.

Add tag "chemical viewed functionally" to all ChemSpider concepts

c.

Add external DB reference as URL
only when external reference is
an exact match to CW concept:

i.

For PDB:
http://www.pdb.org/pdb/ligand/ligandsummary.do?hetI
d=<PDBid e.g. LOV>

ii.

For DrugBank:htt
p://www.drugbank.ca/drugs/<DBid>

iii.

For ChEMBL:

PubMed

Import

Not currently done.

NB:

To speed up the
data import (100x), S
olr indexing at the concept service layer
should be disabled
.

L
ICENSES

Software

T
he ConceptWiki is an
open source, public domain project
. T
he
Neo4j component
is

licensed
under the GP
L, although the GPL3.0 license is compatible with the

Apache
license, the Neo4J component is not distributed with nor linked into the ConceptWiki
code.

The specialized code for the Storage, Service, Web Service, and Interface layers
of the ConceptWiki
is licensed under the
Apache License, Version 2.0
,
http://www.apache.org/licenses/LICENSE
-
2.0.html


Data

The data in the ConceptWiki will be licensed
under CC
-
BY
-
SA. When users sign
-
in for
the first time they will need to agree to adding their edit
s under this license.

There is
also a take down notice, which in essence is the same as the take down policy for the
data in the Virtuoso triple store.