This document describes the envisaged generic storage service designed to be used in the Dicode project.
The objective of the
project is to facilitate and augment collaboration and decision making in data
intensive and cognitively
. In such collaborative scenarios, users need to share and
exchange information, files, reports, etc.
This service arises
as a solutio
n for this necessity.
The main purpose of this service is to provide Dicode users with a
permanent and reliable
keep files on the cloud. The service will be as generic as possible to allow storing any kind of files (text plain,
doc, pdf, h
tml, xml, zip…). The service will provide mechanisms to upload files and retrieve them by using
data information about files will be also stored to facilitate their search
and location by search engines or services. The
data will contain information such as type of file (pdf,
html, xml, etc.) or type of content (Dynamika report, DNA sequence, etc.). These types of files and contents
enumerated in the Dicode ONtology
The storage service is envisage
d as a centralized repository from the end
user perspective, but it will be
following a mixed approach
between centralized and distributed
. When users want to use this
service to share a file, they will have the option of uploading the file to the
central repository or providing an
URL accessible from the Internet where the file will be available. This mixed approach allows both to
maintain privacy of files
as to provide a
storage place “in the cloud
” to those users that cannot directly
ess to files from their organizations.
The scope for this storage service is
thought for using within Dicode project but the generic approach
presented allows using it from outside.
Description of the service
All services will be implemented by using RESTf
ul services. The logical vision (end
user vision) of the storage
service is depicted in figure 1.
Figure 1. Logical vision of the storage service
In this general scenario, one user needs to share a file and s/he decided to use the storage service. The file is
uploaded and stored in a database in the cloud. The storage service assigns an unique identifier to the file.
When another or the same
to retrieve the file, s/he invokes the storage service using such
identifier. This is what happens from the point of view of the end
Storage service consists of three main components:
directly managed by the storage service, is u
sed to store physically the files uploaded
by the users.
also directly managed by the storage service, is used to store the metadata about
the files uploaded/published by the users. Such metadata contains information about the users, fil
annotations, creation and modification dates, etc.
are used to annotate the files. The storage service only access
are deployed over the Internet (cloud) to get the tags/classes of DON that are useful for annotation
the storage service
follows a mixed approach. When users want to share/upload a file,
two different scenarios are possible:
User wants t
o store the file in a centralized repository directly managed by the storage service.
ser wants to maintain locally the file and provides a
public and accessible
retrieve such file
ploading a file to the centralized repository
. Double red arrows represent interactions between user and
storage service, double blue arrows identify interactions between storage service and semantic services, single red arrows
the data flow between the storage service and its internal components, and finally, double green dotted arrows represent that
references/information is shared by the components, i.e. metadata registry contains some tags belonging to the DON t
the files and also contains references to the file in the database.
First scenario is presented in figure 2. User wants to share a file but s/he cannot provide a permanent link to
retrieve such file. So, s/he decides to use the storage service t
o upload the file. Before upload the file, user
has to provide some extra information about the file, such as file format or type of content. The different
formats and types of content supported are retrieved from the semantic services and are facilitated
user. Then, user sends the file and metadata to the storage service. The storage service creates an unique
identifier for the new file
and store the file in its own local database and the metadata in the local registry.
Finally, user is provided wit
h the complete URI where the file is available.
Second scenario is
in figure 3.
In this case the user does not want to upload the file nowhere
because of, for instance, privacy concerns or legal issues. So, s/he decides to provide only the URI whe
file is available.
The process is the same as described for scenario 1. User sends the metadata to the storage
service plus a reference/URI to the file. The storage service stores all this information in the metadata
registry (including the referenc
e to the file) and nothing is stored in the local database.
Uploading a reference to the file (scenario 2)
green arrow represents that the file is accessible through the
Internet/cloud. The rest of arrows have the
same meaning as in figure 2.
In both scenarios, metadata information about file
in the same registry
. Particularly relevant are
metadata concerning the file format and content type. This information will be stored as tags using
from the DON. So, these concepts have to be previously defined in the DON.
To retrieve any file from the storage service, the user has only to
indicate the file identifier assigned by the
storage service. The storage service
will implement the logic needed to retrieve the file wherever it is stored.
This will be transparent for the end
The storage service might be useful
to store user files
from the execution of other services.
In the last case, the users would be the application that uses the
storage service. All kind of users will be able to:
Upload and annotate files
Update already existing files and annotations
List the existing files in a repository
All these functionalities will be implemented as RESTful services.
To ensure privacy of files, service storage will manage differ
ent repositories. For instance,
in the Dicode
Workbench different workspaces (working areas) will be defined, one per use case. Each workspace might
use the storage service but they will only have access to their own repository. Users of workspace 1 will not
be able to see and access to
the repository of workspace 2.
Dicode Workbench will manage this issue
Dicode user interface
In order to be integrated within the Dicode
Workbench, a widget based interface will be developed. A
preliminary design of such interface is shown in figure
Preliminary design of the widget
based interface of the storage service for the Dicode workbench
This widget will display on the
top a generic label to identify the service (“Storage service”). There will be a
menu with, at least, three options:
, this option
will allow users to upload files to the storage service. A new window
(figure 5) will be opened to capture the
metadata information from users
select the file and its
Once the uploading process is finished and success, the new file will be shown within the
, this option will allow users to configure some parameters of the storag
this option will display information about the developers, dates, licenses and useful
information of the service.
In the widget body, the list of available files will be shown using a tree view. Users will be able to retrieve the
just clicking on the name.
Additionally, the widget will allow dragging any file and dropping it over
another service/widget within the workbench. Scrolls will be shown whenever needed.
As mentioned before,
when users want to upload a file, they will click
on the “upload file…” option and a
new window will appear. This window will contain a form to capture the metadata information about the
file. A preliminary design of this form is presented in figure 5.
Preliminary design of the form to capture the information about the file
The form in figure 5 presents the following fields to be completed by the user:
textual identifier of the file that will be used to display in the tree
textual description of the file and its contents.
this field allows users to specify the format of the file. This format will be selected from
a fixed list. The list of supported file formats will be retrieved using the
semantic services (DON).
this field allows users to specify the contents of the file. These content types will be
selected from a fixed list. The
list of supported contents
will be retrieved using the semantic services
tocol should be defined to update and maintain the list of formats and
contents supported in the DON.
the user can decide whether the file will be uploaded and stored in the cloud or will
remain in its original location and a public URI will
be provided to access it.
When user selects “
n the cloud”, s/he has to select a file from his/her local machine
using the “Browse…” button.
If user selects “URI”, s/he must specify the public URI to the file.
Apart from this information, the Dicode
workbench will send extra information to the storage service about
the user uploading the file or the workspace/repository
the file belongs to.
Uploading a file
In the cloud
you can select whether to store the file in the cloud or to
provide a public URI to access the file