Project Document Cover Sheet

splashburgerInternet and Web Development

Oct 22, 2013 (3 years and 9 months ago)

73 views



Project Document Cover Sheet

Project Information

Project Acronym

WebTracks

Project Title

Infrastructure for Integration in Structural Sciences

Start Date

1
st

Aug 2010

End Date

30
th

Nov 2011

Lead Institution

University of Southampton

Project Director

Simon Coles

Project Manager &
contact details

Brian Matthews

brian.matthews@stfc.ac.uk

Partner Institutions

STFC

Project Web URL

http://webtracks.jiscinvolve.org/wp/


Programme Name
(and number)

Managing Research Data (Citing,

Linking and Integrating Research Data)

Programme
Manager

Simon Hodson

Document Name

Document Title

Intercom: A protocol for link notification

Deliverable

1.1

Author(s)


Shirley Crompton, Brian Matthews, John Casson, Arif
Shaon, Mark Borkum

Date

13/04/2012

Filename


URL

if document is posted on project web site

Access

X Project and JISC internal



䝥ne牡氠
d楳獥i楮a瑩tn



Document History

Version

Date

Comments


0.1

29/09/2010

Initial version


gohn⁃ 獳sn


M⸲

OM⼰T⼲MNN

䡥ev楬y⁲ v楳敤


ph楲ley⁃牯 p瑯n

M⸳

NO⼰U⼲MNN

剥o楳ion⁡nd⁥xpan獩snⰠa晴e爠r楳捵獳ion
be瑷een⁃牯mptonⰠMa瑴hew猠snd
phaon

M⸴

NR⼰U⼲MNN

Added⁩ 瑲tdu捴con


B物an⁍ 瑴hews

M⸵

MS⼰9⼲MNN

M楮i爠rd楴i物a氠lhanges
Ⱐadded⁓e捴con O⸱.N

and u
pda瑥d
e
xamp汥
s⁩

pec瑩tn



ph楲汥l
䍲ompton

M⸶

NQ⼰O⼲MNO

o
ev楳ion猬sh楳瑯特⁡nd u獥⁣ ses

M⸷

NP⼰Q⼲MNO

䙩na氠牥l楳ion猬s
h楳瑯特⁡nd u獥⁣a獥s



Table of Contents

1

Introduction
................................
................................
................................

5

1.1

History and related work

................................
................................
.....

5

1.2

Conventions in this document

................................
.............................

6

2

Use cases
................................
................................
................................
..

7

2.1

Use Case 1: Ci
tation Tracking

................................
............................

7

2.2

Use Case 2: Data provenance tracking

................................
.............

8

2.3

Use Case 3: Linking via Annotation

................................
...................

9

3

Link Notification

................................
................................
.......................

10

3.1

Gene
ral Principles

................................
................................
............

10

3.2

Architecture for the Link Notification Service

................................
....

11

4

Terminology Applicable to this Document

................................
...............

13

5

Protocol Description

................................
................................
................

13

5.1

Technical Details

................................
................................
..............

13

5.1.1

Intercom Namespace

................................
................................
.

14

5.1.2

InteRCom Ping URL

................................
................................
...

14

5.1.
3

Metadata Format

................................
................................
........

15

5.1.4

GET Metadata Requests

................................
............................

15

5.1.5

POST Requests

................................
................................
.........

16

5.1.6

Logging Requirements

................................
...............................

18

5.
1.7

REST Interface

................................
................................
...........

18

5.2

Conformance Requirements

................................
.............................

18

6

Example

................................
................................
................................
..

19

6.1

Linking Resour
ces in Managed Archives

................................
..........

19

6.2

Linking Resources in Managed Archives


Initiated by a Third Party
21

7

Security Considerations

................................
................................
..........

22

Acknowledgements

................................
................................
........................

22

References

................................
................................
................................
....

22




InteRCom

Specification

1.0

(Draft)



Editors
:

John Casson
<john.casson@stfc.ac.uk>


Shirley Crompton
<shirley.crompton@stfc.ac.uk>

Brian Matthews
<brian.matthews@stfc.ac.uk>

Arif Sha
o
n

<arif.shaon@stfc.ac.uk>

Mark Borkum

<
m.i.borkum@soton.ac.uk
>


Date: 13/04/2012



Abstract

InteRC
om is a method for managed systems to establish semantically
annotated links between digital
artefacts

published on the web
.

A typical use
case would be that
, in the course of scientific research, a researcher writes
articles
on results obtained from analysing primary

data from experiments

and

refers to other prior work as well as
creating

derived

data
.


The holding
entities wo
uld need to be notified to provide a “link
-
back” corresponding to the
citation.


Aggregat
ing

links between
digital research
resources provides an RDF
graph
of citation and prove
nance

that
capture
s

the research process in context
.
These graphs

can be tra
versed on the web

and

interrogated to support value
added services like impact evaluation.

Reverse linking is supported as link
assertions are
intended
to be stored on both the Source and Target
Resources.

1

Introduction

The Inter
-
Repository Communication
protocol (InteRCom) is a

general
purpose

application layer

protocol for linking digital data resources of any
type

across the web
.

It provides
a
HTTP

REST
-
based
mech
anism for

managed

resource

archive
s

or data management tool
s

to create link request
s

and
to
exchange
metadata on web
-
based

representations of
heterogeneous
research object
s
.

I
nteRCom
is a peer
-
to
-
peer protocol
with no requirement
for centralised services.


1.1

History and related work

The origin of this work dates back to the CLADDIER project
[
Claddier 2007
],
which discussed the problem of
linking citations between
published resources
.
In this project, a use case of relating publications to associated raw data was
developed, and the problem of
tracing “forward” and “backward” citations, and
ho
w to track these between a number of different participating
repositorie
s

was identified
. This project produced a discussion of the problem

[Matthews
et. al. 2007]
and considered a number of protocols available to provide
notifications of this protocol.

Such protocols included harvesting (e.g. OAI
-
PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM).



A “push” protocol
,

where the
notification of the citation is actively directed at
the participating repository

was suggested as being suitable
.
Claddier

then
considered a number of “linkback”
1

protoco
ls which have been proposed for

this purpose
, and proposed to use the well
-
known T
rackBack

[Trackback 2008]
protocol as the basis for notification

protocol

which uses the REST

web
service
model

[Jacobs 2001]
.

This is a simple and established protocol,
based on HTTP, and thus a straightforward extension to existing practice. An
initial prototype of this was produced within the STFC ePubs, and the BADC
rep
ositories

[Matthews et. al. 2007b]

[Matthews et. al. 2008]



T
he work was
subsequently,
extended in the
StoreLink

[Matthews et. al. 2009]

which added whitelists to the protocol and provided an ePrints implementation
in the National
C
rystallography Servi
ce.
The Storelink approach has
advantages over harvesting methods. It is Peer
-
to
-
Peer, which increases the
chance of identification of the source and target node, supplies the context of
the link (link semantics), is simple and does not rely on an aggrega
tor service.
There are also advantages over “pull” approaches

(e.g. Atom), as a link is
propagated directly and therefore there is no reliance on discovery by
subscriber services.




1

“A
linkback

is a method for
Web

authors to obtain notifications when other authors
link

to
one of their documents”
http://en.wikipedia.org/wiki/Linkback

[retrieved 15th August 2011]


Two observations on the protocol arose in StoreLink were that
:


a)

the
step o
f “discovery” of the location of the notification receiver
could be separated from the transmission of the link, and

b)

the protocol should be made

‘general purpose’ in order to propagate
links in context between any digital object.


A similar approach is be
ing taken by the Semantic Pingback project which
uses a Remote Procedure Call [SPB

2010
]. While this project already has
recognised the
value of a general purpose notification protocol, it uses RPC,
and thus requires a different communications protocol ra
ther than building on
widely used HTTP

and REST based services
.


Another approach is taken by the Salmon project [S
almon 2011
]. This does
use a HTTP protocol, but does not use general purpose RDF based
ontologies as the basis for representing the informat
ion.


Thus we propose the InterCom protocol as a

two stage

inter
-
repository
communication protocol. It
is

more flexible than StoreLink as it

does
not
specify a fixed format for the
metadata
ontology
and it

a
llows the metadata
properties

to be defined per link
.

StoreLink
,
in contrast
,

specified a static list
of fields to be sent.


In InteRCom, a link is
represented as
an
RDF

triple.
The source and target resources form the Subject and Object URIs of the
assertion, and the link type

i
s

the Predicate (Figure 1).


Using this approach,
InteRCom
can

support

a wide range of links to be represented between
different types of data resources.





Figure
1
:
An Example L
ink
A
ssertion


1.2

Conventions in this document

The
key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”,”SHALL NOT”,
“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in
this document are to be interpreted as described I [RFC2119].



2

Use cases

We give a number of use cases where the InterCom

protocol would be
appropriate.


2.1

Use Case 1: Citation Tracking

In this scenario we wish to trace the citation graph between research papers.

Traditional publishing uses one directional citation which is entered by the
author, where an author annotates t
he current publications with prior
publications in order to reference work carried previously by other authors and
published. Thus credit for ideas and work can

be

properly attributed.
However, this method is i
ntrinsically one
-
directional.

In a traditi
onal
publication system it is difficult to track citations forward, so that readers can
discovery further work which builds upon the current publication.

Such forward citation tracking has become of greater importance due to the
requirement for citation

and impact metrics within research as
sessment.
Not
only the number of papers generated by research is required, but also an

evaluation of their impact. This can be estimated by the number of ci
tations of
the original work.
Traditional citation indexes
are generated via aggregation
services such as Web of Knowledge which harvest citation information from a
pool of recognised journals and generate a citation. However, a linked web of
repositories
(held in institutions or by publishers)
could perform the
same
function
. Publications in repositories could harvest citation information and
record the cross citation information between papers recorded within their
databases. However, they would not have access to all the papers, but only
those entered from its

own user community, so while
the repositories

would
have access to the information when taken as a whole, each individual one
would not have this information.
A
mechanism
is required
for repositories to
propagate citation information in a targeted manne
r, so that the right paper in
the right repository can be identified, given a URI of the paper identifying
where it is
being held.


In this case, an effective method of propagating citation information
is

a peer
-
to
-
peer
link
notification
metho
d

as illustr
ated in
Figure
2
.

W
hen a paper is
ingested into a repository
:


1.

its citations are identified

2.

the

location of paper within a repository (held within institution or by a
publisher). This could be via the URL of the paper or via its DOI.

3.

The citation of the paper is transmitted to a citation notification service,
which records the citation.

4.

The cita
tion is recorded as

Cited By
’ by

the citing paper.


Figure
2
: Generating the Citation Graph


In this case, the Cito ontology is used to form the appropriate links [Shotton &
Peroni 2011].

2.2

Use Case
2
: Data
provenance trackin
g

In this scenario, we wish to l
ink
a
research object with another research
object
.


For example, we wish to associate a
dataset which was used to
derive the result reported in the paper.

Further,
we may wish to associate
other information to the object, s
uch as the raw data collected, the software
packages used to undertake analysis, the research project used to fund the
research, and the people and organisations involved in the scientific process.
Thus we would create a graph of
Provenance
to trace the d
erivation of the
research results so that the quality of the research process can be made
transparent
in

future assessment, and earlier components
of the process can
be reused.
Such provenance trails are supported by notations such as the
Open Provenance
Model [Moreau et. al
.

2010] and the emerging W3C
Provenance Data Model and Ontology [W3C 2012].

Typically citations will only reference publications. Data archives wish to track
who has been using data resources and thus want to keep track of forward
link
s (“cited
-
by” links)


they may be informed of a citation from a
communication, or from a usage
report for example.
Once a data archive has
recorded a paper as arising from a particular dataset, then the citation from
the paper to the data set can be adde
d, using the data citation form discussed
above; this is not necessarily added by the author, but rather by the repository
managers.

We assume that the publication P is held in

library’s

publication repository A,
and the data set
D
is held in
a research department’s
data repository B, and
that the information that the link should be created is ini
tiated within repository
A.
Thus:

-

Repository A can add the link P
uses data
from
D to its knowledge
base.

-

Repository A can notify B that the link P
uses data from
D

-

Repository B can add the

link

P
uses data from
D to its knowledge
base.


This process can be taken further to related other entities within the
provenance graph. Thus the data set can propagate the relationship that

it

is
derived from
ra
w data
R
generated and held at Facility C, and has used a
software package
S
held at software repository D. Thus in this way, a
provenance graph can be generated and propagated around the interested
parties.

The relationships are illustrated in
Figure
3
; note that in this diagram
a blank node representing the activity of using the software package to
generate

the derived data is included.



Figure
3
: Provenance Graph showing derivation of published data


2.3

Use Case
3
: Linking via Annotation

Third parties which do not hold the entities which are being related can also
create links between entities. For example,
to continue the theme of
annotat
ing research artefacts with their provenance, a
n

electronic laboratory
notebook may add the annotation that
a data set has been derived by an
analy
sis process on a raw data set.

In this case, the link has been recorded
by the notebook, and both the repositories holding the raw and derived data
would need to be notified that such as link exists in order to have a complete
record of the provenance.


So if we assume that a
n

electr
onic lab note book is used to create the citation
that data set A is derived from data set B. A is held in repository X and B in
repository Y. Thus in this case, the link is transmitted to both repositories, so
that they can add it to both triples stores
.

This is illustrated in
Figure
4
.



Figure
4
: Notification of a link via a third party


3

Link

Notification

In order to complete the desired
link

graph we need to populate the
repositories with
links
. In particular, we need to inform repositories that their
entries have been
linked so that they can add the annotated link
entries
to
their triple stores
.

We propo
se that this would be undertaken by a
Link

Notification Service
.

3.1

General Principles

A number of principles
are

adopted in the design of the
link

notification service.

1.

The notification should be on a Peer
-
2
-
Peer basis.

2.

The service should be generic acr
oss different types of repositories
and repository software.

3.

The service should be generic across different digital object types and
metadata format
s (
i.e.

RDF Vocabularies)
.

4.

The notification should exchange appropriate metadata on the
link

from
link

holde
rs to
other parties with an interest in recording the link.

5.

The notification system should not determine what the target repository
does with the notification of the
link
.

6.

The mechanism should fail gracefully in the event that a target does
not exist or
does not recognise the notification.

7.

The mechanism should identify the sender of the notification and
defend against bogus notifications of
links
.


Further, it was seen as desirable if existing off
-
the
-
shelf tools and
mechanisms could be adapted to build o
n existing established practice and
save on development effort.


3.2

Architecture for the
Link

Notification Service

We base

the link notification service on Linkback, a peer
-
to
-
peer push
protocol.
The Linkback model establishes a direct notification between
r
epositories, as in
Error! Reference source not found.
, and operates as
follows:

1.

Link holding

repositories identify the
resources involved in

a
link

and
the
likely holder repos
itory of those resources to identify appropriate
target repositories
.

2.

Link holding

repositories notify the target repository directly of the
link
.

3.

Target repositories accept the notification of the
link
.

This architecture is similar to a
Linkback

Protocol,

such as

Trackback
[
Trackback 2008
] or
Pingback

[Langridge & Hickson
20
02].


A Linkback
protocol is a protocol which has been developed largely within the Blogging
community to allow notification of cross
-
references between Weblogs so that
authors can kee
p track of who is linking to, or referring to their articles.


Figure
5
: Linkback Model for Notification Service

This architecture need
s

the following components and functionality of those
components.

1.

Publishers, which:

i.

Identify likely target holders of
linked resources
.

ii.

Send
link

data in an appropriate format to target holders within
the appropriate Linkback protocol.

2.

Subscribers, which:

i.

Receive
link

data in an appropriate format from source holders
within the appropriat
e Linkback protocol.

ii.

Digest
link

data appropriately.

Note that in this model there is no “registration of interests” with a broker; a
source repository decides not “who” to notify, but merely “where” to notify


an
appropriate end
-
point for the notificatio
n based on its URL.

Advantages

1.

No centralized broker service.

2.

No negotiation of registration with broker.

3.

No definition of interests or harvesting of catalogue required.

4.

If notification is not acknowledged, then there is no need to continue.

Disadvantages

1.

Identification of target repositories dependent on URL, which may be
missing.

2.

Less flexibility in who can receive what (a repository can only get
linkbacks for those resources it hosts)

3.

Linkback protocols are well
-
known for being vulnerable t
o “spamming”
by bogus notifications.

As a consequence of this, there may be
additions to the protocol, such as registration of trusted repositories, or
signatures. While we recommend such safeguards, we regard them as
out of scope of this protocol.

A nu
mber of Linkback specifications exist
2
, including Trackback, Pingback
and Refback
3
.




Refback

uses the information sent when a user clicks on a link to
register the back link to the HTTP Referer (i.e. the page on which the
link was made), which can then
be harvested; thus Refback is
dependent on user’s clicking on a link, which is not guaranteed, and
the back link could be made to any reference to the digital object, not
necessarily citations.



Pingback

uses an XML
-
RPC call rather than HTTP. This

reduces

spam and potentially
richer metadata can be sent across this protocol.
However, the protocol is not widely supported.



Trackback

is a simple “framework for peer
-
to
-
peer communication”.
Essentially, TrackBack involves sending a “ping” request over HTTP
P
OST requests, saying “resource A has a link to (cites) resource B”.
TrackBack is supported by blogging software such as
MoveableType
4
.
It has a relatively simple metadata transmission in its simplest form, but
has a straightforward mechanism for exten
sion of the metadata as it
uses the POST mechanism. Problems with Spamming are well
-
known
and mechanisms can be added to mitigate this problem.




2

For a summary see:
http://en.wikipedia.org/wiki
/Linkback


3

http://en.wikipedia.org/wiki/Refback


4

http://www.movabletype.org/



Consequently, Trackback was chosen as basis
of
the
Link

Notification Service.

4

Terminology Applicable to
this Document

We give some basic definitions as used in this protocol specification.



InteRCom
-
enabled Resource


A digital research object

accessible on
the web by a HTTP
-
based URI and which also supports the InteRCom
GET and POST methods.



InteRCom
User

Ag
ent


An entity that enacts the InteRCom protocol
for a given link assertion
.



Ping



An HTTP Post request send from an InteRCom agent to a
server for the purpose of establishing an explicit relationship between
Web resources.



Receiving/Target
Resource



A Web resource to which a Ping is
directed for the purpose of establishing a link between it and a Source
Resource




Sender/Source Resource



A Web resource containing a link to the
Target Resource.



Security Guard


A generic entity that handles
authentication and/or
authorisation for the Receiving Resource.



TrackBack Ping URL



The HTTP URI to which TrackBack Ping
requests are posted.



URI
-

A HTTP
-
based Uniform Resource Identifier that can be de
-
referenced to a digital representation of a Resourc
e.



URL


A HTTP
-
based Uniform Resource Locator that points to a digital
representation of a Resource.


5

Protocol Description

5.1

Technical Details

The InteRCom mechanism uses
REST

GET and POST requests to
exchange
metadata and
establish
a link
between web
-
based

resources
.

For simplicity,
t
he protocol is designed to be
fired and forgotten

by the invok
ing application
.
S
hould it fail

at any point in the
interactions;

it fails

silently without interrupting
the processing of the invoking application
.
It

is
strongl
y
recommended that an
error message is logged
by the InteRCom User Agent
to
facilitate

error
resolution (see Section
2
.1.4).


5.1.1

Intercom Namespace

Intercom defines a namespace, with conventional prefix as follows:




xmlns
:intercom
="http://intercom.stfc.ac.uk/2011/"


5.1.2

InteRCom Ping URL

An InteRCom
-
enabled resource has an HTTP
-
based URI or a URN (eg
. DOI)
that can be resolved to the resource home location (the resource located at
the resource owner). The resource should have the capability to serve RDF:
either providing a complete RDF representation or RDF embedded in an
HTML representation of the
resource. The RDF should include the InteRCom
ping URL for the resource. This ping URL should be used to manipulate

(e.g.
PUT/UPDATE/DELETE/GET)

links associated with the resource. The ping
URL should follow this format:


URI format:

http://
authority/context/resourcepath/
links


Concrete example:


http://example.com:8080/webtracks/resources/id/1/links


The value of the ping URI MUST exposes the resource’s links resource in line
with the SOA appro
ach.
(
For this reason, the path segment ‘/links’ is
appended to the resource path.
)


Note that how the invocation application is informed of the
Source/Target
Resource
URIs are out of scope of this specification. A typical managed
resource such as STFC e
-
Publications
5

may provide a form for users to input
the relevant link creation information. Alternatively the relevant information
may be “scraped” from electronic citation records using a suitable tool

(
e.g.

see
[
B
ergmark 20
00]
).


In addition, it is str
ongly recommended that resource owners SHOULD
expose the links aggregated by their resources via a SPARL endpoint. This
access method will enable third party value added services to execute specific



5

http://epubs.stfc.ac.uk

queries directly on the aggregated link RDF statements
,

in accordance with
Linked Data Principles [Berner
s
-
Lee 2006]
.


5.1.3


Metadata Format

The protocol only mandates three properties

to be communicated when
creating a link:




The
S
ubject: the

Source Resource

URI



The
O
bject: the
Target Resource

URI



The Predicate :

the type of linkage


For consistency and persistency, cool URIs SHOULD be used to identify the
resources.
InteRCom

by design

aims to be flexible and
purposely
does
not
constrain

the metadata being
exchanged
;

except that a representation in
XML
formatted

RDF
should be

available (see Section 1.1.2)
.

Both t
he number and
type of metadata properties can be specified on a link
-
by
-
link basis.
Resource owners are free to propagate all relevant properties, which may
range from metadata describing the
R
esource
or Rights information relating to
its usage
.


Note that

publication of the metadata does not enforce any

client

behaviour.
It is up to the clie
nt to interpret
the

information in order to adopt the most
respectful behaviour towards the resources offered. To facilitate interpretation

and the correct usage of the resource
, it

is

strongly

recommended that
common
and well understood
ontologies
,
e.g.

SPAR [
Shotton 2010
],

are used.


5.1.4

GET

Metadata

Requests

Each digital
resource

will be associated with a
HTTP
UR
I

that
can

be de
-
referenced to an

XML formatted RDF
representation of the
resource
’s
metadata.

It is strongly recommended that the UR
I

also provides a
representatio
n of the
meta
data suitable for viewing through a web browser.


An example reply to a GET metadata request is provided below:




Figure
6

: An Example Reply to a GET metadata Request.

The digital resource described is specified in
a
rdf:about

attribute of a
rdf:Description

tag
. The child tags contain arbitrary metadata on the
resource.
The above example

uses the Dubli
n

Core

[DCMI 2010
]

standard to
describe the metadata.


Note that it includes the I
nteRCom

ping URL
associated with the resource

to support the InteRCom server discovery
process
.



5.1.5

POST
Request
s

The link creation
POST
request should be POSTed to the
I
nteRCom ping
URL

advertised in the resource’s RDF (see Section
2
.1
.
1
).

The receiver could
be e
ither the Subject or Target of the link assertion.

The Content
-
Type
header of the HTTP POST request MUST be ‘application/rdf+xml’ with a
character encoding of UTF
-
8. The User
-
Agent field should be set to an
appropriate version of

the

InteRCom User Agent.




The entity payload of the POST request is an XML formatted RDF containing
the link assertion
.
The format of the RDF/
XML is similar to the meta
data
format specified
in Section
2
.1.
3
.


All references to an
rdf

namespace refer
to the W3C’s RDF ontology

[RDF]
. The subject is specified in an
rdf:about

attribute of a
rdf:Description

tag.

The object should be specified as an
rdf:resource

attribute to a tag specifying the predicate.

The predicate can
be from any appropriate ontology

(but see Section
2
.1.
2
)
.


Figure 3 gives a
n e
xample
of a minimal
InteRCom P
OSTr
equest
. It links a
publication

(the Source Resource)

in the STFC ePublications archive
(epubs.stfc.ac.uk) to a beamline Investigation (experiment) managed by the
ISIS ICAT data catalogue (data.isis.
stfc.ac.uk) using a predicate of
cito:
usesDataFrom

from the

C
iTO

ontology
[CIT]
.





Figure
7

:
Example
minimal
Post Request

for a link creation.


If the Resource being linked to is managed by another authority, the Sender
SHOULD

also include metadata on the Source Resource in the request RDF.
The Target Resource may use the information to complete or update its own
sparse description of the Source Resource. Note that it is not necessary to
send meta
data about
a Resource

already

stored in the
receiving

repository.


An e
xample InteRCom P
OST

r
equest

with metadata on the Source Resource
is given below.




Figure
8

:
Example Post Request for a Link Creation

5.1.5.1

POST Request

Responses

Upon successful submission of a
POST
Request, the InteRCom User Agent
should
set HTTP
response with a 202 status code.

5.1.6

Logging Requirements

While the pr
otocol does not specify how logging should be managed or
formatted, it
is
strongly recommended that co
mpliant
InteRCom

services
SHOULD log message sent and received to allow audit of messages and
tracing of possible spurious, misdirected, or maliciously sent messages.



5.1.7

REST Interface

InteRCom

has a RESTful interface with the
se HTTP
methods
:


GET


Get met
adata on the requested Resource URI.

Produces
-

application/rdf+xml


POST



Request the creation of a link with the Target (Receiving)
Resource URI.


Consumes


application/rdf+xml


PUT


Update a specific link on the requested Link Resource URI.

Consume
s


application/rdf+xml



DELETE


Remove a specific link on the requested Link Resource URI

Consumes


application/rdf+xml


GET


Get a specific link on the requested Link Resource URI

Produces
-

application/rdf+xml


5.2

Conformance Requirements

To claim
conformance to this specification, a
n

InteRCom
-
enabled
repository
MUST provide resource
-
specific InteRCom Ping URLs and publish these as
part of the resources’ metadata
.
T
he repository
is strongly recommended to

publish a SPARL endpoint to support queries

over the aggregated link RDF
resource.


To claim conformance to this specification, an InteRCom
-
enabled Resource
MUST
support the
GET
Resource metadata
and POST
link creation
methods
as described in Sections
2
.1.
3
and
2
.1.
4
.


To claim conformance to th
is specification, an InteRCom User Agent must be
able to
compose

and parse XML formatted RDF
representation

of InteRCom
messages
.

6

Example

6.1

Linking Resources in Managed Archives



Figure
9

: Interactions between Managed Archives


This example describes the full InteRCom protocol

being

used in creating a
link between two Resources held in separate managed archives.

This
satisfies the
Use Case 1: Citation Tracking

above.


1.

Repository A

publishes/updates a resource which has
a
n RDF

link

assertion

to

a third party resource
.

2.

The
Source
Resource passes the
link

assertion

to
its
InteRCom User
Agent
.

3.

The InteRCom User Agent (
Source
) extracts the Targ
et

(Receiving)

Resource URI from the

link

assertion.

(The target Resource could be
either
the Subject or Object of the RDF statement as the link could be a
backward or forward one).

It sends a GET metadata request to the
Target Resource URI.

4.

The Target R
esource
GET
method v
alidates the request and passes
the request to its InteRCom User Agent (
Target
).

5.

The User Agent (
Target
)
obtains metadata

on the requested resource
,
compose
the Target’s
InterCom Ping
URL

and returns
these
in XML
formatted RDF

(see Figu
re 2)
.

6.

The User Agent (
Source
) parses the response, extract
the Ping URL
and process the metadata according to local policy.

7.

The User Agent (S
ource
) prepares the link creation POST request. It
requests metadata on the
Source
Resource from the local
repository.

8.

The User Agent (
Source
) composes the link creation POST request
which contains
the InteRCom Ping URL and metadata

on the
Source
Resource (see Figure 4).

9.

The User Agent (
Source
) POSTs the request to the Target Resource
’s
Ping

UR
L
.

10.

The Target Resource POST method validates the request and passes
the request to its InteRCom User Agent (
Target
).

11.

The User Agent (
Target
) processes the

link creation

request
according
to its local policy

which may
include specific

authentication and
author
isation before a
dd
ing

the link to its knowledge base
.

12.

The User Agent sets a HTTP status code of 200 OK to send back to
the requester.


Note that should the interaction fails at any point, it fa
il
s

silently. In this event,
it is strongly recommended that
an error message
SHOULD
be logged to
facilitate error resolution.


6.2

Linking Resources in Managed Archives



Initiated by a
Third Party




Figure
10

: Protocol Initiated by a Third Party Data Management Tool.


This example is simila
r to the one described in

Section

6.1
except that the
request came from a Third Party Data Management Tool.
This satisfies the
Use Case
3
: Linking via Annotation
,

above.


1.

A
Third Party Data Management Tool
creates/updates a

resource that
contains a

RDF

link
between two web resources.


This is the result of a
local triggering event, outside of the scope of the protocol.

2.

This triggers a call
to the

User Agent to

initiate link creation processing.

3.

For each URI in the link, the User Agent send
s

a GET request for
metadata which
includes the InteRCom ping URL.

4.

The Receiving resource processes the GET request as per steps 4 and
5 detailed in the
previous example.

5.

The User Agent (Resource X) parses the response to extract the Ping.

6.

The User Agent (Resource X) composes the link creation POST
request.

7.

The remaining processes follow steps 9
-
12 detailed in the previous
example
.


7

Security Considerations

InteRCom is based on HTTP
, it is subject to the
same considerations given to
that specification

[RFC2616]
.

InteRCom does not specify any normative

measures regarding security and

for preventing
malicious spamming as these
are policy issues best addresse
d by the resource owners. There are various
security frameworks

and libraries

such as

P
ublic Key Infrastructure

[PKI

2000
]
,
and OAuth [OAuth 2012
]

that can be applied

to protect InteRCom
-
enabled
Resources.

Similarly, there are standard defensive practices for preventing
spamming. For instance,
Storelink

only accepts TrackBack call from white
-
listed IP addresses. Some Trackback servers scan Originating Resources for
legitimate links to the Target Resource

(see [T
rackback 2008
]). In the case of
managed public archives, active moderation
with an administrat
or

screening
the incoming TrackBack Pings
has

also

proven effective.

Acknowledgements

The InteRCom protocol is developed as part of the WebTracks Project

http://www.jisc.ac.uk/whatwedo/programmes/mrd/clip/webtracks.aspx

which
is funded by the JISCMRD
http://w
ww.jisc.ac.uk/whatwedo/programmes/mrd.aspx

Programme

References

[Bergmark 2000]
Donna Bergmark,
Automatic extraction of Reference Linking
Information from Online Documents
. Cornell Digital Library Research
Group CSTR 2000
-
1821

[Berners
-
Lee 2006]
Berners
-
Lee, T.
(200
6).
Linked Data
-

Design Issues
.

R
etrieved 10 April 2012,

http://www.w3.org/DesignIssues/LinkedData.html



[C
laddier 2007
]

CITATION, LOCATION, And DEPOSITION IN DISCIPLINE &

INSTITUTIONAL REPOSITORIES

(CLADDIER)
JISC Project 2005
-
7
.

http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2005/claddi
er

[DCMI 2010]
Dublin Core Metada
ta

Initiative specifications
http://dublincore.org/specifications/

[Langridge & Hickson
20
02] Stuart Langridge and Ian Hickson.
Pingback 1.0

2002
.


http://www.hixie.ch/specs/pingback/pingback




[Jacobs 2001] Ian Jacobs,
Architecture of the World Wide Web, Volume One
.



http://www.w3.org/2001/tag/webarch/
.

[
Matthews et. al. 2007
a
]
Brian Matthews, Katherine Portwin, Catherine Jones,
and Bryan Lawrence
.

Recommendations for Data/Publication Linkage
,
CLADDIER Project Report III.

Nov 2007



http://epubs.stfc.ac.uk/work
-
details?
w=42017

[Matthews et. al. 2007b]
Brian

Matthew
s
, Katherine

Bouton
,

Jessie

Hey,
Catherine

Jones
,

Sue

Latham
,
Bryan

Lawrence
,
Alistair

Miles
,
Sam

Pepler
,
Katherine

Portwin
.
Cross
-
linking and referencing data and publications in
Claddier
.
Proc.
UK e
-
Science
2007 All Hands Meeting, 10
-
13 Sep 2007
,
UK e
-
Science 2007 All Hands Meeting, 10
-
13 Sep 2007

[Matthews et. al. 2008]

Brian

Matthews
,
Katherine

Portwin
,
Catherine

Jones
,
Bryan

Lawrence
.
Using Trackback to Support Citation Notification Services
.

XTech

2008: The Web on the Move (XTech2008), Dublin, Ireland, 06
-
09
May 2008

[Matthews et. al. 2009]
Brian Matthews, Alastair Duncan, Catherine Jones,
Cameron Neylon, Mark Borkum, Simon Coles, Philip Hunter
,

A Protocol for
Exchanging Scientific Citations
,

e
-
sc
ience, pp.171
-
177, 2009 Fifth IEEE
International Conference on e
-
Science, 2009

http://www.computer.org/portal/web/csdl/doi/10.1109/e
-
Science.2009.32
.

[Moreau et. al

2010] Luc Moreau, Ben Clifford, Juliana Freire, Joe Futrelle,
Yolanda Gil, Paul Groth, Natalia Kwasnikowska, Simon Miles, Paolo
Missier, Jim Myers, Beth Plale, Yogesh Simmhan, Eric Stephan, Jan Van
den Bussche.
The open provenance model core specification

(v1.1).

Future
Generation Computer Systems, July 2010.

[OAuth 2012]
OAuth Work 2.0

http://oauth.net/

[PKI 2000]
Public Key Infrastructure

http://www.opengroup.org/sec
urity/pki/


[RFC2119] S. Bradner.
Key words for use in RFCs to Indicate Requirement
Levels
,
IETF, March 1997.


http://www.normos.org/ietf/rfc/rfc2119.txt

[RFC2616]
R. Fielding, J. Gettys, J. Mogul
, H. Frystyk, L. Masinter
, P. Leach
and T. Berners
-
Lee.
Hypertext Transfer Protocol


HTTP/1.1
, FRC 2616,
June 1999

http://www.ietf.org/rfc/rfc2616.txt

[Salmon 2011] Salmon Protocol
http://www.salmon
-
protocol.org/


[Shotton 2010] David Shotton
.

Introducing the Semantic Publishing and
Referencing (SPAR)

Ontologies
.
http://opencitations.wordpress.com/2010/10/14/introducing
-
the
-
semantic
-
publishing
-
and
-
referencing
-
spar
-
ontologies/


[Shotton & Peroni 2011] David Shotton, Silvio Peroni,
Citation Typing Ontology

(CiTO
)
.


http://speroni.web.cs.unibo.it/cgi
-
bin/lode/req.py?req=http://purl.org/spar/cito
.

[SPB 2010] Semantic Pingback
http://aksw.org/Projects/SemanticPingBack

[T
rackback 2008
]
TrackBack Technical Specification



http://www.movable
type.org/documentation/trackback/specification.html


[W3C 2012] W3C Provenance Working Group
http://www.w3.org/2011/prov/wiki/Main_Page