U-P2P: A Peer-to-Peer Framework for Universal Resource Sharing and Discovery

rangaleclickΛογισμικό & κατασκευή λογ/κού

4 Νοε 2013 (πριν από 4 χρόνια και 4 μέρες)

113 εμφανίσεις

U
-
P2P: A Peer
-
to
-
Peer Framework for Universal Resource
Sharing and Discovery


Neal Arthorne, Babak Esfandiari, Aloke Mukherjee

Department of Systems and Computer Engineering,

Carleton University, Ottawa, Ontario, Canada

narthorn@connectmail.carleton.ca
,
babak@sce.carleton.ca
,
alokem@cisco.com

Abstract


We present U
-
P2P, an open source framework for developing, deplo
ying and discovering file
-
sharing communities. We address the problem of search in peer
-
to
-
peer file sharing by allowing
the end user to add metadata to shared documents. Each community allows the sharing of a
particular structured document. Communities ar
e themselves modeled as structured documents,
thus enabling their sharing and discovery just like any other document. The creator of a particular
community specifies, among other properties, the document type that it shares and the
deployment model. U
-
P2P’
s extensible architecture allows developers to create new properties or
extend existing ones, such as providing new deployment models or custom privacy and
authentication features. U
-
P2P makes use of other open source projects such as Jakarta Tomcat
and eX
ist, an XML database system.

I
-

Introduction


The current success of peer
-
to
-
peer (P2P) file sharing applications has highlighted the benefits of
distribution and redundancy of resources. However, to truly exploit such advantages, a few
roadblocks remain t
o be cleared. In particular, search and discovery of resources is still quite
difficult. Most known approaches rely on simple schemes such as a search for the resource name
or type. File sharing communities are most efficient when the name of the file carr
ies most if not
all of the needed information. As a result, most file swapping communities are restricted to
swapping music and video files. Even when the exchange of non
-
music files is possible, the
difficulty of finding such files has been a sufficient d
eterrent. Lack of metadata is the obvious
problem.


Another roadblock to more general applications of P2P has been the difficulty of creating
communities for specific purposes. This arises partially from the difficulty of defining custom
metadata about dif
ferent types of files. Another important problem is the current fractured state of
peer
-
to
-
peer communities. Again, the lack of metadata about communities makes it difficult to
know
what there is to look for in the first place
.


We propose a peer
-
to
-
peer
framework called Universal Peer
-
to
-
Peer (U
-
P2P) that simplifies the
sharing of custom metadata formats as well as the easy creation, configuration and discovery of
peer
-
to
-
peer communities. In U
-
P2P, each community is described in part by the metadata of t
he
files it exchanges. The format of a community’s metadata is specified using the XML Schema
language thus allowing new communities centered on that file type to be created in any text
editor. U
-
P2P allows these custom resources to be created and shared a
s in traditional file
-
sharing
services.


By logical extension, the description of the community itself (the metadata format of its files, the
protocol used for search, etc) is encapsulated in an XML file. In U
-
P2P, creating, sharing or
discovering a commun
ity follows the same principles as creating, sharing or discovering a file
within that community. As a result, file
-
swapping communities such as Napster or Kazaa can be
seen as
instances

of U
-
P2P devoted to sharing a few specific file types and utilizing a

given P2P
protocol.

II
-

Related Work: the Discovery Problem in Peer
-
to
-
Peer Systems


In Napster [1], the only files that could be shared on the network were MP3 audio files. Search
was based on filenames and relied on users encoding the artist and title o
f each song in the MP3’s
title. Although metadata such as encoding rate could be used to sort the results there was no way
to search on these metadata or define other parameters.

As for a Gnutella [1] network, any type of file can be shared but there is no

explicit metadata
handling. Search strings are passed around without processing between peers and their
interpretation is left to the peer. Each peer must implement its own search algorithm using the
search string as input. Most Gnutella implementations,
like Napster, simply return filenames that
contain the search string. But Gnutella does not stop designers from designing overlay protocols
to encode and decode metadata from search strings. This has led to proposals in the Gnutella
developer community fo
r richer metadata searches [1, 2]. Schemas are defined for common file
types: for example an audio file might be defined to have properties such as artist, title, bit rate,
album, etc. The schema defines a structured format for searching MP3 metadata that
is sent as a
search string to other Gnutella nodes. Responding clients use the query to search local files
annotated using the schema and returns the results using the same structured format. In practice
though, all members of a given community must be abl
e to speak a common language in order to
communicate.


Other P2P systems such as FastTrack [3], Opencola [3] or Bitzi [3] propose variations on that
idea, but they are still limited to a number of predefined schemas. The latter two have the
capability to
extract metadata information from a file given the file format, but this is only
possible for certain formats.


Clearly, there is a need for
shared ontologies

if we want to allow search for any type of file. In
the Internet world, XML Schema [3] is the new

format of choice to represent ontologies,
replacing Document Type Definition (DTD) [4], the schema language defined in the original
XML specification. XML Schema supports the creation of custom and complex data types for
XML tags, which is essential to de
scribe resources of a composite nature. For richer semantic
descriptions, for example to allow software agents to perform search instead of humans, there can
be a need to describe relationships between resources. RDF [4] and more generally the Semantic
Web

[5] effort address that need. It is worth mentioning at this point the Edutella project [6],
which uses a P2P network for sharing metadata described using RDF, as a technology for
distributed learning. However in Edutella metadata
is

the resource, not the

means to describe one.


A P2P system using a semantic layering approach is not without its drawbacks. First and foremost
is getting users of the system to supply metadata for their resources. The addition of metadata
requires user
-
friendly tools for auth
oring RDF and schemas and a simplified approach for users
who are not familiar with XML languages.


What we propose in U
-
P2P starts, like in Edutella, with the sharing and discovery of metadata
described this time using XML Schema. Once metadata is discov
ered it is used to instantiate a
particular resource, which is in turn shared. The next section gives a high
-
level description of the
principles behind U
-
P2P as well as a glimpse of its design.

III
-

U
-
P2P Concepts


U
-
P2P provides four fundamental services
:
search
,
create
,
browse local

and
view
. Each of these
services are provided in the context of a community. We can imagine the existence of a “stamps”
community that trades pictures and descriptions of stamps from around the world. On entering the
stamp co
mmunity the U
-
P2P Search function would offer fields to search for stamps of a given
year, and/or from a given country. Similarly, the Create function would prompt you to upload a
picture of the stamp, Browse Local would show the stamp objects that you hav
e already
downloaded and View would display a picture as well as the attributes for one of your
downloaded stamps.


In traditional P2P applications this functionality would require downloading a client that knew
about the format of a stamp object and con
tained customized search, create and view screens for
such an object. In U
-
P2P, we use the power of metadata to simplify this task. Consider the
following XML schema describing a stamp object:


<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org
/2001/XMLSchema">

<xsd:element name="stamps">


<xsd:complexType>


<xsd:sequence>


<xsd:element name="name" type="xsd:string"/>


<xsd:element name="description" type="xsd:string"/>


<xsd:element name="picture" type="xsd:anyURI"/>


<x
sd:element name="country" type="xsd:string"/>


<xsd:element name="year" type="xsd:string"/>


</xsd:sequence>


</xsd:complexType>

</xsd:element>

</xsd:schema>


The above describes the expected elements of a stamp object and their data types. Is it
possible to
generate a form from this specification using XML Stylesheet Language Transformations (XSLT)
[7]. Here is an excerpt from a stylesheet that generates a Search form from the above XML
schema:


<xsl:template name="SchemaTemplate" match="*[local
-
n
ame()='schema']">

<h3>Search for a Resource</h3>

<p>Enter keywords in any of the fields below to perform a search.</p>

<form action="search" method="post">

<table border="1" cellpadding="5" cellspacing="0">


<tr><th>Property</th><th>Value</th></tr>


<xsl
:for
-
each select="descendant::*[local
-
name()='element' and count(./child::*) = 0]">


<xsl:call
-
template name="ElementTemplate"/>


</xsl:for
-
each>

</table>

<p><input type="hidden" name="up2p:community"><xsl:attribute name="value"><xsl:value
-
of
selec
t="$communityName"/></xsl:attribute></input>

<input type="submit" value="Search"/></p>

</form>

</xsl:template>


Similarly, stylesheets can be produced that render Create forms or display a stamp object. With
nothing more than a few XML documents (a schema
and stylesheets for creating, searching and
displaying), it is possible to define a whole new P2P file
-
sharing application! In fact, U
-
P2P
provides default stylesheets for handling display and forms for resources made up of common
types, making the transfo
rmation processes transparent to the novice user. Figure 1 shows the
relationships between the U
-
P2P functions, and how they are accessed through XSL
transformations:





















So how can a stamp community be found or shared in the first place?

The idea in U
-
P2P is to see
a file
-
sharing community as just another type of resource. This is analogous to the idea of a class
in object
-
oriented programming which specifies the structure of objects. In pure object
-
oriented
languages such as Smalltalk, a

class is merely another type of object whose structure is specified
by a metaclass. Traversing the analogy in the opposite direction, a specific U
-
P2P community can
be seen as a class instantiated by a more general metaclass:
a Community
-
sharing community

(in short: a community community)

shares Community objects
.



an_
object

is an instance of a_
class
, which

is an instance of
metaclass


mp3
belongs to

mp3 community,
which belongs to

community community


Similarly, to a class a specific community (e.g. a
stamp community or an MP3 community) is just
another object. In U
-
P2P the problem of discovering the existence of a community is thus reduced
to the problem of finding an object. This provides a standard way to discover the existence of
resource
-
sharing co
mmunities.


To facilitate this, U
-
P2P comes packaged with one “bootstrap” schema that can be used to search
for and more importantly create communities:


<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">


<xsd:element name
="community">


<xsd:complexType>


<xsd:all>


<xsd:element name="displayLocation" minOccurs="0" type="xsd:anyURI"/>


<xsd:element name="searchLocation" minOccurs="0" type="xsd:anyURI"/>


<xsd:element name="createLo
cation" minOccurs="0" type="xsd:anyURI"/>


<xsd:element name="schemaLocation" type="xsd:anyURI"/>


<xsd:element name="name" type="xsd:string"/>


<xsd:element name="category" minOccurs="0" type="xsd:string"/>


<xs
d:element name="keywords" minOccurs="0" type="xsd:string"/>


<xsd:element name="description" minOccurs="0" type="xsd:string"/>

Resource

XSL

Resource Create
form

Resource
Search form

XSL

RESOURCE

XML SCHEMA

instantiates

Resource
View

XSL

Figure 1: Generation of resource
-
specific d
isplays


<xsd:element name="protocol" minOccurs="0" type="protocolType"/>


</xsd:all>


<xsd:attribute

name="title" type="xsd:string"/>


</xsd:complexType>

</xsd:element>


<xsd:simpleType name="protocolType">


<xsd:restriction base="xsd:string">


<xsd:enumeration value=""/>


<xsd:enumeration value="Generic Central Server"/>


<xsd
:enumeration value="Gnutella"/>


<xsd:enumeration value="JXTA"/>


</xsd:restriction>

</xsd:simpleType>


</xsd:schema>


As can be seen, a community can have many attributes: keywords, deployment protocol,
security… This means that a community can
be created by choosing specific values for such
attributes (e.g.
keywords: stamps, Canadian
;
deployment: Napster
-
style
…).
The search for a community is made in similar fashion, by filling out a similar form. As of now
however, we only provide one type of d
eployment protocol, “Napster
-
style”, in which the peer
that creates the community also acts as a broker. Other deployment models, such as “Gnutella
-
style” (no broker required) and “centralized repository” (a client
-
server option with no local
copies of fil
es) are currently being developed. This means that searching for a document could
either not rely on a central server at all, or that on the other extreme full persistence of files could
be assured by a central storage of files. Such decision on the type o
f deployment will be entirely
up to the creator of the given community. Also, we have not yet explored various security or
privacy schemes. Such possibilities are discussed later in this paper, in the section on design. The
modular aspect of our design sho
uld hopefully allow the open source development community to
provide support for many more of these attributes.


The schema combined with default stylesheets as described above allow U
-
P2P to become an
engine for creating and searching for communities whi
ch trade all sorts of different types of files.
A stamp collector using U
-
P2P for the first time will go to the community search form and type
“stamp” into the keyword field. Upon finding a community of collectors, he might download the
community includin
g the community’s schema and stylesheets. U
-
P2P would then offer the
choice of entering the community. Once in the community he can perform all the actions one
would expect of a file
-
sharing application devoted specifically to sharing stamps.


Figure 2 sh
ows respective snapshots of a stamp view, a stamp search form and finally the root
community view, the highest level of abstraction in U
-
P2P:







IV
-

U
-
P2P Architecture and Design


Like other file
-
sharing services, U
-
P2P consists of a client and a
file server running on a user’s
computer. The client part of U
-
P2P is implemented using JavaServer Pages [8] running on a local
web server. The prototype uses Jakarta Tomcat [8], but any server capable of serving JSPs may
be used. The user connects to th
e U
-
P2P network by pointing their browser at the address of the
local web server, typically localhost:8080. The web server serves the GUI towards the user,
dispatches search and create requests to the other peers as well as both serving and downloading
fil
es from remote U
-
P2P nodes. In the current prototype, U
-
P2P follows the Napster model


this
means that there is also a central server that acts as a database for information about all shared
objects in the system. As with Napster, the information about th
e location of files is stored
centrally but file transfers are conducted between peers. We are considering other possible peer
configurations, such as the Gnutella distributed model and a hybrid one like FastTrack.


U
-
P2P is designed with three major compo
nents: the
WebAdapter
,
PeerNetworkAdapter
, and the
Repository
. These components form the core of U
-
P2P and provide all the services needed to
share and discover resources on a Peer
-
to
-
Peer network.

Figure 2: The “stamps” community




WebAdapter



Glues the com
ponents together and provides a single point of access for the user
interface. If the user interface were not a web browser, this component would be replaced with a
suitable adapter.


Repository



Stores all shared XML resources in a persistent XML databas
e (using the XML:DB
Database API, [8]) and provides local search capabilities to either the WebAdapter or the
PeerNetworkAdapter. Note that this allows for distributed P2P topologies where each node must
be able to execute searches against its own set of s
hared resources.


PeerNetworkAdapter



Provides an interface to the underlying Peer
-
to
-
Peer network and is
responsible for servicing search requests, publishing resources and downloading resources from
the network.


The above components are modeled as Java

interfaces, with their implementations as
DefaultWebAdapter, DefaultRepository and GenericPeerAdapter respectively. Additional classes
include:


FileMapper



Maps resource IDs to real files on the local file system. When a file is ‘uploaded’
it is assigne
d an ID and mapped without modifying the file. The FileMapper is persisted only on
shutdown of the U
-
P2P client and file mappings are restored on startup.


FileMapEntry



Holds a reference to a resource file and all its attachments, pulled from the
resourc
e in the upload process. Attachment names must be unique within a resource. Resource
IDs are generated from the content of the XML file using an MD5 hashing function. When
hashing, a special ResourceProcessor is used that omits any attachment links within
the XML.
This allows the hash to stay consistent when the links are changed by another peer upon
download of the resource.


BasePeerNetworkAdapter



A skeleton class that holds a reference to a DefaultRepository and
implements the accessor for the reposito
ry. The GenericPeerAdapter is the generic P2P
WebAdapter

Repository

PeerNetwork
Adapter

Web
Browser

Jakarta
Tomcat

P2P
Network

Java

Servlets

&

JSPs

XMLdb

Figure 3: U
-
P2P architecture

implementation included with U
-
P2P, which follows a Napster
-
type model. Any developer
wishing to provide an alternative peer
-
to
-
peer deployment, such as a fully distributed one, or one
that would plug into an e
xisting network, would have to provide a different adapter.


DatabaseAdapter



Performs the dirty details needed to get an XML database up and running
and to configure the port that it runs on. The current implementation uses eXist 0.8, an open
source XML
database [9]. If a switch were made to a different XMLdb implementation, this class
would be sub
-
classed or replaced.


The class diagram in figure 4 illustrates the relationships between these classes:







The web
-
based user interface requires that dy
namic pages be served up to the user for such
activities as
Search
,
Create

and
View
. The general flow of events that occur in one of these
activities involves the user accessing a JSP, the JSP submitting a form to a Servlet and the Servlet
talking to the W
ebAdapter and then returning through a JSP.


Figure 5
shows the pages and Servlets involved in each activity as well as the two extra Servlets
needed for diagnostic reasons and for servicing download requests.


Figure 4: U
-
P2P core classes



Security in U
-
P2P
-

In peer
-
to
-
peer and other distributed systems, there are concerns about the
data integrity, authentication and authorization that are common to other network communication
systems. In U
-
P2P we are concerned with a layering of meta
-
data that is used o
n top of an
existing network that may or may not be secure. For this reason, the bulk of security measures are
left up to the network adapter used to communicate with the underlying network. As each peer
network has its own requirements for security, it wo
uld not be suitable for U
-
P2P to impose a
minimum level of security for all networks using the U
-
P2P layering as this would restrict the
ability to join public, non
-
secure networks such as Gnutella or Freenet. Instead, it is up to the
founder of a communit
y to decide which network adapter to use and the level of security provided
by the adapter will then be used in all network communication.


It is reasonable to assume that a set of standard adapters could be made available alongside a
secured version of ea
ch adapter. The currently available centralized peer
-
to
-
peer adapter could for
example, communicate through Secure Sockets Layer SSL [ref] channels and wrap all XML
resources with an XML Signature [ref], thus assuring that the content of the resources have

not
been tampered with while in transit or when shared by another user. The current Tomcat 4
platform used by U
-
P2P has full provisions for SSL communication, but the current release of U
-
CreateServlet

UploadServlet

SearchServlet

DownloadServlet

create.jsp

view.jsp

download.jsp

displayResults.jsp

search.jsp













WebAdapter

Create

View

Search

External

Download

Requests

DatabaseViewer

Diagnostic

Tool

Figure 5: U
-
P2P Servlets

P2P is not secured. U
-
P2P also uses the Java Servlet standard that
provides role
-
based security
suitable for deployment and integration with existing infrastructure.



It should be noted that the current implementation of U
-
P2P uses MD5 sums to generate a unique
ID when a resource is first uploaded to the network. This ID

is not intended for security purposes,
but as a simple means to check if multiple users are sharing the same resource. In consideration
for a future release of U
-
P2P, the core of U
-
P2P could make use of the MD5 sum and XML
Signatures for integrity and aut
hentication of shared resources, with the security of
communications remaining in the network adapter.


V
-

Case Study: Design Patterns


The Carleton Pattern Repository [10] was started in 1999. It serves as a repository for software
design patterns and pro
vides extensive search capabilities over an as yet, small list of patterns.
The patterns are represented in XML using a DTD designed for especially for the repository
project.


The repository website contains papers on representing design patterns in XML,
searching over
design patterns and even a small mention of a distributed model for the repository [11]. The
distributed model proposed was for each author group to have a repository server with a fixed list
of the other servers in the network. The servers
would presumably form a highly distributed mesh
and send out their searches to all other servers. This model was not implemented and evidently,
no one else has pursued the idea.


Using the DTD as a basis we have developed an XML Schema for representing des
ign patterns
[12]. This is used as the basis of a file
-
sharing community for design patterns. In addition to the
schema a custom stylesheet was required to render this complex object since the default
stylesheet is tailored to more simple formats. Another
design problem is deciding which parts of
the design pattern should be indexed. The community designer can also control this by
implementing a stylesheet to filter indexable attributes from the XML object before submitting
them to the local or remote datab
ase.


To our knowledge, prior to our work there has been no way to share design patterns in a peer
-
to
-
peer fashion that incorporates meta
-
data search. When fully implemented this U
-
P2P based
system will expand the benefits of peer
-
to
-
peer file
-
sharing to t
his area. Such a system would
allow computer scientists and students to publish a rich collection of patterns into an underlying
peer
-
to
-
peer network, search them using rich queries and replicate popular patterns to increase
their accessibility. The commun
ity
-
discovery aspect could also be used to access sub
-
communities devoted to different classes of design patterns or based on different underlying
networks.


VI
-

Conclusion and Future Work


U
-
P2P is a peer
-
to
-
peer framework that allows a user to describe,
share and discover communities
just like any other resource. Once a community is found, its schema and associated stylesheets are
downloaded and are used to perform search and publishing of resources specific to that
community. Communities play here a gene
rative role similar to metaclasses in object
-
oriented
languages. Possible applications of U
-
P2P range from sharing resources such as resumes,
knowledge management in a corporate setting, or distributed repositories for design patterns and
software componen
ts.


A major direction for future work is in demonstrating the protocol independence of U
-
P2P. By
developing PeerNetworkAdapters to interface to existing networks such as Freenet or Gnutella,
U
-
P2P could become a meta
-
data layer that would provide an enhan
ced community
-
based search
capability.


U
-
P2P is an open source application licensed under the GPL, and makes use of other open source
products such as Jakarta Tomcat [8] and eXist [9]. Complete source code and documentation as
well as guides and presentat
ions are accessible at

http://u
-
p2p.sourceforge.net

.



References


[1] Napster,
http://www.napster.com

[1] Gnutella, http://www.gnutella.net

[1] Thadani, Sumeet. "Meta
Information Searches on the Gnutella Network"
http://www.limewire.com/index.jsp/metainfo_searches

August 2001.

[2] Thadani, Sumeet. "Meta Data searches on the Gnutella Network (addendum)"

http://www.limewire.com/developer/MetaProposal2.htm

July 2001.

[3] FastTrack,
http://www.fasttrack.nu

[3] OpenCola project,
http://www.opencola.com

[3] Bitzi, http://bitzi.com

[3] XML Schema Part 0: Primer:
http://www.w3.org/TR/xmlschema
-
0/

[4]
Extensible Markup Language (XML) 1.0 (Second Edition)
,

W3C Recom
mendation
, 6
October 2000, Tim Bray, Jean Paoli, C. M. Sperberg
-
McQueen, Eve Maler.

[4] Resource Description Framework:
http://www.w3.org/RDF/

[5] The Semantic Web, Scientific American, May 2001, Tim Berners
-
Lee, Jam
es Hendler and
Ora Lassila

[6] Nejdl, Wolfgang et al. "EDUTELLA: A P2P Networking Infrastructure Based on RDF"

http://edutella.jxta.org/reports/edutella
-
whitepaper.pdf

November 14, 2
001.

[7] Extensible Stylesheet Language Transformations (XSLT) Version 1.0, W3C
Recommendation, 16 November 1999, James Clark (Editor), http://www.w3.org/TR/xslt

[7] Gong, Li. "JXTA: A Network Programming Environment." IEEE Internet Computing Vol 5.
No. 3
. May/June 2001

[8] JavaServer Pages
, Sun Microsystems, http://java.sun.com/products/jsp/

[8] Jakarta Tomcat, Apache Software Foundation:
http://jakarta.apache.org/tomcat


[8]

XML:DB API
, Working Draft, 20
September 2001, Kimbro Staken (Editor),
http://www.xmldb.org/xapi/xapi
-
draft.html

[9] eXist 0.8,
http://exist.sourceforge.net

[10]
The SSL Protocol Version 3.0
, Internet
-
Draft, 18 November 1996,
http://wp.netscape.com/eng/ssl3/draft302.txt

[11]
XML
-
Signature Syntax and Processing
, W3C Recommendation, 12 February 2002,
http://www.w3.org/TR/xmldsig
-
core/

[10] Dwight Deugo, Darrell Ferguson, Carleton Pat
tern Repository.
http://muffin.nexus.carleton.ca/~darrell/repo/

[11]Darrell Ferguson. “Updates to the Pattern Repository.”
http://muffin.nexus.carleton.ca/~darrell/papers/UpdateReport.pdf

[12]Neal Arthorne. A XML Schema for Design Patterns.
http://chat.carleton.ca/~narthorn/proje
ct/patterns/pattern.xsd