Technical and Privacy Challenges for Integrating FOAF into Existing Applications

childlikeprudenceInternet and Web Development

Dec 5, 2013 (3 years and 4 months ago)

75 views

Technical and Privacy Challenges for

Integrating FOAF into Existing Applications


Joseph Smarr

Plaxo, Inc.

1300 Crittenden Lane, Suite 300

Mountain View, CA 94043 USA

joseph@plaxo.com


Abstract


The appeal of
FOAF
-
enabled applications, and thus to a large extent the success of FOAF
itself, depends on having enough people creating and maintaining FOAF files that the
network achieves critical mass. While initial FOAF adoption by individual users on their
Web sites (or

using certain publishing tools) has been encouraging, chances for
widespread adoption will be greatly increased if existing companies and organizations
with large user bases can be persuaded to offer FOAF support for their existing users’
data. In particu
lar, social networking and contact management services represent an
important
conquest for the FOAF community

because they are building and maintaining
exactly the set of data that FOAF describes (personal information and links to other
people).
It is thus

vital that we understand what hurdles these services face when
considering the a
ddition of FOAF
-
enabled features
.


In this paper, w
e examine a number of
specific
technical and privacy
issues facing
organizations that are considering integrating FOAF suppo
rt into their current
applications.
The problems discussed here
emerged as a result of

a startup’s (Plaxo, Inc.)
research an
d engineering efforts with FOAF

and represent both practical problems and
ethical dilemmas, both of which will need to be addressed
by
all

organizations
considering the use of FOAF. We believe that widespread adoption of FOAF will only be
possible once organizations have a roadmap that is both technically clear and
economically and ethically sound.

The goal of this paper is thus to art
iculate the issues,
discuss some possible
responses
, and illicit community feedback in order to craft
real
solutions.


Introduction


The

Friend
-
of
-
a
-
Friend

(FOAF) project is an instance of a popular and growing trend in
Internet technology development towa
rds open, machine
-
readable access to data and
computer programs distributed across the world. The general idea is that if it’s easy to get
programs to talk to other programs and share data, without requiring that the engineers of
each program form a
n expli
cit

partnership, a whole new class of

Semantic Web

services can be built that automatically stitch together disparate aspects of life. This in
theory should
decrease friction and

redundancy

in the user experience

(e.g., having to
enter the same informati
on
twice
for two different services)
and enable novel extensions
to existing services to be built in a distributed fashion (e.g.,
visualizing your
contact list

as a network).


In the case of FOAF, the innovation is to provide a
standard format

(based on
R
DF
) for
describing a person’s contact information, as well as his/her relationships to other people.
(FOAF is also used for describing a person’s projects and other material, but for the
purposes of this paper we fill focus on
contact information and links

to other people,
since this is the core data that enables a network of people to be assembled.)

In addition
to providing your contact information in the form of a business card or on your Web
page, if you make it available in FOAF, other programs can fin
d and interpret it
automatically. This would make it easy to pop up a quick biographical sketch of a person
anytime you visit a page they made, or to add that person’s contact information to your
address book. In addition, if you list your friends and asso
ciates in your FOAF file,
programs can go find these people, and if they too use FOAF, it can show you their
contact info and their friends, and so on across an emergent worldwide social network.


Proprietary social networks like
Friendster
,
Orkut
,
LinkedI
n
, and countless others

have
attracted millions of users to fill out a personal profile and link to their friends, usually
with the objective of dating or business networking. The success of these networks
depends on having enough people join that you can
find people you’re interested in

a
small network holds little value because you don’t know or care about any of the people
on it. Thus as new networking services are released,
there is pressure to grow them as
quickly as possible.
Given the time
-
consuming

and annoying task of entering your
contact information and friend list into yet
-
another
-
social
-
networking
-
service (
YASNS
),
people have
called upon

these sites to
offer import and export of member data in
a
standard format such as
FOAF
. In February 2004,
T
ribe.net

announced their
intention

to
support FOAF, which sparked a
fervor

of additional calls for similar services to follow
suit.


In theory, offering FOAF support for a social networking service could have several
benefits in addition to easing user acq
uisition

(input)
.
By allowing members to publish
their data in FOAF (export), third parties could build additional features that leverage and
extend the value of the original network. Just as many services now offer open APIs to
support the development of
plug
-
ins, opening the data on social networks would let
people build new and
exciting

applications that members could immediately
take
advantage of.


The
author

of this paper is a senior software e
ngineer at
Plaxo
, a company that helps
people keep their ad
dress books up
-
to
-
date by automatically synchronizing contact
information between friends and associates. Members each maintain their own contact
information and decide with whom they want to share it. If Todd has permission to see
Ryan’s home information,

for example, then when Ryan gets a new apartment and
updates his Plaxo cards, Todd’s address book will automatically be updated with Ryan’s
new address.
We’re not strictly a social networking service (though we’re often
considered

one): l
ike most social n
etworking services, members provide Plaxo with their
own contact information and their address book, but unlike most of these sites, a
member’s address book is kept strictly private (since it contains potentially sensitive
contact information) and thus the
re is no social browsing from friend to friend.


At Plaxo, we’ve long been interested in the FOAF project, since
Plaxo maintains exactly
the data that FOAF describes, and our goal is to create a global and ubiquitous contact
network that is available in al
l applications requiring contact information. The frustration
mentioned above at having to enter contact information over and over again is precisely
the problem that Plaxo is designed to solve.

And while initially we chose to focus on
integrating with Mic
rosoft Outlook and Outlook Express because of their large market
share, we would like people to be able to Plaxo
-
enable all of their favorite applications.
In support of this vision, we created a
SOAP

API that partners can use to
talk to the Plaxo
network.

We also created a number of internal prototypes that use FOAF to respond to
contact information requests (input) and publish
members’

contact information and
address book
s

on the Web (output).
These prototypes quickly raised a number of
technical issues a
nd privacy concerns for which we couldn’t find easy answers, and
which prevented us from releasing most of this work publicly (though we
have released

limited FOAF output

for Plaxo members as a hidden feature).



After Tribe’s announcement renewed the clam
or of calls for companies to support FOAF,
we decided to ask the FOAF community for advice
on how to solve the problems we ha
d
encountered
. We published an article on
Plaxo’s Blog

entitled “
Plaxo and FOAF: What’s
the right model?
” which started a lively
an
d productive
discussion.
The goal of this paper
is to elaborate on the issues that we found while trying to answer the rally
ing cry of the
FOAF supporters and to
continue discussing solutions.


In order to get more companies to answer that cry, the FOAF co
mmunity should work to
develop a practical “FOAF integration guide” that explains how to add FOAF
-
enabled
features and provides a compelling case for doing so. This involves addressing
both
technical challenges (how to make it work) as well as
privacy conc
erns

(how to make it
acceptable). The balance of this paper explores these issues in turn, concluding with a
discussion of their implications for the companies and for FOAF itself.


Technical Challenges


Even if companies understand the potential benefits
of supporting FOAF, there are still a
number of technical issues that must be addressed before support becomes practical.
These issues range in scale from logistic details to additional infrastructure requirements.
But
confronting them is a necessary part
of “scaling up” FOAF to tackle today’s large
scale applications of social software.


Extensibility

While the data Plaxo stores for each member closely resembles the data stored in a FOAF
file, Plaxo members store more contact fields in their cards than FO
AF currently supports

(examples include
job title

and
birthday
)
. Plaxo
stores essentially the same set of contact
fields used by Microsoft Outlook and the standard
vCard

format (it’s worth noting that
Plaxo members can
publish their contact information

as
a vCard that always stays up
-
to
-
date). When importing or exporting FOAF, we try to map the contact fields that have an
equivalent representation (e.g.
mbox
,
phone
,
and

workplaceHomepage
),

but ideally
members should be able to preserve the full richness of
their contact information (and the
contact information in their address book) when moving in and out of FOAF. Of course
the FOAF specification could be amended to store additional contact fields, but the more
general problem is that any service will have s
ome of its own special fields that aren’t
covered in the standard. The solution is either to adopt a lowest
-
common
-
denominator
approach in which extra information is simply lost, or to provide a standard mechanism
for extending FOAF in such a way that this

information can be preserved and eventually
understood by additional services.


Since FOAF is based on RDF, and commonly represented in XML documents that can
mix tags from multiple namespaces, in pri
nciple this should be possible. For example,
there’s a

W3C submission on
representing vCards in RDF

that’s used by
eventSherpa

(
among others
)

to augment the FOAF files they generate for members

(see
,

for example
,

Paul Cowles’s profile
).

However,
the

proposed schema

appears not to have been updated
since Febru
ary 2001, and it has not been adopted as a recommended standard yet.
There
is also
ContactML
, but it too appears to have limited support and no official status.


To address extensibility, we believe the FOAF community should do two things. First, it
shoul
d establish a set of “best practices” and examples of how to extend FOAF when
additional data needs to be represented.
If there are clear limits to what data people
should and shouldn’t try to add to a FOAF file, those should be articulated.
Second,
the
co
mmunity

should maintain a repository for common extensions to FOAF, so that
if

a
number of organizations
want

to represent
a similar set of
additional fields,
those fields
can either be
added

to
the FOAF standard itself, or else standard extensions can be
agreed
upon so that sites can
continue to
read each other’s

output.
To ignore the need for
extensibility and instead focus only on sharing the current information expressible in
FOAF would be a mistake, as it would
prevent full integration with existing se
rvices and
stifle innovation. However, to go about extending FOAF in a haphazard and disorganized
manner would risk losing the standard format that makes FOAF exciting in the first
place.



Permissions

FOAF files today are static and public

users decide wh
at information to include and
everyone in the world can see that information. In contrast, Plaxo members can set up
detailed permissions
about which users can see what of their information. Most members
allow people that already know their e
-
mail address t
o see their public (business)
information, but restrict access to their private (home) information to a select group of
friends and associates to which they grant explicit permission. Furthermore, the contents
of a member’s address book are completely hidd
en from other members, though one
could also imagine letting members opt to share their address book with trusted contacts.

Given the strong emphasis that Plaxo places on protecting its members’ privacy and
giving them complete control over what informatio
n is shared and with whom, it is
difficult to release much if any of their data as open FOAF files available to any
W
eb
surfer.


In our current feature that lets people
publish

their contact information on a Web page
(mentioned above), members check wheth
er to publish their business information or
personal information (or both) and a URL is generated with a special key that
will only
display the requested information.
This is also how
HowdyCard

handles privacy: users
can keep several cards, each with their

own password/URL to hand out to qualified
recipients.
A similar approach could be taken for publishing FOAF files, but it has a
number of drawbacks. First, it results in several different URLs being generated to share
different amounts of information (one

for business information, another for personal
information, etc.). This is cumbersome and potentially confusing, both for users (“which
URLs do I give to whom?”) and for computer programs like
aggregators

that might use
the data (“how do these files relat
e to one another?”).
Second, there is nothing directly
preventing unauthorized users from viewing
the protected

information
; it is up to
members to keep their private URLs private, but this can be difficult when trying to share
them with friends over e
-
ma
il and the Web, which are generally insecure and prone to
leaving a trail.


One solution that has been
proposed

by
useful inc.

is to encrypt sensitive information

using
PGP

and to store it in a separate file that is linked to the main FOAF file using the
s
eeAlso

relation and the
wot

vocabulary
. The idea is that anyone can see the public
information, but only people with the right private keys can decrypt the additional
sensitive information
, and there is only one URL for everyone (since the extra
informatio
n is linked to from within the main document). This is
a clever technique, but it
requires that everyone who wants access to the private information get a public/private
key pair and publish the public key in advance. Furthermore the sensitive information
needs to be encrypted using all of the public keys of would
-
be recipients. This means that
if you want to give a new person access to your personal information, you need to get
their public key and re
-
encrypt your sensitive file with this additional key. A
n alternative
might be to use a single private key as a “password” and to give this key to trusted
contacts, but while this simplifies publishing and permissioning, it ends up being more
like the original scheme of distributing secret URLs (which are essen
tially passwords)
and relying on security through obscurity.


Like extensibility, adding support for permissions and restricted viewing is essential for
breaking FOAF out of a one
-
size
-
fits
-
all model and allowing more complex services to
embrace FOAF with
out compromising data richness or privacy.
We don’t have a clear
solution at this time and community feedback will be essential in deciding how to
leverage existing technologies and how much responsibility should be placed on FOAF
itself.


Authentication

The main difficulty in adding a permission model to FOAF is that there’s no
authentication of the person viewing a given FOAF file. Since FOAF files are static Web
pages, they’re available to
any person (or computer program) that happens upon them,
and the
re’s no way to look up who’s requesting to see a member’s FOAF file and what
permissions they’ve been granted.
Within Plaxo, all
our members

are authenticated by
their e
-
mail address
es (which are

verified with a round
-
trip)

and they each have

a
password,
so when one member wants to look at another member’s contact information,
Plaxo knows exactly what information to show. Thus Plaxo
internally
supports restricted
access to information without requiring extra URLs, private keys, or other shared
passwords.
I
f Plaxo were to let members publish their contact information using FOAF,
we could provide a single URL for each member and require that other members login
before getting access to the

FOAF file. O
nce logged in
,

we could dynamically generate
the FOAF file

with the information the viewer was authorized to see.


T
his is fine for Plaxo members viewing each other’s contact information, but the premise
of FOAF is that it’s an open and distributed standard in which no one company or
organization is
in control
.
I
f a non
-
Plaxo member wants to view a member’s FOAF file,
Plaxo has

no choice but to

just present public information.
What’s needed is an
authentication scheme that’s as distributed as FOAF itself

one in which users can grant
each other permissions in a sta
ndard format, and people that want access to private
information first authenticate using a standard mechanism.
The
complex and well
-
studied
technical details of distributed authentication

schemes
are outside of the scope of
this paper
, but s
ee for instan
ce
the
Liberty Alliance

project or
Drupal
. However, some
form of distributed authentication
is required to properly enable permissions on top of
FOAF files.


One interesting approach to distributed permissions for contact information

is offered by
clink sy
stems
.

Each user is identified by a “contact link” (or “clink” for short), which is
essentially a URL like
joseph.plaxo.com
. Rather than being granted by a central
authority, anyone can make a clink for themselves using their own domain. Users grant
each o
ther permissions by attaching each other’s clinks to their contact info. So to reuse
the example above, Ryan can give Todd permission to his contact info by
attaching

Todd’s clink

to it
. When Todd requests Ryan’s information, his clink is there so he is
gr
anted access

(assuming that both Todd and Ryan are using click
-
enabled servers).

Todd
still has to login to
a clink server to access any data that’s been granted to him
(authentication is handled via public
-
key encryption) so

he can distribute his clink
wi
thout compromising his own privacy.

This is essentially a simplified and customized
form of
PKI

authentication, but it still requires that everyone generate and distribute
clinks (that don’t change) and maintain private passwords. It also requires that ev
eryone
hosting sensitive data run a clink server to handle authentication.
It is thus somewhere in
between a data standard like FOAF and a web service like an API.



Priv
acy Challenges


While the technical logistics of extending FOAF and adding authenticat
ed permission
controls are certainly important and need to be worked out in more detail, the more
fundamental issue holding back widespread adoption of FOAF is privacy. As described
above, FOAF files are inherently public and essentially make accessible to

everyone a
person’s contact information and address book.
Many current social networking and
related services have found that their customers regard some of this information as
extremely sensitive, and thus provide members with a great deal of control ove
r what gets
shared with whom. For example, Orkut lets members restrict most of their information to
just their list of friends, or additionally all friends of friends (as opposed to making it
available to everyone). Different fields can be marked with diff
erent levels of permission,
so members can decide individually what information is particularly sensitive. Similarly,
Plaxo lets
members decide separately who has access to their work information and their
home information, and a member’s address book (bot
h who’s in it and

their respective
contact information) is kept strictly private. These privacy safeguards are essential for
users to trust and feel comfortable using these services.
While the previous section
discussed technical challenges in implementing

a permission system on top FOAF, this
section discusses what’s at stake with sharing data in the first place.


Deciding how much to share

Currently Plaxo, and organizations with similar privacy standards, would be unable to
publish much of their members’
data as FOAF

particularly the contents of members’
address books

without violating their own privacy policies. FOAF files without any
foaf:knows

links are of limited value since they are isolated nodes that cannot be
connected to the larger network.
Even i
f we only published biographical data in FOAF, it
would still be
difficult

to automatically release much information, since even what
members designate “public” is currently understood to mean “available to other
authenticated members of Plaxo that already

know my e
-
mail address”. This is a much
stricter standard than “anyone with a Web browser”. Each service has its own privacy
policy, but few if any would allow member data to be released publicly without explicit
user consent. Thus FOAF
-
support will likel
y need to be an
opt
-
in

feature, in which little
or no information is made available by default, but members can elect to share more
information if they see it as valuable.


While requiring users to opt
-
in to making their information available as FOAF is
n
ecessary to ensure privacy, it introduces a number of additional challenges and
drawbacks. First, it will realistically
mean that only a small fraction of the network will
end up with FOAF files, since opting
-
in requires learning about the feature,
underst
anding the benefits, and going through an activation process. At Plaxo, we have
found that most users
tend to keep

the
default settings we provide, and
relatively

few

power
-
users


spend a lot of time exploring the
other options

that are available
.
Given
t
hat

the impetus for getting major social networking sites to support FOAF

is

to
dramatically increase the number of available FOAF files, the challenge is to protect
privacy without destroying the value that FOAF adoption was supposed to create in the
firs
t place.


Even if members are told that they can publish their information as FOAF, it may be
difficult to explain the immediate benefit of doing so. Currently there are very few
compelling applications
that take advantage of FOAF, largely because there a
re very few
FOAF users
driving

such applications to be built. This is a classic chicken
-
and
-
the
-
egg
problem, in which
the draw to get users to embrace FOAF depends
in a sense
on
their

having already embraced

it
.

Furthermore, every feature that is added to
a service is a risk:
it complicates the interface, dilutes the existing feature set, and requires explanation and
support. While
awareness of FOAF has spread quickly in its short life, in a mass
consumer service it is a safe assumption that the vast majori
ty of users will have never
heard of it. Thus
it will be
asking a lot of
companies to potentially compromise their
members’ privacy

even if it just means asking their users to opt
-
in to such a service

given that the take
-
rate and immediate benefits are bot
h likely to be low.


Data ownership vs. p
rivacy

The conclusion from above is that in order to protect privacy, users will generally be
required to opt
-
in to sharing their contact information and address books using FOAF.
However, even getting the consent o
f the address book owner is not enough to remove
privacy from the equation.
If members publish the names and contact information of
people in their address books, they are also potentially compromising the privacy of those
contacts, who were never given t
he opportunity to object.
There is a subtle but extremely
important tension between the rights of the data
owner

(in this case the owner of an
address book) and the rights of the person whom the data
describes
.

For example, if Todd
wants to publish Ryan’s
contact information on the Web using FOAF, Ryan may object
on the grounds that his privacy is being compromised. While his objection is
understandable, should Ryan be able to tell Todd what he can and can’t do with his own
address book? Both parties are so
mewhat entitled, and clearly
it can’t be both ways

the
decision must ultimately fall to either the owner of the address book or to the person
contained therein.


According to US law, the information in a
person’s address book is the property of the
owner o
f the address book and not the property of the individuals
about whom the
information pertains. In the case of Plaxo, no one can demand that their information be
removed from someone else’s address book

once it’s there, it’s the property of the
owner. To m
ake a physical analogy, once you give someone your business card, they can
do with it what they want, and you can’t demand it back.

It seems ludicrous to suggest
that you should be able to break into someone’s house and steal back your business card,
but i
n the digital world it becomes less clear, and some have called
for an alternative
scheme in which the person described by a piece of data is always in control of it.



At

Plaxo, we sometimes get requests from a non
-
member that we remove all copies of
his/
her information from our member
s


address book
s

on the grounds that

he/she never
consented to our storing of it in the first place.
Our

standard response

is to remind the
requestor of US data ownership law, but as a courtesy we contact the members and
requ
est that they remove his/her data anyway. Of course in practice our members are

always happy to accommodate such requests, so the
conflict between ownership and
privacy has remained fairly benign.


This delicate balance is upset if ownership of data now po
tentially includes publishing
that data on the Web.

When using Plaxo
, members’ address books are only available
themselves and to Plaxo (so we can synchronize between multiple computers and provide
secure online access).
Thus the major objection of people
asking for removal of their
information is either ideological or out of mistrust for Plaxo as a company. It is not, in
other words, a reaction to any harm currently being done, but only in an attempt to
prevent future harm. If
members are allowed to share
their address books

as FOAF
,
suddenly non
-
members may find their personal contact information being

published on
the Web, which can be demonstrably harmful.

Wh
ile it wouldn’t technically break any
laws for

Plaxo to let members publish their address books o
n the Web, it would likely
exacerbate
a sensitive pri
vacy concern, and thus we are w
ary to do so.


The challenge for the FOAF community is to articulate a middle path by which people
can create and publish FOAF files, including the list of people they know
, while
respecting the privacy of those people whose information is released in the process.

For
example, members of
Ecademy

can publish their
information and
contact list in FOAF,
but th
e only information that gets
included
about other members

is
their na
me and e
-
mail
address

enough to establish a link, but not enough to give away sensitive contact
information. In fact, even the e
-
mail address is encrypted using a one
-
way hash (SHA1).
This makes it possible to look up a given e
-
mail address in a member’s F
OAF contact list
(by comparing the hashed addresses) but the original e
-
mail address can’t be recovered
by a stranger and used for spamming.
The advantage of this approach is that each
member decides how much information about themselves to share, and the
foaf:knows

links are only used to “wire up” the network. The disadvantage is that members can’t
fully export their address books in FOAF

they lose all the extra contact fields in the
process.


E
-
mail vs. SHA1

The sensitivity of publishing e
-
mail addresses
is particularly high given the current
prevalence of spam.
An early innovation of FOAF was to offer SHA1
-
encyrpted e
-
mail
addresses as an alternative to displaying them raw. As mentioned above this is sufficient
for verifying a link between two known e
-
mai
l addresses, but it prevents unknown
addresses from being leaked.
However, clearly e
-
mail addresses need to be shared at
some point if people want t
o communicate with one another. Thus
a sub
-
challenge in
sorting out
when and
how much information to display

is

developing a set of “best
practices” for when to display raw addresses and when to hash them.


In the case of Plaxo, links between members are established by e
-
mail address (if I have
you
r

e
-
mail address in my address

book and you join Plaxo and regis
ter that address, I
can get your public information) so in all cases where data is being shared, the e
-
mail
address is already known. Thus we have not needed to hash or otherwise encrypt
addresses, and in fact having raw addresses is critical to establishi
ng connections.
The
compromise of Ecademy is to display raw addresses for
u
sers’ own contact information
(if they so desires), but to always use SHA1 for entries in their contact lists. This is
basically avoiding the issue of ownership vs. privacy by side
-
stepping it (though someone
could still complain that even their name was being published).

But it follows the
philosophy that a FOAF document is primarily about the author, and while the
foaf:knows

links can be used to establish relationships with other p
eople, they are not
intended to store their contact information.


If everyone has their own FOAF file, of course, there is no need to store information
about other people because they are storing it themselves, but until this is the case,
restricting the
use of
foaf:knows

means deliberately dropping data that would otherwise
be available. Many social networking sites require you to import your address book and
store several contact fields for each entry. If FOAF is supposed to make it easy to
maintain your

data across several such sites, such a re
striction may run counter to it
s
original intent.


Conclusions


Widespread
adoption of FOAF would be greatly aided if existing applications with large
user bases can be convinced to let their users publish their co
ntact information and
address books using FOAF. But
merely shouting to these companies that they should
embrace open standards or die is unlikely to be sufficient because there are real technical
and privacy issues involved in such a release of information
.
Thus the best course of
action for the FOAF community is to carefully consider these problems and develop a set
of technical solutions and privacy
-
conscious best practices that they can clearly articulate.
These include supporting extensibility, adding p
ermissions with authentication, deciding
what information should be shared in what circumstances, and respecting the rights and
wishes of
both the people that own the data and the people that data describes.


Developing a FOAF integration roadmap will do m
ore than just making it easier for
companies to offer FOAF support. It will also serve to direct the evolution of FOAF
itself. The issues raised above suggest several underlying tensions in the design of FOAF.
Is FOAF just a static data format, or could it

evolve

to be more like an API, including a
mechanism for granting permissions and authenticating requestors? Should
foaf:knows

links be used only to point to other FOAF files, or can they safely be used to richly
represent a user’s address book.

How shoul
d FOAF trade off the use of raw e
-
mail
addresses and SHA1 sums? These questions
require pragmatic answers for FOAF to
transition from a research project to a mainstream technology. But they also offer new
research challenges that will provide the fuel for
a new phase of experimentation.