PPT - Fermilab

woodruffpassionateInternet and Web Development

Oct 21, 2013 (3 years and 7 months ago)

84 views

1

Social Networks, the Semantic
Web, and the Future of Online
Scientific Collaboration

Jennifer Golbeck

University of Maryland, College Park

2

Overview


What is the Semantic Web?


How can it help us do science?


About Web
-
based Social Networks


Combining the Semantic Web, Social
Nets, Science, and Provenance

3

What is the Semantic Web


Extension of the current web


Make information machine processable


Supported at the W3C

4

Current Web to Semantic Web


HTML is designed to make documents
on the web easy to read for humans


Computers have difficulty
“understanding” what is on the web


We do ok with keywords for text


What about videos, pictures, songs, data?

5

Stuff We Want


Find me the mp3 of a song that was on the
Billboard top 10 that uses a cowbell


Show me the URLs of the blogs written by
people my friends know


Get a video where it’s snowing



All of this is hard to do on the web as it stands


6

Making it Easier


On the Semantic Web, data is represented
in a machine readable standard format


Some created automatically, some by humans


Ontologies add semantics


Each datum is uniquely identified by a URI


Distributed data can be aggregated and
integrated into one model

7

Semantic Web Technologies


URIs


Ontologies


Standard Languages


RDF


RDFS



OWL


SPARQL

8

Example: A Video of it
Snowing


On the Semantic web, people will annotate
their data, but they won’t annotate
everything


If my video is of two government officials
meeting, the weather may be irrelevant to
me


How can the semantic web solve this? Do
people have to annotate everything?

9

Linking Distributed Data

Video

Location

Date

President

Prime


Minister

More data

NWS

Weather

Data

Precipitation

Temperature

Camera

Info

10

Data Aggregation


URIs are unique.


If the same URI is used in two files, it
refers to the same object


Semantic Web tools (e.g. things like
databases that understand the
semantics of the languages) build
models that merge information about
the same URI


Model can be queried, filtered, used

11

Semantic Web for Science

12

Provenance


The history of a file or resource


Files that were used in its creation


Processes executed to create it


When, where it was created


Who created it

13

Why is it important?


People in the scientific and intelligence
communities are very interested in
provenance


Science: provenance of data can be
used to recreate them


Intelligence: provenance of information
is important to determine its reliability

14

Example in Science


We want to track the
workflow

that lead
to a given scientific image:


What were the files used to create it?


What is the provenance of those files?


What process was performed to create
the file?


When was that file created?


Who executed the processes?

15

Case Study: A Semantic Web
Approach to the Provenance
Challenge

16

The Provenance Challenge


Tracking provenance is a growing topic of
interest to computer scientists


Applications to grid computing, file systems,
databases, etc


The challenge is to build a system that will
track the provenance of files produced from a
workflow


Series of procedures performed to produce output


functional Magnetic Resonance Imaging (fMRI) is
the example in the challenge

17


18

Challenge


Represent all data that we consider
relevant about the history of each file


Answer as many queries as possible

19

Queries


Find everything that caused a given Graphic
to be as it is.


Find all invocations of procedure align_warp
using a twelfth order nonlinear 1365
parameter that ran on a Monday.


Find all images where at least one of the
input files had an entry global
maximum=4095.



A user has annotated some images with a
key
-
value pair center=UChicago. Find the
outputs of align_warp where the inputs are
annotated with center=UChicago.

20

Semantic Web Approach


Each procedure in the workflow is
encoded as a web service


Workflow is an execution of a series of
web services


Web Services take files as input and
output files to the web

21

Semantic Web Approach


Ontology represents information about
the execution of services and the
dependencies of files


22

Provenance.owl

23

Answering the Queries


SPARQL, a W3C standard, is used to
formulate queries


Reasoning with the semantics of OWL
and some rules

24

Results


We were easily able to answer all nine
queries for the challenge


Semantic Web is an easy and natural
format for representing the provenance
of scientific information



So, with a format for representing data
and metadata, what next?

25

Social Networks: The
Phenomenon

26

What are Web
-
based

Social Networks


Websites where users set up accounts
and list friends


Users can browse through friend links to
explore the network


Some are just for entertainment, others
have business/religious/political
purposes


E.g. MySpace, Friendster, Orkut,
LinkedIn

27

Growth of Social Nets


The

big web phenomenon


About 150 different social networking
websites (that meet the definition that they
can be browsed)


275,000,000 user accounts among the
networks


Number of users has doubled in the last 18
months


Full list at
http://trust.mindswap.org


28

Biggest Networks

1.
MySpace



120,000,000

2.
Adult Friend Finder


23,000,000

3.
Friendster




21,000,000

4.
Tickle





20,000,000

5.
BlackPlanet




17,000,000

6.
Hi5





14,000,000

7.
LiveJournal*




10,000,000

8.
Orkut





8,500,000

9.
Facebook





8,000,000

10.
Asia Friend Finder



6,000,000

29

Social Networks on the
Semantic Web


FOAF (Friend Of A Friend)


A simple ontology for representing
information about people and who they
know


About 20,000,000 social network
profiles are available in FOAF format


Approximately 60% of all semantic web
data is FOAF data

30

Structure of Social Nets


Small World Networks


AKA Six degrees of separation (or six degrees of
Kevin Bacon)


Term coined by Stanley Milgram, 1967


Math of Small Worlds


Average shortest path length grows logarithmically
with the size of the network


Short average path length


High clustering coefficient (friends of mine who are
friends with other friends of mine)

31

Trust in Social Networks


People annotate their relationships with
information about how much they trust their
friends


Trust can be binary (trust or don’t trust) or on
some scale


This work uses a 1
-
10 scale where 1 is low trust
and 10 is high trust


At least 8 social networks have some
mechanism for expressing trust explicitly,
several dozen have implicit trust information

32

Using Trust from Social
Networks


If we have trust available from a social
network, how can we use that?


Trust in people can influence how likely
we are to


Give them access to information


Accept information from them at all


Consider the quality of information from
them

33

Examples


Only people I trust can see my phone
number


I will only accept emails from people I
trust

34

Challenges to Using Trust


Each person only knows a very very
small part of the network


For people we know, some automatic
use of trust
may

be helpful, but it does
not provide any new information


If we have access to the network, we
need a way to compute how much we
should trust others

35

Inferring Trust

The Goal: Select two individuals
-

the
source

(node A) and
sink
(node C)
-

and
recommend to the source how much to
trust the sink.

A

B

C

t
AB

t
BC

t
AC

36

Caveats and Insights


Trust is contextual


Trust is asymmetric


Trust is not exactly transitive


37

Source

Sink

38

Trust Algorithm


If the source does not know the sink, the
source asks all of its friends how much to
trust the sink, and computes a trust value by
a weighted average


Neighbors repeat the process if they do not
have a direct rating for the sink

39

How Well Does It Work?


Pretty well


On networks where we have tested it,
trust is computed accurately within
about 10%


Test this by taking a known trust value,
deleting the edge between those people,
comparing the known value with the value
we compute


10% is very good for social systems with
lots of noise

40

Applications of Trust


With direct knowledge or a
recommendation about how much to
trust people, this value can be used as
a filter in many applications


Since social networks are so prominent
on the web, it is a public, accessible
data source for determining the quality
of annotations and information

41

Ordering


Use trust to determine the order in which
information is presented


Aggregating



If data is aggregated, we can use trust to
determine how much weight is given to
different sources

42

Social Networks for Science

Data + Provenance + Social Networks =
Social Policies

43

Policies on the Web


Policies on the web are used to filter and
restrict access to information for


Security


Privacy


Trust


Information filtering


Accountability


Important because of the open nature of the
web

44

Applications of the policy
aware web



Website access



Network routing



Storage management



Grid computing



Pervasive computing



Information filtering



Digital rights management



Collaboration

45

Applications and Industrial
Interest


Internet Content Rating Agency


Using policies and rules to develop content ratings
for websites


Efforts underway at


Microsoft, IBM, Sun, BEA, Oracle


Heavily discussed at W3C Workshop on
Constraints and Capabilities for Web Services


http://www.w3.org/2004/09/ws
-
cc
-
program.html


46

Example Policies


Only allow members of my research
group to access this data set


Reject messages from anyone whose
address is not on my list of verified
senders

47

Policies and Trust


Only users whose inferred trust rating is a 9
or 10 may run processes on this shared
computing resource


Access to preprints of this paper are
accessible only to
trusted

Fermilab
personnel, members of the research team at
other institutions, or the NSF advisory board


Include information in my knowledge base
only if it, and all the files and processes in its
provenance, were created or executed by
people I trust at a level 7 or above

48

Extending Trust to Science


In collaborative scientific environments,
some data and resources require strict
access control (username / password)


For others, this level of control is
unnecessary and cumbersome

49

Trust for Access Control


With a scientific social network, trust can be
used to restrict access to


Data


Computing resources

and


Limit what data is integrated into a knowledge
base


Weight conflicting information from different
sources according to the trustworthiness of the
source


50

Leading to Collaboration


The semantic web with social networks
provides a platform for


Publishing data


Publishing metadata (so experiments can
be verified)


Limiting/granting access to sensitive data


Gathering data from other sources


Filtering data from the web

51

What do we need to do?


“Easy” Steps


Building ontologies for representing
scientific data / metadata


Publishing data on the web

52

What do we need to do?


Hard Steps (
because people don’t
want to do it
)


Developing web policies for limiting
access to non
-
critical data


Webmasters can do this, with training
and collaboration with data owners


Motivating scientists into social
networks

53

Forcing the Anti
-
Social Into
Social Nets


Can’t expect scientists to use a
Facebook/MySpace style social network

(and we probably don’t want to see that
anyway…)


Integrate social networking into other
activities


E.g. email

54

The Payoff


A whole new way of working over the
web


Multiple levels of collaboration


New ways of sharing data and working
together

55

Conclusions


The intersection of the Semantic Web,
social networks, and science holds
great promise for revolutionizing
collaboration over the web


Steps to achieving it are mostly social,
not technological


Motivating the use of these technologies
among everyone involved with data


Introducing new ways to collaborate and
encouraging adoption of new techniques

56

Questions


Jennifer Golbeck


Golbeck@cs.umd.edu


http://trust.mindswap.org