Analyzing Social Bookmarking Systems:

warbarnacleSecurity

Nov 5, 2013 (3 years and 9 months ago)

55 views

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

1

20 July, 2008

Analyzing Social Bookmarking Systems:

A del.icio.us Cookbook


Robert Wetzker, Carsten Zimmermann, Christian Bauckhage

Workshop on Mining Social Data, ECAI 2008

5 November, 2013






Dipl.
-
Ing. Robert Wetzker I
robert.wetzker@dai
-
labor.de

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

2

20 July, 2008


Why this paper?

Why social bookmarking?


Provides a vast amount of user
-
generated annotations for web content.


Reflects the interests of millions of users.


Wisdom
-
of
-
crowds.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

3

20 July, 2008


Why this paper?

Why social bookmarking?


Provides a vast amount of user
-
generated annotations for web content.


Reflects the interests of millions of users.


Wisdom
-
of
-
crowds.

Research areas:


(Web
-
) Search


(Web
-
) Content classification


Ontology building


Trend detection


Recommendation




Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

4

20 July, 2008


Outline

1.
The del.icio.us bookmarking service

2.
Bookmarking patterns

3.
Tagging patterns

4.
Social bookmarking and spam

5.
Conclusions and future work

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

5

20 July, 2008


The del.icio.us bookmarking service


Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

6

20 July, 2008


The del.icio.us bookmarking service


Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

7

20 July, 2008


The growth of del.icio.us



Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

8

20 July, 2008


The dataset




We recursively crawled
del.icio.us

tag wise starting with the tag “web2.0”
(Oct.
-
Dez. 2007).



From the retrieved corpus of 45 million bookmarks we extracted the 1 million
most frequent users and downloaded the bookmarks of these users. (Dez.
2007


Apr. 2008)



For the analysis presented here, we only considered the 142 million
bookmarks obtained from the user wise crawling.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

9

20 July, 2008


The dataset




We recursively crawled
del.icio.us

tag wise starting with the tag “web2.0”
(Oct.
-
Dez. 2007).



From the retrieved corpus of 45 million bookmarks we extracted the 1 million
most frequent users and downloaded the bookmarks of these users. (Dez.
2007


Apr. 2008)



For the analysis presented here, we only considered the 142 million
bookmarks obtained from the user wise crawling.

Corpus details

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

10

20 July, 2008


The dataset




We recursively crawled
del.icio.us

tag wise starting with the tag “web2.0”
(Oct.
-
Dez. 2007).



From the retrieved corpus of 45 million bookmarks we extracted the 1 million
most frequent users and downloaded the bookmarks of these users. (Dez.
2007


Apr. 2008)



For the analysis presented here, we only considered the 142 million
bookmarks obtained from the user wise crawling.

Corpus details

> 80% of
del.icio.us

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

11

20 July, 2008


Bookmarking patterns

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

12

20 July, 2008


Bookmarking patterns

Top 10 most frequent URLs in the corpus



The del.icio.us community is biased toward web community and web
technology related content.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

13

20 July, 2008


Bookmarking patterns

Top 10 most frequent domains in the corpus



The del.icio.us community is biased toward web community and web
technology related content.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

14

20 July, 2008


Bookmarking patterns



The Top 1% of users proliferates 22% of all bookmarks.



39% of all bookmarks link to 1% of all URLs.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

15

20 July, 2008


Bookmarking patterns


The del.icio.us community pays attention to new content only for a very short
period of time.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

16

20 July, 2008


Tagging patterns

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

17

20 July, 2008


Tagging patterns


Each bookmark is labeled with 3.16 tags on average.


About 7% of all bookmarks are not tagged at all.

Top 20 most frequent tags in the corpus

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

18

20 July, 2008

Tagging patterns



700 of 7.000.000 tags account for 50% of all labels.



55% of all tags appear only once.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

19

20 July, 2008

Tagging patterns



Tendencies in the
del.icio.us

tag distribution strongly correlate with upcoming
and periodic external events.

Occurrence of 5 sample tags in 2007.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

20

20 July, 2008


Social bookmarking and spam


Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

21

20 July, 2008


Social bookmarking and spam




Del.icio.us is highly vulnerable to spam.




19 of the Top 20 users are of apparently non human origin accounting for 1.3
million bookmarks, around 1% of the corpus.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

22

20 July, 2008


Social bookmarking and spam




Del.icio.us is highly vulnerable to spam.




19 of the Top 20 users are of apparently non human origin accounting for 1.3
million bookmarks, around 1% of the corpus.




We find spammers to exhibit one or more of the following characteristics:




very high activity



bookmarking only few domains



high tagging rate



very low tagging rate



bulk posts



a combination of the above

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

23

20 July, 2008


Social bookmarking and spam


The number of bookmarks and the number of users linking to a domain.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

24

20 July, 2008


Social bookmarking and spam


The number of user bookmarks and the average number of tags per bookmark.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

25

20 July, 2008


The diffusion of attention


Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

26

20 July, 2008


The diffusion of attention




In some cases spam detection may prove computational expensive or
ambiguous.




The
diffusion of attention

concept reduces the effect of spam on the tag
distribution without the actual need of spam detection.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

27

20 July, 2008


The diffusion of attention




In some cases spam detection may prove computational expensive or
ambiguous.




The
diffusion of attention

concept reduces the effect of spam on the tag
distribution without the actual need of spam detection.




We define the
attention

given to a tag as the number of users using the tag.




The
diffusion of attention

for a tag is then given by the number of users that
assign a tag
for the first time

in a given period.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

28

20 July, 2008


The diffusion of attention


Tagging trends by tag occurrence.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

29

20 July, 2008

The diffusion of attention

Tagging trends by tag occurrence.

Tagging trends by
diffusion of attention
.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

30

20 July, 2008


Future work




Provide automatic and scalable spam detection methods.



Topic aware detection of trends.

Follow up paper:


Detecting Trends in Social Bookmarking Systems using a Probabilistic Generative Model and
Smoothing,
R. Wetzker, T. Plumbaum, A.Korth, C. Bauckhage, T. Alpcan, F. Metze, International
Conference on Pattern Recognition (ICPR), 2008, Tampa, USA

(to appear)

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

31

20 July, 2008

Questions?

Thank you.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook

32

20 July, 2008


Social bookmarking and spam


The number of bookmarks and the number of users linking to a domain.

http://d.hatena.ne.jp