Assignment 2 Report

longtermagonizingInternet and Web Development

Dec 13, 2013 (3 years and 5 months ago)

92 views

02815 Web 2.0 og mobil interaktion

16
.
oktober

2011



Page
1




Assignment 2

Report

Jannik Lind Andreasen


s080031@
student
.dtu.dk

Svante T. H. Jørgensen


s083143@student.dtu.dk


Casper Skipper Olsen


s081155@student.dtu.dk



1.

Harvesting data from facebook

(S
vante
)

In this task we will

harvest an ego network from
Facebook for at least one person from the
group.


We harvest from all three group members, since
this gives us a change t
o compare them later in
the assignment. We believe that the structure of
a users ego
-
network on Facebook will tell us a
lot about how the user is using Facebook.


Svante, facebook id = '732484421'

Casper, facebook id = '1067851594'

Jannik, facebook id = '1
071121150'


We planned to test this before we started. We
gave each other a briefly description on how
each of us are using facebook:


Svante uses his network to keep track of friends
mainly from different schools, university and
family. He does not use hi
s account very often,
but keeps it handy in case he needs to get in
contact with old acquaintances.


Casper has a Facebook account but is not
actively using it. Caspers friends network is very
small an manly contains of family and close
friends. This resul
ts in a network with very few
weak links.



Jannik mostly uses Facebook to communicate
with his daily friends from his network in real life.
He use it to plan training session and vacations
with near friends. Jannik has been living in the
same town all hi
s life and therefore many of his
friends are living close to each other.


From the descriptions above we expect that
Janniks Facebook network is stronger connected
than Svantes, since where Svante use it to
communicate with friends at a long distance,
then

Jannik uses it mostly to communicate with
friends in his neighborhood. This gives a higher
chance that Janniks friends knows each other
since they live in same neighborhood.



How we constructed the solution with
python
:

The code for this assignment is
inspirited by the
scripts provided by the lecture textbook (Mining
the Social Web). The code for the construction
of the ego network is inspirited by the
“facebook__get_friends_rgraph.py” script.
Beside this ego network script we make use of
all the facebo
ok functionality scripts, also
provided by the book, such as the login and API
scripts. Throughout this assignment we are
using the fql query api for accessing the open
facebook graph.


In this first step we find all the friends to the ego
person. This do
ne by accessing the “connection”
table. By providing the id of ego person as
“source_id” we receive all the “target_id”. This
list of “target_id” correspond to all the friends of
the ego person.


Fql query:
SELECT target_id FROM connection WHERE
source_i
d = me() and target_type ='user'


2.

Generate ego
-
network without the
central person

(C
asper
)

In this task we use the harvested data from task
1 to generate an ego
-
network, but without the
central person and all links connecting to that
person. i.e. an ego
-
ne
twork without the ego.


We will generate a network for each of the group
members.


How we constructed the solution with python:

To generate the ego network we have to find
friendships amongst the priestly found ego
friends. This is done by accessing the “f
riends”
table. We pass in the friends list both as “uid1”
and “uid2”. In this way the result is a list of all the
02815 Web 2.0 og mobil interaktion

16
.
oktober

2011



Page
2




edges between the friends in the list.

Fql query
:
SELECT uid1, uid2 FROM friend WHERE
uid1 in (%s) and uid2 in (%s)


Next we access the “user”

table to harvest some
data from each friend in the friend list. The data
we harvest is the sex of each friend and the
names. The data is stored in two dictionaries
with the user id as a key.

To collect all the data in one place a dictionary
called “friend
ships” is created. This dictionary
key is the user id. The data for each key is the
name, sex and a list of the friends of the user id.

Finally a “networkx” graph is created based on
the “friendships” dictionary. This is done for store
the graph in a pickl
e file for later use. Before the
graph is stored the ego node is removed.


3.

Plotting the network

(Jannik)

In this task we will plot the networks generated in
task 2.


We use Cytoscape to visualize the network.


How we constructed the solution with
python:

T
o visualise the graph in Cytoscape the data has
to be stored in text files. For importing the actual
graph a netlist
-
file is generated. Next, two files
with the node attributed are generated. One with
the gender data and one with the names of the
nodes.

Th
e visualization result follows in task 4.


4.

Visualization of number of likes per
person

(Casper)

In this task we harvest the number of likes for
everyone in the network, and visualize the
number of likes per person by node size in
graph.


How we constructed

the solution with python:

To harvest the like button activity of each
person, in the ego network, the Facebook Graph
API is utilized. For each node in the networkx
graph the amount of likes is found with the
following api call:
https://graph.facebook.com/
[user_id]/likes. This
call returns a list of all the likes for the particular
person. The size of this array is stored in a
dictionary. The data is finally stored in a attribute
text file.


Here are the results:


Graph Attributes

Red node
-

Female

Green no
de
-

Male

Node size
-


The likes count is represented in the
size of the node.


Notice:The size of the nodes in each graph is not
comparable.


Here are the networks

(next page)
:

02815 Web 2.0 og mobil interaktion

16
.
oktober

2011



Page
3




Svantes
n
etwork:

C
aspers network:


















02815 Web 2.0 og mobil interaktion

16
.
oktober

2011



Page
4





Janniks network:


We s
ee that the way people uses likes is very
different, but we do not recognize any pattern
telling us that the way people is using likes
depends on which community they are a part of.















5.

Discovering network communities

(Jannik)

In this task we t
ry to

discover communities within
each network.


We are using CFinder to find the communities
.


Here are the results

(continues on next page)
:



Svante

Casper

Jannik

k=3

7

1

1

k=4

8

1

1

02815 Web 2.0 og mobil interaktion

16
.
oktober

2011



Page
5




k=5

7

1

2

k=6

5

2

2

k=7

3

1

1

k=8

3

n/a

1

k=9

2

n/a

1

k=10

2

n
/a

1

k=11

n/a

n/a

1

k=12

n/a

n/a

2

k=13

n/a

n/a

1

k=14

n/a

n/a

1

k=15

n/a

n/a

1

k=16

n/a

n/a

1

k=17

n/a

n/a

1

k=18

n/a

n/a

1

k=19

n/a

n/a

1

k=20

n/a

n/a

2

k=21

n/a

n/a

1



We first notice that there is no significant sub
communities in Janniks
network this is not seen
very often and is again an indication of that this
network is very strongly connected.


6.

Investigation of how the like button is
used

(Svante)

In this task we compare how the like button is
used in different groups. For each group w
e take
the mean number of likes per person and use
this to gives us an idea on how it is used.


How we constructed the solution with python and
Cfinder:

To find the each persons likes we used the
“facebook__graph_query.py” script which
queues the Facebook
API and modified it to
return the result of the queue

“q = select uid, first_name, last_name, sex,
likes_count from user where uid in (%s)”

where %s is a list of user IDs of all users in
Svantes network, found in task 2. We save the
“userID = numberOfLik
es” pairs in a Dict.


Using the results from task 5 we choose k
-
value
6 in Cfinder, as we felt that it best reflected the
real groups.


With the groups provided from Cfinder and the
results from the Facebook API queue, we
calculated the mean number of like
s per person
in each group. The results where as follows:


Svantes network for k=6

Mean Likes for group 1 = 106

Mean Likes for group 2 = 10

Mean Likes for group 3 = 66

Mean Likes for group 4 = 43

Mean Likes for group 5 = 59


Analysis of the results

From th
e results we see 3 kinds of groups.
Groups 3 to 5 has a number of likes close to the
total average of likes, group 2 has a very low
average of likes and group 1 has a very high
average of likes. From this we can conclude that
there indeed is very big diffe
rences in how much
the Like button is used, even as much as a
factor of 10 difference.

Looking at the general activity(number of posts,
number of friends and so on) of the users in the
different kinds of groups, it clearly shows a
correlation between numb
er of likes and general
activity. But as with any other data mining, it is
difficult to prove any causal relationships.


7.

Harvesting wall posts on Facebook

(Svante)

In this task we are harvesting the 300 most
recent wall posts from each person in each
netwo
rk.


How we constructed the solution with python:

To harvest the wall posts for each user in the
network we access the “stream” table. For each
02815 Web 2.0 og mobil interaktion

16
.
oktober

2011



Page
6




“source_id” we get returned all the messages
and post_id. These returned messages
correspond to the messages pos
ted on the wall.

Fql query:
SELECT post_id, message FROM stream
WHERE source_id = %s.

To get data which emulate actual conversations
between friends we also harvest the comments
posted for each message. This is done by
accessing the “comment” table. For ea
ch post_id
in the list og messages we get all the comments
returned.

Fql query:
SELECT fromid, text FROM comment WHERE
post_id IN (%s).

All the data messages and comments is stored
in a dictionary with the user_id as key. The
dictionary with all the docume
nts for each user is
saved in a pickle file.


8.

Discovering topics in communities
based on text analysis of wall posts.

(Jannik)

In this task we are trying to discover
conversations topics in communities with text
analysis on posts from the Facebook wall.


H
ow we constructed the solution with python:

We harvested all wall posts per user in task 7
and created a dictionary with the information.
The dictionary was saved in a pickle. The
dictionary allow us to look up all the wall posts
from a user by the users u
ser id.


We load the pickle file to get the dictionary and
for each person in the community we get the wall
posts and append them to one document for the
entire community.

We then use a stoplist to filter very frequently
ocuring words in the natural langua
ge, we did
this by using a stop list for the danish language,
since we know that danish is the most written
language in all of the Facebook networks that we
are investigating in this assignment.

We got the stoplist from nltk with this piece of
code:

stopl
ist =
nltk.corpus.stopwords.words('danish')


We had some unicode problems. This means
that we needed to append some of the words by
our self. We did this with the following piece of
code:

stoplist.append(u'så')


Then we do another filtering and remove all
words that only occur once.

At last we are using TFIDF and LSI to find the
topics in the document.


Results:

From the results of all communities we noticed
that Facebook is really one big birthday card and
therefore we added the following words to the
stop
list; “tillykke”, “håber”, “dagen” og “tak”.

But then we got some other silly topics like; “:
-
)”,
“elsker” and “haha”.


We did not get as meaningful topics that we
were hoping for. We had hoped that one
community was talking about soccer (In danish:
fodbol
d), another talking about policy (In danish:
politik).


All we got was happy birthdays wishes, a lot of
love and some laughter. In the end we can at
least conclude that Facebook is mostly a happy
place to be and most of the people seems to be
in good mood.


We also believe that on Facebook the topics
changes everyday except the birthday topic,
since people always have birthdays. That is also
the reason why this topic is so dominating.


Comparison of the networks

(Casper)

In this chapter we will try to compa
re the
different networks that we have worked with and
see if our hypothesis that says the structure of a
users ego
-
network on Facebook will tell us a lot
about how the user is using Facebook.

(
Please refer to the graphs in task 4.
)


We first notice the di
fference in how people are
connected to each other. We see that Janniks
network is strongly connected. i.e. almost
everybody knows each other. We did expect that
Janniks network is closer connected than
Svantes, since Jannik only has
friends on
Facebook

wh
om he live
s

close to. Caspers
network is much like Janniks network just a
smaller

one
.