A Few Chirps about Twitter

gayheadtibburInternet and Web Development

Feb 5, 2013 (4 years and 8 months ago)

191 views

A Few Chirps about Twitter

B. Krishnamurthy, P. Gill and M.
Arlitt


Proceedings of the ACM SIGCOMM
Workshop on Online Social
Networks

Barcelona
,
Spain

August 2008

Micro
-
Content Networks


Examples:


Jaiku



share activity stream


Dodgeball



users update status with geo location


GyPSii



geo location with pictures


Twitter


short, text
-
only messages


Average YouTube video is large (~10 Mbytes)


Micro
-
content network messages are small (~ < 1Kbyte)


One
-
to
-
many communication possible


Often, a publish
-
subscribe system with control on
subscribers


Senders and recipients can choose how to send/receive
messages


E.g. Twitter via cell phone, Facebook, email, RSS feed, or IM

Twitter


Started October 2006, written using Ruby on Rails
[16]


Scripting with Web application framework


Allows users to send short messages (“tweets”)


Max length 140 characters (compatible with SMS)


Notion of following (friends) and followers
(subscribers)


with permission, if desired


“Micro
-
blogging”


Used to transmit messages during 2007 California
fires, riots in Kenya, “Arab Spring” in 2011

Interfacing with Twitter

Outline


Micro
-
content Networks



(
done
)


Data Collection




(
next
)


Characterization


Conclusions

Data Collection


API functions provided by Twitter enable crawl


January


February 2008


Public timeline data (35,978 users)


2:00, 8:00, 14:00, 20:00


Most recent messages available on demand


20 per user, those with custom profile pictures and unrestricted privacy
settings


Provided starting user “seeds” for crawl


Constrained crawl (67,527 users)


Constrained by Twitter API rate limiting


Collecting partial set each
following

(crawl median number for each)


Now way to get the reverse, who’s following user


Metropolized

random walk (31,579 users)


Used to validate constrained crawl


Only follow one child for each (moves to another “neighborhood”
more quickly)


Previously used for unbiased sampling of peer
-
to
-
peer networks

Outline


Micro
-
content Networks



(
done
)


Data Collection




(
done
)


Characterization




(
next
)


Conclusions

Higher Order Results


Following versus Followers


Relationships not always symmetric


Different classes of users


Not all human


Number of tweets varies significantly


Geographic patterns vary


A few countries dominate

Characterization


User relationships


Properties of tweets


What
t
ools are used to post tweets?


When are Twitter users active?


How many tweets do users have?


Other properties of Twitter users


UTC offsets in the datasets


Popular fusers posting


Geographical spread of Twitter

Characterizing User Relationships


Followers



people who subscribe to your
tweets


Following



people whose tweets you
subscribe to


Relationships are not necessarily symmetric


Note: different than “friends” in many OSNs which
are symmetric


Combine all three datasets for analysis

User Relationships

Can analyze clusters of regions, as well as frequency of tweets

User Relationships
-

Broadcasters


News outlets, radio
stations


No reason to follow
anyone


Post playlist, headlines


“Green” points in top
1% of all tweeters



Note, at 1, 2, … are
typically broadcasters,
too, following one
primary broadcaster

User Relationships
-

Acquaintances


Similar number of
followers and
following


Along diagonal


Many green and
purple tweeters


Green are
especially close to
y=x


Typical of other
OSNs (e.g.
Facebook)

User Relationships
-

Oddballs


Some people follow
many users


Evangelists


Hoping some will
follow them back


Miscreants


Spammers or
stalkers


Actual celebrities
(at top)


Most not among
top tweeters

Characterizing User Tweets


Where do tweets come from?


When are people tweeting?


How many tweets do users have?


Where do Tweets come From?

Web includes
twitter.com

and other unregistered apps

Registered apps are significant, enabled by open API


OS (e.g.
twitteriffc
,
twitterwindows
)


Browsers (e.g.
twitterfox
)


OSNs (e.g.
Facebook
)

When are People Tweeting?

Steady during the day, drop
-
off during late night

Length of tweet no correlation (not shown)

Number of Tweets per User


Metropolitan

(M
-
H)


not biased towards well
-
connected since roughly

avoids topology through random walk


M
-
H

looks similar to full crawl, but fewer updates than active users (
timeline
)


Crawl
-
recent

is crawl of only those that tweeted during data gathering

Other Properties of Twitter Users


UTC Offsets


Popular users


Geographical spread of users

Comparison of UTC Offsets

More users in Japan
timezone

not captured in
crawl

compared to
timeline


(Users with UCT of GMT+9 aligns with .
jp

domain)

Suggests: Use
of kanji excludes
English
-
only
users

Popular Users versus Followers


Choose 250 since that is top 95% of all data sets


Popular users (more then 250 followers) tend to tweet
more than those that follow them

Geographical Presence of Twitter

Analyze assigned IDs (linear in assignment modulo over
time), correlated to
timezone

to estimate users in each area

North America most, but Japan, added late, quick growth

Related Work


Earlier work looked at
Twitter
, too [11]


But only single source (3 used here)


No UTC/geography analysis


No relative popularity analysis


Other OSN work (e.g. Flickr and Yahoo! 360 [12])


Properties of a few users with many connections
similar


But
Twitter

has “broadcasters”


Relative demand on system not similar (tweets small)




Summary


Examined Twitter, a micro
-
network


Diversity of access methods (e.g. crawl,
timeline)


Found:


Presence of interesting communities (e.g.
broadcasters)


Open API leading to many sources (e.g. registered
apps)


Communities of users not in crawl (e.g. Japan)