Facebook App: RepCheck

kneewastefulAI and Robotics

Oct 29, 2013 (3 years and 7 months ago)

79 views

i



Facebook App: RepCheck

MIS 510 Project Final Report,

Dr. Chen


Justin Frey

John Gastreich

Jeff Jenkins


5/10/2010







ii


Contents

1 Overview

................................
................................
................................
.........

1

2 Product Description

................................
................................
..........................

1

2.1 Home
page and Posts
................................
................................
..................

2

2.2 Sentiment Analysis

................................
................................
....................

3

2.3 Making a Post

................................
................................
............................

4

2.4 Legislat
ion Bill Sorter

................................
................................
.................

5

2.5 Representatives

................................
................................
.........................

6

3 Business Model

................................
................................
................................

7

3.1 Revenues and Profits

................................
................................
.................

7

3.2 Exit Strategy

................................
................................
.............................

7

3.3 Competit
ive Analysis

................................
................................
..................

8

4 Novelty

................................
................................
................................
...........

9

5 System Design

................................
................................
..............................

10

5.1 Architecture
................................
................................
.............................

10

5.2 APIs

................................
................................
................................
........

10

5.3 Database Compone
nt

................................
................................
...............

10

5.4 Data Mining/AI Component

................................
................................
......

11

6 Lessons Learned

................................
................................
............................

11

7 Project Management

................................
................................
......................

12

Bibliography

................................
................................
................................
.........

i



1


1
Overview

Young people want to get and stay informed on politic
al issues, candidates, and

local
governmental
repre
sentatives. These young users

want to get

information i
n

an entertaining and interactive way
. They are

savvy
Internet
users
who enjoy
using
W
eb
2.0 technologies. They want to be entertained and share with their 130
Facebook friends. They
also
want to feel that their participation is m
eaningful and
that their voices are being heard
.

The following are some typical behaviors of Facebook users:



More than 350 million active users (over 100 million in the US)



50% of active users log on
to Facebook in any given day



The a
verage user spends more

than 55 minutes per day on Facebook



The a
verage user has 130 friends on the site

(Facebook)
.

Web 2.0 technologies and
Social Networking Sites (
SNSs
)

are influencing politics
through moments such as Obama Girl, Jib Jab, and th
e YouTube political debates

(Obama Girl, JibJab, YouTube Debates Named Top Political Web Moments By
Webbys, 2008)
. Politicians realize the importance of online communication with
young savvy Internet users and are spending more

money than ever to capture
online audiences. Obama received over $700 million

(Federal Election Commission)

in campaign funds and spent
approximately
4% online
.

Campaigners in 2010 are
expected to spend nearly 10% of funds on
the Net

(Hart, 2010)
.

RepCheck has been created to address the need for quick and easy access to
political information through the social media
and networking
tool Facebook. The
remainder of this report will provide a product
description

(section 2)

of RepCheck

a
long with screen captures of its

applications. A b
usiness
m
odel

and c
ompetitive
a
nalysis

for RepCheck

has also been established (section 3)
. Product n
ovelty

is
explained (section 4)

and a description of the s
ystem
d
esig
n

is detailed

(section 5)
.


2
Product Description

RepCheck was created to capture the attention of social media
and networking
users and
to
provide governmental information in an easy to use, informative and
interactive format. RepCheck resides as a Faceb
ook application and is therefore
accessible through Facebook and Facebook authentication. The following images
and descriptions show and explain the unique functionality of RepCheck.



2


2.1
Homepage and Posts

Upon first accessing RepCheck the user arrives a
t the “Posts” screen

as seen in
Screen Capture 1

below
. The user can

view an automated
visual

analysis of

what
people are posting

in the P
osts Word
C
loud
. The user can quickly recognize

what
words are being discussed and by the size of the words, the frequency of
discussion.

From anywhere in RepCheck
,

a user is able to access information contained on the
right hand side of the screen which included the ability to post their own comments,

view advertising
,

and access the latest political news.

Screen Capture
1
: Posts




Make Posts

Advertising

News

Word Cloud

3


2.2
Sentiment Analysis

As demonstrated in
Screen Capture 2

b
elow, b
y selecting a word

from

the
Word
Cloud

the

Posts


window appears displaying the
collective
sentiment analysis in a
pie chart that indicates the percentage of positive versus negative postings.
Individual comments and the poster’s name and photo from
F
acebook are also
displ
ayed in the Posts window.

The sentiment analysis process utilized th
e J48 decision algorithm with ten

fold
validation. To train the algorithm
,

previously existing posts from Facebook and
Twitter fan clubs and hate clubs were utilized. 380 Facebook posts an
d 600 Twitter
posts were used to achieve a 78.06% accuracy rate with the sentiment analysis. If
-
else statements were programmed to simulate the decision tree created for the J48
algorithm
.

Splice, a

sentiment analysis engine
, is

utilized
by RepCheck
to de
termine the
sentiment of the posts. Splice is a project currently under development by PhD
student

Jeff Jenkins
in the Center for Management of Information at the Eller
College of Management, University of Arizona.

Screen Capture
2
: Sentiment Analysis


Posting Key
Word

Poster &
Comment

Sentiment of
posts

4



2.3
Making a Post

RepCheck users are able to post comments
from anywhere within RepCheck
by
clicking the
Be Heard, Make a Difference


Post!

button.

As demonstrated in
Screen
Capture 3
, u
pon click
ing this button the “Make Post” window appears. From this
point
, users can

enter their comment
s

and click

the “Post” button to submit th
eir
comments. Key words from a

user’s comment will be added to the Word C
loud and
compiled as part of the sentiment analysis. Additionally,

the user’s comments will
be
visible to other users via the “Posts” window and also on the user’s own wall
within their Facebook profile

as shown in
Screen Capture 4
.

Screen Capture
3
: Make Post


Screen Capture
4
: Facebook Wall


Make Posts

Post Window

Comment
added to
user’s
Facebook wall

5


2
.4
Legislation
Bill
Sorter

Federal Legislation
is associated with one
or
more of
16 different categories.
When

the bill sorter

is launched
, as indicated in
Screen Capture 5
,

users have the ability
to use a slider bar to
assign a

level of importance
to the categories of their
choosing. The bill sorter then returns a list of bill
s that meet the criteria the user
has entered through the slider bars. The bill sorter utilizes the Jaccard Function to
determine which bills to return and assigns a relevance score to each bill
suggestion.

Screen Capture
5
: Legislation
Bill
Sorter




User Input:
Category
Importance

Returned
Le
gislation

6


2.5
Representative
s

As shown in
Screen Capture 6
, i
nformation about the user’s governmental
representatives ca
n be located by entering the ZIP

code of the geographic area in

question. Upon entering the ZIP

code
,

the user is presented with a list of
governmental representatives from the national level down to the local level of
representation.

By sele
cting one of the representative
s


names in the

list
,

a pop up
window appears that displays more detailed information about the selected
representative.

Screen Capture
6
: Representatives




Enter Zip
Code

Select Rep’s
Name

Detailed Rep
Information

7


3
Business
Model

The business model proposed here is similar to that of PayPal and Facebook. It is a
business model that depends on the network effect where the value of each node
increases as the number of nodes increases.
Revenues will come from candidates
advertis
ing on the site to influence voters.

3.1
Revenues and Profits

Based on the 2009 Operating Results of SNAP Interactive,

a publicly traded
company which uses

a similar business model

to RepCheck
, at 20 million users
advertising revenues can be estimated at
approximately $0.16 per user per year.
Table 1

below shows estimated revenues of such a business at various levels of
use.

Table
1
:

Revenues and Profits

No. of
users

Revenue
per year
per user
a

Yearly
revenue

Expenses
% of
revenues

Expenses
per year

Profit per
year

500,000

$0.16

$80,000

100%

$80,000

$0

1,000,000

$0.16

$160,000

95%

$152,000

$8,000

5,000,000

$0.16

$800,000

90%

$720,000

$80,000

20,000,000

$0.16

$3,200,000

85%

$2,720,000

$480,000

a. Based on SNAP
Interactive 2009 Operating Results. $3,170,725 / 19M users = $0.16 per user per year.

http://www.marketwire.com/press
-
release/SNAP
-
Interactive
-
Releases
-
2009
-
Operating
-
Results
-
1141844.htm


The trends in Table 1 also assume that expenses as a percentage of r
evenue will
decrease from 100% to 85% depending on the maturity for the business. These
numbers are based on
the
entrepreneurial experience

of one of this report’s authors

as the CEO of a similar startup company at similar levels of users, revenues,
and
ex
penses.

The results of the forecasted profits of this business
are
based on the assumptions
given, shows that profits would increase from
$0 at 500,000 users to nearly half a
million dollars per year at 20 million users. The final figure can be justified b
y
SNAP’s 2008 Operating Results.

3.2
Exit Strategy

Next, an exit strategy is considered for each of the user levels. Based on SNAP’s
market capitalization of $8.6M at 19M users, it is assumed that RepCheck would
also be valued at approximately the same am
ount.
Table 2

below shows that the
value of each user at sale time is dependent on the number of total users as
dictated by the Network Effect. Therefore, at 500,000 users, it is assumed that the
value of each user ($0.10) is only approximately one
-
quarter

the values of each
user at 20 million users ($0.43).

8


Table
2
:

Exit Strategy

No. of users

Per user
b


Sale price
c

500,000

$0.10

$50,000

1,000,000

$0.15

$150,000

5,000,000

$0.20

$1,000,000

20,000,000

$0.43

$8,600,000

b.
Based on SNAP Interactive market cap of $8.23M / 19M users, 4/27/2010, and for network effect.

c. Sell to online media company, e.g. SNAP Interactive.


This exit strategy assumes that RepCheck will be able to find a buyer to acquire the
company outright. T
his is highly speculative but not unrealistic. Recent acquisition
activity from SNAP and other companies in this niche market has shown that
companies are taking advantage of acquisitions to strengthen their positions.

3.3
Competitive

Analysis

Three

compet
itors to RepCheck have been identified and analyzed. Based on
comparison of the competitors’ products it has been determined that RepCheck is
unique in its offering of User Sentiment Analysis, Capitol Words Chatter,
Legislations Recommendation and links to

relevant political news items.

These
items have been highlighted in
Table 3

below. The main aspect that makes
RepCheck unique, as compared to its competitors, is its intelligent analysis. That is
to say RepCheck receives information from the user, applies

a decision algorithm to
choose and return information that has been customized to the user’s interests.

Table
3
:

Competitive Analysis

Feature


RepCheck


Visible

Vote


Govtrack


Open

Congress


User Sentiment Analysis

X







Capitol Words Chatter

X







Legislation Recommendation

X







Facebook App

X

X





Reps by User’s Region

X

X



X

Representative Biographies

X

X



X

Videos

X



X



Links to News

X







Post to Wall

X

X





Attractive, Easy to Use Interface

X






X

Job Approval Ratings



X





Discussion Forums



X

X




9


4 Novelty

RepCheck

is

uniquely

position
ing

itself by compiling in
formation from multiple
sources

and integrating a variety
of functions into one
location.

The integrated
features and

description
s

of each are presented in
Table 4

below.


Table
4
:

Novelty


Feature

Description

Sentiment Analysis

Users post their comments in RepCheck. The sentiment
analysis system rates the
comment as positive or negative.
A pie chart indicates the percentage of negative vs. positive
comments for the particular topic.


Recommendation
System

All federal legislation going through congress is placed in one
or more categories. The Bill Sorter sy
stem receives input
from users selecting how important a particular category is
to them. The bill sorter returns bills that pertain to a
category that the user has indicated is of interest to them.


Facebook

The social media tool allows users to access
applications and
the applications to interact with the user’s Facebook profile.
oepCheck uses the c慣ebook 慵thentic慴ion 慮d 慬so pl慣es
comments posted within RepCheck to the user’s Facebook
w慬l.


啮ique A偉s

pplice

A偉W pentiment 慮慬ysis

sote pm慲t
A偉W Collects 慮d sh慲es government d慴a

C慰itol tords

A偉

Most common words used 慴 the c慰it慬


doogle teb Toolkit
C App bngine

啳ed to develop the 慰plic慴ion 慮d d慴愠store. The site 慮d
d慴愠store 慲e hosted in the doogle cloud.


sisu慬iz慴ion

heywords from the oepCheck community

慲e displ慹ed in 愠
tord Cloud. The size of the keyword indic慴es the frequency
of occurrence of the keyword. A pie ch慲t displ慹s the
sentiment 慮慬ysis

慮d 愠
b慲 ch慲t
shows the
keywords

used

on
c慰itol

hill.


oevenue
s

Ads for politic慬 fundr慩sing C c慭p慩gning
.


10


5

System Design

5
.1 Architecture




The RepCheck

w
eb application
is

built using a three
-
tier model/view/controller

architecture. In addition, the
application implement
s

two API
-
based architectures to
ensure extensibility/scalability and robustness.
The f
irst

implementation is

the
Google App Engine architecture and API

which
allow
s for easy

deplo
yment of the
Web application to the Google App Engine cloud. The Google App E
ngine cloud
allows almost unlimited extensib
i
l
ity

with the first 1,300,000 daily requests free and
10,000,000 daily calls to the data

store for free.
The second implementation
utilization is the
Google Web Toolkit (GWT) APIs

which

ensure
s

robustness across

browses. GWT is a java
-
base programming API that creates very robust AJAX
components.

5
.2 APIs

The APIs selected
where chosen
to appeal to our target market based on the
followi
ng criteria: 1
-
entertaining, 2
-
ease of use, and 3
-
interest in politics.
The
f
ollowing APIs were implemented in our application
:



Facebook API: integrate with Facebook services to customize content to

the
user



Splice API: Program used for sentiment analysis



VoteSmart API: provides access to all political information that we need



YouT
ube API: brings in entertaining videos based on political and personal
interests of the user



Google News API: brings in relevant and interesting news articles to the user
and politics



Google Web Toolkit: allows interactive and robust Ajax components



Google

App Engine and JDO API: provides almost unlimited
extensibility/scalability and platform independence for data store



Capitol Words API: Lists the key words used on capitol hill each day


5
.3 Database Component

The

database component of the project focuses

on extensibility and platform
independence.
A
Google

data store (database) is
implement
ed utilizing

Java Data
Objects (JDO). JDO allows stor
age of the

Java domain model instances into the
persis
tent store (database).
JDO

was chosen

rather than JDBC or EJB

because it
allows seamless integration with Google App Engine and is easily integrated with
most major
databases. The
database

keep
s

track of user information (e.g., zip
code, interests, and historical behavioral data
).

11


5
.4 Data Mining/AI Component

Hundreds of
bills

go through the national congress each year.
Any person might
have dozens of government
al

representatives who they vote for.

Tens of thousands
of political comments are posted each day on the internet. RepCheck helps users
make sense of al
l the information and distill what they need and want to view.

The government data used by RepCheck comes from the Vote Smart API. Vote
Smart compiles thousands of records each year and makes the information
available through their website and accompanying

API.

The RepCheck
recommendation system utilizes the Jaccard Function, input from the user and the
legislative bill data provided by Vote Smart to make recommendations to the users
as to which bills would be of interest to them.

Data used for sentiment a
nalysis training was mined from
380
comments posted on
Facebook and

600 posted on

Twitter. Drawing data from fan groups produced
words with a mostly positive sentiment. Whereas comments collected from hate
groups produced
words with mostly
negative
sentiment.

When RepCheck users post comments to RepCheck their comments are analyzed
using Splice
, the J48 Decision Algorithm

with ten
-
fold validation

and the training
data to determine if the comment is posit
ive or negative in nature. S
entiment
polarity i
s then made available through

a pie chart on

the RepCheck user interface.
An accuracy rate of 78.06% has been achieved with the preceding method.


6 Lessons Learned

Throughout the project, each team member was able to gain from the experience
of
putting th
e project together. The following are the main points that the team can
say were lessons learned by all.

Working with sentiment analysis, decision algorithms, Weka and converting

decision
trees to programming statements was a new process for each of the te
am
members. One team member had prior experience that allowed him to ramp up
more quickly. However, the other two members had to spend a fair amount of time
learning about these concepts and technologies in order to apply them to the
project.

Utilizing th
e
Google Web Tool
kit to c
reate the data store and host

the application in

the
Google

cloud
was a great opportunity to learn about these new technologies.
It
was interesting to learn that the “cloud” was much more accessible and easy to use
than was initial
ly thought. The cloud can be easily scaled to mach demands and is
very cost effective. For this particular project it was free.

12


Web mining is fascinating and requires creativity to match its powers to real
business applications. As the team considered the
options for data mining
applications, the team found that understanding the power of web mining was only
half the battle. Having the creativity to create interesting functionality is the just as
important. Tools are only as good as their applications for u
seful purposes.

Building the business case is an essential ingredient to a successful project.
Understanding the competitive landscape and business model is important for
design and for the attitude of the team.

Team members need to find what they are cap
able of creating and contributing to
the project. One of the main concerns was that Jeff Jenkins would have to do an
unfairly large amount of the work considering the programming capabilities of the
other two members. The other two members focused on what
they could do to
make the workload as even as possible by contributing to the project in any way
they could.

7 Project Management

The
team members played the following roles.

Most of the technical work was
completed by the team leader, Jeff Jenkins

(s
ee
T
able
5

below
)
.

Table
5
:

Team Members' Roles

Team Member

Role

Jeff Jenkins

(Team Leader)

Sentiment Analysis

API Research & Implementation

GWT/UI Implementation

Database Design/Implementation

Project Management

Justin
Frey

Datastore Development

API Research & Implementation (YouTube)

Decision Algorithm Research & Prototyping

User Interface Design

Competitive Analysis

John Gastreich

Jaccard Function Research & Prototyping

API Research & Implementation
(Google News)

Decision Algorithm Research & Prototyping

Business Case & Business Model

The
team followed t
he Project Schedule outlined in

T
able

6

below
. Although there
was little deviation from the original plan, the team did need to push hard the
last
two weeks on implementing the Jaccard func
tion and the sentiment analysis

(
which
was not in the original plan
)
.


13


Table
6
:

Project Schedule

Week

Task(s)

Deliverable(s)

0

Jan 11

Form team


1

18

Idea generation and
research


2

25

Create project proposal


3

Feb 01

Get acceptance for project,
mock
-
up / storyboards

Project proposal delivered

4

08

Database, data mining, API,
and interface research and
planning


5

15

Basic implementation


6

22






7

Mar 01





Version 0.5 complete
d (internal
milestone)

8

08

Add APIs, functionality




15

Spring Break

None

9

22






10

29






11

Apr 05

Present Demo 1.0, get
feedback from Dr. Chen

Pro
ject Presentation/Demo 1.0

12

12

Implement suggestions


13

19

Prepare for final
presentation


14

26

Present Demo 2.0

Final

Presentation/Demo 2.0




i


Bibliography

Facebook. (n.d.).
Press Room Statistics
. Retrieved January 26, 2010, from
Facebook: http://www.facebook.com/press/info.php?statistics

Federal Election Commission. (n.d.).
Presidential Campaign Finance
. Retrieved
J
anuary 27, 2010, from Federal Election Commission:
http://www.fec.gov/DisclosureSearch/MapAppRefreshCandList.do

Hart, K. (2010, April 3).
Google, Facebook prepare for political ad bonanza in
midterm elections
. Retrieved May 10, 2010, from The Hill:
http://
thehill.com/blogs/hillicon
-
valley/technology/90483
-
google
-
facebook
-
prepare
-
for
-
political
-
ad
-
bonanza
-
in
-
mid
-
term
-
elections

Obama Girl, JibJab, YouTube Debates Named Top Political Web Moments By
Webbys
. (2008, October 28). Retrieved January 26, 2010, from
Cy
berJournalist.net: http://www.cyberjournalist.net/obama
-
girl
-
jibjab
-
youtube
-
debates
-
named
-
top
-
political
-
web
-
moments
-
by
-
webbys/