p16x - University of Southern California

streakconvertingSoftware and s/w Development

Dec 13, 2013 (3 years and 7 months ago)

100 views

Tweetool


(0. 1
100

version)

Final Report

Yilei

Qian

Computer Science

University of Southern California

qianyilei.usc@gmail.com


A Twitter Recommend System

based on Topic Modeling

Ideas


Following too many points on Twitter


Too many news every day


Cannot find the interested and valued
news


Don’t know the name which user want to follow


Need someone to
recommend who to follow


Need someone to recommend the hottest news


Use topic modeling to re
-
rank all the user

Traditional Method

Traditional Method

Traditional Method

Topic Modeling

Topic Modeling

Topic Modeling


a

topic model

is a type of

statistical model

for discovering the
abstract "topics" that occur in a collection of documents.



Always used in natural language processing.


Reference Papers:


Steyvers,m
. and Griffiths, T., “Probabilistic topic models,”
Hand book of latent semantic analysis


Blei
, D.M and Ng, A.Y and Jordan, M.I, “Latent
Dirichlet

Allocation”, The Journal of Machine Learning Research 2003

Label based LDA

Step:

1.
Build the LDA Model

2.
Train the model instance by train document

3.
Run the LDA for all the data based on trained model instance


Problem:

1.
P
unctuation marks. E.g. “”,.={}() …

2.
Frequent words.
E.g

I , you….

3.
Other Noise

Result Generate

1.
By Angle

Value =
cos

1
(










)

2.
By Distance

Value =

(

𝑖


𝑖
)
2
𝑛
1

13
-
Dimension Topics

1.
Art & Design

2.
Book

3.
Business

4.
Charity

5.
Entertainment

6.
Family

7.
Fashion

8.
Food & Drink

9.
Health

10.
Music

11.
News

12.
Science & Technology

13.
Sports

Languages & Tools


Web UI: HTML + AJAX(Unfinished) +CSS(unfinished)+Twitter REST API


Android UI: Java, Android 2.1(unfinished)


Server Side: Java 1.6, Servlet 2.0, Spring 3.0, Hibernate 3.3


Twitter API: Twitter4j 2.2.1 (300 request per hour)


Server: Tomcat 7.08


Database: MySQL 5.5


Data Package: JSON


Develop Platform: Eclipse 3.4


Total code lines: 2000(+) + 2421 + 462
= 5000(+)


Subversion:


http
://
tweetool
-
yilei.googlecode.com/svn/trunk/tweetool
-
yilei
-
read
-
only


Architecture

DB

Twitter

fetch

LLDA

Tweetool

Hibernate

DAO

Work
Flow

Servlets

Work
Flow

Work
Flow

Mobile
Device

HTML

APPLICATION

CONTEXT

Distributed Crawler & Computing

Problems(endless T_T)

1.
High noise in topic model


Few words, Odd marks,
Abbreviation

2.
Unfamiliar with Twitter API, A lot of bugs

3.
Transaction Problems

4.
The Ugly UI

5.
Poor performance

6.
Don’t have enough time. Many functions are unfinished

7.
Tweetool

system should be reconstructed !!!

Environment: 7000+Users 22,0000+Tweets

Future Work

1.
Try to finish it

2.
Debug

3.
Build a better train file

4.
Add
f
eedback function

5.
Better topics classification



Web UI (Design Version)

Android UI

Function

Button

Function

Button

Function

Button

Function

Button

Titile

Main Menu

News Menu

Title

News

News

News