Text Retrieval and Data Mining in SI - An Introduction

voltaireblingΔιαχείριση Δεδομένων

20 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

78 εμφανίσεις

2010 ©
University of Michigan

1

Text Retrieval and Data Mining in SI

-

An Introduction

Qiaozhu Mei

School of Information

Computer Science and Engineering

University of Michigan

qmei@umich.edu


2010 ©
University of Michigan

Challenge of Data Mining

2

Published
Content: 3
-
4G/day

User generated data: 8
-
10G/day

Private text data: 3T/day

-

Ramakrishnan and Tomkins 2007

2010 ©
University of Michigan

What do We Do in this Battle?

3

Crowd

Context

Content

Social networks

Online communities

Academic networks

Information networks

time

location

authorship

sentiments

impact

event

Topics

User

Query logs

Social bookmarks

Scientific Literature

News articles

blogs

tweets

Web pages

Social media

EHR

Contextual Text Mining

Social Data Mining

Information Retrieval

Social Network Mining

Health Informatics

Bioinformatics

Statistical Topic Modeling

Web Search

2010 ©
University of Michigan

Personalization v.s. Diversification

4

MSR

PageRank

Mountain Safety Research +

MSR Tents +

MSR Wheels +

Microsoft Research




?

Personalized Rank

Microsoft Research +

Microsoft Research Redmond +

Microsoft Research Asia …

?

Diverse Rank

Mountain Safety
Research +

Microsoft Research +

Metropolis Street
Racer …

?

-

Joint work with
Jian

Guo,
Qian

Zhen

2010 ©
University of Michigan

5

Hot Topics in
SIGMOD

Topic Evolution and Trends

What’s hot in
literature/twitter?

2010 ©
University of Michigan

6


One Week Later

Modeling Spatiotemporal Topic Diffusion

How does discussion spread?

Topic =

“government response in hurricane
Katrina”

2010 ©
University of Michigan

7

Tom Hanks, who is
my favorite movie
star act the leading
role.

protesting... will
lose your faith by
watching the movie.

a good book to past
time.

... so sick of people
making such a big
deal about a fiction
book


The Da Vinci
Code

Summarizing and Tracking Opinions

What is good and what is bad?

Blogs; customer
reviews

2010 ©
University of Michigan

8

Information
retrieval
community

Machine
learning

community

Data
mining

community

Social/Academic
Network

Topical Community Detection

Who works together on what?

Text
Content

2010 ©
University of Michigan

Thanks!

9

-
Joint work with Cheng
Zhai
, Ken Church, Bruce Schatz, Ravi Kumar, Andrew Tomkins,

Denny
Zhou,
Jian

Guo,
Qian

Zhen,
Xu

Ling, Duo Zhang, Deng
Cai
, Dong
Xin
, Chao Liu ...