VISA: A VIsual Sentiment Analysis System

beansproutscompleteSoftware and s/w Development

Dec 13, 2013 (3 years and 6 months ago)

78 views

VISA
: A
VI
sual
S
entiment
A
nalysis System



Sept. 2012

Dongxu Duan
1

Weihong Qian
1

Shimei Pan
2

Lei Shi
3

Chuang Lin
4

1
IBM Research




China

2

IBM T. J. Watson

Research Center

3

Institute of Software

Chinese Academy of Sciences

4

Tsinghua


University

2

What is Sentiment Analysis


Sentiment analysis

or

opinion mining

refers to the application
of

natural language processing,

computational linguistics,
and

text analytics

to identify and extract
subjective

information
in
source materials
.
----

From Wikipedia






A survey of sentiment analysis works by Pang and Lee in 2008:


“Opinion mining and sentiment analysis”, cited 1189 times in Google
Scholar, including 326 references






A probably earliest study:



3

Motivation

The truth: sentiment analysis is becoming even more important


Corporate



* Brand analysis, sales campaign design, etc.


* Crisis relationship management


Government


As we all know ..


Observations:


Sentiment analysis technologies are going deeper and versatile:



* Aspect
-
oriented, domain
-
specific lexicon expansion, MT technology


The average users are still leveraging rather simple sentiment
results


It’s hard for them (even domain expert) to understand sophisticated SA results


There is big gap and huge potential for sentiment visualization
(visual opinion mining)


4

Agenda


Related Works



Research Problem and Challenges



Sentiment
-
Tuple based Data Model



VISA System Framework



Visualization Optimizations



Cases



User Studies



Summary





Basic Sentiment Representation


Raw text/table or simple visualization

Brand Association Map

COBRA (COrporate Brand and Reputation
Analysis)

Behal et al. (HCI 2009)

Opinion Observer

Liu et al. (KDD 2005); Liu et al. (IW3C2 2005)

Visual Sentiment Analysis of RSS News Feeds

Wanner et al.
(VISSW 2009)

Pulse: Mining Customer Opinions from
Free Text

Gamon et al. (IDA 2005)

Visualizing Sentiments in Financial Texts

Ahmad and Almas (IV2005)

Visual Analysis of Conflicting Opinions

Chen et al. (VAST 2006)

Who Votes For What? A Visual Query
Language for Opinion Data

Draper and Riesenfeld
(Vis 2008)

Visual Opinion Analysis of Customer
Feedback Data

Summary Report of printers

Scatterplot of customer
reviews on printers

Circular Correlation Map

Oelke et al.
(VAST 2009)

OpinionSeer: Interactive Visualization of
Hotel Customer Feedback

Wu et al.
(InfoVis 2010)

Taking the Pulse of the Web: Assessing
Sentiment on Topics in Online Media

Brew et al.
(WebSci 2010)

Understanding Text Corpora with Multiple
Facets

Shi et al.
(VAST 2010)

18

Research Problem


Can we design a sentiment visualization system that:


Show how the sentiment evolves over time (
trend
)


Visualize both the sentiment analysis results and the structured
facet data, e.g. profile of the reviewer (
facet
)


Rather than only showing which document or feature tends to be
positive or negative, also demonstrate how the positives/
negatives are described in documents (
context
)



Most existing sentiment visualization fails to meet all
the requirements simultaneously


Our VISA design is based on the TIARA prototype
, which already
brings together most features (trend, context, facet switching)

19

Retrospect on TIARA Visualization

(Emergency Room Record)

20

Challenges for TIARA Sentiment Visualization


Failure of the document trend visualization


Binary/ternary/scored classification of document
-
level
sentiments will drop valuable pieces










BUT: It has
BED BUGS

and they BITE me!!!


21

Challenges for TIARA Sentiment Visualization


Keyword Summarization


Content visualized are keywords summarized from all the text,
not echoing the
sentiment
-
centric design


Structured Facet


Sentiment
-
aware
facet associations and distributions


Spatial (location) information


Comparison


Categorical, temporal comparison, and
sentiment comparison
as well


Compatibility with sentiment analysis engines


Consumability

of all kinds of sentiment analysis results


Sentiment Tuple


{Aspect, feature, opinion, polarity}


Aspect
: a sub
-
topic shared by some document


In a hotel review, the room, the view, or the service


Feature
: specific object the users are commenting


Entity, person, location, or abstract concepts


An
opinion

is a particular word or phrase describing a feature


Polarity

of the opinion word/phrase in the context






……

Sentiment

Analysis Model

aspect: feature: opinion: polarity

aspect: feature: opinion: polarity

……

aspect: feature: opinion: polarity

aspect: feature: opinion: polarity

……

aspect: feature: opinion: polarity

aspect: feature: opinion: polarity

……

{ “view”,
+
}

Aggregate

Keyword Summarization (TIARA)

A set of topics

{T
1
, …T
i
,… T
N
}

A set of keywords

{W
1
, …, W
j
, …, W
M
}

A set of topic
probabilities

{…,
P
(T
i

| D
k
), …}

A set of word
probabilities

{…,
P
(W
j

| T
i
), …}

k
th document in
the collection

Rank the topics to present
most valuable ones first

Select keyword sub
-
set for each
time segment for content summary

{…}

t
-
1
, {…, W
j
, …}
t
, {…}
t+1
,

VISA Sentiment Keyword Summarization

{C
1
, …C
i
,… C
N
}

A set of
sentiment

keywords

(opinions/features)

{W
1
, …, W
j
, …, W
M
}

A set of topic
probabilities

{…,
P
(T
i

| D
k
), …}

A set of word
probabilities

{…,
P
(W
j

| T
i
), …}

k
th

document in
the collection

Let user select to compare
aspects of a hotel or

an aspect of several hotels

Select keyword sub
-
set for each time
segment for sentiment summary

{…}

t
-
1
, {…, W
j
, …}
t
, {…}
t+1
,

Aspects/Hotels

VISA Mashup Visualization

Sentiment

Tuple

Trend

Facet

Correlations

Sentiment

Snippets

Search

Sentiment
-

Centric

Document

Ranking

Filters

26

VISA Sentiment Visualization Framework


Offline:


Document pre
-
processing


Sentiment analysis


Meta data parsing


Indexing


Online:


Data Retrieval


Visualization


Interactions


Offline Analysis

Raw Data

Reader

Extractor

StatisticManager

Dictionary

IndexWriter

Index

Meta Data

Sentiment Data

Segment Extractor

Sentence Extractor

Text Extractor

Entity Policy

Filter

OpenNLP


Sentiment


Entity Class


No/Not

aspect: feature: opinion: polarity

Data Analysis Framework

Offline Analysis

Raw Data

Reader





3
rd

Party Sentiment Analysis
Framework

IndexWriter

Index

Meta Data

Sentiment Data

aspect: feature: opinion: polarity

Data Server

Query Parser

Data Retrieval

Lucene

Hermes

Index

HttpServlet

VISA

Data Adapter

Sentiment Trend Optimizations


Sentiment tuple based negative/positive/(neutral) trends

Positive

Negative

Y axis: sentiment value

X axis: time

Time Sensitive Feature/Opinion words

Sentiment
-
Centric Interactions

32

Case Study
----

Summarizing Hotel Reviews


Initial View


33

Case Study
----

Summarizing Hotel Reviews


Switch to
”Family”
type only


(traveling in this
type)


34

Case Study
----

Summarizing Hotel Reviews


Click on the
“Free”
sentiment
word


(want to enjoy
the free time
or free
breakfast?)


It’s 30 min
distance
from the
harbor!



35

Case Study
----

Summarizing Hotel Reviews


For two
selected
hotels


Drill down
to the
“cleanliness”
and “room”
aspects


Switch to
the negative
sentiments


36

Case Study
----

Summarizing Hotel Reviews


Comparing
the recent
reviews


37

Case Study
----

NFL on Twitter


Crawling tweets from Twitter on the topic of National
Football League (NFL), from 03/2011 to 08/2011. (when
the famous lock out happened)


665360 tweets from 307973 users, with an average
length of 16.8 words.


Tweet collection pre
-
processing:


Classify

into 5
content topics
: “season play”, “player draft”,
“lockout bad”, “lockout end” and “football return”.


Categorize

according to the
subject of the sentiments


32 NFL
teams, by manually creating relevant subject keyword list for
each team (full/nick name, city, stadium, head, owner and super
stars)




38

Case Study
----

NFL on Twitter


Overview of sentiments on content topics


Reach peak in July when the new CBA signed


39

Case Study
----

NFL on Twitter


Subject
-
comparing view on 4 NFL Teams


“Green Bay Packers”, “Pittsburgh Steelers”, “New York Jets”, “New England Patriots”


A very large
RED “CBA”
for the Steelers: the only team to vote “NO” to CBA


“Brett Favre”
for the Packers: the former NFL all
-
star quarterback in Packers, who has
claimed to return for several times. The fans are tired of the similar news at all.





40

User Study
----

Setup


Subject


VISA System
with all functionalities


TripAdvisor.com


A plain
text editor
with search function



Data


HK hotel cases with 3 hotels’ reviews


Both structured (ratings) and unstructured
(review comments) data inputs



User


12 users (7 male, 5 female), age 26~35


Each is given a gift as incentive



Task


TI: look up specific sentiment
-
related
information of a hotel (e.g. traveler’s ratings).


T2: summarize opinions on a general aspect of
a hotel (e.g. the view of a hotel)



Procedure


Within
-
subject design: user perform all tasks
with all the systems


Record user demographics, time of completion
and satisfactions and open
-
ended questions

TripAdvisor

Text Editor

VISA

41

User Study
----

Objective Results


Three metrics:
Elapsed time
(in minutes)
,
task
completion rate

and
task correctness
.





0
0.5
1
1.5
2
2.5
3
VISA
TripAdvisor
TextEditor
VISA
1.66
1
0.75
TripAdvisor
2.94
0.81
0.42
TextEditor
2.69
0.86
0.67
Time(min)
Completion
Correctness
Significant advantages of VISA

over the compared systems

(t
-
test significance p< 0.004~

0.034)

42

User Study
----

Subjective Results


Three metrics:
Usefulness
,
userability
and
satisfaction
.





0
1
2
3
4
5
VISA
TripAdvisor
TextEditor
VISA
4.58
4.08
4.29
TripAdvisor
2.46
2.67
2.38
TextEditor
2.5
2.33
2.17
Usefulness
Usability
Satisfaction
Subjective Evaluation Results
43

User Study
----

Open Surveys


Why VISA is thought better than the baseline systems:



mash
-
up visualizations
” and “
rich interactions



“Mash
-
up visualizations provide more information and it’s
quite intuitive”, “rich interactions make it easy to search
what I want to know”


Improvements to VISA:

it now needs some learning
efforts to use VISA”, “It could introduce better UI design
and richer interactions”.




44

Summary


We have presented the VISA system for generic
sentiment visualization purpose


The backend core is the
new sentiment
-
tuple definition
, as well
as the faceted data model


In visualization, we introduce several
critical optimizations
over
TIARA in
sentiment visualization
scenarios: sentiment
-
tuple
based trending, sentiment keywords, comparison, sentiment in
document context, interactions


Evaluated with
two real
-
life case studies


Conduct
formal user study
to compare with two baseline
systems and demonstrate the clear advantage




45

Thank You

Merci

Grazie

Gracias

Obrigado

Danke

Japanese

English

French

Russian

German

Italian

Spanish

Brazilian Portuguese

Arabic

Traditional Chinese

Simplified Chinese

Hindi

Tamil

Thai

Korean