Sentiment Analysis

blabbingunequaledAI and Robotics

Oct 24, 2013 (3 years and 7 months ago)

66 views

Sentiment Analysis

An Overview of Concepts and
Selected Techniques

Terms


Sentiment


A thought, view, or attitude, especially one
based mainly on emotion instead of reason


Sentiment Analysis



aka opinion mining


use of natural language processing (NLP) and
computational techniques to automate the
extraction or classification of sentiment from
typically unstructured text

Motivation


Consumer information


Product reviews


Marketing


Consumer attitudes


Trends


Politics


Politicians want to know voters’ views


Voters want to know policitians’ stances and who else
supports them


Social


Find like
-
minded individuals or communities

Problem


Which features to use?


Words (unigrams)


Phrases/n
-
grams


Sentences


How to interpret features for sentiment
detection?


Bag of words (IR)


Annotated lexicons (WordNet, SentiWordNet)


Syntactic patterns


Paragraph structure


Challenges


Harder than topical classification, with
which bag of words features perform well


Must consider other features due to…


Subtlety of sentiment expression


irony


expression of sentiment using neutral words


Domain/context dependence


words/phrases can mean different things in different
contexts and domains


Effect of syntax on semantics

Approaches


Machine learning


Naïve Bayes


Maximum Entropy Classifier


SVM


Markov Blanket Classifier


Accounts for conditional feature dependencies


Allowed reduction of discriminating features from
thousands of words to about 20 (movie review
domain)


Unsupervised methods


Use lexicons


Assume pairwise
independent features


LingPipe Polarity Classifier


First eliminate objective sentences, then
use remaining sentences to classify
document polarity (reduce noise)



LingPipe Polarity Classifier


Uses unigram features extracted from
movie review data


Assumes that adjacent sentences are
likely to have similar subjective
-
objective
(SO) polarity


Uses a min
-
cut algorithm to efficiently
extract subjective sentences


LingPipe Polarity Classifier

Graph for classifying three items.

LingPipe Polarity Classifier


Accurate as baseline but uses only 22% of
content in test data (average)


Metrics suggests properties of movie
review structure

SentiWordNet


Based on WordNet “synsets”


http://wordnet.princeton.edu/


Ternary classifier


Positive, negative, and neutral scores for each
synset


Provides means of gauging sentiment for
a text



SentiWordNet: Construction


Created training sets of synsets, L
p

and L
n


Start with small number of synsets with fundamentally
positive or negative semantics, e.g., “nice” and “nasty”


Use WordNet relations, e.g., direct antonymy, similarity,
derived
-
from, to expand L
p

and L
n

over K iterations


L
o

(objective) is set of synsets not in L
p

or L
n


Trained classifiers on training set


Rocchio and SVM


Use four values of K to create eight classifiers with
different precision/recall characteristics


As K increases, P decreases and R increases


SentiWordNet: Results


24.6% synsets with Objective<1.0


Many terms are classified with
some

degree of
subjectivity


10.45% with Objective<=0.5


0.56% with Objective<=0.125


Only a few terms are classified as definitively
subjective


Difficult (if not impossible) to accurately
assess performance

SentiWordNet: How to use it


Use score to select features (+/
-
)


e.g. Zhang and Zhang (2006) used words in
corpus with subjectivity score of 0.5 or greater


Combine pos/neg/objective scores to
calculate document
-
level score


e.g. Devitt and Ahmad (2007) conflated
polarity scores with a Wordnet
-
based graph
representation of documents to create
predictive metrics



References

1.
http://www.answers.com/sentiment
, 9/22/08


B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment
classification using machine learning techniques,” in
Proc Conf
on Empirical Methods in Natural Language Processing (EMNLP)
,
pp. 79

86, 2002.


Esuli A, Sebastiani F.
SentiWordNet: A Publicly Available Lexical
Resource for Opinion Mining.
In: Proc of LREC 2006
-

5th Conf
on Language Resources and Evaluation, 2006.


Zhang E, Zhang Y.
UCSC on TREC 2006 Blog Opinion Mining.
TREC 2006 Blog Track, Opinion Retrieval Task.


Devitt A, Ahmad K.

Sentiment Polarity Identification in Financial
News: A Cohesion
-
based Approach
.
ACL 2007.



Bo Pang , Lillian Lee, A sentimental education: sentiment
analysis using subjectivity summarization based on minimum
cuts, Proceedings of the 42nd Annual Meeting on Association for
Computational Linguistics, p.271
-
es, July 21
-
26, 2004.