Emotional Annotation of Text

munchsistersΤεχνίτη Νοημοσύνη και Ρομποτική

17 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

72 εμφανίσεις


Emotional Annotation of Text



David Gallagher

Department of Computer Science

University of Wisconsin
-
Platteville

GallagherD@uwplatt.edu



Abstract


Emotions play

an

important role in human intelligence, rational decision making, social

interaction,
perception, memory, learning, creativity, and more [
4
]. Automatic detection of
emotions in texts is progressively becoming an area of research demanding attention
, since
e
motions convey a vast amount of information that is difficult to comprehend and ofte
n difficult
to express

in words, particularly in text
. Effective analysis of text can lead to a vast array of
applications, such as opinion mining
,

market analysis, affective computing, and natural language
interfaces such as e
-
learning environments or educational/edutainment games. Emotions have
been deeply studied in order to gain a better understanding of human interaction. While the
subject of
emotions has been classically researched by the fields of psychology and behavior
sciences, with increasingly integrated communications, the subject is applicable to the field of
computer science, particularly in field of human computer interaction.





In
troduction


Emotions

have been a classical area of study for various disciplines such as psychology and
behavior sciences, as
emotions

play an intrinsic role in our human nature.
In p
articular, emotions
convey a vast amount of subtext or connotation that
can greatly change the context of the
communications that individuals share with one another. Researchers in the field of computer
science have carried out various studies on the emotions represented by facial expression [
6
]
and
the recognition of these e
motions,

through a variety of sensors analyzing the various facial
expressions s
hown in the various emotions [3
].


The most natural way for a computer to
recognize the emotion of
the user is to detect his

or her

emotional state from the text that the user
entered
be it from

a blog, online chat site, or in another
form of text [
2
]. The automatic emotional annotation in texts is becoming increasing
important
2

from an applicative point of view, for the advancement of many related fields. Affective
computing,
or natural language interfaces such as e
-
learning environments or
education/edutainment games would benefit significantly from automated emotional annotation
of text. The
study

would
also
greatly increase the effectiveness of machine
-
driven tasks of
opinion mining and market analysis.


For example, the following are specific areas that the application of an automated affective
analysis could make intriguing and invaluable advancements:




Sentiment Analysis.



Computer Assisted Creativity.



Verbal
Expressivity in Human Computer Interaction.



Artificial Intelligence.


K
nowledge
-
based approaches and machine learning

approaches

were adopted for automatic
analysis of emotions in text, aiming to detect

the writer’s emotional state.
Knowledge
-
based
approac
hes

consist

of using linguistic models or

prior knowledge to classify emotional text.
Machine learning approaches

use

supervised learning

algorithms to build models from annotated
corpora

or
a large and structured set of texts
.
R
esearch done in the
field

of sentiment analy
sis
have also
been
applied
various linguistic models and different learning algorithms.

The machine
learning technique tended to perform better than lexical
-
based techniques because they can adapt
well to different domains [
2
].


This pa
per highlights affective analysis background

and related fields and
a simplified process to
conduct computer assisted emotional annotation of text.



Framework and Background


Deriving the emotional content of a text through linguistic analysis is an ext
remely, even
infamously difficult task. Many fields, such as psychology, sociology, and philosophy, have
proposed approaches for the emotion detection. These fields have studied emotions with

respect
to facial expressions, action tendencies, physiologica
l activity
, and subjective experience [
3
].


A text
-
based emotion prediction system would benefit from identifying the emotional affinity of
sentences. The emotion analysis on sentence level may also be important for more detailed
emotion analysis systems [
5
]
.


Several researchers have attempted to solve this issue in distinct ways.
Cecilia Ovesdotter
Alm
a
professor at Rochester Institute of Technology
explored the text
-
based emotion prediction

3

problem. In order to classify the emotional affinity of
sentences

in the narrative domain of
children’s fairy tales, they annotated a corpus of 22 Grimms’ tales on sentence level with eight
emotion categories (
a
ngry, disgusted, f
earful, happy, sad, positively s
urprised,
and negatively

s
urprised)
.


Alena
Neviaro
uskaya
,
a JSPS Postdoctoral Researcher in the Knowledge Data
Engineering and Information Retrieval Laboratory, Department of Computer Science and
Engineering, Toyohashi University of Technology
,

addressed the tasks of recognition and
interpretation of affe
ct comm
unicated through text messaging [1]
. Classifying the mood of a
single

text is a hard task; state
-
of
-
the
-
art methods in text classification achieve only modest

per
formance in this domain
. In this area, some of the hardest problems involve

acquiring
large
collection

of text tagged with detail linguistic expressions that indicate emotion.


Development


To lay the framework for the development of automatic emotional annotation one must decide
what are the most basic of emotions. This is an important
step and there are several emotional
models that have been developed that may be used as a pivotal resource. These emotional
models include:




Plutchik’s Model
.
Robert
Plutchik
,

a psychology professor emeritus at the Albert
Einstein College of Medicine an
d Adjunct professor at the University of South Flordia,
proposes that there is a small number of basic emotions
;

anger, anticipation, disgust, joy,
fear, sadness and surprise. All other emotions are derivative states; that is, they occur as
combinations,
mixtures, or compounds of the primary emotions. Plutchik states that all
emotions vary in their degree of similarity to one another and that each emotion can exist
in varying degrees of intensity or levels of arousal

(See Figure 1)
.

4




Ekman
.
Paul
Ekman
, a widely renowned American psychologist,

has focused on a set of
six basic emotions that have associated facial expressions: anger, disgust, fear, joy,
sadness and surprise. Those emotions are distinctive, among other properties, by the
facial expression characteristic to each one.



OCC Model
.
The OCC Model has become the authoritative model for emotional
synthesis. It presents its 22 emotional categories in pair of an emotion and its antithesis:
pride
-
shame, love
-
hate, hope
-
fear and so on.



Parrot
. Parrot categorizes the emotions in a short tr
ee structure. This tree has three
levels: primary emotions, secondary emotions and tertiary emotions. Parrot presents
love, joy, surprise, anger, sadness and fear, as the primary emotions.


Even with a model that can accurately annotate text with values
of the chosen basic emotions,
several emotions may have similar or ambiguous meaning such as happy and contented. A
technique to distinguish these words is through the use of emotional dimensions. There are three
categories
of

emotional dimension
: evalu
ation, activation and power
. Evaluation represents how
positive or negative an emotion is. Activation represents an active or passive scale for emotions.
Power represents the control that is exerted, at one end of the scale we have emotions that are
sub
missive and at the other end we have emotions that are dominant.


As this paper has alluded, there are many ways to develop an effective automatic emotional
annotation algorithm. To highlight one approach we will analyze the Sequential Minimal
Optimizatio
n (
SMO
)
implementation of the Support Vector Machine (
SVM)
illustrated by
Figure 1.

Plutchik’s Wheel

5

Soumaya
Chaffar
a researcher
and
Diana
Inkpen

a professor of computer science at the
University of Toronto
.


One must develop a dataset or emotional
dictionary that

can
be
used for
the emotion look
-
up
detection in text. It is useful to have a variety of datasets collected in from different sources as
one may be better suited for a different type of development. Next in order to further analyze the
sentence, a feature set can be app
lied in order to highlight specific things such as negative words,
conju
nctions, punctuations, contexts

and so on. At this point an algorithm may be applied to
derive the emotional annotation of a text. Finally the algorithm can be compared to analyze it
s
ability to distinguish emotions.


Datasets


Five datasets were used in the experiment by Chaffar and Inkpen, these are detailed below.


Text Affect


This dataset consisted
two separate parts
drawn from
news headlines from
renowned

newspapers, as well fro
m the Google News search engine. The first
part

was developed for the
training and composed of 250 annotated sentences. The second
part

was designed for testing and
it consisted of 1,000 annotated sentences. Six emotions (anger, disgust, fear, joy, sadnes
s and
surprise
-

similar to the Ekman model) were used to annotate sentences according to the degree
of emotional load.


Neviarouskaya et al.’s Dataset


This data set was developed by Neviarouskaya

and others. In these datasets, ten labels were
utilized to annotate sentences by three annotators. These labels consist of the nine emotion
al
categories defined by Izard;
anger, disgust, fear, guilt, interest, jo
y, sadness, shame,

surprise

and
a neutral
category. For their experiment Chaffar and Inkpen only considered sentences on which
two annotators or more completely agreed on the emotion category.




Dataset 1
. This dataset includes 1000 sentences extracted from various stories in 13
diverse categ
ori
es such as health, education

and wellness.



Dataset 2
. This dataset includes 700 sentences from collection of diary
-
like blog posts.


Alm’s Dataset


Alm’s Dataset contained annotated sentences from fairy
tales

-

Grimm’s Fairy Tales. In the
highlighted experiment only
sentences

with high emotional agreement were used in the
6

experiment. Ekman’s
list of basic emotions was

used for sentences annotations, because of data
sparsely

and related semantics between ange
r and disgust, these two emotions were merged
together by Alm. This resulted in the five emotions of happy, fearful, sad, surprised and

angry
-
disgusted.


Aman’s Dataset


This dataset consists of emotion
-
rich sentences collected from blogs. Ekman’s basic

emotions
happiness, sadness, ang
er, disgust, surprise, fear

and also a neutral

category were used for
sentences annotation. The sentences were
labeled

with emotions by four annotators. The
experiment considered only sentences for which the annotators agre
ed on the emotion category.


Feature Sets


Feature sets can be applied in order to highlight specific things such as negative words,
conjunctions, punctuations, contexts, which could drastically change the meaning and emotional
load of a sentence. To ensu
re proper emotional classification of text, it is essential to choose the

relevant feature sets to be considered. Various feature sets are illuminated below:




Bag
-
Of
-
Words (BOW)
. Each sentence in the dataset was represented by a feature vector
composed o
f
b
oolean attributes for each word that occurs in the sentence. If a word
occurs in a given sentence, its corresponding attribute is set to 1; otherwise it is set to 0.
BOW considers words as independent entities and it does not take into consideration a
ny
semantic information from the text. However, it
generally
performs very well in text
classification.



N
-
grams
. They are defined as sequences of words of length n. N
-
grams can be used for
catching syntactic patterns in text and may include important
te
xt features such as
negations

F
or

example, “not happy”. Negation is an important feature for the analysis of
emotion in text because it can totally change the expressed emotion of a sentence. For
instance, the sentence “I’m not happy” should be classified into the sadness category and
n
ot
be classified into the

happiness

category
.



Lexical emotion features
. This kind of feature set represents the set of emotional words
extracted from affective lexical repositories such as, WordNetAffect. The highlighted
experiment used all the emotiona
l words, from the WordNetAffect (WNA), associated
with the six basic emotions.



Dependency analysis.
MiniPar is an example of a program that can be used to derive
features of a sentence,

by

breaking the contents down and showing how the words are
related t
o one another. In MiniPar, nodes are numbered and
arcs between nodes are

a
dependency relation. Each dependency relation is labeled with a tag to
identify

the kind
of relation that these nodes share (See Table
1
).

7








Emotional dimension analysis
. An EmoTag

is based on the emotional dimensions of a
sentence. Words are filtered using a stop list and dependency analysis used to identify
scope of negation. Emotion value of word is looked up in an affective
dictionary;

emotion value is inverted for words that
were filtered for negation. Once all the words of
the sentences have been evaluated, the average value for each dimension is calculated
(See Table
2
).





Algorithm Application


Weka

(Waikato Environment for Knowledge Analysis) is a popular suite of machine learning
software written in Java, developed at the University of Waikato, New Zealand.
Using various
Table 1.

MiniPar

example of dependency tree for the sentence
“two of her tears wetted his eyes and they grew clear again

Table
2
.

Fragment of a marked up table.

8

Weka

software tools Chaffar and Inkpen’s

experiment illustrates the effectiveness of different
algorithms on the various datasets discussed earlier; J48 for Decision Trees, Naïve Bayes for
a

Bayesian classifier and the SMO implementation of SVM. The
Weka

ZeroR classifier was used
as a base line

because this classifier does not take into account any sort of feature set

(See Table
3)
.





From Table
3

we can see that the SMO algorithm distinguishes itself and the premier algorithm
in this selection
, as it has the highest accuracy for matching emotions
. In the next section of their
experiment they applied the SMO algorithm with various feature sets to
see the highest accuracy.
In this section they used different data sets to generalize unseen examples

(See Table 4)
.



It is interesting to note that for the various datasets the simplified BOW approach seems to
achieve the highest accuracy on most of the
test sets/datasets
. This could be explained by the
fact that the SMO algorithm does not accurately account for the var
ious features presented by the
Table
3
.

Results for the training datasets using the accuracy rate (%)

9

feature sets, the feature sets themselves contain some type of fundamental error or the test
sets/datasets presented do not accurately meet the expectations of the algorithm.


Conclusions


Written language is one of our most
common forms of commun
ication and only increasing

in
popularity,

besides transmitting informative content, it also transmits information about the
user

s attitude including the user’s emotional state. While there have been many studies carried
out in the
field of human computer interaction, comparatively lit
tle research has been devoted

to
the detection of emotions in texts [
6
]. There
is

a smorgasbord of related fields that could benefit
from and contribute to the advancement of automated emotional annota
tion of text.

Before that
can happen further research and work is needed to aid the advancement of this field and aid the
proper development of the semantic web which could be used to
boost

the
dev
el
opment

of the
emotional annotation of text.


References


[
1
]


Alm, Roth and Sproat.

Emotions from Text: Machine Learning for Text
-
based

Emotion
Prediction
.”

Proceedings of Human Language Technology Conference and Conference on
Empirical Methods in Natural Language Processing
, Vancouver, British Columbia, Canad
a,
pp.579
-
586, 2005.

[
2
]


Chaffar, Soumaya, and Diana Inkpen. "Using a Heterogeneous Dataset for Emotion Analysis
in Text."
School of Information Technology and Engineering, University of Ottawa Ottawa,
ON, Canada
. University of Ottawa, 2011. Web. 7 Oct 20
12.
<http://www.site.uottawa.ca/~diana/publications/SoumayaCANAI2011final.pdf>.

[
3
]


Devillers, Laurence , Laurence Vidrascu, and Lori Lamel. "Challenges in real
-
life emotion
annotation and machine learning based detection."
Neural Networks
. 18.4 (2005): 4
07
-
422,
ISSN 0893
-
6080.

[
4
]


Picard, R. W. “Affective Computing.”
The MIT Press
, MA, USA, 1997.
<http://affect.media.mit.edu/pdfs/99.picard
-
hci.pdf>

[
5
]


Quan, Changqin, and Fuji Ren. "Sentence Emotion Analysis and Recognition Based on
Emotion Words Using
Ren
-
CECps."
International Journal of Advanced Intelligence
. 2.1
(2010): 105
-
117. Web. 7 Oct. 2012. <http://aia
-
i.com/ijai/sample/vol2/no1/105
-
117.pdf>.

[
6
]


Strapparava , Carlo, and Rada Mihalcea. "Learning to Identify Emotions in Text". Fortaleza,
Brazile
: 2008. Web. 7 Oct. 2012. <http://www.cse.unt.edu/~rada/papers
>