Team Research Proposal

munchsistersAI and Robotics

Oct 17, 2013 (3 years and 11 months ago)

107 views

MLA 7
th

1




Team Research Proposal

Team POLITIC

Political Opinions in Literature: Identifying Themes in International Compositions


Robert Cai, Matthew Carr, Adam Elrafei, Alexander Goniprow,

Adrian Hamins
-
Puertolas, Manpreet Khural, Andrew Li, Alexandra Winter,

Soumya Yanamandra, Dan Yang, and Kay Zhang


University of Maryland Gemstone Program

Mentor: Dr. Peter Mallios

Librarian: Timothy Hackman

and

The Maryland Institute for Technology in the Humanities


We pledge on our honor that we have not given or received
any unauthorized assistance on this
assignment.

Team POLITIC

2

Introduction


The United States
was

involved in numerous interna
tional conflicts throughout

the 20
th

century. A prevalent theory suggests deeper public understanding of foreign cultures might have
allowed

th
e United States to avoid several

of these conflicts, including the Iran Hostage C
risis
and the Vietnam War (Li). Since the United States is a democracy,
citizen
perception of foreign
countries has a direct relationship with foreign policies enacted.
A thorough
understanding

of
how the

American

public gathers its perceptions of foreign cultures is crucial

to

fully

comprehend

American foreign policy and

international relations
. Foreign literature is one
important medium that exposes the United States to

the political and cultural ideologies of other
countries (Griswold 1077). The
American
public
reads

novels

by foreign authors

to

gain

an

intimate
p
erspective of

foreign
societies

view
s

unavailable
through
domestic media.
Readers
can
also
connect to other
cultures because n
ovels create emotional ties by appealing to universal
human themes (Aubry 27). At the same time, international and domestic political concerns guide

the

United
States’

public interest in foreign literature. For instance, it is not a coinc
idence that the
peaceful
writings of Gandhi became important in the United States during the Civil Rights
Movement

(Mallios 10
-
19).



H
owever,
different

foreign

authors

often provide
opposing viewpoints of
their societies
.
The most popular wor
ks
form a selective base

of foreign literature that potentially accommodates

elites’
self
-
serving political biases.
Using experimental methods, Gilens asserts
that the
United
States’

ignorance and misinformation “leads many [citizens] to hold political views
different
from those they would hold otherwise” (379).
Therefore, understanding public intent and attitude

requires knowing why certain novels
and authors
seem representative of a cultural canon
. To
become a better
-
informed political citizen of the
United
States, one must

think critically about
Team POLITIC

3

the uses of foreign literature.



Our

study
will

investigate how publicly available United States media

r
eceived

for
eign
novels

and authors and how these portrayals work

toward social and political ends of
government support and criticism

(Mallios 10
-
19). Specifically, we will conduct a low
-
constraint
case study of Russian literature to address the following question: Did

the

reception of Russian
novels and authors

in the United States

and United States fore
ign policy toward Russia r
eflect
each other from 1900
-
1923
? We hypothesize
that
the reception of Russian literature

in the
United States

significantly correlates with United States policies toward Russia
, due to inherent
ties between literary evaluation
and political understanding
. Scholars,

politicians
, and other
government officials will likely take

interest in our study.



We

will use the portrayals of
selected
Russian novels

and authors

in nationally available
print media to

define the reception

of Russian literature

in the United States during this time
period
. We recognize scholars could
investigate how
alternative fo
rms of media, such as pictures
or political cartoons
,
influence public understanding
.
However, we chose print media because it
is

the easiest to quantitatively analyze.
We will define United Stat
es foreign policy toward
Russia

through quantifiable measures such as foreign aid, military investment
, and trade deals
from 1900
-
1923
. This will take the form of overarching topics that
describe the types of policies
enacted, such as interventionism and humanitarianism. Our analysis will include keyword
searches relative to both literary reception and foreign policy. We will track how these themes
have evolved over time using techniques o
f topic
modeling.
1




Our study does not seek to determine a relationship between
political climates and
messages found in novels, opinions held by authors, o
r motivations behind translators. Instead,
we will determine the extent to which

there is a
relationship between media reception of Russian



1

See Appendix H for an example of topic modeling output.

Team POLITIC

4

literature

in the United States
and the political climate.
Our

research is distinguished from
previous studies in two wa
ys: it analyzes reception in

United States media and not the intent of
authors or transl
ators, and we will accomplish our a
nalysis through quantitative,

not
just
qualitative
,

methods.


Throughout the rest of our proposal, we will summarize our literature review, outline our
methodology, explain the limitations of our research, list confoundi
ng variables, and conclude
with descriptions of our anticipated results, our budget, our timeline, and the statistical tools we
will use throughout the project.

Literature Review

Introduction of Russian Literature in the Western World

Eugene
-
Melchoir de Vogue's
Le Roman Russe (The Russian Novel)
in 1886 represented
the increasing interest in Russian literature in Weste
rn Europe and America. Many
writers,
including Isabel Hapgood and Constance Garnett
,

published English translations of R
ussian
novels, short stories, and poems to critical acclaim
in subsequent decades
(Mose
r

431
). In other
words, the late nine
tee
nth and early
twentie
th centuries

mark
ed the availability of Ru
ssian
literature to US

public and intellectuals.


Many studies

have sought to understand

literary themes found in major Russian works.
For example, Emerson analyzes Leo Tolstoy’s views on war through a close reading of his many
texts (1855). However, only a few studies address Russian literary reception in

the United
States
during the early twentieth century
. One of these rarities is Goldfarb’s account of how a
prominent literary critic, William Dean Howells, suppo
rted Tolstoy’s works in the United States

during the twentieth century

(318). However, this study is limit
ed in that it only contemplates
Team POLITIC

5

Russian literary reception through Howells’ and his critics’ views. We intend to expand on such
studies by using comprehensive statistical tools to analyze a wide
r base

of reception material.

Canon

Formation

and Politics


Political motivations shape a nation’s literary canon, which in turn projects that nation’s
identity.
The idea of a national literature emerged in the late eighteenth century as a way of
proving cultural independence on an international level (Corse,
Natio
nalism and Literature
7
-
14).
Original research studies suggest canonical or high
-
culture literature does not reveal how
citizens perceive themselves, but rather how elites want

to envision their nation (ibid

74).
These
previous studies turn to college syll
abi and literary prizes to define the most frequently appearing
works as canonical or high
-
culture (Brown, 1; Corse,
Nations and Novels

1279
-
82). Unlike
bestsellers or popular culture novels, canonical texts differ greatly between countries, as they are
sy
mbolic in value and not simply “economic commodities
.

Theories of canon formation state
novels
have to

experience a conjunction of large sales and certain types of recognition

to reach
canonical status

(Ohmann 206). This recognition refers to the critical

reception of works found in
publications that “carried special weight in forming cultural judgments
,
” such as the
New York
Times Book Review

and the
New Republic

(204).

However, scholars have never specified the
ways i
n which elites have translated
cross
-
cultural

differences into literature.

Topic Modeling

Researchers use topic modeling to analyze large corpora
of d
ata. Topic modeling affirms
“documents are mixtures of topics, where a topic is a probability distribution over wor
ds”
(Steyvers 2). Fur
thermore, Latent Dirichlet A
llocation (LDA), a more specific type

of topic
modeling, asserts

each document from a larger corpus consist
s of a plurality of topics (Chaney

and Blei
2). In past studies,
researchers have used
topic modeling in general and LDA
Team POLITIC

6

specifically to analyze large corpora of data. For example, a 100
-
topic LDA model generate
d

word probabilities under each topic for all articles in the journal
Science

between 1880 and 2002
(ibid

4).

More complex versions of topic modeling, however,

can

g
ather more inf
ormation from
our Russian
author database
.
For example, Topics over Time (TOT) models

can

account for the
chronology
of documents in a corpus (ibid

9).
Since our documents are dynamic in that they
change over time,

LDA

would
confound the topi
cs’ changes and lose

any pe
rceivable patterns
.
Xuerui
Wang and Andrew McCallum explain the topic analysis of US

Presidential State
-
of
-
the
-
Union
addresses, where LDA

“confounds Mexican
-
American War (1846
-
1848) with some
aspects of World War I (1914
-
1918)” s
ince it is “unaware of the 70
-
year separatio
n between the
two events” (1). Modeling topics over time

serves to address this issue.


In
Wang and McCallum’s study, they incorporated

timestamps
to help
track “changes in
the occurrence of
the
topics
themselves” as a function of time

(2)
. They tested their model on
three data sets: “more than two centuries of U.S. Presidential State
-
of
-
the
-
Union addresses,” “17
-
year history of the NIPS [Neural Information Processing Systems] conference,” an
d “nine
mont
hs of email archive


(ibid).

The r
esults of their study show
the TOT model is able to predict
the

timestamps of documents and
generates topics that are “more distinct from each other than
LDA topics” (
ibid
5). In our research, we will also u
se a TOT
model
on the databases we
anticipate constructing
to
account for time.

Furthermore, modified
versions of LDA can relate metadata to topics. Meta
data is
information about the documents we collect such as “author, title, geograp
hi
c location, [and]
links” (Blei

10)
. Therefore, we can also correlate influences s
uch as the gender and ethnicity

of
the authors of the reception material to word probabilities found in

topics

in our corpus
.

Team POLITIC

7

Sentiment Analysis


S
entiment analysis

is also useful for sorting through large corpora of data
. While topic
modeling focuses on the subject of the data in question, sentiment analysis focuses on the
opinion expressed about the
subject matter of the data (Lee

and Pang 1). Multiple methods can
determine

the sentiment of a piece of data.
Lee and Pang

compared three different algorithms
used for sentiment analysis: the Naive Bayes, maximum entropy classification
,

and support
vector machines (
ibid
3). The Nai
ve Bayes algorithm is a

simplistic algor
ithm
. It

may not hold to
high accuracy rates with complicated sets of data, but it “tends to perform surprisingly well” and
is even the ideal algorithm for use with “problem classes with highly dependent features” (
ibid
).
Maximum entropy classification and

support vector machines are both much more sophisticated
methods. Maximum entropy classification algorithms “make no assumptions about the
relationships between features”, which will make it better than Naive Bayes with data that has
little or no dependen
ce on similar features (
ibid
4). Support vector machines differ from both of
the previous methods in that they do not focus on probability, which brings them much closer to
traditional methods used for normal topic modeling adapted to work with sentiment a
nalysis
(
ibid
4).


For our projec
t,
sentiment analysis methods will allow us to

quickly categorize articles by
gauging how American periodicals perceive and discuss Russian authors and novels during the
time period of interest
. In addition, incorporating a sentiment categorization into
our database

will allow future researchers

to quickly add to and examine our data
.

Foreign Policy Analysis



Political scientists have devised several models and theories to explain how foreign
p
olicy develops (Boyer 185). One such theory is the rational actor model, which
states

stimuli
Team POLITIC

8

and immediate responses lead to the creation of foreign policy (Boyer 189). However, the
political aspect of our study does not seek to determine how political le
aders create foreign
policy, but rather

attempts to measure and quantify it. Many previous studies have determined
United States foreign policy towards various nations by analyzing its components. For example,
Rick Travis analyzes foreign policy towards Af
rica by focusing on foreign aid to the continent
(798). Haslam focuses on direct foreign investment and the corresponding treaties to determine
United States foreign polic
y toward other nations (1182).
For our study
, we will gather data on
“exports, import
s, investments, arms sales, and categories of foreign aid (bilateral, aggregate, and
per capita)” between the United States and the Russian Empire (and later the Soviet Union) to
define United States foreign policy (Watson 253).

Methodology

Our first task
s

were

to determine a

time

range and

country
to investigate
, as outlined in
the literature review
.
We
selected

an upper
time

bound

of 1923,
since

all preceding publications

are

in the public domain

and
we can publicly release
all collect
ed data
.
We chose

1
900
as our
lower time bound
to guarantee

a significant number of periodicals
will be available
.
2

Time
allowing, we may be able to expand the t
ime period of interest
, guaranteeing more articles for
analysis.

We

decided to investigate Russian literature

for several reasons
.

First
,

Russia was a
focal point of the United States
during the twentieth century
.

World War I, the Bolshevik
Revolution, and the threat of communism led to increased
public and governmental
interest in
Russia
during our

selected time period
.

Second,
only

a
relatively small

number of significant
Russian authors
had works
available in English

at the time
.

A narrow range of Russian literary
figures sugg
ests

American

periodicals interested in examining Russian literature
had
to
invoke



2

We anticipate finding a significant number of periodicals referencing Russian literary figures during the selected
time period, as shown in Appendix E. By the beginning of our time period of interest, many national periodicals had
already been well establi
shed (Baldasty).

Team POLITIC

9

certai
n Russian literary figures

and works

frequently
, leading to
larger sample size
s

for the
selected authors
.
Subsequently,

we

will be able to construct a more exhaustive corpus
3

of
Russian l
iterature than o
f
the
more readily available
lite
rature from other countries, such as
Britain or France.



To decide which
literary figures

to study,
we

compiled a list Russian
literary figures

whose works had

English

translations during our time period of interest
.
Using that list,
we
cataloged
the number of sear
ch results found in the Readers’

Guide Retrospective
4

for each
literary figure

of interest.
5

From this
preliminary

summary of the availability of

periodicals
in the
United States
specifically discussing Russian literary figures,
we

chose to inve
stigate Dostoevsky
and Tolstoy

to
maintain the feasibility of our

study.

We bring

some bias

in our selection of
literary figures, as
we have

chosen

two of the most

re
nowned Russian
liter
ary figures in the
United States
. Therefore,
our
data re
garding the reception of selected Russian literary figures in
the United States will not be representative of th
e entirety of Russian literary figures
.
We could
add one or two minor
R
ussian authors to our research
to increase the external validity of our
p
roject

if time permits
.



We

resolved to capture a large, representative sample of the body of articles
that
explicitly mention
our
selected Russian literary figures
in periodicals

popular

in the United
States
between 1900 and

1923.
We

will construct a database containing
these
articles

using the
Reader
s’ Guide Retrospective ind
ex.
The Retrospective’s emphasis on more popular periodicals
fits well with
our intent

to gain
an

understanding of how the

general

American

public perceived
significant Russian
literary figures

in the early twentieth century
. We

will use a
subject search of



3

See our Glossary of Terms in Appendix H

4

The Readers’ Guide Retrospective is a comprehensive index of 608 popular periodicals published in the United
States spanning from 1890 to 1982. 224 periodicals in the Readers’ Guide Retrospectiv
e


almost 37% of the
database


are available prior to 1923. See Appendix G.

5

An abridged version of this list can be found in Appendix E.

Team POLITIC

10

se
lected
literary authors

to

explore th
e Reader
s


Guide Retrospective and find articles
appropriate for the constructed database.

Scanni
ng

Since
most articles in the Rea
ders’

Guide are not digitized,
we

have to
digitize
the
physical or micr
ofilm versions of articles that

fall within search parameters. We are currently
scanning a
rticles
by
using publicly available resources at the Universi
ty of Maryland McKeldin
Library. Therefore, our

initial database construction wil
l
contain only

articles

available within
the University of Maryland archive system. Should time permit, it may be feasible to explore
other academic archive
s for articles from

the Readers’

Guide Retrospective
.


We have standardized s
canning techniques to reduce preventable variat
ions in image
quality and size.
6

Systematic
errors, including the
presence of

dust par
ticles, stains, and other
debris

on the scanning glass,
also
contribute to poor

image quality

and complicate analysis of the
database
.
We will

therefore
wipe down the scanning glass

with glass cleaner solution and a
microfiber cloth before and after
each

scan to

reduce this source of error.

P
reservation of the scann
ed material is essential to
data accuracy and reliability
. During
microfilm scanning, an auto
-
adjust function adjusts the brightness and
scanned size

of
each page

to produce an optimally clear image. Furthermore,
we must adjust
the resolution

of the scanne
r

up from the default 300 dots per inch (DPI) to t
he maximum setting of 600 DPI.
Similar settings
are also present on the non
-
print source scanners.

O
nce saved, the file is left unmodified with the
exception of cropping. We
will

not manipulate
images

after scanning
to re
tain the original image
data,

quality
, and integrity
.




6

Examples of standardization in scanning articles include: uniform Scanner type, Scanner settings, and format in
w
hich material is saved. Images will be saved in the Tag Image File Format , a standard “for distributing high
quality scanned images or finished photographic files” (“TIFF Files”).

Team POLITIC

11


We will convert these files to readable documents through Optical Character Recognition
software.

We are using ABBYY FineReader

11 to save the files as plain text documents, DjVu
f
iles, and FineReader documents. Topic modeling and sentiment analysis software can analyze
plain text files; the DjVu format compresses documents and maintains the layout of text on each
page; and we save FineReader files to document the transition from sc
anned image to readable
text.

At this stage, we

remove pictures from the
pages.


Foreign Policy Analysis

The second portion of the project focuses on United States foreign policy toward Russia.
Our goal is

to
quantify

the United States’ changing attitude and foreign policy towards Russian

over
the
established time period for the study of the authors. As mentioned earlier on, one
method of
defining this relationship

is
to examine

statistical data that relates to foreign

policy
including
foreign aid

to Russia
, trade relations, and

America’s

military presence in
Russia
.
We
will also examine
Presidential
s
peeches
delivered

dur
ing the time period of interest; we
will
simply
run searches for

references to

Russia
and
transfer

Presidential

speeches

that

produce hits
into a database for future analysis.

With

sufficient time, we will also
collect and analyze
newspaper editorials

in a similar
manner
.
A

theory discovered in preliminary research indicates

that

editorials of major
newspaper
s

of the late nineteenth

and early twentieth centurie
s, specifically
The New York
Times,

reflected political motivations of the United States government (“Deductions” 42;
Lippmann and Merz 3). If pursued,
a newspaper editorial

database
provides our project with a

wider
scope because it provides an additional
level
of comparison
with
other foreign policy data
.



Team POLITIC

12

Annotation


As we assemble

a corpus of articles regarding literary authors of interest
, one priority is
to ensure we
effectively organize

the constructed database
. We can more easily analyze an
organized corpus
,
making it

essential
for

generat
ing

metadata
7
. Beyond ease of analysis,
metadata will give
us

the ability to categ
orize and analyze articles that

deal with
a
spec
ific topic
or exhibit similar traits, an approach that will yield more significant and interesting results than a
simple keyword search. The assembled corpus’s metadata will include, at a minimum, historical
and archival data concerning each article.
We

wi
ll also attempt

to capture metadata

regarding the
characteristics of each article, such as whether articles include explicit references to ra
dical
politics, by annotating
8

each article.


Annotation questions may reflect biases and stereotypes that
we

bring

individually

to the
project and it is difficult to ensure
our
uniformity in annotation. We

determined what kind of
metadata to capture and refined annotation questions by annotating a sample of articles
from

the
assembled database.
9

The goal of refining a
nnotation quest
ions is to confirm

we will arrive at
similar answers if annotating independently
.

In conjunction with

the
M
aryland
I
nstitute for
T
echnology in the
H
umanities

(MITH)
,
we

will attempt to automate the process by which
we construct metadata
,
reducing t
ime spent on this
portion of our

methodology. It is feasible to automate metadata collection through computer
scripts, including collection of spelling variations in literary author name
s across the constructed



7

According to the National Information Standards Organization, metadata is

“structured information that describes,
explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource…metadata is often
called data about data” (National Information Standards Organization).

8

Annotation is a way to produce variables that will allow us to understand the political significance of Russian
Literature in the United States and catalog the constructed corpus.

9

Reference to revisions of Annotation Questions in Appendix D.

Team POLITIC

13

corpus,
10

or, more abstractly,
p
erfo
rming sentiment analys
is on articles in the corpus.

The end
goal of our

research project is to form conclusions abo
ut the relationship between the
reception
of Russian
l
iterature

in the United States and United States

foreign policies.
To reach

these
conclusions,
we

will need to analyze both an annotated database of articles that pertain to
literature and an annotated database of articles that pertain to foreign policies.


In the data analysis section of the methodology,
we expect

to discover t
rends in the
databases that provide answers to certain questions. For the Russian literature database, the
questi
ons will focus on the discourse

throughout the United States

surrounding the predominant
Russi
an authors
.
11


To

conduct this style of
data analy
sis, we
will use a collection of data mining strategies.
Data mining refers to the process of collecting unknown properties of a database.
Two basic
strategies are keyword frequencies
12

and semantic parsing.
13



The most important data mining analysis
we pl
an to

co
nduct

is probabilistic topic

modeling, “a suite of algorithms that aim to discover and annotate large archives of documents
with thematic information” (Blei 2). A topic is a collection of words that all have a high
probability of being associated t
o
one another
. The basic probabilistic topic modeling is Latent



10

The names

of Russian authors often have a number of accepted spellings and are subject to frequent
mistranslation (Pasterczyk). We will catalog alternative spellings of selected literary figures. The use of Boolean
operators to search for common name variations in
a keyword search of the Readers’ Guide Retrospective will
increase the number of articles found that relate to Russian literary authors of interest. An example of common name
variations can be found in Appendix F.

11

See Appendix C for current annotation gu
idelines.

12

Keyword frequencies, achieved by using the publicly available Text Analysis Portal for Research (TAPoR) tool,
will allow for organization of data on a more general
level (Berson).
An example of the information that TAPor can
provide are the fre
quencies of author references and how often author names are found near each other.

13

We will achieve semantic parsing by using software programs Shalmaneser and FrameNet, developed by the
International Computer Science Institute at the University of Calif
ornia, Berkley
. These programs will allow

us

to
analyze databases using ‘frames,’ which
,

according to FrameNet
,

are semantic representations of situations. These
tools highlight the types of sentences used in specific articles. For example, If an article contains many sentences
framed under the semantic categories of ‘Judgment’ and ‘Assessment,’
we

can safely concl
ude that article contains a
number of opinionated statements. See Appendix H for more information.


Team POLITIC

14

Dirichlet Allocation (LD
A), as described in our literature

review. The end result is that

all the
articles in the database will
have labels

with proportions of various topics
, which can then
be
categorized based on topic frequency
.
By comparison,
we will implement
a

supervised version of
LDA

(sLDA)

in the automation of metadata creation.
14

Finally, the last form of topic

modeling
that
we
will use is the Topics Over

Time model (TOT),
described i
n the literature review, which
will
introduce a time variable into our analysis

(Wang and McCallum 5).


At the conclusion of this step in the process,
we will have fully annotated and labeled
the
databases by all the various data mining strategies. From this data,
we

can determine certain
trends in the topics in the articles. It is these trends that will allow
us to make
certain inferences
about the relationship between the reception of Russian

literature
in the United States
and United
States

foreign policy.

Conclusion


Our research aims to provide new insight into how the United States receives
foreign authors and novels and how this reception relates to US foreign policy.
Our anticipated
resu
lts are vital to a recent development in the humanities known as the globalization of
American literary studies, given that “the mechanisms by which [differences between countries]
are translated into literature have never been fully specified” (Corse,
Nat
ions and Novels
1279).
Foreign novels are an inherent part of
United States

culture and if one were to ignore the
presence o
f foreign literature in United States

politics, then one would be ignoring a major factor
that shaped both the citizens and governme
nt of the

United States. “A sound public opinion
cannot exist without access to the news” and “evidence is needed” to reveal inherent biases in



14

In sLDA,
“each document is paired with a response. The goal is to infer latent topics predictive of the response”
(Blei

and McAuliffe 1). Instead of lett
ing the software construct its own distribution over topics,
we

will provide a
fitted model, specifically the annotation form previously mentioned in the methodology (ibid).
Then

the software
can
predict a response for the previously designated topics, suc
h as sentiment, nationality, racism, politics, etc.

Team POLITIC

15

publicly available portrayals of political events (
Lippmann and Merz 1
).
Experts in fields of
literary studies c
laim scholars reach “little agreement about what constitutes literary value in this
field” and there exists “unnecessary confusion

as to clear standards and goals


in evaluating
these types of
literature

(Brown 1
-
8). We are

also

pioneering relatively new software and
technology in the realm of literary analysis.

By May 9
th
, 2012, we plan to have compiled a sample database of several hundred
articles scanned and processed through the OCR software in preparation for a technical sem
inar
with MITH.

Our

annotation team hopes to annotate
150 of these articles
. The goal of this
seminar is to experiment with some of the
available
database analysis software to determine how
effectively the computer programs can learn to annotate articles i
ndependently and

whether any
trends in the meta
data begin to surface.
We anticipate finding a distinct correlation between the
reception of foreign literature and public attitudes toward foreign policy.
We
will

compile our

completed

findings into an additi
ve online database, to which other scholars can contribute
similar research. Over time, our foundation will pave the way to understanding overall patterns
in foreign literature reception.




Team POLITIC

16

Works Cited

Aubry, Timothy. "Afghanistan Meets the Amazon:
Reading the Kite Runner in America."


PMLA: Publications of the Modern Language Association of America
124.1 (2009): 25
-

43.
EBSCO.
Web. 10 Sept. 2011.

Baldasty, Gerald J.
E.W. Scripps and the Business of Newspapers.
Urbana
-
Champaign: U of


Illinois P,
1999. Print.

Berson, Alex, Stephen Smith, and Kurt Thearling.
Building Data Mining Applications for CRM.


New York: McGraw Hill, (1999): n. pag. Print.

Blei, David M., and Jon D. McAuliffe. “Supervised Topic Models.”

Princeton U and U of


California, Ber
keley, 2010. Web. 17 Mar. 2012.

Blei, David M. “Introduction to Probabilistic Topic Models.”
Communications of the ACM.


Princeton U,
n.d. Web. 17 Mar. 2012.

Boyer, Mark A. "Issue Definition and Two
-
Level Negotiations: An Application to the American

Foreign Policy Process."
Diplomacy & Statecraft

11.2 (2000): 185
-
212.
America: History

and Life with Full Text
. Web. 27 Nov. 2011.

Brown, Joan L., and Crista Johnson. "Required Reading: The Canon in Spanish and Spanish


American Literature."
Hispania

81.
1 (1998): 1
-
19.
JSTOR.

Web. 12 Sept. 2011.

Chaney, Allison J.B., and David M. Blei.
“Visualizing Topic Models.”
International AAAI


Conference on Social Media and Weblogs.

Princeton U

Dept. of Computer Science,


2012. Web. 15 Mar. 2012.

Corse, Sarah M.

Nationalism and Literature: The Politics of Culture in Canada and the United


States
. Cambridge: Cambridge University Press, 1997. Print.


Team POLITIC

17

---
. "Nations and Novels: Cultural Politics and Literary Use."

Social Forces

73.4 (1995): 1279
-


308.
JSTOR
. Web. 8
Sept. 2011.

“Deductions.”
New Republic
4 Aug. 1920: 42
-
3.
EBSCOhost
. Web. 20 Mar. 2012.

Emerson, Caryl. "Leo Tolstoy On Peace And War."
PMLA: Publications Of The Modern



Language Association Of America

124.5 (2009): 1855
-
58.
Academic Search Premier
.

Web.

15 Mar. 2012.

Gilens, Martin. “Political Ignorance and Collective Policy Preferences.”
American Political

Science Review.

95.2 (2001): 379
-
96. Web. 29 Nov. 2011.


Goldfarb, Charles. “William Dean Howells: An American Reaction to Tolstoy.”
Comparative

Literature Studies

8.4 (1971): 317
-
37.

JSTOR.

Web. 12 Mar. 2012.

Griswold, Wendy. "The Fabrication of Meaning: Literary Interpretation in the

United States,

Great Britain, and the West Indies."
American Journal

of Sociology

92.5 (1987): 1077
-

115.
JSTOR.

Web. 13 Sept. 2011
.


Haslam, Paul Alexander. "The Evolution of the Foreign Direct Investment Regime in the


Americas."
Third World Quarterly

31.7 (2010): 1181
-
203.
Academic Search Premier
.


Web. 27 Nov. 2011.

Lee, Lillian, and Bo Pang. “Sentiment of Two Women: Sentiment Analysis and Social Media.”


1900 University Avenue, Cornell University, New York. 22 Mar. 2011. Lecture.

Li, V. “Misgivings of a Tongue
-
Tied Nation.”
Editorial Research Reports
2 (1990): n. pa
g. Web.

CQ Researcher.
13 Sept. 2011
.

Lippmann, Walter, and Charles Merz. “A Test of the News
: Introduction
.”
New Republic

4 Aug.


1920:

1
-
4
.
EBSCOhost
.
Web. 17 Mar. 2012.

Team POLITIC

18

Mallios, Peter Lancelot.
Our Conrad: Constituting American Modernity
. Stanford: Stanford UP,

2010.
Google Books.
Web. 15 Sept. 2011
.

Moser, Charles A. "The Achievement Of Constance Garnett."

American Scholar

57.3 (1988):

431.

Academic Search Premier
. Web. 20 Mar. 2012.

National Information Standards Organization.
Understanding Metadata.
Bethesda: NISO P,


2004. Web. 17 Mar. 2012.

Ohmann, Richard. "The Shaping Of A Canon: U.S. Fiction, 1960
-
1975."
Critical Inquiry

10.1


(1983): 199
-
223.
MLA International Bibliography
. Web. 13 Nov. 2011.

Pasterczyk, Catherine E. “R
ussian Transliteration Variations for Searchers.”
Education


Resources Information Center
8.1 (1985): n. pag. Web. 20 Mar. 2012.

Steyvers, Mark. "Probabilistic Topic Models."

Handbook of Latent Semantic Analysis
. Mahwah,


NJ: Lawrence Erlbaum Associates,
2007.

“TIFF Files.”
John Salim Photographic Glossary of Terms
. 2012. Web. 20 Mar. 2012.

Travis, Rick. "Problems, Politics, and Policy Streams: A Reconsideration US Foreign Aid

Behavior toward Africa."
International Studies Quarterly

54.3 (2010): 797
-
821.

Academic Search Premier
. Web. 27 Nov. 2011.

Wang, Xuerui, and Andrew McCallum. “Topics over Time: A Non
-
Markov Continuous
-
Time


Model of Topical Trends.” U of Massachusetts Dept. of Computer Science,
2006.
Web.



15 Mar. 2012.

Watson, Robert P., and Sean

McCluskie. "Human Rights Considerations and U.S. Foreign



Policy: The Latin American Experience."
Social Science Journal

34.2 (1997): 249
-
57.



Academic Search Premier
. Web. 27 Nov. 2011.


Team POLITIC

19

Appendices

Appendix A: Team Budget


Cost Per Item

Cost

Immediate Expenses:



MLA Guide Book

(already purchased from
MLA.org)

$22.00


Large External Hard Drive

(
1
+

Terabyte)


$300.00


Subtotal:

$322
.00

Foreseeable Expenses:



Hiring Technical Consultant for
Enhancement of Existing Tools


$1,500.00

Travel Expenses (Conferences)


$3,000.00


Subtotal:

$4,500.00


TOTAL:

$4,822.00



Team POLITIC

20

Appendix B: Team Timeline

Spring 2012

o

Complete team website

o

Continue literature r
eview

o

Begin
scanning periodicals into constructed Russian literature database

o

Begin annotating Russian literature database and select metadata to capture

o

Begin coordination with MITH and start to familiarize team with methods of
constructing and analyzing databases



Attempt to automate metadata collection

Summer 2012

o

C
ontinue scann
ing and annotation of Russian literature database

Fall 2012

o

Prepare for
and present at
Junior Colloquium

o

Determine methods by which to quantify American foreign policies



Begin construction of Foreign attitude / policy database

Spring 2013

o

Present at
Undergraduate Research Day

o

Being drafting team t
hesis

Summer 2013

o

Continue to draft team t
hesis

Fall 2013

o

Obtain feedback for our thesis paper from Dr. Mallios

o

Gather data regarding
American foreign p
olicy
toward Russia

o

Draw conclusions regarding the

relationship between American foreign policies
and r
eception of
Russian

literature

Winter 2013
-
14

o

Prepare p
resentation for Thesis Conference

o

Revise and edit team t
hesis

Spring 2014

o

Presen
t at Senior Thesis Conference


Team POLITIC

21

Appendix C: Current Annotation
Guidelines

1.

Author (or authors) of principal concern in article
. What literary author or authors, if
any, is this article primarily about?



Spelling
:


--
Be sure to spell any names given in answer to this question
as accurately as possible
,
exactly
reproducing how the name is spelled in this article. (Spellings will differ between
articles: we want to capture the differences.


--
Include the
fullest version

of the author’s name included in the article: i.e., include an
author’s first and/or middle nam
es and/or initials if these names are included at any point
in the article.



Individuals
:
Only literary authors named by
personal name

(i.e., not anonymous figures
or those referenced only by job title) and who are
persons

(i.e., not publications) count
as
“authors” for purposes of this question.


• “Literary author” means an author of fiction, poetry, plays, or related forms of creative
writing. This applies whether the author is being invoked in his or her capacity as a
literary writer or not.
Academic
professors, literary critics, and journalistic and other
commentators on literature do not fall into this category, unless they have significant
literary accomplishments of their own.


• An author is of “principal” or “primary” concern in an article when a
n author is a major,
continual, or focal concern that runs and receives explicit mention
throughout

an article
as part of its general field of concerns, not just in discrete or severable paragraphs of it.


• Some more rules of thumb on identifying whether
an author is a “primary” or
“principal” concern in an article:


• if a literary author’s name is
included in the article’s title
, it is likely that s/he should
be included in the answer to this question


• if there is a large disproportion between the numb
er of times different authors are
mentioned or referred to, this is a good indicator that those mentioned less should likely
not

be included in the answer to this question


Team POLITIC

22

• if the excising of relatively few paragraphs from this article would result in th
e
elimination of reference to an author, that author should generally
not

be included in the
answer to this question


• as a general matter,
construe answers to this question narrowly
: only an author (or
authors) comprising the main and consistent focus of

an article should be included

although articles whose explicit focus is evenly to compare two (or more) authors
throughout may be described as having multiple “principal” authors



2.

Sentiment Analysis 1: the Opinion of the Article Writer
. Which of the following
ratings comes closest to the
article writer’s

expressed opinion of the literary author(s) this article
principally concerns? [Note: this question concerns the opinion ultimately taken by the
article
writer him/herself

on the litera
ry authors question. This is so even though the article writer may
quote or reference opposing opinions along the way.]
This question should be answered
separately for each author named in question1.


2


A Positive Opinion
: a generally or ultimately posit
ive opinion as an overall matter.

0


A Negative Opinion
: a generally or ultimately negative opinion as an overall matter.

U


A Mixed or Unclear Opinion, or No Opinion Offered
: it is not possible to say
whether the writer’s overall opinion of an author is

either positive or negative because the
writer’s opinions are mixed, unclear, or not offered at all.


3.

Sentiment Analysis 2: Uncertainty of Article Writer’s Opinion.
If the answer to
Question 2 is “U,” answer the following question; if not skip it. Which o
f the following ratings
comes closest to describing why the article writer’s opinion of a principal literary author is
unclear?
This question should be answered separately for each author named in question 1.


1


A Mixed or Unclear Opinion
: the article wr
iter either expresses mixed opinions
about the literary author, or does not make clear how the opinions, judgments, or values
s/he holds clearly relates to the literary author

X


Straight Factual Account
: this is not an article in which the article writer
’s
personality, opinions, judgments, are in evidence; the article writer assumes the position
of the “straight,” factual, objective newspaper reporter; the article writer’s stance is
neutral

with respect to his/her own opinions and values, not evaluative.


4.

Sentiment Analysis 3: Principal Author as Subject of Debate.

(Y/N) Does this article
contain any explicit reference to the literary author(s) it principally concerns as a subject of
debate, either because interpretations of that literary author’s meaning
are explicitly disputed, or
because opposing positive and negative opinions of an author are explicitly referenced?


Team POLITIC

23

5.

Books mentioned?

(Y/N). Does this article explicitly mention by title any specific
books, poems, or texts written by any literary author it is principally about?
Note
: this question
should be answered separately for each author named in question 1.


6.

National identificatio
n
. (Y/N) Does this article specifically identify the nationality of
any literary author it is principally about?
Note
: this question should be answered separately for
each author named in question 1.


7.

Style or literary artistry as issue
. (Y/N) With respect

to any literary author this article is
principally about, is the author explicitly described in terms of “art” or as an “artist” or in terms
of his or her “artistic” vision, or is at least one paragraph of the article devoted to the style (not
the content
) of his or her writing? (A “yes” answer to any part of this question means a YES
answer to the question as a whole.)
Note: this question should be answered separately for each
author named in question1.


8.

Foreign Place Names
. (Y/N) Are there any non
-
U.S. p
lace names mentioned in this
article?


9.

Gender of Article Writer
. Use the following scale to identify the apparent gender of the
writer of this article (i.e.,
no
t

the gender of the literary figure(s) in question, but the gender of the
article writer who is writing about the literary figure(s)):


M


Male

F


Female

U


Unclear (i.e., because name is ambiguous or initials are used; the article is unsigned;
or for an
other reason)


10.

Gender as Issue
. (Y/N) Is gender ever explicitly discussed as an issue in this article?


• Note: The fact that a character or author discussed in the article is a man or woman is
not sufficient to constitute a Yes answer to this question; th
ere needs to be some explicit
attention drawn to gender as a matter of significance

(if only in a single phrase)
--
or
reflection on or significance attributed to the categories of “man” or “woman,”
“masculine” or “feminine,” or other gender ideas.



11.

Race as

Issue
. (Y/N) Is race ever explicitly raised as an issue in this article?


• Note: this question should be answered “Yes” only if: (i) the article explicitly uses the
term “race” (or some direct variant on it: “racial,” “racism,” etc.); (ii) there is expl
icit
discussion about general ideas of race; or (iii) one of the following
radicalized

categories
is explicitly invoked: black or African; white or Aryan or Caucasian; Slavic; Jewish or
Hebrew.

Team POLITIC

24


12.

Socioeconomic class as issue
. (Y/N) Does socioeconomic class
receive explicit
discussion in this article?


• Note: Any explicit mention of social class (for example, “aristocratic,” “peasant,” “the
poor,” “Count,” “prince”) will qualify as a YES answer to this question. (Czar, however,
as a state figure, does not al
one qualify.)


13.

Religion as Issue
. (Y/N)

Does religion receive explicit discussion in this article?


14.

Radical Politics as issue
. (Y/N) Do any radical political movements including
anarchism, nihilism, bolshevism, socialism, or communism receive explicit
mention in this
article?


15.

America/West invoked as a point of similarity with Russia
. (Y/N) Does this article
make any specific and explicit claims that Russia shares any quality in common with the U.S.,
“the West,” or any of the countries, cultures, and/or

literatures of Western Europe?


16.

America/West invoked as point of contrast with Russia.

(Y/N) Does this article draw
any specific and explicit contrasts between Russia or anything Russian and any qualities or
aspects of the U.S., “the West,” or any of the
countries, cultures, and/or literatures Western
Europe?




Team POLITIC

25

Appendix D
: Sample Annotation

Question

Evolution


Current Sample Annotation Question


4.
Sentiment Analysis: Principal Author as Subject of Debate.

(Y/N) Does this article contain
any explicit reference to the literary author(s) it principally concerns as a subject of debate, either
because interpretations of that literary author’s meaning are explicitly disputed, or because
opposing positive and neg
ative opinions of an author are explicitly referenced?


Original Sample Annotation Question


4.
Sentiment Analysis: All Opinions Expressed in the Article
. [This question concerns
all

opinions expressed in the article concerning the literary writers in ques
tion

whether they
express the article’s own point of view or other perspectives quoted and referenced in the
article.] Which of the following ratings comes closest to the
entire field

of opinions quoted or
mentioned in this article concerning each of the l
iterary authors the article principally concerns?
Note: this question should be answered separately for each author named in question 1.

2


A Positive Opinion
: a generally or ultimately positive opinion as an overall matter

1


A Mixed or Unclear Opinion
:

such that it is not possible to say whether the article’s
overall opinion of an author is positive or negative

0


A Negative Opinion
: a generally or ultimately negative opinion as an overall matter

X


Neutral
: This article is not evaluative: it does not

express opinions about the
author(s) in question, but is rather strictly and neutrally factual




Team POLITIC

26

Appendix E
: Search Results Using the Readers’ Guide Retrospective


Author / Subject

Total # Search Results

Tolstoy, L.N.

432

Chekhov, A.P.

266

“russian
literature”

ㄹN

䑯獴潥癳vyⰠc⹍.

ㄲN

䝯dkyⰠ䴮

ㄲN

呵q来湥瘬vf⹓.



B牥獨歯
J
B牥獨潶獫syaⰠ䔮b.














䅰灥湤楸⁆
: Alternative spellings of “Dostoevsky”

“dostoevsky” OR “dostoyevsky” OR “dostoevskii” OR “dostoyevskii” OR “dostojevsky” OR
“dostojevskii” OR “dostoeffsky” OR “dostoyeffsky” OR “dostoeffskii” OR “dostoyeffskii” OR
“dostoieffsky” OR “dostoievsky” OR “dostoieffskii” OR “dostoievskii” OR “dosteovsky” OR
“dostoyefsky” OR “dostoievski” OR “dosteoffsky” OR “dosteovskii” OR “dostoefsk
y” OR
“dostoefskii” OR “dostojefsky” OR “dostojefskii” OR “dostojefski” OR “dostoevski” OR
“dosteovski” OR “dostoyevski” OR “dostojevski” OR “dostojeffski” OR “dostoyeffski” OR
“dostoeffski” OR “dostoieffski” OR “dostoievski” OR “dostojefski” OR “dostoyefs
ki” OR
“dostoefski” OR “dostoiefski”

Alternative spellings research conducted by Nick Slaughter of the Foreign Literatures in

America project.



Team POLITIC

27

Appendix

G
: Sample Chart of Periodicals within Readers
’ Guide Retrospective: 1890
-
1982


Source
Type

ISSN /

ISBN

Publication Name

Publisher

Indexing

Start

Indexing

Stop

Magazine

0163
-
2027

50 Plus

Reader's Digest Association, Inc.

1/1/83

11/1/88

Magazine

1548
-
2014

AARP the Magazine.

AARP

5/1/03


Magazine

1041
-
102X

Ad Astra

National Space Society

1/1/89


Magazine

0955
-
2308

Adults Learning

National Institute of Adult
Continuing Education

1/1/95


Magazine

0001
-
8996

Advocate

Regent Media

1/16/01


Magazine

0002
-
0966

Aging

Superintendent of Documents

11/1/82

1/1/96

Academic
Journal

1205
-
7398

Alternatives
Journal

University of Waterloo

1/1/05


Magazine


Amazing Wellness

Active Interest Media, Inc.

1/1/10


Magazine

1545
-
8741

Amber Waves: The
Economics of Food,
Farming, Natural
Resources, & Rural
America

U.S. Dept. of Agriculture
Economic Research Service

2/2/04


Magazine

0002
-
7049

America

America Press

1/1/83


Magazine

0002
-
7375

American Artist

Interweave Press, LLC

1/1/83


Magazine

1540
-
966X

American Conservative

American Conservative

1/16/06


Magazine

1079
-
3690

American Cowboy

Active Interest Media,
Inc.

2/1/11


Magazine

0194
-
8008

American Craft

American Craft Council

2/1/83


Academic
Journal

0002
-
8304

American Education

US Department of Education

12/1/82

1/3/85

Magazine

0361
-
4751

American Film

BPI Communications

1/1/88

1/1/92

Magazine

0002
-
8541

American Forests

American Forests

9/1/92


Academic
Journal

1549
-
4934

American Geographical
Society's Focus on
Geography

Wiley
-
Blackwell

10/15/05


Magazine

1523
-
3359

American Health

RD Publications Inc.

1/1/99

10/1/99

Magazine

0730
-
7004

American Health
(0730
-
7004)

RD Publications Inc.

1/1/88

1/1/97

Magazine

1092
-
1656

American Health for
Women

RD Publications Inc.

12/1/96

1/1/99

Magazine

0002
-
8738

American Heritage

AHMC Inc.

2/1/83


Magazine

1076
-
8866

American History

Weider History Group

6/1/94


Magazine

0002
-
8770

American History
Illustrated

Weider History Group

1/1/83

3/1/94

Academic
Journal

0095
-
182X

American Indian
Quarterly

University of Nebraska Press

1/1/90


Academic
Journal

1067
-
8654

American Journalism
Review

University of Maryland

3/1/93


Academic
Journal

0003
-
0937

American Scholar

Phi Beta Kappa Society

1/15/83


Team POLITIC

28

Appendix H: Glossary of Terms


Classification
: Supervised (requires human input) method of analyzing text in which the user


first defines labels of how they want a collection of words, sentences, etc, to be classified.


Next, the user creates a training corpus of words, sentences, etc that is already classified

according the specified labels to train the software. The user ca
n then input the collection


of words, sentences, etc
.

they want to “classify” by the labels.

Corpus:

a large body of texts, often the entirety of works by an author, articles by a newspaper,


or writings about a certain subject

Keyword frequencies
:
How

often a word appears in literature

Latent Dirichlet Allocation
: Abbreviated LDA, attributes each word in a written document to a


select number of topics determined to compose the document

Optical Character Recognition Software
: Abbreviated OCR, translat
es PDFs and scans of either


handwritten or typed texts into electronic machine readable text

Semantic Parsing/Analysis
: Also known as opinion mining, using text analysis to determine


subjective information in written works

Shalmaneser
: A supervised
tool (
requires human input) for semantic and syntactic parsing, which


automatically assigns text to semantic and syntactic classes. Generates output such as the

following figure:

Team POLITIC

29






Where the original sentence is:

Creeping in its shadow I reached a
point whence I could


look straight through the uncurtained window.



The green text is the generated analysis of the seman
tics of the sentence and the gra
y text


is the generated analysis of the syntax of the sentence.

Text Analysis Portal for Research
:

Abbreviated TAPoR, a collaborative project that permits


researchers to use text analysis tools for the Humanities

Topic modeling
: The use of a

type of statistical model that

generates abstract “topics” in a


database of documents