Word Document - Bryan Kane

mumpsimuspreviousAI and Robotics

Oct 25, 2013 (3 years and 5 months ago)


Bryan Kane

4: Senior


Writing 2
: Functional specs and use

Functional Overview

EasyRead is a system designed to
facilitate the simplification
of text content to a certain reading,

based on the
targeted audience.
Once a user submits

(i.e. a news article, blog post, or short story) and a
reading level
, EasyRead
will process the text and

identify difficult passages
. The user will be presented with
tips and suggested replacements

that can be accepted

to simplify the ma
Additionally, EasyRead will
offer a platform for users to submit their

simplification content to

a central repository, for other people to
use and rate. This crowdsourcing facet will not only provide additional material for users,
but also

ow for
automatic training of the EasyRead system, which will lead to better results in the future.

This system is critically important in many elementary school environments or
for adult learners who are
learning how to read
. In education environments, there is a huge demand for relevant, but simplified content.
For an elementary school student, reading an article in the New York Times is extremely difficult and will just
lead to frustration due to the writing style of the ar
ticle. However,
repeatedly reading

antiquated passages from
a textbook or journal becomes boring and uninteresting.

For adult learners no longer in a formal school
environment, finding appropriate content that is both interesting and relevant becomes ev
en more challenging.

Currently, many teachers will either use these uninteresting texts, or painstakingly modify current news articles
into a simpler version for classroom use. However, it is difficult for instructors to find all of the ways that
content c
an be simplified without losing any context or critical information. By turning the simplification of
text into an easy and quick task, instructors will be able to use much more relevant material, increasing student
engagement and willingness to read while

concurrently teaching and discussing current events / other

In order to accomplish such tasks, EasyRead will require

technologically challenging

The first aspect of the system, which identifies and helps
correct passages, will make use of
language processing. Natural
language processing, or NLP, helps analyze written natural languages
(such as English). One common approach to analyzing language is to use language parsers to
in s
entences and phrases by their parts of speech.

Through the use of the Natural Language Toolkit (NLTK), a popular platform that allows for advanced NLP
techniques, EasyRead will be able to analyze any
language document. With an implementation of
NLTK functions, EasyRead will identify difficult words and known trouble
phrases, and suggest

alternatives. In the E
nglish language, there are common patterns of parts
speech, and the ordering of the
speech in a phrase can be used to d
etermine the difficulty level (relative to other orderings).

EasyRead also faces the challenge of analysis of simplified content: upon submission of
content, EasyRead will
analyze the changes made to the document and attempt to find pattern
s across multiple
submissions. Eventually, once enough content is available to be deeply analyzed, EasyRead will learn from
patterns found in user submissions, which will provide more accurate suggestions.

By determining what
sentences are rephrased, added
, or removed, EasyRead will be
better able to detect

difficult or easy passages,
and provide better recommendations for changes.

Core Requirements

EasyRead must have the capability to identify of difficult words based on a given reading

Through a large mapping of 10,000s of words to difficulty levels, specific words will
be able to be targetted for replacement.

Additionally, the system must be able to identify phrases that are worded in a difficult manner.

Suggestions should be presented
to the user that could rephrase a sentence in an
easier to understand manner.

There should also be context associated with these suggestions, explaining what
makes the current phrasing difficult to read.

After a user submits their changes to a piece of tex
t, the EasyRead system should analyze the changes
that are made.

Through a comparison of the original text to the modified text, the system will
determine which areas were modified, and the types of changes that were made.

For example, if a sentence includ
ed 3 adjectives of medium difficulty, and was
altered to include only 2 easy
understand adjectives, that pattern should be stored.
If it is a commonly seen pattern across multiple pieces of text, it should start to be
suggested for new text that comes i

List of Functions

Perhaps the most important aspect of EasyRead will be the crowd
sourced repository of simplified

Not only will this make it easier for users to find pre
simplified content for use, but
will provide a vast range of data
available for analysis.

Through supervised machine learning, given the simplified content (along with its
original, unsimplified version), EasyRead will be able to analyze the specific changes
that are made in sentence structure, word choice, and style, wh
ich will allow for
greater simplification suggestions for future users.

In order to generate the repository of simplified content, EasyRead must provide an easy
interface for generating the simplified content.

The web
based interface will allow user
s to upload text content to be analyzed.

areas should be identified on the screen, along with an explanation of what
makes the text difficult, and possible ways to simplify the content.

Of a much lower priority is the integration of the service wit
h third
party sources, such as CNN,
Reuters, Google News, popular blogs, and other sources of relevant, but sometimes difficult

These sources will be consumed through public third
party APIs.

Data Structures

While it is still too early in

the design process to determine exactly what data structures will be required,
general ideas and estimations

be made. The web
based system that interacts with users will require a
database (likely MySQL), and a web server with some temporary storage n
ecessary. All of the documents, and
data that is parsed from each document, will live in this server. In order to perform the advanced machine
learning computation necessary, that will exist as another service. Documents, once they are submitted to the
ver, will be placed on a queue to be processed and analyzed.