Word Document - Bryan Kane

mumpsimuspreviousΤεχνίτη Νοημοσύνη και Ρομποτική

25 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

72 εμφανίσεις

Bryan Kane

CS4243
-
4: Senior

Design

Writing 2
: Functional specs and use
-
cases

Functional Overview

EasyRead is a system designed to
facilitate the simplification
of text content to a certain reading,

based on the
targeted audience.
Once a user submits
content

(i.e. a news article, blog post, or short story) and a
desired
reading level
, EasyRead
will process the text and

identify difficult passages
. The user will be presented with
tips and suggested replacements

that can be accepted

to simplify the ma
terial
.
Additionally, EasyRead will
offer a platform for users to submit their

post
-
simplification content to

a central repository, for other people to
use and rate. This crowdsourcing facet will not only provide additional material for users,
but also

all
ow for
automatic training of the EasyRead system, which will lead to better results in the future.

This system is critically important in many elementary school environments or
for adult learners who are
learning how to read
. In education environments, there is a huge demand for relevant, but simplified content.
For an elementary school student, reading an article in the New York Times is extremely difficult and will just
lead to frustration due to the writing style of the ar
ticle. However,
repeatedly reading

antiquated passages from
a textbook or journal becomes boring and uninteresting.

For adult learners no longer in a formal school
environment, finding appropriate content that is both interesting and relevant becomes ev
en more challenging.

Currently, many teachers will either use these uninteresting texts, or painstakingly modify current news articles
into a simpler version for classroom use. However, it is difficult for instructors to find all of the ways that
content c
an be simplified without losing any context or critical information. By turning the simplification of
text into an easy and quick task, instructors will be able to use much more relevant material, increasing student
engagement and willingness to read while

concurrently teaching and discussing current events / other
interesting
information
.

In order to accomplish such tasks, EasyRead will require
many

technologically challenging
implementations.

The first aspect of the system, which identifies and helps
correct passages, will make use of
natural
-
language processing. Natural
-
language processing, or NLP, helps analyze written natural languages
(such as English). One common approach to analyzing language is to use language parsers to
c
ate
gorize
works
in s
entences and phrases by their parts of speech.

Through the use of the Natural Language Toolkit (NLTK), a popular platform that allows for advanced NLP
techniques, EasyRead will be able to analyze any
E
nglish
-
language document. With an implementation of
NLTK functions, EasyRead will identify difficult words and known trouble
-
phrases, and suggest

possible
alternatives. In the E
nglish language, there are common patterns of parts
-
of
-
speech, and the ordering of the
parts
-
of
-
speech in a phrase can be used to d
etermine the difficulty level (relative to other orderings).

EasyRead also faces the challenge of analysis of simplified content: upon submission of
user
-
simplified
content, EasyRead will
analyze the changes made to the document and attempt to find pattern
s across multiple
submissions. Eventually, once enough content is available to be deeply analyzed, EasyRead will learn from
patterns found in user submissions, which will provide more accurate suggestions.

By determining what
sentences are rephrased, added
, or removed, EasyRead will be
better able to detect

difficult or easy passages,
and provide better recommendations for changes.


Core Requirements



EasyRead must have the capability to identify of difficult words based on a given reading
level
.



Through a large mapping of 10,000s of words to difficulty levels, specific words will
be able to be targetted for replacement.



Additionally, the system must be able to identify phrases that are worded in a difficult manner.



Suggestions should be presented
to the user that could rephrase a sentence in an
easier to understand manner.



There should also be context associated with these suggestions, explaining what
makes the current phrasing difficult to read.



After a user submits their changes to a piece of tex
t, the EasyRead system should analyze the changes
that are made.



Through a comparison of the original text to the modified text, the system will
determine which areas were modified, and the types of changes that were made.



For example, if a sentence includ
ed 3 adjectives of medium difficulty, and was
altered to include only 2 easy
-
to
-
understand adjectives, that pattern should be stored.
If it is a commonly seen pattern across multiple pieces of text, it should start to be
suggested for new text that comes i
n.


List of Functions



Perhaps the most important aspect of EasyRead will be the crowd
-
sourced repository of simplified
content.



Not only will this make it easier for users to find pre
-
simplified content for use, but
will provide a vast range of data
available for analysis.



Through supervised machine learning, given the simplified content (along with its
original, unsimplified version), EasyRead will be able to analyze the specific changes
that are made in sentence structure, word choice, and style, wh
ich will allow for
greater simplification suggestions for future users.



In order to generate the repository of simplified content, EasyRead must provide an easy
-
to
-
use
interface for generating the simplified content.



The web
-
based interface will allow user
s to upload text content to be analyzed.



Problem
-
areas should be identified on the screen, along with an explanation of what
makes the text difficult, and possible ways to simplify the content.



Of a much lower priority is the integration of the service wit
h third
-
party sources, such as CNN,
Reuters, Google News, popular blogs, and other sources of relevant, but sometimes difficult
-
to
-
read,
content.



These sources will be consumed through public third
-
party APIs.

Data Structures

While it is still too early in

the design process to determine exactly what data structures will be required,
general ideas and estimations
can

be made. The web
-
based system that interacts with users will require a
database (likely MySQL), and a web server with some temporary storage n
ecessary. All of the documents, and
data that is parsed from each document, will live in this server. In order to perform the advanced machine
-
learning computation necessary, that will exist as another service. Documents, once they are submitted to the
ser
ver, will be placed on a queue to be processed and analyzed.