here - Jersey LVI 2013

journeycartAI and Robotics

Oct 15, 2013 (3 years and 9 months ago)

73 views

A Right of Access Implies a Right to Know:
An Open Online Readability Research
Platform

Michael Curtotti and Eric McCreath

A Right of Access Implies a Right to Know
-

Readability Research
Platform

2

Motivation


Is there a problem with the
readability of legislation?


Legislation has a significant new audience: the
general public


Historical audience


lawyers, judges,
government officers


Conclusions that can be drawn from existing
research


Plain language drafting does improve the readability
of legislation


Researchers often conclude that legislation is very
hard for large proportions of the population
-

even in
those cases where plain language drafting is used.

A Right of Access Implies a Right to Know
-

Readability Research
Platform

3

Outline




Full paper: available via LVI2013 & SSRN


Existing readability research is extensive & covers
fields under discussion (refer to paper)

I.
The Readability Research Platform & Approaches
for Evaluating Readability

II.
Initial findings on legislation and graded readers
using RRP and Weka machine learning package

III.
Workshop using ipython and the RRP to extract
readability/linguistic data

IV.
Readability Research Possibilities


A Right of Access Implies a Right to Know
-

Readability Research
Platform

4

Readability Research Platform

A Right of Access Implies a Right to Know
-

Readability Research
Platform

5

Platform Features


Traditional Readability Metrics


Cloze Tests


Subjective User Evaluations


Natural Language Processing and Machine
Learning


Command line tools for remote data extraction

A Right of Access Implies a Right to Know
-

Readability Research
Platform

6

RRP Performance

A Right of Access Implies a Right to Know
-

Readability Research
Platform

7

Approaches to Assessing Readability


Traditional readability metrics


Human evaluation


Comprehension testing


Cloze Testing


Crowd Sourcing


Natural Language Processing and Machine
Learning

A Right of Access Implies a Right to Know
-

Readability Research
Platform

8

Readability Metrics

A Right of Access Implies a Right to Know
-

Readability Research
Platform

9

Readability Metrics


Indirectly measure vocabulary and syntactic
complexity


Over 200 measures developed


Primarily designed for gradining reading materials for
learner readers


typically passages of 100 words or
more


Not designed for measuring the difficulty of single
sentences


Not designed for measuring the readability of
legislation


A Right of Access Implies a Right to Know
-

Readability Research
Platform

10


Coleman
-
Liau Index = 0.588 * L


0.296*S


15.8 (L
= average letters per 100 words, S = average
sentences per hundred words)



SMOG index =



Dale Chall uses a list of 3000 'easy' words and their
cognates and average sentence length



ARI = 4.71*(char.length/words) + (words/sentences)


21.43

A Right of Access Implies a Right to Know
-

Readability Research
Platform

11

Cloze Test

A Right of Access Implies a Right to Know
-

Readability Research
Platform

12

Cloze Test results


0
-
35% indicates reader frustration


35
-
49% instructional


the reader needs assistance
to understand the material


50% + independent reader

A Right of Access Implies a Right to Know
-

Readability Research
Platform

13

Cloze Test results


0
-
35% reader frustration


35
-
49% instructional


reader needs
assistance


50% + independent reader

A Right of Access Implies a Right to Know
-

Readability Research
Platform

14

Crowd sourcing & Subjective Eval.

A Right of Access Implies a Right to Know
-

Readability Research
Platform

15

Natural Language Processing

A Right of Access Implies a Right to Know
-

Readability Research
Platform

16

Scope of NLP

Characters

Syllables / Morphemes

Lemmas/Words / Parts of Speech

Phrases / Chunks / ngrams

Clauses

Trees

Sentences

…..

Vocabulary

Syntax

Named Entities

Relations

Discourse Features

Current scope of RRP

A Right of Access Implies a Right to Know
-

Readability Research
Platform

17

Machine learning for readability

Labelled or

Unlabelled

Data

A Right of Access Implies a Right to Know
-

Readability Research
Platform

18

Research Questions & Initial Findings


1. Do traditional readability metrics or surface features
of a sentence assist us in assessing the readability
of the sentence?


2. Does parts of speech or chunk data from a
sentence assist in assessing its readability?


3. Do features such as the above provide us with a
measure of whether legislative `sentences' are
`normal' English?

A Right of Access Implies a Right to Know
-

Readability Research
Platform

19

Question 1: very little

Question 2: It helps
-

accuracy is low

Visualization produced using Weka Software

A Right of Access Implies a Right to Know
-

Readability Research
Platform

20

Question 3: Yes


legislative English is
very different (
within sample
)

Visualizations produced using Weka Software

A Right of Access Implies a Right to Know
-

Readability Research
Platform

21

Question 3: Yes


legislative English is
very different


parts of speech data


PCA on Brown Corpus & Legislative Corp.

Visualizations produced using Weka Software

A Right of Access Implies a Right to Know
-

Readability Research
Platform

22

Machine
learning
can use
POS to
distinguish
legislative
sentences
from a
wide range
of other
English
sentences.

A Right of Access Implies a Right to Know
-

Readability Research
Platform

23

Using the RRP for research:

Sending get requests using browser
address bar

A Right of Access Implies a Right to Know
-

Readability Research
Platform

24

Using the RRP for Research

A Right of Access Implies a Right to Know
-

Readability Research
Platform

25

Sending post or get requests using
ipython

A Right of Access Implies a Right to Know
-

Readability Research
Platform

26

Crowd sourced data collection

A Right of Access Implies a Right to Know
-

Readability Research
Platform

27

Research

possibilities