An Evaluation Tool for Natural Language Processing Systems

blabbingunequaledΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 4 χρόνια και 6 μήνες)

110 εμφανίσεις

An Evaluation Tool for Natural
Language Processing

Audrey N. Mbeje

Department of Computer Science
Ball State University

November 09, 2000



Problem Description

Significance of the Study

Definition of Terms

Computational Linguistics


Literature Review


Anticipated Results

Time Schedule


Future Research & Conclusion

Problem Description

Problem Background:

Human interactive discourse provides many challenges

for natural language processing (NLP) systems. One of

the main challenges is representing the speaker’s

intended meaning in its context. Thus the focus of

current research on NLP has been to develop the

technology that will enable the computer to understand

news events in the context they occur in the real world.

The evolving technology, however, is linguistically

inclined and is less concerned about the quality of the

software. Additionally, it does not reflect uniform

principles of software evaluation.


The goal of the proposed study is to improve the

quality of the natural language processing technology

by assessing NLP system inventions for linguistic and

technical quality assurance before they are implemented.

We are suggesting a natural language processing system

evaluation tool that will provide both the linguistic and

software quality assurance. The proposed study is based

on the assumption that progress in developing NLP

technology depends on using evaluation methods that

better model the speakers’ natural discourse and the

quality software.

Significance of the Study

The study will benefit the theory of natural language

processing, particularly the research area concerned

with context in NLP systems.

The study is proposing an integration of linguistic

principles and software design principles in NLP systems

evaluation which would be a contribution in the current

progress in NLP technology.

The proposed tool will improve the NLP system

usability by offering quality assurance for reliability

and validity of the software technically and linguistically.

Definition of Terms

Computational Linguistics:

Discipline between linguistics and computer science

which is concerned with the computational aspects of

human language faculty.

Belongs to the cognitive sciences, artificial

intelligence (AI) specifically.

Has two components

applied and theoretical

Definition of Terms (cont’d)

With the applied component the interest is in the

practical outcome of modeling human language

use. The goal is to create software products that

have some knowledge of human language.

The theoretical aspect deals with issues of formal

theories about the linguistic knowledge that a

human needs for generating and understanding


(The proposed evaluation tool is intended for the applied

component of CL.)

Definition of Terms (cont’d)


Rough definition of the term

We say that an utterance x presupposes a fact y,

if uttering x only makes sense if the context (e.g.,

world knowledge or earlier utterance in the same

conversation) provides enough information to

conclude that y is the case. Consider example 2a


Mary’s husband is out of town.

The noun phrase presupposes Mary is married.

Computational linguists are concerned with making NLP

systems understand such contextual information.

Literature Review

Much research on the problem of in
depth story

understanding by computer was performed starting in

the 1970’s.

In the 1990’s the interest shifted towards

information extraction and word sense disambiguation.

The end of the 1990 marked another shift in focus back

to in
depth story understanding by the computer.

McCarthy (1990) discusses the problem of getting the

computer to understand the following text from the New

York Times:

A 61
year old furniture salesman was pushed

down the shaft of a freight elevator yesterday in

his downtown Brooklyn store by two robbers

while a third attempted to crush him with the

elevator car because they were dissatisfied with

the $1,200 they had forced him to give them. The

buffer springs at the bottom of the shaft prevented

the car from crushing the salesman John J. Hug, after

he was pushed from the first floor to the basement.

The car stopped about 12 inches above him as he

flattened himself at the bottom of the pit.

(Mueller, 1999)

McCarthy’s concern was beyond mere word sense

disambiguation and information extraction. He

suggested that the computer should be able to

demonstrate such contextual questions as:

Who was in the store when the events began?

Who had the money at the end?

What would have happened if Mr. Hug had not

flattened himself at the bottom of the pit? etc.

Literature Review (cont’d)

Current research on contextual understanding is

concerned with such problems as the one stated above.

Several NLP systems have been suggested whose

orientations is mainly linguistic.

This study is suggesting an evaluation tool for such

NLP systems integrating linguistic and technical

principles, namely, speed.


Create an algorithm simulating aspects of human

language faculty, namely, speed and ability to

decode contextual discourse.

Evaluation technologies to evaluate the NLP

systems for context decoding and speed using

existing evaluation technology.

Methodology (cont’d)

Do the same test using the proposed tool.

Compare the results

Note: The proposed evaluation tool will be evaluated

for validity and reliability before its

implementation using outside researcher’s

evaluation tool.

Anticipated Results

The proposed tool should effectively evaluate NLP

systems for context and speed.

Time Schedule


November: Proposal Writing & Presentation


December: Proposal Review



Literature Review



Data Gathering

Evaluation Tool Designing

Evaluation Tool Testing



Thesis Writing & Defense


Natural Language Processing Evaluation Tool

Research Presentation at a Conference

Research Publication

Conclusion and Future Research

Computing context of a natural language discourse is

an essential task for a natural language processing


The proposed evaluation tool for NLP system will have

a potential for modification to incorporate new design

principles for improved usability.

The End