Working with Natural Language Text: Tools and Techniques

blabbingunequaledΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 4 χρόνια και 2 μήνες)

88 εμφανίσεις

1

Working with Natural
Language Text: Tools and
Techniques


Nestor Rychtyckyj


Advanced & Manufacturing
Engineering Systems

Ford Motor Company

2

Agenda


Introduction


Description of problem


Why is language
so important?


Dealing with Natural Language Text


Application Examples


Machine Translation


Future Directions


Conclusions

3

Natural Language Text is
“everywhere”


Internet


Web sites


Blogs


Customer Feedback


Dealer Feedback


Lessons Learned


Corporate Knowledge


Warranty Claims


Internal documentation


Spoken Dialog systems

4

Dealing With Text Information


Search Engines (Google, askjeeves.com)


Excel


Commercial Text Mining Tools (Wordstat, SAS
Text Miner, SMART Text Miner, etc)


Open Source tools (Wordnet, Senseclusters,
etc.)


Controlled Languages


Ontologies


Natural Language Processing


Semantic Web



5

Present Status


Mostly key
-
word based


Very little intelligence, no background
knowledge or context


Limited natural language dialog interpretation


Most of the processing is left to the human user


Difficult to build computer systems that can
retrieve information in an “intelligent” manner


6

Future State


Semantic Web


information on the web is
organized using structured tagging based on
XML, RDF, OWL, SWRL


machine
-
processable data on the web


standard interface to data


rich knowledge representations through
ontologies


Allows for the development of systems that cab
retrieve information in an intelligent manner

7

Semantic Web Architecture

Source: Tim Berners
-
Lee, 2000

8

Artificial Intelligence (AI)


Study on how to build human
-
level intelligence into
computer applications


Uses learning, representation of human knowledge,
understanding of language, vision, speech, etc.


Applies the built
-
in knowledge using inference and
reasoning


Been very successful in limited problem domains


less
so for general applications


Integrated into many applications areas including
manufacturing, planning, search, speech recognition,
financial analysis, games, customer analysis,
commercial fishing, etc.


9

Current use of AI in Manufacturing
at Ford


AI applications for manufacturing


Bring appropriate knowledge about
manufacturing to the proper people at the right
time


Improve manufacturing efficiency


Reduce workplace injuries through better up
-
front ergonomics analysis


Make assembly build instructions available to
operators in other languages


Develop common framework for representing
knowledge and exchanging it between different
systems




10

Knowledge Sources in
Manufacturing


Process Build Information


Required Tooling


Part Information


Ergonomics Analysis


Plant Layout Information


Assembly Visualization


Safety Concerns


Manufacturing “Best Practices”



11

Global Study Process Allocation
System (GSPAS)


The Allocation system
used to assign
manufacturing processes
to plant operation
resources.


Process sheets use
STANDARD LANGUAGE
(159) verbs


Like
-

insert, select, grasp,
load …

12

Global Study Process Allocation
System (GSPAS)



Global System to handle Manufacturing Costing,
Process and Labor Management for vehicle
assembly.


Standard Language and AI is an integral part of
GSPAS.


Launched in North America and Europe in 1998
to support the Focus program.


Currently deployed for almost all car and truck
manufacturing at Vehicle Operations assembly
plants world
-
wide.

13

Step by Step Instructions



Process sheets specify the operations, tasks, parts and

tools required to support the production of a vehicle.


14

Standard Language


Controlled language where the grammar and
syntax is restricted.


Developed at Ford Body & Assembly to describe
the vehicle assembly process.


Contains information about tools, parts and work
required to build a vehicle.


Contains over 5000 words, 1000 abbreviations
that can be used by the process engineers.


Standard Language is checked by Artificial
Intelligence (AI) system.



15

Examples of Standard
Language

1.
ALIGN
-
AND
-
SEAT DOOR TRIM
PROTECTOR

2.
FIRMLY PRESS SEALER INTO JOINT TO
AFFECT A POSITIVE SEAL

3.
APPLY DAUB OF SEALER TO THE JOINT
OF THE CENTER FLOOR PAN AND FRONT
FLOOR PAN AT ROCKER PANEL

4.
PUSH SEAT REARWARD TO EXPOSE
FRONT ATTACHMENTS

16

Standard Language Rules


Imperative form


Sentence must start with verb clause followed by
noun phrase.


Only one Standard Language (main action) verb
per sentence.


Some prepositions have special meaning
(“using”, “with”).


Size modifiers may follow nouns (“bumper
large”).


Free form allowed for certain verbs “verify that..”)


17

Process

Sheet

Written

in

Standard

Language

from

CAP

(Focus)

deck

TITLE
:

ASSEMBLE

IMMERSION

HEATER

TO

ENGINE

10

OBTAIN

ENGINE

BLOCK

HEATER

ASSEMBLY

FROM

STOCK

20

LOOSEN

HEATER

ASSEMBLY

TURNSCREW

USING

POWER

TOOL

30

APPLY

GREASE

TO

RUBBER

O
-
RING

AND

CORE

OPENING

40

INSERT

HEATER

ASSEMBLY

INTO

RIGHT

REAR

CORE

PLUG

HOSE

50

ALIGN

SCREW

HEAD

TO

TOP

OF

HEATER

TOOL

20

1

P

AAPTCA

TSEQ

RT

ANGLE

NUTRUNNER

TOOL

30

1

C

COMM

TSEQ

GREASE

BRUSH



Resulting

Work

Instructions

Generated

by

DLMS

For

Line

20

LOOSEN

HEATER

ASSEMBLY

TURNSCREW

USING

POWER

TOOL

005

GRASP

POWER

TOOL

(RT

ANGLE

NUTRUNNER)

<
01
M
4
G
1
>

010

POSITION

POWER

TOOL

(RT

ANGLE

NUTRUNNER)

<
01
M
4
P
2
>

015

ACTIVATE

POWER

TOOL

(RT

ANGLE

NUTRUNNER)

<
01
M
1
P
0
>

020

REMOVE

POWER

TOOL

(RT

ANGLE

NUTRUNNER)

<
01
M
4
P
0
>

025

RELEASE

POWER

TOOL

(RT

ANGLE

NUTRUNNER)

<
01
M
4
P
0
>


.


Standard Language Process
Sheet

18

Natural Language Parsing

Secure bracket using multiple motor nutrunner

Verb Phrase

Verb

Noun Phrase

Noun

Prepositional

Phrase

Preposition

Noun Phrase

Secure

Bracket

Using

Noun

19

Process for Natural Language
Processing


Parse the text (sentence by sentence) into parse
tree structure


Bypass/ignore common words (articles, common
terms)


Stemming (get the root of the word)


Word lookup (synonyms, misspellings,
acronyms)


Word understanding (deeper
-
level ontologies)


Controlled languages with automated checking


20

Parsing Information in Standard
Language


Example

of

Standard

Language

parsing
:

“Feed

2

150

mm

wire

assemblies

through

hole

in

liftgate

panel”



(S

(VP

(VERB

FEED))

(NP

(SIMPLE
-
NP

(QUANTIFIER

2
)

(DIM

(QUANTIFIER

150
)

(DIM
-
UNIT
-
1

MM))

(ADJECTIVE

WIRE)

(NOUN

ASSEMBLY)))

(S
-
PP

(S
-
PREP

THROUGH)

(NP

(SIMPLE
-
NP

(NOUN

HOLE)

(N
-
PP

(N
-
PREP

in)

(NP

(SIMPLE
-
NP

(ADJECTIVE

LIFTGATE)

(ADJECTIVE

OUTER)

(NOUN

PANEL))))))))



21

Ontology


used to represent
knowledge


Individuals


Classes (with hierarchy); think sets


Properties (w/ hierarchy); not part of class


Equivalence


Property characteristics/restrictions


Complex classes


22

GSPAS Ontology

Thing

Tools

Parts

Lexical Nodes

Operations

HAMMER

Attributes: Size,
Part of Speech,
Subsystem
-
id, etc….

Intervening Concept Nodes

23

GSPAS Knowledge Base

24

Ergonomics Analysis


Check the assembly work instructions to
determine what type of physical action is being
described


Check the assembly work instruction to
determine what object is manipulated


Check the associated parts and tools for part
weight and tool properties


Flag potential ergonomics concerns at the
process level and at the work allocation level


Knowledge can be represented as a business
rule

25

Machine Translation


“The Spirit is willing but the flesh is weak”


"The vodka is tempting, but the meat's a bit
suspect".


“The alcohol is arranged, but the meat is weak.”


“This kind of spirit is wants, but the flesh and
blood is weak.”


“The spirit is willing, but the flesh is impossible”


“The spirit puts out the flag and does, the flesh
omits but.”

26

Machine Translation



Use of computers to translate from one
language to another


Examples: Babelfish


Translation accuracy is
highly

dependant on the
quality of the source text


Use proper grammar, punctuation, shorter
sentences, active voice to improve quality


Customize translation systems for each
application domain

27

Problem Description


Need to translate assembly build instructions from
English to the language used at the assembly plants


A single vehicle may require several thousand process
sheets to describe the assembly process


Large amount of assembly instructions are frequently
modified


Large volume of translations precludes the use of human
translators


Specialized terminology requires technical glossaries


MT performance can be improved greatly by improving
the source text


28

Application Description


Machine Translation is integrated into the process
planning for manufacturing system known as GSPAS
(Global Study Process Allocation System)


The translation process is fully automated and does not
require human intervention


Translation occurs automatically after a process sheet is
validated by the AI system and before it is released to
the assembly plants.


We currently translate build instructions for 26 different
vehicle lines in 5 languages (we also have a separate
glossary for Mexican Spanish)


Data is read in from an Oracle database, processed
through the translation system and the output is then
written out to the Oracle database


29

Machine Translation


Source: Process build instructions in English


Target: Process build instructions in Spanish, German,
Portuguese, Dutch & Turkish


Translate both controlled language and embedded free
-
form text


Example: SECURE BUMPER BRACKET {FOR LHS
ONLY} TO VEHICLE BODY USING POWER TOOL


Utilize customized SYSTRAN translation engine,
automotive and Ford
-
specific terminology glossaries and
embedded tagging


Future plans include additional parsing and tagging
information to improve translation accuracy


30

Machine Translation
Implementation in GSPAS


Worked with Systran & Apptek to customize their
translation software for our requirements.


Develop technical dictionaries that contain Ford
terminology with correct translation for each
language pair.


Develop and integrate the translation process
into GSPAS.


Developed a system to check and improve the
source text prior to translation

31

Translation Statistics


Language pairs being translated:
English/German, English/Spanish,
English/Dutch, English/Portuguese, English
-
Spanish (Mexican), English
-
Turkish


Ford specific terminology in Standard Language:
over 5000 words, 13,000 noun phrases, over
1000 abbreviations and acronyms .


Typically translate over 200,000 records each
month


Over 10,000,000 records already translated.

32

GSPAS Translation Process

33

Standard Language Translation
Issues


Sentence structure is not grammatical English (ROBOT
APPLY 50 MM TAPE
-
STRIPE)


Ford terminology is complex and must be explicitly
translated as an entire phrase (INSULATION
ASSEMBLY BODY PILLAR)


Use of abbreviations, misspellings, acronyms (ABS,
A.B.S)


Use of compound verbs (PICK
-
AND
-
SPOON)


Inverted phrase structure with modifiers (BODY PANEL
LRG)


Embedded comments (LOAD BUMPER {LOWER} TO
VEHICLE)




34

Standard Language Translation


Use of slang (“shotgun”)


Articles are seldom used (HAMMER HAMMER).


Need to handle “British” English as well as
“American” English. (terminology, use, spellings)


Source text is incorrectly written and not
understandable.


Punctuation is rarely used.


Standard Language is always evolving and
needs to be maintained.

35

Uses of AI Technology


Apply natural language processing (NLP) along
with knowledge representation and reasoning to
improve the source text


Analyze the source text; utilize the ontology to
identify terminology


Convert the source text to a more “translatable”
form by adding articles, replacing abbreviations,
improving grammar and punctuation


Utilize XML tagging and ontology lookup to
improve the structure of free
-
form source text


36

Improving Translation Quality


Process the source text prior to translation
(Standard Language pre
-
processor).


Add articles before the nouns.


Adjust the word order to deal with size modifiers
coming after nouns.


Replace acronyms, synonyms with original
expanded text (ASY
-
> ASSEMBLY)


Verify that punctuation is correct.


Pre
-
process the embedded comments to
improve translation quality.


37

Issues with Machine Translation
Quality


Localization issues (even with technical
terminology)


Spanish in Spain, Mexico,
Argentina, etc.


Ensure that system correctly displays
special characters (umlaut, accents etc.)


Have additional space available on screen
as target languages require more room
than English.


38

Conclusions


Machine Translation is a cost
-
effective way to
translate information with high quality if you are
willing to customize the application to your
requirements


Machine Translation is not an “out of the box”
solution


Machine Translation accuracy can be greatly
improved by controlling and improving the
quality of the source text


39

Where are we going?


Intelligent search w/ context and understanding


Sharing of knowledge through ontologies


Growth of user
-
defined knowledge
(folksonomies)


Intelligent Dialog Systems


integration of
speech recognition w/ intelligent engines
(“Sync”)


Automate the process of information retrieval


40

Questions

?????