Infra Structure for Machine Translation

estonianmelonΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

400 εμφανίσεις

Infra St
ructure for Machine Translation

Hassanin M. Al
-
Barhamtoshy, Mostafa S. Saleh and Reddah A. Al
-
Kheribi

Faculty of Computing and Information Technology, KAU, Jeddah

Abstract

This
research

presents a
n infrastructure for Machine Translation (MT) to en
able the translation
to/from Arabic language with different forms of input: texts ; speech, or unstructured documents,
and different forms of output: texts; speech; sign language; or structured documents.

This approach needs the establishment of dictionari
es, lexicons, corpus along with the
ontological representation of the entities in the domain of discourse. Consequently, this notion
will put the basic cornerstones for national Arabic wordnet either by wrapping the international
Wordnet with additional la
yers or establishing a notion for Arabic wordnet from scratch.

Th
erefore, the research

describes an approach to
represent and generate
Arabic

Sign Language
(A
r
SL) from English

or Arabic text
. A
r
SL uses
the analyzed

grammatical
morphological, syntax
and se
mantic
features
to generate scripted signs.
Such model is based on 3 D Avatar with full
animation to improve information accessibility for the majority of deaf adults with limited
Arabic literacy.


Unde
r
standing the spoken input is one module in the system

model. Arabic speech synthesis of
the translated text will also be added in the output module. Mixed language
support in the text to
speech module is used to enable the analysis of single phrase that includes several different
languages to be communicated

effectively.

Also, the model can be used to perform semantic matching between the analyzed text and the
domain ontology concepts and relations

to produce the structured document that represents the
information within the input unstructured documents.

Keyw
ords:

Automatic translation, Sign language, Embedded tags, Ontology, Syntax analyzer,
Semantic Analyzer, Domain Ontology, Semantic Matching.

1.

Introduction

T
h
is

research describes a system for generating natural
-
language sentences from syntax and lexical
str
uctures, taken into our point of view an internal (or interlingual) representation. Such model will be
developed as part of an English
-
Arabic Machine Translation (MT) system; however, it is designed to be
used for many other MT language pairs and natural l
anguage applications.

The following subsections introduce to the related fields and related works.


2

1.1

Sign Language (SL)


The following subsections introduce to the previous works in sign languages. Therefore
British Sign
Language (B
SL),

American Sign Langua
ge (ASL), French sign language, and Arabic sign language will
be introduced.

ASL is linguistic structure distinct from English


used for communication to approximately one half
million deaf people in the United States (Neidle et al., 2000, Liddell, 2003;
Mitchell, 2004). Technology
for the deaf rarely addresses this field; so, many deaf people find it difficult to read text on electronic
devices. Software tools for translating English text into animations of a computer
-
generated character
performing ASL ca
n make a variety of English text sources accessible to the deaf, (Huenerfauth, 2005).
Language processing and machine translation (MT) can also be used in educational software for deaf
children to help them improve their English literacy skills.

There are
papers describe the design of English to
-

ASL MT system (Huenerfauth, 2004a, 2004b, 2005),
describing ASL generation. This overview illustrates important correspondences between the problem of
ASL natural language generation (NLG) and related research in M
ultimodal NLG.

Th
e objective of th
is
proposal
report
is

design
ing and implementing TRGM model to accept an Arabic
text, analyze and parse such input, represent this analysis into an

interlingua semantic representation, the
n
employe
e

NLP techniques and
ther
efore generate

sign language synthesis stages.

Section 2 briefly describes relevant aspects of sign languages, which challenge a translation

system.
Section
s

3
and 4 are

devoted to the overall text processing architecture
, text analysis and text
understand
ing
.
Consequently, the proposed model will be presented. The subs
ections

describe the
syntactic parsing, translation to the interlingua semantic representation, and the

pronoun resolution stage
respectively. Current progress in the realization of the natur
al language

component is also outlined in
Section
3
.

During the designing and the implementation of the proposed model, we will review our
experience with constructing dictionary, corpus and lexicon. Also, the research discusses the part of
speech tags (PO
S).

1.1
.1
. British Sign Language (BSL)

The following subsections describe the BSL features.

(a)
Sign Order

BSL has a topic
-
comment structure, in which the subject or topic is signed

first. The topic is the
framework within which the predication takes plac
e. After the topic has

been identified, the rest of the
sentence is the comment, the new information on it. Furthermore

BSL has no fixed order of basic
elements

(S
ubject
,

V
erb
,

O
bject
). This flexibility is due to the extra information

carried in the direct
ional
verbs (see later) and eye
-
gaze

(
Eva
S´af´ar and Ian Marshall, 2002)
.

(b)
Signing Space, Placement and Pronouns

In many sign languages

and in

BSL,

signers exploit the signing space in front

of their body. In a discourse
components of a description c
an be situated in that space: first

the area is defined and then all items or
3


actions are related to that area. Thus, BSL has more

pronouns than English, which are articulated by
pointing to a location previously associated

with a noun. This means that Eng
lish is underspecified when
using plural pronouns
.

(c)
Sign
Agreement Verbs

Agreement verbs include the information about person and number of the subject and object.

This is
realized by moving the verb in the syntactic space, in which the subject and the
object

are placed around
the signer

(
Eva
S´af´ar and Ian Marshall, 2002)
. The signing of the verb begins at the position of the
subject and

ends at the position of the object (GIVE, TELL, etc), some verbs begin at the object and finish

at the subject (BOR
ROW).

(d)
Sign
Classifiers

Classifiers are hand
-
shapes that can denote an object from a group of semantically related objects.

They
are used with verbs which require a classifier so that when combined with location,

orientation,
movement and non
-
manual fea
tures the composite forms a predicate. The hand
-
shape

is used to denote a
referent from a class of objec
ts that have similar features (J. R. Kennaway. 2001 and
Kopp, S., T
epper, P.,
and Cassell, J. 2004).

(e)
Sign
Tenses

BSL has no tense system. Rather t
han express temporal information by morphological or syntactic

features associated with verbs, it is expressed with the help of four time lines in the signing

space or by
the ordering of the propositions in the discourse.

1.
1.
2. American Sign Language (ASL
)
and English
Linguistic

In
many of sign languages, especially in
ASL, several parts of the body convey meaning in parallel: hands
(location, orientation, shape), eye

gaze, mouth shape, facial expression, head
-
tilt, and shoulder
-
tilt.
Signers may also inte
rleave lexical signing (LS) with classifier predicates (CP) during a performance.
During LS, a signer builds ASL sentences by syntactically combining ASL lexical items (arranging
individual signs into sentences). The signer may also associate entities unde
r discussion with locations in
space around their body; these locations are used in pronominal reference (pointing to a loc
ation) or verb
agreement (
Huenerfauth
, 2006)
.

During
generation
, signers’ hands draw a 3D scene in the space in front of their torso.

One could imagine
invisible placeholders floating in front of a signer representing real
-
world objects in a scene. To represent
each object, the signer places his/her hand in a special hand shape (used specifically for objects of that
semantic type: movin
g vehicles, seated animals, upright humans, etc.). The hand is moved to show a 3D
location, movement path, or surface contour of the object being described.

For example, to convey the English sentence “the car parked next to the house,” signers would indi
cate a
location in space to represent the house using a special hand shape for ‘bulky objects.’ Next, they would
use a ‘moving vehicle’ hand shape to trace a 3D path for the car which stops next to the house.


4

The following sub sections describe four system
s to translate from English text to American Sign
Language (ASL). Therefore, the following subsections introduce the four systems under consideration:



The VisiCAST translator (Marshell & Safar, 2002),



The ZARDOZ system (Veale eta al., 1998),



The Workbench
(Speers, 2001), and;



The TEAM system (Zhao et al., 2000).

Such systems take into consideration the following terms:



Machine Translation (MT) architecture,



Grammar formalisms,



Linguistic representations,



Lexicon Format,



Grammatical rules,

and;



Developing ti
me.

(a)

The VisiCAST Translator

The VisiCAST introduced as a part of European Union’s (EU), the university of East Anglia implemented
a system for translating from English text into British Sign Language (BSL: Marshell, 2002). The
approach CMU link parser

to analyze an input English text and uses prolog grammar rules to convert this
output into discourse representation structure. Therefore, head driven phrase structure rules are used to
produce symbolic sign language representation script. This script is d
efined as “signing gesture markup
language”, and it is based on scheme of movement required to perform natural sign language (Kennaway,
2001).

The ViSiCAST is an EU Framework V supported project which builds on work supported by the
UK Independent Televis
ion Commission and Post Office. The project develops virtual signing
technology in order to provide information access and services to Deaf people

(
´Eva
S´af´ar and
Ian Marshall, 2002)
.


(b)

The ZARDOZ system

This system was proposed to translate English
text to sign language using set of hand coded schema, as an
interlingua representation (Huenerfauth, 2003). The authors were developing their framework with
British, Irish and Japanese sign language.

(c)
The ASL Workbench

This system is based on lexical fu
nctional grammar (LFG) to analyze English text, then transfer rules is
used to converting an English f
-
structure into ASL output. Sometimes, the system encounters difficulties
in analysis or other translation tasks, an additional step is employed to ask th
e user of the system for
advice.


5


(d)
The TEAM Project

At university of Pennsylvania, TEAM project is employed to build an ASL syntactic structure from
English text depending on tree during analysis (Zhao, 2000).

2. Text Processing Architecture

The organiz
ation of the English text language processing component is shown in Figure 1. This is
organized as a collection of automatic transformation components augmented by user interaction.


2.1. Noun

In many languages (Arabic, English, etc.), a
noun

or
noun subst
antive

مساك ةلمجلا يف لمعتسم

is a lexical
category which can co
-
occur with (in) definite articles (
لا
,
لاك
, etc.) and attributive adjectives (
اذه
,
هذه
,
etc.), and function as the head of a noun phrase.

The word "noun" comes from the
Latin

nomen

meaning "name." Nouns can be inflected for
grammatical
case
, such as dative or accusative. Nouns, can not be inflected for
tenses
, such as past, present or future.
Vinokurova
(Vinokurova, Nadezhda,

2005
)

has a more detailed discussion of the historical origin of the
notion of a noun in her dissertation
.

2.1.1. Different Definitions of Nouns

Expressions of natural language have
formal

properties, like what kinds of
morphological

prefixes
,
suffixes

or infixes they can take. But they also have
semantic

properties, i.e.

properties pertaining to their
meaning. Due to the specialty and sometimes the peculiarity of languages, the general grammar does not
apply to the nouns in all languages. For example in Russian, there are no definite articles, so one cannot
define nouns b
y means of those. There are also several attempts of defining nouns in terms of their
semantic properties; some of these are discussed below.

2.1.2. Names for Things

In traditional grammars [
http://en.wikipedi
a.org/wiki/Noun
], one definition of nouns is those expressions
that refer to a
person
,
place
,
thing
,
event
,
substance
,
quality
, or
idea
, etc. This is a
semantic

definition,
part of this semantic is related to the definition that makes use of relatively
ge
neral

nouns ("thing,"
"phenomenon," "event") to define what nouns
are.

The existence of such
general

nouns shows us that
nouns are organized in
taxonomic

hierarchies. But other kinds of expr
essions are also organized in
hierarchies. For example all of the verbs "
لوجتي
," "
ىشمتي
," "
وطخي
," and "
ودعي
" are more specific words than
the more
general

"
ريسي
". The latter is more specific than the verb "
كرحتي
." But it is unlikely that such
hierarchies c
an be used to
define

nouns and verbs. Similarly, adjectives like "yellow
رافصلا
" or "difficult
ريسع/بعص
" might be thought to refer to qualities, and adverbs (
لاحلا/فرظلا
) like "outside
جراخ
" or "upstairs
قوف/يولع
" seem to refer to places. Worse still, a tr
ip into the woods can be referred to by the verbs "

لوجتي

stroll" or "
يشمتي

walk." But verbs, adjectives and adverbs are not nouns, and nouns aren't verbs. So the
definition is related to part of speech.

2.1.3. Classification of Arabic Nouns: Proper Nouns
and Common Nouns

Proper nouns

or proper names are the names of unique entities. As an example, "
دمحأ

Ahmad", "
بكوك
يرتشملا

Jupiter" and "
رصم

Egypt" are proper nouns. Proper nouns are usually capitalized in English and

6

most other languages that use the Lati
n alphabet, and this is one easy way to recognize them. However, in
Arabic nouns of all types are written in general form likes Arabic verbs, adjectives and adverbs (there is
no capital or small alphabet).

All other nouns are called
common nouns
, i.e.; "
ب
تن

girl", "
بكوك

planet", and "
دلب

country" are common
nouns.

The common meaning of the word or words constituting a proper noun may be unrelated to the object to
which the proper noun refers. Consequently, someone might be named "
الله دبع
" despite being ne
ither a
"
دبع

slave" nor a "
الله

God". For this reason, proper nouns are usually not translated between languages,
although they may be transliterated. For example (as it is cited in [
http://en.wikipedia.org/wi
ki/Noun
], the
German surname
Knödel

becomes
Knodel

or
Knoedel

in English (not the literal
Dumpling
). However, the
translation of place names and the names of the Portuguese word
Lisboa

becomes
Lisbon

in English; the
English
London

becomes
Londres

in Frenc
h; and the Greek
Aristotelēs

becomes Aristotle in English
(
Vinokurova, Nadezhda. 2005
)
.

2.1.4. Count Nouns and Mass Nouns

Count nouns

[
http://en.wikipedia.o
rg/wiki/Noun
] (or
countable nouns
) are common nouns that can take a
plural, can combine with numerals or quantifiers (e.g. "one", "two", "several", "every", "most"), and can
take an indefinite article ("
لا
" or "
لاب
"). Examples of count nouns are "
دعقم
/
سرك
ي

chair", "
فنأ

nose", and "

لافتحا
/

ناجرهم

occasion".

Mass nouns

(or
non
-
countable nouns
) differ from count nouns in precisely that respect: they can't take
plural or combine with number words or quantifiers. As an examples from Arabic include "
كحاض

laught
er", and "
ثاثأ

furniture". It is not possible to refer to single of "
ثاثأ

a furniture" or "three furnitures".
This is true, even though the furniture referred to could be counted. Thus the distinction between mass
and count nouns shouldn't be made in terms

of what sorts of things the nouns
refer

to, but rather in terms
of how the nouns
present

these entities

[
Baker, Mark. 2005
,
Krifka, Manfred
.

1989

and
Borer, Hagit.
2005
]
.

Some words function in the singular as a count noun and, without a change in the sp
elling, as a mass noun
in the plural, and consequently, these words can use additional word like "
ضعب

some" and "
لك

all".

2.1.5. Number in Nouns and Broken plurals in Arabic

Semitic languages originally had three grammatical numbers: singular, dual, and plural. The dual
continues to be used in contemporary dialects of Arabic, as in the name for the n
ation of (
نيرحب
; Bahrain
-

baħr

"sea" +
-
ayn

"two"), (
ةنس
;
šana

means "one year",
نيتنس
;
šnatayim

means "two years", and
نينس
;
šanin

means "years"). The curious phenomenon of broken plurals

(
ريسكتلا عمج
)
-

e.g. in Arabic,
دس

sadd

"one
dam" vs.
دودس
;
sudūd

"
dams"
-

found most profusely in the languages of Arabia.

In Arabic, the
regular

way of making a plural for a masculine noun is adding the suffix
(
نو
-
ūn
)

at the end.
For feminine nouns, the regular way is to add the suffix
(
تا
-
āt
)
.

Yet one finds that less t
han 10% of all
plurals used in everyday speech or in written texts (modern and classical, even the Qur'an) adhere to these
simple rules. Instead, spoken and written Arabic produces plurals using a system of groups based on the
vocalization of the word.

7


2.
1.6. Collective Nouns

Collective nouns

are nouns that refer to
groups

consisting of more than one individual or entity, even
when they are inflected for the singular. Examples include "
ةنجل

committee," "
روهمج/عيطق

herd" and
"
بهذم/ةسردم

school" (of herring). These nouns have slightly different grammatical properties than other
nouns. For example, the noun phrases that they head can serve of the subject of a collective predicate,
even when

they are inflected for the singular. A collective predicate is a predicate that normally can't take
a singular subject.

2.1.7. Concrete Nouns and Abstract Nouns

Concrete nouns

refer to definite objects that use senses (e.g.;
دمحأ

Ahmad,
ةحافت

Apple).
Whi
le abstract
nouns

on the other hand refer to ideas or concepts, such as "
لدع

justice" or "
هرك

hate". While this
distinction is sometimes useful, the boundary between the two of them is not always clear. In Arabic,
many abstract nouns are formed by adding n
oun
-
forming suffix ("
ة
" or "
نا
" at the end) and affixes ("
ا
" or
"
و
" at the middle) to verbs. Examples are "
ةداعس

happiness", "
نارود

circulation" and "
نوكس

serenity".

2.1.8. Nouns and Pronouns

Noun phrases can be replaced by pronouns, such as "
وه

he", "
يه

s
he", "
يذلا

or
يتلا

which", and "
ءلاؤه

those", in order to avoid repetition or explicit identification, or for other reasons. As it is cited in the
English word
one

can replace parts of noun phrases, and it sometimes stands in for a noun.

For example, in t
he following example,
one

can stand in for
new car
.

This new car is cheaper than
that one
.

But in Arabic
"
ىلولأا
" one

can also stand in for bigger subparts of a noun phrase. For example, in the
following example,
"
ىلولأا
"

can stand in for
"ةرايسلا"
.

ا هذه
.ىلولأا نم صخرأ ةرايسل

2.1.9. Broken
P
lurals in Arabic

In Arabic, the
regular

way of making a plural for a masculine noun is adding the suffix
نو
-
ūn

at the end.
For feminine nouns, the regular way is to add the suffix
تا
-
āt.

Yet one finds that less than 10
% of all
plurals used in everyday speech or in written texts (modern and classical, even the Qur'an) adhere to these
simple rules. Instead, spoken and written Arabic produces plurals using a system of groups based on the
vocalization of the word. This syst
em is not fully regular, as can be seen in the examples and the article
below.

Broken plurals are known as "Jam' Takseer" (
ريسكت عمج
) in Arabic grammar. These plurals are one of the
most fantastic aspects of the language, given the very strong and highly detailed grammar and derivation
rules that govern the written language.

Full knowledge of these plurals comes with extended
exposure to the language. Much like spelling in
English, this system has so many special cases that can be known only by reading a lot of Arabic texts. As

8

Semitic languages typically form tri
-
consonantal roots, forming a "grid" into which vowels may be
ins
erted without affecting the basic root.

Here are a few examples; note that the commonality is in the consonants, not the vowels

[
http://en.wikipedia.org/wiki/Noun
]
.



KiTāB

باتك

"book" →
KuTuB

بتك

"books"



KāTiB

بتاك

"writer, scribe" →
KuTTāB

باتك

"writers, scribes"



maKTūB

بوتكم

"letter" →
maKāTīB

بيتاكم

"letters"

note: these three words all have a common word root,
K
-
T
-
B

ك


ت


ب

"to write"



WaLaD

دلو

"boy" →
aWLāD

دلاوأ

"boys, children"



WaRaQ

قرو

"paper" →
aWRāQ

قاروأ

"papers"



SHaJaR

رجش

"tree" →
aSHJāR

راجشأ

"trees, timber"



but:
JaMaL

لمج

"camel" →
JiMāL

لامج

"camels"



maKTaB

بتكم

"desk, office" →
maKāTiB

بتاكم

"desks, offices"



maLBaS

سبلم

"dress, garb" →
maLāB
iS

سبلام

"apparel, clothes"



JaDD



دج

"grandfather" →
JuDūD

دودج

"grandfathers"



FaNN



نف

"art" →
FuNūN

نونف

"arts"



but:
RaBB



بر

"master, owner" →
aRBāB

بابرأ

"masters"

2.3. Verb and Tense

In general, two main verb aspects are existing:
perfect

for comp
leted action (with pronominal suffixes)
and
imperfect

for uncompleted action (with pronominal prefixes and suffixes).

Morphology: tri
-
literal roots

All Semitic languages exhibit a unique pattern of stems consisting of "tri
-
literal" or consonantal roots
(n
ormally consisting of three consonants), from which nouns, adjectives, and verbs are formed by
inserting vowels with, potentially, prefixes, suffixes, or infixes.

For instance, the root (
بتك
; K
-
T
-
B), "write", yields in Arabic:

kataba

بتك

means "he wrote"

kutiba

بتك

means "it was written" masculine

kutibat

تبتك

means "it was written" feminine

kitāb

باتك

means "book"

kutub

بتك

means "books"

kutayyib

بيتك

means "booklet" dimunitive

9


kitā
ba

ةباتك

means "writing"

kātib

بتاك

means "writer" masculine

kātiba

ةبتاك

means "writer" feminine

kuttāb

باتك

means "writers"

kataba

ةبتك

means "writers"

maktab

بتكم

means "desk"

maktaba

ةبتكم

means "library"

maktūb

بوتكم

means "written"


2.
4
. Word

Order

The standard of word order is
Verb Subject Object

(V
-
S
-
O). In Classical and Modern Standard Arabic,
this is still the dominant order: (


اديرف
ٌ
دمحم ىئر
;
ra'ā muħammadun farīdan.

(Muhammad saw Farid.)
However, VSO has given way in most modern Semitic

languages to typologically more common orders
(e.g. SVO)

[M. A. Abdel
-
Fattah, 2005]
; in many modern Arabic dialects, for example, the classical order
VSO has given way to SVO, and the same happened in Hebrew (due to
Europeanization
) [
Croft, William,

1993]
.

As illustrated in many literatures, Arabic sign language is ordered in the form of S V O.

3. Speech to Speech Translation

The goal of the Speech
-
to
-
Speech Translation (S2S) research is to enable real
-
time, interpersonal
communication via natural spoken l
anguage for people who do not share a common language
[Bowen Zhou, 2004]. English to Arabic S2S is one of the proposed features in this project. This
requires a speech recognizer module as an input to the natural language understanding module.
In the outpu
t module, a statistical language generation model (NLG) is proposed to be used in
the output of the phrase/word translator together and the information extracted by the
information extractor module in order to generate the information needed by the speech
synthesizer[Fu Hua Liu et al, 2003].

A statistical model, e.g. maximum entropy probability model [
Liang Gu

et al, 2004], is used in
the NLG system. Both the sentence level and concept level classes are used as

constituents. The

features used in the model
include the previous symbols, local sentence or phrase type in the
semantic tree, and the concept list that remains to be generated before current symbol. During the
trans
lat
ion, a recursive search is performed on the parsed tree of the input sentence in a

bottom
-
up manner to generate the output word sequence in the target language. After each non
-
terminal
node is traversed, the resultant symbol string is appended to the output. At the end of search, the
concepts are substituted with their variables. The ou
tput from NLG is

input to a trainable speech
synthesis subsystem to synthesize the Arabic language speech. The proposed text
-
to
-
speech has
the ability to generate speech across different languages [
Ruhi Sarikaya
, 2005].


4
. Semantic and Ontology

The World
Wide Web currently contains about 50 million Web sites hosting more than 8 billion pages,

which are accessed by more than 600 million users internationally. This vast amount of data and
information can be better for the user to find his goal in various way
s and sources, but the problem is that
most of the information available on the Web, including that obtained from legacy paper
-
based

10

documents, is in human comprehensible text form, not readily accessible to or understood by computer
programs. Also, web is

not only written in a human
-
readable language (usually, author natural language)
but in a human
-
vision
-
oriented layout (HTML with tables, frames, etc.) and with human
-
only
-
readable
graphics” [
http://www.cs.umd.edu/projects/plus/SHOE/index.html
]
[

K. Thirunarayan, 2005
].

The enormity of available information coupled with its disability to be processed by computers have
made it very difficult to accurately search, present, summarize, and mainta
in it for a variety of users. The
machine handles information stupidly because it is not programmed to handle the information in the way
human do, and the information itself is not represented for the machine in the way represented in the
human mind [
J.Dav
ies
, 2003
].

Semantic Web attempts to enrich the available information with machine
-
processable semantics by
representing the information in a way similar to human way. This enables computer and human to
comple
ment each other cooperatively [
D.Fensel
, 2003
][

http://www.semanticWeb.org/
]. As a result, the
machine can do the complex repetitive task of annotating, summarizing, reasoning within documents to
save these labor, and money consuming tasks.

The realization of
the semantic web needs languages to represent the knowledge. HTML is used to
present the data in a graphical way for the user, but it did not pay any attention to semantic side of the
document. XML presents a structural representation of the document, but
also did not give an explicit
indication about the document semantic. For that purpose, the W3C presented the Resource Description
Framework (RDF) to give that lost semantic structure.

Dorai et. al, [
G. Dorai
, 2002
][

G. Dorai
, 2001
] proposed to embed gramm
ar tags to capture the natural
language queries using embedded grammar tags. The used grammar can provide a unified component for
speech recognition engines, semantic web representation, semantic search, and speech output. But in that
work, the embedded gr
ammar tags are produced and embedded by the author of the web page.

Thirunarayan [
K. Thirunarayan, 2005
]

proposed a modular technique to embed machine processable
semantics into the text document with tabular data using annotation. He used a semi automatic

way to
extract and update. The approach enhances traceability and facilitates querying, but not convenient for
document integration of for automatic tagging of tables.

Chan, and Franklin
[Samuel W. K. Chan, 2003]

described a comprehensive framework for t
ext un
-
derstanding, based on the representation of context. It is designed to serve as a representation of semantics
for the full range of in
terpretive and inferential needs of general natural language pro
cessing. Its most
distinctive feature is its unif
orm representation of the various simple and independent linguistic sources
that play a role in determining meaning such as: lexical associations, syntactic re
strictions, case
-
role
expectations, and most importantly, contextual effects.

5. The Proposed M
odel

Figure 1 shows a traditional MT pyramid (Hutchins and Somers 1992). A machine translation system
takes source text as input and performs some function to produce target text as the output.
Such f
igure
shows pyramid inter
-
lingual machine translation

at the peak, followed by transfer
-
based, then direct
translation.

11


MT systems performed a direct translation, substituting words from a bilingual dictionary and
employing

superficial surface word order changes.
The
analysis

is done first, and therefore t
he

generation
is

done
secondly
. The interlingua
(
at the top of the pyramid
)
represents a language
-
neutral abstraction that is the
output of analysis and the input to generation. In an interlingua
-
based system, all of the work of
translation is based in a
nalysis and generation. In the middle are transfer
-
based systems, which use a
combination of analysis, transfer and generation.


Fig. (1): Standard Translation Model (Hutchins and Somers 1992)

At the analysis phase, morphological analysis is employed fir
stly, then syntactic analysis, and finally
lexical and semantic analysis.

5.1

Natural Language Analysis

Figure 2 shows the steps of the suggested model, when a user enters his/her English/Arabic text and
requests the translation. The steps of the algorit
hm are divided into three main stages:

1.

Analyze and understand stage, 2. Transfer stage; and 3. Generation stage.

The proposed model will describe the previous stages in more details, but we summarize here in order to
provide clear picture of the events.

St
age 1: Analyze and understand stage

1.

The user inputs his/her English/Arabic text or speech

2.

The model recognize the language of the text or speech, and therefore such model confirms the
dictionary and corpus which be intended

3.

The morphological, syntactical,
and semantic analysis take places, and employed to analyze and
understand the entered text

4.

The whole analyzed text are stored into set of synset instructions (interlingual format) using
lexical meaning and ontology rules.

Stage 2: Transfer stage

1.

The unders
tood text (interlingua) and the analyzed components are translated into a set of sign
script


12

2.

The model rearranges such sign script to form the sign order (<Subject> <Verb> <Object> (S
-
V
-
O) or <
Verb
>

<
Subject
>

<
Object
>

(V
-
S
-
O)
) according to the target langu
age.












Fig. (
2
): The Proposed Model Structure

Stage 3: Generation stage

1.

Open the signed or speech dictionary

2.

Manipulate between signed script and related scene from the opened dictionary

3.

Generate the complete scene

4.

Display the complete

scene on a video screen.


IIS

Analysis
Stage

Syntactic Analysis

Morphological Analysis

Lexical Indicators

Semantic Analysis
Services

IIS

Translation Stage

Syntactic Transf
er

(Surface Structure)

Deep Structure

(Internal Structure
)


Semantic Transfer
Services

Dictionary, Lexicon
and Corpus

Translation Dictionary,
Lexicon and Corpus

SL Generation, Speech
Synthesizer Dictionaries

Generation Stage

1
-

M/C Translation

2
-

Sign Lan
guage (ArSL)


3
-

Knowledge Extraction

4
-

Speech Translation


Affixes Rules

Syntactic

Rules

Domain
Ontology

13


Figure 3 shows a functional description of the proposed model.


Fig
.

(3)
:

Functional Description of the Model

5.
2

The
Morphological Analyzer and
Arabic Dictionary

This phase takes its input stream and
broken it into separate tokens, then such tokens can be
analyzed into stream of features (roo
t, token type, lexical features,

… etc). Consequently, this
phase uses dictionary knowledge base (Arabic rules and Arabic feature templates) the output of
this mod
ule is the roots of the input stream.

The proposed model will consist from separate
components: Word analysis rules, Arabic Template matching rules, features extraction and
dictionary.


The Morphological analysis is employed using prefixes
,

suffixes and in
fixes rules firstly,
second
l
y check a list of templates to determine whether the reminder could be known root or not.
Third, if a root is defined, return by it. Otherwise, it returns the original word.


T
here is a triliteral, quadrilateral, or pentalitera
l

Arabic verbs
. Every Arabic verb has its own
derivatives and these derivatives are depend o
n

its type.
A
bout 30 types of the triliteral verbs
contain 5321 verbs

[
Antoine El
-
dahdah
, 2002
]
. This can produce 20000 templates. If affixes rules
are

applied fo
r these templates (4 prefixes and 30 suffixes)
, therefore the total number of Arabic
word derived from verbs are 28,140,005 derivations.


Input Speech

Output Speech

Semantic
Tree

Semantic/Syntactic
Cues

Sign Language
Generator

Analysis Stage

Information
Extractor

Phrase/Word
Tran
slator

Semantic
Lexicon

Generation
Stage
(
Statistical
NLG
)

Statistical

Models

Speech
recognizer

Sign Language

Speech
Synthesizer

Output
Text

Input
Text


14

5.3
Information Extraction for Language Translation Systems

Several NLP technologies (such as Information Retrieval, t
opic detection, Information Extraction for the
most part) have reached a level of quality that is comparable to human performance on similar tasks.
However, the quality of machine translation (MT) is still far behind the quality of professional human
trans
lation. The problem how to bridge the gap between the human quality and MT quality is central to
research and development efforts in the field. There is recognition that on this stage we don’t adequately
understand cognitive processes involved in human tra
nslation, such as language comprehension,
production, application of translation strategies and procedures, so we don’t have appropriate models to
implement in MT systems.

There have been a number of suggestions how MT quality can be improved, e.g., using
anaphora
resolution, disambiguation, including word sense and syntactic disambiguation, term extraction,
representations of the rhetorical structure of texts, of common
-
sense knowledge, etc. (Wilks, 2003: 203).
The majority of these suggestions were based
on abstract theoretical lines of reasoning, on isolated and
artificially constructed examples rather than on empirical corpusbased data. Little effort has been made to
rank the importance of the problems and to identify which difficulties are most typical
for the state
-
of
-
the
-
art MT systems.

Combining IE technology with MT may result in a great potential for improving the state
-
of
-
the art in
output quality. Taking advantage of efforts to resolve specific linguistic problems


such as named entity
problems


may improve not only the treatment of that phenomenon by MT, but also morphosyntactic
and lexical well
-
formedness more generally in the wider context of the target, thus boosting the overall
quality of MT.

Machine translation systems based on speech reco
gnition, have to cope with the fact that word
recognition modules don't work with a 100% accuracy. Moreover, user's spoken language is usually far
from being ideal language use. Information extraction facilities should be added in order for example to
foll
ow the course of Input speech dialog. Speech processing systems must be able to cope with
recognition errors or non grammatical input from the users. This has to be done by processing specific
language relevant information to extract expressions representi
ng relevant data.

5.
4

The Proposed ArSL Model

The
proposed
model is used to analyze, translate, recognize and generate Arabic sign. Therefore,
the model uses multiple retrieved files include dictionary, lexicon and corpus.

Consequently,
translation, recogn
ition and generation model (TRGM) is a software tool to be used for the
generation of gesture animation based on 3D avatar

(virtual human)
.

The concepts presented in
this
research

is

implemented
.

Possible future applications of this work include developi
ng
animated output

(3D Character)
, tagged
script

for linguistic
generation
, and shared lexicons for
sign

standardization,
is also designed and implemented
.

The
Arabic words can be classified into many types: noun Fig (
4
-
a), verb Fig (
4
-
b), special
characte
r, determiners, prepositions Fig (
4
-
c), adjectives Fig (
4
-
d). There are relations between
some of these types according to the position, place and/ or frequency.

15







Some of such types can be included into intensifier (adverb) model (e.g.; every day: repeating
the sign to show frequency).


5.
5

Semantic Level

The semantic level deals with the assignment of meaning to individual words and sent
ences.
This
task is achieved through the conditions, and actions associated with
grammar

rules.
Semantic
analysis stage has the problem of semantic ambiguity. It means unclear deterministic semantic
meaning for the word sense.


Word net contains informati
on about the nouns, verbs, adjectives and adverbs in English. Words
of the same part
-
of
-
speech are organized as
synset

to reflect their synonymous relationship [
J.
Gonzalo
, 2004
]. The main types in semantic ambiguity are
synonym

and
polysemy
. In a next
sec
tion about problems in the model, we will handle this problem using domain ontology, and
namespace definition.



5.6


WordNet, The Lexical Ontology

WordNet is an online lexical reference system developed by the Cognitive Science Laboratory at
Princeton Univer
sity by a group led by Professor George A. Miller [
Miller G. A.
, 1999
]. At


Fig. (4
-
d): Sign of
adjectives
(
ليمج
: beautiful).



Fig. (4
-
c): Sign of Special character (
يف
: in)



Figure (4.b): Sign of Verb (
بتكي
: Write).



Fig. (4
-
a): Sign of Noun (
باتك
: Book)


16

present, WordNet contains more than 150,000 different unique terms. These are organized into
some 115,000 word meanings or set of synonyms called synsets. The lexicon in WordNet is

divided into nouns, verbs, adjectives and adverbs.
[
S. Chua
, 2004
]. Figure
5
, shows the WordNet
lexical ontology.








By
checking
the word happy using
t
he Wordnet Browser we can find

the output in Figure
6
,
which explains the different senses for the
word
"happy"
.


Fig.
(6)
: A screen shot from the WordNet browser for the word "
happy"


WordNet is an electronic lexical database considered to be the most important resource available to
researchers in computational linguistics, text analysis, and many re
lated areas. This lexicon is organized
in terms of word meaning rather than word forms [Elkateb, S. and Black, B. (2004)]. It is neither a
traditional dictionary nor a thesaurus; it combines features of both types of lexical reference resources.

As a thesa
urus, synset (synonymy set) is considered as the building block of the WordNet consisting of
all the words that express a given concepts [Fellbaum, C. (1998)]. It organizes the lexicon by semantic
relations on the basis of synonymy [Elkateb, S. and Black,
B. (2004)]. On the other hand, WordNet
differs from a traditional dictionary in a way that gives definitions and sample sentences for most of its
synsets. In addition, it contains information about morphologically related words [Fellbaum, C. (1998)].
WordN
et comprises four parts of speech databases corresponding to nouns, verbs, adjectives, and adverbs
[4]. These parts are organized into synonymy sets, each representing one underlying lexicalized concept.
Lexical Concept

Verb

Noun


Adjective

Literal

WordForm

Fig. 5: Lexical ontology for WordNet

Adverb

Literal

Glossary Entry

Antonym
-
o
f

Hyponym
-
of

Adjective


Satellite

17


Different relations link these sets [Mandala R., Tak
enobu T. and Hozumi T. (1998)].

Mainly, WordNet was developed to be as a monolingual English Language online lexical resource
[Elkateb, S. and Black, B. (2004)], by George Miller, Princeton University. Later, several WordNets have
been developed to serve d
ifferent languages. Such WordNets is the EuroWordNet which is a multilingual
database with WordNets for several European languages including Dutch, Italian, Spanish, German,
French, Czech, and Estonian.

Moreover, specific
-
language WordNets are available to
day that include Russian WordNet, ItalWordNet
(Italian), Hindi WordNet, Korean WordNet, Chinese WordNet, Japanese WordNet, and others.

The success of developing WordNets for different languages depends on the language properties
including its structure, se
mantics, and syntax. The problem becomes even more challenging when the
language is with inadequate automatic knowledge resources such as Arabic. We tried to search out the
attempts done to develop an Arabic version of WordNet. We noticed that, to date, th
ere is no such
developed application of an Arabic WordNet. However, there are a few attempts that have been made to
develop such a WordNet.

Mona Diab has worked on bootstrapping an Arabic WordNet on the lexeme level using Arabic
-
English
parallel corpora ba
sed on the English WordNet. Her attempt is based on the idea of Arabic cross
translation of every node in an English WordNet and finding correspondences between specific English
synsets in the English WordNet and their Arabic counterparts [Diab, M. (2004)]

[Diab, M., Hacioglu, K.
and Jurafsky, D. (2004)].

Another attempt was done by Sabri Elkateb and Bill Black. They have worked on a computerized
bilingual English
-
Arabic dictionary for translators. It is an attempt to develop a model whose query
mechanism i
s mainly based on the query process implemented in English WordNet, in the form of a
conceptual dictionary. This dictionary can meet the needs of various groups of users in addition to Arab
translators who seek to have satisfactory information about a word

and an adequate representation of its
form, structure, and senses. It supports queries based on words, roots or patterns, as well as via
synonymy, hyponymy and the other WordNet relations, and by English translation [Elkateb, S. and Black,
B. (2004)].

5.7


De
signing and Implementing the Proposed Arabic WordNet

In our design, each synset has an identification number, part of speech (nouns, verbs, adjectives, and
adverbs),

word, word number to determine the order of the word in this synset , the word origin, and

the
etymon.

Each of these synsets has a following relations synonyms, antonyms, parts, part of, types, type of,
and the domain, each of these relations has the same attributes of the synset

(figure
7
)
.

Each synset has the following attributes:

1
-

ID, 2
-

Word
, 3
-

wordNum, 4
-

wordOrigin, 5
-
POS ((part of speech)), and 6
-
etymon


18


Fig.
(7)
: System design model of Arabic semantic lexicon

Figure 7 shows the semantic dictionary of the proposed Arabic WordNet. Figure (
8
-
a) describes the
mea
ning of the context (
عبصا
).


Fig. (
8
-
a): Description of the meaning of the context (
عبصا
)

19




Fig. (
8
-
b): Synonym of the Arabic word (
ديلوت
)




Fig. (
8
-
c)
:
Antonym of the Arabic word (
ديلوت
)



20

5.
8

Domain ontology

Domain
-
specific
ontology serve
s

as a means
for establishing a conceptually concise basis for
communicating knowledge. A domain
-
specific ontology is a shared and common understanding
of a particular domain. It includes a representational vocabulary of terms that are precisely
defined, and specified
with relationships between terms. These terms may be considered as
semantically rich metadata to capture the information contents of the underlying information
sources. The use of ontologies with these semantically rich descriptions offers a promising way
to deal with semantic heterogeneity

[
H. Yang and M. Zhang
, 2005
].



A domain
-
specific ontology specifies a conceptual

terms of a specific domain.
Each
concept

represents a class for a specific set of entities. It
must be
characterized by a unique label nam
e in
the ontology
.
The concepts are typically organized into a
taxonomy tree
, where each node
represents a concept. Concepts are linked together by means of their semantic relationships. The
set of concepts together with their links form a semantic network
. Various kinds of semantic
relationships are maintained between the concepts

[
H. Yang and M. Zhang
, 2005
]
.

Figure
9

shows
the graph representation for the domain of economics.









Part
-
of instance
-
of


(
The affect may be [win| lose
]
)

Fig.
(9):

Simple domain
-
ontology for the Economic domain.

6

Conclusion

This
research

has illustrated
designing and implementing model to translate
to/from Arabic
language with different forms of input: texts ; speech, or unstructured documents, and differ
ent
forms of output: texts; speech; sign language; or structured documents.
Some of
linguistic

background has been employed taken into our consideration morphology, syntax, semantic and ontology
analysis.

Several of the important challenges in developing M
T methods for A
r
SL have also been

described to show how studying A
r
SL can push the boundaries of current MT methodologies.

Economic

Currency

Metal

Bank

Euro

USD

Gold

Silver

21


The analyzed text (interlingua script)

is
motivated
and translated to
scene
-
visualization to produce a 3D
model of objects under dis
cussion.

The research reported the steps of the analysis and understanding, translation and generation
stages of the TRGM model, and therefore partial implementation of translation model form
English to ArSL will presented. Bilingual English to Arabic dic
tionary will be employed to store
the lexical and conceptual relations. The ArSL dictionary will be implemented to be used in the
phase of generation.

The proposed model will
present
ing

the detailed design of
such

system,
taken into our consideration the
d
ifferent analyzed stages;

morphological and lexical analysis, syntactic analysis, semantic analysis, and
document embedding and updating.

References


1.

Antoine El
-
D
ahdah

(2002).

A Dictionary of Arabi
c Grammar in Charts and Tables,
Lebanon.


2.

Baker, Mark. 20
05.
Lexical Categories
-

Verbs, nouns and adjectives.

Cambridge University Press.

3.

Borer, Hagit. 2005.
In Name Only. Structuring Sense, Volume I.

Oxford: Oxford University Press.
Cassell, J., Sullivan, J., Prevost, S., and Churchill, E. (Eds.). 2000. Embod
ied Conversational Agents.
Cambridge, MA: MIT Press.

4.

Bowen Zhou, Daniel Dechelotte and Yuqing Gao, "Two
-
way Speech
-
to
-
Speech Translation on
Handheld Devices",
Int. Conf. of Spoken Language Processing (ICSLP), Korea, Oct. 2004.

5.

Croft, William. 1993. "A noun

is a noun is a noun
-

or is it? Some reflections on the universality of
semantics." Proceedings of the Nineteenth Annual Meeting of the Berkeley Linguistics Society, ed.
Joshua S. Guenter, Barbara A. Kaiser and Cheryl C. Zoll, 369
-
80.

Berkeley: Berkeley L
inguistics.

6.

D.Fensel, J.Hendler, H.Lieberman, and W.Wahlster,

2003.
Spinning the Semantic Web: Bringing the
World Wide Web to Its Full Potential,
eds.The MIT Press.

7.

Diab, M. (2004). "The Feasibility of Bootstrapping an Arabic WordNet leveraging Parallel Co
rpora
and an English WordNet". Second International WordNet Conference, Brno, Czech Republic,
January 20

23, 2004.

8.

Diab, M., Hacioglu, K. and Jurafsky, D. (2004). "Automatic Tagging of Arabic Text: From Raw Text
to Base Phrase Chunks". Proceedings of HLT
-
N
AACL, 2004.

9.

Elkateb, S. and Black, B. (2004). "English
-
Arabic Dictionary for Translators", Second International
WordNet Conference, Brno, Czech Republic, January 20

23, 2004.

10.

Eva S´af´ar and Ian Marshall. 2002. Translation of English Text to a DRS
-
based, S
ign Language
Oriented Semantic Representation, University of East Anglia, School of Information Systems,
University of East Anglia Norwich, NR4 7TJ, United Kingdom

11.

Fellbaum, C. (1998),
WordNet: An Electronic Lexical Database
. The MIT Press, Cambridge, 1998
.

12.

Fu Hua Liu, Liang Gu, Yuqing Gao and Michael Picheny. Use of Statistical N
-
Gram Models In
Natural Language Generation For Machine Translation.
ICASSP 2003
. IEEE, April 2003.

13.

G. Dorai, and Y. Yacoob,
2001.
"Facilitating Semantic Web Search with Embedded
Grammar Tags
"
Workshop on

E
-
Business & the Intelligent Web
,
USA, August 2001
.

14.

G. Dorai, and Y. Yacoob,
2002.
"
Embedded grammar tags: advancing natural language interaction on
the Web
", Intelligent Systems, IEEE Volume 17,


Issue 1,


Jan/Feb 2002

15.

H. Yang
and M. Zhang,
2005.
"Ontology
-
based Resource Descriptions for Distributed

Information
Sources", ICITA'05


22

16.

HLT/NAACL 2004,

Sign Language Animation. In Proceedings of the Student Workshop of the
Human Language Technologies conference / North American chapter
of the Association for
Computatio
nal Linguistics annual meeting,
Boston, MA, USA.

17.

Holt, J. 1991. Demographic, Stanford Achievement Test
-

8th Edition for Deaf and Hard of Hearing
Students: Reading Comprehension Subgroup Results.

18.

http://en.wikipedia.org/wiki/Noun

.

19.

http://www.cs.umd.edu/projects/plus/SHOE/index.html
, retrieved 4/20/2005.

20.

http://www.semant
icWeb.org/

,retrieved 4/20/2005.

21.

Huenerfauth, M. 2003 Survey and Critique of ASL Natural Language Generation and Machine
Translation Systems. Technical Report MS
-
CIS
-
03
-
32, Computer and Information Science,
University of Pennsylvania.

22.

Huenerfauth, M. 2004
a. A Multi
-
Path Architecture for Machine Translation of English Text into
American

23.

Huenerfauth, M. 2004b. Spatial and Planning Models of ASL Classifier Predicates for Machine
Translation. 10th International Conference on Theoretical and Methodological Issu
es in Machine
Translation: TMI 2004, Baltimore, MD.

24.

Huenerfauth, M. 2005. American Sign Language Spatial Representations for an Accessible User
-
Interface. In 3rd International Conference on Univer
sal Access in Human
-
Computer Interaction. Las
Vegas, NV, USA.

25.

J. Gonzalo,

2004.
"
Sense Proximity versus Sense Relations", The Secon
d Global Wordnet Conference

26.

J. R. Kennaway. 2001.

Synthetic animation of deaf signing gestures. 4th International Workshop o
n
Gesture and Sign Language Based Human
-
Computer Interaction, London. Lecture Notes in Artificial
Intelligence vol. 2298 (eds. Ipke Wachsmuth and Timo Sowa).

27.

J.Davies, D. Fensel, and F. van Harmelen,

2003.
Towards the Semantic Web: Ontology
-
Driven
Knowledg
e Management,

eds. John Wiley and Sons, Inc
.

28.

K. Thirunarayan,
2005.
"
On Embedding Machine
-
Processable Semantics into Documents",
IEEE
Transactions on Knowledge and Data Engineering, VOL.17, NO.7, JULY
,
2005

29.

Kopp, S., Tepper, P., and Cassell, J. 2004. Towar
ds Integrated Microplanning of Language and Iconic
Gesture for Multimodal Output. Int’l Conference on Multimodal Interfaces, State College, PA, USA.

30.

Krifka, Manfred. 1989."Nominal Reference, Temporal Constitution and Quantification in Event
Semantics". In
R. Bartsch, J. van Benthem, P. von Emde Boas (eds.), Semantics and Contextual
Expression, Dordrecht: Foris Publication.

31.

L. Zhao, K. Kipper, W. Schuler, C. Vogler, N. Badler, and M. Palmer. 2000.

A Machine Translation
System from English to American Sign L
anguage. Association for Machine Translation in the
Americas.

32.

Liang Gu, Yuqing Gao, "On Feature Selection in Maximum Entropy Approach to Statistical Concept
-
based Speech
-
to
-
Speech Translation,",
Int. Workshop on Spoken Language Translation, Kyoto, Japan
Oc
t. 2004.

33.

Liddell, S. 2003. Grammar, Gesture, and Meaning in American Sign Language. UK: Cambridge U.
Press.

34.

M. A. Abdel
-
Fattah, 2005, Arabic Sign Language: A Perspective, Department of languages and
Translation, Birzeit University, Journal of Deaf Studies
and Education vol. 10 no. 2.

35.

Mandala R., Takenobu T. and Hozumi T. (1998). "The use of WordNet in information retrieval", in
Proceedings of ACL Workshop on the Usage of WordNet in Natural Language Processing Systems
(Montréal, Canada, Aug. 1998), pp 31
-
37.

36.

Miller G. A., Beckwidth, R., Fellbaum, C., Gross, D. and Miller, K. J.,
1999.

“Introduction to
WordNet: An On
-
line Lexical Database”,
International Journal of Lexicography
, Vol 3, No.4 (Winter
1990), pp. 235
-
244.

37.

Mitchell, R. 2004. How many deaf people ar
e there in the United States. Gallaudet Research Institute,
Grad School & Prof. Progs. Gal
laudet U. June 28
.
http://gri.gallaudet.edu/Demographics/deaf
-
US.php


23


38.

Morford, J., and MacFarlane, J.

2003. Frequency Characteristics of ASL. Sign Language Studies, 3:2.

39.

Neidle, C., Kegl, J., MacLaughlin, D., Bahan, B., and Lee R.G. 2000. The Syntax of American Sign
Language: Functional Categories and Hierarchical Structure. Cambridge, MA: The MIT Press.

40.

Parsons, Terence. 1990. Events in the semantics of English: a study in subatomic semantics.
Cambridge, Mass.:MIT Press
.

41.

Popov, B., Kirilov, A., Maynard, D., & Manov, D. (2004).
Creation of reusable components and
language resources for Named Entity Recogn
ition in Russian.
LREC
-
2004.

42.

Ruhi Sarikaya, Yuqing Gao, Michael Picheny and Hakan Erdogan, "Semantic Confidence
Measurement for Spoken Dialog Systems",
IEEE Trans. Speech and Audio Processing
,
July, 2005.


43.

S. Chua, and
N. Kulathuramaiyer,
2004.

"Semantic Fe
ature Selection Using WordNet ",
Proceedings
of the IEEE/WIC/ACM International Conference on Web Intelligence (WI’04) 0
-
7695
-
2100
-
2/04

44.

Saggion, H., Cunningham, H., Bontcheva, K., Maynard, D., Hamza, O., & Wilks, Y. (2004).
Multimedia indexing through multi
-
source and multi
-
language information extraction: the MUMIS
project.
Data & Knowledge Engineering, 48
(2), 247
-
264.

45.

Samuel W. K. Chan
, 2003.

"Dynamic Context Generation for Natural Language Understanding: A
Multifaceted Knowledge Approach",
IEEE Trans. Sy
st., Man, Cybern.
, vol.33, No. 1 Jan. 2003

46.

Somers, H. (2003). Machine Translation: latest developments. In R. Mitkov (Ed.),
The Oxford
handbook of Computational Linguistics
(pp. 512
-
528). Oxford, NY: Oxford University Press.

47.

Vinokurova, Nadezhda. 2005.

Le
xical Categories and A
rgument
S
tructure : a study with reference to
Sakha.] Ph.D. diss. University of Utrecht.


ACKNOWLEDGEMENT

We wish to thank Basil Ahmed Ba
-
aziz, Sakher F. Ghanem, and Khalid Al
-
Harbi,
Faculty of
Computing and Information Technology, K
ing Abdulaziz University in Saudi Arabia for their effort in
implementation and testing during the preparation of this research.

Also,
we gratefully acknowledge
the support of the Faculty of Computing and Information Technology, Saudi Arabia
-

Jeddah.

Hassa
nin M. Al
-
Barhamtoshy
received the B.S. degree in electronic and communication engineering
from Cairo University, Egypt, in 1978, and the M.S. degree in systems and computers engineering
from the Al
-
Azhar university, Cairo, 1985. In 1992, he received the P
h.D. degree in systems and
computers engineering from Al
-
Azhar University, Cairo. During 1992

ㄹ㤷Ⱐ 桥h w慳 慮a A獳i獴慮a
偲潦敳獯s i渠t桥ha数ertm敮e 潦 卹獴敭猠慮搠C潭灵p敲 䕮bi湥nri湧 慴 Al
-
Az桡h 畮uv敲獩ty⸠a畲i湧
ㄹ㤶
-
ㄹ㤷N桥hw慳 慮aA獳i獴慮a 偲潦敳獯s i
渠C潭灵p敲 卣p敮e攠慴 hAr r湩v敲獩tyⰠg敤摡栬h卡畤p
Ar慢a愮aa畲i湧 ㄹ㤸
-
㈰〲O桥hw慳 慮aA獳潣o慴攠偲潦敳獯s i渠C潭灵p敲 卣p敮e攠慴 hAr r湩v敲獩tyⰠ
g敤摡栬h 卡畤p Ar慢a愮a e攠 i猠 捵cr敮ely 偲潦敳獯s i渠 t桥h a数ertm敮e 潦 C潭灵p敲 卣p敮e攠 慮搠
f湦潲m慴i潮oq散桮潬o
gy 慴 䙡捵cty 潦 C潭灵pi湧 慮搠f湦潲m慴i潮oq散桮潬潧yⰠhAr r湩v敲獩ty⸠ei猠
r敳敡r捨c i湴敲敳t猠 i湣n畤攠 l慮a畡u攠 灲潣o獳i湧Ⱐ 獯stw慲攠 敮ei湥nri湧Ⱐ i湴敬lig敮e 獹獴敭猬s 獰s散栠
灲潣o獳i湧ⰠI
-
l敡r湩湧Ⱐ慮搠剆faK


24

ةينب

ةيتحت

ل
ةمجرتل

ةيللآا

.د.أ
يشوتمهربلا دمحم نينسح



ديسلا حلاص ىفطصم



يبيرخلا باهولا دبع اضر

تامولعملا ةينقتو تابساحلا ةيلك



زيزعلا دبع كلملا ةعماج


صلختسملا

ثحبلا اذه مدقي
:ةفلتخم لاكشأب ةيبرعلا ةغللا ىلإ/نم ةيللآا ةمجرتلل ةيتحت ةينب
،تاوصأ ،صوصن

،صوصن تاجرخملا لاكشأ ددعتتو .ةيلكيه ريغ تادنتسم وأ
.اهمهف بساحلل لهسي ةيلكيه تادنتسم وأ ،تاوصأ ،ةراشإ ةغل

،يفرصلا ليلحتلا ىلع دامتعلااب لخدملا صنلا مهف ةيلآ ثحبلا اذه فصيو
.ةيبرع ةراشإ ةغل ءانبل مدختست ةيفصو ةراشإ ديلوت ضرغب يللادلاو يوحنلا
لا اذه مدختسيو
داعبلأا ثلاث يلكش ديسجت ماظن ةراشلإا ةغلل ديلوت
3 D Avatar

.ةصاخلا تاجايتحلاا يوذ عم لصاوتلل ةكرحلا عم

لمشت يتلاو تانايبلا تافلم نم ةعومجم مادختسا ىلع حرتقملا ماظنلا دمتعيو
ثحبلا اذه يف ةمدقملا ميهافملا ةفاكو .ةمدختسملا تاغلل يوغللا مجعملاو سوماقلا
شت متيس
لمشي ثحبلا اذهل

لاماش اراطإ ءاشنإ متيس يلاتلابو ،اهءانبو اهديي
ةلحرم
،مجاعملاو سوماقلا ةلماش ردصملا ةغلل ليلحتلا
،داعبلأا يثلاث دسجملا لكشلا
مجاعملا ءانب ،يوغل فصو ديلوت ،)توصلا بيكرت ،زييمت( ملاكلا جلاعم
.ةيسايق ةراشإ ةغلل سيماوقلاو

لا راطإ أدبي يلاتلابو
دامتعلااب ةمدختسملا تادنتسملاو قئاثولا يف صنلا مهفب لمع
يوحنلاو يفرصلا ليلحتلا ليعفتب ةمدختسملا ةيعيبطلا ةغلل صنلا مهف ىلع
لاجمو صنلا اذه نيب يللادلا قباطتلا ليعفتب ماظنلا موقي مث نمو .يللادلاو
ةيللادلا ميهافملا
Ontology

.تنرتنلإا يف

م ةوطخ لمعلا اذه سسؤيو
وحن ةمه
.ةيللادلا بيولل ةيجيتارتسا ةطخ ءاشنإ