Arabic Morphology Template Grammar-based

mumpsimuspreviousAI and Robotics

Oct 25, 2013 (3 years and 5 months ago)

117 views

1


Arabic
Morphology
Template
Grammar
-
based


Hassanin M. Al
-
Barhamtoshy, Khalid O. Thab
i
t and B
asil
. Ba
-
Aziz

KAU,
Faculty of
Computing and Information
Technology
, Jeddah

Abstract

This research
presents a multi natural language processing model to be used in
machine translation

and language processing
systems. We
will

describe
problems of
analysis, taken into our consideration
ambiguity (lexical
ly

and
syntactically
)
.
Different types of
linguistic

and non
-
linguistic knowledge are necessary to resolve
these
problems of ambiguity, and in th
is

research

we examine in more detail how to
represent this knowledge.

In addition, the research describes a system for generating natural
-
lang
uage sentences
from

syntax and l
exical
s
tructure
s
, taken into
our point of view

an internal (or
interlingual) representation.
Such model
will be developed as part of an Arabic

English Machine Translation (MT) system; however, it is designed to be used for
many other MT langu
age pairs and natural language applications.

Consequently, the

contributions of this work include
b
uilding dictionary to be used in
automatic translation.


1.
Introduction

To make a good natural language processing (NLP) in translation models, the
following subsection describes different sub
-
models of the NLP.

1.1.
Dictionary

Dictionaries are the largest components of machine translation (MT: or automatic
translation) systems in terms of the amount of information they hold. If they are more
then sim
ple word lists, then they may well be the most expensive components to
construct [1
-
3
].
Consequently,

a user can
make

some additions to system dictionaries
to make a system useful.

One aspect point of view get an idea of the dictionary information size th
at may be
needed for commercial purposes a lexicon with 20 000 entries is often considered as
the minimum. However existing dictionary contains words
-

the
Oxford English
Dictionary

contains about 250 000 entries without being exhaustive even of general
us
age. In

a matter of

fact, no dictionary can ever be complete [
1,
2]
.


2


1.2.
Word Types

It is useful to make a distinction between the characteristics of a word and its inherent
properties with respect to its places
(in sentence)

in its grammatical environme
nt.
Each word
has type

with respect to its morphological analysis.
Although this types
include grammatical properties, like the indication of gender in some languages (the
Arabic or the French part of the bilingual dictionary entry), and the indication of
number on nouns. Typically, the citation form of nouns is the singular form

[1
-
5]
.


1.3.
Dictionaries and Morphology

Morphology means the internal structure of words, and how words can be formed.
In
Arabic i
t is usual to categorize three different word
formation processes

[1,4,7]
:

1
Inflectional

processes, by means of which a word is derived from another word
form, acquiring certain grammatical features but maintaining the same part of
speech or category (e.g.
walk, walks
);

2
Derivational

processes in w
hich a word of a different category is derived from
another word or word stem by the application of some process (e.g.
grammar

grammatical
,
grammatical

grammaticality
);

3
Compounding
, in which independent words come together in some way to form a
new
se
ntence
unit
,

(in Arabic
مهانركش
).

In Arabic, inflectional and derivational processes involve
prefixes

(as in
رككشنف
) and
suffixes

(as in
مهانرككش
), and what is called

pronouns inflection

or subword
. In other
languages, a range of devices such as changes in
the vowel patterns of words,
doubling or reduplication of syllables, etc., are also found. Clearly, these prefixes and
suffixes (collectively known as
affixes
) cannot
"
stand alone
"
as words. Compounding
is quite different in that the parts can each occur a
s individual words.

1.4.
Ambiguity

Most Natural Language Processing is concerned with only one meaning.
However
, as
we all know, this is not the case. When a word has more than one meaning, it is said
to be
lexically ambiguous
. When a phrase or sentence c
an have more than one
structure,

it is said to be
structurally ambiguous

[4,5]
.

3


1.5.
Semantic

Semantic is concerned with the meaning of words and how they combine to form
sentence meanings

[5]
. It is useful to distinguish
lexical

semantics, and
structural

semantics
-

the former is to do with the meanings of words, the latter to do with the
meanings of phrases, including sentences

[
6]
.

There are many ways of thinking about and representing word meanings, but one that
has proved useful in the field of machin
e translation involves associating words with
semantic features

which correspond to their sense components. For example, the
words
man
,
woman
,
boy
, and
girl

might be represented as [
1,
5,

6
]:

man = (+HUMAN, +MASCULINE and +ADULT)

woman = (+HUMAN,
-
MASCUL
INE and +ADULT)

boy = (+HUMAN, +MASCULINE and
-
ADULT)

girl = (+HUMAN,
-
MASCULINE and
-
ADULT)

In case of designing an Arabic translation dictionary, it must be professional in
linguist's

translation. The following figures

(1 and 2)

give example as case s
tudies for
English to Arabic and French to English translation examples

[
6
]
.


Fig. (1): English to Arabic simple translator



Fig. (2): French to English to simple translator


2.
Building
Arabic Dictionary

In
information r
etrieval
systems, such as
CLIR,
queries in one language retrieve
relevant documents in other languages Machine
-
Readable Dictionary (MRD) and
Machine Translation (MT) are important resources for query translation in CLIR
[8]
.
Mohammed Aljlay and et al

investigate MT and MRD to Arabic
-
Engli
sh CLIR. The
4


translation ambiguity associated with these resources is the key problem.
They

present
three methods of query translation using a bilingual dictionary for Arabic
-
English
CLIR [
8
].

Out of vocabulary
(OOV)

words are problematic for cross languag
e information
retrieval. One way to deal with OOV words when the two languages have different
alphabets, is to
transliterate

the unknown words, that is, to render them in the
orthography of the second language. In the present study,
research of [
9
]

present
s

a
simple statistical technique to train English to Arabic transliterati
on model from pairs
of names
.

Arabic requires good stemming for effective information retrieval

due to highly
inflected in derivations
, yet no standard approach to stem
ming has
emerged

[10
-
13]
.
S
everal light stemmers
is developed
based on heuristics and a statistical stemmer
based on co
-
occurrence for Arabic retrieval.
T
he retrieval effectiveness of
such

stemmers a
re compared with

morphological analyzer on the TREC
-
2001 data [
10
]
.

The inflectional structure of word
affects

the retrieval accuracy of information
retrieval systems of Latin
-
based languages.
Different

stemming algorithms for Arabic
information retrieval systems

are

presented [
11
-
1
8
]
.

T
he effectiveness of surface
-
based
retrieval

is also investigated
. This approach degrades retrieval precision since
Arabic is a highly inflected language.
Therefore
, root
-
based retrieval

model is
proposed [11]
.
A
lso, a

statistically significant improvement over the surface
-
based
approach

no
ticed
.

Arabic inflectional morphology requires infixation, prefixation and suffixation, giving
rise to a large space of morphological variation

[12]
. In this p
roject

an approach

is
described

to reducing the complexity of Arabic morphology generation using
grammar
-
based

rules. By decoupling the problem of stem changes from that of
prefixes and suffixes, significant reduction
is gained
in

addition to

the number of rules
required, as much as a factor of three for certain verb types [
1
8
].

Topic tracking is complicated when the stories in the stream occur in multiple
languages. Typically, researchers have trained only English topic models because the
training stories have been provided in English. In tracking, non
-
English test stories are
th
en machine translated into English to compare them with the topic models.
A

native
language hypothesis

proposed
stating that comparisons would be more effective in the
original language of the story [
21
].

5


Due to the high number of inflectional variations o
f Arabic words, empirical results
suggest that stemming is essential for Arabic information retrieval. However, current
light stemming algorithms do not extract the correct stem of irregular (so
-
called
broken) plurals, which constitute ~10% of Arabic texts

and ~41% of plurals.
Although light stemming in particular has led to improvements in information
retrieval [
22
].

There have been advances in Cross
-
Language Information Retrieval (CLIR) in recent
years. One of the major remaining reasons that CLIR does no
t perform as well as
monolingual retrieval is the presence of out of vocabulary (OOV) terms. Previous
work
either has

relied on manual intervention or has only been partially successful in
solving this problem.
M
ethod
is used to

extend earlier work in this

area by
augmenting this with statistical analysis, and corpus
-
based translation [
23
].

In
another

paper, a system that recognizes place names in natural language text
is
described to

produce geographic maps and animations showing the geographical
coverage

of texts about a certain subject as it changes over time. As the system is built
to
analyze

texts in many different languages, it restricts the usage of linguistic
analysis tools to the minimum. Instead, it relies on a gazetteer

(geo dictionary)

containin
g place names in different languages and uses heuristics for disambiguation
purposes [
24
].

A

methodology for implementing natural language morphology in the functional
language Haskell

introduced in [
25
]
. The main idea behind is simple

as stated in [25],

instead of working with un
-
typed regular expressions, which is the state of the art of
morphology in computational linguistics, finite functions
and
algebraic data

types

are
used
. The definitions of these data

types and functions are the language
-
d
ependent

part of the morphology
.

For cross language information retrieval (CLIR) based on bilingual translation
dictionaries, good performance depends upon lexical coverage in the dictionary. This
is especially true for languages possessing few inter
-
language cog
nates, such as
between Japanese and English. In th
e
article

of [26
]
,
it
describe
s

a method for
automatically creating and validating candidate Japanese transliterated terms of
English words. A phonetic English dictionary and a set of proba
bilistic mapping
rules
are used

[
26
]
.

As participants in the TIDES Surprise language exercise, researchers at the University
of Mas
sachusetts helped collect Hindi
-
English resources and developed a cross
-
6


language information retrieval system. Components included normalizati
on, stop
-
word removal, transliteration, structured query translation, and
language
-
modeling

using a probabilistic dictionary derived from a parallel corpus. Existing technology
was successfully applied to Hindi [
27
].

A

novel two
-
step fuzzy translation tec
hnique
is presented
for cross
-
lingual spelling
variants. In the first stage, transformation rules are applied to source words to render
them more similar to their target language equivalents. The rules are generated
automatically using translation dictiona
ries as source data. In the second stage, the
intermediate forms obtained in the first stage are translated into a target language
using fuzzy matching [
28
].

While many investigations have explored the use of query expansion techniques to
combat errors in
duced by translation, no study has yet examined the effectiveness of
these techniques across resources of varying quality. This paper presents results using
parallel corpora and bilingual wordlists that have been deliberately degraded prior to
query

[
29
]
.

A

cross
-
lingual, question
-
answering (CLQA) system for Hindi and English

are
developed [
30
]
. It accepts questions in English, finds candidate answers in Hindi
newspapers, and translates the answer candidates into English along with the context
surrounding e
ach answer. The system was developed as part of the surprise language
exercise (SLE) within the TIDES program
[
30
].


3.
Proposed Model
System Structure

The proposed model includes the following rules:

Step 1:

The Arabic words are looked up in an Arabic el
ectronic dictionary, and
then
employees the morphological component that contains specific rules that deal
with the regularities of inflection.
T
he appropriate category (for example
:

noun

or
verb

or special character
) is assigned.

Step 2:

Some rules of a
n

Arabic grammar are used to try to parse the
entire words
.
Therefore, a
n advanced parser might work out that it is in fact a
measure

modifier. However, it is quite possible that the
parser parses the entire word to
find out its components (extract its impl
icit pronouns

from affixes).
This is
7


because the difference between the Arabic and some possible English
translations is not great.

Step 3:

The Engine now applies
source to target language (
Arabic to English
)

transformation rules. The first step here is t
o find translations of the Arabic
words in a
n

Arabic to English dictionary.

We can now summarize some of the distinctive design features of this engine:



Input sentences are automatically parsed only
a
s it is necessary for the
successful operation
using

va
rious
morphological and
lexical

rules

(
structured
-
based) and phrasal transformation rules. The transformer engine is often content
to find out just a few incomplete pieces of information about the structure of
some of the phrases in a sentence, and where t
he main verb might be.



Morphological rules employed firstly, within all the possibilit
ies o
f derivation
rules

for all the words
inside

sentence
s
.
In practice, transformer
model

takes
some of analyzed features and then

translate
it into the target features
. Thus in
the
Arabic

to English transformer system, we assumed that the grammar covered
only some features of
Arabic
.



Syntactic rules takes
such analyzed features in added to the extracted features,
and therefore find the syntactic form of the sentence (s
urface representation).



The Lexical rules ar
e done to find out if there are
meaning of such
representation or
not?




The use of limited grammars and incomplete parsing means that transformer
systems do not generally construct
complex

repres
entations of
input sentences
-

in many cases, not even the simplest surface constituent structure tree.



Most of the engine's translational competence lies in the rules which transform
bits of input sentence into bits of output sentence, including the bilingual
dictiona
ry rules. In a sense a transformer system has some knowledge of the
comparative grammar

of the two languages
-
of what makes the one structurally
different from the other.

T
he proposed model is based on bilingual dictionary. Therefore, we'll try to create a

new dictionary based on the philosophy of Word.Net dictionary [31].
Consequently,
reports on the design and model implementation will be illustrated and executed based
8


on bilingual Arabic/English dictionary. In a matter of fact, a relational database may
be employed to store the syntactic and lexical indicators and conceptual relations.

3.1.
Model Activity Diagram


As described in many literatures, activity diagram shows the flow of control,
using rounded rectangles. Figure
3

shows flow of control for the
Find Root for a verb
.
All transitions between activities are represented by an arrow. Horizontal bars are
used to simulate activities performed parallel.

The model is based on Arabic template dictionaries (Arabic verb types, Roots and
template patterns). C
onsequently, each rule will be illustrated according to the
relational database dictionary.


Fig.

3

:

Find Arabic Root Activity Diagram

3.2.
Generate Non Diacritic Arabic Word

This function is to generate a non diacritic Arabic
word from an input which is the
Arabic root and the template for the word the following example can explain more

Parameter

Value

Arabic Root

برش

呥浰ma瑥

ي
1

2

3


نوب

9


As described later in 4 about the vowel and letter mask as following Table

Letter

Present

Symbol

1

Present First Letter


2

Present Second Letter


3

Present Third Letter


Any Arabic Letter

Same Arabic Letter

يوهنملكقفغعظطضصشسزرذدخحجثتبأ

Extended Arabic letters


ئءؤآأإ

The output will be

Output

Value

Generate Non Diacritic Arabic
word

نوبرشي

3.3.
Generate Diacritic Arabic Word

This function is to generate a non diacritic Arabic word from an input which is the
Arabic root and the template for the word the following example can explain more

Parameter

Value

Arabic Root

برش

呥浰ma瑥

ي
Q

1

Q

2

Q

3
X

䅳⁤A獣物扥搠da瑥t⁩渠㐮㘠a扯畴⁴桥⁶潷 氠慮搠汥瑴e爠浡獫⁡猠景汬潷楮g⁔ 扬e

Letter

Present

Symbol

1

Present First Letter


2

Present Second Letter


3

Present Third Letter


Q

Fatha



َ

X

Skoon



َ

Any Arabic Letter

Same Arabic Letter

يوهنملكقفغعظطضصشسزرذدخحجثتبأ

Extended Arabic letters


ئءؤآأإ

The output will be

Output

Value

Generate Diacritic Arabic word



ب

ر

ش

ي

3.3.
Extract the Root of Arabic word

Extracting the root of Arabic using a little bit complex Algorithm which is using

multi
ple functions and multiple mask,

and in beginning the function should find the
10


Matched Templates to the input word.
A
fter that we remove all the non required
characters and keep the original verb characters, the output will be all matched Roots.

3.4.
Generate all possible
derivative
pattern

Generating all possible derivative pattern uses different functions, at beginning we
find the Type for the input root verb, and what kind of templates that applied to this
verb, as example:

verb

TypeID

Present

RealRoot

ىفو

29

2

ى
َ
ف
َ
و

برش

1

1

َ
َ
ب
َ
ر
َ
ش

برش

1

5

َ
َ
ب
ِ
ر
َ
ش

䅳⁷e⁳ e⁩渠 桥⁴a扬攠瑨b⁶ 牢r
ىفو

the verb type is 29 and the Present Type is 2 and
which is matching only one table as following:


4
. Arabic Template

Rules

of the Proposed Model

The three
operations of affixes (prefixes, infixes and suffixes) can be used to extract
the roots from Arabic words using derivations templates.

Also, the derived Arabic Words can be derived from Arabic roots after applying the
three affixes templates.

The input Ara
bic word is employed with the second input (affixes templates: called
Mask) to find out the Arabic root
, as shown in Figure 4
.

11



Fig.
4
: Arabic Template Mask

4.1. Unsetting Rules

One of the rules may use AND operator, others may use OR operator or XOR
oper
ators to do so, use an unsetting mask with the same character length.

Consequently, such rules for extracting root can be summarized as follows:

1.

To unset a character in input Arabic word, use 0 fore the corresponding
character in the mask.

2.

To leave a chara
cter in the input Arabic word unchanged, use 1 for the
corresponding character in the mask.

3.
Use the AND/OR/XOR operators to extract the Arabic root and additional
indicators.

To understand how these rules work, refer to
figure 5

as an example


Fig.
(
5
)
:
E
xample of Arabic Template Mask for (
نوركشي
)


4.
2
. Setting Rules

This rule is employed to find out another derivation of words after the first rule
(unsetting rules) or sole. Therefore use a setting mask with the same manner except
the OR operator is used
instead of AND. The setting rules algorithm can be
summarized as the following steps:

12



1.

To set a character in the input word, use 1 for corresponding
character in the mask.

2.

To leave a character in the input word unchanged, use 0 for the
corresponding charac
ter in the mask.

To simplify those rules, refer to the characteristic of the OR operator as shown in
figure (
6
)

and a
ssume that the input Arabic word is (
رككش
). The mask should have
stream of alternatives to find out all the possible derivations from the wo
rd (
ركش
).


Fig.
(
6
)
:
Example of Arabic Template Mask for (
ركش
) to find out its Arabic derivations


5
.
Results

and Discussion

There is a triliteral, quadrilateral, or pentaliteral Arabic verbs. Every Arabic verb has
its own derivatives and these derivatives

are depend on its type. About 30 types of the
triliteral verbs contain 5321 verbs. This can produce 20000 templates. If affixes rules
are applied for these templates (4 prefixes and 30 suffixes), therefore the total number
of Arabic word derived from ve
rbs are 28,140,005 derivations.


5.1
Testing

Arabic artificial words were used in testing the proposed model. Such words include
all their various possible derived verbs, nouns, adjectives, adverbs, etc and various
13


combinations of using affixes (prefixes,

suffixes, infixes, and connected pronouns).
The testing sample included 50 roots and their derivations. The results of this
experiment are presented in table (1). The sample was composed of 60% (30 roots) of
which was derived from sound verbs and 40% (20
roots) belonged to weak verbs.

Table (1): Results of Testing the Proposed Model


Total number of hits

Correct Ratio

Error ratio

Sound Verbs

30

100 %

0.00 %

Weak Verbs

20

98 %

0.02 %

Total

50

99 %

0.01 %

The testing is used to find out:

(1) Roots of
entire Arabic words (figures 7
-
a,b and c).

(2) Morphological analysis of the entire Arabic words with associated analyzed
properties, (figures (8
-
a, and b)

(3) Possible diactrize of the entire Arabic words (figure 9).


Fig.

(
7
-
a
)
: Example to find Root of
the Arabic word (
امهكيفكيسف
)

14



Fig.

(
7
-
b
)
: Example to find Root of the Arabic word (
موزهملا
)


Fig
.

(
7
-
c
)
: Example to find Root of the Arabic word (


ق
)

15




Fig
.

(
8
-
a
)
: Example to find Properties of the Arabic word (
مهكيفكيسف
)


Fig
.

(
8
-
b
)
: Example to find
Properties of the Arabic word (
براض
)


Fig
. (
9
)
: Example to find Properties of the Arabic word (
براض
)




16


5.2 Complexity

Due to proposed model complexity, so turn our attention to how morphological
analysis is conducted by the proposed model, we find that
the running time cost is
determined by three component of the following algorithm:

Step 1: Checking the existence of the entire Arabic word and order of root using the
proposed Arabic dictionary.

Step 2: Validating
prefixes and suffixes of the entire Arabic word
using the proposed
template
Arabic
grammar
.

Step 3: Validating infixes of the entire Arabic word


if needed.


Therefore, for the first step “Checking the existence of the entire Arabic word and
order of root

using the proposed Arabic dictionary”, the comparison is carried out
character by character, i.e.; we should assume that the number of comparisons would
be:

T
1

= n

Where n is length of the entire Arabic word (n=3 for trilateral or 4 for

quadrilateral, or
5 for
pentaliteral
).

At the second step, if the entire Arabic word exists in a proper sequence after
validating prefixes and suffixes, such that are checked against a list of stored prefixes
and suffixes, the number of comparisons determined as follows:

T
2

= Log N
ps

Where, Nps is the number of prefixes and suffixes.

The validation of word infixes depends on two factors [
32]: the size of difference
between positions of the letters of root in the entire word, and the list of infix letters to
be checked.
Accordingly, the number of comparisons would be calculated as follows:

T
3

= D + M

Where, D is the number of comparisons for checking the difference, M is the number
of character comparisons to match an infix against the set infixes.

Consequently, the overa
ll running time for our proposed model is computed as the
sum of the three factors listed above.

T = T
1

+ T
2

+ T
3




= n + (Log Nps) + (D +
M
)


17


References

[1] Doug Arnold, Lorna Balkan, Siety Meijer, R.Lee Humphreys, and Louisa Sadler,
MACHINE
TRANSLATION: An Introductory Guide, 1995.

[2] W.J. Hutchins and H.L. Somers.
An Introduction to Machine Translation
.
Academic Press, London, 1992.

[3]
http://www.essex.ac.uk/l
inguistics/clmt/MTbook/HTML/


[4] Hassanin M. Al
-
Barhamtoshy, Understanding of Arabic Text, Ph. D. dissertation,
Al
-
Azhar University, 1992.

[5] Ronnie Cann.
Formal Semantics
. Cambridge University Press, Cambridge, 1993.

[6]
http://www.worldlingo.com/products_services/worldlingo_translator.html

[7]
Ashraf

I Madkour and Hassanin M. Al
-
Barhamtoshy, Arabic Morphological
An
alyzer, Al
-
Azhar Engineering International Conference, AEIC 1993, Cairo
-
Egypt.

[8] Mohammed Aljlayl, Ophir Frieder,
Corpus Linguistics: Effective arabic
-
english
cross
-
language information retrieval via machine
-
readable dictionaries and
machine translation,

Proceedings of the tenth international conference on
Information and knowledge management, October 2001 .

[9] Nasreen AbdulJaleel, Leah S. Larkey, Information retrieval session 3: cross
language retrieval: Statistical transliteration for english
-
arabic c
ross language
information retrieval, Proceedings of the twelfth international conference on
Information and knowledge management, November 2003.

[10] Leah S. Larkey, Lisa Ballesteros, Margaret E. Connell, Arabic Information
Retrieval: Improving stemming fo
r Arabic information retrieval: light stemming
and co
-
occurrence analysis, Proceedings of the 25th annual international ACM
SIGIR conference on Research and development in information retrieval, August
2002.

[11] Mohammed Aljlayl, Ophir Frieder, Informatio
n retrieval 1: On arabic search:
improving the retrieval effectiveness via a light stemming approach, Proceedings
of the eleventh international conference on Information and knowledge
management, November 2002.

[12] M. A. madkour, A. Al
-
samahy and Hassanin

M. Al
-
Barhamtoshy, “An Arabic
Morphological Analyzer”, Al Azhar Engineering International Confrence, AEIC
1991, Cairo, December 1991.

[13]
N. H. Hegazi, and A. A. Elsharkawi. "An Approach to a Computerized Lexical
Analyzer for Natural Arabic Text". Procee
dings of the Arabic Language
conference, Kuwait,1985.

[14] M. Geith, T. El
-
Sadany. "Arabic Morphological Analyzer on a Personal
Computer". Proceedings of the First KSU Symposium on Computer
Arabization.1987.

[15] S. S. Al
-
Fadaghi and F. S. Al
-
Anzi.”
A new
algorithm to generate Root
-
pattern
Forms”. Proceedings of the 11th National Computer Conference, KFUPM, P.391.
1989.

[16] Y. Hilal “Morphological Analysis of Arabic Morphology", Computer Processing
of the Arabic Language,Workshop Papers, vol. I, April, Ku
wait.1985

[17] Botrous Thalouth and Abdullah Al
-
Dannan. “ A Comprehensive Arabic
Morphological Analyzer /Generator”. IBM Kuwait Scientific Center. Feb. 1987.

[18]
Imad A. Al
-
Sughaiyer and Ibrahim A Al
-
Kharashi “Arabic Morphological
Analysis Techniques: A
Comprehensive Survey”, CERI internal report, KACST
2003.

18


[
20
عورشم ]

ةيبرعلا ةغللا تادرفمل ةيفرصلا ةدعاقلا ءانب

زيزعلا دبع كلملا ةنيدم ،ةيوغللا ةريخذلا مادختساب
،تاينورتكللإاو بساحلا ثوحب دهعم ،ةينقتلاو مولعلل

3
/
4
/
1424
.ـه

[20]Violetta Cavalli
-
Sforza,
Abdelhadi Soudi, Teruko Mitamura , Arabic morphology
generation using a concatenative strategy, Proceedings of the first conference on
North American chapter of the Association for Computational Linguistics, April
2000.

[21] Leah S. Larkey, Fangfang Feng,
Margaret Connell, Victor Lavrenko, Machine
learning for IR: Language
-
specific models in multilingual topic tracking,
Proceedings of the 27th annual international conference on Research and
development in information retrieval, July 2004.

[22] Abduelbaset G
oweder, Massimo Poesio, Anne De Roeck, Posters: Broken plural
detection for arabic information retrieval, Proceedings of the 27th annual
international conference on Research and development in information retrieval,
July 2004.

[23] Ying Zhang, Phil Vines,
Cross
-
language information retrieval: Using the web for
automated translation extraction in cross
-
language information retrieval,
Proceedings of the 27th annual international conference on Research and
development in information retrieval, July 2004.

[24]

Bruno Pouliquen, Ralf Steinberger, Camelia Ignat, Tom De Groeve, Information
access and retrieval (IAR): Geographical information recognition and
visualization in texts written in various languages, Proceedings of the 2004 ACM
symposium on Applied computi
ng, March 2004.

[25] Markus Forsberg, Aarne Ranta, Functional morphology, ACM SIGPLAN
Notices , Proceedings of the ninth ACM SIGPLAN international conference on
Functional programming, Volume 39 Issue 9, September 2004 .

[26] Yan Qu, Gregory Grefenstette
, David A. Evans, Cross
-
lingual information
retrieval: Automatic transliteration for Japanese
-
to
-
English text retrieval,
Proceedings of the 26th annual international ACM SIGIR conference on Research
and development in informaion retrieval, July 2003.

[27]

Leah S. Larkey, Margaret E. Connell, Nasreen Abduljaleel, Hindi CLIR in thirty
days, ACM Transactions on Asian Language Information Processing (TALIP),
Volume 2 Issue 2, June 2003.

[28] Ari Pirkola, Jarmo Toivonen, Heikki Keskustalo, Kari Visala, Kalerv
o Järvelin,
Cross
-
lingual information retrieval: Fuzzy translation of cross
-
lingual spelling
variants, Proceedings of the 26th annual international ACM SIGIR conference on
Research and development in information retrieval, July 2003.

[29] Paul McNamee, Ja
mes Mayfield, Cross
-
language Information Retrieval:
Comparing cross
-
language query expansion techniques by degrading translation
resources, Proceedings of the 25th annual international ACM SIGIR conference
on Research and development in information retr
ieval, August 2002.

[30] Satoshi Sekine, Ralph Grishman, Hindi
-
english cross
-
lingual question
-
answering
system, ACM Transactions on Asian Language Information Processing (TALIP),
Volume 2 Issue 3, September 2003.

[31] William J. Black and Sabri El
-
Kateb,
A Prototype English
-
Arabic Dictionary
based on Word Net, UMIST, Department of Computation, Manchester, M60
1QD, UK, Piek Vossen (Eds): GWC 2004, Proceedings, pp. 67
-
74.

[32] Suleiman H. Mustafa (2003), A Morphology
-
driven string matching approach to
Arab
ic text searching, the Journal of Systems and Software 67 (2003) 77
-
87.


19


Hassanin M. Al
-
Barhamtoshy

is a professor of computer science in the Department of Information
Technology at King Abdulaziz University (Jeddah, Saudi Arabia).

He earned his Ph.D. in
computer
s

and system
s

engineering from the University of Al
-
Azhar (Egypt) in
1992. He was granted several academic awards and scholarships. After graduation, he worked at Al
-
Azhar University and chaired many external projects for
this four

years. In 1996

he went on leave
from Al
-
Azhar for
six

years during which he worked in the Department of Computer Science at King
AbdulAziz University
, Faculty of Science
. He is at present
a full professor at KAU, Faculty of
Computing and Information Technology
. He has p
ublished several papers in a number of research
areas in

computer science
and computer engineering
including natural language processing (especially
Arabization of computers), database and information retrieval systems, software

engineering and
artificial
intelligence
.