Comparative Study of Topic Segmentation Algorithms Based on Lexical Cohesion: Experimental Results on Arabic LanguageHarrag, Fouzi; Hamdi-Cherif Aboubekeur; Al-Salman, AbdulMalik

mumpsimuspreviousAI and Robotics

Oct 25, 2013 (3 years and 7 months ago)

76 views




Title

Comparative Study of Topic Segmentation Algorithms Based on Lexical
Cohesion: Experimental Results on Arabic Language

Author
-
s

Harrag, Fouzi; Hamdi
-
Cherif Aboubekeur; Al
-
Salman, AbdulMalik

Contact lnfo

College of Computer and Information
Sciences, King Saud University, PO Box 51178

Riyadh, 11543, Saudi Arabia


Department

Computer Science

Major

Computer Science

citation

The Arabian Journal for Science and Engineering, Volume 35, Number 2C. 183
-
202


Year of
Publication

20
10

Publisher

The Arabian Journal for Science and Engineering

Sponsor


Type of
Publication

.
Journal paper

ISSN

ISSN 1319
-
8025


URI/DOI

http://ajse.kfupm.edu.sa/articles/352C_P.12.pdf

Full Text
(Yes,No)

Yes

Key words

natural language processing, Arabic language
processing, information retrieval, topic
segmentation, text

tiling algorithm, C99 algorithm


Abstract

Topic segmentation is essential for a lot of Natural Language Processing (NLP)
applications, such as text

summarization or information extraction. The ob
jective of this research is to evaluate the
effectiveness of topic

segmentation algorithms in identifying the thematic breaks in Arabic texts. For this aim, a
group of 7 readers are

asked to identify the changes of theme that they discerned in 5 Arabic tex
ts of different
domains. The resulting

judgments are used to evaluate the relative performance of two of the main algorithms of
segmentation proposed in

the literature: C99 and Text Tiling, using the classical Recall/Precision evaluation metrics
and the re
cently

introduced Reader Judgment method. The experimental results show that with only a few
improvements, existing

algorithms for segmenting English texts are also efficient for segmenting Arabic texts.