ArtsSemNet: From Bilingual Dictionary to Bilingual Semantic Network

kayakstarsAI and Robotics

Nov 15, 2013 (3 years and 7 months ago)

191 views

ArtsSemNet
: From Bilingual Dictionary to Bilingual Semantic Network

Ivanka Atanassova
1
, Svetlin Nakov
2
, Preslav Nakov
3


(1)

University of Veliko Turnovo “St. Cyril and St. Methodius”, V. Turnovo, Bulgaria

(2)

Sofia University “St. Kliment Ohridski”, FMI, S
ofia, Bulgaria

(3)

University of California at Berkeley, EECS, Berkeley CA 94720, USA


(1)
ivanka49@mail.bg,

(2)
http://www.nakov.com
,

(3)
nakov@eecs.berkeley.edu



Abstract:

The paper presents two bilingual lexi
co
gra
phi
cal
re
sources for the terminology

of fine arts: the ArtsDict elec
-
tro
nic dictionary and the ArtsSemNet semantic net
work
,
and
des
cribes the process of transformation of the for
mer into the
latter.

ArtsDict combines a broad range of in
formation sour
-
ces and is currently the most compl
ete dic
tionary of fine arts
ter
mi
no
lo
gy for both Bulgarian and Rus
sian: not only elec
-
tro
nic, but also in general. It con
ta
ins 2,900 Bulgarian and
2,644 Rus
si
an terms, each an
no
ta
ted with complete dic
ti
o
na
-
ry de
fi
ni
ti
ons. The
se are
fur
ther augmented with various ter
-
m
i
no
lo
gi
cal re
la
ti
ons (po
ly
semy, synonymy, homonymy, an
to
-
ny
my and hy
po
ny
my) and organised into a bi
lingual se
man
tic
network si
milar to WordNet. In addition, a specialised hy
per
-
text brow
ser
is implemented in order to enable intuitive qu
ery
and na
vi
gation thro
ugh the network.

Keywords:

semantic network, terminology, polysemy, ho
mo
-
nymy, hyponymy, antonymy, synonymy
.


1. Introduction

The contemporary dictionary development has be
en de
-
ep
ly affected by the wide spread of personal computers.
No
wadays, a fast growing number of users already for
-
got the annoying lookups in huge paper
-
based dicti
o
na
-
ri
es and started using their computer equivalents. Al
tho
-
ugh the first computer dictionari
es were often worse
than the traditional ones their potential was out of ques
-
ti
on. As early as in 1992 the creators of the
Oxford Eng
-
lish Dic
ti
o
nary

[
OED
] invested $13.5 millions in a five
ye
ars project to enable the development of an elec
tro
nic
v
er
sion. It soon became clear that the com
puter dic
ti
o
-
na
ries could potentially provide by far richer ca
pa
bi
li
ti
-
es. In the mean time, some other re
sour
ces, such as
the
-
sa
uri
,

arose (e.g. the
Roget’s the
sa
u
rus

[
RT
]), which
pro
vided the users

with sy
no
ny
my information. Soon,
the lexicographers started com
bi
ning dictionaries and
the
sauri, which resulted in
se
mantic networks
(e.g.
WordNet

[
Fell
ba
um,1998
;
Miller&al.,1990
;
WordNet
]),
in
cluding not just term glosses and synonyms lists, bu
t
also links to antonyms, hy
po
nyms etc.

The work presented below progressed in a si
mi
lar
fas
hion: we started with electronic dic
ti
o
na
ri
es and later
trans
formed them into semantic net
works with va
ri
ous
terminological relations. We con
centrat
ed on the fine
arts terminology for two closely related and easy
-
to
-
combine Slavonic lan
gu
ages suitable for a comparative
research: Bul
ga
ri
an and Russian. Al
though initially we
focused on Bul
garian, Rus
si
an support has been added
for two re
a
so
ns: to illustrate the mul
ti
lin
gu
al sup
port (at
pre
sent the dictionary interface is bi
lin
gual, whi
le the
se
mantic network allows several lan
gu
a
ges to be used
in parallel) and to make use of the rich lan
gu
a
ge ma
te
ri
-
al for Russian we al
re
ady had. Ad
ding ot
her Balkan
lan
guages in com
bi
nation/in
s
te
ad of Bulgarian/Russian
wou
ld be attractive, on
ce the ne
ces
sary data is collected
and made ava
i
la
ble.

2.

A
RTS
D
ICT
:

Bilingual Termino
lo
gi
cal
Dictionary

ArtsDict

has been create
d in order to allow for easy cre
-
a
tion and usage of parallel bilingual ter
mi
no
logical dic
-
ti
o
naries for the purpose of lexi
co
gra
phi
cal research.
The dictionary data consists of a set of navigable dic
tio
-
na
ry entries: a term (
single
-
word term
,
S
WT

or
multi
-
word term
,
MWT
) and one or mo
re glosses describing
its sense(es). The main scre
en of
ArtsDict

is split both
ho
rizontally (be
t
we
en the dictionaries) and vertically:
the SWT and MWT, including doublets and variants,
ap
pear on the left in a
lphabetical order, while their glos
-
ses are listed on the right. Although the user inter
fa
ce
im
poses no such restrictions, we enforced strict ru
les
for the contents of the separate fields. For ex
am
ple, af
-
ter the term we add in brackets its ori
gin
, when it is a
fo
reign word, and the form for sin
gu
lar, when it is
presented in plural. The doublets
1

and variants
2

appear
horizontally comma separated after the term. Similarly,
after a neutral term its sty
listic re
la
tive synonyms are



1

We consider the
doublets
and the
variants

as
absolute sy
no
-
nyms, the difference being that the former share the sa
me
root
,
while the latter do not
.

2

In fact the phoneti
c and orthographic

variants are lexico
-
gram
matical variants

of the same word

(allolexes), not dis
-
tinct words (synonyms)
.
We treat them as separate words
(
i.e.
sy
no
nyms
)
for two reasons
: 1.
to preserve the uni
fied ap
pro
-
ach to all groups of
variant
, which represent dis
tinct words or
ter
mi
no
logical collocations
; 2.
because the phonetic and gra
-
phe
mic variants could be stylistic re
la
ti
ve synonyms
.
It is not
possible for the lexico
-
gram
ma
ti
cal variants of a word to be
related to different s
tyles
,
e.g. in the fine arts terminology
: б.
зограф


изограф

(the dialect for
зографа
).

Аквамарин

(
нем
. Aquamarin,
по

лат
.
aqua '
вода
' + marinus '
морски
')

Минерал, разновидност на
берила, силикат на берилия и алуминия,
скъпоценен камък, с цвят от светлозелен до небесносин, използван като
материал за художествени изделия.

Акварел (рус. акварель, фр. aquarelle,
от ит. acquarello, от лат. aqua 'вода')

1. Акварелни бои
-

бои, състоящи
се от пигмент и свързващо вещество
(растително лепило с примеси на мед, захар, глицерин);

2. Акварелна техника
-

живописна техника, използваща акварелни бои;

3. Произведение на живописта, изпълнено с акварелна техника.

Акварелен портрет

Разновидност на
портретния жанр, включваща портрети, изпълнени в
акварелна техника.

Акварелист (от ит. acquarello)

вж. Художник
-
акварелист

Акварелистка (от акварелист, от ит.
acquarello)

вж. Художничка
-
акварелистка.

Акварелна техника

вж. Акварел във 2 знач.

Акварелни
бои, Водни бои

вж. Акварел в 1 знач.

Table 1.

Extract from the Bulgarian dictionary contents.

Аквамарин (нем. Aquamarin, по лат.
aqua marina 'морская вода')

Минерал, прозрачная разновидность берилла, синевато
-
зеленой или
голубой окраски, драгоценный камен
ь, применяемый как материал для
художественных изделий.

Акварелист (ит. acquarello)


с
м. Художник
-
акварелист.

Акварелистка (от акварелист, от ит.
acquarello)


см. Художница
-
акварелистка.

Акварель (фр.
aquarelle
, ит.
acquarello
, от лат.
aqua

'вода')


1.
Красочный материал, предназначенный для акварельной живописи,
состоящий из пигмента и большого процента клеящих веществ в качестве
связующего (которым служит растительный клей с примесью меда,
сахара, глицерина);

2. Техника живописи, выполняемая акварельны
ми красками;

3. Произведение искусства, выполненное акварельными красками в
соответствующей технике.

Акварельная живопись


см. Акварельная техника.

Акварельная техника, Акварельная
живопись, Живопись акварелью,
Живопись водяными красками

см. Акварель во
2 знач.

Акварельные краски (ед. ч. краска),
Водяные краски


см. Акварель в 1 знач.

Table 2.

Extract from the Russian dictionary contents.

Олово

(Bulgarian)

Тежък мек ковък метал със сивосинкав цвят, използван като материал за художествени произ
ве
-
де
ния
.

Олово

(Russian)

Химический элемент, мягкий, ковкий, серебристо
-
белый металл, применяемый в изобрази
тель
-
ном искусстве как материал для художественных изделий. На български се превежда калай.

Table 3.

Example of translingual homonymy (Russian).


lis
ted, sinc
e they re
pre
sent the sa
me notion (again com
-
ma separated).

The presented arrangement of variants, do
ub
lets
and stylistic synonyms allows equivalent terms in the
two dictionaries (i.e. the two lan
gu
a
ges) to be exa
mined
in parallel, for the short
entries, and se
qu
entially, for the
lon
ger ones (see
Tables 1, 2
). The parallel exploration
sim
plifies not only the uni
fi
ca
tion of the dictionaries
(by means of ad
di
ti
on the corresponding equivalent: see
Table 5
) but al
so the search for trans
lin
gu
al homonyms
(see
Table 3
).

We would like to note that the dic
ti
o
naries pre
sen
-
ted here are the most com
ple
te fine arts ter
mi
no
lo
gi
cal
ones for both Bul
ga
ri
an and Russian and ha
ve been bu
i
-
lt using a broad ran
ge of resources: sci
e
n
ti
fic, popular
-
sci
entific, fi
ne arts, publicist, so
ci
al
-
po
litical and other
(jour
nals, specialised sci
en
ti
fic and popular
-
scientific li
-
te
ra
tu
re, catalogues, etc., [
Flerov,1981
;
Odnora
lo
va,
1982
;

Pavlov
sky, 1975
;

Tsonev,1957
;

Vinner,1954
]). In
ad
dition, Rus
sian and Bulgarian dic
ti
o
na
ri
es ha
ve be
en
used: ter
minological (e.g. [
SDFAT,1965
;
SDFAT,
1970
]), encyc
lo
pa
e
dic (e.g. [
EFAB,1987
]), or
tho
gra
phi
-

Fig
ure 1.
Screenshot from
ArtsDict
.


cal, ety
mo
logical, dic
tionaries of fo
re
ign words, terms
lists of fine
arts so
ur
ces etc. Termi
no
lo
gi
cal terms, pro
-
fes
sional slang and no
men
cla
tu
res are grouped together
and con
sidered within a uni
fi
ed termi
no
lo
gi
cal fra
me
-
work (see [
Atanasova,2003
] for details).

3.

A
RTS
S
EM
N
ET
:

Semantic Network

3.1. Creat
ion

The
ArtsSemNet

semantic network was built aro
und
the
ArtsDict

dictionaries contents. For the pur
pose, we
in
ves
tigated and completely annotated (ma
nually, but
with a partial computer automation using a formal and a
se
mantic techniques described b
elow) several important
ter
mi
nological re
la
ti
ons: po
lysemy, homonymy, syno
-
ny
my, antonymy and hyponymy. As a result a semantic
net
work of the type of
WordNet
, hierarchically orga
ni
-
sed aro
und the hyponymy relation, was obtained. At the
mo
ment o
f preparation of the paper it con
ta
i
ned:



lexemes: 2,900 Bul
ga
ri
an and 2,644 Rus
sian;



hyponyms chains: 276 Bulgarian and 283 Rus
-
sian;



antonyms chains: 157 Bulgarian and 134 Rus
-
sian;



absolute synonyms chains: 483 Bulgarian and
458 Russian;



relative
synonyms chains: 136 Bulgarian and
114 Russian;



homonyms: 14 Bulgarian and 6 Russian;



polysemous words: see Table 4.

The direct extraction of
homonyms
,
synonyms

(
sty
-
lis
tic
and
relative
) and
polysemous terms

from the dic
ti
-
o
nary entries was simplified b
ecause of the or
ga
nisation
of
ArtsDict
. The
hyponyms

and
antonyms

posed a prob
-
lem though. For the ext
rac
ti
on of hypo
nyms sharing a
com
mon term
-
ele
ment (root/stem, affix, word as a com
-
po
nent of MWT or another complex word, MWT), not
ne
ces
sa
ri
ly sha
red also by the hypernym, a formal tech
-
ni
que was used.
ArtsDict
was given a hy
po
nym/hy
per
-
nym, expressed through SWT or MWT, and it produced
cha
ins of SWT and MWT con
ta
i
ning the target term
-
ele
ment. These were further in
ves
ti
ga
ted and

the hy
po
-
nyms were sieved by the le
xi
co
lo
gi
cal researcher
[
Atanas
so
va&al.,2002
]
.

A similar technique was used
to fa
ci
li
ta
te the extraction of
antonyms

sharing a com
-
mon term
-
element as well as for
shared
-
root synonyms
(also with common suffix
or prefix).


Senses
count

1

2

3

4

5

6

7

Bulgarian

2,571

273

49

4

2

1

0

Russian

2,313

263

56

9

2

0

1

Table 4.

Terms polysemy.

For the extraction of hyponyms sharing no term
-
ele
-
ment we used
latent se
man
tic analysis
(
LSA
). This is a
popular technique fo
r indexing, retrieval and analysis of
textual data, and assumes a set of mu
tu
al latent de
pen
-
den
cies between the terms and the con
texts they are us
-
ed in.
This
permits LSA to de
al suc
ces
sfully with sy
no
-
nymy and partially with po
ly
se
my, which a
re the ma
jor
prob
lems with the word
-
based
text pro
cessing tech
-
niques

(due to the fre
e
dom and va
ria
bi
li
ty of exp
res
si
-
on). LSA is a two
-
stage pro
cess in
clu
ding learning and
analysis. Du
ring the learning pha
se it is given a text col
-
lec
tio
n and it produces a real
-
valued vector for each
term and for each do
cu
ment. The se
cond phase is the
analysis when the pro
ximity between a pa
ir of do
cu
-
ments or terms
is calcu
la
ted

as the dot product bet
we
en
their normalised LSA vectors (see [
Lan
dauer&al.,1998
]
for an introduction to LSA).

We tried to use as fe
a
tu
res raw or seg
men
ted words
(after stop
-
words and infrequent words re
mo
val; the
SWT and MWT from the dictionary were considered as
sin
gle words) and the former have be
en found to

be mo
-
re suitable for our task (see [
Ata
nas
so
va&Na
kov,2001a
]
for de
ta
ils). During both training and analysis the en
gi
-
ne has been used with one lan
gu
age at a time: Bul
ga
rian
or Russian.

In the analysis phase, LSA was given a hy
po
nym or
a hy
per
nym, exp
res
sed as SWT or MWT, and it pro
du
-
ced a ran
ked list as a result, sor
ted ac
cor
ding to the se
-
man
tic proximity to the tar
get. The le
xi
co
gra
pher ma
-
nu
al
ly investigated the re
sult and kept only the true hy
-
po
nyms. Although LSA wa
s intended to focus on hy
po
-
nyms with no shared term elements the re
tur
ned list co
-
uld pos
sibly con
tain such, as long as they are con
si
de
-
red se
man
ti
cal
ly clo
se eno
ugh by the LSA en
gine (see
[
Na
kov&Ata
nas
so
va,2001
]).

The dua
lis
tic na
ture of LSA allowed us to mea
su
re
the proximity not only between terms (SWT or MWT)
but also bet
we
en their glosses (see [
Atanas
so
va&Na
-
kov,2001b
]). We used as tar
get the
glosses
of the target
hy
per
nym (or the
glos
ses
of some of its known hy
po
-
n
yms) but also the hypernym
itself

(using some of its
kno
wn hypo
nyms was another option we found useful).
In the lat
ter case we compared it against the term vec
-
tors whi
le in the former


against the document vectors.
Qu
e
ry
ing using terms performed b
et
ter but the two vari
-
ants have be
en used in parallel sin
ce they proposed dif
-
fe
rent arrangement of the poten
ti
al hyponyms and each
of them was useful for the le
xi
co
grapher who was not
wil
ling to miss any potential hy
po
nym.

3.2. Functionality

The primary purpose of
ArtsSemNet
is to assist the le
xi
-
co
grapher with his work by providing him with a tool
for fast and easy access to rich fine arts terminology
(see [
Atanassova&al.,2003
]). When a se
arch for a par
ti
-
cu
lar term is per
formed
Arts
Se
m
Net
displays its glosses,
ho
mo
nyms, synonyms (both absolute and relative) and
sy
nonyms chains, anto
nyms and antonyms chains, as
well as hypo
nyms chains the target term is part of (both
as hypo
nym or hypernym).
ArtsSemNet
offers a clean
and intuit
ive interface. The user can input a term to be
ex
plored, change the language being used or spe
cify
dif
ferent search criteria. The infor
ma
tion dis
pla
yed for
a given term includes:



term glosses list;



homonyms list;



absolute synonyms chains;



relative s
ynonyms chains;



antonyms chains;



hyponyms chains with the target term as a hy
-
per
nym;



hyponyms chains with the target term as a co
-
hy
ponym.

The system offers several options: whether the term
is to be searched exactly or partial mat
ches sho
uld be
consi
dered as well (e.g. root or pre
fix); whe
ther the
homonyms, synonyms and sy
no
ny
ms chains, antonyms
and antonyms chains, and hy
po
nyms and hyponym
chains should be dis
pla
y
ed.

Glosses
are presented as plain text one per line with
numbers added in fro
nt, in case there is more than one
gloss for the target term.

Homonyms

are listed one per
line.
Absolute synonyms
,
relative synonyms

and
anto
-
nyms

are hyphen
-
separated. If a relative synonym of the
target term has some ab
so
lu
te synonyms these are listed

after it comma
-
se
pa
ra
ted. So are the absolute synonyms
of the an
to
nyms.

Hyponyms chains

are listed as terms lists where the
hypernym is displayed first, followed by its hy
po
nyms.
Aga
in, if a term has absolute syno
nyms, the
se are sho
-
wn along wit
h it separated by com
mas. If a polysemous
term is the hypernym of more than one hyponyms chain
the corresponding gloss is displayed in brackets for
each of them. This is si
milar to the
synsets
in
WordNet
.
The user in
ter
fa
ce al
lows also displaying se
pa
ra
te
ly
each hy
po
nym, which is the hypernym of hy
po
nyms
cha
ins of its own as well as showing these chains.

In any case, when the terms lists are displayed each
distinct one is presented as a hyperlink. When the latter
is followed the target term c
hanges and the cor
res
pon
-
ding information about the new one is dis
pla
yed (it in
turn contains hyperlinks to other terms and so on). The
navigation mechanism is si
mi
lar to the one pro
vided by

Figure 2.
Screenshot from
ArtsSemNet
.


a standard Web bro
w
ser: even the stan
dard forward and
bac
kward but
tons are pre
sent, visualised as left and
right arrows, so that the user can navigate back to the
al
ready vi
si
ted terms and then can go forth. Figure 2
sho
ws
ArtsSemNet
af
ter a successful search for the Bul
-
ga
ri
an term
надлъжна гравюр
а
.

ArtsSemNet

is implemented in
Borland Delphi
us
-
ing
the relational database mana
ge
ment system
Micro
-
soft Access
for the storage and retrieval of the fine arts
ter
minological terms, designed in a way to ensure effi
-
ci
ent processing for the kinds of q
ueries needed.

4. Related Work


WordNet.
WordNet
has been developed by psy
cho
lin
-
gu
ists from the Cognitive Science La
bo
ra
tory of the
Princeton University as a com
pu
ta
tio
nal model of the
hu
man lexical memory. Since then the project evaluated
into

a general lexical re
fe
rence system comprising thou
-
sands of words and their corresponding glosses, or
ga
ni
-
sed into a se
man
tic network. The terms (lexemes) in
Word
Net
are represented as one or more
synsets
(i.e.
sy
-
no
nym sets
). A synset groups a te
rm with some of its
sy
nonyms, which taken as a whole represent a par
ti
cu
-
lar lexical sense of that term (see [
Fellba
um,
1998
;
Mil
-
ler&al.,1990
]). A le
xi
cal
ly ambiguous term is included
in more than one syn
sets: one for each of its senses
(according

to the sense gra
nu
la
ri
ty level chosen by the
network). The synsets are hi
e
r
archically interconnected
according to the hy
po
ny
my and the meronymy (part
-
whole) relations and are further distinguished by more
spe
ci
fic pro
per
ti
es. The work on th
e project continues
and the latest version 2.0 of
WordNet

includes 115,424
synsets


79,689 nouns, 13,508 verbs, 18,563 adjectives
and 3,664 adverbs [
WordNet
].
WordNet
is am
ong the
most important resources for natural lan
gu
a
ge pro
ces
-
sing, machine tra
nslation, word sense di
sam
bi
guation,
in
formation extraction, in
for
mation re
tri
eval etc.

EuroWordNet.
The success of
WordNet

pro
vo
ked in
te
-
rest in the development of similar re
sour
ces for other
lan
guages. In 1996 the European Com
mission fund
ed
Натюрморт

(Bulgarian)

1. Един от жанровете на изобразителното изкуство, който изобразява битови предмети, зеленчуци,
плодове, убит дивеч, цветя и др.;

2. Отделно произведение от този жанр.

Натюрморт

(Russ
ian)

1. Один из жанров изобразительного искусства, посвященный воспроизведению предметов обихода,
снеди (овощи, мясо, битая дичь, фрукты), цветов и пр.;

2. Отдельное произведение этого жанра.

Table 5.

Parallel notions in Bulgarian and Russian.


the
EuroWordNet

project, co
ve
ring 7 European lan
gu
a
-
ges in parallel (see [
Euro
Word
Net
;
Vossen,1998
])
:

Cze
-
ch, Dutch, Estonian, Fren
ch, German, Italian and Spa
-
nish. Each part of
Eu
ro
WordNet
uses its own language
-
spe
cific synsets but all are in
ter
con
nec
ted by means of a
com
mon index based on
WordNet
, so that the navi
ga
ti
-
on between the si
mi
lar words in different languages is
pos
sible in all di
rections. While the
EuroWordNet
pro
-
ject was fi
nis
hed in 1999 (as opposed to
Word
Net
whi
-
ch

has always be
en active) the work on ot
her Eu
ro
pe
an
lan
guages continues. There are al
re
a
dy
WordNet
s ava
i
-
la
ble for Basque, Portuguese and Swe
dish. Under deve
-
lop
ment are ones for Bul
ga
ri
an, Danish, Greek, Ice
lan
-
dic, Latvian, Moldavian, No
r
wegian, Romanian, Rus
si
-
an (see
[
RWN
]
), Ser
bi
an, Slo
ve
ni
an, Swedish and Tur
-
kish. Several non
-
European languages have projects un
-
der de
ve
lop
ment (see the Web page of the
Global
Word
Net As
sociation
for details, [
GWA
]).

There have been also so
me attempts to integrate do
-
ma
in
-
specific terminologies into EuroWordNet [
Mag
ni
-
ni&Speranza,2001
;
Stamou&al.,2002
].

BalkaNet.
This is an ongoing project whose aim is the
creation of a multilingual lexical database con
sis
ting of
WordNets

for the follow
ing mostly Bal
kan lan
gu
a
ges:
Gre
ek, Tur
kish, Romanian, Bul
ga
ri
an, Czech and Ser
bi
-
an (in fact Czech is not a Bal
kan lan
gu
a
ge, but is Sla
vo
-
nic just like Bulgarian and Ser
bi
an). The ob
jec
ti
ve is to
collect some 15,000 com
pa
rable syn
sets (aro
und 30,000
literals) in each lan
gu
age, co
vering ge
neric vocabulary,
distributed in
to the fol
lowing POS categories: 65%
nouns, 25% verbs, 5% ad
jec
ti
ves and 5% adverbs (see
[
BalkaNet
]
). The data will be later incorporated in
to
Euro
WordN
et
.

The first attempts to build a Bulgarian
WordNet

focused on automatic construction from Eng
lish
-
Bul
-
garian and Bulgarian
-
En
g
lish elec
tro
nic dic
tio
na
ries
(see
[
Nikolov&Pe
tro
va,2001
]
). For the
Balka
Net

project
though, eve
rything has been cre
a
ted from scratch. At
the moment of pre
pa
ra
ti
on of the present paper t
he
Bulgarian
WordNet

con
ta
ined abo
ut 8,000 synsets (see
[
BWN
]).

5.

A
RTS
S
EM
N
ET

and
W
ORD
N
ET

WordNet

and
ArtsSemNet

have similar func
tio
na
lity but
there are also some important

dif
fe
ren
ces. As we men
ti
-
o
ned above, the terms in
WordNet
are represented not
as entities of their own but as synsets. Although this is a
clean way to express the lexical relations as holding
bet
ween
senses
and not between the terms themselves,
it i
s also partly due to the fact that
WordNet
was de
sig
-
ned for English where the same word could often be
-
long to several different parts of speech (e.g. noun, ad
-
jec
tive and verb), which implies different senses ac
cor
-
ding to
WordNet
. This is highly unli
kely for Slavonic
lan
gu
a
ges: while they are rich in homographs, these in
-
vol
ve mostly inflected wordforms and only occa
sio
nal
-
ly hold between two or more lemmas. In ad
dition, at
present
ArtsSemNet
focuses on no
uns only, while the
ho
mographs in t
he Slavonic languages involve mostly
words with different POS.

The synset organisation of
WordNet

implies also
so
me in
ter
face dif
fe
rences. When the user en
ters a qu
e
-
ry word,
WordNet
displays all syn
sets it is included in
along with their glosses
. In addition, the synonyms, co
-
hy
po
nyms, hypo
nyms and hypo
nyms chains, me
ro
-
nyms/holo
nyms, antonyms and coordinated words can
be shown. All this in
for
ma
tion is related to the cor
res
-
pon
ding
synsets
of the target. A sum
ma
ry of the major
dif
ferences bet
we
en
Arts
Sem
Net
and
WordNet
fol
lows:



ArtsSemNet
is term
-
centred, while
WordNet
is
built on synsets (senses).
ArtsSemNet
in
clu
des so
me
internal organisation similar to syn
sets as well but only
when it is really needed to split the term
for a particular
relation
(e.g. hy
po
nymy, see
Tables 6,7
). The synsets
do not necessarily correspond to different glosses. Even
when a term has different glos
ses (i.e. sen
ses) this does
not im
ply that this will make dif
fe
ren
ce for
all
the re
la
-
ti
ons it is in
volved in (e.g. due to systematic rela
tions).
If one fol
lo
wed the
WordNet
ap
pro
ach for a focused
domain
-
specific ter
mi
no
lo
gical network this would re
-
sult in several
pa
ral
lel
sense
-
sen
se re
lations (see
Tables
6,7
), which we wante
d to avo
id.



WordNet
does not distinguish between
ab
so
lute
and
relative
synonyms as
ArtsSemNet
does, which, in
our opinion, is an important dis
tin
ction for a domain
-
spe
cific terminology. Examples of ab
solute synonyms:
Bul
garian (
готи
чес
ки стил


готика
;
изумруд


сма
-
рагд
;
ис
то
ри
ческо платно


историческа картина
;
на
ки
ти


бижу
;
торсо


торс
;
морски пейзаж


ма
-
ри
на
;
разяждане


ецване
) and Russian (
муш
та
бель


палка
;
арабеска


арабеск
;
барбы


зау
сен
цы
;
вос
-
ко
вая
живопись


энкаустика
;
ге
ма
тит


кровавик
;
от
печаток


оттиск
;
оклад


басма
;
мягкий кра
ке
-
Пейзаж, Ландшафт (жанр)

Градски пейзаж


Исторически пейзаж


Морски пейзаж, Марина



Парков
пейзаж

Пейзаж, Ландшафт (произведение)

Ведута


Морски пейзаж, Марина

Портрет (жанр)

Автопортрет


Акварелен портрет


Бюст, Бюстов портрет


Групов портрет


Кавалетен портрет


Камерен портрет


Ктиторски портрети


Параден
портрет


Психологически портрет


Скулптурен портрет


Социален портрет


Фаюмски портрет


Херма

Портрет (произ
ведение
)

Автопортрет



Бюст, Бюстов портрет


Херма

Table 6.

Pseudosynsets and parallel homonymy in Bulgarian.

Перо (инструмент)

Гусиное перо



Рейсфедер


Рондо


Тростниковое перо, Калам

Перо (техника
)

Гусиное перо



Тростниковое перо, Калам

Table 7.

Pseudosynsets and parallel homonymy in Russian.


люр


плывучий кра
келюр
). Exam
ples of relative sy
no
-
nyms: Bulgarian (
брис
тол


ватман


торшон
;
ку
ке
-
ри


бабугери
;
мар
теница


китица


гадалушка
;
паф
ти



чап
рази


куки
;
златарство


ку
юм
джий
-
ство
;
но
жарство


бучакчийство
) and Russian
(
мас
ти
хин


шпатель
;
картинная га
ле
рея


пина
ко
-
те
ка
;
гиацинт


жёлтый яхонт
;
ру
бин


крас
ный
яхонт
).



WordNet
does not explicitly distinguish bet
we
-
en
homon
ymy
and
polysemy
, which has been sho
wn im
-
por
tant for some applications, e.g. in
for
ma
tion re
tri
e
val
(see [
Krovetz,1993
]).



ArtsSemNet
does not support the mero
ny
my/ho
-
lo
nymy relation (“
X

is part of
Y
”), present in
WordNet
.
This is because we fol
low the Bulgarian and Russian
lin
guistics tradition, where meronymy is consi
de
red as
a spe
cial kind of hypony
my/hy
per
ny
my and not a se
pa
-
ra
te relation.



The user interface of
WordNet

does not provide
au
tomated hyperlink
-
based navigation bet
we
en

terms
(as
ArtsSemNet
does), but has a prog
ramm
ing interface.
ArtsSemNet
is kept in a re
la
tio
nal database, which al
-
lows a simple prog
ram
ming access, although a specia
li
-
sed interface is not sup
ported at the moment.



ArtsSemNet
supports both Bulg
arian and Rus
si
-
an, while the original
WordNet
is for English only (and
EuroWordNet

supports another set of 7 Eu
ropean
languages, but at the moment


neither Bulgarian nor
Russian, but these are already under development).

We would like to point out tha
t we have two sepa
-
ra
te networks though
without
links between them. Al
-
tho
ugh they are accessed via the same in
ter
face, so that
a term can be looked up in either language (a lot of the
terms are present in both, but do not ne
ces
sa
ri
ly re
pre
-
sent
parallel notions /
Table 5
/, but al
so translingual
homonyms /
Table 3
/ etc.), there is no common index.
This is because of prob
lems due to language
-
specific
ter
minology (crafts, ma
terials, instruments, techniques)
ori
ginating from differences of culture
, traditions, cli
-
mate etc. Exam
p
les for Russian terms with no analo
gu
-
es in Bul
ga
ri
an are:
к
лееварка

(
к
леянка
),
пор
т
рет
ная
(
ro
om for portraits
),
резьба по газо
пе
но
бе
тону
,
резь
-
ба по ганчу
,
хохломская роспись

(
хох
лома
),
па
лех
с
-
кая миниат
юра
,
сграффито с инкрус
та
ци
ей цвет
-
ных штукатурок
. Some terms specific to Bul
ga
ri
an
in
clude:
каменина
,
ковано желязо
,
пас
тирска резба

(
овчарска резба
),
чипровски ки
лим
. Another source of
differences is the lan
gu
a
ge
-
specific deficiency of whole
classes of terms, e.g. particular fe
ma
le professionals:
Bul
garian
-
only (
гра
фичка
,
де
ко
ра
торка
,
дизайнерка
,
екс
пре
сио
нистка
,
кали
г
раф
ка
,
керамичка
,
мари
нис
-
т
ка
,
натура
лис
т
ка
,
рес
тав
раторка
) and Rus
si
an
only (
лепщица
,
ме
да
льерка
,
м
иниатюристка
,
си
лу
-
эт
истка
,
юмо
рис
тка
). Unlike
Euro
Word
Net
, which
is a
general
semantic net
work, we wan
ted to build one
that is both
spe
ci
a
li
sed
and as
com
p
le
te

as possible. We
were not willing to sa
crifice coverage in some lan
gu
a
-
ge, f
or the sake of cross
-
language index.

6. Availability and Usage

Both
ArtsDict
and
ArtsSemNet

are freely available for
re
search pur
po
ses and the latest versions can be found
on the Web (the applications and da
ta
base for Bulgarian
and Russian): www.cs.be
rkeley.edu/~nakov/artssemnet.

There are two variants of distribution: 1) Mi
cro
soft
Ac
cess .
mdb

file; and 2) SQL
-
script to create the
database schema and populate the data. The first one is
oriented to Windows applications and is suitable even
for users

that are not familiar with relational databases.
The second variant could be used by a software deve
lo
-
per to import the data into a standard RDBMS (e.g.
MySQL
,
Oracle
,
SQL Server
) and then access it using
his/her favourite programming language (e.g.
Jav
a
,
Perl
,
C++
,

C#
)
.

Technically, the software part of
ArtsSemNet

(both
the application and the database) is not li
mi
ted in any
way neither to Bulgarian/Russian nor to fine arts ter
mi
-
no
logy. It can be used with any ter
mi
nology in any lan
-
gu
age (excep
t when the alpha
bet used may be of con
-
cern, e.g. Chinese) as long as information about the
terms, glosses and relations is available. Since the data
is currently stored in for
mat that is compatible with MS
Access, it can be used as an alternative way to

explore
and edit the data, to add a new term, gloss or relation,
even a new lan
guage. The changes will be then auto
ma
-
ti
cally re
cognised and ready to use by the
ArtsSemNet

interface presented above.

7. Future Work

There are several directions for furt
her im
pro
ve
ment
and development of
ArtsSemNet
. First of all, so
me mi
-
nor functional additions are possible: e.g. enable direct
search for co
-
hyponyms. Second, it wo
uld be good to
provide a more intuitive na
vi
ga
ti
on: e.g. display the hy
-
po
nymy hie
rarchy in the form of tree/graph(s) thus
providing a better visual idea of the relations holding
bet
ween the different terms. Other relations, e.g. holo
-
ny
my can also be
ne
fit from a hierarchical visua
li
sa
ti
on.
A suitable graphical representation si
milar to the one
us
ed in the QuickGO browser (see [
QuickGO
]) for the
Ge
ne Ontology Web interface is another interesting op
-
ti
on. It wo
uld be go
od to allow for edi
ting/ad
ding/de
le
-
ting terms, glos
ses and relations di
rec
t
ly from the bro
w
-
ser in
ter
face. It wo
uld be also nice to try to interconnect
(may
be partially) the two lan
gu
a
ges si
mi
lar
ly to
EuroWordNet
. Ad
ding more lan
gu
ages is another pos
si
-
bi
lity.

8. References

[
Atanasova,2003
] Atanasova I. Fine Arts Ter
mi
no
logy in Rus
si
an
and Bulgarian (semasiological and onomasio
lo
gi
cal as
pect. Ph.D.
thesis. Ve
li
ko Tur
no
vo, Bulgaria, 2003.

[
Atanassova&al.,2003
] Atanassova I., S. Nakov, P. Nakov. Arts
-
SemNet: A Bilingual Semantic Network for Bulgarian and Rus
sian
Fine Arts Termi
nology. Proceedings of BulMET, Var
na, Bulgaria,
2003

[
Atanassova&al.,2002
] Atanassova I., Nakov P, Nakov S. In
-
formation Technologies Helping the Lin
guist
-
Explorer. Proc. VIII
th

International Sim
po
sium MAPRIAL 2002. pp. 304
-
309. Veliko Tur
-
no
vo, Bulga
ria, 2002.

[
Atanassova&Nakov,2001a
] Atanassova I., Nakov P. The Im
pact
of the Segmentation on the Au
to
ma
tic Hyponyms Ex
t
raction from
Termi
no
lo
gi
cal Dic
tionaries. Proc. Conference on Contem
po
ra
ry
Ac
hievements in the Philological Sciences and
the Fo
reign
Language University Education. Ve
li
ko Tur
no
vo, Bulgaria, 2001.

[
Atanassova&Nakov,2001b
] Atanassova I., Nakov P. Term and
Document from the Point of View of the Latent Semantic Ana
lysis.
Proc. In
ter
na
tional Conference “Technologies, Sa
fe
ty and Eco
logy”,
pp.(69)193
-
205. Veliko Turnovo, Bul
ga
ria, 2001.

[
BalkaNet
] BalkaNet: http://www.ceid.upatras.gr/Balkanet/

[
BWN
] Bulgarian WordNet: http://www.ibl.bas.bg/balk_en.htm

[
EFAB,1987
] Encyclopaedia of Fine Arts in Bul
ga
ria (Bul
ga
rian:
Енциклопедия на изобра
зи
тел
ни
те изкуства в Бъл
га
рия.) vol. I
-
II. Sofia, 1987.

[
EuroWordNet
] EuroWordNet:
http://www.illc.uva.nl/EuroWordNet/

[
Fellbaum,1998
] Fellbaum C. (ed.). WordNet: An Elec
tro
nic
Lexical Database, MIT Press, 1998.

[
Flerov,1981
]
Flerov A. Material Knowledge and Tech
no
lo
gy of
the Artistic Treatment of Metals (Russian: Мате
ри
а
ло
ве
де
ние и
технология худо
жес
твенной обработки ме
тал
лов). Vys
shaya
shkola. Moscow, 1981.

[
GWA
] Global WordNet Association:
http://www.globalword
net.org/

[
Krovetz,1993
] Krovetz R. Viewing Morphology as an In
fe
rence
Process. Proc. 16th ACM SIGIR Conf. on R&D in IR. pp. 191
-
202.
ACM. New York. 1993.

[
Landauer&al.,1998
] Landauer T., P. Foltz, D. Laham. In
tro
duc
-
tion to LSA. Discourse Processes, vo
l. 25, pp. 259
-
284, 1998.

[
Magnini&Speranza,2001
] Magnini B., Speranza M. Integ
ra
ting
Generic and Specialized Wordnets. Proc. Euroconference RANLP.
pp. 149
-
153, Tzigov Chark, Bulgaria, 2001.

[
Miller&al.,1990
] Miller G., Beckwith R., Fell
ba
um C., Gross
D.,
Miller K. Introduction to WordNet: An on
-
line le
xi
cal da
ta
ba
se.
Journal of Le
xicography, 3(4), pp. 235
-
244, 1990.

[
Nakov&Atanassova,2001
] Nakov P., Atanassova I. Auto
ma
tic
hy
ponymy extraction from Bulgarian and Russian ter
mi
no
lo
gi
cal
dict
ionaries. Proc. Naval Scientific Forum, vol. 3, pp.327
-
335.
Varna, Bulgaria, 2001.

[
Nikolov&Petrova,2001
] Nikolov T., K. Petrova. To
wards Buil
-
ding Bulgarian WordNet. Proc. Eu
ro
con
ference Recent Ad
vances in
Natural Language. Eds. G.Angelova, K.Bon
t
c
he
va, R.Mitkov, N.Ni
-
co
lov, N.Nikolov. pp.199
-
203, Tzi
gov Chark, Bul
ga
ria, 2001.

[
Novikov,1982
] Novikov L. Semantika russ
ko
go yazyka
(Семантика русского языка). Vysshaya shkola. Moscow, 1982.

[
Odnoralova,1982
] Odnoralova N. Sculpture and Sculp
tu
ra
l
materials. (Russian: Скульптура и скуль
птурные ма
те
ри
алы.),
Izobrazitel’noe is
kus
s
tvo. Moscow, 1982.

[
OED
] Oxford English Dictionary http://www.oed.com

[
Pavlovsky,1975
] Pavlovsky A. Monumental De
co
rative Arts
Materials and Technique (Rus
si
an:

Ма
териалы и тех
ни
ка
монументально
-
деко
ра
тив
ного искусства). Sovetsky hu
dozh
nik.
Mos
cow, 1975.

[
QuickGO
] QuickGO: GO Browser http://www.ebi.ac.uk/ego

[
RT
] Roget’s Thesaurus: http://www.bartleby.com/thesauri

[
RWN
] Russian WordNet:
http://www.phil.
pu.ru/depts/12/RN/Main.html

[
SDFAT,1970
] Short Dictionary of Fine Arts Ter
mi
nology (in
Bulgarian: Кратък речник на термините в изо
бра
зи
тел
но
то
изкуство). Bul
gar
ski Hudozhnik. Sofia, 1970.


[
SDFAT,1965
] Short Dictionary of Fine Arts Ter
mi
nology
(Rus
-
sian: Краткий словарь тер
ми
нов изо
бразительного ис
кусства).
Sov
re
men
niy hu
dozh
nik. Moscow, 1965.

[
Stamou&al.,2002
] Stamou, S., Ntoulas, A., Kyriakopoulou, M.,
Christodoulakis D. Expanding EuroWordNet with Domain
-
Spe
cific
Terminology Using Co
mmon Lexical Resources: Vo
ca
bu
lary
Completeness and Coverage Issues. Proc. First In
ter
na
tio
nal
WordNet Conference. Mysore, India, 2002.

[
Tsonev,1957
] Tsonev K. Painter’s Technical Guide (Bul
garian:
Технически наръчник на ху
дож
ни
ка). Nauka i izku
s
tvo,1957.

[
Vossen,1998
] Vossen P. (ed.). EuroWordNet: A Mul
ti
lin
gu
al
Database with Lexical Semantic Net
wor
ks, Kluwer Aca
de
mic
Publishers, Dordrecht. 1998.

[
Vinner,1954
] Vinner A. Art of Painting Ma
te
ri
als. (Rus
si
an:
Мате
ри
а
лы живописи). S
ovetsky hu
dozhnik. Moscow, 1954.

[
WordNet
] http://www.cogsci.princeton.edu/~wn