An Efficient Algorithm for Near Optimal Data Allocation on

donkeyswarmMobile - Wireless

Nov 24, 2013 (3 years and 8 months ago)

68 views


1

An Efficient Algorithm for Near Optimal Data Allocation on
Multiple Broadcast Channels

Chih
-
Hao Hsu
+
,
Guanling Lee
#

and Arbee L.P. Chen
*

+
Department of Computer
S
cience

National Tsing Hua University

Hsinchu, Taiwan 300, R.O.C

#
Department of Computer Scienc
e
and Information Engineering
.

N a t i o n a l D o n g H w a U n i v e r s i t y
H u a l i e n, Ta i w a n 9 7 3, R.O.C.


*
D e p a r t me n t o f C o mp u t e r S c i e n c e

N a t i o n a l C h e n g c h i U n i v e r s i t y

Ta i p e i, Ta i w a n, R.O.C.


Abs t r a c t



In a wireless environment,

the bandwidth
of the channels and the

energy
of the portable devices
are
limit
ed.

Data broadcast has become an excellent method for efficient data dissemination. In this
paper, the problem
for

generating
a
broadcast program

of a set of data items with the

associated access
frequencies

on multiple channels is explored. In our approach, a minimal expected average access
time
of the broadcast data items
is
first
derived.
The

broadcast program
is then generated, which
minimizes

the minimal expected average acc
ess time. Simulation is performed to compare the
performance
of
our approach with two existing approaches. The result of
the
experiment
s

shows that
our approach outperforms others and is in fact close to the optimal.

Keyword
s
:
Wireless
Environment
, Multipl
e
Broadcast Channel
s
, Data Allocation,
Broadcast
Program

1. Introduction

With the development of wireless technologies, people
can now

access information

any time, any
where

via wireless communication
s
. However, different
from

traditional wired network
s
,
s
ome issues
should be considered in the wireless environment. First, the bandwidth
of the wireless network and the
energy needed for

portable devices are limited.
Second
, the environment is asymmetric, that is,
from
the power consumption point of view, send
ing data is more costly than receiving data for a portable
computer.
Due to
these issues
, the traditional request
-
response system [TO98] is no longer suitable for
data
dissemination

in the wireless environment. Therefore, data dissemination in the wireless


2

environment has become an interesting research problem [AK93][IB93][PS98].

Broadcast
-
based

information systems provide the dissemination of information
with

a cost
independent of the number of clients, which compensate
s

for the limited bandwidth in the wi
reless
environment. Moreover, the clients can retrieve the broadcast data by just tuning to the broadcast
channel, which result
s

in a certain degree of energy saving. Therefore, data broadcast has become an
attractive solution for information dissemination
. However, in the broadcast
-
based system, the clients
have to access data items in the broadcast channel sequentially. Therefore, how to allocate data items
in the broadcast channel for efficient data access become
s

an important issue.

To evaluate the effi
ciency of data access, access time is used. Access time is the time
elapsed

from the moment a client first tune
s

into the broadcast channel to the moment the desired data are
acquired. In the broadcast
-
based system, a broadcast program needs to be construc
ted to determine the
order of data items to be broadcast. The main issue to generate a broadcast program is to minimize the
average access time for saving the bandwidth and energy in a mobile computing system.
M
any
researche
r
s focus on generating broadcast

program
s

for a single broadcast channel. In

[BGH92][HGL87],
the
server uniformly broadcasts

each

requested
data item.

However, in fact, some
data items are more frequently accessed than others. Acharya et al. [AAF95] propose the concept of
broadcast disks
, in which all data items are partitioned into several groups
such that

the group
s

contain
data items with higher access frequencies have shorter broadcast period
s
. As a result, the average
access time decreases. The performance of broadcast disk
s

is

furth
er

improved in
[AFZ95][AFZ96a][AFZ96b]
.

Moreover, approaches considering broadcasting
variable
-
sized data
items

are proposed

in [HV99][VH99].

T
he problem of broadcasting
location
-
dependent data is
proposed and discussed

in
[XZL03][ZXL04].

In recent years,
many researchers focus on generating a broadcast program for efficient multiple
data items access.
In [BS99], the method for finding
the
optimal broadcast program
for

two dependent
files is proposed.
A

lower bound on the average access time
of the optimal
broadcast program
for the
queries

which only require two data items

is
derived in [BNS00]
. In [CK99]
,

the
scheduling method
for answering multiple data items queries where there is no access order constraint among the required

3

data items is presented. The
broadcast order is decided by a greedy method based on the frequencies
of queries.

Based on [CK99], [LYL02] [LL03] propose a more efficient algorithm to solve this
problem.

In [CHK99], the scheduling method for answering dependent data items query is discu
ssed.
The
broadcast order

is decided by a set of heuristic rules.

In [LLC02], an efficient algorithm to
generate a broadcast program for accessing dependent data items is proposed. In the proposed method,
frequently co
-
accessed data are not only allocated
close to each other, but also in a particular order
which optimizes the performance of query processing.

The issue of data allocation on multiple channels is widely discussed recently.
Shivakumar et al.
[SV96] extend
Alphabetic Huffman Tree

to a k
-
nary se
arch tree and allocate this index tree to multiple
channels. However, it is inflexible because the number of channels must equal the height of the tree.
Lo and Chen [LC00] propose a solution for optimal index and data allocation, which minimizes the
averag
e access time for any number of broadcast channels.
In [HLC02], the concept of broadcast disks
is used to allocate the data and index on multiple broadcast channels.

In [HCH00], the issue of
allocating dependent data on multiple channels is discussed. A he
uristic algorithm is proposed to
cluster related data items to minimize the average access time. Furthermore, the problems of
broadcasting dependent data with data replication and dependent data
access
on multiple channels are
con
sidered in [HC0
4
] and [HCP
03]
, respectively
.

In fact,
the concept of broadcast disks can be used to
generat
e

broadcast programs on multiple
channels.
That is, the data items in each group can be allocated

into a channel where the groups
containing data items with

higher access freq
uencies have less data items such that the average access
time for these data items
is

reduced.
Peng and Chen [PC00] construct a
channel allocation tree

with
variant fan
-
out

and propose a

heuristic algorithm

VF
k

to generate

a broadcast program
.
This approa
ch
only works well as the number of channels is a power of 2. The reason is as follows.
VF
k

partitions
one channel into two channels to minimize the average access time of these two channels. However,
this partition only incurs local optimal. In [HLC01], a

near optimal algorithm for allocating data item
with same size on multiple broadcast channels is proposed. In [YNO02],
an approximation algorithm

for generating broadcast programs on multiple channels is proposed. O
ur approach

considers

4

allocating data it
ems in a channel with a goal to minimize the average access time for all data items.
Moreover, we also show that our work can be easily extended to deal with data items with variable
sizes. The rest of this paper is organized as follows. The problem of gen
erating broadcast programs on
the multiple broadcast channels is formulated in Section 2. In Section 3, the technique for generating
broadcast programs on multiple channels is proposed. The performance analysis is given in Section 4.
Finally, in Section 5,

conclusion and future work are
presented
.

2. Problem Description

2.1 Preliminaries


In a broadcast
-
based information system, the server generates a broadcast program and

periodically

broadcasts the data items according
ly

on the broadcast channels
.

The per
iodic broadcast
forms a
broadcast cycle
. It is possible for a data item to appear more than once in a broadcast cycle.
We assume that the data items are of equal size. Therefore, the broadcast bandwidth needed to allocate
a data item is equal, w
hich is

den
ote
d

as a
time slot
. Each data item
d
i

has a corresponding access
frequency
f
i
, which denotes the probability that data item
d
i

is needed by the clients. Moreover,
1
1



N
i
i
f

where
N

denotes the number of data items needed to be broadcast. The

broadcast program
is generated according to this probability distribution. An
instance

of a data item is defined as the
appearance of the data item on the broadcast channel
.

When the distance between any two instances of
data item
d
i

is the same, we say
d
i

is
equally spaced

with the distance
s
i
. The reciprocal of
s
i

is denoted
as
p
i

which is the probability that
d
i

will be selected to broadcast in each time slot.

The average access
time
for each

data item
d
i

is denoted
t
i
. Also,
the

)
:
1
(
M
total
t

is the average access time
for all data items,
that is

i
N
i
i
M
total
t
f
t




1
)
:
1
(
.


Wong [Won88] shows that, for all data items of equal size, the average access time can be
minimized if each data item is equally spaced and for any two data items
d
i

and
d
j
,
j
i
j
i
f
f
p
p
/
/

.

In the single broadcast channel environment,
1
1



N
i
i
p
. It is easy to show that
M
p
N
i
i



1

in the M broadcast channels environment. According to this property, the minimal

5

average access time of all data
items

on multiple channels can be derived as follows:

Lemma 1.
Assume that each data item is equally spaced, the minimal average access time for all data
items on M channels, denoted
)
:
1
(
min
M
t
, is given by


2
1
)
:
1
(
min
)
(
2
1



N
i
i
M
f
M
t

Proof.

With

the assumption that each data item is equally spaced, the average access time of data item
d
i

is
s
i
/2. Therefore, the average access time of all data items is





N
i
i
i
M
total
f
s
t
1
)
:
1
(
2
1


According to the property showed in [Won88], the average access time
of all data items can be
minimized if
j
i
j
i
f
f
p
p
/
/

, that is
i
i
f
a
p



where
a

is a constant. Moreover,
M
p
N
i
i



1
,
we can get that



N
i
i
f
M
a
1
/
. Therefore, the minimal average access time on M broadcast chan
nels

2
1
1
1
1
1
)
:
1
(
)
:
1
(
min
)
(
2
1
2
1
1
2
1
)
2
1
(
)
(



















N
i
i
N
i
i
i
N
i
i
N
i
i
i
N
i
i
i
M
total
M
f
M
f
f
M
f
f
p
f
s
Min
t
Min
t




In Lemma 1, the minimal average access time for all data items to be allocated from 1
th

to M
th

channels is derived. In general, the minimal average access time for allocating data items from
i
th

to
j
th

channels can be formul
ate as follows.

Lemma 2.

Assume the summation of access frequencies for the data items to be allocated from
i
th

to
j
th

channels is F, the minimal average access time is given by

2
1
)
:
(
min
)
(
)
1
(
2
1





N
i
i
j
i
F
f
i
j
t

Proof.
In Lemma 1, the summation of access frequencies

for all data items is equal to 1. When the
summation of access frequencies for the data items is F, the access frequency of each data item have
to be divided by F such that the summation of access frequencies for the data items is equal to 1.
Therefore, t
he equation of minimal average access time shown in Lemma 1 can be
transformed

as


6

2
1
)
:
(
min
)
(
)
1
(
2
1





N
i
i
j
i
F
f
i
j
t



The minimal average access time is based on the assumption that each data item is equally
spaced. However, in most cases, it is difficult to genera
te this kind of broadcast programs. For
example, assume that
p
1

= 1/2,
p
2

= 1/3 and
p
3

= 1/6, it is impossible to generate a broadcast program
to broadcast data item 1 exactly every two time slots, to broadcast data item 2 exactly every three time
slots an
d to broadcast data item 3 exactly every six time slots in a channel. Therefore, the minimal
average access time is the lower bound for a broadcast program. In Section 3, a heuristic algorithm
will be proposed to generate a near optimal broadcast program o
n multiple channels.

2.2
Problem

Formulation

1
Group 1
Group 2
0.4
0.15 0.15
(a): Three groups for the partition problem
0.1 0.1 0.1
5
4
3
2
6
Group 3
1
1
1
1
1
1
3
2
3
2
3
2
6
5
4
6
5
4
Channel 1
Channel 3
Channel 2
(b): Channel allocation corresponding to (a)
Figure 1: Generating broadcast program on multiple channels

In the multiple channels environment, generating broadcast programs can be treated as a partition
problem. That is, the data items can be partitioned into groups according to the nu
mber of channels,
which are then broadcast to the respective channels. For example, assume there are six data items and
three broadcast channels as shown in Figure 1. The data items are partitioned into three groups, Group
1, Group 2 and Group 3, and broad
cast to Channel 1, Channel 2 and Channel 3, respectively. In the
broadcast
-
based environment, as the number of data items in a broadcast channel increases, the
average access time of these data items will also increase. Therefore, the data items with highe
r access
frequencies have to be allocated in a channel containing fewer data items so that the average access

7

time will be minimized. The
problem

of generating broadcast programs on multiple channels is
formulated as follows:

Problem of generating broadca
st programs on multiple channels:

Given M channels and a set of
data items. Each data item is associated with an access frequency, which represents the probability the
data item is needed by the clients. Our problem is to partition the data items into M gr
oups and
allocate the data items in each group into an individual channel, such that the average access time for
all data items is minimized. The average access time of a broadcast program can be analyzed as
follows:

Definition

M
: The number of channels.

G
roup
i
: The set of data items in the group
i
, where
}
,...,
2
,
1
{
M
i

.

|Group
i
|
: The number of data items in
Group
i
.

Data
ij
: The
j
th
data item in
Group
i
, where
}
,...,
2
,
1
{
M
i

and
|}
|
,...,
2
,
1
{
i
Group
j

.

ij
Data
f
: The access freque
ncy of
Data
ij
.


Assume the clients tune into the broadcast channel in random. The average access time of data
items in channel
i
, denoted
i
total
t
, can be derived as follows:

|
|
2
1
|
2
1
|
|
1
|
|
1
|
|
0
2
|
|
0
i
Group
i
Group
i
i
total
Group
t
Group
dt
t
Group
t
i
i








Therefore, the average access time of all
data items, denoted
t
total
, is

)
|
(|
2
1
)
(
1
|
|
1
1
|
|
1
1

















M
i
Group
j
Data
i
M
i
Group
j
Data
i
total
N
i
i
i
total
i
ij
i
ij
f
Group
f
t
f
t
t


......
High
Low
Access
Frequency
Group
1
Group
2
Group
3
Group
M
Data Items
Figure 2: Partition problem for generating broadcast program on
multiple channels


8

3. Allocating Data Items on Multiple Channels


In this section, a heuristic algorithm is proposed to generate a near optimal broadcast program on
multiple channels. As sho
wn in Figure 2, data items are first sorted in descending order according to
the access frequencies.

The algorithm allocates the data items in the channels according to this order.
The number of data items

to

be

allocated to a channel is determined as foll
ows.


Assume the first
i



1 channels have been allocated and we are deciding the number of data items
to be allocated in the
i
th

channel. Given a certain number of data items to be allocated in the
i
th

channel,
we can compute the
minimal expected average

access time

(denoted MEAAT) of all data items. By
computing the MEAAT for each number in a certain range, the number with the minimal MEAAT will
be selected as the number of data items to allocate the data items to the
i
th

channel. In Subsection 3.1,
we d
erive the equation to compute the MEAAT, and in Subsection 3.2, we derive the range of the
possible number for computing the MEAAT.

3.1 Computing the MEAAT

Assume the first
i



1 channels have been allocated with the data items in Group
1

to Group
i
-
1
,
respe
ctively. The average access time of the data items in these channels is denoted
)
1
:
1
(

i
total
t
. Given the
number of data items to be allocated to the
i
th

channel |Group
i
|, the MEAAT of all data items can be
computed by

,
2
|
|
)
:
1
(
min
1
|
|
1
|
|
1
)
1
:
1
(
1
1
|
|
1
min
M
i
M
i
j
Group
k
Data
i
Group
j
Data
i
total
i
j
Group
k
Data
i
t
f
Group
f
t
f
T
j
jk
i
ij
j
jk





















whe
re
)
:
1
(
min
M
i
t


denotes the minimal average access time of the data items allocated in the (
i
+1)
th

to
M
th

channels. By Lemma 2,

,
)
(
)
(
2
1
2
1
|
|
1
1
|
|
1
)
:
1
(
min














M
i
j
Group
k
M
i
m
Group
n
Data
Data
M
i
j
m
mn
jk
f
f
i
M
t


we get


9

2
1
|
|
1
|
|
1
)
1
:
1
(
1
1
|
|
1
1
|
|
1
2
1
|
|
1
1
|
|
1
|
|
1
)
1
:
1
(
1
1
|
|
1
2
1
|
|
1
1
|
|
1
1
|
|
1
|
|
1
)
1
:
1
(
1
1
|
|
1
min
)
(
)
(
2
1
2
|
|
)
(
)
(
2
1
2
|
|
)
(
)
(
2
1
2
|
|


















































































M
i
j
Group
k
Data
i
Group
j
Data
i
total
i
j
Group
k
Data
M
i
m
Group
n
Data
M
i
j
Group
k
Data
M
i
j
Group
k
Data
i
Group
j
Data
i
total
i
j
Group
k
Data
M
i
j
Group
k
M
i
m
Group
n
Data
Data
M
i
j
Group
k
Data
i
Group
j
Data
i
total
i
j
Group
k
Data
i
j
jk
i
ij
j
jk
m
mn
j
jk
j
jk
i
ij
j
jk
j
m
mn
jk
j
jk
i
ij
j
jk
f
i
M
Group
f
t
f
f
f
i
M
f
Group
f
t
f
f
f
i
M
f
Group
f
t
f
T




Essentially, the MEAAT
i
T
min

consists of

three parts,
)
1
:
1
(

i
total
t
,
2
/
|
|
i
Group

and
M
i
t
~
1
min

.
)
1
:
1
(

i
total
t

is
the

average access time of the data items in the first
i



1 channels.
2
/
|
|
i
Group

is the average
access time of the data ite
ms in the
i
th channel
.

And

)
:
1
(
min
M
i
t


is the minimal average access time of the
data items in the (
i
+1)
th

to M
th

channels. The three values are used to estimate the minimal average
access time of all data items to determine the allocation of d
ata items in the
i
th

channel.

3.2
Deciding the Range

Group 1
Group 2
Group 3
Group M
...
Group 1
Group 1
Group 1
Group 1
Group 1
Group 1
Group 2
Group 2
Group 2
Group 3
Group 3
...
Group M
Channel 1
Channel 2
Channel 3
Channel M
Figure 3 : A hierarchical broadcast program for M channels

As shown in Figure 3, in our approach, a hierarchical broadcast program is generated and the
data items in each group will be allocated into the respective channel. In the hi
erarchical broadcast
program, we have the following properties:

Lemma 3.
For an optimal solution of the partition problem,
|Group
1
|



|Group
2
|






|Group
M
|
.

Proof.

Assume that


is an optimal solution for the partition problem and
|Group
i
|

>
|Group
j
|

for

i

<
j
.

10

We can move the data item
d
k

with the lowest access frequencies in Group
i

to Group
j

so that
|Group
i
|

decreases by 1 and
|Group
j
|
increases by 1. Consequently, the average access time for the data items in
Group
i

decreases by


|
|
1
2
1
i
ik
Group
k
Data
f
and the average access time for the data items in Group
j

increases by



|
|
1
2
1
j
jk
Group
k
Data
f

)
2
|
|
2
1
|
|
(
i
j
Group
Group


, where
2
|
|
2
1
|
|
i
j
Group
Group



is the change of the
average access time for
d
k
. Because
j
i
Group
Group

, we can get
0
2
|
|
2
1
|
|



i
j
Group
Group
.
Moreover, in our approach, the data

items allocated in Group
i

have higher access frequencies than
those allocated in Group
j
, we get





|
|
1
|
|
1
2
1
2
1
j
jk
i
ik
Group
k
Data
Group
k
Data
f
f
, that is the average access time of all
data items decreases. As a result,


is not an op
timal solution for the partition problem. Therefore, for
an optimal solution of the partition problem,
|Group
1
|



|Group
2
|






|Group
M
|
.




According to Lemma 3, the range of the number of data items in each channel can be derived as
follows.

Lemma 4.
For an optimal solution of the partition problem,

|Group
i
-
1
|



|Group
i
|



1
|
|
1
1






i
M
Group
N
i
j
j

P
roof
.

According to
Lemma

3, we get
|Group
i
-
1
|



|Group
i
|
.

And
1
|
|
1
1






i
M
Group
N
i
j
j

is the mean for
the remaining data items to be allocated in the remaining channel. If
|Group
i
|

>

1
|
|
1
1






i
M
Group
N
i
j
j
,
certainly, there is a group j where
j

>
i

such that
|Group
j
|

<

1
|
|
1
1






i
M
Group
N
i
j
j
. Th
is conflicts with the
property shown in Lemma 3. Therefore, we get that
|Group
i
-
1
|



|Group
i
|



1
|
|
1
1






i
M
Group
N
i
j
j




According to Lemma 4, the range of the number of data items in each channel can be determined.

3.3 The Heuristic Algorithm and Example


11


The heuristic
algorithm

to generate broadcast programs on multiple channels is presented as
follows:

Algorithm

Input:

The set of data items
D

= {
d
1
,
d
2
,

,
d
N
} and the corresponding access frequencies
f
i
; number of
channels, M.

Output:
A broadcast program.

Begin

1.

Sort all data items in descending order according to the corresponding access frequencies f
i
.

2.

For i = 1 to M
-
1

Be
gin


Computer
the range of
the number of data items

to be allocated

in channel i.

T
he number of data items to be
allocated in
channel i

is the number in the range which minimizes
the MEAAT of all data items.

End

3. Allocate the remaining data items into t
he last channel.

End.

The following example is used to illustrate our algorithm.

Table 1 shows the set of data items and their corresponding access frequencies. The data items are
sorted in descending order according to the access frequencies. The number o
f channels available is
assumed 4. As shown in Table 2(a), the number of data items to be allocated on channel 1 is
determined. According to Lemma 4, the range of the number of data items to be allocated on channel
1 is from 1 to 3. When |
Group
1
| = 2, the
MEAAT is minimal (1.3590). Therefore, we allocate D
1

and
D
2

on channel 1. Similarly, D
3,

D
4

and D
5

are allocated on channel 2 as showed in Table 2(b), and D
6,

D
7

and D
8

are allocated on channel 3 as showed in Table 2(c). Finally, the remaining data items a
re
allocated on channels 3, which is also shown in Table 2(c). In Table 2(c), the average access time of
all data items,
)
4
:
1
(
total
t

= 0.3485 + 0.4318 + 0.3182 + 0.303 = 1.4015 is also shown.


12

D
1
D
2
D
3
D
4
D
5
D
6
D
7
D
8
D
9
D
10
D
11
D
12
0.212
0.136
0.114
0.091
0.083
0.076
0.068
0.068
0.061
0.053
0.03
0.008
Data Items
Access
Frequencies
Table 1: Data items for the example

|
Group
1
|
1
3
2
0
0
0
0.1061
0.6932
0.3485
0.7528
1.0105
1.4424
1.4460
1.3590*
)
1
:
1
(
1
1
|
|
1







i
total
i
j
Group
k
Data
t
f
j
jk
2
/
|
|
|
|
1
i
Group
j
Data
Group
f
i
ij



)
:
1
(
min
1
|
|
1
M
i
M
i
j
Group
k
Data
t
f
j
jk







1.3363
1
min
T
Channel i
1
2
3
0.3485
0
0
|Group
i
|
2
0
0
2
/
|
|
|
|
1
i
Group
j
Data
Group
f
i
ij



4
0
0
(a)Determine the number of data items to be allocated on channel 1
|
Group
2
|
2
3
0.3485
0.3485
0.2045
0.4318
0.5891
1.3875
1.3694*
)
1
:
1
(
1
1
|
|
1







i
total
i
j
Group
k
Data
t
f
j
jk
2
/
|
|
|
|
1
i
Group
j
Data
Group
f
i
ij



0.8345
Channel i
1
2
3
0.3485
0.4318
0
|Group
i
|
2
3
0
4
0
0
|
Group
3
|
3
0.7803
0.3182
1.4015*
)
1
:
1
(
1
1
|
|
1







i
total
i
j
Group
k
Data
t
f
j
jk
2
/
|
|
|
|
1
i
Group
j
Data
Group
f
i
ij



0.3030
3
min
T
Channel i
1
2
3
0.3485
0.4318
0.3182
|Group
i
|
2
3
3
2
/
|
|
|
|
1
i
Group
j
Data
Group
f
i
ij



4
0.3030
4
Table 2: Generteing a near optimal broadcast program on multiple channels
2
/
|
|
|
|
1
i
Group
j
Data
Group
f
i
ij



)
:
1
(
min
1
|
|
1
M
i
M
i
j
Group
k
Data
t
f
j
jk







2
min
T
(b)Determine the number of data items to be allocated on channel 2
(c)Determine the number of data items to be allocated on channel 3 and 4
)
:
1
(
min
1
|
|
1
M
i
M
i
j
Group
k
Data
t
f
j
jk









4.

Performance Evaluation


In order to evaluate the performance of the proposed algorithm, a series of experiments are
performed based on different broadcast data sets. In the simulation, assume the size of all data items is
the same and it takes a
time unit

to access a data item. The cost metric is the average access time of all
data items. We compare the cost of our approach with that of two other algorithms proposed in
[PHO00] and [PC00]. In [PHO00], an approach called
step broadcast

is proposed.
In the step
broadcast, the summation of the access frequencies of the data items is almost the same in each group.
In [PC00], an algorithm
VF
K

is proposed to construct a channel allocation tree with variant fan
-
out to

13

minimize the average access time of al
l data items on multiple channels. First, the algorithm
VF
K

attaches all data items to the root node. After that, some data items with smaller access frequencies are
moved to the lower level so as to reduce the average access time of all data items. The pa
rtition is
evaluated iteratively with the objective of minimizing the average access time of all data items until
the depth of the channel allocation tree is equal to the number of channels.

4.1 Simulation Model


The following parameters are used to genera
te different broadcast data sets.

PARAMETERS



N
: The number of data items
to
be broadcast.



M
: The
number of channels.




:
The parameter of Zipf distribution
.


Parameters

Default value

Ranges

Number of data items

(
N
)

200

20


㄰〰

乵浢k爠潦⁣h~湮n汳

M
)

6

3



U

w楰⁰i牡浥瑥爠r

)

〮0




〮㤹

Table 3. Parameter Settings

The parameter settings for our experiments are listed in Table 3.

The access frequencies of the data
items are generated based on the Zipf distribution [GSE94]. In the Zipf distribution, the

access
frequencies of the data items follow the 80/20 rule that 80 percent clients are usually interested in 20
percent data items.

4.2 Performance Evaluation

4.2.1 Effect of the Number of Channels


In this simulation, the effect of the number of channels

is considered. The result is shown in
Figure 4. As shown in Figure 4(a), the average access time of all data items decrease
s

as the number
of the channels increase. Intuitively, as the number of channel increases, the number of data items
allocated in eac
h channel decreases. Therefore, the average access time of all data items is reduced. In
Figure 4(b), we show the ratio
)
(
bound
lower
bound
lower
time
access
average


of the three approaches.

14

Obviously, our approach outperforms two approaches. The reason is that our approach can

predict the
average access time of all data items in the partition operation. Therefore, we can allocate data items
in each channel with a goal to minimize the average access time for all data items. On the other hand,
the
VF
K

approach partition one chann
el into two channels to minimize the average access time of
these two channels. This partition only incurs local optimal. Therefore, the

VF
K

approach only works
well in a power of 2.

(a)The average access time of our approach,
VF
k
, step broadcast and lower bound
(b)Compared with the lower bound
by the ratio
Figure 4:
effect of the number of channels
10
20
30
3
4
5
6
7
8
9
Number of channels
Average access
time
Lower bound
Our approach
VF
Step broadcast
0
0.05
0.1
0.15
0.2
3
4
5
6
7
8
N
umber of the channels
Ratio
Our approach
VF
Step broadcast

4.2.2 Effect of the Node Number


Another fac
tor that affects the performance of the broadcast program is the number of the data
items. The simulation result is shown in Figure 5. Since the number of data items increases, the
number of data items allocated in each channel increases. Intuitively, the
time spent to access a data
item also increases. The result shown in Figure 5(a) confirms this intuition. Similarly, compared with
step broadcast and
VF
K
, our approach performs superiorly. This is shown in Figure 5(b).

(a)The average access time of our approach,
VF
k
, step broadcast and lower bound
(b)Compared with the lower bound
by the ratio
Figure 5:
effect of the number of nodes
0
30
60
90
120
0
200
400
600
800
1000
Number of nodes
Average access
time
Lower bound
Our approach
VF
Step broadcast
0
0.05
0.1
0.15
0.2
0.25
20
50
100
200
500
1000
Number of nodes
Ratio
Our approach
VF
Step broadcast


15

4.2.3 E
ffect of the Zipf Parameter



(a)The average access time of our approach,
VFk, step broadcast and lower bound
(b)Compared with the lower bound
by the ratio
Figure 6:
effect of the Zipf parameter
0
10
20
30
0
0.2
0.4
0.6
0.8
1
Theta
Average access
time
Lower bound
Our approach
VF
Step broadcast
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
0.2
0.4
0.6
0.8
0.99
Theta
Ratio
Our approach
VF
Step broadcast

Figure 6 shows the effect of the Zipf parameter


on the average access time of all data items for the
three approaches. The Zipf parameter

, limited from 0 to 1, is used to adjust the skew degre
e of the
access frequencies. As


increases, the access frequencies of data items are increasingly skew. A high
skew access frequencies means that a small number of data items are accessed frequently. This
explains why the average access time of all data i
tems, shown in Figure 6(a), will decrease as


tends
to 1. The ratio of the three
approaches

is shown in Figure 6(b).
Our approach outperforms

others. On
the contrary, step broadcast performs the worst when


tends to 1. The reason is that in step broadcas
t,
the summation of access frequencies of the data items is almost the same in each group. As the Zipf
parameter


tends to 1, the
access

frequencies

of the data items become skew, that is, few data items
hold high access frequencies. This results in a sit
uation that many data items with low access
frequencies will be allocated in the same channel. The average access time of all data items becomes
worse.

4.3

An Extension to Data Items with Variable Sizes

The simulation shows
that the

performance of our approach

is similar to that of the optimal one
when the data items have the same size. Actually, our work can be easily extended to handle the data
items with variable sizes. Vaidya and Hameed [VH99] showed that, for data items with variable sizes,
the average acc
ess time can be minimized if each data item is equally spaced and for any two data
items
d
i

and
d
j
,
)
/(
)
(
/
i
j
j
i
j
i
l
f
l
f
p
p



, where
l
x

denotes the length of
d
x
. According to

16

this property, the minimal average access time
)
:
1
(
min
M
t

can be exten
ded to deal with data items with
variable sizes
as follows.

Lemma 5.
Assume that each data item is equally spaced, the minimal average access time for all data
items on M channels, denoted
)
:
1
(
min
M
t
, is given by









N
i
i
i
N
i
i
i
M
l
f
l
f
M
t
1
1
)
:
1
(
min
2
1

Proof
.

With the assumption that each data item is equally spaced, the average access time of data item
d
i

is
s
i
/2. Therefore, the average access time of all data items is





N
i
i
i
M
total
f
s
t
1
)
:
1
(
2
1


According to the property showed in [VH99], the average access
time of all data items can be
minimized if
)
/(
)
(
/
i
j
j
i
j
i
l
f
l
f
p
p



, that is
i
i
i
l
f
a
p



where
a

is a constant.
Moreover,
M
p
N
i
i



1
.
We have



N
i
i
i
l
f
M
a
1
/
. Therefore, the minimal average access time on M
broadcast channel
s

)
(
)
(
2
1
2
1
1
2
1
)
2
1
(
)
(
1
1
1
1
1
1
)
:
1
(
)
:
1
(
min























N
i
i
i
N
i
i
i
N
i
i
i
i
N
i
i
i
N
i
i
i
N
i
i
i
M
total
M
l
f
l
f
M
f
l
f
M
l
f
f
p
f
s
Min
t
Min
t



According to Lemma 2 and Lemma 5, it is easy to find out that the minimal average access time for
allocating data items in
i
th

to
j
th

channels can be formulate
d

as
)
(
)
(
)
1
(
2
1
1
1
)
:
(
min










N
i
i
i
N
i
i
i
j
i
l
F
f
l
F
f
i
j
t
, where F denotes
the summation of access

frequencies
for the data items to be allocated in
i
th

to
j
th

channels
.


17

After calculating the minimum average access time for data items with variable sizes, we need to
decide the range

of the data items which can be allocated on each channel
. T
o deal with

the data items
with variable sizes, data items are first sorted in decreasing order by their corresponding
i
i
l
f
. As
mentioned in [VH99], the average access time can be minimized if each data item is equally spaced
and for any two data i
tems
d
i

and
d
j
,
)
/(
)
(
/
i
j
j
i
j
i
l
f
l
f
p
p



. That is, the higher the
i
i
l
f

is,
the more frequent the data item should be broadcast. Assume
Group
k

denotes the set of data items
allocated on channel k. Let ||
Group
k
|| denote the broadcast length

of
Group
k
, i.e.,



k
i
Group
d
i
k
l
Group
||
||
. For the optimal broadcast program, the data items with a higher
i
i
l
f

will be
put into the
Group

with a smaller ||
Group
||
.

Therefore, we get |
|Group
1
|
|



|
|Group
2
|
|






|
|Group
M
|
|.


According to

the previous discussion, the range of the data items which can be allocated in each
channel can be derived as follows.

Lemma 6.
For an optimal solution of the partition problem,

|
|Group
i
-
1
|
|



|
|Group
i
|
|



1
||
||
1
1






i
M
Group
TL
i
j
j
, where TL denotes the tot
al length of data items needed to be broadcast, i.e.,



n
i
i
l
TL
1
.

P
roof
.

According to
the previous discussion
, we get
|
|Group
i
-
1
|
|



|
|Group
i
|
|
.

And
1
||
||
1
1






i
M
Group
TL
i
j
j

is the mean
for the

length of the remaining data items to be allocate
d in the remaining channels. If
|
|Group
i
|
|

>

1
||
||
1
1






i
M
Group
TL
i
j
j
, certainly, there is a Group j where
j

>
i

such that |
|Group
j
|
|

<

1
||
||
1
1






i
M
Group
TL
i
j
j
. This conflicts with the property discussed above. Therefore, we get that
|
|
Group
i
-
1
|
|



|
|Group
i
|
|



1
||
||
1
1






i
M
Group
TL
i
j
j
.





18

5. Conclusion

In this paper, an
approach for

generating broadcast programs on multiple channels is proposed.
In this approach, we determine the number of data items to be allocated in each channel so that the
average access time of all data items is min
imized. Simulation is performed to compare the
performance between our approach with two other approaches. The experiment result shows that our
approach is better than others. In fact, the average access time of all data items incurred by the
broadcast pro
gram generating by our approach is close to its lower bound. Moreover, we show that
our approach can be easily extended to deal with the data items with variable sizes.

There are many applications that allow clients to access multiple data items simultane
ously in the
broadcast channels. How to allocate these data items on multiple channels to minimize the average
access time is a challenge. Moreover, how to generate the broadcast program to adapt to the changing
access frequencies is also a problem to solv
e.

Reference

[AAF95] S. Acharya, R. Alonso, M. Franklin, and S. Zdonik, "Broadcast Disks: Data management for
Asymmetric Communic
a
tion Environments
,
" Proc. ACM
International Conference on Management of
Data
, pages 199
-
210, May 1995.

[AFZ95]
S. Acharya, M.
Franklin

and S. Zdonik, “Dissemination
-
based Data Delivery

Using
Broadcast Disks
,
” IEEE Personal Communications, 2(6),
Dec.
1995.

[AFZ96a]
S. Acharya, M.
Franklin

and S. Zdonik, “Disseminating Updates on

Broadcast Disks
,
” Proc.
VLDB Conference, pages 354
-
3
65, 1996.

[AFZ96b]
S. Acharya, M.
Franklin

and S. Zdonik, “Prefetching from a Broadcast Disk
,
” Proc. IEEE
International Conference on Data Engineering, pages 276
-
285, 1996.

Advanced Applications (DASFAA), Hsinchu, Taiwan, April 1999.

[AK93] R. Alonso and
H. Korth,

Database Systems in Nomadic Computing,


Proc. ACM
International Conference on Management of Data, pages 388
-
392, 1993.

[BGH92] T.F. Bowen, G. Gopal, G. Herman, T. Hickey, K.C. Lee, W.H. Mansfield, J. Raitz, and A.
Weinrib,

The Datacycle Archit
ecture,


Communications of the ACM, pages 850
-
857, 1995.

[BNS00] A. Bar
-
Noy, J. Naor and B. Schieber, “Pushing Dependent Data in Clients
-
Providers
-
Servers
Systems,” MOBICOM conference, 2000.

[BS99] A. Bar
-
Noy and Y. Shilo, “Optimal Broadcasting of Two File
s over an Asymmetric Channel,”
IEEE INFOCOM Conference, 1999.

[CK99] Y. D. Chung, M.
-
H. Kim, "QEM: A Scheduling Method for Wireless Broadcast Data," Proc
.


19

6
th

International Conference on Database Systems for

Advanced Applications(DASFAA), Hsinchu,
Taiwan,

April 1999.

[CHK99]
Y. C. Chehadeh, A. R. Hurson and M. Kavehrad,
"
Object Organization on a
Single

Broadcast Channel in the Mobile Computing Environment,
"

Multimedia Tools and Applications
,

Vol.
9, No. 1, July 1999.

[GSE94] J. Gray, P. Sundaresan, S. Engl
ert, K. Baclawski, and P. J. Weinberger,

Quickly Generating
Billion
-
Record Synthetic Databases,


Proc. ACM International Conference on Management of Data,
pages: 243
-
252, 1994.

[HC0
4
]
J.
-
L. Huang and M.
-
S. Chen, “Dependent Data Broadcasting for Unordered
Queries in a
Multiple Channel Mobile Environment,” IEEE Transactions on Knowledge and Data Engineering,
200
4
.

[HCH00] A.R. Hurson, Y.C. Chehadeh and J. Hannan,

Object Organization on Parallel Broadcast
Channels in a Global Information Sharing Environment,


Proc.
IEEE I
nternational

P
erformance
,
Computing
,
and

C
ommunications
C
onference, Feb.
2000
.

[HCP03]
J.
-
L. Huang, M.
-
S. Chen and W.
-
C. Peng, “
Broadcasting Dependent Data for Ord
ered
Queries without Replication in a Multi
-
Channel Mobile Environment
,” Proc. IEEE International
Conference on Data Engineering, Mar
.

5
-
8, 2003.

[HGL87] G. Herman, G. Gopal, K.C. Lee, and A. Weinrib,

The Datacycle Architecture for Very High
Throughput D
atabase Systems,


Proc. ACM International Conference on Management of Data, pages
97
-
103, 1987.

[HLC01] Chih
-
Hao Hsu
, Guanling Lee

and A.L.P. Chen, “A Near Optimal Algorithm for Generating
Broadcast Programs on

Multiple Channels
,
” ACM CIKM 2001(Tenth Inter
national Conference on
Information and

Knowledge Management)
.

[HLC02]
Chih
-
Hao Hsu,
Guanling Lee and Arbee L.P. Chen

,

Index and Data Allocation on Multiple
Broadcast Channels Considering Data Access Frequencies,


International conference on mobile data
m
anagement
, 200
2
.

[HV99] S. hameed and N. Vaidya,

Efficient Algorithms for Scheduling Data Broadcastation,


ACM/Baltzer Wireless Networks, 5(3):183
-
193,1999.

[IB93] T. Imielinski, B.R. Badrinath,

Data Management for Mobile Computing,


SIGMOD
RECORD, 22(1)
: 34
-
39, 1993.

[LC00] S.C. Lo and A. L. P. Chen,

Optimal Index and Data Allocation in Multiple Broadcast
Channels,


Proc. IEEE International Conference on Data Engineering, pages 293
-
302, 2000.

[LLC02]
Guanling Lee, S.C. Lo and A.L.P. Chen, "
Data Allocation on the Wireless Broadcast Channel
for Efficient Query Processing
," IEEE Trans. On Computers

Special Section on Data.Management
Systems and Mobile Computing, October 2002, volume 51, pp.1237~1252.

[LL03]
Guanling Lee, and Shou
-
Chih Lo, "
Broadcast Data Allocation for Efficient Access of Multiple
Data Items in Mobile Environments
," ACM/Baltzer Mobile Networks and Applications (MONET),
August 2003, Volume 8, pp.365
-
375.


20

[LYL02] Guanling Lee, Meng
-
Shin Yeh
,
Shou
-
Chih Lo
,

and Arbee L.P. Chen
,
“A Strategy for
Efficient Access of Multiple Data Items in Mobile Environments
,


International conference on mobile
data management
, 200
2
.

[PC00]
W.C. Peng and M.S. Chen,


Dynamic Gen
eration of Data Broadcast Programs for a Broadcast
Disk Array in a Mobile Computing Environment,


Proc. ACM Inter
national

Conference on
Information and Knowledge Management, Nov. 2000.

[PHO00] Kiran Prabhakra, Kien A. Hua, and JungHwan Oh,

Multi
-
Level Mul
ti
-
Channel Air Cache
Designs for Broadcasting in a Mobile Environment,


Proc. IEEE International Conference on Data
Engineering, pages 167
-
176, 2000.

[PS98] E. Pitoura and G. Samaras,

Data Management for Mobile Computing,


Kluwer Academic
Publishers, 1998
.

[SV96]
N. Shivakumar and S. Venkatasubramanian
,

“Energy
-
Efficient Indexing For Inform
a
tion
Dissemination In Wireless Systems,” ACM Journal of Wireless and Nomadic Application, 1996.

[TO98] K.L Tan and B.C. Ooi,

Batch Scheduling for Demand
-
driven Servers

in Wireless
Environment,


Information Sciences, 109:281
-
298, 1998.

[VH99] N. Vaidya and S. Hameed,

Scheduling data broadcast in asymmetric communication
environments,


ACM/Baltzer Wireless Networks, 5(3):171
-
182,1999.

[Won88] J.W. Wong,

Broadcast delive
ry,


Proc. of the IEEE, 76(12): 1566
-
1577, 1988.

[XZL03] J. L. Xu, B. Zheng, W.
-
C. Lee, and D. K. Lee. Energy Efficient Index for Querying
Location
-
Dependent Data in Mobile Broadcast Environments. In Proceedings of the 19th International
Conference on Data

Engineering, March 2003.

[YNO02] W.G.. Yee, S.B. Navathe, E. Omiecinski and C. Jermaine, “Efficient Data Allocation Over
Multiple Channels at

Broadcast Servers,” IEEE Trans. On Computer, 1231~1236, 2002.

[ZXL04] B. Zheng, J. Xu, W.
-
C. Lee, and D. L. Lee.
Energy
-
Conserving Air Indexes for Nearest
Neighbor Search.In Proceedings of the 9th International Conference on Extending Database
Technology, March 2004.