1
An Efficient Algorithm for Near Optimal Data Allocation on
Multiple Broadcast Channels
Chih

Hao Hsu
+
,
Guanling Lee
#
and Arbee L.P. Chen
*
+
Department of Computer
S
cience
National Tsing Hua University
Hsinchu, Taiwan 300, R.O.C
#
Department of Computer Scienc
e
and Information Engineering
.
N a t i o n a l D o n g H w a U n i v e r s i t y
H u a l i e n, Ta i w a n 9 7 3, R.O.C.
*
D e p a r t me n t o f C o mp u t e r S c i e n c e
N a t i o n a l C h e n g c h i U n i v e r s i t y
Ta i p e i, Ta i w a n, R.O.C.
Abs t r a c t
In a wireless environment,
the bandwidth
of the channels and the
energy
of the portable devices
are
limit
ed.
Data broadcast has become an excellent method for efficient data dissemination. In this
paper, the problem
for
generating
a
broadcast program
of a set of data items with the
associated access
frequencies
on multiple channels is explored. In our approach, a minimal expected average access
time
of the broadcast data items
is
first
derived.
The
broadcast program
is then generated, which
minimizes
the minimal expected average acc
ess time. Simulation is performed to compare the
performance
of
our approach with two existing approaches. The result of
the
experiment
s
shows that
our approach outperforms others and is in fact close to the optimal.
Keyword
s
:
Wireless
Environment
, Multipl
e
Broadcast Channel
s
, Data Allocation,
Broadcast
Program
1. Introduction
With the development of wireless technologies, people
can now
access information
any time, any
where
via wireless communication
s
. However, different
from
traditional wired network
s
,
s
ome issues
should be considered in the wireless environment. First, the bandwidth
of the wireless network and the
energy needed for
portable devices are limited.
Second
, the environment is asymmetric, that is,
from
the power consumption point of view, send
ing data is more costly than receiving data for a portable
computer.
Due to
these issues
, the traditional request

response system [TO98] is no longer suitable for
data
dissemination
in the wireless environment. Therefore, data dissemination in the wireless
2
environment has become an interesting research problem [AK93][IB93][PS98].
Broadcast

based
information systems provide the dissemination of information
with
a cost
independent of the number of clients, which compensate
s
for the limited bandwidth in the wi
reless
environment. Moreover, the clients can retrieve the broadcast data by just tuning to the broadcast
channel, which result
s
in a certain degree of energy saving. Therefore, data broadcast has become an
attractive solution for information dissemination
. However, in the broadcast

based system, the clients
have to access data items in the broadcast channel sequentially. Therefore, how to allocate data items
in the broadcast channel for efficient data access become
s
an important issue.
To evaluate the effi
ciency of data access, access time is used. Access time is the time
elapsed
from the moment a client first tune
s
into the broadcast channel to the moment the desired data are
acquired. In the broadcast

based system, a broadcast program needs to be construc
ted to determine the
order of data items to be broadcast. The main issue to generate a broadcast program is to minimize the
average access time for saving the bandwidth and energy in a mobile computing system.
M
any
researche
r
s focus on generating broadcast
program
s
for a single broadcast channel. In
[BGH92][HGL87],
the
server uniformly broadcasts
each
requested
data item.
However, in fact, some
data items are more frequently accessed than others. Acharya et al. [AAF95] propose the concept of
broadcast disks
, in which all data items are partitioned into several groups
such that
the group
s
contain
data items with higher access frequencies have shorter broadcast period
s
. As a result, the average
access time decreases. The performance of broadcast disk
s
is
furth
er
improved in
[AFZ95][AFZ96a][AFZ96b]
.
Moreover, approaches considering broadcasting
variable

sized data
items
are proposed
in [HV99][VH99].
T
he problem of broadcasting
location

dependent data is
proposed and discussed
in
[XZL03][ZXL04].
In recent years,
many researchers focus on generating a broadcast program for efficient multiple
data items access.
In [BS99], the method for finding
the
optimal broadcast program
for
two dependent
files is proposed.
A
lower bound on the average access time
of the optimal
broadcast program
for the
queries
which only require two data items
is
derived in [BNS00]
. In [CK99]
,
the
scheduling method
for answering multiple data items queries where there is no access order constraint among the required
3
data items is presented. The
broadcast order is decided by a greedy method based on the frequencies
of queries.
Based on [CK99], [LYL02] [LL03] propose a more efficient algorithm to solve this
problem.
In [CHK99], the scheduling method for answering dependent data items query is discu
ssed.
The
broadcast order
is decided by a set of heuristic rules.
In [LLC02], an efficient algorithm to
generate a broadcast program for accessing dependent data items is proposed. In the proposed method,
frequently co

accessed data are not only allocated
close to each other, but also in a particular order
which optimizes the performance of query processing.
The issue of data allocation on multiple channels is widely discussed recently.
Shivakumar et al.
[SV96] extend
Alphabetic Huffman Tree
to a k

nary se
arch tree and allocate this index tree to multiple
channels. However, it is inflexible because the number of channels must equal the height of the tree.
Lo and Chen [LC00] propose a solution for optimal index and data allocation, which minimizes the
averag
e access time for any number of broadcast channels.
In [HLC02], the concept of broadcast disks
is used to allocate the data and index on multiple broadcast channels.
In [HCH00], the issue of
allocating dependent data on multiple channels is discussed. A he
uristic algorithm is proposed to
cluster related data items to minimize the average access time. Furthermore, the problems of
broadcasting dependent data with data replication and dependent data
access
on multiple channels are
con
sidered in [HC0
4
] and [HCP
03]
, respectively
.
In fact,
the concept of broadcast disks can be used to
generat
e
broadcast programs on multiple
channels.
That is, the data items in each group can be allocated
into a channel where the groups
containing data items with
higher access freq
uencies have less data items such that the average access
time for these data items
is
reduced.
Peng and Chen [PC00] construct a
channel allocation tree
with
variant fan

out
and propose a
heuristic algorithm
VF
k
to generate
a broadcast program
.
This approa
ch
only works well as the number of channels is a power of 2. The reason is as follows.
VF
k
partitions
one channel into two channels to minimize the average access time of these two channels. However,
this partition only incurs local optimal. In [HLC01], a
near optimal algorithm for allocating data item
with same size on multiple broadcast channels is proposed. In [YNO02],
an approximation algorithm
for generating broadcast programs on multiple channels is proposed. O
ur approach
considers
4
allocating data it
ems in a channel with a goal to minimize the average access time for all data items.
Moreover, we also show that our work can be easily extended to deal with data items with variable
sizes. The rest of this paper is organized as follows. The problem of gen
erating broadcast programs on
the multiple broadcast channels is formulated in Section 2. In Section 3, the technique for generating
broadcast programs on multiple channels is proposed. The performance analysis is given in Section 4.
Finally, in Section 5,
conclusion and future work are
presented
.
2. Problem Description
2.1 Preliminaries
In a broadcast

based information system, the server generates a broadcast program and
periodically
broadcasts the data items according
ly
on the broadcast channels
.
The per
iodic broadcast
forms a
broadcast cycle
. It is possible for a data item to appear more than once in a broadcast cycle.
We assume that the data items are of equal size. Therefore, the broadcast bandwidth needed to allocate
a data item is equal, w
hich is
den
ote
d
as a
time slot
. Each data item
d
i
has a corresponding access
frequency
f
i
, which denotes the probability that data item
d
i
is needed by the clients. Moreover,
1
1
N
i
i
f
where
N
denotes the number of data items needed to be broadcast. The
broadcast program
is generated according to this probability distribution. An
instance
of a data item is defined as the
appearance of the data item on the broadcast channel
.
When the distance between any two instances of
data item
d
i
is the same, we say
d
i
is
equally spaced
with the distance
s
i
. The reciprocal of
s
i
is denoted
as
p
i
which is the probability that
d
i
will be selected to broadcast in each time slot.
The average access
time
for each
data item
d
i
is denoted
t
i
. Also,
the
)
:
1
(
M
total
t
is the average access time
for all data items,
that is
i
N
i
i
M
total
t
f
t
1
)
:
1
(
.
Wong [Won88] shows that, for all data items of equal size, the average access time can be
minimized if each data item is equally spaced and for any two data items
d
i
and
d
j
,
j
i
j
i
f
f
p
p
/
/
.
In the single broadcast channel environment,
1
1
N
i
i
p
. It is easy to show that
M
p
N
i
i
1
in the M broadcast channels environment. According to this property, the minimal
5
average access time of all data
items
on multiple channels can be derived as follows:
Lemma 1.
Assume that each data item is equally spaced, the minimal average access time for all data
items on M channels, denoted
)
:
1
(
min
M
t
, is given by
2
1
)
:
1
(
min
)
(
2
1
N
i
i
M
f
M
t
Proof.
With
the assumption that each data item is equally spaced, the average access time of data item
d
i
is
s
i
/2. Therefore, the average access time of all data items is
N
i
i
i
M
total
f
s
t
1
)
:
1
(
2
1
According to the property showed in [Won88], the average access time
of all data items can be
minimized if
j
i
j
i
f
f
p
p
/
/
, that is
i
i
f
a
p
where
a
is a constant. Moreover,
M
p
N
i
i
1
,
we can get that
N
i
i
f
M
a
1
/
. Therefore, the minimal average access time on M broadcast chan
nels
2
1
1
1
1
1
)
:
1
(
)
:
1
(
min
)
(
2
1
2
1
1
2
1
)
2
1
(
)
(
N
i
i
N
i
i
i
N
i
i
N
i
i
i
N
i
i
i
M
total
M
f
M
f
f
M
f
f
p
f
s
Min
t
Min
t
■
In Lemma 1, the minimal average access time for all data items to be allocated from 1
th
to M
th
channels is derived. In general, the minimal average access time for allocating data items from
i
th
to
j
th
channels can be formul
ate as follows.
Lemma 2.
Assume the summation of access frequencies for the data items to be allocated from
i
th
to
j
th
channels is F, the minimal average access time is given by
2
1
)
:
(
min
)
(
)
1
(
2
1
N
i
i
j
i
F
f
i
j
t
Proof.
In Lemma 1, the summation of access frequencies
for all data items is equal to 1. When the
summation of access frequencies for the data items is F, the access frequency of each data item have
to be divided by F such that the summation of access frequencies for the data items is equal to 1.
Therefore, t
he equation of minimal average access time shown in Lemma 1 can be
transformed
as
6
2
1
)
:
(
min
)
(
)
1
(
2
1
N
i
i
j
i
F
f
i
j
t
■
The minimal average access time is based on the assumption that each data item is equally
spaced. However, in most cases, it is difficult to genera
te this kind of broadcast programs. For
example, assume that
p
1
= 1/2,
p
2
= 1/3 and
p
3
= 1/6, it is impossible to generate a broadcast program
to broadcast data item 1 exactly every two time slots, to broadcast data item 2 exactly every three time
slots an
d to broadcast data item 3 exactly every six time slots in a channel. Therefore, the minimal
average access time is the lower bound for a broadcast program. In Section 3, a heuristic algorithm
will be proposed to generate a near optimal broadcast program o
n multiple channels.
2.2
Problem
Formulation
1
Group 1
Group 2
0.4
0.15 0.15
(a): Three groups for the partition problem
0.1 0.1 0.1
5
4
3
2
6
Group 3
1
1
1
1
1
1
3
2
3
2
3
2
6
5
4
6
5
4
Channel 1
Channel 3
Channel 2
(b): Channel allocation corresponding to (a)
Figure 1: Generating broadcast program on multiple channels
In the multiple channels environment, generating broadcast programs can be treated as a partition
problem. That is, the data items can be partitioned into groups according to the nu
mber of channels,
which are then broadcast to the respective channels. For example, assume there are six data items and
three broadcast channels as shown in Figure 1. The data items are partitioned into three groups, Group
1, Group 2 and Group 3, and broad
cast to Channel 1, Channel 2 and Channel 3, respectively. In the
broadcast

based environment, as the number of data items in a broadcast channel increases, the
average access time of these data items will also increase. Therefore, the data items with highe
r access
frequencies have to be allocated in a channel containing fewer data items so that the average access
7
time will be minimized. The
problem
of generating broadcast programs on multiple channels is
formulated as follows:
Problem of generating broadca
st programs on multiple channels:
Given M channels and a set of
data items. Each data item is associated with an access frequency, which represents the probability the
data item is needed by the clients. Our problem is to partition the data items into M gr
oups and
allocate the data items in each group into an individual channel, such that the average access time for
all data items is minimized. The average access time of a broadcast program can be analyzed as
follows:
Definition
M
: The number of channels.
G
roup
i
: The set of data items in the group
i
, where
}
,...,
2
,
1
{
M
i
.
Group
i

: The number of data items in
Group
i
.
Data
ij
: The
j
th
data item in
Group
i
, where
}
,...,
2
,
1
{
M
i
and
}

,...,
2
,
1
{
i
Group
j
.
ij
Data
f
: The access freque
ncy of
Data
ij
.
Assume the clients tune into the broadcast channel in random. The average access time of data
items in channel
i
, denoted
i
total
t
, can be derived as follows:


2
1

2
1


1


1


0
2


0
i
Group
i
Group
i
i
total
Group
t
Group
dt
t
Group
t
i
i
Therefore, the average access time of all
data items, denoted
t
total
, is
)

(
2
1
)
(
1


1
1


1
1
M
i
Group
j
Data
i
M
i
Group
j
Data
i
total
N
i
i
i
total
i
ij
i
ij
f
Group
f
t
f
t
t
......
High
Low
Access
Frequency
Group
1
Group
2
Group
3
Group
M
Data Items
Figure 2: Partition problem for generating broadcast program on
multiple channels
8
3. Allocating Data Items on Multiple Channels
In this section, a heuristic algorithm is proposed to generate a near optimal broadcast program on
multiple channels. As sho
wn in Figure 2, data items are first sorted in descending order according to
the access frequencies.
The algorithm allocates the data items in the channels according to this order.
The number of data items
to
be
allocated to a channel is determined as foll
ows.
Assume the first
i
–
1 channels have been allocated and we are deciding the number of data items
to be allocated in the
i
th
channel. Given a certain number of data items to be allocated in the
i
th
channel,
we can compute the
minimal expected average
access time
(denoted MEAAT) of all data items. By
computing the MEAAT for each number in a certain range, the number with the minimal MEAAT will
be selected as the number of data items to allocate the data items to the
i
th
channel. In Subsection 3.1,
we d
erive the equation to compute the MEAAT, and in Subsection 3.2, we derive the range of the
possible number for computing the MEAAT.
3.1 Computing the MEAAT
Assume the first
i
–
1 channels have been allocated with the data items in Group
1
to Group
i

1
,
respe
ctively. The average access time of the data items in these channels is denoted
)
1
:
1
(
i
total
t
. Given the
number of data items to be allocated to the
i
th
channel Group
i
, the MEAAT of all data items can be
computed by
,
2


)
:
1
(
min
1


1


1
)
1
:
1
(
1
1


1
min
M
i
M
i
j
Group
k
Data
i
Group
j
Data
i
total
i
j
Group
k
Data
i
t
f
Group
f
t
f
T
j
jk
i
ij
j
jk
whe
re
)
:
1
(
min
M
i
t
denotes the minimal average access time of the data items allocated in the (
i
+1)
th
to
M
th
channels. By Lemma 2,
,
)
(
)
(
2
1
2
1


1
1


1
)
:
1
(
min
M
i
j
Group
k
M
i
m
Group
n
Data
Data
M
i
j
m
mn
jk
f
f
i
M
t
we get
9
2
1


1


1
)
1
:
1
(
1
1


1
1


1
2
1


1
1


1


1
)
1
:
1
(
1
1


1
2
1


1
1


1
1


1


1
)
1
:
1
(
1
1


1
min
)
(
)
(
2
1
2


)
(
)
(
2
1
2


)
(
)
(
2
1
2


M
i
j
Group
k
Data
i
Group
j
Data
i
total
i
j
Group
k
Data
M
i
m
Group
n
Data
M
i
j
Group
k
Data
M
i
j
Group
k
Data
i
Group
j
Data
i
total
i
j
Group
k
Data
M
i
j
Group
k
M
i
m
Group
n
Data
Data
M
i
j
Group
k
Data
i
Group
j
Data
i
total
i
j
Group
k
Data
i
j
jk
i
ij
j
jk
m
mn
j
jk
j
jk
i
ij
j
jk
j
m
mn
jk
j
jk
i
ij
j
jk
f
i
M
Group
f
t
f
f
f
i
M
f
Group
f
t
f
f
f
i
M
f
Group
f
t
f
T
■
Essentially, the MEAAT
i
T
min
consists of
three parts,
)
1
:
1
(
i
total
t
,
2
/


i
Group
and
M
i
t
~
1
min
.
)
1
:
1
(
i
total
t
is
the
average access time of the data items in the first
i
–
1 channels.
2
/


i
Group
is the average
access time of the data ite
ms in the
i
th channel
.
And
)
:
1
(
min
M
i
t
is the minimal average access time of the
data items in the (
i
+1)
th
to M
th
channels. The three values are used to estimate the minimal average
access time of all data items to determine the allocation of d
ata items in the
i
th
channel.
3.2
Deciding the Range
Group 1
Group 2
Group 3
Group M
...
Group 1
Group 1
Group 1
Group 1
Group 1
Group 1
Group 2
Group 2
Group 2
Group 3
Group 3
...
Group M
Channel 1
Channel 2
Channel 3
Channel M
Figure 3 : A hierarchical broadcast program for M channels
As shown in Figure 3, in our approach, a hierarchical broadcast program is generated and the
data items in each group will be allocated into the respective channel. In the hi
erarchical broadcast
program, we have the following properties:
Lemma 3.
For an optimal solution of the partition problem,
Group
1

Group
2

…
Group
M

.
Proof.
Assume that
is an optimal solution for the partition problem and
Group
i

>
Group
j

for
i
<
j
.
10
We can move the data item
d
k
with the lowest access frequencies in Group
i
to Group
j
so that
Group
i

decreases by 1 and
Group
j

increases by 1. Consequently, the average access time for the data items in
Group
i
decreases by


1
2
1
i
ik
Group
k
Data
f
and the average access time for the data items in Group
j
increases by


1
2
1
j
jk
Group
k
Data
f
)
2


2
1


(
i
j
Group
Group
, where
2


2
1


i
j
Group
Group
is the change of the
average access time for
d
k
. Because
j
i
Group
Group
, we can get
0
2


2
1


i
j
Group
Group
.
Moreover, in our approach, the data
items allocated in Group
i
have higher access frequencies than
those allocated in Group
j
, we get


1


1
2
1
2
1
j
jk
i
ik
Group
k
Data
Group
k
Data
f
f
, that is the average access time of all
data items decreases. As a result,
is not an op
timal solution for the partition problem. Therefore, for
an optimal solution of the partition problem,
Group
1

Group
2

…
Group
M

.
■
According to Lemma 3, the range of the number of data items in each channel can be derived as
follows.
Lemma 4.
For an optimal solution of the partition problem,
Group
i

1

Group
i

1


1
1
i
M
Group
N
i
j
j
P
roof
.
According to
Lemma
3, we get
Group
i

1

Group
i

.
And
1


1
1
i
M
Group
N
i
j
j
is the mean for
the remaining data items to be allocated in the remaining channel. If
Group
i

>
1


1
1
i
M
Group
N
i
j
j
,
certainly, there is a group j where
j
>
i
such that
Group
j

<
1


1
1
i
M
Group
N
i
j
j
. Th
is conflicts with the
property shown in Lemma 3. Therefore, we get that
Group
i

1

Group
i

1


1
1
i
M
Group
N
i
j
j
■
According to Lemma 4, the range of the number of data items in each channel can be determined.
3.3 The Heuristic Algorithm and Example
11
The heuristic
algorithm
to generate broadcast programs on multiple channels is presented as
follows:
Algorithm
Input:
The set of data items
D
= {
d
1
,
d
2
,
…
,
d
N
} and the corresponding access frequencies
f
i
; number of
channels, M.
Output:
A broadcast program.
Begin
1.
Sort all data items in descending order according to the corresponding access frequencies f
i
.
2.
For i = 1 to M

1
Be
gin
Computer
the range of
the number of data items
to be allocated
in channel i.
T
he number of data items to be
allocated in
channel i
is the number in the range which minimizes
the MEAAT of all data items.
End
3. Allocate the remaining data items into t
he last channel.
End.
The following example is used to illustrate our algorithm.
Table 1 shows the set of data items and their corresponding access frequencies. The data items are
sorted in descending order according to the access frequencies. The number o
f channels available is
assumed 4. As shown in Table 2(a), the number of data items to be allocated on channel 1 is
determined. According to Lemma 4, the range of the number of data items to be allocated on channel
1 is from 1 to 3. When 
Group
1
 = 2, the
MEAAT is minimal (1.3590). Therefore, we allocate D
1
and
D
2
on channel 1. Similarly, D
3,
D
4
and D
5
are allocated on channel 2 as showed in Table 2(b), and D
6,
D
7
and D
8
are allocated on channel 3 as showed in Table 2(c). Finally, the remaining data items a
re
allocated on channels 3, which is also shown in Table 2(c). In Table 2(c), the average access time of
all data items,
)
4
:
1
(
total
t
= 0.3485 + 0.4318 + 0.3182 + 0.303 = 1.4015 is also shown.
12
D
1
D
2
D
3
D
4
D
5
D
6
D
7
D
8
D
9
D
10
D
11
D
12
0.212
0.136
0.114
0.091
0.083
0.076
0.068
0.068
0.061
0.053
0.03
0.008
Data Items
Access
Frequencies
Table 1: Data items for the example

Group
1

1
3
2
0
0
0
0.1061
0.6932
0.3485
0.7528
1.0105
1.4424
1.4460
1.3590*
)
1
:
1
(
1
1


1
i
total
i
j
Group
k
Data
t
f
j
jk
2
/




1
i
Group
j
Data
Group
f
i
ij
)
:
1
(
min
1


1
M
i
M
i
j
Group
k
Data
t
f
j
jk
1.3363
1
min
T
Channel i
1
2
3
0.3485
0
0
Group
i

2
0
0
2
/




1
i
Group
j
Data
Group
f
i
ij
4
0
0
(a)Determine the number of data items to be allocated on channel 1

Group
2

2
3
0.3485
0.3485
0.2045
0.4318
0.5891
1.3875
1.3694*
)
1
:
1
(
1
1


1
i
total
i
j
Group
k
Data
t
f
j
jk
2
/




1
i
Group
j
Data
Group
f
i
ij
0.8345
Channel i
1
2
3
0.3485
0.4318
0
Group
i

2
3
0
4
0
0

Group
3

3
0.7803
0.3182
1.4015*
)
1
:
1
(
1
1


1
i
total
i
j
Group
k
Data
t
f
j
jk
2
/




1
i
Group
j
Data
Group
f
i
ij
0.3030
3
min
T
Channel i
1
2
3
0.3485
0.4318
0.3182
Group
i

2
3
3
2
/




1
i
Group
j
Data
Group
f
i
ij
4
0.3030
4
Table 2: Generteing a near optimal broadcast program on multiple channels
2
/




1
i
Group
j
Data
Group
f
i
ij
)
:
1
(
min
1


1
M
i
M
i
j
Group
k
Data
t
f
j
jk
2
min
T
(b)Determine the number of data items to be allocated on channel 2
(c)Determine the number of data items to be allocated on channel 3 and 4
)
:
1
(
min
1


1
M
i
M
i
j
Group
k
Data
t
f
j
jk
4.
Performance Evaluation
In order to evaluate the performance of the proposed algorithm, a series of experiments are
performed based on different broadcast data sets. In the simulation, assume the size of all data items is
the same and it takes a
time unit
to access a data item. The cost metric is the average access time of all
data items. We compare the cost of our approach with that of two other algorithms proposed in
[PHO00] and [PC00]. In [PHO00], an approach called
step broadcast
is proposed.
In the step
broadcast, the summation of the access frequencies of the data items is almost the same in each group.
In [PC00], an algorithm
VF
K
is proposed to construct a channel allocation tree with variant fan

out to
13
minimize the average access time of al
l data items on multiple channels. First, the algorithm
VF
K
attaches all data items to the root node. After that, some data items with smaller access frequencies are
moved to the lower level so as to reduce the average access time of all data items. The pa
rtition is
evaluated iteratively with the objective of minimizing the average access time of all data items until
the depth of the channel allocation tree is equal to the number of channels.
4.1 Simulation Model
The following parameters are used to genera
te different broadcast data sets.
PARAMETERS
N
: The number of data items
to
be broadcast.
M
: The
number of channels.
:
The parameter of Zipf distribution
.
Parameters
Default value
Ranges
Number of data items
(
N
)
200
20
–
〰
乵浢k爠潦h~湮n汳
M
)
6
3
–
U
w楰⁰i牡浥瑥爠r
)
〮0
〠
–
〮㤹
Table 3. Parameter Settings
The parameter settings for our experiments are listed in Table 3.
The access frequencies of the data
items are generated based on the Zipf distribution [GSE94]. In the Zipf distribution, the
access
frequencies of the data items follow the 80/20 rule that 80 percent clients are usually interested in 20
percent data items.
4.2 Performance Evaluation
4.2.1 Effect of the Number of Channels
In this simulation, the effect of the number of channels
is considered. The result is shown in
Figure 4. As shown in Figure 4(a), the average access time of all data items decrease
s
as the number
of the channels increase. Intuitively, as the number of channel increases, the number of data items
allocated in eac
h channel decreases. Therefore, the average access time of all data items is reduced. In
Figure 4(b), we show the ratio
)
(
bound
lower
bound
lower
time
access
average
of the three approaches.
14
Obviously, our approach outperforms two approaches. The reason is that our approach can
predict the
average access time of all data items in the partition operation. Therefore, we can allocate data items
in each channel with a goal to minimize the average access time for all data items. On the other hand,
the
VF
K
approach partition one chann
el into two channels to minimize the average access time of
these two channels. This partition only incurs local optimal. Therefore, the
VF
K
approach only works
well in a power of 2.
(a)The average access time of our approach,
VF
k
, step broadcast and lower bound
(b)Compared with the lower bound
by the ratio
Figure 4:
effect of the number of channels
10
20
30
3
4
5
6
7
8
9
Number of channels
Average access
time
Lower bound
Our approach
VF
Step broadcast
0
0.05
0.1
0.15
0.2
3
4
5
6
7
8
N
umber of the channels
Ratio
Our approach
VF
Step broadcast
4.2.2 Effect of the Node Number
Another fac
tor that affects the performance of the broadcast program is the number of the data
items. The simulation result is shown in Figure 5. Since the number of data items increases, the
number of data items allocated in each channel increases. Intuitively, the
time spent to access a data
item also increases. The result shown in Figure 5(a) confirms this intuition. Similarly, compared with
step broadcast and
VF
K
, our approach performs superiorly. This is shown in Figure 5(b).
(a)The average access time of our approach,
VF
k
, step broadcast and lower bound
(b)Compared with the lower bound
by the ratio
Figure 5:
effect of the number of nodes
0
30
60
90
120
0
200
400
600
800
1000
Number of nodes
Average access
time
Lower bound
Our approach
VF
Step broadcast
0
0.05
0.1
0.15
0.2
0.25
20
50
100
200
500
1000
Number of nodes
Ratio
Our approach
VF
Step broadcast
15
4.2.3 E
ffect of the Zipf Parameter
(a)The average access time of our approach,
VFk, step broadcast and lower bound
(b)Compared with the lower bound
by the ratio
Figure 6:
effect of the Zipf parameter
0
10
20
30
0
0.2
0.4
0.6
0.8
1
Theta
Average access
time
Lower bound
Our approach
VF
Step broadcast
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
0.2
0.4
0.6
0.8
0.99
Theta
Ratio
Our approach
VF
Step broadcast
Figure 6 shows the effect of the Zipf parameter
on the average access time of all data items for the
three approaches. The Zipf parameter
, limited from 0 to 1, is used to adjust the skew degre
e of the
access frequencies. As
increases, the access frequencies of data items are increasingly skew. A high
skew access frequencies means that a small number of data items are accessed frequently. This
explains why the average access time of all data i
tems, shown in Figure 6(a), will decrease as
tends
to 1. The ratio of the three
approaches
is shown in Figure 6(b).
Our approach outperforms
others. On
the contrary, step broadcast performs the worst when
tends to 1. The reason is that in step broadcas
t,
the summation of access frequencies of the data items is almost the same in each group. As the Zipf
parameter
tends to 1, the
access
frequencies
of the data items become skew, that is, few data items
hold high access frequencies. This results in a sit
uation that many data items with low access
frequencies will be allocated in the same channel. The average access time of all data items becomes
worse.
4.3
An Extension to Data Items with Variable Sizes
The simulation shows
that the
performance of our approach
is similar to that of the optimal one
when the data items have the same size. Actually, our work can be easily extended to handle the data
items with variable sizes. Vaidya and Hameed [VH99] showed that, for data items with variable sizes,
the average acc
ess time can be minimized if each data item is equally spaced and for any two data
items
d
i
and
d
j
,
)
/(
)
(
/
i
j
j
i
j
i
l
f
l
f
p
p
, where
l
x
denotes the length of
d
x
. According to
16
this property, the minimal average access time
)
:
1
(
min
M
t
can be exten
ded to deal with data items with
variable sizes
as follows.
Lemma 5.
Assume that each data item is equally spaced, the minimal average access time for all data
items on M channels, denoted
)
:
1
(
min
M
t
, is given by
N
i
i
i
N
i
i
i
M
l
f
l
f
M
t
1
1
)
:
1
(
min
2
1
Proof
.
With the assumption that each data item is equally spaced, the average access time of data item
d
i
is
s
i
/2. Therefore, the average access time of all data items is
N
i
i
i
M
total
f
s
t
1
)
:
1
(
2
1
According to the property showed in [VH99], the average access
time of all data items can be
minimized if
)
/(
)
(
/
i
j
j
i
j
i
l
f
l
f
p
p
, that is
i
i
i
l
f
a
p
where
a
is a constant.
Moreover,
M
p
N
i
i
1
.
We have
N
i
i
i
l
f
M
a
1
/
. Therefore, the minimal average access time on M
broadcast channel
s
)
(
)
(
2
1
2
1
1
2
1
)
2
1
(
)
(
1
1
1
1
1
1
)
:
1
(
)
:
1
(
min
N
i
i
i
N
i
i
i
N
i
i
i
i
N
i
i
i
N
i
i
i
N
i
i
i
M
total
M
l
f
l
f
M
f
l
f
M
l
f
f
p
f
s
Min
t
Min
t
■
According to Lemma 2 and Lemma 5, it is easy to find out that the minimal average access time for
allocating data items in
i
th
to
j
th
channels can be formulate
d
as
)
(
)
(
)
1
(
2
1
1
1
)
:
(
min
N
i
i
i
N
i
i
i
j
i
l
F
f
l
F
f
i
j
t
, where F denotes
the summation of access
frequencies
for the data items to be allocated in
i
th
to
j
th
channels
.
17
After calculating the minimum average access time for data items with variable sizes, we need to
decide the range
of the data items which can be allocated on each channel
. T
o deal with
the data items
with variable sizes, data items are first sorted in decreasing order by their corresponding
i
i
l
f
. As
mentioned in [VH99], the average access time can be minimized if each data item is equally spaced
and for any two data i
tems
d
i
and
d
j
,
)
/(
)
(
/
i
j
j
i
j
i
l
f
l
f
p
p
. That is, the higher the
i
i
l
f
is,
the more frequent the data item should be broadcast. Assume
Group
k
denotes the set of data items
allocated on channel k. Let 
Group
k
 denote the broadcast length
of
Group
k
, i.e.,
k
i
Group
d
i
k
l
Group


. For the optimal broadcast program, the data items with a higher
i
i
l
f
will be
put into the
Group
with a smaller 
Group

.
Therefore, we get 
Group
1



Group
2


…

Group
M

.
According to
the previous discussion, the range of the data items which can be allocated in each
channel can be derived as follows.
Lemma 6.
For an optimal solution of the partition problem,

Group
i

1



Group
i


1


1
1
i
M
Group
TL
i
j
j
, where TL denotes the tot
al length of data items needed to be broadcast, i.e.,
n
i
i
l
TL
1
.
P
roof
.
According to
the previous discussion
, we get

Group
i

1



Group
i


.
And
1


1
1
i
M
Group
TL
i
j
j
is the mean
for the
length of the remaining data items to be allocate
d in the remaining channels. If

Group
i


>
1


1
1
i
M
Group
TL
i
j
j
, certainly, there is a Group j where
j
>
i
such that 
Group
j


<
1


1
1
i
M
Group
TL
i
j
j
. This conflicts with the property discussed above. Therefore, we get that


Group
i

1



Group
i


1


1
1
i
M
Group
TL
i
j
j
.
■
18
5. Conclusion
In this paper, an
approach for
generating broadcast programs on multiple channels is proposed.
In this approach, we determine the number of data items to be allocated in each channel so that the
average access time of all data items is min
imized. Simulation is performed to compare the
performance between our approach with two other approaches. The experiment result shows that our
approach is better than others. In fact, the average access time of all data items incurred by the
broadcast pro
gram generating by our approach is close to its lower bound. Moreover, we show that
our approach can be easily extended to deal with the data items with variable sizes.
There are many applications that allow clients to access multiple data items simultane
ously in the
broadcast channels. How to allocate these data items on multiple channels to minimize the average
access time is a challenge. Moreover, how to generate the broadcast program to adapt to the changing
access frequencies is also a problem to solv
e.
Reference
[AAF95] S. Acharya, R. Alonso, M. Franklin, and S. Zdonik, "Broadcast Disks: Data management for
Asymmetric Communic
a
tion Environments
,
" Proc. ACM
International Conference on Management of
Data
, pages 199

210, May 1995.
[AFZ95]
S. Acharya, M.
Franklin
and S. Zdonik, “Dissemination

based Data Delivery
Using
Broadcast Disks
,
” IEEE Personal Communications, 2(6),
Dec.
1995.
[AFZ96a]
S. Acharya, M.
Franklin
and S. Zdonik, “Disseminating Updates on
Broadcast Disks
,
” Proc.
VLDB Conference, pages 354

3
65, 1996.
[AFZ96b]
S. Acharya, M.
Franklin
and S. Zdonik, “Prefetching from a Broadcast Disk
,
” Proc. IEEE
International Conference on Data Engineering, pages 276

285, 1996.
Advanced Applications (DASFAA), Hsinchu, Taiwan, April 1999.
[AK93] R. Alonso and
H. Korth,
“
Database Systems in Nomadic Computing,
”
Proc. ACM
International Conference on Management of Data, pages 388

392, 1993.
[BGH92] T.F. Bowen, G. Gopal, G. Herman, T. Hickey, K.C. Lee, W.H. Mansfield, J. Raitz, and A.
Weinrib,
“
The Datacycle Archit
ecture,
”
Communications of the ACM, pages 850

857, 1995.
[BNS00] A. Bar

Noy, J. Naor and B. Schieber, “Pushing Dependent Data in Clients

Providers

Servers
Systems,” MOBICOM conference, 2000.
[BS99] A. Bar

Noy and Y. Shilo, “Optimal Broadcasting of Two File
s over an Asymmetric Channel,”
IEEE INFOCOM Conference, 1999.
[CK99] Y. D. Chung, M.

H. Kim, "QEM: A Scheduling Method for Wireless Broadcast Data," Proc
.
19
6
th
International Conference on Database Systems for
Advanced Applications(DASFAA), Hsinchu,
Taiwan,
April 1999.
[CHK99]
Y. C. Chehadeh, A. R. Hurson and M. Kavehrad,
"
Object Organization on a
Single
Broadcast Channel in the Mobile Computing Environment,
"
Multimedia Tools and Applications
,
Vol.
9, No. 1, July 1999.
[GSE94] J. Gray, P. Sundaresan, S. Engl
ert, K. Baclawski, and P. J. Weinberger,
“
Quickly Generating
Billion

Record Synthetic Databases,
”
Proc. ACM International Conference on Management of Data,
pages: 243

252, 1994.
[HC0
4
]
J.

L. Huang and M.

S. Chen, “Dependent Data Broadcasting for Unordered
Queries in a
Multiple Channel Mobile Environment,” IEEE Transactions on Knowledge and Data Engineering,
200
4
.
[HCH00] A.R. Hurson, Y.C. Chehadeh and J. Hannan,
“
Object Organization on Parallel Broadcast
Channels in a Global Information Sharing Environment,
”
Proc.
IEEE I
nternational
P
erformance
,
Computing
,
and
C
ommunications
C
onference, Feb.
2000
.
[HCP03]
J.

L. Huang, M.

S. Chen and W.

C. Peng, “
Broadcasting Dependent Data for Ord
ered
Queries without Replication in a Multi

Channel Mobile Environment
,” Proc. IEEE International
Conference on Data Engineering, Mar
.
5

8, 2003.
[HGL87] G. Herman, G. Gopal, K.C. Lee, and A. Weinrib,
“
The Datacycle Architecture for Very High
Throughput D
atabase Systems,
”
Proc. ACM International Conference on Management of Data, pages
97

103, 1987.
[HLC01] Chih

Hao Hsu
, Guanling Lee
and A.L.P. Chen, “A Near Optimal Algorithm for Generating
Broadcast Programs on
Multiple Channels
,
” ACM CIKM 2001(Tenth Inter
national Conference on
Information and
Knowledge Management)
.
[HLC02]
Chih

Hao Hsu,
Guanling Lee and Arbee L.P. Chen
,
“
Index and Data Allocation on Multiple
Broadcast Channels Considering Data Access Frequencies,
”
International conference on mobile data
m
anagement
, 200
2
.
[HV99] S. hameed and N. Vaidya,
“
Efficient Algorithms for Scheduling Data Broadcastation,
”
ACM/Baltzer Wireless Networks, 5(3):183

193,1999.
[IB93] T. Imielinski, B.R. Badrinath,
“
Data Management for Mobile Computing,
”
SIGMOD
RECORD, 22(1)
: 34

39, 1993.
[LC00] S.C. Lo and A. L. P. Chen,
“
Optimal Index and Data Allocation in Multiple Broadcast
Channels,
”
Proc. IEEE International Conference on Data Engineering, pages 293

302, 2000.
[LLC02]
Guanling Lee, S.C. Lo and A.L.P. Chen, "
Data Allocation on the Wireless Broadcast Channel
for Efficient Query Processing
," IEEE Trans. On Computers
Special Section on Data.Management
Systems and Mobile Computing, October 2002, volume 51, pp.1237~1252.
[LL03]
Guanling Lee, and Shou

Chih Lo, "
Broadcast Data Allocation for Efficient Access of Multiple
Data Items in Mobile Environments
," ACM/Baltzer Mobile Networks and Applications (MONET),
August 2003, Volume 8, pp.365

375.
20
[LYL02] Guanling Lee, Meng

Shin Yeh
,
Shou

Chih Lo
,
and Arbee L.P. Chen
,
“A Strategy for
Efficient Access of Multiple Data Items in Mobile Environments
,
”
International conference on mobile
data management
, 200
2
.
[PC00]
W.C. Peng and M.S. Chen,
“
Dynamic Gen
eration of Data Broadcast Programs for a Broadcast
Disk Array in a Mobile Computing Environment,
”
Proc. ACM Inter
national
Conference on
Information and Knowledge Management, Nov. 2000.
[PHO00] Kiran Prabhakra, Kien A. Hua, and JungHwan Oh,
“
Multi

Level Mul
ti

Channel Air Cache
Designs for Broadcasting in a Mobile Environment,
”
Proc. IEEE International Conference on Data
Engineering, pages 167

176, 2000.
[PS98] E. Pitoura and G. Samaras,
“
Data Management for Mobile Computing,
”
Kluwer Academic
Publishers, 1998
.
[SV96]
N. Shivakumar and S. Venkatasubramanian
,
“Energy

Efficient Indexing For Inform
a
tion
Dissemination In Wireless Systems,” ACM Journal of Wireless and Nomadic Application, 1996.
[TO98] K.L Tan and B.C. Ooi,
“
Batch Scheduling for Demand

driven Servers
in Wireless
Environment,
”
Information Sciences, 109:281

298, 1998.
[VH99] N. Vaidya and S. Hameed,
“
Scheduling data broadcast in asymmetric communication
environments,
”
ACM/Baltzer Wireless Networks, 5(3):171

182,1999.
[Won88] J.W. Wong,
“
Broadcast delive
ry,
”
Proc. of the IEEE, 76(12): 1566

1577, 1988.
[XZL03] J. L. Xu, B. Zheng, W.

C. Lee, and D. K. Lee. Energy Efficient Index for Querying
Location

Dependent Data in Mobile Broadcast Environments. In Proceedings of the 19th International
Conference on Data
Engineering, March 2003.
[YNO02] W.G.. Yee, S.B. Navathe, E. Omiecinski and C. Jermaine, “Efficient Data Allocation Over
Multiple Channels at
Broadcast Servers,” IEEE Trans. On Computer, 1231~1236, 2002.
[ZXL04] B. Zheng, J. Xu, W.

C. Lee, and D. L. Lee.
Energy

Conserving Air Indexes for Nearest
Neighbor Search.In Proceedings of the 9th International Conference on Extending Database
Technology, March 2004.
Comments 0
Log in to post a comment