Ch. 5 Dictionary techniques

toughhawaiiΔίκτυα και Επικοινωνίες

26 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

130 εμφανίσεις

Adopted from K.Sayood, “Introduction to Data Compression“, 4
th

edition, Morgan Kaufmann,2012


Ch. 5 Dictionary techniques

LZ, LZ 77 (or
LZ 1), LZ 78 (or LZ 2), LZ
W

Lempel
-
Ziv
-
Welch algorithm

Applications

Unix Compression Command

V
-
42
bis






PK
Zip, Zip,

GIF







L Har
c, PNG, gzip and








ARJ

Text Sources/ Computer Commands

( Sources that generate a relatively small number of patterns quite frequently.)

Applications
:

Text Compress
ion, Modem Communications, Image Compression.


Techniques that

incorporate structure in the data in order to increase Compression

1) Static

2) Dynamic (Adaptive)

Commonly occurring patterns. Develop an index for these.

Most useful with sources that generate a relatively small number of patterns quite
frequently such

as text sources and computer commands

class of frequently
occurring patterns (size of dictionary) must be much smaller than the number of all
possible patterns.



DICTIONARY

Ex:Consider 4 character words, 3 character from lower case English alphabet (26
letter
s
) one character from six punctuation marks(, ? . ! ; :)

Alphabet size = 32 (26 letters + 6 punctuation marks)

Number of character patterns = 32
4

= 2
20

= 1048576

Need 20 bits
(5

bits/character) to code each

pattern. Assume 256 most likely

patterns
placed into a dictionary.


1
-
bit flag

0 (In the dictionary) + 8 bits

for
pattern in the dictionary (Total 9 bits)

1(not in dictionary) + 20 bits for pattern (Total 21 bits)

p = probability of pattern from the dictionary

Ar. Number of bits/ pattern = R

R =
9p + 21(1
-
p) = 21
-
12p, (5.1)

For R< 20, p

0.084

20 = 21
-
12p

12p = 1, p = 1/12


0.084

p

should be as large as possible. Carefully select pattern
s

that are most likely to
occur as entries in the dictionary.

Static approach: Dictionary developed before
encoding

Adaptive of Dynamic approach: Dictionary developed on the fly.




5.3
Static Dictionary

Most appr
opriate when considerable prior knowledge about the source is available.

Ex. Student records, bank statements, credit card statements

Efficient for a
specific application

Application
-
specific or data
-
specific static
-
dictionary
-
based coding scheme is the
most efficient. The coding scheme designed for a specific application may not
work well for a different application.


5.3.1
Di
gram Coding

Static Diction
ary Coding.

Di
grams
: p
airs of letters

ASC II characters

Di
gram Coding
: s
tatic dictionary technique that is less specific to a single
application.


Ex 5.3.1/ p 119 (Source)

5
-
letter alphabet
A

=
{a,b,c,d,r
}

Encode ‘
abracadabr
a’

Table 5.1
:

A sample dictionary

Code

Entry

Code

Entry

000

a

100

r

001

b

101

ab

010

c

110

ac

011

d

111

ad



Add 101100110111101100000
























Dictionary designed for LaTex (Table 5.2) is not suitable for C
programs.

nl =
new line
= space

Technique (generating dictionary
) to adapt to source output characteristics.

Table 5
-
2



Table 5.3

(
Latex
document


C
-
programs

Ch. 5)




These tables are different
.


5.4 Adaptive

dictionary based technique. (LZ 77)

Lempel
-
Ziv

1977
-
LZ1

Lempel
-
Ziv

1978
-
LZ2

Lempel
-
Ziv
-
Welch
-

LZW













WAN data communication
p
roducts use LZ 77 or LZ 78 algorithm (see table 7.4,
p. 186,
Hoffman, “Data compression in digital system
s
: Kluwer, 1995).

Publishing! Text, graphics and print ready images are compressed
with LZW and
other lossless algorithm
s
Ibid p. 292.




Ex. 5.4.4

LZW algorithm decoding

Encoder output sequence

5 2 3 3 2 1 6 8 10 12 9 11 7 16

5 4 4 11 21 4 (see Table 5)

D
ecoder starts with the same initial dictionary as the encoder (Table 3)

Table 3 Initial LZW dictionary

Index

Entry

Index

Entry

1


6

wa

2

a

7

ab

3

b

8

bb

4

o

9

ba

5

w

10

a
b


Start with Index 5

corresponds to
w
, decode (Already in the dictionary)

Next decoder input is 2 (index) corresponds to ‘
a


Decode ‘
a
’ and concatenate with our current pattern to form ‘
wa
’. This is not in the
dictionary. Add this as 6
th

element of the dictionary and start a new pattern
beginning with ‘
a


The next fo
ur inputs 3 3 2 1

Corresponds to
b b a b

These generate
























The next input is 6
wa

Concatenate
b

with
w

to form
bw

(11)

New pattern starts with
w

(‘
wa
’ already in the dictionary)

Index 8
bb

Concatenate ‘
wa
’ with ‘
b
’ to
wab
(12)



Continue the construction
(decoding)
of the LZW dictionary.



Situation where LZW decoding breaks down

Table 5.10
: Initial dictionary for
abababab

Index

Entry

1

a

2

b


Table 5.11: Final dictionary for
abababab

Index


Entry

Index

Entry

1

a

9

ababa

2

b

10

ababab

3

ab

11

babab

4

ba

12

bababa

5

aba

13

abababa

6

abab

14

abababab

7

b
ab

15

bababab

8

baba




Source alphabet
A

=
{a,b}

Encode the sequence
ababababab

-------

Transmitted sequence 1 2 3 5
--------

Decoding: Begin with initial dictionary (Table 5.10).

(1, 2) decoded as (
a,b
) leads to 3
rd

entry
ab
. Next input is 3
(
gives
ab
)
. Next is 4
(gives
ba
). See table (5.14). Next input is 5. Not in the dictionary


5.5 Applications: LZW is one of the mo
st widely used compression algorithms.


Table 5.13: Constructing the fifth entry (stage one)

Index

Entry

1

a

2

b

3

a
b

4

b
a

5

a…


Table 5.14: Constructing the fifth entry (stage two)

Index

Entry

1

a

2

b

3

a
b

4

b
a

5

a
b



Table 5.14: Completion
of the fifth entry.

Index

Entry

1

a

2

b

3

a
b

4

b
a

5

a
ba

6

a…


See prob8/ p. 140

Program diffim, huff_enc

(Compress command)

(Unix Compress Command)

LZW decoder has to con
t
ain

an exception handler to handle the special case of
decoding an index that does not have a corresponding complete entry in the
decoder dictionary.

(See Tables 4.7 and 4.8)

Table 5.16
:
Comparison of GIF with arithmetic coding

Image

GIF

Arithmetic

Coding
of
Pixel Values

Arithmetic

Coding
of Pixel Difference

Sena

51,085

53,431

31,847

Sensin

60,649

58,306

37,126

Earth

34,276

38,248

32,137

Omaha

61,580

56,061

51,393


5.5.2
GIF

(
Image Compression)

Developed by Compuserve Info Service to encode graphical
images (For details
see page
s

151, 152
)
.

GIF is very popular for encoding all kinds of images both
computer generated and natural images. Not very efficient to losslessly compress
images of natural sce
nes,p
hotographs, satellite images

etc., (see table 5.16 above)







References

1.

J. Ziv. and A. Lempel "A Universal Algorithm for Data Compression," IEEETrans. on
Information Theory, vol. IT
-
23, pp. 337
-
343, May 1977.

2.

J. Ziv and A. Lempel "Compression of Individual Sequences via
Variable
-
RateCoding,"
IEEE Trans. on Information Theory, vol. IT
-
24, pp. 530
-
536, Sept. 1978.

3.

J. A. Storer and T. G. Syzmanski, "Data Compression via Textual Substitution,"Journal
of the ACM, pp. 928
-
951,1982.

4.

T. C. Bell "Better OPMIL Text Compression," IE
EE Trans. on Comm., vol. COM
-
34, pp.
1176
-
1182, Dec. 1986.

5.

T. A. Welch "A Technique for High
-
Performance Data Compression," IEEE Computer,
pp. 8
-
19, June 1984.

6.

T. C. Bell, J
. G. Cleary, and I. H. Witten "Text Compression," Advanced
Reference
Series.
Engle
wood Cliffs, NJ: Prentice Hall, 1990.

7.

M. Nelson "The Data Compression Book," New York: M&T Books, 1991.

8.

G. Held and T. R. Marshall "Data Compression," New York: Wiley, third edition, 1991.

9.

P. Marchand, "Graphics and GUI's with MATLAB," Boca Raton, FL: CRC

Press, 1996.

10.

W. Kou, "Digital Image Compression Algorithms and Standards," Amsterdam, Kluwer
Academic, 1995.

11.

G. Louchard and W. Szpankowski, "Generalized Lempel
-
Ziv parsing scheme and its
preliminary analysis of the average profile," DCC '95 Data Compress
ion Conf., pp. ,
Snowbird, UT, March 1995.

12.

R. Horspool, "The effect of non
-
greedy parsing Lempel
-
Ziv compression methods,"
DCC' 95 Data Compression Conf., pp. ,Snowbird, UT, March 1995.

13.

G. Louchard and W. Szpankowski, "On the Average Redundancy Rate of the

Lempel
-
Ziv
Code," DCC '96, Data Compression Conf., Snowbird, UT, April 1996.

14.

J. A. Storer, "Lossless Image C
ompression Using Generalized LZ1
-
Type Methods,"
DCC' 96, Data Compression Conf., UT, April 1996.

15.

C. T. Chen and L. G. Chen, "A novel architecture f
or Lempel
-
Ziv based data
c
ompression,"
IEEE
ICCE, Chicago, IL, June 1996.

16.

D. Sheinwald, "On the Ziv
-
Lempel proofand related topics," Proc. IEEE, vol. 82, pp.
866
-
871, June 1994.

17.

A. D. Wyner and J
. Ziv, "The sliding window Lempel
-
Ziv algorithm is
asymptotic
ally
optimal," Proc. IEEE, vol. 82, pp. 872
-
877, June 1994.

18.

Y. F. Hu and X. S. Wu, "The methods of improving the compression ratio ofLZ77 family
data compression algorithms," ICSP, Beijing, China, Oct. 1996.

19.

V. G. Ruiz and I. Garcia, "A lossy data
compressor based on the LZW
algorithm,"ICSPAT 96, pp. 1002
-
1006, Boston, MA, Oct. 1996.

20.

S. A. Savari, "Redundancy of the Lempel
-
Ziv
-
Welch Code," Data Compression Conf.,
(DCC 97), Snowbird, UT, March 1997.

21.

S. R. Kosaraju and G. Manzini, "Compression oflow e
ntropy strings with Lempel
Z
iv
algorithms," Compression and Complexity of Sequences 1997, Salerno, Italy,June 1997.

22.

J. I. Lathrop and M. Strauss, "A universal upper bound on the performance of the
Lempel
-
Zivalgorithm on maliciously
-
constructed data," Compre
ssion and Complexity
ofSequences 1997, Salerno, Italy, June 1997.

23.

D. Greene et al, "A progressive Ziv
-
Lempel algorithm for imag
e

compression,"
Compression andComplexity of Sequences 1997, Salerno, Italy, June 1997
.

24.

M. Cohn and H. Helfgott, "Asymmetry in Zi
v
-
Lempel compression," Compression
andComplexity of sequences 1997, Salerno, Italy, June 1997.

25.

S. D
e Agostino, "A parallel decoder f
or LZ2 compression using the ID

update
heuristic,"Compression and Complexity of sequences 1997, Salerno, Italy, June 1997.

26.

R
. H. Wyman and P. Y. K. Cheung, "Bit plane differe
ntial L
ZW for the compression of
video for variable bandwidth channels," IEEE ISCAS' 97, Hong Kong,June 1997.

27.

C. Su, C
-
F. Yan and J
-
C. Yo, "Hardware efficient updating technique for LZW codec
design," IEEE

ISCAS' 97, Hong Kong, June 1997.

28.

C. T. Chen and L. G. Chen, "High
-
Speed VLSI design of the LZ
-
based
datacompression,"

IEEE

ISCAS'97, Hong Kong, June 1997.

29.

G. Held, "Data and image compression: Tools and techniques," 4th Edition,

New York,
NY:
Wiley, 1996.

30.

P. Tischer, "A modified LZW data compression scheme," Australian ComputerScience
Commun., vol. 9, pp. 262
-
272, 1987.

31.

R. Hoffman,

"
Data compression in digital systems," New York, NY: Chapman &
Hall,1997.

32.

D.J. Craft, "
ADLC and a pre
-
processor extension, BDL
C, provides ultra fast compression
for general
-
purpose bit
-
mapped image data," Data Compression Conf., p.400, IEEE
Computer Society Press, 1995. (ADLC
-

Adaptive lossless data compression, BDLC
-

Bit
-
mapped lossless datacompression, an LZ77 variant).

33.

T. Ki
da et al, "Multiple pattern matching in LZW compressed text," IEEE DCC Conf,
UT,Mar. 1998.

34.

S. Even, "Four value adding algorithms," IEEE Spectrum, vol. 35, pp.33
-
38, May 1998.

35.

J.

C. Kieffer, T.H. Park and Y. Xu
, "Progressive lossless image coding via self
referential
partitions," IEEE ICIP, pp. , Chicago, IL, Oct. 1998.

36.

C
-
Ho Cheung, C. S
-
Wai and P. Lai
-
Man, " Predictive lossy LZSS algorithm for fidelity
constrainedimage coding," Intl. Forum cum Conf. on Info. Technology and Commun. at
the dawn of the new Mi
llennium, Bangkok, Thailand, Aug. 2000.

37.

Y
-
K. Lai and K
-
C. Chen, " A novel VLSI architecture for Lempel
-
Ziv based data
compression,"IEEE ISCAS, Geneva, Switzerland, May 2000.

38.

L.P.Deutsch, "Deflate compressed data format specification," Request for Comments
(RFC), 1951, available in ftp
ftp://ftp.uu.netlpub/archiving/zip/doc/
1996.

39.

J. Miano, " Compressed image file formats: JPEG, PNG, GIF, XBM, BMP,"Addison
Wesley, 1999. (software on disk)

40.

H.H. Shih, S.S.

Narayanan and C.
-
C. Jay Kuo, "Automatic main melody extraction from
MIDI files with a modified Lempel
-
Ziv algorithm," IEEE ISIMP 2001, Hong Kong, May
2001.

41.

M. J. Weinberger and Ordentlich, “On
-
line decision making for a class of loss functions
via Lempel
-
ziv parsing”, DCC 2000, Snow Bird, UT March 2000,
http://www.cs.brandeis.edu/~dcc

42.

Y. Reznik

and W. Szpankowski, “On the average redundancy rate of the Lempel
-
ziv code
with K
-
error protocol,” DCC 2000.

Data compression conference.;

43.

S. De Agostino, “Work
-
optimal parallel decoders for LZ2 data compression,” DCC 2000.

44.

N. J. Brittain and M. R. El
-
Sakka, “
Grayscale true two
-
dimensional dict
i
onary based

image compression,” JVCIR, vol. 18, pp 35
-
44, Feb 2007
. (2D
-
LZ).

45.

J.D. Gibson et al, "Digital compression for multimedia," San Diego, CA: Academic
Press, 1998 (see Appendices E and F).

46.

M.
Aboy, R. Hornero, D.Abasalo, and D. Alvarez. Interpretation of Lempel
-
Ziv
complexity measure in the context of biomedical s
ignal analysis. IEEE Transactions on
Biomedical Engineering,53(11):2282
-
2288,Nov.2006.

47.

N. Radhakrishnan and B.N. Gangadhar. Estimating regularity in epileptic seizure time
-
series data. IEEE Engineering in Medicine and Biology Magazine,17:89
-
94,1998.

48.

X.
-
S.
Zhang, R.J. Roy, and E.W. Jensen. EEG complexity as a measure of depth of
anesthesia for patients. IEEE Transactions on Biomedical Engineering,48(12):1424
-
1433,
Dec.2001.

49.

Daniel Abasolo, Roberto Hornero, Carlos Gomez, Maria Garcia, and Miguel Lopez.
Analys
is of EEG background activity in Alzheimer’s disease patients with Lampel
-
Ziv
complexity and central tendency measure. Medical Engineering Physics,28(4):315
-
322,2006.

50.

H. Zhang, Y.Zhu, and Z. Wang. Complexity measure and complexity rate information
based de
tection

of ventricular tachycardia and fibrillation. Medical and Biological
Engineering amd Computing, 38:553
-
557,2000.











Further Reading

1.

Text Compression, by T.C. Bel
l, J.G. Cleary, and I.H. Witten. Text Compression.
Advanced Reference Series.
Prentice Hall, Eaglewood Cliffs, New Jersey, 1990. This
provides

an excellent exposition of dictionary
-
based coding techniques.

2.

The Data Compression Book, by M.Nelson and J.
-
L.
Gailley. The Data Compression
Book. This
also does a good job of describing the
Ziv
-
Lempel algorithms. There is also a
very nice description of some of the software implementation aspects.

3.

Data Compression
, by G. Held and T.R. Marshall. Data Compression. Wiley, third
edition, 1991. This
contains a description of diagram coding under t
he name “diatomic
coding.” The book also includes BASIC programs that help in the design of dictionaries.

4.

The PNG algorithm is described in a very accessible manner in

PNG
Lossless
Compression,” by G. Roelofs
. PNG Lossless Compression. In K. Sayood, edito
r,
Lossless
Compression

Handbook
, pages 371
-
390. Academic Press,2003

.

5.

A more in
-
depth look at dictionary compression is provided in “Dictionary
-

Based Data
Compression: An Algorithm Perspective,” by
S.C. Sahinalp and N.M. Rajpoot.
Dictionary
-
Based Data
Compression: An Algorithmic Perspective. In K Sayood, editor
,
Lossless

Compression Handbook
, pages 153
-
168. Academic Press
, 2003
.