Adopted from K.Sayood, “Introduction to Data Compression“, 4
edition, Morgan Kaufmann,2012
Ch. 5 Dictionary techniques
LZ, LZ 77 (or
LZ 1), LZ 78 (or LZ 2), LZ
Unix Compression Command
c, PNG, gzip and
Text Sources/ Computer Commands
( Sources that generate a relatively small number of patterns quite frequently.)
ion, Modem Communications, Image Compression.
incorporate structure in the data in order to increase Compression
2) Dynamic (Adaptive)
Commonly occurring patterns. Develop an index for these.
Most useful with sources that generate a relatively small number of patterns quite
as text sources and computer commands
class of frequently
occurring patterns (size of dictionary) must be much smaller than the number of all
Ex:Consider 4 character words, 3 character from lower case English alphabet (26
) one character from six punctuation marks(, ? . ! ; :)
Alphabet size = 32 (26 letters + 6 punctuation marks)
Number of character patterns = 32
Need 20 bits
bits/character) to code each
pattern. Assume 256 most likely
placed into a dictionary.
0 (In the dictionary) + 8 bits
pattern in the dictionary (Total 9 bits)
1(not in dictionary) + 20 bits for pattern (Total 21 bits)
p = probability of pattern from the dictionary
Ar. Number of bits/ pattern = R
9p + 21(1
p) = 21
For R< 20, p
20 = 21
12p = 1, p = 1/12
should be as large as possible. Carefully select pattern
that are most likely to
occur as entries in the dictionary.
Static approach: Dictionary developed before
Adaptive of Dynamic approach: Dictionary developed on the fly.
opriate when considerable prior knowledge about the source is available.
Ex. Student records, bank statements, credit card statements
Efficient for a
specific or data
based coding scheme is the
most efficient. The coding scheme designed for a specific application may not
work well for a different application.
airs of letters
ASC II characters
tatic dictionary technique that is less specific to a single
Ex 5.3.1/ p 119 (Source)
A sample dictionary
Dictionary designed for LaTex (Table 5.2) is not suitable for C
Technique (generating dictionary
) to adapt to source output characteristics.
These tables are different
dictionary based technique. (LZ 77)
WAN data communication
roducts use LZ 77 or LZ 78 algorithm (see table 7.4,
Hoffman, “Data compression in digital system
: Kluwer, 1995).
Publishing! Text, graphics and print ready images are compressed
with LZW and
other lossless algorithm
Ibid p. 292.
LZW algorithm decoding
Encoder output sequence
5 2 3 3 2 1 6 8 10 12 9 11 7 16
5 4 4 11 21 4 (see Table 5)
ecoder starts with the same initial dictionary as the encoder (Table 3)
Table 3 Initial LZW dictionary
Start with Index 5
, decode (Already in the dictionary)
Next decoder input is 2 (index) corresponds to ‘
’ and concatenate with our current pattern to form ‘
’. This is not in the
dictionary. Add this as 6
element of the dictionary and start a new pattern
beginning with ‘
The next fo
ur inputs 3 3 2 1
b b a b
The next input is 6
New pattern starts with
’ already in the dictionary)
’ with ‘
Continue the construction
of the LZW dictionary.
Situation where LZW decoding breaks down
: Initial dictionary for
Table 5.11: Final dictionary for
Encode the sequence
Transmitted sequence 1 2 3 5
Decoding: Begin with initial dictionary (Table 5.10).
(1, 2) decoded as (
) leads to 3
. Next input is 3
. Next is 4
). See table (5.14). Next input is 5. Not in the dictionary
5.5 Applications: LZW is one of the mo
st widely used compression algorithms.
Table 5.13: Constructing the fifth entry (stage one)
Table 5.14: Constructing the fifth entry (stage two)
Table 5.14: Completion
of the fifth entry.
See prob8/ p. 140
Program diffim, huff_enc
(Unix Compress Command)
LZW decoder has to con
an exception handler to handle the special case of
decoding an index that does not have a corresponding complete entry in the
(See Tables 4.7 and 4.8)
Comparison of GIF with arithmetic coding
of Pixel Difference
Developed by Compuserve Info Service to encode graphical
images (For details
GIF is very popular for encoding all kinds of images both
computer generated and natural images. Not very efficient to losslessly compress
images of natural sce
hotographs, satellite images
etc., (see table 5.16 above)
J. Ziv. and A. Lempel "A Universal Algorithm for Data Compression," IEEETrans. on
Information Theory, vol. IT
23, pp. 337
343, May 1977.
J. Ziv and A. Lempel "Compression of Individual Sequences via
IEEE Trans. on Information Theory, vol. IT
24, pp. 530
536, Sept. 1978.
J. A. Storer and T. G. Syzmanski, "Data Compression via Textual Substitution,"Journal
of the ACM, pp. 928
T. C. Bell "Better OPMIL Text Compression," IE
EE Trans. on Comm., vol. COM
1182, Dec. 1986.
T. A. Welch "A Technique for High
Performance Data Compression," IEEE Computer,
19, June 1984.
T. C. Bell, J
. G. Cleary, and I. H. Witten "Text Compression," Advanced
wood Cliffs, NJ: Prentice Hall, 1990.
M. Nelson "The Data Compression Book," New York: M&T Books, 1991.
G. Held and T. R. Marshall "Data Compression," New York: Wiley, third edition, 1991.
P. Marchand, "Graphics and GUI's with MATLAB," Boca Raton, FL: CRC
W. Kou, "Digital Image Compression Algorithms and Standards," Amsterdam, Kluwer
G. Louchard and W. Szpankowski, "Generalized Lempel
Ziv parsing scheme and its
preliminary analysis of the average profile," DCC '95 Data Compress
ion Conf., pp. ,
Snowbird, UT, March 1995.
R. Horspool, "The effect of non
greedy parsing Lempel
Ziv compression methods,"
DCC' 95 Data Compression Conf., pp. ,Snowbird, UT, March 1995.
G. Louchard and W. Szpankowski, "On the Average Redundancy Rate of the
Code," DCC '96, Data Compression Conf., Snowbird, UT, April 1996.
J. A. Storer, "Lossless Image C
ompression Using Generalized LZ1
DCC' 96, Data Compression Conf., UT, April 1996.
C. T. Chen and L. G. Chen, "A novel architecture f
Ziv based data
ICCE, Chicago, IL, June 1996.
D. Sheinwald, "On the Ziv
Lempel proofand related topics," Proc. IEEE, vol. 82, pp.
871, June 1994.
A. D. Wyner and J
. Ziv, "The sliding window Lempel
Ziv algorithm is
optimal," Proc. IEEE, vol. 82, pp. 872
877, June 1994.
Y. F. Hu and X. S. Wu, "The methods of improving the compression ratio ofLZ77 family
data compression algorithms," ICSP, Beijing, China, Oct. 1996.
V. G. Ruiz and I. Garcia, "A lossy data
compressor based on the LZW
algorithm,"ICSPAT 96, pp. 1002
1006, Boston, MA, Oct. 1996.
S. A. Savari, "Redundancy of the Lempel
Welch Code," Data Compression Conf.,
(DCC 97), Snowbird, UT, March 1997.
S. R. Kosaraju and G. Manzini, "Compression oflow e
ntropy strings with Lempel
algorithms," Compression and Complexity of Sequences 1997, Salerno, Italy,June 1997.
J. I. Lathrop and M. Strauss, "A universal upper bound on the performance of the
Zivalgorithm on maliciously
constructed data," Compre
ssion and Complexity
ofSequences 1997, Salerno, Italy, June 1997.
D. Greene et al, "A progressive Ziv
Lempel algorithm for imag
Compression andComplexity of Sequences 1997, Salerno, Italy, June 1997
M. Cohn and H. Helfgott, "Asymmetry in Zi
Lempel compression," Compression
andComplexity of sequences 1997, Salerno, Italy, June 1997.
e Agostino, "A parallel decoder f
or LZ2 compression using the ID
heuristic,"Compression and Complexity of sequences 1997, Salerno, Italy, June 1997.
. H. Wyman and P. Y. K. Cheung, "Bit plane differe
ZW for the compression of
video for variable bandwidth channels," IEEE ISCAS' 97, Hong Kong,June 1997.
C. Su, C
F. Yan and J
C. Yo, "Hardware efficient updating technique for LZW codec
ISCAS' 97, Hong Kong, June 1997.
C. T. Chen and L. G. Chen, "High
Speed VLSI design of the LZ
ISCAS'97, Hong Kong, June 1997.
G. Held, "Data and image compression: Tools and techniques," 4th Edition,
P. Tischer, "A modified LZW data compression scheme," Australian ComputerScience
Commun., vol. 9, pp. 262
Data compression in digital systems," New York, NY: Chapman &
D.J. Craft, "
ADLC and a pre
processor extension, BDL
C, provides ultra fast compression
mapped image data," Data Compression Conf., p.400, IEEE
Computer Society Press, 1995. (ADLC
Adaptive lossless data compression, BDLC
mapped lossless datacompression, an LZ77 variant).
da et al, "Multiple pattern matching in LZW compressed text," IEEE DCC Conf,
S. Even, "Four value adding algorithms," IEEE Spectrum, vol. 35, pp.33
38, May 1998.
C. Kieffer, T.H. Park and Y. Xu
, "Progressive lossless image coding via self
partitions," IEEE ICIP, pp. , Chicago, IL, Oct. 1998.
Ho Cheung, C. S
Wai and P. Lai
Man, " Predictive lossy LZSS algorithm for fidelity
constrainedimage coding," Intl. Forum cum Conf. on Info. Technology and Commun. at
the dawn of the new Mi
llennium, Bangkok, Thailand, Aug. 2000.
K. Lai and K
C. Chen, " A novel VLSI architecture for Lempel
Ziv based data
compression,"IEEE ISCAS, Geneva, Switzerland, May 2000.
L.P.Deutsch, "Deflate compressed data format specification," Request for Comments
(RFC), 1951, available in ftp
J. Miano, " Compressed image file formats: JPEG, PNG, GIF, XBM, BMP,"Addison
Wesley, 1999. (software on disk)
H.H. Shih, S.S.
Narayanan and C.
C. Jay Kuo, "Automatic main melody extraction from
MIDI files with a modified Lempel
Ziv algorithm," IEEE ISIMP 2001, Hong Kong, May
M. J. Weinberger and Ordentlich, “On
line decision making for a class of loss functions
ziv parsing”, DCC 2000, Snow Bird, UT March 2000,
and W. Szpankowski, “On the average redundancy rate of the Lempel
error protocol,” DCC 2000.
Data compression conference.;
S. De Agostino, “Work
optimal parallel decoders for LZ2 data compression,” DCC 2000.
N. J. Brittain and M. R. El
Grayscale true two
image compression,” JVCIR, vol. 18, pp 35
44, Feb 2007
J.D. Gibson et al, "Digital compression for multimedia," San Diego, CA: Academic
Press, 1998 (see Appendices E and F).
Aboy, R. Hornero, D.Abasalo, and D. Alvarez. Interpretation of Lempel
complexity measure in the context of biomedical s
ignal analysis. IEEE Transactions on
N. Radhakrishnan and B.N. Gangadhar. Estimating regularity in epileptic seizure time
series data. IEEE Engineering in Medicine and Biology Magazine,17:89
Zhang, R.J. Roy, and E.W. Jensen. EEG complexity as a measure of depth of
anesthesia for patients. IEEE Transactions on Biomedical Engineering,48(12):1424
Daniel Abasolo, Roberto Hornero, Carlos Gomez, Maria Garcia, and Miguel Lopez.
is of EEG background activity in Alzheimer’s disease patients with Lampel
complexity and central tendency measure. Medical Engineering Physics,28(4):315
H. Zhang, Y.Zhu, and Z. Wang. Complexity measure and complexity rate information
of ventricular tachycardia and fibrillation. Medical and Biological
Engineering amd Computing, 38:553
Text Compression, by T.C. Bel
l, J.G. Cleary, and I.H. Witten. Text Compression.
Advanced Reference Series.
Prentice Hall, Eaglewood Cliffs, New Jersey, 1990. This
an excellent exposition of dictionary
based coding techniques.
The Data Compression Book, by M.Nelson and J.
Gailley. The Data Compression
also does a good job of describing the
Lempel algorithms. There is also a
very nice description of some of the software implementation aspects.
, by G. Held and T.R. Marshall. Data Compression. Wiley, third
edition, 1991. This
contains a description of diagram coding under t
he name “diatomic
coding.” The book also includes BASIC programs that help in the design of dictionaries.
The PNG algorithm is described in a very accessible manner in
Compression,” by G. Roelofs
. PNG Lossless Compression. In K. Sayood, edito
, pages 371
390. Academic Press,2003
A more in
depth look at dictionary compression is provided in “Dictionary
Compression: An Algorithm Perspective,” by
S.C. Sahinalp and N.M. Rajpoot.
Compression: An Algorithmic Perspective. In K Sayood, editor
, pages 153
168. Academic Press