Image Compression from DCT to Wavelets : A Review
by Subhasis Saha
Introduction
Uncompressed multimedia (graphics, audio and video) data requires considerable
storage capacity and transmission bandwidth. Despite rapid progress in
mass-storage density, processor speeds, and digital communication system
performance, demand for data storage capacity and data-transmission bandwidth
continues to outstrip the capabilities of available technologies. The recent
growth of data intensive multimedia-based web applications have not only
sustained the need for more efficient ways to encode signals and images
but have made compression of such signals central to storage and communication
technology.
For still image compression, the `Joint Photographic Experts Group'
or JPEG[19] standard has been established by ISO (International
Standards Organization) and IEC (International Electro-Technical Commission).
The performance of these coders generally degrades at low bit-rates mainly
because of the underlying block-based Discrete Cosine Transform (DCT)[20]
scheme. More recently, the wavelet transform has emerged as a cutting edge
technology, within the field of image compression. Wavelet-based coding[29]
provides substantial improvements in picture quality at higher compression
ratios.
Over the past few years, a variety of powerful and sophisticated wavelet-based
schemes for image compression, as discussed later, have been developed
and implemented. Because of the many advantages, the top contenders in
the upcoming JPEG-2000 standard[14] are all wavelet-based
compression algorithms.
The goal of this article is two-fold. First, for readers new to compression,
we briefly review some basic concepts on image compression and present
a short overview of the DCT-based JPEG standard and the more popular wavelet-based
image coding schemes. Second, for more advanced readers, we mention a few
sophisticated, modern, and popular wavelet-based techniques including one
we are currently pursuing. The goal of the upcoming JPEG-2000 image compression
standard, which is going to be wavelet-based, is briefly presented. For
those who are curious, a number of useful references
are given. There is also abundance of information about image compression
on the Internet.
Background
Why do we need compression?
The figures in Table 1 show the qualitative transition
from simple text to full-motion video data and the disk space, transmission
bandwidth, and transmission time needed to store and transmit such uncompressed
data.
Table 1 Multimedia data types and uncompressed
storage space, transmission bandwidth, and transmission time required.
The prefix kilo- denotes a factor of 1000 rather than 1024.
|
Multimedia Data
|
Size/Duration
|
Bits/Pixel or
Bits/Sample
|
Uncompressed
Size
(B for bytes)
|
Transmission
Bandwidth
(b for bits)
|
Transmission
Time (using a
28.8K Modem)
|
|
A page of text
|
11'' x 8.5''
|
Varying resolution
|
4-8 KB
|
32-64 Kb/page
|
1.1 - 2.2 sec
|
|
Telephone quality speech
|
10 sec
|
8 bps
|
80 KB
|
64 Kb/sec
|
22.2 sec
|
|
Grayscale Image
|
512 x 512
|
8 bpp
|
262 KB
|
2.1 Mb/image
|
1 min 13 sec
|
|
Color Image
|
512 x 512
|
24 bpp
|
786 KB
|
6.29 Mb/image
|
3 min 39 sec
|
|
Medical Image
|
2048 x 1680
|
12 bpp
|
5.16 MB
|
41.3 Mb/image
|
23 min 54 sec
|
|
SHD Image
|
2048 x 2048
|
24 bpp
|
12.58 MB
|
100 Mb/image
|
58 min 15 sec
|
|
Full-motion Video
|
640 x 480, 1 min
(30 frames/sec)
|
24 bpp
|
1.66 GB
|
221 Mb/sec
|
5 days 8 hrs
|
The examples above clearly illustrate the need for sufficient storage
space, large transmission bandwidth, and long transmission time for image,
audio, and video data. At the present state of technology, the only solution
is to compress multimedia data before its storage and transmission, and
decompress it at the receiver for play back. For example, with a compression
ratio of 32:1, the space, bandwidth, and transmission time requirements
can be reduced by a factor of 32, with acceptable quality.
What are the principles behind compression?
A common characteristic of most images is that the neighboring pixels are
correlated and therefore contain redundant information. The
foremost task then is to find less correlated representation of the image. Two fundamental components of compression are redundancy and irrelevancy reduction. Redundancy reduction aims at removing duplication from the signal source (image/video). Irrelevancy reduction omits parts of the signal that will not be noticed by the signal receiver, namely the Human Visual System (HVS). In general, three types of redundancy can be identified:
-
Spatial Redundancy or correlation between neighboring pixel values.
-
Spectral Redundancy or correlation between different color planes
or spectral bands.
-
Temporal Redundancy or correlation between adjacent frames in a
sequence of images (in video applications).
Image compression research aims at reducing the number of bits needed to
represent an image by removing the spatial and spectral redundancies as
much as possible. Since we will focus only on still image compression,
we will not worry about temporal redundancy.
What are the different classes of compression techniques?
Two ways of classifying compression techniques are mentioned here.
(a) Lossless vs. Lossy compression: In lossless compression schemes,
the reconstructed image, after compression, is numerically identical to the
original image. However lossless compression can only a achieve a modest amount of compression. An image reconstructed following lossy compression contains degradation relative to the original. Often this is because the compression scheme completely discards redundant information.
However, lossy schemes are capable of achieving much higher compression. Under normal viewing conditions, no visible loss is perceived (visually lossless).
(b) Predictive vs. Transform coding: In predictive coding, information already sent or available is used to predict future values, and the difference
is coded. Since this is done in the image or spatial domain, it is relatively
simple to implement and is readily adapted to local image characteristics.
Differential Pulse Code Modulation (DPCM) is one particular example of
predictive coding. Transform coding, on the other hand, first transforms
the image from its spatial domain representation to a different type of
representation using some well-known transform and then codes the transformed
values (coefficients). This method provides greater data compression compared
to predictive methods, although at the expense of greater computation.
What does a typical image coder look like?
A typical lossy image compression system is shown in Fig.
1. It consists of three closely connected components namely (a) Source
Encoder (b) Quantizer, and (c) Entropy Encoder. Compression is accomplished
by applying a linear transform to decorrelate the image data, quantizing
the resulting transform coefficients, and entropy coding the quantized
values.
Fig. 1. A Typical Lossy Signal/Image Encoder
Source Encoder (or Linear Transformer)
Over the years, a variety of linear transforms have been developed
which include Discrete Fourier Transform (DFT), Discrete Cosine Transform
(DCT) [1], Discrete Wavelet Transform (DWT)[29]
and many more, each with its own advantages and disadvantages.
Quantizer
A quantizer simply reduces the number of bits needed to store the transformed
coefficients by reducing the precision of those values. Since this is a
many-to-one mapping, it is a lossy process and is the main source of compression
in an encoder. Quantization can be performed on each individual coefficient,
which is known as Scalar Quantization (SQ). Quantization can also be performed
on a group of coefficients together, and this is known as Vector Quantization
(VQ). Both uniform and non-uniform quantizers can be used depending on
the problem at hand. For an analysis on different quantization schemes,
see [12].
Entropy Encoder
An entropy encoder further compresses the quantized values losslessly
to give better overall compression. It uses a model to accurately determine
the probabilities for each quantized value and produces an appropriate
code based on these probabilities so that the resultant output code stream
will be smaller than the input stream. The most commonly used entropy encoders
are the Huffman encoder and the arithmetic encoder, although for applications
requiring fast execution, simple run-length encoding (RLE) has proven very
effective. An overview on various entropy encoding techniques can be found
in [12,18].
It is important to note that a properly designed quantizer and entropy
encoder are absolutely necessary along with optimum signal transformation
to get the best possible compression.
JPEG : DCT-Based Image Coding Standard
The idea of compressing an image is not new. The discovery of DCT in 1974
[1] is an important achievement for the research community
working on image compression. The DCT can be regarded as a discrete-time
version of the Fourier-Cosine series. It is a close relative of DFT, a
technique for converting a signal into elementary frequency components.
Thus DCT can be computed with a Fast Fourier Transform (FFT) like algorithm
in O(n log n) operations. Unlike DFT, DCT is real-valued and provides
a better approximation of a signal with fewer coefficients. The DCT of
a discrete signal x(n), n=0, 1, .. , N-1 is defined as:
where, C(u) = 0.707 for u = 0 and
= 1
otherwise.
An excellent analysis of DCT and related transforms and their applications
can be found in [26].
In 1992, JPEG established the first international standard for still image compression where the encoders and decoders are DCT-based. The JPEG standard
specifies three modes namely sequential, progressive, and hierarchical for
lossy encoding, and one mode of lossless encoding. The `baseline JPEG coder'
[19,30] which is the sequential
encoding in its simplest form, will be briefly discussed here. Fig. 2(a)
and 2(b) show the key processing steps in such an
encoder and decoder for grayscale images. Color image compression can be
approximately regarded as compression of multiple grayscale images, which
are either compressed entirely one at a time, or are compressed by alternately
interleaving 8x8 sample blocks from each in turn. In this article, we focus
on grayscale images only.
Fig. 2(a) JPEG Encoder Block Diagram
Fig. 2(b) JPEG Decoder Block Diagram
The DCT-based encoder can be thought of as essentially compression of
a stream of 8x8 blocks of image samples. Each 8x8 block makes its way through
each processing step, and yields output in compressed form into the data
stream. Because adjacent image pixels are highly correlated, the `forward'
DCT (FDCT) processing step lays the foundation for achieving data compression
by concentrating most of the signal in the lower spatial frequencies. For
a typical 8x8 sample block from a typical source image, most of the spatial
frequencies have zero or near-zero amplitude and need not be encoded. In
principle, the DCT introduces no loss to the source image samples; it merely
transforms them to a domain in which they can be more efficiently encoded.
After output from the FDCT, each of the 64 DCT coefficients is uniformly
quantized in conjunction with a carefully designed 64-element Quantization
Table (QT)[19]. At the decoder, the quantized values
are multiplied by the corresponding QT elements to recover the original
unquantized values. After quantization, all of the quantized coefficients
are ordered into the "zig-zag" sequence as shown in Fig.
3. This ordering helps to facilitate entropy encoding by placing
low-frequency non-zero coefficients before high-frequency coefficients.
The DC coefficient, which contains a significant fraction of the total
image energy, is differentially encoded.
Fig. 3. Zig-Zag sequence
Entropy Coding (EC) achieves additional compression losslessly by encoding
the quantized DCT coefficients more compactly based on their statistical
characteristics. The JPEG proposal specifies both Huffman coding and arithmetic
coding. The baseline sequential codec uses Huffman coding,
but codecs with both methods are specified for all modes of operation.
Arithmetic coding, though more complex, normally achieves 5-10% better
compression than Huffman coding.
Wavelets and Image Compression
What is a Wavelet Transform ?
Wavelets are functions defined over a finite interval and having an average
value of zero. The basic idea of the wavelet transform is to represent
any arbitrary function (t) as a superposition of a set of such
wavelets or basis functions. These basis functions or baby wavelets are
obtained from a single prototype wavelet called the mother wavelet, by
dilations or contractions (scaling) and translations (shifts). The Discrete
Wavelet Transform of a finite length signal x(n) having N
components, for example, is expressed by an N x N matrix.
For a simple and excellent introduction to wavelets, see [6].
For a thorough analysis and applications of wavelets and filterbanks, see
[24,29].
Why Wavelet-based Compression?
Despite all the advantages of JPEG compression schemes based on DCT namely
simplicity, satisfactory performance, and availability of special purpose
hardware for implementation, these are not without their shortcomings.
Since the input image needs to be ``blocked,'' correlation across the block
boundaries is not eliminated. This results in noticeable and annoying ``blocking
artifacts'' particularly at low bit rates as shown in Fig.
4. Lapped Orthogonal Transforms (LOT) [16] attempt
to solve this problem by using smoothly overlapping blocks. Although blocking
effects are reduced in LOT compressed images, increased computational complexity
of such algorithms do not justify wide replacement of DCT by LOT.
|
|
|
(a)
|
(b)
|
Fig. 4(a) Original Lena Image, and (b) Reconstructed
Lena with DC component only, to show blocking artifacts
Over the past several years, the wavelet transform has gained widespread
acceptance in signal processing in general, and in image compression research
in particular. In many applications wavelet-based schemes (also referred
as subband coding) outperform other coding schemes like the one based on
DCT. Since there is no need to block the input image and its basis functions
have variable length, wavelet coding schemes at higher compression avoid
blocking artifacts. Wavelet-based coding is more robust under transmission
and decoding errors, and also facilitates progressive transmission of images.
In addition, they are better matched to the HVS characteristics.
Because of their inherent multiresolution nature[17],
wavelet coding schemes are especially suitable for applications where scalability
and tolerable degradation are important.
Subband Coding
The fundamental concept behind Subband Coding (SBC) is to split up the
frequency band of a signal (image in our case) and then to code each subband
using a coder and bit rate accurately matched to the statistics of the
band. SBC has been used extensively first in speech coding [10]
and later in image coding [31] because of its inherent
advantages namely variable bit assignment among the subbands as well as coding
error confinement within the subbands.
|
|
|
(a)
|
(b)
|
Fig. 5(a) Separable 4-subband Filterbank, and 5(b) Partition
of the Frequency Domain
Woods and O'Neil [31] used a separable combination
of one-dimensional Quadrature Mirror Filterbanks (QMF) to perform a 4-band
decomposition by the row-column approach as shown in Fig.
5(a). Corresponding division of the frequency spectrum is shown in
Fig.
5(b). The process can be iterated to obtain higher band decomposition
filter trees. At the decoder, the subband signals are decoded, upsampled
and passed through a bank of synthesis filters and properly summed up to
yield the reconstructed image. Interested readers may look into a number
of books and papers [24,27,28,29]
dealing with single and multi-dimensional QMF design and applications.
From Subband to Wavelet Coding
Over the years, there have been many efforts leading to improved and efficient
design of filterbanks and subband coding techniques. Since 1990, methods
very similar and closely related to subband coding have been proposed by
various researchers under the name of Wavelet Coding (WC) using
filters specifically designed for this purpose. Such filters must meet
additional and often conflicting requirements [24,29].
These include short impulse response of the analysis filters to preserve
the localization of image features as well as to have fast computation,
short impulse response of the synthesis filters to prevent spreading of
artifacts (ringing around edges) resulting from quantization errors, and
linear phase of both types of filters since nonlinear phase introduce unpleasant
waveform distortions around edges. Orthogonality is another useful requirement
since orthogonal filters, in addition to preservation of energy, implement
a unitary transform between the input and the subbands. But, as in the
case of 1-D, in two-band Finite Impulse Response (FIR) systems linear phase
and orthogonality are mutually exclusive, and so orthogonality is sacrificed
to achieve linear phase.
Link between Wavelet Transform and Filterbank
Construction of orthonormal families of wavelet basis functions can be
carried out in continuous time. However, the same can also be derived by
starting from discrete-time filters. Daubechies [9]
was the first to discover that the discrete-time filters or QMFs can be
iterated and under certain regularity conditions will lead to continuous-time
wavelets. This is a very practical and extremely useful wavelet decomposition
scheme, since FIR discrete-time filters can be used to implement them.
It follows that the orthonormal bases in [9] correspond
to a subband coding scheme with exact reconstruction property, using the
same FIR filters for reconstruction as for decomposition. So, subband coding
developed earlier is in fact a form of wavelet coding in disguise. Wavelets
did not gain popularity in image coding until Daubechies established this
link in late 1980s. Later a systematic way of constructing a family of
compactly supported biorthogonal wavelets was developed by Cohen, Daubechies,
and Feauveau (CDF) [7]. Although the design and choice
of various filters and the construction of different wavelets from the
iteration of such filters are very important, it is beyond the scope of
this article.
An Example of Wavelet Decomposition
There are several ways wavelet transforms can decompose
a signal into various subbands. These include uniform decomposition, octave-band
decomposition, and adaptive or wavelet-packet decomposition [29].
Out of these, octave-band decomposition is the most widely used. This is
a non-uniform band splitting method that decomposes the lower frequency
part into narrower bands and the high-pass output at each level is left
without any further decomposition. Fig. 6(a)
shows the various subband images of a 3-level octave-band decomposed Lena
using the popular CDF-9/7 [7] biorthogonal wavelet.
|
|
|
(a)
|
(b)
|
Fig. 6(a) Three level octave-band decomposition of Lena image,
and (b) Spectral decomposition and ordering.
Most of the subband and wavelet coding schemes can also be described
in terms of the general framework depicted as in Fig. 1. The main difference
from the JPEG standard is the use of DWT rather than DCT. Also, the image need
not be split into 8 x 8 disjoint blocks. Of course, many enhancements have
been made to the standard quantization and encoding techniques to take
advantage of how the wavelet transforms works on an image and the properties
and statistics of transformed coefficients so generated. These will be
discussed next.
Advanced Wavelet Coding Schemes
A. Recent Developments in Subband and Wavelet Coding
The interplay between the three components of any image coder cannot be
over-emphasized since a properly designed quantizer and entropy encoder
are absolutely necessary along with optimum signal transformation to get
the best possible compression. Many enhancements have been made to the standard
quantizers and encoders to take advantage of how the wavelet transform
works on an image, the properties of the HVS, and the statistics of transformed
coefficients. A number of more sophisticated variations of the standard
entropy encoders have also been developed. These include Q, QM, ELS, Z,
and ZP coders [3,25]. These have
lead to improved results in terms of lower bit rates for a required image
quality and better image quality for a given bit rate.
Over the past few years, a variety of novel and sophisticated wavelet-based
image coding schemes have been developed. These include EZW[23],
SPIHT[22], SFQ[32], CREW[2],
EPWIC[4], EBCOT[25], SR[26],
Second Generation Image Coding[11], Image Coding using
Wavelet Packets[8], Wavelet Image Coding using VQ[12],
and Lossless Image Compression using Integer Lifting[5]. This list is by no means exhaustive and many more such
innovative techniques are being developed as this article is written. We
will briefly discuss a few of these interesting algorithms here.
1. Embedded Zerotree Wavelet (EZW) Compression
In octave-band wavelet decomposition, shown in Fig. 7(a), each coefficient
in the high-pass bands of the wavelet transform has four coefficients corresponding
to its spatial position in the octave band above in frequency. Because
of this very structure of the decomposition, it probably needed a smarter
way of encoding its coefficients to achieve better compression results.
Lewis and Knowles [15] in 1992 were the first to introduce
a tree-like data structure to represent the coefficients of the octave
decomposition.
Later, in 1993 Shapiro [23] called this structure
"zerotree" of wavelet coefficients, and presented his elegant algorithm
for entropy encoding called "Embedded Zerotree Wavelet" (EZW) algorithm.
The zerotree is based on the hypothesis that if a wavelet coefficient at
a coarse scale is insignificant with respect to a given threshold T, then
all wavelet coefficients of the same orientation in the same spatial location
at a finer scales are likely to be insignificant with respect to T. The
idea is to define a tree of zero symbols which starts at a root which is
also zero and labeled as end-of-block. Fig. 7(a) and
7(b)
shows a similar zerotree structure. Many insignificant coefficients at
higher frequency subbands (finer resolutions) can be discarded, because
the tree grows as powers of four. The EZW algorithm encodes the tree structure
so obtained. This results in bits that are generated in order of importance,
yielding a fully embedded code. The main advantage of this encoding is
that the encoder can terminate the encoding at any point, thereby allowing
a target bit rate to be met exactly. Similarly, the decoder can also stop
decoding at any point resulting in the image that would have been produced
at the rate of the truncated bit stream. The algorithm produces excellent
results without any pre-stored tables or codebooks, training, or prior
knowledge of the image source.
|
|
|
(a)
|
(b)
|
Fig. 7(a) Structure of zerotrees,
and 7(b) Scanning order of subbands for encoding
Since its inception in 1993, many enhancements have been made to make
the EZW algorithm more robust and efficient. One very popular and improved
variation of the EZW is the SPIHT algorithm.
2. Set Partitioning in Hierarchical Trees (SPIHT) Algorithm
Said and Pearlman [22], offered an alternative explanation
of the principles of operation of the EZW algorithm to better understand
the reasons for its excellent performance. According to them, partial ordering
by magnitude of the transformed coefficients with a set partitioning sorting
algorithm, ordered bitplane transmission of refinement bits, and exploitation
of self-similarity of the image wavelet transform across different scales
of an image are the three key concepts in EZW. In addition, they offer
a new and more effective implementation of the modified EZW algorithm based
on set partitioning in hierarchical trees, and call it the SPIHT algorithm.
They also present a scheme for progressive transmission of the coefficient
values that incorporates the concepts of ordering the coefficients by magnitude
and transmitting the most significant bits first. They use a uniform scalar
quantizer and claim that the ordering information made this simple quantization
method more efficient than expected. An efficient way to code the ordering
information is also proposed. According to them, results from the SPIHT
oding algorithm in most cases surpass those obtained from EZQ algorithm.
3. Scalable Image Compression with EBCOT
This algorithm is based on independent Embedded Block Coding with Optimized
Truncation of the embedded bit-streams (EBCOT)[25]
which identifies some of the major contributions of the algorithm. The
EBCOT algorithm is related in various degrees to much earlier work on scalable
image compression. Noteworthy among its early predecessors are: the EZW algorithm,
SPIHT algorithm, and Taubman and Zakhor's LZC (Layered Zero Coding) algorithm.
Like each of these, the EBCOT algorithm uses a wavelet
transform to generate the subband coefficients which are then
quantized and coded. Although the usual dyadic wavelet
decomposition is typical, other "packet" decompositions are
also supported and occasionally preferable.
Scalable compression refers to the generation of a bit-stream which
contains embedded subsets, each of which represents an efficient compression
of the original image at a reduced resolution or increased distortion.
A key advantage of scalable compression is that the target bit-rate or
reconstruction resolution need not be known at the time of compression.
Another advantage of practical significance is that the image need not
be compressed multiple times in order to achieve a target bit-rate, as
is common with the existing JPEG compression standard. Rather than focusing
on generating a single scalable bit-stream to represent the entire image,
EBCOT partitions each subband into relatively small blocks of samples and
generates a separate highly scalable bit-stream to represent each so-called
code-block. The algorithm exhibits state-of-the-art compression performance
while producing a bit-stream with an unprecedented feature set, including
resolution and SNR scalability together with a random access property.
The algorithm has modest complexity and is extremely well suited to applications
involving remote browsing of large compressed images.
4. Lossless Image Compression using Integer to Integer WT
Although we have concentrated in this article mainly on lossy image coding,
lossless coding is important for high-fidelity images like medical images,
seismic data, satellite images, and images generated from studio-quality
video. The JPEG standard specifies a lossless coding scheme which simply codes
the difference between each pixel and the predicted value for the pixel.
The sequence of differences is encoded using Huffman or arithmetic encoding.
Unfortunately, the huge size of the images for which lossless compression
is required makes it necessary to have encoding methods that can support
storage and progressive transmission of images at a spectrum of resolutions
and encoding fidelities from lossy to lossless.
The multiresolution nature[17] of the wavelet transform
makes it an ideal candidate for progressive transmission. However, when wavelet
filtering is applied to an input image (set of integer pixel values), since
the filter coefficients are not necessarily integers, the resultant filtered
output is no longer integers but floating point values. For lossless encoding,
to make the decoding process exactly reversible, it is required that the
filtered coefficients should be represented with integer values.
In [5] approaches to build invertible wavelet transforms
that map integers to integers are described. These invertible
integer-to-integer wavelet transforms are useful for lossless image coding.
The
approach is based upon the idea of factoring wavelet transforms into lifting
steps thereby, allowing the construction of an integer version of every
wavelet transform. The construction is based on writing a wavelet transform
in terms of "lifting", which is a flexible technique that has been applied
to the construction of wavelets through an iterative process of updating
a subband from an appropriate linear combination of the other subbands.
A number of invertible integer wavelet transforms are implemented and applied
to lossless compression of images in [5]. Although
the results are mixed in terms of performance of these newly developed
filters, it is concluded that using such wavelet transforms permits lossless
representation of images thereby easily allowing progressive transmission
where a lower resolution version of the image is transmitted first followed
by transmission of successive details.
5. Image Coding using Adaptive Wavelets
Now a few words about image coding using adaptive wavelets. The main idea
behind the research work that we are currently pursuing is that all images
are not equal, and so in wavelet-based image coding, the wavelet filter
should be chosen adaptively depending on the statistical nature of image
being coded. We have experimented with a variety of wavelets to compress
a variety of images of different types at various compression ratios. Our
results [21] show that the performance in lossy coders
is image dependent; while some wavelet filters perform better than others
depending on the image, no specific wavelet filter performs uniformly better
than others on all images. Similar results have also been observed in the
context of lossless compression using various integer-to-integer wavelet
transforms. This adaptive filter selection is important because, when the
performance of the wavelet filter is poor in the first place, use of even
sophisticated quantization and context modeling of the transform coefficients
may not always provide significant enough gain. Hence, the importance of
searching and using good wavelet filters in most coding schemes
cannot be over emphasized. We are currently working on algorithms to dynamically
determine the right wavelet filter based on the type and statistical
nature of the input image to be coded.
B. Performance Comparison: DCT vs. DWT
A final word on the performance of wavelet-based and JPEG coders. The peak
signal to noise ratios (PSNR) of several different wavelet compression
techniques applied to the 512 x 512, 8-bpp Lena image as well as the performance
of a baseline JPEG image compressor are compared in [13]
and are reproduced in Fig. 8. It is seen that, at compression
ratios less than 25:1 or so, the JPEG performs better numerically than
the simple wavelet coders. At compression ratios above 30:1, JPEG performance
rapidly deteriorates, while wavelet coders degrade gracefully well beyond
ratios of 100:1. The graphs also show that both the encoding technique
and the particular wavelet used can make a significant difference in the
performance of a compression system: the zerotree coder performs the best;
biorthogonal perform better than W6; and variable length coders(VLC) perform
better than fixed length coders(FLC).
Fig 8. Comparison of Wavelet
Compression methods [13]
More detailed PSNR comparison results of many wavelet-based algorithms
mentioned here can be found at the Image Communications Laboratory's Web
site at UCLA: http://www.icsl.ucla.edu/~ipl/
Future Image Compression Standard and Conclusion
JPEG-2000: Image compression standard for the next millennium
A lot of research work has been done on still image compression since the
establishment of the JPEG standard in 1992. To bring these research efforts
into a focus, a new standard called JPEG-2000[14]
for coding of still images is currently under development, and should be
completed by the end of year 2000. This standard is intended to advance
standardized image coding systems to serve applications into the next millennium.
It will provide a set of features vital to many high-end and emerging image
applications by taking advantage of new modern technologies. Specifically,
this new standard will address areas where current standards fail to produce
the best quality or performance. It will also provide capabilities to markets
that currently do not use compression. The standard will strive for openness
and royalty-free licensing. It is intended to compliment, not replace,
the current JPEG standards. This standard will include many modern features
including improved low bit-rate compression performance, lossless and lossy
compression, continuous-tone and bi-level compression, compression of large
images, single decompression architecture, transmission in noisy environments
including robustness to bit-errors, progressive transmission by pixel accuracy
and resolution, content-based description, and protective image security.
One important point to note is that a vast majority of the 22 candidate
algorithms under consideration are wavelet-based, and it is almost certain
that JPEG-2000 will be based on the DWT rather than DCT. More information
on JPEG-2000 activity can be found in JPEG's website (URL http://www.jpeg.org
), although most of this information is limited to JPEG members only.
Conclusion
While the DCT-based image coders perform very well at moderate bit rates,
at higher compression ratios, image quality degrades because of the artifacts
resulting from the block-based DCT scheme. Wavelet-based coding on the
other hand provides substantial improvement in picture quality at low bit
rates because of overlapping basis functions and better energy compaction property
of wavelet transforms. Because of the inherent multiresolution nature,
wavelet-based coders facilitate progressive transmission of images thereby
allowing variable bit rates. We have briefly reviewed some of
the more sophisticated techniques that take advantage of the statistics
of the wavelet coefficients. The upcoming JPEG-2000 standard will incorporate
many of these research works and will address many important aspects in
image coding for the next millennium. However, the current data compression
methods might be far away from the ultimate limits imposed by the underlying
structure of specific data sources such as images. Interesting issues like
obtaining accurate models of images, optimal representations of such models,
and rapidly computing such optimal representations are the "Grand Challenges"
facing the data compression community. Interaction of harmonic analysis
with data compression, joint source-channel coding, image coding based
on models of human perception, scalability, robustness, error resilience,
and complexity are a few of the many outstanding challenges in image coding
to be fully resolved and may affect image data compression performance
in the years to come.
Acknowledgements
I would like to thank Prof. Rao Vemuri for his inspiration and guidance
in writing this article. I would also like to thank the anonymous reviewers
for their constructive criticism and feedback.
Glossary of Terms
Please visit my glossary
page (http://www.engr.ucdavis.edu/~ssaha/glossary.html) for terminology
related to data compression.
References
- 1
- Ahmed, N., Natarajan, T., and Rao, K. R. Discrete
Cosine Transform, IEEE Trans. Computers, vol. C-23, Jan. 1974, pp.
90-93.
- 2
- Boliek, M., Gormish, M. J., Schwartz, E. L.,
and Keith, A. Next Generation Image Compression and Manipulation Using
CREW,
Proc. IEEE ICIP, 1997, http://www.crc.ricoh.com/CREW.
- 3
- Bottou, L., Howard, P. G., and Bengio, Y. The
Z-Coder Adaptive Binary Coder, Proc. IEEE DCC, Mar. 1998, pp. 13-22, .
- 4
- Buccigrossi, R., and Simoncelli, E. P. EPWIC:
Embedded Predictive Wavelet Image Coder, GRASP Laboratory, TR #414.
- 5
- Calderbank, R. C., Daubechies, I., Sweldens,
W., and Yeo, B. L. Wavelet Transforms that Map Integers to Integers, Applied
and Computational Harmonic Analysis (ACHA), vol. 5, no. 3, pp. 332-369,
1998, http://cm.bell-labs.com/who/wim/papers/integer.pdf.
- 6
- Chan, Y. T. Wavelet Basics, Kluwer
Academic Publishers, Norwell, MA, 1995.
- 7
- Cohen, A., Daubechies, I., and Feauveau,
J. C. Biorthogonal Bases of Compactly Supported Wavelets, Comm. on Pure
and Applied Mathematics, 1992, vol. XLV, pp. 485-560.
- 8
- Coifman, R. R. and Wickerhauser, M. V.
Entropy Based Algorithms for Best Basis Selection, IEEE Trans. Information
Theory, vol. 38, no. 2, Mar. 1992, pp. 713-718.
- 9
- Daubechies, I. Orthonormal Bases of Compactly
Supported Wavelets, Comm. Pure and Applied Math., vol. 41, Nov.
1988, pp. 909-996.
- 10
- Estaban, D. and Galand, C. Application
of Quadrature Mirror Filters to Split Band Voice Coding Schemes, Proc.
ICASSP, May 1977, pp. 191-195.
- 11
- Froment , J. and Mallat, S. Second Generation
Compact Image Coding with Wavelets, in C.K. Chui, editor, Wavelets:
A Tutorial in Theory and Applications, vol. 2, Academic Press, NY,
1992.
- 12
- Gersho, A. and Gray, R. M. Vector
Quantization and Signal Compression, Kluwer Academic Publishers,
1991.
- 13
- Hilton, M. L., Jawerth, B. D.,
and Sengupta, A. Compressing Still and Moving Images with Wavelets, Multimedia
Systems, vol. 2 no. 3, April, 1994, ftp://ftp.math.scarolina.edu/pub/wavelet/papers/varia/tutorial/tutorial.ps.Z.
- 14
- ISO/IEC/JTC1/SC29/WG1 N390R, JPEG 2000 Image
Coding System, Mar. 1997, http://www.jpeg.org/public/wg1n505.pdf.
- 15
- Lewis, A. S. and Knowles, G. Image Compression
Using the 2-D Wavelet Transform, IEEE Trans. IP, vol. 1, no. 2,
April 1992, pp. 244-250.
- 16
- Malavar, H. S. Signal Processing with
Lapped Transforms, Norwood, MA, Artech House, 1992.
- 17
- Mallat, S. G. A Theory for Multiresolution
Signal Decomposition: The Wavelet Representation, IEEE Trans. PAMI,
vol. 11, no. 7, July 1989, pp. 674-693.
- 18
- Nelson, M. The Data Compression Book,2nd
ed., M&T books, Nov. 1995, http://www1.fatbrain.com/asp/bookinfo/bookinfo.asp?theisbn=1558514341.
- 19
- Pennebaker, W. B. and Mitchell, J. L.
JPEG
- Still Image Data Compression Standards, Van Nostrand Reinhold, 1993.
- 20
- Rao, K. R. and Yip, P. Discrete Cosine
Transforms - Algorithms, Advantages, Applications, Academic Press,
1990.
- 21
- Saha, S. and Vemuri, R. Adaptive Wavelet Coding
of Multimedia Images, Proc. ACM Multimedia Conference, Nov. 1999,
to appear.
- 22
- Said, A. and Pearlman, W. A. A New, Fast,
and Efficient Image Codec Based on Set Partitioning in Hierarchical Trees,
IEEE
Trans. CSVT, vol. 6, no. 3, June 1996, pp. 243-250, ftp://ipl.rpi.edu/pub/EW_Code/SPIHT.ps.gz.
- 23
- Shapiro, J. M. Embedded Image Coding
Using Zerotrees of Wavelet Coefficients, IEEE Trans. SP, vol. 41,
no. 12, Dec. 1993, pp. 3445-3462.
- 24
- Strang, G. and Nguyen, T. Wavelets
and Filter Banks, Wellesley-Cambridge Press, Wellesley, MA, 1996, http://www-math.mit.edu/~gs/books/wfb.html.
- 25
- Taubman, D. High Performance Scalable Image
Compression with EBCOT, submitted to IEEE Tran. IP, Mar. 1999.
- 26
- Tsai, M. J., Villasenor, J. D., and Chen, F.
Stack-Run Image Coding, IEEE Trans. CSVT, vol. 6, no. 5, Oct. 1996, pp.
519-521.
- 27
- Vaidyanathan, P. P. Multirate Systems
and Filter Banks, Prentice Hall, Englewood Cliffs, N.J., 1993.
- 28
- Vetterli, M. and Herley, C. Wavelets
and Filter Banks: Theory and Design, IEEE Trans. SP, vol. 40, no.
9, Sep. 1992, pp. 2207-2232.
- 29
- Vetterli, M. and Kovacevic, J. Wavelets
and Subband Coding, Englewood Cliffs, NJ, Prentice Hall, 1995, .
- 30
- Wallace, G. K. The JPEG Still Picture
Compression Standard, Comm. ACM, vol. 34, no. 4, April 1991, pp.
30-44.
- 31
- Woods, J. W. and O'Neil, S. D. Subband Coding
of Images IEEE Trans. ASSP, vol. 34, no. 5, October 1986, pp. 1278-1288.
- 32
- Xiong, Z., Ramachandran, K. and Orchard,
M. T. Space-Frequency Quantization for Wavelet Image Coding, IEEE Trans.
IP, vol. 6, no. 5, May 1997, pp. 677-693, http://troi.ifp.uiuc.edu/~kannan/my_papers/xiong_ramchandran_orchard_ip97.ps.gz.
Biography Subhasis
Saha (saha@llnl.gov ) received the
B.E. degree in Electronics and Tele-Communications Engineering from Bengal
Engineering College, University of Calcutta, and the M.Tech. degree in
Computer Science from Indian Institute of Technology, Kharagpur, India.
He is currently pursuing a Ph. D. from the University of California, Davis
with a Student Employee-ship from Lawrence Livermore National Laboratory.
Prior to joining the Ph. D. program, he was a technical lead at Apple Computer
Inc. in Cupertino, California. His current research interests include image
and video coding, pattern recognition, neural networks, and soft computing.
Copyright 2000 Subhasis Saha
Last Modified:Sunday, 21-Jan-01 18:08:26
MIND-IT-IGNORE
© Copyright 2000-2002 by ACM, Inc.
|