Pyramid coder with nonlinear prediction

Laurent MEUNIER
Antoine MANENS
EE368b (Image and Video compression)
Fall 2000

Introduction

The idea of a pyramidal image decomposition was first introduced by Burt & Adelson [1] as a Compact Image Code. They envisioned it as a powerful lossless technique to remove pixel-to-pixel correlation as well as a useful tool for progressive transmission.
A pyramidal image is composed of one low resolution image along with all the higher resolution error images. For this reason, the initial images can be recovered losslessly, which is a fundamental condition of many fields (medical imagery and astrophysics are two examples of such applications). The data reduction is obtained through a filtering process that minimizes the variance of the error images.
In a progressive transmission scheme, an image is first sent at reduced resolution (top of the pyramid) and the details are sent afterwards, bringing with them some finer details to be superimposed to the interpolated low resolution picture. In a similar way, we could imagine a protocol which adapts the resolution to the rate of the connection : a PC user behind a DSL connection does not necessarily have the same exigencies as a Palm user. In this context, a pyramid coder is the ideal tool.
Contrary to most traditional techniques such as transform techniques, statistical modeling or linear filtering, pyramid coding offers a lot of freedom as far as its operators are concerned. Linear operators have been studied extensively. However, there is no reason to limit ourselves to those and some gains can be obtained by looking at non-linear operators.

We started by reviewing the existing linear and non-linear techniques. The original paper by Burt & Adelson gave birth to the development of all kinds of linear kernels and no work has been done on synthetizing the performance of reconstruction of the existing filters in terms of PSNR and of visual quality.

The observed limitation of the linear kernels motivated our non-linear approach that gave us some improvement in the context of lossless coding.

Slides of the presentation (Powerpoint format)

Framework

Algorithm

The pyramidal algorithm has two operators :

REDUCE : a pre sampling filter followed by a down sampling by a factor 2
EXPAND : an up sampling by a factor 2 followed by an interpolation filter

The REDUCE operator produces a low passed version of the input signal and using the result of the EXPAND algorithm, an error signal is produced.
The output signal is made of all the error images as well as of the last low pass version : it contains all the information necessary to reconstruct the original image (modulo the quantization errors).

Figure : Open-loop pyramid coder. The information sent consists of the error images and of the reduced image

From a compression point of view, the role of the pyramidal decomposition is to concentrate the energy of the image in the sub sampled low pass version. If we achieve this, the error images will have a low variance and we will therefore be able to code them efficiently.
From a perceptual point of view, we want to reduce the aliasing of the higher frequencies so that an image reconstructed to full size from the lower definition error image gives a satisfying result.

Figure : Images transmitted in the pyramidal algorithm

We see that we don't have any restriction on the operator REDUCE and EXPAND (unlike in wavelet decomposition or in sub band coding). They can be non-linear in the general case.

The receiver gets the reduced image and the error images. The reduced image is expanded using the same EXPAND as the coder and the error images are added at each step of the pyramid to add the details to the image.

Figure : Expansion of the image.

One of the feature we focused on in this study is the progressive transmission. If the cost of the transmission of the reduced image can be low, the cost of the transmission of the error images increases by a factor of 4 each time. However, as shown in the next figure, all levels are not necessary to obtain a recognizable picture.

Figure : Different steps of the reconstruction of the image (left to right, top to bottom)

Quantization

Entropy can be substantially reduced by quantizing the pixel values in each level of the Laplacian pyramid [1]. When a proper choice of the number and of the distribution of the quantization levels is made, the degradation due to quantization is almost imperceptible (e.g. the human eye is fairly sensitive to contrast perturbations at low and medium spatial frequencies but relatively less to such perturbations at high spatial frequencies).
However, we did not concentrate on this aspect. We concentrated on lossless coding and the quantization simply consisted of rounding to the nearest integer value. This way, the distinction between open and closed pyramidal coding is not relevant any more since the reconstruction is nearly perfect at each step. Most importantly, the inherent quality of the REDUCE-EXPAND scheme can be assessed without any side-effects coming from other aspects of the coding pipeline.

Coding

We made a few assumptions as far as how the coding was done. All the levels of the pyramid are coded separately using a different Variable Length Code : it permitted a significant improvement over a coding with the same VLC for the pyramid. If the error images have more or less the same histogram of values, the reduced resolution has completely different statistical properties.
The final rate was obtained by adding the weighted entropy of the images (i.e. assuming an ideal VLC coder) rather than actually coding the images by Huffman coding. Moreover, we did not include the cost of sending the coding table, which can be non-negligible as we increase the number of levels of the pyramid : it would be probably more realistic to code the smaller error images with the same VLC. An other approach could consist of using a pre computed VLC coder using the statistics of the pyramid levels.

Implementation

Matlab was used for all the programming. The performances were quite good because the pyramidal algorithms is usually well expressed in terms of matrix operations.

Criteria

We based our comparison of the filters on several criteria :

compression rate : what is the efficiency of the transmission scheme ?
visual aspect : how well are the edges are preserved at lower resolution, what is the intensity of the blurring, etc... ?
aliasing : this point is highly related to the previous one. The only source of aliasing is the pre-sampling filter used in REDUCE.
flops : we don't want the method to be too computationaly expensive.

Next (Linear techniques)