I. Introduction
II. Non-parametric Sampling
III. Estimation Using Subband Decomposition of Wavelet
Coefficients
IV. Estimation by Induced Correlation
V. Conclusions
Our target is to fill the empty area so that the resulting
image approaches the original image.
|
image with half of its area as a hole |
|
Original Image | Extrapolated Image (patch hold the whole feature) : 23.1338 dB |
Original Image | Patch used | Extrapolated Image : 20.8683 dB |
Our target is to explore into using these decomposed subband
images or transform coefficients to estimate unknown subband components.
|
|
L1 plane |
|
|
|
|
When using the Laplacian pyramid structure or the wavelet decomposition,
there is a decrease in the size of the plane(image) from subband to subband.
For example, if the lowest Laplacian plane, L0, is of size N by N, then
the next plane L1 is of size N/2 by N/2. Therefore, if we can estimate
the L0 from a lowpass filtered image of size N/2 by N/2, we can achieve
compression by transmitting only the N/2 by N/2 image and estimating the
high frequency subband at the decoder.
Because there is a reduction of four to one from (higher frequency)
subband to (next-lower frequency) subband, the reverse process of estimation
involves one-to-many mapping[4]. Estimating from the
(lower frequency) subband to (next-higher) subband requires each pixel
in the lower subband (we will refer these pixels as the parent pixels)
to predict 4 pixels in the higher subband (we will refer these pixels as
the children pixels). Therefore, our model has each parent pixel associated
with four functions that map the intensity value of the parent pixel to
each of the children pixels, respectively. It is in finding these functions
that define our 'relations' among subbands. Under the assumption of self-similarity
of the pixels among subbands, the four functions were simplified in our
model to scaling functions of the parent pixel value. In other words, the
children pixel values are described by the pixel value of its parent times
a scalar that characterizes the mapping from the parent to each child.
The visual explanation of this paragraph is given in the next figure.
|
On the other hand, wavelet transform coefficients are characterized by their similarity and locality we assumed in our model and thus fit perfectly as our source of experimentation. To have a better understanding of the structure of the wavelet transform coefficients and the inherent spatial similarities, experiment started with a simple synthetic image - checkerboard. This image has ideal horizontal, vertical, and diagonal structure. When a 3 level wavelet transform was used with biorthogonal filter of length 2.2, the resulting coefficients present a very simple structure and similarity.
Unlike the Laplacian images, the wavelet coefficients had a large different
in the range of values from plane to plane. Normalization was necessary
to correctly determine the scalar function mapping of each parent pixel
to its children pixels. It was also needed for comparison of blocks in
adjacent planes of the decomposition.
1) decompose input image - 3 levels (A, B, C) |
2) normalize and by normalizing factor to get and |
3) find the mapping from pixels in C to B
. |
4) find the best match in the minimum MSE or maximum correlation
sense best matching pixel coordinate
. or . |
5) if multiple best matches, find the one closest in Euclidean distance to the parent pixel |
6) estimate the 4 children pixel of the current pixel in B
. |
|
|
|
|
|
|
Estimated Image (MMSE) : 24.3735 dB | Estimated Image (Correlation) : 23.7204 dB |
Statistical characteristics of the coefficient values in each subband plane shows increase in the variance and range of values from the highest frequency subband(D1) to the lowest one(D3). The range of the value increases because the D3 planes are formed after filtering 3 times as compared to a single filtering for the D1 planes. Since the wavelet transform is a filtering process, the coefficients are determined by the filter taps and neighboring pixels that are input to the filter. For this reason, ranges varied for different images and for different filters. This range difference between subbands was the major cause of error and difficulty. As mentioned, normalization was necessary to compare blocks in D2 and D3. It was also needed to determine the scalar factors in the parent to children mapping. Although efforts were made on figuring out the range differences, there was no general trend. Normalization by the dynamic range of coefficient pixels or the variance of the pixels in each plane were attempted, it did not improve the results when real texture images were used. Another major problem was the near zero mean and the difference in variance from plane to plane. Near zero values of the parent pixels were set to zero and consequently they were non-significant coefficients. If near zero values were used in mapping and estimation, they resulted in erroneous large scaling coefficients, so they need to be set to zero. Since the mean values of each plane was near zero and the variance decreased from D3 to D1, many valid mappings in D3-D2 pair were not being fully utilized in D2-D1 pair mapping due to the thresholding process described. An appropriate tradeoff point was not available from experimentations.
The assumption of mapping propagation from one adjacent subband pair
to another does not generally hold for texture images. Figure shows the
horizontal detail planes (D1, D2, and D3) for a real texture image and
an ideal dense checkerboard pattern image. From the visualization of the
texture image coefficients, structure is very difficult to find in the
D3 plane. There is an increase in structure in D2 and D3 planes.
Thus, trying to find a mapping in the more unstructured coefficients to
estimate more structured coefficient mapping results in a random like behavior
of the estimation. Even with the same checkerboard pattern differing
only in the density of the patterns in the image, there is a noticeable
difference in the wavelet coefficient decomposition. Specifically, due
to lack of similarity in D3 and D2, the mapping of D3 to D2 would not be
the same as from D2 to D1. Therefore, a mapping acquired from D2
and D3 does not generally provide a correct mapping of the coefficients
from D2 to D1 which is our source of estimation.
|
|
|
|
|
|
<Fig.10> Vertical Detail Coefficients of Wavelet Transform (bior 2.2)
Here we approach this basic problem with a new paradigm. Our basic problem
is to reconstruct the original image well by transmitting only a fraction
of the original image. So, rather than just using the planes provided by
wavelet transform (or subsampling) with unknown statistics, we construct
a new transformation in order to construct the set of layer where we can
control the statistical relation between layers completely.
Our target in this system is to find the filter B
which maximizes the PSNR at the end of receiver.
1. Eliminate the correlation between each component completely using
KLT.
This step is necessary to make the correlation between each component
under our control completely later by correlation filter, B. KLT
means xKLT= A x , where A is the Hermitian of
the eigenmatrix of the original image.
|
2. Put xKLT into an invertible filter B such
that the correlation matrix of y ( =B xKLT ) has a "good"
property.
Filter B should be invertible so the estimated image can be reconstructed
later.
|
3. Choose l from y and transmit only l.
y=[y1, ..., yn, ..., ym]
(
=B xKLT ) and our definition of layers are u=[y1,...,
yn], l=[yn+1,..., ym].
So only (m-n)/m of the original image is transmitted.
|
4. Estimate u using l.
In order to maximize SNR, the mean square error between estimated u and real u should be minimized. From linear estimation theory that filter C=E[ulH](E[llH])-1 minimizes the mean square error. Then mean square error is tr(E[uuH]-E[ulH](E[llH])-1 E[luH]). So we have to find the matrix filter B so that tr(E[uuH])/tr(E[uuH]-E[ulH](E[llH])-1 E[luH]) is maximized.
|
5. Reconstruct xKLT using estimated y. |
6. Reconstruct x. |
So the next job is to find the filter B which maximizes the PSNR
at the end of receiver.
The estimation error for u is
. So noise term at PSNR is the trace of the following matrix.
|
To find the invertible matrix B to minimize the noise is very difficult job. So we could not solve this for general cases. Even though we could not solve this problem, this is very interesting problem since it will guide the systematic approach to construct the planes to reconstruct the original image by containing of a fraction of the image.
But for the case when n=1 and m=2, we could solve this
problem.
Suppose
and . Then it could be found that is minimized to 0 when ad-bc=0, which means B is not invertible. So we know that the global optimal solution resides in the non-invertible matrix. So we know that the optimal solution can't be found and "good" sub-optimal solution can increase the PSNR as high as we want as long as no computation error happens during the inverse process of B. |
Now the best approach we can do is to find the best-possible sub-optimal
invertible solution. We found the two B below using our test images.
More B could be tested, but this 2 matrixes indeed shows the general
trend quite well.
General B | Special B |
One of the easiest solution is to set
when e is not 0.
Then we can estimate u from l by multiplying l by
Experimentally we know that the energy concentration property of KLT is very good. (Indeed in terms of energy concentration property, KLT is the optimal of all the other transforms.). So even for different images, the noise term takes relatively stable value. |
For specific images we searched a good matrix filter B. Below
is the filter B, which was designed for the first image in Section
4. Here we call this filter as a special filter.
|
Original Image | Extrapolated Image with general B : 17.18 dB | Extrapolated Image with special B : 19.8187dB | Interpolated Image : 20.9868dB |
In preserving the details of textures, this extrapolation method works much better than interpolation, but in preserving the intensity of the general shape, interpolation works better. By designing a special filter, B, the performance gets better. As we explained above, by spending more time in looking for the best filter B, the performance could improve.
Original Image | Extrapolated Image with general B : 15.3313 dB | Extrapolated Image with special B : 14.7275dB | Interpolated Image : 12.7236dB |
In this case, generally extrapolation cases work better than interpolation. That is because this image has a lot of high frequency component, so interpolation loses a lot of original image property. On the other hand, extrapolated image loses the property of original image rather equally through the whole images, so the degradation is spreaded over all frequencies. What is also notable is that in case of special filter, here this performs worse than general filter since the special filter was designed for the first texture above.
Original Image | Extrapolated Image with general B : 29.2755 dB | Extrapolated Image with special B : 32.2265 dB | Interpolated Image : 33.6150 dB |
From here we can see that this extrapolation method work for general
images also. In case of interpolation, the high frequency component of
image is lost, but in case of extrapolation, the jaggy effect appears on
the edges. So it is hard to say what is better than the other.
Considering the small number of data that B and KLT filter has, the optimal way to transmit low rate image is to find a good B for each images, and then transmit only a fraction of image with a good B and KLT filter. As original image grows bigger, the overload of transmitting B and KLT filter becomes negligible.
We carried out 3 different approaches to achieve compression for
texture images. The first approach was an experimentation and extension
to the ideas in [1] and [2].
The second approach was an investigation into the utilization of correlation
inherent among subbands, especially by use of wavelet transform coefficients.
Third approach was, to an extent, the reverse idea from the second - we
try to control the correlation among the data.
We found out that for texture images, non-parametric sampling method proves a good method in its performance of achieving compression. The drawback is the long computation time involved and the dependency of the window size in the implementation.
Contrary to the visual similarities of the wavelet coefficients images, there was lack of information to be can be shared between different pairs of subbands. As a result, trying to find correlation between the highest two subbands from lower subbands relations does not yield fruitful results. A statistical framework in the highest two subbands would be the natural step in further investigating into this approach.
Decorrelating the image pixels by KLT and inducing controlled correlation
is a novel idea that proved to have benefits as well as pitfalls.
Though there were limitations due to lack of mathematical tool to find
and elegant analytical form, this approach is also open to more investigation.