Abstract: Elder and Zucker propose a novel image compression scheme sending only edge and blur information rather than the entire image. This compression technique is expecially appealing because edge density is linear in image size, so larger images will have higher compression ratios. While resulting reconstructions can have artifacts, the results seem suitable for ultra low data rate applications.
Image compression is an extremely important component of many modern information systems and a problem worthy of thorough investigation.
Current compression techniques utilize lossy and lossless techniques over the entire image. Elder proposes a technique to encode and transmit only a small portion of the entire image--the areas around signifgant edges in the original image. His argument is that for natural images, most scenes are relatively smooth in areas between edges, while large discontinuties occur at edge locations. Thus, much of the information between edges may be redundant and subject to increased compression. Elder proposes two necessary quantities: the intensity at edge locations in the image and a estimate of the blur at those locations. Using these quantities, Elder is able to reconstruct an image perceptually similar to the original.
This is an example of a second-generation coding technique, i.e. one that attempts to break an image into primitives such as contours or edges and then reconstructs an approximation to the original image from these primitives. In general, second-generation techniques are believed to become efficient at compression ratios higher than 50, as opposed to waveform or fractal based techniques.
In addition to the work proposed by Elder, we also perform an analysis on compressing these significant quantities using run length and quadruple coding techniques.
Image edges are usually found where there is a sudden change in image intensity. This will result in local minima or maxima of the first derivative of the intensity. Alternatively, this same location will have a zero-crossing of the second derivative.
Unfortunately, real images are corrupted with noise, which can change faster than the image data. An effective edge detector must be able to differentiate between real intensity transitions and sudden (and possibly random) noise transitions. The difficulty of discriminating between these transitions makes edge detection a problem for digital images.
Traditional edge detectors filter noise by using an isotropic Gaussian filter. The standard deviation--usually referred to as sigma--of this filter determines how many neighboring points are considered significant by the filter kernal. The larger the standard deviation, the more weight neighboring points contain in the filtered result, leading to greater smoothing of both noise and image sharpness. As a result, too small a sigma will provide insufficient noise filtration while too large a sigma will smooth out distinct transitions. This creates a problem of finding the "optimal" sigma for a given image (or subsection of an image, an even harder problem).
Elder proposes a multi-scale or multi-resolution approach where different scales of sigma are employed. For example, sharp images with little noise require a narrow filter (one that does averaging over a small scale) while blurry noise images require a broad filter (large scale). For each point in the gradient map, Elder-Zucker determines the "minimum reliable scale" (the smallest filter that can be used reliably) and uses that scale locally. After computing the "minimum reliable gradient," it uses that information for a second "minimum reliable Laplacian" steered in the gradient direction (direction of the edge-normal). In this way, it only tries to compute a 1D Laplacian locally across edges. Zero-crossings in the "minimum reliable Laplacian" will indicate edge locations.
The thresholds and definition of reliable scale are based on statistical arguments and some a priori knowledge about the image noise characteristics. While the algorithm is much less sensitive to finding the "correct" noise parameters, it can still be affected by incorrect values.
For a detailed discussion of the implementation, see Anat Caspi's discussion.
The rationale behind this is straightforward. d is an estimate of the sharpness of the edge. However, we cannot estimate this better than the minimum reliable scale, so we correct for that in the estimate. Due to the uncertainty afforded by the minimum reliable scale, we err on the side of less blurring, blurring only those pixels that we are sure need blur.
The reconstruction algorithm requires intensity values on either side of a detected edge and a blur estimate along the edge. We propose two schemes to compress this data.
Scheme #1: Run length encode each map, followed by entropy encoding. The code to do this is available in compress.m and an improved scheme is available in cfncompress.m.
Scheme #2: Encode all relevant information as an order quadrapule: (Row, Column, Intensity, Blur). Information can be grouped by each +row or column, to reduce the size of each tuple.
The first step in reconstruction of the image is to interpolate across all the edge information obtained from the original picture, in order to fill the gaps between edges. Additionally, the same technique can be used to reconstruct a map of the estimated edge blur, into a smoothly interpolated blur map that reduces the sharp edges of the reconstructed image.
The interpolation technique used is based on the solution of a diffusive partial differential equation, namely the Laplace equation:
This can be interpreted as the steady-state solution of the standard diffusive heat equation:
As the change with time goes to zero, the solution of this diffusive equation becomes a surface that minimizes the "energy" over the entire image, producing a smooth interpolation.
Here is an example of such a smooth solution, given contrived boundary conditions that are non-smooth. The image on the left represents the boundary points, with the zero-valued points being the variables to solve for. The image on the right is the reconstructed solution that smoothly interpolates between the center spike, and the values along the border.
In order to solve the Laplace equation, it must first be discretized. It can then be written as a matrix equation:
,where A is a discretized version of the Laplacian operator. The right hand side, b, contains zeros everywhere except for the "boundary data". By solving the equation for u, the image is reconstructed.
To do this, consider the reconstructed image as a grid, with each pixel a gridpoint. The "boundary values" in the reconstruction equation are the known edge values at each pixel. Using an example 3x3 grid, we can relabel the points in a linear fashion:
The four-point Laplacian differential operator is defined as:
which can be rewritten in the following form:
So, if we assume a single "boundary" value of 1 at the center pixel of the 3x3 grid, and using the four-point Laplacian differential operator, we can construct the matrix equation that must be solved for this simple diffusion problem. It is written in the following way:
In order to solve this problem numerically, a number of techniques can be used. As can be seen from the equation above, the solution involves solving a large, sparse matrix equation. Basic Gauss-Jordan elimination is possible, but because of the large matrices, this is prohibitively expensive for anything but small images. Other techniques include iterative methods, fast Fourier transform techniques, and multigrid methods.
Multigrid methods have the advantage of performing with O(n) time and space requirements, and therefore are probably the recommended method for any practical implementation. The multigrid methods work by solving the Laplace equation on a course grid, then use smoothing operations to transfer this solution to a finer grid.
The final point about this method of reconstruction is that there are several design choices to be made in what data to use in the reconstruction. The method we described above simply uses one set of boundary conditions, namely, the image values at the edge points. The paper by Elder and Zucker describes a slightly more sophisticated method which uses a gradient approximation to solve for two different edge values, the "light" and the "dark". In general, this should give somewhat better approximations for highly discontinuous edge areas, at the cost of more data to compress. A possible advantage is that less edges might be needed to reconstruct an accurate image, resulting in an overall win in compression.
Also, points on the edge of the image need to solved using "reflective" boundary conditions, which require a more specialized solver than the general method described above. This can be easily implemented in a multigrid solver, but as a compromise for the general case, the border pixels of the picture can be treated as boundary data in the edge map.
Sample matlab code that implements a direct Laplacian solver is available in test_solve.m, pde_solve.m, Laplacian_mat.m, and lap_bound_mat.m
Looking at reconstructed intensity values leads to sharp edges. While this has almost all of the perceptually relevant information, the reconstructed images all have sharp edges. This is fine if the original image had sharp edges, but if it contained blurry edges, we need to blur these edges a bit. The information to perform this reblurring is contained in the blur map encoded along with the intensity information and reconstructed with the pde solution.
To reblur, for each pixel we create a 2D Gaussian filter with a standard deviation equal to the value given in the reconstructed blur map. We then perform the filtering on the intensity image with that Gaussian kernal.
In practice, we found it easier to quantize the blur to the nearest 1/2 pixel, then perform many iterations of global filtering using one kernal, keeping only the relevant values. The code for this is available in reblur.m.
We have implemented a basic version of the Elder Zucker Edge Detector. The code for this implementation is available here. Various results are shown below. The first is the "doll" figure from Elder's paper[3], which may be available from his Publications page:
| |
| |
These edge maps range from poor to acceptable. Unfortunately, none are as good as edge maps shown in Elder's paper. This may indicate a problem with my implementation, or improper estimation of image noise characteristics. Even so, as you can see, a good estimate of the "sensor noise" is required for proper results.
Famous Terrorist
sigma = 1
42nd President of USA
sigma = 1
|
Not all intensity information picked out is relevant. In the peppers image, many spurious edges are detected, while, as we will see later, not all the relevant edges are picked out of the long narrow pepper in the foreground. This will lead to unnaturally smooth images in the reconstruction.
The edge map of Bin Laden is very good, most of the important details are captured. Not surprisingly, this leads to the best reconstruction results. Clinton's edge map is quite good, but unfortunately part of his chin is missing, as well as part of his eyebrow. This will lead to problems in the future as well.
Once we have the edge map, we need to transmit intensity values on both sides of the edge. My simple implementation transmits all intensity values within one pixel of a detected edge. The code is available in getcompressedimage.m and examples are shown below:
| |
| |
This simple implementation may be redundant. For every edgel, we may send 4 intensity values instead of 2. However, for the purposes of this project, we err on the side of greater information and less compression.
|
|
|
|
|
|
original image |
sigma = 1 edge map |
reconstructed blur map |
reconstructed intensity values
only |
reblurred result with
good blur |
reblurred result with
unmodified blur |
In the images above, we are astounded how well the reconstructor does with the limited data provided to it. Indeed, at first glance the images are very similar.
First, let's examine the reconstructed intensity values only. These images are very good, considering the low data rate. The artifacts remaining in Bin Laden's image are mostly visible in his forhead. Careful examination reveals that the values, though close to the original, are clearly filled in by the smooth pde_solution. This shows itself by removing much of the texture from the forehead. You can see why this is in the edge map--no edges were detected in his forehead, and thus no intensity values were transmitted. The same is true for his head scarf.
Clinton's image is more distorted in the intensity reconstruction, especially in his chin and cheek regions. Here, the edge detector failed to detect any edges, and so we have a large region where the pde solution must smoothly fill in intensity values. This leads to the smooth, "baby face" appearence. While this may be good cosmetically (we think it takes years off of the president's face), it is not that great for distortion metrics!
Reblurred images are also presented. At the far right, we see the results when the blur map is applied as is. These results are of very poor quality, indicating that the blur estimation is in general too high. We then manually modified the blur maps by taking the sqrt and dividing the result by a constant. This effectivly reduced the mean blur by quite a bit, but retained the relative scaling of the blur information. The results produced by these images are more pleasing, but I'm not sure they are qualitatively better than the reblurred intensity information alone.
|
| |
|
| |
This image best illustrates the need for the blur information. This image has a varying degree of blur in the image--the doll and the near parts of the shadow are sharp, while the far parts of the shadow are blurry. In the intensity reconstruction, the blurry parts of the shadow are sharpened by the pde solution. Here, the blur information becomes very important, because we must now know how much to reblur the remaining image.
This image also contains many artifacts of the pde reconstruction. Near the shadow's legs, you see some bright spots--they are discontinuous artifacts of the pde solution. Notice how these spots correlate highly with spurious edgels in the edge map presented above. Clearly, the edge detector must be improved to raise the quality of the reconstructed images!
original image
| |||
| | ||
However, Elder's results look much better than our implementation. This is probably due to the better quality of his edge detector. We would greatly like to know the differences between our implementations to understand how to improve our own implementation.
There are two possibilities for encoding the original image--for the purposes of our limited reconstructor, we must send the intensity values along the edges of the image. This clearly increases the number of bits we must send. However, a true multigrid pde solver should be able to reflect the solution at the boundaries, eliminating the need to send edge imformation. This will ultimately increase the compression ratio of the images. For comparison, we include results both with and without sending edge information.
Using scheme #1 on the "mann picture," run length encoding brings the original 512x768 (= 393216 pixels or 3.1MB) image down to 50272 pixels (w/ edges included) or 49068 (w/o edges). The entropy of the symbols is 6.47 bits/pixel (w/ edges) or 6.36 bits/pixel (w/o edges), yielding a total bit allocation of 325,000 (w/ edges) or 312,000 (w/o edges) and compressing the original image to roughly 10% of original size for the intensity map of each image. As the edge information poses only a slight difference (3% less bits without edge pixels in THIS implementation), we omit the calculation without edges for the sake of brevity.
Performing the same analysis on the blur map, run length encoding brings the original image down to 23,982 pixels. The entropy of the symbols is 4.6 bits/pixel, yielding a total bit allocation of 110280 and a compression percentage of 3% for the blur map only. The blur map, as implemented, does not contain edge information because blur is only defined at edge points.
The total bit allocation for the two separately encoded images is then 435,280, or a compression percentage of 14%
Code to perform run length encoding and entropy calculation is available in compress.m.
image | image size in bits (uncompressed) | # of run length encoded pixels | entropy of pixels | total bits (compressed) | compression percentage | |
| | |||||
| | |||||
| | |||||
| |
As we conjectured above, the compression ratio does increase with image size. Unfortunately, the larger images are also very expensive to reconstruct. Mann.pgm takes almost 1 hr on the sun workstations in sweet hall using the current matlab code.
Now, for one particular intensity map, we begin to quantize the intensity map in incremental powers of two. The results of reconstructing intensity values with these quantized intensity edge maps are shown below
These results are not as good as we had hoped. So, being intrepid daredevils, we tried to come up with a better compression scheme the night before the report is due. In this second scheme, we try to compress the edge maps by encoding the run length zeros as negative numbers (where before we decode run length zeros as a zero followed by a positive number indicating how many zeros are in the run). It is available here.
A plot of the rate distortion curve for laden.pgm. One should note that this image had one of the lowest compression ratios for all the images tested. Presumably other larger images would have a better performance. Unfortunately, computation time on the larger images is very expensive using the simple solver and could not be computed in time for this report.
The Elder Zucker curve falls significantly short of jpg performance for this image, while slightly beating out entropy constrained VLC results from homework 2. However, MSE distortion does not capture all the effects on the human perceptual system. Elder Zucker may be of comparable quality for similar data rates because it lacks blocking artifacts.
Using scheme #2, on peppers 512x512, we created ordered quadruples using the function compress2.m. For this image, the edge map created 59936 ordered quadruples (x,y, intensity, blur). These values had an entropy of 7.8 bits/pixel or 31.2 bits/quadruple. When encoded independently, they yield a total bit allocation of 1.9 Mbits for the entire image, not a great savings.
Clearly, the reconstruction results are heavily dependent on the quality of the edge detector. While Elder's method attempts automatically detect edges on the fly, the method is very sensitive to estimate of sensor noise. This sensitivity can drastically effect the quality of reconstructed images--if too many edges are detected, the image compression will be poor. On the other hand, if too few edges are detected, the image reconstruction will be of very poor quality, missing much detail of the original image.
However, the results seem to be of higher perceptual quality than jpg images at low data rates, although MSE measures may not capture that fact entirely. This method may be better suited than jpg for these applications
Our compression calculations are rough, back of the envelope calculations. More effort should be placed on maximizing the compressibility of these edge map images. By doing so, we may increase the quality of images at low data rates, thus improving the rate distortion performance of this compression scheme. One technique that we could have implemented would be to perform an ECSQ or ECVQ on the edge map quantization. Undoubtably, this would produce additional compression. An additional possibility would be to jpg encode the edge maps and reconstruct from the jpg decoded edge maps.
The edge detector can be improved--at least to the quality presented in Elder's paper. In addition, steps should be taken to attempt a more automated algorithm (a hard problem indeed!)
[1] Caspi, A. "Local Scale Control for Edge Detection and Blur Estimation." http://robotics.stanford.edu/~acaspi/cs223b 1999.
[2] J.H. Elder and S.W. Zucker. "Local Scale Control for Edge Detection and Blur Estimation." IEEE Transactions on Pattern Analysis and Machine Intelligence. col.20, No. 7. July 1998.
[3] Elder, J.H. "Are edges incomplete?" International Journal of Computer Vision. Nov 1999.
[4] Marr, David. Vision. San Francisco: W.H. Freeman and Company, 1995. Chapter 2.
[5] Marr,D. and E. Hildreth. "Theory of Edge Detection." Proc. R. Soc. Lond. B 207. 1980. p197.
[6] Olding, B. "Implementing the Marr-Hildreth Algorithm." http://www.stanford.edu/~ben/cs223b/final.htm 1999.
[7] Trucco, E. and A Verri. 3-D Computer Vision. Prentice Hall, 1998. Chapter 3,4.
[8] M. Kunt, A. Ikonomopoulos and M. Kocher, "Second-Generation Image Coding Techniques", Proceedings of the IEEE, Vol. 73, No. 4, pp. 549-574, April 1985.
[9] James W. Demmel, Applied Numerical Linear Algebra, SIAM Press, 1997
[10] Nakhle Asmar, Partial Differential Equations and Boundary Value Problems, Prentice Hall, 2000