Rate-constrained Conditional Replenishment Video Coding with Adaptive Change Detection

 

Xinqiao Liu

December 1, 2000

Abstract

For conditional replenishment video coding algorithms, typically a change detection mask is transmitted as side information in addition to the signal within the changed areas. The change detector should therefore achieve the best performance without exceeding a certain bit-rate available for encoding the change detection mask and the signal. To find the optimal change detection algorithm, a Context-based Arithmetic Encoder (CAE) is first implemented to transmit the binary change detection mask. The CAE is originally used for shape coding in the object-based MPEG-4 standard. Based on this compression scheme, two direct change detection algorithms are implemented and their performances are analyzed using rate-distortion theory. Further, we propose an adaptive change detection algorithm that is based on the inherent noise characteristics of digital imaging system. We demonstrate how the change detection threshold is locally adapted to each pixel’s luminance value. For the rate-constrained conditional replenishment compression, simulation proves that our adaptive change detection algorithm achieves the best performance among all the three algorithms.

I. Introduction

Conditional replenishment (see, e.g. [1] [2] [3]) has been proposed as a technique for taking advantage of the similarity between successive frames in video-telephony or video conferencing where cameras typically are stationary and scenes usually change slowly. The rudimentary algorithm behind conditional replenishment is to transmit the pixel intensity value plus an address for each picture element that is changed by more than a certain threshold since the previous displayed frame. A binary change detection mask is used to represent the address information and transmitted separately. Conditional replenishment is significantly simpler than other video compression methods in terms of computational complexity, while is still able to achieve respectful compression ratio for certain video applications. Experiments with desktop video without motion compensation reveal that on average less than 3% of the pixels need to be replenished in most head-and-shoulders scenes with CIF/Q-NTSC frame sizes [5].

There have been some studies on conditional replenishment video coding (see e.g. [1] [2]). Most of the research, however, concentrate mainly on the image quality. The correlation between transmission bit-rate and the choice of change detection schemes still need to be explored. Recently, in [6], a perception-based change detection method in conditional replenishment video coding was presented. Based on the contrast sensitivity of human visual system, the authors suggest that only the pixel whose intensity differences relative to the stimulus background exceed the visibility threshold should be encoded. Therefore, the proposed method reduces the perceptual redundancy in addition to the spatial and temporal redundancy. However, the paper only studied the case of distortionless coding, and the threshold values used in their simulation are arbitrary and lack of theoretical support.

Figure1 shows the block diagram of conditional replenishment video coding method [4].

Figure 1 Block diagram of conditional replenishment.

Typically, the information in the changed area is decomposed into a change detection mask and the signal within the changed area. They are encoded and transmitted separately through the transmission channel under certain bit-rate constrain. In this study, we assume that the encoders for both paths are lossless. A Context-based Arithmetic Encoder (CAE) is used to encoding the binary change detection mask. The CAE method is originally used for shape coding in the object-based MPEG-4 standard. Here we don’t specify the compression scheme used for encoding the signal within the changed area. However, it is reasonable to assume that the bit-rate required for encoding the signal (R2) is proportional to the bit-rate required for encoding the change mask (R1) since they are all proportional to the number of pixels need to be replenished. Based on this assumption, we then are able to look at the trade-off between different change detection schemes and the bit rate required using rate-distortion theory. Figure 2 shows the model we will be using through this study.

Figure 2 Conditional replenishment model used in this study

Given limited channel bandwidth, there are several ways to control the bit rate in the change detector: i) by subsampling the change detection mask and thus reducing the number of pixels within the changed area; ii) by adjusting the detection threshold for the whole frame; iii) by using adaptive threshold based on the noise characteristics, which eliminates those pixels that have changed due to noise rather than the input. As the number of changed pixels decreases, however, the distortion of the reconstructed frame increases. In this report, we simulate the rate-distortion relationship of the above three methods. The unconstrained Lagrangian cost function is then introduced in order to find the optimum detection parameters for each method. And finally, we demonstrate that the proposed noise adaptive threshold method generate the best image quality given the same channel bandwidth among the three methods.

This report is organized as follows: in section II, we describe the context-based arithmetic encoding (CAE) scheme used in encoding the change detection mask; in section III, we present the rate-distortion result from both the subsampling change detection method and threshold-adjusting change detection method; in section III, we describe the noise characteristics in a typical digital imaging system and propose an adaptive change detection algorithm. The performance of these three algorithms is compared in section IV.

II. Context-based Arithmetic Encoder (CAE)

Context-based arithmetic encoding [8] is one of the binary bitmap-based shape coding scheme used in the object-based MPEG-4 standard [7]. Here we only consider the intraframe mode compression.

The binary change detection mask is first divided into 16x16 macroblocks. Within each macroblock, the coder exploits the spatial redundancy of the binary shape information to be coded. Pixels are coded in scan-line order and row-by-row. Three different types of the macroblocks are defined: "black" block in which none of the pixel has changed (all pixels are 0); "white" block whose pixels are all changed and need to be replenished (all pixels are 1); and those boundary macroblock which contains both changed and unchanged pixels. For black and white macroblocks, encoder only need to signal the macroblock type and the number of bits it used is negligible. For those boundary macroblocks, a template of 10 pixels is used to define the causal context for predicting the binary value of the current pixel (S0). The template is shown in Figure 2.

Figure 3 Template for defining the context of the pixel to be coded

The template extends up to 2 pixels to the left, to the right and to the top of the pixel (S0) to be coded. For encoding the pixels in the two top and left rows of a macroblock, parts of the template are defined by the shape information of the already transmitted macroblocks on the top and left side of the current macroblock. For the two right-most columns, each undefined pixel of the context is set to the value of its closest neighbor inside the macroblock.

For encoding the state transition, a context-based arithmetic encoder is used. For this project, however, the arithmetic encoder is not implemented. Instead, the theoretical conditional entropy based this template is calculated and this entropy is approximated as the bit-rate required for encoding the change mask. In order to calculate the conditional entropy, conditional probability table with size of 1024 contexts is first derived from a bigger training sequence. The conditional entropy is given by:

 

Same as predictive coding, the context-based conditional entropy coding greatly reduces the statistical dependencies between adjacent pixels, since:

III. Direct change detection methods for rate-constrained conditional replenishment

A standard change detector is shown in the following figure:

Figure 4 Change detector block diagram

Suppose the previous frame is represented by matrix A1, the current frame is represented by matrix A2, after change detection, the change mask is given by matrix C, which is binary with 0 represents unchanged pixels and 1 represent changed pixels. The signal in the changed area is thus given by  and the reconstructed frame is:

The mean-square distortion is defined as:

Assume the bit-rate to transmit C is R1 where R1=H(C), and the bit-rate to transmit the change signal is R2=H(A2C). Since both R1 and R2 are proportional to the number of changed pixels, it is reasonable to assume that  even though we don’t specify the compression scheme used to encoding A2C. Thus the total bit-rate R is:

where k is the ratio between R2 and R1.  Based on this assumption, we can just study the rate-distortion between D and R1 instead.

The Lagrangian cost function given Lagrange multiplier l is defined as

which is under the constrain that:

The test video sequence used in this study is captured by a stationary high-speed digital camera with a person moving cross the screen. Following is two consecutive sample frames and an example of the detected change mask:

Figure 5 Example of two continuous frames and the change mask.

There are two direct ways to reduce the number of detected pixels in the change detection: subsampling and threshold adjusting.

A.   Subsampling change detection

To reduce the bit-rate as well as allow lossy compression, the macroblock can be subsampled by a factor of 2, 4 or 8, resulting in a subblock of size 8x8, 4x4 or 2x2, respectively. The subblock is encoded using the CAE encoder as described above. The signal in the changed area is subsampled as well by the same ratio. The encoder transmits to the decoder the subsampling factor such that the decoder decodes the change mask and the signal and then upsamples them. The upsampling filter used in this study is a simple pixel replication filter combined with a 3x3 median filter.

Following figure shows the change mask generated at different subsampling ratio:

Figure 6 Change mask generated at different subsampling ratio. (a) 1:1, (b) 1:2, (c) 1:4, (d) 1:8

Following figures show that as the subsampling ratio increases, the bit-rate decreases while the distortion increases:

MATLAB Handle GraphicsMATLAB Handle Graphics

The rate distortion curve and the Lagrangian cost function at different Lagrange multiplier l are plotted as follows:

 

MATLAB Handle GraphicsMATLAB Handle Graphics

B.   Threshold-adjusting change detection

Another method to control the bit-rate is by adjusting the change detector threshold. As the threshold increased, few pixels will be detected. Following figure shows the change detection mask under different threshold values.

MATLAB Handle Graphics

Following figures show that as detection threshold increases, the bit-rate decreases while the distortion increases.

MATLAB Handle GraphicsMATLAB Handle Graphics

The rate distortion curve and the Lagrangian cost function at different Lagrange multiplier l are plotted as follows:

MATLAB Handle GraphicsMATLAB Handle Graphics

IV. Adaptive rate constrained change detection

A fundamental problem in designing an optimum change detector is how to separate pixels whose change is due to noise from pixels whose value change is due to real input signal change. The two algorithms described above do not provide a good solution. For example, in the threshold-adjusting scheme, even though increasing threshold reduces the bit rate, pixels whose input have relative small change will be neglected while pixels with large noise variance will still be detected and transmitted. This will increase the distortion of the reconstructed frame and degrade the compression performance. Therefore, a noise adaptive change detection algorithm is necessary.

A.   Noise characteristic of signals in digital imaging system

Today, most of images and videos are captured by either CCD or CMOS image sensors[9]. During frame exposure time, photo-detector inside each pixel converts incident light into photocurrent. The photocurrent is integrated onto a capacitor and the charge Qi,j (or voltage) is read out at the end of exposure time (here i and j are the pixel index). In this process, additive noise corrupts the output signal charge. The noise can be expressed as the sum of two independent components, (i) shot noise Ui,j which is zero mean signal dependent gaussian distribution with:

 

where q is the electron charge and (ii) readout circuit and reset noise Vi,j (including quantization noise) with zero mean and variance dV2. Thus the total noise variance of pixel (i,j) is:

The above equation implies that the noise is signal dependent, especially when the shot noise is larger than the readout noise. The stronger the luminance level, the noisier the pixel will be. Therefore, to separate pixels whose change is due to noise from pixels whose value change is due to real input signal change, the detection threshold should be locally adaptive.

B.   Noise adaptive change detection algorithm

In the proposed adaptive detection algorithm, we first calculate the local average value by averaging signals over a small area with size 8x8. The threshold Ti,j is then set to be multiples of the standard deviation of the noise of local average signals, i.e.,

where m is the sensitivity factor that is set globally over the whole frame. Note that by changing m, we effectively adjusting the detection sensitivity while the threshold is still locally adapted. Following figure shows the relation between input signal value and the adapted threshold value:

MATLAB Handle Graphics

Following figure shows the threshold varying across the whole frame:

MATLAB Handle Graphics

Using this adaptive detection algorithm, we can analysis the rate-distortion relation by varying the sensitivity factor m. Following figure shows the change detection mask under different m values.

MATLAB Handle Graphics

Following figures show that as detection sensitivity factor m increases, the bit-rate decreases while the distortion increases.

MATLAB Handle GraphicsMATLAB Handle Graphics

The rate distortion curve and the Lagrangian cost function at different Lagrange multiplier l are plotted as follows:

MATLAB Handle GraphicsMATLAB Handle Graphics

 

V. Comparison among different detection methods

In this section we compare the performance of the three detection methods presented in this report. First we define the frame PSNR as:

Following figure shows the comparison between subsampling change detection and threshold-adjusting change detection algorithm. It can be seen that given the same bit-rate, the threshold-adjusting achieves much higher PSNR than the subsampling method. On the other hand, the subsampling method can achieve lower bit-rate than the threshold-adjusting method.

MATLAB Handle Graphics

Following figure shows the comparison between the adaptive detection method and the threshold-adjusting method. The adaptive algorithm achieves slightly higher PSNR. The gain of the adaptive detection algorithm should be higher when evaluated using different criteria. The reason is that our definition of the distortion is referenced to the current frame A2, which include some noisy pixels in the unchanged region. Since our adaptive detection algorithm eliminates those pixels, the still background should looks smother from frame to frame when using conditional replenishment than from the original video sequence.

MATLAB Handle Graphics

 

VI. Conclusion

This project studied the optimum change detection algorithm in rate-constrained conditional replenishment video coding scheme. Based on the inherent noise characteristics of digital imaging system, we presented an adaptive change detection algorithm that can efficiently separate pixels whose change is due to noise from pixels whose value change is due to real input signal change. Two other change detection algorithms were also implemented together with a context-based arithmetic encoder (CAE) that is originally used for shape coding in MPEG-4 standard. Simulation proves that our adaptive change detection algorithm achieves the best PSNR among all the three algorithms. 

REFERENCES

[1] B. Haskell, "Frame replenishment coding of television," in Image Transmission Techniques, W.K. Pratt, Ed. New York: Academic, 1979.

[2] B. Haskell, F.W. Mounts and J.C. Candy, "Interframe coding of videotelephone pictures," Proc. Of IEEE, Vol. 60, pp. 792-800, July 1972.

[3] A.N. Netravali and B.G. Haskell, Digital pictures: representation, compression and standards, 2nd ed. Holmdel, NJ, 1995

[4] B. Girod, Image and video compression, EE368b class note, Fall 2000-2001.

[5] Y.J. Chiu and T. Berger, "Video compression using fax techniques," in Proc. 1996 IEEE Data Compression Conf., Apr. 1-3, 1996.

[6] Y.J. Chiu and T. Berger, "A software-only videocodec using pixelwise conditional differential replenishment and perceptual enhancements," IEEE Trans. on Circuits and Systems for Video Technology, Vol. 9, No. 3, pp. 438-450, April 1999.

[7] A.K. Katsaggelos, L.P. Kondi, F.W. Meier, J. Ostermann, and G.M. Schuster, "MPEG-4 and rate-distortion-based shape-coding techniques", Proc. of the IEEE, Volume: 86 6, pp. 1126-1154, June 1998.

[8] N. Brady, F. Bossen, N. Murphy, “Context-based arithmetic encoding of 2D shape sequences,” ICIP97, Santa Barbara, CA 1997

[9] A. J.P. Theuwissen, Solid-State Imaging with Charge-Coupled Devices, Kluwer Academic, 1995