Motion-Compensated Prediction

Motion-Compensated Prediction

Motion-compensated prediction (MCP) can be used to decrease the number of necessary bits needed for quantization by encoding the error of predicted motion in the current frame [3,4]. A reconstructed copy of the previous frame is kept at both the encoder and decoder. The current frame is broken up into blocks, usually of 8x8 or 16x16 pixels. The previous reconstructed frame is also broken into blocks. Each of the reconstructed frame's blocks is shifted, one pel at a time, to a maximum displacement in both directions. Each of these blocks in the reconstructed frame is compared to the block in the current frame, and the sum-of-squared differences (SSD), or the sum of absolute differences (SAD) is computed. The error block is saved in an error image the same size as the current frame. The number of pixel shifts in both directions (dx,dy) is also saved for each block. The error blocks are quantized and sent along with the displacement vectors for each block, which are usually differentially and Huffman coded.

In order to justify using more bits to send the displacement vectors, the bit savings from quantizing the error should be significantly greater than the number of bits required to encode the displacement vectors. When the quantized error is comparable to or larger than the original image, more bandwidth will be required to send the displacement vectors, and the current frame should instead be sent to the quantizer. This could occur in tv video sequences during a scene cut, for example.

Motion-compensating prediction can be improved to make use of temporal image sequence statistics by increasing the motion estimation search region to include the same search regions in multiple previous frames. A buffer of multiple reconstructed frames is kept at both the encoder and decoder. Instead of searching just the region of possible displacements in the previous reconstructed frame, each region in the frame buffer is searched for the minimum SAD. The displacement vector for each block now includes a time delay to indicate which frame in the buffer holds the block which corresponds to that block's spatial displacement vector.

Just as the size of the block's search region is dependent on the number of bits allocated to encode spatial displacement, the size of the frame buffer is dependent on the number of bits allocated to encode the frame buffer index. Therefore, long-term memory prediction should only be implemented when the savings in reduced motion estimation error is larger than the number of bits needed to encode the block position in the frame buffer index.