What is compression? Compression is a reversible conversion (encoding) of data that contains fewer bits. This allows a more efficient storage and transmission of the data. The inverse process is called decompression (decoding). Software and hardware that can encode and decode are called decoders. Both combined form a codec and should not be confused with the terms data container or compression algorithms.
The MPEG Compression The MPEG compression algorithm encodes the data in 5 steps [6], [8]: First a reduction of the resolution is done, which is followed by a motion compensation in order to reduce temporal redundancy. The next steps are the Discrete Cosine Transformation (DCT) and a quantization as it is used for the JPEG compression; this reduces the spatial redundancy (referring to human visual perception). The final step is an entropy coding using the Run Length Encoding and the Huffman coding algorithm. Step 1: Reduction of the Resolution The human eye has a lower sensibility to colour information than to dark-bright contrasts. A conversion from RGB-colour-space into YUV colour components help to use this effect for compression. The chrominance components U and V can be reduced (subsampling) to half of the pixels in horizontal direction (4:2:2), or a half of the pixels in both the horizontal and vertical (4:2:0).
Step 2: Motion Estimation An MPEG video can be understood as a sequence of frames. Because two successive frames of a video sequence often have small differences (except in scene changes), the MPEG-standard offers a way of reducing this temporal redundancy. It uses three types of frames: I-frames (intra), P-frames (predicted) and B-frames (bidirectional). The I-frames are “key-frames”, which have no reference to other frames and their compression is not that high. The P-frames can be predicted from an earlier I-frame or P-frame. P-frames cannot be reconstructed without their referencing frame, but they need less space than the I-frames, because only the differences are stored. The B-frames are a two directional version of the P-frame, referring to both directions (one forward frame and one backward frame). B-frames cannot be referenced by other P- or Bframes, because they are interpolated from forward and backward frames. P-frames and B-frames are called inter coded frames, whereas I-frames are known as intra coded frames.
Step 3: Discrete Cosine Transform (DCT) DCT allows, similar to the Fast Fourier Transform (FFT), a representation of image data in terms of frequency components. So the frame-blocks (8x8 or 16x16 pixels) can be represented as frequency components.
Step 4: Quantization During quantization, which is the primary source of data loss, the DCT terms are divided by a quantization matrix, which takes into account human visual perception. The human eyes are more reactive to low frequencies than to high ones. Higher frequencies end up with a zero entry after quantization and the domain was reduced significantly. FQUANTISED = F ( u , v ) DIV Q ( u , v ) Where Q is the quantisation Matrix of dimension N. The way Q is chosen defines the final compression level and therefore the quality. After Quantization the DC- and AC- terms are treated separately. As the correlation between the adjacent blocks is high, only the differences between the DC-terms are stored, instead of storing all values independently. The AC-terms are then stored in a zig-zag-path with increasing frequency values. This representation is optimal for the next coding step, because same values are stored next to each other; as mentioned most of the higher frequencies are zero after division with Q.
Step 5: Entropy Coding The entropy coding takes two steps: Run Length Encoding (RLE ) [2] and Huffman coding [1]. These are well known lossless compression methods, which can compress data, depending on its redundancy, by an additional factor of 3 to 4.
Lossless compression allows a 100% recovery of the original data. It is usually used for text or executable files, where a loss of information is a major damage. These compression algorithms often use statistical information to reduce redundancies. Huffman-Coding [1] and Run Length Encoding [2] are two popular examples allowing high compression ratios depending on the data. Using lossy compression does not allow an exact recovery of the original data. Nevertheless it can be used for data, which is not very sensitive to losses and which contains a lot of redundancies, such as images, video or sound. Lossy compression allows higher compression ratios than lossless compression. Why is video compression used? A simple calculation shows that an uncompressed video produces an enormous amount of data: a resolution of 720x576 pixels (PAL), with a refresh rate of 25 fps and 8-bit colour depth, would require the following bandwidth: 720 x 576 x 25 x 8 + 2 x (360 x 576 x 25 x 8) = 1.66 Mb/s (luminance + chrominance) For High Definition Television (HDTV): 1920 x 1080 x 60 x 8 + 2 x (960 x 1080 x 60 x 8) = 1.99 Gb/s Even with powerful computer systems (storage, processor power, network bandwidth), such data amount cause extreme high computational demands for managing the data. Fortunately, digital video contains a great deal of redundancy. Thus it is suitable for compression, which can reduce these problems significantly. Especially lossy compression techniques deliver high compression ratios for video data. However, one must keep in mind that there is always a trade-off between data size (therefore computational time) and quality. The higher the compression ratio, the lower the size and the lower the quality. The encoding and decoding process itself also needs computational resources, which have to be taken into consideration. It makes no sense, for example for a real-time application with low bandwidth requirements, to compress the video with a computational expensive algorithm which takes too long to encode and decode the data.
The MPEG Compression The MPEG compression algorithm encodes the data in 5 steps [6], [8]: First a reduction of the resolution is done, which is followed by a motion compensation in order to reduce temporal redundancy. The next steps are the Discrete Cosine Transformation (DCT) and a quantization as it is used for the JPEG compression; this reduces the spatial redundancy (referring to human visual perception). The final step is an entropy coding using the Run Length Encoding and the Huffman coding algorithm. Step 1: Reduction of the Resolution The human eye has a lower sensibility to colour information than to dark-bright contrasts. A conversion from RGB-colour-space into YUV colour components help to use this effect for compression. The chrominance components U and V can be reduced (subsampling) to half of the pixels in horizontal direction (4:2:2), or a half of the pixels in both the horizontal and vertical (4:2:0).
Step 2: Motion Estimation An MPEG video can be understood as a sequence of frames. Because two successive frames of a video sequence often have small differences (except in scene changes), the MPEG-standard offers a way of reducing this temporal redundancy. It uses three types of frames: I-frames (intra), P-frames (predicted) and B-frames (bidirectional). The I-frames are “key-frames”, which have no reference to other frames and their compression is not that high. The P-frames can be predicted from an earlier I-frame or P-frame. P-frames cannot be reconstructed without their referencing frame, but they need less space than the I-frames, because only the differences are stored. The B-frames are a two directional version of the P-frame, referring to both directions (one forward frame and one backward frame). B-frames cannot be referenced by other P- or Bframes, because they are interpolated from forward and backward frames. P-frames and B-frames are called inter coded frames, whereas I-frames are known as intra coded frames.
Step 3: Discrete Cosine Transform (DCT) DCT allows, similar to the Fast Fourier Transform (FFT), a representation of image data in terms of frequency components. So the frame-blocks (8x8 or 16x16 pixels) can be represented as frequency components.
Step 4: Quantization During quantization, which is the primary source of data loss, the DCT terms are divided by a quantization matrix, which takes into account human visual perception. The human eyes are more reactive to low frequencies than to high ones. Higher frequencies end up with a zero entry after quantization and the domain was reduced significantly. FQUANTISED = F ( u , v ) DIV Q ( u , v ) Where Q is the quantisation Matrix of dimension N. The way Q is chosen defines the final compression level and therefore the quality. After Quantization the DC- and AC- terms are treated separately. As the correlation between the adjacent blocks is high, only the differences between the DC-terms are stored, instead of storing all values independently. The AC-terms are then stored in a zig-zag-path with increasing frequency values. This representation is optimal for the next coding step, because same values are stored next to each other; as mentioned most of the higher frequencies are zero after division with Q.
Step 5: Entropy Coding The entropy coding takes two steps: Run Length Encoding (RLE ) [2] and Huffman coding [1]. These are well known lossless compression methods, which can compress data, depending on its redundancy, by an additional factor of 3 to 4.
Comments
Post a Comment