Skip to main content

How video compression works

What is compression? Compression  is a reversible conversion  (encoding)  of data that contains fewer  bits. This allows  a more efficient storage and transmission of the data. The inverse process is  called  decompression (decoding). Software  and  hardware that can encode  and decode  are  called decoders. Both combined  form a codec and should not be confused with the terms data container or compression algorithms.



Lossless compression  allows a  100% recovery  of the  original data.  It is usually used  for text or executable files, where a  loss of information is  a  major damage. These  compression algorithms often use  statistical information to reduce redundancies.  Huffman-Coding [1] and  Run Length Encoding  [2] are two popular examples allowing high compression ratios depending on the data.  Using  lossy compression  does not allow an exact recovery of the  original data.  Nevertheless it can be used  for data, which is not  very sensitive to losses  and which contains a lot of  redundancies, such as images, video or sound. Lossy compression allows higher compression ratios than lossless compression. Why is video compression used? A simple  calculation shows that an  uncompressed  video produces an  enormous amount  of data: a resolution of  720x576 pixels (PAL),  with  a refresh rate  of 25  fps and 8-bit  colour depth, would require the following bandwidth: 720 x 576 x 25 x 8 + 2 x (360 x 576 x 25 x 8) = 1.66 Mb/s (luminance + chrominance) For High Definition Television (HDTV): 1920 x 1080 x 60 x 8 + 2 x (960 x 1080 x 60 x 8) = 1.99 Gb/s Even with powerful  computer  systems (storage, processor power, network bandwidth), such data amount  cause extreme high computational  demands for managing the data. Fortunately, digital video contains a great deal of  redundancy.  Thus it is  suitable for compression,  which  can reduce  these problems significantly. Especially lossy compression techniques deliver high compression ratios for video data. However, one must  keep in mind that there is  always  a trade-off between data size (therefore computational time) and quality. The  higher the compression  ratio, the lower the size and the lower the  quality. The encoding and decoding  process itself  also needs computational resources, which have to be taken into consideration. It makes no sense, for example for a real-time application  with low bandwidth requirements, to  compress the  video with a  computational expensive algorithm  which takes too long to encode  and decode  the data.
The MPEG  Compression  The MPEG compression algorithm encodes the data in 5 steps [6],  [8]: First a  reduction of the resolution  is  done, which  is followed  by  a  motion compensation in  order  to  reduce temporal redundancy. The next steps are the  Discrete Cosine Transformation  (DCT)  and a  quantization as  it is used for the JPEG  compression; this reduces  the spatial redundancy (referring to human visual perception).  The final step  is an  entropy  coding  using  the  Run Length Encoding  and the  Huffman coding algorithm. Step 1:  Reduction of the  Resolution  The human  eye has  a lower sensibility to colour information than  to dark-bright  contrasts. A conversion from  RGB-colour-space into YUV  colour  components help to  use this  effect for  compression. The chrominance  components U and V can be reduced (subsampling) to half of  the pixels in horizontal direction (4:2:2), or a half of  the pixels in both the horizontal and vertical (4:2:0).


Step 2: Motion Estimation An MPEG video can be  understood  as  a  sequence  of  frames. Because two  successive frames  of a video sequence often have  small differences  (except in  scene  changes), the MPEG-standard  offers a  way of reducing this temporal redundancy. It uses three types of frames: I-frames (intra), P-frames (predicted) and B-frames (bidirectional). The I-frames  are “key-frames”, which have no reference  to other frames and their compression is not that high. The P-frames  can  be predicted  from an earlier I-frame or P-frame. P-frames  cannot be reconstructed  without their referencing frame, but  they  need less space than the I-frames, because only the differences  are  stored.  The B-frames  are  a two  directional  version of  the P-frame, referring  to both directions  (one  forward frame and one backward frame). B-frames  cannot be referenced  by other P- or  Bframes, because they are  interpolated  from forward  and backward frames. P-frames  and  B-frames  are called  inter coded  frames, whereas  I-frames  are  known  as  intra coded frames.

Step 3: Discrete Cosine Transform (DCT) DCT allows, similar to the Fast Fourier Transform (FFT), a representation of image data in terms of frequency components. So the frame-blocks (8x8 or 16x16 pixels) can be represented as frequency components.

Step 4: Quantization During  quantization,  which  is the  primary source  of  data loss,  the  DCT terms are  divided  by a  quantization matrix, which  takes into account human  visual perception. The  human eyes  are more reactive to low frequencies than to high  ones. Higher frequencies  end up  with  a zero entry after quantization and the domain  was reduced  significantly.   FQUANTISED = F ( u , v ) DIV Q ( u , v ) Where  Q is  the quantisation Matrix of  dimension  N.  The  way Q is  chosen  defines the final  compression level and therefore the quality. After Quantization  the DC- and AC- terms are  treated  separately. As the correlation between the  adjacent blocks is  high, only the differences  between  the DC-terms  are  stored, instead of storing all values independently.  The  AC-terms  are then stored in a zig-zag-path  with increasing frequency values. This  representation is  optimal for  the next coding step, because same values  are  stored  next to each other; as  mentioned  most  of the higher frequencies  are zero  after division with Q.



Step 5: Entropy  Coding The entropy coding  takes two steps:  Run Length  Encoding  (RLE  )  [2] and  Huffman coding  [1].  These are well known  lossless compression methods,  which can  compress  data, depending on  its  redundancy, by an additional  factor of 3 to 4.


Comments

Popular posts from this blog

Video format explained and compared

 This article explains main video formats and later compares with photos the quality. Audio Video Interleave (.avi) Developed by Microsoft and released with Windows 3.1 way back when false teeth were still made out of wood, AVI files have been a work horse of digital video. Although its popularity has been waning, lots of legacy video in AVI can be found all over the web. More recently, AVI has been abandoned for Microsoft's WMV (Windows Media Video). One of the most maddening things about AVI today is that the format doesn't allow for specifying an aspect ratio, so a 16:9 AVI video may start displaying at 4:3 - this is less of a problem if your player allows you to manually select aspect ratios. If you're watching in the non-pro version of QuickTime though, you'll have to learn to live with people being unusually narrow. Advanced Systems Format (.asf) ASF is a proprietary Microsoft container that usually houses files compressed with Microsoft's ...

After Effects Alternative

Pro editing. Superpowered visual effects. Next-generation films. Editing Trim, ripple, roll and slice everything from feature-length films to 30 second YouTube videos, with HitFilm Pro’s industry standard editing software. Create 3D title sequences for your sci-fi thriller, distort action cam footage and get a hi-res second monitor preview of your work. Editor Get that rough cut looking sharp with HitFilm Pro’s dedicated editing trimmer, customizable workspace and shot-transition features. You can mix different formats, resolutions, framerates all on the same timeline and quickly import and organize your footage. Auto sync Tired of turning your characters into ventriloquists? With the much-requested audio auto-sync feature, your DSLR scratch audio will automatically synchronize with your files from a dedicated audio recorder. No more awkward sound delays. 3D titling Draw your audience in with 3D title sequences and stunning lower thir...

1100 $ AMD system to edit 4K

Amd recently has released some great CPU with high cores and low prices. About 305$ you can get an editing beast with 8 cores @3.7 GHz. This cpu will be enough to edit smoothly. There are more expensive cpu and higher cores or higher frequencies but to be honest we don't need real time exporting. You can have a break and drink a coffee while exporting your video. Memory is important. Lower the size the more time it takes to do task. I recommend 16GB DDR4. (2400Mhz or more is great also). That amount will make switching between applications easier and faster. So right now we spent 305 dollars for cpu and 150 for memory (RAM). Sum is 455 $. Lets suppose you are upgrading your system so you have some hard drives and some ssd maybe. I love SSDs for system and as a drive for footages. For the system Samsung EVO 970 will do a great job for 90$ ( Sata m.2 250GB @Read speed up to 3400 MB/s). Asus x470 Pro is a great mainboard for 160$. Also find power supply more than 500w. I recommen...