Skip to main content

How video compression works

What is compression? Compression  is a reversible conversion  (encoding)  of data that contains fewer  bits. This allows  a more efficient storage and transmission of the data. The inverse process is  called  decompression (decoding). Software  and  hardware that can encode  and decode  are  called decoders. Both combined  form a codec and should not be confused with the terms data container or compression algorithms.



Lossless compression  allows a  100% recovery  of the  original data.  It is usually used  for text or executable files, where a  loss of information is  a  major damage. These  compression algorithms often use  statistical information to reduce redundancies.  Huffman-Coding [1] and  Run Length Encoding  [2] are two popular examples allowing high compression ratios depending on the data.  Using  lossy compression  does not allow an exact recovery of the  original data.  Nevertheless it can be used  for data, which is not  very sensitive to losses  and which contains a lot of  redundancies, such as images, video or sound. Lossy compression allows higher compression ratios than lossless compression. Why is video compression used? A simple  calculation shows that an  uncompressed  video produces an  enormous amount  of data: a resolution of  720x576 pixels (PAL),  with  a refresh rate  of 25  fps and 8-bit  colour depth, would require the following bandwidth: 720 x 576 x 25 x 8 + 2 x (360 x 576 x 25 x 8) = 1.66 Mb/s (luminance + chrominance) For High Definition Television (HDTV): 1920 x 1080 x 60 x 8 + 2 x (960 x 1080 x 60 x 8) = 1.99 Gb/s Even with powerful  computer  systems (storage, processor power, network bandwidth), such data amount  cause extreme high computational  demands for managing the data. Fortunately, digital video contains a great deal of  redundancy.  Thus it is  suitable for compression,  which  can reduce  these problems significantly. Especially lossy compression techniques deliver high compression ratios for video data. However, one must  keep in mind that there is  always  a trade-off between data size (therefore computational time) and quality. The  higher the compression  ratio, the lower the size and the lower the  quality. The encoding and decoding  process itself  also needs computational resources, which have to be taken into consideration. It makes no sense, for example for a real-time application  with low bandwidth requirements, to  compress the  video with a  computational expensive algorithm  which takes too long to encode  and decode  the data.
The MPEG  Compression  The MPEG compression algorithm encodes the data in 5 steps [6],  [8]: First a  reduction of the resolution  is  done, which  is followed  by  a  motion compensation in  order  to  reduce temporal redundancy. The next steps are the  Discrete Cosine Transformation  (DCT)  and a  quantization as  it is used for the JPEG  compression; this reduces  the spatial redundancy (referring to human visual perception).  The final step  is an  entropy  coding  using  the  Run Length Encoding  and the  Huffman coding algorithm. Step 1:  Reduction of the  Resolution  The human  eye has  a lower sensibility to colour information than  to dark-bright  contrasts. A conversion from  RGB-colour-space into YUV  colour  components help to  use this  effect for  compression. The chrominance  components U and V can be reduced (subsampling) to half of  the pixels in horizontal direction (4:2:2), or a half of  the pixels in both the horizontal and vertical (4:2:0).


Step 2: Motion Estimation An MPEG video can be  understood  as  a  sequence  of  frames. Because two  successive frames  of a video sequence often have  small differences  (except in  scene  changes), the MPEG-standard  offers a  way of reducing this temporal redundancy. It uses three types of frames: I-frames (intra), P-frames (predicted) and B-frames (bidirectional). The I-frames  are “key-frames”, which have no reference  to other frames and their compression is not that high. The P-frames  can  be predicted  from an earlier I-frame or P-frame. P-frames  cannot be reconstructed  without their referencing frame, but  they  need less space than the I-frames, because only the differences  are  stored.  The B-frames  are  a two  directional  version of  the P-frame, referring  to both directions  (one  forward frame and one backward frame). B-frames  cannot be referenced  by other P- or  Bframes, because they are  interpolated  from forward  and backward frames. P-frames  and  B-frames  are called  inter coded  frames, whereas  I-frames  are  known  as  intra coded frames.

Step 3: Discrete Cosine Transform (DCT) DCT allows, similar to the Fast Fourier Transform (FFT), a representation of image data in terms of frequency components. So the frame-blocks (8x8 or 16x16 pixels) can be represented as frequency components.

Step 4: Quantization During  quantization,  which  is the  primary source  of  data loss,  the  DCT terms are  divided  by a  quantization matrix, which  takes into account human  visual perception. The  human eyes  are more reactive to low frequencies than to high  ones. Higher frequencies  end up  with  a zero entry after quantization and the domain  was reduced  significantly.   FQUANTISED = F ( u , v ) DIV Q ( u , v ) Where  Q is  the quantisation Matrix of  dimension  N.  The  way Q is  chosen  defines the final  compression level and therefore the quality. After Quantization  the DC- and AC- terms are  treated  separately. As the correlation between the  adjacent blocks is  high, only the differences  between  the DC-terms  are  stored, instead of storing all values independently.  The  AC-terms  are then stored in a zig-zag-path  with increasing frequency values. This  representation is  optimal for  the next coding step, because same values  are  stored  next to each other; as  mentioned  most  of the higher frequencies  are zero  after division with Q.



Step 5: Entropy  Coding The entropy coding  takes two steps:  Run Length  Encoding  (RLE  )  [2] and  Huffman coding  [1].  These are well known  lossless compression methods,  which can  compress  data, depending on  its  redundancy, by an additional  factor of 3 to 4.


Comments

Popular posts from this blog

Video format explained and compared

 This article explains main video formats and later compares with photos the quality. Audio Video Interleave (.avi) Developed by Microsoft and released with Windows 3.1 way back when false teeth were still made out of wood, AVI files have been a work horse of digital video. Although its popularity has been waning, lots of legacy video in AVI can be found all over the web. More recently, AVI has been abandoned for Microsoft's WMV (Windows Media Video). One of the most maddening things about AVI today is that the format doesn't allow for specifying an aspect ratio, so a 16:9 AVI video may start displaying at 4:3 - this is less of a problem if your player allows you to manually select aspect ratios. If you're watching in the non-pro version of QuickTime though, you'll have to learn to live with people being unusually narrow. Advanced Systems Format (.asf) ASF is a proprietary Microsoft container that usually houses files compressed with Microsoft's

Best noise reduction software

Shooting in dark conditions or in high ISO results are video with noise.But you dont have to worry. Neat Video  is a video filter designed to reduce visible noise and grain in video footage produced by digital video cameras, camcorders, TV-tuners, digitizers of film or analog video. Neat Video provides the  most accurate video noise reduction  currently available. Neat Video efficiently reduces noise of the following types: - digital sensor noise  produced by digital video capturing devices - film grain  visible in digitized film or analog video - dust ,  scratches ,  compression artifacts  and more. Neat Video is a video noise reduction plug-in for: Premiere Pro , Premiere Elements After Effects Final Cut Pro X / 7 / 6 , Motion, Final Cut Express AVX  hosts:  Media Composer , News Cutter, Avid DS OFX  hosts:  Nuke ,  Scratch ,  Resolve ,  Mistika / Mamba ,  Fusion , Natron, Quantel Pablo Rio ,  DustBuster+ Vegas Pro , Sony Movie Studio VirtualDub  (and AviSynth

MPEG 2 vs H264 vs H265

As time moves away,  compression techniques are created.  As a beginning we all have seen an AVI movie about 2 hours and equal to 30-40 GB. After that nee techniques were invented with a main reason : Same quality, less storage.  MPEG group created  some compression  standarts like MPEG1,2 and 4 but MPEG 2 became really  important and is being used in many devices.  What about H264 and H265? Well,  the quality is the same but they use less memory in your disk,  H265 < H264 < MPEG 2. But, is H265 worth? Well the simpliest answer is you but rembere : the higher the compression the higher the export time is. So if you have a powerfull computer use it. But  H265 is new and it is not tested in some devices. The good news is that google supports H265. This format is the greatest solution for youtube videos. Photo from : https://www.linkedin.com/pulse/future-video-encoding-h265-vs-h264-security-system-depot