Video processor Interframe prediction

I need to do "Inter-frame Prediction" and "Motion Compensation" of 30 frames for video processing in Matlab. I work with a mother daughter.

enter image description here

What I have done so far is to take the very first frame and divide it by

  • 8x8 blocks
  • executed by DCT
  • quantized him
  • edited it
  • performed reverse DCT.

I know that no motion estimation is required for the first frame and second frame. The restored frame is used as a reference for the second frame, etc. To estimate motion, I need to implement the "Full Search Block Matching Algorithm"

Question 1 . What is frame restoration? Is this the quantization and DCT I have listed above?

Question 2 : What is a full search block matching algorithm?

+3


source to share


1 answer


I'm going to assume that you mean the MPEG video compression consortium (MPEG-1, MPEG-2, H.264, etc.). Let me answer each question one at a time:

Question number 1 - Reconstruction of the frame

For one frame, the forward transform basically consists of decomposing the frame into 8 x 8 non-overlapping blocks, doing 8 x 8 DCT transforms on each block, quantizing the blocks, and then we do some more complex things like zig-zag ordering, length encoding etc.

Basically, your frame is represented as a compressed sequence of bits. the reconstruction of the frame goes in reverse, so you are almost right. This consists in restoring the sequence and undoing the zig-zag ordering, then de-quantizing the block, and then applying the IDCT. The reason they call this "reconstruction" is because you presented the frame in a different format. You are converting the frame back to what it should have been before the frame was compressed.

One thing that you may already know about is that frame quantization is the reason this methodology is lost . This means you won't be able to get the original frame , but you can make it as close to the original as possible. The advantage, however, is that with lossy algorithms you get high compression ratios, which means that the video size will be smaller and can be transferred easily.

In fact, if you are doing a single frame forward conversion, then do the reverse conversion. If you compare pixel by pixel by pixel, you can see that there are some subtle differences, but not enough to write about them. The parameters and design of how compression works have been tuned so that the average person's human visual system will not be able to notice the differences between the original and the restored frame in retrospect.

So why can you ask about the loss? The reason for this is that the MPEG consortium has used the fact that the video must be highly compressible and transferable in favor of the actual video quality. This is because quality has always been a subjective measure, even when you have numerical measures (like PSNR) that can measure image quality.

So the moral of this story is that reconstruction undoes the forward transform to compress the video frame, but it won't exactly be the same as the original frame, but close enough that a normal person won't complain.




Question No. 2 - Algorithm for matching blocks of full search

The basics of motion estimation is that we don't want to transmit every frame as full video frames in order to reduce the bandwidth. If you know the basics of the MPEG Video Consortium video compression algorithms, there are three classes of encoded frames in your video:

  • I-Frames are what are known as intracode frames. These frames have a complete compression algorithm performed on them (DCT, quantization, etc.). We do not have a video that consists entirely of I-Frames, as the video size is quite large. Instead, what is done is that I-frames are used as a reference point, and difference frames are sent after that point, where for each block as an I-frame, and the motion vector is transmitted. More details to follow.

  • P-Frames - Instead of sending another I-frame, we send a predicted frame or P-Frame. For each block from the reference I-frame, the P-Frame essentially tells us where the block best moves from one frame to another. These are what are known as motion vectors for each block. The rationale for this is that video is usually captured at such a high frame rate that consecutive video frames show very little difference, and therefore most blocks must remain unchanged or move very little. You will get to the point where the scene changes dramatically in the video or there is a lotthat even at high frame rates, you cannot completely capture all the movements with P-Frames alone. This is usually seen when you are watching MPEG video and there is a lot of high motion - you will see a lot of "blockiness" and this blocking is due to this fact. Thus, you will need to encode another I-Frame as a fast update and then continue from that point on. So most video files have frames encoded in such a way that you have one I-frame, then there is a bunch of P-frames, then there is another I-frame, followed by a bunch of P-frames, etc.

  • B-Frames are what are known as bi-directional predicted frames. These frames use information from the frame (or frames) that is in front and the frame (or frames) in the back. How this exactly works is beyond the scope of this post, but I would like to briefly talk about it to be self-contained.

Thus, one possible sequence of encoded frames follows the following format:

IPPPBPPPIPPPBPPPI...

      

However, it all depends on how your encoder is configured, but we'll leave that aside.

How helpful is it to ask? The reason is that your question about the Full Search Box Matching Algorithm deals with how P-frames are generated. For a given block in an I-frame , where is the best location that this block would move to the next frame ? To do this, we'll actually look at the blocks in the next frame and figure out the most similar block to the one in the I-Frame. You are probably asking yourself this question: Woah .... not many blocks to find? and the answer is yes. The full search matching algorithm basically searches the entire framefor the best matching block. This is quite computationally intensive, so most encoders actually limit the search to a finite window of moderate size around the block location. A full Block Matching search will give you the best results, but it takes too long and is definitely not worth it. We can exploit the fact that most blocks don't actually go as far as we assume the video was captured at such a high frame rate.




Hope this answers your questions!

+3


source







All Articles