WO2002001848A2

WO2002001848A2 - System and method for reducing the computational complexity of mpeg video decoding

Info

Publication number: WO2002001848A2
Application number: PCT/US2001/020661
Authority: WO
Inventors: Shahab Layeghi; Andy Hung
Original assignee: Intervideo, Inc.
Priority date: 2000-06-27
Filing date: 2001-06-27
Publication date: 2002-01-03
Also published as: TW535441B; WO2002001848A3; AU2001273059A1

Abstract

A system and corresponding method for reducing the computational complexity in decoding an MPEG encoded signal is disclosed. The disclosed system is a digital video disk player capable of decoding and constructing a previously encoded MPEG video signal based on a subset of information contained in the encoded input signal (Fig 4). The system of the present invention also includes means for determining whether the particular block to be decoded is of a predetermined type and decoding such block accordingly.

Description

SYSTEM AND METHOD FOR REDUCING THE COMPUTATIONAL COMPLEXITY OF MPEG VIDEO DECODING

NOTICE OF COPYRIGHT A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever. FIELD OF THE INVENTION

The present invention generally relates to MPEG video decoders and, more particularly, to a system and corresponding algorithm for reducing the complexity and time associated with decoding MPEG video signals.

BACKGROUND OF THE INVENTION In the Moving Pictures Experts Group (MPEG) video standard, a sequence is comprised of a series of video frames. Each video frame in the sequence is subdivided into a number of rectangular information blocks, each containing a pixel portion of the image. These information blocks are referred to as macroblocks. The pixel portions of the image are represented by a series of bits of data. In the MPEG video standard, the data bits are encoded in a particular fashion. The encoded bitstream contains compressed information based on encoding each macroblock of an image.

An MPEG decoder is used to decode each of the compressed macroblocks in the MPEG encoded bitstream based on previously transmitted and decoded video frames, called reference frames. In order to accommodate motion in the video frames, each decoded macroblock refers to one or more image regions in a previously transmitted reference frame to use as a prediction source for decoding the current frame. The displacement between the macroblock and the image regions of interest is called a motion vector. For MPEG, the displacement motion vectors are computed with half pixel resolution, referred to as half-pel prediction. Thus, the motion vectors represent displacement on a half-pel grid.

Computing the motion prediction is one of the most time consuming steps in decoding MPEG video signals. As discussed above, motion vectors in MPEG are computed with half pixel (half-pel) resolution. In situations where the horizontal or vertical components of the motion vector are an even number of half-pels, the displacement in the corresponding decoded block is the entire pixel displacement value as provided in the reference frame. Consequently, prediction of the corresponding macroblock in the decoded frame does not have to be computed. In situations where the horizontal or vertical components of the motion vector are an odd number of half-pels, the prediction macroblock is reconstructed by taking the average of all the surrounding pixels. There are four cases of motion prediction associated with any video image, depending on whether the x (horizontal) and y (vertical) components of the motion vector are even or odd: (1) x is even and y is even, this represents the situation where the motion prediction does not have to be computed as both the horizontal and vertical components have even values; (2) x is even and y is odd, this represents half-pel vertical prediction as illustrated in Fig. 1(a); (3) x is odd and y is even, this represents half-pel horizontal prediction as illustrated in Fig. 1(b); and (4) x is odd and y is odd, this represents half-pel horizontal and vertical prediction as illustrated in Fig. 1(c).

SUMMARY OF THE INVENTION The present invention is directed to a digital video system which incorporates an algorithm that adjusts the motion vector used to decode a corresponding macroblock of a digital video frame based on a subset of information contained in an encoded input video signal. The digital video system comprises means for providing an input signal, said input signal including a plurality of encoded macroblocks; means for constructing a plurality of decoded macroblocks in response to a subset of data present in said plurality of encoded macroblocks; and means for providing an output signal in response to said plurality of decoded macroblocks. The video system further includes a means for displaying the constructed video signal.

In an exemplary embodiment of the present invention, the constructing means is a decoder present within a larger digital video disk player which constructs a decoded video frame from an encoded reference frame by retrieving a motion vector from the reference frame; determining whether the block of the reference frame is of a particular type; constructing a modified motion vector from a subset of data contained within the reference frame; and applying the modified motion vector to a corresponding block within the reference frame. The algorithm performed by the decoder is most advantageously used in conjunction with motion vectors containing an odd number of half-pels, wherein the motion vector provided by the decoder is comprised of horizontal components and vertical components having values equal to the nearest even half-pel value of the corresponding reference value.

An advantage of the present invention is that it improves video decoding efficiency by reducing the number of calculations that need to be performed on a given frame.

Another advantage of the present invention is that it is straightforward to implement.

A feature of the present invention is that it has minimal effect on video image quality.

BRIEF DESCRIPTION OF THE DRAWINGS The aforementioned and related advantages and features of the present invention will become apparent upon review of the following detailed description of the invention, taken in conjunction with the following drawings, where like numerals represent like elements, in which:

Figures 1(a) - (c) are schematic representations of the calculations made to determine the motion vector components of a half-pel using conventional decoding techniques;

Figure 2 is a schematic representation of the structure of a video frame; Figures 3(a) - (c) are schematic representations of the types of macroblocks that comprise a video frame;

Figure 4 is a block diagram of a digital video disk player incorporating a decoder module that performs the improved computation algorithm according to the present invention;

Figure 5 is a flow chart illustrating the operating steps performed by the decoder module in decoding a macroblock according to the improved computation algorithm of the present invention; and Figures 6(a) - (b) are schematic representations of a macroblock being constructed using the motion vector calculated according to the improved computation algorithm of the present invention. . DETAILED DESCRIPTION OF THE INVENTION

The system and corresponding method of decoding previously encoded MPEG video signals will now be described with reference to Figures 2 - 6. In the MPEG video standard, a digital image is comprised of a series of frames. The types of frames that make up the video image are illustrated in Figure 2. As shown, a video sequence is comprised of an I-frame 20, a number of B-frames 22 and a P-frame 24. I-frames, also referred to as an intra-frame is a type of video frame that is encoded as a stand alone still image. I-frames allow random access points within the video stream. In application, I-frames are used where scene cuts occur. B-frames, or bi- directional frames, provide the most compression and decrease noise by averaging the pixel information contained in the frames that are used to decode (or predict) the contents of the B-frame. P-frames, or predicted frames, are frames that are encoded relative to the nearest I-frame or P-frame, resulting in forward prediction processing. The I-frame 20, B-frame 22 and P-frame 24 can be encoded using any one of several types (i.e. Huffman) of encoding schemes.

In application, the data (pixel representation) of the B-frame to be decoded is predicted by a preceding I-frame or P-frame and a subsequent I-frame or P-frame. For example, as shown in Figure 2, the pixel data of B-frame 22 is decoded by the data contained within the preceding I-frame 20 and a subsequent P-frame 24. The same decoding scheme may be applied to decode the contents of the second B-frame

23 in the frame series. The content of B-frame 23 is decoded by using the I-frame 20 and the subsequent P-frame 24. In an alternate embodiment of the present invention, the I-frame 20, alone, can be used to decode P-frame 24. Moreover, the P-frame 24 also can be used to decode either B-frame 23, B-frame 25 or a subsequent P-frame (not shown). In accordance with the present invention, the B-frame is not used to decode (predict the contents of) any other video frame. In a preferred embodiment of the present invention, the P-frame 24 is a fixed reference frame.

As shown in greater detail in Figure 3, each video frame is comprised of a series of blocks, referred to as macroblocks. Each macroblock is comprised of a luminance component and two chrominance components, referred to as U and V, respectively. Macroblocks contain a series of pixels (represented as dots in Figure 3(c)) which represent a larger image. In application, each pixel is comprised of a series of bits of information, represented as

where N is an integer. In a preferred embodiment of the present invention, N equals 8. Thus, each pixel is comprised of 8 bit of data. In a preferred embodiment, each macroblock is 16 pixels x 16 pixels in size. The luminance component of each macroblock contains 16 x 16 x 8 bits of information. The chrominance components of the macroblock contain two corresponding 8 x 8 x 8 bit blocks of information corresponding to the U and V portions, respectively. The decoding of the pixel information contained in a current macroblock, for example, current macroblock 32 is a combination of the changes present in a corresponding reference macroblock 32', plus the displacement of the reference macroblock 32' from a standard reference location. The displacement of the reference macroblock 32' from the standard reference location is referred to as the motion vector (V_m). V_m is comprised of two components: (1) the distance along the horizontal direction (x-axis) from the standard reference location (x-component or Vmx); and (2) the distance along the vertical direction (y-axis) from the standard reference location (y-component or V_my). In application, the reference macroblock 32' as shown in Figure 3(a), consists of information from a plurality of macroblocks that are determined during frame encoding and the corresponding V_m which provides the displacement of the reference macroblock 32' from the corner of the reference frame. After being encoded Vmx and V_my may either have an integer value or a non-integer value measured on a pel unit scale. When motion vectors are measured in half-pel units, integer values of components results in an even half-pel component and fractional values of components results in an odd half-pel component. Half-pel reference motion vectors require additional calculations before a corresponding V_m can be obtained. In those situations where the pixel displacement of the reference macroblock is in the middle of a base (reference) frame, the average displacement of that pixel from the reference pixel location requires the calculation of the displacement from all four surrounding pixel locations as illustrated in Figure 1(c).

This requires that four additional pixel average calculations be completed per pixel. As B-frames may have both forward and backward prediction, twice the amount of additional calculations may be required. This can significantly increases video frame decoding time.

The present invention is directed to a motion vector computation method which reduces the number of calculations that have to be performed when computing V_m when the reference displacement vector is a half-pel. The computed V_m is then used to generate a corresponding decoded macroblock. The computation method will now be described with reference to Figures 4-6.

Figure 4 is a block diagram of a digital versatile disk (DVD) player 40 that performs the improved computation algorithm according to the present invention. The DVD player 40 includes a navigation unit 42 and a corresponding video unit 44 which provides an output video signal to a display device 48 on line 47. In a preferred embodiment of the present invention, the display device 48 is a progressive display device, such as a computer monitor. In an alternate embodiment of the present invention, the display device 48 is an interlaced display device. The video unit 44 includes a video decoder module 45 and a video display module 46. The decoder module 45 decodes (constructs) the input video signal provided by the navigation unit 42 according to the improved computation algorithm according to the present invention.

The navigation unit 42 accepts a digital media element such as, for example, a digital versatile disk 11 having digital information, i.e., audio, video and complementary information stored thereon. The navigation unit 42 is capable of differentiating between the different types of information stored on the disk 11 and providing the encoded video information on a first data line (VIDEO). The audio and other complementary information stored on the disk 11 are provided on an AUDIO line and a COMP line, respectively.

The encoded video information present on the VIDEO line is transferred to the video unit 44 through the video decoder module 45. The video decoder module 45 receives the encoded video bit stream from the navigation unit 42 and reconstructs the I-frames and P-frames of the reference frame 32' (Figure 3). Using the I-frame 20 and the P-frame 24, and the reference motion vector (V_m') from the reference frame 32', the individual macroblocks of the B-frame 22 are decoded by the video decoder module 45 based on the following representative algorithm: For each macroblock in the image

{

Find motion vector components mv-x, mv-y; if (picture coding-type = = B TYPE)

{ if(luminance_block)

{ mv_x = mx_x &~1 ; } if(chrominance_block)

{ mv_x = mv_x & ~1; mv_y = mv_y & ~1; }

}

Do motion compensation for the block;

} As illustrated by the pseudo-code provided above, the video decoder module 45 generates a modified motion vector (V_m) for each macroblock based on a subset of the displacement data (horizontal (x) and vertical (y) components of the motion vector) present in the reference motion vector V_m'. After generating the motion vector for the current macroblock being decoded, such motion vector is used to obtain the decoded macroblock to provide a modified (decoded) video signal. The modified video signal is then transferred to the video display module 46 on line 43.

The video display module 46 includes a detection unit (not shown) and a processing unit (not shown) which are capable of detecting the modified video signal provided by the video decoder module 45 and converting the modified video signal into the output video signal that is transferred to the computer monitor 48 on line 47.

The computation steps performed by the video decoder module 45 to construct the modified video signal from the encoded input video signal will now be described with reference to Figure 5. Figure 5 is a flow chart illustrating the improved computation algorithm according to the present invention. In a first step 60, the horizontal (x-component) and vertical (y-component) components of the reference motion vector are obtained. Next, in step 62, a determination is made as to whether the current macroblock to be decoded is within a B-frame. This is done by detecting whether the picture coding type present in the reference frame is bi-directional type. This can be accomplished, for example, by detecting the presence of a flag bit preceding, within, of subsequent to the data bits that comprise the current macroblock. If the current macroblock to be decoded is not in a B-frame, then control is passed to step 68 where conventional motion compensation is performed on the macroblock using the reference motion vector. The computation algorithm employed by the present invention is only used on B-frames. The reason for applying the computation algorithm only to B-frames is that they are not reference frames used for later decoding. The I-frames and P-frames are reference frames and are only used to decode B-frames. On the other hand, if the current frame to be decoded is a B-frame, control is passed to step 64 where a determination is made as to whether the current block is the luminance portion of the block. The luminance portion of the block includes data representative of the brightness of the corresponding image. For a luminance block, control is then passed to step 65 where the horizontal displacement (Vmx) of the modified motion vector is approximated to be the nearest even number integer to that provided by the reference motion vector on a half-pel basis. For example, if the horizontal displacement of the reference motion vector has a value of 9, the algorithm of the present invention approximates V_mx to have a value of 8 or 10. The vertical displacement, V_my retains its current value as provided by the reference motion vector. Thus, no approximation is performed on the y-component (or vertical displacement) of the reference motion vector. By approximating the value of V_mx, the additional computation steps that are performed in conventional decoding schemes to determine the horizontal displacement of the current motion vector are eliminated. This results in increased decoding speed. After V_mx has been determined, motion compensation is performed on the current macroblock in step 68 where the modified motion vector is applied to the current macroblock to place the current macroblock in the correct position with respect to the frame being decoded.

If the current portion of the block being decoded is not a luminance portion, then the current portion is the chrominance portion of the macroblock and the decoder module of the present invention approximates the horizontal displacement (x- component) and the vertical displacement (y-component) of the current motion vector to have a value equal to the nearest lowest or highest integer of the corresponding values in the reference motion vector in step 66. More specifically, if the horizontal displacement of the reference motion vector has a value of 9, the algorithm of the present invention approximates V_mx to have a value of 8 or 10, the nearest integers. Correspondingly, if the vertical displacement of the reference motion vector has a value of 7, the algorithm of the present invention approximates V_my to have a value of

6 or 8, the nearest integers. Thus, if the current macroblock to be decoded is a color block, the entire motion vector is calculated according to the present invention. In this fashion, decoding time is significantly reduced. In experiments performed by the inventors, it was determined that using the computation scheme of the present invention, decoding time is decreased by 25% as compared to conventional decoding schemes, with no significant degradation in resulting video image quality.

After the motion vector of the current macroblock has been calculated in step 66, standard motion compensation is then performed to recover the pixel data from the macroblock in step 68 using the V_m calculated in step 66 as shown in Figure 6. As shown in Figure 6, the currently decoded frame 60' (Fig. 6(b)) contains the same pixel information contained in the reference frame 60 (Fig. 6(a)), shifted by the amount of the motion vector V_m.

The foregoing detailed description of the invention has been provided for the purposes of illustration and description. Although an exemplary embodiment of the present invention has been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiment disclosed, and that various changes and modifications to the invention are possible in light of the above teaching. Accordingly, the scope of the present invention is to be defined by the claims appended hereto.

Claims

WHAT IS CLAIMED IS:

1. A digital video system, comprising: means for providing an input video signal, said input video signal including a plurality of blocks; means for constructing a plurality of decoded blocks on a block by block basis in response to a subset of data present in said plurality of encoded blocks; and means for providing an output video input signal in response to said plurality of decoded blocks, wherein said output video signal is formatted to be displayed as a conventional video image.

2. The video system of Claim 1, wherein said video image is displayed on a progressive display device.

3. The video system of Claim 1, wherein said video image is displayed on an interlaced display device.

4. The video system of Claim 1, wherein said constructing means comprises a decoder capable of generating a decoded motion vector as a function of a subset of information present in said encoded block, said decoder including means for generating a decoded block in response to said motion vector.

5. The video system of Claim 4, wherein said motion vector includes a horizontal component and a vertical component, and said decoder constructs said video image by displacing said previously decoded blocks by an amount corresponding to said motion vector.

6. The video system of Claim 5, wherein said decoder detects whether said encoded block is a luminance block and in response to the detection of a luminance block, adjusting the horizontal half-pel component of said motion vector to the nearest even integer value.

7. The video system of Claim 5, wherein said decoder detects whether said encoded block is a chrominance block and in response to the detection of a chrominance block, adjusting the horizontal and vertical half-pel components of said motion vector to the nearest integer value.

8. The video system of Claim 1, wherein said providing means is a navigation unit for generating said input video signal in response to information read from a digital media element.

9. The video system of Claim 1, wherein said providing means comprises a video display module capable of combining said plurality of decoded blocks into an output video signal.

10. A method of constructing a video frame signal from a reference frame comprising a plurality of coded blocks, comprising the steps of:

(a) retrieving a reference motion vector from one of said plurality of coded blocks;

(b) detecting whether the current block to be decoded is a luminance block;

(c) constructing a modified motion vector based on a subset of information contained in said reference frame and said current frame; and

(d) applying the modified motion vector constructed in step (c) to a corresponding block within said current frame.

11. The method of Claim 10, wherein step (c) comprises the step of:

(cl) adjusting a component value of said motion vector to the nearest even value.

12. The method of Claim 11, wherein the component part of step (cl) is the horizontal half-pel component value of said motion vector.

13. A method of constructing a video frame signal from a reference frame comprising a plurality of coded blocks, comprising the steps of:

(a) retrieving a reference motion vector from one of said plurality of coded blocks; (b) detecting whether the current block to be decoded is a chrominance block;

(c) constructing a modified motion vector by adjusting the component values of said reference frame; and

(d) applying the modified motion vector constructed in step (c) to the corresponding block to be decoded.

14. The method of Claim 13, wherein step (c) comprises the steps of: (cl) adjusting the horizontal component value of said motion vector t the nearest integer; and (c2) adjusting the vertical component value of said motion vector to the nearest integer.