WO1999033275A1 - Partial decoding of compressed video sequences - Google Patents

Partial decoding of compressed video sequences Download PDF

Info

Publication number
WO1999033275A1
WO1999033275A1 PCT/US1998/027223 US9827223W WO9933275A1 WO 1999033275 A1 WO1999033275 A1 WO 1999033275A1 US 9827223 W US9827223 W US 9827223W WO 9933275 A1 WO9933275 A1 WO 9933275A1
Authority
WO
WIPO (PCT)
Prior art keywords
low
block
image data
transform
compressed video
Prior art date
Application number
PCT/US1998/027223
Other languages
French (fr)
Inventor
Stuart Jay Golin
Charles Martin Wine
Original Assignee
Sarnoff Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sarnoff Corporation filed Critical Sarnoff Corporation
Priority to EP98964199A priority Critical patent/EP1048174A4/en
Priority to CA002310652A priority patent/CA2310652C/en
Priority to JP2000526055A priority patent/JP2001527352A/en
Priority to KR1020007007059A priority patent/KR20010033550A/en
Priority to AU19377/99A priority patent/AU1937799A/en
Publication of WO1999033275A1 publication Critical patent/WO1999033275A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region
    • H04N5/147Scene change detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • H04N19/23Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding with coding of regions that are present throughout a whole video segment, e.g. sprites, background or mosaic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/142Detection of scene cut or scene change
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/179Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/87Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving scene cut or scene change detection in combination with video compression
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording

Definitions

  • the present invention relates to video processing, and, in particular, to the decoding of compressed video sequences.
  • Video images (i.e., frames) in a digital video sequence are typically represented by arrays of picture elements or pixels, where each pixel is represented by one or more different components.
  • each pixel is represented by a component whose value corresponds to the intensity of the pixel.
  • each pixel is represented by a red component, a green component, and a blue component.
  • each pixel is represented by an intensity (or luminance) component Y and two color (or chrominance) components U and V.
  • each pixel component is represented by an 8-bit value.
  • a typical video sequence is made up of sets of consecutive frames called shots, where the frames of a given shot correspond to the same basic scene.
  • a shot is an unbroken sequence of frames from one camera.
  • Parsing is also useful for video editing and compression.
  • One way to distinguish different shots in a digital video sequence is to analyze histograms corresponding to the video frames.
  • a frame histogram is a representation of the distribution of component values for a frame of video data.
  • a 32-bin histogram may be generated for the 8-bit Y components of a video frame represented in a 24-bit YUV color format, where the first bin indicates how many pixels in the frame have Y values between 0 and 7 inclusive, the second bin indicates how many pixels have Y values between 8 and 15 inclusive, and so on until the thirty-second bin indicates how many pixels have Y values between 248 and 255 inclusive.
  • Multi-dimensional histograms can also be generated based on combinations of bits from different components. For example, a three-dimensional (3D) histogram can be generated for YUV data using the 4 most significant bits (MSBs) of the Y components and the 3 MSBs of the U and V components. Where each bin in the 3D histogram corresponds to the number of pixels in the frame having the same 4 MSBs of Y, the same 3 MSBs of U, and the same 3 MSBs of V.
  • MSBs most significant bits
  • One general characteristic of video sequences is that, for certain types of histograms, the histograms for frames within a given shot are typically more similar than the histograms for frames in shots corresponding to different scenes.
  • one way to parse digital video into its constituent shots is to look for the transitions between shots by comparing histograms generated for the frames in the video sequence.
  • frames with similar histograms are more likely to correspond to the same scene than frames with dissimilar histograms.
  • Video sequences are typically stored and/or transmitted in a compressed digital form, in which the original pixel data are processed to exploit redundancy that occurs both within a video frame and between video frames.
  • a transform such as a discrete cosine transform (DCT)
  • DCT discrete cosine transform
  • a DCT transform may be applied directly to pixel values or, when motion estimation is employed, the DCT transform is applied to inter-frame pixel differences corresponding to differences between pixel data in the current frame and motion-compensated pixel data from a reference frame.
  • Inter-frame differencing can be implemented with or without motion estimation and motion compensation.
  • the resulting DCT coefficients may be quantized and then run- length (RL) encoded using, for example, a zig-zag RL pattern.
  • the RL encoded data may then optionally be further encoded for storage and/or transmission as a compressed video data stream.
  • the motion vectors derived during motion estimation and used during motion-compensated inter-frame differencing are also encoded into the compressed video data stream.
  • the different planes of pixel component values for each video frame are typically compressed separately.
  • the encoding steps are reversed to recover full video images for display.
  • the compressed data are RL decoded and dequantized, as necessary, to recover decoded DCT coefficients.
  • An inverse DCT transform is then applied to the decoded DCT coefficients to recover pixel data. If the pixel data correspond to inter-frame differences, then motion-compensated inter-frame addition is performed using motion vectors (decoded from the compressed video stream) to generate decoded pixel intensity values for the current frame of the decoded video sequence.
  • One way to parse a compressed video sequence into its constituent shots is to completely decompress the compressed video stream to recover fully decoded video frames, and then apply histogram analysis to the decompressed video sequence. This can be computationally expensive and slow, because, for example, the inverse DCT transform is computationally intense.
  • An alternative technique for parsing compressed video sequences that have been generated using a compression algorithm based on a transform, such as the DCT transform, is based on a low- resolution decoding scheme.
  • this low-resolution decoding scheme all but the lowest spatial frequency DCT coefficient in the compressed video stream (i.e., the DC coefficient for the DCT transform) are ignored and only the DC coefficient is used to generate a low-resolution decoded image for each frame in the compressed video stream, where each pixel in the low-resolution decoded image is the DC coefficient for a block of pixels in the corresponding frame of the original video sequence. Histogram analysis is then performed on the low-resolution images.
  • Figs. 1(A) and 1(B) show original frame 102 in an original video sequence and low-resolution frame 104 in a corresponding low-resolution decoded video sequence, respectively.
  • Original frame 102 has 480 rows and 512 columns of pixels.
  • original frame 102 is divided into (8x8) blocks of pixels, wherein original frame 102 has 60 rows and 64 columns of such (8x8) blocks.
  • Motion estimation is performed for each (8x8) block to identify a motion vector that relates each (8x8) block to a corresponding (8x8) block in a reference frame (not shown).
  • Motion-compensated inter-frame differencing is applied to generate an (8x8) block of inter- frame pixel differences for each (8x8) block in original frame 102.
  • An (8x8) DCT transform is applied to each (8x8) block of inter-frame pixels differences to generate an (8x8) block of DCT coefficients, where the DC coefficient is typically located in the upper left corner.
  • Each (8x8) block of DCT coefficients is then quantized, run-length encoded, and possibly further encoded, along with the motion vectors, for storage and/or transmission as a compressed video stream.
  • run-length decoding is applied to the compressed video data to recover the quantized DCT coefficients.
  • the quantized DC coefficient from each set of DCT coefficients is dequantized and motion-compensated inter-frame addition is applied to the decoded DC coefficients (using decoded motion vectors properly scaled by a factor of 8) to generate low-resolution frame 104 of Fig. 1, where each pixel in low-resolution frame 104 corresponds to only the decoded DC coefficient.
  • each (8x8) block in original frame 102 is represented by a single pixel in low-resolution frame 104, which therefore has only 60 (i.e., 480/8) rows and 64 (i.e., 512/8) columns of pixels. Since the DC coefficient of the DCT transform is equivalent to the average intensity value for the 64 pixels in the corresponding (8x8) pixel block, low- resolution frame 104 is a low-resolution approximation of original frame 102.
  • Histogram analysis is then applied to the sequence of low-resolution frames to parse the compressed video sequence into its constituent shots. Since this low-resolution decoding scheme avoids the computationally expensive inverse DCT processing, parsing of compressed video sequences can be accomplished faster and more cheaply than if the histogram analysis is applied to fully decoded images that are generated using inverse DCT processing. Unfortunately, the resolution of the decoded frames using this conventional low-resolution decoding scheme may be too low to provide accurate parsing results, leading to too many false positives (i.e., identification of transitions between shots that are not true transitions in the original video sequence) and/or false negatives (i.e., missing true transitions in the original video sequence).
  • the present invention is directed to a scheme for partially decoding compressed video streams for such applications as video parsing.
  • the compressed video stream is decoded to recover one or more low-frequency transform coefficients for each block of original image data.
  • a block of low-frequency image data is generated from each set of low-frequency transform coefficients corresponding to each block of original image data.
  • Motion-compensated inter-frame differencing is applied to each block of low-frequency image data to generate a partially decoded image for each frame in the compressed video stream.
  • Figs. 1(A) and 1(B) show an original frame in an original video sequence and a low-resolution frame in a corresponding low-resolution decoded video sequence, respectively;
  • Fig. 2 shows a flow diagram of the processing, according to one embodiment of the present invention
  • Figs. 3(A) and 3(B) show an original frame in an original video sequence and a partially decoded frame in a corresponding partially decoded video sequence, respectively, according to the processing of Fig. 2;
  • Fig. 3(C) shows a (4x4) block of replicated DC coefficients representative of the sub-blocks used to generate the partially decoded image of Fig. 3(B); and Figs. 4(A)-4(D) show graphical representations of inter-frame histogram differences (with arbitrary scale along the Y axis) plotted against frame number for a 1500-frame test sequence encoded using the Px64 video compression scheme for four different DC-component block sizes.
  • a transform-based compressed video stream is partially decoded to generate partially decoded images that may then be subjected to subsequent processing, such as histogram analysis for video parsing.
  • the partially decoded images are generated by building blocks using only the decoded DC transform coefficients from the compressed video stream.
  • Fig. 2 shows a flow diagram of the processing, according to one embodiment of the present invention.
  • the compressed video stream is partially decoded to recover the DC coefficients of the encoded transform coefficients (step 202 of Fig. 2).
  • step 202 would involve decoding of the compressed video stream (e.g., run-length decoding and possibly dequantization) just enough to recover from the bitstream the decoded DC DCT coefficient corresponding to each (8x8) block of pixels in the original video sequence.
  • Blocks of image data are then generated using only the DC coefficients by replicating the corresponding DC coefficient for each pixel in a block (step 204).
  • each block is a sub-block that is smaller than the corresponding region of the original video sequence used to generate the transform coefficients. For example, if the DCT transform was applied to (8x8) blocks in the original video sequence, then the sub-blocks generated in step 204 might be only (2x2) or (4x4) (although other sizes can also be used). Alternatively, the blocks of replicated DC coefficients could be the same size (e.g., 8x8) as the transform.
  • motion-compensated inter-frame addition is then performed to generate partially decoded images (step 206), which can then be subjected to additional processing, such as histogram analysis for video parsing (step 208). If the blocks of replicated DC coefficients are smaller than the size of the original transform, then the decoded motion vectors used during motion-compensated inter- frame addition must be scaled accordingly.
  • (4x4) or (2x2) sub-blocks One advantage to using (4x4) or (2x2) sub-blocks is that motion vectors can be scaled down by factors of 2 or 4, respectively, simply by shifting bits, rather than having to implement a divide operation.
  • Figs. 3(A) and 3(B) show original frame 302 in an original video sequence and partially decoded frame 304 in a corresponding partially decoded video sequence, respectively, according to one implementation of the processing of Fig. 2.
  • Original frame 302 is a (480x512) frame, similar to frame 102 of Fig. 1(A).
  • Partially decoded image 304 is generated from (4x4) blocks of replicated DC coefficients and therefore has 240 rows and 256 columns of pixels.
  • Fig. 3(C) shows a (4x4) block 306 of replicated DC coefficients representative of the sub- blocks used to generate partially decoded image 304 of Fig. 3(B).
  • each pixel for a given sub- block of replicated DC coefficients contains the same piece of information, when motion-compensated inter-frame addition is performed using properly scaled decoded motion vectors, the corresponding sub-blocks in the resulting partially decoded images will typically not contain replications of the same data, since most motion vectors will have components other than integer multiples of 4.
  • this partial decoding scheme provides better video parsing of compressed video streams (i.e., fewer false positives and/or fewer false negatives) than does the low- resolution decoding scheme of the prior art.
  • the partial decoding scheme of the present invention shares the advantage of avoiding implementation of the reverse transform of the prior-art low-resolution decoding scheme.
  • the present invention may provide better results at an affordable increase in computational cost.
  • Figs. 4(A)-4(D) show graphical representations of inter-frame histogram differences (with arbitrary scale along the Y axis) plotted against frame number for a 1500-frame test sequence encoded using the Px64 video compression scheme for four different DC-component block sizes.
  • Fig. 4(A) corresponds to the prior-art processing in which each (8x8) block of pixels is represented by a single DC value (i.e., a (lxl) block) in a low-resolution image.
  • Fig. 4(B) corresponds to processing according to the present invention in which each (8x8) block of pixels is represented by a (2x2) block of replicated DC values in a partially decoded image.
  • Figs. 4(A)-4(D) show graphical representations of inter-frame histogram differences (with arbitrary scale along the Y axis) plotted against frame number for a 1500-frame test sequence encoded using the Px64 video compression scheme for four different DC-component block sizes.
  • 4(C) and 4(D) correspond to processing according to the present invention in which each (8x8) block of pixels if represented by a (4x4) and an (8x8) block, respectively, of replicated DC values in a partially decoded image.
  • the large peaks in these figures indicate scene changes in the video sequence.
  • Figs. 4(A)-(D) show that as the block of replicated DC values increases from (lxl) to (8x8), the background noise level in the results decreases. This is one of the advantages of the present invention over the prior-art, low-resolution scheme of Fig. 4(A).
  • the present invention has been described in the context of generating partially decoded images using only the DC transform coefficients, those skilled in the art will understand that, in alternative implementations, two or more of the low-frequency transform coefficients (including the DC coefficient) can be used to generate partially decoded images.
  • two or more of the low-frequency transform coefficients can be used to generate partially decoded images.
  • the present invention has been described in the context of video parsing based on histogram analysis, those skilled in the art will understand that the present invention can be used for other applications in which low-resolution are acceptable, such as picture-in-a-picture generation, fast-forward replays of video sequences, target recognition, and motion detection.
  • the present invention can be embodied in the form of methods and apparatuses for practicing those methods.
  • the present invention can also be embodied in the form of program code embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
  • the present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
  • program code When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.

Abstract

Compressed video sequences generated using a (e.g., DCT) transform are partially decoded (202) to recover (204) the low-frequency transform coefficients (e.g., only the DC coefficients) from the encoded bitstream. The low-frequency coefficients are then used to generate blocks of image data that are subjected to motion-compensated inter-frame addition (206) to generate partially decoded images that can be used for subsequent processing, such as histogram analysis (208) for video parsing. The present invention avoids having to perform the computationally expensive inverse transform operations, while still achieving satisfactory results.

Description

PARTIAL DECODING OF COMPRESSED VIDEO SEQUENCES
BACKGROUND OF THE INVENTION Field of the Invention The present invention relates to video processing, and, in particular, to the decoding of compressed video sequences.
Cross-Reference to Related Applications
This application claims the benefit of the filing date of U.S. provisional application no. 60/068,774, filed on 12/23/97.
Statement Regarding Federally Sponsored Research or Development
The Government of the United States of America has certain rights in at least part of this invention pursuant to Government Contract No. MDA-904-95-C-3126.
Description of the Related Art
Video images (i.e., frames) in a digital video sequence are typically represented by arrays of picture elements or pixels, where each pixel is represented by one or more different components. For example, in a monochrome gray-scale image, each pixel is represented by a component whose value corresponds to the intensity of the pixel. In an RGB color format, each pixel is represented by a red component, a green component, and a blue component. Similarly, in a YUV color format, each pixel is represented by an intensity (or luminance) component Y and two color (or chrominance) components U and V. In 24-bit versions of these color formats, each pixel component is represented by an 8-bit value. A typical video sequence is made up of sets of consecutive frames called shots, where the frames of a given shot correspond to the same basic scene. A shot is an unbroken sequence of frames from one camera. There is currently considerable interest in the parsing of digital video into its constituent shots. This is driven by the rapidly increasing availability of video material, and the need to create indexes for video databases. Parsing is also useful for video editing and compression. One way to distinguish different shots in a digital video sequence is to analyze histograms corresponding to the video frames. A frame histogram is a representation of the distribution of component values for a frame of video data. For example, a 32-bin histogram may be generated for the 8-bit Y components of a video frame represented in a 24-bit YUV color format, where the first bin indicates how many pixels in the frame have Y values between 0 and 7 inclusive, the second bin indicates how many pixels have Y values between 8 and 15 inclusive, and so on until the thirty-second bin indicates how many pixels have Y values between 248 and 255 inclusive. Multi-dimensional histograms can also be generated based on combinations of bits from different components. For example, a three-dimensional (3D) histogram can be generated for YUV data using the 4 most significant bits (MSBs) of the Y components and the 3 MSBs of the U and V components. Where each bin in the 3D histogram corresponds to the number of pixels in the frame having the same 4 MSBs of Y, the same 3 MSBs of U, and the same 3 MSBs of V.
One general characteristic of video sequences is that, for certain types of histograms, the histograms for frames within a given shot are typically more similar than the histograms for frames in shots corresponding to different scenes. As such, one way to parse digital video into its constituent shots is to look for the transitions between shots by comparing histograms generated for the frames in the video sequence. In general, frames with similar histograms are more likely to correspond to the same scene than frames with dissimilar histograms.
Video sequences are typically stored and/or transmitted in a compressed digital form, in which the original pixel data are processed to exploit redundancy that occurs both within a video frame and between video frames. There are many known algorithms for compressing video data. Some of these algorithms employ a transform, such as a discrete cosine transform (DCT), that transforms pixel intensity data into transform coefficients in a spatial frequency domain. A DCT transform may be applied directly to pixel values or, when motion estimation is employed, the DCT transform is applied to inter-frame pixel differences corresponding to differences between pixel data in the current frame and motion-compensated pixel data from a reference frame. Inter-frame differencing can be implemented with or without motion estimation and motion compensation. After applying the DCT transform to the appropriate pixel data, the resulting DCT coefficients may be quantized and then run- length (RL) encoded using, for example, a zig-zag RL pattern. The RL encoded data may then optionally be further encoded for storage and/or transmission as a compressed video data stream. The motion vectors derived during motion estimation and used during motion-compensated inter-frame differencing are also encoded into the compressed video data stream. For multi-component video data, such as RGB and YUV data, the different planes of pixel component values for each video frame are typically compressed separately.
When video sequences are to be recovered from a compressed video stream, the encoding steps are reversed to recover full video images for display. For example, for the compression algorithm described above, the compressed data are RL decoded and dequantized, as necessary, to recover decoded DCT coefficients. An inverse DCT transform is then applied to the decoded DCT coefficients to recover pixel data. If the pixel data correspond to inter-frame differences, then motion-compensated inter-frame addition is performed using motion vectors (decoded from the compressed video stream) to generate decoded pixel intensity values for the current frame of the decoded video sequence. One way to parse a compressed video sequence into its constituent shots is to completely decompress the compressed video stream to recover fully decoded video frames, and then apply histogram analysis to the decompressed video sequence. This can be computationally expensive and slow, because, for example, the inverse DCT transform is computationally intense.
An alternative technique for parsing compressed video sequences that have been generated using a compression algorithm based on a transform, such as the DCT transform, is based on a low- resolution decoding scheme. In this low-resolution decoding scheme, all but the lowest spatial frequency DCT coefficient in the compressed video stream (i.e., the DC coefficient for the DCT transform) are ignored and only the DC coefficient is used to generate a low-resolution decoded image for each frame in the compressed video stream, where each pixel in the low-resolution decoded image is the DC coefficient for a block of pixels in the corresponding frame of the original video sequence. Histogram analysis is then performed on the low-resolution images.
Figs. 1(A) and 1(B) show original frame 102 in an original video sequence and low-resolution frame 104 in a corresponding low-resolution decoded video sequence, respectively. Original frame 102 has 480 rows and 512 columns of pixels. According to one possible compression algorithm, original frame 102 is divided into (8x8) blocks of pixels, wherein original frame 102 has 60 rows and 64 columns of such (8x8) blocks. Motion estimation is performed for each (8x8) block to identify a motion vector that relates each (8x8) block to a corresponding (8x8) block in a reference frame (not shown). Motion-compensated inter-frame differencing is applied to generate an (8x8) block of inter- frame pixel differences for each (8x8) block in original frame 102. An (8x8) DCT transform is applied to each (8x8) block of inter-frame pixels differences to generate an (8x8) block of DCT coefficients, where the DC coefficient is typically located in the upper left corner. Each (8x8) block of DCT coefficients is then quantized, run-length encoded, and possibly further encoded, along with the motion vectors, for storage and/or transmission as a compressed video stream.
According to the low-resolution decoding scheme, in order to parse the compressed video stream (e.g., to identify transitions between shots), run-length decoding is applied to the compressed video data to recover the quantized DCT coefficients. If appropriate, the quantized DC coefficient from each set of DCT coefficients is dequantized and motion-compensated inter-frame addition is applied to the decoded DC coefficients (using decoded motion vectors properly scaled by a factor of 8) to generate low-resolution frame 104 of Fig. 1, where each pixel in low-resolution frame 104 corresponds to only the decoded DC coefficient. As such, each (8x8) block in original frame 102 is represented by a single pixel in low-resolution frame 104, which therefore has only 60 (i.e., 480/8) rows and 64 (i.e., 512/8) columns of pixels. Since the DC coefficient of the DCT transform is equivalent to the average intensity value for the 64 pixels in the corresponding (8x8) pixel block, low- resolution frame 104 is a low-resolution approximation of original frame 102.
Histogram analysis is then applied to the sequence of low-resolution frames to parse the compressed video sequence into its constituent shots. Since this low-resolution decoding scheme avoids the computationally expensive inverse DCT processing, parsing of compressed video sequences can be accomplished faster and more cheaply than if the histogram analysis is applied to fully decoded images that are generated using inverse DCT processing. Unfortunately, the resolution of the decoded frames using this conventional low-resolution decoding scheme may be too low to provide accurate parsing results, leading to too many false positives (i.e., identification of transitions between shots that are not true transitions in the original video sequence) and/or false negatives (i.e., missing true transitions in the original video sequence).
SUMMARY OF THE INVENTION The present invention is directed to a scheme for partially decoding compressed video streams for such applications as video parsing. According to one embodiment, the compressed video stream is decoded to recover one or more low-frequency transform coefficients for each block of original image data. A block of low-frequency image data is generated from each set of low-frequency transform coefficients corresponding to each block of original image data. Motion-compensated inter-frame differencing is applied to each block of low-frequency image data to generate a partially decoded image for each frame in the compressed video stream.
BRIEF DESCRIPTION OF THE DRAWINGS Other aspects, features, and advantages of the present invention will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings in which:
Figs. 1(A) and 1(B) show an original frame in an original video sequence and a low-resolution frame in a corresponding low-resolution decoded video sequence, respectively;
Fig. 2 shows a flow diagram of the processing, according to one embodiment of the present invention; Figs. 3(A) and 3(B) show an original frame in an original video sequence and a partially decoded frame in a corresponding partially decoded video sequence, respectively, according to the processing of Fig. 2;
Fig. 3(C) shows a (4x4) block of replicated DC coefficients representative of the sub-blocks used to generate the partially decoded image of Fig. 3(B); and Figs. 4(A)-4(D) show graphical representations of inter-frame histogram differences (with arbitrary scale along the Y axis) plotted against frame number for a 1500-frame test sequence encoded using the Px64 video compression scheme for four different DC-component block sizes.
DETAILED DESCRIPTION According to embodiments of the present invention, a transform-based compressed video stream is partially decoded to generate partially decoded images that may then be subjected to subsequent processing, such as histogram analysis for video parsing. The partially decoded images are generated by building blocks using only the decoded DC transform coefficients from the compressed video stream.
Fig. 2 shows a flow diagram of the processing, according to one embodiment of the present invention. The compressed video stream is partially decoded to recover the DC coefficients of the encoded transform coefficients (step 202 of Fig. 2). For example, if the compressed video stream was encoded using an (8x8) DCT transform, step 202 would involve decoding of the compressed video stream (e.g., run-length decoding and possibly dequantization) just enough to recover from the bitstream the decoded DC DCT coefficient corresponding to each (8x8) block of pixels in the original video sequence.
Blocks of image data are then generated using only the DC coefficients by replicating the corresponding DC coefficient for each pixel in a block (step 204). In one embodiment, each block is a sub-block that is smaller than the corresponding region of the original video sequence used to generate the transform coefficients. For example, if the DCT transform was applied to (8x8) blocks in the original video sequence, then the sub-blocks generated in step 204 might be only (2x2) or (4x4) (although other sizes can also be used). Alternatively, the blocks of replicated DC coefficients could be the same size (e.g., 8x8) as the transform. If appropriate, motion-compensated inter-frame addition is then performed to generate partially decoded images (step 206), which can then be subjected to additional processing, such as histogram analysis for video parsing (step 208). If the blocks of replicated DC coefficients are smaller than the size of the original transform, then the decoded motion vectors used during motion-compensated inter- frame addition must be scaled accordingly. One advantage to using (4x4) or (2x2) sub-blocks is that motion vectors can be scaled down by factors of 2 or 4, respectively, simply by shifting bits, rather than having to implement a divide operation.
Figs. 3(A) and 3(B) show original frame 302 in an original video sequence and partially decoded frame 304 in a corresponding partially decoded video sequence, respectively, according to one implementation of the processing of Fig. 2. Original frame 302 is a (480x512) frame, similar to frame 102 of Fig. 1(A). Partially decoded image 304 is generated from (4x4) blocks of replicated DC coefficients and therefore has 240 rows and 256 columns of pixels.
Fig. 3(C) shows a (4x4) block 306 of replicated DC coefficients representative of the sub- blocks used to generate partially decoded image 304 of Fig. 3(B). Although each pixel for a given sub- block of replicated DC coefficients contains the same piece of information, when motion-compensated inter-frame addition is performed using properly scaled decoded motion vectors, the corresponding sub-blocks in the resulting partially decoded images will typically not contain replications of the same data, since most motion vectors will have components other than integer multiples of 4.
The inventors have found that this partial decoding scheme provides better video parsing of compressed video streams (i.e., fewer false positives and/or fewer false negatives) than does the low- resolution decoding scheme of the prior art. Although there is some additional computational load due, for example, to performing motion-compensated inter-frame addition for larger decoded images, the partial decoding scheme of the present invention shares the advantage of avoiding implementation of the reverse transform of the prior-art low-resolution decoding scheme. Thus, depending on the processing constraints, the present invention may provide better results at an affordable increase in computational cost.
Figs. 4(A)-4(D) show graphical representations of inter-frame histogram differences (with arbitrary scale along the Y axis) plotted against frame number for a 1500-frame test sequence encoded using the Px64 video compression scheme for four different DC-component block sizes. Fig. 4(A) corresponds to the prior-art processing in which each (8x8) block of pixels is represented by a single DC value (i.e., a (lxl) block) in a low-resolution image. Fig. 4(B) corresponds to processing according to the present invention in which each (8x8) block of pixels is represented by a (2x2) block of replicated DC values in a partially decoded image. Similarly, Figs. 4(C) and 4(D) correspond to processing according to the present invention in which each (8x8) block of pixels if represented by a (4x4) and an (8x8) block, respectively, of replicated DC values in a partially decoded image. The large peaks in these figures indicate scene changes in the video sequence.
Figs. 4(A)-(D) show that as the block of replicated DC values increases from (lxl) to (8x8), the background noise level in the results decreases. This is one of the advantages of the present invention over the prior-art, low-resolution scheme of Fig. 4(A).
Although the present invention has been described in the context of two-dimensional (8x8) DCT transforms, those skilled in the art will understand that the present invention can be implemented using other DCT transforms, such as two-dimensional DCT transforms of sizes other than (8x8) and one-dimensional DCT transforms, as well as other transforms, such as one-dimensional or two- dimensional slant or Haar transforms. Similarly, although the invention has been described in the context of motion estimation and motion compensation being performed on (8x8) blocks of pixel data, it will be understood that motion analysis can be performed on blocks of other sizes and that these other sizes may differ from the size of the transform. For example, a common video compression scheme has motion estimation and compensation performed on (16x16) blocks of pixel data, while the transform is an (8x8) DCT transform. Moreover, although the present invention has been described in the context of generating partially decoded images using only the DC transform coefficients, those skilled in the art will understand that, in alternative implementations, two or more of the low-frequency transform coefficients (including the DC coefficient) can be used to generate partially decoded images. Similarly, although the present invention has been described in the context of video parsing based on histogram analysis, those skilled in the art will understand that the present invention can be used for other applications in which low-resolution are acceptable, such as picture-in-a-picture generation, fast-forward replays of video sequences, target recognition, and motion detection.
The present invention can be embodied in the form of methods and apparatuses for practicing those methods. The present invention can also be embodied in the form of program code embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. The present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.
It will be further understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated in order to explain the nature of this invention may be made by those skilled in the art without departing from the principle and scope of the invention as expressed in the following claims.

Claims

CLAIMS What is claimed is: 1. A method for partially decoding a transform-based compressed video stream, comprising the steps of: (a) decoding the compressed video stream to recover one or more low-frequency transform coefficients for each block of original image data; (b) generating a block of low-frequency image data from each set of low-frequency transform coefficients corresponding to each block of original image data; and (c) applying motion-compensated inter-frame differencing to each block of low-frequency image data to generate a partially decoded image for each frame in the compressed video stream.
2. The invention of claim 1, further comprising the step of applying histogram analysis to the partially decoded images to parse the compressed video stream.
3. The invention of claim 1, wherein each block of low-frequency image data is smaller than the size of the transform.
4. The invention of claim 1 , wherein the transform is a discrete cosine transform (DCT).
5. The invention of claim 4, wherein the transform is an (8x8) DCT transform and each block of low-frequency image data is either (2x2) or (4x4).
6. The invention of claim 5, wherein each block of low-frequency image data is smaller than the size of the transform and further comprising the step of applying histogram analysis to the partially decoded images to parse the compressed video stream.
7. The invention of claim 1, wherein: step (a) comprises the step of decoding the compressed video stream to recover only the DC transform coefficient for each block of original image data; and step (b) comprises the step of generating a block of low-frequency image data from each DC transform coefficient.
8. The invention of claim 7, wherein each block of low-frequency image data is generated by replicating the corresponding DC transform coefficient.
9. An apparatus for partially decoding a transform-based compressed video stream, comprising: (a) means for decoding the compressed video stream to recover one or more low-frequency transform coefficients for each block of original image data; (b) means for generating a block of low-frequency image data from each set of low-frequency transform coefficients corresponding to each block of original image data; and (c) means for applying motion-compensated inter-frame differencing to each block of low- frequency image data to generate a partially decoded image for each frame in the compressed video stream.
10. A computer-readable medium having stored thereon a plurality of instructions, the plurality of instructions including instructions which, when executed by a processor, cause the processor to implement a method for partially decoding a transform-based compressed video stream, the method comprising the steps of: (a) decoding the compressed video stream to recover one or more low-frequency transform coefficients for each block of original image data; (b) generating a block of low-frequency image data from each set of low-frequency transform coefficients corresponding to each block of original image data; and (c) applying motion-compensated inter-frame differencing to each block of low-frequency image data to generate a partially decoded image for each frame in the compressed video stream.
PCT/US1998/027223 1997-12-23 1998-12-22 Partial decoding of compressed video sequences WO1999033275A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP98964199A EP1048174A4 (en) 1997-12-23 1998-12-22 Partial decoding of compressed video sequences
CA002310652A CA2310652C (en) 1997-12-23 1998-12-22 Partial decoding of compressed video sequences
JP2000526055A JP2001527352A (en) 1997-12-23 1998-12-22 Partial decoding of compressed video sequences
KR1020007007059A KR20010033550A (en) 1997-12-23 1998-12-22 Partial decoding of compressed video sequences
AU19377/99A AU1937799A (en) 1997-12-23 1998-12-22 Partial decoding of compressed video sequences

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US6877497P 1997-12-23 1997-12-23
US60/068,774 1997-12-23
US10574698A 1998-06-26 1998-06-26
US09/105,746 1998-06-26

Publications (1)

Publication Number Publication Date
WO1999033275A1 true WO1999033275A1 (en) 1999-07-01

Family

ID=26749360

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/027223 WO1999033275A1 (en) 1997-12-23 1998-12-22 Partial decoding of compressed video sequences

Country Status (6)

Country Link
EP (1) EP1048174A4 (en)
JP (1) JP2001527352A (en)
KR (1) KR20010033550A (en)
AU (1) AU1937799A (en)
CA (1) CA2310652C (en)
WO (1) WO1999033275A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019156287A1 (en) * 2018-02-08 2019-08-15 Samsung Electronics Co., Ltd. Progressive compressed domain computer vision and deep learning systems

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3593944B2 (en) 2000-03-08 2004-11-24 日本電気株式会社 Image data processing apparatus and motion compensation processing method used therefor
EP2713619A3 (en) * 2003-11-18 2015-01-07 Mobile Imaging in Sweden AB Method for processing a digital image and image representation format
US8611414B2 (en) * 2010-02-17 2013-12-17 University-Industry Cooperation Group Of Kyung Hee University Video signal processing and encoding

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4887156A (en) * 1987-04-30 1989-12-12 Nec Corporation Method and system for transform coding of video signals
US4920426A (en) * 1986-11-10 1990-04-24 Kokusai Denshin Denwa Co., Ltd. Image coding system coding digital image signals by forming a histogram of a coefficient signal sequence to estimate an amount of information
US5006931A (en) * 1989-02-28 1991-04-09 Sony Corporation Highly efficient coding apparatus
US5049992A (en) * 1990-08-27 1991-09-17 Zenith Electronics Corporation HDTV system with receivers operable at different levels of resolution
US5097331A (en) * 1990-08-24 1992-03-17 Bell Communications Research, Inc. Multiple block-size transform video coding using an asymmetric sub-band structure
US5109451A (en) * 1988-04-28 1992-04-28 Sharp Kabushiki Kaisha Orthogonal transform coding system for image data
US5150208A (en) * 1990-10-19 1992-09-22 Matsushita Electric Industrial Co., Ltd. Encoding apparatus
US5189526A (en) * 1990-09-21 1993-02-23 Eastman Kodak Company Method and apparatus for performing image compression using discrete cosine transform
US5235420A (en) * 1991-03-22 1993-08-10 Bell Communications Research, Inc. Multilayer universal video coder
US5412429A (en) * 1993-03-11 1995-05-02 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Picture data compression coder using subband/transform coding with a Lempel-Ziv-based coder

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5262854A (en) * 1992-02-21 1993-11-16 Rca Thomson Licensing Corporation Lower resolution HDTV receivers

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4920426A (en) * 1986-11-10 1990-04-24 Kokusai Denshin Denwa Co., Ltd. Image coding system coding digital image signals by forming a histogram of a coefficient signal sequence to estimate an amount of information
US4887156A (en) * 1987-04-30 1989-12-12 Nec Corporation Method and system for transform coding of video signals
US5109451A (en) * 1988-04-28 1992-04-28 Sharp Kabushiki Kaisha Orthogonal transform coding system for image data
US5006931A (en) * 1989-02-28 1991-04-09 Sony Corporation Highly efficient coding apparatus
US5097331A (en) * 1990-08-24 1992-03-17 Bell Communications Research, Inc. Multiple block-size transform video coding using an asymmetric sub-band structure
US5049992A (en) * 1990-08-27 1991-09-17 Zenith Electronics Corporation HDTV system with receivers operable at different levels of resolution
US5189526A (en) * 1990-09-21 1993-02-23 Eastman Kodak Company Method and apparatus for performing image compression using discrete cosine transform
US5150208A (en) * 1990-10-19 1992-09-22 Matsushita Electric Industrial Co., Ltd. Encoding apparatus
US5235420A (en) * 1991-03-22 1993-08-10 Bell Communications Research, Inc. Multilayer universal video coder
US5412429A (en) * 1993-03-11 1995-05-02 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Picture data compression coder using subband/transform coding with a Lempel-Ziv-based coder

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1048174A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019156287A1 (en) * 2018-02-08 2019-08-15 Samsung Electronics Co., Ltd. Progressive compressed domain computer vision and deep learning systems
US11025942B2 (en) 2018-02-08 2021-06-01 Samsung Electronics Co., Ltd. Progressive compressed domain computer vision and deep learning systems

Also Published As

Publication number Publication date
CA2310652A1 (en) 1999-07-01
EP1048174A4 (en) 2004-04-07
AU1937799A (en) 1999-07-12
EP1048174A1 (en) 2000-11-02
JP2001527352A (en) 2001-12-25
CA2310652C (en) 2008-07-22
KR20010033550A (en) 2001-04-25

Similar Documents

Publication Publication Date Title
US11089311B2 (en) Parameterization for fading compensation
US6058210A (en) Using encoding cost data for segmentation of compressed image sequences
Shen et al. A fast algorithm for video parsing using MPEG compressed sequences
CN101222644B (en) Moving image encoding/decoding device and moving image encoding/decoding method
US7920628B2 (en) Noise filter for video compression
US6327390B1 (en) Methods of scene fade detection for indexing of video sequences
US5864637A (en) Method and apparatus for improved video decompression by selective reduction of spatial resolution
US6983078B2 (en) System and method for improving image quality in processed images
US20060233259A1 (en) Switching decode resolution during video decoding
US7463684B2 (en) Fading estimation/compensation
US20080031518A1 (en) Method and apparatus for encoding/decoding color image
US6643410B1 (en) Method of determining the extent of blocking artifacts in a digital image
US20070025626A1 (en) Method, medium, and system encoding/decoding image data
US6463102B1 (en) Digital video compressor with border processor
US20040047416A1 (en) Image processor
WO2020263442A1 (en) Transform-skip residual coding of video data
US9106925B2 (en) WEAV video compression system
CA2310652C (en) Partial decoding of compressed video sequences
US8023559B2 (en) Minimizing blocking artifacts in videos
US20050157790A1 (en) Apparatus and mehtod of coding moving picture
JP2002064823A (en) Apparatus and method for detecting scene change of compressed dynamic image as well as recording medium recording its program
JPH07152779A (en) Processing method for detecting moving picture index and moving picture processor having moving picture index detection processing function
Afshin et al. A dictionary based approach to JPEG anti-forensics
Hashim et al. Correlated Block Quad-Tree Segmented and DCT based Scheme for Color Image Compression
US7209591B2 (en) Motion compensation method for video sequence encoding in low bit rate systems

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref document number: 2310652

Country of ref document: CA

Ref country code: CA

Ref document number: 2310652

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1998964199

Country of ref document: EP

ENP Entry into the national phase

Ref country code: JP

Ref document number: 2000 526055

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1020007007059

Country of ref document: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 1998964199

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1020007007059

Country of ref document: KR

WWW Wipo information: withdrawn in national office

Ref document number: 1998964199

Country of ref document: EP

WWR Wipo information: refused in national office

Ref document number: 1020007007059

Country of ref document: KR