WO1997030544A2

WO1997030544A2 - Method and apparatus for transitions, reverse play and other special effects in digital motion video

Info

Publication number: WO1997030544A2
Application number: PCT/US1997/002953
Authority: WO
Inventors: John A. Viii Toebes; Douglas J. Walker
Original assignee: Sas Institute, Inc.
Priority date: 1996-02-20
Filing date: 1997-02-20
Publication date: 1997-08-21
Also published as: EP0882358A2; WO1997030544A3

Abstract

Transitions between two video frames are effectuated by selecting a FROM frame and a TO frame (31), generating a stream of bidirectionally-dependent duplicator frames which vary, placing the FROM frame in the past buffer of a decoder, placing the TO frame in the future frame of a decoder, feeding the stream of duplicator frames to the decoder, causing the duplicator frames to be displayed, and beginning normal playback of the video stream containing the TO frame at the TO frame position. Frame-specific access is accomplished by determining the location and type of target frame, identifying the reference frames to which the target frame directly and indirectly refers, parsing the reference frames with a decoder while the decoder's video display is suppressed, enabling the video display, and beginning normal decoder playback at the target frame location in the video bitstream. Reverse play is similar.

Description

METHOD AND APPARATUS FOR TRANSITIONS, REVERSE PLAY AND OTHER SPECIAL EFFECTS IN DIGITAL MOTION VIDEO

BACKGROUND 1. Field of the Invention

The present invention relates to the field of digital motion video and more particularly to a system and techniques for altering and decompressing digital motion video signals in a manner which allows efficient reverse play of the motion video as well as efficient, frame-level access and play of the motion video stream for creation of other special video effects. The system and techniques are compatible with the MPEG-1 standard adopted by the International Standards Organization^'s (ISO's) Moving Picture Experts Group (MPEG), however the invention taught herein may also be applied to other video coding algorithms which share some of the features of the MPEG algorithm, such as Intel Corporation^'s Indeo™ and Indeo Video Interactive algorithms, the Fractal Codec algorithm from Iterated Systems of Atlanta. Georgia. MVI from Sirius Publishing of Scottsdale. Arizona. Cinepak from Radius of Sunnyvale. California and the Smacker 2.0 algorithm from RAD Software of Salt Lake City, Utah.

2. Environment

The present invention relates generally to the field of digital video and more specifically to the coding and compression of analog video signals into digital video and the decoding and decompression of the digital bitstream into a displayable video signal. Digital video compression is used in a variety of applications where video images are displayed in a system where available bandwidth is limited, such as video telephone, digital television and interactive multimedia using such digital storage technology as CD-ROM. digital audio tape and magnetic disk. Such applications require digital video coding, or video compression to achieve the necessary high data transfer rates over relatively low bandwidth channels.

Various standards have been proposed and are in use for video coding. The standards vary from application to application in resolution and frames per minute allowed based, among other things, on the bandwidth available in the particular application. Several of these standards involve algorithms based on a common core of compression techniques, including transform coding, such as that employing the Discrete Cosine Transform. See K.R. Rao and P.

SUBSTΓΓUTE SHEET (RULE 26) Yip, DISCRETE COSINE TRANSFORM, ALGORITHMS. ADVANTAGES. APPLICATIONS. San Diego, California. Academic Press, 1990. .and H. Ahmed. T. Ratarajan, and K.R. Rao. Discrete Cosine Transform. IEEE TRANSACTIONS ON COMPUTERS, pp. 90-93, January 1974. See also U.S. Pat. No. 4.791.598 entitled 'Two-Dimensional Discrete Cosine Transform Processor.^" issued Dec. 13. 1988.

This invention relates most specifically to those digital video applications where the user interacts with the system in ways which can modify the video display, such as in interactive computer games or other interactive multimedia applications. In particular, digital video systems, such as MPEG video players in personal computers or video game machines would benefit from use of the apparatus and methods of the present invention to allow more efficient and realistic navigation through a video world, creation of special effects, frame specific search and access to a video stream and reverse playback of a video stream.

The MPEG-1 Video Compression Algorithm

The ISO^'s MPEG-1 algorithm is designed to yield a true TV-like image with compression ratios around 180: 1 at data rates low enough for use in storage applications with data transfer rates at or below 1.5 Mb/s (megabits/sec), comparable to those used on CD-ROM drives on personal computers. While the algorithm is designed for such data rates, it is usable at higher data rates. The inventor routinely uses data rates of 2 to 2.5 Mb/s. MPEG-1 is designed to work with images having a one-fourth of broadcast-quality resolution: 352 by 240 pels. This is approximately the quality of a picture presented by standard VHS video cassettes.

An MPEG-1 stream may consist of 0 to 16 separate video streams, 0 to 32 separate audio streams, any of which may be in stereo, and possibly other customized streams carrying user-specified information and padding bytes. The various streams are multiplexed into a single MPEG composite stream called a "system stream.^" This invention relates to the manipulation of an MPEG-1 video stream. It also relates to ways of de-multiplexing the system stream to create an actual or virtual non-multiplexed, valid MPEG stream.

The further aspects of the MPEG video standard, including the other data streams which comprise the MPEG system stream, are well known in the art, are extensively discussed in the literature, including International Standard ISO IEC 11172-2, entitled "Information technology — Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s -- Part 2: Video" dated August 1. 1993 and will not be further discussed here. Similarly, the application of the present invention to other systems of video data compression does not require discussion of analogous aspects of those systems. A single video stream consists of a sequence of pictures. These pictures are also referred to as "frames." The MPEG video stream is normally created by subjecting video data representing a video picture or frame to several digital compression steps. The MPEG-1 encoding scheme includes intra-frame compression which seeks to reduce redundancies within a frame and inter-frame compression which uses motion compensation to identify and eliminate redundancies between sequential frames. Motion compensation takes advantage of the movement of picture elements that remain approximately the same within a series of sequential frames but change position from frame to frame.

In MPEG, motion compensation is accomplished by employing a sequence of types of frames with various characteristics within a related group of pictures. The three types of frames possible in normal MPEG video are I-frames (Intra), P-frames (Predictive), and B- frames (Bidirectional). A fourth type of frame, the D-frame. is defined in the standard but is intended for use only as an indexing and overview feature and cannot be mixed with I. B, and P frames. I and P frames are collectively called reference frames since other frames can be based on them. I frames contain all of the information needed to reconstruct one frame of video. P- frames can use information from the previously displayed reference frame and can add new information. B frames can use information from either the previously displayed reference frame, the next reference frame that will be displayed, or both, and can also add new information.

Since B frames can depend on a frame that will be displayed at some point in the future, the pictures in the MPEG bitstream are encoded and stored in a different order than they will be displayed. The order that the pictures are intended to appear on the screen is referred to as the ''display order." and the order that the pictures appear in the bitstream is referred to as the "bitstream order." Bitstream order is optimized to provide the necessary reference pictures at the appropriate time to allow efficient parsing and decoding when the stream is played forward. removing the need to back up or skip forward to display the stream. Because MPEG is optimized for forward play, backward play is especially challenging. Further, before the

SUBSTTTUTE SHEET (RULE 26) current invention, efficient backward play, that is backward play of acceptable speed and quality, was not obtainable on a machine with only the memory resources required for acceptable forward play.

The MPEG standard also defines the concept of a Group of Pictures (GOP). Each GOP contains at least one I-frame and may contain additional I. B, and P frames. There is no limit on the size of a GOP. The GOP may begin in display order with one or more B-frames that refer to the last reference frame in the previous GOP. but no GOP may end with a frame that refers to the next GOP in the display order. Each GOP begins with a header which contains parameters to assist in decoding the video stream. While such parameters can be different for every GOP in the stream, they typically are the same for all GOPs in a stream.

Finally, the MPEG standard defines a "'sequence." A sequence is a sequential group of GOPs in an MPEG stream. Each sequence begins with a sequence header which contains parameters which may be used to assist in decoding the sequence.

The video data is broken down into a luminance or Y component and two color difference components. Cr (red chrominance) and Cb (blue chrominance). The individual pictures can be represented as arrays of Y, Cr and Cb values. The Cr and Cb values are subsampled with respect to the Y values by 2:1 in both the horizontal and vertical directions, therefore there is one Cr and one Cb value for each four Y values.

Pictures are broken down into macroblocks, which are contiguous regions of 16x16 pels. The Y component is represented by four 8x8 contiguous blocks for each 16x16 macroblock. The Cr and Cb components are represented by a single 8x8 block for each component, but due to the subsampling discussed above the Cb 8x8 block and the Cr 8x8 block each cover the same area of the screen as the four 8x8 Y blocks. The macroblock. therefore, consists of six 8x8 blocks, each limited to one component and all superimposed in the 16x16 pel area ofthe display covered by the four 8x8 Y blocks.

In MPEG-1 coding the six blocks comprising the macroblock are each subjected to a Discrete Cosine Transformation (DCT) algorithm that transforms them losslessly into 8x8 matrices that represent on each axis increasing horizontal and vertical frequency. Further compression steps take place to reduce the range of the values and encode them using a Huffman-type compression algorithm, but the specific algorithms used in the further compression steps are not relevant to the present invention. All macroblocks in an I picture are intra-coded. This means that all their DCT coefficients are encoded directly into the bitstream with no references to other pictures.

Each macroblock in a P picture may or may not have their DCT coefficients directly coded (herein referred to as "intra-coded" information), and each may or may not have a reference to a 16x16 pel area in the most recently displayed reference frame (such references are referred to as "motion vectors^" in the ISO/IEC standard 1 1 172-2 or as "inter-^" coded information). If both motion vector information and intra-coded information are present, the values are added. Either intra-coded information or an inter-coded reference to the next reference frame must be supplied for each macroblock. Each macroblock in a B picture may or may not have intra-coded information, may or may not have a reference to a 16x16 pel area in the most recently displayed reference frame, and may or may not have a reference to a 16x16 pel area in the next reference frame that will be displayed. If references to both the previous and next reference frames arc present, the values in the two frames are averaged and added to the intra-coded information, if any. Either intra-coded information or an inter-coded reference, either to the next reference frame, the previous reference frame or both the next and previous reference frames, must be available for each macroblock. although the standard allows the information to be inherited from previous macroblocks in some cases.

As the stream is parsed and decoded, the MPEG player constantly keeps the last two reference frames available for use in decoding B and P frames when they appear. The first reference frame decoded is placed in the future buffer. When a new reference frame is encountered in the decoder^'s parsing of the bitstream (bitstream order), the previous "future^" frame becomes the "past" frame and is normally displayed at that time. The new reference frame is read into the future buffer and becomes the future frame. These available reference frames are known as the "past" and "future" frames or pictures and are normally kept in portions of the computer or decoder memory known respectively as the '"past" and "future^" buffers. As mentioned above, P frames may refer to past reference frames, and B frames may refer to past and/or future reference frames. The appropriate reference frames must be in the appropriate buffers of the MPEG player for the P and B frames to be properly decoded. The MPEG bitstream is designed so that the proper frames will always be in the appropriate buffers when dependent frames are presented for decoding. If the past and future buffers contain the correct values and the MPEG decoder decodes the B picture, the correct picture will be displayed on the screen. However., the contents of these buffers change frequently during normal play. This makes it difficult to play a dependent frame except in the original linear video order. As used in this disclosure, the terms '"parsed" and "decoded" are virtually synonymous.

They both refer to the various processes employed by the computer and MPEG player whereby the digital information contained in the compressed video stream is accessed, manipulated, converted into bitmaps and displayed in the proper order. However as explained herein, a compressed, digitized video stream may be partially or completely parsed or decoded. Thus, parsing or decoding may refer to only one or less than all steps necessary for complete decoding of the stream information. Similarly, as is here made evident, a GOP. picture, or portion of a picture may be completely or incompletely parsed for reasons other than display. Depending on the context in which they are used, the terms "parse^" or "decode^" may refer only to preliminary steps in the decoding process, such as those steps necessary to determine whether a certain picture is an I. P or B frame, or may refer to the entire process of decoding the picture and displaying the resultant bitmap.

As used in this disclosure, the terms "frame^" and "picture^" are also virtually synonymous. They both refer to a single video picture, whether or not it is coded.

Frame accurate access to the video stream is not necessary for broadcast, satellite or cable video programming applications. However, it is desirable for many other uses of MPEG. particularly in interactive, multimedia computer applications such as computer games. It would be desirable to use MPEG video "worlds" in interactive educational and game programs.

It would further be desirable to have frame accurate access to the MPEG video streams comprising such video worlds, subplots, and the like. Although there are suggestions in ISO IEC 11 172-2 regarding random access, reverse play and other special effects, no adequate methodology has been provided for achieving random access at frames other than I frames or for achieving reverse play of MPEG video with computer memory resources no greater than those required for forward play.

The MPEG standard has been designed primarily to support normal, forward linear playback of a digital video stream in display order. However, the standard also refers to possible additional operations including random access, fast search, reverse playback, error recovery, and editing. The MPEG standard also mentions the possibility of reverse playback. Reverse playback poses particular problems because of the directionality enforced by the MPEG standard in encoding groups of pictures. Only I frames can be individually accessed and decoded. Neither B nor P pictures contain sufficient information to generate a complete frame without reference to previous (bitstream order) pictures. As with other digital streams, an MPEG stream has directionality and is incomprehensible if read backwards bit by bit. Further, the bitstream order of an MPEG stream has a definite directionality on the Group of Pictures level as well. Consequently, only reverse play of I frames can be achieved by simply reading the frames into the decoder in either reverse display order or in reverse bitstream order.

The MPEG standard suggests performing reverse playback by decoding GOP's in the ordinary fashion, storing the decoded bitmaps in a memory buffer and then displaying the bitmaps in reverse order. While this method results in a reverse playback with equal quality to the forward playback, by requiring storing of decoded pictures before playback, it places significant greater demands on computing resources, particularly memory resources than does forward MPEG. Another method is to decode only the I frames in each group of pictures. While this method eliminates the bit map buffer requirement, it results in either loss of temporal resolution (where there is a significant number of B and P pictures skipped) or loss of compression (where the original video sequence is coded primarily in I frames to allow for smoother reverse playback).

Another method of creating a similar effect would be to avoid reverse play of an MPEG stream by storing "forward" and "backward^" video contents in standard unidirectional MPEG streams to simulate reverse play by having the run-time system switch from the appropriate forward stream to the corresponding reverse stream when the "reverse^" command is given by the user. While eliminating the need for memory resources which the bit map storage method requires and eliminating the loss of temporal resolution which the I frame only method may involve, such a system would double the storage requirements for the video information files which are to be made available for forward and reverse play. Further, such a solution would require limitation of the points along the video stream where a reverse command could be executed and or complete synchronization of the forward and reverse MPEG streams. Such a solution would also require a seek to the reverse stream every time a reverse command is given, slowing down navigation of the video world.

While reverse play is not necessary for broadcast, satellite, or cable video programming applications, it is desirable for many other uses of MPEG, particularly in interactive. multimedia computer applications, such as computer games. It would be desirable to use MPEG video "worlds" in interactive educational and game programs. It would further be desirable to have such worlds navigable in the forward and reverse directions without doubling the MPEG storage requirements for creation of such a world.

Further, there has been no method suggested for creating meaningful transitions between separate MPEG video streams, or solving the problem of delay in the display of information during the seek time required to transition from one video stream to another. As all of the methods of MPEG frame specific access, reverse play and stream to stream transitions attempted to date have limitations which makes their use in interactive multimedia personal computer applications limited, there is needed more efficient methods to accomplish these functions.

SUMMARY OF THE INVENTION

It is therefore an object of the present invention to provide a method and apparatus for providing in a personal computing system with relatively modest resources random, frame accurate access to an MPEG video stream at any frame. It is a further object of this invention to provide in such a computing system high quality reverse play of MPEG video streams using computing resources approximately the same as those required for forward MPEG play. It is a further object of this invention to provide meaningful transitions which can enhance the user experience and can also serve to provide meaningful video content while masking seeking time delays.

A method and system according to the present invention comprises a self-contained interactive multimedia computer system, such as a Microsoft Windows compatible personal computer with a 90 MHz Pentium processor. SVGA video display and software or hardware MPEG decoder, which is pre-programmed to allow the user to access MPEG video streams on a frame specific basis, play such video streams in the reverse direction, and create or play meaningful MPEG transitions which can be played while the system is seeking the "to" video stream of the transition, and to construct, edit and navigate multimedia video based environments and applications based thereon having such capabilities.

BRIEF DESCRIPTION OF THE DRAWINGS Other objects, features and advantages of the present invention will become more fully apparent from the following detailed description of a preferred embodiment, the appended claims and the accompanying drawings in which:

FIG. 1 is a block diagram of a video system for displaying video images in a PC environment, according to a preferred embodiment ofthe present invention: FIG. 2 is a block diagram of a system, including an MPEG streamer, according to a preferred embodiment ofthe present invention:

FIG. 3 is an example of a video stream index according to a preferred embodiment of the present invention;

FIG. 4a is an example of a video stream shown in display order; FIG. 4b is an example ofthe video stream of FIG. 4a shown in bitstream order;

FIG. 5 is a process flow diagram showing the process of a preferred embodiment ofthe present invention for frame specific access to a B frame in the video stream of FIG. 4.

FIG. 6 is a process flow diagram showing the process of a preferred embodiment ofthe present invention for frame specific access to a P frame in the video stream of FIG. 4. FIG. 7 is a process flow diagram showing the process of a preferred embodiment ofthe present invention for frame specific access to an I frame in the video stream of FIG. 4.

FIG. 8 is a generalized flow diagram showing the process of frame specific access to a specific frame.

FIG. 9 is a process flow diagram showing the process of a preferred embodiment of the present invention for reverse play ofthe video stream of FIG. 4.

FIG. 10 is a process flow diagram showing the process of a preferred embodiment of the invention for transition from one video stream to another.

FIG. 11 and its parts 11a through Ilk is an illustration of a transition according to the invention. FIG. 12 is an illustration of the appearance of successive displays during a push right transition. FIG. 13 is a process flow diagram showing the process of a preferred embodiment of a turn transition, incorporating panning and composite pictures according to the invention.

FIG. 14 is an illustration of the relationship of panning the display over the FROM picture, composite picture and TO picture during a right turn transition according to the invention.

DESCRIPTION OF PREFERRED EMBODIMENTS

The present invention is directed to providing an apparatus and method for random playback and reverse playback of digitally compressed video bitstreams without loss of quality or temporal resolution while minimizing the impact of such playback on processing time and memory resources and to providing meaningful transitions between different video streams or video frames. By random playback, we mean that any frame in a video stream may be the first in that stream to be displayed. Once such random access is achieved, the invention allows for normal forward play to continue from the randomly accessed frame or for reverse playback to begin from such frame. By reverse playback, we mean the display of an MPEG video stream in reverse display order beginning at any selected frame within the video stream.

In the preferred embodiment, the GOP and Sequence header portions of the MPEG video stream are ignored and the MPEG player is set to use the default setting for all quantization matrices. It is a trivial matter to cause the MPEG player to load quantization matrices in the parsing process should this be desired in a particular application of the invention. Although in the MPEG ISO/IEC standard the sequence header portion ofthe MPEG video stream can be used to adjust the bit rate, in practice available decoders, ignore this parameter. The bit rate is not an issue as long as it is within the decoder^'s upper bound of bit rate handling capacity. Consequently, although it would be a trivial matter to have the MPEG streamer of the present invention adjust the bit rate parameter in the sequence header, the preferred embodiment, as do currently available MPEG players, ignores bit rate information in the sequence header.

Further, the preferred embodiment of the present invention does not use the temporal reference field of the MPEG stream. As a practical matter, available decoders do not use temporal reference data. However, it would be a trivial matter to cause the streamer to adjust

SUBSTTTUTE SHEET (RULE 26) the temporal reference field, should it be desired to use the invention in combination with an MPEG player which uses this information.

The present invention may comprise either a hardware or software MPEG player. Hardware players use special video processing chips, either on the computer's motherboard. or more commonly, on a special video card in the computer. Typically these chips speed up the process of decoding the MPEG video and/or the process of drawing the decoded video on the screen. Software players, on the other hand, do most of the work with program logic, and use only conventional computer hardware to display the decoded video.

The method and apparatus ofthe present invention preferably requires an MPEG player capable of parsing certain arbitrary sections of the bitstream without updating the display to display the resulting pictures on the screen, leaving the previously displayed picture on the screen and without playing any audio associated with the sections parsed. We refer to the operation of an MPEG player where updating the display and audio is suppressed as playing the MPEG player in "suppression mode." The invention also preferably requires use of an MPEG player capable of accessing MPEG data stored in a memory buffer instead of merely the MPEG file. In the case of hardware or combination hardware/software decoders, this ability is usually a function of the player^'s software drivers, rather than a limitation of the hardware itself. Therefore, it can be solved in the case of such MPEG players by replacing the available drivers with more sophisticated ones. In the preferred embodiment, the computer system incoφorates a software MPEG decoder, such as SoftMotion from SAS Institute Inc. of Cary. North Carolina.

Referring now to FIG. 1. there is shown a block diagram of a video system comprised of a multimedia computer system 10 which could be employed to implement the present invention. In FIG. 1. a computer system 10 is shown having a Pentium® 90 MHz CPU based computer 1 with a 540MB internal hard drive, a CD ROM drive 8. a disk drive 9. a SVGA monitor 2. speakers 3. a keyboard 6. and a pointing device 7.

The computer could be connected to a local area network and/or modem for accessing resources not located within the computer^'s local drives. Implementation of other user interface devices, add-ons. operating systems and peripherals would be obvious to those skilled in the art and will not be discussed further herein. Referring now to FIG. 2. there is shown a block diagram of a hardware/software system according to the present invention. The central component in such a system is the MPEG Streamer 23. The MPEG Streamer can retrieve the video stream from an MPEG disk file 11. usually contained on a CD-ROM disk in a CD-ROM drive 8 as shown in FIG. 1. As the data on the CD-ROM disk file is multiplexed with audio data and other data into a system stream 13. the data must first be de-multiplexed by a de-multiplexer 15 in order for the video stream data 21 to be separated from the other types of data in the system stream. Among the other types of data in the stream is the audio stream which is fed by the de-multiplexer to the audio player 19. The video stream data fed to the MPEG Streamer 23 also includes video indices which are placed by the streamer in the appropriate buffer 33, "TO" frames placed by the streamer in the "TO" frame buffer 27 and pre-constructed synthetic MPEG transition sequences which are placed in the transition buffer 29. Alternatively or in addition, the system may include a transition generator 31 which generates the desired synthetic MPEG for transitions in response to the request for such from the MPEG streamer 23.

MPEG Streamer

The MPEG streamer function is an important component of the preferred embodiment of the invention. It essentially constructs a continuous virtual MPEG stream out of the source MPEG to send to the decoder. As far as the MPEG player is concerned, the MPEG stream never ends and no seeking is ever requested. Consequently, the MPEG decoder never needs to initialize as it otherwise would have to do every time a seek is needed. Decoder initialization sequences may cause the skipping of some frames of a stream and will cause a delay as the player decodes initial frames before beginning to display video.

The streamer manufactures the continuous MPEG stream by assembling MPEG data from selected segments of the MPEG stream(s), and from transition data. In effect, the streamer does the seeking required outside of the MPEG player. When a seek is done, the streamer joins the new MPEG frames to the old ones so that the MPEG player sees what looks like a single MPEG stream with no interruptions. The streamer also injects the "synthetic^" MPEG frames generated during transitions into its output stream. The MPEG streamer accepts manual inputs as well as responding to positional data contained within the program to seek and access the appropriate video data to send to the MPEG player 37. The video stream 39 sent to the MPEG player is referred to as the virtual video stream because the system may contain frames which are synthesized by the MPEG streamer and inserted into a "video stream" according to the invention, although in reality no preexisting stream with these characteristics exists. The MPEG streamer 23. usually a software entity, is constructed so that, in conjunction with a de-multiplexer 15. it reads a base MPEG video stream, such as that on a video disc, and constructs a second valid MPEG stream 39 derived from the base stream and. possibly, other sources for the MPEG player to play. Preferably such MPEG streamer creates an actual or virtual non-multiplexed MPEG video stream capable of being manipulated according to the method of the invention. Such component or software entity may also include the ability to accept user input 25 and execute various seeking transactions as described herein to queue other data for "streaming^" into the MPEG player.

The method and apparatus ofthe present invention uses the MPEG streamer^'s ability to construct a derived stream from various components and to manipulate the derived stream according to the invention to accomplish frame accurate access, reverse playback and other special effects. The streamer has the ability to duplicate, omit or reorder pictures present in the base video stream and the ability to insert pictures into the video stream from other sources.

In the preferred embodiment the MPEG player does not make use of the temporal reference field described in the MPEG standard and available in the MPEG stream. If the MPEG player on which the invention is to be practiced does use the temporal reference field, the MPEG streamer must contain means for adjusting the temporal reference data in the MPEG stream it creates to have the correct temporal reference data. The functionality of the MPEG streamer used in the preferred embodiment ofthe invention is shown in the annotated source code for the interface definition of such a streamer attached hereto as Attachment 1 and incorporated herein by reference.

Video Stream Frame Index

Further according to the invention, although the method will function without such an entity, the method preferably creates an index of the MPEG stream to provide more rapid searching of said stream. Such index preferably is an array of video stream offset numbers in bitstream order which indicates at what byte each picture starts and whether the picture is an I. P or B picture. If the MPEG video stream being decoded uses the GOP headers or sequence headers within the stream to alter the MPEG player^'s state, the index should also contain flags indicating whether the picture is the last picture in a GOP (in bitstream order) or the last picture in a sequence (in bitstream order). These last two flags allow the streamer to locate, read and parse the appropriate GOP and sequence headers where these are important in placing the player in the appropriate state. While such an index can be created during runtime, for greater efficiency such index is preferably created for each MPEG stream which will be present in a product incorporating the invention during the creation of the product rather than being created during play of the product. Such pre-constructed indices are loaded into buffer when appropriate during runtime. The index may be stored in a separate file or in the file containing the video stream. If it is stored with the video stream, it may be stored as a composite stream, as user data, in the fields of the video stream (e.g., GOP timestamp. temporal reference, vbv delay, etc.). as a non-standard extension of the video data, or as a single chunk at a known location within the video stream. FIG. 3 shows an example of an index according to the invention. The headings are provided for illustration and convenience and are not necessarily present in the software index entity. The first column in FIG. 3 indicates the type of frame of all of the frames in the MPEG stream indexed, in bitstream order. The second column indicates whether the frame is the last frame in a GOP. The third column indicates the offset byte number showing the location ofthe frame in the bitstream. Preferably the index refers to the de-multiplexed MPEG stream offset number rather than the multiplexed system stream offset number. The fourth column indicates whether the frame is the last frame in a sequence.

Frame Specific Access to MPEG Stream

According to the present invention, an MPEG player with the capability of suppressing display of a parsed MPEG stream while continuing to display the previously displayed picture and with the capability of generating a derived MPEG video stream can be made to exactly reproduce the state ofthe MPEG player when playing any given frame of video.

In normal forward play of an MPEG video stream, the decoder retains information from the most recently displayed reference frame and the next reference frame to be displayed in its past and future buffers respectively. It also retains information from the most recently

14

SUBSTTTUTE SHEET (RULE 26) encountered GOP header and sequence header. This information is used to parse and display B-frames and P-frames which are displayed before the reference frame in the future buffer but occur later in bitstream order and is referred to as ""state" information for the MPEG Player. Thus there is a difference between display order and bitstream order. FIG. 4a and FIG. 4b illustrate typical frame sequences in an MPEG stream. FIG. 4a shows the sequence in display order. FIG. 4b shows the same sequence in bitstream order. Notice that the reference frames required to decode the intermediately displayed B frames appear before the intermediate B frames in bitstream order. In display order the intermediate B frames appear between the past and future reference frames on which they depend for complete decoding. Thus, each P picture depends on the preceding I or P picture in bitstream order. Of course, where a P picture depends on a previous P picture, that reference picture in turn depends on an earlier reference frame (either P or I). Each B frame depends on the previous two I or P pictures in bitstream order. If these necessary' reference pictures are not completely parsed and present in the buffers, the decoder is not in the proper state to completely decode the dependent frames and the P or B pictures which depend on the reference pictures cannot be displayed correctly.

Further, as discussed above, in addition to having the required pictures parsed, decoded and present in the buffers, the relevant GOP header and sequence header information must also be presented to the player before the frames depending on it are parsed. If all of this "state" information is not correct the player will not properly decode the B frames and P frames which occur later in bitstream order. No doubt due to this sequential dependence, the MPEG standard only suggests how to access I frames which can be read and displayed without reference to any other frames.

If one wishes to display an arbitrary frame other than an I frame, the past and future buffers in the player must be placed in the proper state to accurately parse the desired frame. In order to practice the invention, the MPEG player must be capable of accepting MPEG data from a memory buffer instead of merely from a file, such as found on a video disk or CD. Further, the MPEG player must be capable of executing instructions, according to the invention, requiring the player to parse certain arbitrary sections of the bitstream without updating the display to display the resulting pictures on the screen, leaving the previously displayed picture on the screen and without playing any audio associated with the sections

15

SUBSTTTUTE SHEET (RULE 26) parsed. For the purposes of this disclosure, operation of an MPEG player where updating the display is suppressed is defined as operation of the player in "suppression mode." In the preferred embodiment, when operating in the suppression mode the MPEG player ignores the nominal picture rate ofthe stream and decodes the frames presented as quickly as possible. Many existing hardware players are capable of executing instructions according to the invention and operating in suppression mode. An example of a hardware player which is capable of executing the process of the invention with its current drivers is the Jakarta MPEG Video Graphics Accelerator sold by Jazz Multimedia. Santa Clara. California. Applicants know of only one software player having the required capabilities — the "SoftMotion" MPEG player distributed by SAS Institute of Cary. North Carolina. While there has been little need for players able to perform this function prior to the current invention, the knowledge required to create such a player is well known in the art and will not be further discussed herein.

Any frame in a bitstream will be properly parsed and displayed if the decoder is in the proper state to parse and display it. In ordinary play, the bitstream order of the frames, GOP headers and sequence headers automatically place the player in the proper state. In its preferred embodiment, the method ofthe invention accomplishes the task of placing the MPEG player in the proper state to display an arbitrary "target" frame by using a previously constructed MPEG stream index as described above and illustrated in FIG. 3. While such data could easily be obtained from the MPEG video stream or from an array constructed during runtime, the use of a preconstructed array reduces processing time.

When a target frame is selected for access, according to the preferred embodiment of the method of the invention the index is analyzed to determine which 1 and P frames must be parsed prior to the parsing of the target frame in order to place the player in the proper state for parsing the target frame. Once this is determined, the player is directed to parse these I and P frames while in suppression mode. Then the picture display and audio are enabled. Finally, the MPEG player is directed to begin playing in the ordinary forward mode at the target location.

While the technique described assumes a video-only MPEG stream, it works equally well in multiplexed MPEG system streams which combine one or more multiplexed video and audio streams. In such cases a de-multiplexer is used to create a single video stream. The index is created from such single video stream rather than from the multiplexed system stream.

The process of frame specific access is easier to understand if specific examples are provided. For the purpose of the following examples, assume the video stream represented in FIG. 4a. The top row of FIG. 4a shows the picture type, either I. B or P. The second row provides the picture display order. FIG. 4b shows the same video stream in bitstream order.

As previously discussed, each P picture depends on the preceding I or P picture in bitstream order or in display order. Therefore, in normal play I and P pictures are always parsed and displayed in the same order. Thus the position of I and P pictures relative to each other does not change from display order to bitstream order.

Notice by comparing FIG. 4a and FIG. 4b that the only difference between display order and bitstream order is the location of the B frames relative to the I and P frames on which they depend. In bitstream order both the past and future reference I or P frames are placed before the depending B frames. In display order B frames are displayed after the reference (I or P) frame on which they depend for a past reference and before the reference (I or P) frame on which they depend for the future reference. As the parser must have access to the information in both the past and future reference frames to properly decode these B frames, both reference frames appear before the depending B frames in bitstream order. They are parsed and read into the appropriate buffer within the player so that they are available to reconstruct the B frames. In normal play an I or P frame is parsed and placed in the forward buffer. If the frame is a P frame, the parsing includes references to the preceding P frame or I frame which is present in the past buffer. Any B frames preceding the I or P frame and depending on it are then parsed and displayed. The I or P frame is then also displayed as it is read into the past buffer. The next I or P frame is then parsed and read into the forward buffer. The process then repeats itself through the entire video stream.

In order to provide frame specific access to an arbitrary target frame other than an I frame, the necessary reference frames must be properly parsed and residing in the appropriate buffer in order for the player to be in the proper state to parse the target frame. If the appropriate frames are not parsed and read into the appropriate player buffers, the P or B picture cannot be parsed and displayed correctly. Where the target frame is an I frame the display still must be disabled while seeking to the appropriate frame. Further, as I and P frames

17

SUBSTTTUTE SHEET (RULE 26) are not displayed until they are replaced in the future buffer by the next reference frame, the next reference frame must be placed in the future buffer in order to display the target frame.

After it decides the list of frames which must be parsed in order to parse the target frame, the MPEG streamer must construct a virtual MPEG stream out of the correct components of the original stream. The virtual stream will be sent to the MPEG player to reproduce the correct system state. For the MPEG streamer to create a correct virtual stream, it may need more information than just the picture data. MPEG player state information can be modified by MPEG sequence headers and by MPEG GOP headers.

Where the technique is being used to access a specific frame in a video stream where state information is being altered from GOP header to GOP header and from sequence header to sequence header, the streamer performs the following steps:

For each frame that is on the list to be parsed, determine which sequence header and which GOP header apply to that frame. If the sequence header and/or GOP header of the first picture are different from the last ones the player decoded, the different header must be copied in before the data for the first picture it applies to. For all pictures after the first, the sequence header and/or GOP header must be inserted if it is different from the one that applies to the preceding picture in the virtual stream.

While this step is technically necessary according to the MPEG standard, in practice we have found that virtually no streams actually use sequence headers and GOP headers to change player state. Thus, as long as we are merely changing the file position within the same stream, the preferred implementation does not insert sequence headers or GOP headers.

The following examples use the above described video stream to illustrate frame specific access to B. P and I frames according to the method of the invention. The following illustrated examples are illustrations of the preferred embodiment and do not contain the extra steps necessary to insert sequence headers or GOP headers into the virtual stream.

Example 1. Frame specific access to a B frame

FIG. 5 shows the preferred process of the current invention used to access the video stream of FIG. 4 at picture 8. The first step in the process illustrated in FIG. 5 is to analyze the bitstream. The identity ofthe target frame is determined preferably by using the index. We see that picture 8 is a B frame. Once the streamer determines that the target frame is a B frame.

18

SUBSTTTUTE SHEET (RULE 26) the index is further analyzed to determine the location in the video bitstream of the reference frames on which the B frame depends. The target frame number 8 depends on the two preceding I or P pictures in bitstream order. In the example, these pictures are pictures 9 and 6. Thus, pictures 9 and 6 must be parsed correctly and resident in the appropriate buffers for frame 8 to be parsed. Also since picture 6 is a P frame, we must first parse the preceding I frame to correctly parse picture 6. Therefore, we must parse pictures 0. 3. 6, and 9 before attempting to parse and play picture 8.

The second step in the FIG. 5 process is to place the player in suppression mode. In the third step, the MPEG streamer creates a virtual video stream containing these reference frames pictures 0, 3, 6 and 9 in the proper order and sends them to the player for parsing. Of course, if GOP header and sequence header information was important for accurate parsing, these headers would also be sent to the decoder at the proper time. The streamer then skips to the beginning of picture 8. Note that this sequence does not require skipping backward. The fourth step in the process is to take the player out of suppression mode, re-enabling the picture display and audio.

Finally, the MPEG player begins normal play at the current file position.

Example 2. Frame specific access to a P frame

FIG. 6 illustrates the preferred process for frame specific access to a reference frame (I or P frame). Notice that the process is slightly different than that for a B frame. One reason for the difference is that some of the B frames which would appear before the reference frame in display order occur after the reference picture in bitstream order. Thus "normal" play cannot be achieved immediately after parsing the desired reference frame. The process of the invention uses the analysis obtained from the index to direct the player to skip these dependent B pictures which follow the target reference frame in bitstream order. Since I and P frames are not displayed immediately upon parsing, but are held in the future and/or past buffers until their appropriate place in the display order, we can parse the target reference frame itself while the display is in suppression mode, then seek the frame to the next reference frame after the target frame to skip the B pictures which are before the target frame in display order but after the target frame in bitstream order. Using the video stream of FIG. 4, FIG. 6 illustrates the preferred process of the invention for frame specific access to picture 6:

Again, the first step is to analyze the bitstream. In the preferred embodiment this step is accomplished at greater speed by referring to the preconstructed index. From the analysis we see that picture 6. the target frame, is a P frame, so it depends on the preceding I or P frame in bitstream order. Therefore picture 3 must be parsed correctly in order for picture 6 to be properly parsed. In addition, as picture 3 is also a P frame requiring picture 0, picture 0 must also be parsed. Since picture 6 is a reference frame, we will need to parse it before re-enabling the display. We also note that the reference picture following the target frame is picture 9. Once the streamer, using the index, determines the list of necessary pictures, it can also determine which GOP and sequence headers will be required at which points in the bitstream. if the MPEG stream is using these headers to convey state information to the player.

The second step is to place the player in suppression mode.

The third step is to parse pictures 0. 3 and 6 while the player is in suppression mode. then to seek to the beginning of picture 9.

The fourth step is to take the player out of suppression mode by re-enabling picture display and audio.

Finally, begin normal play at the current file position. When the player encounters picture 9. it will put picture 6 on the screen as it is moved to the past buffer and hold picture 9 in the future buffer for later display.

Example 3. Frame specific access to an I frame

FIG. 7 illustrates the preferred embodiment of the present invention to access an I frame. Using the same video stream shown in FIG. 4 as do the previous examples to illustrate the steps required to access picture 0 (an I frame).

The first step is to use the pre-constructed index to analyze the bitstream. Picture 0 is an I frame, so it does not depend on other frames. As the preferred embodiment uses an MPEG stream which does not convey state information to the player using sequence headers or GOP headers, the steps related to such headers, discussed above, do not need to be performed. The second step is to place the player in suppression mode.

20

SUBSTTTUTE SHEET (RULE 26) The third step is to parse picture 0. then seek to the beginning of picture 3 while the player remains in the suppression mode. In this particular stream, this action does nothing since pictures 0 and 3 are adjacent in bitstream order, but in other streams there may be B frames that are displayed before picture 0 that occur after picture 0 in the bitstream. In such a case the streamer would seek to the beginning of picture 3.

The fourth step is to take the player out of suppression mode by re-enabling the picture display and audio.

Finally, the last step is to begin normal play at the current file position. While the process actually carried out to perform frame specific access according to the invention varies depending on the type of picture sought, the logic is consistent. This logic is shown on the flow chart of FIG. 8.

As shown in FIG. 8. the overall process is initiated in response to user input or program instruction. For example, in a computer game the user may "click" on a particular area of the screen within the current video frame being displayed. A click in this area may direct the program to begin playing a video stream at a particular location within the stream. This instruction will then cause the system to select a particular target frame in the video stream to be accessed (either the same video stream or another video stream). In response to the same "click" the system also puts the player in suppression mode and analyzes the target frame context. In the preferred embodiment, the system accesses a preconstructed index such as that illustrated in FIG. 3. This index allows the system to determine the location ofthe target frame within the target video stream, whether the target frame is I. P or B and. if the target frame is B or P. which reference frames must be parsed to place the player in the appropriate state to play the target frame. FIG. 8 then shows, once the target frame type and location is identified, the process branches into three alternatives, which instruct the system how to get the MPEG player in the proper state for the three possible types of target frames. The specific instructions are then carried out as described above and in FIG. 8, using, in the preferred embodiment, the MPEG streamer to locate and read the appropriate frames into the streamer buffer in the appropriate order to be fed by the streamer to the player while the player is in suppression mode. Once the player is placed in the proper state, the display and audio are enabled and the system begins normal play at its current position. As the three alternatives in FIG. 8 illustrate.

21

SUBSTTTUTE SHEET (RULE 26) the bitstream position varies depending on the type of target frame. For B frames the MPEG streamer is directed to skip or seek to the beginning of the target B frame once the preceding (in bitstream order) reference frames are parsed. Then the display is enabled and normal play is began. As B frames are displayed immediately as parsed, the B frame will be parsed and displayed immediately upon resumption of normal play. The appropriate I and P frames necessary to place the player in the appropriate state to accurately parse the B frame will have already been parsed into the appropriate buffers so that, once normal play is resumed, the B frame will be displayed as it is parsed.

Under normal MPEG player playing conditions reference frames, either I or P. will not be displayed when initially parsed. Rather, reference frames are first parsed into the future buffer where they are used, along with the reference frame in the past buffer, to decode any intermediate B frames. The reference frame is not displayed until another reference frame is parsed into the future buffer, forcing the parsed reference frame into the past buffer and simultaneously displaying it. The current invention takes advantage of this characteristic of MPEG players. Recall that P frames are dependent on the reference frame in the past buffer for accurate decoding. If the frame in the past buffer is a P frame, it in turn is dependent on any previous P frames between it and the closest previous I frame. Therefore, in order to have the past buffer in the proper state to parse the target P frame, the MPEG streamer is directed to skip or seek to the most recent previous 1 frame which it sends to the player, then it sends in normal order all P frames between the I frame and the target P frame. Of course, if the streamer/player is not capable of skipping, all intermediate frames can be parsed while the player is in suppression mode as the streamer/player combination seeks toward the target frame. The target P frame is then parsed into the future buffer and the player/streamer instructed to skip or seek to the next reference frame. The display is enabled and normal play is resumed at this position.

Immediately upon the resumption of normal play, the next reference frame is parsed into the future buffer, forcing the target frame into the past buffer and simultaneously displaying it.

Where the target frame is an I frame, which can be completely parsed without reference to other frames, we do not need to be concerned with the state of the MPEG player's past buffer. The process of the invention merely assures that the I frame is parsed into the future buffer and the streamer/player is poised to parse the next reference frame upon enablement of the display and resumption of normal play. The parsing of the next reference frame upon resumption of normal play will force the target I frame into the past buffer and simultaneously cause it to be displayed.

Reverse Play

ISO 1 1 172-2 is the document defining the MPEG video standard. Appendix D. section 6.7 of this document contains a brief discussion of "Coding at lower picture rates." The standard recommends the use of a B or P picture inserted between other pictures in the stream to duplicate the previously displayed picture. Such B or P pictures would contain only zero motion vectors to the past reference frame, which would be in the player^'s past buffer. Such pictures essentially duplicate the picture in the past buffer in its entirety. This previously disclosed technique can only be used to duplicate reference frames, not B pictures. Appendix D also suggests encoding a stream at a lower temporal resolution (i.e. fewer pictures per second) and then "padding" the stream with inserted B or P pictures. However, such a stream must be encoded with all reference frames for such padding to work. This is the only use of such duplicator frames in the prior art.

The current invention uses duplicator frames in a novel and non-obvious way to facilitate reverse play and other special effects. Specifically, one technique used in the present invention is the creation of a B picture by the MPEG streamer that contains only zcro-motion- vector references to either the past or the future reference picture. Such a past or future duplicator B picture effectively reproduces the picture in the past or future buffer in its entirety.

Such B pictures consist ofthe same bit sequence regardless of the contents of the picture being duplicated. Consequently the same past or future duplicator bit sequence can be used over and over again regardless ofthe content ofthe pictures being duplicated. Duplicator B pictures which reproduce the picture in the past buffer are called past duplicators. Those which reproduce the picture in the future buffer are called future duplicators. The present invention makes use of the uniform property of such duplicators to create novel video effects, such as efficient reverse play.

It is possible to play MPEG video in reverse simply by successively reproducing the state ofthe player for each frame in the sequence in reverse order. That is, by using the method described above to achieve frame accurate access for each frame in the video in reverse order.

23

SUBSTTTUTE SHEET (RULE 26) While this avoids the need for additional memory requirements which prior art reverse play methods require, it does require a great deal of extra processing overhead. If the frame to be displayed is near the end of a long GOP. the player may have to parse and suppress the display of many pictures to get the desired picture on the screen. Reverse MPEG can be accomplished with less processing overhead by the following methodology:

1. If the last frame in the sequence to be played in reverse is a B frame, display it using the technique described above for frame specific access to a B frame.

2. If the next frame to be displayed in reverse display order is another B frame, the past and future buffers are already set up correctly. Parse and display the B frame, then repeat this step for the next frame to be displayed.

3. If the frame to be displayed is a P frame or an I frame, skip to the beginning of the GOP and parse but do not display all P and I frames up to and including the frame to be displayed. Then generate a future duplicator B frame, parse it and display it. Go back to step 2.

Example 4. Reverse play of an MPEG sequence:

Using the stream illustrated in FIG. 4, FIG. 9 illustrates the process of a preferred embodiment of the invention for playing the FIG. 4 stream in reverse order. The first step is to analyze the bitstream to be displayed in reverse order using the index. The analysis reveals that the first frame to be played in reverse order is picture 9, a P frame. The analysis also reveals that in order to place the player in the proper state to parse picture 9 the pictures on which it depends. 0. 3, and 6 must first be parsed.

The second step is to place the player in suppression mode and to parse pictures 0. 3. 6 and 9 while the player remains in suppression mode. This puts frame 6 in the past buffer and picture 9 in the future buffer.

The third step is to take the player out of suppression mode by re-enabling the display and audio.

The fourth step is to create, or retrieve from a buffer, a future duplicator B picture, parse it. and display it. This has the effect of displaying frame 9. The fifth step is to parse and display pictures 8 and 7. which the analysis revealed to be B frames which depend on picture 9. .

The sixth step is to place the player back in suppression mode and to parse pictures 0. 3, and 6 while the player is in suppression mode. This places the player in the proper state to decode pictures 5 and 4 which are B frames but does not display picture 6. a P frame which remains in the future buffer. In order to display picture 6 while keeping the player in the proper state to decode pictures 5 and 4, the seventh step is performed. The seventh step is to create a future duplicator B frame and display it. This has the effect of displaying an exact duplicate of picture 6. The eighth step is to parse and display pictures 5 and 4. now that picture 6 has been displayed by duplication, although it remains in the future buffer where its presence is required to properly decode B frame pictures 5 and 4. The ninth step is to place the player back in suppression mode and to parse pictures 0 and 3. This places the player in the proper state to decode pictures 1 and 2 which are B frames but does not display picture 3. In order to display the contents of picture 3 without disturbing the future buffer, step 10 is performed. Step 10 is the construction or retrieval from buffer, a future duplicator B frame which is then parsed and displayed. The eleventh step is the parsing and displaying of pictures 2 and 1. The twelfth step is the parsing of picture 0. However, as picture 0 is an I frame, it will not be displayed unless another reference frame is decoded to display it as it moves from the future buffer to the past buffer. The thirteenth and final step is to display the contents of picture 0 by creating or retrieving a future duplicator B frame and displaying it while picture 0 is in the future buffer.

This technique can be used regardless of where GOP boundaries fall. However, where GOP header and sequence headers are employed in the parent video stream to convey changing state information to the player, such headers must be copied into the virtual MPEG stream along with the frames to which they apply.

Synthetic MPEG Transitions and Special Effects

The current invention also uses duplicator frames in a further novel and non-obvious way to create a meaningful transition from one video stream to another. The method requires two video frames, a "FROM" frame, which must be either an I or P frame of the pre-transition video stream and must be in the player^'s "future" buffer; and a "TO" frame which must be an I frame and is either the first frame to be decoded from the "TO" video stream or an I frame which duplicates the contents of this frame, such as one stored in a separate TO frame cache. Note that the requirement that the TO frame be an I frame does not require the target frame in the parent video stream to be an I frame. If the target frame in the TO video stream is not an I frame it can still function as the TO frame provided the I frame duplicate of the TO frame is in the TO buffer. As such frames are preferably selected during the creation of the program when allowable transitions are identified, the recoding of the required TO frames into I frames is trivial.

It is also within the present invention to avoid the restrictions that the FROM frame be an I or P frame and that the TO frame be an I frame by using a decoder which is capable of treating the decoded data of a non-reference frame as if it had come from an I or P frame. To avoid the need to use reference frames for FROM frames or I frames for TO frames, one must use a decoder able to copy the decoded data from the non-reference frame being used as a FROM frame directly into the past or future buffer and able to copy the decoded data from the P frame or non-reference frame being used as a TO frame directly into the future buffer. Where such a decoder is used a transition according to the invention can be initiated from any FROM frame to any TO frame regardless of the frame type. The first step of such a transition would be to copy the decoded information from the displayed FROM frame into the future buffer at the beginning ofthe transition. If the TO frame being used is an I frame the transition can proceed normally with the TO frame being decoded into the future buffer forcing the FROM frame into the past buffer. Of course, the decoder could also be directed to copy the FROM frame directly into the past buffer rather than using the TO frame to force it into the past buffer as would happen in ordinary play. Using a non-reference frame for the FROM frame with a decoder capable of this manipulation does not create any additional computational burdens on the system, however use of TO frames other than I frames does. The TO frame must be decoded and the decoded frame information copied into the future buffer.

Unlike the case for a FROM frame where such decoding will be accomplished in the ordinary decoding and playing of the FROM video stream prior to the transition, the use of a TO frame other than an I frame will require the system to first accurately decode the desired TO frame before it can be copied into the future buffer. Thus, the system will be required at a minimum to parse all the reference frames upon which the TO frame depends in order to

26

SUBSTTTUTE SHEET (RULE 26) properly decode it. The frame specific access technique disclosed herein could be employed to achieve the proper decoding of the TO frame. While this would allow for transitions to any frame as a TO frame, such flexibility would come at the cost of additional seek and computation time while the frame specific access technique was being executed. Thus, in the preferred embodiment the TO frames are I frames, either through the use of cached I frame duplicates of the TO frames or through restricting transitions to TO frames in the TO stream which are coded as I frames.

In the preferred embodiment, the transition is used not only to provide a meaningful transition from one video stream to the other or from one part of a stream to another part of the same stream by providing visual cues as to the direction of the transition thereby helping the viewer maintain orientation within the video environment, but also to provide such meaningful transition information during the time when the computer is seeking the TO video stream. However, the transition technique of the present invention can be used for either one of these purposes or for both. The example provided below employs the processes required to perform both functions, although it is understood that both functions are not necessary to practice the invention and the use of synthetic MPEG as disclosed herein is within the scope of the invention whether or not coupled with use of visually meaningful orientation cues and whether or not used to mask seeking time.

The video and film industry has used many transitions in the prior art. and many of these can be accomplished using the above technique. For the specific purpose of navigating through a videotaped environment, however, the two most important transitions are what we will call the "push left" and the "push right" transitions. In a push left, the TO picture enters the screen from the left and pushes the FROM picture off the right-hand side. This appears to the user as if he had turned to the left. In a push right, the TO picture enters the screen from the right and pushes the FROM picture off the left-hand side. This appears to the user as if he had turned to the right. If care is taken when shooting the source video, the edges of the FROM and TO frames will be a close match, enhancing the appearance of turning. Similar transitions can be accomplished for "push up", "push down" and other common and uncommon transitional techniques present in the digital and analog prior art. According to the preferred embodiment ofthe present invention, the MPEG streamer of the multimedia product has an additional buffer in which are maintained 1 frames

27

SUBSTTTUTE SHEET (RULE 26) corresponding to the TO frames which the run-time system would seek for each possible transition which the user might initiate from that point of the video environment. For example, if the environment was a department store interior, the buffered frames might be left view, right view, and 180 degree turn midway on a department store aisle and might include left turn, right turn and 180 degree turn at the intersection of two aisles. FIG. 2 illustrates an embodiment containing such a TO frame buffer. In order to reduce memory requirements and access time, the contents ofthe TO frame buffer may change from time to time depending on the location of the user within the video environment. Alternatively, the transition can be performed after the TO frame is located in the target video stream sequence where the entire sequence is stored. A system according to the invention is illustrated in FIG. 2. This illustration shows the preferred embodiment containing a dynamic TO frame buffer 27. The embodiment illustrated in FIG. 2 also has a pre-constructed group of synthetic MPEG transition streams 29 (FIG. 2) which can be readily accessed to provide the desired push left, push right or other enabled transition effect. FIG. 2 also illustrates use of a transition generator 31 which can be employed to create the desired synthetic MPEG stream on-the-fly. The method of the invention can be practiced with either or both a pre-constructed synthetic MPEG transition buffer or a generator.

The Push Transition

The transition methods of the invention are useful whether or not they are used to mask seek time. Where seek time masking is not desired, the TO frame buffer can be eliminated and the TO frame sought directly from the target video stream. Whether the TO frame comes from the additional buffer or is copied from the actual TO frame, the streamer inserts the TO frame into the stream and causes the player to parse it. This forces the FROM frame onto the screen.

The player has now set up a situation in which the FROM frame is in the past buffer and the TO frame is in the future buffer. The streamer can generate a series of B frames, all of which refer to either the FROM frame, the TO frame, or both, which appears to the user, when played, as if a transition is occurring.

The transition process is illustrated in the process flow diagram of FIG. 10. For the purposes of this illustration, assume the user of a multimedia software product is "navigating^" along a path in a video environment which path is represented by a single MPEG stream and the user wishes to turn to the right at an intersection in the video environment being displayed

28

SUBSTTTUTE SHEET (RULE 26) by the computer and proceed in this new direction. Assume further that the view to the right and the navigable path to the right is contained on a second video stream here called the target stream. It could just as easily be contained on a remote area ofthe first video stream.

In the first step of the push right transition illustrated in FIG. 10. the user by means of any input device executes the appropriate navigational command which is recognized by the computer to require a push right transition to the target video stream at the TO frame. As the preferred embodiment of the invention uses a transition frame cache or buffer, only selected, matched frames may be used for transitions. Therefore, upon receiving the push right instruction, in step two ofthe process the MPEG streamer continues to play the current FROM video stream until it encounters the next FROM frame in the stream for which a push right TO frame is correlated. Such a frame may simply provide a view of a side wall or inaccessible portion ofthe environment, in which case no target video stream is sought, merely cached "side view" frames. However, in this illustration, the system indicates the availability of a path to the right, therefore, there is not only a cached "right view" TO frame, but such TO frame also represents an entry point on a target video stream. Therefore, step two leads not only to step three where the FROM frame is sent to the MPEG player by the MPEG streamer, but also to steps A and B where, simultaneously, the streamer seeks the target video stream on the video stream storage media. Steps A and B can be further facilitated by use of a pre-constructed video stream index 33 (FIG. 2). While the seeking steps are being performed, the streamer sends the appropriate TO frame from the TO frame buffer 27 (FIG. 2) to the player where it is parsed into the future buffer forcing the FROM frame into the past buffer as shown in step four of FIG. 10. The MPEG streamer then either retrieves the appropriate pre-constructed synthetic MPEG transition stream from the transition buffer 29 (FIG. 2) or employs the transition generator 31 (FIG. 2) to create the appropriate synthetic MPEG stream on-the-fly as illustrated in step five of FIG. 10.

The construction and function ofthe synthetic MPEG stream is discussed in more detail below.

As the last of the synthetic MPEG frames reach the player and the transition is completed, tlie seek function will also be completed and the player can resume normal play at the new position in the TO video stream. Note that if the TO frame is not an I frame it must be converted to an I frame using standard MPEG decoding/encoding steps before it can be injected into the TO

29

SUBSTTTUTE SHEET (RULE 26) frame cache. The actual target frame will remain unaltered, but its representation in the cache will be as an I frame.

Construction of Synthetic MPEG for use with the Push Transition.

The synthetic MPEG which is sent to the player from the transition buffer or transition generator to create the push right transition desired by the user is performed by generating a series of B frames. Each macroblock in the generated B frames refers to either the picture in the future buffer or the picture in the past buffer. The first frame in the series contains mostly references to the past buffer, but all macroblocks referring to the past buffer use a motion vector that copies the macroblock in the FROM frame that is in the same row, but in the next column to the right. The last column of macroblocks use a motion vector that copies the macroblock in the TO frame that is in the same row. but in the first column. This results in a picture that consists mostly of the FROM picture, but shifted to the left, and a small amount of the TO picture. Succeeding pictures repeat the process, gradually shifting more of the FROM picture off the left edge ofthe screen and more ofthe TO picture onto the screen, until only one column remains ofthe FROM picture. At this point, the streamer would initiate frame accurate access to the TO frame in the TO video stream, causing the player to perform the functions illustrated above to place itself in the appropriate state to play the target TO frame. Normal forward play can then be resumed, with the decoder now decoding pictures from the new stream or position. Since the streamer does not refer to any MPEG streams "on disk^" while generating and displaying the transition synthetic MPEG, it is free to issue an asynchronous seek command to the storage device containing the MPEG streams immediately after receiving the turn command.

This process is illustrated in FIG. 11. subparts a through k. Assume that the FROM picture is picture F and the TO picture is picture T. Assume further that each picture is 10 macroblocks wide and 7 macroblocks high ( 160x112) for purposes of illustration, although the technique applies to any size picture. FIG. 11a labels each macroblock in the FROM picture with a code consisting ofthe letter of that picture, a digit indicating the row of the macroblock. and a digit indicating the column of the macroblock. FIG. l ib does the same for the TO picture. FIG. lie through FIG. Ilk show the contents of the intermediate B frames that would be generated to display a push right transition from frame A to frame B. Note that in the

30

SUBSTTTUTE SHEET (RULE 26) push right transition, all motion vectors are an even multiple of 16 pels, meaning that an exact copy of a macroblock in either the FROM picture or the TO picture is placed in each macroblock of the intermediate pictures.

FIG. 12 shows an illustration of a push right transition according to the invention as it would appear in the display. In FIG. 12, the pictures illustrated are assumed to be only 6 macroblocks wide, although in actuality a 16 pel macroblock width increment would be substantially smaller than that illustrated. Further, in FIG. 12 illustration the display is assumed to coincide with the bitmap of the picture. That is. it is assumed that there are no undisplayed portions of the picture. The transition works equally well where the picture display is less that the entire bitmap. Further, transitions can be improved if the player used has the ability to "pan" the bitmap. This process is discussed in detail below in the Turn Transition section.

At the top left of FIG. 12 the FROM frame is shown. The width of a column of macroblocks is illustrated by a dotted line 51 extending vertically through the FROM picture of FIG. 13. This width is denoted to be one macroblock column (16 pels) by the MB measurement shown 53. The right margin of the FROM picture is depicted in pictures Tl through T5 of FIG. 12 by a vertical line 55. Pictures Tl through T5 are sequential transition pictures generated according to the push right embodiment ofthe invention. Tl contains all of the macroblock columns of the FROM picture except for the one to the farthest left. The leftmost column of Tl is the macroblock column which was second from the leftmost in the FROM picture. Tl also contains as its farthest right macroblock column the farthest left column of macroblocks from the TO Picture. Similarly. T2 contains all of the macroblocks of the FROM picture except for the two columns farthest to the left with the macroblock column which was third from the left in the FROM picture as the leftmost column of T2. T2 contains the leftmost two macroblock columns of the TO picture as its rightmost two columns. The successive transitional pictures. T3, T4, and T5. contain increasing numbers of columns of die TO picture and decreasingly fewer columns of macroblocks from the FROM picture until T5 which contains only the right most column of the FROM picture as its leftmost column and contains all columns of the TO picture except for the right most. When played sequentially between a FROM picture and a TO picture whose borders are well aligned, the effect of this transition is a pan to the right.

31

SUBSTTTUTE SHEET (RULE 26) The speed of the transition can be varied by use of repeated transition frames. Further, the speed of the transition does not have to be constant. Use of varying numbers of repeated frames can create the illusion of acceleration or deceleration in turn speed. Use of this embodiment of the invention does have a minimum transition increment, however. The smallest increment of change from transition to transition is the macroblock. Thus in a push right transition, a row of macroblocks is the smallest incremental change from picture to picture.

Transitions are not limited to push rights and push lefts. Indeed, many transition effects can be generated since any of the macroblocks in the intermediate frames can display any 16x16 pel region in the FROM picture, any 16x16 pel region in the TO picture, or. by using both forward and backward motion vectors, the average of any 16x16 pel region in the FROM picture and any other 16x16 pel region in the TO picture.

Note that it is possible for transitions similar to these to be generated in software without the use of an MPEG player or an MPEG stream. Some of these exist in the prior art. In order to create a transition without resort to the present invention, the decoder would need to make the bitmaps of the FROM and TO pictures available to the software, which would then scroll them across the screen in a meaningful way. However, this does require the transition software to be in control with the resulting increase in computational demands. If the transition is done solely as an MPEG stream, as taught in the present invention, any MPEG hardware that is available will be able to run the transition with minimal CPU intervention. If the transition is done in software, rather than utilizing the present invention, the CPU will be quite busy displaying it. If the CPU is less busy performing the transition, it can spend more time anticipating the user^'s needs and reduce response time to the user's requests.

Note further that the transitions described here are "invariant." In other words, the intermediate sequence of B frames for a given transition, such as a push right, is identical. regardless ofthe content ofthe FROM and TO pictures. This has significant advantages at run¬ time in that it is not necessary to perform laborious calculations or encoding to run the transition; a static block of data can be inserted into the bitstream by the streamer and then fed to the player to implement the transition. Rather than generating transitions during runtime as described above, the transitions can be performed during editing of a video work and the resulting linear video stream can be stored for future playback. Consequently, the part ofthe invention dealing with transitions also has application in more traditional linear-style editing of MPEG video. A linear editor traditionally uses cut and paste techniques to create a new video stream out of one or more component streams, possibly with transitions between sections of the component streams. The transition methods of the current invention could be employed to generate smooth transitions between the cut and pasted sequences at edit time and written to the disk as a part of the resulting edited video stream, thus creating a new linear video stream that has the effect of playing the transition. Thus the present invention can be employed in a runtime system to allow a large number of transitions to be accommodated, or can be used to splice video streams or sequences together at edit time.

Turn Transition

One drawback of the push transition described above is that the smallest incremental change from intermediate picture to intermediate picture is a macroblock. If the transition is being performed in video coded to the MPEG- 1 standard, this means that the smallest unit of change is a 16X16 pel region of the display. While this type of transition is more than adequate for many applications, there are applications where a smoother transition is desired.

The transitions aspect of the present invention can also be used in conjunction with intra-bit map panning to generate a smooth transition. This type of transition is a hybrid of the techniques previously described and existing techniques for panning bitmaps called a "turn" transition. In the preferred embodiment, we are primarily concerned with "turn left" and "turn right" transitions, analogous to "push left" and "push right", but "turn up" and "turn down^" would also be possible. Turn transitions are much like push transitions, but can be made to occur much more smoothly than pushes since the from and to images do not have to move in 16-pel increments.

To execute a turn transition, the player must be placed in a mode which displays less than all ofthe video data in the pictures in the stream. At least a 16 pel column (where the turn transition is to the left or right) or row (where the turn transition is up or down) of video data must be offscreen at all times. This could be accomplished by any means known in the art. For example, pictures larger than the display can be accomplished by "stretching^" the

SUBSTTTUTE SHEET (RULE 26) remaining pels to fill the onscreen area, or by reducing the size of the onscreen area so that only part ofthe source video is available.

Furthermore, in order to execute the hybrid "turn^" transition, the MPEG player must have the capability of ^"panning^" over the source video. Panning involves mapping a portion of the source video to the onscreen display area, leaving some portion of the source video offscreen. For example, if the pel with X coordinate of 16 and Y coordinate of 0 is onscreen at coordinate 0.0 (the upper left portion of the video display area) and coordinates are increasing down and to the right, the video picture is said to be panned 16 pels right. In this case, 16 pels of source video would be said to be offscreen to the left of the onscreen area. If no pels are offscreen to the left, the video is said to be panned fully left; if no pels are offscreen to the right, it is said to be panned fully right.

Once the player is set up correctly, the streamer executes the turn right by the following steps as illustrated for a right turn transition in FIG. 13:

In the first step the player's past and future buffers are set up as for a push transition as illustrated in FIG. 10. steps 1 through 4. The future buffer should contain a copy of the TO frame, the past buffer should contain a copy of the FROM frame.

In the second step, the streamer or other runtime component decides on the number of pels the FROM and TO frames should appear to move between frames. This number reflects the speed at which the "turn" will be perceived by the viewer of the display. This decision can be made in a variety of ways. For example, the speed of the turn could vary according to the magnitude of the turn command from a user input device, such as a joystick or mouse. In the example of FIG. 13 the display will change at the constant rate of 4 pels per frame. However, it is well within the invention for the speed of the transition to be much less or much more depending on the effect desired. Further, the speed of the transition does not have to be constant. Note also that a variable number could be chosen for each frame to give the impression of accelerating or decelerating during the turn. The system could accelerate or decelerate or both during all or portions ofthe turn transition.

The third step illustrated in FIG. 13 requires the player to be instructed to pan the video display port to the right such that, at the end of the series of pans which make up this step, there are fewer than one panning increment's worth of pels of the FROM picture offscreen to the right of the display. This is done iteratively by sending past duplicator B frames to the screen and iteratively adjusting the panning amount by the panning increment. For example, if 80 pels are offscreen to the right when step 3 begins, and if the panning increment is 4, then 20 past duplicator frames will be sent as the picture is panned to the right 4 pels at a time. Notice that the duplicator frames in this panning step refer only to the FROM frame and contain no information from the TO frame.

In the fourth step illustrated in FIG. 13 the streamer determines which pel column of the FROM picture will be the leftmost pel column of the onscreen display area for the next frame by adding the panning increment to the leftmost currently visible pel column which has the effect of shifting the leftmost column to be displayed one panning increment to the right of the leftmost column displayed in the previous picture. Exactly where in the FROM picture this leftmost displayed column will be depends on the relationship between the width ofthe display port and the width ofthe picture being displayed. Once this determination is made the streamer constructs or retrieves a composite B which places the macroblock column of the FROM picture which contains the desired pel column in the leftmost macroblock column of the composite B frame. Succeeding columns contain the rest of the FROM picture and the first column(s) of the TO frame. These composite B frames are similar to the ones used in the PUSH transition discussed above.

The fifth step illustrated in FIG. 13 is to send the composite picture to the player while also instructing the player to adjust its panning mode to the left by the amount that the FROM picture was shifted when creating the composite picture minus one panning increment. The onscreen appearance will be that the FROM picture has shifted one panning increment to the left and the onscreen pel columns to the right of the FROM picture contain the first pel columns ofthe TO picture.

In the sixth step illustrated in FIG. 13 the streamer sends the player another copy ofthe composite video frame used in step 5. but with a panning instruction which serves to move the display one panning increment (eight pels) to the right. This step is repeated with incremental panning adjustments until the number of pels remaining offscreen on the right side of the picture is less than one panning increment.

The seventh step illustrated in FIG. 13 is designed to determine whether further duplicator B frames must be constructed or retrieved. In the seventh step the MPEG streamer determines which pel column should appear in the leftmost pel column of the visible display area by adding the panning increment to the leftmost currently visible pel coiumn. If this addition results in a leftmost visible pel column which is still in the range of the FROM picture^'s width, further intermediate B frames must be constructed or retrieved, and the streamer proceeds to step 8. If the resulting pel column would be outside the FROM picture^'s width, no further frames need be constructed; the TO picture can be used instead, and the streamer proceeds to step 9. which is the last step in the process.

In step eight the streamer determines which macroblock column in the FROM picture contains the pel column that should be on the left-hand edge of the visible portion of the next frame and constructs or retrieves a duplicator B frame with the appropriate past and future buffer references to place this macroblock coiumn in the leftmost macroblock column of the B picture. Succeeding macroblock columns contain the portions of the FROM picture further to the right if any. followed by the first columns of the TO picture. After constructing the intermediate B frame, the streamer proceeds back to step five to display it appropriately.

Once the streamer determines that no more portions of the FROM picture are in the visible display area, normal play is resumed from the TO picture. Note that steps A and B illustrated in FIG. 13 may be executed during the transition to mask seek time so that the TO video stream has been located and is ready to be sent to the streamer at the completion of the transition and the system is ready to accept any other user input available from this position on the TO frame. Note that in the best mode implementation of the invention the correlated FROM and

TO pictures in all transitions, whether "hybrid" panning transitions or pure synthetic MPEG transitions, are visually well matched. For example, in a push right or right turn transition, there is preferably no overlap and no visual gaps between the right edge of the FROM picture and the left edge of the TO picture. Visual elements that span the two pictures should appear normal if the two pictures are displayed adjacent to each other. If video or film is used to create the MPEG video, the camera position must be carefully controlled with respect to its elevation and position. The difference in the camera angles between the two pictures should also be equal to the camera's field of view, to prevent gaps or overlaps. Also, optical or post- production correction techniques may be needed to correct for any optical distortion of the lens at the edges ofthe picture. These techniques are well known in the art. Most of these problems disappear if computer-generated images are used to generate the MPEG video. In the preferred embodiment of the system, a camera lens or computerized rendering option is chosen so that a 90 degree field of view is obtained, thus giving the ability to make 90 degree left or right turns conveniently. In this implementation, it is possible to show exactly half of the resulting video, giving an effective 45 degree field of view onscreen. This in turn allows the turn transitions to occur using only one intermediate frame that consists of half of the FROM picture and half of the TO picture. The intermediate frame is repeated multiple times while the panning is adjusted to reveal different portions of it.

It is also important to note that, as is the case in the push transition, in the turn transition the duplicate B frames used to accomplish panning are invariant and independent of picture content. Further, in the preferred embodiment using a 90 degree field of view for the picture and a 45 degree field of view for the display, composite B frames containing references to portions of the FROM picture and to portions of the TO picture used in right turns in the production are also invariant. Consequently, a single sequence of transition B frames, for example, one for use in all right turns, can be either precreated and stored at edit time or generated on-the-fly during runtime.

FIG. 14 illustrates a right turn according to the invention. In FIG. 14 there are two video pictures, picture F. the FROM picture and picture T. the TO picture. These pictures which were derived from actual "shot" scenes or rendered scenes are referred to as video resources to distinguish them from composite pictures created with the use of duplicator B frames according to the invention. These composite pictures are sometimes referred to herein as manufactured resources. For the purposes of this illustration it is assumed that picture F and picture T each measure 704 pels by 240 pels and that the onscreen visible area is 352 pels by 240 pels. It is also assumed that the panning increment is a constant 8 pels, and that the topmost, leftmost pel in a picture is pel 0.0. To assist in the explanation of the transition in FIG. 14 the pictures are shown divided into two regions. For example, picture F has a left region F_L 206 consisting of the first 352 pel columns of F (columns 0 through 351) and a right region F_R 207 consisting of pel columns 352 through 703. Similarly, picture T has a left region T_L 208 consisting of pel columns 0 through 351 and a right region T_R 209 consisting of the remaining 352 pel columns of the picture. The four subparts of FIG. 14. FIGS. 14a, 14b, 14c and 14d illustrate a right turn transition from F to T. In FIG. 14 the active video picture, that is. the picture of which a portion is being displayed by the system, is shown by a solid-lined rectangle. Where the active video picture is a video resource 201. the lines of the rectangle are thin. Where the active video picture is a manufactured resource 202. the lines of the rectangle are thick. Where the video picture is inactive, that is not currently being displayed, it is represented by a rectangle outlined by a single, dotted line 204. A double, dotted line is also used to delineate between the left and right regions of a picture 205. whether or not the picture is an active picture.

The onscreen region of the video picture being displayed is shown as a shaded gray area 203. Depending on the circumstances of the turn, the FROM frame could be either a video resource 201 or a manufactured resource 202. In the preferred embodiment, the FROM frame is usually a video resource either taken directly from the video stream being played or taken from a cache of pictures of views which may or may not be not present on a video stream. Of course, if the original FROM picture is not a reference frame it must be converted to a reference frame as discussed above in the section dealing with push transitions. FIG. 14a shows the visible portion of the screen 203 before the transition is executed.

The middle portion of picture F is being displayed. That is the displayed portion of F consists of the right portion of region F_L 206 and the left portion of region F_R 207 which consists of pel columns 176 through pel column 527 of F. The panning position is therefore 176 pels from the left edge of F. To begin the transition, the streamer must first add the FROM picture to the past buffer and the TO picture to the future buffer by passing them to the MPEG player to parse. Then the streamer generates a series of 22 past duplicator pictures while the panning position is adjusted to the following positions:

184, 192. 200. 208, 216, 224. 232. 240. 248, 256. 264, 272, 280. 288. 296. 304, 312, 320, 328. 336. 344, and 352

As the twenty-second duplicator is displayed the picture is panned fully to the right, with no pels offscreen on the right. That is the display boundary coincides with the boundaries of F_R and the displayed portion of the picture 203 covers all of F_R. This position is shown in FIG. 14b

The streamer then obtains or creates a composite B picture consisting of the right 352 pel columns of picture F and the left 352 pel columns of picture T made in accordance with the

38

SUBSTTTUTE SHEET (RULE 26) invention. We refer to this composite picture as F_RT_L. As it is the active picture in FIG. 14c. it is shown there in a bold rectangle 202.. In the preferred embodiment, the leftmost panning increment of F_RT_] will not be displayed, as it duplicates that last display of picture F. The first display of F_RT_R will be one panning increment to the right of the picture^'s left margin. This is accomplished by the streamer sending picture F_RT] to the player and simultaneously resetting the panning position such that 8 pels are offscreen on the left and 344 pels are offscreen on the right. The visible appearance is that the portion of picture F_RF, that was previously visible has shifted left 8 pels and 8 pels of picture T_LT_R are visible on the right edge of the picture. The streamer continues to send copies of F_RT_L to the player while the panning position is adjusted to the following positions:

16. 24. 32. 40. 56. 64. 72. 80. 88. 96. 104. 1 12. 120. 128. 136. 144. 152. 160. 168, 176. 184. 192. 200. 208. 216. 224. 232. 240. 248. 256. 264. 272. 280. 288. 296. 304. 312. 320. 328. 336. and 344

FIG. 14c shows the panning process at the intermediate point where the panning position reaches 176 pels from the right margin of picture F_RT, . When the panning position reaches 344 we have the option of panning one further increment to 352 and direct the player to pan picture T_LT_R 8 pels to the right of the left margin, however we have chosen to cease the pan at the 344 pel position and. preferably using the frame specific access techniques described elsewhere in this disclosure, place the T_t T_R picture on the screen and simultaneously change the panning instruction to the player so that no pels are offscreen to the left. This position is shown in FIG. 14d.

From this point, based on the user^'s input, we are free to continue panning so that frame T_LT_R is centered, to play video in a forward or backward direction, to execute a further right turn, resulting in a 360 degree turn, to execute a left turn back to F, F_R (through use of a left turn according to the invention, or any other interaction incorporated into the system and chosen by the user.

It is important to repeat that the inventions described herein are adaptable to any digital video system, particularly those which use reference frames and dependent frames and are not limited to use in digital video complying with the MPEG standard.

39

SUBSTTTUTE SHEET (RULE 26) ATTACHMENT 1

MPEG STREAMER INTERFACE DEFINITION t\ pedef long SVPRC. I' Return code value tvpedef void *SVPSuperHandle. // Blind handle used by streamer

This structure defines an MPEG transition tvpedef struct taeMPEGTRANS [ long mJLength, // Length of data in m_pData unsigned char *m_pData // Actual MPEG data for transition (all B frames),

// 'mJLength' bytes long unsigned long m ulFlags // Flags, see MPTF defines below short m sNumFrames. '/ Number of B-frames in this transition

SVPRect *m_pPans. / Amount bv which to pan. m sNumFrames elements or NULL

// if the MPTF PANS bit is not on ! MPEGTRANS, =define MPTF PANS 0x01 /' Transition has panning information

This structure is used for seeking and to indicate the current frame in any ofthe ' frame sync functions Note that when you call MpegbeekSMPTE or MpegSeekFrame. only the SMPTE or FrameNum portions are looked at for input and the other portion is appropriately

/ updated on return

/ tvpedef struct tagSYNCINFO { long m JFrameNum. '' INOUT - Current frame number - 0 is the first frame B BYYTTEE mm nnSSMMPPTTEEHHoouurr,, // INOUT - Hour component of SMPTE time code

BYTE m nSMPTEMm. // INOUT - Minute component of SMPTE time code

BYTE m nSMPTESec, // INOUT - Second component of SMPTE time code

BYTE m nSMPTEFrame , // INOUT - Frame count component of SMPTE time code

BYTE m_nFrameType. // INOUT - Type of the current FRAME (I. P. B, D)

| SYNCINFO,

^define MAX_PICS_PER_BUFFER 20 // Max number of pictures per MPPBuffer tvpedef struct MPPPicInfoTag

I I long mJPicNum, // Picture number (0-oπgιn) long m JOffset, '/ Offset of this picture in the buffer

} MPPPiclnfo, Codes for the m nlnUse field of MPPBuffer

^define MPPBuffer FREE 0 // Buffer is available ^define MPPBuffer PLAYER 1 // Buffer is in use by player ^define MPPBuffer PDATE 2 // Buffer is being updated do not use tvpedef struct MPPBufferTag struct MPPBufferTag *m_pNext, // Next buffer in chain (used by streamer only) void *m_pvData. // MPEG data long m JFiIeOffset. // File offset of MPEG data long mJLength, // Length of MPEG data long mJMax. // Maximum length allocated

40

SUBSTTTUTE SHEET (RULE 26) long m JTag; // Tag of MPPBuffer long m INumFrames: Number of frames in MPEG data int m jilnUse; // See MPPBuffer_ codes above

MPPPiclnfo m_Piclnfo[MAXJ^>ICSJ³ER_BUFFER];

/ List of picture number/offset combinations. >! sorted by offsets. j MPPBuffer;

MPSOpen initializes the streamer on the given file. The file will usually be the clip hunk generated by the binder. SVPRC MPSOpen(SVPSuperHandle &sup. const char *strFileName);

MPSClose closes a previously-opened file SVPRC MPSClose(SVPSuperHand!e sup):

/ MPSSeek seeks the currently-open stream to the given file number SVPRC MPSSeek(SVPSuperHandle sup, long IFrameNum);

' MPSCacheFrame copies the specified frame number into the frame cache without displaying it. Typically the cache ID is the first frame of the actual TO video clip, while the IFrameNum parameter is the frame number of the cached copy of the TO frame that the

• binder copies into the clip hunk.

SVPRC MPSCacheFrame(SVPSuperHandle sup, long IFrameNum. long ICachelD);

/ MPSDeCacheFrame removes a previously-cached frame from the cache. SVPRC MPSDeCacheFrame(SVPSuperHandle sup, long ICachelD):

/ MPSDeCacheAll removes all previously-cached frames from the cache SVPRC MPSDeCacheAIKSVPSuperHandle sup);

/■' MPSPlay Transition causes a transition to play using the MPEG frame '/ that is currently in the future buffer as the FROM frame and the / specified cached frame as the TO frame. The transition to use is specified by the pMpegTrans structure passed in. SVPRC MPSPlay Transition(SVPSuperHandle sup,

MPEGTRANS *pMpegTrans. long ICachelD. long ISeekFrame =- 1 );

7 MPSWork must be called periodically to give the streamer a chance to i read data and to give a software player, if one is being used, a / chance to do its work.

SVPRC MPSWork(SVPSuperHandle sup. SYNCINFO *pSyncInfo);

MPSChopStream terminates the currently-playing MPEG stream after the specified frame number. Used for eliminating information that has already been read but has not yet been sent to the MPEG ' player in preparation for exiting the current clip. SVPRC MPSChopStream(SVPSuperHandle sup, long IFrameNum);

' MPSHoIdVideo is used to prevent the streamer from reading any more ' video from the clip hunk. Used before an MPSChopStream call to ' make sure the streamer doesn't read any more video before the ' transition takes place.

SVPRC MPSHoldVideo(SVPSuperHandle sup, int hold);

Claims

What is claimed is:

1. A computer-implemented method of generating transition effects between two frames of digitally compressed video, comprising the steps of: a. selecting a FROM frame and a TO frame; b. generating a stream of bidirectionally dependent duplicator frames wherein the members of said series van' in their motion vector references to said FROM frame and said TO frame according to a predefined pattern; c. placing the FROM frame in the past buffer of a decoder: d. placing the TO frame in the future buffer of a decoder; e. feeding said stream of duplicator frames to said decoder, causing said duplicator frames to be displayed: and f. beginning normal playback of the video stream containing the TO frame at the TO frame position.

2. A computer-based system for generating transition effects between two frames of digitally compressed video, comprising: a. means for selecting a FROM frame and a TO frame; b. means for generating a stream of bidirectionally dependent duplicator frames wherein the members of said series vary in their motion vector references to said FROM frame and said TO frame according to a predefined pattern: c. means for placing the FROM frame in the past buffer of a decoder: d. means for placing the TO frame in the future buffer of a decoder; e. means for feeding said stream of duplicator frames to said decoder, causing said duplicator frames to be displayed; and f. means for beginning normal playback of the video stream containing the TO frame at the TO frame position.

3. A system according to claim 2 wherein said selecting means comprises identification of permitted FROM and TO frames during the edit process.

42

SUBSTTTUTE SHEET (RULE 26) A system according to claim 2 wherein said generating means comprises creation of said duplicator frame stream during the edit process.

A system according to claim 2 wherein said generating means comprises creation of said duplicator frame stream on-the-fly during playback.

A system according to claim 2 wherein the transition effects are used to mask seeking time.

7. A system according to claim 2 wherein said generating means, said TO frame placing means, said FROM frame placing means, said feeding means and said resuming means comprise use of a streamer means wherein said decoder is fed a digital video stream from a streamer buffer in which various video frames from various sources are sent to said streamer buffer where they are combined into a virtual video stream.

8. A system for generating transition effects between a FROM frame and a TO frame of digitally compressed video, comprising: a. a computer; b. said computer being programmed to: i. generate a stream of bidirectionally dependent duplicator frames wherein the members of said series vary in their motion vector references to said FROM frame and said TO frame according to a predefined pattern, ii. place the FROM frame in the past buffer of a decoder. iii. place the TO frame in the future buffer of a decoder, iv. feed said stream of duplicator frames to said decoder, causing said duplicator frames to be displayed, and v. begin normal playback of the video stream containing the TO frame at the TO frame position.

43

SUBSTTTUTE SHEET (RULE 26) 9. A computer-readable medium for causing a computer to generate transition effects between a FROM frame and a TO frame of digitally compressed video, comprising: a. a computer-readable storage medium; and b. a computer program stored on said storage medium: c. said computer program comprising: i. means for generating a stream of bidirectionally dependent duplicator frames wherein the members of said series vary in their motion vector references to said FROM frame and said TO frame according to a _S predefined pattern; ii. means for placing the FROM frame in the past buffer of a decoder; iii. means for placing the TO frame in the future buffer of a decoder: iv. means for feeding said stream of duplicator frames to said decoder, causing said duplicator frames to be displayed; and v. means for beginning normal playback of the video stream containing the TO frame at the TO frame position.

10. A computer-readable medium according to claim 9 wherein: a. the FROM frame is a non-reference frame; and b. said means for placing the FROM frame in the past buffer or the future buffer of the decoder comprises means for copying the decoded data of the FROM frame directly into the past buffer or the future buffer of the decoder.

1 1 . A computer-readable medium according to claim 9 wherein: a. the TO frame is a non-reference frame; and b. said means for placing the TO frame in the past buffer or the future buffer of the decoder comprises means for copying the decoded data of the TO frame directly into the past buffer or the future buffer of the decoder.

12. A computer-readable medium according to claim 9 wherein said transition effects are generated in response to user input or program signal. 13. A computer-readable medium according to claim 9 wherein said means for placing said TO frame in said future buffer comprises: a. means for storing at least one possible TO frame in at least one TO frame buffer; and b. means for sending the appropriate TO frame to said decoder after said decoder receives said FROM frame.

14. A computer-readable medium according to claim 9 further comprising upgrading means whereby said FROM and said TO frames which were not originally encoded as reference frames are upgraded to reference frames.

15. A computer-readable medium according to claim 14 wherein said TO frames are upgraded to reference frames with no dependencies on other frames.

16. A computer-readable medium according to claim 14 wherein said FROM frames and said TO frames are upgraded by reencoding the video streams where the encoder is instructed to encode said FROM frames and said TO frames as the required frame type.

17. A computer-implemented method of initiating playback of a video from a digitally compressed MPEG video stream, which stream contains reference frames and dependent frames, at an arbitrarily selected target frame within said video stream, comprising the steps of: a. determining the location ofthe target frame; b. determining the type ofthe target frame: c. identifying the reference frames to which the target frame directly and indirectly refers; d. parsing the reference frames with a decoder while said decoder is in suppression mode;

45

SUBSTTTUTE SHEET (RULE 26) e. enabling said video display so that subsequently decoded frames are displayed; and f. beginning normal decoder playback at the target frame location in said video bitstream.

18. A method according to claim 17. wherein said steps of determining the location of the target frame, determining the type of the target frame, and identifying the reference frames are all accomplished using a pre-constructed index.

19. A method according to claim 18. wherein said pre-constructed index is stored in a manner selected from the group consisting of: a. in a file separate from the video stream: b. in a separate stream within the file containing the video stream; c. distributed throughout the video stream as user data; d. encoded in fields within the video stream; e. distributed throughout the video stream as a non-standard extension of the video data; and f. in a single chunk at a known position in the video stream.

20. A computer-based system for initiating playback of a video from a digitally compressed MPEG video stream, which stream contains reference frames, dependent frames. GOP headers and sequence headers, beginning at an arbitrarily selected frame within said video stream comprising: a. means for determining the location of the target frame; b. means for determining the type of the target frame; c. means for identifying the reference frames to which the target frame directly and indirectly refers; d. means for parsing said reference frames with a decoder while said decoder is in suppression mode; e. means for enabling said video display so that subsequently decoded frames are displayed; and f. means for beginning normal decoder playback at the target frame location in said video bitstream.

21. A computer-based system according to claim 20. wherein said target frame locating means, said means for determining the type of the target frame, and said means for identifying reference frames comprise a pre-constructed index.

22. A computer-based system according to claim 21. wherein said pre-constructed index is stored in a manner selected from the group consisting of: a. in a file separate from the video stream; b. in a separate stream within the file containing the video stream: c. distributed throughout the video stream as user data: d. encoded in fields within the video stream; e. distributed throughout the video stream as a non-standard extension of the video data; and f. in a single chunk at a known position in the video stream.

23. A system for initiating playback of a video from a digitally compressed MPEG video stream, which stream contains reference frames, dependent frames. GOP headers and sequence headers, beginning at an arbitrarily selected frame within said video stream comprising: a. a computer: b. said computer being programmed to determine the location of the target frame, determine the type ofthe target frame, identify the reference frames to which the target frame directly and indirectly refers, parse said reference frames with a decoder while said decoder is in suppression mode, enable said video display so that subsequently decoded frames are displayed, and begin normal decoder playback at the target frame location in said video bitstream.

47

SUBSTTTUTE SHEET (RULE 26) 24. A system according to claim 23. wherein said computer program determines the location of the target frame, determines the type of the target frame, and identifies the reference frames to which the target frame directly and indirectly refers using a pre¬ constructed index.

25. A computer-readable medium according to claim 24, wherein said pre-constructed index is stored in a manner selected from the group consisting of: a. in a file separate from the video stream; b. in a separate stream within the file containing the video stream; c. distributed throughout the video stream as user data; d. encoded in fields within the video stream; e. distributed throughout the video stream as a non-standard extension of the video data; and f. in a single chunk at a known position in the video stream.

26. A computer-readable medium for causing a computer to initiate playback of a video from a digitally compressed MPEG video stream, which stream contains reference frames, dependent frames. GOP headers and sequence headers, beginning at an arbitrarily selected frame within said video stream comprising: a. a computer-readable storage medium; and b. a computer program stored on said medium; c. said computer program comprising: i. means for determining the location of the target frame, ii. means for determining the type ofthe target frame. 5 iii. means for identifying the reference frames to which the target frame directly and indirectly refers, iv. means for parsing said reference frames with a decoder while said decoder is in suppression mode, v. means for enabling said video display so that subsequently decoded C frames are displayed, and

48

SUBSTTTUTE SHEET (RULE 26) vi. means for beginning normal decoder playback at the target frame location in said video bitstream.

27. A computer-readable medium according to claim 26. wherein said target frame locating means, said means for determining the type of the target frame, and said means for identifying reference frames comprise a pre-constructed index.

28. A computer-readable medium according to claim 27. wherein said pre-constructed index is stored in a manner selected from the group consisting of: a. in a file separate from the video stream: b. in a separate stream within the file containing the video stream; c. distributed throughout the video stream as user data; d. encoded in fields within the video stream; e. distributed throughout the video stream as a non-standard extension of the video data; and f in a single chunk at a known position in the video stream.

29. A computer-readable medium according to claim 26. wherein said target frame locating means is a pre-constructed index.

30. A computer-readable medium according to claim 26. wherein said means for determining the type ofthe target frame is a pre-constructed index.

31. A computer-readable medium according to claim 26, wherein said means for identifying said reference frames, including said reference frame^'s reference frames. is a pre-constructed index.

32. A computer-readable medium according to claim 26. wherein said parsing means is means for sending said reference frames to said decoder in bitstream order.

49

SUBSTTTUTE SHEET (RULE 26) 33. A computer-readable medium according to claim 32. wherein said parsing means further comprises: a. means for seeking to the first reference frame in bitstream order and sending it to said decoder; b. means for seeking to the next reference frame in bitstream order and sending it to said decoder; and c. means for continuing said seeking and sending function until all reference frames have been sent, in bitstream order, to said decoder.

34. A computer-readable medium according to claim 32, wherein said parsing means further comprises means for seeking to the first reference frame in bitstream order and beginning video stream playback in suppression mode in bitstream order from the location of said reference frame until said target frame is reached.

35. A computer-readable medium according to claim 26, wherein said target frame access is accomplished in response to user input.

36. A computer-readable medium according to claim 26, wherein said target frame access is accomplished in response to a computer program instruction.

37. A computer-readable medium according to claim 26, wherein said computer program further comprises means for communicating to said decoder GOP header and sequence header state information for each frame to be parsed.

38. A computer-readable medium according to claim 26, wherein said computer program further comprises: a. means for determining which sequence header applies to each frame to be parsed: and b. means for copying said sequence header and sending it to said decoder prior to sending said frame.

50

SUBSTTTUTE SHEET (RULE 26) 39. A computer-readable medium according to claim 38. wherein said determining means is an index.

40. A computer-readable medium according to claim 26. wherein said computer program further comprises: a. means for determining which GOP header applies to each frame to be parsed; and b. means for copying said GOP header and sending it to said decoder prior to sending said frame.

41. A computer-readable medium according to claim 40. wherein said determining means is an index.

42. A computer-readable medium according to claim 26. wherein said computer program further comprises: a. means for comparing the GOP header and sequence header which applies to each frame to be decoded to the GOP header and sequence header which applies to the frames most recently decoded by said decoder: b. where said comparison means indicates at least one is different, means for copying the different GOP header or sequence header which applies to said frame to be decoded and sending it to said decoder prior to sending said frame to be decoded.

43. A computer-readable medium according to claim 26, wherein said computer program further comprises means for suppressing audio play during said parsing of said reference frames and means for enabling said audio play when said video display is enabled.

51

SUBSTHUTE SHEET (RULE 26) 44. A computer-readable medium according to claim 26. wherein reverse play of said video stream is accomplished from the location of said target frame by successively employing the means of said system for successive target frames where each target frame chosen immediately precedes the previous target frame in display order.

45. A computer-readable medium according to claim 26 wherein reverse play of said video stream is accomplished from the location of said target frame by successively employing the means of said system for successive target frames where each target frame chosen precedes the previous target frame by a preselected number of frames in display order.

46. A computer-implemented method of playing a video sequence from a digitally compressed video stream, which stream contains reference frames and dependent frames, in reverse display order beginning at an arbitrarily selected target frame within said video stream, comprising the steps of: a. determining the location of the target frame; b. determining the type of the target frame; c. identifying the reference frames to which the target frame directly and indirectly refers; d. parsing the reference frames with a decoder while said decoder is in suppression mode; e. displaying the target frame; f. designating the frame immediately preceding the current target frame in display order as the new target frame; and g. repeating steps (a) through (f) until all desired frames have been displayed.

47. A computer-based system for playing a video sequence from a digitally compressed video stream in reverse display order beginning at an arbitrarily selected target frame within said video stream, comprising: a. means for determining the location of each frame to be displayed; b. means for determining the type of each frame to be displayed;

52

SUBSTTTUTE SHEET (RULE 26) c. means for identifying the frames to which each frame to be displayed directly and indirectly refers: d. means for parsing said direct and indirect reference frames in bitstream order with a decoder while said decoder is in suppression mode: e. means for enabling said video display: and f. means for parsing and displaying each frame to be displayed.

48. A system for playing a video sequence from a digitally compressed video stream in reverse display order beginning at an arbitrarily selected target frame within said video stream, comprising: a. a computer; b. said computer being programmed to determine the location of each frame to be displayed, determine the type of each frame to be displayed, identify the frames to which each frame to be displayed directly and indirectly refers, parse said direct and indirect reference frames in bitstream order with a decoder while said decoder is in suppression mode, enable said video display, and parse and display each frame to be displayed.

49. A computer-readable medium for causing a computer to play a video sequence from a digitally compressed video stream, which stream has headers containing state information, in reverse display order beginning at an arbitrarily selected target frame within said video stream, comprising: a. a computer-readable storage medium: and b. a computer program stored on said storage medium; c. said computer program comprising: i. means for determining the location of each frame to be displayed, ii. means for determining the type of each frame to be displayed, iii. means for identifying the frames to which each frame to be displayed directly and indirectly refers. iv. means for parsing said direct and indirect reference frames in bitstream order with a decoder while said decoder is in suppression mode. v. means for enabling said video display, and vi. means for parsing and displaying each frame to be displayed.

50. A computer-readable medium according to claim 49, wherein said computer program further comprises: a. where the type of the next frame to be displayed is a B frame, means for sending said B frame to said decoder after enabling said video display; b. where the type of the next frame to be displayed is an I frame or a P frame, means for creating or retrieving a future duplicator B frame and sending said future duplicator B frame to said decoder after enabling said video display.

51. A computer-readable medium according to claim 49, wherein said computer program further comprises: a. where the type of the next frame to be displayed is a non-reference frame, means for sending said non-reference frame to said decoder after enabling said video display; and b. where the type of frame to be displayed is a reference frame, means for creating or retrieving a future duplicator non-reference frame and sending said future duplicator non-reference frame to said decoder after enabling said video display.

52. A computer-readable medium according to claim 49. wherein said computer program further comprises means for communicating to said decoder additional state information for each frame to be parsed.

53. A computer-readable medium according to claim 49. wherein said computer program further comprises: a. means for determining which header applies to each frame to be parsed: and

54

SUBSTTTUTE SHEET (RULE 26) b. means for copying said header and sending it to said decoder prior to sending said frame.

54. A computer-readable medium according to claim 53, wherein said determining means comprises an index.

55. A computer-readable medium according to claim 53. wherein said determining means comprises a global variable.

, 56. A computer-readable medium according to claim 53, wherein said determining means comprises a state table.

57. A computer-readable medium according to claim 49. wherein said computer program further comprises: a. means for comparing the header which applies to each frame to be decoded to the header which applies to the frames most recently decoded by said decoder; b. where said comparison means indicates a difference, means for copying the header which applies to said frame to be decoded and sending it to said decoder prior to sending said frame to be decoded.

58. A computer-readable medium according to claim 49. wherein said video stream is an MPEG video stream containing GOP headers and sequence headers.

59. A computer-readable medium according to claim 58, wherein said computer program further comprises: a. means for determining which GOP header applies to each frame to be parsed; and b. means for copying said GOP header and sending it to said decoder prior to sending said frame.

55

SUBSTTTUTE SHEET (RULE 26) 60. A computer-readable medium according to claim 58. wherein said computer program further comprises: a. means for determining which sequence header applies to each frame to be parsed: and ^• b. means for copying said sequence header and sending it to said decoder prior to sending said frame.

56

SUBSTTTUTE SHEET (RULE 26)