WO2008084179A1 - Buffer management - Google Patents

Buffer management Download PDF

Info

Publication number
WO2008084179A1
WO2008084179A1 PCT/GB2007/000032 GB2007000032W WO2008084179A1 WO 2008084179 A1 WO2008084179 A1 WO 2008084179A1 GB 2007000032 W GB2007000032 W GB 2007000032W WO 2008084179 A1 WO2008084179 A1 WO 2008084179A1
Authority
WO
WIPO (PCT)
Prior art keywords
buffer
packets
fill
level
timestamps
Prior art date
Application number
PCT/GB2007/000032
Other languages
French (fr)
Inventor
Ray Taylor
Original Assignee
Nds Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nds Limited filed Critical Nds Limited
Priority to PCT/GB2007/000032 priority Critical patent/WO2008084179A1/en
Publication of WO2008084179A1 publication Critical patent/WO2008084179A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23406Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving management of server-side video buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/2365Multiplexing of several video streams
    • H04N21/23655Statistical multiplexing, e.g. by controlling the encoder to alter its bitrate to optimize the bandwidth utilization
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26208Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists the scheduling operation being performed under constraints
    • H04N21/26216Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists the scheduling operation being performed under constraints involving the channel capacity, e.g. network bandwidth
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/432Content retrieval operation from a local storage medium, e.g. hard-disk
    • H04N21/4325Content retrieval operation from a local storage medium, e.g. hard-disk by playing back content from the storage medium
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/436Interfacing a local distribution network, e.g. communicating with another STB or one or more peripheral devices inside the home
    • H04N21/43615Interfacing a Home Network, e.g. for connecting the client to a plurality of peripherals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4392Processing of audio elementary streams involving audio buffer management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44004Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving video buffer management, e.g. video decoder buffer or video display buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/4424Monitoring of the internal components or processes of the client device, e.g. CPU or memory load, processing speed, timer, counter or percentage of the hard disk space used

Definitions

  • the present invention relates to video decoders, and in particular, to buffer fill-level management in video decoders.
  • bandwidth constrained networks for example, but not limited to, wireless networks such as wireless home networks, or wired networks such as wired home networks and the Internet, may suffer periods of reduced throughput. Therefore, the available bandwidth for a video service, transmitted via a bandwidth-constrained network to receivers, may be less than that required.
  • a buffer in a receiver may start to empty, until at some point, the buffer run out of data. Whilst the buffer fullness remains above zero, there is no noticeable effect on the video or audio output. However, if tfie buffer empties, the output will suffer, usually noticeable as a glitch or freeze in the video and audio output.
  • the present invention seeks to provide an improved buffer fill-level management system.
  • the present invention in preferred embodiments thereof, monitors a fill-level of a receive buffer. If the receive buffer starts to empty, an action is performed to correct the situation.
  • the fill-level of the receive buffer is managed by varying decoder playback speed based on the fill-level of the receive buffer. If the receive buffer has an adequate fill-level, then playback proceeds at normal play speeds. If the receive buffer reduces below a threshold fill-level, the play speed is reduced in order to replenish the receive buffer. Further play speed reductions may be necessary. If the receive buffer is very depleted, an action such as re-establishing the network connection may be performed.
  • VBR variable bitrate
  • the fill-level of the receive buffer is preferably measured using timing information in the stream in order to determine the amount of playback time in the receive buffer, thereby controlling the playback speed based on time thresholds rather than data thresholds. It will be appreciated by those ordinarily skilled in the art that controlling the playback speed based on timing information can be used for VBR services and non-VBR services alike. If playback speed is reduced by simply playing video frames and audio samples slower, then the audio is pitch-shifted down giving the actors deeper voices, for instance females would start to sound like males. Although playing the video slightly slower is generally not noticed by the viewers, a small change in audio pitch is readily heard.
  • pitch shifting technology is employed to pitch-shift the audio during the decoding process to cancel out the pitch-shifting introduced by varying the playback speed of the stream.
  • the actors have the same tone of voice, but are delivering the lines at a slower or faster rate.
  • a system for processing a plurality of packets of a media stream encoded by an encoder the packets being associated with a plurality of timestamps such that each of the packets is associated with one of the timestamps, the packets being at least partially transferred via a bandwidth constrained network
  • the system including a receiver to receive the packets, a buffer, operationally connected to the receiver, to store the packets, a decoder, operationally connected to the buffer, to receive the packets from the buffer and decode the packets, a fill-level determiner, operationally connected to the buffer, to determine a fill-level of the buffer based on a time difference between an oldest timestamp of the timestamps of the packets currently stored in the buffer, and a newest timestamp of the timestamps of the packets currently stored in the buffer, and a fill-level manager, operationally connected to the buffer, to perform an action based on the determined fill-level.
  • the fill-level manager is operative to adjust the playback speed of the media stream by the decoder a plurality of thresholds of the buffer fill-level.
  • the system includes an audio module to adjust the pitch of an audio element of the media stream in order to compensate for the adjustment of the playback speed of the decoder.
  • the audio module is operative to adjust the pitch using pitch shifting.
  • the fill-level manager is operative to reestablish a connection with the network.
  • the timestamps are assigned by the encoder at a time of encoding of each of the packets.
  • the timestamps are program clock references.
  • the buffer is a receive buffer, the decoder including a decode buffer.
  • the buffer includes a receive buffer and a decode buffer.
  • a method for processing a plurality of packets of a media stream encoded by an encoder the packets being associated with a plurality of timestamps such that each of the packets is associated with one of the timestamps, the packets being at least partially transferred via a bandwidth constrained network
  • the method including receiving the packets, storing the packets in a buffer, decoding the packets, determining a fill-level of the buffer based on a time difference between an oldest timestamp of the timestamps of the packets currently stored in the buffer, and a newest timestamp of the timestamps of the packets currently stored in the buffer, and performing an action based on the determined fill-level.
  • performing the action includes adjusting a playback speed of the media stream based on the determined fill-level of the buffer.
  • the adjusting of the playback speed is performed a plurality of thresholds of the buffer fill-level.
  • the method includes adjusting the pitch of an audio element of the media stream in order to compensate for the adjustment of the playback speed. Further in accordance with a preferred embodiment of the present invention the adjusting the pitch includes pitch shifting.
  • performing the action includes reestablishing a connection with the network.
  • the timestamps are assigned by the encoder at a time of encoding of the packets.
  • the timestamp are program clock references.
  • the buffer is a receive buffer, the decoder including a decode buffer.
  • the buffer includes a receive buffer and a decode buffer.
  • Fig. 1 is a partly pictorial, partly block diagram view of a media stream system constructed and operative in accordance with a preferred embodiment of the present invention
  • Fig. 2 is a partly pictorial, partly block diagram view of a set-top box for use in the system of Fig. 1 constructed and operative in accordance with a preferred embodiment of the present invention
  • Fig. 3 is a flow chart showing a preferred method of operation of the set-top box of Fig. 2;
  • Fig. 4 is a partly pictorial, partly block diagram view of a media stream system constructed and operative in accordance with another preferred embodiment of the present invention.
  • Fig. 1 is a partly pictorial, partly block diagram view of a media stream system 10 constructed and operative in accordance with a preferred embodiment of the present invention.
  • the system 10 preferably includes a broadcaster Headend 12 having an encoder 14 for encoding a media stream 16.
  • the media stream 16 generally includes a plurality of packets 18.
  • the Headend 12 also typically includes a clock 22, preferably operationally associated with the encoder 14.
  • the packets 18 are typically associated with a plurality of timestamps 20.
  • the timestamps 20 are preferably assigned by the encoder 14 based on the time provided by the clock 22, such that each packet 18 is associated with one timestamp 20 assigned at the time of encoding each packet 18.
  • the timestamps 20 are typically program clock references (PCRs).
  • the Headend 12 also typically includes a transmitter 24 for broadcasting the media stream 16 to a plurality of subscribers 28 (only one shown for the sake of clarity) via a satellite 26.
  • the media stream 16 may be transmitted by any suitable transmission method, for example, but not limited to, cable, terrestrial communication or Internet Protocol (IP).
  • IP Internet Protocol
  • the media stream 16 is preferably received by a satellite dish 32 attached to a house 34 of the subscriber 28.
  • the media stream 16 is then typically received by a receiver decoder 30 which is operationally connected to the satellite dish 32.
  • the receiver decoder 30 is a personal video recorder incorporating set-top box functionality with video recording functionality.
  • the receiver decoder 30 may be any suitable receiving device such as a set-top box or a suitable computer.
  • the receiver decoder 30 is typically connected to other set- top boxes 36 in the house 34 via a home network 38.
  • the home network 38 connects the set-top boxes 36 to the receiver decoder 30 enabling the set-top boxes 36 to play content received via the receiver decoder 30.
  • the home network 38 is typically a bandwidth-constrained network which is a wired or wireless network. Therefore, the packets 18 of the media stream 16, received by the set-top boxes 36, are at least partially transferred via a bandwidth constrained network.
  • the media stream 16 is transmitted by a transmitter 40 of the home network 38 from the receiver decoder 30 to the set-top boxes 36.
  • the term "transmitter” as used in the specification and claims is defined as an arrangement to send the media stream 16 from one device to another, via a wired or wireless network.
  • Some systems use rate adaptive encoding in the transmitter 40 to adjust the service bitrate of the media stream 16 to the available bandwidth of the network 38.
  • Adaptive encoding introduces an inherent reduction in quality, due to the non-perfect decode-encode stage.
  • picture quality may be reduced considerably when reducing the bitrate during a period of poor performance of the home network 38.
  • the system 10 of the present invention in preferred embodiments thereof, generally maintains the same video encoding as the original broadcast, preferably eliminating the encode-decode process, which is expensive.
  • adaptive encoding can be implemented with a preferred embodiment of the present invention.
  • the media stream 16 is preferably transmitted by the transmitter 40 as fast as the network 38 allows.
  • the transmitter 40 typically needs a buffer (not shown) to store data which cannot be sent immediately.
  • the buffer of the transmitter 40 generally does not incorporate another delay into the system 10 as the default state of the buffer of the transmitter 40 is empty (whereas the default state of a buffer of a receiver is generally full or close to full).
  • Fig. 2 is a partly pictorial, partly block diagram view of one of the set-top boxes 36, namely, a set-top box 42, for use in the system 10 of Fig. 1, constructed and operative in accordance with a preferred embodiment of the present invention.
  • Fig. 3 is a flow chart showing a preferred method of operation of the set-top box 42 of Fig. 2.
  • the set-top box 42 preferably includes a receiver 44, a receive buffer 46, a decoder 48, a fill-level determiner 50 and a fill-level manager 52.
  • the receiver 44 is preferably operative to receive the packets 18 from the home network 38 (block 74).
  • the receive buffer 46 is preferably operationally connected to the receiver 44.
  • the receive buffer 46 is preferably operative to store the packets 18 (block 76).
  • the decoder 48 which is preferably operationally connected to the receive buffer 46, is preferably operative to receive the packets 18 from the receive buffer 46 and decode the packets 18 (block 78).
  • the decoder 48 typically includes a decode buffer 45 to receive the packets 18 from the receive buffer 46 prior to decoding.
  • the decode buffer 45 is typically the MPEG variable bitrate (VBR) buffer whose level may vary wildly.
  • VBR MPEG variable bitrate
  • the decoder 48 is generally in complete control of the decode buffer 45 and typically makes sure that the decode buffer 45 never runs out of data. Ascertaining any information about the level of the decode buffer 45 is generally practically impossible.
  • the decode buffer 45 and receive buffer 46 are typically implemented in a single physical buffer, but are logically separate.
  • the decode buffer 45 and the receive buffer 46 may be physically and logically separate.
  • the decode buffer 45 and the receive buffer 46 may be included in a single physical and logical buffer (a hybrid buffer 47) requiring special treatment described below in more detail.
  • the fill-level determiner 50 preferably operationally connected to the receive buffer 46, is preferably operative to determine a fill-level of the receive buffer 46 as a time difference between: an oldest timestamp 56 of the timestamps 20 of the packets 18 currently stored in the receive buffer 46; and a newest timestamp 58 of the timestamps 20 of the packets 18 currently stored in the receive buffer 46 (block 80).
  • the timestamps 20 used in the fill-level determination are preferably of the same type (for example, both the oldest timestamp 56 and the newest timestamp 58 are PCRs) and not a combination of different types of timestamps.
  • each packet 18 may have more than one type of timestamp, for example, but not limited to, timestamps generated by the encoder 14, a multiplexer (not shown) and/or the transmitter 40, such as program clock references (PCRs), frame decode time stamps (DTSs), frame presentation time stamps (PTSs), time stamps of IP packets (e.g.: reference time stamps (RTSs)), and timestamps or time codes originating from video sources (e.g.: vertical interval time code (VITC)).
  • PCRs program clock references
  • DTSs frame decode time stamps
  • PTSs frame presentation time stamps
  • time stamps of IP packets e.g.: reference time stamps (RTSs)
  • the fill-level determiner 50 needs to take into account the decode timestamps (not shown) of the packets 18, allowing calculation of what the decode buffer 45 fill-level would have been if the decode buffer 45 was separate from the receive buffer 46 (at least logically separate).
  • the decode buffer 45 fill-level then needs to be subtracted from the total fill-level of the hybrid buffer leaving a logical receive buffer fill- level.
  • the total fill-level of the hybrid buffer is determined as a time difference between an oldest timestamp (not shown) of the timestamps 20 of the packets 18 currently stored in the hybrid buffer; and a newest timestamp (not shown) of the timestamps 20 of the packets 18 currently stored in the hybrid buffer.
  • the fill-level manager 52 which is preferably operationally connected to the receive buffer, is preferably operative to perform an action based on the determined fill-level.
  • the action typically includes adjusting the playback speed of the media stream 16 by the decoder 48 based on the determined fill-level of the receive buffer 46 (block 82).
  • the playback speed is preferably adjusted in accordance with predefined threshold levels of the buffer fill-level, so that when the buffer fill-level falls below a certain level the playback speed is reduced. When the buffer fill-level falls below another threshold level, the playback speed is reduced again, and so on.
  • the adjustment of the playback speed is performed such that the playback speed is proportional to fill-level, so that the lower the fill-level, the lower the playback speed. Therefore, the playback speed is generally decreased (either smoothly or in steps) as the fill-level of the receive buffer 46 drops, reducing the rate that the receive buffer 46 empties. If the consumption rate of the media stream 16 is reduced to less than the current network 38 throughput, then the receive buffer 46 generally starts to fill.
  • the receive buffer 46 fill-level is typically in effect acting as a feedback mechanism to match the rate of data consumption from the receive buffer 46 to the rate of data acquisition from the network 38.
  • the decoder 48 to play faster than real-time once network throughput is restored, thereby allowing the transmitter 40 to transmit the media stream 16 faster than real-time emptying the buffer of the transmitter 40, and restoring the system 10 to the default steady state of empty buffer of the transmitter 40 and full receive buffer 46.
  • the decoder 48 To allow the decoder 48 to determine when to play faster than real- time typically requires the receive buffer 46 to be maintained at a nominal full- level which is slightly less than full capacity of the receive buffer 46. If the receive buffer 46 fill-level exceeds the nominal full-level, then the decoder 48 typically plays faster than real time.
  • the size and cost of the receive buffer 46, as well as the delay caused by the receive buffer 46, are generally minimized. Additionally, by maintaining the receive buffer 46 fill-level above zero, the noticeable glitches and stutters in the audio and video caused by reduced network throughput are generally eliminated.
  • the action performed by the fill-level manager 52 includes reestablishing a connection with the network 38, for example, if the fill-level of the receive buffer 46 drops below a predetermined level or if the fill-level is below a predetermined level for a predetermined time period
  • VBR variable bitrate
  • a sensible receive-buffer level in a non-variable playback speed set-top box may be equal to the maximum expected delay (1 second) multiplied by the maximum bitrate (10 megabits per second) multiplied by a safety factor of 2, giving 20 megabits or 2.5 megabytes. So a buffer of 2.5 megabytes generally insulates against receive buffer under-runs, but typically at a considerable cost in terms of total delay. For example, if the client tunes to the service whilst the service is in the 1 megabit per second mode then 20 seconds of data are buffered prior to decoding.
  • the playback speed of the decoder 48 is adjustable allowing the receive buffer 46 to refill. Therefore, in the above example, where the home network 38 usually introduces a delay of less than 500 ms, but occasionally introduces a delay of up to 1 second, the set-top box 42 only needs to delay the media stream 16 by slightly more than the usual network delay (of less than 500 ms) to say, 1000 ms.
  • any usual delay (of less than 500 ms) would reduce the receive buffer 46 fill-level from 1000 ms to 501 ms or more and not affect the playback speed.
  • An occasional network delay of 500 ms or more reduces the determined fill-level of the receive buffer 46 (as determined by the fill-level determiner 50) to 500 ms or less, triggering slower than real-time playback at 80% of the real-time playback speed.
  • the amount of effective time left in the receive buffer 46 known as the new effective fill-level, is inversely proportional to the playback speed.
  • the effective amount of data in the receive buffer 46 increases as the data is used at a slower rate. Therefore, if there is 500 ms of data in the receive buffer 46 when the decoder 48 is playing at 100% playback speed, there is effectively 625 ms of data if played at 80% speed (500 ms divided by 80%). Therefore, there is an effective fill level at the new 80% speed of 625 ms.
  • the effective fill level has now also dropped from 625 ms (row 2 of table 1) to 500 ms (row 3 of table 1).
  • the effective fill level is calculated by dividing the determined fill-level of 400 ms by the speed of 80%, giving 500 ms.
  • the drop in the effective fill-level to 500 ms (and/or the drop in the determined fill-level to 400 ms) triggers a further reduction in the speed of the decoder 48 to 60% speed in order to compensate for the additional network delay, resulting in a new effective fill-level of 667 ms (400 ms divided by 60%).
  • an additional delay is introduced, for example, 250ms (giving a total delay of 1042 ms).
  • the additional delay of 250 ms is associated with a reduction in the determined fill-level by 100 ms, from 300 ms to 200 ms (as determined by the fill-level determiner 50) at 40% speed (100 ms divided by 40% equals 250 ms).
  • the effective fill level has now also dropped from 750 ms (row 4 of table 1) to 500 ms (row 5 of table 1).
  • the effective fill level is calculated by dividing the determined fill-level of 200 ms by the speed of 40%, giving 500 ms.
  • an additional delay is introduced, for example, 500ms (giving a total delay of 1542 ms).
  • the additional delay of 500 ms is associated with a reduction in the determined fill-level by 100 ms, from 200 ms to 100 ms (as determined by the fill-level determiner 50) at 20% speed (100 ms divided by 20% equals 500 ms).
  • the effective fill level has now also dropped from 1000 ms (row 5 of table 1) to 500 ms (row 6 of table 1).
  • the effective fill level is calculated by dividing the determined fill-level of 100 ms by the speed of 20%, giving 500 ms.
  • the drop in the effective fill-level to 500 ms (and/or the drop in the determined fill-level to 100 ms), triggers a further reduction in the speed of the decoder 48 to 0% speed, effectively freezing the decoder 48 and also typically triggering an action to reestablish a connection with the network 38. It will be appreciated by those ordinarily skilled in the art that an action to reestablish a connection with the network 38 may be taken at any other suitable trigger point and/or after a predetermined time period of reduced network throughput.
  • the delay after channel change has generally reduced from a variable 2 to 20 seconds to a fixed 1 second.
  • the decoder 48 does not typically under-run when the VBR coding is high (for example, but not limited to, 10 megabits per second) or slowing down unnecessarily when the VBR coding is low (for example, but not limited to, 1 megabit per second). Additionally, the set- top box 42 generally does not cause a short delay (for example, but not limited to, 2 seconds) when tuned to a service operating in a high coding phase, or a long delay (for example, but not limited to, 20 seconds) when tuned to a service operating in a low coding phase.
  • a short delay for example, but not limited to, 2 seconds
  • a long delay for example, but not limited to, 20 seconds
  • a steady delay of 1 second is preferably achieved no matter what phase the VBR coding is in during channel change. Reducing the delay through the system 10 is generally very beneficial, as less memory is typically required in the receive buffer 46, and the system 10 is generally far more responsive to the user.
  • the parameters used in table 1 are only examples. In practice, values are typically chosen based on the statistical distribution of the delays in the system 10, the amount of memory available in the set-top box 42, and the amount of delay the user and/or operator feels is acceptable.
  • Data may be delayed in the network 38 due to congestion. However, after the congestion clears, the components in the network 38 will generally try to deliver the data as quickly as possible. In wireless networking, the post-congestion delivery may not be significantly faster, as the network 38 normally runs close to capacity. Hence, there is little additional bandwidth to overwhelm the set-top boxes 36. In wired networks, and some wireless networks, data can be transferred many times faster than the normal rate. Therefore, following a period of congestion, the decoder 48 may receive more data than can be handled.
  • the receiver decoder 30 in Fig. 1 is the server of the data in the system 10. The server then has several options to resolve the problem. First, provide a sufficiently large transmit queue to cater for the variable consumption of the receiver. Second, reduce the bitrate of the media stream 16 to compensate for the lower throughput (if transcoding or similar is available).
  • the original timing of the media stream 16 is typically maintained. However, a transcoder (not shown) may generate new timing. If new timing is generated, the fill-level of the receive buffer 46 may be based on the new timing or the original timing. Although, basing the fill-level on the new timing is preferred, the clock drift between the original and new timestamps is generally so small as to have a negligible effect on the receive buffer 46.
  • the media stream 16 can be recorded to disk.
  • the media stream 16 is then played from the disk so that the disk acts as a suitable buffer.
  • the network 38 may be reconfigured, or streams dropped to provide the best service for the available network.
  • the set-top box 42 includes an audio module 54.
  • the audio module 54 is preferably operative to adjust the pitch of an audio element (not shown) of the media stream 16 in order to compensate for the adjustment of the playback speed of the decoder 48 using pitch shifting (block 86).
  • a preferred method to correct the pitch error caused by adjusting the playback speed is to apply a Fourier transform to convert the audio into the frequency domain. Then, the frequency domain values are shifted up or down in frequency, as appropriate. Finally, applying an inverse Fourier transform converts the audio back to the time domain. Playing the audio at the adjusted playback speed causes the pitch of the audio to be shifted, canceling out the effect of the pitch shift performed with the Fourier transforms so that the pitch of the actor's voice remains constant and the user typically does not perceive the change in playback speed. In effect, it generally appears as if the actor is talking slightly faster, or slower, but still with the same tone of voice.
  • Pitch shifting is generally quite simple for digitally compressed audio decoders to achieve.
  • digital audio the compressed input data is typically already in the frequency domain, and so the first Fourier transform is generally unnecessary.
  • the audio decoding generally requires an inverse Fourier transform to be applied anyway. Typically, the only difference is the shifting of the frequency samples up or down by the appropriate amount.
  • the pitch shifting technique is typically implemented in set-top boxes as part of the review buffer functionality allowing viewers to start watching a live program from the beginning of the program even after the start time.
  • the content is generally played slightly faster that real-time to enable the viewer to gradually catch up with the live action.
  • Fig. 4 is a partly pictorial, partly block diagram view of a media stream system 60 constructed and operative in accordance with another preferred embodiment of the present invention.
  • the system 60 is substantially the same as the system 10 of Figs. 1-3 except for the following differences.
  • the system 60 includes a Headend 62 with an encoder 64.
  • the Headend 62 is preferably operative to broadcast programming via an Internet Protocol 66 (IP) to subscribers including a subscriber 68.
  • IP Internet Protocol 66
  • the subscriber 68 receives programming from the Headend 62 via a set-top box 70 (or PVR).
  • the set-top box 70 is connected to the Internet Protocol 66 via a residential gateway 72.
  • the system 60 is typically an Internet Protocol Television (IPTV) system and/or a Video-on-demand (VOD) system, by way of example only.
  • IPTV Internet Protocol Television
  • VOD Video-on-demand
  • Bandwidth may be restricted at various sections of the system 60.
  • the Headend 62 When the Headend 62 is located at the office of the content provider (not shown), the content first needs to be sent across the Internet 66 to the Internet Service Provider (ISP) (not shown) for broadcast to the subscribers.
  • ISP Internet Service Provider
  • the Headend 62 When the Headend 62 is located in the server room, for example, the transfer to the ISP across the Internet 66 is avoided. Congestion may occur at any point in the Internet 66 as well as in the home network.

Abstract

A system for processing packets of a media stream, the packets being associated with a plurality of timestamps, the packets being at least partially transferred via a bandwidth constrained network, the system including a receiver to receive the packets, a buffer, operationally connected to the receiver, to store the packets, a decoder, operationally connected to the buffer, to receive the packets from the buffer and to decode the packets, a fill-level determiner, operationally connected to the buffer, to determine a fill-level of the buffer based on a time difference between an oldest timestamp of the timestamps of the packets currently stored in the buffer and a newest timestamp of the timestamps of the packets currently stored in the buffer, and a fill-level manager, operationally connected to the buffer, to perform an action based on the determined fill-level. Related apparatus and methods are also described.

Description

BUFFER MANAGEMENT
FIELD OF THE INVENTION
The present invention relates to video decoders, and in particular, to buffer fill-level management in video decoders.
BACKGROUND OF THE INVENTION
By way of introduction, bandwidth constrained networks, for example, but not limited to, wireless networks such as wireless home networks, or wired networks such as wired home networks and the Internet, may suffer periods of reduced throughput. Therefore, the available bandwidth for a video service, transmitted via a bandwidth-constrained network to receivers, may be less than that required.
During periods of reduced throughput, a buffer in a receiver may start to empty, until at some point, the buffer run out of data. Whilst the buffer fullness remains above zero, there is no noticeable effect on the video or audio output. However, if tfie buffer empties, the output will suffer, usually noticeable as a glitch or freeze in the video and audio output.
Increasing the buffer size allows longer periods of reduced network throughput. However, there is more data in the buffer, which means that the time delay between entering the buffer and leaving the buffer is increased. This has two disadvantages. First, the cost of the extra memory cannot always be justified. Second, the response time to key presses such as channel change, pause and rewind will always be greater than the delay through the buffer typically making the system too unresponsive for the user.
Another technique to cope with periods of reduced throughput is to recode the video stream in order to reduce the bitrate in line with the available network bandwidth. However, this technique has disadvantages. First, an additional decoder and encoder are required in the transmission line increasing costs. Second, the resulting video is of a lower quality than the originally broadcast video. The following references are also believed to represent the state of the art:
US Patent 4,413,289 to Weaver, et al;
US Patent 5,396,497 to Veltman; US Patent 5,526,362 to Thompson, et al.;
US Patent 5,644,677 to Park, et al.;
US Patent 5,805,602 to Cloutier, et al.;
US Patent 6,108,286 to Eastty;
US Patent 6,493,298 to Youn; US Patent 6,553,455 to Asano, et al.;
US Patent 6,970,895 to Vaidyanathan, et al.;
US Patent 6,999,447 to D'Amico, et al.;
US Published Patent Application 2003/0208359 of Kang, et al.;
US Published Patent Application 2003/0165326 of Blair, et al.; US Published Patent Application 2004/0204945 of Okuda, et al.;
US Published Patent Application 2004/0019491 of Rhee;
US Published Patent Application 2005/0100054 of Scott, et al.;
US Published Patent Application 2005/0265334 of Koguchi;
US Published Patent Application 2006/0092282 of Herley, et al.; US Published Patent Application 2006/0015348 of Cooper, et al.;
PCT Published Patent Application WO 2004/064301 of Thompson
Licensing S. A.;
PCT Published Patent Application WO 2004/062291 of Koninklijke Philips Electronics N.V. European Published Patent Application EP 1 394 975 of Zarlink
Semiconductor Limited; and Abstract of Japanese Published Patent Application JP2005322995 of Nippon Telegraph and Telephone.
The disclosures of all references mentioned above and throughout the present specification, as well as the disclosures of all references mentioned in those references, are hereby incorporated herein by reference.
SUMMARY OF THE INVENTION
The present invention seeks to provide an improved buffer fill-level management system.
The present invention, in preferred embodiments thereof, monitors a fill-level of a receive buffer. If the receive buffer starts to empty, an action is performed to correct the situation.
Typically, the fill-level of the receive buffer is managed by varying decoder playback speed based on the fill-level of the receive buffer. If the receive buffer has an adequate fill-level, then playback proceeds at normal play speeds. If the receive buffer reduces below a threshold fill-level, the play speed is reduced in order to replenish the receive buffer. Further play speed reductions may be necessary. If the receive buffer is very depleted, an action such as re-establishing the network connection may be performed.
In a variable bitrate (VBR) service, some times referred to as a statistically multiplexed or a stat-muxed service, the amount of data sent per second changes quite dramatically, usually between 1 megabits per second and 10 megabits per second, by way of example only. Therefore, it is very difficult to set a data threshold for the receive buffer for use in controlling the variable speed playback. For example, a threshold requiring the video to be played slower than real-time for a 1 megabit stream would typically be far lower than that for a 10 megabit stream. However, as the stream bitrate is changing frame by frame in an unpredictable way, there is generally not a stable data-threshold which would be appropriate.
The fill-level of the receive buffer is preferably measured using timing information in the stream in order to determine the amount of playback time in the receive buffer, thereby controlling the playback speed based on time thresholds rather than data thresholds. It will be appreciated by those ordinarily skilled in the art that controlling the playback speed based on timing information can be used for VBR services and non-VBR services alike. If playback speed is reduced by simply playing video frames and audio samples slower, then the audio is pitch-shifted down giving the actors deeper voices, for instance females would start to sound like males. Although playing the video slightly slower is generally not noticed by the viewers, a small change in audio pitch is readily heard.
Therefore, in accordance with preferred embodiments of the present invention, pitch shifting technology is employed to pitch-shift the audio during the decoding process to cancel out the pitch-shifting introduced by varying the playback speed of the stream. The actors have the same tone of voice, but are delivering the lines at a slower or faster rate.
There is thus provided in accordance with a preferred embodiment of the present invention a system for processing a plurality of packets of a media stream encoded by an encoder, the packets being associated with a plurality of timestamps such that each of the packets is associated with one of the timestamps, the packets being at least partially transferred via a bandwidth constrained network, the system including a receiver to receive the packets, a buffer, operationally connected to the receiver, to store the packets, a decoder, operationally connected to the buffer, to receive the packets from the buffer and decode the packets, a fill-level determiner, operationally connected to the buffer, to determine a fill-level of the buffer based on a time difference between an oldest timestamp of the timestamps of the packets currently stored in the buffer, and a newest timestamp of the timestamps of the packets currently stored in the buffer, and a fill-level manager, operationally connected to the buffer, to perform an action based on the determined fill-level. Further in accordance with a preferred embodiment of the present invention the fill-level manager is operative to adjust a playback speed of the media stream by the decoder based on the determined fill-level of the buffer.
Still further in accordance with a preferred embodiment of the present invention the fill-level manager is operative to adjust the playback speed of the media stream by the decoder a plurality of thresholds of the buffer fill-level. Additionally in accordance with a preferred embodiment of the present invention, the system includes an audio module to adjust the pitch of an audio element of the media stream in order to compensate for the adjustment of the playback speed of the decoder. Moreover in accordance with a preferred embodiment of the present invention the audio module is operative to adjust the pitch using pitch shifting.
Further in accordance with a preferred embodiment of the present invention the fill-level manager is operative to reestablish a connection with the network. Still further in accordance with a preferred embodiment of the present invention the timestamps are assigned by the encoder at a time of encoding of each of the packets.
Additionally in accordance with a preferred embodiment of the present invention the timestamps are program clock references. Moreover in accordance with a preferred embodiment of the present invention the buffer is a receive buffer, the decoder including a decode buffer.
Further in accordance with a preferred embodiment of the present invention the buffer includes a receive buffer and a decode buffer.
There is also provided in accordance with still another preferred embodiment of the present invention a method for processing a plurality of packets of a media stream encoded by an encoder, the packets being associated with a plurality of timestamps such that each of the packets is associated with one of the timestamps, the packets being at least partially transferred via a bandwidth constrained network, the method including receiving the packets, storing the packets in a buffer, decoding the packets, determining a fill-level of the buffer based on a time difference between an oldest timestamp of the timestamps of the packets currently stored in the buffer, and a newest timestamp of the timestamps of the packets currently stored in the buffer, and performing an action based on the determined fill-level. Still further in accordance with a preferred embodiment of the present invention performing the action includes adjusting a playback speed of the media stream based on the determined fill-level of the buffer.
Additionally in accordance with a preferred embodiment of the present invention the adjusting of the playback speed is performed a plurality of thresholds of the buffer fill-level.
Moreover in accordance with a preferred embodiment of the present invention, the method includes adjusting the pitch of an audio element of the media stream in order to compensate for the adjustment of the playback speed. Further in accordance with a preferred embodiment of the present invention the adjusting the pitch includes pitch shifting.
Still further in accordance with a preferred embodiment of the present invention performing the action includes reestablishing a connection with the network. Additionally in accordance with a preferred embodiment of the present invention the timestamps are assigned by the encoder at a time of encoding of the packets.
Moreover in accordance with a preferred embodiment of the present invention the timestamp are program clock references. Further in accordance with a preferred embodiment of the present invention the buffer is a receive buffer, the decoder including a decode buffer.
Still further in accordance with a preferred embodiment of the present invention the buffer includes a receive buffer and a decode buffer. BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be understood and appreciated more folly from the following detailed description, taken in conjunction with the drawings in which: Fig. 1 is a partly pictorial, partly block diagram view of a media stream system constructed and operative in accordance with a preferred embodiment of the present invention;
Fig. 2 is a partly pictorial, partly block diagram view of a set-top box for use in the system of Fig. 1 constructed and operative in accordance with a preferred embodiment of the present invention;
Fig. 3 is a flow chart showing a preferred method of operation of the set-top box of Fig. 2; and
Fig. 4 is a partly pictorial, partly block diagram view of a media stream system constructed and operative in accordance with another preferred embodiment of the present invention.
DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT
Reference is now made to Fig. 1, which is a partly pictorial, partly block diagram view of a media stream system 10 constructed and operative in accordance with a preferred embodiment of the present invention. The system 10 preferably includes a broadcaster Headend 12 having an encoder 14 for encoding a media stream 16. The media stream 16 generally includes a plurality of packets 18. The Headend 12 also typically includes a clock 22, preferably operationally associated with the encoder 14. The packets 18 are typically associated with a plurality of timestamps 20. The timestamps 20 are preferably assigned by the encoder 14 based on the time provided by the clock 22, such that each packet 18 is associated with one timestamp 20 assigned at the time of encoding each packet 18. By way of example, in an MPEG system, the timestamps 20 are typically program clock references (PCRs).
The Headend 12 also typically includes a transmitter 24 for broadcasting the media stream 16 to a plurality of subscribers 28 (only one shown for the sake of clarity) via a satellite 26. However, it will be appreciated by those ordinarily skilled in the art that the media stream 16 may be transmitted by any suitable transmission method, for example, but not limited to, cable, terrestrial communication or Internet Protocol (IP). The media stream 16 is preferably received by a satellite dish 32 attached to a house 34 of the subscriber 28. The media stream 16 is then typically received by a receiver decoder 30 which is operationally connected to the satellite dish 32. In the example of Fig. 1, the receiver decoder 30 is a personal video recorder incorporating set-top box functionality with video recording functionality. However, it will be appreciated by those ordinarily skilled in the art that the receiver decoder 30 may be any suitable receiving device such as a set-top box or a suitable computer. The receiver decoder 30 is typically connected to other set- top boxes 36 in the house 34 via a home network 38. The home network 38 connects the set-top boxes 36 to the receiver decoder 30 enabling the set-top boxes 36 to play content received via the receiver decoder 30. The home network 38 is typically a bandwidth-constrained network which is a wired or wireless network. Therefore, the packets 18 of the media stream 16, received by the set-top boxes 36, are at least partially transferred via a bandwidth constrained network.
The media stream 16 is transmitted by a transmitter 40 of the home network 38 from the receiver decoder 30 to the set-top boxes 36. The term "transmitter" as used in the specification and claims is defined as an arrangement to send the media stream 16 from one device to another, via a wired or wireless network. Some systems use rate adaptive encoding in the transmitter 40 to adjust the service bitrate of the media stream 16 to the available bandwidth of the network 38. Adaptive encoding introduces an inherent reduction in quality, due to the non-perfect decode-encode stage. Moreover, picture quality may be reduced considerably when reducing the bitrate during a period of poor performance of the home network 38. The system 10 of the present invention, in preferred embodiments thereof, generally maintains the same video encoding as the original broadcast, preferably eliminating the encode-decode process, which is expensive. However, it will be appreciated by those ordinarily skilled in the art that adaptive encoding can be implemented with a preferred embodiment of the present invention.
Additionally, the media stream 16 is preferably transmitted by the transmitter 40 as fast as the network 38 allows. The transmitter 40 typically needs a buffer (not shown) to store data which cannot be sent immediately. The buffer of the transmitter 40 generally does not incorporate another delay into the system 10 as the default state of the buffer of the transmitter 40 is empty (whereas the default state of a buffer of a receiver is generally full or close to full).
Reference is now made to Figs. 2 and 3. Fig. 2 is a partly pictorial, partly block diagram view of one of the set-top boxes 36, namely, a set-top box 42, for use in the system 10 of Fig. 1, constructed and operative in accordance with a preferred embodiment of the present invention. Fig. 3 is a flow chart showing a preferred method of operation of the set-top box 42 of Fig. 2.
The set-top box 42 preferably includes a receiver 44, a receive buffer 46, a decoder 48, a fill-level determiner 50 and a fill-level manager 52. The receiver 44 is preferably operative to receive the packets 18 from the home network 38 (block 74).
The receive buffer 46 is preferably operationally connected to the receiver 44. The receive buffer 46 is preferably operative to store the packets 18 (block 76).
The decoder 48, which is preferably operationally connected to the receive buffer 46, is preferably operative to receive the packets 18 from the receive buffer 46 and decode the packets 18 (block 78). The decoder 48 typically includes a decode buffer 45 to receive the packets 18 from the receive buffer 46 prior to decoding. In an MPEG system, the decode buffer 45 is typically the MPEG variable bitrate (VBR) buffer whose level may vary wildly. The decoder 48 is generally in complete control of the decode buffer 45 and typically makes sure that the decode buffer 45 never runs out of data. Ascertaining any information about the level of the decode buffer 45 is generally practically impossible. The decode buffer 45 and receive buffer 46 are typically implemented in a single physical buffer, but are logically separate.
Alternatively, the decode buffer 45 and the receive buffer 46 may be physically and logically separate.
Alternatively, the decode buffer 45 and the receive buffer 46 may be included in a single physical and logical buffer (a hybrid buffer 47) requiring special treatment described below in more detail.
The fill-level determiner 50, preferably operationally connected to the receive buffer 46, is preferably operative to determine a fill-level of the receive buffer 46 as a time difference between: an oldest timestamp 56 of the timestamps 20 of the packets 18 currently stored in the receive buffer 46; and a newest timestamp 58 of the timestamps 20 of the packets 18 currently stored in the receive buffer 46 (block 80).
The timestamps 20 used in the fill-level determination are preferably of the same type (for example, both the oldest timestamp 56 and the newest timestamp 58 are PCRs) and not a combination of different types of timestamps. It should be noted that each packet 18 may have more than one type of timestamp, for example, but not limited to, timestamps generated by the encoder 14, a multiplexer (not shown) and/or the transmitter 40, such as program clock references (PCRs), frame decode time stamps (DTSs), frame presentation time stamps (PTSs), time stamps of IP packets (e.g.: reference time stamps (RTSs)), and timestamps or time codes originating from video sources (e.g.: vertical interval time code (VITC)).
If the set-top box 42 is implemented with the hybrid buffer 47 combining the receive buffer 46 and the decode buffer 45, then the fill-level determiner 50 needs to take into account the decode timestamps (not shown) of the packets 18, allowing calculation of what the decode buffer 45 fill-level would have been if the decode buffer 45 was separate from the receive buffer 46 (at least logically separate). The decode buffer 45 fill-level then needs to be subtracted from the total fill-level of the hybrid buffer leaving a logical receive buffer fill- level. The total fill-level of the hybrid buffer is determined as a time difference between an oldest timestamp (not shown) of the timestamps 20 of the packets 18 currently stored in the hybrid buffer; and a newest timestamp (not shown) of the timestamps 20 of the packets 18 currently stored in the hybrid buffer.
The fill-level manager 52, which is preferably operationally connected to the receive buffer, is preferably operative to perform an action based on the determined fill-level. The action typically includes adjusting the playback speed of the media stream 16 by the decoder 48 based on the determined fill-level of the receive buffer 46 (block 82).
The playback speed is preferably adjusted in accordance with predefined threshold levels of the buffer fill-level, so that when the buffer fill-level falls below a certain level the playback speed is reduced. When the buffer fill-level falls below another threshold level, the playback speed is reduced again, and so on. Alternatively, the adjustment of the playback speed is performed such that the playback speed is proportional to fill-level, so that the lower the fill-level, the lower the playback speed. Therefore, the playback speed is generally decreased (either smoothly or in steps) as the fill-level of the receive buffer 46 drops, reducing the rate that the receive buffer 46 empties. If the consumption rate of the media stream 16 is reduced to less than the current network 38 throughput, then the receive buffer 46 generally starts to fill. The receive buffer 46 fill-level is typically in effect acting as a feedback mechanism to match the rate of data consumption from the receive buffer 46 to the rate of data acquisition from the network 38. To ensure that the buffer of the transmitter 40 (Fig. 1) does not continue to grow indefinitely, generally requires the decoder 48 to play faster than real-time once network throughput is restored, thereby allowing the transmitter 40 to transmit the media stream 16 faster than real-time emptying the buffer of the transmitter 40, and restoring the system 10 to the default steady state of empty buffer of the transmitter 40 and full receive buffer 46.
To allow the decoder 48 to determine when to play faster than real- time typically requires the receive buffer 46 to be maintained at a nominal full- level which is slightly less than full capacity of the receive buffer 46. If the receive buffer 46 fill-level exceeds the nominal full-level, then the decoder 48 typically plays faster than real time.
By employing the above method of buffer fill-level control, the size and cost of the receive buffer 46, as well as the delay caused by the receive buffer 46, are generally minimized. Additionally, by maintaining the receive buffer 46 fill-level above zero, the noticeable glitches and stutters in the audio and video caused by reduced network throughput are generally eliminated.
Alternatively or additionally, the action performed by the fill-level manager 52 includes reestablishing a connection with the network 38, for example, if the fill-level of the receive buffer 46 drops below a predetermined level or if the fill-level is below a predetermined level for a predetermined time period
(block 84).
Some advantages of controlling the buffer 46 based on a time related buffer fill-level are now described below. By way of introduction, in a variable bitrate (VBR) service, the amount of data sent per second typically changes quite dramatically, usually between 1 megabit per second and 10 megabits per second, by way of example only. Therefore, it is very difficult to set a data threshold for the receive buffer 46 for use in controlling the variable speed playback. For example, a threshold requiring the video to be played slower than real-time for a 1 megabit per second stream would typically be far lower than that for a 10 megabits per second stream.
However, as the stream bitrate is typically changing frame by frame in an unpredictable way, there is generally not a stable data-threshold which would be appropriate.
For example, if the network 38 usually introduces a delay of less than 500 milliseconds (ms), but occasionally introduces a delay of up to 1 second, then a sensible receive-buffer level in a non-variable playback speed set-top box may be equal to the maximum expected delay (1 second) multiplied by the maximum bitrate (10 megabits per second) multiplied by a safety factor of 2, giving 20 megabits or 2.5 megabytes. So a buffer of 2.5 megabytes generally insulates against receive buffer under-runs, but typically at a considerable cost in terms of total delay. For example, if the client tunes to the service whilst the service is in the 1 megabit per second mode then 20 seconds of data are buffered prior to decoding.
As described above, with the set-top box 42, the playback speed of the decoder 48 is adjustable allowing the receive buffer 46 to refill. Therefore, in the above example, where the home network 38 usually introduces a delay of less than 500 ms, but occasionally introduces a delay of up to 1 second, the set-top box 42 only needs to delay the media stream 16 by slightly more than the usual network delay (of less than 500 ms) to say, 1000 ms.
Reference is now made to row 1 of table 1. If a fill-level threshold is set at 500 ms then any usual delay (of less than 500 ms) would reduce the receive buffer 46 fill-level from 1000 ms to 501 ms or more and not affect the playback speed. Reference Is now made to row 2 of table 1. An occasional network delay of 500 ms or more, reduces the determined fill-level of the receive buffer 46 (as determined by the fill-level determiner 50) to 500 ms or less, triggering slower than real-time playback at 80% of the real-time playback speed. The amount of effective time left in the receive buffer 46, known as the new effective fill-level, is inversely proportional to the playback speed. In other words, as the playback speed is reduced, the effective amount of data in the receive buffer 46 increases as the data is used at a slower rate. Therefore, if there is 500 ms of data in the receive buffer 46 when the decoder 48 is playing at 100% playback speed, there is effectively 625 ms of data if played at 80% speed (500 ms divided by 80%). Therefore, there is an effective fill level at the new 80% speed of 625 ms.
Table 1
Effective Effective Fill-
Network Determined Old Fill-level New level
Delay Fill-level Speed at old speed Speed at new speed
<500ms >500ms 100% >500ms 100% >500ms
500ms 500ms 100% 500ms 80% 625ms
625ms 400ms 80% 500ms 60% 667ms
792ms 300ms 60% 500ms 40% 750ms
1042ms 200ms 40% 500ms 20% 1000ms
1542ms 100ms 20% 500ms 0% infinity
Reference is now made to row 3 of table 1. If the throughput of the home network 38 deteriorates further, an additional delay, of say 125 ms, is introduced (giving a total delay of 625 ms). The additional delay of 125 ms is associated with a reduction in the determined fill-level by 100 ms, from 500 ms to
400 ms (as determined by the fill-level determiner 50) at 80% speed (100 ms divided by 80% equals 125 ms). The effective fill level has now also dropped from 625 ms (row 2 of table 1) to 500 ms (row 3 of table 1). The effective fill level is calculated by dividing the determined fill-level of 400 ms by the speed of 80%, giving 500 ms. The drop in the effective fill-level to 500 ms (and/or the drop in the determined fill-level to 400 ms), triggers a further reduction in the speed of the decoder 48 to 60% speed in order to compensate for the additional network delay, resulting in a new effective fill-level of 667 ms (400 ms divided by 60%). Reference is now made to row 4 of table 1. If the throughput of the home network 38 deteriorates further, an additional delay is introduced, of say 167ms (giving a total delay of 792 ms). The additional delay of 167 ms is associated with a reduction in the determined fill-level by 100 ms, from 400 ms to 300 ms (as determined by the fill-level determiner 50) at 60% speed (100 ms divided by 60% equals 167 ms). The effective fill level has now also dropped from 667 ms (row 3 of table 1) to 500 ms (row 4 of table 1). The effective fill level is calculated by dividing the determined fill-level of 300 ms by the speed of 60%, giving 500 ms. The drop in the effective fill-level to 500 ms (and/or the drop in the determined fill-level to 300 ms), triggers a further reduction in the speed of the decoder 48 to 40% speed in order to compensate for the additional network delay, resulting in a new effective fill-level of 750 ms (300 ms divided by 40%).
Reference is now made to row 5 of table 1. If the throughput of the home network 38 deteriorates further, an additional delay is introduced, for example, 250ms (giving a total delay of 1042 ms). The additional delay of 250 ms is associated with a reduction in the determined fill-level by 100 ms, from 300 ms to 200 ms (as determined by the fill-level determiner 50) at 40% speed (100 ms divided by 40% equals 250 ms). The effective fill level has now also dropped from 750 ms (row 4 of table 1) to 500 ms (row 5 of table 1). The effective fill level is calculated by dividing the determined fill-level of 200 ms by the speed of 40%, giving 500 ms. The drop in the effective fill-level to 500 ms (and/or the drop in the determined fill-level to 200 ms), triggers a further reduction in the speed of the decoder 48 to 20% speed in order to compensate for the additional network delay, resulting in a new effective fill-level of 1000 ms (200 ms divided by 20%).
Reference is now made to row 6 of table 1. If the throughput of the home network 38 deteriorates further, an additional delay is introduced, for example, 500ms (giving a total delay of 1542 ms). The additional delay of 500 ms is associated with a reduction in the determined fill-level by 100 ms, from 200 ms to 100 ms (as determined by the fill-level determiner 50) at 20% speed (100 ms divided by 20% equals 500 ms). The effective fill level has now also dropped from 1000 ms (row 5 of table 1) to 500 ms (row 6 of table 1). The effective fill level is calculated by dividing the determined fill-level of 100 ms by the speed of 20%, giving 500 ms. The drop in the effective fill-level to 500 ms (and/or the drop in the determined fill-level to 100 ms), triggers a further reduction in the speed of the decoder 48 to 0% speed, effectively freezing the decoder 48 and also typically triggering an action to reestablish a connection with the network 38. It will be appreciated by those ordinarily skilled in the art that an action to reestablish a connection with the network 38 may be taken at any other suitable trigger point and/or after a predetermined time period of reduced network throughput.
Using the parameters of table 1 with the set-top box 42 typically ensures that the receive buffer 46 does not under-run, but the playback speed reduces quite harshly as the delay increases. Choosing different trigger levels and playback speeds for the trigger levels may achieve a more gradual effect, by way of example only. It will be appreciated by those ordinarily skilled in the art that choosing different trigger levels and playback speeds for the trigger levels may achieve an even harsher effect.
In the above extreme example, the delay after channel change has generally reduced from a variable 2 to 20 seconds to a fixed 1 second.
As the playback speed of the decoder 48 is generally adjusted based on the time dependent fill-level of the receive buffer 46, the decoder 48 does not typically under-run when the VBR coding is high (for example, but not limited to, 10 megabits per second) or slowing down unnecessarily when the VBR coding is low (for example, but not limited to, 1 megabit per second). Additionally, the set- top box 42 generally does not cause a short delay (for example, but not limited to, 2 seconds) when tuned to a service operating in a high coding phase, or a long delay (for example, but not limited to, 20 seconds) when tuned to a service operating in a low coding phase. Instead, a steady delay of 1 second is preferably achieved no matter what phase the VBR coding is in during channel change. Reducing the delay through the system 10 is generally very beneficial, as less memory is typically required in the receive buffer 46, and the system 10 is generally far more responsive to the user. The parameters used in table 1 are only examples. In practice, values are typically chosen based on the statistical distribution of the delays in the system 10, the amount of memory available in the set-top box 42, and the amount of delay the user and/or operator feels is acceptable.
Data may be delayed in the network 38 due to congestion. However, after the congestion clears, the components in the network 38 will generally try to deliver the data as quickly as possible. In wireless networking, the post-congestion delivery may not be significantly faster, as the network 38 normally runs close to capacity. Hence, there is little additional bandwidth to overwhelm the set-top boxes 36. In wired networks, and some wireless networks, data can be transferred many times faster than the normal rate. Therefore, following a period of congestion, the decoder 48 may receive more data than can be handled.
Therefore, during design of the receiver (for example, but not limited to, the set-top box 42 in Fig. 2), sufficient memory is preferably provisioned to meet the demands of the system 10 as well as placing suitable constraints on the data path in the network 38 so that bursts are controlled and the memory of the set-top box 42 is not exhausted. Longer-term throughput problems may result in "back-pressure" on the server of the data. The receiver decoder 30 in Fig. 1 is the server of the data in the system 10. The server then has several options to resolve the problem. First, provide a sufficiently large transmit queue to cater for the variable consumption of the receiver. Second, reduce the bitrate of the media stream 16 to compensate for the lower throughput (if transcoding or similar is available). If transcoding is used, the original timing of the media stream 16 is typically maintained. However, a transcoder (not shown) may generate new timing. If new timing is generated, the fill-level of the receive buffer 46 may be based on the new timing or the original timing. Although, basing the fill-level on the new timing is preferred, the clock drift between the original and new timestamps is generally so small as to have a negligible effect on the receive buffer 46.
Third, if the media stream 16 is a live stream, the media stream 16 can be recorded to disk. The media stream 16 is then played from the disk so that the disk acts as a suitable buffer. Fourth, the network 38 may be reconfigured, or streams dropped to provide the best service for the available network.
By way of introduction, although the viewer generally does not perceive the slight reduction or increase in video speed, the same is typically not true for audio. If audio is played slower than real-time, the pitch of the actors' voices will be lowered. The opposite is true if the audio is played at a higher rate, the pitch goes up. The frequency shift is generally unacceptable to a listener.
Therefore, in accordance with a most preferred embodiment of the present invention, the set-top box 42 includes an audio module 54.
The audio module 54 is preferably operative to adjust the pitch of an audio element (not shown) of the media stream 16 in order to compensate for the adjustment of the playback speed of the decoder 48 using pitch shifting (block 86).
A preferred method to correct the pitch error caused by adjusting the playback speed is to apply a Fourier transform to convert the audio into the frequency domain. Then, the frequency domain values are shifted up or down in frequency, as appropriate. Finally, applying an inverse Fourier transform converts the audio back to the time domain. Playing the audio at the adjusted playback speed causes the pitch of the audio to be shifted, canceling out the effect of the pitch shift performed with the Fourier transforms so that the pitch of the actor's voice remains constant and the user typically does not perceive the change in playback speed. In effect, it generally appears as if the actor is talking slightly faster, or slower, but still with the same tone of voice.
Pitch shifting is generally quite simple for digitally compressed audio decoders to achieve. In digital audio, the compressed input data is typically already in the frequency domain, and so the first Fourier transform is generally unnecessary. Secondly the audio decoding generally requires an inverse Fourier transform to be applied anyway. Typically, the only difference is the shifting of the frequency samples up or down by the appropriate amount.
The pitch shifting technique is typically implemented in set-top boxes as part of the review buffer functionality allowing viewers to start watching a live program from the beginning of the program even after the start time. The content is generally played slightly faster that real-time to enable the viewer to gradually catch up with the live action.
Reference is now made to Fig. 4, which is a partly pictorial, partly block diagram view of a media stream system 60 constructed and operative in accordance with another preferred embodiment of the present invention. The system 60 is substantially the same as the system 10 of Figs. 1-3 except for the following differences. The system 60 includes a Headend 62 with an encoder 64. The Headend 62 is preferably operative to broadcast programming via an Internet Protocol 66 (IP) to subscribers including a subscriber 68. The subscriber 68 receives programming from the Headend 62 via a set-top box 70 (or PVR). The set-top box 70 is connected to the Internet Protocol 66 via a residential gateway 72. The system 60 is typically an Internet Protocol Television (IPTV) system and/or a Video-on-demand (VOD) system, by way of example only.
Bandwidth may be restricted at various sections of the system 60. When the Headend 62 is located at the office of the content provider (not shown), the content first needs to be sent across the Internet 66 to the Internet Service Provider (ISP) (not shown) for broadcast to the subscribers. When the Headend 62 is located in the server room, for example, the transfer to the ISP across the Internet 66 is avoided. Congestion may occur at any point in the Internet 66 as well as in the home network.
It is appreciated that software components of the present invention may, if desired, be implemented in ROM (read only memory) form. The software components may, generally, be implemented in hardware, if desired, using conventional techniques. It will be appreciated that various features of the invention which are, for clarity, described in the contexts of separate embodiments may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment may also be provided separately or in any suitable sub-combination. It will also be appreciated by persons skilled in the art that the present invention is not limited by what has been particularly shown and described hereinabove. Rather the scope of the invention is defined only by the claims which follow.

Claims

What is claimed is:CLAIMS
1. A system for processing a plurality of packets of a media stream encoded by an encoder, the packets being associated with a plurality of timestamps such that each of the packets is associated with one of the timestamps, the system comprising: a receiver to receive the packets; a buffer, operationally connected to the receiver, to store the packets; a decoder, operationally connected to the buffer, to receive the packets from the buffer and decode the packets; a fill-level determiner, operationally connected to the buffer, to determine a fill-level of the buffer based on a time difference between: an oldest timestamp of the timestamps of the packets currently stored in the buffer; and a newest timestamp of the timestamps of the packets currently stored in the buffer; and a fill-level manager, operationally connected to the buffer, to perform an action based on the determined fill-level.
2. The system according to claim 1, wherein the fill-level manager is operative to adjust a playback speed of the media stream by the decoder based on the determined fill-level of the buffer.
3. The system according to claim 2, wherein the fill-level manager is operative to adjust the playback speed of the media stream by the decoder according to a plurality of thresholds of the buffer fill-level.
22
4. The system according to claim 2 or claim 3, further comprising an audio module to adjust the pitch of an audio element of the media stream in order to compensate for the adjustment of the playback speed of the decoder.
5. The system according to claim 4, wherein the audio module is operative to adjust the pitch using pitch shifting.
6. The system according to claim 1, wherein the fill-level manager is operative to reestablish a connection with the network.
7. The system according to any of claims 1-6, wherein the timestamps are assigned by the encoder at a time of encoding of each of the packets.
8. The system according to claim 7, wherein the timestamps are program clock references.
9. The system according to any of claims 1-8, wherein the buffer is a receive buffer, the decoder including a decode buffer.
10. The system according to any of claims 1-8, wherein the buffer includes a receive buffer and a decode buffer.
11. The system according to any of claims 1-10, the packets being at least partially transferred via a bandwidth constrained network.
12. A method for processing a plurality of packets of a media stream encoded by an encoder, the packets being associated with a plurality of timestamps such that each of the packets is associated with one of the timestamps, the method comprising: receiving the packets; storing the packets in a buffer;
23 decoding the packets; determining a fill-level of the buffer based on a time difference between: an oldest timestamp of the timestamps of the packets currently stored in the buffer; and a newest timestamp of the timestamps of the packets currently stored in the buffer; and performing an action based on the determined fill-level.
13. The method according to claim 12, wherein performing the action includes adjusting a playback speed of the media stream based on the determined fill-level of the buffer.
14. The method according to claim 13, wherein the adjusting of the playback speed is performed according to a plurality of thresholds of the buffer fill-level.
15. The method according to claim 13 or claim 14, further comprising adjusting the pitch of an audio element of the media stream in order to compensate for the adjustment of the playback speed.
16. The method according to claim 15, wherein the adjusting the pitch includes pitch shifting.
17. The method according to claim 12, wherein performing the action includes reestablishing a connection with the network.
18. The method according to any of claims 12-17, wherein the timestamps are assigned by the encoder at a time of encoding of the packets.
19. The method according to claim 18, wherein the timestamp are program clock references.
24
20. The method according to any of claims 12-19, wherein the buffer is a receive buffer, the decoder including a decode buffer.
21. The method according to any of claims 12-19, wherein the buffer includes a receive buffer and a decode buffer.
22. The method according to any of claims 12-21, the packets being at least partially transferred via a bandwidth constrained network.
25
PCT/GB2007/000032 2007-01-08 2007-01-08 Buffer management WO2008084179A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/GB2007/000032 WO2008084179A1 (en) 2007-01-08 2007-01-08 Buffer management

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/GB2007/000032 WO2008084179A1 (en) 2007-01-08 2007-01-08 Buffer management

Publications (1)

Publication Number Publication Date
WO2008084179A1 true WO2008084179A1 (en) 2008-07-17

Family

ID=38328271

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2007/000032 WO2008084179A1 (en) 2007-01-08 2007-01-08 Buffer management

Country Status (1)

Country Link
WO (1) WO2008084179A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2180708A1 (en) * 2008-10-22 2010-04-28 TeliaSonera AB Method for streaming media playback and terminal device
WO2010076649A2 (en) * 2008-12-31 2010-07-08 Transwitch India Pvt. Ltd. Packet processing system on chip device
WO2010129435A1 (en) * 2009-05-04 2010-11-11 Rovi Solutions Corporation System and methods for buffering of real-time data streams
DE102010011098A1 (en) * 2010-03-11 2011-11-17 Daimler Ag Audio and video data playback device installed in motor car, has buffer unit in which playback velocity of audio and video is reduced in relation to normal velocity, until preset quantity of audio and video is stored in buffer unit
US20120233288A1 (en) * 2011-02-11 2012-09-13 Research In Motion Limited Apparatus, and associated method, by which to play out media data pursuant to a media data service
WO2015182189A1 (en) * 2014-05-28 2015-12-03 ソニー株式会社 Information processing apparatus, information processing method, and program
CN108810656A (en) * 2018-06-12 2018-11-13 深圳国微视安科技有限公司 A kind of the debounce processing method and processing system of real-time live broadcast TS streams
EP3462745A1 (en) * 2017-09-27 2019-04-03 Nokia Solutions and Networks Oy Modifying a buffer size
WO2019223040A1 (en) * 2018-05-25 2019-11-28 网宿科技股份有限公司 Method and device for synthesizing audio and video data stream
CN113038128A (en) * 2021-01-25 2021-06-25 腾讯科技(深圳)有限公司 Data transmission method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5822537A (en) * 1994-02-24 1998-10-13 At&T Corp. Multimedia networked system detecting congestion by monitoring buffers' threshold and compensating by reducing video transmittal rate then reducing audio playback rate
US6665751B1 (en) * 1999-04-17 2003-12-16 International Business Machines Corporation Streaming media player varying a play speed from an original to a maximum allowable slowdown proportionally in accordance with a buffer state
WO2004062291A1 (en) * 2003-01-07 2004-07-22 Koninklijke Philips Electronics N.V. Audio-visual content transmission

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5822537A (en) * 1994-02-24 1998-10-13 At&T Corp. Multimedia networked system detecting congestion by monitoring buffers' threshold and compensating by reducing video transmittal rate then reducing audio playback rate
US6665751B1 (en) * 1999-04-17 2003-12-16 International Business Machines Corporation Streaming media player varying a play speed from an original to a maximum allowable slowdown proportionally in accordance with a buffer state
WO2004062291A1 (en) * 2003-01-07 2004-07-22 Koninklijke Philips Electronics N.V. Audio-visual content transmission

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010046531A1 (en) * 2008-10-22 2010-04-29 Teliasonera Ab Method for streaming media playback and terminal device
EP2180708A1 (en) * 2008-10-22 2010-04-28 TeliaSonera AB Method for streaming media playback and terminal device
WO2010076649A2 (en) * 2008-12-31 2010-07-08 Transwitch India Pvt. Ltd. Packet processing system on chip device
WO2010076649A3 (en) * 2008-12-31 2011-11-24 Transwitch India Pvt. Ltd. Packet processing system on chip device
US8499059B2 (en) 2009-05-04 2013-07-30 Rovi Solutions Corporation System and methods for buffering of real-time data streams
WO2010129435A1 (en) * 2009-05-04 2010-11-11 Rovi Solutions Corporation System and methods for buffering of real-time data streams
DE102010011098A1 (en) * 2010-03-11 2011-11-17 Daimler Ag Audio and video data playback device installed in motor car, has buffer unit in which playback velocity of audio and video is reduced in relation to normal velocity, until preset quantity of audio and video is stored in buffer unit
US20120233288A1 (en) * 2011-02-11 2012-09-13 Research In Motion Limited Apparatus, and associated method, by which to play out media data pursuant to a media data service
WO2015182189A1 (en) * 2014-05-28 2015-12-03 ソニー株式会社 Information processing apparatus, information processing method, and program
JPWO2015182189A1 (en) * 2014-05-28 2017-04-20 ソニー株式会社 Information processing apparatus, information processing method, and program
EP3462745A1 (en) * 2017-09-27 2019-04-03 Nokia Solutions and Networks Oy Modifying a buffer size
WO2019223040A1 (en) * 2018-05-25 2019-11-28 网宿科技股份有限公司 Method and device for synthesizing audio and video data stream
CN108810656A (en) * 2018-06-12 2018-11-13 深圳国微视安科技有限公司 A kind of the debounce processing method and processing system of real-time live broadcast TS streams
CN113038128A (en) * 2021-01-25 2021-06-25 腾讯科技(深圳)有限公司 Data transmission method and device, electronic equipment and storage medium
CN113038128B (en) * 2021-01-25 2022-07-26 腾讯科技(深圳)有限公司 Data transmission method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
WO2008084179A1 (en) Buffer management
US10623785B2 (en) Streaming manifest quality control
US7965771B2 (en) Method and apparatus for immediate display of multicast IPTV over a bandwidth constrained network
CA2385230C (en) Adaptive bandwidth system and method for broadcast data
US9426335B2 (en) Preserving synchronized playout of auxiliary audio transmission
CA2435936C (en) Method and system for buffering of streamed media
EP1946501B1 (en) Expedited digital signal decoding
US7979885B2 (en) Real time bit rate switching for internet protocol television
US8300667B2 (en) Buffer expansion and contraction over successive intervals for network devices
EP2759111B1 (en) Statistical multiplexing of streaming media
US20030103243A1 (en) Transmission system
US20060230176A1 (en) Methods and apparatus for decreasing streaming latencies for IPTV
WO2005050989A1 (en) Apparatus and method for use in providing dynamic bit rate encording
KR20120010089A (en) Method and apparatus for improving quality of multimedia streaming service based on hypertext transfer protocol
EP2196033A1 (en) System and method for an early start of audio-video rendering
EP1224643B1 (en) Adaptive bandwidth system and method for broadcast data
CA2706718C (en) Method and apparatus for deferring transmission of an sdv program to conserve network resources
CA2769949C (en) Apparatus and method for tuning to a channel of a moving pictures expert group transport stream (mpeg-ts)
KR101625663B1 (en) Method and Apparatus for Receiving Content
WO2009053595A1 (en) Device for the continuous reception of audio and/or video data packets
RU2389145C2 (en) Method of controlling transmission of data packets for data with variable bitrate
EP2075960A1 (en) System and method of adapting video content streams to variable transmission conditions in a radiotelephone network and to the dynamics of the video source content
Bing MPEG-4 AVC video traffic smoothing for broadband cable networks
Diepolder et al. Improved Channel Switching for Hybrid Unicast/Broadcast Mobile Television.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07700335

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07700335

Country of ref document: EP

Kind code of ref document: A1