WO1998035447A2 - Audio coding method and apparatus - Google Patents

Audio coding method and apparatus Download PDF

Info

Publication number
WO1998035447A2
WO1998035447A2 PCT/FI1998/000029 FI9800029W WO9835447A2 WO 1998035447 A2 WO1998035447 A2 WO 1998035447A2 FI 9800029 W FI9800029 W FI 9800029W WO 9835447 A2 WO9835447 A2 WO 9835447A2
Authority
WO
WIPO (PCT)
Prior art keywords
spectral
values
value
stream
prediction coefficients
Prior art date
Application number
PCT/FI1998/000029
Other languages
French (fr)
Other versions
WO1998035447A3 (en
Inventor
Lin Yin
Original Assignee
Nokia Mobile Phones Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Mobile Phones Limited filed Critical Nokia Mobile Phones Limited
Priority to AU56648/98A priority Critical patent/AU5664898A/en
Publication of WO1998035447A2 publication Critical patent/WO1998035447A2/en
Publication of WO1998035447A3 publication Critical patent/WO1998035447A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/06Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients

Definitions

  • the present invention relates to a method for coding and decoding electronic signals and to apparatus for carrying out such a method. It is well known that the transmission of data in digital form provides for increased signal to noise ratios and increased information capacity along the transmission channel. There is however a continuing desire to further increase channel capacity by compressing digital signals to an ever greater extent. In relation to audio signals, two basic compression principles are conventionally applied. The first of these involves removing the statistical or deterministic redundancies in the source signal whilst the second involves suppressing or eliminating from the source signal elements which are redundant in so far as human perception is concerned.
  • a particular form of adaptive prediction is known as 'backward adaptive lattice prediction'.
  • Fuchs et al 'Improving MPEG Audio Coding by Backward Adaptive Linear Stereo Prediction', AES Convention, New York ' , Preprint 4086 Oct. 1995, describes one such backward adaptive lattice prediction algorithm.
  • backward adaptive lattice prediction For each spectral value (the 'current' value) of each frequency component, backward adaptive lattice prediction generates a set of prediction coefficients in the coder from the previously calculated spectral values of that component (via the intermediate calculation of quantized spectral values). These coefficients are then used to predict the value of the current spectral value.
  • the error between the current spectral value and the predicted spectral value is determined and it is this error value (after quantisation) which is transmitted to the receiver. It will be appreciated that at any given time, the current prediction coefficients have effectively been derived from all previously received sample values. At the receiver, the coefficients are similarly calculated and reconstructed spectral values obtained by combining the predicted spectral values with the received error values.
  • the new MPEG-2 AAC standard employs psychoacoustic modeling and backward adaptive linear prediction with 1024 frequency components. It is envisaged that the new MPEG-4 VM standard will have similar requirements. However, such a large number of frequency components results in a large computational overhead due to the complexity of the prediction algorithm and also requires the availability of large areas of memory to store the calculated coefficients. Additionally, with backward adaptive lattice prediction, even when the predictors are turned 'off' (e.g. when no compression advantage can be obtained by transmitting the error values), the decoder must continue to determine the coefficients so that the predictors can be turned 'on' again when required without any temporary degradation in performance. This provides an additional computation overhead.
  • This object is achieved by utilising a backward adaptive prediction algorithm which acts upon a relatively large number of frequency components of an audio signal to be coded and which calculates prediction coefficients for a component from a predetermined number of previously received sample values of that component.
  • a method of coding an audio electrical signal using backward adaptive prediction comprising the steps of:
  • step (b) transforming the time frame into the frequency domain to generate a frequency spectrum having 512 or more spectral components; (c) receiving subsequent time frames of said audio electrical signal and repeating step (b) for these frames in sequence to generate a stream of spectral data values for each spectral component; (e) for each said stream, calculating a set of prediction coefficients for each spectral value using the covariances of a predetermined number of previously determined reconstructed spectral values of the stream, using said set of prediction coefficients to generate a predicted spectral value, and calculating the error between the predicted spectral value and the corresponding actual spectral value, wherein the calculated errors provide a coded representation of the spectral value stream and said errors can be recombined with predicted spectral values to obtain reconstructed spectral values.
  • the method of the present invention does not directly calculate a set of prediction coefficients from all preceding spectral components as is the case with conventional backward adaptive prediction algorithms. That is to say that the prediction coefficients are recalculated for each spectral value and are not merely adapted from the previously calculated set. Thus, during periods when the predictor is turned off, there is no requirement to continue updating the coefficients at the decoder. It has been discovered that, whilst backward adaptive prediction algorithms which calculate prediction coefficients from the covariances of a predetermined number of previous spectral values are generally not suitable for coding audio signals subdivided into a relatively small number of frequency sub-bands (e.g.
  • the prediction order is one or two. More preferably, the prediction order is two.
  • said predetermined number of previously received consecutive spectral values are used to derive a corresponding number of quantized spectral values. It is then the quantized values which are used to calculate said prediction coefficients.
  • the time windows taken from the audio signal are overlapping.
  • each window may contain 2048 sample points with adjacent window having a 50% overlap.
  • the windows may also be contiguous.
  • a new set of prediction coefficients may be calculated for each and every spectral value.
  • the lower limit on the predetermined number of previously received sample points used to calculate each set of prediction coefficients is determined by the coding quality required. Preferably however, the number is four or more. The upper limit on this number is determined by memory and computational constraints. Preferably the number is ten or less. More preferably the predetermined number is six.
  • Any suitable method for evaluating the prediction coefficients may be used, e.g. an autocorrelation method. However, it has been found that the least squares method is particularly advantageous.
  • the prediction coefficients used to calculate predicted spectral values are linear prediction coefficients.
  • a method of decoding an audio electrical signal encoded using the method of the above first aspect comprising the steps of: receiving as an input signal a sequence of error values corresponding to the coded audio signal and separating these values into spectral component streams; for each stream, determining a corresponding predicted spectral component value for each error value using a set of prediction coefficients, the prediction coefficients being calculated using covariances of a predetermined number of previously determined consecutive predicted spectral component values for that stream, and combining the error value and the predicted spectral value to provide a reconstructed spectral value; and substantially reconstructing said audio signal by combining and frequency-to- time transforming the reconstructed spectral values of all of the streams.
  • apparatus for coding an audio electrical signal using backward adaptive prediction comprising: an input for receiving an audio electrical signal to be coded; a time-to-frequency domain transformer for transforming sequentially received time frames of the received signal from the time domain to the frequency domain to provide frequency spectra having 512 or more spectral components; signal processing means associated with each spectral component for receiving as a stream the associated spectral values, for calculating for each spectral value a set of prediction coefficients using covariances of a predetermined number of previously reconstructed spectral values, for using said set of prediction coefficients to generate a predicted spectral value, and for calculating the error between the predicted value and the corresponding actual spectral value, the calculated errors providing a coded representation of the received spectral value stream and wherein said errors can be recombined with predicted spectral values to obtain reconstructed spectral values.
  • apparatus for decoding an audio electrical signal encoded using the apparatus of the above third aspect of the present invention, the apparatus comprising: an input for receiving a sequence of error values corresponding to the coded audio signal; and signal processing means for separating said sequence of values into separate spectral component streams and for determining for each error value a corresponding predicted spectral value a set of prediction coefficients, the signal processing means being arranged to calculate the prediction coefficients using covariances of a predetermined number of previously determined consecutive reconstructed spectral values, the signal processing means being further arranged to combine each error value with the corresponding predicted spectral value to provide a reconstructed spectral value and to substantially reconstruct said audio signal by combining and frequency-to-time transforming the reconstructed spectral values of all of the sub- bands.
  • a communications system comprising in combination the apparatus of the third and fourth aspect of the present invention.
  • a mobile communication device comprising apparatus according to the third and fourth aspect of the present invention.
  • Figure 1 shows schematically apparatus for coding an audio signal using backward adaptive prediction according to an embodiment of the present invention
  • Figure 2 shows schematically apparatus for decoding an audio signal encoded with the apparatus of Figure 1 ;
  • Figure 3 shows a mobile telephone incorporating the apparatus of Figures 1 and 2.
  • a pulse code modulated (PCM) audio input signal g(t) to be coded is provided at the input to a first signal processing unit 1 of a coding apparatus.
  • This first unit 1 is arranged to transform the input signal g(t) from the time to the frequency domain on a frame by frame basis, each frame n consisting of 2048 sample values and adjacent frames having a 50% overlap.
  • the unit 1 employs a modified discrete cosine transform (MDCT) to transform the signal into the frequency domain such that the output of the unit 1 consists of 1024 separate streams of spectral values X j (n), each stream j corresponding to a different spectral component.
  • MDCT modified discrete cosine transform
  • other transform methods may be used, e.g. a Fourier transform.
  • Each stream of data values X j (n) is provided to the corresponding input of a backward adaptive predictor 2, the operation of which is described in detail below.
  • the predictor 2 calculates a set of prediction coefficients a j (n) using subsequently derived reconstructed quantized spectral values, in turn derived from previously received spectral values of that stream.
  • the prediction coefficients are in turn used to calculate an error value e (n) for the spectral value.
  • the error values for each stream are provided to the input of a quantiser 3 which is arranged to generate quantized errors e y (n) for subsequent digital transmission.
  • the quantized errors e (n) are provided to a multiplexer 4, which generates a multiplexed error signal 9 for transmission, and are also fed back to the predictor 2.
  • a further signal processing unit 5 is also provided for controlling the operation of the signal processing unit 1 and the quantiser 3 in dependence upon the psychoacoustic characteristics of the input audio signal g(t).
  • the operation of this unit is conventional and will not be described in detail here.
  • x( ⁇ ) , x(n) , and x(n) are the input signal to the predictor 2, a predictor output signal, and a reconstructed quantized signal, and e(n) and e (n) are a prediction error signal and a quantized prediction error signal.
  • the output signal of the predictor 2 x(n) is calculated by:
  • the linear predictors can be obtained by solving the normal equation.
  • a least squared algorithm is presented to estimate the linear predictor coefficients sample by sample.
  • the least squared method often gives better linear prediction coefficient estimation than the autocorrelation method especially when the number of available data is small. It will be shown in the following that when the order of the predictor is low, in particular only two, the complexity of the least squared algorithm is comparable to or less than that of the adaptive lattice algorithm of the prior art.
  • the reconstructed quantized signal is denoted by x( ) .
  • the covariances of the reconstructed signal are computed by
  • linear prediction coefficients are derived from a predetermined or fixed, relatively small, number of previous spectral values. Calculation of the coefficients is not dependent upon every previously received spectral value.
  • bandwidth expansion can be performed after the linear prediction coefficients are obtained.
  • ⁇ 0 l .
  • the bandwidth expansion operation replaces each ⁇ ( by ⁇ ' , , where ⁇ is a constant slightly less than unity.
  • the covariance functions are updated sample by sample.
  • the linear prediction coefficients can also be obtained sample by sample by solving the normal equation.
  • the linear prediction coefficients can be calculated less frequently. For example, the linear prediction coefficients may be calculated once every two samples.
  • the loss of the average prediction gain is negligible.
  • the loss of the prediction gain is clearly noticeable upon occurrence of a transient in the audio signal to be coded.
  • a transient detector 10 is therefore included which switches the predictor from a normal low coefficient update rate (e.g. every second spectral value) to a high update rate (e.g. every spectral value) when a transient is detected.
  • the high update rate may be maintained for a short period after detection of the transient.
  • N s is the number of scalefactor bands.
  • G compensates the additional bit need for the predictor side information, i.e., G > T (dB) or prediction gain does not drop dramatically, i.e., G ⁇ 86 "' - G Prcv '°" s ⁇ T 2 (dB)
  • the complete side information is transmitted and the predictors which produce positive gains are switched on: otherwise, the predictors are not used, which also means that the transient comes.
  • the backward adaptive prediction coefficients are calculated sample by sample. After a certain number of samples, the prediction coefficients are calculated every second sample.
  • Figure 2 illustrates apparatus for decoding a signal encoded using the method described in detail above.
  • the received multiplexed error signal 9 is provided at the input of a demultiplexer 6 which separates the signal into 1024 spectral value streams ⁇ j (n). These streams are then passed to a signal processing unit 7. For each stream, this unit 7 calculates for each error value a predicted or estimated spectral value. A predetermined number of these predicted values are in turn used to calculate linear prediction coefficients to allow the calculation of a predicted value for a current sample. This process is identical to that described for the coding process. A reconstructed spectral value is obtained by combining the received error signal with the corresponding predicted value.
  • the streams of reconstructed spectral values are provided to a further processing unit 8 which carries out an inverse MDCT on the data to substantially regenerate the original audio signal.
  • Figure 3 shows a mobile telephone 11 incorporating in its transmitter, apparatus 12 (corresponding to the apparatus of Figure 1) for coding a radio telephone signal using the coding method described above.
  • the telephone also incorporates in its receiver, apparatus 13 (corresponding to the apparatus of Figure 2) for decoding a received encoded telephone signal.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)

Abstract

A method of coding an audio electrical signal using backward adaptive prediction. A first time frame of the audio electrical signal to be coded is received and transformed into the frequency domain using a modified discrete cosine transform (MDCT). The resulting frequency spectrum has 1024 spectral components. Subsequent time frames of the audio electrical signal are then received and the MDCT is applied to each in turn so as to generate a stream of spectral data values for each spectral component. For each stream, a set of prediction coefficients is calculated for each spectral value using a predetermined number of previously received consecutive spectral values of the stream. Using the set of linear prediction coefficients, a predicted spectral value is generated and the error between the predicted spectral value and the corresponding actual spectral value calculated. The calculated errors provide a coded representation of the spectral value stream.

Description

Audio Coding Method and Apparatus
The present invention relates to a method for coding and decoding electronic signals and to apparatus for carrying out such a method. It is well known that the transmission of data in digital form provides for increased signal to noise ratios and increased information capacity along the transmission channel. There is however a continuing desire to further increase channel capacity by compressing digital signals to an ever greater extent. In relation to audio signals, two basic compression principles are conventionally applied. The first of these involves removing the statistical or deterministic redundancies in the source signal whilst the second involves suppressing or eliminating from the source signal elements which are redundant in so far as human perception is concerned. Recently, the latter principle has become predominant in high quality audio applications and typically involves the separation of an audio signal into frequency components (sometimes called 'sub-bands'), each of which is analysed and quantized with a quantisation accuracy determined to remove data irrelevancy (to the listener). The ISO (International Standards Organisation) MPEG (Moving Pictures Expert Group) audio coding standard and other audio coding standards employ and further define this principle. However, MPEG (and other standards) also employs a technique known as 'adaptive prediction' to produce a further reduction in data rate.
A particular form of adaptive prediction is known as 'backward adaptive lattice prediction'. Fuchs et al, 'Improving MPEG Audio Coding by Backward Adaptive Linear Stereo Prediction', AES Convention, New York', Preprint 4086 Oct. 1995, describes one such backward adaptive lattice prediction algorithm. For each spectral value (the 'current' value) of each frequency component, backward adaptive lattice prediction generates a set of prediction coefficients in the coder from the previously calculated spectral values of that component (via the intermediate calculation of quantized spectral values). These coefficients are then used to predict the value of the current spectral value. The error between the current spectral value and the predicted spectral value is determined and it is this error value (after quantisation) which is transmitted to the receiver. It will be appreciated that at any given time, the current prediction coefficients have effectively been derived from all previously received sample values. At the receiver, the coefficients are similarly calculated and reconstructed spectral values obtained by combining the predicted spectral values with the received error values.
In certain algorithms employing backward adaptive prediction, it is often the case that a measure of the compression achieved is determined during the compression process and the error values sent only if positive compression gain is achieved. If not, then the actual quantized frequency component signals are transmitted instead. The new MPEG-2 AAC standard employs psychoacoustic modeling and backward adaptive linear prediction with 1024 frequency components. It is envisaged that the new MPEG-4 VM standard will have similar requirements. However, such a large number of frequency components results in a large computational overhead due to the complexity of the prediction algorithm and also requires the availability of large areas of memory to store the calculated coefficients. Additionally, with backward adaptive lattice prediction, even when the predictors are turned 'off' (e.g. when no compression advantage can be obtained by transmitting the error values), the decoder must continue to determine the coefficients so that the predictors can be turned 'on' again when required without any temporary degradation in performance. This provides an additional computation overhead.
It is an object of the present invention to overcome or at least mitigate one or more of the above disadvantages. This object is achieved by utilising a backward adaptive prediction algorithm which acts upon a relatively large number of frequency components of an audio signal to be coded and which calculates prediction coefficients for a component from a predetermined number of previously received sample values of that component.
According to a first aspect of the present invention there is provided a method of coding an audio electrical signal using backward adaptive prediction, the method comprising the steps of:
(a) receiving a first time frame of an audio electrical signal to be coded;
(b) transforming the time frame into the frequency domain to generate a frequency spectrum having 512 or more spectral components; (c) receiving subsequent time frames of said audio electrical signal and repeating step (b) for these frames in sequence to generate a stream of spectral data values for each spectral component; (e) for each said stream, calculating a set of prediction coefficients for each spectral value using the covariances of a predetermined number of previously determined reconstructed spectral values of the stream, using said set of prediction coefficients to generate a predicted spectral value, and calculating the error between the predicted spectral value and the corresponding actual spectral value, wherein the calculated errors provide a coded representation of the spectral value stream and said errors can be recombined with predicted spectral values to obtain reconstructed spectral values.
The method of the present invention does not directly calculate a set of prediction coefficients from all preceding spectral components as is the case with conventional backward adaptive prediction algorithms. That is to say that the prediction coefficients are recalculated for each spectral value and are not merely adapted from the previously calculated set. Thus, during periods when the predictor is turned off, there is no requirement to continue updating the coefficients at the decoder. It has been discovered that, whilst backward adaptive prediction algorithms which calculate prediction coefficients from the covariances of a predetermined number of previous spectral values are generally not suitable for coding audio signals subdivided into a relatively small number of frequency sub-bands (e.g. 32), such prediction algorithms are appropriate when the audio signal is sub-divided into a relatively large number of frequency sub-bands (e.g.1024 as defined in the draft MPEG-4 standard). This is because, when a large number of sub-bands are defined, the order of the prediction algorithm (that is the number of prediction coefficients) can be low and algorithms embodying the present invention offer high performance and are computationally efficient for low orders. Preferably, the prediction order is one or two. More preferably, the prediction order is two.
Preferably, said predetermined number of previously received consecutive spectral values are used to derive a corresponding number of quantized spectral values. It is then the quantized values which are used to calculate said prediction coefficients. Preferably, the time windows taken from the audio signal are overlapping. For example, each window may contain 2048 sample points with adjacent window having a 50% overlap. However, the windows may also be contiguous. In certain embodiments of the invention, a new set of prediction coefficients may be calculated for each and every spectral value. However, in other embodiments it may be more computationally efficient to recalculate the prediction coefficients for only every second or third (or other multiple) spectral value and to use the same coefficients for several consecutive spectral values. It may also be appropriate to provide for switching between a low coefficient update rate (e.g. every second value) and a high update rate (e.g. for every spectral value) immediately upon detection of a transient in the audio signal.
The lower limit on the predetermined number of previously received sample points used to calculate each set of prediction coefficients, is determined by the coding quality required. Preferably however, the number is four or more. The upper limit on this number is determined by memory and computational constraints. Preferably the number is ten or less. More preferably the predetermined number is six.
Any suitable method for evaluating the prediction coefficients may be used, e.g. an autocorrelation method. However, it has been found that the least squares method is particularly advantageous.
Preferably, the prediction coefficients used to calculate predicted spectral values are linear prediction coefficients.
It will be appreciated that the present invention is intended for use with psychoacoustic compensation and that quantisation of the error signals may be controlled accordingly.
According to a second aspect of the present invention there is provided a method of decoding an audio electrical signal encoded using the method of the above first aspect, the decoding method comprising the steps of: receiving as an input signal a sequence of error values corresponding to the coded audio signal and separating these values into spectral component streams; for each stream, determining a corresponding predicted spectral component value for each error value using a set of prediction coefficients, the prediction coefficients being calculated using covariances of a predetermined number of previously determined consecutive predicted spectral component values for that stream, and combining the error value and the predicted spectral value to provide a reconstructed spectral value; and substantially reconstructing said audio signal by combining and frequency-to- time transforming the reconstructed spectral values of all of the streams.
It will be appreciated that the specific implementation details of the coding method will to a large extent determine the implementation details of the decoding method, e.g. prediction order.
According to a third aspect of the present invention there is provided apparatus for coding an audio electrical signal using backward adaptive prediction, the apparatus comprising: an input for receiving an audio electrical signal to be coded; a time-to-frequency domain transformer for transforming sequentially received time frames of the received signal from the time domain to the frequency domain to provide frequency spectra having 512 or more spectral components; signal processing means associated with each spectral component for receiving as a stream the associated spectral values, for calculating for each spectral value a set of prediction coefficients using covariances of a predetermined number of previously reconstructed spectral values, for using said set of prediction coefficients to generate a predicted spectral value, and for calculating the error between the predicted value and the corresponding actual spectral value, the calculated errors providing a coded representation of the received spectral value stream and wherein said errors can be recombined with predicted spectral values to obtain reconstructed spectral values.
According to a fourth aspect of the present invention there is provided apparatus for decoding an audio electrical signal encoded using the apparatus of the above third aspect of the present invention, the apparatus comprising: an input for receiving a sequence of error values corresponding to the coded audio signal; and signal processing means for separating said sequence of values into separate spectral component streams and for determining for each error value a corresponding predicted spectral value a set of prediction coefficients, the signal processing means being arranged to calculate the prediction coefficients using covariances of a predetermined number of previously determined consecutive reconstructed spectral values, the signal processing means being further arranged to combine each error value with the corresponding predicted spectral value to provide a reconstructed spectral value and to substantially reconstruct said audio signal by combining and frequency-to-time transforming the reconstructed spectral values of all of the sub- bands.
According to a fifth aspect of the present invention there is provided a communications system comprising in combination the apparatus of the third and fourth aspect of the present invention.
According to a sixth aspect of the present invention there is provided a mobile communication device comprising apparatus according to the third and fourth aspect of the present invention. For a better understanding of the present invention and in order to show how the same may be carried into effect reference will now be made, by way of example, to the accompanying drawings, in which:
Figure 1 shows schematically apparatus for coding an audio signal using backward adaptive prediction according to an embodiment of the present invention; Figure 2 shows schematically apparatus for decoding an audio signal encoded with the apparatus of Figure 1 ; and
Figure 3 shows a mobile telephone incorporating the apparatus of Figures 1 and 2.
With reference to Figure 1 , a pulse code modulated (PCM) audio input signal g(t) to be coded is provided at the input to a first signal processing unit 1 of a coding apparatus. This first unit 1 is arranged to transform the input signal g(t) from the time to the frequency domain on a frame by frame basis, each frame n consisting of 2048 sample values and adjacent frames having a 50% overlap. More particularly, the unit 1 employs a modified discrete cosine transform (MDCT) to transform the signal into the frequency domain such that the output of the unit 1 consists of 1024 separate streams of spectral values Xj(n), each stream j corresponding to a different spectral component. It is noted that other transform methods may be used, e.g. a Fourier transform.
Each stream of data values Xj(n) is provided to the corresponding input of a backward adaptive predictor 2, the operation of which is described in detail below. In general terms, for each spectral value Xj(n) of each stream, the predictor 2 calculates a set of prediction coefficients aj(n) using subsequently derived reconstructed quantized spectral values, in turn derived from previously received spectral values of that stream. The prediction coefficients are in turn used to calculate an error value e (n) for the spectral value. The error values for each stream are provided to the input of a quantiser 3 which is arranged to generate quantized errors ey(n) for subsequent digital transmission. The quantized errors e (n) are provided to a multiplexer 4, which generates a multiplexed error signal 9 for transmission, and are also fed back to the predictor 2.
A further signal processing unit 5 is also provided for controlling the operation of the signal processing unit 1 and the quantiser 3 in dependence upon the psychoacoustic characteristics of the input audio signal g(t). The operation of this unit is conventional and will not be described in detail here.
For each spectral component j, x(ή) , x(n) , and x(n) are the input signal to the predictor 2, a predictor output signal, and a reconstructed quantized signal, and e(n) and e (n) are a prediction error signal and a quantized prediction error signal. The set of prediction coefficients can be represented by: a(n) = [al (n),a2(n),- - -,ap(n)
which is time dependent and where superscript T represents the Transpose. The output signal of the predictor 2 x(n) is calculated by:
P x(n) — a(n)τx(n) - ^α, («) ( -
.=1 where x(n) = [x(n - l),x(n - 2),- - - ,x(n - P)f
and P is the prediction order, i.e. the number of coefficients. The predictor error is e(n) = x(n) - x(n)
and the reconstructed quantized signal is x (n) - x(n) + e («)
The calculation of the predictor coefficients is based on minimizing the mean square prediction error. a(«) can be expressed as
Figure imgf000010_0001
where R(n) = E[x(n)x7 '(«)] and r(n) = E[3c(n)x(n)] and the symbol E represents the
Expectation.
It will be appreciated that once the autocorrelation functions r(n) are obtained, the linear predictors can be obtained by solving the normal equation. However, here a least squared algorithm is presented to estimate the linear predictor coefficients sample by sample. The least squared method often gives better linear prediction coefficient estimation than the autocorrelation method especially when the number of available data is small. It will be shown in the following that when the order of the predictor is low, in particular only two, the complexity of the least squared algorithm is comparable to or less than that of the adaptive lattice algorithm of the prior art.
Assume again that the reconstructed quantized signal is denoted by x( ) . For a prediction order of two and a block length of L, the covariances of the reconstructed signal are computed by
ro,o + l)J(B - i)
Figure imgf000010_0002
L-\ L-\ r, = ∑5?(n - ι + 2)5? (n - i) , r2 = ∑5?(n - ι + 2)5?(« - ι + l) ι=2 1=2
An efficient algorithm would be
L-2 templ — _, x ' (n — i) , r00 •= 5* 2 (n — L — 1) + tem , , r- , = temp, + x2 (n — 1) ι=2 L-2 temp2 = 2_,x{n — i + 1)5? (n — i) , r0 , = r, 0 = x(n — L + Y)x(n - L + 2) + temp2
1=2
Δ-l r2 = tem 2 + 3- (n - \)x(n) , r- = 2_lx(n - i + 2)x(n - i) ι=2 With these covariances, the two linear predictor coefficients can be calculated as follows:
Figure imgf000011_0001
It will be appreciated that the linear prediction coefficients are derived from a predetermined or fixed, relatively small, number of previous spectral values. Calculation of the coefficients is not dependent upon every previously received spectral value.
In order to enhance the robustness of the backward adaptive prediction against channel errors and numerical round-off errors, bandwidth expansion can be performed after the linear prediction coefficients are obtained. Let the linear prediction coefficients calculated by the above equations be α, ,/ = 0,1,2. where α0 = l . The bandwidth expansion operation replaces each α( by γ' , , where γ is a constant slightly less than unity.
As can be seen from the previous section, the covariance functions are updated sample by sample. Correspondingly, the linear prediction coefficients can also be obtained sample by sample by solving the normal equation. However, in order to save computation, the linear prediction coefficients can be calculated less frequently. For example, the linear prediction coefficients may be calculated once every two samples. The loss of the average prediction gain is negligible. However, the loss of the prediction gain is clearly noticeable upon occurrence of a transient in the audio signal to be coded. A transient detector 10 is therefore included which switches the predictor from a normal low coefficient update rate (e.g. every second spectral value) to a high update rate (e.g. every spectral value) when a transient is detected. The high update rate may be maintained for a short period after detection of the transient.
Assume that G, denotes the prediction gain in scalefactor band / . If G, > 0, the predictor in this subband can be switched on depending on the overall prediction gain, which is calculated as follows
G = ∑ G,
1=1 & (C->0) where Ns is the number of scalefactor bands. If G compensates the additional bit need for the predictor side information, i.e., G > T (dB) or prediction gain does not drop dramatically, i.e., G^86"' - GPrcv'°"s < T2 (dB), the complete side information is transmitted and the predictors which produce positive gains are switched on: otherwise, the predictors are not used, which also means that the transient comes. After the transient frames are detected, the backward adaptive prediction coefficients are calculated sample by sample. After a certain number of samples, the prediction coefficients are calculated every second sample.
Figure 2 illustrates apparatus for decoding a signal encoded using the method described in detail above. The received multiplexed error signal 9 is provided at the input of a demultiplexer 6 which separates the signal into 1024 spectral value streams βj(n). These streams are then passed to a signal processing unit 7. For each stream, this unit 7 calculates for each error value a predicted or estimated spectral value. A predetermined number of these predicted values are in turn used to calculate linear prediction coefficients to allow the calculation of a predicted value for a current sample. This process is identical to that described for the coding process. A reconstructed spectral value is obtained by combining the received error signal with the corresponding predicted value. The streams of reconstructed spectral values are provided to a further processing unit 8 which carries out an inverse MDCT on the data to substantially regenerate the original audio signal.
Figure 3 shows a mobile telephone 11 incorporating in its transmitter, apparatus 12 (corresponding to the apparatus of Figure 1) for coding a radio telephone signal using the coding method described above. The telephone also incorporates in its receiver, apparatus 13 (corresponding to the apparatus of Figure 2) for decoding a received encoded telephone signal.

Claims

Claims
1. A method of coding an audio electrical signal using backward adaptive prediction, the method comprising the steps of:
(a) receiving a first time frame of an audio electrical signal to be coded; (b) transforming the time frame into the frequency domain to generate a frequency spectrum having 512 or more spectral components;
(c) receiving subsequent time frames of said audio electrical signal and repeating step (b) for these frames in sequence to generate a stream of spectral data values for each spectral component; (e) for each said stream, calculating a set of prediction coefficients for each spectral value using the covariances of a predetermined number of previously determined reconstructed spectral values of the stream, using said set of prediction coefficients to generate a predicted spectral value, and calculating the error between the predicted spectral value and the corresponding actual spectral value, wherein the calculated errors provide a coded representation of the spectral value stream and said errors can be recombined with predicted spectral values to obtain reconstructed spectral values.
2. A method according to claim 1 , wherein the prediction order is two.
3. A method according to claim 1 or 2 and comprising recalculating the prediction coefficients only after receipt of multiple spectral values and using the same coefficients for several consecutive spectral values.
4. A method according to claim 3, wherein said multiple is two.
5. A method according to claim 3 or 4 and comprising switching between a low coefficient update rate and a high update rate immediately upon detection of a transient in the audio signal to be coded.
6. A method according to any one of the preceding claims, wherein said predetermined number of spectral values is four or more.
7. A method according to any one of the preceding claims, wherein said predetermined number of spectral values is ten or less.
8. A method according to any one of the preceding claims, wherein a least squares method is used for evaluating the prediction coefficients.
9. A method according to claim 8 when appended to claim 2, wherein said covariances are determined as:
Z--1 L-l L-l ro,o = ∑x 2 (n - i) , r, , = ∑*2(n - / + l) , r0 1 = r, 0 = ∑ (« - i + l)x(n - ι) ι=2 ι=2 1=2
r, + 2)x(n - i + l)
Figure imgf000014_0001
L-2 tempx = 2_lx 2 {n - i) , r00 = 5?2 (n - L - 1) + templ , r, , = temp, + 5? 2 (n - 1) r=2
temp2 = , r0 , = r, 0 = x(n ΓÇö L + l)x(n ΓÇö L + 2) + temp2
Figure imgf000014_0002
L-\ r2 = temp2 + x{n - 1)5* (n) Γûá ri = Γêæ (" ΓÇö i + 2)5* (n ΓÇö i) . ╬╣=2
10. A method according to claim 9, wherein the prediction coefficients are determined according to: r rl - r0,\ r2
,o,or╬╣,╬╣ - r '02,1 '
r0,0r2 _ - ^o,╬╣^╬╣ a2 = Γûá ro,or╬╣,╬╣ ro,╬╣
11. A method of decoding an audio electrical signal encoded, the decoding method comprising the steps of: receiving as an input signal a sequence of error values corresponding to the coded audio signal and separating these values into spectral component streams; for each stream, determining a corresponding predicted spectral component value for each error value using a set of prediction coefficients, the prediction coefficients being calculated using covariances of a predetermined number of previously determined consecutive predicted spectral component values for that stream, and combining the error value and the predicted spectral value to provide a reconstructed spectral value; and substantially reconstructing said audio signal by combining and frequency-to- time transforming the reconstructed spectral values of all of the streams.
12. Apparatus for coding an audio electrical signal using backward adaptive prediction, the apparatus comprising: an input for receiving an audio electrical signal to be coded; a time-to-frequency domain transformer for transforming sequentially received time frames of the received signal from the time domain to the frequency domain to provide frequency spectra having 512 or more spectral components; signal processing means associated with each spectral component for receiving as a stream the associated spectral values, for calculating for each spectral value a set of prediction coefficients using covariances of a predetermined number of previously reconstructed spectral values, for using said set of prediction coefficients to generate a predicted spectral value, and for calculating the error between the predicted value and the corresponding actual spectral value, the calculated errors providing a coded representation of the received spectral value stream and wherein said errors can be recombined with predicted spectral values to obtain reconstructed spectral values.
13. Apparatus for decoding an audio electrical signal encoded, the apparatus comprising: an input for receiving a sequence of error values corresponding to the coded audio signal; and signal processing means for separating said sequence of values into separate spectral component streams and for determining for each error value a corresponding predicted spectral value a set of prediction coefficients, the signal processing means being arranged to calculate the prediction coefficients using covariances of a predetermined number of previously determined consecutive reconstructed spectral values, the signal processing means being further arranged to combine each error value with the corresponding predicted spectral value to provide a reconstructed spectral value and to substantially reconstruct said audio signal by combining and frequency-to-time transforming the reconstructed spectral values of all of the sub- bands.
14. A communications system comprising in combination the apparatus of claims 12 and 13.
15. A mobile communication device comprising in combination the apparatus of claims 12 and 13.
PCT/FI1998/000029 1997-02-07 1998-01-15 Audio coding method and apparatus WO1998035447A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU56648/98A AU5664898A (en) 1997-02-07 1998-01-15 Audio coding method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI970553 1997-02-07
FI970553A FI970553A (en) 1997-02-07 1997-02-07 Audio coding method and device

Publications (2)

Publication Number Publication Date
WO1998035447A2 true WO1998035447A2 (en) 1998-08-13
WO1998035447A3 WO1998035447A3 (en) 1998-11-19

Family

ID=8548146

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI1998/000029 WO1998035447A2 (en) 1997-02-07 1998-01-15 Audio coding method and apparatus

Country Status (9)

Country Link
JP (1) JPH10260699A (en)
CN (1) CN1202513C (en)
AU (1) AU5664898A (en)
DE (1) DE19804584A1 (en)
FI (1) FI970553A (en)
FR (1) FR2759510A1 (en)
GB (1) GB2322776B (en)
SE (1) SE9800338L (en)
WO (1) WO1998035447A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7610195B2 (en) 2006-06-01 2009-10-27 Nokia Corporation Decoding of predictively coded data using buffer adaptation

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8332216B2 (en) 2006-01-12 2012-12-11 Stmicroelectronics Asia Pacific Pte., Ltd. System and method for low power stereo perceptual audio coding using adaptive masking threshold
US8665945B2 (en) * 2009-03-10 2014-03-04 Nippon Telegraph And Telephone Corporation Encoding method, decoding method, encoding device, decoding device, program, and recording medium
CN106409299B (en) * 2012-03-29 2019-11-05 华为技术有限公司 Signal coding and decoded method and apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0673014A2 (en) * 1994-03-17 1995-09-20 Nippon Telegraph And Telephone Corporation Acoustic signal transform coding method and decoding method
US5473727A (en) * 1992-10-31 1995-12-05 Sony Corporation Voice encoding method and voice decoding method
EP0692881A1 (en) * 1993-11-09 1996-01-17 Sony Corporation Quantization apparatus, quantization method, high efficiency encoder, high efficiency encoding method, decoder, high efficiency encoder and recording media

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02131038A (en) * 1988-11-10 1990-05-18 Pioneer Electron Corp Signal transmitter
DE19526366A1 (en) * 1995-07-20 1997-01-23 Bosch Gmbh Robert Redundancy reduction method for coding multichannel signals and device for decoding redundancy-reduced multichannel signals
GB2318029B (en) * 1996-10-01 2000-11-08 Nokia Mobile Phones Ltd Audio coding method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5473727A (en) * 1992-10-31 1995-12-05 Sony Corporation Voice encoding method and voice decoding method
EP0692881A1 (en) * 1993-11-09 1996-01-17 Sony Corporation Quantization apparatus, quantization method, high efficiency encoder, high efficiency encoding method, decoder, high efficiency encoder and recording media
EP0673014A2 (en) * 1994-03-17 1995-09-20 Nippon Telegraph And Telephone Corporation Acoustic signal transform coding method and decoding method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7610195B2 (en) 2006-06-01 2009-10-27 Nokia Corporation Decoding of predictively coded data using buffer adaptation

Also Published As

Publication number Publication date
AU5664898A (en) 1998-08-26
FR2759510A1 (en) 1998-08-14
GB9802611D0 (en) 1998-04-01
GB2322776B (en) 2002-03-13
DE19804584A1 (en) 1998-08-13
CN1199959A (en) 1998-11-25
CN1202513C (en) 2005-05-18
SE9800338D0 (en) 1998-02-05
SE9800338L (en) 1998-08-08
FI970553A0 (en) 1997-02-07
GB2322776A (en) 1998-09-02
FI970553A (en) 1998-08-08
WO1998035447A3 (en) 1998-11-19
JPH10260699A (en) 1998-09-29

Similar Documents

Publication Publication Date Title
KR100469002B1 (en) Audio coding method and apparatus
US7613603B2 (en) Audio coding device with fast algorithm for determining quantization step sizes based on psycho-acoustic model
US6104996A (en) Audio coding with low-order adaptive prediction of transients
KR100814673B1 (en) audio coding
EP0785631B1 (en) Perceptual noise shaping in the time domain via LPC prediction in the frequency domain
US6064954A (en) Digital audio signal coding
KR101340233B1 (en) Stereo encoding device, stereo decoding device, and stereo encoding method
US6345246B1 (en) Apparatus and method for efficiently coding plural channels of an acoustic signal at low bit rates
US20030233236A1 (en) Audio coding system using characteristics of a decoded signal to adapt synthesized spectral components
JP3237089B2 (en) Acoustic signal encoding / decoding method
US20090204397A1 (en) Linear predictive coding of an audio signal
JPH11509388A (en) Redundancy reduction method at the time of signal encoding and signal decoding apparatus with reduced redundancy
JPH10511243A (en) Apparatus and method for applying waveform prediction to subbands of a perceptual coding system
CA2489443C (en) Audio coding system using characteristics of a decoded signal to adapt synthesized spectral components
EP1507256A1 (en) Acoustic signal encoding method and encoding device, acoustic signal decoding method and decoding device, program, and recording medium image display device
US10170126B2 (en) Effective attenuation of pre-echoes in a digital audio signal
JP2007504503A (en) Low bit rate audio encoding
KR100848370B1 (en) Audio Encoding
US8665914B2 (en) Signal analysis/control system and method, signal control apparatus and method, and program
US6012025A (en) Audio coding method and apparatus using backward adaptive prediction
JP3087814B2 (en) Acoustic signal conversion encoding device and decoding device
KR20060036724A (en) Method and apparatus for encoding/decoding audio signal
WO1998035447A2 (en) Audio coding method and apparatus
JP2008129250A (en) Window changing method for advanced audio coding and band determination method for m/s encoding
JP4721355B2 (en) Coding rule conversion method and apparatus for coded data

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM GW HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM GW HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase in:

Ref country code: JP

Ref document number: 1998533804

Format of ref document f/p: F

122 Ep: pct application non-entry in european phase