WO2002011125A1 - Attenuation of background noise and echoes in audio signal - Google Patents

Attenuation of background noise and echoes in audio signal Download PDF

Info

Publication number
WO2002011125A1
WO2002011125A1 PCT/EP2001/008827 EP0108827W WO0211125A1 WO 2002011125 A1 WO2002011125 A1 WO 2002011125A1 EP 0108827 W EP0108827 W EP 0108827W WO 0211125 A1 WO0211125 A1 WO 0211125A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
procedure
follows
sub
per application
Prior art date
Application number
PCT/EP2001/008827
Other languages
French (fr)
Inventor
László GYIMESI
Rudolf FÖLDVÁRI
Original Assignee
Herterkom Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Herterkom Gmbh filed Critical Herterkom Gmbh
Priority to AU2001282047A priority Critical patent/AU2001282047A1/en
Publication of WO2002011125A1 publication Critical patent/WO2002011125A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02168Noise filtering characterised by the method used for estimating noise the estimation exclusively taking place during speech pauses

Definitions

  • the object of the invention is a signal cleaning procedure to isolate audio-frequency signals from the background noise.
  • the signal to be cleaned is led into a filter bank and the signals separated into sub-bands will be sent into a transformer unit where a transformation will be performed upon them and this way a cleaned signal will be produced.
  • Patent No. 183.491 can essentially be used as a step of a noise cleaning procedure but it cannot be used appropriately to filter different sounds.
  • the objective of the invention was to eliminate the deficiencies and the creation of such a version that would be able to filter voice and other sounds appropriately from a noisy background.
  • the basis of the invention was the idea that if a non-linear impact model basically consisting of two components is used where the first part being a filter bank consisting of a Zwicker filter series, while the second part being a non-linear transformation producing a generalized amplitude/frequency function, then the task can be performed.
  • the procedure ⁇ to isolate audio-frequency signals from the background noise as per the invention during which the signal to be cleaned is led to a filter bank and the signals separated into sub-bands will be sent into a transformer unit where a transformation will be performed upon them and this way a cleaned signal will be produced — is based on the principle that the instantaneous amplitude or the approximated value of the amplitude of the signal of each sub-band appearing at the output of the filter bank is produced first in the transformer unit, then the instantaneous amplitude values produced this way will be averaged and the signals of the sub-bands will be weighted using the averages, and finally the weighted signals will be summed up and the cleaned signal will be determined this way.
  • Another feature of the invention can be that a Zwicker filter series is used as a filter bank.
  • the signals of the sub-bands are led into a squaring high pass filter (9) and the slowly changing component is separated using the filter and the period is determined from this signal using a null detector (10).
  • the subsequent periods will be determined using the period created by sub-bands, the weighted average of these will be created and the time functions created this way will be summed up.
  • the filtered signal of the squared amplitude of several sub-bands filtered by a high pass filter will be summed and this sum will be used.
  • the degree of curvature of the non-linear characteristics used in determining the weighting factors is controlled by the minimum to be found in a time period in the average of each channel — expediently in the 0.5 ⁇ 1 s time period. In the case of another use of the procedure, the degree of curvature of the non-linear characteristics used in determining the weighting factors is controlled by the minimum and maximum to be found in a time period of the average of each channel.
  • corrections of the amplitude and the delay time will be applied in the particular channels in such a way that the generating and evaluating signal of the standard signal will be led to a loudspeaker, then the amplitude and run time data of the received signal of the surrounding space will be measured at the midrange of the filter series and the values of the corrections will be generated using the measured values and repeating the procedure one or more times (iteration).
  • the microphone producing the input signal and the loudspeaker fed with the output signal are placed in the same space.
  • the instantaneous parameters are generated from the input signal using quadrature modulation and low pass filters in the environment of the zero frequency, then the instantaneous parameters acquired this way will be averaged and using another modulation the cleaned sub-bands will be acquired from these averages and the cleaned output signal will be acquired from summing these up.
  • the input signal will be filtered by a low pass filter bank implementing the asymmetry of the Zwicker filter series and then quadrature modulation will be applied on it.
  • the instantaneous parameters will be separated to DC and AC components by an alternate filter, then the instantaneous parameters of the AC components will be generated, that is, the transformation will be repeated on the result of the GAFT.
  • the advantage of the procedure as per the invention is that musical sounds and other quasi-periodic audio-frequency signals can be separated from the noisy environment with favourable results by using it. This way the same procedure and tool kit can be used in a wide range and in a flexible way for different purposes.
  • the efficiency of hearing aids can be improved, voice and musical sounds can be separated from the background noise but the procedure is also able to enhance the quality of recordings produced in moderate or poor quality studios as well as to restore old sound recordings.
  • the procedure is able to separate quasi-periodic signals, for instance, engine sounds of vehicles from the background noise so as to determine their movement and in addition it also can be used to examine rotating components, e.g. bearings and wheels, and to show the anomalies in their operation.
  • the principles of the procedure, according to the invention, are the creation and application of a non-linear hearing model on the circuit layout level, in which the filter bank, differently to the others, is constructed suitably of Zwicker filters.
  • Zwicker published in 1957, that the human hearing perceives the loudness differently inside of a critical band, than outside of this band.
  • Zwicker and Feldtkeller determined the steepness of cutting territory of the filters from the concealment occurrence.
  • These filters are approximately a third wide, asymmetrical and of a exceptionally steep characteristic on the upper part. This filtering effect takes place already on the tympanic membrane of the middle ear.
  • the data of the filter row is fixed by international standard.
  • the resultant characteristics of the 25 filters covers the entire audible spectrum, for the processing of speech approximately 14-15 is enough, while for studio quality 18 filters are necessary.
  • the output signals of the filters are transformed, namely we generate the instantaneous parameters, which in principle means the function of the instantaneous amplitude and the instantaneous frequency, which we define as follows:
  • x(t) is the output signal of the band-pass filter
  • y(t) is the Hilbert pair
  • x'(t) and y'(t) are the time differential quotients.
  • This non-linear transformation is called generalized amplitude and frequency transformation.
  • the result or the iterative value of the above transformation can be determined in several ways.
  • the transformation can be interpreted as the simultaneous implementation of a mutually independent ideal AM and FM demodulator, therefore the result of the GAFT can be determined or iterated with every AM and FM demodulator.
  • Fig. 1. shows the diagram of the simplest version of the signal cleaning procedure as per the invention.
  • Fig. 2. shows another possible layout of the signal cleaning process.
  • Fig. 3. is the block diagram of the signal cleaning procedure supplemented with echo reduction.
  • Fig. 4. is the draft diagram of the layout required for the separation of quasi-periodic signals.
  • Fig. 5 shows another layout that is able to separate quasi-periodic signals.
  • Fig.l shows the simplest block diagram of the signal cleaning procedure as per the invention.
  • the X(t) signal to be cleaned and led to the input has been broken down into sub-bands using 1 filter bank containing expediently a Zwicker filter series, then the instantaneous amplitude of each sub-band was determined with the unit implementing the 2 non-linear transformation. After determining the instantaneous amplitude of the sub-bands, their average was produced with the 3 low pass filter. Using the averages and the 4 non-linear characteristics without memory, the weighting factors falling between zero and one as appropriate were determined.
  • the signal of the sub-bands delayed with the 6 delay was weighted with these weighting factors, it was multiplied with the 7 multiplier in the case of the present version, then the cleaned signal will be generated using the 8 adder.
  • the degree of curvature of the 4 non-linear characteristics will be set by the 5 extreme value determiner used for determining the minimum or the minimum and maximum.
  • the "as possible” condition can be formulated in various ways. The condition is easiest to manage when the squared average or the expected value of the error is minimized, that is, if we look for the system where the expected value of the square of the
  • the first part in the non-linear impact model namely the 1 filter bank assembled from the Zwicker filter series is still linear, and based upon the solution of the Wiener-Hopf integral equation, the A k (t) weighting factor is to be set to the following value:
  • the value of the weighting factor will be:
  • a weighting function acquired from an arctg characteristics serves the purpose and can be formulated like this:
  • parameters A and B determining the arctg characteristics can also be determined in such a way that the value of the weighting factor be near 1 at the maximum and be near 0 at the minimum, e.g. 0.9 at the maximum and 0.1 at the minimum.
  • the value of the parameters is determined by the ratio of the minimum and the maximum. It is worth mentioning that only the control considering the minimum is more pleasant for the listener but a speech recognition procedure using both the minimum and the maximum may result in a significant improvement.
  • Fig.2 demonstrates another possible layout of the signal cleaning procedure.
  • the 9 squaring high pass filter removes the slowly changing component and determines the A k (t) signal that is the base frequency of the signal, that is, it is the pitch frequency itself.
  • the base frequency of the A k H (t) signal is acquired by using the 10 null detector or the autocorrelation function.
  • the period times determined by channels are identical only in theory or in the case of no noise.
  • the background noise affects the value of the period times detected in the various sub-bands differently. If the noise does not contain a periodic or quasi-periodic component, that is, the noise spectra in the sub-bands are independent, then period times are summed up in the sum of the period times detected by channels, whilst the components causing the difference will be summed up with the square root of their sum of squares — the identical components "as per voltage", the independent components "as per power”.
  • the time functions within the so determined period times provide the time function of each voiced period. These voiced periods do not change much within a voiced phoneme but in case of a heavy background noise the subsequent voiced periods will be different due to the noise.
  • the echoing speech is hardly intelligible and heavily degrades the efficiency of the speech recognition system.
  • the methods described above are primarily able to separate from the noise but controlling with the minimum and the maximum reduces the effect of the echo as well.
  • Fig. 3. shows the layout of the signal cleaning procedure described above extended with echo-reduction.
  • the value of the 12 amplitude correction member and the 13 run time correction member is set by the 14 measuring generator/analyzer unit in such a way that using the Nyquist method and then iteration, the value of the corrections required in the sub-bands is determined by supplying the 15 loudspeaker with the signal and analyzing the received signal. If possible, the loudspeaker should be placed at the location of the source of the useful sound, e.g. the speaker, or in the vicinity of it.
  • Fig. 4. shows the layout that is able to separate quasi-periodic signals.
  • the quadrature components xw(t), y k o(t)
  • Their instantaneous parameters can be generated by the procedure determining the instantaneous parameters of the 17 quadrature components (GAFT) (A k o(t), W k o(t)).
  • GFT the instantaneous parameters of the 17 quadrature components
  • the 18 filter unit designated to filter the instantaneous parameters and with the 19 transformer unit performing the frequency shift and the inverse transformation, and with the summing of the this way generated Y k (t) signals by the 8 adder, the quasi- periodic signal separated from the background noise is created.
  • the asymmetric Zwicker filter bank is not suitable directly for the implementation with quadrature modulation.
  • the requirement of the asymmetric attenuation is realized first by means of the 20 low pass filter bank implementing the asymmetry of the Zwicker filters in such a way that the attenuation of the remaining part be symmetric on the mid-band — as it is shown on Fig.5. — then the remaining attenuation can be realized with quadrature modulation and low pass filtering ⁇ using the procedure already shown on Fig.4.

Abstract

The subject of the invention is a signal cleaning procedure to isolate audio-frequency signals from the background noise. During the procedure, the signal to be cleaned is led into a filter bank and the signals separated into sub-bands will be sent into a transformer unit where a transformation will be performed upon them and this way a cleaned signal will be produced. It is a typical feature of the procedure that the instantaneous amplitude or the approximated value of the amplitude of the signal of each sub-band appearing at the output (1a) of the filter bank (1) is produced first in the transformer unit, then the instantaneous amplitude values produced this way will be averaged and the signals of the sub-bands will be weighted using the averages, and finally the weighted signals will be summed up and the cleaned signal will be determined this way.

Description

ATTENUATION OF BACKGROUND NOISE AND ECHOES IN AUDIO SIGNAL
The object of the invention is a signal cleaning procedure to isolate audio-frequency signals from the background noise. During the procedure, the signal to be cleaned is led into a filter bank and the signals separated into sub-bands will be sent into a transformer unit where a transformation will be performed upon them and this way a cleaned signal will be produced.
Several solutions have already become known for the separation of sounds from a noisy background. The main point of these is that the input signal is led through a filter bank and a transformation will be performed upon the signals of given bandwidths of the signal group separated into sub-bands and this way the cleaned signal will be produced. The solutions described in patents US 4.809.331, HU 183.491, and EP 240.329 are also examples for this.
However, the disadvantage of the solution described in EP Patent No. 240.329 is that essentially it can only be used for speech recognition and the transformation of the separated signals is also not reliable enough. Another deficiency being that a so-called "training" is required for its application that makes the rapid and flexible use of the procedure harder.
The version described in Patent No. 183.491 can essentially be used as a step of a noise cleaning procedure but it cannot be used appropriately to filter different sounds.
The objective of the invention was to eliminate the deficiencies and the creation of such a version that would be able to filter voice and other sounds appropriately from a noisy background.
The basis of the invention was the idea that if a non-linear impact model basically consisting of two components is used where the first part being a filter bank consisting of a Zwicker filter series, while the second part being a non-linear transformation producing a generalized amplitude/frequency function, then the task can be performed.
The perception was also part of the invention that speech always contains pauses of 100-200 ms, that is, the minimum of the signal measured in this period is the background noise itself because the joint value of the voice and the background noise can only be larger than that of the background noise only.
According to the objective set, the procedure ~ to isolate audio-frequency signals from the background noise as per the invention, during which the signal to be cleaned is led to a filter bank and the signals separated into sub-bands will be sent into a transformer unit where a transformation will be performed upon them and this way a cleaned signal will be produced — is based on the principle that the instantaneous amplitude or the approximated value of the amplitude of the signal of each sub-band appearing at the output of the filter bank is produced first in the transformer unit, then the instantaneous amplitude values produced this way will be averaged and the signals of the sub-bands will be weighted using the averages, and finally the weighted signals will be summed up and the cleaned signal will be determined this way.
Another feature of the invention can be that a Zwicker filter series is used as a filter bank.
In one of the versions of the procedure, the signals of the sub-bands are led into a squaring high pass filter (9) and the slowly changing component is separated using the filter and the period is determined from this signal using a null detector (10).
In the case of a different version of the invention, the subsequent periods will be determined using the period created by sub-bands, the weighted average of these will be created and the time functions created this way will be summed up. To determine the period, the filtered signal of the squared amplitude of several sub-bands filtered by a high pass filter will be summed and this sum will be used. The degree of curvature of the non-linear characteristics used in determining the weighting factors is controlled by the minimum to be found in a time period in the average of each channel — expediently in the 0.5÷1 s time period. In the case of another use of the procedure, the degree of curvature of the non-linear characteristics used in determining the weighting factors is controlled by the minimum and maximum to be found in a time period of the average of each channel.
From the point of view of the procedure as per the invention, it may be advantageous to place microphones in different distances from the source of the sound to reduce the echo and the signal of the nearby microphone will be delayed whilst the signal of the remote microphone will be shaped to resemble as much as possible to the delayed signal of the nearby microphone and the parameters required for the equalization will be stored.
In another implementation of the procedure, corrections of the amplitude and the delay time will be applied in the particular channels in such a way that the generating and evaluating signal of the standard signal will be led to a loudspeaker, then the amplitude and run time data of the received signal of the surrounding space will be measured at the midrange of the filter series and the values of the corrections will be generated using the measured values and repeating the procedure one or more times (iteration).
At another version of the invention, the microphone producing the input signal and the loudspeaker fed with the output signal are placed in the same space.
The instantaneous parameters are generated from the input signal using quadrature modulation and low pass filters in the environment of the zero frequency, then the instantaneous parameters acquired this way will be averaged and using another modulation the cleaned sub-bands will be acquired from these averages and the cleaned output signal will be acquired from summing these up.
From the point of view of the procedure, it can also be useful if the input signal will be filtered by a low pass filter bank implementing the asymmetry of the Zwicker filter series and then quadrature modulation will be applied on it. In another implementation of the invention, the instantaneous parameters will be separated to DC and AC components by an alternate filter, then the instantaneous parameters of the AC components will be generated, that is, the transformation will be repeated on the result of the GAFT.
The advantage of the procedure as per the invention is that musical sounds and other quasi-periodic audio-frequency signals can be separated from the noisy environment with favourable results by using it. This way the same procedure and tool kit can be used in a wide range and in a flexible way for different purposes.
Using the procedure of the invention, for example, the efficiency of hearing aids can be improved, voice and musical sounds can be separated from the background noise but the procedure is also able to enhance the quality of recordings produced in moderate or poor quality studios as well as to restore old sound recordings.
It also should be considered as an advantage that the procedure could be used well to actively improve the acoustics of rooms by using echo-suppression implemented with an acoustic chain containing a microphone and loudspeakers.
It is also an advantage that the procedure is able to separate quasi-periodic signals, for instance, engine sounds of vehicles from the background noise so as to determine their movement and in addition it also can be used to examine rotating components, e.g. bearings and wheels, and to show the anomalies in their operation.
The principles of the procedure, according to the invention, are the creation and application of a non-linear hearing model on the circuit layout level, in which the filter bank, differently to the others, is constructed suitably of Zwicker filters. Zwicker published in 1957, that the human hearing perceives the loudness differently inside of a critical band, than outside of this band. Zwicker and Feldtkeller determined the steepness of cutting territory of the filters from the concealment occurrence. These filters are approximately a third wide, asymmetrical and of a exceptionally steep characteristic on the upper part. This filtering effect takes place already on the tympanic membrane of the middle ear. The data of the filter row is fixed by international standard. The resultant characteristics of the 25 filters covers the entire audible spectrum, for the processing of speech approximately 14-15 is enough, while for studio quality 18 filters are necessary.
The output signals of the filters are transformed, namely we generate the instantaneous parameters, which in principle means the function of the instantaneous amplitude and the instantaneous frequency, which we define as follows:
Figure imgf000006_0001
where
Figure imgf000006_0002
In the formulas described x(t) is the output signal of the band-pass filter, y(t) is the Hilbert pair, x'(t) and y'(t) are the time differential quotients. This non-linear transformation is called generalized amplitude and frequency transformation.
The result or the iterative value of the above transformation can be determined in several ways. The transformation can be interpreted as the simultaneous implementation of a mutually independent ideal AM and FM demodulator, therefore the result of the GAFT can be determined or iterated with every AM and FM demodulator.
There is no neurological proof for the further part of the model, that is, each instantaneous parameter is followed by an alternate filter, it is only assumed, however, results of psychophysical examinations can well be explained with this model.
The invention will be described in more detail below using diagrams. The diagrams show the following: Fig. 1. shows the diagram of the simplest version of the signal cleaning procedure as per the invention.
Fig. 2. shows another possible layout of the signal cleaning process.
Fig. 3. is the block diagram of the signal cleaning procedure supplemented with echo reduction.
Fig. 4. is the draft diagram of the layout required for the separation of quasi-periodic signals.
Fig. 5 shows another layout that is able to separate quasi-periodic signals.
Fig.l shows the simplest block diagram of the signal cleaning procedure as per the invention. The X(t) signal to be cleaned and led to the input has been broken down into sub-bands using 1 filter bank containing expediently a Zwicker filter series, then the instantaneous amplitude of each sub-band was determined with the unit implementing the 2 non-linear transformation. After determining the instantaneous amplitude of the sub-bands, their average was produced with the 3 low pass filter. Using the averages and the 4 non-linear characteristics without memory, the weighting factors falling between zero and one as appropriate were determined. The signal of the sub-bands delayed with the 6 delay was weighted with these weighting factors, it was multiplied with the 7 multiplier in the case of the present version, then the cleaned signal will be generated using the 8 adder. The degree of curvature of the 4 non-linear characteristics will be set by the 5 extreme value determiner used for determining the minimum or the minimum and maximum.
The so-called "optimum filter" provides the theoretical basis of this procedure. If a system input receives a noisy signal, that is,
x(t) = s(t) + n(t),
then it is reasonable to choose the parameters of the system in such a way that the output y(t) signal be "as similar to the s(t) signal as possible". The "as possible" condition can be formulated in various ways. The condition is easiest to manage when the squared average or the expected value of the error is minimized, that is, if we look for the system where the expected value of the square of the
e(t) = y(t) - s(t)
error signal is at the minimum. In a general case this minimum cannot be determined, therefore it is reasonable to limit this solution to a linear system. The system acquired from this limitation is a so-called "optimum filter" because the filter is a linear system having a general characteristics. The calculation leads to a Wiener-Hopf integral equation, the solution of which gives the K(w) input/output characteristics of the system to be found that is the following expressed with the density spectrum of the signal and noise power:
Figure imgf000008_0001
The first part in the non-linear impact model, namely the 1 filter bank assembled from the Zwicker filter series is still linear, and based upon the solution of the Wiener-Hopf integral equation, the Ak (t) weighting factor is to be set to the following value:
Figure imgf000008_0002
In this equation the power density spectra are not known but very good estimates can be given for their values. Setting out from the realization that the speech always contains pauses, it is right to assume that the square of the minimum of Ak L(t) is proportional to the noise power, that is,
Figure imgf000008_0003
and additionally, the square of the instantaneous value of Aι (t) is proportional to the sum of the power of the signal and the noise because the signal and the noise are mutually independent processes. Based upon all of these, it can be written that
&(«•) + &(**) = β (i4j (Of. that is,
Figure imgf000009_0001
Let us normalize the value of Afyt) with its minimum, that is, let us introduce the following simplified symbol:
Figure imgf000009_0002
Using the above mentioned and after a few simple transformation, the value of the weighting factor will be:
Figure imgf000009_0003
Experience from examination shows that the curve has a sharp break at the minimum value. As a result of this, the signal of the channel either completely stops or appears with a significant amplitude near the minimum value. It is known from experience that this phenomenon is very disturbing, therefore it is worth choosing the weighting characteristics with a continuous transition. A weighting function acquired from an arctg characteristics serves the purpose and can be formulated like this:
Figure imgf000009_0004
where equality can be reached at any two points with the weighing factor created from the Wiener-Hopf solution and choosing the A and B parameters. Naturally, the theoretical characteristics can be approximated with several functions and equality can be reached at not only two points but the punctuality of the approximation is of not too great importance. It is essential however, that the transition should not contain such a sharp break in the vicinity of the minimum as the theoretical characteristics. The value of the weighting factor can be determined with a different method. It can be assumed that the local maximum is generated in such a way that it is mutually created by the signal, e.g. speech and the background noise, that is, let us have a hypothesis that the signal always contains speech in the vicinity of the maximum values and if the maximum of the signal is greater than the maximum of the noise then this statement is always true and it is assumed that the vicinity of the minimum is determined by the noise only.
Therefore parameters A and B determining the arctg characteristics can also be determined in such a way that the value of the weighting factor be near 1 at the maximum and be near 0 at the minimum, e.g. 0.9 at the maximum and 0.1 at the minimum. Let us mark the maximum of the normalized amplitude with zw r and the normalized minimum is trivially 1 so the following can be written:
Figure imgf000010_0001
Figure imgf000010_0002
solving the set of equations above:
Figure imgf000010_0003
It can be seen in this case that the value of the parameters is determined by the ratio of the minimum and the maximum. It is worth mentioning that only the control considering the minimum is more pleasant for the listener but a speech recognition procedure using both the minimum and the maximum may result in a significant improvement.
Fig.2 demonstrates another possible layout of the signal cleaning procedure. The 9 squaring high pass filter removes the slowly changing component and determines the Ak (t) signal that is the base frequency of the signal, that is, it is the pitch frequency itself. The base frequency of the Ak H(t) signal is acquired by using the 10 null detector or the autocorrelation function. The period times determined by channels are identical only in theory or in the case of no noise.
The background noise affects the value of the period times detected in the various sub-bands differently. If the noise does not contain a periodic or quasi-periodic component, that is, the noise spectra in the sub-bands are independent, then period times are summed up in the sum of the period times detected by channels, whilst the components causing the difference will be summed up with the square root of their sum of squares — the identical components "as per voltage", the independent components "as per power".
If the independence by sub-bands is not valid for the noise, then the procedure does not improve the punctuality of the determination of the period time but it does not spoil it in any way either.
The time functions within the so determined period times provide the time function of each voiced period. These voiced periods do not change much within a voiced phoneme but in case of a heavy background noise the subsequent voiced periods will be different due to the noise.
Further improvement can be acquired in separating the speech from the background noise if the subsequent periods are averaged. Let us mark the time of the subsequent voiced periods with Tj_ι, T1; Tj+1. The (j-l)th and (j+l)th parts of the time function will be stretched or shrunk to the same length as that of the jth part, then all these will be averaged (in a weighted way) and the voiced period of the jth position will be substituted with the averaged time function. Naturally, the averaging can be made for more than three periods but it is not reasonable to use more than 5-6 periods because in this case the changes in the speech are also "equalized" and the speech becomes unnatural. This task is performed by the 11 voiced averager then the cleaned signal is weighted with the Ak (t) weighting factor, e.g. it is multiplied by the 7 multiplier, and these time functions will then be summed up.
Not only the background noise but the echo is disturbing as well, the echoing speech is hardly intelligible and heavily degrades the efficiency of the speech recognition system. The methods described above are primarily able to separate from the noise but controlling with the minimum and the maximum reduces the effect of the echo as well.
Fig. 3. shows the layout of the signal cleaning procedure described above extended with echo-reduction. The value of the 12 amplitude correction member and the 13 run time correction member is set by the 14 measuring generator/analyzer unit in such a way that using the Nyquist method and then iteration, the value of the corrections required in the sub-bands is determined by supplying the 15 loudspeaker with the signal and analyzing the received signal. If possible, the loudspeaker should be placed at the location of the source of the useful sound, e.g. the speaker, or in the vicinity of it.
Fig. 4. shows the layout that is able to separate quasi-periodic signals. Using the 16 quadrature modulator and the difference signal separator, the quadrature components (xw(t), yko(t) ) can easily be generated in the vicinity of the zero frequency. Their instantaneous parameters can be generated by the procedure determining the instantaneous parameters of the 17 quadrature components (GAFT) (Ako(t), Wko(t)). With the 18 filter unit designated to filter the instantaneous parameters and with the 19 transformer unit performing the frequency shift and the inverse transformation, and with the summing of the this way generated Yk(t) signals by the 8 adder, the quasi- periodic signal separated from the background noise is created.
It should be noted here that the asymmetric Zwicker filter bank is not suitable directly for the implementation with quadrature modulation. However, if the requirement of the asymmetric attenuation is realized first by means of the 20 low pass filter bank implementing the asymmetry of the Zwicker filters in such a way that the attenuation of the remaining part be symmetric on the mid-band — as it is shown on Fig.5. — then the remaining attenuation can be realized with quadrature modulation and low pass filtering ~ using the procedure already shown on Fig.4.

Claims

1. The subject of the invention is a signal cleaning procedure to isolate audiofrequency signals from the background noise. During the procedure, the signal to be cleaned is led into a filter bank and the signals separated into sub-bands will be sent into a transformer unit where a transformation will be performed upon them and this way a cleaned signal will be produced and this is characterized by that the instantaneous amplitude or the approximated value of the amplitude of the signal of each sub-band appearing at the output (la) of the filter bank (1) is produced first in the transformer unit, then the instantaneous amplitude values produced this way will be averaged and the signals of the sub-bands will be weighted using the averages, and finally the weighted signals will be summed up and the cleaned signal will be determined this way.
2. The procedure as per Application Item 1 and characterized as follows: a Zwicker filter series is used as a filter bank (1).
3. The procedure as per Application Items 1 or 2 and characterized as follows: the signals of the sub-bands are led into a squaring high pass filter (9) and using this, the slowly changing component is separated, then the period time is determined from this signal using a null detector (10).
4. Any procedure as per Application Items 1,2, or 3 and characterized as follows: subsequent periods are determined using the period times generated by sub-bands, a weighted average of these is calculated and the time functions generated this way are summed up.
5. Any procedure as per Application Items 1 through 4 and characterized as follows: in order to determine the period time, the high pass filtered (9) signals of the squared amplitudes of several sub-bands are summed up and the sum of these is used later on.
6. Any procedure as per Application Items 1 through 5 and characterized as follows: the degree of curvature of the non-linear characteristics (4) used in determining the weighting factors is controlled by the minimum to be found in a time period in the average of each channel ~ expediently in the 0.5÷1 s time period.
7. Any procedure as per Application Items 1 through 5 and characterized as follows: the degree of curvature of the non-linear characteristics (4) used in determining the weighting factors is controlled by the minimum and maximum to be found in a time period of the average of each channel.
8. Any procedure as per Application Items 1 through 7 and characterized as follows: microphones are placed in different distances from the source of the sound to reduce the echo and the signal of the nearby microphone will be delayed whilst the signal of the remote microphone will be shaped to resemble as much as possible to the delayed signal of the nearby microphone and the parameters required for the equalization will be stored.
9. Any procedure as per Application Items 1 through 8 and characterized as follows: corrections of the amplitude and the delay time will be applied in the particular channels in such a way that the generating and evaluating (14) signal of the standard signal will be led to a loudspeaker (15), then the amplitude and run time data of the received signal of the surrounding space will be measured at the midrange of the filter series and the values of the corrections will be generated using the measured values and repeating the procedure one or more times (iteration).
10. Any procedure as per Application Items 1 through 6 and 8 and characterized as follows: the microphone producing the input signal and the loudspeaker fed with the output signal is placed in the same space.
11. Any procedure as per Application Item 1 and characterized as follows: the instantaneous parameters are generated from the input signal using quadrature modulation and low pass filters in the environment of the zero frequency, then the instantaneous parameters acquired this way will be averaged and using another modulation the cleaned sub-bands will be acquired from these averages and the cleaned output signal will be acquired from summing these up.
12. Any procedure as per Application Items 1 through 10 and characterized as follows: the input signal will be filtered first by a low pass filter bank (20) implementing the asymmetry of the Zwicker filter series and then quadrature modulation will be applied on it.
13. Any procedure as per Application Items 1 and 11 and characterized as follows: the instantaneous parameters will be separated to DC and AC components by an alternate filter, then the instantaneous parameters of the AC components will be generated, that is, the trasformation will be repeated on the result of the GAFT.
PCT/EP2001/008827 2000-07-31 2001-07-31 Attenuation of background noise and echoes in audio signal WO2002011125A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001282047A AU2001282047A1 (en) 2000-07-31 2001-07-31 Attenuation of background noise and echoes in audio signal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
HU0003010A HUP0003010A2 (en) 2000-07-31 2000-07-31 Signal purification method for the discrimination of a signal from background noise
HUP0003010 2000-07-31

Publications (1)

Publication Number Publication Date
WO2002011125A1 true WO2002011125A1 (en) 2002-02-07

Family

ID=89978510

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2001/008827 WO2002011125A1 (en) 2000-07-31 2001-07-31 Attenuation of background noise and echoes in audio signal

Country Status (3)

Country Link
AU (1) AU2001282047A1 (en)
HU (1) HUP0003010A2 (en)
WO (1) WO2002011125A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009043066A1 (en) * 2007-10-02 2009-04-09 Akg Acoustics Gmbh Method and device for low-latency auditory model-based single-channel speech enhancement
CN111145770A (en) * 2018-11-02 2020-05-12 北京微播视界科技有限公司 Audio processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5574824A (en) * 1994-04-11 1996-11-12 The United States Of America As Represented By The Secretary Of The Air Force Analysis/synthesis-based microphone array speech enhancer with variable signal distortion
US5933495A (en) * 1997-02-07 1999-08-03 Texas Instruments Incorporated Subband acoustic noise suppression
EP1006510A2 (en) * 1994-03-18 2000-06-07 Mitsubishi Denki Kabushiki Kaisha Signal encoding and decoding system
WO2000041169A1 (en) * 1999-01-07 2000-07-13 Tellabs Operations, Inc. Method and apparatus for adaptively suppressing noise

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1006510A2 (en) * 1994-03-18 2000-06-07 Mitsubishi Denki Kabushiki Kaisha Signal encoding and decoding system
US5574824A (en) * 1994-04-11 1996-11-12 The United States Of America As Represented By The Secretary Of The Air Force Analysis/synthesis-based microphone array speech enhancer with variable signal distortion
US5933495A (en) * 1997-02-07 1999-08-03 Texas Instruments Incorporated Subband acoustic noise suppression
WO2000041169A1 (en) * 1999-01-07 2000-07-13 Tellabs Operations, Inc. Method and apparatus for adaptively suppressing noise

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHARBONNIER ET AL: "Design of nearly perfect non uniform QMF filter banks", ICASSP'88, vol. 3, 11 April 1988 (1988-04-11) - 14 April 1988 (1988-04-14), New York, pages 1786 - 1789, XP010072670 *
RUDOLF FÖLDVÁRI AND LÁSZLO GYIMESI: "Very low bit rate voice coder based on a nonlinear hearing model", EUROSPEECH'99, vol. 4, 5 September 1999 (1999-09-05) - 9 September 1999 (1999-09-09), Budapest, Hungary, pages 1547 - 1550, XP002180861 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009043066A1 (en) * 2007-10-02 2009-04-09 Akg Acoustics Gmbh Method and device for low-latency auditory model-based single-channel speech enhancement
GB2465910A (en) * 2007-10-02 2010-06-09 Akg Acoustics Gmbh Method and device for low-latency auditory model-based single-channel speech enhancement
GB2465910B (en) * 2007-10-02 2012-02-15 Akg Acoustics Gmbh Method and device for low-latency auditory model-based single-channel speech enhancement
CN111145770A (en) * 2018-11-02 2020-05-12 北京微播视界科技有限公司 Audio processing method and device

Also Published As

Publication number Publication date
AU2001282047A1 (en) 2002-02-13
HUP0003010A2 (en) 2002-08-28
HU0003010D0 (en) 2000-10-28

Similar Documents

Publication Publication Date Title
AU666161B2 (en) Noise attenuation system for voice signals
US8005246B2 (en) Hearing aid apparatus
EP2375785B1 (en) Stability improvements in hearing aids
EP2579252B1 (en) Stability and speech audibility improvements in hearing devices
WO2004040555A1 (en) Voice intensifier
JP2012517124A (en) Reinforced envelope coded sound, speech processing apparatus and system
JP2003520469A (en) Noise reduction apparatus and method
JPH09503590A (en) Background noise reduction to improve conversation quality
EP2560410B1 (en) Control of output modulation in a hearing instrument
JPH09258787A (en) Frequency band expanding circuit for narrow band voice signal
CN111107478B (en) Sound enhancement method and sound enhancement system
CN111182431A (en) Howling suppression method for conference sound reinforcement system
CN103827967A (en) Audio signal restoration device and audio signal restoration method
AU2002300314B2 (en) Apparatus And Method For Frequency Transposition In Hearing Aids
WO1999001942A2 (en) A method of noise reduction in speech signals and an apparatus for performing the method
CN113993053B (en) Channel self-adaptive digital hearing aid wide dynamic range compression method
Sondhi et al. Improving the quality of a noisy speech signal
JPH0968997A (en) Method and device for processing voice
WO2002011125A1 (en) Attenuation of background noise and echoes in audio signal
JP2007251354A (en) Microphone and sound generation method
RU2589298C1 (en) Method of increasing legible and informative audio signals in the noise situation
JPH07146700A (en) Pitch emphasizing method and device and hearing acuity compensating device
Zou et al. Design of compensated multi-channel dynamic-range compressor for hearing aid devices using polyphase implementation
Moore Computational models for predicting sound quality
JPH0956000A (en) Hearing aid

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP