WO1998058448A1

WO1998058448A1 - Method and apparatus for low complexity noise reduction

Info

Publication number: WO1998058448A1
Application number: PCT/SE1998/001161
Authority: WO
Inventors: Björn Olof Peter EKELUND
Original assignee: Telefonaktiebolaget Lm Ericsson
Priority date: 1997-06-16
Filing date: 1998-06-16
Publication date: 1998-12-23
Also published as: AU8050998A

Abstract

A method and apparatus for improving sound quality in a voice transmission or recording system is disclosed. The system involves controlling an amount of gain used in amplifying a signal wherein a nominal gain characteristic that is substantially invoked in the presence of a voice signal or information signal, while gain is decreased rapidly in the absence of a voice or information signal. A background noise characteristic assists in controlling the nominal gain characteristic. In an exemplary embodiment, a gain adjustable amplifier is gain controlled by a gain control signal that is the arithmetic product of an input signal derived from an environment that may contain background noise, as modulated by a background noise estimate signal. Amplification in the gain adjustable amplifier is thus controlled so that amplification of the input signal is adjusted so as to reduce the perceived background noise received by at a far end (i.e., transmitted).

Description

METHOD AND APPARATUS FOR LOW COMPLEXITY NOISE REDUCTION

Background of the Invention 1) Field of the Invention

The present invention is directed to a method and apparatus for reducing perceived background noise in a signal pick-up system, such as a hands-free microphone for a car mounted cellular telephone.

2) Discussion of Related Art

Car safety regulations in many countries prohibit a user of a cellular telephone from using a handset while driving a car. This has increased the demand for "hands-free" operation of cellular telephones, particularly when the cellular telephone is mounted in a car. There are many ways to implement hands-free operation of a telephone, most of which rely on a separate loudspeaker and an omni-directional microphone. When provided in an automobile, the microphone is normally mounted to the ceiling on a gooseneck rod, or on the visor. The loudspeaker is usually integrated into the car stereo, but may be provided in a dash-mounted arrangement. A problem associated with an omni-directional microphone, spatially separated from a user, is its tendency to pick up sounds other than the user's voice. In particular, the microphone may pick up background noise, which in a car can be substantial. Background noise can greatly distort a transmitted signal. The European Telecommunications Standards Institute (ETSI) is in the process of implementing GSM standards (ETSI Recommendation GSM 11.10) on the extent to which physically distributed background noise source is to be attenuated compared to a user's voice. Attenuation of background noise can be provided, in part, by careful mechanical and acoustical microphone design and/or prudent configuration of the setting in which the microphone is used. However, such designs tend to be environment- specific and fail to offer flexibility. For instance, a directional microphone can be used so that it picks up sound from a limited area, such as a zone in which a driver's head is typically positioned. However, the voice of a user in a passenger seat may be too attenuated. Systems can be designed for specific passenger compartments, but this option necessitates a different system design for each type of car. Hence, mechanical and acoustical solutions do not lend themselves to universal applicability in a variety of settings.

Active background noise reduction, where background noise is electronically suppressed, is not known for use on a commercial scale in cellular telephony. In military communications, active background noise reduction is used, particularly in aircraft communication systems.

The methods used in military communications are typically complex and consequently, expensive. Such systems are normally based on either spectral properties of the background noise or linear prediction and estimation of background noise. Methods based on spectral properties of the background noise are computationally intense and utilize Fast Fourier Transforms (FFT's). Linear prediction and estimation of background noise methods include techniques such as Linear Predictive Coding (LPC). LPC estimates the spectral characteristics of, e.g., a speech signal, by calculating "best fit" parameters in a parameterized model and suppressing the background noise it represents. The technique of expanding the dynamics of a voice signal can be used to reduce the perceived intensity of irritating background static, hiss, noise or hum. The terms "expansion" or "expanding," as used herein, refer to increasing the dynamic range of a signal by amplifying strong components more than weak components. This technique has been used in many types of electronic equipment, such as home audio equipment (e.g., U.S. Reissue Patent No. RE30,468 entitled "Signal Compressors and Expanders" issued on December 30, 1980 to Dolby et al.) or in commercial radio systems (e.g. companded frequency modulation). Conventional systems that utilize expansion usually involve some degree of signal pre-compression to achieve end-to-end linearity of the sound signal. However, linearity is not necessary in voice communications if a dynamic characteristic of the expansion mechanism is tuned properly.

Because it is difficult to define a typical environment in which hands-free communications are used, it is necessary to provide a noise-suppression mechanism having an adaptation, or learning capability. That is, the noise suppression mechanism should be able to adapt to a particular environment in which the hands-free communication device is used.

According to U.S. Patent No. 4,847,897 entitled "Adaptive Expander for Telephones" issued to Donald Means on July 11, 1989, the performance of a conventional wireline telephone operating in a full duplex mode in a noisy environment can be improved by use of an adaptive expander. In the absence of a strong speech signal, the adaptive expander for a wireline telephone in Means reduces the gain of a transmitting amplifier. The gain is raised when a strong speech signal is detected. Background noise is distinguished from speech by a time-averaging technique. However, the adaptive expander of the Means patent is somewhat primitive because it operates dependent on an instantaneous voice level, resulting in sound quality that makes a user sound short of breath. This is caused by gain variation in a signal transmission amplifier during speech. This may be acceptable in an application where costs are critical, but it leaves much to be desired because the perceived audio quality is compromised by the shortness-of-breath effect. Furthermore, the device disclosed by Means performs poorly in noisy environments and does not handle DTMF signalling well. Hence, the perceived sound quality remains marginalized.

Mobile cellular telephones may be used in a variety of environments, each having unique sound performance requirements. In a noisy environment, intelligibility is key, whereas in a quiet environment, a clean clear sound quality is desirable. This illustrates the need for a noise suppression technique that is capable of carefully adapting its operation, i.e., adjusting its expansion characteristics, for a given setting in such a way to avoid the common pitfalls associated with conventional techniques, such as the one set forth in the Means patent. Summary of the Invention

A system in accordance with the invention is based on using a dynamic range expander to reduce perceived background noise during speech pauses. The system uses an amplifier with a gain that is responsive, in a non-linear manner, to both an instantaneous sound level as well as a background noise level.

In the presence of speech, the gain of the amplifier is kept at a substantially constant, "nominal level" while the gain is rapidly reduced during speech pauses. The gain reduction during speech pauses is proportional to the amount of background noise. The gain level during speech, that is, the nominal gain, is reduced in correspondence with increasing background noise. The nominal gain reduction, in a preferred embodiment, is approximately 2 decibels for every 3 decibel increase in background noise. The amount of noise reduction is thus proportional to the amount of background noise present. Thus, in the absence of noise, the nominal gain is equal to the maximum gain, while the gain reduction in speech pauses is zero. A cellular telephone noise reduction circuit can incorporate the invention to reduce the background noise perceived by a person at the far end of a telephone call. This is done, in accordance with an embodiment of the invention, by instantaneously and dynamically varying the sensitivity of a microphone, in accordance with the background noise level. This can be done by using analog circuitry, but in a preferred embodiment is carried out using or digital circuitry and/or processing.

A noise reduction circuit in accordance with an exemplary embodiment of the invention can handle input from a port for receiving a sound signal from a microphone, and rectify the sound signal using a rectifier circuit. A preferably nonlinear low pass filter is used to extract the signal envelope of the rectified sound signal. An instantaneous level of the extracted signal is modulated with a signal proportional to an estimated background noise level to produce a signal which is used to control a variable gain amplifier for amplifying the sound signal.

By use of such a noise reduction circuit, the amplification of the sound signal, and thus the background noise reduction, is dynamically adjusted. Consequently, in a situation where no background noise is present, the amplifier gain is constant and no signal expansion takes place. With increasing background noise, the expansion is correspondingly increased. However, in preferred embodiments of the invention, the gain level during speech is relatively constant at the nominal level. This has the effect of diminishing the shortness of breath characteristics inherent in prior art systems, such as Means.

A system in accordance with the invention exploits certain properties and factors inherent in radio communications systems and in human speech, to provide enhanced sound quality. For instance, humans have a tendency to speak at greater volume levels in noisy environments. Typically, a person increases his or her voice volume level by two decibels for an increase of the three decibels in the background noise level. This is known as the Lombard effect. Systems in accordance with the present invention operate to decrease the nominal gain to adjust for the Lombard effect, thereby decreasing the amount of background noise perceived by a far-end party.

Embodiments of the present invention can take advantage of the fact that background noise measurement is mandated under many existing cellular standards, including the Global System for Mobile Communication (GSM) standard. Therefore, background noise information (e.g., a background noise estimate parameter) is often readily available in a system which operates in accordance with standards requiring such a measurement. The invention may operate in a terminal or land station in the GSM system. In such an application, the aforementioned background noise estimate is provided by a voice activity detector (VAD) that operates according to ETSI GSM Recommendation 06.32, which is incorporated herein by reference.

In addition, under many cellular standards, there is no requirement for end-to-end sound level linearity. Hence, a device operating in accordance with the present invention may be used. This is not the case with prior art systems, such as U.S. RE30,468 to Dolby, wherein end-to-end linearity is required. Furthermore, exemplary embodiments of the present invention are well suited for implementation in a Digital Signal Processing (DSP) environment. Consequently, advanced algorithms for controlling gain and estimating background noise, that may be difficult to implement in systems operating with analog circuitry (e.g., that of Means), are relatively simple to carry out.

Brief Description of the Drawings The present invention will now be described in conjunction with the exemplary embodiments as shown in the following figures, wherein:

Figure 1 is a block diagram of an exemplary embodiment of the invention including a radio transceiver having a background noise suppressor;

Figure 2 is a background noise suppressor in accordance with an embodiment of the invention;

Figure 3A is a graph of adaptive noise suppressor input-output characteristics wherein a linear low-pass filter is used, and wherein curves 0 through 3 correspond to increasing levels of background noise, respectively;

Figure 3B is a graph of adaptive noise suppressor input-output characteristics wherein a non-linear low-pass filter is used, and wherein curves 0 through 3 correspond to increasing levels of background noise, respectively; Figure 4 A depicts a rectified speech signal; Figure 4B depicts an input-to-output gain variation for a noise suppression arrangement, at a certain level of background noise, in accordance with an embodiment of the invention wherein linear low-pass filtering is used;

Figure 4C depicts an input-to-output gain variation for a noise suppression arrangement, at a certain level of background noise, in accordance with an embodiment of the invention wherein non-linear low-pass filtering is used;

Figure 5 is a conventional analog voice activity detector which can be used in accordance with the invention;

Figures 6A and 6B are alternative background noise estimators having a voice activity detector which can be used in accordance with the present invention;

Figure 7 is a more detailed block diagram of a GSM voice activity detector; Figures 8A and 8B are a flow chart depicting the voice activity detection threshold value determination procedure in a GSM VAD; and

Figure 9 depicts an I/O-scenario for a prior art system.

Detailed Description of the Preferred Embodiments

Figure 1 depicts an exemplary embodiment of the invention as applied in a cellular telephone transceiver. The transceiver includes a background noise suppressor 12 that uses a dynamic range expander to reduce the perceived background noise during speech pauses. The background noise suppressor 12 is connected between a microphone 11 and an input port 13 of a communications terminal 18 in a mobile telephone. The communications terminal 18 comprises a transmitting circuit 18A, a receiving circuit 18B, and a duplexer 18C for selectively connecting the transmitting circuit 18A or the receiving circuit 18B to antenna 18D. The receiving circuit 18B outputs a received sound signal to a speaker 19. The transmitting circuit 18A transmits a signal to a base station (not shown).

Figure 2 depicts an exemplary noise suppressor 12 shown in Figure 1 and in accordance with the invention. An input signal received at the input port 10 is provided to a gain adjustable amplifier 22, to a rectifying circuit 20, and also to a background noise estimator (BNE) 24. The output of the rectifying circuit 20 is supplied to a low pass filter (LPF) 21. The LPF 21 may be linear or non-linear. An output of the LPF 21 is provided to a first input port of an arithmetic unit 23. Output from background noise estimator 24 is provided to a second input port of the arithmetic unit 23. In accordance with a preferred embodiment, the arithmetic unit 23 performs the operation given by equation 1 :

Y = A + (BX-C)*W = A + BXW - CW (Equation 1)

Where: A, B and C are constants whose values are system dependent;

Y is the output of the arithmetic unit 23; X is the output of the LPF 21 ; and

W is the output of the BNE 24. The term A controls the gain in noise free conditions (i.e. , when W = 0), the term CW controls the gain to decrease gain with increasing background noise (e.g., to adjust for the Lombard effect), and the term BXW controls the degree of expansion. The output Y from the arithmetic unit 23 is the output of LPF 21 modulated by the background noise estimate from background noise estimator 24. The output signal Y is provided to the gain adjustable amplifier 22 to control gain.

A time constant and voltage transfer function of the low pass filter (LPF) 21 are selected to control the gain of the adjustable amplifier 22 to be substantially constant during speech bursts, but fall rapidly in speech pauses.

In a software context, such as a program in a DSP, the functionality of the elements in the exemplary embodiment depicted in Figure 2 could be carried out, at least in part, in accordance with the following C-language code segment:

NoiseSupp(input, &output, background, tkl, tk2, expl, exp2, maxgain) { float input, output, tkl, tk2, maxgain; static float state;

/* rectify */ rect = abs(input);

/* this is a non-linear low-pass filter */ if (rect > state) state + = tkl * (rect - state) else state + = tk2 * (rect - state)

/* Obtain the output by non-linear function (level off at high input levels) */ lpf = lookup table(state)

/* Calculate gain */ gain = maxgain + (expl * state - exp2) * background;

output = input * gain;

}

Referring to Figure 2: "input" is the input signal at 10; "output" is the output of the adjustable amplifier 22; "rect" is the output of rectifier 20; "lpf" is the output of low pass filter 21; "gain" is the output of arithmetic unit 23;

"tkl " controls the time constant for increasing input signals; and "tk2" controls the time constant for decreasing input signals.

Generally speaking, if tkl is larger than tk2, the filter will have a faster attack time than its decay time. Accordingly, the gain will increases more rapidly with an increase in the input level than it will drop when the signal disappears. The gain from arithmetic unit 23 and supplied to the adjustable amplifier 22, is the gain applied to the input supplied to the adjustable amplifier 22 thereby generating the "output" . The background noise estimate is not generated in the foregoing code segment, but can be supplied as a parameter, such as a signal value ordinarily available in a GSM system. The parameter "maxgain" is the nominal gain value for a noiseless condition. The amount of expansion applied corresponds to "expl" while "exp2" corresponds to variation of the nominal gain with changes in background noise. The value of exp2 is preferably chosen so that "gain" is reduced by approximately 2 decibels for each 3 dB rise in the background noise value ("background"). A simple linear adjustment is shown in the example. A more exact adjustment can be obtained by use of logarithms as would be readily apparent to those skilled in the digital signal processing arts.

The function of "lookup table" is used to symbolize the use of a look-up table for the non-linear function utilized to achieve substantially constant gain during speech and to rapidly decrease gain at lower sound levels. An exemplary function that can be used to generate the lookup-table is: f(x) = (x^k)/((x+ l)^k - l) where k is desirably between 1.1 and 2. The value of the function increases rapidly for small values of x, but then level off to a substantially constant value for large values of x. In an exemplary embodiment, the arithmetic unit stores a plurality of gain curves which are selected in accordance with a level of background noise. Figure 3 A depicts gain curves in an embodiment of the invention where a linear LPF 21 is used. Each curve corresponds to a particular level of background noise. The curves are substantially linear and increase in slope depending on the level of background noise present. Curve 0 corresponds to a situation where there is no background noise. In such a situation, the input/output relationship would correspond to a 1 : 1 gain. Linear gain curves 1, 2, and 3 with gain rates of greater than 1: 1, respectively correspond to increasing levels of background noise. In contrast, the arrangement disclosed in the patent to Means does not reduce gain with increasing background noise. As a result, the family of curves in Means intercepts at the OdB input level point. Any other curve arrangement is exceedingly difficult to provide using the analog circuitry proposed by Means. The slope of a curve can be adjusted in accordance with increasing background noise to adaptively adjust gain in to compensate for the Lombard effect. Specifically, the greater the amount of background noise, the steeper the gain curve, i.e., the higher the gain input/output ratio.

Figure 3B depicts gain curves provided when the LPF 21 is non-linear. Each curve corresponds to a particular level of background noise. A situation where there is no background noise is shown by curve 0 in which there is a 1 : 1 relationship between input and output. A curve for a situation where there is background noise is non-linear. Curve 1 corresponds to a situation where the product of the arithmetic unit 23 corresponds to mild background noise. Gain curves 2 and 3 represent gain curves corresponding to increasing amounts of background noise. It is worth noting that the non-linear LPF 21 is designed to ensure that gain curves 1, 2, and 3 have very steep roll-offs so that signal levels swiftly drop below a voice signal level in the absence of speech. One having ordinary skill in the art will readily be able to select components and/or programming variables to achieve such goals.

In a simple system, the transfer characteristics of the noise suppressor will be constant such as shown by the suppression characteristic line in Figure 9. Such a characteristic results in constant operation of dynamic expansion, regardless of the actual background noise level. The slope of the suppression characteristic line is selected so that the sensitivity (i.e., gain control) of the microphone is varied with the level of incoming sound, forming a reasonable compromise between noise suppression and sound quality reduction. For example, the slope can vary by approximately two decibels for every three decibel increase in noise. In the simplest embodiment, both the arithmetic unit and the background noise estimator can be eliminated by the selection of low filter characteristics corresponding to the appropriate suppression curve. The line marked "linear" represents no suppression of noise.

In an exemplary embodiment of the present invention, the noise suppressor 12 can operate to continuously change the manifest sensitivity of the microphone 11 (i.e., amplification characteristics of a gain adjustable amplifier associated therewith). Since dynamic expansion will introduce a slight reduction of perceived sound quality because of constantly changing sensitivity of the microphone, in a preferred embodiment the gain variation of the inventive noise suppression unit is based on need, i.e., the actual background noise level.

If the slope and intercept of the transfer characteristics in Figure 9 are changed to provide non-linear operation of the low-pass filter based on measured background noise, a negative effect of the noise suppressor is substantially eliminated. The negative effect (eliminated) occurs when background noise is absent in which situation varying gain may produce undesirable sound characteristics. This means that the suppressor would not affect the speech when the background noise is below a certain threshold level and continuously increase its affect as background noise increases above this threshold level. Referring to Figure 3B, the slope of the curve during speech pauses (i.e., below 0 dB) increases dramatically with increasing background noise. The sensitivity of the microphone is reduced, in the preferred embodiment, by approximately two decibels for every three decibels of increased noise. The slope of the transfer characteristics may increase from one decibel per decibel of voice to two, or even three decibels per decibel of voice. The actual characteristics of a particular embodiment are determined empirically since they depend on the acoustic properties of the cellular setting and its installation, spectral properties of the background noise, etc.

Figure 4 A represents a typical voice, or other received signal, as it appears after having been rectified by the rectifier 20 and Figure 4B represents the rectified speech signal after it is low passed filtered by the LPF 21. The signals of Figures 4B and 4C are processed with output from the background noise estimator in the arithmetic unit 23. Figure 4B depicts the input-output characteristics of the noise suppression arrangement when the LPF 21 is linear. Figure 4C depicts the input-output characteristics of a preferred noise suppression arrangement when the LPF 21 is non- linear. The gain of the amplifier 22 is controlled by the output of the LPF 21 , as modulated by a background noise energy estimate within the arithmetic unit 23. This means, for example, that the amount of dynamic expansion applied to the microphone output signal can be substantially controlled by the amount of background noise. In a preferred embodiment, the low pass filter 21 is non-linear in that it has different attack and decay times. For example, a first order non-linear filter has differing step responses depending on the polarity of a signal in accordance with the following: s₊(t) = l-e^"(t/τ+) Equation 2 s.(t) = l-e- ^{t/τ )} Equation 3 where τ+ is the time constant for increasing input signals, and τ- is the time constant for decreasing input signals.

Such a filter is complicated to design using analog circuitry, but is relatively simple to implement in software or digital circuitry.

Conventional systems differ from systems incorporating the invention in that previous uses of expansion have been applied in systems requiring end-to-end linearity of the voice or sound transfer. These conventional systems, such as that disclosed in the aforementioned Means patent, suffer from the undesirable "shortness of breath" audible effect due to their amplification scheme. Systems in accordance with the present invention operate on the premise that end-to-end linearity is not necessary for voice communications, provided that dynamic behavior of an expansion mechanism is tuned properly (i.e., selection of an appropriate linear or non-linear LPF, or by adjusting the A, B and C constants in arithmetic unit processing). In embodiments of the invention using a non-linear LPF, performance is greatly enhanced. Utilization of a non-linear LPF provides a nominal gain characteristic that averts the shortness of breath audible effect. More specifically, the gain of the adjustable amplifier 22 is preferably controlled to remain fairly constant during speech, and to drop rapidly in speech pauses. The signal depicted in Figure 4C shows a preferred input-output response from a sound quality perspective. The plateaus in Figure 4C represent the nominal gain. This is the gain provided by the amplifier 22 in accordance with a control signal received from the arithmetic unit 23. The gain adjustable amplifier 22 should be capable of fast pick up, to avoid loss of syllables in the beginning of words and the shortness of breath effect. The linear portion of the Figure 3B curves correspond to the plateaus of Figure 4C. One skilled in the art may advantageously include the use of a voice activity detector (VAD), embodiments of which are described below. These embodiments recognize that there is no need for noise suppression in the absence of noise. Gain variation in the amplifier 22 can degrade sound quality in the absence of background noise. In a preferred embodiment of the invention, degradation from amplification and degradation from background noise itself are balanced to ensure that there is an action, or lack thereof, that results in a net improvement. Thus, in a noise- free situation, noise suppression is suspended, providing that the noise suppressor can identify this situation and disable the noise suppression (i.e., disable dynamic expansion). Recognition and differentiation of background noise or noise-free situations requires very accurate measurement. Such measurement can be provided in the context of voice activity detector operation. There are a number of methods and devices for detecting the presence of a human voice. A voice activity detector (VAD) typically operates based on the premise that background noise varies much more slowly than speech, and that speech typically has a greater energy level than that of background noise. Figure 5 shows a simple VAD 50. This type of VAD can be used in low-cost applications, such as in speaker phones or answering machines. Two low pass filters 51 and 52 have different time-constants. A first low pass filter 51 might have a shorter time constant thereby generating a signal envelope reflecting the fast transition of a voice signal, whereas the second low pass filter 52 may have a longer time constant tending to reflect a background noise level. Signals output by the filters are input to a comparator 53, which generates a "voice" indicator when a voice signal from the first low pass filter 51 is detected by comparison to the background signal of the second low pass filter 52. The comparator 53 incorporates hysteresis to assist in the (voice/no voice) decision.

Figures 6A and 6B illustrate exemplary background noise energy estimators (BNEs) 60 which operate in concert with a VAD 64. A BNE may be used to determine which of the possible gain curves, such as shown in Figure 3A, are to be used or as input parameters in Equation 1 above. In the absence of speech, as determined by the VAD 64, a background noise measurement can be taken. In Figure 6A, a switch 61 is closed during a non-speech segments of a sound signal as determined by the VAD 64. An amplifier 62, operating in combination with a resistor R and a capacitor C, forms a low pass filter with a hold function. When the switch 61 is opened, the output voltage of the of the low pass filter is held constant and an input signal to the low pass filter is ignored. Because of the low output impedance of the amplifier 62, the resistor R has no effect when the switch 61 is open. An input signal generated by microphone 11 is rectified by a rectifier 63. The output of the rectifier 63 as it is preferable to average background noise power rather than to average background noise voltage. In Figure 6B, a hold circuit 67 (which replaces the switch 61 in Figure 6 A) blocks input to LPF 66, when speech is detected, so that the background noise estimate is only updated during pauses in speech.

The VAD circuit of Figure 5 can be used as VAD 64 in either Figure 6A or 6B. The VAD of Figure 5 can operate to open or close the switch 61 in Figure 6A, and to engage the hold 67 circuit in Figure 6B. In Figure 5, when a signal is not present, i.e. not detected, the output is zero and the output of amplifier 62 or LPF 66 in Figures 6A and 6B, respectively, is held constant since the background noise is not being estimated. A more complex VAD may operate on some unique property of human speech, such as the presence of a vocal pitch or spectral characteristics. Such a VAD may be incorporated into a background noise estimator (BNE) such those shown in Figures 6A and 6B. Alternatively, such a BNE may be a component, function or aspect of a GSM VAD, similar standardized system (e.g., IS95), or even an analog system.

An aspect of systems operating in accordance with the present invention, is the ability to utilize existing parameters maintained in standardized cellular stations. For example, in an application in a GSM standard-based cellular mobile station, an embodiment of the present invention can utilize existing software and hardware outputs and parameters for high performance background noise estimation.

An example is the use of a GSM specified Voice Activity Detector under GSM standard Recommendation 06.32. Such a VAD is connected directly to, or integrated in, a speech coder/decoder (codec). In the 06.32 VAD (GSM VAD), the process of recognizing the presence of speech involves the determination of an accurate background noise estimate. The GSM VAD is used to determine when speech is not present, and at those instances measure and average the energy of the background noise, thereby providing an accurate background noise estimate.

Background noise in a GSM VAD is measured during speech pauses. During such pauses, a linear predictive coding (LPC) algorithm within an encoder element is used to measure the characteristics of background noise. Measurements include the spectral distribution and energy levels of the background noise. The measured properties of the background noise are averaged to enhance a degree of confidence in the measurements. The speech encoder performs measurement of background noise every 20 milliseconds during speech pauses. The GSM VAD provides an extremely accurate background noise estimate and generally outperforms prior art analog systems, such as the one disclosed in the Means patent. In accordance with a preferred embodiment of the present invention, the background noise measurement, or background noise estimate (BNE) provided by the GSM VAD speech encoder can be used to assist in controlling an amount of dynamic range expansion applied by a noise suppressor, as the BNE measurement is readily available for use in any system operating in accordance with the GSM standard, or any derivative thereof (e.g., the U.S. PCS 1900 standard).

The GSM VAD (hereinafter referred to as "VAD") functions to distinguish between a signal having noise with speech present, and a signal having noise with no speech present. The mobile environment poses difficulties because of the low speech-to-noise level ratios involved. Furthermore, background noise in a mobile environment may vary, and the spectrum of noise may also change between vehicles. The GSM VAD operates as an energy detector where speech presence is indicated when a threshold value is exceeded. However, the threshold value for voice detection in one situation may not be suitable for another. This problem is remedied in the GSM environment by providing a VAD having an adaptable voice detection threshold value. To provide reliable voice detection, the VAD threshold level is adjusted to be sufficiently higher than a noise level to avoid the noise being mistaken for speech, but not so far above the noise level that low level speech is not detected, or is mistaken for noise. The GSM VAD threshold is adapted during periods when speech is not detected (i.e., in concert with a background noise estimate). In order to ensure that speech is not present during threshold adaptation several factors are analyzed. These factors include determining whether a detected signal's frequency domain appears to be stationary, and whether the signal contains a pitch component inherent in speech or information tones.

A GSM VAD operates in concert with an adaptive filter, which filter's coefficients also are updated during periods when speech is not present. Figure 7 depicts a circuit block diagram for determining the GSM VAD threshold value. In the depicted block diagram, a speech encoder 200 receives an input signal 220 (e.g., a voice signal or information tone). Autocorrelated coefficients (ACF) in the speech encoder 200 are provided to a filter 202 and to a coefficient assessment module 206 and 208. The autocorrelated coefficients are evaluated by a spectral comparison module 210 to determine whether the coefficient's spectra is stationary. In addition, a periodicity detector module 214 provides an indicator based on speech decoder output to determine if there is periodic stationarity. If there is an indication of a stationary condition, i.e. no speech present, and an indication of a stationary periodic condition, then a threshold determination module 212 will proceed to determine a VAD threshold value. The VAD threshold value is compared with coefficient energy values from the filter 202 by VAD decision module 216 to determine if speech is present. More specifically, autocorrelated coefficients (ACF) from a speech encoder 200 are provided to a filter 202 and to a coefficient averaging module 206. In order to assess spectral steadiness, the coefficient averaging module 206 maintains a previous average (avO) which is provided to a spectral comparison module 210, and a present average (avl) which is provided to a coefficient predictor module 208. The spectral comparison module 210 compares the previous average (avO) with predicted coefficient values (r_avl) from coefficient predictor module 208, in order to make a determination of whether there is a stationary spectral condition over a given sample period. The result of such a determination is provided to the threshold determination module 212, as are the predicted coefficient values (r_avl). A periodicity detector module 214 provides the threshold determination module 212 with an indication of whether the frequency spectrum of the noise is stationary over a relatively long period. The threshold determination module 212 will typically update the VAD threshold value (th_vad) when the autocorrelated coefficients (ACF) have a very low level, or when the probability of speech not being present is high. The VAD threshold value (th_rad) is provided to a VAD decision module 216. The VAD decision module 216 compares the VAD threshold value (th_vad) with an autocorrelation energy value (p_vad) determined by coefficient energy computing module 204. The result of the comparison determines whether or not speech is present. In order to compensate for bursty speech, a hangover may be added by a hangover addition module 218, as needed. Energy in a current frame (p_vad) output by the filter 202 is determined according to the following equation:

P~i [ACF(i)}

where r_vad are the autocorrelated predictor values determined by coefficient predictor module 208, and ACF are the autocorrelation coefficients provided by speech encoder 200. The above equation is performed for 8th order block filtering on input samples to the speech encoder 200. Threshold adaptation assessment is typically made about every 20 milliseconds, or one frame. However, spectral characteristics of the input signal have to be obtained using blocks that are larger than a 20 ms frame. Consequently, ACF averaging by the coefficient averaging module 200 is performed by averaging autocorrelation values for several consecutive frames. Such averaging is given by the following equations: av0(n)[ι] = }j ACF(n -j)[ι] i = 0 8 avl(n)[ι] = av0{n -f)[ι] i = 0 8

where n represents the current frame, n-1 represents the previous frame, and f is the number of frames (typically set to 4).

Coefficient predictor determination by coefficient predictor module 208 is performed according to matrix calculation:

'p

where:

| avl[0],avl[l],avl[2],avl[3],avl[4],avl[5],avl[6],avl[7] | avl[l],avl[0],avl[l],avl[2],avl[3],avl[4],avl[5],avl[6] | avl[2],avl[l],avl[0],avl[l],avl[2],avl[3],avl[4],avl[5]

R = | avl[3],avl[2],avl[l],avl[0],avl[l],avl[2],avl[3],avl[4] | avl[4],avl[3],avl[2],avl[l],avl[0],avl[l],avl[2],avl[3] j avl[5],avl[4],avl[3],avl[2],avl[l],avl[0],avl[l],avl[2] |avl[6],avl[5],avl[4],avl[3],avl[2],avl[l],avl[0],avl[l]| avl[7],avl[6],avl[5],avl[4],avl[3],avl[2],avl[l],avl[0]|

and

V- = |avl[4] a = |aavl[4] |avl[5] |aavl[5] |avl[6] |aavl[6] |avl[7] |aavl[7] |avl[8] |aavl[8]

It is worth noting that avl is used rather than avO, because avO may contain speech. From these matrices, the autocorrelated predictor values (r_avl) are calculated according to the equation:

r„_lm = Σ[^ββ 7(t)] [^βαv7(*+ ] ι = 0..8

The spectral comparison is made using the autocorrelated predictor values (r_avl) and the averaged autocorrelation values (avO) to determine a distortion measure (dm) as given by:

r_a (0)αvO(0) + 2 ∑ r (iav0i)

A difference cβtøaparison betwgma present distortion measure (dm) and a previous distortion measure (lastdm) is made after which the dm is set to lastdm. A "stationary" indication will be provided if the difference is less than a preset threshold value. The frequency spectrum of noise in, for example, an automobile application, generally is stationary over relatively long periods. The coefficients (r_vad) of the filter 202 are updated when this type of stationary condition is present. Vowel sounds and information tones may also have this stationary characteristic, but can be accounted for (i.e., to avoid a false indication of speech not being present) by detecting the periodicity of these sounds using a long term predictor value (N_j) from the speech encoder 200. Consecutive lag values are compared, and an indication (ptch) is made if a periodic condition is determined.

As mentioned above, threshold adaptation is performed by threshold determination module 212 when spectral stationarity and periodicity conditions are proper. These conditions are checked every 20 milliseconds. The process undertaken by module 212 is depicted in Figures 8A and 8B. A first assessment is made at decision block 310 to determine whether energy p_vad of input autocorr elate coefficients (ACF0) is greater than a preset threshold constant (pth). If ACF0 is greater than pth, such a condition might indicate the presence of speech, in which case the voice activity detection threshold (th_vad) is set to a preset constant (plev) at block 316, and the process then terminates until the next condition check. If ACF0 is less than pth, this indicates that speech might not be present, allowing a determination of spectral stationarity and periodicity conditions at decision block 314. If the conditions are not proper for adaptation (i.e., speech is present), an adaptation counter (adaptcount) is set to zero at block 320, and the process terminates. If the conditions are found to be proper, adaptcount is incremented at block 312, and at decision block 318, adaptcount is checked to see if it is greater than a number corresponding to a number of frames (adp) over which only a small spectral difference may occur for adaptation processing to be allowed. If adp is not exceeded, the process terminates, if adp is exceeded, then threshold adaptation occurs.

Adaptation of the VAD threshold value (th_vad) takes place in proportional steps, rather than by constant increments. The current value is initially multiplied by a proportional factor given by (1-1/dec) as shown at block 322. Block 324 determines if the value given by block 322 is greater than or equal to a index given by an energy value of a current input frame (p_vad) multiplied by a constant factor (fac). If so, then the VAD threshold (th_vad) decrease adjustment is proper and th_vad is left at its new value. If, on the other hand, the value given by 322 is less than (p_vad)*fac, then th_vad may require an increase. Such an increase is made at block 326, where th_vad is increased by a proportional factor of p_vad*fac, unless this exceeds an increase by a factor of (1 + 1/inc), in which case th_vad is increased by a factor of (1 + 1/inc). In either case, th_vad is never to exceed p_vad plus a constant (margin) as determined at block 330. If the adjusted value of th_vad does exceed p_vad+ margin, then th_vad is set to p_vad+ margin at block 328. In either case, the predicted values of the autocorrelation coefficients (r_vad) are sent to the filter 202 at block 332, and adaptcount is incremented at 334.

The value for th_vad is used by VAD decision module 216 to determine whether an input signal contains speech. Such an indication is made when energy in a current frame output by filter 202 (p_vad) exceeds th_vad. In the event that the input signal has burst characteristics, it may be necessary to add VAD hangover. This is done, if needed, by VAD hangover addition module 218.

In a GSM system, the standard speech encoder contains both a voice activity detector and a background noise estimator. In a digital signal processor (DSP), implementation of the invention is very simple. An implementation in a DSP in a GSM terminal could consume as few as 100 instructions and a dozen storage locations for, inter alia, the curves of Figures 3 A and 3B. A typical off-the-shelf DSP would currently have a program memory of 8192 locations and a storage memory of 2048 locations, for example. The implementation of the invention for other systems, e.g., PDC (Pacific Digital Cellular based on the RCR-27 standard) or D-AMPS (Digital- AMPS based on the EIA IS-54 standard) would be understood to one skilled in the art. The inventive concept disclosed in this application applies not only to cellular telephones but to intercoms, public announcement systems and other voice transmission systems. The described noise suppression method is also used for situations other than a hands-free operation. For instance, it could be used to improve the perceived sound quality when using built-in microphones of any device in a noisy environment. The present invention may, of course, be carried out in other specific ways than those set forth herein without departing from the spirit and the central characteristics of the invention. The present embodiments are, therefore, to be construed in all respects as illustrative and not restrictive and all changes coming within the meaning and the equivalency range of pendant claims are intended to be embraced herein.

Claims

What is claimed is:

1. An apparatus for reducing perceived noise in a transmit signal, the apparatus comprising: means for receiving a first signal from an environment that may include background noise; means for measuring background noise in the environment and for supplying a second signal corresponding to an amount of background noise present in the environment; an arithmetic unit, responsive to the first signal and to the second signal, for generating a gain control signal; and a variable gain amplifier for amplifying the first signal to generate the transmit signal, a gain of the variable gain amplifier being responsive to the gain control signal; the variable gain amplifier being controlled such that the gain is kept at a nominal level that is substantially constant when at least one of a voice signal and an information signal is present in the first signal.

2. The apparatus of claim 1, wherein the variable gain amplifier is controlled so that gain is rapidly reduced in the absence of at least one of a voice signal and an info signal in the first signal.

3. The apparatus of claim 1 , wherein the nominal level of amplification is reduced for an increase in the level of background noise.

4. The apparatus of claim 3, wherein the nominal level of amplification is reduced by approximately 2 dB for an approximate 3 dB increase in the level of background noise.

5. The apparatus of claim 1, wherein the nominal level of amplification is at a maximum level when the background noise signal indicates an absence of background noise.

6. The apparatus of claim 1 , wherein a rate of amplification reduction is proportional to a level of the background noise signal.

7. The apparatus of claim 6, wherein the rate of amplification reduction is greater for a higher level of background noise than for a lower level of background noise.

8. The apparatus of claim 1, further comprising: means for rectifying the first signal to produce a rectified signal; and means for low-pass filtering the rectified signal to produce a filtered signal; wherein the filtered signal is supplied to the arithmetic unit.

9. The apparatus of claim 1, wherein the gain control signal produced conforms to one of a plurality of input-output curves each of which curves corresponds to a background noise signal level.

10. A method of reducing noise perceived in a transmit signal, the method comprising the steps of: receiving a first signal from an environment that may include background noise; measuring a level of background noise in the environment and generating a second signal corresponding thereto; modulating the first signal with the second signal to produce a gain control signal; supplying the gain control signal to a gain adjustable amplifier to provide a nominal gain characteristic in the gain adjustable amplifier; and amplifying the first signal with the gain adjustable amplifier to produce the transmit signal; wherein the modulation of the first signal with the second signal to produce the gain control signal provides the nominal gain characteristic when one of at least a voice signal and an information signal is present in the first signal.

11. The method of claim 10, wherein the nominal level of amplification is reduced for an increase in the level of background noise.

12. The method of claim 11 , wherein the nominal level of amplification is reduced by approximately 2 dB for an approximate 3 dB increase in the level of background noise.

13. The method of claim 10, wherein the nominal level of amplification is at a maximum level when the background noise signal indicates an absence of background noise.

14. The method of claim 10, wherein a rate of amplification reduction is proportional to the level of background noise.

15. The method of claim 14, wherein the rate of amplification reduction is greater for a higher level of background noise than for a lower level of background noise.

16. The method of claim 10, further comprising the steps of: rectifying the input signal to produce a rectified signal; low-pass filtering the rectified signal to produce a filtered signal; and supplying the filtered signal to the arithmetic unit.

17. The method of claim 10, wherein the gain control signal produced conforms to one of a plurality of input-output curves each of which curves corresponds to a background noise signal level.

18. The method of claim 10, further comprising the step of: controlling the variable gain amplifier so that gain is rapidly reduced in the absence of at least a voice signal and an information signal in the first signal.

19. An apparatus for reducing the amount of noise perceived by a far end user, the apparatus comprising: a background noise estimator for supplying a background noise estimate; a variable gain amplifier; and an arithmetic unit for providing an amplifier gain control signal to the variable gain amplifier; wherein a gain control signal supplied to the variable gain amplifier varies amplification by approximately 2dB for every 3dB change in background noise indicated in the background noise estimate.

20. An apparatus for reducing noise transmitted to a far end user in a cellular telephone operating in accordance with a cellular telephone standard having a requirement for measuring a background noise, the apparatus comprising: a gain controllable amplifier; an arithmetic unit for supplying gain control to the amplifier; wherein the arithmetic unit provides a gain control signal corresponding to a background noise estimate supplied in accordance with the cellular telephone standard.

21. The apparatus for reducing noise claimed in claim 2, wherein: the gain controllable amplifier operates at a substantially constant gain level in the presence of speech.

22. The apparatus for reducing noise claimed in claim 2, further comprising: a non-linear low pass filter for filtering a voice signal to generate a filtered voice signal, wherein the filtered voice signal is supplied to the arithmetic unit to be processed with the background noise estimate to provide the gain control signal.

23. The apparatus for reducing noise claimed in claim 2, wherein: a gain control signal provided by the arithmetic unit conforms to one of a plurality of input-output curves each of which corresponds to a level of a given background noise estimate.

24. The apparatus for reducing noise claimed in claim 5, wherein: the plurality of input-output curves correspond to increasing background noise to adaptively adjust in accordance with the Lombard effect.

25. The apparatus for reducing noise claimed in claim 2, wherein the cellular telephone operating environment in accordance with a cellular telephone standard is the Global System of Mobile Communications (GSM) standard.

26. An apparatus for reducing noise transmitted to a far end user in a cellular telephone operating in accordance with a cellular telephone standard having a requirement for measuring a background noise, the apparatus comprising: means for amplifying a signal to be transmitted; means for controlling an amount of gain in the amplifying means; wherein the controlling means provides a gain control signal corresponding to a background noise estimate supplied in accordance with the cellular telephone standard.

27. The apparatus for reducing noise claimed in claim 6, wherein: the amplifying means operates at a substantially constant gain level in the presence of speech.

28. The apparatus for reducing noise claimed in claim 6, further comprising: means for low-pass filtering for filtering a voice signal to generate a filtered voice signal, wherein the filtered voice signal is supplied to the controlling means to be processed with the background noise estimate to provide the gain control signal.

29. The apparatus for reducing noise claimed in claim 8, wherein: a gain control signal provided by the controlling means conforms to one of a plurality of input-output curves each of which corresponds to a level of a given background noise estimate.

30. The apparatus for reducing noise claimed in claim 11, wherein: the plurality of input-output curves correspond to increasing background noise to adaptively compensate gain in accordance with the Lombard effect.

31. A method for reducing noise transmitted to a far end user in a cellular telephone operating in accordance with a cellular telephone standard having a requirement for measuring a background noise, the method comprising the steps of: receiving a signal to be transmitted; producing an amplification gain control signal in accordance with a background noise estimate supplied in accordance with the cellular telephone standard; supplying the amplification gain control signal to a gain controllable amplifier; and amplifying the signal to be transmitted in accordance with the amplification gain control signal.

32. The method for reducing noise claimed in claim 13, wherein: amplification is performed at a substantially constant gain level in the presence of speech.

33. The method for reducing noise claimed in claim 13, further comprising the step of: low-pass filtering a signal to be transmitted to generate a filtered signal, wherein the filtered signal is processed with the background noise estimate to provide the gain control signal.

34. The method for reducing noise claimed in claim 13, wherein: a gain control signal supplied conforms to one of a plurality of input-output curves each of which corresponds to a level of a given background noise estimate.

35. The method for reducing noise claimed in claim 16, wherein: the plurality of input-output curves correspond to increasing background noise to adaptively compensate gain in accordance with the Lombard effect.