WO2001031972A1 - System and method for adaptive interference canceling - Google Patents

System and method for adaptive interference canceling Download PDF

Info

Publication number
WO2001031972A1
WO2001031972A1 PCT/US2000/029336 US0029336W WO0131972A1 WO 2001031972 A1 WO2001031972 A1 WO 2001031972A1 US 0029336 W US0029336 W US 0029336W WO 0131972 A1 WO0131972 A1 WO 0131972A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
microphone
frequency
noise
detecting
Prior art date
Application number
PCT/US2000/029336
Other languages
French (fr)
Inventor
Douglas Andrea
Baruch Berdugo
Joseph Marash
Original Assignee
Andrea Electronics Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Andrea Electronics Corporation filed Critical Andrea Electronics Corporation
Publication of WO2001031972A1 publication Critical patent/WO2001031972A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/007Two-channel systems in which the audio signals are in digital form
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/06Receivers
    • H04B1/10Means associated with receiver for limiting or suppressing noise or interference
    • H04B1/12Neutralising, balancing, or compensation arrangements
    • H04B1/123Neutralising, balancing, or compensation arrangements using adaptive balancing or compensation means
    • H04B1/126Neutralising, balancing, or compensation arrangements using adaptive balancing or compensation means having multiple inputs, e.g. auxiliary antenna for receiving interfering signal

Definitions

  • the present invention relates generally to integrating a DSDA (Digital Super Directional Array).
  • FIG. 1 is a block diagram of an overall system
  • FIG. 2 is a block diagram of a sampling unit
  • FIG. 3 is a block diagram of an alternative embodiment of a sampling unit
  • FIG. 4 is a schematic depiction of tapped delay lines used in a main channel matrix and a reference matrix unit
  • FIG. 5 is a schematic depiction of a main channel matrix unit
  • FIG. 6 is a schematic depiction of a reference channel matrix unit
  • FIG. 7 is a schematic depiction of a decolorizing filter
  • FIG. 8 is a schematic depiction of an inhibiting unit based on directional interference
  • FIG. 9 is a schematic depiction of a frequency-selective constraint adaptive filter
  • FIG. 10 is a block diagram of a frequency-selective weight-constraint unit
  • FIG. 11 is a flow chart depicting the operation of a program that can be used to implement the invention.
  • FIGS. 12A-H illustrate the DSDA integrated according to the present invention
  • FIGS. 13-18C-2 illustrate the universal interface in accordance with the present invention
  • FIG. 19 is a block diagram of a system using sub-band processing
  • FIG. 20 is'a block diagram of a system using broadband processing with frequency-limited adaptation
  • FIG. 21 is a block diagram of a system using broadband processing with an external main-channel generator
  • FIGS. 22A-22D are a flow chart depicting the operation of a program that can be used to implement a method using sub-band processing
  • FIGS. 23A-23C are a flow chart depicting the operation of a program that can be used to implement a method using broad-band processing with frequency- limited adaptation;
  • FIGS. 24A-24C are a flow chart depicting the operation of a program that can be used to implement a method using broad-band processing with an external main-channel generator;
  • FIG. 25 is a functional diagram of the overall system including a microphone array, an A-to-D converter, a band-pass filter, an approximate-direction finder, a precise-direction finder, and a measurement qualification unit in accordance with the present invention
  • FIG. 26 is a perspective view showing the arrangement of a particular embodiment of the microphone array of FIG. 25;
  • FIG. 27 is a functional diagram of an embodiment of the approximate and exact direction finder of FIG. 25;
  • FIG. 28 is a functional diagram of an embodiment of the precise- direction finder of FIG. 25;
  • FIG. 29 is an exact-direction finder of FIG. 25;
  • FIG. 30 is the 3-D coordinate system used to describe the present invention;
  • FIG. 31 A is a functional diagram of a first embodiment of the measurement qualification unit of FIG. 25;
  • FIG. 3 IB is a functional diagram of a second embodiment of the measurement qualification unit of FIG. 25;
  • FIG. 31C is a functional diagram of a third embodiment of the measurement qualification unit of FIG. 25;
  • FIG. 3 ID is a functional diagram of a fourth embodiment of the measurement qualification unit of FIG. 25;
  • FIGS. 32A - 32D are a flow chart depicting the operation of a program that can be used to implement the method in accordance with the present invention.
  • FIGS. 33A-33J are diagrams of the present invention.
  • FIG. 1 is a block diagram of a system in accordance with a preferred embodiment of the present invention.
  • the system illustrated has a sensor array 1, a sampling unit 2, a main channel matrix unit 3, a reference channel matrix unit 4, a set of decolorizing filters 5, a set of frequency-selective constrained adaptive filters 6, a delay 7, a difference unit 8, an inhibiting unit 9, and an output D/A unit 10.
  • Sensor array 1 having individual sensors la- Id, receives signals from a signal source on-axis from the system and from interference sources located off-axis from the system.
  • the sensor array is connected to sampling unit 2 for sampling the received signals, having individual sampling elements, 2a-2d, where each element is connected to the corresponding individual sensor to produce digital signals 11.
  • sampling unit 2 are connected to main channel matrix unit 3 producing a main channel 12 representing signals received in the direction of a source.
  • the main channel contains both a source signal component and an interference signal component.
  • the outputs of sampling unit 2 are also connected reference channel matrix unit 4, which generates reference channels 13 representing signals received from directions other that of the signal source. Thus, the reference channels represent interference signals.
  • the reference channels are filtered through decolorizing filters 5, which generate flat-frequency reference channels 14 having a frequency spectrum whose magnitude is substantially flat over a frequency range of interest.
  • Flat- frequency reference channels 14 are fed into the set of frequency-selective constraint adaptive filters 6, which generate canceling signals 15.
  • main channel 12 is delayed through delay 7 so that it is synchronized with canceling signals 15.
  • Difference unit 8 then subtracts canceling signals 15 from the delayed main channel to generate an digital output signal 16, which is converted by D/A unit 10 into analog form.
  • Digital output signal 15 is fed back to the adaptive filters to update the filter weights of the adaptive filters.
  • Flat- frequency reference channels 14 are fed to inhibiting unit 9, which estimates the power of each flat- frequency reference channel as well as the power of the main channel and generates an inhibit signal 19 to prevent signal leakage.
  • FIG. 2 depicts a preferred embodiment of the sampling unit.
  • a sensor array 21, having sensor elements 21a-21d, is connected to an analog front end 22, having amplifier elements 22a-22d, where each amplifier element is connected to the output of the corresponding sensor element.
  • each sensor can be either a directional or omnidirectional microphone.
  • the analog front end amplifies the received analog sensor signals to match the input requirement of the sampling elements.
  • the outputs from the analog front ends are connected to a set of delta-sigma A/D converters, 23, where each converter samples and digitizes the amplified analog signals.
  • the delta-sigma sampling is a well-known A/D technique using both oversampling and digital filtering. For details on delta-sigma A/D sampling, see Crystal Semiconductor Corporation, Application Note: Delta-Sigma Techniques, 1989.
  • FIG. 3 shows an alternative embodiment of the sampling unit.
  • a sensor array 31, having sensor elements 31a-31d is connected to an amplifier 32, having amplifier elements 32a-32d, where each amplifier element amplifies the received signals from the corresponding sensor element.
  • the outputs of the amplifier are connected to a sample & hold (S/H) unit 33 having sample & hold elements 33a-33d, where each S/H element samples the amplified analog signal from the corresponding amplifier element to produce a discrete signal.
  • the outputs from the S/H unit are multiplexed into a single signal through a multiplexor 34.
  • the output of the multiplexor is connected to a conventional A/D converter 35 to produce a digital signal.
  • FIG. 4 is a schematic depiction of tapped delay lines used in the main channel matrix unit and the reference channel matrix in accordance with a preferred embodiment of the present invention.
  • the tapped delay line used here is defined as a nonrecursive digital filter, also known in the art as a transversal filter, a finite impulse response filter or an FIR filter.
  • the illustrated embodiment has 4 tapped delay lines, 40a-40d.
  • Each tapped delay line includes delay elements 41, multipliers 42 and adders 43.
  • Digital signals, 44a-44d are fed into the set of tapped delay lines 40a-40d. Delayed signals through delay elements 41 are multiplied by filter coefficients, F, j , 45 and added to produce outputs, 46a-46d.
  • n-th sample of an output from the i-th tapped delay line, Y,(n) can then be expressed as:
  • FIG. 5 depicts the main channel matrix unit for generating a main channel in accordance with a preferred embodiment of the present invention.
  • the unit has tapped delay lines, 50a-50d, as an input section taking inputs 51a-51d from the sampling unit.
  • Its output section includes multipliers, 52a-52d, where each multiplier is connected to the corresponding tapped delay line and an adder 53, which sums all output signals from the multipliers.
  • the unit generates a main channel 54, as a weighted sum of outputs from all multipliers.
  • the filter weights 55a-55d can be any combination of fractions as long as their sum is 1. For example, if 4 microphones are used, the embodiment may use the filter weights of 1/4 in order to take into account of the contribution of each microphone.
  • the unit acts as a beamformer, a spatial filter which filters a signal coming in all directions to produce a signal coming in a specific direction without physically moving the sensor array.
  • the coefficients of the tapped delay lines and the filter weights are set in such a way that the received signals are spatially filtered to maximize the sensitivity toward the signal source.
  • FIG. 6 depicts the reference channel matrix unit for generating reference matrix channels in accordance with a preferred embodiment of the present invention. It has tapped delay lines, 60a-60d, as an input section taking inputs 61a-61d from the sampling unit. The same tapped delay lines as that of FIG. 4 may be used, in which case the tapped delay lines may be shared by the main and reference channel matrix units.
  • Its output section includes multipliers, 62a-62d, 63a-63d, 64a-64d and adders 65a-65c, where each multiplier is connected to the corresponding tapped delay line and adder.
  • the unit acts as a beamformer which generates the reference channels 66a-66c representing signals arriving off-axis from the signal source by obtaining the weighted differences of certain combinations of outputs from the tapped delay lines.
  • the filter weight combinations can be any numbers as long as their sum of filter weights for combining a given reference channel is 0.
  • the net effect is placing a null (low sensitivity) in the receiving gain of the beamformer toward the signal source.
  • the reference channels represent interference signals in directions other than that of the signal source.
  • the unit "steers" the input digital data to obtain interference signals without physically moving the sensor array.
  • FIG. 7 is a schematic depiction of the decolorizing filter in accordance with a preferred embodiment of the present invention. It is a tapped delay line including delay elements 71, multipliers 72 and adders 73. A reference channel 74 is fed into the tapped delay line. Delayed signals are multiplied by filter coefficients, Fj, 75 and added to produce an output 76. The filter coefficients are set in such a way that the filter amplifies the low-magnitude frequency components of an input signal to obtain an output signal having a substantially flat frequency spectrum.
  • the output of a conventional adaptive beamformer suffers a non-uniform frequency behavior.
  • the reference channels do not have a flat frequency spectrum.
  • the receiving sensitivity of a beamformer toward a particular angular direction is often described in terms of a gain curve.
  • the reference channel is obtained by placing a null in the gain curve (making the sensor array insensitive) in the direction of the signal source.
  • the resulting gain curve has a lower gain for lower frequency signals than higher frequency signals. Since the reference channel is modified to generate a canceling signal, a non- flat frequency spectrum of the reference channel is translated to a non-uniform frequency behavior in the system output.
  • the decolorizing filter is a fixed-coefficient filter which flattens the frequency spectrum of the reference channel (thus “decolorizing" the reference channel) by boosting the low frequency portion of the reference channel.
  • the decolorizing filter in the illustrated embodiment uses a tapped delay line filter which is the same as a finite impulse response (FIR) filter, but other kinds of filters such as an infinite impulse response (IIR) filter can also be used for the decolorizing filter in an alternative embodiment.
  • FIR finite impulse response
  • IIR infinite impulse response
  • FIG. 8 depicts schematically the inhibiting unit in accordance with a preferred embodiment of the present invention. It includes power estimation units 81, 82 which estimate the power of a main channel 83 and each reference channel 84, respectively.
  • a sample power estimation unit 85 calculates the power of each sample.
  • a multiplier 86 multiplies the power of each sample by a fraction, ⁇ , which is the reciprocal of the number of samples for a given averaging period to obtain an average sample power 87.
  • An adder 88 adds the average sample power to the output of another multiplier 89 which multiplies a previously calculated main channel power average 90 by (1- ⁇ ).
  • a new main channel power average is obtained by (new sample power) x ⁇ + (old power average) x (1- ⁇ ).
  • the updated power average will be (new sample power) x 0.01 + (old power average) x 0.99. In this way, the updated power average will be available at each sampling instant rather than after an averaging period.
  • the illustrated embodiment shows an on-the-fly estimation method of the power average, other kinds of power estimation methods can also be used in an alternative embodiment.
  • a multiplier 91 multiplies the main channel power 89 with a threshold
  • a comparator 97 If the difference is positive, a comparator 97 generates an inhibit signal 98.
  • the inhibit signal is provided to the adaptive filters to stop the adaptation process to prevent signal leakage.
  • an alternative embodiment may normalize the reference channel power average instead of the main channel power average. For example, if the threshold 92 in the illustrated embodiment is 0.25, the same effect can be obtained in the alternative embodiment by normalizing each reference channel power average by multiplying it by 4.
  • This inhibition approach is different from the prior art SNR-based inhibition approach mentioned in the background section in that it detects the presence of significant directional interference which the prior art approach does not consider. As a result, the directional-interference-based inhibition approach stops the adaptation process when there is no significant directional interference to be eliminated, whereas the prior art approach does not.
  • the SNR-based approach would allow the adaptive filter to continue adapting due to the small SNR.
  • the continued adaptation process is not desirable because there is very little directional interference to be eliminated in the first place, and the adaptation process searches in vain for new filter weights to eliminate the uncorrelated noise, which often results in canceling the source signal component of the received signal.
  • FIG. 9 shows the frequency-selective constraint adaptive filter together with the difference unit in accordance with a preferred embodiment of the present invention.
  • the frequency-selective constraint adaptive filter 101 includes a finite impulse response (FIR) filter 102, an LMS weight updating unit 103 and a frequency- selective weight-constraint unit 104.
  • FIR finite impulse response
  • IIR infinite impulse response
  • a flat- frequency reference channel 105 passes through FIR filter 102 whose filter weights are adjusted to produce a canceling signal 106 which closely approximates the actual interference signal component present in a main channel 107.
  • the main channel is obtained from the main channel matrix unit after a delay in order to synchronize the main channel with the canceling signal.
  • there is a delay between the main channel and the canceling signal because the canceling signal is obtained by processing reference channels through extra stages of delay, i.e., the decolorization filters and adaptive filters.
  • the main channel directly from the main channel matrix unit may be used if the delay is not significant.
  • a difference unit 108 subtracts canceling signal 106 from main channel 107 to generates an output signal 109.
  • Adaptive filter 101 adjusts filter weights, W W n , to minimize the power of the output signal. When the filter weights settle, output signal 109 generates the source signal substantially free of the actual interference signal component because canceling signal 106 closely tracks the interference signal component. Output signal 109 is sent to the output D/A unit to produce an analog output signal. Output signal 109 is also used to adjust the adaptive filter weights to further reduce the interference signal component.
  • LMS Least Mean-Square
  • RLS Recursive Least Square
  • the adaptive filter weights are updated according to the following:
  • W p (n+1) W p (n) + 2 ⁇ r(n-p) e(n)
  • n a discrete time index
  • W p is a p-th filter weight of the adaptive filter
  • e(n) is a difference signal between the main channel signal and the canceling signal
  • r(n) is a reference channel
  • is an adaptation constant that controls the speed of adaptation.
  • FIG. 10 depicts a preferred embodiment of the frequency-selective weight-constraint unit.
  • the frequency-selective weight-control unit 110 includes a Fast Fourier Transform (FFT) unit 112, a set of frequency bins 114, a set of truncating units 115, a set of storage cells 116, and an Inverse Fast Fourier Transform (IFFT) unit 117, connected in series.
  • the FFT unit 112 receives adaptive filter weights 111 and performs the
  • Each frequency bin stores the frequency representation values within a specific bandwidth assigned to each bin.
  • the values represent the operation of the adaptive filter with respect to a specific frequency component of the source signal.
  • Each of the truncating units 115a-115h compares the frequency representation values with a threshold assigned to each bin, and truncates the values if they exceeds the threshold.
  • the truncated frequency representation values are temporarily stored in 116a-l 16h before the IFFT unit 117 converts them back to new filter weight values 118.
  • the frequency-selective weight-constraint unit further controls the adaptation process based on the frequency spectrum of the received source signal.
  • the weight-constraint mechanism is based on the observation that a large increase in the adaptive filter weight values hints signal leakage. If the adaptive filter works properly, there is no need for the filter to increase the filter weights to large values. But, if the filter is not working properly, the filter weights tend to grow to large values. .
  • One way to curve the growth is to use a simple truncating mechanism to truncate the values of filter weights to predetermined threshold values. In this way, even if the overall signal power may be high enough to trigger the inhibition mechanism, the weight-constraint mechanism can still prevent the signal leakage.
  • the filter weight values in the frequency domain will indicate some increase because they represent the operation of the adaptive filter in response to a specific frequency component of the source signal.
  • the frequency-selective weight-constraint unit detects that condition by sensing a large increase in the frequency representation values of the filter weights. By truncating the frequency representation values in the narrow frequency band of interest and inverse-transforming them back to the time domain, the unit acts to prevent the signal leakage involving narrow band signals.
  • DSP digital signal processing
  • FIG. 11 shows a flow chart depicting the operation of a program for a DSP processor in accordance with a preferred embodiment of the present invention.
  • the program After the program starts at step 100, the program initializes registers and pointers as well as buffers (step 110). The program then waits for an interrupt from a sampling unit requesting for processing of samples received from the array of sensors
  • step 120 When the sampling unit sends an interrupt (step 131) that the samples are ready, the program reads the sample values (step 130) and stores the values (step 140). The program filters the stored values using a routine implementing a tapped delay line and stores the filtered input values (step 141).
  • the program then retrieves the filtered input values (step 151) and main channel matrix coefficients (step 152) to generate a main channel (step 150) by multiplying the two and to store the result (step 160).
  • the program retrieves the filtered input values (step 171) and reference channel matrix coefficients (step 172) to generate a reference channel (reference channel #1) by multiplying the two (step 170) and to store the result (step 180). Steps
  • the program retrieves one of the reference channels (step 201) and decolorization filter coefficients for the corresponding reference channel (step 202) to generate a flat-frequency reference channel by multiplying the two (step 200) and stores the result (step 210). Steps 200 and 210 are repeated for all other reference channels (step 220). The program retrieves one of the flat- frequency reference channels (step
  • Step 232 adaptive filter coefficients to generate canceling signal (step 230) by multiplying the two and to store the result (step 240). Steps 230 and 240 are repeated for all other reference channels to generate more canceling signals (step 250).
  • the program retrieves canceling signals (steps 262-263) to subtract them from the main channel (retrieved at step 261) to cancel the interference signal component in the main channel (step 260).
  • the output is send to a D/A unit to reproduce the signal without interference in analog form (step 264).
  • the output is also stored (step 270).
  • the program calculates the power of a reference channel sample (step
  • step 281) and retrieves an old reference channel power average (step 282).
  • the program multiplies the sample power by ⁇ and the old power average by (1- ⁇ ), and sums them (step 280), and stores the result as a new power average (step 290).
  • This process is repeated for all other reference channels (step 300) and the total sum of power averages of all reference channels is stored (step 310).
  • the program multiplies the power of a main channel sample (retrieved at step 321) by ⁇ and an old main channel power average (retrieved at step 322) by (1- ⁇ ), sums them (step 320) and stores them as a new main channel power average (step 330).
  • the program then multiplies the main channel power with a threshold to obtain a normalized main channel power average (step 340).
  • the program subtracts the total reference channel power average (retrieved at step 341) from the normalized main channel power average to produce a difference (step 350). If the difference is positive, the program goes back to step 120 where it simply waits for another samples.
  • the program enters a weight-updating routine.
  • the program calculates a new filter weight by adding [2 x adaptation constant x reference channel sample (retrieved at step 361) x output (retrieved at step 362)] to an old filter weight (retrieved at step 363) to update the weight (step 360) and stores the result (step 370).
  • the program performs the FFT of the new filter weights to obtain their frequency representation (step 380).
  • the frequency representation values are divided into several frequency bands and stored into a set of frequency bins (step 390).
  • the frequency representation values in each bin are compared with a threshold associated with each frequency bin (step 400). If the values exceed the threshold, the values are truncated to the threshold (step 410).
  • the program performs the IFFT to convert the truncated frequency representation values back to filter weight values (step 420) and stores them (step 430).
  • the program repeats the weight-updating routine, steps 360- 430, for all other reference channels and associated adaptive filters (step 440).
  • the program then goes back to step 120 to wait for an interrupt for a new round of processing samples (step 450).
  • the microphone array of the present invention may be embodied as a digital super directional arrayTM (DSDA) 120 shown in Fig. 12 A.
  • the DSDA 120 shown in Fig. 12A is formed of a substantially cylindrical housing or "wand" which is elongated in one direction with the microphone elements of the array arranged therein and aligned with slats or any other suitable spacing for allowing sound to be received by the microphones in the array.
  • the DSDA 120 may be incorporated into a keyboard 122 as shown in Fig. 12B, an automobile visor 124 as shown in Fig. 12C, an automobile mirror 126 as shown in Fig. 12D, a mouse 128 as shown in Fig. 12E or a video camera 130 as shown in Fig. 12F.
  • the keyboard 122 shown in Fig. 12B incorporates the DSDA 120 therein with the microphones aligned with slats or any other suitable spacing for allowing sound to be received by the microphone.
  • the DSDA processing may be performed by hardware implementing the adaptive beamforming technique of the present invention or may couple the microphone array signals to a computer (not shown) through the serial keyboard port, COM port, LPT port, USB port or other suitable means such as radio frequency or infra-red transmission.
  • the adaptive beamforming technique is performed by software installed in the computer.
  • the DSDA may be flush with the keyboard so as not to be distinguishable or may be formed within a raised portion which serves to position the DSDA closer to the computer user's mouth.
  • the raised portion may be elongated in a direction toward a position commensurate with a typical position of the computer user's mouth such as directly in front of and above the keyboard.
  • the DSDA may be housed in a boom or a wand which is either attached to the computer at one end fixedly or non-fixedly such as a hinge or may be coupled by a connecting wire.
  • the DSDA may be wirelessly coupled to the keyboard by any suitable wireless transmission means.
  • the keyboard may include a receiving platform such as a stand or a depression for receiving the DSDA whereby the DSDA is removed by the user from the receiving platform and either spoken into like a hand-held microphone or placed by the user in any convenient location. Fig.
  • the DSDA array may be configured as a microphone at one or more corners of the keyboard including all four corners.
  • the DSDA may be configured as a plurality of microphones in any or all of the corners of the keyboard such as two or four microphones in each corner.
  • the DSDA may be embodied integrally with the keyboard such as flip-up style accessory built into the keyboard which otherwise is unnoticeable when concealed within the keyboard and, when tilted upward, advances beyond the surface of the keyboard to expose the microphone array.
  • Fig. 12B further illustrates the DSDA integrated into the bottom of the keyboard with the slats or other means for allowing sound to enter the microphones adjacent thereto to receive audio signals.
  • a small gap between the bottom of the keyboard and a supporting surface such as a desktop which creates a pressure zone microphone effect between the DSDA facing downward and the supporting surface which minimizes acoustic reflections to achieve direct sound reception by the DSDA.
  • the DSDA is configured on the bottom of the keyboard adjacent or at the leading edge. The instant invention takes into account the tendency of digital processing to fail to remove acoustic reflections caused by audible sounds reflecting off surfaces in a room or other objects therein.
  • Fig. 12C illustrates a further embodiment of the DSDA which is housed in a substantially flat housing 134 having two substantially parallel sides elongated in one direction with one side including slats or other suitable means for allowing sound to be received by the microphones arranged adjacent thereto.
  • the DSDA has a substantially flat profile such that the DSDA may fit snugly between an upper surface of the visor 124 of the automobile and a ceiling of the interior of the automobile.
  • a holding member 136 may be included for holding the DSDA which includes a pair of opposed pincer-like arms 138 which receive and hold therein the DSDA. It is possible that each arm includes a distal portion formed such that a spacing between opposed distal portions is slightly less than a width of the DSDA and a spacing between opposed arms is substantially the same width as the DSDA.
  • the DSDA is inserted into said holding member by forcibly inserting the DSDA between said pair of opposed arm causing the opposed arms to be slightly separated such that the DSDA is slid therebetween and afterward the DSDA is snapped in place in an area formed between the opposed arms.
  • the DSDA is slid into the area formed between the opposed arms from one side.
  • the DSDA housing may be formed with a longitudinal groove which meets a protruding portion of the housing member such as the distal portion of one or more of the arms or a nodule formed in said housing specifically constructed for meeting the groove for holding the DSDA within the housing.
  • the DSDA may be affixed to the visor, or any portion of the automobile for that matter, by use of suction cups such as on the window or the dashboard, magnetic strips formed on the DSDA and the surface where the DSDA is to be mounted or VelcroTM, for example.
  • hooking members 132 substantially formed in a shape resembling clothespins may be provided for hooking the DSDA or holding member which holds the DSDA to the visor whereby the hooking members include a spacing between flexibly rigid opposed members which are curved to provide increasing resistance when spread apart.
  • one side of the hooking member engages an opening or slot 140 formed within the holding member while the opposite side of the hooking member engages the bottom of the visor such that the visor is sandwiched between the opposed sides of the hooking member firmly enough such that the visor can be swung open and closed without the hooking members losing the holding member or the DSDA held therein.
  • the DSDA coupled to the visor of an automobile creates a small air gap between the DSDA and the ceiling of the automobile thereby generating a pressure zone effect and minimizing acoustic reflections within the automobile.
  • Fig. 12D illustrates the DSDA integrated into a rear-view mirror 126 of an automobile.
  • the microphones of the array are spaced along the rim of the rear-view mirror.
  • this microphone arrangement has similar properties to the array shown in Fig. 26.
  • a flexible, tubular housing may be provided for housing the microphones in the array such that the tubular-housing may be applied by the driver by fitting the tubular housing around the rear- view mirror for ease of installation.
  • the DSDA may be provided in a long wand-like member attached to the rear-view mirror 126.
  • the processing hardware for processing the sound received by the microphone array may be incorporated into the interior of the rear-view mirror or, alternatively, transmitted via wired or wireless transmission means to a remote processor within the automobile. It is within the scope of the invention to provide, as a separate processor or integrated with the processor for processing the audio signal, a processor for controlling automobile components such as the radio, ca ⁇ hone or global positioning satellite navigation system, for example, in accordance with the processed sound received by the DSDA.
  • Fig. 12E illustrates the DSDA integrated within a mouse 128 wherein the microphones of the microphone array are disposed adjacent slats or other means for suitably allowing sound to be received by the microphones.
  • the mouse is otherwise a standard mouse except for the DSDA.
  • additional features may be linked to the mouse keys which effect array performance such as array volume, beam direction, setting an array type, array tuning, etc.
  • the mouse may be wired or wirelessly connected to the DSDA processing circuitry and/or a personal computer.
  • Fig. 12F illustrates the DSDA integrated with a video camera such as that used for video teleconferencing over, for example, the internet.
  • the DSDA may be inco ⁇ orated as a peripheral to the video camera which, by coupling means comprising such as a microphone plug, the DSDA is coupled to the video camera by, for example, a holding member or VelcroTM.
  • the DSDA in this embodiment may be inco ⁇ orated into the video camera.
  • the DSDA may include wireless transmission to the video camera such that the video operator may place the DSDA in the vicinity of the talent, such as an actor or actress, and record the scene from a distance.
  • Fig. 12G illustrates the noise canceling stethoscope of the present invention which inco ⁇ orates the DSDA.
  • the noise canceling stethoscope is inco ⁇ orated herein by reference to U.S. Patent Serial No. 08/963,164 filed November
  • the present invention is applicable to medical applications including ultrasound for canceling noise when reading ultrasound vibrations echoed in a body and retrieved for reconstruction on a display of the portion of the body including, for example, ultrasound examinations for imaging fetuses. It will be instantly recognized that removing noise in such medical applications from either sound received by the noise canceling stethoscope or ultrasound advantageously cancels noise, thereby providing improved audio signals for the noise canceling stethoscope or for imaging in ultrasound.
  • the DSDA or microphone may be inco ⁇ orated in the noise canceling stethoscope or ultrasound device and/or the hardware/software for processing the audio to remove the noise may be inco ⁇ orated in those devices as well.
  • the DSDA of the present invention is inco ⁇ orated into the remote control and keypad for the set top box as illustrated in Fig. 12H. The remote control, as described in copending U.S. Appln. Ser. No.
  • UVI may be inco ⁇ orated into the remote control to interface the speech signals received by the DSDA or the remote control may include the noise cancellation/reduction processing herein described.
  • the set top box may include the speech processing.
  • the set top box as described interfaces to a television to provide both operation of the television and external sources such as cable services, internet or other on-line service. To that end, the set top box may inco ⁇ orate the processing ability for supporting internet access.
  • Fig. 13 illustrates the universal voice interface 142 which may embody the DSDA. It will be appreciated that any type of microphone may be inco ⁇ orated as UVI, including a dieletret, a stereo, unidirectional or multi-directional microphone.
  • the received audio signals are transferred from the universal voice interface by any appropriate means including wired or wireless transmission such as infrared or radio frequency transmission.
  • the universal voice interface may comprise the microphone by itself or include interface circuitry such as analog-to- digital converters and a multiplexor for interface into a computer processor.
  • the audio signals are received by any known communication port of a computer including the serial or parallel port or for that matter the USB port.
  • a device driver may be included for the driving the processor 144 or the processor itself may strobe the appropriate port register for the audio signals converted into digital data.
  • the audio signals may be input to a sound card installed in the computer and then forwarded by the appropriate device driver to the processor 144.
  • either the device driver or the sound card may provide the processing circuitry or software for processing the audio signal to remove noise.
  • the audio signals are signal processed to remove the noise in accordance with the adaptive beam forming techniques described herein.
  • the audio processing may include noise cancellation which inverts the noise portion of the signal, extracted by a separate microphone or spectral processing subtraction, and subtracted from the main reference signal.
  • Fig. 14 illustrates the universal voice interface circuitry.
  • the universal voice interface may be provided by software.
  • the universal voice interface is coupled to the DSDA inco ⁇ orated around the rim of the rear- view mirror 126. It will be appreciated that other microphone arrangements may be inco ⁇ orated with the universal voice interface circuitry such as those illustrated in Figs. 12A-F.
  • the universal voice interface circuitry may be inco ⁇ orated into the microphone arrangement or coupled thereto by any appropriate transmission means including wired or wireless transmission.
  • the UVI interfaces the analog audio signals received by the microphone arrangement to a digital processor such as that found in a personal computer and, therefore, includes an analog-to-digital converter series 146, each A/D converter in the series corresponding to a microphone in the DSDA.
  • a single analog-to-digital converter may be provided where, for example, a single microphone element is employed.
  • the A/D converters 146 are driven by a 44KHz clock controlled by the microprocessor 150; but the clock may be of any clock speed corresponding to the processor speed of the system.
  • the A D converters 146 output 16-bit samples; but the sample size may vary for different systems or application.
  • the digitized samples are coupled to a multiplexer
  • the microprocessor 150 may include processing hardware/software for processing the audio samples in a format which agrees with the later digital speech processor.
  • the microprocessor 150 may provide the audio processing such as the adaptive beam forming techniques or noise cancellation techniques herein described. This example illustrates that the microprocessor 150 is manufactured as an application-specific integrated circuit (ASIC); but the present invention also may be practiced as other IC structures.
  • ASIC application-specific integrated circuit
  • the processed audio signal is forwarded to a digital speech processor, such as the speech recognition unit which recognizes and controls audio driven components.
  • Fig. 15 shows the adaptive beam forming technique of the present invention inco ⁇ orated in the UVI.
  • the audio signals received by each microphone in the DSDA is received by the A/D converter 146 and, once digitally converted, are coupled to corresponding band pass filters 154 which act as digital samplers.
  • the A/D converter 146 may also act as a band pass filter whose filtering characteristic is controlled by the clock rate which operates the A/D converter.
  • a direction calculation unit 156 is provided for calculating the direction of an audio sound source which drives the band pass filters to steer the direction in which sound is primarily received in accordance with the direction calculated by the direction calculation unit 156.
  • the main channel matrix 158 is provided for receiving the signals in the main channel of the beam formed by the direction calculation unit 156 in accordance with weights provided from the direction calculation unit.
  • the reference channel matrix 160 is provided for receiving the audio signals which are substantially not within the beam formed by the direction calculation unit 156. It will be appreciated that the direction calculation unit is controlled by the system controller 172. Down converters 162, 164 are provided to down convert the signals received by the main and reference channels respectively.
  • the dedicated adaptive filter 166 adaptively processes the audio signals in accordance with the adaptive beam forming techniques described herein.
  • An arithmetic logic unit 168 is provided for subtracting from the main channel the adaptively formed reference channel noise as controlled by the system controller 172.
  • the resulting substantially noise-free signal is provided to a multiplexer 170 which multiplexes the noise-free signal with the main channel signal as controlled by the signal controller 172.
  • the system controller 172 controls the multiplexer 170 to either select the main channel or the channel with noise removed.
  • the multiplexer is a four input multiplexer; however, any other equivalent means may be provided for selecting between the signals.
  • the multiplexed signal is output to a digital speech processor such as a speech recognition processor which recognizes speed and controls in response thereto audio/driven components.
  • Figs. 16 and 17 illustrate the operation of the universal voice interface.
  • step 174 the system is reset.
  • Control advances to step 176 wherein the direction calculation unit is enabled.
  • a determination is made in step 178 whether the direction calculation is in error and advances to step 180 where a system alarm is raised if the answer to the determination is in the affirmative. Otherwise, the direction calculation is correct and control advances to step 182 wherein a direction result is awaited.
  • control advances to step 184 wherein the main end reference channel weights are set for the respective main and reference channel matrices.
  • step 186 the system awaits a ready signal and, upon receiving the ready signal, step 188 enables, the down conversion. The operation is further described in Fig.
  • step 190 enables the dedicated adaptive filter. If a filter error is detected in step 192, and, if it is determined that a filter error occurs, an alarm is raised and the system is reset in step 196. In step 194, the filtered result is awaited and, upon receiving the filtered result, the arithmetic logic unit is enabled in step 198. If the ALU commits an error as determined by step 200, an alarm is raised and the system is reset in step 210. Otherwise, the multiplexer is enabled in step 212.
  • Fig. 18 shows the universal voice interface inco ⁇ orated with a computer monitor wherein the microphones of the DSDA array are situated around the perimeter of the face of the monitor facing the computer user.
  • the UVI inco ⁇ orates A/D converters 146 which digitize the audio signals received from the respective microphones in the array.
  • the A/D converters 146 are inco ⁇ orated into the body of a standard personal computer plug which plugs into any standard type of personal computer port.
  • the plug is an RS-232 Interface/Parallel Port Plug which plugs into a corresponding RS-232/Parallel port.
  • a monitor it is well within the scope of the present invention to inco ⁇ orate the microphone arrangement which include either the microphone array or the several types of microphones described in virtually any fixture or appliance such as those shown in Figs. 12A-F.
  • FIG. 19 shows one preferred embodiment of the present invention using sub-bands where an adaptive filter driven from the sub-bands rather than the entire bandwidth of the input signal.
  • Sub-bands result from partitioning a broader band in any manner as long as the subbands can be combined together so that the broader band can be reconstructed without distortions.
  • perfect reconstruction structures see P.P. Vaidyanathan, Quadrature Mirror Filter Banks, M- Band Extensions and Perfect-Reconstruction Techniques, IEEE ASSP Magazine, pp. 4- 20, July 1987.
  • a broader band is partitioned into sub- bands, using several partitioning steps successively through intermediate bands.
  • Broadband inputs from an array of sensors, 191 a- 191 d are sampled at an appropriate sampling frequency and entered into a main-channel matrix 192 and a reference- channel matrix 193.
  • the main-channel matrix generates a main channel, a signal received in the main looking direction of the sensor array, which contain a target signal component and an interference component.
  • FI, 194, and F2, 195 are splitters which first split the main channel into two intermediate bands, followed by down-sampling by two. Down-sampling is a well-known procedure in digital signal processing.
  • Down- sampling by two is a process of sub-sampling by taking every other data point. Down-sampling is indicated by a downward arrow in the figure.
  • Splitters F3, 196 and F4, 197 further split the lower intermediate band into two sub-bands followed by down-sampling by two.
  • the result is a 0-4 Khz lower sub-band with 1/4 of the input sampling rate, a 4-8 Khz upper sub-band with 1/4 of the input sampling rate, and another upper 8-16 Khz intermediate band with 1/2 of the input sampling rate.
  • the reference channels are processed in the same way by filters FI, 198, and F2, 199, to provide only the lower sub-band with 1/4 of the input sampling rate, while the other sub-bands are discarded.
  • the lower sub-bands of the reference channels are fed into an adaptive filter 1910, which generates canceling signals approximating interferences present the main channel.
  • a subtracter 1911 subtracts the canceling signals from the lower sub- band of the main channel to generate an output in the lower sub-band.
  • the output is fed back to the adaptive filter for updating the filter weights.
  • the adaptive filter processing and the subtraction is performed at the lower sampling rate appropriate for the lower sub-band.
  • the other upper bands of the main channel are delayed by delay units, 1912 and 1913, each by an appropriate time, to compensate for various delays caused by the different processing each sub-band is going through, and to synchronize them with the other sub-bands.
  • the delay units can be implemented by a series of registers or a programmable delay.
  • the output from the subtracter is combined with the other two sub-bands of the main channel through the reconstruction filters Hl-
  • H4 1914-1917, to reconstruct a broadband output.
  • H1-H4 may be designed such that they together with F1-F4 provide a theoretically perfect reconstruction without any distortions.
  • Reconstructors H3 and H4 combine the lower and upper sub-bands into a low intermediate band, followed by an inte ⁇ olation by two.
  • An inte ⁇ olation is a well- known procedure in digital signal processing. Inte ⁇ olation by two, for example, is an up-sampling process increasing the number of samples by taking every other data point and inte ⁇ olating them to fill as samples in between. Up-sampling is indicated by an upward arrow in the figure.
  • the reconstructors HI, 1916 and H2, 1917 further combine the two intermediate bands into a broadband.
  • non-adaptive filter processing is performed in the upper sub-band of 194-1916 Khz.
  • Adaptive filter processing is performed in the lower sub-band of 0-4 Khz where most of interferences are located. Since there is little computation overhead involved in the non-adaptive filter processing, the use of non- adaptive filter processing in the upper sub-band can reduce the computational burden significantly. The result is superior performance without an expensive increase in the required hardware.
  • FIG. 20 shows another preferred embodiment using broadband processing with band-limited adaptation.
  • the embodiment uses broadband canceling signals which act on a broadband main channel. But, since adaptive filter processing is done in a low-frequency domain, the resulting canceling signals are converted to a broadband signal so that it can be subtracted from the broadband main channel.
  • broadband inputs from an array of sensors, 2021a-2021d are sampled at an appropriate sampling frequency and entered into a main-channel matrix 2022 and a reference-channel matrix 2023.
  • the main-channel matrix generates a main channel, a signal received in the main-looking direction, which has a target signal component and an interference component.
  • the reference-channel matrix generates reference channels representing interferences received from all other directions.
  • a low- pass filter 2025 filters the reference channels and down-samples them to provide low- frequency signals to an adaptive filter 2026.
  • the adaptive filter 2026 acts on these low-frequency signals to generate low- frequency canceling signals which estimate a low- frequency portion of the interference component of the main channel.
  • the low-frequency canceling signals are converted to broadband signals by an inte ⁇ olator 2028 so that they can be subtracted from the main channel by a subtracter 2029 to produce a broadband output.
  • the broadband output is low-pass filtered and down-sampled by a filter 2024 to provide a low- frequency feedback signal to the adaptive filter 2026.
  • the main channel is delayed by a delay unit 2027 to synchronize it with the canceling signals from the adaptive filter 2026.
  • FIG. 21 shows yet another preferred embodiment similar to the previous embodiment except that an external main-channel generator is used instead of a main- channel matrix to obtain a broadband main channel.
  • This embodiment is useful when it is desired to take advantage of the broadband capabilities of commercially available hi- fi microphones.
  • a broadband input is obtained by using an external main-channel generator, such as a shotgun microphone 2143 or a parabolic dish 2144.
  • the broadband input is sampled through a high fidelity A-to-D converter 2145.
  • the sampling rate should preferably be high enough to maintain the broad bandwidth and the audio quality of the external main-channel generator.
  • a reference-channel matrix 2142 is used to obtain low-frequency reference channels representing interferences in the low-frequency domain. Since adaptive filter processing is done in the low- frequency domain, the reference-channel matrix does not need a broadband capability.
  • a subtracter 2150 is used to subtract canceling signals estimating interferences from the broadband input.
  • the broadband output is filtered by a low-pass filter 2146 which also performs down-sampling.
  • the low-pass filtered output and the low-frequency reference channels are provided to an adaptive filter 2147.
  • the adaptive filter acts on these low frequency signals to generate low-frequency canceling signals.
  • the broadband input is delayed by a delay unit 2148 so that it can be synchronized with the canceling signals from the adaptive filter 2147.
  • the delay unit can be implemented by a series of registers or by a programmable delay.
  • the low- frequency canceling signals are converted to broadband canceling signals by an inte ⁇ olator 2149 so that they can be subtracted from the broadband main channel to produce the broadband output.
  • DSP digital signal processor
  • FIGS. 22A-22D are a flow chart depicting the operation of a program in accordance with the first preferred embodiment of the present invention using sub-band processing.
  • the program Upon starting at step 22100, the program initializes registers and pointers as well as buffers (steps 22110-22120).
  • a sampling unit sends an interrupt (step 22131) that samples are ready
  • the program reads the sample values (step 22130), and stores them in memory (step 22140).
  • the program retrieves the input values (step 22151) and main-channel matrix coefficients (step 22152) to generate a main channel by filtering the inputs values using the coefficients (step 22150), and then stores the result in memory (step 22160).
  • the program retrieves the input values (step 22171) and reference- channel matrix coefficients (step 22172) to generate a reference channel by filtering the input values using the coefficients (step 22170), and then store the result (step 22180). Steps 22170 and 22180 are repeated to generate all other reference channels (step 22190).
  • the program retrieves the main channel (step 22201) and the FI filter coefficients (step 22202) to generate an lower intermediate band with 1/2 of the sampling rate appropriate for the whole main channel by filtering the main channel with the coefficients and down-sampling the filtered output (step 22210), and then stores the result (step 22220).
  • the F2 filter coefficients are used to generate a upper intermediate band with 1/2 of the sampling rate (step 22240).
  • the F3 and F3 filter coefficients are used to further generate a lower sub-band with 1/4 of the sampling rate
  • step 22260 and a upper sub-band with 1/4 of the sampling rate (step 22280).
  • the program retrieves one of the reference channels (step 22291) and the FI filter coefficients (step 22292) to generate an intermediate band with 1/2 of the sampling rate by filtering the reference channel with the coefficients and down- sampling the filtered output (step 22290), and then stores the result (step 22300). Similarly, the F2 filter coefficients are used to generate a lower sub-band with 1/4 of the sampling rate (step 22320). Steps 22290-22320 are repeated for all the other reference channels (step 22330).
  • the program retrieves the reference channels (step 22341) and the main channel (step 22342) to generate canceling signal using an adaptive beamforming process routine (step 22340). The program subtracts the canceling signals from the main channel to cancel the interference component in the main channel (step 22350).
  • the program then inte ⁇ olates the output from the adaptive beamforming process routine (step 22360) and filtering the output with the H3 filter coefficients (step 22361) to obtain an up-sampled version (step 22370).
  • the program also inte ⁇ olates the main channel in the lower band (step 22380) and filters it with the H4 filter coefficients (step 22381) to obtain an up-sampled version (step 22390).
  • the program combines the up-sampled versions to obtain a lower intermediate main channel (step 22400).
  • the program inte ⁇ olates the lower intermediate main channel (step
  • step 22420 filters it with the HI filter coefficients (step 22420) to obtain an up-sampled version (step 22420).
  • the program also inte ⁇ olates the upper intermediate main channel (step 22430) and filters it with the H2 filter coefficients (step 22431) to obtain an up-sampled version (step 22440).
  • the program combines the up-sampled versions to obtain a broadband output (step 22450).
  • FIGS. 23A-C are a flow chart depicting the operation of a program in accordance with the second preferred embodiment of the present invention using broadband processing with frequency- limited adaptation.
  • the program Upon starting at step 23500, the program initializes registers and pointers as well as buffers (steps 23510-23520). When a sampling unit sends an interrupt (step 23510-23520).
  • the program reads the sample values (step 23530), and stores them in memory (step 23540).
  • the program retrieves the broadband sample values (step 23551) and the main-channel matrix coefficients (step 23552) to generate a broadband main channel by filtering the broadband sample values with the coefficients (step 23550), and then stores the result in memory (step 23560).
  • the program retrieves the broadband samples (step 23571) and reference-channel matrix coefficients (step 23572) to generate a broadband reference channel by filtering the samples using the coefficients (step 23570), and then stores the result (step 23580). Steps 23570 and 23580 are repeated to generate all the other reference channels (step 23590).
  • the program retrieves the reference channels (step 23601) which are down-sampled (step 23602), the main channel (step 23603) which is also down-sampled to the low sampling rate (step 23604), and the low-frequency output (step 23605) to generate a low-frequency canceling signal (step 23600) using an adaptive beamforming process routine.
  • the program updates the adaptive filter weights (step 23610) and inte ⁇ olates the low-frequency canceling signal to generate a broadband canceling signal (step 23620). Steps 23610-23620 are repeated for all the other reference channels (step 23630).
  • the program subtracts the canceling signals from the main channel to cancel the interference component in the main channel (step 23640).
  • the program low-pass filters and inte ⁇ olates the broadband output (step 23650) so that the low-frequency output can fed back to update the adaptive filter weights.
  • FIGS. 24A-24C are a flow chart depicting the operation of a program in accordance with the third preferred embodiment of the present invention using broadband processing with an external main-channel generator.
  • the program Upon starting at step 24700, the program initializes registers and pointers as well as buffers (steps 24710-24720). When a sampling unit sends an interrupt (step 24731) that samples are ready, the program reads the sample values (step 24730), and stores them in memory (step 24740).
  • the program then reads a broadband input from the external main- channel generator (step 24750), and stores it as a main channel (step 24760).
  • the program retrieves the low-frequency input (step 24771) and reference-channel matrix coefficients (step 24772) to generate a reference channel by multiplying the two (step 24770), and then stores the result (step 24780). Steps 24770 and 24780 are repeated to generate all the other reference channels (step 24790).
  • the program retrieves the low-frequency reference channels (step 24801), the main channel (step 24802) which is down-sampled (step 24803), and a low- frequency output (step 24604) to generate low-frequency canceling signals (step 24600) using an adaptive beamforming process routine.
  • the program updates the adaptive filter weights (step 24810) and inte ⁇ olates the low-frequency canceling signal to generate the broadband canceling signal (step 24820). Steps 24810-24820 are repeated for all the other reference channels (step 24830). The program subtracts the broadband canceling signals from the broadband main channel to generate the broadband output with substantially reduced interferences (step 24840).
  • the program low-pass filters and inte ⁇ olates the broadband output (step 24850) so that the low-frequency output can fed back to update the adaptive filter weights.
  • FIG. 25 shows the functional blocks of a preferred embodiment in accordance with the present invention.
  • the embodiment deals with finding the direction of a sound source, but the invention is not limited to such. It will be understood to those skilled in the art that the invention can be readily used for finding the direction of other wave sources such as an electromagnetic wave source.
  • the system includes an array of microphones 2501 that sense or measure sound from a particular sound source and that produce analog signals 2507 representing the measured sound. The analog signals 2507 are then sampled and converted to corresponding digital signals 2508 by an analog-to-digital (A-to-D) converter 2502.
  • A-to-D analog-to-digital
  • the digital signals 2508 are filtered by a band-pass filter 2503 so that the filtered signals 2509 contain only the frequencies in a specific bandwidth of interest for the pu ⁇ ose of determining the direction of the sound source.
  • the filtered signals 2509 are then fed into an approximate-direction finder 2504 which calculates an approximate direction 2510 in terms of a microphone pair selected among the microphones.
  • the precise-direction finder 5 estimates the precise-direction 2511 of the sound source based on the approximate direction.
  • the validity of the precise-direction 2511 is checked by a measurement qualification unit 2506, which invalidates the precise direction if it does not satisfy a set of measurement criteria.
  • FIG. 26 shows an example of the array of microphones 1 that may used in accordance with the present invention.
  • the microphones sense or measure the incident sound waves from a sound source and generate electronic signals (analog signals) representing the sound.
  • the microphones may be omni, cardioid, or dipole microphones, or any combinations of such microphones.
  • the example shows a cylindrical structure 2621 with six microphones
  • the microphone array may take on a variety of different geometries such as a linear array or a rectangular array.
  • the analog signals representing the sound sensed or measured by the microphones are converted to digital signals by the A-to-D converter 2502, which samples the analog signals at an appropriate sampling frequency.
  • the converter may employ a well-known technique of sigma-delta sampling, which consists of oversampling and built-in low-pass filtering followed by decimation to avoid aliasing, a phenomenon due to inadequate sampling.
  • an analog signal When an analog signal is sampled, the sampling process creates a mirror representation of the original frequencies of the analog signal around the frequencies that are multiples of the sampling frequency. "Aliasing” refers to the situation where the analog signal contains information at frequencies above one half of the sampling frequency so that the reflected frequencies cross over the original frequencies, thereby distorting the original signal. In order to avoid aliasing, an analog signal should be sampled at a rate at least twice its maximum frequency component, known as the Nyquist frequency.
  • a sampling frequency far greater than the Nyquist frequency is used to avoid aliasing problems with system noise and less-than-ideal filter responses.
  • This oversampling is followed by low-pass filtering to cut off the frequency components above the maximum frequency component of the original analog signal.
  • the rate must be reduced by decimation. If the oversampling frequency is n times the Nyquist frequency, the rate of the digital signal after oversampling needs to be reduced by decimation, which takes one sample for every n samples input.
  • the pu ⁇ ose of the bandpass filter 2503 is to filter the signals sensed or measured by the microphones so that the filtered signals contain those frequencies optimal for detecting or determining the direction of the signals. Signals of too low a frequency do not produce enough phase difference at the microphones to accurately detect the direction. Signals of too high a frequency have less signal energy and are thus more subject to noise. By suppressing signals of the extreme high and low frequencies, the bandpass filter 2503 passes those signals of a specific bandwidth that can be further processed to detect or determine the direction of the sound source. The specific values of the bandwidth depends on the type of target wave source. If the source is a human speaker, the bandwidth may be between 300 Hz and 1500 Hz where typical speech signals have most of their energy concentrated.
  • the bandwidth may also be changed by a calibration process, a trial-and-error process. Instead of using a fixed bandwidth during the operation, initially a certain bandwidth is tried. If too many measurement errors result, the bandwidth is adjusted to decrease the measurement errors so as to arrive at the optimal bandwidth.
  • the system first finds the approximate direction of the sound source, without the burden of heavy computation, and subsequently calculates the precise direction by using more computation power.
  • the approximate direction is also used to determine the subset of microphones that are relevant to subsequent refinement of the approximate direction.
  • some of the microphones may not have a line of sight to the source, and thus may create phase errors if they participate in further refinement of the approximate direction. Therefore, a subset of microphones are selected that would be relevant to further refinement of the source direction.
  • FIG. 27 shows the approximate and exact direction finding units 2721, 2722.
  • FIG. 3 shows the approximate-direction finder 2821 in detail. It is based on the idea of specifying the approximate direction of the sound source in terms of a direction pe ⁇ endicular to a pair of microphones. Let peripheral microphone pairs be the microphones located adjacent to each other around the periphery of the structure holding the microphones, except that a microphone located at the center of the structure, if any, are excluded. For each peripheral microphone pair, "pair direction” is defined as the direction in the horizontal plane, pointing from the center of the pair outward from the structure, pe ⁇ endicular to the line connecting the peripheral microphone pair. "Sector direction” is then defined as the pair direction closest to the source direction, selected among possible pair directions. If there are n pairs of peripheral microphones, there would be n candidates for the sector direction.
  • the sector direction corresponding to the sound source is determined using a zero-delay cross-correlation.
  • a correlation calculator 2831 calculates a zero-delay cross-correlation of two signals received from the microphone pair, X;(t) and X j (t). It is known to those skilled in the art that such a zero-delay cross-correlation function, R, j (0), over a time period T can be defined by the following formula:
  • a correlation calculator is well-known to those skilled in the art and may be available as an integrated circuit. Otherwise, it is well-known that such a correlation calculator can be built using discrete electronic components such as multipliers, adders, and shift registers.
  • block 2832 finds the sector direction by selecting the microphone pair that produces the maximum correlation. Since the signals having the same or similar phase are correlated with each other, the result is to find the pair with the same phase (equi -phase) or having the least phase difference. Since the plane of equi-phase is pe ⁇ endicular to the propagation direction of the sound wave, the pair direction of the maximum correlation pair is, then, the sector direction, i.e., the pair direction closest to the source direction.
  • block 2833 identifies the microphones that participate in further refinement of the approximate direction.
  • "Sector" is defined as the subset of the microphones in the microphone array, which participate in calculating the precise direction of the sound source. For example, where some of the microphones in the array are blocked by a mechanical structure, the signals received by those microphones are not likely to be from direct-travelling waves, and thus such microphones should be excluded from the sector.
  • the sector includes the maximum- correlation peripheral microphone pair, another peripheral microphone adjacent to the pair, and a center microphone, if any.
  • the maximum-correlation peripheral microphone pair Of two peripheral microphones adjacent to the maximum-correlation peripheral microphone pair, the one with a higher zero-delay cross-correlation is selected.
  • the inclusion of the center microphone is optional, but the inclusion helps to improve the accuracy of the source direction, because otherwise three adjacent microphones would be arranged almost in a straight line.
  • There may be other ways of selecting the microphones to be included in the sector and the information about such selection schemes may be stored in computer memory for an easy retrieval during the operation of the system.
  • the precise-direction finder 5 calculates the precise direction of the sound source using a full cross-correlation.
  • Block 2941 first identifies all possible combinations of microphone pairs within the sector. For each microphone pair identified, block 2942 calculates a full cross-correlation, R ⁇ j ( ⁇ ), over a time period T using the follow formula, a well-known formula to those skilled in the art:
  • a correlation calculator is well-known to those skilled in the art and may be available as an integrated circuit. Otherwise, it is well-known that such a correlation calculator can be built using discrete electronic components such as multipliers, adders, and shift registers.
  • R._.( ⁇ ) can be plotted as a cross-correlation curve.
  • the time delay between any two sensors is equal to the projection of the distance vector between them along the K vector divided by the sound velocity.
  • T d vector can be expressed as follows:
  • T d - ( R K ) / c
  • c the speed of sound
  • R denotes the matrix representing the geometry of the microphone array in terms of position differences among the microphones as follows:
  • the B matrix depends only on the geometry of the microphone array, and thus can be computed off-line, without burdening the computation requirement during the direction determination.
  • Block 2946 converts K into polar coordinates.
  • FIG. 5 shows the 3- dimensional coordinate system used in the present invention.
  • An azimuth angle, ⁇ is defined as the angle of the source direction in the horizontal plane, measured clockwise from a reference horizonal direction (e.g. x-axis).
  • An elevation angle, ⁇ is defined as the vertical angle of the source direction measured from the vertical axis (z-axis).
  • Block 2946 calculates ⁇ and ⁇ from K x , K y , and K z by converting the
  • the algorithm can function even when the microphones are arranged in a 2-dimensional arrangement and still capable of resolving the azimuth and elevation.
  • the pu ⁇ ose of the measurement qualification unit 2505 is to evaluate the soundness or validity of the precise direction using a variety of measurement criteria and invalidate the measurements if the criteria are not satisfied.
  • FIGS 31a, 31 b, 31 c, and 31 d show different embodiments of the measurement qualification unit using a different measurement criterion. These embodiments may be used individually or in any combination.
  • FIG. 31a shows a first embodiment of the qualification unit that uses a signal-to-noise ratio (SNR) as a measurement criterion.
  • SNR signal-to-noise ratio
  • the SNR is defined as a ratio of a signal power to a noise power.
  • the measured signals are divided into blocks of signals having a predetermined period such as 40 milliseconds.
  • Block 3161 calculates the signal power for each signal block by calculating the square- sum of the sampled signals within the block.
  • the noise power can be measured in many ways, but one convenient way of measuring the noise power may be to pick the signal power of the signal block having the minimum signal power and to use it as the noise power.
  • Block 3162 selects the signal block having the minimum power over a predetermined interval such as 2 second.
  • Block 3163 calculates the SNR as the ratio of the signal power of the current block to that of the noise power.
  • Block 3164 invalidates the precise direction if the SNR is below a certain threshold.
  • Fig. 31b shows a second embodiment of the measurement qualification unit that uses a spread (range of distribution) of individual measured delays as a measurement criterion.
  • the precise source direction calculated by the precise-direction finder represents an average direction among the individual measured directions measured by microphone pairs in the sector. Since delays are directly related to direction angles, the spread of the individual measured delays with respect to the individual estimated delay indicates how widely the individual directions vary with respect to the precise direction. Thus, the spread gives a good indication as to the validity of the measurements. For example, if the individual measured delays are too widely spread, it is likely to indicate some kind of measurement error.
  • T e is defined as a vector representing the set of individual estimated delays ⁇ e corresponding to the precise direction, K.
  • Block 3171 calculates T e from K based on the linear relation between K and T e .
  • T e (- R K) / c where R denotes the position difference matrix representing the geometry of the microphone array as follows
  • block 73 invalidates the precise source direction.
  • the spread can be calculated directly from the individual measured delays using the following:
  • FIG. 31c shows a third embodiment of the measurement qualification unit that uses the azimuth angle, ⁇ , as a measurement criterion. If ⁇ deviates significantly from the sector direction (the approximate source direction), it is likely to indicate that the precise direction is false. Therefore, if ⁇ is not within a permissible range of angles (e.g. within +/- 60 degrees) of the sector direction, the precise direction is invalidated.
  • the above embodiments can be used selectively or combined to produce a single quality figure of measurement, Q, which may be sent to a target system such as a controller for a videoconferencing system. For example, Q may be set to 0 if any of the error conditions above occurs and set to the SNR otherwise.
  • the direction finding system of the present invention can be used in combination with a directional microphone system, which may include an adaptive filter.
  • adaptive filter is not limited to a particular kind of adaptive filter.
  • the adaptive filter may include weight constraining means for truncating updated filter weight values to predetermined threshold values when each of the updated filter weight value exceeds the corresponding threshold value.
  • the adaptive filter may further include inhibiting means for estimating the power of the main channel and the power of the reference channels and for generating an inhibit signal to the weight updating means based on normalized power difference between the main channel and the reference channels.
  • the weight constraining means may include a frequency-selective weight-control unit, which includes a Fast Fourier Transform (FFT) unit for receiving adaptive filter weights and performing the FFT of the filer weights to obtain frequency representation values, a set of frequency bins for storing the frequency representation values divided into a set of frequency bands, a set of truncating units for comparing the frequency representation values with a threshold assigned to each bin and for truncating the values if they exceed the threshold, a set of storage cells for temporarily storing the truncated values, and an Inverse Fast Fourier Transform (IFFT) unit for converting them back to the adaptive filter weights.
  • FFT Fast Fourier Transform
  • IFFT Inverse Fast Fourier Transform
  • the adaptive filter in the directional microphone may also employ dual-processing interference canceling system where adaptive filter processing is used for a subset of a frequency range and fixed filter processing is used for another subset of the frequency range.
  • dual-processing interference canceling system where adaptive filter processing is used for a subset of a frequency range and fixed filter processing is used for another subset of the frequency range.
  • adaptive filter processing portion of the dual processing may also employ the adaptive filter processing disclosed in applicant's commonly assigned and co-pending U.S. patent application Serial No. 08/672,899, filed
  • DSP digital signal processor
  • FIGS. 32A-32D show a flow chart depicting the operation of a program in accordance with a preferred embodiment of the present invention.
  • the program uses measurement flags to indicate various error conditions.
  • step 32100 When the program starts (step 32100), it resets the system (step 32101) by resetting system variables including various measurement flags used for indicating error conditions.
  • the program then reads into registers microphone inputs sampled at the sampling frequency of 64 KHz (step 32102), which is oversampling over the Nyquist frequency. As mentioned in Section 5.2, oversampling allows anti-aliasing filters to be realized with a much more gentle cut-off characteristic of a filter.
  • step 32103 Upon reading every 5 samples (step 32103), the program performs a low-pass filter operation and a decimation by taking one sample out of every 5 samples for each microphone (step 32104). The decimated samples are stored in the registers (step 32105).
  • the program performs a bandpass filter operation on the decimated samples so that the output contains frequencies ranging from 1.5 to 2.5 KHz (step 32106).
  • the output is stored in input memory (step 32107).
  • the program repeats the above procedure until 512 new samples are obtained (step 32108). If the 512 news samples are reached, the program takes each pair of adjacent microphone pairs and multiples the received signals and add them to obtain zero-delay cross-correlation (step 32200), and the results are stored (step 32206). The calculation of zero-delay cross-correlation is repeated for all adjacent microphone pairs, not involving the center microphone (step 32201).
  • the microphone pair having the highest zero-delay cross-correlation is selected (step 32202) and the value is stored as the signal power (step 32207), which will be used later.
  • the program calculates the zero-correlation (step 32203) and the microphone having the higher correlation is selected (step 32204).
  • the program determines the sector by including the selected microphone pair, the neighboring microphone selected, and the center microphone, if there is one.
  • the program calculates the average power of the 512 samples taken from the center microphone (step 32300).
  • the lowest average energy during the latest 2 seconds is set to be the noise power (steps 32301-32305).
  • the program calculates the full cross-correlation of signals received by each microphone pair in the sector (step 32306).
  • the program finds the peak cross- correlation delay, ⁇ s , where the correlation is maximum (step 32307). ⁇ s lies on a sampling point, but the actual maximum-correlation delay, ⁇ d , may occur between two sampling points. If ⁇ s is either the maximum or minimum possible delay (step 32308), ⁇ d is set to ⁇ s (step 32309). Otherwise, the program finds the actual maximum-correlation delays using the parabolic inte ⁇ olation formula described in Section 5.4.1 (steps 310-312). The above steps are repeated for all the microphone pairs in the sector (step 32313).
  • the program then calculates the azimuth angle, ⁇ , and the elevation angle, ⁇ , corresponding to the direction vector obtained (step 32401).
  • the program calculates the SNR as the ratio of the signal power and the noise power (step 32402). If the SNR exceeds a threshold (step 32403), the program raises the SNR Flag (step 32404).
  • the program then evaluates the elevation angle, ⁇ . If ⁇ is not within a permissible range of angles (e.g. from 30° to 150°) (step 32405), the Elevation Flag is raised (step 32406).
  • a permissible range of angles e.g. from 30° to 150°
  • the program calculates corresponding delays from the precise direction
  • step 32407 The program calculates a delay spread as the sum of squares of the difference between the individual measured delays and the individual estimated delays (step 32408). If the delay spread exceeds a certain threshold (step 32409), the Delay Spread Flag is raised (step 32410).
  • the program calculates the quality figure of measurement, Q, as a combination of all or part of the measurement criteria above (step 32411). For example, Q may be set to 0 if any of the measurement flags was raised and set to the SNR otherwise.
  • the program transfers ⁇ , ⁇ , and Q to a target system, such as an automatic camera tracking system used in a video conferencing application (step 32412).
  • the program resets the measurement flags (step 32413) and goes back to the beginning of the program (step 32414).
  • Figure 33 A illustrates an embodiment of the present invention 33100.
  • the system receives a digital audio signal at input 33102 sampled at a frequency which is at least twice the bandwidth of the audio signal.
  • the signal is derived from a microphone signal that has been processed through an analog front end,
  • the input is taken from the output of a beamformer or even an adaptive beamformer. In that case the signal has been processed to eliminate noises arriving from directions other than the desired one leaving mainly noises originated from the same direction of the desired one.
  • the input signal can be obtained from a sound board when the processing is implemented on a PC processor or similar computer processor.
  • the input samples are stored in a temporary buffer 33104 of 256 points.
  • the new 256 points are combined in a combiner 33106 with the previous 256 points to provide 512 input points.
  • the 512 input points are multiplied by multiplier 33108 with a shading window with the length of 512 points.
  • the shading window contains coefficients that are multiplied with the input data accordingly.
  • the shading window can be Hanning or other and it serves two goals: the first is to srnooth the transients between two processed blocks (together with the overlap process); the second is to reduce the side lobes in the frequency domain and hence prevent the masking of low energy tonals by high energy side lobes.
  • the shaded results are converted to the frequency domain through an FFT (Fast Fourier Transform) processor 33110.
  • FFT Fast Fourier Transform
  • the FFT output is a complex vector of 256 significant points (the other 256 points are an anti-symmetric replica of the first 256 points).
  • the points are processed in the noise processing block 33112 which includes the noise magnitude estimation for each frequency bin - the subtraction process that estimates the noise- free complex value for each frequency bin and the residual noise reduction process.
  • IFFT Inverse Fast Fourier Transform
  • time domain points are summed by the summer 33116 with the previous last 256 data points to compensate for the input overlap and shading process and output at output terminal 33118. The remaining 256 points are saved for the next iteration.
  • Figure 33B is a detailed description of the noise processing block 33200.
  • each frequency bin (n) 33202 magnitude is estimated.
  • the straight forward approach is to estimate the magnitude by calculating:
  • the present invention implements a 2D smoothing process.
  • Each bin is replaced with the average of its value and the two neighboring bins' value (of the same time frame) by a first averager 33206.
  • the smoothed value of each smoothed bin is further smoothed by a second averager 33208 using a time exponential average with a time constant of 0.7 (which is the equivalent of averaging over 3 time frames).
  • the 2D-smoothed value is then used by two processes - the noise estimation process by noise estimation processor 33212 and the subtraction process by subtractor 33210.
  • the noise estimation process estimates the noise at each frequency bin and the result is used by the noise subtraction process.
  • the output of the noise subtraction is fed into a residual noise reduction processor 33216 to further reduce the noise.
  • the time domain signal is also used by the residual noise processor 33216 to determine the speech free segments.
  • the noise free signal is moved to the IFFT process to obtain the time domain output 33218.
  • FIG. 33C is a detailed description of the noise estimation processor
  • the noise should be estimated by taking a long time average of the signal magnitude (Y) of non-speech time intervals.
  • Y signal magnitude
  • a voice switch be used to detect the speech/non-speech intervals.
  • a too-sensitive a switch may result in the use of a speech signal for the noise estimation which will defect the voice signal.
  • a less sensitive switch may dramatically reduce the length of the noise time intervals (especially in continuous speech cases) and defect the validity of the noise estimation.
  • a separate adaptive threshold is implemented for each frequency bin 33302. This allows the location of noise elements for each bin separately without the examination of the overall signal energy.
  • the logic behind this method is that, for each syllable, the energy may appear at different frequency bands. At the same time, other frequency bands may contain noise elements. It is therefore possible to apply a non-sensitive threshold for the noise and yet locate many non-speech data points for each bin, even within a continuous speech case.
  • the advantage of this method is that it allows the collection of many noise segments for a good and stable estimation of the noise, even within continuous speech segments.
  • a future minimum value is initiated every 5 seconds at 33304 with the value of the current magnitude (Y(n)) and replaced with a smaller minimal value over the next 5 seconds through the following process.
  • the future minimum value of each bin is compared with the current magnitude value of the signal. If the current magnitude is smaller than the future minimum, the future minimum is replaced with the magnitude which becomes the new future minimum.
  • a current minimum value is calculated at 33306. The current minimum is initiated every 5 seconds with the value of the future minimum that was determined over the previous 5 seconds and follows the minimum value of the signal for the next 5 seconds by comparing its value with the current magnitude value.
  • the current minimum value is used by the subtraction process, while the future minimum is used for the initiation and refreshing of the current minimum.
  • the noise estimation mechanism of the present invention ensures a tight and quick estimation of the noise value, with limited memory of the process (5 seconds), while preventing a too high an estimation of the noise.
  • Each bin's magnitude (Y(n)) is compared with four times the current minimum value of that bin by comparator 33308 - which serves as the adaptive threshold for that bin. If the magnitude is within the range (hence below the threshold), it is allowed as noise and used by an exponential averaging unit 33310 that determines the level of the noise 33312 of that frequency. If the magnitude is above the threshold it is rejected for the noise estimation.
  • the time constant for the exponential averaging is typically 0.95 which may be inte ⁇ reted as taking the average of the last 20 frames.
  • the threshold of 4*minimum value may be changed for some applications.
  • Figure 33D is a detailed description of the subtraction processor 33400.
  • the value of the estimated bin noise magnitude is subtracted from the current bin magnitude.
  • the phase of the current bin is calculated and used in conjunction with the result of the subtraction to obtain the Real and Imaginary parts of the result.
  • This approach is very expensive in terms of processing and memory because it requires the calculation of the Sine and Cosine arguments of the complex vector with consideration of the 4 quarters where the complex vector may be positioned.
  • An alternative approach used in this present invention is to use a Filter approach.
  • the subtraction is inte ⁇ reted as a filter multiplication performed by filter 33402 where H (the filter coefficient) is:
  • E is the noise free complex value.
  • the subtraction may result in a negative value of magnitude.
  • This value can be either replaced with zero (half-wave rectification) or replaced with a positive value equal to the negative one (full-wave rectification).
  • the filter approach results in the full-wave rectification directly.
  • the full wave rectification provides a little less noise reduction but introduces much less artifacts to the signal. It will be appreciated that this filter can be modified to effect a half-wave rectification by taking ' the non- absolute value of the numerator and replacing negative values with zeros.
  • the values of Y in the figures are the smoothed values of Y after averaging over neighboring spectral bins and over time frames (2D smoothing). Another approach is to use the smoothed Y only for the noise estimation (N), and to use the unsmoothed Y for the calculation of H.
  • Figure 33E illustrates the residual noise reduction processor 33500.
  • the residual noise is defined as the remaining noise during non-speech intervals.
  • the noise in these intervals is first reduced by the subtraction process which does not differentiate between speech and non-speech time intervals.
  • the remaining residual noise can be reduced further by using a voice switch 33502 and either multiplying the residual noise by a decaying factor or replacing it with zeros. Another alternative to the zeroing is replacing the residual noise with a minimum value of noise at 33504.
  • the residual noise reduction processor 33506 applies a similar threshold used by the noise estimator at 33508 on the noise free output bin and replaces or decays the result when it is lower than the threshold at 33510.
  • the result of the residual noise processing of the present invention is a quieter sound in the non-speech intervals.
  • the appearance of artifacts such as a pumping noise when the noise level is switched between the speech interval and the non-speech interval may occur in some applications.
  • the spectral subtraction technique of the present invention can be utilized in conjunction with the array techniques, close talk microphone technique or as a stand alone system.
  • the spectral subtraction of the present invention can be implemented on an embedded hardware (DSP) as a stand alone system, as part of other embedded algorithms such as adaptive beamforming, or as a software application running on a PC using data obtained from a sound port.
  • DSP embedded hardware
  • the present invention may be implemented as a software application.
  • the input samples are read.
  • the read samples are stored in a buffer. If 256 new points are accumulated in step 33604, program control advances to step 33606 - otherwise control returns to step 33600 where additional samples are read.
  • the last 512 points are moved to the processing buffer in step 33606.
  • the 256 new samples stored are combined with the previous 256 points in step 33608 to obtain the 512 points.
  • a Fourier Transform is performed on the 512 points. Of course, another transform may be employed to obtain the spectral noise signal.
  • step 33602 the input samples are read.
  • the read samples are stored in a buffer. If 256 new points are accumulated in step 33604, program control advances to step 33606 - otherwise control returns to step 33600 where additional samples are read.
  • the last 512 points are moved to the processing buffer in step 33606.
  • the 256 new samples stored are combined with the previous 256 points in step 33608 to obtain the 512 points.
  • the 256 significant complex points resulting from the transformation are stored in the buffer.
  • the second 256 points are a conjugate replica of the first 256 points and are redundant for real inputs.
  • the stored data in step 33614 includes the 256 real points and the 256 imaginary points.
  • step 33H the noise processing is performed wherein the magnitude of the signal is estimated in step 33700.
  • the straight forward approach may be employed but, as discussed with reference to Figure 33B, the straight forward approach requires extraneous processing time and complexity.
  • step 33702 the stored complex points are read from the buffer and calculated using the estimation equation shown in step 33700. The result is stored in step 33704.
  • a 2-dimensional (2D) smoothing process is effected in steps 33706 and 33708 wherein, in step 33706, the estimate at each point is averaged with the estimates of adjacent points and, in step 33708, the estimate is averaged using an exponential average having the effect of averaging the estimate at each point over, for example, 3 time samples of each bin.
  • the smoothed estimate is employed to determine the future minimum value and the current minimum value. If the smoothed estimate is less than the calculated future minimum value as determined in step 33710, the future minimum value is replaced with the smoothed estimate and stored in step 33714.
  • step 33712 determines whether the smoothed estimate is less than the current minimum value. If it is determined at step 33712 that the smoothed estimate is less than the current minimum value, then the current minimum is replaced with the smoothed estimate value and stored in step 33720.
  • the future and current minimum values are calculated continuously and initiated periodically, for example, every 5 seconds as determined in step 33724 and control is advanced to steps 33722 and 33726 wherein the new future and current minimum are calculated. Afterwards, control advances to Figure 331 as indicated by the circumscribed letter B where the subtraction and residual noise reduction are effected.
  • step 33804 it is determined whether the samples are less than a threshold amount in step 33800.
  • step 33804 where the samples are within the threshold, the samples undergo an exponential averaging and stored in the buffer at step 33802. Otherwise, control advances directly to step 33808.
  • the filter coefficients are determined from the signal samples retrieved in step 33806 the samples retrieved from step 33810 is determined from the signal samples retrieved in step 33806 and the estimated samples retrieved from step 33810.
  • the straight forward approach may be used by which phase is estimated and applied, the alternative Weiner Filter is preferred since this saves processing time and complexity.
  • step 33814 the filter transform is multiplied by the samples retrieved from steps 33816 and stored in step 33812.
  • the residual noise reduction process is performed wherein, in step 33818, if the processed noise signal is within a threshold, control advances to step 33820 wherein the processed noise is subjected to replacement, for example, a decay.
  • the residual noise reduction process may not be suitable in some applications where the application is negatively effected. It will be appreciated that, while specific values are used as in the several equations and calculations employed in the present invention, these values may be different than those shown.
  • the Inverse Fourier Transform is generated in step 902 on the basis of the recovered noise processed audio signal recovered in step 904 and stored in step 900.
  • the time-domain signals are overlayed in order to regenerate the audio signal substantially without noise.
  • the present invention may be practiced as a software application, preferably written using C or any other programming language, which may be embedded on, for example, a programmable memory chip or stored on a computer-readable medium such as, for example, an optical disk, and retrieved therefrom to drive a computer processor.
  • Sample code representative of the present invention is illustrated in Appendix A which, as will be appreciated by those skilled in the art, may be modified to accommodate various operating systems and compilers or to include various bells and whistles without departing from the spirit and scope of the present invention.
  • a spectral subtraction system that has a simple, yet efficient mechanism, to estimate the noise magnitude spectrum even in poor signal to noise ratio situations and in continuous fast speech cases.
  • An efficient mechanism is provided that can perform the magnitude estimation with little cost, and will overcome the problem of phase association.
  • a stable mechanism is provided to estimate the noise spectral magnitude without the smearing of the data.

Abstract

A digital super directional array integrated into a peripheral device. The DSDA may be incorporated into a wand. The DSDA may be incorporated into a keyboard wherein a wand housing the microphone array is coupled to the keyboard, a plurality of microphones are coupled to corners of the keyboard, the DSDA is integrated as a pop-up portion, the DSDA is formed as a protrusion from the keyboard and the DSDA is formed on the bottom of the keyboard at an edge thereof to create a pressure zone effect to eliminate acoustic reflections. The DSDA is integrated into a substantially flat, elongated housing for positioning between a visor and the roof of an automobile wherein a pressure zone microphone effect is created between the visor and the roof of the automobile. The DSDA is integrated into the rear view mirror, a mouse or a video camera. The DSDA is integrated with a universal voice interface which interfaces the DSDA to audio processing hardware/software. The DSDA is integrated with adaptive beam forming and directing techniques.

Description

SYSTEM AND METHOD FOR ADAPTIVE INTERFERENCE CANCELING
RELATED APPLICATIONS INCORPORATED BY REFERENCE.
This is a Continuation-In-Part of U.S. Patent Serial No. 09/059,503 filed August 6, 1998, U.S. Patent Serial No. 08/840,159 filed April 14, 1997, U.S. Patent Serial No. 09/055,709 filed April 7, 1998, U.S. Patent Serial No. 09/130,923, filed August 6, 1998 and U.S. Patent Serial No. 09/157,035 filed September 18, 1998, each of which is hereby incorporated herein by reference.
The following applications and patent(s) are cited and hereby herein incorporated by reference: U.S. Patent Serial No. 60/126,567 filed March 26, 1999; U.S. Patent Serial No. 09/252,874 filed February 18, 1999, U.S. Patent Serial No. 09/130,923 filed August 6, 1998, U.S. Patent Serial No. 09/055,709 filed April 7, 1998, U.S. Patent Serial No. 09/059,503 filed April 13, 1998, U.S. Patent Serial No. 08/840,159 filed April 14, 1997, U.S. Patent Serial No. 09/050,196 filed March 30, 1998, U.S. Patent Serial No. 09/252,874 filed February 18, 1999, U.S. Patent Serial No. 08/672,899 now U.S. Patent No. 5,825,898 issued October 20, 1998; and U.S. Patent Serial No. 09/089,710 filed June 3, 1998, U.S. Patent No. 5,825,897 issued October 20, 1998, U.S. Patent No. 5,732,143 issued March 24, 1998, U.S. Patent No. 5,673,325 issued September 30, 1997, U.S. Patent No. 5,381,473 issued January 10, 1995, International Application No. PCT/US 99/06764 filed March 29, 1999, U.S. patent Serial No. 60/126,567 filed March 26, 1999, International Application No. PCT/US99/08012 filed April 13, 1999. And, all documents cited herein are incorporated herein by reference, as are documents cited or referenced in documents cited herein.
BACKGROUND OF THE INVENTION The present invention relates generally to integrating a DSDA (Digital Super Directional Array).
SUMMARY OF THE INVENTION It is an object of the present invention an integrated DSDA. BRIEF DESCRIPTION OF THE DRAWINGS
The objects, features and advantages of the present invention will be more readily apparent from the following detailed description of the invention in which:
FIG. 1 is a block diagram of an overall system; FIG. 2 is a block diagram of a sampling unit;
FIG. 3 is a block diagram of an alternative embodiment of a sampling unit;
FIG. 4 is a schematic depiction of tapped delay lines used in a main channel matrix and a reference matrix unit; FIG. 5 is a schematic depiction of a main channel matrix unit;
FIG. 6 is a schematic depiction of a reference channel matrix unit;
FIG. 7 is a schematic depiction of a decolorizing filter;
FIG. 8 is a schematic depiction of an inhibiting unit based on directional interference; FIG. 9 is a schematic depiction of a frequency-selective constraint adaptive filter;
FIG. 10 is a block diagram of a frequency-selective weight-constraint unit;
FIG. 11 is a flow chart depicting the operation of a program that can be used to implement the invention;
FIGS. 12A-H illustrate the DSDA integrated according to the present invention;
FIGS. 13-18C-2 illustrate the universal interface in accordance with the present invention; FIG. 19 is a block diagram of a system using sub-band processing;
FIG. 20 is'a block diagram of a system using broadband processing with frequency-limited adaptation;
FIG. 21 is a block diagram of a system using broadband processing with an external main-channel generator; FIGS. 22A-22D are a flow chart depicting the operation of a program that can be used to implement a method using sub-band processing;
FIGS. 23A-23C are a flow chart depicting the operation of a program that can be used to implement a method using broad-band processing with frequency- limited adaptation;
FIGS. 24A-24C are a flow chart depicting the operation of a program that can be used to implement a method using broad-band processing with an external main-channel generator;
FIG. 25 is a functional diagram of the overall system including a microphone array, an A-to-D converter, a band-pass filter, an approximate-direction finder, a precise-direction finder, and a measurement qualification unit in accordance with the present invention;
FIG. 26 is a perspective view showing the arrangement of a particular embodiment of the microphone array of FIG. 25; FIG. 27 is a functional diagram of an embodiment of the approximate and exact direction finder of FIG. 25;
FIG. 28 is a functional diagram of an embodiment of the precise- direction finder of FIG. 25;
FIG. 29 is an exact-direction finder of FIG. 25; FIG. 30 is the 3-D coordinate system used to describe the present invention;
FIG. 31 A is a functional diagram of a first embodiment of the measurement qualification unit of FIG. 25;
FIG. 3 IB is a functional diagram of a second embodiment of the measurement qualification unit of FIG. 25;
FIG. 31C is a functional diagram of a third embodiment of the measurement qualification unit of FIG. 25;
FIG. 3 ID is a functional diagram of a fourth embodiment of the measurement qualification unit of FIG. 25; FIGS. 32A - 32D are a flow chart depicting the operation of a program that can be used to implement the method in accordance with the present invention; and
FIGS. 33A-33J are diagrams of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a block diagram of a system in accordance with a preferred embodiment of the present invention. The system illustrated has a sensor array 1, a sampling unit 2, a main channel matrix unit 3, a reference channel matrix unit 4, a set of decolorizing filters 5, a set of frequency-selective constrained adaptive filters 6, a delay 7, a difference unit 8, an inhibiting unit 9, and an output D/A unit 10.
Sensor array 1, having individual sensors la- Id, receives signals from a signal source on-axis from the system and from interference sources located off-axis from the system. The sensor array is connected to sampling unit 2 for sampling the received signals, having individual sampling elements, 2a-2d, where each element is connected to the corresponding individual sensor to produce digital signals 11.
The outputs of sampling unit 2 are connected to main channel matrix unit 3 producing a main channel 12 representing signals received in the direction of a source. The main channel contains both a source signal component and an interference signal component. The outputs of sampling unit 2 are also connected reference channel matrix unit 4, which generates reference channels 13 representing signals received from directions other that of the signal source. Thus, the reference channels represent interference signals.
The reference channels are filtered through decolorizing filters 5, which generate flat-frequency reference channels 14 having a frequency spectrum whose magnitude is substantially flat over a frequency range of interest. Flat- frequency reference channels 14 are fed into the set of frequency-selective constraint adaptive filters 6, which generate canceling signals 15.
In the mean time, main channel 12 is delayed through delay 7 so that it is synchronized with canceling signals 15. Difference unit 8 then subtracts canceling signals 15 from the delayed main channel to generate an digital output signal 16, which is converted by D/A unit 10 into analog form. Digital output signal 15 is fed back to the adaptive filters to update the filter weights of the adaptive filters. Flat- frequency reference channels 14 are fed to inhibiting unit 9, which estimates the power of each flat- frequency reference channel as well as the power of the main channel and generates an inhibit signal 19 to prevent signal leakage.
FIG. 2 depicts a preferred embodiment of the sampling unit. A sensor array 21, having sensor elements 21a-21d, is connected to an analog front end 22, having amplifier elements 22a-22d, where each amplifier element is connected to the output of the corresponding sensor element. In a directional microphone application, each sensor can be either a directional or omnidirectional microphone. The analog front end amplifies the received analog sensor signals to match the input requirement of the sampling elements. The outputs from the analog front ends are connected to a set of delta-sigma A/D converters, 23, where each converter samples and digitizes the amplified analog signals. The delta-sigma sampling is a well-known A/D technique using both oversampling and digital filtering. For details on delta-sigma A/D sampling, see Crystal Semiconductor Corporation, Application Note: Delta-Sigma Techniques, 1989.
FIG. 3 shows an alternative embodiment of the sampling unit. A sensor array 31, having sensor elements 31a-31d, is connected to an amplifier 32, having amplifier elements 32a-32d, where each amplifier element amplifies the received signals from the corresponding sensor element. The outputs of the amplifier are connected to a sample & hold (S/H) unit 33 having sample & hold elements 33a-33d, where each S/H element samples the amplified analog signal from the corresponding amplifier element to produce a discrete signal. The outputs from the S/H unit are multiplexed into a single signal through a multiplexor 34. The output of the multiplexor is connected to a conventional A/D converter 35 to produce a digital signal.
FIG. 4 is a schematic depiction of tapped delay lines used in the main channel matrix unit and the reference channel matrix in accordance with a preferred embodiment of the present invention. The tapped delay line used here is defined as a nonrecursive digital filter, also known in the art as a transversal filter, a finite impulse response filter or an FIR filter. The illustrated embodiment has 4 tapped delay lines, 40a-40d. Each tapped delay line includes delay elements 41, multipliers 42 and adders 43. Digital signals, 44a-44d, are fed into the set of tapped delay lines 40a-40d. Delayed signals through delay elements 41 are multiplied by filter coefficients, F,j, 45 and added to produce outputs, 46a-46d.
The n-th sample of an output from the i-th tapped delay line, Y,(n), can then be expressed as:
Y,(n) = ∑k j=0 F,0 X,(n-j), where k is the length of the filter, and X,(n) is the n-th sample of an input to the i-th tapped delay line.
FIG. 5 depicts the main channel matrix unit for generating a main channel in accordance with a preferred embodiment of the present invention. The unit has tapped delay lines, 50a-50d, as an input section taking inputs 51a-51d from the sampling unit. Its output section includes multipliers, 52a-52d, where each multiplier is connected to the corresponding tapped delay line and an adder 53, which sums all output signals from the multipliers. The unit generates a main channel 54, as a weighted sum of outputs from all multipliers. The filter weights 55a-55d can be any combination of fractions as long as their sum is 1. For example, if 4 microphones are used, the embodiment may use the filter weights of 1/4 in order to take into account of the contribution of each microphone.
The unit acts as a beamformer, a spatial filter which filters a signal coming in all directions to produce a signal coming in a specific direction without physically moving the sensor array. The coefficients of the tapped delay lines and the filter weights are set in such a way that the received signals are spatially filtered to maximize the sensitivity toward the signal source.
Since some interference signals find their way to reach the signal source due to many factors such as the reverberation of a room, main channel 54 representing the received signal in the direction of the signal source contains not only a source signal component, but also an interference signal component. FIG. 6 depicts the reference channel matrix unit for generating reference matrix channels in accordance with a preferred embodiment of the present invention. It has tapped delay lines, 60a-60d, as an input section taking inputs 61a-61d from the sampling unit. The same tapped delay lines as that of FIG. 4 may be used, in which case the tapped delay lines may be shared by the main and reference channel matrix units.
Its output section includes multipliers, 62a-62d, 63a-63d, 64a-64d and adders 65a-65c, where each multiplier is connected to the corresponding tapped delay line and adder. The unit acts as a beamformer which generates the reference channels 66a-66c representing signals arriving off-axis from the signal source by obtaining the weighted differences of certain combinations of outputs from the tapped delay lines. The filter weight combinations can be any numbers as long as their sum of filter weights for combining a given reference channel is 0. For example, the illustrated embodiment may use a filter weight combination, (Wl 1, W12, W13, W14) = (0.25, 0.25, 0.25, -0.75), in order to combine signals 61a-61d to produce reference channel 66a.
The net effect is placing a null (low sensitivity) in the receiving gain of the beamformer toward the signal source. As a result, the reference channels represent interference signals in directions other than that of the signal source. In other words, the unit "steers" the input digital data to obtain interference signals without physically moving the sensor array.
FIG. 7 is a schematic depiction of the decolorizing filter in accordance with a preferred embodiment of the present invention. It is a tapped delay line including delay elements 71, multipliers 72 and adders 73. A reference channel 74 is fed into the tapped delay line. Delayed signals are multiplied by filter coefficients, Fj, 75 and added to produce an output 76. The filter coefficients are set in such a way that the filter amplifies the low-magnitude frequency components of an input signal to obtain an output signal having a substantially flat frequency spectrum.
As mentioned before in the background section, the output of a conventional adaptive beamformer suffers a non-uniform frequency behavior. This is because the reference channels do not have a flat frequency spectrum. The receiving sensitivity of a beamformer toward a particular angular direction is often described in terms of a gain curve. As mentioned before, the reference channel is obtained by placing a null in the gain curve (making the sensor array insensitive) in the direction of the signal source. The resulting gain curve has a lower gain for lower frequency signals than higher frequency signals. Since the reference channel is modified to generate a canceling signal, a non- flat frequency spectrum of the reference channel is translated to a non-uniform frequency behavior in the system output.
The decolorizing filter is a fixed-coefficient filter which flattens the frequency spectrum of the reference channel (thus "decolorizing" the reference channel) by boosting the low frequency portion of the reference channel. By adding the decolorizing filters to all outputs of the reference channel matrix unit, a substantially flat frequency response in all directions is obtained.
The decolorizing filter in the illustrated embodiment uses a tapped delay line filter which is the same as a finite impulse response (FIR) filter, but other kinds of filters such as an infinite impulse response (IIR) filter can also be used for the decolorizing filter in an alternative embodiment.
FIG. 8 depicts schematically the inhibiting unit in accordance with a preferred embodiment of the present invention. It includes power estimation units 81, 82 which estimate the power of a main channel 83 and each reference channel 84, respectively. A sample power estimation unit 85 calculates the power of each sample. A multiplier 86 multiplies the power of each sample by a fraction, α, which is the reciprocal of the number of samples for a given averaging period to obtain an average sample power 87. An adder 88 adds the average sample power to the output of another multiplier 89 which multiplies a previously calculated main channel power average 90 by (1-α). A new main channel power average is obtained by (new sample power) x α + (old power average) x (1-α). For example, if a 100-sample average is used, α = 0.01. The updated power average will be (new sample power) x 0.01 + (old power average) x 0.99. In this way, the updated power average will be available at each sampling instant rather than after an averaging period. Although the illustrated embodiment shows an on-the-fly estimation method of the power average, other kinds of power estimation methods can also be used in an alternative embodiment.
A multiplier 91 multiplies the main channel power 89 with a threshold
92 to obtain a normalized main channel power average 93. An adder 94 subtracts reference channel power averages 95 from the normalized main channel power average
93 to produce a difference 96. If the difference is positive, a comparator 97 generates an inhibit signal 98. The inhibit signal is provided to the adaptive filters to stop the adaptation process to prevent signal leakage.
Although the illustrated embodiment normalizes the main channel power average, an alternative embodiment may normalize the reference channel power average instead of the main channel power average. For example, if the threshold 92 in the illustrated embodiment is 0.25, the same effect can be obtained in the alternative embodiment by normalizing each reference channel power average by multiplying it by 4. This inhibition approach is different from the prior art SNR-based inhibition approach mentioned in the background section in that it detects the presence of significant directional interference which the prior art approach does not consider. As a result, the directional-interference-based inhibition approach stops the adaptation process when there is no significant directional interference to be eliminated, whereas the prior art approach does not.
For example, where there is a weak source signal (e.g. during speech intermission) and there is almost no directional interference except some uncorrelated noise (such as noise due to wind or mechanical vibrations on the sensor structure), the SNR-based approach would allow the adaptive filter to continue adapting due to the small SNR. The continued adaptation process is not desirable because there is very little directional interference to be eliminated in the first place, and the adaptation process searches in vain for new filter weights to eliminate the uncorrelated noise, which often results in canceling the source signal component of the received signal.
By contrast, the directional-interference-based inhibition mechanism will inhibit the adaptation process in such a case because the strength of directional interference as reflected in the reference channel power average will be smaller than the normalized main channel power average, producing a positive normalized power difference. The adaptive process is inhibited as a result until there is some directional interference to be eliminated. FIG. 9 shows the frequency-selective constraint adaptive filter together with the difference unit in accordance with a preferred embodiment of the present invention. The frequency-selective constraint adaptive filter 101 includes a finite impulse response (FIR) filter 102, an LMS weight updating unit 103 and a frequency- selective weight-constraint unit 104. In an alternative embodiment, an infinite impulse response (IIR) filter can be used instead of the FIR filter. A flat- frequency reference channel 105 passes through FIR filter 102 whose filter weights are adjusted to produce a canceling signal 106 which closely approximates the actual interference signal component present in a main channel 107. In a preferred embodiment, the main channel is obtained from the main channel matrix unit after a delay in order to synchronize the main channel with the canceling signal. In general, there is a delay between the main channel and the canceling signal because the canceling signal is obtained by processing reference channels through extra stages of delay, i.e., the decolorization filters and adaptive filters. In an alternative embodiment, the main channel directly from the main channel matrix unit may be used if the delay is not significant.
A difference unit 108 subtracts canceling signal 106 from main channel 107 to generates an output signal 109. Adaptive filter 101 adjusts filter weights, W Wn, to minimize the power of the output signal. When the filter weights settle, output signal 109 generates the source signal substantially free of the actual interference signal component because canceling signal 106 closely tracks the interference signal component. Output signal 109 is sent to the output D/A unit to produce an analog output signal. Output signal 109 is also used to adjust the adaptive filter weights to further reduce the interference signal component.
There are many techniques to continuously update the values of the filter weights. The preferred embodiment uses the Least Mean-Square (LMS) algorithm which minimize the mean-square value of the difference between the main channel and the canceling signal, but in an alternative embodiment, other algorithms such as
Recursive Least Square (RLS) can also be used.
Under the LMS algorithm, the adaptive filter weights are updated according to the following:
Wp(n+1) = Wp(n) + 2 μ r(n-p) e(n) where n is a discrete time index; Wp is a p-th filter weight of the adaptive filter; e(n) is a difference signal between the main channel signal and the canceling signal; r(n) is a reference channel; and μ is an adaptation constant that controls the speed of adaptation. FIG. 10 depicts a preferred embodiment of the frequency-selective weight-constraint unit. The frequency-selective weight-control unit 110 includes a Fast Fourier Transform (FFT) unit 112, a set of frequency bins 114, a set of truncating units 115, a set of storage cells 116, and an Inverse Fast Fourier Transform (IFFT) unit 117, connected in series. The FFT unit 112 receives adaptive filter weights 111 and performs the
FFT of the filter weights 111 to obtain frequency representation values 113. The frequency representation values are then divided into a set of frequency bands and stored into the frequency bins 114a-l 14h. Each frequency bin stores the frequency representation values within a specific bandwidth assigned to each bin. The values represent the operation of the adaptive filter with respect to a specific frequency component of the source signal. Each of the truncating units 115a-115h compares the frequency representation values with a threshold assigned to each bin, and truncates the values if they exceeds the threshold. The truncated frequency representation values are temporarily stored in 116a-l 16h before the IFFT unit 117 converts them back to new filter weight values 118.
In addition to the inhibiting mechanism based on directional interference, the frequency-selective weight-constraint unit further controls the adaptation process based on the frequency spectrum of the received source signal. Once the adaptive filter starts working, the performance change in the output of the filter, better or worse, becomes drastic. Uncontrolled adaptation can quickly lead to a drastic performance degradation.
The weight-constraint mechanism is based on the observation that a large increase in the adaptive filter weight values hints signal leakage. If the adaptive filter works properly, there is no need for the filter to increase the filter weights to large values. But, if the filter is not working properly, the filter weights tend to grow to large values. .
One way to curve the growth is to use a simple truncating mechanism to truncate the values of filter weights to predetermined threshold values. In this way, even if the overall signal power may be high enough to trigger the inhibition mechanism, the weight-constraint mechanism can still prevent the signal leakage.
For narrow band signals, such as a speech signal or a tonal signal, having their power spectral density concentrated in a narrow frequency range, signal leakage may not be manifested in a large growth of the filter weight values in the time domain. However, the filter weight values in the frequency domain will indicate some increase because they represent the operation of the adaptive filter in response to a specific frequency component of the source signal. The frequency-selective weight-constraint unit detects that condition by sensing a large increase in the frequency representation values of the filter weights. By truncating the frequency representation values in the narrow frequency band of interest and inverse-transforming them back to the time domain, the unit acts to prevent the signal leakage involving narrow band signals.
The system described herein may be implemented using commercially available digital signal processing (DSP) systems such as Analog Device 2100 series.
FIG. 11 shows a flow chart depicting the operation of a program for a DSP processor in accordance with a preferred embodiment of the present invention.
After the program starts at step 100, the program initializes registers and pointers as well as buffers (step 110). The program then waits for an interrupt from a sampling unit requesting for processing of samples received from the array of sensors
(step 120). When the sampling unit sends an interrupt (step 131) that the samples are ready, the program reads the sample values (step 130) and stores the values (step 140). The program filters the stored values using a routine implementing a tapped delay line and stores the filtered input values (step 141).
The program then retrieves the filtered input values (step 151) and main channel matrix coefficients (step 152) to generate a main channel (step 150) by multiplying the two and to store the result (step 160).
The program retrieves the filtered input values (step 171) and reference channel matrix coefficients (step 172) to generate a reference channel (reference channel #1) by multiplying the two (step 170) and to store the result (step 180). Steps
170 and 180 are repeated to generate all other reference channels (step 190). The program retrieves one of the reference channels (step 201) and decolorization filter coefficients for the corresponding reference channel (step 202) to generate a flat-frequency reference channel by multiplying the two (step 200) and stores the result (step 210). Steps 200 and 210 are repeated for all other reference channels (step 220). The program retrieves one of the flat- frequency reference channels (step
231) and adaptive filter coefficients (step 232) to generate canceling signal (step 230) by multiplying the two and to store the result (step 240). Steps 230 and 240 are repeated for all other reference channels to generate more canceling signals (step 250).
The program retrieves canceling signals (steps 262-263) to subtract them from the main channel (retrieved at step 261) to cancel the interference signal component in the main channel (step 260). The output is send to a D/A unit to reproduce the signal without interference in analog form (step 264). The output is also stored (step 270). The program calculates the power of a reference channel sample (step
281) and retrieves an old reference channel power average (step 282). The program multiplies the sample power by α and the old power average by (1-α), and sums them (step 280), and stores the result as a new power average (step 290). This process is repeated for all other reference channels (step 300) and the total sum of power averages of all reference channels is stored (step 310). The program multiplies the power of a main channel sample (retrieved at step 321) by α and an old main channel power average (retrieved at step 322) by (1-α), sums them (step 320) and stores them as a new main channel power average (step 330).
The program then multiplies the main channel power with a threshold to obtain a normalized main channel power average (step 340). The program subtracts the total reference channel power average (retrieved at step 341) from the normalized main channel power average to produce a difference (step 350). If the difference is positive, the program goes back to step 120 where it simply waits for another samples.
If the difference is negative, the program enters a weight-updating routine. The program calculates a new filter weight by adding [2 x adaptation constant x reference channel sample (retrieved at step 361) x output (retrieved at step 362)] to an old filter weight (retrieved at step 363) to update the weight (step 360) and stores the result (step 370).
The program performs the FFT of the new filter weights to obtain their frequency representation (step 380). The frequency representation values are divided into several frequency bands and stored into a set of frequency bins (step 390). The frequency representation values in each bin are compared with a threshold associated with each frequency bin (step 400). If the values exceed the threshold, the values are truncated to the threshold (step 410). The program performs the IFFT to convert the truncated frequency representation values back to filter weight values (step 420) and stores them (step 430). The program repeats the weight-updating routine, steps 360- 430, for all other reference channels and associated adaptive filters (step 440). The program then goes back to step 120 to wait for an interrupt for a new round of processing samples (step 450). The microphone array of the present invention may be embodied as a digital super directional array™ (DSDA) 120 shown in Fig. 12 A. The DSDA 120 shown in Fig. 12A is formed of a substantially cylindrical housing or "wand" which is elongated in one direction with the microphone elements of the array arranged therein and aligned with slats or any other suitable spacing for allowing sound to be received by the microphones in the array. It will be appreciated that the DSDA 120 may be incorporated into a keyboard 122 as shown in Fig. 12B, an automobile visor 124 as shown in Fig. 12C, an automobile mirror 126 as shown in Fig. 12D, a mouse 128 as shown in Fig. 12E or a video camera 130 as shown in Fig. 12F.
The keyboard 122 shown in Fig. 12B incorporates the DSDA 120 therein with the microphones aligned with slats or any other suitable spacing for allowing sound to be received by the microphone. The DSDA processing may be performed by hardware implementing the adaptive beamforming technique of the present invention or may couple the microphone array signals to a computer (not shown) through the serial keyboard port, COM port, LPT port, USB port or other suitable means such as radio frequency or infra-red transmission. In the computer implementation, the adaptive beamforming technique is performed by software installed in the computer. The DSDA may be flush with the keyboard so as not to be distinguishable or may be formed within a raised portion which serves to position the DSDA closer to the computer user's mouth. The raised portion may be elongated in a direction toward a position commensurate with a typical position of the computer user's mouth such as directly in front of and above the keyboard. Alternatively, the DSDA may be housed in a boom or a wand which is either attached to the computer at one end fixedly or non-fixedly such as a hinge or may be coupled by a connecting wire. Alternatively, the DSDA may be wirelessly coupled to the keyboard by any suitable wireless transmission means. In the last instance, the keyboard may include a receiving platform such as a stand or a depression for receiving the DSDA whereby the DSDA is removed by the user from the receiving platform and either spoken into like a hand-held microphone or placed by the user in any convenient location. Fig. 12B further shows that the DSDA array may be configured as a microphone at one or more corners of the keyboard including all four corners. In addition, the DSDA may be configured as a plurality of microphones in any or all of the corners of the keyboard such as two or four microphones in each corner. The DSDA may be embodied integrally with the keyboard such as flip-up style accessory built into the keyboard which otherwise is unnoticeable when concealed within the keyboard and, when tilted upward, advances beyond the surface of the keyboard to expose the microphone array.
Fig. 12B further illustrates the DSDA integrated into the bottom of the keyboard with the slats or other means for allowing sound to enter the microphones adjacent thereto to receive audio signals. With this arrangement, there is provided a small gap between the bottom of the keyboard and a supporting surface such as a desktop which creates a pressure zone microphone effect between the DSDA facing downward and the supporting surface which minimizes acoustic reflections to achieve direct sound reception by the DSDA. In at least one embodiment, the DSDA is configured on the bottom of the keyboard adjacent or at the leading edge. The instant invention takes into account the tendency of digital processing to fail to remove acoustic reflections caused by audible sounds reflecting off surfaces in a room or other objects therein. This so-called "hollow effect" is minimized in the present invention by providing the DSDA beneath the keyboard whereby the acoustic reflections are minimized due to the slight air gap between the DSDA and the supporting surface which creates the pressure zone effect. It will be appreciated that the pressure zone microphone in accordance with the present invention may be created with any peripheral, including those shown in Figs. 12A-F by forming the DSDA between the peripheral and a supporting surface. Fig. 12C illustrates a further embodiment of the DSDA which is housed in a substantially flat housing 134 having two substantially parallel sides elongated in one direction with one side including slats or other suitable means for allowing sound to be received by the microphones arranged adjacent thereto. It will be appreciated that the DSDA has a substantially flat profile such that the DSDA may fit snugly between an upper surface of the visor 124 of the automobile and a ceiling of the interior of the automobile. A holding member 136 may be included for holding the DSDA which includes a pair of opposed pincer-like arms 138 which receive and hold therein the DSDA. It is possible that each arm includes a distal portion formed such that a spacing between opposed distal portions is slightly less than a width of the DSDA and a spacing between opposed arms is substantially the same width as the DSDA. The DSDA is inserted into said holding member by forcibly inserting the DSDA between said pair of opposed arm causing the opposed arms to be slightly separated such that the DSDA is slid therebetween and afterward the DSDA is snapped in place in an area formed between the opposed arms. Alternatively, the DSDA is slid into the area formed between the opposed arms from one side. In this last instance, the DSDA housing may be formed with a longitudinal groove which meets a protruding portion of the housing member such as the distal portion of one or more of the arms or a nodule formed in said housing specifically constructed for meeting the groove for holding the DSDA within the housing. The DSDA may be affixed to the visor, or any portion of the automobile for that matter, by use of suction cups such as on the window or the dashboard, magnetic strips formed on the DSDA and the surface where the DSDA is to be mounted or Velcro™, for example. In addition, hooking members 132 substantially formed in a shape resembling clothespins may be provided for hooking the DSDA or holding member which holds the DSDA to the visor whereby the hooking members include a spacing between flexibly rigid opposed members which are curved to provide increasing resistance when spread apart. In one embodiment, one side of the hooking member engages an opening or slot 140 formed within the holding member while the opposite side of the hooking member engages the bottom of the visor such that the visor is sandwiched between the opposed sides of the hooking member firmly enough such that the visor can be swung open and closed without the hooking members losing the holding member or the DSDA held therein.
It will be appreciated that the DSDA coupled to the visor of an automobile creates a small air gap between the DSDA and the ceiling of the automobile thereby generating a pressure zone effect and minimizing acoustic reflections within the automobile.
Fig. 12D illustrates the DSDA integrated into a rear-view mirror 126 of an automobile. In this instance, the microphones of the array are spaced along the rim of the rear-view mirror. It will be appreciated that this microphone arrangement has similar properties to the array shown in Fig. 26. A flexible, tubular housing may be provided for housing the microphones in the array such that the tubular-housing may be applied by the driver by fitting the tubular housing around the rear- view mirror for ease of installation. In the alternative, the DSDA may be provided in a long wand-like member attached to the rear-view mirror 126. The processing hardware for processing the sound received by the microphone array may be incorporated into the interior of the rear-view mirror or, alternatively, transmitted via wired or wireless transmission means to a remote processor within the automobile. It is within the scope of the invention to provide, as a separate processor or integrated with the processor for processing the audio signal, a processor for controlling automobile components such as the radio, caφhone or global positioning satellite navigation system, for example, in accordance with the processed sound received by the DSDA.
Fig. 12E illustrates the DSDA integrated within a mouse 128 wherein the microphones of the microphone array are disposed adjacent slats or other means for suitably allowing sound to be received by the microphones. The mouse is otherwise a standard mouse except for the DSDA. However, additional features may be linked to the mouse keys which effect array performance such as array volume, beam direction, setting an array type, array tuning, etc. As with the other embodiments of the DSDA, the mouse may be wired or wirelessly connected to the DSDA processing circuitry and/or a personal computer. Fig. 12F illustrates the DSDA integrated with a video camera such as that used for video teleconferencing over, for example, the internet. In this instance, the DSDA may be incoφorated as a peripheral to the video camera which, by coupling means comprising such as a microphone plug, the DSDA is coupled to the video camera by, for example, a holding member or Velcro™. In the alternative, the DSDA in this embodiment may be incoφorated into the video camera. As with the other embodiments of the DSDA, the DSDA may include wireless transmission to the video camera such that the video operator may place the DSDA in the vicinity of the talent, such as an actor or actress, and record the scene from a distance.
Fig. 12G illustrates the noise canceling stethoscope of the present invention which incoφorates the DSDA. The noise canceling stethoscope is incoφorated herein by reference to U.S. Patent Serial No. 08/963,164 filed November
4, 1997 (now U.S. Patent No. 5,909,495 issued Juen 1, 1999) which one skilled in the art will appreciate may be integrated with the spectral substraction techniques herein described and/or microphone array technology. Thus, the present invention is applicable to medical applications including ultrasound for canceling noise when reading ultrasound vibrations echoed in a body and retrieved for reconstruction on a display of the portion of the body including, for example, ultrasound examinations for imaging fetuses. It will be instantly recognized that removing noise in such medical applications from either sound received by the noise canceling stethoscope or ultrasound advantageously cancels noise, thereby providing improved audio signals for the noise canceling stethoscope or for imaging in ultrasound. It will be appreciated that the DSDA or microphone may be incoφorated in the noise canceling stethoscope or ultrasound device and/or the hardware/software for processing the audio to remove the noise may be incoφorated in those devices as well. In addition, the DSDA of the present invention is incoφorated into the remote control and keypad for the set top box as illustrated in Fig. 12H. The remote control, as described in copending U.S. Appln. Ser. No. 09/050,196 (filed March 30, 1998) and copending international application PCT/US99/06764 (filed March 29, 1999), both incoφorated herein by reference, is operable to input textual data on numeric and/or alphabetic keypad(s) and includes the ability to wirelessly transmit speech commands received by the microphone (DSDA) to the set top box. The UVI may be incoφorated into the remote control to interface the speech signals received by the DSDA or the remote control may include the noise cancellation/reduction processing herein described. In the alternative, the set top box may include the speech processing.
The set top box as described interfaces to a television to provide both operation of the television and external sources such as cable services, internet or other on-line service. To that end, the set top box may incoφorate the processing ability for supporting internet access. Fig. 13 illustrates the universal voice interface 142 which may embody the DSDA. It will be appreciated that any type of microphone may be incoφorated as UVI, including a dieletret, a stereo, unidirectional or multi-directional microphone. The received audio signals are transferred from the universal voice interface by any appropriate means including wired or wireless transmission such as infrared or radio frequency transmission. It will be appreciated that the universal voice interface may comprise the microphone by itself or include interface circuitry such as analog-to- digital converters and a multiplexor for interface into a computer processor. The audio signals are received by any known communication port of a computer including the serial or parallel port or for that matter the USB port. A device driver may be included for the driving the processor 144 or the processor itself may strobe the appropriate port register for the audio signals converted into digital data. It will also be appreciated that the audio signals may be input to a sound card installed in the computer and then forwarded by the appropriate device driver to the processor 144. On the other hand, either the device driver or the sound card may provide the processing circuitry or software for processing the audio signal to remove noise. In any case, the audio signals are signal processed to remove the noise in accordance with the adaptive beam forming techniques described herein. In addition, or in the alternative, the audio processing may include noise cancellation which inverts the noise portion of the signal, extracted by a separate microphone or spectral processing subtraction, and subtracted from the main reference signal.
Fig. 14 illustrates the universal voice interface circuitry. Of course, the universal voice interface may be provided by software. In this example, the universal voice interface is coupled to the DSDA incoφorated around the rim of the rear- view mirror 126. It will be appreciated that other microphone arrangements may be incoφorated with the universal voice interface circuitry such as those illustrated in Figs. 12A-F. The universal voice interface circuitry may be incoφorated into the microphone arrangement or coupled thereto by any appropriate transmission means including wired or wireless transmission. The UVI interfaces the analog audio signals received by the microphone arrangement to a digital processor such as that found in a personal computer and, therefore, includes an analog-to-digital converter series 146, each A/D converter in the series corresponding to a microphone in the DSDA. Of course, a single analog-to-digital converter may be provided where, for example, a single microphone element is employed. In this instance, the A/D converters 146 are driven by a 44KHz clock controlled by the microprocessor 150; but the clock may be of any clock speed corresponding to the processor speed of the system. Also in this example, the A D converters 146 output 16-bit samples; but the sample size may vary for different systems or application. The digitized samples are coupled to a multiplexer
148 where they are multiplexed at the control of the microprocessor in a predetermined order and forwarded to the microprocessor 150. The figure shows that the samples are forwarded to the microprocessor on a 16-bit channel; but the channels may be of any band width including a bit stream. The microprocessor may include processing hardware/software for processing the audio samples in a format which agrees with the later digital speech processor. In addition, the microprocessor 150 may provide the audio processing such as the adaptive beam forming techniques or noise cancellation techniques herein described. This example illustrates that the microprocessor 150 is manufactured as an application-specific integrated circuit (ASIC); but the present invention also may be practiced as other IC structures. There is provided a dedicated adaptive filter 152 for assisting the microprocessor 150 in the adaptive beam forming techniques herein described. The processed audio signal is forwarded to a digital speech processor, such as the speech recognition unit which recognizes and controls audio driven components.
Fig. 15 shows the adaptive beam forming technique of the present invention incoφorated in the UVI. In the example shown in Fig. 15, the audio signals received by each microphone in the DSDA is received by the A/D converter 146 and, once digitally converted, are coupled to corresponding band pass filters 154 which act as digital samplers. It will be appreciated that the A/D converter 146 may also act as a band pass filter whose filtering characteristic is controlled by the clock rate which operates the A/D converter. A direction calculation unit 156 is provided for calculating the direction of an audio sound source which drives the band pass filters to steer the direction in which sound is primarily received in accordance with the direction calculated by the direction calculation unit 156. The main channel matrix 158 is provided for receiving the signals in the main channel of the beam formed by the direction calculation unit 156 in accordance with weights provided from the direction calculation unit. The reference channel matrix 160 is provided for receiving the audio signals which are substantially not within the beam formed by the direction calculation unit 156. It will be appreciated that the direction calculation unit is controlled by the system controller 172. Down converters 162, 164 are provided to down convert the signals received by the main and reference channels respectively. The dedicated adaptive filter 166 adaptively processes the audio signals in accordance with the adaptive beam forming techniques described herein. An arithmetic logic unit 168 is provided for subtracting from the main channel the adaptively formed reference channel noise as controlled by the system controller 172. The resulting substantially noise-free signal is provided to a multiplexer 170 which multiplexes the noise-free signal with the main channel signal as controlled by the signal controller 172. In this manner, the system controller 172 controls the multiplexer 170 to either select the main channel or the channel with noise removed. It is shown in the figure that the multiplexer is a four input multiplexer; however, any other equivalent means may be provided for selecting between the signals. The multiplexed signal is output to a digital speech processor such as a speech recognition processor which recognizes speed and controls in response thereto audio/driven components.
Figs. 16 and 17 illustrate the operation of the universal voice interface. In step 174 the system is reset. Control advances to step 176 wherein the direction calculation unit is enabled. A determination is made in step 178 whether the direction calculation is in error and advances to step 180 where a system alarm is raised if the answer to the determination is in the affirmative. Otherwise, the direction calculation is correct and control advances to step 182 wherein a direction result is awaited. When the direction result is obtained, control advances to step 184 wherein the main end reference channel weights are set for the respective main and reference channel matrices. Control advances to step 186 wherein the system awaits a ready signal and, upon receiving the ready signal, step 188 enables, the down conversion. The operation is further described in Fig. 17, wherein step 190 enables the dedicated adaptive filter. If a filter error is detected in step 192, and, if it is determined that a filter error occurs, an alarm is raised and the system is reset in step 196. In step 194, the filtered result is awaited and, upon receiving the filtered result, the arithmetic logic unit is enabled in step 198. If the ALU commits an error as determined by step 200, an alarm is raised and the system is reset in step 210. Otherwise, the multiplexer is enabled in step 212.
Fig. 18 shows the universal voice interface incoφorated with a computer monitor wherein the microphones of the DSDA array are situated around the perimeter of the face of the monitor facing the computer user. In this example, the UVI incoφorates A/D converters 146 which digitize the audio signals received from the respective microphones in the array. In at least one embodiment, the A/D converters 146 are incoφorated into the body of a standard personal computer plug which plugs into any standard type of personal computer port. In addition, it is within the scope of the present invention to power the analog-to-digital converter 146 through the power pin of the personal computer port. In the example shown in the figure, the plug is an RS-232 Interface/Parallel Port Plug which plugs into a corresponding RS-232/Parallel port. In the example, there is shown a monitor; however, it is well within the scope of the present invention to incoφorate the microphone arrangement which include either the microphone array or the several types of microphones described in virtually any fixture or appliance such as those shown in Figs. 12A-F.
A. System Implementation 1. Sub-band Processing FIG. 19 shows one preferred embodiment of the present invention using sub-bands where an adaptive filter driven from the sub-bands rather than the entire bandwidth of the input signal. Sub-bands result from partitioning a broader band in any manner as long as the subbands can be combined together so that the broader band can be reconstructed without distortions. One may use a so-called "perfect reconstruction structure" as known in the art to split the broadband into sub-bands and to combine the sub-bands together substantially without distortion. For details on perfect reconstruction structures, see P.P. Vaidyanathan, Quadrature Mirror Filter Banks, M- Band Extensions and Perfect-Reconstruction Techniques, IEEE ASSP Magazine, pp. 4- 20, July 1987. In the preferred embodiment, a broader band is partitioned into sub- bands, using several partitioning steps successively through intermediate bands. Broadband inputs from an array of sensors, 191 a- 191 d, are sampled at an appropriate sampling frequency and entered into a main-channel matrix 192 and a reference- channel matrix 193. The main-channel matrix generates a main channel, a signal received in the main looking direction of the sensor array, which contain a target signal component and an interference component. FI, 194, and F2, 195 are splitters which first split the main channel into two intermediate bands, followed by down-sampling by two. Down-sampling is a well-known procedure in digital signal processing. Down- sampling by two, for example, is a process of sub-sampling by taking every other data point. Down-sampling is indicated by a downward arrow in the figure. Splitters F3, 196 and F4, 197 further split the lower intermediate band into two sub-bands followed by down-sampling by two.
In an example using a 16 Khz input signal, the result is a 0-4 Khz lower sub-band with 1/4 of the input sampling rate, a 4-8 Khz upper sub-band with 1/4 of the input sampling rate, and another upper 8-16 Khz intermediate band with 1/2 of the input sampling rate.
The reference channels are processed in the same way by filters FI, 198, and F2, 199, to provide only the lower sub-band with 1/4 of the input sampling rate, while the other sub-bands are discarded. The lower sub-bands of the reference channels are fed into an adaptive filter 1910, which generates canceling signals approximating interferences present the main channel. A subtracter 1911 subtracts the canceling signals from the lower sub- band of the main channel to generate an output in the lower sub-band. The output is fed back to the adaptive filter for updating the filter weights. The adaptive filter processing and the subtraction is performed at the lower sampling rate appropriate for the lower sub-band. At the same time the other upper bands of the main channel are delayed by delay units, 1912 and 1913, each by an appropriate time, to compensate for various delays caused by the different processing each sub-band is going through, and to synchronize them with the other sub-bands. The delay units can be implemented by a series of registers or a programmable delay. The output from the subtracter is combined with the other two sub-bands of the main channel through the reconstruction filters Hl-
H4, 1914-1917, to reconstruct a broadband output. H1-H4 may be designed such that they together with F1-F4 provide a theoretically perfect reconstruction without any distortions. Reconstructors H3 and H4 combine the lower and upper sub-bands into a low intermediate band, followed by an inteφolation by two. An inteφolation is a well- known procedure in digital signal processing. Inteφolation by two, for example, is an up-sampling process increasing the number of samples by taking every other data point and inteφolating them to fill as samples in between. Up-sampling is indicated by an upward arrow in the figure. The reconstructors HI, 1916 and H2, 1917 further combine the two intermediate bands into a broadband.
In the preferred embodiment described, non-adaptive filter processing is performed in the upper sub-band of 194-1916 Khz. Adaptive filter processing is performed in the lower sub-band of 0-4 Khz where most of interferences are located. Since there is little computation overhead involved in the non-adaptive filter processing, the use of non- adaptive filter processing in the upper sub-band can reduce the computational burden significantly. The result is superior performance without an expensive increase in the required hardware.
2. Broadband Processing with Band-Limited Adaptation
FIG. 20 shows another preferred embodiment using broadband processing with band-limited adaptation. Instead of using sub-band canceling signals which act on a sub-band main channel, the embodiment uses broadband canceling signals which act on a broadband main channel. But, since adaptive filter processing is done in a low-frequency domain, the resulting canceling signals are converted to a broadband signal so that it can be subtracted from the broadband main channel.
As before, broadband inputs from an array of sensors, 2021a-2021d, are sampled at an appropriate sampling frequency and entered into a main-channel matrix 2022 and a reference-channel matrix 2023. The main-channel matrix generates a main channel, a signal received in the main-looking direction, which has a target signal component and an interference component. The reference-channel matrix generates reference channels representing interferences received from all other directions. A low- pass filter 2025 filters the reference channels and down-samples them to provide low- frequency signals to an adaptive filter 2026.
The adaptive filter 2026 acts on these low-frequency signals to generate low- frequency canceling signals which estimate a low- frequency portion of the interference component of the main channel. The low-frequency canceling signals are converted to broadband signals by an inteφolator 2028 so that they can be subtracted from the main channel by a subtracter 2029 to produce a broadband output.
The broadband output is low-pass filtered and down-sampled by a filter 2024 to provide a low- frequency feedback signal to the adaptive filter 2026. In the mean time, the main channel is delayed by a delay unit 2027 to synchronize it with the canceling signals from the adaptive filter 2026. 3. Broadband Processing with an External Main-Channel Generator
FIG. 21 shows yet another preferred embodiment similar to the previous embodiment except that an external main-channel generator is used instead of a main- channel matrix to obtain a broadband main channel. This embodiment is useful when it is desired to take advantage of the broadband capabilities of commercially available hi- fi microphones.
A broadband input is obtained by using an external main-channel generator, such as a shotgun microphone 2143 or a parabolic dish 2144. The broadband input is sampled through a high fidelity A-to-D converter 2145. The sampling rate should preferably be high enough to maintain the broad bandwidth and the audio quality of the external main-channel generator.
A reference-channel matrix 2142 is used to obtain low-frequency reference channels representing interferences in the low-frequency domain. Since adaptive filter processing is done in the low- frequency domain, the reference-channel matrix does not need a broadband capability.
A subtracter 2150 is used to subtract canceling signals estimating interferences from the broadband input. The broadband output is filtered by a low-pass filter 2146 which also performs down-sampling. The low-pass filtered output and the low-frequency reference channels are provided to an adaptive filter 2147. The adaptive filter acts on these low frequency signals to generate low-frequency canceling signals. In the meantime, the broadband input is delayed by a delay unit 2148 so that it can be synchronized with the canceling signals from the adaptive filter 2147. The delay unit can be implemented by a series of registers or by a programmable delay. The low- frequency canceling signals are converted to broadband canceling signals by an inteφolator 2149 so that they can be subtracted from the broadband main channel to produce the broadband output. B. Software Implementation
The invention described herein may be implemented using a commercially available digital signal processor (DSP) such as Analog Device's 2100 Series or any other general puφose microprocessor. For more information on Analog Device 2100 Series, see Analog Device, ADSP-2100 Family User's Manual, 3rd Ed., 1995. 1. Sub-Band Processing
FIGS. 22A-22D are a flow chart depicting the operation of a program in accordance with the first preferred embodiment of the present invention using sub-band processing.
Upon starting at step 22100, the program initializes registers and pointers as well as buffers (steps 22110-22120). When a sampling unit sends an interrupt (step 22131) that samples are ready, the program reads the sample values (step 22130), and stores them in memory (step 22140). The program retrieves the input values (step 22151) and main-channel matrix coefficients (step 22152) to generate a main channel by filtering the inputs values using the coefficients (step 22150), and then stores the result in memory (step 22160).
The program retrieves the input values (step 22171) and reference- channel matrix coefficients (step 22172) to generate a reference channel by filtering the input values using the coefficients (step 22170), and then store the result (step 22180). Steps 22170 and 22180 are repeated to generate all other reference channels (step 22190).
The program retrieves the main channel (step 22201) and the FI filter coefficients (step 22202) to generate an lower intermediate band with 1/2 of the sampling rate appropriate for the whole main channel by filtering the main channel with the coefficients and down-sampling the filtered output (step 22210), and then stores the result (step 22220). Similarly, the F2 filter coefficients are used to generate a upper intermediate band with 1/2 of the sampling rate (step 22240). The F3 and F3 filter coefficients are used to further generate a lower sub-band with 1/4 of the sampling rate
(step 22260) and a upper sub-band with 1/4 of the sampling rate (step 22280).
The program retrieves one of the reference channels (step 22291) and the FI filter coefficients (step 22292) to generate an intermediate band with 1/2 of the sampling rate by filtering the reference channel with the coefficients and down- sampling the filtered output (step 22290), and then stores the result (step 22300). Similarly, the F2 filter coefficients are used to generate a lower sub-band with 1/4 of the sampling rate (step 22320). Steps 22290-22320 are repeated for all the other reference channels (step 22330). The program retrieves the reference channels (step 22341) and the main channel (step 22342) to generate canceling signal using an adaptive beamforming process routine (step 22340). The program subtracts the canceling signals from the main channel to cancel the interference component in the main channel (step 22350).
The program then inteφolates the output from the adaptive beamforming process routine (step 22360) and filtering the output with the H3 filter coefficients (step 22361) to obtain an up-sampled version (step 22370). The program also inteφolates the main channel in the lower band (step 22380) and filters it with the H4 filter coefficients (step 22381) to obtain an up-sampled version (step 22390). The program combines the up-sampled versions to obtain a lower intermediate main channel (step 22400). The program inteφolates the lower intermediate main channel (step
22410) and filters it with the HI filter coefficients (step 22420) to obtain an up-sampled version (step 22420). The program also inteφolates the upper intermediate main channel (step 22430) and filters it with the H2 filter coefficients (step 22431) to obtain an up-sampled version (step 22440). The program combines the up-sampled versions to obtain a broadband output (step 22450).
2. Broadband Processing with Frequency-Limited Adaptation
FIGS. 23A-C are a flow chart depicting the operation of a program in accordance with the second preferred embodiment of the present invention using broadband processing with frequency- limited adaptation. Upon starting at step 23500, the program initializes registers and pointers as well as buffers (steps 23510-23520). When a sampling unit sends an interrupt (step
23531) that the samples are ready, the program reads the sample values (step 23530), and stores them in memory (step 23540). The program retrieves the broadband sample values (step 23551) and the main-channel matrix coefficients (step 23552) to generate a broadband main channel by filtering the broadband sample values with the coefficients (step 23550), and then stores the result in memory (step 23560).
The program retrieves the broadband samples (step 23571) and reference-channel matrix coefficients (step 23572) to generate a broadband reference channel by filtering the samples using the coefficients (step 23570), and then stores the result (step 23580). Steps 23570 and 23580 are repeated to generate all the other reference channels (step 23590).
The program retrieves the reference channels (step 23601) which are down-sampled (step 23602), the main channel (step 23603) which is also down-sampled to the low sampling rate (step 23604), and the low-frequency output (step 23605) to generate a low-frequency canceling signal (step 23600) using an adaptive beamforming process routine. The program updates the adaptive filter weights (step 23610) and inteφolates the low-frequency canceling signal to generate a broadband canceling signal (step 23620). Steps 23610-23620 are repeated for all the other reference channels (step 23630).
The program subtracts the canceling signals from the main channel to cancel the interference component in the main channel (step 23640).
The program low-pass filters and inteφolates the broadband output (step 23650) so that the low-frequency output can fed back to update the adaptive filter weights.
3. Broadband Processing with an External Main-Channel Generator FIGS. 24A-24C are a flow chart depicting the operation of a program in accordance with the third preferred embodiment of the present invention using broadband processing with an external main-channel generator.
Upon starting at step 24700, the program initializes registers and pointers as well as buffers (steps 24710-24720). When a sampling unit sends an interrupt (step 24731) that samples are ready, the program reads the sample values (step 24730), and stores them in memory (step 24740).
The program then reads a broadband input from the external main- channel generator (step 24750), and stores it as a main channel (step 24760). The program retrieves the low-frequency input (step 24771) and reference-channel matrix coefficients (step 24772) to generate a reference channel by multiplying the two (step 24770), and then stores the result (step 24780). Steps 24770 and 24780 are repeated to generate all the other reference channels (step 24790). The program retrieves the low-frequency reference channels (step 24801), the main channel (step 24802) which is down-sampled (step 24803), and a low- frequency output (step 24604) to generate low-frequency canceling signals (step 24600) using an adaptive beamforming process routine. The program updates the adaptive filter weights (step 24810) and inteφolates the low-frequency canceling signal to generate the broadband canceling signal (step 24820). Steps 24810-24820 are repeated for all the other reference channels (step 24830). The program subtracts the broadband canceling signals from the broadband main channel to generate the broadband output with substantially reduced interferences (step 24840).
The program low-pass filters and inteφolates the broadband output (step 24850) so that the low-frequency output can fed back to update the adaptive filter weights.
FIG. 25 shows the functional blocks of a preferred embodiment in accordance with the present invention. The embodiment deals with finding the direction of a sound source, but the invention is not limited to such. It will be understood to those skilled in the art that the invention can be readily used for finding the direction of other wave sources such as an electromagnetic wave source. The system includes an array of microphones 2501 that sense or measure sound from a particular sound source and that produce analog signals 2507 representing the measured sound. The analog signals 2507 are then sampled and converted to corresponding digital signals 2508 by an analog-to-digital (A-to-D) converter 2502. The digital signals 2508 are filtered by a band-pass filter 2503 so that the filtered signals 2509 contain only the frequencies in a specific bandwidth of interest for the puφose of determining the direction of the sound source. The filtered signals 2509 are then fed into an approximate-direction finder 2504 which calculates an approximate direction 2510 in terms of a microphone pair selected among the microphones. The precise-direction finder 5 estimates the precise-direction 2511 of the sound source based on the approximate direction. The validity of the precise-direction 2511 is checked by a measurement qualification unit 2506, which invalidates the precise direction if it does not satisfy a set of measurement criteria. Each functional block is explained in more detail below.
5.1. Microphone Array
FIG. 26 shows an example of the array of microphones 1 that may used in accordance with the present invention. The microphones sense or measure the incident sound waves from a sound source and generate electronic signals (analog signals) representing the sound. The microphones may be omni, cardioid, or dipole microphones, or any combinations of such microphones.
The example shows a cylindrical structure 2621 with six microphones
2622-27 mounted around its periphery, and an upper, center microphone 2628 mounted at the center of the upper surface of the structure. The upper, center microphone is optional, but its presence improves the accuracy of the precise direction, especially the elevation angle. Although the example shows the subset of microphones in a circular arrangement of the microphone array, the microphone array may take on a variety of different geometries such as a linear array or a rectangular array.
5.2 A-to-D Converter The analog signals representing the sound sensed or measured by the microphones are converted to digital signals by the A-to-D converter 2502, which samples the analog signals at an appropriate sampling frequency. The converter may employ a well-known technique of sigma-delta sampling, which consists of oversampling and built-in low-pass filtering followed by decimation to avoid aliasing, a phenomenon due to inadequate sampling.
When an analog signal is sampled, the sampling process creates a mirror representation of the original frequencies of the analog signal around the frequencies that are multiples of the sampling frequency. "Aliasing" refers to the situation where the analog signal contains information at frequencies above one half of the sampling frequency so that the reflected frequencies cross over the original frequencies, thereby distorting the original signal. In order to avoid aliasing, an analog signal should be sampled at a rate at least twice its maximum frequency component, known as the Nyquist frequency.
In practice, a sampling frequency far greater than the Nyquist frequency is used to avoid aliasing problems with system noise and less-than-ideal filter responses. This oversampling is followed by low-pass filtering to cut off the frequency components above the maximum frequency component of the original analog signal. Once the digital signal is Nyquist limited, the rate must be reduced by decimation. If the oversampling frequency is n times the Nyquist frequency, the rate of the digital signal after oversampling needs to be reduced by decimation, which takes one sample for every n samples input.
An alternative approach to avoid aliasing is to limit the bandwidth of signals using an analog filter that halves the sampling frequency before the sampling process. This approach, however, would require an analog filter with a very shaφ frequency cut-off characteristic.
5.3 Bandpass Filter
The puφose of the bandpass filter 2503 is to filter the signals sensed or measured by the microphones so that the filtered signals contain those frequencies optimal for detecting or determining the direction of the signals. Signals of too low a frequency do not produce enough phase difference at the microphones to accurately detect the direction. Signals of too high a frequency have less signal energy and are thus more subject to noise. By suppressing signals of the extreme high and low frequencies, the bandpass filter 2503 passes those signals of a specific bandwidth that can be further processed to detect or determine the direction of the sound source. The specific values of the bandwidth depends on the type of target wave source. If the source is a human speaker, the bandwidth may be between 300 Hz and 1500 Hz where typical speech signals have most of their energy concentrated. The bandwidth may also be changed by a calibration process, a trial-and-error process. Instead of using a fixed bandwidth during the operation, initially a certain bandwidth is tried. If too many measurement errors result, the bandwidth is adjusted to decrease the measurement errors so as to arrive at the optimal bandwidth.
5.4 Direction Estimation
For efficiency of computation, the system first finds the approximate direction of the sound source, without the burden of heavy computation, and subsequently calculates the precise direction by using more computation power. The approximate direction is also used to determine the subset of microphones that are relevant to subsequent refinement of the approximate direction. In some configurations, some of the microphones may not have a line of sight to the source, and thus may create phase errors if they participate in further refinement of the approximate direction. Therefore, a subset of microphones are selected that would be relevant to further refinement of the source direction.
5.4.1 Approximate-Direction Finding
FIG. 27 shows the approximate and exact direction finding units 2721, 2722.
FIG. 3 shows the approximate-direction finder 2821 in detail. It is based on the idea of specifying the approximate direction of the sound source in terms of a direction peφendicular to a pair of microphones. Let peripheral microphone pairs be the microphones located adjacent to each other around the periphery of the structure holding the microphones, except that a microphone located at the center of the structure, if any, are excluded. For each peripheral microphone pair, "pair direction" is defined as the direction in the horizontal plane, pointing from the center of the pair outward from the structure, peφendicular to the line connecting the peripheral microphone pair. "Sector direction" is then defined as the pair direction closest to the source direction, selected among possible pair directions. If there are n pairs of peripheral microphones, there would be n candidates for the sector direction.
The sector direction corresponding to the sound source is determined using a zero-delay cross-correlation. For each peripheral microphone pair, a correlation calculator 2831 calculates a zero-delay cross-correlation of two signals received from the microphone pair, X;(t) and Xj(t). It is known to those skilled in the art that such a zero-delay cross-correlation function, R,j(0), over a time period T can be defined by the following formula:
T R,j(0) = Σ X,(t) Xj(t) t=0
It is noted that a correlation calculator is well-known to those skilled in the art and may be available as an integrated circuit. Otherwise, it is well-known that such a correlation calculator can be built using discrete electronic components such as multipliers, adders, and shift registers.
Among the peripheral microphone pairs, block 2832 finds the sector direction by selecting the microphone pair that produces the maximum correlation. Since the signals having the same or similar phase are correlated with each other, the result is to find the pair with the same phase (equi -phase) or having the least phase difference. Since the plane of equi-phase is peφendicular to the propagation direction of the sound wave, the pair direction of the maximum correlation pair is, then, the sector direction, i.e., the pair direction closest to the source direction.
Once the sector direction is found, block 2833 identifies the microphones that participate in further refinement of the approximate direction. "Sector" is defined as the subset of the microphones in the microphone array, which participate in calculating the precise direction of the sound source. For example, where some of the microphones in the array are blocked by a mechanical structure, the signals received by those microphones are not likely to be from direct-travelling waves, and thus such microphones should be excluded from the sector.
In one preferred embodiment, the sector includes the maximum- correlation peripheral microphone pair, another peripheral microphone adjacent to the pair, and a center microphone, if any. Of two peripheral microphones adjacent to the maximum-correlation peripheral microphone pair, the one with a higher zero-delay cross-correlation is selected. The inclusion of the center microphone is optional, but the inclusion helps to improve the accuracy of the source direction, because otherwise three adjacent microphones would be arranged almost in a straight line. There may be other ways of selecting the microphones to be included in the sector, and the information about such selection schemes may be stored in computer memory for an easy retrieval during the operation of the system.
5.4.2 Precise-Direction Finding
The precise-direction finder 5 (Fig. 29) calculates the precise direction of the sound source using a full cross-correlation. Block 2941 first identifies all possible combinations of microphone pairs within the sector. For each microphone pair identified, block 2942 calculates a full cross-correlation, Rιj(τ), over a time period T using the follow formula, a well-known formula to those skilled in the art:
T Rυ(τ) = ∑ X,(t) Xj(t-τ) t=0
As mentioned before, a correlation calculator is well-known to those skilled in the art and may be available as an integrated circuit. Otherwise, it is well-known that such a correlation calculator can be built using discrete electronic components such as multipliers, adders, and shift registers.
R._.(τ) can be plotted as a cross-correlation curve. For each R,j(τ) , block 2943 finds the delay, τs corresponding to the peak point of the cross-correlation curve. Note that this peak-correlation delay τs lies at a sampling point. In reality, however, the maximum-correlation point may be located between sampling points. Therefore, block 2944 calculates such maximum-correlation delay (which may be between sampling points), Td, by inteφolating the cross-correlation function using a parabolic curve (y = p x2 + q x + r) as follow:
C(k-l) = P k2 + q (k-l) + r C(k) = P k2 + q k + r C(k+l)= P k2 + q (k+l) + r By solving the above equation for p, q, and r, the maximum point is obtained by obtaining the derivative of the parabolic curve and setting the derivative of the equation to zero. The maximum point τd is - (l/2p), and is further expressed as follow:
1 C(k-1)-C(k+1) τd = - (k + ( ) ) fs 2 (C(k-l) - 2 C(k) + C(k+l)) where fs denotes the sampling frequency; k denotes the sampling point corresponding to τs; and C(k) is the delay corresponding to sampling point k. The use of the inteφolation technique improves the accuracy of the maximum-correlation delay, while eliminating the need for using a very high sampling rate.
Since each maximum-correlation delay calculated for each microphone pair indicates the direction of the sound source measured by individual microphone pairs, the individual maximum-correlation delays are combined to estimate an average direction of the sound source. The estimation process provides a better indication of the source direction than each individual measured directions because it eliminates ambiguity problems inherent to each individual pair and provides a mechanism to verify the relevancy of the individual measurements by possibly eliminating those individual measurements that are far off from the source direction. Block 2945 calculates the precise direction of the sound source in terms of a vector in the Cartesian coordinates, K = (Kx, Ky, Kz), from the vector of individual measured delays, Td, by solving the linear equation between K and Td. The time delay between any two sensors is equal to the projection of the distance vector between them along the K vector divided by the sound velocity. Thus, the Td vector can be expressed as follows:
Td = - ( R K ) / c where c is the speed of sound; R denotes the matrix representing the geometry of the microphone array in terms of position differences among the microphones as follows:
[ X2-X,, Y2-Yι, Z2-Z, ] R = [ ... ]
Figure imgf000039_0001
Since the above equation is over-determined in that there are more constraints than the number of variables, the least-square (LS) method is used to obtain the optimal solution. Defining the error as the difference between the measured time delay vector and the evaluated time delay calculated, the error vector ε is given by: ε = (R K / c) + Td
The solution depends on the covariance matrix Λ of the delay measurements which is defined by
A - E{Td Td τ} - E{Td}E{Td}T = COV {Td} where E{} denotes the expected value operator, and {*}τ denotes the transpose of a matrix. The LS estimated solution, K, is then expressed in the following formula:
K = - c (Rτ Λ"1 R)"1 Rτ Λ"1 Td
= - c B Td where {*}"' denotes the inverse of a matrix. For derivation of the equation, see A. Gelb,
Applied Optimal Estimation, the MIT Press, p. 103. Note that the B matrix depends only on the geometry of the microphone array, and thus can be computed off-line, without burdening the computation requirement during the direction determination.
Block 2946 converts K into polar coordinates. FIG. 5 shows the 3- dimensional coordinate system used in the present invention. An azimuth angle, φ, is defined as the angle of the source direction in the horizontal plane, measured clockwise from a reference horizonal direction (e.g. x-axis). An elevation angle, Θ, is defined as the vertical angle of the source direction measured from the vertical axis (z-axis).
Block 2946 calculates φ and Θ from Kx, Ky, and Kz by converting the
Cartesian coordinates to the polar coordinates by solving the nonlinear equation between (Kx, Ky, Kz) and (φ, Θ):
[Kx] [sin(Θ) cos(φ) ]
[Ky] = [sin(Θ) sin(φ) ]
[Kz] [ cos(Θ) ] In the case of a 3 -dimensional microphone array (FIG. 30) (with the upper microphone), the above equation yields three non-linear equations with two unknowns (φ, Θ). The problem is over-determined that there are more equations than the number of variables. The LS solution for (φ, Θ) has no close- form solution, but a suboptimal, closed-form, estimation can be found as: φ = tan",(Ky/Kx)
Θ = tan 1(v(KX 2+Ky 2)/Kz)
If a 2-dimensional microphone array were used (without the upper microphone), block 46 calculates φ and Θ from Kx and Ky using the following formula: φ = tan'1(Ky/Kx)
Figure imgf000041_0001
Note that the algorithm can function even when the microphones are arranged in a 2-dimensional arrangement and still capable of resolving the azimuth and elevation.
5.5 Measurement Qualification Unit When the precise-direction finder 2722 calculates the precise-direction of the sound source, the result may not reflect the true direction of the sound source due to various noise and measurement errors. The puφose of the measurement qualification unit 2505 is to evaluate the soundness or validity of the precise direction using a variety of measurement criteria and invalidate the measurements if the criteria are not satisfied. FIGS 31a, 31 b, 31 c, and 31 d show different embodiments of the measurement qualification unit using a different measurement criterion. These embodiments may be used individually or in any combination.
FIG. 31a shows a first embodiment of the qualification unit that uses a signal-to-noise ratio (SNR) as a measurement criterion. The SNR is defined as a ratio of a signal power to a noise power. To calculate the SNR, the measured signals are divided into blocks of signals having a predetermined period such as 40 milliseconds. Block 3161 calculates the signal power for each signal block by calculating the square- sum of the sampled signals within the block. The noise power can be measured in many ways, but one convenient way of measuring the noise power may be to pick the signal power of the signal block having the minimum signal power and to use it as the noise power. Block 3162 selects the signal block having the minimum power over a predetermined interval such as 2 second. Block 3163 calculates the SNR as the ratio of the signal power of the current block to that of the noise power. Block 3164 invalidates the precise direction if the SNR is below a certain threshold.
Fig. 31b shows a second embodiment of the measurement qualification unit that uses a spread (range of distribution) of individual measured delays as a measurement criterion. The precise source direction calculated by the precise-direction finder represents an average direction among the individual measured directions measured by microphone pairs in the sector. Since delays are directly related to direction angles, the spread of the individual measured delays with respect to the individual estimated delay indicates how widely the individual directions vary with respect to the precise direction. Thus, the spread gives a good indication as to the validity of the measurements. For example, if the individual measured delays are too widely spread, it is likely to indicate some kind of measurement error.
Te is defined as a vector representing the set of individual estimated delays τe corresponding to the precise direction, K. Block 3171 calculates Te from K based on the linear relation between K and Te. Te = (- R K) / c where R denotes the position difference matrix representing the geometry of the microphone array as follows
[ X2-X1, Y2-Y,, Z2-Zι ]
R = [ ... ]
Figure imgf000042_0001
and c is the propagation velocity of sound waves. Block 3172 compares the individual measured delays τd with the individual estimated delays τe and calculates the spread of individual measured delays using the following measure: Σ e2 = Σ (τd - τe)2
If this spread exceeds a certain threshold, block 73 invalidates the precise source direction.
Alternatively, the spread can be calculated directly from the individual measured delays using the following:
Σ e2 = E * Td where E = R (Rτ R)"1 Rτ - 1; and I is the identity matrix.
FIG. 31c shows a third embodiment of the measurement qualification unit that uses the azimuth angle, φ, as a measurement criterion. If φ deviates significantly from the sector direction (the approximate source direction), it is likely to indicate that the precise direction is false. Therefore, if φ is not within a permissible range of angles (e.g. within +/- 60 degrees) of the sector direction, the precise direction is invalidated.
FIG. 3 Id shows a fourth embodiment of the measurement qualification unit that uses the elevation angle, Θ, as a measurement criterion. If Θ deviates significantly from the horizontal direction (where Θ = 90°), it is likely to indicate the direction of reflected sound waves through the ceiling or the floor rather than that of direct sound waves. Therefore, if Θ is not within a range of allowable angles (e.g. from 30° to 150°), the precise direction is invalidated. As mentioned before, the above embodiments can be used selectively or combined to produce a single quality figure of measurement, Q, which may be sent to a target system such as a controller for a videoconferencing system. For example, Q may be set to 0 if any of the error conditions above occurs and set to the SNR otherwise.
The direction finding system of the present invention can be used in combination with a directional microphone system, which may include an adaptive filter. Such adaptive filter is not limited to a particular kind of adaptive filter. For example, one can practice the present invention in combination with the invention disclosed in applicant's commonly assigned and co-pending U.S. patent application Serial No. 08/672,899, filed June 27, 1996, entitled 'System and Method for Adaptive Interference Canceling,' by inventor Joseph Marash and its corresponding PCT application WO 97/50186, published December 31, 1997. Both applications are incoφorated by reference herein in their entirety.
Specifically, the adaptive filter may include weight constraining means for truncating updated filter weight values to predetermined threshold values when each of the updated filter weight value exceeds the corresponding threshold value. The adaptive filter may further include inhibiting means for estimating the power of the main channel and the power of the reference channels and for generating an inhibit signal to the weight updating means based on normalized power difference between the main channel and the reference channels.
The weight constraining means may include a frequency-selective weight-control unit, which includes a Fast Fourier Transform (FFT) unit for receiving adaptive filter weights and performing the FFT of the filer weights to obtain frequency representation values, a set of frequency bins for storing the frequency representation values divided into a set of frequency bands, a set of truncating units for comparing the frequency representation values with a threshold assigned to each bin and for truncating the values if they exceed the threshold, a set of storage cells for temporarily storing the truncated values, and an Inverse Fast Fourier Transform (IFFT) unit for converting them back to the adaptive filter weights. The adaptive filter in the directional microphone that may be used in combination with the present invention may also employ dual-processing interference canceling system where adaptive filter processing is used for a subset of a frequency range and fixed filter processing is used for another subset of the frequency range. For example, one can practice the present invention in combination with the invention disclosed in applicant's commonly assigned and co-pending U.S. patent application Serial No. 08/840,159, filed April 14, 1997, entitled 'Dual-Processing Interference Canceling System,' by inventor Joseph Marash and corresponding continuation-in-part application, filed April 8, 1997. Both applications are incoφorated by reference herein in their entirety. It is noted that the adaptive filter processing portion of the dual processing may also employ the adaptive filter processing disclosed in applicant's commonly assigned and co-pending U.S. patent application Serial No. 08/672,899, filed
June 27, 1996, entitled 'System and Method for Adaptive Interference Canceling,' by inventor Joseph Marash and its corresponding PCT application WO 97/50186, published December 31, 1997.
5.6 Software Implementation
The present invention described herein may be implemented using a commercially available digital signal processor (DSP) such as Analog Device's 2100 Series or any other general puφose microprocessors. For more information on Analog Device 2100 Series, see Analog Device, ADSP-2100 Family User's Manual, 3rd Ed., 1995.
FIGS. 32A-32D show a flow chart depicting the operation of a program in accordance with a preferred embodiment of the present invention. The program uses measurement flags to indicate various error conditions.
When the program starts (step 32100), it resets the system (step 32101) by resetting system variables including various measurement flags used for indicating error conditions. The program then reads into registers microphone inputs sampled at the sampling frequency of 64 KHz (step 32102), which is oversampling over the Nyquist frequency. As mentioned in Section 5.2, oversampling allows anti-aliasing filters to be realized with a much more gentle cut-off characteristic of a filter. Upon reading every 5 samples (step 32103), the program performs a low-pass filter operation and a decimation by taking one sample out of every 5 samples for each microphone (step 32104). The decimated samples are stored in the registers (step 32105).
The program performs a bandpass filter operation on the decimated samples so that the output contains frequencies ranging from 1.5 to 2.5 KHz (step 32106). The output is stored in input memory (step 32107). The program repeats the above procedure until 512 new samples are obtained (step 32108). If the 512 news samples are reached, the program takes each pair of adjacent microphone pairs and multiples the received signals and add them to obtain zero-delay cross-correlation (step 32200), and the results are stored (step 32206). The calculation of zero-delay cross-correlation is repeated for all adjacent microphone pairs, not involving the center microphone (step 32201).
The microphone pair having the highest zero-delay cross-correlation is selected (step 32202) and the value is stored as the signal power (step 32207), which will be used later. Of those two microphones adjacent to the selected pair, the program calculates the zero-correlation (step 32203) and the microphone having the higher correlation is selected (step 32204). The program determines the sector by including the selected microphone pair, the neighboring microphone selected, and the center microphone, if there is one.
The program calculates the average power of the 512 samples taken from the center microphone (step 32300). The lowest average energy during the latest 2 seconds is set to be the noise power (steps 32301-32305).
The program calculates the full cross-correlation of signals received by each microphone pair in the sector (step 32306). The program finds the peak cross- correlation delay, τs, where the correlation is maximum (step 32307). τs lies on a sampling point, but the actual maximum-correlation delay, τd, may occur between two sampling points. If τs is either the maximum or minimum possible delay (step 32308), τd is set to τs (step 32309). Otherwise, the program finds the actual maximum-correlation delays using the parabolic inteφolation formula described in Section 5.4.1 (steps 310-312). The above steps are repeated for all the microphone pairs in the sector (step 32313). The program uses the B matrix mentioned in Section 5.4.2 to obtain the direction vector K = [Kx, Ky, Kz] from the set of time delays (step 32400).
The program then calculates the azimuth angle, φ, and the elevation angle, Θ, corresponding to the direction vector obtained (step 32401). The program calculates the SNR as the ratio of the signal power and the noise power (step 32402). If the SNR exceeds a threshold (step 32403), the program raises the SNR Flag (step 32404).
The program then evaluates the elevation angle, Θ. If Θ is not within a permissible range of angles (e.g. from 30° to 150°) (step 32405), the Elevation Flag is raised (step 32406).
The program calculates corresponding delays from the precise direction
(step 32407). The program calculates a delay spread as the sum of squares of the difference between the individual measured delays and the individual estimated delays (step 32408). If the delay spread exceeds a certain threshold (step 32409), the Delay Spread Flag is raised (step 32410).
The program calculates the quality figure of measurement, Q, as a combination of all or part of the measurement criteria above (step 32411). For example, Q may be set to 0 if any of the measurement flags was raised and set to the SNR otherwise.
The program transfers φ, Θ, and Q to a target system, such as an automatic camera tracking system used in a video conferencing application (step 32412). The program resets the measurement flags (step 32413) and goes back to the beginning of the program (step 32414). While the invention has been described with reference to several preferred embodiments, it is not intended to be limited to those embodiments. It will be appreciated by those of ordinary skill in the art that many modifications can be made to the structure and form of the described embodiments without departing from the spirit and scope of the invention, which is defined and limited only in the following claims. For example, the present invention can be used to locate a direction of a source transmitting electromagnetic waves.
SPECTRAL SUBTRACTION The several embodiments of the present invention are practicable with spectral substration noise and echo acncellation and, in particular, integrated with the
DSDA technology and incoφorated in a keyboard.
Figure 33 A illustrates an embodiment of the present invention 33100. The system receives a digital audio signal at input 33102 sampled at a frequency which is at least twice the bandwidth of the audio signal. In one embodiment, the signal is derived from a microphone signal that has been processed through an analog front end,
A/D converter and a decimation filter to obtain the required sampling frequency. In another embodiment, the input is taken from the output of a beamformer or even an adaptive beamformer. In that case the signal has been processed to eliminate noises arriving from directions other than the desired one leaving mainly noises originated from the same direction of the desired one. In yet another embodiment, the input signal can be obtained from a sound board when the processing is implemented on a PC processor or similar computer processor. The input samples are stored in a temporary buffer 33104 of 256 points.
When the buffer is full, the new 256 points are combined in a combiner 33106 with the previous 256 points to provide 512 input points. The 512 input points are multiplied by multiplier 33108 with a shading window with the length of 512 points. The shading window contains coefficients that are multiplied with the input data accordingly. The shading window can be Hanning or other and it serves two goals: the first is to srnooth the transients between two processed blocks (together with the overlap process); the second is to reduce the side lobes in the frequency domain and hence prevent the masking of low energy tonals by high energy side lobes. The shaded results are converted to the frequency domain through an FFT (Fast Fourier Transform) processor 33110. Other lengths of the FFT samples (and accordingly input buffers) are possible including 256 points or 1024 points.
The FFT output is a complex vector of 256 significant points (the other 256 points are an anti-symmetric replica of the first 256 points). The points are processed in the noise processing block 33112 which includes the noise magnitude estimation for each frequency bin - the subtraction process that estimates the noise- free complex value for each frequency bin and the residual noise reduction process. An
IFFT (Inverse Fast Fourier Transform) processor 33114 performs the Inverse Fourier
Transform on the complex noise free data to provide 512 time domain points. The first
256 time domain points are summed by the summer 33116 with the previous last 256 data points to compensate for the input overlap and shading process and output at output terminal 33118. The remaining 256 points are saved for the next iteration.
It will be appreciated that, while specific transforms are utilized in the preferred embodiments, it is of course understood that other transforms may be applied to the present invention to obtain the spectral noise signal. Figure 33B is a detailed description of the noise processing block 33200.
First, each frequency bin (n) 33202 magnitude is estimated. The straight forward approach is to estimate the magnitude by calculating:
Figure imgf000049_0001
In order to save processing time and complexity the signal magnitude (Y) is estimated by an estimator 33204 using an approximation formula instead:
Y(n) = Maxf\Real(n),Imag(n)\]+0.4* Min[\Real(n),Imag(n)\J
In order to reduce the instability of the spectral estimation, which typically plagues the FFT Process (ref[2J Digital Signal Processing, Oppenheim Schafer, Prentice Hall P. 542545), the present invention implements a 2D smoothing process. Each bin is replaced with the average of its value and the two neighboring bins' value (of the same time frame) by a first averager 33206. In addition, the smoothed value of each smoothed bin is further smoothed by a second averager 33208 using a time exponential average with a time constant of 0.7 (which is the equivalent of averaging over 3 time frames). The 2D-smoothed value is then used by two processes - the noise estimation process by noise estimation processor 33212 and the subtraction process by subtractor 33210. The noise estimation process estimates the noise at each frequency bin and the result is used by the noise subtraction process. The output of the noise subtraction is fed into a residual noise reduction processor 33216 to further reduce the noise. In one embodiment, the time domain signal is also used by the residual noise processor 33216 to determine the speech free segments. The noise free signal is moved to the IFFT process to obtain the time domain output 33218.
Figure 33C is a detailed description of the noise estimation processor
33300. Theoretically, the noise should be estimated by taking a long time average of the signal magnitude (Y) of non-speech time intervals. This requires that a voice switch be used to detect the speech/non-speech intervals. However, a too-sensitive a switch may result in the use of a speech signal for the noise estimation which will defect the voice signal. A less sensitive switch, on the other hand, may dramatically reduce the length of the noise time intervals (especially in continuous speech cases) and defect the validity of the noise estimation.
In the present invention, a separate adaptive threshold is implemented for each frequency bin 33302. This allows the location of noise elements for each bin separately without the examination of the overall signal energy. The logic behind this method is that, for each syllable, the energy may appear at different frequency bands. At the same time, other frequency bands may contain noise elements. It is therefore possible to apply a non-sensitive threshold for the noise and yet locate many non-speech data points for each bin, even within a continuous speech case. The advantage of this method is that it allows the collection of many noise segments for a good and stable estimation of the noise, even within continuous speech segments.
In the threshold determination process, for each frequency bin, two minimum values are calculated. A future minimum value is initiated every 5 seconds at 33304 with the value of the current magnitude (Y(n)) and replaced with a smaller minimal value over the next 5 seconds through the following process. The future minimum value of each bin is compared with the current magnitude value of the signal. If the current magnitude is smaller than the future minimum, the future minimum is replaced with the magnitude which becomes the new future minimum. At the same time, a current minimum value is calculated at 33306. The current minimum is initiated every 5 seconds with the value of the future minimum that was determined over the previous 5 seconds and follows the minimum value of the signal for the next 5 seconds by comparing its value with the current magnitude value. The current minimum value is used by the subtraction process, while the future minimum is used for the initiation and refreshing of the current minimum.
The noise estimation mechanism of the present invention ensures a tight and quick estimation of the noise value, with limited memory of the process (5 seconds), while preventing a too high an estimation of the noise. Each bin's magnitude (Y(n)) is compared with four times the current minimum value of that bin by comparator 33308 - which serves as the adaptive threshold for that bin. If the magnitude is within the range (hence below the threshold), it is allowed as noise and used by an exponential averaging unit 33310 that determines the level of the noise 33312 of that frequency. If the magnitude is above the threshold it is rejected for the noise estimation. The time constant for the exponential averaging is typically 0.95 which may be inteφreted as taking the average of the last 20 frames. The threshold of 4*minimum value may be changed for some applications.
Figure 33D is a detailed description of the subtraction processor 33400. In a straight forward approach, the value of the estimated bin noise magnitude is subtracted from the current bin magnitude. The phase of the current bin is calculated and used in conjunction with the result of the subtraction to obtain the Real and Imaginary parts of the result. This approach is very expensive in terms of processing and memory because it requires the calculation of the Sine and Cosine arguments of the complex vector with consideration of the 4 quarters where the complex vector may be positioned. An alternative approach used in this present invention is to use a Filter approach. The subtraction is inteφreted as a filter multiplication performed by filter 33402 where H (the filter coefficient) is:
H(n) = \\Y(n)\ - \N(n)
IWI Where Y(n) is the magnitude of the current bin and N(n) is the noise estimation of that bin. The value H of the filter coefficient (of each bin separately) is multiplied by the Real and Imaginary parts of the current bin at 33404:
E(Real)=Y(Real)*H ; E(Imag)=Y(Imag)*H
Where E is the noise free complex value. In the straight forward approach the subtraction may result in a negative value of magnitude. This value can be either replaced with zero (half-wave rectification) or replaced with a positive value equal to the negative one (full-wave rectification). The filter approach, as expressed here, results in the full-wave rectification directly. The full wave rectification provides a little less noise reduction but introduces much less artifacts to the signal. It will be appreciated that this filter can be modified to effect a half-wave rectification by taking' the non- absolute value of the numerator and replacing negative values with zeros. Note also that the values of Y in the figures are the smoothed values of Y after averaging over neighboring spectral bins and over time frames (2D smoothing). Another approach is to use the smoothed Y only for the noise estimation (N), and to use the unsmoothed Y for the calculation of H.
Figure 33E illustrates the residual noise reduction processor 33500. The residual noise is defined as the remaining noise during non-speech intervals. The noise in these intervals is first reduced by the subtraction process which does not differentiate between speech and non-speech time intervals. The remaining residual noise can be reduced further by using a voice switch 33502 and either multiplying the residual noise by a decaying factor or replacing it with zeros. Another alternative to the zeroing is replacing the residual noise with a minimum value of noise at 33504.
Yet another approach, which avoids the voice switch, is illustrated in
Figure 33F. The residual noise reduction processor 33506 applies a similar threshold used by the noise estimator at 33508 on the noise free output bin and replaces or decays the result when it is lower than the threshold at 33510. The result of the residual noise processing of the present invention is a quieter sound in the non-speech intervals. However, the appearance of artifacts such as a pumping noise when the noise level is switched between the speech interval and the non-speech interval may occur in some applications.
The spectral subtraction technique of the present invention can be utilized in conjunction with the array techniques, close talk microphone technique or as a stand alone system. The spectral subtraction of the present invention can be implemented on an embedded hardware (DSP) as a stand alone system, as part of other embedded algorithms such as adaptive beamforming, or as a software application running on a PC using data obtained from a sound port.
As illustrated in Figures 33G-33J, for example, the present invention may be implemented as a software application. In step 33600, the input samples are read. At step 33602, the read samples are stored in a buffer. If 256 new points are accumulated in step 33604, program control advances to step 33606 - otherwise control returns to step 33600 where additional samples are read. Once 256 new samples are read, the last 512 points are moved to the processing buffer in step 33606. The 256 new samples stored are combined with the previous 256 points in step 33608 to obtain the 512 points. In step 33610, a Fourier Transform is performed on the 512 points. Of course, another transform may be employed to obtain the spectral noise signal. In step
33612, the 256 significant complex points resulting from the transformation are stored in the buffer. The second 256 points are a conjugate replica of the first 256 points and are redundant for real inputs. The stored data in step 33614 includes the 256 real points and the 256 imaginary points. Next, control advances to Figure 33H as indicated by the circumscribed letter A.
In Figure 33H, the noise processing is performed wherein the magnitude of the signal is estimated in step 33700. Of course, the straight forward approach may be employed but, as discussed with reference to Figure 33B, the straight forward approach requires extraneous processing time and complexity. In step 33702, the stored complex points are read from the buffer and calculated using the estimation equation shown in step 33700. The result is stored in step 33704. A 2-dimensional (2D) smoothing process is effected in steps 33706 and 33708 wherein, in step 33706, the estimate at each point is averaged with the estimates of adjacent points and, in step 33708, the estimate is averaged using an exponential average having the effect of averaging the estimate at each point over, for example, 3 time samples of each bin. In steps 33710 and 33712, the smoothed estimate is employed to determine the future minimum value and the current minimum value. If the smoothed estimate is less than the calculated future minimum value as determined in step 33710, the future minimum value is replaced with the smoothed estimate and stored in step 33714.
Meanwhile, if it is determined at step 33712 that the smoothed estimate is less than the current minimum value, then the current minimum is replaced with the smoothed estimate value and stored in step 33720. The future and current minimum values are calculated continuously and initiated periodically, for example, every 5 seconds as determined in step 33724 and control is advanced to steps 33722 and 33726 wherein the new future and current minimum are calculated. Afterwards, control advances to Figure 331 as indicated by the circumscribed letter B where the subtraction and residual noise reduction are effected.
In Figure 331, it is determined whether the samples are less than a threshold amount in step 33800. In step 33804, where the samples are within the threshold, the samples undergo an exponential averaging and stored in the buffer at step 33802. Otherwise, control advances directly to step 33808. At step 33808, the filter coefficients are determined from the signal samples retrieved in step 33806 the samples retrieved from step 33810 is determined from the signal samples retrieved in step 33806 and the estimated samples retrieved from step 33810. Although the straight forward approach may be used by which phase is estimated and applied, the alternative Weiner Filter is preferred since this saves processing time and complexity. In step 33814, the filter transform is multiplied by the samples retrieved from steps 33816 and stored in step 33812.
In steps 33818 and 33820, the residual noise reduction process is performed wherein, in step 33818, if the processed noise signal is within a threshold, control advances to step 33820 wherein the processed noise is subjected to replacement, for example, a decay. However, the residual noise reduction process may not be suitable in some applications where the application is negatively effected. It will be appreciated that, while specific values are used as in the several equations and calculations employed in the present invention, these values may be different than those shown.
In Figure 33J, the Inverse Fourier Transform is generated in step 902 on the basis of the recovered noise processed audio signal recovered in step 904 and stored in step 900. In step 906, the time-domain signals are overlayed in order to regenerate the audio signal substantially without noise.
It will be appreciated that the present invention may be practiced as a software application, preferably written using C or any other programming language, which may be embedded on, for example, a programmable memory chip or stored on a computer-readable medium such as, for example, an optical disk, and retrieved therefrom to drive a computer processor. Sample code representative of the present invention is illustrated in Appendix A which, as will be appreciated by those skilled in the art, may be modified to accommodate various operating systems and compilers or to include various bells and whistles without departing from the spirit and scope of the present invention.
With the present invention, a spectral subtraction system is provided that has a simple, yet efficient mechanism, to estimate the noise magnitude spectrum even in poor signal to noise ratio situations and in continuous fast speech cases. An efficient mechanism is provided that can perform the magnitude estimation with little cost, and will overcome the problem of phase association. A stable mechanism is provided to estimate the noise spectral magnitude without the smearing of the data. Although preferred embodiments of the present invention and modifications thereof have been described in detail herein, it is to be understood that this invention is not limited to those precise embodiments and modifications, and that other modifications and variations may be affected by one skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

WHAT IS CLAIMED IS:
1. A microphone array comprising: a number of microphone elements for independently receiving main and reference channel matrix audio signals corresponding to a main channel wherein a desired audio signal is received and a reference channel wherein a noise component of the desired audio signal is received; and a keyboard coupled to said array of microphones.
2. The microphone array of claim 1, wherein said microphone elements are disposed adjacent a bottom surface of said keyboard such that said microphone elements are directed downward toward a supporting surface such that a pressure zone microphone effect is created thereby reducing acoustic reflections.
3. The microphone array according to claim 2, wherein said microphone elements are arranged adjacent an edge of said keyboard.
4. The microphone array of claim 1, wherein said microphone elements are individual microphones located at a number of corners of said keyboard.
5. The microphone array of claim 4, wherein each corner of said keyboard incoφorates a number of said independent microphone.
6. The microphone array of claim 1, wherein said keyboard comprises a pop-up housing which houses said microphone elements in a structure which is substantially within said keyboard in a closed position and substantially above a surface of said keyboard in an open position for receiving said audio signals.
7. The microphone array of claim 1 , wherein said keyboard comprises a raised surface for housing said microphone elements.
8. A microphone array comprising: a number of microphone elements for receiving respective audio signals corresponding to a main channel and a reference channel wherein said main channel receives a desired audio signal and said reference channel receives a noise component of said desired audio signal; and an elongated housing having a substantially flat profile for insertion between small gaps.
9. The microphone array of claim 8, wherein said housing comprises holding means for holding said housing to a portion of an automobile.
10. The microphone array of claim 9, wherein said portion of the automobile is a visor.
11. The microphone array of claim 10, wherein a small gap is formed between said housing and a roof of said automobile thereby creating a pressure zone microphone effect for reducing acoustic reflections in said automobile.
12. A microphone array comprising: a number of microphone elements for independently receiving main and reference channel matrix audio signals corresponding to a main channel wherein a desired audio signal is received and a reference channel wherein a noise component of the desired audio signal is received; a rear view mirror of an automobile having a reflective portion bounded by a perimeter, wherein said microphone elements are disposed along said perimeter of said rear view mirror.
13. A microphone array comprising: a number of microphone elements for independently receiving main and reference channel matrix audio signals corresponding to a main channel wherein a desired audio signal is received and a reference channel wherein a noise component of a desired audio signal is received; and a mouse peripheral for use with a computer for housing said microphone elements.
14. A microphone array comprising: a number of microphone elements for independently receiving main and reference channel matrix audio signals corresponding to a main channel wherein a desired audio signal is received and a reference channel wherein a noise component of the desired audio signal is received; and a video camera for housing said microphone elements.
15., A microphone array comprising: a number of microphone elements for independently receiving main and reference channel matrix audio signals corresponding to a main channel wherein a desired audio signal is received and a reference channel wherein a noise component of the desired audio signal is received; and a universal voice interface for interfacing the desired audio signal and the noise component to a computer processor.
16. The microphone array according to claim 15, wherein said universal voice interface includes converting means for converting said desired audio signal and said noise component to digital form.
17. The microphone array according to claim 16, wherein said universal voice interface is incoφorated within a rear view mirror of an automobile.
18. The microphone array according to claim 16, wherein said universal voice interface provides audio noise canceling of said noise component.
19. The microphone array according to claim 16, wherein said universal voice interface is incoφorated in a port plug for a standard computer.
20. The microphone array according to claim 19, wherein said microphone elements are disposed along a perimeter of a computer monitor.
21. The microphone array comprising: a number of microphone elements for independently receiving main and reference channel matrix audio signals corresponding to a main channel wherein a desired audio signal is received and a reference channel wherein a noise component of the desired audio signal is received; and a noise canceling stethoscope for housing said microphone elements.
22. A microphone array comprising: a number of microphone elements for independently receiving main and referenced channel matrix audio signals corresponding to a main channel wherein a desired audio signal is received and a reference channel wherein a noise component of the desired audio signal is received; and an ultrasound device for housing said microphone elements.
23. The microphone array according to claim 1 , further comprising: an input for inputting an audio signal which includes a noise signal; a frequency spectrum generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and a threshold detector for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
24. The microphone array according to claim 8, further comprising: an input for inputting an audio signal which includes a noise signal; a frequency spectrum generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and a threshold detector for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
25. The microphone array according to claim 12, further comprising: an input for inputting an audio signal which includes a noise signal; a frequency spectrum generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and a threshold detector for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
26. The microphone array according to claim 13, further comprising: an input for inputting an audio signal which includes a noise signal; a frequency spectrum generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and a threshold detector for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
27. The microphone array according to claim 14, further comprising: an input for inputting an audio signal which includes a noise signal; a frequency spectrum generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and a threshold detector for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
28. The microphone array according to claim 15, further comprising: an input for inputting an audio signal which includes a noise signal; a frequency spectrum generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and a threshold detector for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
29. The microphone array according to claim 21, further comprising: an input for inputting an audio signal which includes a noise signal; a frequency spectrum generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and a threshold detector for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
30. The microphone array according to claim 22, further comprising: an input for inputting an audio signal which includes a noise signal; a frequency spectrum generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and a threshold detector for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
31. The microphone array according to claim 1, further comprising: input means for inputting an audio signal which includes a noise signal; frequency spectrum generating means for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and threshold detecting means for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
32. The microphone array according to claim 8, further comprising: input means for inputting an audio signal which includes a noise signal; frequency spectrum generating means for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and threshold detecting means for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
33. The microphone array according to claim 12, further comprising: input means for inputting an audio signal which includes a noise signal; frequency spectrum generating means for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and threshold detecting means for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
34. The microphone array according to claim 13, further comprising: input means for inputting an audio signal which includes a noise signal; frequency spectrum generating means for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and threshold detecting means for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
35. The microphone array according to claim 14, further comprising: input means for inputting an audio signal which includes a noise signal; frequency spectrum generating means for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and threshold detecting means for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
36. The microphone array according to claim 15, further comprising: input means for inputting an audio signal which includes a noise signal; frequency spectrum generating means for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and threshold detecting means for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
37. The microphone array according to claim 21, further comprising: input means for inputting an audio signal which includes a noise signal; frequency spectrum generating means for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and threshold detecting means for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
38. The microphone array according to claim 22, further comprising: input means for inputting an audio signal which includes a noise signal; frequency spectrum generating means for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal; and threshold detecting means for detecting for each frequency bin whether a respective frequency bin is within said threshold thereby detecting the position of noise elements for each frequency bin.
39. A handheld digital assistant, comprising: a body for housing a processing means and a display; and a microphone array integral to said body and including a number of microphone elements for independently receiving main and reference channel matrix audio signals corresponding to a main channel wherein a desired audio signal is received and a reference channel wherein a noise component of the desired audio signal is received; wherein said processing means is operable to receive microphone signals generated by said aπay and perform tasks based on said microphone signals.
40. The handheld digital assistant of claim 39, wherein said microphone aπay is selectively operable in a near field audio reception mode and a far field audio reception mode.
41. A personal computer system, comprising: a computer body for housing a processing means and a monitor; a microphone aπay including a number of microphone elements for independently receiving main and reference channel matrix audio signals coπesponding to a main channel wherein a desired audio signal is received and a reference channel wherein a noise component of the desired audio signal is received; and means for coupling said microphone aπay to said computer body; wherein said processing means is operable to receive microphone signals generated by said aπay and perform tasks based on said microphone signals.
42. The personal computer system of claim 41, wherein said microphone aπay is integrated into said monitor.
43. The personal computer system of claim 41 , wherein said microphone aπay has four microphones, said means for coupling said microphone aπay to said computer body is two dual channel audio lines, and each microphone has its signals transmitted to the computer body via a dedicated one of said channels.
PCT/US2000/029336 1999-10-22 2000-10-23 System and method for adaptive interference canceling WO2001031972A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US42579099A 1999-10-22 1999-10-22
US09/425,790 1999-10-22

Publications (1)

Publication Number Publication Date
WO2001031972A1 true WO2001031972A1 (en) 2001-05-03

Family

ID=23688055

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/029336 WO2001031972A1 (en) 1999-10-22 2000-10-23 System and method for adaptive interference canceling

Country Status (1)

Country Link
WO (1) WO2001031972A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003058606A1 (en) 2001-12-31 2003-07-17 Speechgear, Inc. Translation device with planar microphone array
EP1411517A1 (en) * 2002-10-18 2004-04-21 Motorola, Inc. Control unit and method for transmitting audio signals over an optical network
EP1524879A1 (en) 2003-06-30 2005-04-20 Harman Becker Automotive Systems GmbH Handsfree system for use in a vehicle
EP1339256A3 (en) * 2003-03-03 2005-06-22 Phonak Ag Method for manufacturing acoustical devices and for reducing wind disturbances
US7127076B2 (en) 2003-03-03 2006-10-24 Phonak Ag Method for manufacturing acoustical devices and for reducing especially wind disturbances
US8923529B2 (en) 2008-08-29 2014-12-30 Biamp Systems Corporation Microphone array system and method for sound acquisition
US9889931B2 (en) 2014-08-29 2018-02-13 Sz Dji Technology, Co., Ltd Unmanned aerial vehicle (UAV) for collecting audio data
EP4064725A1 (en) * 2021-03-23 2022-09-28 Sagemcom Broadband Sas Method for dynamic selection of microphones

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5492129A (en) * 1993-12-03 1996-02-20 Greenberger; Hal Noise-reducing stethoscope
US5539831A (en) * 1993-08-16 1996-07-23 The University Of Mississippi Active noise control stethoscope
US5631638A (en) * 1993-07-09 1997-05-20 Hohe Gmbh & Co.Kg. Information system in a motor vehicle
US5631669A (en) * 1994-02-04 1997-05-20 Stobbs; Gregory A. Pointing device with integral microphone
US5657393A (en) * 1993-07-30 1997-08-12 Crow; Robert P. Beamed linear array microphone system
US5673325A (en) * 1992-10-29 1997-09-30 Andrea Electronics Corporation Noise cancellation apparatus
US5717430A (en) * 1994-08-18 1998-02-10 Sc&T International, Inc. Multimedia computer keyboard
US5727073A (en) * 1995-06-30 1998-03-10 Nec Corporation Noise cancelling method and noise canceller with variable step size based on SNR
US5825898A (en) * 1996-06-27 1998-10-20 Lamar Signal Processing Ltd. System and method for adaptive interference cancelling
US5835732A (en) * 1993-10-28 1998-11-10 Elonex Ip Holdings, Ltd. Miniature digital assistant having enhanced host communication
US5862240A (en) * 1995-02-10 1999-01-19 Sony Corporation Microphone device
US5900907A (en) * 1997-10-17 1999-05-04 Polycom, Inc. Integrated videoconferencing unit
US5946403A (en) * 1993-06-23 1999-08-31 Apple Computer, Inc. Directional microphone for computer visual display monitor and method for construction
US6026162A (en) * 1993-02-02 2000-02-15 Palett; Anthony P. Mirror mounted mobile telephone system
US6083167A (en) * 1998-02-10 2000-07-04 Emory University Systems and methods for providing radiation therapy and catheter guides

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5673325A (en) * 1992-10-29 1997-09-30 Andrea Electronics Corporation Noise cancellation apparatus
US6026162A (en) * 1993-02-02 2000-02-15 Palett; Anthony P. Mirror mounted mobile telephone system
US5946403A (en) * 1993-06-23 1999-08-31 Apple Computer, Inc. Directional microphone for computer visual display monitor and method for construction
US5631638A (en) * 1993-07-09 1997-05-20 Hohe Gmbh & Co.Kg. Information system in a motor vehicle
US5657393A (en) * 1993-07-30 1997-08-12 Crow; Robert P. Beamed linear array microphone system
US5539831A (en) * 1993-08-16 1996-07-23 The University Of Mississippi Active noise control stethoscope
US5835732A (en) * 1993-10-28 1998-11-10 Elonex Ip Holdings, Ltd. Miniature digital assistant having enhanced host communication
US5492129A (en) * 1993-12-03 1996-02-20 Greenberger; Hal Noise-reducing stethoscope
US5631669A (en) * 1994-02-04 1997-05-20 Stobbs; Gregory A. Pointing device with integral microphone
US5717430A (en) * 1994-08-18 1998-02-10 Sc&T International, Inc. Multimedia computer keyboard
US5862240A (en) * 1995-02-10 1999-01-19 Sony Corporation Microphone device
US5727073A (en) * 1995-06-30 1998-03-10 Nec Corporation Noise cancelling method and noise canceller with variable step size based on SNR
US5825898A (en) * 1996-06-27 1998-10-20 Lamar Signal Processing Ltd. System and method for adaptive interference cancelling
US5900907A (en) * 1997-10-17 1999-05-04 Polycom, Inc. Integrated videoconferencing unit
US6083167A (en) * 1998-02-10 2000-07-04 Emory University Systems and methods for providing radiation therapy and catheter guides

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1464048A4 (en) * 2001-12-31 2005-05-04 Speechgear Inc Translation device with planar microphone array
WO2003058606A1 (en) 2001-12-31 2003-07-17 Speechgear, Inc. Translation device with planar microphone array
EP1464048A1 (en) * 2001-12-31 2004-10-06 Speechgear, Inc. Translation device with planar microphone array
US7519085B2 (en) 2002-10-18 2009-04-14 Temic Automotive Of North America, Inc. Control unit for transmitting audio signals over an optical network and methods of doing the same
EP1411517A1 (en) * 2002-10-18 2004-04-21 Motorola, Inc. Control unit and method for transmitting audio signals over an optical network
US8094847B2 (en) 2003-03-03 2012-01-10 Phonak Ag Method for manufacturing acoustical devices and for reducing especially wind disturbances
EP1339256A3 (en) * 2003-03-03 2005-06-22 Phonak Ag Method for manufacturing acoustical devices and for reducing wind disturbances
US7127076B2 (en) 2003-03-03 2006-10-24 Phonak Ag Method for manufacturing acoustical devices and for reducing especially wind disturbances
US7492916B2 (en) 2003-03-03 2009-02-17 Phonak Ag Method for manufacturing acoustical devices and for reducing especially wind disturbances
EP1524879B1 (en) * 2003-06-30 2014-05-07 Nuance Communications, Inc. Handsfree system for use in a vehicle
EP1524879A1 (en) 2003-06-30 2005-04-20 Harman Becker Automotive Systems GmbH Handsfree system for use in a vehicle
US8923529B2 (en) 2008-08-29 2014-12-30 Biamp Systems Corporation Microphone array system and method for sound acquisition
US9462380B2 (en) 2008-08-29 2016-10-04 Biamp Systems Corporation Microphone array system and a method for sound acquisition
US9889931B2 (en) 2014-08-29 2018-02-13 Sz Dji Technology, Co., Ltd Unmanned aerial vehicle (UAV) for collecting audio data
US10850839B2 (en) 2014-08-29 2020-12-01 SZ DJI Technology Co., Ltd. Unmanned aerial vehicle (UAV) for collecting audio data
EP4064725A1 (en) * 2021-03-23 2022-09-28 Sagemcom Broadband Sas Method for dynamic selection of microphones
FR3121260A1 (en) * 2021-03-23 2022-09-30 Sagemcom Broadband Sas Dynamic microphone selection method

Similar Documents

Publication Publication Date Title
EP1658751B1 (en) Audio input system
KR101449433B1 (en) Noise cancelling method and apparatus from the sound signal through the microphone
JP4782202B2 (en) Method and apparatus for improving noise discrimination using enhanced phase difference values
AU2005200699B2 (en) A system and method for beamforming using a microphone array
US6157403A (en) Apparatus for detecting position of object capable of simultaneously detecting plural objects and detection method therefor
US6668062B1 (en) FFT-based technique for adaptive directionality of dual microphones
RU2759715C2 (en) Sound recording using formation of directional diagram
JP2009506363A (en) Method and apparatus for adapting to device and / or signal mismatch in a sensor array
JPH09512676A (en) Adaptive beamforming method and apparatus
US10887691B2 (en) Audio capture using beamforming
JP2009506672A (en) Method and apparatus for improving noise discrimination using attenuation factors
WO1999053336A1 (en) Wave source direction determination with sensor array
US20200145752A1 (en) Method and apparatus for audio capture using beamforming
CN111078185A (en) Method and equipment for recording sound
Trucco et al. A stochastic approach to the synthesis of a robust frequency-invariant filter-and-sum beamformer
US6718041B2 (en) Echo attenuating method and device
WO2001031972A1 (en) System and method for adaptive interference canceling
Benesty et al. Array beamforming with linear difference equations
Kompis et al. Simulating transfer functions in a reverberant room including source directivity and head‐shadow effects
Mabande et al. On 2D localization of reflectors using robust beamforming techniques
CN114001816B (en) Acoustic imager audio acquisition system based on MPSOC
Grbic et al. Optimal FIR subband beamforming for speech enhancement in multipath environments
Liu et al. Simulation of fixed microphone arrays for directional hearing aids
Wang et al. A high performance microphone array system for hearing aid applications
Zheng et al. A nested sensor array focusing on near field targets

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA IL JP SG

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP