CN102496366A - Speaker identification method irrelevant with text - Google Patents

Speaker identification method irrelevant with text Download PDF

Info

Publication number
CN102496366A
CN102496366A CN2011104283792A CN201110428379A CN102496366A CN 102496366 A CN102496366 A CN 102496366A CN 2011104283792 A CN2011104283792 A CN 2011104283792A CN 201110428379 A CN201110428379 A CN 201110428379A CN 102496366 A CN102496366 A CN 102496366A
Authority
CN
China
Prior art keywords
speaker
characteristic parameter
frequency
sequence
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011104283792A
Other languages
Chinese (zh)
Other versions
CN102496366B (en
Inventor
朱坚民
黄之文
李孝茹
李海伟
王军
翟东婷
毛得吉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN201110428379.2A priority Critical patent/CN102496366B/en
Publication of CN102496366A publication Critical patent/CN102496366A/en
Application granted granted Critical
Publication of CN102496366B publication Critical patent/CN102496366B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a speaker identification method irrelevant with a text. The method mainly comprises the following steps: (1) acquiring a speaker voice signal and processing the voice signal so as to obtain a voice pretreatment signal; (2) carrying out feature extraction to the voice signal obtained after the pretreatment so as to obtain a characteristic parameter of a speaker in the identification system; (3) repeating the above two steps several times, obtaining a characteristic parameter sequence of the registered speaker and establishing a characteristic parameter reference database of all the registered speakers; (4) acquiring the characteristic parameter sequence of the speaker to be identified and calculating a weighted grey correlation degree between the speaker to be identified and all the registered speakers; (5) extracting a maximum value of all the weighted grey correlation degrees, comparing a weighted grey correlation degree identification threshold with the maximum value so as to obtain a identification result. The invention relates to a biological characteristic identification technology field and especially relates to the speaker identification technology field. In the current speaker identification technology which is irrelevant with the text, an error rate is high. By using the method of the invention, the above problem can be solved. A wide application prospect can be possessed.

Description

A kind of method for distinguishing speek person of and text-independent
Technical field
The present invention relates to biometrics identification technology, mainly is a kind of based on third-octave and grey association and method for distinguishing speek person text-independent.
Technical background
The raising of the development of Along with computer technology and social informatization degree, utilizing people's biological characteristic (like fingerprint, vocal print, image etc.) to carry out identification or checking has become in the information industry very important advanced technology.Speaker Identification is meant that the pronunciation that utilizes the people carries out the identification or the checking of speaker ' s identity, and Speaker Identification can be widely used in fields such as police and judicial department, business transaction, bank finance, conservative individual's secret, safety inspection.
The research emphasis in speaker Recognition Technology field is the extraction of characteristic parameter and the structure of recognizer.Feature extraction is exactly from speaker's voice signal, to extract the characteristic parameter that can at large, accurately express its voice.At present; The characteristic parameter that uses in the speech recognition technology is based on LPCC (the Linear Prediction Cepstrum Coefficient) parameter of channel model, based on MFCC (Mel Frequency Cepstmm Coefficient) parameter or its improvement and the combination of hearing mechanism, but the voice messaging quantity not sufficient that these characteristic parameters characterized.Therefore the present invention proposes to adopt the third-octave Spectral Analysis Method voice signal to be carried out the extraction of characteristic parameter.The whole audio range of 20HZ-20KHZ that the third-octave Spectral Analysis Method can be heard people's ear is divided into the frequency band of 30 constant bandwidth ratios; And the sound signal that drops in these frequency bands carried out spectrum analysis; Can express the information that is contained in speaker's the voice signal more accurately, and then strengthen the robustness of speaker characteristic parameter.
In voice technology research and application, the recognizer of voice signal has three kinds: based on the method for the method of channel model and voice knowledge, template matches and utilize Artificial Neural Network model.Though the research based on channel model and voice knowledge method starts to walk early, because it is too complicated, present stage is not obtained good practical function.The method of template matches has dynamic time warping (DTW), theoretical, vector quantization (VQ) technology of hidden Markov (HMM), and these algorithms are poor anti jamming capability under noise circumstance, can not reach good identification effect.Artificial Neural Network has adaptivity, concurrency, robustness, fault-tolerance and learning characteristic; Classification capacity that it is powerful and input-output mapping ability are all very attractive in speech recognition; But, can not obtain good practical function owing to have training, the oversize shortcoming of recognition time.The present invention proposes to use the method based on the grey association degree to carry out Speaker Identification, considers the information and the effect of information change in Speaker Identification thereof of containing in speaker's voice signal simultaneously, has improved the discrimination of voice signal significantly.
Speaker Identification can be divided into relevant with text again and with text-independent, and this two all be to carry out Speaker Identification according to the characteristic information that contains in the voice signal." relevant with text " is to adopt the restricted content of text of speaking, and only one or more characteristic parameters in speaker's the voice signal discerned that be easier to palmed off imitation, the confidentiality of recognition system is not high." with text-independent " then is to adopt the content of text of speaking at random, and the dirigibility of recognition system is good.But since in the voice signal contain the rich of characteristic information, and the complicacy of noise in the actual environment, the step of traditional method for distinguishing speek person is more loaded down with trivial details again.
Summary of the invention
In order to solve the Speaker Identification rate of the existing defective of above-mentioned technology and raising and text-independent, it is a kind of based on third-octave and grey association and method for distinguishing speek person text-independent that the present invention provides.This method is carried out feature extraction through the third-octave Spectral Analysis Method to speaker's voice signal; And adopt grey association degree algorithm to carry out Speaker Identification; Be a kind of reliably, effectively and the method for distinguishing speek person of text-independent, have good robustness.
For reaching above goal of the invention, the inventive method comprises the steps:
One, set up N speaker's phonetic feature reference library, described N is the integer more than or equal to 1, and step is following:
A, gather the 1st section voice signal of the 1st speaker and successively sample quantization, zero-suppress float, pre-emphasis and windowing, obtain the 1-1 audio frame F after the windowing m' (n);
B, to 1-1 audio frame F m' (n) use the third-octave Spectral Analysis Method, obtain the 1-1 characteristic parameter, described characteristic parameter is the corresponding power spectral value sequence of each centre frequency frequency band of living in, described 1-1 representes the 1st section voice signal of the 1st speaker;
C, a N speaker carry out M A, B step, obtain N * M characteristic parameter successively, and described N * M characteristic parameter forms the characteristic parameter reference library, and described N * M representes M characteristic parameter extraction of N speaker;
Two, obtain N grey association degree, step is following:
I, gather speaker characteristic parameter X to be measured through steps A, B;
II, add to the sequence of characteristic parameter X in the reference library respectively; And the sequence of giving N characteristic parameter equably according to the timeinvariance of frequency-region signal is with identical weight coefficient; Reconfigure N weighted mean characteristic parameter sequence of formation, obtain N grey association degree value;
Three, identification and matching is extracted maximal value R in N the grey association degree value MaxWith R θRelatively, if R Max>=R, then coupling not, does not then match.
According to the method for distinguishing speek person of a kind of and text-independent of one embodiment of the present invention, the step of the feature extraction described in the step B is:
(A) signal time-frequency conversion: adopt base-2 Algorithm FFT conversion to convert the time-domain signal of speaker's voice into frequency-region signal, ask for the power spectrum of speaker's voice signal;
(B) confirm the centre frequency f of third-octave Spectral Analysis Method c
(C) ask for the upper and lower limit frequency: the upper and lower limit frequency of third-octave and the relation between the centre frequency are:
f u f d = 2 1 / 3 , f c f d = 2 1 / 6 , f u f c = 2 1 / 6 ;
(D) sound pressure level conversion, promptly
L p = 20 lg P P 0 ( dB )
P wherein 0Be reference acoustic pressure, its value is 2 * 10 -5Pa;
(E) calculate each centre frequency f cThe mean value of the power spectrum of frequency band of living in: upper and lower limit frequency and centre frequency according to third-octave become a plurality of frequency bands with the frequency partition in the power spectrum; And in each frequency band, all power magnitude are pressed the logarithm stack; Obtain the third-octave frequency spectrum, its amplitude is characteristic parameter.
According to the method for distinguishing speek person of a kind of and text-independent of one embodiment of the present invention, the detailed step that the grey association degree described in the Step II calculates is:
(F) extract the characteristic parameter sequence: the sequence X 0 that obtains speaker characteristic parameter X to be identified; And extract each characteristic parameter sequences of all registered speaker's reference library; Be characteristic parameter sequence A 1, A2,
Figure BDA0000122444270000035
AN of registered speaker A; The characteristic parameter sequence B 1 of registered speaker B, B2,
Figure BDA0000122444270000036
BN, by that analogy;
(G) structure weighted mean characteristic parameter sequence: speaker's to be identified characteristic parameter sequence is added to respectively in the recognition system in all registered speaker's reference library; And give these characteristic parameter sequences equably with identical weight coefficient according to the timeinvariance of frequency-region signal, so that reconfiguring with registered speaker respectively, speaker to be identified constitutes weighted mean characteristic parameter sequence.Be that registered speaker A and speaker X to be identified constitute sequence ω 11A1, ω 12A2,
Figure BDA0000122444270000037
ω 1nAN, ω 1xX0, wherein ω 1112=L=ω 1n1xAnd ω 11+ ω 12+ L+ ω 1n+ ω 1x=1; Registered speaker B and speaker X to be identified constitute sequence ω 21B1, ω 22B2,
Figure BDA0000122444270000038
ω 2nBN, ω 2xX0, wherein ω 2122=L=ω 2n2xAnd ω 21+ ω 22+ L+ ω 2n+ ω 2x=1, by that analogy;
(H) add up and generate weighted mean grey correlation characteristic parameter sequence: try to achieve all registered speakers' in speaker to be identified and the recognition system weighted mean grey correlation characteristic parameter sequence, promptly registered speaker A and the new characteristic parameter sequence A Y=ω of speaker X to be identified formation respectively according to superposition principle 11A1+ ω 12A2+L+ ω 1nAN+ ω 1xX1, registered speaker B and speaker X to be identified constitute new characteristic parameter sequence B Y=ω 21B1+ ω 22B2+L+ ω 2nBN+ ω 2xX1, by that analogy;
(I) calculate the grey association degree: by grey association degree algorithm computation speaker to be identified and registered speaker's grey association degree; Be the grey association degree RA of registered speaker A and speaker X to be identified; The grey association degree RB of registered speaker B and speaker X to be identified; By that analogy, obtain N grey association degree R.
According to the method for distinguishing speek person of a kind of and text-independent of one embodiment of the present invention, definite method of the centre frequency of described third-octave Spectral Analysis Method is:
The centre frequency of third-octave is f c=1000 * 10 3n/30HZ (n=0, ± 1, ± 2, K);
Choose the approximate value of centre frequency, promptly selected centre frequency is: 20HZ, 25HZ, 31.5HZ, 40HZ, 50HZ, 63HZ, 80HZ; 100HZ, 125HZ, 160HZ, 200HZ, 250HZ, 315HZ, 400HZ, 500HZ; 630HZ, 800HZ, 1000HZ, 1350HZ, 1600HZ, 2000HZ, 2500HZ, 3150HZ; 4000HZ, 5000HZ, 6300HZ, 8000HZ, 10000HZ, 12500HZ, 16000HZ.
According to the method for distinguishing speek person of a kind of and text-independent of one embodiment of the present invention, the algorithm of described grey association degree is:
If X={x σ(t) | σ=0,1,2, K, m} are the serial correlation factor set, i.e. reference library, x 0Be reference function (female factor), i.e. one of them registered speaker;
x iBe comparison function (sub-factor) i.e. speaker's to be measured characteristic factor X, x σ(k) be x σThe value of ordering at k, wherein, i=1,2, K, m, k=1,2, K, n.
For x 0, x i, order:
ζ i ( k ) = ξ · max i ∈ m max k ∈ n | x 0 ( k ) - x i ( k ) | λ 1 | x 0 ( k ) i ∈ m - x i ( k ) k ∈ n | + λ 2 | x 0 ′ i ∈ m - x i ′ k ∈ n ( k ) | + ξ · max max | x 0 ( k ) - x i ( k ) |
X then iFor x 0The grey degree of association be:
γ i = γ ( x 0 , x i ) = 1 n · Σ k = 1 n ζ i ( k )
Wherein, 0<ε<1, λ 1, λ 2>=0, λ 1+ λ 2=1, constant ξ is a resolution ratio, λ 1, λ 2Be respectively displacement and rate of change weighting coefficient, in practical application, can suitably choose ξ, λ as the case may be 1, λ 2
The effect that the present invention is useful is: the present invention adopts the third-octave Spectral Analysis Method that speaker's voice signal is carried out characteristic parameter extraction; The information that voice signal contained in the whole audio range of 20HZ-20KHZ of hearing people's ear more fully extracts, and has reduced the adverse effect that the characteristic information of voice signal in the Speaker Identification process does not bring entirely; This invention is carried out Speaker Identification through grey association degree algorithm, considers the information and the effect of information change in Speaker Identification of containing in speaker's voice signal simultaneously, has reduced the error rate of Speaker Identification.This based on third-octave and grey association and method for distinguishing speek person text-independent; Realized robustness with the Speaker Identification of text-independent; Improved the discrimination with speaker's voice signal of text-independent significantly, be with a wide range of applications.
Description of drawings
Fig. 1 is the process flow diagram of method provided by the invention;
Fig. 2 is a third-octave feature extraction process flow diagram of the present invention;
Fig. 3 is FFT butterfly computation symbol figure of the present invention;
Fig. 4 is a grey association degree algorithm flow chart of the present invention;
Fig. 5 is identification and matching of the present invention and strategy choice process flow diagram;
Fig. 6 is one section voice signal figure of speaker A of the present invention;
Fig. 7 is the frame signal figure of voice after one section pre-service of speaker A of the present invention;
Fig. 8 is the width of cloth third-octave spectrogram of speaker A of the present invention.
Embodiment
Through accompanying drawing and embodiment technical scheme of the present invention is done further detailed description below.Method of the present invention was divided into for five steps, shown in accompanying drawing 1.
The first step: voice signal pre-service
1, sampling and quantification
A), with the FIR BPF. to voice signal through the row filtering, make nyquist frequency F NBe 20KHZ;
B), speech sample frequency F>=2F is set N, get it among the embodiment according to the invention and be F=51200HZ;
C), to voice signal s a(t) sample by the cycle, obtain the voice signal amplitude sequence Wherein t representes that this voice signal is a time-continuous signal, and n then representes the discrete signal sequence, is taken as continuous natural number during the n value and gets final product;
D), the amplitude sequence s (n) of audio digital signals is carried out quantization encoding, the quantized value that obtains amplitude sequence is represented s ' (n) with pulse code (PCM).
2, zero-suppress and float
A), the quantized value that calculates amplitude sequence is represented s ' mean value (n)
B), each amplitude in the amplitude sequence is deducted mean value respectively, obtain zero-suppressing float back mean value be 0 amplitude sequence s " (n);
3, pre-emphasis is handled
A), Z transfer function H (the z)=1-az of digital filter is set -1In pre emphasis factor a, the desirable ratio of a 1 is slightly for a short time to be value, getting it in the present embodiment is 0.96;
B), s ", obtain the suitable amplitude sequence s of high, medium and low frequency amplitude of voice signal " (n) through digital filter ' (n).
4, windowing
A), the frame length N of computing voice frame, N satisfies:
20 ≤ N F ≤ 30 ,
Wherein, F is the speech sample rate, and unit is HZ;
B), be that frame length, N/2 are the frame amount of moving with N, s " ' (n) be divided into a series of speech frame F m, each speech frame F mComprise N voice signal sample;
C), calculate Hamming window function:
Figure BDA0000122444270000064
N is each audio frame F in the formula mFrame length;
D), to each speech frame F mAdd Hamming window:
Utilize formula F m' (n): F m' (n)=ω (n) * F m(n) respectively to each audio frame F mAdd Hamming window, obtain adding the audio frame F behind the Hamming window m' (n).
Second step: characteristic parameter extraction
The present invention is based on third-octave and extract the characteristic parameter of pretreated speaker's voice signal.Its algorithm flow is as shown in Figure 2, and specifically details are as follows:
1, power spectrum is asked in Fast Fourier Transform (FFT) (FFT)
The present invention adopts base-2 Algorithm FFT to convert the time-domain signal of speaker's voice signal into frequency-region signal, asks for the power spectrum sequence of speaker's voice signal.
A), voice signal sequence x (n) is carried out " base-2 decimations in time ", obtain " decimation in time " subsequence, promptly
x 1(r)=x(2r),r=0,1,2,K,N/2-1
x 2(r)=x(2r+1),r=0,1,2,K,N/2-1
Wherein, N is the length of voice signal sequence.
B), voice signal x (n) is carried out discrete Fourier transformation (DFT), obtain the frequency-region signal of speaker's voice, promptly
X ( k ) = Σ r = 0 N / 2 - 1 x 1 ( r ) W N 2 kr + W N k Σ r = 0 N / 2 - 1 x 2 ( r ) W N 2 kr
Because
W N 2 kr = e - j 2 π N 2 kr = e - j 4 π N kr = W N / 2 2 kr
Therefore, the frequency-region signal of speaker's voice does
X ( k ) = X 1 ( k ) + W N k X 2 ( k ) , k = 0,1,2 , K , N - 1
Wherein, X 1(k) and X 2(k) be respectively x 1(r) and x 2(r) N/2 point DFT, promptly
X 1 ( k ) = Σ r = 0 N / 2 - 1 x 1 ( r ) W N / 2 kr = DFT [ x 1 ( r ) ]
X 2 ( k ) = Σ r = 0 N / 2 - 1 x 2 ( r ) W N / 2 kr = DFT [ x 2 ( r ) ]
C), according to X 1(k) and X 2(k) periodicity (N/2) and
Figure BDA0000122444270000076
Symmetry Obtain FFT frequency spectrum sequence:
X ( k ) = X 1 ( k ) + W N k X 2 ( k ) , k = 0,1,2 , KN / 2 - 1
X ( k + N / 2 ) = X 1 ( k ) - W N k X 2 ( k ) , k = 0,1,2 , KN / 2 - 1
Above-mentioned computing is as shown in Figure 3, so can try to achieve the FFT frequency domain power spectrum of voice signal after the pre-service.
2, confirm centre frequency
The centre frequency f of third-octave cFor:
f c=1000×10 3n/30HZ(n=0,±1,±2,K)
The centre frequency that the present invention adopts is its approximate value, and promptly selected centre frequency is: 20HZ, 25HZ, 31.5HZ, 40HZ, 50HZ, 63HZ, 80HZ; 100HZ, 125HZ, 160HZ, 200HZ, 250HZ, 315HZ, 400HZ, 500HZ; 630HZ, 800HZ, 1000HZ, 1350HZ, 1600HZ 2000HZ, 2500HZ, 3150HZ; 4000HZ, 5000HZ, 6300HZ, 8000HZ, 10000HZ, 12500HZ, 16000HZ.
3, ask for the bound frequency
The centre frequency f of third-octave cFrequency band of living in is between upper limiting frequency f uWith lower frequency limit f dBetween.Its upper limiting frequency f u, lower frequency limit f dAnd centre frequency f cBetween relation be:
f u f d = 2 1 / 3 , f c f d = 2 1 / 6 , f u f c = 2 1 / 6 ;
Each centre frequency f of third-octave cThe bandwidth of frequency band of living in is:
Δf=f u-f d=(2 1/6-2- 1/6)f c
4, sound pressure level conversion
The whole audio range of 20HZ-20KHZ that the third-octave spectrum analysis can be heard people's ear is divided into the frequency band of 30 constant bandwidth ratios, and the sound signal that drops in these frequency bands is calculated sound pressure level.
Acoustic pressure according to sound signal can be obtained sound pressure level, and its transformational relation is:
L p = 20 lg P P 0 ( dB )
Wherein, P 0Be reference acoustic pressure, its value is 2 * 10 -5Pa.
5, computing center's frequency f cSpectrum value in the frequency band of living in
According to upper and lower limit frequency and centre frequency the frequency partition in the power spectrum is become a plurality of frequency bands, synthesize the third-octave power spectrum to the power spectrum of constant bandwidth ratio.1 power spectrum that the third-octave frequency band is interior, its synthetic method does
S x ( f n ) = ∫ f d f u S x ( f ) df
In the formula, S x(f n) be the synthetic power spectrum in 1 third-octave frequency band; S x(f) be the interior discrete power spectrum of 1 third-octave frequency band.
For the discrete power spectrum, the power spectrum of n frequency band does
S x , n = &Sigma; f l , n &le; f i < f u , n ln ( S x , n ( f i ) )
In the formula, S X, n(f i) be the power spectrum amplitude of each discrete frequency in this frequency band.
The mean value of band power spectrum is the amplitude A of this frequency band n, promptly
A n = 1 n S x , n
The pairing amplitude of frequency band of 30 constant bandwidth ratios is speaker's characteristic parameter in the frequency spectrum, and these 30 characteristic parameters constitute speaker's characteristic parameter sequence.The 3rd step: set up speaker's reference library
Repeat the first step and the second step several times; Set up all registered speakers' in the Speaker Recognition System characteristic parameter reference library; Promptly characteristic parameter sequence A 1, A2,
Figure BDA0000122444270000094
AN by registered speaker A constitutes its reference library; Characteristic parameter sequence B 1, B2,
Figure BDA0000122444270000095
BN by registered speaker B constitute its reference library, and
Figure BDA0000122444270000096
so sets up all registered speakers' in the Speaker Recognition System reference library.14 registered speakers are arranged in the present embodiment, 5 characteristic parameter sequences are arranged in every speaker's the reference library.
The 4th step: ask for the grey association degree
Grey association degree algorithm flow is as shown in Figure 4 among the present invention, and specifically details are as follows:
1, structural attitude parameter association group
A), obtain the characteristic parameter sequence X 0 of speaker X to be identified; And extract each characteristic parameter sequence in all registered speaker's reference library; Be characteristic parameter sequence A 1, A2,
Figure BDA0000122444270000097
AN of registered speaker A; The characteristic parameter sequence B 1 of registered speaker B, B2,
Figure BDA0000122444270000098
BN, by that analogy.
B), speaker's to be identified characteristic parameter sequence is added to respectively in the recognition system in all registered speaker's reference library; And give these characteristic parameter sequences equably with identical weight coefficient according to the timeinvariance of frequency-region signal, so that reconfiguring with registered speaker respectively, speaker to be identified constitutes weighted mean characteristic parameter sequence.Be that registered speaker A and speaker X to be identified constitute sequence ω 11A1, ω 12A2,
Figure BDA0000122444270000099
ω 1nAN, ω 1xX0, wherein ω 1112=L=ω 1n1xAnd ω 11+ ω 12+ L+ ω 1n+ ω 1x=1; Registered speaker B and speaker X to be identified constitute sequence ω 21B 1, ω 22B2,
Figure BDA00001224442700000910
ω 2nBN, ω 2xX0, wherein ω 2122=L=ω 2n2xAnd ω 21+ ω 22+ L+ ω 2n+ ω 2x=1; By that analogy.
C), try to achieve all registered speakers' in speaker to be identified and the recognition system grey correlation weighted mean characteristic parameter sequence respectively, promptly registered speaker A and speaker X to be identified constitute new characteristic parameter sequence A Y=ω according to superposition principle 11A1+ ω 12A2+L+ ω 1nAN+ ω 1xX1, registered speaker B and speaker X to be identified constitute new characteristic parameter sequence B Y=ω 21B1+ ω 22B2+L+ ω 2 nBN+ ω 2xX1, by that analogy.
D), establish X={x σ(t) | σ=0,1,2, K, m} are the serial correlation factor set, x 0Be reference function (female factor), x iBe comparison function (sub-factor), x σ(k) be x σThe value of ordering at k, wherein, i=1,2, K, m, k=1,2, K, n.
For x 0, x i, order
&zeta; i ( k ) = &xi; &CenterDot; max i &Element; m max k &Element; n | x 0 ( k ) - x i ( k ) | &lambda; 1 | x 0 ( k ) i &Element; m - x i ( k ) k &Element; n | + &lambda; 2 | x 0 &prime; i &Element; m - x i &prime; k &Element; n ( k ) | + &xi; &CenterDot; max max | x 0 ( k ) - x i ( k ) |
Obtain x iFor x 0The grey degree of association
&gamma; i = &gamma; ( x 0 , x i ) = 1 n &CenterDot; &Sigma; k = 1 n &zeta; i ( k )
Wherein, 0<ε<1, λ 1, λ 2>=0, λ 1+ λ 2=1, constant ξ is a resolution ratio, λ 1, λ 2Be respectively displacement and rate of change weighting coefficient, in practical application, can suitably choose ξ, λ as the case may be 1, λ 2
In the present embodiment, get resolution coefficient ξ=0.9, displacement weighting coefficient λ 1=0.95, rate of change weighting coefficient λ 2=0.05.Calculate speaker to be identified and registered speaker's grey association degree value according to above-mentioned steps; Be the grey association degree value RA of registered speaker A and speaker X to be identified; The grey association degree value RB of registered speaker B and speaker X to be identified, by that analogy.
The 5th step: identification and matching and strategy choice
Speaker Identification coupling and tactful choice process are as shown in Figure 5, specific as follows among the present invention:
1, obtains grey association degree maximal value
In speaker to be identified and all registered grey association degree values of saying the people, extract grey association degree maximal value, i.e. R Max=max{RA, RB, K}, wherein, RA is the grey association degree value of speaker X to be identified and registered speaker A, RB is the grey association degree value of speaker X to be identified and registered speaker B, by that analogy.
2, Speaker Identification coupling and strategy choice
With the grey association degree maximal value R that extracts MaxWith grey association degree recognition threshold R θRelatively, if R Max>=R then matees successfully, has the registered speaker of maximum weighted grey relational grade value in the artificial recognition system of speaking promptly to be identified with it; Otherwise the coupling failure, people promptly to be identified is not registered speaker in the recognition system.Wherein, grey association degree recognition threshold R θProvide by a large amount of experiment statistics analyses.
Present embodiment is gathered 14 speakers' (7 men, 7 woman) voice signal, and every speaker enrolls 10 sections different content of text, every section duration 28 seconds, and the content of text between each speaker is also different.In order to reduce to gather beginning and to conclude by saying the voice difference that words voice change of tune disorder brings, clipped every section voice signal head and the tail each 3 seconds, then every section voice signal duration is 22 seconds.On this basis,, carry out voice signal pre-service and characteristic parameter extraction, set up registered speaker's characteristic parameter reference library by above-mentioned embodiment to the optional respectively 5 sections voice signals of every speaker; Then appoint and get the voice signal of one section remainder, carry out voice signal pre-service and characteristic parameter extraction, obtain speaker's to be identified characteristic parameter sequence, and calculate the grey association degree by above-mentioned embodiment by above-mentioned embodiment; Extract maximum weighted grey relational grade value at last, compare, draw the Speaker Identification result with grey association degree recognition threshold.Represent above-mentioned speaker with A, B, C, D, E, F, G, H, I, J, K, L, M, N, the practical implementation step of present embodiment is detailed at present.
Extract one section voice signal of the speaker A that has gathered, this time domain signal is shown in accompanying drawing 6; By above-mentioned embodiment successively to its carry out sample quantization, zero-suppress float, pre-emphasis and windowing, obtain pretreated voice signal, the frame signal of its voice is shown in accompanying drawing 7; Then adopt the third-octave Spectral Analysis Method that pretreated voice signal is carried out feature extraction, obtain the third-octave frequency spectrum, shown in accompanying drawing 8, its characteristic parameter sequence of reentrying, as shown in table 1.
The characteristic parameter sequence of form 1 registered speaker A
Figure BDA0000122444270000111
Figure BDA0000122444270000121
According to above-mentioned steps, the other four sections voice signals to speaker A carry out feature extraction respectively, obtain its characteristic parameter sequence, make up all characteristic parameter sequences of speaker A again, set up the characteristic parameter reference library of speaker A, and are as shown in table 2.According to the step of the characteristic parameter reference library of setting up speaker A, set up the characteristic parameter reference library of speaker B, C, D, E, F, G, H, I, J, K, L, M, N more respectively successively.
The characteristic parameter reference library of form 2 registered speaker A
Figure BDA0000122444270000122
Figure BDA0000122444270000131
Appoint and get the remaining voice signal of one section speaker A,, carry out voice signal pre-service and characteristic parameter extraction successively, obtain people's to be identified characteristic parameter sequence according to above-mentioned implementation step.According to grey association degree algorithm provided by the present invention, ask for the grey association degree of speaker A to be identified and registered speaker A, B, C, D, E, F, G, H, I, J, K, L, M, N, its result is as shown in table 3.
Form A B C D E F G
A 0.9528 0.8006 0.7440 0.8039 0.7995 0.8598 0.8016
H I J K L M N
A 0.7903 0.8267 0.7804 0.8741 0.8057 0.8887 0.7945
Extract one section remaining voice signal of other speakers successively arbitrarily; Method of operating according to speaker A to be identified; Ask for speaker B to be identified, C, D, E, F, G, H, I, J, K, L, M, N and all registered speakers' grey association degree; Its result is as shown in table 4, the horizontal registered speaker of letter representation, vertically letter representation speaker to be identified in the table.
Grey association degree between form 4 all speakers to be identified and all registered speakers
A B C D E F G
A 0.9528 0.8006 0.7440 0.8039 0.7995 0.8598 0.8016
B 0.8295 0.9050 0.8281 0.8699 0.8693 0.8387 0.8967
C 0.7306 0.8556 0.9628 0.8324 0.7968 0.7509 0.8407
D 0.7935 0.8371 0.7769 0.8762 0.8421 0.8335 0.8324
E 0.8214 0.8601 0.8119 0.8426 0.9645 0.8501 0.8921
F 0.8659 0.8292 0.7851 0.8391 0.8647 0.9489 0.8447
G 0.7940 0.9030 0.8868 0.8750 0.8899 0.8159 0.9324
H 0.7799 0.7990 0.8216 0.7979 0.7488 0.7641 0.7857
I 0.7949 0.8201 0.7710 0.8335 0.8437 0.8091 0.8178
J 0.8086 0.7748 0.8327 0.8450 0.8106 0.8024 0.8251
K 0.8710 0.7829 0.7517 0.8055 0.7924 0.8763 0.8041
L 0.8142 0.8276 0.8629 0.8865 0.9038 0.8343 0.9274
M 0.8958 0.8350 0.7777 0.8239 0.8207 0.8965 0.8273
N 0.8103 0.8896 0.8593 0.8784 0.8838 0.8242 0.9081
H I J K L M N
A 0.7903 0.8267 0.7804 0.8741 0.8057 0.8887 0.7945
B 0.7761 0.8681 0.7816 0.8188 0.8749 0.8415 0.8675
C 0.798 0.8425 0.8151 0.7278 0.8138 0.7466 0.8425
D 0.7182 0.8530 0.7202 0.7804 0.8238 0.7953 0.8465
E 0.7697 0.8717 0.7671 0.8049 0.9012 0.8349 0.8842
F 0.7909 0.8717 0.7925 0.8900 0.8479 0.9072 0.8325
G 0.8190 0.8892 0.8326 0.7916 0.9209 0.8058 0.9047
H 0.9432 0.8127 0.8982 0.8106 0.7702 0.8063 0.7913
I 0.7299 0.9198 0.7214 0.7715 0.8157 0.7775 0.8432
J 0.8935 0.7634 0.9605 0.8445 0.8095 0.8514 0.8099
K 0.8380 0.8286 0.8370 0.9502 0.8075 0.9011 0.7990
L 0.8127 0.8667 0.8234 0.8117 0.9435 0.8227 0.9051
M 0.8359 0.8318 0.8401 0.9094 0.8235 0.9565 0.815
N 0.8053 0.8598 0.8058 0.805 0.8984 0.8158 0.9310
By the maximal value of grey association degree between above-mentioned embodiment all speakers to be identified of extraction and all registered speakers, see the numerical value of the overstriking in the table 4 for details.Through the analysis to a large amount of experimental results, the grey association degree recognition threshold that present embodiment is chosen Speaker Identification is 0.9.Acquired maximal value and its are compared, draw the Speaker Identification result, as shown in table 5.
Form 5 Speaker Identification results
The total number of persons of Speaker Identification (position) 14
Grey association degree maximal value is greater than the number (position) of recognition threshold 13
Accuracy with the Speaker Identification of text-independent 92.86%
Recognition result shown in the table 5; Show provided by the invention based on third-octave and grey association and method for distinguishing speek person text-independent; Adopt the third-octave Spectral Analysis Method that speaker's voice signal is carried out characteristic parameter extraction, carry out Speaker Identification, improved accuracy with the Speaker Identification of text-independent through grey association degree algorithm; Realized and the robustness of the Speaker Identification of text-independent, be with a wide range of applications.
More than to provided by the present invention a kind of based on third-octave and grey association and method for distinguishing speek person text-independent; Carried out detailed concrete introduction; And principle of the present invention and embodiment have further been set forth through concrete embodiment; The explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof, rather than its invention is limited, and is any in the protection domain of spirit of the present invention and claim; Any modification and change to the present invention makes all fall into protection scope of the present invention.

Claims (5)

1. the method for distinguishing speek person with text-independent is characterized in that, comprises the steps:
One, sets up N speaker's phonetic feature reference library, set grey association degree recognition threshold R θDescribed N is the integer more than or equal to 1, and step is following:
A, gather the 1st section voice signal of the 1st speaker and successively sample quantization, zero-suppress float, pre-emphasis and windowing, obtain the 1-1 audio frame F after the windowing m' (n);
B, to 1-1 audio frame F m' (n) use the third-octave Spectral Analysis Method, obtain the 1-1 characteristic parameter, described characteristic parameter is the corresponding power spectral value sequence of each centre frequency frequency band of living in,
C, a N speaker carry out M A, B step successively, obtain N * M characteristic parameter successively, and described N characteristic parameter sequence forms the phonetic feature reference library;
Two, obtain N grey association degree, step is following:
I, gather speaker characteristic parameter X to be measured through steps A, B;
II, add to the sequence of characteristic parameter X in the reference library respectively; And the sequence of giving N characteristic parameter equably according to the timeinvariance of frequency-region signal is with identical weight coefficient; Reconfigure N weighted mean characteristic parameter sequence of formation, obtain N grey association degree value;
Three, identification and matching is extracted maximal value R in N the grey association degree value MaxWith R θRelatively, if R Max>=R, then coupling not, does not then match.
2. the method for distinguishing speek person of a kind of and text-independent according to claim 1, it is characterized in that: the step of the feature extraction described in the step B is:
(A) signal time-frequency conversion: adopt base-2 Algorithm FFT conversion to convert the time-domain signal of speaker's voice into frequency-region signal, ask for the power spectrum of speaker's voice signal;
(B) confirm the centre frequency f of third-octave Spectral Analysis Method c
(C) ask for the upper and lower limit frequency: the upper and lower limit frequency of third-octave and the relation between the centre frequency do
f u f d = 2 1 / 3 , f c f d = 2 1 / 6 , f u f c = 2 1 / 6 ;
(D) sound pressure level conversion, promptly
L p = 20 lg P P 0 ( dB )
P wherein 0Be reference acoustic pressure, its value is 2 * 10 -5Pa;
(E) calculate each centre frequency f cThe mean value of the power spectrum of frequency band of living in: upper and lower limit frequency and centre frequency according to third-octave become a plurality of frequency bands with the frequency partition in the power spectrum; And in each frequency band, all power magnitude are pressed the logarithm stack; Obtain the third-octave frequency spectrum, its amplitude is characteristic parameter.
3. the method for distinguishing speek person of a kind of and text-independent according to claim 1 is characterized in that: the detailed step that the grey association degree described in the Step II calculates is:
(F) extract the characteristic parameter sequence: the sequence X 0 that obtains speaker characteristic parameter X to be identified; And extract each characteristic parameter sequences of all registered speaker's reference library; Be characteristic parameter sequence A 1, A2, AN of registered speaker A; The characteristic parameter sequence B 1 of registered speaker B, B2,
Figure FDA0000122444260000022
BN, by that analogy;
(G) structure weighted mean characteristic parameter sequence: speaker's to be identified characteristic parameter sequence is added to respectively in the recognition system in all registered speaker's reference library; And give these characteristic parameter sequences equably with identical weight coefficient according to the timeinvariance of frequency-region signal, so that reconfiguring with registered speaker respectively, speaker to be identified constitutes weighted mean characteristic parameter sequence.Be that registered speaker A and speaker X to be identified constitute sequence ω 11A1, ω 12A2,
Figure FDA0000122444260000023
ω 1nAN, ω 1xX0, wherein ω 1112=L=ω 1n1xAnd ω 11+ ω 12+ L+ ω 1n+ ω 1x=1; Registered speaker B and speaker X to be identified constitute sequence ω 21B1, ω 22B2,
Figure FDA0000122444260000024
ω 2nBN, ω 2xX0, wherein ω 2122=L=ω 2n2xAnd ω 21+ ω 22+ L+ ω 2n+ ω 2x=1, by that analogy;
(H) add up and generate weighted mean grey correlation characteristic parameter sequence: try to achieve all registered speakers' in speaker to be identified and the recognition system weighted mean grey correlation characteristic parameter sequence, promptly registered speaker A and the new characteristic parameter sequence A Y=ω of speaker X to be identified formation respectively according to superposition principle 11A1+ ω 12A2+L+ ω 1nAN+ ω 1xX1, registered speaker B and speaker X to be identified constitute new characteristic parameter sequence B Y=ω 21B1+ ω 22B2+L+ ω 2nBN+ ω 2xX1, by that analogy;
(I) calculate the grey association degree: by grey association degree algorithm computation speaker to be identified and registered speaker's grey association degree; Be the grey association degree RA of registered speaker A and speaker X to be identified; The grey association degree RB of registered speaker B and speaker X to be identified; By that analogy, obtain N grey association degree R.
4. the method for distinguishing speek person of a kind of and text-independent according to claim 2, it is characterized in that: definite method of the centre frequency of described third-octave Spectral Analysis Method is:
The centre frequency of third-octave is f c=1000 * 10 3n/30HZ (n=0, ± 1, ± 2, K);
Choose the approximate value of centre frequency, promptly selected centre frequency is: 20HZ, 25HZ, 31.5HZ, 40HZ, 50HZ, 63HZ, 80HZ; 100HZ, 125HZ, 160HZ, 200HZ, 250HZ, 315HZ, 400HZ, 500HZ; 630HZ, 800HZ, 1000HZ, 1350HZ, 1600HZ, 2000HZ, 2500HZ, 3150HZ; 4000HZ, 5000HZ, 6300HZ, 8000HZ, 10000HZ, 12500HZ, 16000HZ.
5. the method for distinguishing speek person of a kind of and text-independent according to claim 3, it is characterized in that: the algorithm of described grey association degree is:
If X={x σ(t) | σ=0,1,2, K, m} are the serial correlation factor set, i.e. reference library, x 0Be reference function (female factor), i.e. one of them registered speaker;
Xi is comparison function (a sub-factor), i.e. speaker's to be measured characteristic factor X, x σ(k) be x σThe value of ordering at k, wherein, i=1,2, K, m, k=1,2, K, n.
For x 0, x i, order
&zeta; i ( k ) = &xi; &CenterDot; max i &Element; m max k &Element; n | x 0 ( k ) - x i ( k ) | &lambda; 1 | x 0 ( k ) i &Element; m - x i ( k ) k &Element; n | + &lambda; 2 | x 0 &prime; i &Element; m - x i &prime; k &Element; n ( k ) | + &xi; &CenterDot; max max | x 0 ( k ) - x i ( k ) |
X then iFor x 0The grey degree of association do
&gamma; i = &gamma; ( x 0 , x i ) = 1 n &CenterDot; &Sigma; k = 1 n &zeta; i ( k )
Wherein, 0<ε<1, λ 1, λ 2>=0, λ 1+ λ 2=1, constant ξ is a resolution ratio, λ 1, λ 2Be respectively displacement and rate of change weighting coefficient, in practical application, can suitably choose ξ, λ as the case may be 1, λ 2
CN201110428379.2A 2011-12-20 2011-12-20 Speaker identification method irrelevant with text Expired - Fee Related CN102496366B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110428379.2A CN102496366B (en) 2011-12-20 2011-12-20 Speaker identification method irrelevant with text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110428379.2A CN102496366B (en) 2011-12-20 2011-12-20 Speaker identification method irrelevant with text

Publications (2)

Publication Number Publication Date
CN102496366A true CN102496366A (en) 2012-06-13
CN102496366B CN102496366B (en) 2014-04-09

Family

ID=46188183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110428379.2A Expired - Fee Related CN102496366B (en) 2011-12-20 2011-12-20 Speaker identification method irrelevant with text

Country Status (1)

Country Link
CN (1) CN102496366B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104167208A (en) * 2014-08-08 2014-11-26 中国科学院深圳先进技术研究院 Speaker recognition method and device
CN105244031A (en) * 2015-10-26 2016-01-13 北京锐安科技有限公司 Speaker identification method and device
CN106328168A (en) * 2016-08-30 2017-01-11 成都普创通信技术股份有限公司 Voice signal similarity detection method
CN108154189A (en) * 2018-01-10 2018-06-12 重庆邮电大学 Grey relational cluster method based on LDTW distances
CN109065026A (en) * 2018-09-14 2018-12-21 海信集团有限公司 A kind of recording control method and device
CN112885355A (en) * 2021-01-25 2021-06-01 上海头趣科技有限公司 Speech recognition method based on multiple features

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1986005618A1 (en) * 1985-03-21 1986-09-25 American Telephone & Telegraph Company Individual recognition by voice analysis
US5548647A (en) * 1987-04-03 1996-08-20 Texas Instruments Incorporated Fixed text speaker verification method and apparatus
US5950157A (en) * 1997-02-28 1999-09-07 Sri International Method for establishing handset-dependent normalizing models for speaker recognition
CN1941080A (en) * 2005-09-26 2007-04-04 吴田平 Soundwave discriminating unlocking module and unlocking method for interactive device at gate of building
CN101266792A (en) * 2007-03-16 2008-09-17 富士通株式会社 Speech recognition system and method for speech recognition
CN101405739A (en) * 2002-12-26 2009-04-08 摩托罗拉公司(在特拉华州注册的公司) Identification apparatus and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1986005618A1 (en) * 1985-03-21 1986-09-25 American Telephone & Telegraph Company Individual recognition by voice analysis
US5548647A (en) * 1987-04-03 1996-08-20 Texas Instruments Incorporated Fixed text speaker verification method and apparatus
US5950157A (en) * 1997-02-28 1999-09-07 Sri International Method for establishing handset-dependent normalizing models for speaker recognition
CN101405739A (en) * 2002-12-26 2009-04-08 摩托罗拉公司(在特拉华州注册的公司) Identification apparatus and method
CN1941080A (en) * 2005-09-26 2007-04-04 吴田平 Soundwave discriminating unlocking module and unlocking method for interactive device at gate of building
CN101266792A (en) * 2007-03-16 2008-09-17 富士通株式会社 Speech recognition system and method for speech recognition

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《声学技术》 20021231 王宏等 基于长时平均频谱的"文本无关"话者识别 59-62 1-5 第21卷, 第2期 *
曾毓敏等: "基于浊音语音谐波谱子带加权重建的抗噪声说话人识别", 《东南大学学报(自然科学版)》, vol. 38, no. 06, 30 November 2008 (2008-11-30), pages 925 - 941 *
王宏等: "基于长时平均频谱的"文本无关"话者识别", 《声学技术》, vol. 21, no. 2, 31 December 2002 (2002-12-31), pages 59 - 62 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104167208A (en) * 2014-08-08 2014-11-26 中国科学院深圳先进技术研究院 Speaker recognition method and device
CN104167208B (en) * 2014-08-08 2017-09-15 中国科学院深圳先进技术研究院 A kind of method for distinguishing speek person and device
CN105244031A (en) * 2015-10-26 2016-01-13 北京锐安科技有限公司 Speaker identification method and device
CN106328168A (en) * 2016-08-30 2017-01-11 成都普创通信技术股份有限公司 Voice signal similarity detection method
CN108154189A (en) * 2018-01-10 2018-06-12 重庆邮电大学 Grey relational cluster method based on LDTW distances
CN109065026A (en) * 2018-09-14 2018-12-21 海信集团有限公司 A kind of recording control method and device
CN112885355A (en) * 2021-01-25 2021-06-01 上海头趣科技有限公司 Speech recognition method based on multiple features

Also Published As

Publication number Publication date
CN102496366B (en) 2014-04-09

Similar Documents

Publication Publication Date Title
CN102893326B (en) Chinese voice emotion extraction and modeling method combining emotion points
CN101178897B (en) Speaking man recognizing method using base frequency envelope to eliminate emotion voice
CN102496366A (en) Speaker identification method irrelevant with text
CN102509547A (en) Method and system for voiceprint recognition based on vector quantization based
CN101226743A (en) Method for recognizing speaker based on conversion of neutral and affection sound-groove model
CN106024010B (en) A kind of voice signal dynamic feature extraction method based on formant curve
CN102968990A (en) Speaker identifying method and system
CN104887263A (en) Identity recognition algorithm based on heart sound multi-dimension feature extraction and system thereof
CN106531174A (en) Animal sound recognition method based on wavelet packet decomposition and spectrogram features
Waghmare et al. Emotion recognition system from artificial marathi speech using MFCC and LDA techniques
CN103456302A (en) Emotion speaker recognition method based on emotion GMM model weight synthesis
CN109272986A (en) A kind of dog sound sensibility classification method based on artificial neural network
Abdallah et al. Text-independent speaker identification using hidden Markov model
Linh et al. MFCC-DTW algorithm for speech recognition in an intelligent wheelchair
Fagerlund et al. New parametric representations of bird sounds for automatic classification
Kumar et al. Hybrid of wavelet and MFCC features for speaker verification
Martin et al. Cepstral modulation ratio regression (CMRARE) parameters for audio signal analysis and classification
Kumar et al. Text dependent speaker identification in noisy environment
CN111785262B (en) Speaker age and gender classification method based on residual error network and fusion characteristics
Aggarwal et al. Performance evaluation of artificial neural networks for isolated Hindi digit recognition with LPC and MFCC
Islam et al. A Novel Approach for Text-Independent Speaker Identification Using Artificial Neural Network
Bansod et al. Speaker Recognition using Marathi (Varhadi) Language
GS et al. Synthetic speech classification using bidirectional LSTM Networks
Abdulwahid et al. Arabic Speaker Identification System for Forensic Authentication Using K-NN Algorithm
Dua et al. Speaker recognition using noise robust features and LSTM-RNN

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140409

Termination date: 20161220

CF01 Termination of patent right due to non-payment of annual fee