CN102496366A - Speaker identification method irrelevant with text - Google Patents
Speaker identification method irrelevant with text Download PDFInfo
- Publication number
- CN102496366A CN102496366A CN2011104283792A CN201110428379A CN102496366A CN 102496366 A CN102496366 A CN 102496366A CN 2011104283792 A CN2011104283792 A CN 2011104283792A CN 201110428379 A CN201110428379 A CN 201110428379A CN 102496366 A CN102496366 A CN 102496366A
- Authority
- CN
- China
- Prior art keywords
- speaker
- characteristic parameter
- frequency
- sequence
- identified
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention relates to a speaker identification method irrelevant with a text. The method mainly comprises the following steps: (1) acquiring a speaker voice signal and processing the voice signal so as to obtain a voice pretreatment signal; (2) carrying out feature extraction to the voice signal obtained after the pretreatment so as to obtain a characteristic parameter of a speaker in the identification system; (3) repeating the above two steps several times, obtaining a characteristic parameter sequence of the registered speaker and establishing a characteristic parameter reference database of all the registered speakers; (4) acquiring the characteristic parameter sequence of the speaker to be identified and calculating a weighted grey correlation degree between the speaker to be identified and all the registered speakers; (5) extracting a maximum value of all the weighted grey correlation degrees, comparing a weighted grey correlation degree identification threshold with the maximum value so as to obtain a identification result. The invention relates to a biological characteristic identification technology field and especially relates to the speaker identification technology field. In the current speaker identification technology which is irrelevant with the text, an error rate is high. By using the method of the invention, the above problem can be solved. A wide application prospect can be possessed.
Description
Technical field
The present invention relates to biometrics identification technology, mainly is a kind of based on third-octave and grey association and method for distinguishing speek person text-independent.
Technical background
The raising of the development of Along with computer technology and social informatization degree, utilizing people's biological characteristic (like fingerprint, vocal print, image etc.) to carry out identification or checking has become in the information industry very important advanced technology.Speaker Identification is meant that the pronunciation that utilizes the people carries out the identification or the checking of speaker ' s identity, and Speaker Identification can be widely used in fields such as police and judicial department, business transaction, bank finance, conservative individual's secret, safety inspection.
The research emphasis in speaker Recognition Technology field is the extraction of characteristic parameter and the structure of recognizer.Feature extraction is exactly from speaker's voice signal, to extract the characteristic parameter that can at large, accurately express its voice.At present; The characteristic parameter that uses in the speech recognition technology is based on LPCC (the Linear Prediction Cepstrum Coefficient) parameter of channel model, based on MFCC (Mel Frequency Cepstmm Coefficient) parameter or its improvement and the combination of hearing mechanism, but the voice messaging quantity not sufficient that these characteristic parameters characterized.Therefore the present invention proposes to adopt the third-octave Spectral Analysis Method voice signal to be carried out the extraction of characteristic parameter.The whole audio range of 20HZ-20KHZ that the third-octave Spectral Analysis Method can be heard people's ear is divided into the frequency band of 30 constant bandwidth ratios; And the sound signal that drops in these frequency bands carried out spectrum analysis; Can express the information that is contained in speaker's the voice signal more accurately, and then strengthen the robustness of speaker characteristic parameter.
In voice technology research and application, the recognizer of voice signal has three kinds: based on the method for the method of channel model and voice knowledge, template matches and utilize Artificial Neural Network model.Though the research based on channel model and voice knowledge method starts to walk early, because it is too complicated, present stage is not obtained good practical function.The method of template matches has dynamic time warping (DTW), theoretical, vector quantization (VQ) technology of hidden Markov (HMM), and these algorithms are poor anti jamming capability under noise circumstance, can not reach good identification effect.Artificial Neural Network has adaptivity, concurrency, robustness, fault-tolerance and learning characteristic; Classification capacity that it is powerful and input-output mapping ability are all very attractive in speech recognition; But, can not obtain good practical function owing to have training, the oversize shortcoming of recognition time.The present invention proposes to use the method based on the grey association degree to carry out Speaker Identification, considers the information and the effect of information change in Speaker Identification thereof of containing in speaker's voice signal simultaneously, has improved the discrimination of voice signal significantly.
Speaker Identification can be divided into relevant with text again and with text-independent, and this two all be to carry out Speaker Identification according to the characteristic information that contains in the voice signal." relevant with text " is to adopt the restricted content of text of speaking, and only one or more characteristic parameters in speaker's the voice signal discerned that be easier to palmed off imitation, the confidentiality of recognition system is not high." with text-independent " then is to adopt the content of text of speaking at random, and the dirigibility of recognition system is good.But since in the voice signal contain the rich of characteristic information, and the complicacy of noise in the actual environment, the step of traditional method for distinguishing speek person is more loaded down with trivial details again.
Summary of the invention
In order to solve the Speaker Identification rate of the existing defective of above-mentioned technology and raising and text-independent, it is a kind of based on third-octave and grey association and method for distinguishing speek person text-independent that the present invention provides.This method is carried out feature extraction through the third-octave Spectral Analysis Method to speaker's voice signal; And adopt grey association degree algorithm to carry out Speaker Identification; Be a kind of reliably, effectively and the method for distinguishing speek person of text-independent, have good robustness.
For reaching above goal of the invention, the inventive method comprises the steps:
One, set up N speaker's phonetic feature reference library, described N is the integer more than or equal to 1, and step is following:
A, gather the 1st section voice signal of the 1st speaker and successively sample quantization, zero-suppress float, pre-emphasis and windowing, obtain the 1-1 audio frame F after the windowing
m' (n);
B, to 1-1 audio frame F
m' (n) use the third-octave Spectral Analysis Method, obtain the 1-1 characteristic parameter, described characteristic parameter is the corresponding power spectral value sequence of each centre frequency frequency band of living in, described 1-1 representes the 1st section voice signal of the 1st speaker;
C, a N speaker carry out M A, B step, obtain N * M characteristic parameter successively, and described N * M characteristic parameter forms the characteristic parameter reference library, and described N * M representes M characteristic parameter extraction of N speaker;
Two, obtain N grey association degree, step is following:
I, gather speaker characteristic parameter X to be measured through steps A, B;
II, add to the sequence of characteristic parameter X in the reference library respectively; And the sequence of giving N characteristic parameter equably according to the timeinvariance of frequency-region signal is with identical weight coefficient; Reconfigure N weighted mean characteristic parameter sequence of formation, obtain N grey association degree value;
Three, identification and matching is extracted maximal value R in N the grey association degree value
MaxWith R
θRelatively, if R
Max>=R, then coupling not, does not then match.
According to the method for distinguishing speek person of a kind of and text-independent of one embodiment of the present invention, the step of the feature extraction described in the step B is:
(A) signal time-frequency conversion: adopt base-2 Algorithm FFT conversion to convert the time-domain signal of speaker's voice into frequency-region signal, ask for the power spectrum of speaker's voice signal;
(B) confirm the centre frequency f of third-octave Spectral Analysis Method
c
(C) ask for the upper and lower limit frequency: the upper and lower limit frequency of third-octave and the relation between the centre frequency are:
(D) sound pressure level conversion, promptly
P wherein
0Be reference acoustic pressure, its value is 2 * 10
-5Pa;
(E) calculate each centre frequency f
cThe mean value of the power spectrum of frequency band of living in: upper and lower limit frequency and centre frequency according to third-octave become a plurality of frequency bands with the frequency partition in the power spectrum; And in each frequency band, all power magnitude are pressed the logarithm stack; Obtain the third-octave frequency spectrum, its amplitude is characteristic parameter.
According to the method for distinguishing speek person of a kind of and text-independent of one embodiment of the present invention, the detailed step that the grey association degree described in the Step II calculates is:
(F) extract the characteristic parameter sequence: the sequence X 0 that obtains speaker characteristic parameter X to be identified; And extract each characteristic parameter sequences of all registered speaker's reference library; Be characteristic parameter sequence A 1, A2,
AN of registered speaker A; The characteristic parameter sequence B 1 of registered speaker B, B2,
BN, by that analogy;
(G) structure weighted mean characteristic parameter sequence: speaker's to be identified characteristic parameter sequence is added to respectively in the recognition system in all registered speaker's reference library; And give these characteristic parameter sequences equably with identical weight coefficient according to the timeinvariance of frequency-region signal, so that reconfiguring with registered speaker respectively, speaker to be identified constitutes weighted mean characteristic parameter sequence.Be that registered speaker A and speaker X to be identified constitute sequence ω
11A1, ω
12A2,
ω
1nAN, ω
1xX0, wherein ω
11=ω
12=L=ω
1n=ω
1xAnd ω
11+ ω
12+ L+ ω
1n+ ω
1x=1; Registered speaker B and speaker X to be identified constitute sequence ω
21B1, ω
22B2,
ω
2nBN, ω
2xX0, wherein ω
21=ω
22=L=ω
2n=ω
2xAnd ω
21+ ω
22+ L+ ω
2n+ ω
2x=1, by that analogy;
(H) add up and generate weighted mean grey correlation characteristic parameter sequence: try to achieve all registered speakers' in speaker to be identified and the recognition system weighted mean grey correlation characteristic parameter sequence, promptly registered speaker A and the new characteristic parameter sequence A Y=ω of speaker X to be identified formation respectively according to superposition principle
11A1+ ω
12A2+L+ ω
1nAN+ ω
1xX1, registered speaker B and speaker X to be identified constitute new characteristic parameter sequence B Y=ω
21B1+ ω
22B2+L+ ω
2nBN+ ω
2xX1, by that analogy;
(I) calculate the grey association degree: by grey association degree algorithm computation speaker to be identified and registered speaker's grey association degree; Be the grey association degree RA of registered speaker A and speaker X to be identified; The grey association degree RB of registered speaker B and speaker X to be identified; By that analogy, obtain N grey association degree R.
According to the method for distinguishing speek person of a kind of and text-independent of one embodiment of the present invention, definite method of the centre frequency of described third-octave Spectral Analysis Method is:
The centre frequency of third-octave is f
c=1000 * 10
3n/30HZ (n=0, ± 1, ± 2, K);
Choose the approximate value of centre frequency, promptly selected centre frequency is: 20HZ, 25HZ, 31.5HZ, 40HZ, 50HZ, 63HZ, 80HZ; 100HZ, 125HZ, 160HZ, 200HZ, 250HZ, 315HZ, 400HZ, 500HZ; 630HZ, 800HZ, 1000HZ, 1350HZ, 1600HZ, 2000HZ, 2500HZ, 3150HZ; 4000HZ, 5000HZ, 6300HZ, 8000HZ, 10000HZ, 12500HZ, 16000HZ.
According to the method for distinguishing speek person of a kind of and text-independent of one embodiment of the present invention, the algorithm of described grey association degree is:
If X={x
σ(t) | σ=0,1,2, K, m} are the serial correlation factor set, i.e. reference library, x
0Be reference function (female factor), i.e. one of them registered speaker;
x
iBe comparison function (sub-factor) i.e. speaker's to be measured characteristic factor X, x
σ(k) be x
σThe value of ordering at k, wherein, i=1,2, K, m, k=1,2, K, n.
For x
0, x
i, order:
X then
iFor x
0The grey degree of association be:
Wherein, 0<ε<1, λ
1, λ
2>=0, λ
1+ λ
2=1, constant ξ is a resolution ratio, λ
1, λ
2Be respectively displacement and rate of change weighting coefficient, in practical application, can suitably choose ξ, λ as the case may be
1, λ
2
The effect that the present invention is useful is: the present invention adopts the third-octave Spectral Analysis Method that speaker's voice signal is carried out characteristic parameter extraction; The information that voice signal contained in the whole audio range of 20HZ-20KHZ of hearing people's ear more fully extracts, and has reduced the adverse effect that the characteristic information of voice signal in the Speaker Identification process does not bring entirely; This invention is carried out Speaker Identification through grey association degree algorithm, considers the information and the effect of information change in Speaker Identification of containing in speaker's voice signal simultaneously, has reduced the error rate of Speaker Identification.This based on third-octave and grey association and method for distinguishing speek person text-independent; Realized robustness with the Speaker Identification of text-independent; Improved the discrimination with speaker's voice signal of text-independent significantly, be with a wide range of applications.
Description of drawings
Fig. 1 is the process flow diagram of method provided by the invention;
Fig. 2 is a third-octave feature extraction process flow diagram of the present invention;
Fig. 3 is FFT butterfly computation symbol figure of the present invention;
Fig. 4 is a grey association degree algorithm flow chart of the present invention;
Fig. 5 is identification and matching of the present invention and strategy choice process flow diagram;
Fig. 6 is one section voice signal figure of speaker A of the present invention;
Fig. 7 is the frame signal figure of voice after one section pre-service of speaker A of the present invention;
Fig. 8 is the width of cloth third-octave spectrogram of speaker A of the present invention.
Embodiment
Through accompanying drawing and embodiment technical scheme of the present invention is done further detailed description below.Method of the present invention was divided into for five steps, shown in accompanying drawing 1.
The first step: voice signal pre-service
1, sampling and quantification
A), with the FIR BPF. to voice signal through the row filtering, make nyquist frequency F
NBe 20KHZ;
B), speech sample frequency F>=2F is set
N, get it among the embodiment according to the invention and be F=51200HZ;
C), to voice signal s
a(t) sample by the cycle, obtain the voice signal amplitude sequence
Wherein t representes that this voice signal is a time-continuous signal, and n then representes the discrete signal sequence, is taken as continuous natural number during the n value and gets final product;
D), the amplitude sequence s (n) of audio digital signals is carried out quantization encoding, the quantized value that obtains amplitude sequence is represented s ' (n) with pulse code (PCM).
2, zero-suppress and float
A), the quantized value that calculates amplitude sequence is represented s ' mean value
(n)
B), each amplitude in the amplitude sequence is deducted mean value respectively, obtain zero-suppressing float back mean value be 0 amplitude sequence s " (n);
3, pre-emphasis is handled
A), Z transfer function H (the z)=1-az of digital filter is set
-1In pre emphasis factor a, the desirable ratio of a 1 is slightly for a short time to be value, getting it in the present embodiment is 0.96;
B), s ", obtain the suitable amplitude sequence s of high, medium and low frequency amplitude of voice signal " (n) through digital filter ' (n).
4, windowing
A), the frame length N of computing voice frame, N satisfies:
Wherein, F is the speech sample rate, and unit is HZ;
B), be that frame length, N/2 are the frame amount of moving with N, s " ' (n) be divided into a series of speech frame F
m, each speech frame F
mComprise N voice signal sample;
C), calculate Hamming window function:
N is each audio frame F in the formula
mFrame length;
D), to each speech frame F
mAdd Hamming window:
Utilize formula F
m' (n): F
m' (n)=ω (n) * F
m(n) respectively to each audio frame F
mAdd Hamming window, obtain adding the audio frame F behind the Hamming window
m' (n).
Second step: characteristic parameter extraction
The present invention is based on third-octave and extract the characteristic parameter of pretreated speaker's voice signal.Its algorithm flow is as shown in Figure 2, and specifically details are as follows:
1, power spectrum is asked in Fast Fourier Transform (FFT) (FFT)
The present invention adopts base-2 Algorithm FFT to convert the time-domain signal of speaker's voice signal into frequency-region signal, asks for the power spectrum sequence of speaker's voice signal.
A), voice signal sequence x (n) is carried out " base-2 decimations in time ", obtain " decimation in time " subsequence, promptly
x
1(r)=x(2r),r=0,1,2,K,N/2-1
x
2(r)=x(2r+1),r=0,1,2,K,N/2-1
Wherein, N is the length of voice signal sequence.
B), voice signal x (n) is carried out discrete Fourier transformation (DFT), obtain the frequency-region signal of speaker's voice, promptly
Because
Therefore, the frequency-region signal of speaker's voice does
Wherein, X
1(k) and X
2(k) be respectively x
1(r) and x
2(r) N/2 point DFT, promptly
C), according to X
1(k) and X
2(k) periodicity (N/2) and
Symmetry
Obtain FFT frequency spectrum sequence:
Above-mentioned computing is as shown in Figure 3, so can try to achieve the FFT frequency domain power spectrum of voice signal after the pre-service.
2, confirm centre frequency
The centre frequency f of third-octave
cFor:
f
c=1000×10
3n/30HZ(n=0,±1,±2,K)
The centre frequency that the present invention adopts is its approximate value, and promptly selected centre frequency is: 20HZ, 25HZ, 31.5HZ, 40HZ, 50HZ, 63HZ, 80HZ; 100HZ, 125HZ, 160HZ, 200HZ, 250HZ, 315HZ, 400HZ, 500HZ; 630HZ, 800HZ, 1000HZ, 1350HZ, 1600HZ 2000HZ, 2500HZ, 3150HZ; 4000HZ, 5000HZ, 6300HZ, 8000HZ, 10000HZ, 12500HZ, 16000HZ.
3, ask for the bound frequency
The centre frequency f of third-octave
cFrequency band of living in is between upper limiting frequency f
uWith lower frequency limit f
dBetween.Its upper limiting frequency f
u, lower frequency limit f
dAnd centre frequency f
cBetween relation be:
Each centre frequency f of third-octave
cThe bandwidth of frequency band of living in is:
Δf=f
u-f
d=(2
1/6-2-
1/6)f
c
4, sound pressure level conversion
The whole audio range of 20HZ-20KHZ that the third-octave spectrum analysis can be heard people's ear is divided into the frequency band of 30 constant bandwidth ratios, and the sound signal that drops in these frequency bands is calculated sound pressure level.
Acoustic pressure according to sound signal can be obtained sound pressure level, and its transformational relation is:
Wherein, P
0Be reference acoustic pressure, its value is 2 * 10
-5Pa.
5, computing center's frequency f
cSpectrum value in the frequency band of living in
According to upper and lower limit frequency and centre frequency the frequency partition in the power spectrum is become a plurality of frequency bands, synthesize the third-octave power spectrum to the power spectrum of constant bandwidth ratio.1 power spectrum that the third-octave frequency band is interior, its synthetic method does
In the formula, S
x(f
n) be the synthetic power spectrum in 1 third-octave frequency band; S
x(f) be the interior discrete power spectrum of 1 third-octave frequency band.
For the discrete power spectrum, the power spectrum of n frequency band does
In the formula, S
X, n(f
i) be the power spectrum amplitude of each discrete frequency in this frequency band.
The mean value of band power spectrum is the amplitude A of this frequency band
n, promptly
The pairing amplitude of frequency band of 30 constant bandwidth ratios is speaker's characteristic parameter in the frequency spectrum, and these 30 characteristic parameters constitute speaker's characteristic parameter sequence.The 3rd step: set up speaker's reference library
Repeat the first step and the second step several times; Set up all registered speakers' in the Speaker Recognition System characteristic parameter reference library; Promptly characteristic parameter sequence A 1, A2,
AN by registered speaker A constitutes its reference library; Characteristic parameter sequence B 1, B2,
BN by registered speaker B constitute its reference library, and
so sets up all registered speakers' in the Speaker Recognition System reference library.14 registered speakers are arranged in the present embodiment, 5 characteristic parameter sequences are arranged in every speaker's the reference library.
The 4th step: ask for the grey association degree
Grey association degree algorithm flow is as shown in Figure 4 among the present invention, and specifically details are as follows:
1, structural attitude parameter association group
A), obtain the characteristic parameter sequence X 0 of speaker X to be identified; And extract each characteristic parameter sequence in all registered speaker's reference library; Be characteristic parameter sequence A 1, A2,
AN of registered speaker A; The characteristic parameter sequence B 1 of registered speaker B, B2,
BN, by that analogy.
B), speaker's to be identified characteristic parameter sequence is added to respectively in the recognition system in all registered speaker's reference library; And give these characteristic parameter sequences equably with identical weight coefficient according to the timeinvariance of frequency-region signal, so that reconfiguring with registered speaker respectively, speaker to be identified constitutes weighted mean characteristic parameter sequence.Be that registered speaker A and speaker X to be identified constitute sequence ω
11A1, ω
12A2,
ω
1nAN, ω
1xX0, wherein ω
11=ω
12=L=ω
1n=ω
1xAnd ω
11+ ω
12+ L+ ω
1n+ ω
1x=1; Registered speaker B and speaker X to be identified constitute sequence ω
21B
1, ω
22B2,
ω
2nBN, ω
2xX0, wherein ω
21=ω
22=L=ω
2n=ω
2xAnd ω
21+ ω
22+ L+ ω
2n+ ω
2x=1; By that analogy.
C), try to achieve all registered speakers' in speaker to be identified and the recognition system grey correlation weighted mean characteristic parameter sequence respectively, promptly registered speaker A and speaker X to be identified constitute new characteristic parameter sequence A Y=ω according to superposition principle
11A1+ ω
12A2+L+ ω
1nAN+ ω
1xX1, registered speaker B and speaker X to be identified constitute new characteristic parameter sequence B Y=ω
21B1+ ω
22B2+L+ ω 2
nBN+ ω
2xX1, by that analogy.
D), establish X={x
σ(t) | σ=0,1,2, K, m} are the serial correlation factor set, x
0Be reference function (female factor), x
iBe comparison function (sub-factor), x
σ(k) be x
σThe value of ordering at k, wherein, i=1,2, K, m, k=1,2, K, n.
For x
0, x
i, order
Obtain x
iFor x
0The grey degree of association
Wherein, 0<ε<1, λ
1, λ
2>=0, λ
1+ λ
2=1, constant ξ is a resolution ratio, λ
1, λ
2Be respectively displacement and rate of change weighting coefficient, in practical application, can suitably choose ξ, λ as the case may be
1, λ
2
In the present embodiment, get resolution coefficient ξ=0.9, displacement weighting coefficient λ
1=0.95, rate of change weighting coefficient λ
2=0.05.Calculate speaker to be identified and registered speaker's grey association degree value according to above-mentioned steps; Be the grey association degree value RA of registered speaker A and speaker X to be identified; The grey association degree value RB of registered speaker B and speaker X to be identified, by that analogy.
The 5th step: identification and matching and strategy choice
Speaker Identification coupling and tactful choice process are as shown in Figure 5, specific as follows among the present invention:
1, obtains grey association degree maximal value
In speaker to be identified and all registered grey association degree values of saying the people, extract grey association degree maximal value, i.e. R
Max=max{RA, RB, K}, wherein, RA is the grey association degree value of speaker X to be identified and registered speaker A, RB is the grey association degree value of speaker X to be identified and registered speaker B, by that analogy.
2, Speaker Identification coupling and strategy choice
With the grey association degree maximal value R that extracts
MaxWith grey association degree recognition threshold R
θRelatively, if R
Max>=R then matees successfully, has the registered speaker of maximum weighted grey relational grade value in the artificial recognition system of speaking promptly to be identified with it; Otherwise the coupling failure, people promptly to be identified is not registered speaker in the recognition system.Wherein, grey association degree recognition threshold R
θProvide by a large amount of experiment statistics analyses.
Present embodiment is gathered 14 speakers' (7 men, 7 woman) voice signal, and every speaker enrolls 10 sections different content of text, every section duration 28 seconds, and the content of text between each speaker is also different.In order to reduce to gather beginning and to conclude by saying the voice difference that words voice change of tune disorder brings, clipped every section voice signal head and the tail each 3 seconds, then every section voice signal duration is 22 seconds.On this basis,, carry out voice signal pre-service and characteristic parameter extraction, set up registered speaker's characteristic parameter reference library by above-mentioned embodiment to the optional respectively 5 sections voice signals of every speaker; Then appoint and get the voice signal of one section remainder, carry out voice signal pre-service and characteristic parameter extraction, obtain speaker's to be identified characteristic parameter sequence, and calculate the grey association degree by above-mentioned embodiment by above-mentioned embodiment; Extract maximum weighted grey relational grade value at last, compare, draw the Speaker Identification result with grey association degree recognition threshold.Represent above-mentioned speaker with A, B, C, D, E, F, G, H, I, J, K, L, M, N, the practical implementation step of present embodiment is detailed at present.
Extract one section voice signal of the speaker A that has gathered, this time domain signal is shown in accompanying drawing 6; By above-mentioned embodiment successively to its carry out sample quantization, zero-suppress float, pre-emphasis and windowing, obtain pretreated voice signal, the frame signal of its voice is shown in accompanying drawing 7; Then adopt the third-octave Spectral Analysis Method that pretreated voice signal is carried out feature extraction, obtain the third-octave frequency spectrum, shown in accompanying drawing 8, its characteristic parameter sequence of reentrying, as shown in table 1.
The characteristic parameter sequence of form 1 registered speaker A
According to above-mentioned steps, the other four sections voice signals to speaker A carry out feature extraction respectively, obtain its characteristic parameter sequence, make up all characteristic parameter sequences of speaker A again, set up the characteristic parameter reference library of speaker A, and are as shown in table 2.According to the step of the characteristic parameter reference library of setting up speaker A, set up the characteristic parameter reference library of speaker B, C, D, E, F, G, H, I, J, K, L, M, N more respectively successively.
The characteristic parameter reference library of form 2 registered speaker A
Appoint and get the remaining voice signal of one section speaker A,, carry out voice signal pre-service and characteristic parameter extraction successively, obtain people's to be identified characteristic parameter sequence according to above-mentioned implementation step.According to grey association degree algorithm provided by the present invention, ask for the grey association degree of speaker A to be identified and registered speaker A, B, C, D, E, F, G, H, I, J, K, L, M, N, its result is as shown in table 3.
Form | A | B | C | D | E | F | G |
A | 0.9528 | 0.8006 | 0.7440 | 0.8039 | 0.7995 | 0.8598 | 0.8016 |
H | I | J | K | L | M | N | |
A | 0.7903 | 0.8267 | 0.7804 | 0.8741 | 0.8057 | 0.8887 | 0.7945 |
Extract one section remaining voice signal of other speakers successively arbitrarily; Method of operating according to speaker A to be identified; Ask for speaker B to be identified, C, D, E, F, G, H, I, J, K, L, M, N and all registered speakers' grey association degree; Its result is as shown in table 4, the horizontal registered speaker of letter representation, vertically letter representation speaker to be identified in the table.
Grey association degree between form 4 all speakers to be identified and all registered speakers
A | B | C | D | E | F | G | |
A | 0.9528 | 0.8006 | 0.7440 | 0.8039 | 0.7995 | 0.8598 | 0.8016 |
B | 0.8295 | 0.9050 | 0.8281 | 0.8699 | 0.8693 | 0.8387 | 0.8967 |
C | 0.7306 | 0.8556 | 0.9628 | 0.8324 | 0.7968 | 0.7509 | 0.8407 |
D | 0.7935 | 0.8371 | 0.7769 | 0.8762 | 0.8421 | 0.8335 | 0.8324 |
E | 0.8214 | 0.8601 | 0.8119 | 0.8426 | 0.9645 | 0.8501 | 0.8921 |
F | 0.8659 | 0.8292 | 0.7851 | 0.8391 | 0.8647 | 0.9489 | 0.8447 |
G | 0.7940 | 0.9030 | 0.8868 | 0.8750 | 0.8899 | 0.8159 | 0.9324 |
H | 0.7799 | 0.7990 | 0.8216 | 0.7979 | 0.7488 | 0.7641 | 0.7857 |
I | 0.7949 | 0.8201 | 0.7710 | 0.8335 | 0.8437 | 0.8091 | 0.8178 |
J | 0.8086 | 0.7748 | 0.8327 | 0.8450 | 0.8106 | 0.8024 | 0.8251 |
K | 0.8710 | 0.7829 | 0.7517 | 0.8055 | 0.7924 | 0.8763 | 0.8041 |
L | 0.8142 | 0.8276 | 0.8629 | 0.8865 | 0.9038 | 0.8343 | 0.9274 |
M | 0.8958 | 0.8350 | 0.7777 | 0.8239 | 0.8207 | 0.8965 | 0.8273 |
N | 0.8103 | 0.8896 | 0.8593 | 0.8784 | 0.8838 | 0.8242 | 0.9081 |
H | I | J | K | L | M | N | |
A | 0.7903 | 0.8267 | 0.7804 | 0.8741 | 0.8057 | 0.8887 | 0.7945 |
B | 0.7761 | 0.8681 | 0.7816 | 0.8188 | 0.8749 | 0.8415 | 0.8675 |
C | 0.798 | 0.8425 | 0.8151 | 0.7278 | 0.8138 | 0.7466 | 0.8425 |
D | 0.7182 | 0.8530 | 0.7202 | 0.7804 | 0.8238 | 0.7953 | 0.8465 |
E | 0.7697 | 0.8717 | 0.7671 | 0.8049 | 0.9012 | 0.8349 | 0.8842 |
F | 0.7909 | 0.8717 | 0.7925 | 0.8900 | 0.8479 | 0.9072 | 0.8325 |
G | 0.8190 | 0.8892 | 0.8326 | 0.7916 | 0.9209 | 0.8058 | 0.9047 |
H | 0.9432 | 0.8127 | 0.8982 | 0.8106 | 0.7702 | 0.8063 | 0.7913 |
I | 0.7299 | 0.9198 | 0.7214 | 0.7715 | 0.8157 | 0.7775 | 0.8432 |
J | 0.8935 | 0.7634 | 0.9605 | 0.8445 | 0.8095 | 0.8514 | 0.8099 |
K | 0.8380 | 0.8286 | 0.8370 | 0.9502 | 0.8075 | 0.9011 | 0.7990 |
L | 0.8127 | 0.8667 | 0.8234 | 0.8117 | 0.9435 | 0.8227 | 0.9051 |
M | 0.8359 | 0.8318 | 0.8401 | 0.9094 | 0.8235 | 0.9565 | 0.815 |
N | 0.8053 | 0.8598 | 0.8058 | 0.805 | 0.8984 | 0.8158 | 0.9310 |
By the maximal value of grey association degree between above-mentioned embodiment all speakers to be identified of extraction and all registered speakers, see the numerical value of the overstriking in the table 4 for details.Through the analysis to a large amount of experimental results, the grey association degree recognition threshold that present embodiment is chosen Speaker Identification is 0.9.Acquired maximal value and its are compared, draw the Speaker Identification result, as shown in table 5.
Form 5 Speaker Identification results
The total number of persons of Speaker Identification (position) | 14 |
Grey association degree maximal value is greater than the number (position) of recognition threshold | 13 |
Accuracy with the Speaker Identification of text-independent | 92.86% |
Recognition result shown in the table 5; Show provided by the invention based on third-octave and grey association and method for distinguishing speek person text-independent; Adopt the third-octave Spectral Analysis Method that speaker's voice signal is carried out characteristic parameter extraction, carry out Speaker Identification, improved accuracy with the Speaker Identification of text-independent through grey association degree algorithm; Realized and the robustness of the Speaker Identification of text-independent, be with a wide range of applications.
More than to provided by the present invention a kind of based on third-octave and grey association and method for distinguishing speek person text-independent; Carried out detailed concrete introduction; And principle of the present invention and embodiment have further been set forth through concrete embodiment; The explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof, rather than its invention is limited, and is any in the protection domain of spirit of the present invention and claim; Any modification and change to the present invention makes all fall into protection scope of the present invention.
Claims (5)
1. the method for distinguishing speek person with text-independent is characterized in that, comprises the steps:
One, sets up N speaker's phonetic feature reference library, set grey association degree recognition threshold R
θDescribed N is the integer more than or equal to 1, and step is following:
A, gather the 1st section voice signal of the 1st speaker and successively sample quantization, zero-suppress float, pre-emphasis and windowing, obtain the 1-1 audio frame F after the windowing
m' (n);
B, to 1-1 audio frame F
m' (n) use the third-octave Spectral Analysis Method, obtain the 1-1 characteristic parameter, described characteristic parameter is the corresponding power spectral value sequence of each centre frequency frequency band of living in,
C, a N speaker carry out M A, B step successively, obtain N * M characteristic parameter successively, and described N characteristic parameter sequence forms the phonetic feature reference library;
Two, obtain N grey association degree, step is following:
I, gather speaker characteristic parameter X to be measured through steps A, B;
II, add to the sequence of characteristic parameter X in the reference library respectively; And the sequence of giving N characteristic parameter equably according to the timeinvariance of frequency-region signal is with identical weight coefficient; Reconfigure N weighted mean characteristic parameter sequence of formation, obtain N grey association degree value;
Three, identification and matching is extracted maximal value R in N the grey association degree value
MaxWith R
θRelatively, if R
Max>=R, then coupling not, does not then match.
2. the method for distinguishing speek person of a kind of and text-independent according to claim 1, it is characterized in that: the step of the feature extraction described in the step B is:
(A) signal time-frequency conversion: adopt base-2 Algorithm FFT conversion to convert the time-domain signal of speaker's voice into frequency-region signal, ask for the power spectrum of speaker's voice signal;
(B) confirm the centre frequency f of third-octave Spectral Analysis Method
c
(C) ask for the upper and lower limit frequency: the upper and lower limit frequency of third-octave and the relation between the centre frequency do
(D) sound pressure level conversion, promptly
P wherein
0Be reference acoustic pressure, its value is 2 * 10
-5Pa;
(E) calculate each centre frequency f
cThe mean value of the power spectrum of frequency band of living in: upper and lower limit frequency and centre frequency according to third-octave become a plurality of frequency bands with the frequency partition in the power spectrum; And in each frequency band, all power magnitude are pressed the logarithm stack; Obtain the third-octave frequency spectrum, its amplitude is characteristic parameter.
3. the method for distinguishing speek person of a kind of and text-independent according to claim 1 is characterized in that: the detailed step that the grey association degree described in the Step II calculates is:
(F) extract the characteristic parameter sequence: the sequence X 0 that obtains speaker characteristic parameter X to be identified; And extract each characteristic parameter sequences of all registered speaker's reference library; Be characteristic parameter sequence A 1, A2,
AN of registered speaker A; The characteristic parameter sequence B 1 of registered speaker B, B2,
BN, by that analogy;
(G) structure weighted mean characteristic parameter sequence: speaker's to be identified characteristic parameter sequence is added to respectively in the recognition system in all registered speaker's reference library; And give these characteristic parameter sequences equably with identical weight coefficient according to the timeinvariance of frequency-region signal, so that reconfiguring with registered speaker respectively, speaker to be identified constitutes weighted mean characteristic parameter sequence.Be that registered speaker A and speaker X to be identified constitute sequence ω
11A1, ω
12A2,
ω
1nAN, ω
1xX0, wherein ω
11=ω
12=L=ω
1n=ω
1xAnd ω
11+ ω
12+ L+ ω
1n+ ω
1x=1; Registered speaker B and speaker X to be identified constitute sequence ω
21B1, ω
22B2,
ω
2nBN, ω
2xX0, wherein ω
21=ω
22=L=ω
2n=ω
2xAnd ω
21+ ω
22+ L+ ω
2n+ ω
2x=1, by that analogy;
(H) add up and generate weighted mean grey correlation characteristic parameter sequence: try to achieve all registered speakers' in speaker to be identified and the recognition system weighted mean grey correlation characteristic parameter sequence, promptly registered speaker A and the new characteristic parameter sequence A Y=ω of speaker X to be identified formation respectively according to superposition principle
11A1+ ω
12A2+L+ ω
1nAN+ ω
1xX1, registered speaker B and speaker X to be identified constitute new characteristic parameter sequence B Y=ω
21B1+ ω
22B2+L+ ω
2nBN+ ω
2xX1, by that analogy;
(I) calculate the grey association degree: by grey association degree algorithm computation speaker to be identified and registered speaker's grey association degree; Be the grey association degree RA of registered speaker A and speaker X to be identified; The grey association degree RB of registered speaker B and speaker X to be identified; By that analogy, obtain N grey association degree R.
4. the method for distinguishing speek person of a kind of and text-independent according to claim 2, it is characterized in that: definite method of the centre frequency of described third-octave Spectral Analysis Method is:
The centre frequency of third-octave is f
c=1000 * 10
3n/30HZ (n=0, ± 1, ± 2, K);
Choose the approximate value of centre frequency, promptly selected centre frequency is: 20HZ, 25HZ, 31.5HZ, 40HZ, 50HZ, 63HZ, 80HZ; 100HZ, 125HZ, 160HZ, 200HZ, 250HZ, 315HZ, 400HZ, 500HZ; 630HZ, 800HZ, 1000HZ, 1350HZ, 1600HZ, 2000HZ, 2500HZ, 3150HZ; 4000HZ, 5000HZ, 6300HZ, 8000HZ, 10000HZ, 12500HZ, 16000HZ.
5. the method for distinguishing speek person of a kind of and text-independent according to claim 3, it is characterized in that: the algorithm of described grey association degree is:
If X={x
σ(t) | σ=0,1,2, K, m} are the serial correlation factor set, i.e. reference library, x
0Be reference function (female factor), i.e. one of them registered speaker;
Xi is comparison function (a sub-factor), i.e. speaker's to be measured characteristic factor X, x
σ(k) be x
σThe value of ordering at k, wherein, i=1,2, K, m, k=1,2, K, n.
For x
0, x
i, order
X then
iFor x
0The grey degree of association do
Wherein, 0<ε<1, λ
1, λ
2>=0, λ
1+ λ
2=1, constant ξ is a resolution ratio, λ
1, λ
2Be respectively displacement and rate of change weighting coefficient, in practical application, can suitably choose ξ, λ as the case may be
1, λ
2
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110428379.2A CN102496366B (en) | 2011-12-20 | 2011-12-20 | Speaker identification method irrelevant with text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110428379.2A CN102496366B (en) | 2011-12-20 | 2011-12-20 | Speaker identification method irrelevant with text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102496366A true CN102496366A (en) | 2012-06-13 |
CN102496366B CN102496366B (en) | 2014-04-09 |
Family
ID=46188183
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110428379.2A Expired - Fee Related CN102496366B (en) | 2011-12-20 | 2011-12-20 | Speaker identification method irrelevant with text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102496366B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104167208A (en) * | 2014-08-08 | 2014-11-26 | 中国科学院深圳先进技术研究院 | Speaker recognition method and device |
CN105244031A (en) * | 2015-10-26 | 2016-01-13 | 北京锐安科技有限公司 | Speaker identification method and device |
CN106328168A (en) * | 2016-08-30 | 2017-01-11 | 成都普创通信技术股份有限公司 | Voice signal similarity detection method |
CN108154189A (en) * | 2018-01-10 | 2018-06-12 | 重庆邮电大学 | Grey relational cluster method based on LDTW distances |
CN109065026A (en) * | 2018-09-14 | 2018-12-21 | 海信集团有限公司 | A kind of recording control method and device |
CN112885355A (en) * | 2021-01-25 | 2021-06-01 | 上海头趣科技有限公司 | Speech recognition method based on multiple features |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1986005618A1 (en) * | 1985-03-21 | 1986-09-25 | American Telephone & Telegraph Company | Individual recognition by voice analysis |
US5548647A (en) * | 1987-04-03 | 1996-08-20 | Texas Instruments Incorporated | Fixed text speaker verification method and apparatus |
US5950157A (en) * | 1997-02-28 | 1999-09-07 | Sri International | Method for establishing handset-dependent normalizing models for speaker recognition |
CN1941080A (en) * | 2005-09-26 | 2007-04-04 | 吴田平 | Soundwave discriminating unlocking module and unlocking method for interactive device at gate of building |
CN101266792A (en) * | 2007-03-16 | 2008-09-17 | 富士通株式会社 | Speech recognition system and method for speech recognition |
CN101405739A (en) * | 2002-12-26 | 2009-04-08 | 摩托罗拉公司(在特拉华州注册的公司) | Identification apparatus and method |
-
2011
- 2011-12-20 CN CN201110428379.2A patent/CN102496366B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1986005618A1 (en) * | 1985-03-21 | 1986-09-25 | American Telephone & Telegraph Company | Individual recognition by voice analysis |
US5548647A (en) * | 1987-04-03 | 1996-08-20 | Texas Instruments Incorporated | Fixed text speaker verification method and apparatus |
US5950157A (en) * | 1997-02-28 | 1999-09-07 | Sri International | Method for establishing handset-dependent normalizing models for speaker recognition |
CN101405739A (en) * | 2002-12-26 | 2009-04-08 | 摩托罗拉公司(在特拉华州注册的公司) | Identification apparatus and method |
CN1941080A (en) * | 2005-09-26 | 2007-04-04 | 吴田平 | Soundwave discriminating unlocking module and unlocking method for interactive device at gate of building |
CN101266792A (en) * | 2007-03-16 | 2008-09-17 | 富士通株式会社 | Speech recognition system and method for speech recognition |
Non-Patent Citations (3)
Title |
---|
《声学技术》 20021231 王宏等 基于长时平均频谱的"文本无关"话者识别 59-62 1-5 第21卷, 第2期 * |
曾毓敏等: "基于浊音语音谐波谱子带加权重建的抗噪声说话人识别", 《东南大学学报(自然科学版)》, vol. 38, no. 06, 30 November 2008 (2008-11-30), pages 925 - 941 * |
王宏等: "基于长时平均频谱的"文本无关"话者识别", 《声学技术》, vol. 21, no. 2, 31 December 2002 (2002-12-31), pages 59 - 62 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104167208A (en) * | 2014-08-08 | 2014-11-26 | 中国科学院深圳先进技术研究院 | Speaker recognition method and device |
CN104167208B (en) * | 2014-08-08 | 2017-09-15 | 中国科学院深圳先进技术研究院 | A kind of method for distinguishing speek person and device |
CN105244031A (en) * | 2015-10-26 | 2016-01-13 | 北京锐安科技有限公司 | Speaker identification method and device |
CN106328168A (en) * | 2016-08-30 | 2017-01-11 | 成都普创通信技术股份有限公司 | Voice signal similarity detection method |
CN108154189A (en) * | 2018-01-10 | 2018-06-12 | 重庆邮电大学 | Grey relational cluster method based on LDTW distances |
CN109065026A (en) * | 2018-09-14 | 2018-12-21 | 海信集团有限公司 | A kind of recording control method and device |
CN112885355A (en) * | 2021-01-25 | 2021-06-01 | 上海头趣科技有限公司 | Speech recognition method based on multiple features |
Also Published As
Publication number | Publication date |
---|---|
CN102496366B (en) | 2014-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102893326B (en) | Chinese voice emotion extraction and modeling method combining emotion points | |
CN101178897B (en) | Speaking man recognizing method using base frequency envelope to eliminate emotion voice | |
CN102496366A (en) | Speaker identification method irrelevant with text | |
CN102509547A (en) | Method and system for voiceprint recognition based on vector quantization based | |
CN101226743A (en) | Method for recognizing speaker based on conversion of neutral and affection sound-groove model | |
CN106024010B (en) | A kind of voice signal dynamic feature extraction method based on formant curve | |
CN102968990A (en) | Speaker identifying method and system | |
CN104887263A (en) | Identity recognition algorithm based on heart sound multi-dimension feature extraction and system thereof | |
CN106531174A (en) | Animal sound recognition method based on wavelet packet decomposition and spectrogram features | |
Waghmare et al. | Emotion recognition system from artificial marathi speech using MFCC and LDA techniques | |
CN103456302A (en) | Emotion speaker recognition method based on emotion GMM model weight synthesis | |
CN109272986A (en) | A kind of dog sound sensibility classification method based on artificial neural network | |
Abdallah et al. | Text-independent speaker identification using hidden Markov model | |
Linh et al. | MFCC-DTW algorithm for speech recognition in an intelligent wheelchair | |
Fagerlund et al. | New parametric representations of bird sounds for automatic classification | |
Kumar et al. | Hybrid of wavelet and MFCC features for speaker verification | |
Martin et al. | Cepstral modulation ratio regression (CMRARE) parameters for audio signal analysis and classification | |
Kumar et al. | Text dependent speaker identification in noisy environment | |
CN111785262B (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
Aggarwal et al. | Performance evaluation of artificial neural networks for isolated Hindi digit recognition with LPC and MFCC | |
Islam et al. | A Novel Approach for Text-Independent Speaker Identification Using Artificial Neural Network | |
Bansod et al. | Speaker Recognition using Marathi (Varhadi) Language | |
GS et al. | Synthetic speech classification using bidirectional LSTM Networks | |
Abdulwahid et al. | Arabic Speaker Identification System for Forensic Authentication Using K-NN Algorithm | |
Dua et al. | Speaker recognition using noise robust features and LSTM-RNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20140409 Termination date: 20161220 |
|
CF01 | Termination of patent right due to non-payment of annual fee |