CN102592593A - Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech - Google Patents
Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech Download PDFInfo
- Publication number
- CN102592593A CN102592593A CN2012100915251A CN201210091525A CN102592593A CN 102592593 A CN102592593 A CN 102592593A CN 2012100915251 A CN2012100915251 A CN 2012100915251A CN 201210091525 A CN201210091525 A CN 201210091525A CN 102592593 A CN102592593 A CN 102592593A
- Authority
- CN
- China
- Prior art keywords
- overbar
- rank
- characteristic
- matrix
- tensor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Machine Translation (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses an emotional-characteristic extraction method implemented through considering the sparsity of a multilinear group in a speech. The method comprising the following steps: considering multiple factors such as time, frequency, scale and direction information included in a speech signal; carrying out characteristic extraction by using a sparse decomposition method for multilinear groups; carrying out multilinear characterization on an energy spectrum of the speech signal through Gabor functions with different scales and the directions; solving a characteristic projection matrix by using a sparse tensor decomposition method for groups; calculating a characteristic projection with a frequency order; carrying out characteristic decorrelation through discrete cosine transform; and finally, calculating first-order and second-order differential coefficients so as to obtain the emotional characteristics of the speed. According to the invention, the factors such as time, frequency, scale and direction and the like in a speech signal is taken into consideration and used for extracting emotional characteristics, and the characteristic projection is performed by using a sparse tensor decomposition method for groups, thereby finally improving the accuracy rate of various speech emotion recognitions.
Description
Technical field
The present invention relates to a kind of voice mood feature extracting method that is used to improve the voice mood recognition performance, belong to the voice process technology field.
Background technology
Voice are one of convenient mode of exchanging in daily life of people, and this makes also the researchist tries to explore how to utilize voice as the instrument that exchanges between people and the machine.Except traditional interactive modes such as speech recognition, speaker's mood also is a kind of important interactive information, and it is one of important symbol of human-computer interaction intelligentization that machine can be discerned the mood of understanding the speaker automatically.
Voice mood is identified in signal Processing and the intelligent human-machine interaction field has significant values, and a lot of potential application are arranged.Aspect man-machine interaction, can improve the cordiality and the accuracy of system by computer Recognition speaker's mood, for example long-distance educational system can in time be adjusted course by identification student's mood, thereby promotes teaching efficiency; In telephone contact center and mobile communication, can in time obtain user's emotional information, improve the quality of service; Whether onboard system can be concentrated by mood identification detection driver's energy, and makes corresponding auxiliary caution.Aspect medical science, voice-based mood identification can be used as a kind of instrument, helps the doctor that patient's the state of an illness is diagnosed.
For voice mood identification, an important problem is exactly how to extract effective characteristic to be used for representing different moods.According to traditional feature extracting method, can one section voice signal be divided into multiframe usually, so that obtain approximate signal stably.The periodic feature that obtains from each frame is called local feature; For example fundamental tone, energy etc.; Its advantage is that existing sorter can utilize local feature to estimate the parameter of different emotional states comparatively accurately; Shortcoming is that intrinsic dimensionality and sample number are more, has influence on the speed of feature extraction and classification.Obtain characteristic and be called global characteristics through the characteristic of whole sentence is added up, its advantage is to obtain nicety of grading and speed preferably, but has lost the time sequence information of voice signal, occurs the problem of lack of training samples easily.Generally speaking, voice mood identification characteristic commonly used has following several types: continuously acoustic feature, spectrum signature, based on characteristic of Teager energy operator or the like.
According to the result of study of psychology and metrics etc., speaker's mood in voice the most intuitively characteristic be exactly the continuous characteristic of the rhythm, like fundamental tone, energy, the speed of speaking etc.Corresponding global characteristics comprises the average, median, standard deviation, maximal value, minimum value of fundamental tone or energy etc., and first, second resonance peak or the like.
Spectrum signature provides the useful frequency information in the voice signal, also is important feature extraction mode in the voice mood identification.Spectrum signature commonly used comprises linear predictor coefficient (LPC), linear prediction cepstrum coefficient (LPCC), Mei Er frequency cepstral coefficient (MFCC), perceptual weighting linear prediction (PLP) or the like.
Voice are to be produced by the non-linear airflow in the sonification system, and Teager energy operator (TEO) is that a kind of that people such as Teager proposes can follow the tracks of the arithmetic operation that signal energy changes in the glottis cycle fast, is used for the fine structure of analyzing speech.Under the different emotional states, the flexible situation of muscle can influence the motion of sonification system hollow air-flow, can know according to people's such as Bou-Ghazale result of study, can be used for detecting the intense strain in the voice based on the characteristic of TEO.
According to numerous experimental evaluation results, for voice mood identification, select suitable feature to characterize to different classification task, be suitable for detecting the intense strain in the voice signal based on the characteristic of Teager energy; The then suitable height of distinguishing of acoustic feature wakes mood (high-arousal emotion) and the low mood (low-arousal emotion) of waking up up continuously; And for the mood classification task of multiclass, the voice that spectrum signature is best suited for characterize, if spectrum signature is combined with acoustic feature continuously, perhaps consider the association analysis of multiple factor, also can reach the purpose of raising nicety of grading.
Classify exactly at voice mood feature extraction and the another one important stage after selecting completion.Various sorters all are used to the voice mood characteristic is classified in the area of pattern recognition at present, comprise HMM (HMM), gauss hybrid models (GMM), SVMs (SVM), linear discriminant analysis (LDA) and integrated classifier or the like.HMM is one of recognizer the most widely of in voice mood identification, using; This has benefited from its widespread usage in voice signal; Be particularly useful for handling data with sequential organization; From present result of study, higher classification accuracy can be provided based on the mood recognition system of HMM.Gauss hybrid models can be regarded as the HMM that has only a state, is very suitable for modeling is carried out in polynary distribution, and people such as Breazeal utilize GMM to be applied to the KISMET speech database as sorter, and five types of moods are carried out Classification and Identification.The SVMs area of pattern recognition that has been widely used; Its ultimate principle is through kernel function characteristic to be projected to higher dimensional space to make the characteristic linear separability; Compare HMM and GMM; It has the advantage that training algorithm global optimum and existence depend on the extensive border of data, and many results of study are to utilize SVMs as the sorter of voice mood identification and obtained classifying quality preferably.
As shown in Figure 1, following steps are adopted in traditional voice mood recognition methods based on spectrum signature usually:
1) voice signal to input carries out pre-service, comprises windowing, filtering, pre-emphasis etc.;
2) signal is carried out short time discrete Fourier transform, carry out filtering, ask logarithmic spectrum (getting log) then through the Mei Er quarter window;
3) utilize discrete cosine transform to calculate cepstrum, weighting then asks cepstral mean to subtract, and calculates difference;
4) utilize gauss hybrid models (GMM) to train, obtain the model of different moods;
5) mood model that obtains through training is discerned test data, obtains recognition accuracy.
To two types of mood classification,, reached nicety of grading relatively preferably at present like negative emotions and neutral mood; But classification for the multiclass mood; Because the unbalancedness of data is only considered single factors reasons such as (frequency or times), makes that the characteristic property distinguished is relatively poor; The mood nicety of grading is relatively low, and this makes voice-based mood recognition system use and is restricted.
Summary of the invention
Single factors is only considered in feature extraction in the identification of traditional voice mood; Like frequency or time; Make the problem that the characteristic property distinguished is relatively poor, the present invention propose a kind ofly to consider polyteny group sparse characteristic in the voice, be used for voice mood identification and can improve the voice mood feature extracting method of multiclass mood recognition accuracy.
The emotional characteristics method for distilling of polyteny group sparse characteristic in the consideration voice of the present invention is:
Consider to comprise in the voice signal the multiple factor of time, frequency, yardstick and directional information; Utilize the method for polyteny group Sparse Decomposition to carry out feature extraction; Gabor function through different scale and direction carries out the polyteny sign to the speech signal energy spectrum, utilizes the sparse tensor decomposition method of group to find the solution the characteristic projection matrix, the characteristic projection on the calculated rate rank; To the characteristic decorrelation, obtain the single order and the second order difference coefficient of characteristic through discrete cosine transform through difference; Specifically may further comprise the steps:
(1) gather voice signal s (t) (through equipment collections such as microphones), utilize Short Time Fourier Transform that s (t) is transformed to time-frequency domain, obtain signal time-frequency representation S (f, t) with energy spectrum P (f, t);
(2) utilize the two-dimensional Gabor function with different scale and direction that energy spectrum is carried out convolutional filtering, the Gabor function definition is following:
Wherein:
Be that (f is the element of f in t frame, frequency t) to energy spectrum P;
Be the yardstick of control function and the vector of direction, j representes imaginary part unit, k
v=2
-(v+2)/2π, and φ=u (π/K), the direction of u representative function, the yardstick of v representative function, K are represented total direction number, σ is a constant of confirming the function envelope, is made as 2 π.
(f, t) result of convolutional filtering is the polyteny sign of voice signal to the Gabor function to energy spectrum P
Here
Be that a size does
5 rank tensors, each rank is express time, frequency, direction, yardstick and classification respectively, and is right then
The frequency rank carry out the filtering of Mei Er quarter window and obtain 5 new rank tensors
P,
PSize be N
1* N
2* N
3* N
4* N
5, the length on each rank is N
i, i=1, L 5;
(3) polyteny that obtains is characterized
PCarry out the sparse tensor of group and decompose, calculate the projection matrix U on the different factors
(i), i=1, L 5, so that carry out the characteristic projection, set up following decomposition model:
P≈
Λ×
1U
(1)×
2U
(2)×
3U
(3)×
4U
(4)×
5U
(5)
Wherein, U
(i)Be that the size that decomposition obtains afterwards is N
iThe projection matrix of * K;
ΛBe that diagonal element is 15 rank tensors, size is K * K * K * K * K; *
iExpression tensor i rank matrix multiplication, it defines as follows:
Wherein
XRepresent that a size is N
1* L * N
MM rank tensor, A is that a size is N
iThe matrix of * K,
It is tensor
XElement,
It is the element of matrix A;
Calculate projection matrix U
(i), i=1, the concrete decomposable process of L I is following, and i representes the index of rank (corresponding different factors) here, I=5:
1. adopt alternately lowest mean square or random initializtion U
(i)>=0, i=1, L, I;
2. to projection matrix U
(i), i=1, each column vector of L I
I=1, L, I, k=1, L, K carries out normalization;
3. error objective function
During greater than certain threshold value, operation below circulation is carried out:
● from n=1 to I, carry out successively:
Wherein, || ||
FExpression Frobenius norm,
It is tensor
P (k)I rank tensor matrixes launch,
E is that the Khatri-Rao of matrix is long-pending, λ
kAnd q
iBe the weight coefficient that is used to regulate objective function composition degree of rarefication, get the numerical value between 0 to 1;
4. work as objective function
EDuring less than certain threshold value, loop ends calculates projection matrix U
(i), i=1, L I;
(4) utilize the projection matrix U that obtains corresponding to frequency domain
(2)Polyteny to voice signal characterizes
PCarry out the characteristic projection:
Wherein,
Be projection matrix U
(2)The matrix that the nonzero element of pseudoinverse is formed, *
2Representing matrix
With
PCarrying out 2 rank matrixes of tensor takes advantage of;
(5) the time rank are fixed, the sparse sign of polyteny that obtains
SCarry out tensor and launch operation, obtain size and do
Eigenmatrix S
(f), wherein
(6) utilize discrete cosine transform to S
(f)Carry out decorrelation, obtain voice mood characteristic F, the single order of calculated characteristics and second order difference coefficient obtain final emotional characteristics.
The present invention considers that the factors such as time, frequency, yardstick and direction in the voice signal are used for the feature extraction of mood, utilizes the sparse tensor decomposition method of group to carry out the characteristic projection, has finally improved the accuracy rate of multiclass voice mood identification.
Description of drawings
Fig. 1 is the schematic block diagram of traditional voice mood identifying;
Fig. 2 is the synoptic diagram of feature extracting method of the present invention;
Fig. 3 is the schematic block diagram that adopts voice mood identifying of the present invention.
Fig. 4 is the experimental result comparison diagram to four types of voice mood identifications.
Embodiment
As shown in Figure 2, the voice mood recognition methods based on polyteny group sparse features of the present invention specifically may further comprise the steps:
(1) collect voice signal s (t) through equipment such as microphones, utilize Short Time Fourier Transform that s (t) is transformed to time-frequency domain, obtain signal time-frequency representation S (f, t) with energy spectrum P (f, t);
(2) utilize the two-dimensional Gabor function with different scale and direction that energy spectrum is carried out convolutional filtering, the polyteny that obtains voice signal characterizes
Right then
The frequency rank carry out the filtering of Mei Er quarter window and obtain characterizing
P
The Gabor function definition is following:
Wherein:
Be that (f is the element of f in t frame, frequency t) to energy spectrum P;
Be the yardstick of control function and the vector of direction, j representes imaginary part unit, k
v=2
-(v+2)/2π, and φ=u (π/K), the direction of u representative function, the yardstick of v representative function, K are represented total direction number, σ is a constant of confirming the function envelope, is made as 2 π.
(f, t) result of convolutional filtering is the polyteny sign of voice signal to the Gabor function to energy spectrum P
Here
Be that a size does
5 rank tensors, each rank is express time, frequency, direction, yardstick and classification respectively, and is right then
The frequency rank carry out the filtering of Mei Er quarter window and obtain 5 new rank tensors
P,
PSize be N
1* N
2* N
3* N
4* N
5, the length on each rank is N
i, i=1, L 5;
(3) to characterizing
PCarry out the sparse tensor of group and decompose, calculate the projection matrix U on the different factors
(i), i=1, L 5, so that carry out the characteristic projection.Set up following decomposition model:
P≈Λ×
1U
(1)×
2U
(2)×
3U
(3)×
4U
(4)×
5U
(5)
Wherein, U
(i)Be that the size that decomposition obtains afterwards is N
iThe projection matrix of * K;
ΛBe that diagonal element is 15 rank tensors, size is K * K * K * K * K; *
iExpression tensor i rank matrix multiplication, it defines as follows:
Wherein
XRepresent that a size is N
1* L * N
MM rank tensor, A is that a size is N
iThe matrix of * K,
It is tensor
XElement,
It is the element of matrix A.
For calculating projection matrix U
(i), i=1, L I, I=5 here, concrete decomposable process is following:
A) adopt alternately lowest mean square or random initializtion U
(i)>=0, i=1, L, I;
B) to projection matrix U
(i), i=1, each column vector of L I
I=1, L, I, k=1, L, K carries out normalization;
C) error objective function
During greater than certain threshold value, operation below circulation is carried out:
● from n=1 to I, carry out successively
Wherein, || ||
FExpression Frobenius norm,
It is tensor
P (k)I rank tensor matrixes launch,
E is that the Khatri-Rao of matrix is long-pending, λ
kAnd q
iBe the weight coefficient that is used to regulate objective function composition degree of rarefication, get the numerical value between 0 to 1;
D) work as objective function
EDuring less than certain threshold value, loop ends calculates projection matrix U
(i), i=1, L I;
(4) utilize the projection matrix U that obtains corresponding to frequency domain
(2)Polyteny to voice signal characterizes
PCarry out the characteristic projection:
Wherein,
Be projection matrix U
(2)The matrix that the nonzero element of pseudoinverse is formed, *
2Representing matrix
With
PCarrying out 2 rank matrixes of tensor takes advantage of;
(5) the time rank are fixed, the sparse sign of polyteny that obtains
SCarry out tensor and launch operation, obtain size and do
Eigenmatrix S
(f), wherein
(6) utilize discrete cosine transform to S
(f)Carry out decorrelation, obtain voice mood characteristic F, the single order of calculated characteristics and second order difference coefficient obtain final emotional characteristics.
As shown in Figure 3, adopt above-mentioned feature extracting method to carry out the process of voice mood identification, may further comprise the steps:
1) obtains the voice signal data s that has different mood labels
l(t), l=1, L, L, the different moods of total J class;
2) utilize the feature extracting method shown in Fig. 2 to extract the characteristic F of different moods
l
3) utilize mixed Gaussian mixture model (GMM) that different emotional characteristicses are carried out modeling,, obtain the pairing mood model M of mood of l class through learning training
l
4) when the voice signal of given unknown type of emotion
When testing, the mood model M that utilizes GMM to set up
l, l=1, L, L tests the calculating maximum posteriori probability successively, obtains the mood classification of maximum probability, promptly is the mood recognition result of this voice signal.
Effect of the present invention can further specify through experiment.
The recognition performance of the feature extracting method of the present invention's proposition has been tested in experiment on FAU Aibo data set, (Neutral Rest) discerns for Anger, Emphatic to 4 types of moods.The sampling rate of this experiment voice signal is 8kHz, adopts Hamming window to carry out windowing, and the 23ms window is long; The 10ms window moves; Utilize the energy spectrum of Short Time Fourier Transform signal calculated, have 4 different yardsticks and 4 different directions Gabor functions carry out the time-frequency convolutional filtering to energy spectrum, adopting size is 36 Mel bank of filters calculating Mei Er power spectrum; Utilize projection matrix on the frequency domain rank, to carry out the characteristic projection, utilize DCT that characteristic is carried out decorrelation.
Fig. 4 has compared the method for the present invention's proposition and the recognition performance of existing Feature Extraction Technology (MFCC and LFPC characteristic) compares; Visible by final recognition accuracy; After adopting the present invention; The accuracy rate of multiclass voice mood identification effectively improves, and MFCC has improved 6.1% than classic method, has improved 5.8% than the LFPC method.
Claims (2)
1. voice mood feature extracting method of considering polyteny group sparse features in the voice is characterized in that:
Consider to comprise in the voice signal the multiple factor of time, frequency, yardstick and directional information; Utilize the method for polyteny group Sparse Decomposition to carry out feature extraction, through the Gabor function of different scale and direction speech signal energy is composed and carry out the polyteny sign, utilize the sparse tensor decomposition method of group to find the solution the characteristic projection matrix; Characteristic projection on the calculated rate rank; To the characteristic decorrelation, the single order of calculated characteristics and second order difference coefficient specifically may further comprise the steps through discrete cosine transform:
(1) gather voice signal s (t), utilize Short Time Fourier Transform that s (t) is transformed to time-frequency domain, obtain signal time-frequency representation S (f, t) with energy spectrum P (f, t);
(2) utilize the two-dimensional Gabor function with different scale and direction that energy spectrum is carried out convolutional filtering, the Gabor function definition is following:
Wherein:
Be that (f is the element of f in t frame, frequency t) to energy spectrum P;
Be the yardstick of control function and the vector of direction, j representes imaginary part unit, k
v=2
-(v+2)/2π, and φ=u (π/K), the direction of u representative function, the yardstick of v representative function, K are represented total direction number, σ is a constant of confirming the function envelope, is made as 2 π;
(f, t) result of convolutional filtering is the polyteny sign of voice signal to the Gabor function to energy spectrum P
Here
Be that a size does
5 rank tensors, each rank is express time, frequency, direction, yardstick and classification respectively, and is right then
The frequency rank carry out the filtering of Mei Er quarter window and obtain 5 new rank tensors
P, its size is N
1* N
2* N
3* N
4* N
5, the length on each rank is N
i, i=1, L 5;
(3) polyteny that obtains is characterized
PCarry out the sparse tensor of group and decompose, calculate the projection matrix U on the different factors
(i), i=1, L 5, so that carry out the characteristic projection, set up following decomposition model:
P≈
Λ×
1U
(1)×
2U
(2)×
3U
(3)×
4U
(4)×
5U
(5)
Wherein, U
(i)Be that the size that decomposition obtains afterwards is N
iThe projection matrix of * K,,
ΛBe that diagonal element is 15 rank tensors, size is K * K * K * K * K, *
iExpression tensor i rank matrix multiplication, it defines as follows:
Wherein
XRepresent that a size is N
1* L * N
MM rank tensor, A is that a size is N
iThe matrix of * K,
It is tensor
XElement,
It is the element of matrix A;
(4) utilize the projection matrix U that obtains corresponding to frequency domain
(2)Polyteny to voice signal characterizes
PCarry out the characteristic projection:
Wherein,
Be projection matrix U
(2)The matrix that the nonzero element of pseudoinverse is formed, *
2Representing matrix
With
PCarrying out 2 rank matrixes of tensor takes advantage of;
(5) the time rank are fixed, the sparse sign of polyteny that obtains
SCarry out tensor and launch operation, obtain size and do
Eigenmatrix S
(f), wherein
(6) utilize discrete cosine transform to S
(f)Carry out decorrelation, obtain voice mood characteristic F, the single order of calculated characteristics and second order difference coefficient obtain final emotional characteristics.
2. the voice mood feature extracting method based on polyteny group sparse features according to claim 1 is characterized in that: said calculating projection matrix U
(i), i=1, the concrete decomposable process of L I is following, and i representes the index of rank (corresponding different factors) here, I=5:
1. adopt alternately lowest mean square or random initializtion U
(i)>=0, i=1, L, I;
2. to projection matrix U
(i), i=1, each column vector of L I
I=1, L, I, k=1, L, K carries out normalization;
3. error objective function
During greater than certain threshold value, operation below circulation is carried out:
● from n=1 to I, carry out successively:
Wherein, || ||
FExpression Frobenius norm,
It is tensor
P (k)I rank tensor matrixes launch,
E is that the Khatri-Rao of matrix is long-pending, λ
kAnd q
iBe the weight coefficient that is used to regulate objective function composition degree of rarefication, get the numerical value between 0 to 1;
4. work as objective function
EDuring less than certain threshold value, loop ends calculates projection matrix U
(i), i=1, L I.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210091525.1A CN102592593B (en) | 2012-03-31 | 2012-03-31 | Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210091525.1A CN102592593B (en) | 2012-03-31 | 2012-03-31 | Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102592593A true CN102592593A (en) | 2012-07-18 |
CN102592593B CN102592593B (en) | 2014-01-01 |
Family
ID=46481134
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210091525.1A Expired - Fee Related CN102592593B (en) | 2012-03-31 | 2012-03-31 | Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102592593B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102833918A (en) * | 2012-08-30 | 2012-12-19 | 四川长虹电器股份有限公司 | Emotional recognition-based intelligent illumination interactive method |
CN103245376A (en) * | 2013-04-10 | 2013-08-14 | 中国科学院上海微系统与信息技术研究所 | Weak signal target detection method |
CN103531199A (en) * | 2013-10-11 | 2014-01-22 | 福州大学 | Ecological sound identification method on basis of rapid sparse decomposition and deep learning |
CN103531206A (en) * | 2013-09-30 | 2014-01-22 | 华南理工大学 | Voice affective characteristic extraction method capable of combining local information and global information |
CN103825678A (en) * | 2014-03-06 | 2014-05-28 | 重庆邮电大学 | Three-dimensional multi-user multi-input and multi-output (3D MU-MIMO) precoding method based on Khatri-Rao product |
CN105047194A (en) * | 2015-07-28 | 2015-11-11 | 东南大学 | Self-learning spectrogram feature extraction method for speech emotion recognition |
CN107886942A (en) * | 2017-10-31 | 2018-04-06 | 东南大学 | A kind of voice signal emotion identification method returned based on local punishment random spectrum |
CN109060371A (en) * | 2018-07-04 | 2018-12-21 | 深圳万发创新进出口贸易有限公司 | A kind of auto parts and components abnormal sound detection device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020135618A1 (en) * | 2001-02-05 | 2002-09-26 | International Business Machines Corporation | System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input |
CN101030316A (en) * | 2007-04-17 | 2007-09-05 | 北京中星微电子有限公司 | Safety driving monitoring system and method for vehicle |
CN101404060A (en) * | 2008-11-10 | 2009-04-08 | 北京航空航天大学 | Human face recognition method based on visible light and near-infrared Gabor information amalgamation |
US20110034176A1 (en) * | 2009-05-01 | 2011-02-10 | Lord John D | Methods and Systems for Content Processing |
-
2012
- 2012-03-31 CN CN201210091525.1A patent/CN102592593B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020135618A1 (en) * | 2001-02-05 | 2002-09-26 | International Business Machines Corporation | System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input |
CN101030316A (en) * | 2007-04-17 | 2007-09-05 | 北京中星微电子有限公司 | Safety driving monitoring system and method for vehicle |
CN101404060A (en) * | 2008-11-10 | 2009-04-08 | 北京航空航天大学 | Human face recognition method based on visible light and near-infrared Gabor information amalgamation |
US20110034176A1 (en) * | 2009-05-01 | 2011-02-10 | Lord John D | Methods and Systems for Content Processing |
Non-Patent Citations (3)
Title |
---|
DAHMANE, MOHAMED; MEUNIER, JEAN: "Continuous Emotion Recognition Using Gabor Energy Filters", 《4TH BI-ANNUAL INTERNATIONAL CONFERENCE OF THE HUMAINE ASSOCIATION ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION》, 31 December 2011 (2011-12-31) * |
MORALES-PEREZ,M. ET AL: "Feature extraction of speech signals in emotion identification", 《30TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE-ENGINEERING-IN-MEDICINE-AND-BIOLOGY-SOCIETY》, 31 December 2008 (2008-12-31) * |
TU, BINBIN; YU, FENGQIN: "Bimodal Emotion Recognition Based on Speech Signals and Facial Expression", 《6TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING》, 31 December 2011 (2011-12-31) * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102833918B (en) * | 2012-08-30 | 2015-07-15 | 四川长虹电器股份有限公司 | Emotional recognition-based intelligent illumination interactive method |
CN102833918A (en) * | 2012-08-30 | 2012-12-19 | 四川长虹电器股份有限公司 | Emotional recognition-based intelligent illumination interactive method |
CN103245376A (en) * | 2013-04-10 | 2013-08-14 | 中国科学院上海微系统与信息技术研究所 | Weak signal target detection method |
CN103245376B (en) * | 2013-04-10 | 2016-01-20 | 中国科学院上海微系统与信息技术研究所 | A kind of weak signal target detection method |
CN103531206A (en) * | 2013-09-30 | 2014-01-22 | 华南理工大学 | Voice affective characteristic extraction method capable of combining local information and global information |
CN103531206B (en) * | 2013-09-30 | 2017-09-29 | 华南理工大学 | A kind of local speech emotional characteristic extraction method with global information of combination |
CN103531199B (en) * | 2013-10-11 | 2016-03-09 | 福州大学 | Based on the ecological that rapid sparse decomposition and the degree of depth learn |
CN103531199A (en) * | 2013-10-11 | 2014-01-22 | 福州大学 | Ecological sound identification method on basis of rapid sparse decomposition and deep learning |
CN103825678B (en) * | 2014-03-06 | 2017-03-08 | 重庆邮电大学 | A kind of method for precoding amassing 3D MU MIMO based on Khatri Rao |
CN103825678A (en) * | 2014-03-06 | 2014-05-28 | 重庆邮电大学 | Three-dimensional multi-user multi-input and multi-output (3D MU-MIMO) precoding method based on Khatri-Rao product |
CN105047194A (en) * | 2015-07-28 | 2015-11-11 | 东南大学 | Self-learning spectrogram feature extraction method for speech emotion recognition |
CN105047194B (en) * | 2015-07-28 | 2018-08-28 | 东南大学 | A kind of self study sound spectrograph feature extracting method for speech emotion recognition |
CN107886942A (en) * | 2017-10-31 | 2018-04-06 | 东南大学 | A kind of voice signal emotion identification method returned based on local punishment random spectrum |
CN107886942B (en) * | 2017-10-31 | 2021-09-28 | 东南大学 | Voice signal emotion recognition method based on local punishment random spectral regression |
CN109060371A (en) * | 2018-07-04 | 2018-12-21 | 深圳万发创新进出口贸易有限公司 | A kind of auto parts and components abnormal sound detection device |
Also Published As
Publication number | Publication date |
---|---|
CN102592593B (en) | 2014-01-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102592593B (en) | Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech | |
An et al. | Deep CNNs with self-attention for speaker identification | |
CN106057212B (en) | Driving fatigue detection method based on voice personal characteristics and model adaptation | |
Zhang et al. | Robust sound event recognition using convolutional neural networks | |
CN110457432B (en) | Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium | |
CN102142253B (en) | Voice emotion identification equipment and method | |
CN101923855A (en) | Test-irrelevant voice print identifying system | |
Jancovic et al. | Bird species recognition using unsupervised modeling of individual vocalization elements | |
CN103985381B (en) | A kind of audio indexing method based on Parameter fusion Optimal Decision-making | |
CN112259106A (en) | Voiceprint recognition method and device, storage medium and computer equipment | |
CN105895078A (en) | Speech recognition method used for dynamically selecting speech model and device | |
CN101930735A (en) | Speech emotion recognition equipment and speech emotion recognition method | |
CN102723079B (en) | Music and chord automatic identification method based on sparse representation | |
CN110222841A (en) | Neural network training method and device based on spacing loss function | |
CN104978507A (en) | Intelligent well logging evaluation expert system identity authentication method based on voiceprint recognition | |
CN103456302A (en) | Emotion speaker recognition method based on emotion GMM model weight synthesis | |
CN105702251A (en) | Speech emotion identifying method based on Top-k enhanced audio bag-of-word model | |
CN101419799A (en) | Speaker identification method based mixed t model | |
CN103578480B (en) | The speech-emotion recognition method based on context correction during negative emotions detects | |
Ranjard et al. | Integration over song classification replicates: Song variant analysis in the hihi | |
Praksah et al. | Analysis of emotion recognition system through speech signal using KNN, GMM & SVM classifier | |
CN106448660A (en) | Natural language fuzzy boundary determining method with introduction of big data analysis | |
CN105006231A (en) | Distributed large population speaker recognition method based on fuzzy clustering decision tree | |
Pan et al. | Robust Speech Recognition by DHMM with A Codebook Trained by Genetic Algorithm. | |
Li et al. | Feature extraction with convolutional restricted boltzmann machine for audio classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20140101 Termination date: 20170331 |