US20150066549A1 - System, Method and Apparatus for Voice Analytics of Recorded Audio - Google Patents
System, Method and Apparatus for Voice Analytics of Recorded Audio Download PDFInfo
- Publication number
- US20150066549A1 US20150066549A1 US14/298,457 US201414298457A US2015066549A1 US 20150066549 A1 US20150066549 A1 US 20150066549A1 US 201414298457 A US201414298457 A US 201414298457A US 2015066549 A1 US2015066549 A1 US 2015066549A1
- Authority
- US
- United States
- Prior art keywords
- features
- audio features
- telephone calls
- model
- business
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
- G06Q10/06375—Prediction of business process outcome or impact based on a proposed change
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/42221—Conversation recording systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/51—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
Definitions
- This invention relates to a method for analyzing recorded speech. More specifically, the present invention relates to a method for predicting business outcomes by analyzing recorded speech.
- Call centers usually deal with calls, complaints, and requests coming from a large number of clients. These conversations can support several business processes, such as customer retention, sales, fraud management, etc. For instance, in the case of an inbound sales business process, the agent asks the client whether he or she is interested in a new product or service. The question is whether any alternative product or service should be offered in the form of a direct telephone call to clients who did not purchase anything during the previous telephone conversation (i.e. in the “inbound” sales phase).
- Prior solutions utilize computer technology to recognize content (e.g. keywords, full sentences, etc.) in digitally recorded conversations.
- Other solutions have utilized signal-based information from human voice such as loudness, speed, and changes in intonation, but have done so based upon keyword or full sentence content.
- Such approaches require sophisticated and expert work to configure and maintain their operability.
- the present invention eliminates the need for expensive and burdensome speech analytics and allows companies to use speech analytics to enhance a company's business performance.
- the present invention is created for companies in need of speech analytics to enhance a company's business performance.
- Another object of this invention is to predict business outcomes based upon audio recordings.
- a further object of this invention is to create a model for the given business target, based upon annotated sound conversations.
- the invention relates to a unique approach for maximizing the number of positive business outcomes on the top of the ranked set by using predictions based on a model.
- the business outcome is set directly by the user of the system of the present invention by feeding annotated conversations into the system.
- the annotation consists of sound files labeled as successful/not successful business event.
- the prediction is based on a model which relies on the extracted features of digitally recorded conversations.
- a method, system, and/or apparatus may include the following steps: initial model, generate rank, validate model, and refine model.
- Initial model may include providing the method, system, and/or apparatus with a collection of annotated audio recordings, such as telephone conversations. These audio recordings are preferably annotated in accordance with the business result desired (i.e. successful/unsuccessful sales calls).
- the generate rank step includes extracting pre-selected features from each recorded conversation and then running the prediction engine.
- the validate model may include comparing the desired business outcome with the ranked audio conversations.
- the refine model step learns from the validation phase and may upgrade the initial model or build a new one based upon the desired business results. This validation stage may also include feedback from the customer.
- FIG. 1 is a flow chart illustrating the steps associated with the method of the present invention.
- FIG. 2 a is a table of extracted audio features.
- FIG. 2 b is a table of extracted audio features tabulated relative to desired business outcomes.
- FIG. 3 is a diagram illustrating the initial model, rank generation, model validation and model refinement.
- FIG. 4 is a flow chart showing different steps of both the learning and the evaluation phases in parallel.
- the present invention relates to a method for analyzing recorded telephone calls in order to predict business outcomes.
- the method involves recording a series of initial telephone calls and analyzing those calls for particular audio features.
- the audio features are tabulated and annotated to specify whether the telephone call resulted in a pre-determined business outcome.
- a model is then built whereby the pre-determined business outcome can be predicted based upon the presence of certain audio features.
- the model can subsequently be used to predict whether future calls are likely to result in the pre-determined business outcome.
- a user may feed the system with annotated conversations (successful/not successful event) and the system builds a model based on this train information.
- the system of the present invention constructs an equal number of features for each conversation and runs the prediction engine.
- the system of the present invention reveals the business outcome of the ranked conversations by the underlying business process.
- the system of the present invention learns from the validation phase and is capable to either upgrade initial model or build a new model.
- the speech analytics and data mining software techniques of the present invention are comprised of layered software architecture consisting of a core engine, a scheduler and a graphical user interface.
- the core engines of the system consist of all speech analytics and data mining modules necessary for feature extraction, automatic model building and predictions.
- Core engines are the most essential part of the present invention due to the proprietary, self-researched algorithms in the field of machine learning and parallel processing.
- the system does not apply conventional text mining approaches by converting speech to text. Instead, the system extracts features directly from recorded voice. As a result, the system is not language specific and can be used on multiple different languages.
- the system 10 of the present invention has three core engines. These include a feature extraction engine 20 , a model builder engine 30 , and a prediction engine 40 . Furthermore, an additional refining engine 50 can be included for refining the initial model. The function and operation of each of these engines is described next.
- the feature extraction engine 20 of the present invention is a module that processes the sound files and extracts equal number of features from it. This feature matrix is later used for model building and prediction engines.
- the extraction engine 20 records a series of telephone calls. These are ideally calls from individuals to a business. The calls are digitally processed as sound files. Thereafter, the sound files are analyzed and various audio features are extracted from each of the telephone calls.
- the extracted features may be, for example, volume, tone, pitch, speed, etc. Other features can be extracted that are indicative of certain emotions, such as anger, happiness, or frustration. Thereafter, the extraction engine 20 builds a feature matrix whereby a plurality of the extracted audio features are tabulated. An example of such a tabulation is included as FIG. 2 a.
- the model builder engine 30 of the invention automatically builds a model based on whether the individual telephone calls resulted in a pre-determined business outcome.
- Pre-determined business outcome can be any of a wide variety of favorable business outcomes.
- the pre-determined outcome may be, for example, whether a sale is completed, whether the individual participated in a survey, or whether the individual chose to sign up for a mailing list. Any of a number of other pre-determined business outcomes can likewise be used in connection with the invention.
- the model builder engine 30 updates the feature matrix by indicating whether the pre-determined business outcome was achieved for a particular call. The annotated feature matrix is indicated in FIG. 2 b.
- the annotated feature matrix is used to establish a model whereby the pre-determined business outcome is associated with a subset of the extracted audio features.
- the model may specify that a particular voice tone, inflection, and speed are more frequently associated with the pre-determined business outcome.
- a tone, inflection, or speed above or below a pre-determined threshold value can be associated with a business outcome.
- the referenced measurements can be made in decibels, voice frequency, or words per minute. The model would, therefore, specify that callers matching those audio features are more likely to result in the favorable business outcome.
- the engine's output is a model that is used for future predictions. For example, with reference to FIG. 2 a , the model may predict a positive outcome when audio features 1 and 2 are present.
- the prediction engine 40 uses the model and the extracted features to predict the business outcome for new sound files.
- the prediction engine 40 records additional telephone calls from individuals calling the business. These calls are likewise processed as sound files. Various audio features are again extracted from each of the sound files. Thereafter, the engine 40 predicts whether the pre-determined business outcome will occur based upon the established model. Namely, the established model is compared to the newly extracted voice feature to predict the likelihood of a desired business result.
- the telephone calls can be placed in a ranked order based upon the prediction. This ranking, in turn, can be used to properly route and/or prioritize calls within the business.
- the refining engine 50 can be used to further develop the model. This is accomplished by determining whether each additional telephone call, in fact, resulted in the actual occurrence of the pre-determined business outcome. Next, a comparison is made between the actual occurrence with the prediction. In other words, the actual occurrence can be used to either validate or refute the prediction. The model can then be modified as needed based upon the comparison between the actual and predicted outcomes.
- the scheduler of the present invention is a component to start, stop and supervise the system's core processes. Additionally, the scheduler features both a single and a multi-threaded mode to maximize performance.
- the graphical user interface of the present invention enables end users to build, run and refine models, administer projects and browse the ranking results.
- the graphical user interface is an intuitive, lightweight and web-browser based software.
- the system of the present invention is able to run in command line (scripting)/service mode in order to enable system-level integration with different systems.
- the present invention may be referred to below as Rank Miner or the present system.
- One of the unique qualities of the present system is that it is capable of building its model and optimizing it for successful business outcomes. This means the following: based on each conversation the company initiates certain business activities. Each activity may have an associated successful and unsuccessful event (e.g. a successful sale).
- the system can draw a profit curve based on validation results, expected costs and revenues. It is suggested to use ranked list at the maximum level of profit. It is easier to decide the threshold of maximum profit based on the profit curve.
- the present system's machine learning method may be able to learn these business results and the ranked list will be optimized directly for business outcome, and the customer can decide which is the optimal percentage of the ranked list to use, which means a unique differentiation from existing prior art.
- Another distinguishing feature of the system may be the learning of ranking: current research gives a special solution for an intensely-researched problem of machine learning.
- numeric values assigned to (telephone) conversations constitute the input of the learning process.
- 1/0 values mean whether the business event was successful, that is, for example, the client bought the product/service in question during the second direct call or not.
- the present system's unique approach includes maximizing the number of positive business outcomes at the top of the ranked set by using predictions based on a model.
- the business outcome is set directly by the user of the system by feeding annotated conversations into the system based on the customer's historical business data.
- the annotation may consist of sound files labeled as successful/not successful business event.
- the prediction is based on a model which relies on the extracted features of digitally recorded conversations and optimizes directly for the customer's business profit.
- the present method may include four steps as noted in FIG. 3 :
- the system extracts an equal number of features from each conversation and runs the prediction engine.
- Ranking can be optimized for lift measure.
- the system learns from the validation phase and is able to either upgrade the initial model or build a new one.
- the refining process uses the customer's feedback thus providing a dynamic, continuous prediction model optimization.
- the evaluation phase follows those of the learning phase. Exactly the same steps must be executed on the input data during the use of the ranking model as in the learning phase. The only difference lies in the fact that the success index is not part of the input of the system, since that is exactly what we are going to estimate.
- As the output of the evaluation phase we receive a prediction to the incoming conversations concerning their success indexes. On the basis of this, we can rank clients who are yet to be contacted directly.
- FIG. 4 shows different steps of both the learning and the evaluation phases parallel to one another from input to output.
- the learning phase displays the methods of model construction
- the evaluation phase presents the application of the model for the cases of telephone conversations that are unknown to the model.
- the spectrum received after noise filtering we may attempt to separate the different speakers. Its first step may be a dividing process that produces chunks, i.e. 1-3-second pieces of the signal along minimal energy-level cutting points.
- KTM kernel-based non-linear learning process A learning database was constructed during the process, whose aforementioned four classes contained at least 5000 samples respectively.
- For the representation of samples to be learned we extract power spectral density features and the histogram of mel frequency cepstral coefficients' changes. The accuracy of detection is measured on an independent database, in which we detected a higher than 96% accuracy in the case of each class.
- segments identified in homogenous human speech must be partitioned into two fields, which are agent and client segments.
- a dissimilarity function relating to segment pairs with the help of machine learning. This function assigns a value near 0 to segment pairs selected from the same speaker, and a value near 1 to segment pairs selected from different speakers.
- an appropriate hierarchical clustering application agglomerative nesting, in our case
- the statistical functions of the extracted information (e.g. minimum, maximum, median, mean, variance, skewness, etc.) constitute the extracted features.
- the emotional transition statistics belong to the third class of extracted features. We examine whether consecutive agent and client tracks separately contained changes in different emotional states. For instance, if the client was angry at one moment, but spoke calmly after the agent segment following the client's angry utterance, that constitutes information for us.
- Lift a measure popular in data mining terminology, as the learning measure.
- Lift measure is the index of the model's performance, which indicates how great the improvement is compared to an arbitrarily constructed model.
- Lift p stands for the value of the first p percent of the list. That way
- f S (N) is the accuracy of the new S model in the case of the first p percent of the elements
- f R (N) provides the percentage of the arbitrary model's performance
- Double object (D,T) and p value constituted the learning input.
- p parameter stands for the cutting at the calculation of Lift measure.
- M ranking model whose Lift p value is the highest out of the constructed model variations, constitutes the output of the algorithm.
- Model variations are going to be constructed with the help of feature space transformational processes and the execution of dimension reduction of various degrees.
- Feature space transformational processes attempt to further modify features extracted from conversations in order to make the D->T learning process more efficient.
- Methods follow the underlying principle that it is worth oppressing noise-like features, while truly relevant features are being extracted.
- These methods have two basic types: supervised and non-supervised processes.
- supervised processes use T target values along with D samples, whereas non-supervised methods focus only on D feature space.
- out of nonsupervised methods we applied the method of principal component analysis, and, out of supervised methods, we tried Spring-Based Discriminating Analysis defined by us, which is similar to Linear Discriminating Analysis.
- Kernel PCA Kernel SDA
- T feature space transformation with T
- Tvn where the components of vector Tvi are ordered in the way that dimensions of smaller indexes represent a more relevant feature space component (in the case of spectral feature extraction processes, the dimensions of the new transformed feature space are ordered along the dominant eigenvectors of respective algorithm-specific matrices) [ ]. That is why we can apply a dimension reduction step as well by keeping the first few dimensions of the m dimension space received after transformation. If we keep the first t elements of the Tvi vectors, we indicate the transformation with T_tv_i, and the whole database with (T_tD,T).
- the applied ranking algorithm follows the transformation of the feature space with a chosen feature space transformation process, and then we consecutively execute the (T — 1D,T), . . . , (T — 1D,T) dimension reduction steps.
- the pseudo-code of the algorithm can be found in the chart Algorithm 1.
- the system is applicable for the previously discussed Profit optimization, in which case we use Profitn instead of Liftn, and the value of Profitn is a numeric value derived from income and cost input data.
- Profit optimization is a further distinguishing feature of the present system.
- T the feature space transformation provided by PCA, SDA, KPCA or KSDA applied on the training data (D,T)
- I 1 to m
- M_I learn the KTM model on training data (T_ID,T) // T_I represents an i-dimensional cut // after transformation T End;
- M Argmax Lift p (M_I)
- the system is layered software architecture may be made up of a core engine, a scheduler and a GUI.
- Core engines consist of all speech analytics and data mining modules that are needed for feature extraction, building an automatic model and making predictions. Core engines are the most essential part of the system due to the proprietary, self-researched algorithms in the field of machine learning and parallel processing.
- the system has three core engines, a feature extraction, a model builder and a prediction engine.
- Feature extraction module processes the sound files and extracts an equal number of features from all conversations. This feature matrix is later used for model builder and prediction engines.
- the features extracted may include:
- Model builder engine builds a model automatically based on the feature matrix.
- the engine's output is a model that is used for predictions.
- Prediction engine uses the previously built business-tailored model to predict the business outcome of the business activity. Using the prediction, the system provides a business-optimized ranked set. The system's uniqueness lies in providing support for the customer's own business activity based on the customer's own historical data by predicting future outcomes.
- Scheduler is a system component to start, stop and supervise the core processes of the system. Scheduler features single and multi-threaded modes to maximize performance.
- GUI graphical user interface
- end users to build, run and refine models, administer projects and browse the ranking results.
- GUI is an intuitive, lightweight, web-browser-based software.
- the system is able to run in command line (scripting)/service mode in order to enable system-level integration with different systems.
Abstract
Disclosed is a method for analyzing recorded telephone calls in order to predict business outcomes. The method involves recording a series of initial telephone calls and analyzing those calls for particular audio features. The audio features are tabulated and annotated to specify whether the telephone call resulted in a pre-determined business outcome. A model is then built whereby the pre-determined business outcome can be predicted based upon the presence of certain audio features. The model can subsequently be used to predict whether future calls are likely to result in the pre-determined business outcome.
Description
- This application claims priority to co-pending provisional application Ser. No. 61/655,594 filed on Jun. 5, 2012 and entitled “System, Method, and Apparatus for Voice Analytics of Recorded Audio.” The contents of this co-pending application are fully incorporated herein for all purposes.
- 1. Field of the Invention
- This invention relates to a method for analyzing recorded speech. More specifically, the present invention relates to a method for predicting business outcomes by analyzing recorded speech.
- 2. Description of the Background Art
- Presently, there are methods to enable companies to use speech analytics for enhancing a company's business performance. Currently, however, such methods require costly and unnecessary methods.
- Call centers usually deal with calls, complaints, and requests coming from a large number of clients. These conversations can support several business processes, such as customer retention, sales, fraud management, etc. For instance, in the case of an inbound sales business process, the agent asks the client whether he or she is interested in a new product or service. The question is whether any alternative product or service should be offered in the form of a direct telephone call to clients who did not purchase anything during the previous telephone conversation (i.e. in the “inbound” sales phase).
- Prior solutions utilize computer technology to recognize content (e.g. keywords, full sentences, etc.) in digitally recorded conversations. Other solutions have utilized signal-based information from human voice such as loudness, speed, and changes in intonation, but have done so based upon keyword or full sentence content. Such approaches require sophisticated and expert work to configure and maintain their operability.
- The present invention eliminates the need for expensive and burdensome speech analytics and allows companies to use speech analytics to enhance a company's business performance.
- The present invention is created for companies in need of speech analytics to enhance a company's business performance.
- Therefore, it is an object of this invention to provide an improvement which overcomes the aforementioned inadequacies of the prior art systems and provides an improvement which is a significant contribution to the advancement of the art of speech analytics.
- Another object of this invention is to predict business outcomes based upon audio recordings.
- A further object of this invention is to create a model for the given business target, based upon annotated sound conversations.
- For the purposes of summarizing this invention, the invention relates to a unique approach for maximizing the number of positive business outcomes on the top of the ranked set by using predictions based on a model. The business outcome is set directly by the user of the system of the present invention by feeding annotated conversations into the system. The annotation consists of sound files labeled as successful/not successful business event. The prediction is based on a model which relies on the extracted features of digitally recorded conversations.
- In one aspect of the present disclosure, a method, system, and/or apparatus may include the following steps: initial model, generate rank, validate model, and refine model. Initial model may include providing the method, system, and/or apparatus with a collection of annotated audio recordings, such as telephone conversations. These audio recordings are preferably annotated in accordance with the business result desired (i.e. successful/unsuccessful sales calls).
- The generate rank step includes extracting pre-selected features from each recorded conversation and then running the prediction engine. The validate model may include comparing the desired business outcome with the ranked audio conversations. The refine model step learns from the validation phase and may upgrade the initial model or build a new one based upon the desired business results. This validation stage may also include feedback from the customer.
- The foregoing has outlined some of the pertinent objects of the invention. These objects should be construed to be merely illustrative of some of the more prominent features and applications of the intended invention. Many other beneficial results can be attained by applying the disclosed invention in a different manner or modifying the invention within the scope of the disclosure. Accordingly, other objects and a fuller understanding of the invention may be had by referring to the summary of the invention and the detailed description of the preferred embodiment in addition to the scope of the invention defined by the claims taken in conjunction with the accompanying drawings.
- For a fuller understanding of the nature and objects of the invention, reference should be had to the following detailed description taken in connection with the accompanying drawings.
-
FIG. 1 is a flow chart illustrating the steps associated with the method of the present invention. -
FIG. 2 a is a table of extracted audio features. -
FIG. 2 b is a table of extracted audio features tabulated relative to desired business outcomes. -
FIG. 3 is a diagram illustrating the initial model, rank generation, model validation and model refinement. -
FIG. 4 is a flow chart showing different steps of both the learning and the evaluation phases in parallel. - The present invention relates to a method for analyzing recorded telephone calls in order to predict business outcomes. The method involves recording a series of initial telephone calls and analyzing those calls for particular audio features. The audio features are tabulated and annotated to specify whether the telephone call resulted in a pre-determined business outcome. A model is then built whereby the pre-determined business outcome can be predicted based upon the presence of certain audio features. The model can subsequently be used to predict whether future calls are likely to result in the pre-determined business outcome. The various features of the present invention, and the manner in which they interrelate, are described in greater detail hereinafter.
- In one preferred embodiment (the initial model) of the system or method in accordance with the present disclosure, a user may feed the system with annotated conversations (successful/not successful event) and the system builds a model based on this train information. During ranking, the system of the present invention constructs an equal number of features for each conversation and runs the prediction engine. During validation, the system of the present invention reveals the business outcome of the ranked conversations by the underlying business process.
- The system of the present invention learns from the validation phase and is capable to either upgrade initial model or build a new model. The speech analytics and data mining software techniques of the present invention are comprised of layered software architecture consisting of a core engine, a scheduler and a graphical user interface.
- The core engines of the system consist of all speech analytics and data mining modules necessary for feature extraction, automatic model building and predictions. Core engines are the most essential part of the present invention due to the proprietary, self-researched algorithms in the field of machine learning and parallel processing. Notably, the system does not apply conventional text mining approaches by converting speech to text. Instead, the system extracts features directly from recorded voice. As a result, the system is not language specific and can be used on multiple different languages.
- As noted in
FIG. 1 , thesystem 10 of the present invention has three core engines. These include afeature extraction engine 20, amodel builder engine 30, and aprediction engine 40. Furthermore, anadditional refining engine 50 can be included for refining the initial model. The function and operation of each of these engines is described next. - The
feature extraction engine 20 of the present invention is a module that processes the sound files and extracts equal number of features from it. This feature matrix is later used for model building and prediction engines. Theextraction engine 20 records a series of telephone calls. These are ideally calls from individuals to a business. The calls are digitally processed as sound files. Thereafter, the sound files are analyzed and various audio features are extracted from each of the telephone calls. The extracted features may be, for example, volume, tone, pitch, speed, etc. Other features can be extracted that are indicative of certain emotions, such as anger, happiness, or frustration. Thereafter, theextraction engine 20 builds a feature matrix whereby a plurality of the extracted audio features are tabulated. An example of such a tabulation is included asFIG. 2 a. - The
model builder engine 30 of the invention automatically builds a model based on whether the individual telephone calls resulted in a pre-determined business outcome. Pre-determined business outcome can be any of a wide variety of favorable business outcomes. The pre-determined outcome may be, for example, whether a sale is completed, whether the individual participated in a survey, or whether the individual chose to sign up for a mailing list. Any of a number of other pre-determined business outcomes can likewise be used in connection with the invention. Themodel builder engine 30 updates the feature matrix by indicating whether the pre-determined business outcome was achieved for a particular call. The annotated feature matrix is indicated inFIG. 2 b. - Thereafter, the annotated feature matrix is used to establish a model whereby the pre-determined business outcome is associated with a subset of the extracted audio features. For example, the model may specify that a particular voice tone, inflection, and speed are more frequently associated with the pre-determined business outcome. Likewise, a tone, inflection, or speed above or below a pre-determined threshold value can be associated with a business outcome. The referenced measurements can be made in decibels, voice frequency, or words per minute. The model would, therefore, specify that callers matching those audio features are more likely to result in the favorable business outcome. The engine's output is a model that is used for future predictions. For example, with reference to
FIG. 2 a, the model may predict a positive outcome when audio features 1 and 2 are present. - The
prediction engine 40 uses the model and the extracted features to predict the business outcome for new sound files. Theprediction engine 40 records additional telephone calls from individuals calling the business. These calls are likewise processed as sound files. Various audio features are again extracted from each of the sound files. Thereafter, theengine 40 predicts whether the pre-determined business outcome will occur based upon the established model. Namely, the established model is compared to the newly extracted voice feature to predict the likelihood of a desired business result. The telephone calls can be placed in a ranked order based upon the prediction. This ranking, in turn, can be used to properly route and/or prioritize calls within the business. - Following the disposition of the call, the
refining engine 50 can be used to further develop the model. This is accomplished by determining whether each additional telephone call, in fact, resulted in the actual occurrence of the pre-determined business outcome. Next, a comparison is made between the actual occurrence with the prediction. In other words, the actual occurrence can be used to either validate or refute the prediction. The model can then be modified as needed based upon the comparison between the actual and predicted outcomes. - The scheduler of the present invention is a component to start, stop and supervise the system's core processes. Additionally, the scheduler features both a single and a multi-threaded mode to maximize performance.
- The graphical user interface of the present invention enables end users to build, run and refine models, administer projects and browse the ranking results. The graphical user interface is an intuitive, lightweight and web-browser based software. In addition to the graphical user interface, the system of the present invention is able to run in command line (scripting)/service mode in order to enable system-level integration with different systems.
- The present invention may be referred to below as Rank Miner or the present system. One of the unique qualities of the present system is that it is capable of building its model and optimizing it for successful business outcomes. This means the following: based on each conversation the company initiates certain business activities. Each activity may have an associated successful and unsuccessful event (e.g. a successful sale).
- We may also associate revenue and cost to each activity (e.g. cost of call, wages, etc.). If the business activity (e.g. sales activity) is successful that means revenue and profit for the company. The costs as well as the revenues may have a different structure, for example a cost may arise after each call or after 10 successful sales as well. If the business activity is not successful, only the associated cost remains. The system can draw a profit curve based on validation results, expected costs and revenues. It is suggested to use ranked list at the maximum level of profit. It is easier to decide the threshold of maximum profit based on the profit curve. Based on historical business activity results, the present system's machine learning method may be able to learn these business results and the ranked list will be optimized directly for business outcome, and the customer can decide which is the optimal percentage of the ranked list to use, which means a unique differentiation from existing prior art.
- Another distinguishing feature of the system may be the learning of ranking: current research gives a special solution for an intensely-researched problem of machine learning. In our case, numeric values assigned to (telephone) conversations constitute the input of the learning process. In a binary case, 1/0 values mean whether the business event was successful, that is, for example, the client bought the product/service in question during the second direct call or not.
- The present system's unique approach includes maximizing the number of positive business outcomes at the top of the ranked set by using predictions based on a model. The business outcome is set directly by the user of the system by feeding annotated conversations into the system based on the customer's historical business data. The annotation may consist of sound files labeled as successful/not successful business event. The prediction is based on a model which relies on the extracted features of digitally recorded conversations and optimizes directly for the customer's business profit.
- The present method may include four steps as noted in
FIG. 3 : - Feeding the system with annotated conversations (successful/not successful events). The system builds a model based on this train information.
- During ranking, the system extracts an equal number of features from each conversation and runs the prediction engine. Ranking can be optimized for lift measure.
- During validation the business outcome of the ranked conversations is revealed by the underlying business process.
- The system learns from the validation phase and is able to either upgrade the initial model or build a new one. The refining process uses the customer's feedback thus providing a dynamic, continuous prediction model optimization.
- The phases of the ranking process can be seen in
FIG. 4 . The discussion below is exemplary only. - On the left side of
FIG. 4 we can follow the learning phase, and parallel to it, the evaluation phase can be seen on the right. Agent-client conversations in way file format along telephone sampling (usually 8 khz, 8 bit mono) constitute the input of both phases. Let us note that both phases can be divided into three functional blocks: preprocessing - (noise filtering, speaker separation), feature extraction and ranking.
- During the learning phase, telephone conversations and numeric values indicating whether the business event occurred or not (0/1) constitute the input. First, we filter noises from the signal, then we attempt to separate the speech segments of agent and client. We consider certain segments where neither the agent nor the client is speaking as unproductive. The next step is feature extraction, during which we examine the characteristics of the dialogue and emotional features of agent and client. We extract the same number of features from each conversation, and convert them into a vector containing numeric elements. In the learning phase we build a model to find the correlations between these vectors and the assigned success indexes.
- As regards its distinct steps, the evaluation phase follows those of the learning phase. Exactly the same steps must be executed on the input data during the use of the ranking model as in the learning phase. The only difference lies in the fact that the success index is not part of the input of the system, since that is exactly what we are going to estimate. We execute the steps we have already taken in the learning phase until the feature extraction phase, and then evaluate the ranking model of the features received. As the output of the evaluation phase we receive a prediction to the incoming conversations concerning their success indexes. On the basis of this, we can rank clients who are yet to be contacted directly.
-
FIG. 4 shows different steps of both the learning and the evaluation phases parallel to one another from input to output. The learning phase displays the methods of model construction, and the evaluation phase presents the application of the model for the cases of telephone conversations that are unknown to the model. - Further components of the present disclosure are discussed below.
- Telephone conversations undergo several signal processing steps. First we may normalize the signal with deviation and average, and it is followed by the removal of the noise, for which we apply the method Spectral Mean Substraction. We may construct the spectrum using slices of 10 ms samples, and a window size of 1024, with the fast Fourier Transformation along with applying Hamming window.
- Using the spectrum received after noise filtering, we may attempt to separate the different speakers. Its first step may be a dividing process that produces chunks, i.e. 1-3-second pieces of the signal along minimal energy-level cutting points. We identify the pieces with learning algorithm as homogenous human speech, simultaneous speech, music or non-informative parts. For the classification we have chosen KTM kernel-based non-linear learning process. A learning database was constructed during the process, whose aforementioned four classes contained at least 5000 samples respectively. For the representation of samples to be learned we extract power spectral density features and the histogram of mel frequency cepstral coefficients' changes. The accuracy of detection is measured on an independent database, in which we detected a higher than 96% accuracy in the case of each class. As the next step, segments identified in homogenous human speech must be partitioned into two fields, which are agent and client segments. To achieve this, we taught a dissimilarity function relating to segment pairs with the help of machine learning. This function assigns a value near 0 to segment pairs selected from the same speaker, and a value near 1 to segment pairs selected from different speakers. In the learning process of the dissimilarity function we used 10000 segment pairs belonging to the same speaker and 10000 belonging to different speakers. On the basis of the learned dissimilarity function we construct the dissimilarity matrix of homogenous segments for each pair, then with an appropriate hierarchical clustering application (agglomerative nesting, in our case) we divide the segments into two classes (agent/client). Finally, we remove segments containing nonhomogenous speech excerpts from the conversation, and contract the parts belonging to the same speaker and correlating in time. After contraction, we obtain an “ABABAB”-type analysis of the conversation, which prepares the field for the execution of further operations.
- As the next step, we may extract three different types of features. We determine communicational features on the basis of speaker changes and the length of consecutive agent-client segments and their different statistical functions.
- We extract emotional information from the segments of both the agent and the client. The statistical functions of the extracted information (e.g. minimum, maximum, median, mean, variance, skewness, etc.) constitute the extracted features.
- The emotional transition statistics belong to the third class of extracted features. We examine whether consecutive agent and client tracks separately contained changes in different emotional states. For instance, if the client was angry at one moment, but spoke calmly after the agent segment following the client's angry utterance, that constitutes information for us.
- During the learning phase, correlations may be found between features extracted from conversations and target values assigned to conversations. In our case, the traditional classification and regression approaches of machine learning are not necessarily sufficient by themselves. Our goal is to make a prediction with the help of which those cases end up at the top of the list that more probably lead to successful purchases after ranking based on the prediction value. Ranking methodology perfectly suits real applications if we only intend to highlight a subset of a large set where cases favorable to us are gathered. We wish to have an overview of the elements of the highlighted subset organized in a list based on relevance, similarly to a browser search.
- The following questions arise at some ranking learning problems:
- i) how to select the ranking measure relevant to us,
- ii) considering that the learning of different ranking measures regularly leads to learning processes of great time complexity, how to get an appropriate suboptimal solution quickly.
- We have chosen Lift, a measure popular in data mining terminology, as the learning measure. Lift measure is the index of the model's performance, which indicates how great the improvement is compared to an arbitrarily constructed model. Liftp stands for the value of the first p percent of the list. That way
-
Lift(f)f(p)/f(p)pSSR. - where fS(N) is the accuracy of the new S model in the case of the first p percent of the elements, and fR(N) provides the percentage of the arbitrary model's performance.
- Ranking learning was performed with the following algorithm. Double object (D,T) and p value constituted the learning input. (D,T) indicates the learning database, where D=[v1, . . . vn] vectors indicate the m dimensional feature vectors assigned to the conversations, and T=[c1, . . . , cn] numbers indicate the assigned success indexes. Furthermore, p parameter stands for the cutting at the calculation of Lift measure. M ranking model, whose Liftp value is the highest out of the constructed model variations, constitutes the output of the algorithm.
- Model variations are going to be constructed with the help of feature space transformational processes and the execution of dimension reduction of various degrees. Feature space transformational processes attempt to further modify features extracted from conversations in order to make the D->T learning process more efficient. Methods follow the underlying principle that it is worth oppressing noise-like features, while truly relevant features are being extracted. These methods have two basic types: supervised and non-supervised processes. During the establishment of feature space transformation, supervised processes use T target values along with D samples, whereas non-supervised methods focus only on D feature space. Out of nonsupervised methods, we applied the method of principal component analysis, and, out of supervised methods, we tried Spring-Based Discriminating Analysis defined by us, which is similar to Linear Discriminating Analysis. The two applied processes have versions applying non-linear kernel functions (Kernel PCA, Kernel SDA), which we also incorporated into our research [ ]. After this, if we indicate feature space transformation with T, then by that we mean the execution of one of algorithms PCA, SDA, KPCA, KSDA in practice. Samples learning with T transformation are indicated by (TD,T). After feature space transformation, D=[v1, . . . , vn] vectors will become the shape of tD=[Tv1, . . . , Tvn], where the components of vector Tvi are ordered in the way that dimensions of smaller indexes represent a more relevant feature space component (in the case of spectral feature extraction processes, the dimensions of the new transformed feature space are ordered along the dominant eigenvectors of respective algorithm-specific matrices) [ ]. That is why we can apply a dimension reduction step as well by keeping the first few dimensions of the m dimension space received after transformation. If we keep the first t elements of the Tvi vectors, we indicate the transformation with T_tv_i, and the whole database with (T_tD,T).
- The applied ranking algorithm follows the transformation of the feature space with a chosen feature space transformation process, and then we consecutively execute the (T—1D,T), . . . , (T—1D,T) dimension reduction steps. We consistently apply the aforementioned KTM non-linear regression process to the learning tasks received that way. Finally, we select the best model on the basis of Liftp measure. The pseudo-code of the algorithm can be found in the
chart Algorithm 1. - The system is applicable for the previously discussed Profit optimization, in which case we use Profitn instead of Liftn, and the value of Profitn is a numeric value derived from income and cost input data. A further distinguishing feature of the present system is profit optimization.
- Input:
- (D,T)—training data
- D=[v1, . . . , vn]—m-dimensional feature vectors
- T=[c1, . . . , cn]—corresponding target values
- p—Lift measure parameter
- Output:
- M—ranking model optimizing LiftN
- Algorithm:
-
T = the feature space transformation provided by PCA, SDA, KPCA or KSDA applied on the training data (D,T) For I = 1 to m do M_I = learn the KTM model on training data (T_ID,T) // T_I represents an i-dimensional cut // after transformation T End; M = Argmax Liftp(M_I) - The system is layered software architecture may be made up of a core engine, a scheduler and a GUI.
- Core engines consist of all speech analytics and data mining modules that are needed for feature extraction, building an automatic model and making predictions. Core engines are the most essential part of the system due to the proprietary, self-researched algorithms in the field of machine learning and parallel processing. The system has three core engines, a feature extraction, a model builder and a prediction engine.
- Feature extraction module processes the sound files and extracts an equal number of features from all conversations. This feature matrix is later used for model builder and prediction engines. The features extracted may include:
-
- speaker separation
- human speech segmentation
- extraction of communication and intonation features
- construction of an equal number of features for each conversation
- working on narrow-bandwidth cell phone/landline quality
- Model builder engine builds a model automatically based on the feature matrix. The engine's output is a model that is used for predictions.
- Prediction engine uses the previously built business-tailored model to predict the business outcome of the business activity. Using the prediction, the system provides a business-optimized ranked set. The system's uniqueness lies in providing support for the customer's own business activity based on the customer's own historical data by predicting future outcomes.
- Scheduler is a system component to start, stop and supervise the core processes of the system. Scheduler features single and multi-threaded modes to maximize performance.
- The system's graphical user interface enables end users to build, run and refine models, administer projects and browse the ranking results. GUI is an intuitive, lightweight, web-browser-based software. In addition to GUI, the system is able to run in command line (scripting)/service mode in order to enable system-level integration with different systems.
- We conducted the tests at the customer service center of a company trading holiday rights. Employees of the customer service had to call and invite potential clients to a personal meeting. Clients who did not refuse the offer completely were recalled after a while. We consider the call successful, if the client accepted the invitation, and unsuccessful, if he or she did not. With the help of the ranking system, we endeavored to rank the clients after the first conversation on the basis of success probability, which would potentially result in a higher number of clients after the second call. The learning of the system was conducted on a database of more than 8000 labeled samples with 408 positive cases. During testing, the sample database was divided, and 60% of the samples formed the teaching, and 40% the testing domain. This way, we measured the ranking of a list containing 3200 conversations, out of which 167 were known to be positive outputs. We found the following results:
-
TABLE 1 Ranking Results Lift measure Percentage of Unranked list Ranked list: (efficiency calls #of attendees # of attendees multiplicator) 1% 1.67 11.67 7.01 2% 3.33 15.57 4.67 5% 8.33 54.5 6.54 10% 16.67 70.83 4.25 25% 41.67 82.92 1.99 50% 83.33 142.5 1.71 75% 125 155 1.24 90% 150 159 1.06 100% 166.67 166.67 (—) - More than 7 times more positive outputs reached the upper 1% of the ranked list as opposed to choice selection, but even on the 10% level, there are more than 4 times more. In other words, calling 50% of the numbers formerly made it possible to reach 50% of the participants (that is 83 clients), with the help of ranking, it is 85% (143 clients), which represents a 72% improvement. With 75% of the calls, 75% (125 clients) of the clients were formerly successfully invited, and with ranking, it is 93% (155 clients) of all the participants, which represents a 24% improvement. All in all, we can assume that calls made in accordance with the ranked list result in considerable additional profit.
- The present disclosure includes that contained in the appended claims, as well as that of the foregoing description. Although this invention has been described in its preferred form with a certain degree of particularity, it is understood that the present disclosure of the preferred form has been made only by way of example and that numerous changes in the details of construction and the combination and arrangement of parts may be resorted to without departing from the spirit and scope of the invention.
Claims (17)
1. A method for predicting business outcomes based upon recorded telephone calls, the method comprising:
using a feature extraction engine for using a series of recorded telephone calls of individuals calling a business, processing the recorded telephone calls as sound files, extracting various audio features from each of the telephone calls, and building a feature matrix whereby a plurality of extracted audio features are tabulated, wherein each audio feature is a first dimension in the feature matrix and each telephone call is a second dimension in the feature matrix, wherein the first dimension being one of a row or a column and the second dimension being the other one of the row or the column, wherein extracting various audio features comprises extracting emotional information comprising changes in emotional states;
using a model builder engine for determining whether the telephone calls resulted in a pre-determined business outcome, annotating the feature matrix by indicating whether the pre-determined business outcome was achieved by adding to the first dimension of the feature matrix a business outcome, and establishing a model whereby the pre-determined business outcome is associated with a subset of the extracted audio features;
using a prediction engine for using additional telephone calls from individuals calling the business, processing the additional telephone calls as sound files, extracting various audio features from each of the sound files, predicting whether the pre-determined business outcome will occur based upon the established model, and ranking the additional telephone calls based upon the prediction; and
using a refining engine for determining whether each additional telephone call resulted in the actual occurrence of the pre-determined business outcome, comparing the actual occurrence with the prediction and modifying the established model on the basis of the comparison.
2. A method for predicting business outcomes comprising:
using a feature extraction engine for using a series of recorded telephone calls, processing the recorded telephone calls as sound files, and extracting various audio features from each of the telephone calls, wherein extracting various audio features comprises extracting emotional information comprising changes in emotional states;
using a model builder engine for determining whether the telephone calls resulted in a pre-determined business outcome, and establishing a model whereby the pre-determined business outcome is associated with a subset of the extracted audio features; and
using a prediction engine for using additional telephone calls, processing the additional telephone calls as sound files, and extracting various audio features from each of the sound files, and predicting whether the pre-determined business outcome will occur based upon the established model.
3. The method as described in claim 2 , further comprising ranking the additional telephone calls based upon the prediction.
4. The method as described in claim 3 wherein the rankings determine how the additional telephone calls are handled.
5. The method as described in claim 2 , further comprising using a refining engine for determining whether each additional telephone call resulted in the actual occurrence of the pre-determined business outcome, comparing the actual occurrence with the prediction and modifying the established model on the basis of the comparison.
6. The method as described in claim 2 , further comprising:
building a feature matrix whereby a plurality of extracted audio features are tabulated, wherein each audio feature is a first dimension in the feature matrix and each telephone call is a second dimension in the feature matrix, wherein the first dimension being one of a row or a column and the second dimension being the other one of the row or the column; and
annotating the feature matrix by indicating whether the pre-determined business outcome was achieved for a particular telephone call by adding to the first dimension of the feature matrix a business outcome.
7. The method as described in claim 2 wherein the audio features further comprises voice volume, tone, emotion, and inflection.
8. The method as described in claim 2 wherein the pre-determined business outcome is whether or not a sale was completed as a result of the telephone call.
9. A method for predicting business outcomes comprising:
using a feature extraction engine for processing a series of telephone calls, and extracting various audio features from each of the telephone calls, wherein extracting various audio features comprises extracting emotional information comprising changes in emotional states;
using a model builder engine for determining whether the telephone calls resulted in a favorable business outcome, and establishing a model whereby the favorable business outcome is associated with the extracted audio features; and
using a prediction engine for processing an additional telephone call, extracting various audio features from the additional telephone call, and predicting whether the favorable business outcome will occur based upon the established model.
10. The method as described in claim 1 wherein the extracted audio features are in the English language.
11. The method as described in claim 1 wherein the extracted audio feature is in a language other than English.
12. The method of claim 1 , wherein extracting various audio features further comprises extracting features based on emotional changes in speakers and consecutive agent-client segments.
13. The method of claim 2 , wherein extracting various audio features further comprises extracting features based on emotional changes in speakers and consecutive agent-client segments.
14. The method of claim 9 , wherein extracting various audio features further comprises extracting features based on emotional changes in speakers and consecutive agent-client segments.
15. The method of claim 1 , wherein extracting various audio features further comprises extracting features comprising speaker separation, human speech segmentation and communication and intonation features.
16. The method of claim 2 , wherein extracting various audio features further comprises extracting features comprising speaker separation, human speech segmentation and communication and intonation features.
17. The method of claim 9 , wherein extracting various audio features further comprises extracting features comprising speaker separation, human speech segmentation and communication and intonation features.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/298,457 US20150066549A1 (en) | 2012-06-05 | 2014-06-06 | System, Method and Apparatus for Voice Analytics of Recorded Audio |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261655594P | 2012-06-05 | 2012-06-05 | |
US13/909,351 US8781880B2 (en) | 2012-06-05 | 2013-06-04 | System, method and apparatus for voice analytics of recorded audio |
US14/298,457 US20150066549A1 (en) | 2012-06-05 | 2014-06-06 | System, Method and Apparatus for Voice Analytics of Recorded Audio |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/909,351 Continuation US8781880B2 (en) | 2012-06-05 | 2013-06-04 | System, method and apparatus for voice analytics of recorded audio |
Publications (1)
Publication Number | Publication Date |
---|---|
US20150066549A1 true US20150066549A1 (en) | 2015-03-05 |
Family
ID=49671390
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/909,351 Expired - Fee Related US8781880B2 (en) | 2012-06-05 | 2013-06-04 | System, method and apparatus for voice analytics of recorded audio |
US14/298,457 Abandoned US20150066549A1 (en) | 2012-06-05 | 2014-06-06 | System, Method and Apparatus for Voice Analytics of Recorded Audio |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/909,351 Expired - Fee Related US8781880B2 (en) | 2012-06-05 | 2013-06-04 | System, method and apparatus for voice analytics of recorded audio |
Country Status (2)
Country | Link |
---|---|
US (2) | US8781880B2 (en) |
WO (1) | WO2013184667A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016195897A1 (en) * | 2015-06-04 | 2016-12-08 | Intel Corporation | System for analytic model development |
CN108536811A (en) * | 2018-04-04 | 2018-09-14 | 上海智臻智能网络科技股份有限公司 | Interactive voice determining method of path based on machine learning and device, storage medium, terminal |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8914285B2 (en) * | 2012-07-17 | 2014-12-16 | Nice-Systems Ltd | Predicting a sales success probability score from a distance vector between speech of a customer and speech of an organization representative |
PL3011554T3 (en) * | 2013-06-21 | 2019-12-31 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Pitch lag estimation |
US9641681B2 (en) | 2015-04-27 | 2017-05-02 | TalkIQ, Inc. | Methods and systems for determining conversation quality |
US10282456B2 (en) | 2015-10-01 | 2019-05-07 | Avaya Inc. | Managing contact center metrics |
US9792908B1 (en) * | 2016-10-28 | 2017-10-17 | International Business Machines Corporation | Analyzing speech delivery |
CN109272164B (en) * | 2018-09-29 | 2021-09-28 | 清华大学深圳研究生院 | Learning behavior dynamic prediction method, device, equipment and storage medium |
CN110245301A (en) * | 2018-11-29 | 2019-09-17 | 腾讯科技(深圳)有限公司 | A kind of recommended method, device and storage medium |
US11736610B2 (en) | 2021-09-28 | 2023-08-22 | Optum, Inc. | Audio-based machine learning frameworks utilizing similarity determination machine learning models |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020194002A1 (en) * | 1999-08-31 | 2002-12-19 | Accenture Llp | Detecting emotions using voice signal analysis |
US20040098274A1 (en) * | 2002-11-15 | 2004-05-20 | Dezonno Anthony J. | System and method for predicting customer contact outcomes |
US20040096050A1 (en) * | 2002-11-19 | 2004-05-20 | Das Sharmistha Sarkar | Accent-based matching of a communicant with a call-center agent |
US20060265089A1 (en) * | 2005-05-18 | 2006-11-23 | Kelly Conway | Method and software for analyzing voice data of a telephonic communication and generating a retention strategy therefrom |
US20110307257A1 (en) * | 2010-06-10 | 2011-12-15 | Nice Systems Ltd. | Methods and apparatus for real-time interaction analysis in call centers |
Family Cites Families (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB9620082D0 (en) | 1996-09-26 | 1996-11-13 | Eyretel Ltd | Signal monitoring apparatus |
US6665644B1 (en) | 1999-08-10 | 2003-12-16 | International Business Machines Corporation | Conversational data mining |
US6600821B1 (en) | 1999-10-26 | 2003-07-29 | Rockwell Electronic Commerce Corp. | System and method for automatically detecting problematic calls |
US7346186B2 (en) | 2001-01-30 | 2008-03-18 | Nice Systems Ltd | Video and audio content analysis system |
US7191133B1 (en) | 2001-02-15 | 2007-03-13 | West Corporation | Script compliance using speech recognition |
US6493668B1 (en) | 2001-06-15 | 2002-12-10 | Yigal Brandman | Speech feature extraction system |
US7010115B2 (en) * | 2001-12-13 | 2006-03-07 | Rockwell Electronic Commerce Technologies, Llc | System and method for predictive contacts |
US8055503B2 (en) | 2002-10-18 | 2011-11-08 | Siemens Enterprise Communications, Inc. | Methods and apparatus for audio data analysis and data mining using speech recognition |
US7546173B2 (en) | 2003-08-18 | 2009-06-09 | Nice Systems, Ltd. | Apparatus and method for audio content analysis, marking and summing |
US7412377B2 (en) | 2003-12-19 | 2008-08-12 | International Business Machines Corporation | Voice model for speech processing based on ordered average ranks of spectral features |
US8094803B2 (en) | 2005-05-18 | 2012-01-10 | Mattersight Corporation | Method and system for analyzing separated voice data of a telephonic communication between a customer and a contact center by applying a psychological behavioral model thereto |
US7983910B2 (en) * | 2006-03-03 | 2011-07-19 | International Business Machines Corporation | Communicating across voice and text channels with emotion preservation |
US20080015910A1 (en) | 2006-07-11 | 2008-01-17 | Claudia Reisz | Ranking-based method and system for evaluating customer predication models |
US8005676B2 (en) | 2006-09-29 | 2011-08-23 | Verint Americas, Inc. | Speech analysis using statistical learning |
US7752043B2 (en) | 2006-09-29 | 2010-07-06 | Verint Americas Inc. | Multi-pass speech analytics |
US7577246B2 (en) | 2006-12-20 | 2009-08-18 | Nice Systems Ltd. | Method and system for automatic quality evaluation |
US20080189171A1 (en) | 2007-02-01 | 2008-08-07 | Nice Systems Ltd. | Method and apparatus for call categorization |
US7599475B2 (en) | 2007-03-12 | 2009-10-06 | Nice Systems, Ltd. | Method and apparatus for generic analytics |
US8611523B2 (en) * | 2007-09-28 | 2013-12-17 | Mattersight Corporation | Methods and systems for determining segments of a telephonic communication between a customer and a contact center to classify each segment of the communication, assess negotiations, and automate setup time calculation |
US7788095B2 (en) * | 2007-11-18 | 2010-08-31 | Nice Systems, Ltd. | Method and apparatus for fast search in call-center monitoring |
US7953692B2 (en) | 2007-12-07 | 2011-05-31 | Microsoft Corporation | Predicting candidates using information sources |
US8126723B1 (en) | 2007-12-19 | 2012-02-28 | Convergys Cmg Utah, Inc. | System and method for improving tuning using caller provided satisfaction scores |
US8615419B2 (en) | 2008-05-07 | 2013-12-24 | Nice Systems Ltd | Method and apparatus for predicting customer churn |
US8145482B2 (en) | 2008-05-25 | 2012-03-27 | Ezra Daya | Enhancing analysis of test key phrases from acoustic sources with key phrase training models |
US8548812B2 (en) | 2008-12-22 | 2013-10-01 | Avaya Inc. | Method and system for detecting a relevant utterance in a voice session |
US8295471B2 (en) | 2009-01-16 | 2012-10-23 | The Resource Group International | Selective mapping of callers in a call-center routing system based on individual agent settings |
US20100191689A1 (en) * | 2009-01-27 | 2010-07-29 | Google Inc. | Video content analysis for automatic demographics recognition of users and videos |
US20100332287A1 (en) * | 2009-06-24 | 2010-12-30 | International Business Machines Corporation | System and method for real-time prediction of customer satisfaction |
US8494133B2 (en) * | 2009-06-24 | 2013-07-23 | Nexidia Inc. | Enterprise speech intelligence analysis |
US8411841B2 (en) | 2009-08-06 | 2013-04-02 | Nexidia Inc. | Real-time agent assistance |
US20120016674A1 (en) | 2010-07-16 | 2012-01-19 | International Business Machines Corporation | Modification of Speech Quality in Conversations Over Voice Channels |
US20130121580A1 (en) * | 2011-11-11 | 2013-05-16 | International Business Machines Corporation | Analysis of service delivery processes based on interrogation of work assisted devices |
-
2013
- 2013-06-04 WO PCT/US2013/044091 patent/WO2013184667A1/en active Application Filing
- 2013-06-04 US US13/909,351 patent/US8781880B2/en not_active Expired - Fee Related
-
2014
- 2014-06-06 US US14/298,457 patent/US20150066549A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020194002A1 (en) * | 1999-08-31 | 2002-12-19 | Accenture Llp | Detecting emotions using voice signal analysis |
US20040098274A1 (en) * | 2002-11-15 | 2004-05-20 | Dezonno Anthony J. | System and method for predicting customer contact outcomes |
US20040096050A1 (en) * | 2002-11-19 | 2004-05-20 | Das Sharmistha Sarkar | Accent-based matching of a communicant with a call-center agent |
US20060265089A1 (en) * | 2005-05-18 | 2006-11-23 | Kelly Conway | Method and software for analyzing voice data of a telephonic communication and generating a retention strategy therefrom |
US20110307257A1 (en) * | 2010-06-10 | 2011-12-15 | Nice Systems Ltd. | Methods and apparatus for real-time interaction analysis in call centers |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016195897A1 (en) * | 2015-06-04 | 2016-12-08 | Intel Corporation | System for analytic model development |
US20160357886A1 (en) * | 2015-06-04 | 2016-12-08 | Intel Corporation | System for analytic model development |
CN108536811A (en) * | 2018-04-04 | 2018-09-14 | 上海智臻智能网络科技股份有限公司 | Interactive voice determining method of path based on machine learning and device, storage medium, terminal |
Also Published As
Publication number | Publication date |
---|---|
US20130325560A1 (en) | 2013-12-05 |
WO2013184667A1 (en) | 2013-12-12 |
US8781880B2 (en) | 2014-07-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8781880B2 (en) | System, method and apparatus for voice analytics of recorded audio | |
US11004013B2 (en) | Training of chatbots from corpus of human-to-human chats | |
US11005995B2 (en) | System and method for performing agent behavioral analytics | |
US8768686B2 (en) | Machine translation with side information | |
US20190325324A1 (en) | Automated ontology development | |
US9477752B1 (en) | Ontology administration and application to enhance communication data analytics | |
US20180239822A1 (en) | Unsupervised automated topic detection, segmentation and labeling of conversations | |
US8676586B2 (en) | Method and apparatus for interaction or discourse analytics | |
US8914285B2 (en) | Predicting a sales success probability score from a distance vector between speech of a customer and speech of an organization representative | |
US10860566B1 (en) | Themes surfacing for communication data analysis | |
Li et al. | Maec: A multimodal aligned earnings conference call dataset for financial risk prediction | |
US11315569B1 (en) | Transcription and analysis of meeting recordings | |
US20120209606A1 (en) | Method and apparatus for information extraction from interactions | |
US20120209605A1 (en) | Method and apparatus for data exploration of interactions | |
US20080154579A1 (en) | Method of analyzing conversational transcripts | |
KR102100214B1 (en) | Method and appratus for analysing sales conversation based on voice recognition | |
CN110475032A (en) | Multi-service interface switching method, device, computer installation and storage medium | |
US20160299965A1 (en) | Prioritizing survey text responses | |
CN110633912A (en) | Method and system for monitoring service quality of service personnel | |
US20230267927A1 (en) | Unsupervised automated extraction of conversation structure from recorded conversations | |
Bockhorst et al. | Predicting self-reported customer satisfaction of interactions with a corporate call center | |
CN113505606B (en) | Training information acquisition method and device, electronic equipment and storage medium | |
Pallotta et al. | Interaction Mining: the new Frontier of Call Center Analytics. | |
KR20210009266A (en) | Method and appratus for analysing sales conversation based on voice recognition | |
Okada et al. | A time series structure analysis method of a meeting using text data and a visualization method of state transitions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: RANK MINER, INC., FLORIDA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KOCSOR, ANDRAS;KOVACS, KORNEL;BODOGH, ATTILA;AND OTHERS;REEL/FRAME:036021/0101 Effective date: 20130514 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |