CN103325371A

CN103325371A - Voice recognition system and method based on cloud

Info

Publication number: CN103325371A
Application number: CN2013102227104A
Authority: CN
Inventors: 熊伟; 刘伟; 谢良伟; 王飞浪; 陈鑫娜; 张俱扬; 熊鑫; 倪中恩; 栾新
Original assignee: Hangzhou Wangdou Digital Technology Co Ltd
Current assignee: Hangzhou Wangdou Digital Technology Co Ltd
Priority date: 2013-06-05
Filing date: 2013-06-05
Publication date: 2013-09-25

Abstract

The invention relates to a voice recognition system and method, in particular to a voice recognition system and method based on cloud. The voice recognition system comprises an intelligent voice input terminal, a load balancing server, a voice identification server cluster and a data base server cluster, wherein the intelligent voice input terminal is connected with the load balancing server, the load balancing server is connected with the voice identification server cluster, and the voice identification server cluster is connected with the data base server cluster. Because a cloud calculation technology is introduced to voice identification application services, the structure and the method can effectively solve the problems that the prior voice identification system is low in identification rate and causes slow concurrent access responses because of being limited in storage and calculation resources and improves comparison operation time, and adaptability of the voice recognition system and method is extended.

Description

Speech recognition system and method based on cloud

Technical field

The present invention relates to a kind of speech recognition system and method, especially a kind of speech recognition system and method based on cloud.

Background technology

Traditional man-machine interaction relies on complicated keyboard or button to realize, along with the development of science and technology, some novel man-machine interaction modes also are born thereupon, bring people brand-new experience.Man-machine interaction mode based on speech recognition is one of popular at present technology, and speech recognition is the problem of studying the content of the voice that make machine can recognize exactly the people, i.e. what is said or talked about in accurately identification.Voice are that the mankind intercom and the most convenient that exchanges means efficiently mutually.Voice recognition processing relates to some forward position scientific research tasks, is a cross discipline that involvement aspect is very wide.It is that present information science with the fastest developing speed is studied in all fields, and it and the subjects such as phonetics, linguistics, mathematical statistics and neuro-physiology have very close relationship.

But in the prior art, speech recognition technology can only be applied to limited scope, this be because, because the restriction of the factors such as hardware device, memory data output, region, in case the data increase just increases a lot of working times, remote speech identification is difficult to realize that therefore the applicability of speech recognition technology is very narrow.

Summary of the invention

The object of the invention is to, a kind of working time is fast, applicability is wide speech recognition system and method based on cloud are provided.

The technical solution adopted for the present invention to solve the technical problems is: a kind of speech recognition system based on cloud, comprise intelligent sound entry terminal, load-balanced server, speech recognition server cluster, database server cluster, the intelligent sound entry terminal is connected with load-balanced server, load-balanced server is connected with the speech recognition server cluster, and the speech recognition server cluster is connected with database server cluster.

The further setting of the present invention is: the speech recognition server cluster comprises at least two speech recognition servers that speech identifying function can independently be provided, and the pattern that connects with the cloud net between the speech recognition server connects.

The present invention solves the further technical scheme that its technical matters adopts: the intelligent sound entry terminal gathers user voice data, user voice data transfers to load-balanced server, load-balanced server is identified the task dynamic assignment according to the load of speech recognition server cluster to idle speech recognition server, the data server cluster is mainly used in the sound bank masterplate of store and management Hidden Markov Model (HMM), speech recognition server is compared the data of user voice data in the data server cluster fast, for the user in time returns recognition result.

Said structure and method are by introducing cloud computing technology in the speech recognition application service, the problems such as the Concurrency Access response that can solve effectively that the discrimination that the existing voice recognition system exists is low, storage and computational resource anxiety causes is slow, not only improve contrast working time, also expanded applicability of the present invention.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, the below will do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art, apparently, accompanying drawing in the following describes only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the theory diagram of invention;

Embodiment

With reference to figure 1 as can be known, a kind of speech recognition system based on cloud of the present invention, comprise the intelligent sound entry terminal, load-balanced server, the speech recognition server cluster, database server cluster, the intelligent sound entry terminal is connected with load-balanced server, load-balanced server is connected with the speech recognition server cluster, the speech recognition server cluster is connected with database server cluster, the speech recognition server cluster comprises at least two speech recognition servers that speech identifying function can independently be provided, and the pattern that connects with the cloud net between the speech recognition server connects.

Based on said system, the present invention has adopted the audio recognition method based on cloud, method is as follows: the intelligent sound entry terminal gathers user voice data, user voice data transfers to load-balanced server, load-balanced server is identified the task dynamic assignment according to the load of speech recognition server cluster to idle speech recognition server, the data server cluster is mainly used in the sound bank masterplate of store and management Hidden Markov Model (HMM), speech recognition server is compared the data of user voice data in the data server cluster fast, for the user in time returns recognition result.

Hidden Markov model HMM(HiddenMarkovModel wherein) be a kind of of Markov chain, its state can not observe directly, but can observe by the observation vector sequence, each observation vector is to show as various states by some probability density distribution, and each observation vector is to be produced by a status switch with corresponding probability density distribution.So hidden Markov model is a dual random process----have the HMM of certain status number and show the random function collection.

Based on HMM(Hidden Markov Model) recognizer by feat of higher accuracy of identification and stability, obtained in recent years studying widely and using.This is the people such as Rabiner introduce field of speech recognition in the eighties in 20th century a kind of speech recognition algorithm.This algorithm is set up the statistical model of identification bar by a large amount of speech datas are carried out data statistics, then extracts feature from voice to be identified, with these Model Matching, by the comparison match mark to obtain recognition result.By a large amount of voice, just can obtain a sane statistical model, can adapt to the various emergency case in the actual speech.Therefore, the HMM algorithm has good recognition performance and noiseproof feature.Recognition system based on the HMM technology can be used for unspecified person, does not need user's precondition.The shortcoming of HMM technology is that the foundation of statistical model need to rely on a larger sound bank, in the application process of reality, the operand that the memory space that model needs and coupling are calculated (output probability that comprises eigenvector calculates) is larger, is difficult to satisfy the HMM algorithm to the demand of system resource based on the intelligent terminal of thin-client; Therefore realize based on the most C/S frameworks (client-server structure) that adopt of the speech recognition system of HMM algorithm, this pattern has two main drawbacks, one is exactly that the performance of speech recognition has very large defective, and when for example carrying out extensive speech recognition algorithm, response speed is very slow; It two is the multi-user's application demands that can not carry high concurrent, high capacity, and this recognizer just can address the above problem in conjunction with cloud, thereby satisfies purpose of the present invention.

And what solve hidden Markov model identification computing is to have utilized the computational load balancing technique, and wherein dynamic load leveling is analyzed packet in real time by some instruments, grasps the data traffic situation in the network, and the task reasonable distribution is gone out.Be divided into local load balancing and region load balancing (GSLB) on the structure, front a kind of referring to done load balancing to the server cluster of this locality, rear a kind of referring to being placed on respectively different geographic position, making load balancing between different networks and cluster of servers.Server zone is concentrated the separate copies of a required service device program of each service station operation, such as Web, FTP, Telnet or e-mail server program.For some service (as operate on the Web server those services), on all main frame, Network Load Balance then distributes operating load a copy of program between these main frames in operating in and trooping.For other services (for example e-mail), only has a host process operating load, for these services, Network Load Balance allows the network communication amount to flow on the main frame, and when this main frame breaks down, traffic moved to other main frames, on the basis of this mode, just can identify the problem that exists in the computing by fine solution hidden Markov model.

The intelligent sound entry terminal is used for collection and the pre-service of voice messaging in the present invention, load-balanced server is used for according to the load state of backstage speech recognition server user's visiting demand being distributed and equilibrium, and the backstage sound identification module adopts the Hidden Markov Model (HMM) algorithm to identifying and return recognition result through pretreated speech data.The problems such as the present invention is by introducing the cloud computing load balancing technology in the speech recognition application service, and the Concurrency Access response that can solve effectively that the discrimination that the existing voice recognition system exists is low, storage and computational resource anxiety causes is slow.

Obviously, above-described embodiment only be for explanation clearly do for example, and be not restriction to embodiment.For the technician in described field, can also make other changes in different forms on the basis of the above description.Here need not also can't give all embodiments exhaustive.And the apparent variation of being extended out thus or change still are in protection scope of the present invention.

Claims

1. speech recognition system based on cloud, it is characterized in that: comprise intelligent sound entry terminal, load-balanced server, speech recognition server cluster, database server cluster, the intelligent sound entry terminal is connected with load-balanced server, load-balanced server is connected with the speech recognition server cluster, and the speech recognition server cluster is connected with database server cluster.

2. according to the speech recognition system based on cloud claimed in claim 1, it is characterized in that: the speech recognition server cluster comprises at least two speech recognition servers that speech identifying function can independently be provided, and the pattern that connects with the cloud net between the speech recognition server connects.

3. audio recognition method based on cloud, method is as follows: the intelligent sound entry terminal gathers user voice data, user voice data transfers to load-balanced server, load-balanced server is identified the task dynamic assignment according to the load of speech recognition server cluster to idle speech recognition server, the data server cluster is mainly used in the sound bank masterplate of store and management Hidden Markov Model (HMM), speech recognition server is compared the data of user voice data in the data server cluster fast, for the user in time returns recognition result.